Working with Data 📦¶

Learn how to store and return data from your Jobs.

Returning Data from Functions¶

When your function returns data, specify what should be stored using the returns parameter:

from examples.common.functions import write_parameter
from runnable import PythonJob, metric, pickled

def main():
    job = PythonJob(
        function=write_parameter,
        returns=[
            pickled("df"),        # pandas DataFrame (complex object)
            "integer",            # JSON-serializable integer
            "floater",            # JSON-serializable float
            "stringer",           # JSON-serializable string
            "pydantic_param",     # Pydantic model (auto-handled)
            metric("score"),      # Metric for tracking
        ],
    )

    job.execute()
    return job

if __name__ == "__main__":
    main()

See complete runnable code

examples/11-jobs/passing_parameters_python.py

"""
The below example shows how to set/get parameters in python
tasks of the pipeline.

The function, set_parameter, returns
    - JSON serializable types
    - pydantic models
    - pandas dataframe, any "object" type

pydantic models are implicitly handled by runnable
but "object" types should be marked as "pickled".

Use pickled even for python data types is advised for
reasonably large collections.

Run the below example as:
    python examples/03-parameters/passing_parameters_python.py

"""

from examples.common.functions import write_parameter
from runnable import PythonJob, metric, pickled


def main():
    job = PythonJob(
        function=write_parameter,
        returns=[
            pickled("df"),
            "integer",
            "floater",
            "stringer",
            "pydantic_param",
            metric("score"),
        ],
    )

    job.execute()

    return job


if __name__ == "__main__":
    main()

Try it now:

uv run examples/11-jobs/passing_parameters_python.py

What Gets Stored¶

{
    "Output parameters": [
        ("df", "Pickled object stored in catalog as: df"),
        ("integer", 1),
        ("floater", 3.14),
        ("stringer", "hello"),
        ("pydantic_param", {"x": 10, "foo": "bar"}),
        ("score", 0.9)
    ],
    "Metrics": [("score", 0.9)],
    "status": "SUCCESS"
}

Return Type Guide¶

Type	Usage	Storage Location	Example
`pickled("name")`	Complex objects (DataFrames, models)	`.catalog/{run-id}/name.dill`	`pickled("model")`
`"name"`	JSON-serializable (int, float, str, dict)	Job summary	`"count"`
`metric("name")`	Trackable metrics	Metrics section + summary	`metric("accuracy")`
Pydantic models	Auto-handled objects	Job summary as JSON	`"user_profile"`

Practical Examples¶

Data Analysis Job¶

def analyze_sales():
    # Your analysis logic here
    summary = {"total_sales": 50000, "growth": 0.15}
    return summary

job = PythonJob(
    function=analyze_sales,
    returns=["summary"]
)

Model Training Job¶

def train_model():
    # Training logic here
    model = create_model()
    accuracy = 0.95
    return model, accuracy

job = PythonJob(
    function=train_model,
    returns=[pickled("model"), metric("accuracy")]
)

Report Generation Job¶

def generate_report():
    # Report logic here
    report_path = "monthly_report.pdf"
    metrics = {"pages": 12, "charts": 5}
    return report_path, metrics

job = PythonJob(
    function=generate_report,
    returns=["report_path", "metrics"]
)

Viewing Stored Data¶

After execution, check what was stored:

# List catalog contents
ls .catalog/{run-id}/

# View pickled objects (requires Python)
# Complex objects are in .dill files

# Simple values appear in job summary
# Check terminal output for JSON summary

Best Practices¶

✅ Always Specify Returns¶

# Good - explicit about what to keep
job = PythonJob(
    function=my_function,
    returns=["result", metric("score")]
)

❌ Don't Forget Returns¶

# Bad - function output will be lost
job = PythonJob(function=my_function)  # No returns specified!

✅ Use Appropriate Types¶

returns=[
    pickled("large_dataframe"),  # For complex objects
    "simple_count",              # For basic values
    metric("accuracy"),          # For trackable metrics
]

What's Next?¶

You can now store Job outputs! Next steps:

Parameters & Environment - Configure Jobs dynamically
File Storage - Store files created during execution
Job Types - Shell and Notebook Jobs

Ready to make your Jobs configurable? Continue to Parameters & Environment!