Skip to content

Working with Data 📦

Learn how to store and return data from your Jobs.

Returning Data from Functions

When your function returns data, specify what should be stored using the returns parameter:

from examples.common.functions import write_parameter
from runnable import PythonJob, metric, pickled

def main():
    job = PythonJob(
        function=write_parameter,
        returns=[
            pickled("df"),        # pandas DataFrame (complex object)
            "integer",            # JSON-serializable integer
            "floater",            # JSON-serializable float
            "stringer",           # JSON-serializable string
            "pydantic_param",     # Pydantic model (auto-handled)
            metric("score"),      # Metric for tracking
        ],
    )

    job.execute()
    return job

if __name__ == "__main__":
    main()
See complete runnable code
examples/11-jobs/passing_parameters_python.py
"""
The below example shows how to set/get parameters in python
tasks of the pipeline.

The function, set_parameter, returns
    - JSON serializable types
    - pydantic models
    - pandas dataframe, any "object" type

pydantic models are implicitly handled by runnable
but "object" types should be marked as "pickled".

Use pickled even for python data types is advised for
reasonably large collections.

Run the below example as:
    python examples/03-parameters/passing_parameters_python.py

"""

from examples.common.functions import write_parameter
from runnable import PythonJob, metric, pickled


def main():
    job = PythonJob(
        function=write_parameter,
        returns=[
            pickled("df"),
            "integer",
            "floater",
            "stringer",
            "pydantic_param",
            metric("score"),
        ],
    )

    job.execute()

    return job


if __name__ == "__main__":
    main()

Try it now:

uv run examples/11-jobs/passing_parameters_python.py

What Gets Stored

{
    "Output parameters": [
        ("df", "Pickled object stored in catalog as: df"),
        ("integer", 1),
        ("floater", 3.14),
        ("stringer", "hello"),
        ("pydantic_param", {"x": 10, "foo": "bar"}),
        ("score", 0.9)
    ],
    "Metrics": [("score", 0.9)],
    "status": "SUCCESS"
}

Return Type Guide

Type Usage Storage Location Example
pickled("name") Complex objects (DataFrames, models) .catalog/{run-id}/name.dill pickled("model")
"name" JSON-serializable (int, float, str, dict) Job summary "count"
metric("name") Trackable metrics Metrics section + summary metric("accuracy")
Pydantic models Auto-handled objects Job summary as JSON "user_profile"

Practical Examples

Data Analysis Job

def analyze_sales():
    # Your analysis logic here
    summary = {"total_sales": 50000, "growth": 0.15}
    return summary

job = PythonJob(
    function=analyze_sales,
    returns=["summary"]
)

Model Training Job

def train_model():
    # Training logic here
    model = create_model()
    accuracy = 0.95
    return model, accuracy

job = PythonJob(
    function=train_model,
    returns=[pickled("model"), metric("accuracy")]
)

Report Generation Job

def generate_report():
    # Report logic here
    report_path = "monthly_report.pdf"
    metrics = {"pages": 12, "charts": 5}
    return report_path, metrics

job = PythonJob(
    function=generate_report,
    returns=["report_path", "metrics"]
)

Viewing Stored Data

After execution, check what was stored:

# List catalog contents
ls .catalog/{run-id}/

# View pickled objects (requires Python)
# Complex objects are in .dill files

# Simple values appear in job summary
# Check terminal output for JSON summary

Best Practices

Always Specify Returns

# Good - explicit about what to keep
job = PythonJob(
    function=my_function,
    returns=["result", metric("score")]
)

Don't Forget Returns

# Bad - function output will be lost
job = PythonJob(function=my_function)  # No returns specified!

Use Appropriate Types

returns=[
    pickled("large_dataframe"),  # For complex objects
    "simple_count",              # For basic values
    metric("accuracy"),          # For trackable metrics
]

What's Next?

You can now store Job outputs! Next steps:

Ready to make your Jobs configurable? Continue to Parameters & Environment!