Working with Data 📦¶
Learn how to store and return data from your Jobs.
Returning Data from Functions¶
When your function returns data, specify what should be stored using the returns parameter:
from examples.common.functions import write_parameter
from runnable import PythonJob, metric, pickled
def main():
job = PythonJob(
function=write_parameter,
returns=[
pickled("df"), # pandas DataFrame (complex object)
"integer", # JSON-serializable integer
"floater", # JSON-serializable float
"stringer", # JSON-serializable string
"pydantic_param", # Pydantic model (auto-handled)
metric("score"), # Metric for tracking
],
)
job.execute()
return job
if __name__ == "__main__":
main()
See complete runnable code
examples/11-jobs/passing_parameters_python.py
"""
The below example shows how to set/get parameters in python
tasks of the pipeline.
The function, set_parameter, returns
- JSON serializable types
- pydantic models
- pandas dataframe, any "object" type
pydantic models are implicitly handled by runnable
but "object" types should be marked as "pickled".
Use pickled even for python data types is advised for
reasonably large collections.
Run the below example as:
python examples/03-parameters/passing_parameters_python.py
"""
from examples.common.functions import write_parameter
from runnable import PythonJob, metric, pickled
def main():
job = PythonJob(
function=write_parameter,
returns=[
pickled("df"),
"integer",
"floater",
"stringer",
"pydantic_param",
metric("score"),
],
)
job.execute()
return job
if __name__ == "__main__":
main()
Try it now:
What Gets Stored¶
{
"Output parameters": [
("df", "Pickled object stored in catalog as: df"),
("integer", 1),
("floater", 3.14),
("stringer", "hello"),
("pydantic_param", {"x": 10, "foo": "bar"}),
("score", 0.9)
],
"Metrics": [("score", 0.9)],
"status": "SUCCESS"
}
Return Type Guide¶
| Type | Usage | Storage Location | Example |
|---|---|---|---|
pickled("name") |
Complex objects (DataFrames, models) | .catalog/{run-id}/name.dill |
pickled("model") |
"name" |
JSON-serializable (int, float, str, dict) | Job summary | "count" |
metric("name") |
Trackable metrics | Metrics section + summary | metric("accuracy") |
| Pydantic models | Auto-handled objects | Job summary as JSON | "user_profile" |
Practical Examples¶
Data Analysis Job¶
def analyze_sales():
# Your analysis logic here
summary = {"total_sales": 50000, "growth": 0.15}
return summary
job = PythonJob(
function=analyze_sales,
returns=["summary"]
)
Model Training Job¶
def train_model():
# Training logic here
model = create_model()
accuracy = 0.95
return model, accuracy
job = PythonJob(
function=train_model,
returns=[pickled("model"), metric("accuracy")]
)
Report Generation Job¶
def generate_report():
# Report logic here
report_path = "monthly_report.pdf"
metrics = {"pages": 12, "charts": 5}
return report_path, metrics
job = PythonJob(
function=generate_report,
returns=["report_path", "metrics"]
)
Viewing Stored Data¶
After execution, check what was stored:
# List catalog contents
ls .catalog/{run-id}/
# View pickled objects (requires Python)
# Complex objects are in .dill files
# Simple values appear in job summary
# Check terminal output for JSON summary
Best Practices¶
✅ Always Specify Returns¶
# Good - explicit about what to keep
job = PythonJob(
function=my_function,
returns=["result", metric("score")]
)
❌ Don't Forget Returns¶
✅ Use Appropriate Types¶
returns=[
pickled("large_dataframe"), # For complex objects
"simple_count", # For basic values
metric("accuracy"), # For trackable metrics
]
What's Next?¶
You can now store Job outputs! Next steps:
- Parameters & Environment - Configure Jobs dynamically
- File Storage - Store files created during execution
- Job Types - Shell and Notebook Jobs
Ready to make your Jobs configurable? Continue to Parameters & Environment!