🎯 Jobs vs Pipelines: When to Use Which?¶
Both jobs and pipelines run your functions. The difference is intent.
🎯 Jobs: "Run this once"¶
Perfect for standalone tasks:
from runnable import PythonJob
def analyze_sales_data():
# Load data, run analysis, generate report
return "Analysis complete!"
def main():
# Job: Just run it
job = PythonJob(function=analyze_sales_data)
job.execute()
return job # REQUIRED: Always return the job object
if __name__ == "__main__":
main()
See complete runnable code
"""
You can execute this pipeline by:
python examples/01-tasks/python_tasks.py
The stdout of "Hello World!" would be captured as execution
log and stored in the catalog.
An example of the catalog structure:
.catalog
└── baked-heyrovsky-0602
└── hello.execution.log
2 directories, 1 file
The hello.execution.log has the captured stdout of "Hello World!".
"""
from examples.common.functions import hello
from runnable import PythonJob
def main():
job = PythonJob(function=hello)
job.execute()
return job
if __name__ == "__main__":
main()
Try it now:
When to use jobs:¶
- One-off analysis: "Analyze this dataset"
- Testing functions: "Does my code work?"
- Standalone reports: "Generate monthly summary"
- Data exploration: "What's in this file?"
🔗 Pipelines: "This is step X of many"¶
Perfect for multi-step workflows:
from runnable import Pipeline, PythonTask
def load_data():
return {"users": 1000, "sales": 50000}
def clean_data(raw_data):
return {"clean_users": raw_data["users"], "clean_sales": raw_data["sales"]}
def train_model(cleaned_data):
return f"Model trained on {cleaned_data['clean_users']} users"
def main():
# Pipeline: Chain them together
pipeline = Pipeline(steps=[
PythonTask(function=load_data, returns=["raw_data"]),
PythonTask(function=clean_data, returns=["cleaned_data"]),
PythonTask(function=train_model, returns=["model"])
])
pipeline.execute()
return pipeline # REQUIRED: Always return the pipeline object
if __name__ == "__main__":
main()
See complete runnable code
"""
The below example shows how to set/get parameters in python
tasks of the pipeline.
The function, set_parameter, returns
- JSON serializable types
- pydantic models
- pandas dataframe, any "object" type
pydantic models are implicitly handled by runnable
but "object" types should be marked as "pickled".
Use pickled even for python data types is advised for
reasonably large collections.
Run the below example as:
python examples/03-parameters/passing_parameters_python.py
"""
from examples.common.functions import read_parameter, write_parameter
from runnable import Pipeline, PythonTask, metric, pickled
def main():
write_parameters = PythonTask(
function=write_parameter,
returns=[
pickled("df"),
"integer",
"floater",
"stringer",
"pydantic_param",
metric("score"),
],
name="set_parameter",
)
read_parameters = PythonTask(
function=read_parameter,
terminate_with_success=True,
name="get_parameters",
)
pipeline = Pipeline(
steps=[write_parameters, read_parameters],
)
_ = pipeline.execute()
return pipeline
if __name__ == "__main__":
main()
Try it now:
When to use pipelines:¶
- Multi-step workflows: "Load → Clean → Train → Deploy"
- Data pipelines: "Extract → Transform → Load"
- Reproducible processes: "Run the same steps every time"
- Complex dependencies: "Step 3 needs outputs from steps 1 and 2"
🔄 Same function, different contexts¶
Here's the same function used both ways:
As a job:
from runnable import PythonJob
def main():
job = PythonJob(function=hello)
job.execute()
return job # REQUIRED: Always return the job object
if __name__ == "__main__":
main()
As a pipeline task:
from runnable import Pipeline, PythonTask
def main():
task = PythonTask(function=hello, name="say_hello")
pipeline = Pipeline(steps=[task])
pipeline.execute()
return pipeline # REQUIRED: Always return the pipeline object
if __name__ == "__main__":
main()
Quick decision guide¶
| I want to... | Use |
|---|---|
| Test my function | Job |
| Run analysis once | Job |
| Generate a report | Job |
| Process data in multiple steps | Pipeline |
| Chain different functions | Pipeline |
| Run the same workflow repeatedly | Pipeline |
You can always switch
Start with a job to test your function, then move it into a pipeline when you're ready to build a workflow.
Essential Pattern: Always Return Objects
Both jobs and pipelines must be returned from your main() function. This pattern is critical for:
🔍 Execution Tracking: Runnable tracks run status, timing, and metadata through the returned object
📊 Result Access: The returned object contains execution results, logs, and run IDs
🔗 Integration: External tools and monitoring systems need the object for further processing
🐛 Debugging: Error details and execution context are accessible via the returned object
❌ Missing returns break functionality:
def main():
job = PythonJob(function=my_function)
job.execute()
# Missing return - loses execution tracking!
def main():
pipeline = Pipeline(steps=[...])
pipeline.execute()
# Missing return - no access to results!
✅ Always use this pattern:
Custom Execution Models
Need to run jobs beyond Python, Shell, and Notebooks? Create custom task types and executors for any infrastructure or execution model using Runnable's extensible plugin architecture.
What's Next?¶
- Pipeline Parameters - Configure pipelines with parameters and custom run IDs
- Task Types - Different ways to define pipeline steps (Python, notebooks, shell scripts)
- Visualization - Visualize pipeline execution with interactive timelines