Define pipeline
Without any orchestrator, the simplest pipeline could be the below functions:
| def generate():
...
# write some files, data.csv
...
# return objects or simple python data types.
return x, y
def consume(x, y):
...
# read from data.csv
# do some computation with x and y
# Stich the functions together
# This is the driver pattern.
x, y = generate()
consume(x, y)
|
Runnable representation
The workflow in runnable
would be:
| from runnable import PythonTask, pickled, catalog, Pipeline
generate_task = PythonTask(name="generate", function=generate,
returns=[pickled("x"), y],
catalog=Catalog(put=["data.csv"])
consume_task = PythonTask(name="consume", function=consume,
catalog=Catalog(get=["data.csv"])
pipeline = Pipeline(steps=[generate_task, consume_task])
pipeline.execute()
|
runnable
wraps the functions generate
and consume
as tasks.
- Tasks can access and return parameters.
- Tasks can also share files between them using catalog.
- Tasks are stitched together as pipeline
- The execution environment is configured via # todo
Examples
All the concepts are accompanied by examples.