Tasks
Task nodes are the execution units of the pipeline.
They can be python functions, notebooks, shell scripts or stubs
In the below examples, highlighted lines of the code are the relevant bits while the rest of the python code (or yaml) defines and executes a pipeline that executes the python function/notebook/shell script/stubs.
Python functions¶
Uses python functions as tasks.
Example¶
Structuring
It is best to keep the application specific functions in a different module than the pipeline definition, if you are using Python SDK.
Dotted path
Assuming the below project structure:
-
The
command
for theouter_function
should beouter_functions.outer_function
-
The
command
forinner_function
should bemodule_inner.inner_functions.inner_function
Notebook¶
Jupyter notebooks are supported as tasks. We internally use Ploomber engine for executing notebooks.
The output is saved to the same location as the input notebook but with _out
post-fixed to
the name of the notebook and is also saved in the catalog
for logging and ease of debugging.
Example¶
Shell¶
Python functions and Jupyter notebooks provide a rich interface to the python ecosystem while shell provides a interface to non-python executables.
Example¶
Stub¶
Stub nodes in runnable are just like pass
or ...
in python code.
It is a placeholder and useful when you want to debug ordesign your pipeline.
Stub nodes can take arbitrary number of parameters and is always a success.
Example¶
Intuition
Designing a pipeline is similar to writing a modular program. Stub nodes are handy to create a placeholder for some step that will be implemented in the future.
During debugging, changing a node to stub
will let you focus on the actual bug without having to
execute the additional steps.