Parameters
parameters
are data that can be passed from one task
to another.
Concept¶
For example, in the below snippet, the parameters x
and y
are passed from
generate
to consume
.
The data types of x
and y
can be:
- JSON serializable: int, string, float, list, dict including pydantic models.
- Objects: Any dill friendly objects.
Compatibility¶
Below table summarizes the input/output types of different task types. For ex: notebooks can only take JSON serializable parameters as input but can return json/pydantic/objects.
Task | Input | Output |
---|---|---|
python | json, pydantic, object via function arguments | json, pydantic, object as returns |
notebook | json via cell tagged with parameters |
json, pydantic, object as returns |
shell | json via environment variables | json environmental variables as returns |
Project parameters¶
Project parameters can be defined using a yaml
file. These parameters can then be
over-ridden by tasks of the pipeline.
They can also be provided by environment variables prefixed by RUNNABLE_PRM_
.
Environmental variables over-ride yaml
parameters.
Type casting
Annotating the arguments of python function ensures the right data type of arguments.
It is advised to cast
the parameters in notebook tasks or shell.
Deeply nested yaml objects are supported.
The yaml formatted parameters can also be defined as:
export runnable_PRM_integer="1"
export runnable_PRM_floater="3.14"
export runnable_PRM_stringer="hello"
export runnable_PRM_pydantic_param="{'x': 10, 'foo': bar}"
export runnable_PRM_chunks="[1, 2, 3]"
Parameters defined by environment variables override parameters defined by
yaml
. This can be useful to do a quick experimentation without changing code.
Accessing parameters¶
The functions have arguments that correspond to the project parameters.
Without annotations for nested params, they are sent in as dictionary.
"""
The below example showcases setting up known initial parameters for a pipeline
of only python tasks
The initial parameters as defined in the yaml file are:
simple: 1
complex_param:
x: 10
y: "hello world!!"
runnable allows using pydantic models for deeply nested parameters and
casts appropriately based on annotation. eg: read_initial_params_as_pydantic
If no annotation is provided, the parameter is assumed to be a dictionary.
eg: read_initial_params_as_json
You can set the initial parameters from environment variables as well.
eg: Any environment variable prefixed by "RUNNABLE_PRM_" will be picked up by runnable
Run this pipeline as:
python examples/03-parameters/static_parameters_python.py
"""
import os
from examples.common.functions import (
read_initial_params_as_json,
read_initial_params_as_pydantic,
)
from runnable import Pipeline, PythonTask
def main():
"""
Signature of read_initial_params_as_pydantic
def read_initial_params_as_pydantic(
integer: int,
floater: float,
stringer: str,
pydantic_param: ComplexParams,
envvar: str,
):
"""
read_params_as_pydantic = PythonTask(
function=read_initial_params_as_pydantic,
name="read_params_as_pydantic",
)
"""
Signature of read_initial_params_as_json
def read_initial_params_as_json(
integer: int,
floater: float,
stringer: str,
pydantic_param: Dict[str, Union[int, str]],
):
"""
read_params_as_json = PythonTask(
function=read_initial_params_as_json,
terminate_with_success=True,
name="read_params_json",
)
pipeline = Pipeline(
steps=[read_params_as_pydantic, read_params_as_json],
)
_ = pipeline.execute(parameters_file="examples/common/initial_parameters.yaml")
return pipeline
if __name__ == "__main__":
# Any parameter prefixed by "RUNNABLE_PRM_" will be picked up by runnable
os.environ["RUNNABLE_PRM_envvar"] = "from env"
main()
The notebook has cell tagged with parameters
which are substituted at run time.
The shell script has access to them as environmental variables.
"""
The below example showcases setting up known initial parameters for a pipeline
of notebook and shell based commands.
The initial parameters as defined in the yaml file are:
integer: 1
floater : 3.14
stringer : hello
pydantic_param:
x: 10
foo: bar
runnable exposes the nested parameters as dictionary for notebook based tasks
and as a json string for the shell based tasks.
You can set the initial parameters from environment variables as well.
eg: Any environment variable prefixed by "RUNNABLE_PRM_" will be picked up by runnable
Run this pipeline as:
python examples/03-parameters/static_parameters_non_python.py
"""
from runnable import NotebookTask, Pipeline, ShellTask
def main():
read_params_in_notebook = NotebookTask(
name="read_params_in_notebook",
notebook="examples/common/read_parameters.ipynb",
)
shell_command = """
if [ "$integer" = 1 ] \
&& [ "$floater" = 3.14 ] \
&& [ "$stringer" = "hello" ] \
&& [ "$pydantic_param" = '{"x": 10, "foo": "bar"}' ]; then
echo "yaay"
exit 0;
else
echo "naay"
exit 1;
fi
"""
read_params_in_shell = ShellTask(
name="read_params_in_shell",
command=shell_command,
terminate_with_success=True,
)
pipeline = Pipeline(
steps=[read_params_in_notebook, read_params_in_shell],
)
_ = pipeline.execute(parameters_file="examples/common/initial_parameters.yaml")
return pipeline
if __name__ == "__main__":
main()
Access & returns¶
access¶
The access of parameters returned by upstream tasks is similar to project parameters
returns¶
Tasks can return parameters which can then be accessed by downstream tasks.
The syntax is inspired by:
def generate():
...
return x, y
def consume(x, y):
...
x, y = generate() # returns x and y as output
consume(x, y) # consumes x, y as input arguments.
and implemented in runnable
as:
order of returns
The order of returns
should match the order of the python function returning them.
marking returns as metric
or object
¶
JSON style parameters can be marked as a metric
in
python functions, notebook, shell. Metric parameters can be accessed as normal parameters in downstream steps.
Returns marked as pickled
in python functions, notebook are serialized using dill
.
Example¶
import pandas as pd
# Assuming a function return a pandas dataframe and a score
def generate():
...
return df, score
# Downstream step consuming the df and score
def consume(df: pd.Dataframe, score: float):
...
Complete Example¶
To access parameters, the cell should be tagged with parameters
. Only
JSON style parameters can be injected in.
Any python variable defined during the execution of the notebook matching the
name in returns
is inferred as a parameter. The variable can be either
JSON type or objects.
Shell tasks can only access/return JSON style parameters