container environments
Pipeline definition¶
Executing pipelines in containers needs a yaml
based definition of the pipeline which is
referred during the task execution.
Any execution of the pipeline defined by SDK generates the pipeline
definition inyaml
format for all executors apart from the local
executor.
Follow the below steps to execute the pipeline defined by SDK.
- Execute the pipeline by running the python script as you would normally do to generate
yaml
based definition. - Optionally (but highly recommended) version your code using git.
- Build the docker image with the
yaml
file-based definition as part of the image. We recommend tagging the docker image with the short git sha to uniquely identify the docker image (1). - Define a variable to temporarily hold the docker image name in the pipeline definition, if the docker image name is not known.
- Execute the pipeline using the runnable CLI.
- Avoid using generic tags such as
latest
.
Dynamic name of the image¶
All containerized executors have a circular dependency problem.
- The docker image tag is only known after the creation of the image with the
yaml
based definition. - But the
yaml
based definition needs the docker image tag as part of the definition.
Warning
Not providing the required environment variable will raise an exception.
To resolve this, runnable supports variables
in the configuration of executors, both global and in step
overrides. Variables should follow the
python template strings
syntax and are replaced with environment variable prefixed by runnable_VAR_<identifier>
.
Concretely, $identifier
is replaced by runnable_VAR_<identifier>
.
Dockerfile¶
runnable should be installed in the docker image and available in the path. An example dockerfile is provided below.
non-native orchestration
Having runnable to be part of the docker image adds additional dependencies for python to be present in the docker image. In that sense, runnable is technically non-native container orchestration tool.
Facilitating native container orchestration, without runnable as part of the docker image, results in a complicated specification of files/parameters/experiment tracking losing the value of native interfaces to these essential orchestration concepts.
With the improvements in python packaging ecosystem, it should be possible to distribute runnable as a self-contained binary and reducing the dependency on the docker image.