Skip to content

Local Container Pipeline Execution

Execute pipelines using Docker containers with optional parallel processing - perfect for testing container-based deployments locally with environment isolation.

Installation Required

Container execution requires the optional Docker dependency:

pip install runnable[docker]

Container Setup Made Simple

Just build a Docker image from your project root - it automatically includes your code, dependencies, and environment!

docker build -t my-project:latest .

Getting Started

Basic Configuration

pipeline-executor:
  type: local-container
  config:
    docker_image: "my-project:latest"

Simple Example

from runnable import Pipeline, PythonTask

def hello_from_container():
    import platform
    print(f"Hello from container running: {platform.platform()}")
    return "success"

def main():
    task = PythonTask(
        function=hello_from_container,
        name="hello"
    )

    pipeline = Pipeline(steps=[task])
    pipeline.execute()

if __name__ == "__main__":
    main()
pipeline-executor:
  type: local-container
  config:
    docker_image: "my-project:latest"

Run the pipeline:

RUNNABLE_CONFIGURATION_FILE=config.yaml uv run pipeline.py

Container Isolation

Each task runs in a fresh container, giving you clean isolation between steps.

Why Use Containers Locally?

Perfect for Production Testing

Environment reproduction: Test exactly what runs in production

  • Dependency isolation: Each step gets a clean container environment
  • Local validation: Catch container issues before cloud deployment
  • Multiple environments: Different containers for different pipeline steps

Execution Models

Sequential (Default):

  • 🔄 One step at a time: Tasks run sequentially for simplicity
  • 🐳 Container per step: Each task gets a fresh, isolated container
  • 💻 Local resources: Uses your machine's CPU/memory limits

Parallel (Optional):

  • Parallel branches: parallel and map nodes can run simultaneously
  • 🐳 Multiple containers: Each branch gets its own container
  • 📋 Requires compatible run log store: Use chunked-fs for parallel writes

Parallel Execution

Enable parallel processing for container-based workflows:

from runnable import Pipeline, PythonTask, Parallel

def process_in_container(data_chunk):
    import platform
    print(f"Processing chunk {data_chunk} on {platform.platform()}")
    return f"processed_{data_chunk}"

def main():
    # Parallel branches that run in separate containers
    parallel_node = Parallel(
        name="container_parallel",
        branches={
            "process_a": [PythonTask(function=process_in_container, name="task_a")],
            "process_b": [PythonTask(function=process_in_container, name="task_b")],
            "process_c": [PythonTask(function=process_in_container, name="task_c")]
        }
    )

    pipeline = Pipeline(steps=[parallel_node])

    # Execute with parallel container support
    pipeline.execute(configuration_file="parallel_container.yaml")

    return pipeline

if __name__ == "__main__":
    main()
pipeline-executor:
  type: local-container
  config:
    docker_image: "my-project:latest"
    enable_parallel: true

# Required for parallel execution
run-log-store:
  type: chunked-fs

catalog:
  type: file-system

Run with parallel containers:

# Build your image first
docker build -t my-project:latest .

# Execute the pipeline
uv run pipeline.py

Parallel Container Benefits

  • True isolation: Each parallel branch runs in its own container
  • Resource utilization: Uses multiple CPU cores simultaneously
  • Production testing: Test parallel behavior before deploying to Kubernetes

Advanced Usage

Dynamic Container Images

Runtime Image Selection

Use different images at runtime with environment variables:

pipeline-executor:
  type: local-container
  config:
    docker_image: $my_docker_image
# Set the image dynamically
export RUNNABLE_VAR_my_docker_image="my-project:v2.0"
RUNNABLE_CONFIGURATION_FILE=config.yaml uv run pipeline.py

Step-Specific Containers

Different steps can use different container images - useful when you need specialized environments for different parts of your pipeline.

How it works:

  1. Define multiple configurations in your config file using overrides
  2. Reference the override in your task using the overrides parameter
  3. Each task runs in its specified container environment
from runnable import Pipeline, ShellTask

def main():
    # Uses default Python container (from main config)
    step1 = ShellTask(
        name="python_analysis",
        command="python --version && python analyze.py"
    )

    # Uses specialized R container (from "r_override" configuration)
    step2 = ShellTask(
        name="r_modeling",
        command="Rscript model.R",
        overrides={"local-container": "r_override"}  # References config below
    )

    pipeline = Pipeline(steps=[step1, step2])
    pipeline.execute()

if __name__ == "__main__":
    main()

Understanding the Override

overrides={"local-container": "r_override"} means:

  • "local-container": The executor type we're overriding
  • "r_override": The name of the override configuration (defined in config.yaml)
  • Result: This task will use the R container instead of the default Python container
pipeline-executor:
  type: local-container
  config:
    docker_image: "my-python:latest"  # Default for most steps
  overrides:
    r_override:
      docker_image: "my-r-env:latest"  # Specialized R environment

Debugging Failed Containers

Debug Failed Containers

Keep containers around for debugging:

pipeline-executor:
  type: local-container
  config:
    docker_image: "my-project:latest"
    auto_remove_container: false  # Keep failed containers

Then inspect the failed container:

# List containers to find the failed one
docker ps -a

# Get into the failed container
docker exec -it <container-id> /bin/bash

# Or check its logs
docker logs <container-id>

Configuration Reference

pipeline-executor:
  type: local-container
  config:
    docker_image: "my-project:latest"  # Required: Docker image to use
    enable_parallel: false             # Enable parallel execution
    auto_remove_container: true        # Remove containers after execution
    environment:                       # Environment variables for containers
      VAR_NAME: "value"
    overrides:                        # Step-specific configurations
      alt_config:
        docker_image: "alternative:latest"
        auto_remove_container: false
        environment:
          SPECIAL_VAR: "special_value"

When to Use Local Container

Choose Local Container When

  • Testing container-based deployments before going to cloud
  • Need environment isolation between pipeline steps
  • Want to replicate production container behavior locally
  • Different steps require different software environments

Use Regular Local Executor When

  • Simple development and experimentation
  • All steps use the same environment
  • Want fastest possible execution (no container overhead)

Upgrade to Cloud Executors When

  • Need true parallel execution (Argo)
  • Want distributed compute resources
  • Running production workloads