Skip to content

🔄 Perfect Reproducibility Every Time

Tired of "it worked on my machine" problems? Runnable automatically captures everything needed to reproduce your workflows.

The old way (hope and pray)

def analyze_data():
    # Which version of pandas was this?
    # What were the input files?
    # Which git commit was this?
    df = pd.read_csv("data.csv")  # Hope it's the same data...
    return df.groupby('category').mean()  # Hope same pandas behavior...

The Runnable way (automatic tracking)

Every run captures:

from runnable import Pipeline, PythonTask

pipeline = Pipeline(steps=[
    PythonTask(function=analyze_data, name="analysis")
])
result = pipeline.execute()  # Everything is automatically tracked!

After running, you get:

  • 🆔 Unique run ID: clever-einstein-1234
  • 📝 Complete execution log: .run_log_store/clever-einstein-1234.json
  • 💾 All data artifacts: .catalog/clever-einstein-1234/
  • 🔍 Full metadata: Parameters, timings, code versions

What gets tracked automatically

Code & Environment:

{
  "code_identities": [{
    "code_identifier": "7079b8df5c4bf826d3baf6e3f5839ba6193d88dd",
    "code_identifier_type": "git",
    "code_identifier_url": "https://github.com/your-org/project.git"
  }]
}

Parameters & Data Flow:

{
  "input_parameters": {"threshold": 0.95},
  "output_parameters": {"accuracy": 0.87},
  "data_catalog": [{
    "name": "model.pkl",
    "data_hash": "8650858600ce25b35e978ecb162414d9"
  }]
}

Execution Context:

{
  "start_time": "2025-11-04 22:48:42.128195",
  "status": "SUCCESS",
  "pipeline_executor": {"service_name": "local"},
  "dag_hash": "d26e1287acb814e58c72a1c67914033eb0fb2e26"
}

Complete workflow example

from runnable import Pipeline, PythonTask, Catalog, pickled

def train_model(learning_rate: float = 0.01):
    model = train_ml_model(learning_rate)
    return {"model": model, "accuracy": 0.87}

def evaluate_model(model, test_data_path: str):
    accuracy = evaluate(model, test_data_path)
    return {"final_accuracy": accuracy}

pipeline = Pipeline(steps=[
    PythonTask(
        function=train_model,
        returns=[pickled("model"), ("accuracy", "json")],
        catalog=Catalog(get=["train.csv"], put=["model.pkl"])
    ),
    PythonTask(
        function=evaluate_model,
        catalog=Catalog(get=["test.csv"])
    )
])

# Everything gets tracked automatically
result = pipeline.execute()
print(f"Run ID: {result.run_id}")  # clever-einstein-1234
See complete runnable code
examples/03-parameters/passing_parameters_python.py
"""
The below example shows how to set/get parameters in python
tasks of the pipeline.

The function, set_parameter, returns
    - JSON serializable types
    - pydantic models
    - pandas dataframe, any "object" type

pydantic models are implicitly handled by runnable
but "object" types should be marked as "pickled".

Use pickled even for python data types is advised for
reasonably large collections.

Run the below example as:
    python examples/03-parameters/passing_parameters_python.py

"""

from examples.common.functions import read_parameter, write_parameter
from runnable import Pipeline, PythonTask, metric, pickled


def main():
    write_parameters = PythonTask(
        function=write_parameter,
        returns=[
            pickled("df"),
            "integer",
            "floater",
            "stringer",
            "pydantic_param",
            metric("score"),
        ],
        name="set_parameter",
    )

    read_parameters = PythonTask(
        function=read_parameter,
        terminate_with_success=True,
        name="get_parameters",
    )

    pipeline = Pipeline(
        steps=[write_parameters, read_parameters],
    )

    _ = pipeline.execute()

    return pipeline


if __name__ == "__main__":
    main()

Try it now:

uv run examples/03-parameters/passing_parameters_python.py

Exploring your run history

Find your run:

ls .run_log_store/
# clever-einstein-1234.json
# nervous-tesla-5678.json

Examine what happened:

import json
with open('.run_log_store/clever-einstein-1234.json') as f:
    run_log = json.load(f)

print(f"Status: {run_log['status']}")
print(f"Final accuracy: {run_log['parameters']['final_accuracy']}")

Access the data:

ls .catalog/clever-einstein-1234/
# model.pkl
# train.csv
# test.csv
# step1_execution.log

Real example: Catalog tracking

Let's see how file management gets tracked:

from runnable import Pipeline, PythonTask, Catalog

def generate_data():
    # Create files that will be tracked
    df.to_csv("df.csv")
    with open("data_folder/data.txt", "w") as f:
        f.write("Important data")

def process_data():
    # Files are automatically available here
    df = pd.read_csv("df.csv")
    with open("data_folder/data.txt") as f:
        content = f.read()

pipeline = Pipeline(steps=[
    PythonTask(
        function=generate_data,
        catalog=Catalog(put=["df.csv", "data_folder/data.txt"]),
        name="generate"
    ),
    PythonTask(
        function=process_data,
        catalog=Catalog(get=["df.csv", "data_folder/data.txt"]),
        name="process"
    )
])
pipeline.execute()
See complete runnable code
examples/04-catalog/catalog_python.py
"""
You can execute this pipeline by:

    python examples/04-catalog/catalog_python.py
"""

from examples.common.functions import read_files, write_files
from runnable import Catalog, Pipeline, PythonTask, ShellTask


def main():
    write_catalog = Catalog(put=["df.csv", "data_folder/data.txt"])
    generate_data = PythonTask(
        name="generate_data_python",
        function=write_files,
        catalog=write_catalog,
    )

    delete_files_command = """
        rm df.csv || true && \
        rm data_folder/data.txt || true
    """
    # delete from local files after generate
    # since its local catalog, we delete to show "get from catalog"
    delete_local_after_generate = ShellTask(
        name="delete_after_generate",
        command=delete_files_command,
    )

    read_catalog = Catalog(get=["df.csv", "data_folder/data.txt"])
    read_data_python = PythonTask(
        name="read_data_python",
        function=read_files,
        catalog=read_catalog,
        terminate_with_success=True,
    )

    pipeline = Pipeline(
        steps=[
            generate_data,
            delete_local_after_generate,
            read_data_python,
        ]
    )
    _ = pipeline.execute()

    return pipeline


if __name__ == "__main__":
    main()

Try it now:

uv run examples/04-catalog/catalog_python.py

What gets logged for each step

Step Summary:

{
  "Name": "generate_data",
  "Input catalog content": [],
  "Available parameters": [],
  "Output catalog content": ["df.csv", "data_folder/data.txt"],
  "Output parameters": [],
  "Metrics": [],
  "Code identities": ["git:7079b8df5c4bf826d3baf6e3f5839ba6193d88dd"],
  "status": "SUCCESS"
}

File Tracking:

{
  "data_catalog": [
    {
      "name": "df.csv",
      "data_hash": "8650858600ce25b35e978ecb162414d9",
      "catalog_relative_path": "run-id-123/df.csv",
      "stage": "put"
    }
  ]
}

Why this matters

Without automatic tracking:

  • ❌ "It worked last week" debugging sessions
  • ❌ Lost parameter combinations that worked
  • ❌ No way to reproduce important results
  • ❌ Manual documentation that gets stale

With Runnable's tracking:

  • ✅ Every run is completely reproducible
  • ✅ Compare results across different runs
  • ✅ Debug with full execution context
  • ✅ Zero-effort audit trails for compliance

Advanced: Custom run tracking

# Tag important runs
pipeline.execute(tag="production-candidate")

# Environment-specific tracking
pipeline.execute(config="configs/production.yaml")

Custom Run IDs via Environment

Control pipeline execution tracking with custom identifiers:

# Set custom run ID for tracking and debugging
export RUNNABLE_RUN_ID="experiment-learning-rate-001"
uv run ml_pipeline.py

# Daily ETL runs with dates
export RUNNABLE_RUN_ID="etl-daily-$(date +%Y-%m-%d)"
uv run data_pipeline.py

# Experiment tracking with git context
export RUNNABLE_RUN_ID="experiment-$(git branch --show-current)-v2"
uv run research_pipeline.py

Benefits for reproducibility:

  • Predictable naming for experiment tracking
  • Easy identification in run history and logs
  • Integration with external systems and CI/CD
  • Consistent tracking across related pipeline executions

Run ID patterns

Runnable generates memorable run IDs automatically:

  • obnoxious-williams-2248 - From our catalog example
  • nervous-sinoussi-2248 - From our parameters example
  • clever-einstein-1234 - Hypothetical example

Each ID is unique and helps you easily reference specific runs in conversations and debugging.

Best practices

  • Let Runnable generate run IDs for exploration
  • Use tags for important experimental runs
  • Keep your git repo clean for reliable code tracking
  • Use the catalog for all data that flows between steps

Next: Learn how to deploy anywhere while keeping the same reproducibility guarantees.