Skip to content

Run Log Store Configuration

Run logs make reproducibility simple - they capture everything needed to understand, debug, and recreate your pipeline executions.

Why Run Logs Matter for Reproducibility

Complete Execution History

Every run is fully documented: Run logs capture the complete context of your pipeline execution

  • πŸ“Š Parameters and inputs: What data and settings were used
  • πŸ”„ Execution timeline: When each step ran and how long it took
  • πŸ’Ύ Data lineage: Which data artifacts were created and consumed
  • πŸ“ Code snapshots: Git commits and code versions used
  • ❌ Failure details: Exact error messages and stack traces
  • 🏷️ Environment metadata: Configuration and infrastructure used

Reproducibility Made Easy

Run logs enable you to:

  • Debug production failures by recreating exact conditions locally
  • Compare experiments across different parameter sets or code versions
  • Audit model training with complete training history and data lineage
  • Resume failed pipelines from the exact point of failure

Available Run Log Stores

Store Type Environment Best For
buffered In-memory only Quick testing and development
file-system Any environment with mounted log_folder Sequential execution, simple setup
chunked-fs Any environment with mounted log_folder Parallel execution, universal choice
minio / chunked-minio Object storage Distributed systems without shared filesystem

buffered

Stores run logs in-memory only. No persistence - data is lost when execution completes.

In-Memory Only

  • No persistence: Run logs are lost after execution
  • Testing only: Not suitable for production or reproducibility
  • No parallel support: Race conditions occur with concurrent execution

Use case: Quick testing and debugging during development.

Configuration

run-log-store:
  type: buffered

file-system

Stores run logs as single JSON files in the filesystem - simple and reliable for sequential execution.

Works Everywhere with Mounted Storage

Runs in any environment where log_folder is accessible

  • πŸ’Ύ Persistent storage: Run logs saved to mounted filesystem
  • πŸ“ Simple structure: One JSON file per pipeline run
  • πŸ” Easy debugging: Human-readable JSON format
  • 🏠 Local development: Direct filesystem access
  • 🐳 Containers: Works with volume mounts
  • ☸️ Kubernetes: Works with persistent volumes

Sequential Only

Not suitable for parallel execution - use chunked-fs for parallel workflows

Configuration

run-log-store:
  type: file-system
  config:
    log_folder: ".run_log_store"  # Optional: defaults to ".run_log_store"

Example

from runnable import PythonJob
from examples.common.functions import hello

def main():
    job = PythonJob(function=hello)
    job.execute()
    return job

if __name__ == "__main__":
    main()
run-log-store:
  type: file-system
  config:
    log_folder: ".run_log_store"

Run with file-system logging:

RUNNABLE_CONFIGURATION_FILE=config.yaml uv run job.py

Result: Run log stored as .run_log_store/{run_id}.json with complete execution metadata for reproducibility.

chunked-fs

Thread-safe run log store - works everywhere with parallel execution support. The recommended choice for most use cases.

Works Everywhere with Mounted Storage

Runs in any environment where log_folder is accessible

  • βœ… Thread-safe: Supports parallel execution without race conditions
  • 🏠 Local development: Direct filesystem access
  • 🐳 Containers: Works with volume mounts (Docker, local-container executor)
  • ☸️ Kubernetes: Works with persistent volumes (Argo, k8s-job executor)
  • ⚑ Parallel execution: Enable enable_parallel: true safely
  • πŸ’Ύ Persistent: Full reproducibility with detailed execution history

Recommended Default

Use chunked-fs unless you have specific requirements - it provides parallel safety and works in all execution environments where the log_folder can be mounted.

Configuration

run-log-store:
  type: chunked-fs
  config:
    log_folder: ".run_log_store"  # Optional: defaults to ".run_log_store"

Example

from runnable import Pipeline, PythonTask, Parallel
from examples.common.functions import hello

def main():
    # Parallel execution safe with chunked-fs
    parallel_node = Parallel(
        name="parallel_tasks",
        branches={
            "task_a": PythonTask(function=hello, name="hello_a"),
            "task_b": PythonTask(function=hello, name="hello_b")
        }
    )

    pipeline = Pipeline(steps=[parallel_node])
    pipeline.execute()
    return pipeline

if __name__ == "__main__":
    main()
run-log-store:
  type: chunked-fs

pipeline-executor:
  type: local
  config:
    enable_parallel: true  # Safe with chunked-fs

Run with chunked-fs logging:

RUNNABLE_CONFIGURATION_FILE=config.yaml uv run pipeline.py

Result: Run logs stored as separate files in .run_log_store/{run_id}/ directory:

  • RunLog.json - Pipeline metadata and configuration
  • StepLog-{step}-{timestamp}.json - Individual step execution details

This chunked structure enables thread-safe parallel writes while maintaining complete execution history for reproducibility.

Object Storage (minio / chunked-minio)

For distributed systems and cloud deployments, use object storage-based run log stores:

minio

run-log-store:
  type: minio
  config:
    endpoint: "https://s3.amazonaws.com"
    access_key: "your-access-key"
    secret_key: "your-secret-key"
    bucket_name: "runnable-logs"
run-log-store:
  type: chunked-minio
  config:
    endpoint: "https://s3.amazonaws.com"
    access_key: "your-access-key"
    secret_key: "your-secret-key"
    bucket_name: "runnable-logs"

Cloud Deployment

Use chunked-minio for distributed systems - it provides the same parallel execution safety as chunked-fs but with cloud storage scalability.

Choosing the Right Run Log Store

Decision Guide

For most users: Use chunked-fs - works in any environment with mounted storage and supports parallel execution

For development/testing: Use buffered for quick iterations where persistence isn't needed

Sequential workflows: Use file-system - works in any environment with mounted storage but only for sequential execution

Distributed systems without shared filesystem: Use chunked-minio when execution environments can't mount a shared log_folder

Filesystem vs Object Storage

Filesystem stores (file-system, chunked-fs): Work in any execution environment where the log_folder can be mounted

  • βœ… Local development (direct filesystem access)
  • βœ… Docker containers (volume mounts)
  • βœ… Kubernetes (persistent volumes)
  • βœ… Any containerized environment with volume mounting

Object storage (minio, chunked-minio): Use when shared filesystem mounting isn't available

Remember: Run logs are your key to reproducibility - they capture everything needed to understand, debug, and recreate your pipeline executions.

Custom Run Log Stores

Need to integrate with your existing logging infrastructure? Build custom run log stores that send execution data anywhere using Runnable's extensible architecture.

Enterprise Integration

Integrate with your existing systems: Never be limited by built-in storage options

  • πŸ“Š Enterprise logging: Send to Splunk, ELK Stack, Datadog, New Relic
  • 🏒 Corporate databases: Store in existing data warehouses, time-series databases
  • πŸ” Compliance systems: Meet audit and governance requirements
  • 🌐 Multi-region storage: Distribute logs across geographic regions

Building Custom Run Log Stores

Learn how to create production-ready custom run log stores:

πŸ“– Custom Run Log Stores Development Guide

The guide provides:

  • Complete stubbed implementation for database and cloud storage integration
  • YAML to Pydantic configuration mapping with validation
  • Storage system patterns for SQL, NoSQL, and cloud storage
  • Performance optimization for high-volume deployments

Quick Example

Create a custom run log store in just 3 steps:

  1. Implement key methods by extending BaseRunLogStore
  2. Register via entry point in your pyproject.toml
  3. Configure via YAML for seamless integration
from runnable.datastore import BaseRunLogStore

class MyDatabaseRunLogStore(BaseRunLogStore):
    service_name: str = "my-database"

    def create_run_log(self, run_id: str, **kwargs):
        # Your database integration here
        pass

Ready to build? See the development guide for implementation patterns and examples.