Skip to content

🆚 Runnable vs Kedro: Simplicity Wins

Both Runnable and Kedro solve pipeline orchestration, but with radically different philosophies. Here's a side-by-side comparison using a real ML workflow.

The Example: Existing ML Functions

Let's start with typical Python functions you might already have:

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import xgboost as xgb
import joblib

def load_and_clean_data():
    """Your existing data loading function."""
    customers = pd.read_csv("s3://bucket/raw-data/customers.csv")
    transactions = pd.read_csv("s3://bucket/raw-data/transactions.csv")

    data = customers.merge(transactions, on="customer_id").dropna()
    X = data.drop(['target'], axis=1)
    y = data['target']

    X.to_csv("features.csv", index=False)
    y.to_csv("target.csv", index=False)
    return {"n_samples": len(X), "n_features": X.shape[1]}

def train_random_forest(n_samples, n_features, max_depth=10):
    """Your existing RF training function."""
    X = pd.read_csv("features.csv")
    y = pd.read_csv("target.csv").values.ravel()

    model = RandomForestClassifier(max_depth=max_depth, random_state=42)
    model.fit(X, y)
    joblib.dump(model, "rf_model.pkl")

    return {"model_type": "RandomForest", "accuracy": model.score(X, y)}

def train_xgboost(n_samples, n_features, max_depth=10):
    """Your existing XGBoost training function."""
    X = pd.read_csv("features.csv")
    y = pd.read_csv("target.csv").values.ravel()

    model = xgb.XGBClassifier(max_depth=max_depth, random_state=42)
    model.fit(X, y)
    joblib.dump(model, "xgb_model.pkl")

    return {"model_type": "XGBoost", "accuracy": model.score(X, y)}

def select_best_model(model_results):
    """Your existing model selection function."""
    best_model = max(model_results, key=lambda x: x['accuracy'])
    # Copy best model logic...
    return best_model

Goal: Create a pipeline that runs these functions with parallel model training.


Making It Work with Runnable

Work required: Add pipeline wrapper (functions stay unchanged)

from runnable import Pipeline, PythonTask, Parallel, Catalog

# Import your existing functions (no changes needed)
from your_ml_code import load_and_clean_data, train_random_forest, train_xgboost, select_best_model

def main():
    pipeline = Pipeline(steps=[
        PythonTask(function=load_and_clean_data, returns=["n_samples", "n_features"]),
        Parallel(branches={
            "rf": PythonTask(function=train_random_forest, returns=["rf_results"]).as_pipeline(),
            "xgb": PythonTask(function=train_xgboost, returns=["xgb_results"]).as_pipeline()
        }),
        PythonTask(function=select_best_model, returns=["best_model"])
    ])
    pipeline.execute()
    return pipeline  # Required for Runnable

if __name__ == "__main__":
    main()

That's it. Functions unchanged, single wrapper file.


Making It Work with Kedro

Work required: Project restructuring + configuration files

Required Project Structure

ml-kedro-project/
├── conf/base/
│   ├── catalog.yml          # Data source/destination definitions
│   ├── parameters.yml       # Pipeline parameters
│   └── logging.yml          # Logging configuration
├── src/ml_kedro_project/
│   ├── pipelines/
│   │   ├── data_engineering/
│   │   │   ├── nodes.py     # Data processing functions
│   │   │   └── pipeline.py  # Pipeline definition
│   │   └── data_science/
│   │       ├── nodes.py     # ML model functions
│   │       └── pipeline.py  # ML pipeline definition
│   └── pipeline_registry.py # Register all pipelines
└── pyproject.toml

Configuration Files Required

Data Catalog (conf/base/catalog.yml)

# Must define every data input/output with type and location
customers_raw:
  type: pandas.CSVDataSet
  filepath: data/01_raw/customers.csv

features:
  type: pandas.CSVDataSet
  filepath: data/03_primary/features.csv

rf_model:
  type: pickle.PickleDataSet
  filepath: data/06_models/rf_model.pkl
# ... repeat for all data assets

Parameters (conf/base/parameters.yml)

model_options:
  max_depth: 15
  random_state: 42

Functions Must Be Restructured

Original function:

def train_random_forest(n_samples, n_features, max_depth=10):
    # Your existing logic

Kedro requires changing to:

def train_random_forest(features: pd.DataFrame, target: pd.Series,
                       parameters: Dict[str, Any]) -> Dict[str, Any]:
    # Must accept data from catalog, parameters from config
    model = RandomForestClassifier(max_depth=parameters["model_options"]["max_depth"])
    # Restructured logic to fit Kedro patterns
    return {"model": model, "accuracy": accuracy}

Pipeline Registration Required:

# src/ml_kedro_project/pipeline_registry.py
def register_pipelines() -> Dict[str, Pipeline]:
    return {
        "__default__": data_engineering.create_pipeline() + data_science.create_pipeline()
    }

Running the Pipeline:

kedro new --starter=pandas-iris ml-kedro-project
# Implement node functions, pipeline definitions, configurations
kedro run


Core Capabilities Comparison

Workflow Features

Feature Runnable Approach Kedro Approach
Pipeline Definition Single Python file with minimal setup Structured project layout with enforced conventions
Task Types Python, Notebooks, Shell, Stubs Python nodes
Parallel Execution Parallel() with explicit branching Automatic dependency resolution
Conditional Logic Native Conditional() support Manual implementation in node logic
Map/Reduce Native Map() with custom reducers Manual implementation required

Data Handling

Feature Runnable Approach Kedro Approach
File Management Simple Catalog(put/get) with minimal config Rich catalog.yml definitions with fine control
Data Versioning Content-based hashing for change detection Timestamp-based versioning
Storage Backends File, S3, Minio via plugins 20+ built-in dataset types with validation
Data Lineage Automatic via run logs kedro-viz visualization

Production Deployment

Feature Runnable Approach Kedro Approach
Environment Portability Same code runs local/container/K8s/Argo Requires deployment-specific configurations
Container Execution Same containerized code runs across environments May require deployment-specific configurations
Extensibility Entry points auto-discovery - custom executors, catalogs, secrets in your codebase Plugin system - public kedro-* packages or custom internal plugins
Monitoring Basic run logs Rich hooks ecosystem
MLOps Integration Tool-agnostic - choose your own MLOps stack Plugin ecosystem (MLflow, Airflow via kedro-* packages)

When to Choose Each Tool

Choose Runnable When:

  • Working with existing Python functions without refactoring
  • Need multi-environment portability (local → container → K8s → Argo)
  • Require advanced workflow patterns (parallel, conditional, map-reduce)
  • Want immediate productivity with minimal setup
  • Working with mixed task types (Python + notebooks + shell)

Choose Kedro When:

  • Need standardized project structure across large teams
  • Require rich data catalog features and validation
  • Heavy ETL pipelines with extensive data governance needs
  • Want established MLOps ecosystem integrations (MLflow, Airflow)
  • Already invested in Kedro infrastructure and expertise

Implementation Structure Comparison

Runnable Approach:

  • Minimal disruption: Wrap existing functions directly without changes
  • Single file: Complete pipeline in one Python file
  • No restructuring: Keep your current code organization and patterns
  • Optional configuration: Add YAML configs only when needed for specific environments

Kedro Approach:

  • Project restructuring: Requires adopting Kedro's directory structure and conventions
  • Multi-file organization: Separate files for nodes, pipelines, catalogs, and configurations
  • Function refactoring: Convert existing functions to fit Kedro node patterns
  • Required configuration: YAML files for catalog, parameters, and logging are essential

🚀 Try Both Yourself

Test Runnable (2 minutes):

pip install runnable
# Copy the Runnable example above
python ml_pipeline.py

Test Kedro (2+ hours):

pip install kedro
kedro new --starter=pandas-iris my-project
# Implement all the files shown above
kedro run

The productivity difference speaks for itself.


Next: See how Runnable compares to Metaflow and other orchestration tools.