Adding Flexibility¶
Now let's solve another major problem: hardcoded parameters. We'll make our ML function configurable so you can run different experiments without touching any code.
The Problem with Hardcoded Parameters¶
In Chapter 2, our function still had hardcoded values:
# Fixed values - need code changes for experiments
preprocessed = preprocess_data(df, test_size=0.2, random_state=42)
model_data = train_model(preprocessed, n_estimators=100, random_state=42)
Want to try n_estimators=200? Edit the code. Different train/test split? Edit the code. This doesn't scale for experimentation.
The Solution: Parameterized Functions¶
Let's create a flexible version that accepts parameters:
def train_ml_model_flexible(
data_path="data.csv",
test_size=0.2,
n_estimators=100,
random_state=42,
model_path="model.pkl",
results_path="results.json"
):
"""Same ML logic, now configurable!"""
print("Loading data...")
df = load_data(data_path)
print("Preprocessing...")
preprocessed = preprocess_data(df, test_size=test_size, random_state=random_state)
print(f"Training model with {n_estimators} estimators...")
model_data = train_model(preprocessed, n_estimators=n_estimators, random_state=random_state)
# ... rest unchanged but uses parameters
Running with Parameters¶
Now you can run different experiments without changing code:
๐ Environment Variables¶
# Default parameters
uv run examples/tutorials/getting-started/03_adding_flexibility.py
# Large forest experiment
RUNNABLE_PRM_n_estimators=200 uv run examples/tutorials/getting-started/03_adding_flexibility.py
# Different train/test split
RUNNABLE_PRM_test_size=0.3 RUNNABLE_PRM_n_estimators=150 uv run examples/tutorials/getting-started/03_adding_flexibility.py
๐ Configuration Files¶
Create experiment configurations:
test_size: 0.2
n_estimators: 50
random_state: 42
model_path: "models/basic_model.pkl"
results_path: "results/basic_results.json"
test_size: 0.25
n_estimators: 200
random_state: 123
model_path: "models/large_forest.pkl"
results_path: "results/large_forest_results.json"
Run different experiments:
# Basic experiment
uv run examples/tutorials/getting-started/03_adding_flexibility.py --parameters-file experiment_configs/basic.yaml
# Large forest experiment
uv run examples/tutorials/getting-started/03_adding_flexibility.py --parameters-file experiment_configs/large_forest.yaml
Parameter Precedence¶
Runnable handles parameter conflicts intelligently:
- Environment variables (highest priority):
RUNNABLE_PRM_n_estimators=300 - Command line config:
--parameters-file config.yaml - Function defaults (lowest priority): What you defined in the function signature
This means you can have a base configuration file but override specific values with environment variables.
What You Get Now¶
๐งช Easy Experimentation¶
- Test different hyperparameters instantly
- Compare multiple approaches without code changes
- Save each experiment configuration for reproducibility
๐ Automatic Experiment Tracking¶
Every run gets logged with the exact parameters used:
๐ Reproducible Experiments¶
Want to recreate that great result from last week? Just rerun with the same config file.
๐ฏ Clean Separation¶
- Your ML logic: Stays in the function, unchanged
- Experiment configuration: Lives in config files or environment variables
- Execution tracking: Handled automatically by Runnable
Try It Yourself¶
Run these experiments and watch how each gets tracked separately:
cd examples/tutorials/getting-started
# Experiment 1: Default
uv run 03_adding_flexibility.py
# Experiment 2: Large forest
RUNNABLE_PRM_n_estimators=200 uv run 03_adding_flexibility.py
# Experiment 3: From config file
uv run 03_adding_flexibility.py --parameters-file experiment_configs/large_forest.yaml
# Check the logs - each run preserved with its parameters
ls .run_log_store/
Compare: Before vs After¶
Before:
- โ Parameters hardcoded in functions
- โ Code changes needed for experiments
- โ Hard to track which parameters produced which results
After:
- โ Functions accept parameters with sensible defaults
- โ Experiments configurable via environment or config files
- โ Every run logged with exact parameters used
- โ Easy to reproduce any experiment
Next: We'll break our monolithic function into a proper multi-step ML pipeline.
Next: Connecting the Workflow - Multi-step ML pipeline with automatic data flow