Getting Started

This guide will walk you through using GeoStep to design, execute, and analyze geographic marketing experiments.

Terminal Output

1. Installation

First, install GeoStep and its dependencies. The library requires Python 3.8+:

# Clone the repository (if not already done)
git clone https://github.com/your-github-org/geostep.git
cd geostep

# Install in editable mode with all dependencies
pip install -e .

# Or for development setup with additional tools
make dev-setup

Verify Installation

import geostep
print(f"GeoStep version {geostep.__version__} installed successfully.")

2. Quick Start Example

Experiment Workflow Figure: Complete workflow for running a geo-experiment from initial design through power analysis, data collection, statistical analysis, and final decision making. The flowchart shows decision points and iterative refinement steps.

Here’s a complete example showing how to design and analyze a geo-experiment:

import pandas as pd
from datetime import datetime, timedelta
from geostep.designer import SimpleRandomizationDesigner
from geostep.analyzer import LiftAnalyzer
from geostep.visualizer import plot_lift_distribution

# Step 1: Design the experiment
# Create a list of geographic units (e.g., cities, DMAs, zip codes)
geo_data = pd.DataFrame({
    'geo_id': [f'geo_{i:03d}' for i in range(50)],
    'population': [100000 + i*5000 for i in range(50)]
})

# Randomly assign geos to treatment and control
designer = SimpleRandomizationDesigner(seed=42)
assigned_df = designer.design(geo_data, geo_col='geo_id')
print(f"Assigned {len(assigned_df[assigned_df['assignment'] == 'Treatment'])} geos to Treatment")
print(f"Assigned {len(assigned_df[assigned_df['assignment'] == 'Control'])} geos to Control")

# Step 2: Simulate experiment data (in practice, this would be your actual data)
# Generate pre-period and test-period data
dates = pd.date_range('2024-09-01', '2024-12-31', freq='D')
experiment_data = []

for geo_id in assigned_df['geo_id']:
    assignment = assigned_df[assigned_df['geo_id'] == geo_id]['assignment'].iloc[0]
    for date in dates:
        # Base sales with some randomness
        base_sales = 10000 + np.random.normal(0, 500)
        
        # Add treatment effect after test period starts (3% lift)
        if assignment == 'Treatment' and date >= datetime(2024, 10, 1):
            sales = base_sales * 1.03  # 3% lift
        else:
            sales = base_sales
            
        experiment_data.append({
            'geo_id': geo_id,
            'date': date,
            'assignment': assignment,
            'sales': sales
        })

df = pd.DataFrame(experiment_data)

# Step 3: Analyze the results
analyzer = LiftAnalyzer()
results = analyzer.analyze(
    df=df,
    geo_col='geo_id',
    assignment_col='assignment',
    date_col='date',
    kpi_col='sales',
    pre_period_end='2024-09-30',
    test_period_start='2024-10-01',
    test_period_end='2024-12-31'
)

# Step 4: View the results
print(f"Lift Estimate: {results.estimate:.4f} ({results.estimate*100:.2f}%)")
print(f"P-value: {results.p_value:.4e}")
print(f"Confidence Interval: [{results.confidence_interval[0]:.4f}, {results.confidence_interval[1]:.4f}]")

# Access additional metrics
if 'raw_volume_uplift' in results.metadata:
    print(f"Raw Volume Uplift: {results.metadata['raw_volume_uplift']:,.2f} units")

3. Three Ways to Run Experiments

GeoStep provides three convenient ways to execute your analysis pipeline:

Method 1: Simple Example Script

python examples/run_example_analysis.py

This runs a pre-configured analysis with synthetic data and produces Rich-formatted output:

Synthetic Market Example

Method 2: Advanced CLI Runner

# Basic lift analysis
python run_pipeline.py \
  --data examples/synthetic_market_data.csv \
  --analyzer lift \
  --pre-end 2024-09-30 \
  --test-start 2024-10-01 \
  --test-end 2024-12-31

# With bootstrap confidence intervals
python run_pipeline.py \
  --data your_data.csv \
  --analyzer lift \
  --bootstrap-ci \
  --n-bootstrap 2000 \
  --random-seed 42

# Difference-in-Differences analysis
python run_pipeline.py \
  --data your_data.csv \
  --analyzer did \
  --test-start 2024-10-01

Method 3: Makefile Commands

# Run with default settings
make run-pipeline

# Run specific analysis types
make run-lift     # Lift analysis
make run-did      # Difference-in-Differences
make run-crt      # Cluster Randomized Trial

# Run with bootstrap
make run-bootstrap

# See all available commands
make help

4. Understanding the Output

When you run an analysis, you’ll see Rich-formatted terminal output like this:

╭───────────────────────────────────────────────────────────────────────────╮
│ Primary Analysis Results                                                  │
╰───────────────────────────────────────────────────────────────────────────╯
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric              ┃ Value                             ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Lift Estimate       │ 0.0291 (2.91%)                    │
│ P-Value             │ 3.21e-25 *** (highly significant) │
│ Confidence Interval │ [0.0236, 0.0346] ([2.36%, 3.46%]) │
│ Raw Volume Uplift   │ 558.72                            │
└─────────────────────┴───────────────────────────────────┘

Key Metrics Explained:

  • Lift Estimate: The percentage increase in your KPI (e.g., 2.91% sales increase)

  • P-Value: Statistical significance with indicators:

    • *** = highly significant (p < 0.001)

    • ** = very significant (p < 0.01)

    • * = significant (p < 0.05)

    • (not significant) = p ≥ 0.05

  • Confidence Interval: Range of plausible values for the true effect

  • Raw Volume Uplift: Absolute increase in units/dollars

5. Visualizing Results

GeoStep automatically generates visualization plots saved to the results/ directory:

from geostep.visualizer import plot_lift_distribution, plot_analysis_results

# Plot lift distribution
plot_lift_distribution(
    analysis_df=df,
    assignment_col='assignment',
    lift_col='lift_index',
    save_path='results/lift_distribution.png'
)

# Plot time series results
plot_analysis_results(
    results=results,
    df=df,
    date_col='date',
    kpi_col='sales',
    assignment_col='assignment',
    test_period_start='2024-10-01',
    save_path='results/kpi_trend.png'
)

6. Advanced Experiment Designs

Stratified Randomization

For better balance across important covariates:

from geostep.designer import StratifiedRandomizationDesigner

designer = StratifiedRandomizationDesigner(seed=42)
assigned_df = designer.design(
    df=geo_data,
    geo_col='geo_id',
    stratify_cols=['region', 'population_bucket']
)

Staircase Design

For phased rollouts with efficient data collection:

from geostep.designer import StaircaseDesigner

designer = StaircaseDesigner(
    n_steps=4,
    step_duration_periods=2,
    n_pre_periods=2,
    n_post_periods=2,
    seed=42
)
design_df = designer.design(
    df=geo_data,
    geo_col='geo_id'
)

7. Power Analysis

Before running an experiment, determine the required sample size and duration:

from geostep.power import run_power_analysis
from geostep.visualizer import plot_power_analysis

# Run power analysis
power_results = run_power_analysis(
    n_geos_list=[20, 30, 40, 50],
    effect_sizes=[0.01, 0.02, 0.03, 0.05],
    baseline_mean=10000,
    baseline_std=1000,
    alpha=0.05,
    n_simulations=1000
)

# Visualize power curves
plot_power_analysis(
    power_results,
    save_path='results/power_analysis.png'
)

8. Working with Real Data

Your data should have the following structure:

Column

Description

Example

geo_id

Geographic identifier

“DMA_501”, “ZIP_10001”

date

Date of observation

“2024-10-15”

assignment

Treatment group

“Treatment” or “Control”

kpi

Metric to measure

12500.50 (sales, conversions, etc.)

# Load your data
df = pd.read_csv('your_experiment_data.csv')

# Ensure proper data types
df['date'] = pd.to_datetime(df['date'])
df['kpi'] = pd.to_numeric(df['kpi'])

# Run analysis
analyzer = LiftAnalyzer()
results = analyzer.analyze(
    df=df,
    geo_col='geo_id',
    assignment_col='assignment',
    date_col='date',
    kpi_col='kpi',
    pre_period_end='2024-09-30',
    test_period_start='2024-10-01',
    test_period_end='2024-12-31'
)

9. Next Steps

10. Quick Reference

Common CLI Options

--data              # Path to data file
--analyzer          # Analysis type: lift, did, crt
--geo-col           # Geographic ID column name
--kpi-col           # KPI column name
--assignment-col    # Assignment column name
--date-col          # Date column name
--pre-start         # Pre-period start date
--pre-end           # Pre-period end date
--test-start        # Test period start date
--test-end          # Test period end date
--bootstrap-ci      # Use bootstrap confidence intervals
--n-bootstrap       # Number of bootstrap samples
--confidence-level  # Confidence level (default: 0.95)
--random-seed       # Random seed for reproducibility
--output-dir        # Output directory for results
--save-plots        # Save visualization plots

Makefile Commands

make install        # Install package
make install-dev    # Install with dev dependencies
make test           # Run tests
make clean          # Clean build artifacts
make run-pipeline   # Run default pipeline
make run-example    # Run example analysis
make help           # Show all commands