# Getting Started

This guide will walk you through using GeoStep to design, execute, and analyze geographic marketing experiments.

![Terminal Output](images/terminal.png)

## 1. Installation

First, install GeoStep and its dependencies. The library requires Python 3.8+:

```bash
# Clone the repository (if not already done)
git clone https://github.com/your-github-org/geostep.git
cd geostep

# Install in editable mode with all dependencies
pip install -e .

# Or for development setup with additional tools
make dev-setup
```

### Verify Installation

```python
import geostep
print(f"GeoStep version {geostep.__version__} installed successfully.")
```

## 2. Quick Start Example

![Experiment Workflow](images/experiment_workflow.png)
*Figure: Complete workflow for running a geo-experiment from initial design through power analysis, data collection, statistical analysis, and final decision making. The flowchart shows decision points and iterative refinement steps.*

Here's a complete example showing how to design and analyze a geo-experiment:

```python
import pandas as pd
from datetime import datetime, timedelta
from geostep.designer import SimpleRandomizationDesigner
from geostep.analyzer import LiftAnalyzer
from geostep.visualizer import plot_lift_distribution

# Step 1: Design the experiment
# Create a list of geographic units (e.g., cities, DMAs, zip codes)
geo_data = pd.DataFrame({
    'geo_id': [f'geo_{i:03d}' for i in range(50)],
    'population': [100000 + i*5000 for i in range(50)]
})

# Randomly assign geos to treatment and control
designer = SimpleRandomizationDesigner(seed=42)
assigned_df = designer.design(geo_data, geo_col='geo_id')
print(f"Assigned {len(assigned_df[assigned_df['assignment'] == 'Treatment'])} geos to Treatment")
print(f"Assigned {len(assigned_df[assigned_df['assignment'] == 'Control'])} geos to Control")

# Step 2: Simulate experiment data (in practice, this would be your actual data)
# Generate pre-period and test-period data
dates = pd.date_range('2024-09-01', '2024-12-31', freq='D')
experiment_data = []

for geo_id in assigned_df['geo_id']:
    assignment = assigned_df[assigned_df['geo_id'] == geo_id]['assignment'].iloc[0]
    for date in dates:
        # Base sales with some randomness
        base_sales = 10000 + np.random.normal(0, 500)
        
        # Add treatment effect after test period starts (3% lift)
        if assignment == 'Treatment' and date >= datetime(2024, 10, 1):
            sales = base_sales * 1.03  # 3% lift
        else:
            sales = base_sales
            
        experiment_data.append({
            'geo_id': geo_id,
            'date': date,
            'assignment': assignment,
            'sales': sales
        })

df = pd.DataFrame(experiment_data)

# Step 3: Analyze the results
analyzer = LiftAnalyzer()
results = analyzer.analyze(
    df=df,
    geo_col='geo_id',
    assignment_col='assignment',
    date_col='date',
    kpi_col='sales',
    pre_period_end='2024-09-30',
    test_period_start='2024-10-01',
    test_period_end='2024-12-31'
)

# Step 4: View the results
print(f"Lift Estimate: {results.estimate:.4f} ({results.estimate*100:.2f}%)")
print(f"P-value: {results.p_value:.4e}")
print(f"Confidence Interval: [{results.confidence_interval[0]:.4f}, {results.confidence_interval[1]:.4f}]")

# Access additional metrics
if 'raw_volume_uplift' in results.metadata:
    print(f"Raw Volume Uplift: {results.metadata['raw_volume_uplift']:,.2f} units")
```

## 3. Three Ways to Run Experiments

GeoStep provides three convenient ways to execute your analysis pipeline:

### Method 1: Simple Example Script
```bash
python examples/run_example_analysis.py
```

This runs a pre-configured analysis with synthetic data and produces Rich-formatted output:

![Synthetic Market Example](images/synthetic_market_example.png)

### Method 2: Advanced CLI Runner
```bash
# Basic lift analysis
python run_pipeline.py \
  --data examples/synthetic_market_data.csv \
  --analyzer lift \
  --pre-end 2024-09-30 \
  --test-start 2024-10-01 \
  --test-end 2024-12-31

# With bootstrap confidence intervals
python run_pipeline.py \
  --data your_data.csv \
  --analyzer lift \
  --bootstrap-ci \
  --n-bootstrap 2000 \
  --random-seed 42

# Difference-in-Differences analysis
python run_pipeline.py \
  --data your_data.csv \
  --analyzer did \
  --test-start 2024-10-01
```

### Method 3: Makefile Commands
```bash
# Run with default settings
make run-pipeline

# Run specific analysis types
make run-lift     # Lift analysis
make run-did      # Difference-in-Differences
make run-crt      # Cluster Randomized Trial

# Run with bootstrap
make run-bootstrap

# See all available commands
make help
```

## 4. Understanding the Output

When you run an analysis, you'll see Rich-formatted terminal output like this:

```
╭───────────────────────────────────────────────────────────────────────────╮
│ Primary Analysis Results                                                  │
╰───────────────────────────────────────────────────────────────────────────╯
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Metric              ┃ Value                             ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Lift Estimate       │ 0.0291 (2.91%)                    │
│ P-Value             │ 3.21e-25 *** (highly significant) │
│ Confidence Interval │ [0.0236, 0.0346] ([2.36%, 3.46%]) │
│ Raw Volume Uplift   │ 558.72                            │
└─────────────────────┴───────────────────────────────────┘
```

### Key Metrics Explained:
- **Lift Estimate**: The percentage increase in your KPI (e.g., 2.91% sales increase)
- **P-Value**: Statistical significance with indicators:
  - `***` = highly significant (p < 0.001)
  - `**` = very significant (p < 0.01)
  - `*` = significant (p < 0.05)
  - `(not significant)` = p ≥ 0.05
- **Confidence Interval**: Range of plausible values for the true effect
- **Raw Volume Uplift**: Absolute increase in units/dollars

## 5. Visualizing Results

GeoStep automatically generates visualization plots saved to the `results/` directory:

```python
from geostep.visualizer import plot_lift_distribution, plot_analysis_results

# Plot lift distribution
plot_lift_distribution(
    analysis_df=df,
    assignment_col='assignment',
    lift_col='lift_index',
    save_path='results/lift_distribution.png'
)

# Plot time series results
plot_analysis_results(
    results=results,
    df=df,
    date_col='date',
    kpi_col='sales',
    assignment_col='assignment',
    test_period_start='2024-10-01',
    save_path='results/kpi_trend.png'
)
```

## 6. Advanced Experiment Designs

### Stratified Randomization
For better balance across important covariates:

```python
from geostep.designer import StratifiedRandomizationDesigner

designer = StratifiedRandomizationDesigner(seed=42)
assigned_df = designer.design(
    df=geo_data,
    geo_col='geo_id',
    stratify_cols=['region', 'population_bucket']
)
```

### Staircase Design
For phased rollouts with efficient data collection:

```python
from geostep.designer import StaircaseDesigner

designer = StaircaseDesigner(
    n_steps=4,
    step_duration_periods=2,
    n_pre_periods=2,
    n_post_periods=2,
    seed=42
)
design_df = designer.design(
    df=geo_data,
    geo_col='geo_id'
)
```

## 7. Power Analysis

Before running an experiment, determine the required sample size and duration:

```python
from geostep.power import run_power_analysis
from geostep.visualizer import plot_power_analysis

# Run power analysis
power_results = run_power_analysis(
    n_geos_list=[20, 30, 40, 50],
    effect_sizes=[0.01, 0.02, 0.03, 0.05],
    baseline_mean=10000,
    baseline_std=1000,
    alpha=0.05,
    n_simulations=1000
)

# Visualize power curves
plot_power_analysis(
    power_results,
    save_path='results/power_analysis.png'
)
```

## 8. Working with Real Data

Your data should have the following structure:

| Column | Description | Example |
|--------|-------------|---------|
| `geo_id` | Geographic identifier | "DMA_501", "ZIP_10001" |
| `date` | Date of observation | "2024-10-15" |
| `assignment` | Treatment group | "Treatment" or "Control" |
| `kpi` | Metric to measure | 12500.50 (sales, conversions, etc.) |

```python
# Load your data
df = pd.read_csv('your_experiment_data.csv')

# Ensure proper data types
df['date'] = pd.to_datetime(df['date'])
df['kpi'] = pd.to_numeric(df['kpi'])

# Run analysis
analyzer = LiftAnalyzer()
results = analyzer.analyze(
    df=df,
    geo_col='geo_id',
    assignment_col='assignment',
    date_col='date',
    kpi_col='kpi',
    pre_period_end='2024-09-30',
    test_period_start='2024-10-01',
    test_period_end='2024-12-31'
)
```

## 9. Next Steps

- **[Methodology](./methodology.md)**: Understand the statistical theory
- **[Business Guide](./business_guide.md)**: ROI analysis and integration strategies
- **[API Reference](./api_reference.md)**: Detailed function documentation
- **[Advanced Topics](./advanced_topics.md)**: Sophisticated techniques
- **[Troubleshooting](./troubleshooting.md)**: Common issues and solutions

## 10. Quick Reference

### Common CLI Options
```bash
--data              # Path to data file
--analyzer          # Analysis type: lift, did, crt
--geo-col           # Geographic ID column name
--kpi-col           # KPI column name
--assignment-col    # Assignment column name
--date-col          # Date column name
--pre-start         # Pre-period start date
--pre-end           # Pre-period end date
--test-start        # Test period start date
--test-end          # Test period end date
--bootstrap-ci      # Use bootstrap confidence intervals
--n-bootstrap       # Number of bootstrap samples
--confidence-level  # Confidence level (default: 0.95)
--random-seed       # Random seed for reproducibility
--output-dir        # Output directory for results
--save-plots        # Save visualization plots
```

### Makefile Commands
```bash
make install        # Install package
make install-dev    # Install with dev dependencies
make test           # Run tests
make clean          # Clean build artifacts
make run-pipeline   # Run default pipeline
make run-example    # Run example analysis
make help           # Show all commands