geostep.designer.StratifiedRandomizationDesigner

class geostep.designer.StratifiedRandomizationDesigner(num_groups: int = 2, n_strata: int = 4, seed: int = 42)[source]

Methods

__init__([num_groups, n_strata, seed])

Initialize designer with configuration.

design(df, geo_col, strat_vars)

Assigns geographic units to groups using stratified randomization.

enable_monitoring([enabled])

Enable or disable performance monitoring.

get_metrics()

Get performance and execution metrics.

monitor_operation(operation_name[, ...])

Context manager for monitoring operations.

post_process_design(design_df)

Post-process design results.

prepare_data([df, geo_col, strat_vars])

Prepare data for stratified randomization.

set_metrics_collector(collector)

Set the metrics collector for this instance.

validate_inputs([df, geo_col, strat_vars])

Validate inputs for stratified randomization design.

Attributes

__init__(num_groups: int = 2, n_strata: int = 4, seed: int = 42)[source]

Initialize designer with configuration.

validate_inputs(df: DataFrame = None, geo_col: str = None, strat_vars: List[str] = None, **kwargs) None[source]

Validate inputs for stratified randomization design.

Parameters:
  • df (pd.DataFrame) – DataFrame containing geographic units and stratification variables.

  • geo_col (str) – The name of the column with unique geographic identifiers.

  • strat_vars (list of str) – A list of column names to use for stratification.

Raises:
  • ValidationError – If input validation fails.

  • DesignError – If num_groups is not 2 or n_strata is invalid.

prepare_data(df: DataFrame = None, geo_col: str = None, strat_vars: List[str] = None, **kwargs) None[source]

Prepare data for stratified randomization.

Parameters:
  • df (pd.DataFrame) – DataFrame containing geographic units and stratification variables.

  • geo_col (str) – The name of the column with unique geographic identifiers.

  • strat_vars (list of str) – A list of column names to use for stratification.

design(df: DataFrame, geo_col: str, strat_vars: List[str]) DataFrame[source]

Assigns geographic units to groups using stratified randomization.

Parameters:
  • df (pd.DataFrame) – DataFrame containing geographic units and stratification variables.

  • geo_col (str) – The name of the column with unique geographic identifiers.

  • strat_vars (list of str) – A list of column names to use for stratification.

Returns:

The input DataFrame with ‘stratum’ and ‘assignment’ columns added.

Return type:

pd.DataFrame

Raises:

ValidationError – If input validation fails.