Causal Inference and Market Experiment Design with MLSYNTH

Jared Greathouse

Introduction

Jared Amani Greathouse
- PhD Candidate, Public Policy, Georgia State University
- Advisor: Jason Coupet
- Specialization: Econometrics and Causal Inference
- Focus: Synthetic Control Methods, High-Dimensional Panel Data, Machine Learning for Treatment Effects

Classic Synthetic Control Model

The SCM estimator constructs a weighted combination of control units to approximate the treated unit pre-treatment. It solves:

\[ \mathbf{w}^{\mathrm{SCM}} = \underset{\mathbf{w} \in \mathcal{W}_{\mathrm{conv}}}{\operatorname*{argmin}} \; \left\| \mathbf{y}_1 - \mathbf{Y}_0 \mathbf{w} \right\|_2^2, \quad \mathcal{W}_{\mathrm{conv}} = \{\mathbf{w} \ge 0, \; \mathbf{1}^\top \mathbf{w} = 1 \} \]

  • Treated unit: \(\mathbf{y}_1\)
  • Donor/control units: \(\mathbf{Y}_0\)
  • Constraint: weights are non-negative and sum to 1

The synthetic control prediction for any time \(t\) is

\[ \hat{\mathbf{y}}_1^{\mathrm{SCM}} = \mathbf{Y}_0 \mathbf{w}^{\mathrm{SCM}} = \sum_{j \in \mathcal{J}_0} w_j^{\mathrm{SCM}} \, y_{jt} \]

Intuition: A weighted average of controls can resemble the treated unit better than any single control or the simple mean of all controls. Deviations post-treatment estimate the causal effect.

My Work

  • Developed mlsynth, the largest Python SCM package to date

What Makes MLSYNTH Special

  • Suite of tools for policy evaluation using panel data

  • Supports dozens of estimators, using techniques from matrix factorization methods to forward selection and proximal inference methods

  • Consolidates numerous SCMs across multiple software ecosystems into Python with one singular syntax (e.g., R or MATLAB).

pip install -U git+https://github.com/jgreathouse9/mlsynth.git
  • Offers a simple syntax where uses have a treatment indicator, and unit and time column, and a numerioc outcome in a pandas dataframe.

Challenges of Standard Experiments

  • Randomized trials are often impractical at scale for many kinds of interventions
    • High coordination and costs
    • Ethical or logistical constraints
  • Slow to iterate: each new design requires manual planning and coordination

Introducing MAREX of mlsynth

  • Abadie, Alberto, and Jinglong Zhao. (2025). Synthetic Controls for Experimental Design. https://arxiv.org/abs/2108.02196

\[ \mathcal{L}_{\text{Base}}(\mathbf{w},\mathbf{v}) = \sum_{k=1}^K \Big( \mathbf{f}_{I_k}^\top \mathbf{1} \Big) \Big[ \underbrace{\|\bar{\mathbf{x}}_k - \mathbf{X}_{I_k}^\top \mathbf{w}_{I_k}\|_2^2}_{\text{selecting treated units}} + \underbrace{\|\bar{\mathbf{x}}_k - \mathbf{X}_{I_k}^\top \mathbf{v}_{I_k}\|_2^2}_{\text{selecting control units}} \Big]. \]

  • Selects both treated and control units simultaneously

  • Enforces cluster structure (if specified), budgets, and cardinality constraints

  • Penalizes units far from cluster means to ensure representativeness

MAREX vs. Standard Experiment

Feature Standard Experiment MAREX
Validity in small/clustered marketing tests Often compromised Maintains validity via synthetic control optimization
Budget-aware Sometimes requires ad hoc adjustments Built-in into optimization
Cluster-aware Possible but may reduce power Enforced and balanced by design
Ethical / feasibility constraints Manual adjustments Integrated constraints
Scenario testing / iteration Limited by randomization and manual setup Overnight simulations with multiple configurations
Representativeness of treated/control Needs careful design, can fail Ensured via cluster-weighted synthetic controls

An Example

Suppose we have 21 units. We wish to roll out an intervention that we cannot randomize (say, differences in closing times/work hours)

Takeaway

  • MAREX is a planning tool for experiments where randomization is expensive, infeasible, or restricted
  • Makes representative, feasible, and comparable treatment assignments
  • Analysts can simulate, optimize, and justify experimental decisions before implementation
  • Bridges econometrics theory and practical business experiment design