Causal Inference and Market Experiment Design with MLSYNTH

Jared Greathouse

Introduction

Jared Amani Greathouse
- PhD Candidate, Public Policy, Georgia State University
- Advisor: Jason Coupet
- Specialization: Econometrics and Causal Inference
- Focus: Synthetic Control Methods, High-Dimensional Panel Data, Machine Learning for Treatment Effects

Classic Synthetic Control Model

The SCM estimator constructs a weighted combination of control units to approximate the treated unit pre-treatment. It solves:

\[ \mathbf{w}^{\mathrm{SCM}} = \underset{\mathbf{w} \in \mathcal{W}_{\mathrm{conv}}}{\operatorname*{argmin}} \; \left\| \mathbf{y}_1 - \mathbf{Y}_0 \mathbf{w} \right\|_2^2, \quad \mathcal{W}_{\mathrm{conv}} = \{\mathbf{w} \ge 0, \; \mathbf{1}^\top \mathbf{w} = 1 \} \]

Treated unit: \(\mathbf{y}_1\)
Donor/control units: \(\mathbf{Y}_0\)
Constraint: weights are non-negative and sum to 1

The synthetic control prediction for any time \(t\) is

\[ \hat{\mathbf{y}}_1^{\mathrm{SCM}} = \mathbf{Y}_0 \mathbf{w}^{\mathrm{SCM}} = \sum_{j \in \mathcal{J}_0} w_j^{\mathrm{SCM}} \, y_{jt} \]

Intuition: A weighted average of controls can resemble the treated unit better than any single control or the simple mean of all controls. Deviations post-treatment estimate the causal effect.

My Work

Developed mlsynth, the largest Python SCM package to date

What Makes MLSYNTH Special

Suite of tools for policy evaluation using panel data
Supports dozens of estimators, using techniques from matrix factorization methods to forward selection and proximal inference methods
Consolidates numerous SCMs across multiple software ecosystems into Python with one singular syntax (e.g., R or MATLAB).

pip install -U git+https://github.com/jgreathouse9/mlsynth.git

Offers a simple syntax where uses have a treatment indicator, and unit and time column, and a numerioc outcome in a pandas dataframe.

Challenges of Standard Experiments

Randomized trials are often impractical at scale for many kinds of interventions
- High coordination and costs
- Ethical or logistical constraints
Slow to iterate: each new design requires manual planning and coordination

Introducing MAREX of `mlsynth`

Abadie, Alberto, and Jinglong Zhao. (2025). Synthetic Controls for Experimental Design. https://arxiv.org/abs/2108.02196

\[ \mathcal{L}_{\text{Base}}(\mathbf{w},\mathbf{v}) = \sum_{k=1}^K \Big( \mathbf{f}_{I_k}^\top \mathbf{1} \Big) \Big[ \underbrace{\|\bar{\mathbf{x}}_k - \mathbf{X}_{I_k}^\top \mathbf{w}_{I_k}\|_2^2}_{\text{selecting treated units}} + \underbrace{\|\bar{\mathbf{x}}_k - \mathbf{X}_{I_k}^\top \mathbf{v}_{I_k}\|_2^2}_{\text{selecting control units}} \Big]. \]

Selects both treated and control units simultaneously
Enforces cluster structure (if specified), budgets, and cardinality constraints
Penalizes units far from cluster means to ensure representativeness

MAREX vs. Standard Experiment

Feature	Standard Experiment	MAREX
Validity in small/clustered marketing tests	Often compromised	Maintains validity via synthetic control optimization
Budget-aware	Sometimes requires ad hoc adjustments	Built-in into optimization
Cluster-aware	Possible but may reduce power	Enforced and balanced by design
Ethical / feasibility constraints	Manual adjustments	Integrated constraints
Scenario testing / iteration	Limited by randomization and manual setup	Overnight simulations with multiple configurations
Representativeness of treated/control	Needs careful design, can fail	Ensured via cluster-weighted synthetic controls

An Example

Suppose we have 21 units. We wish to roll out an intervention that we cannot randomize (say, differences in closing times/work hours)

Takeaway

MAREX is a planning tool for experiments where randomization is expensive, infeasible, or restricted
Makes representative, feasible, and comparable treatment assignments
Analysts can simulate, optimize, and justify experimental decisions before implementation
Bridges econometrics theory and practical business experiment design