Shake it to the Max? Using the \(\ell_\infty\) norm for Synthetic Control Methods

Machine Learning

Econometrics

Author

Jared Greathouse

Published

December 25, 2025

Regularization in synthetic control methods has become an important econometric topic. Let \(\mathbf{y}_1 \in \mathbb{R}^{T_0}\) denote the pre-treatment outcomes for the treated unit and let \(\mathbf{Y}_0 \in \mathbb{R}^{T_0 \times |\mathcal{N}_0|}\) denote the corresponding donor matrix. In the most general terms, an SCM is a form of convex optimization where we use a set of donor units that were not exposed to a treatment to predict how the outcomes for a single (or set of) target unit(s) would have evolved without the treatment. In full generality, a synthetic control estimator solves the following family of programs:

\[ \mathbf{w} \;\in\; \underset{\mathbf{w} \in \mathcal{C}}{\operatorname*{argmin}} \; \mathcal{L}(\mathbf{Y}_0, \mathbf{y}_1, \mathbf{w}) \;+\; \mathcal{P}(\mathbf{w}), \]

subject to

\[ \mathcal{B}(\mathbf{Y}_0, \mathbf{y}_1, \mathbf{w}) \;\le\; \boldsymbol{\tau}. \]

Here \(\mathcal{L}(\cdot)\) denotes a data-dependent loss function governing pre-treatment fit, \(\mathcal{P}(\cdot)\) denotes a regularization or geometry-inducing penalty on the weights, and \(\mathcal{C}\) denotes a convex admissible set for the donor weights. The operator \(\mathcal{B}(\cdot)\) encodes balance or moment conditions, and \(\boldsymbol{\tau}\) controls the degree of relaxation. Either \(\mathcal{L}\) or \(\mathcal{B}\) may be identically zero, but not both. The classical synthetic control estimator of Abadie, Diamond, and Hainmueller is obtained by setting

\[ \mathcal{L}(\mathbf{Y}_0, \mathbf{y}_1, \mathbf{w}) = \left\| \mathbf{y}_1 - \mathbf{Y}_0 \mathbf{w} \right\|_2^2, \quad \mathcal{P}(\mathbf{w}) = 0, \quad \mathcal{B} \equiv 0, \]

with admissible weights given by

\[ \mathcal{C} = \left\{ \mathbf{w} \in \mathbb{R}_{+}^{|\mathcal{N}_0|} : \sum_{j \in \mathcal{N}_0} w_j = 1 \right\}, \]

where \(\mathbb{R}_+\) denotes the nonnegative reals. In this case, balance is enforced entirely through the objective function.

Regularized Synthetic Control

Analysts have developed formulations of synthetic control that simultaneously account for level differences and control the geometry of the donor weights. To allow for level differences, we augment the donor matrix with an intercept term:

\[ \widetilde{\mathbf{Y}}_0 = \begin{bmatrix} \mathbf{Y}_0 & \mathbf{1} \end{bmatrix}, \qquad \widetilde{\mathbf{w}} = \begin{bmatrix} \mathbf{w} \\ b_0 \end{bmatrix}, \]

where \(b_0 \in \mathbb{R}\) captures an additive baseline shift. Regularization of the coefficients is also an important issue. Note that all penalties are applied to \(\mathbf{w}\), while \(b_0\) is left unpenalized. For a graphical example, see the plot:

import numpy as np
import matplotlib.pyplot as plt

# Grid for contour plots
x = np.linspace(-1.2, 1.2, 400)
y = np.linspace(-1.2, 1.2, 400)
X, Y = np.meshgrid(x, y)

# Norms
L1 = np.abs(X) + np.abs(Y)
L2 = np.sqrt(X**2 + Y**2)
Linf = np.maximum(np.abs(X), np.abs(Y))

plt.figure(figsize=(6, 6))

# Plot unit balls
plt.contour(X, Y, L1, levels=[1], colors="k", linestyles="dashed")
plt.contour(X, Y, L2, levels=[1], colors="blue")
plt.contour(X, Y, Linf, levels=[1], colors="red")

# Axes and aesthetics
plt.axhline(0)
plt.axvline(0)
plt.gca().set_aspect("equal", adjustable="box")
plt.xlim(-1.2, 1.2)
plt.ylim(-1.2, 1.2)

plt.title("2D Geometry of $\\ell_1$, $\\ell_2$, and $\\ell_\\infty$ Norms")
plt.xlabel("$w_1$")
plt.ylabel("$w_2$")

plt.show()

A common choice is the elastic net penalty, and I cover two flavors of the elastic net in this post. The elastic net penalty is in fact a very general case of a very wide class of SC estimators. The first, by Doudchenko and Imbens (2017), interpolates between the \(\ell_1\) norm and the \(\ell_2\) norm. The former term encourages the SC to be supported on a small subset of donors, while the latter stabilizes the solution in the presence of collinearity:

\[ \mathcal{P}(\mathbf{w}) = \lambda \Big( \alpha \lVert \mathbf{w} \rVert_1 + (1 - \alpha) \lVert \mathbf{w} \rVert_2^2 \Big), \qquad \alpha \in [0,1]. \]

Intermediate values of \(\alpha \in (0,1)\) correspond to the elastic net. Special cases are obtained when \(\alpha = 1\) (pure LASSO), \(\alpha = 0\) (pure Ridge). Alternatively, as studied by Wang, Xing, and Ye (2025), one may replace the \(\ell_2\) term with the \(\ell_\infty\) norm:

\[ \mathcal{P}(\mathbf{w}) = \lambda \Big( \alpha \lVert \mathbf{w} \rVert_1 + (1 - \alpha) \lVert \mathbf{w} \rVert_\infty \Big), \qquad \alpha \in [0,1]. \]

The \(\ell_\infty\) component caps the maximum influence of any single donor, producing a “balanced sparsity” effect: a small number of donors may be selected, but none is allowed to dominate. Geometrically, the \(\ell_1\)–\(\ell_\infty\) penalty replaces the circular ridge constraint with a hyper-rectangular region, reflecting the analyst’s preference for bounding donor influence rather than merely smoothing it. When using the \(\ell_\infty\) variant of hte elastic net, \(\alpha = 0\) corresponds to the max-norm penalty. These special cases highlight how the elastic net family interpolates continuously between sparsity and smoothness (or maximum weight control). Within the general synthetic control framework, the corresponding optimization problem becomes

\[ (\mathbf{w}, b_0) \;\in\; \underset{(\mathbf{w}, b_0) \in \mathcal{C}}{\operatorname*{argmin}} \; \mathcal{L}(\mathbf{Y}_0, \mathbf{y}_1, \mathbf{w}, b_0) + \lambda \Big( \alpha \lVert \mathbf{w} \rVert_1 + (1 - \alpha) \lVert \mathbf{w} \rVert_q \Big), \]

where \(q = 2\) recovers the standard elastic net and \(q = \infty\) recovers the max-norm penalty. Conceptually, the \(\ell_2\) term spreads weight smoothly, stabilizing against collinearity, while the \(\ell_\infty\) term limits any single donor’s dominance. This is analogous to a portfolio with position limits: no single donor can dominate the synthetic control, while overall weight distribution can be controlled via \(\alpha\) and \(\lambda\).

In practice, this can be important: in the original 2003 SCM paper, the donors Catalonia and Madrid received weights of 0.8508 and 0.1492, respectively. While this allocation makes sense economically, there are settings where analysts may wish to reduce the influence of any single donor.

A Relaxed Balanced Approach

Other approaches to mitigating high-dimensionality are possible. Liao, Shi, and Zheng (2025), introduce a relaxation of the fit conditions imposed by the elastic net estimators above. Here, the loss is set to zero, a penalty is placed on the weights, and fit is enforced via a constraint

\[ \mathcal{L} \equiv 0, \qquad \text{penalty active on } \mathbf{w}, \qquad \gamma \in \mathbb{R}. \]

Define the Gram matrix of donor outcomes and the projection of the treated unit onto the donor space as

\[ \mathbf{G} := \frac{1}{T_0}\mathbf{Y}_0^\top \mathbf{Y}_0, \qquad \mathbf{a} := \frac{1}{T_0}\mathbf{Y}_0^\top \mathbf{y}_1. \]

The relaxed balance constraint is

\[ \Big\| \mathbf{G}\mathbf{w} - \mathbf{a} + \gamma \mathbf{1} \Big\|_\infty \le \tau, \]

and the optimization problem becomes

\[ (\mathbf{w},\gamma) \in \underset{\mathbf{w}\in\mathcal{C},\;\gamma\in\mathbb{R}} {\operatorname*{argmin}} \; \|\mathbf{w}\|_2^2 \quad\text{s.t.}\quad \Big\| \mathbf{G}\mathbf{w} - \mathbf{a} + \gamma \mathbf{1} \Big\|_\infty \le \tau . \]

The slack variable \(\gamma\) shifts all donor projections uniformly and is estimated jointly with \(\mathbf{w}\), allowing small, evenly distributed violations when exact pre-treatment matching is infeasible. Alternative penalties can also be used: negative entropy

\[ \sum_j w_j \log w_j \]

encourages dense allocation, while empirical likelihood

\[ - \sum_j \log w_j \]

discourages zero weights. In all cases, the constraint enforces relaxed balance, while the penalty governs the weight structure.

This contrasts with classical SCM and elastic net approaches. There, fit is minimized in the objective and the weights absorb all discrepancies. In the relaxed balance method, fit is a constraint, the objective imposes an \(\ell_2\) (or other) penalty on the weights, and \(\gamma\) allows controlled relaxation—analogous to portfolio optimization with position limits, where constraints cap exposure to any single asset while the objective encourages diversification or avoidance of zero positions. This separation is particularly useful in high-dimensional donor pools or when robustness to over-reliance on any single donor is desired.

An Example in `mlsynth`

As usual, these may be implemented in mlsynth, using the RESCM class. To install mlsynth, you must have Github on your machine and and do

pip install -U git+https://github.com/jgreathouse9/mlsynth.git

from the command line or within your virtual Python environment. To run the model, users provide a panel data frame along with the outcome variable, a treatment indicator, a unit identifier, and a time variable. Optional configuration includes whether to display plots, save results, and customize the colors of treated and counterfactual series are also present.

Modeling options are controlled via the models_to_run dictionary, where a run: bool specifies whether the model is estimated. Relaxed SCM estimators are specified by the "RELAXED" key, with the type of relaxation chosen through the relaxation parameter. Options include l2 for standard Euclidean relaxation, entropy for entropy-based relaxation, and el for elementwise (max-norm) relaxation. The relaxation strength is controlled by the tau parameter, which can be provided explicitly (in which case no cross-validation is performed) or selected via cross-validation over a grid of candidate values. The number of candidate taus is controlled by n_taus, and the number of cross-validation folds by n_splits.

Elastic Net SCM estimators are specified by the "ELASTIC" key and combine L1 with either \(\ell_2\) or \(\ell_\infty\) penalties on donor weights via second_norm="l2" or second_norm="L1_INF". The alpha parameter controls the mixture between L1 and the second norm, while lambda controls the overall penalty strength. If lambda is zero, no cross-validation is performed. If lambda is provided but alpha is not, cross-validation is performed over alpha, and vice versa. An optional intercept can be added via fit_intercept.

The feasible set for the donor weights is specified via the constraint_type parameter. Users may select unit, simplex, affine, nonneg, or unconstrained.

The "unit" option constrains each weight to lie between 0 and 1 independently:

\[ \mathcal{C}_{\text{unit}} = \left\{ \mathbf{w} \in [0,1]^{|\mathcal{N}_0|} \right\} \]

The "simplex" option requires non-negative weights that sum to one:

\[ \mathcal{C}_{\text{simplex}} = \left\{ \mathbf{w} \in \mathbb{R}_{+}^{|\mathcal{N}_0|} : \sum_{j \in \mathcal{N}_0} w_j = 1 \right\} \]

The "affine" option requires weights to sum to one but allows negative values:

\[ \mathcal{C}_{\text{affine}} = \left\{ \mathbf{w} \in \mathbb{R}^{|\mathcal{N}_0|} : \sum_{j \in \mathcal{N}_0} w_j = 1 \right\} \]

The "nonneg" option allows any non-negative weights:

\[ \mathcal{C}_{\text{nonneg}} = \left\{ \mathbf{w} \in \mathbb{R}_{+}^{|\mathcal{N}_0|} \right\} \]

The "unconstrained" option allows weights to take any real value:

\[ \mathcal{C}_{\text{unconstrained}} = \left\{ \mathbf{w} \in \mathbb{R}^{|\mathcal{N}_0|} \right\} \]

Relaxation methods like entropy and el are simplex weights by definition. Once the configuration is set, calling RESCM(config).fit() estimates the selected models on the pre-treatment period and produces counterfactual predictions for both pre- and post-treatment periods. The output is an EstimatorResults object containing separate results for relaxed and elastic SCMs (depending on what the user specifies), including donor weights, time series of observed and counterfactual outcomes, and fit diagnostics such as pre- and post-treatment RMSE. To date, this is the most flexible class of estimators mlsynth provides.

Now for an empirical application for the Basque Country. I fit the three Relaxed SCM models are fit, one for each relaxation type, and a pure \(\ell_\infty\) norm SCM with unit interval donor weights. First I load the data and set up the model

import pandas as pd
import matplotlib.pyplot as plt
from mlsynth import RESCM
from IPython.display import display, Markdown

# ---------------- Load data ----------------
url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/basque_data.csv"
data = pd.read_csv(url)

# ---------------- Relaxation types ----------------
relax_types = ["l2", "entropy", "el"]

# Dictionary to store results per model
results = {}

for i, rtype in enumerate(relax_types):
    print(f"Fitting Relaxed SCM with relaxation_type='{rtype}'...")

    # Only run ELASTIC once
    models_to_run = {
        "RELAXED": {"run": True, "relaxation": rtype, "n_taus": 100},
        "ELASTIC": {"run": i == 0, "second_norm": "L1_INF",
                    "constraint_type": "unit", "fit_intercept": False, "alpha": 0}
    }

    config = {
        "df": data,
        "outcome": data.columns[2],
        "treat": data.columns[-1],
        "unitid": data.columns[0],
        "time": data.columns[1],
        "display_graphs": False,
        "save": False,
        "counterfactual_color": ["blue", "red"],
        "models_to_run": models_to_run
    }

    arco = RESCM(config).fit()

    # RELAXED results
    relax_results = {
        "counterfactual": arco.relax.time_series.counterfactual_outcome,
        "method_name": arco.relax.method_details.method_name,
        "donor_weights": arco.relax.weights.donor_weights
    }

    # ELASTIC results (only if run)
    if hasattr(arco, "elastic") and getattr(arco.elastic, "time_series", None):
        elastic_results = {
            "counterfactual": arco.elastic.time_series.counterfactual_outcome,
            "method_name": arco.elastic.method_details.method_name,
            "donor_weights": arco.elastic.weights.donor_weights
        }
    else:
        elastic_results = None

    results[rtype] = {
        "RELAXED": relax_results,
        "ELASTIC": elastic_results
    }


weights_dict = {}
for relax_type, models in results.items():
    for model_name, output in models.items():
        if output is not None and "donor_weights" in output:
            col_name = f"{relax_type}_{model_name}"
            weights_dict[col_name] = pd.Series(output["donor_weights"])

weights_df = pd.DataFrame(weights_dict).round(3)

# ---------------- Display as Markdown ----------------
markdown_table = weights_df.to_markdown()

Fitting Relaxed SCM with relaxation_type='l2'...

/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/cvxpy/problems/problem.py:1539: UserWarning:

Solution may be inaccurate. Try another solver, adjusting the solver settings, or solve with verbose=True for more information.

Fitting Relaxed SCM with relaxation_type='entropy'...
Fitting Relaxed SCM with relaxation_type='el'...

/opt/hostedtoolcache/Python/3.13.11/x64/lib/python3.13/site-packages/cvxpy/problems/problem.py:1539: UserWarning:

Solution may be inaccurate. Try another solver, adjusting the solver settings, or solve with verbose=True for more information.

and then I plot the model predictions:

# ---------------- Plotting ----------------
plt.figure(figsize=(12, 6))

# Observed trajectory (from any iteration, all identical)
observed = arco.relax.time_series.observed_outcome
time_points = range(len(observed))

plt.plot(time_points, observed, label="Observed", color="black", linewidth=2)

# Plot RELAXED counterfactuals
colors = {"l2": "blue", "entropy": "green", "el": "orange"}
norm_labels = {"l2": r"$\ell_2$", "entropy": r"$\ell_\infty^{\rm Entropy}$", "el": r"$\ell_\infty^{\rm EL}$"}

for rtype in relax_types:
    cf = results[rtype]["RELAXED"]["counterfactual"]
    plt.plot(time_points, cf, label=f"RELAXED ({norm_labels[rtype]})", color=colors[rtype], linestyle="--")

# Plot ELASTIC counterfactual only once (from first iteration)
elastic_cf = results["l2"]["ELASTIC"]
if elastic_cf is not None:
    plt.plot(time_points, elastic_cf["counterfactual"], label=r"ELASTIC ($\ell_1$-$\ell_\infty$)", color="red", linestyle="-")

plt.xlabel("Time Periods")
plt.ylabel("GDP per Capita")
plt.title("Machine Learning Synthetic Control Predictions, Basque Predictions")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Here are the weights:

display(Markdown(markdown_table))

	l2_RELAXED	l2_ELASTIC	entropy_RELAXED	el_RELAXED
Andalucia	0	0	0	0.012
Aragon	0.176	0	0.132	0.047
Baleares (Islas)	0	0	0.005	0.009
Canarias	0	0	0	0.007
Cantabria	0.127	0	0.087	0.035
Castilla Y Leon	0	0	0.007	0.016
Castilla-La Mancha	0	0	0	0.008
Cataluna	0.289	0.528	0.387	0.621
Comunidad Valenciana	0	0	0.004	0.015
Extremadura	0	0	0	0.013
Galicia	0	0	0	0.01
Madrid (Comunidad De)	0.206	0.068	0.177	0.107
Murcia (Region de)	0	0	0.01	0.015
Navarra (Comunidad Foral De)	0.037	0	0.04	0.024
Principado De Asturias	0	0.115	0.028	0.022
Rioja (La)	0.166	0.289	0.121	0.04