MLSYNTH

A Machine Learning Synthetic Control Library Using Python

This is the landing page for the mini-ecosystem mlsynth. In terms of the raw number of synthetic control estimators, it is the largest open source library that exists.

Contributors: Jared Greathouse

Status: Beta version released. Documentation and app are in active development.

Philosophy

The mlsynth library implements machine learning–based synthetic control estimators with a focus on simplicity and flexibility. Unlike other libraries that require juggling multiple methods for data prep, fitting, prediction, and plotting, mlsynth streamlines everything into a single .fit() call. All estimator options are passed through a configuration dictionary, allowing you to switch between models or classes with minimal code changes. For example,

# Forward SCM, Forward Augmented SCM, and Forward Differnece-in-Differences

import pandas as pd
from mlsynth import FSCM, FDID

url = "https://raw.githubusercontent.com/jgreathouse9/mlsynth/refs/heads/main/basedata/smoking_data.csv"
data = pd.read_csv(url)

# base config
config = {
    "df": data,
    "outcome": data.columns[2],
    "treat": data.columns[-1],
    "unitid": data.columns[0],
    "time": data.columns[1],
    "display_graphs": True,
    "save": False,
    "counterfactual_color": ["red"]
}

# Forward SCM
arco = FSCM(config).fit()

# Augmented Forward SCM
config_aug = dict(config)
config_aug["use_augmented"] = True
arco_aug = FSCM(config_aug).fit()

# Forward Difference-in-Differences
fdid = FDID(config).fit()

runs two different flavors of the Forward Selected SCM and Forward DID.

Furthermore, we can do this with essentially zero change in the data structure or input requirements beyond simple changes to a dictionary. Each .fit() call returns a comprehensive BaseEstimatorResults object containing treatment effects, fit diagnostics (e.g., RMSE, R-squared), time-series data, and (where applicable) donor weights, ready for analysis or visualization.

mlsynth is also customizable too. For advanced users, mlsynth’s design has a submodule for each class, and each class has its own unique configuration dictionary. This mean that if you want to add your very own proprietary method, you need only add it to the base models, update the init.py locally/forked at your own repo, and take advantage of the existing helpers/utils to make a brand new method.

GitHub Repo

Here is the GitHub repo. You may install mlsynth by doing

pip install -U git+https://github.com/jgreathouse9/mlsynth.git

in your terminal.

Documentation

Here is the link to the documentation.

App

Here is the beta version of the app which runs many of the mlsynth estimators.

Projects:

Here are the projects which use mlsynth. Please let me know if yours does and I will add it!

Economic Impact of Cameroon’s Anglophone Crisis: A Forward Difference-in-Differences Approach (forthcoming)