Productionizing Synthetic Controls for Tripadvisor Using mlsynth
2025-09-13
The problem: SCMs are essential for geo-testing and policy analysis, but current tools are fragmented and hard to automate.
The solution: mlsynth
— a lightweight Python library that unifies 20+ SCMs behind one API and runs cleanly in CI/CD.
The impact:
Tripadvisor benefit: Python-first workflows that match team skills and production needs.
Synthetic Control Methods (SCMs) estimate what would have happened without the intervention by building a counterfactual from similar markets. The counterfactual is a weighted average of untreated units, or units that did not experience the same intervention.
SCMs are ideal for:
Geo-testing ad campaigns across markets
Evaluating local/national tourism policies (e.g., pricing, regulations)
The econometrics literature in recent years has seen a variety of SCM advances. Many such advances use machine-learning methods to augment the standard SCM.
Tripadvisor use case: Measure the impact of marketing campaigns or local policy changes.
However, many of the recent SCM advances that Tripadvisor may benefit from have non-trivial startup costs. These costs inhibit their effective use.
The first cost is purely software: not all SCMs are offered in one software. Furthermore, not all software is free.
This makes it difficult to test different SCMs in a pipeline, especially given that other existing libraries (e.g., pysoncon
, tidysynth
, augsynth
) differ in terms of data input expectations, options, and dependencies.
Beyond simple software constraints, SCM tools often fail to:
Support multiple methods in one workflow
Provide automated diagnostics or effect-size statistics
Facilitate easy comparison between methods
Tripadvisor impact: Manual/ad-hoc processes in Python/R slow down the implementation of SCMs for business-critical use cases.
A streamlined SCM workflow for Tripadvisor should include:
.fit()
callTripadvisor benefit: Enables rapid geo-testing across markets and policy evaluations with consistent, automated results.
mlsynth
mlsynth
is a lightweight Python library which:
Unifies multiple SCMs under one consistent API
Integrates seamlessly into GitHub Actions or CI/CD pipelines for automated analyses
Result: Tripadvisor’s analysts may focus on insights without wrestling with software logistics.
mlsynth
?mlsynth
Here are the findings from three models.
mlsynth
is differentNext Steps
Partner with Jared Greathouse to:
- Select optimal SCMs for Tripadvisor’s use cases
- Productionize workflows with mlsynth
+ CI/CD
- Build a custom SCM tool with visualizations
Estimator | Language | Challenges |
---|---|---|
Forward DID | R | Free, but requires extensive user input |
Forward SCM | Stata | Not free, not streamlined |
Two-Step SCM | Matlab | Not free, not streamlined |
Robust PCA Synth | R + Python | Requires two languages |
This is the code used to produce the plot.
#| eval: False
import pandas as pd
from mlsynth import FSCM, CLUSTERSC, PDA
# Load public tourism dataset
url = "https://raw.githubusercontent.com/jgreathouse9/GSUmetricspolicy/refs/heads/main/data/RawData/hotelex.csv"
data = pd.read_csv(url)
# Configure inputs (unit, time, treatment, outcome)
config = {
"df": data,
"outcome": "indexed_price",
"treat": "treat",
"unitid": "fullname",
"time": "time",
"display_graphs": False
}
# Fit Forward SCM
arco = FSCM(config).fit()
y_obs = arco.sub_method_results['FSCM'].time_series.observed_outcome
y_FSCM = arco.sub_method_results['FSCM'].time_series.counterfactual_outcome.flatten()
T0 = arco.raw_results['Fit']['Pre-Periods']
# Fit PDA (L2 method)
config_pda = dict(config)
config_pda["method"] = "l2"
l2pda = PDA(config_pda).fit()
y_l2relaxed = l2pda.time_series.model_extra['synthetic_outcome']
# Fit Robust SCM
config_RSC = dict(config)
config_RSC["method"], config_RSC["cluster"] = "PCR", False
RSC = CLUSTERSC(config_RSC).fit()
y_RSC = RSC.sub_method_results['PCR'].time_series.counterfactual_outcome.flatten()
Tripadvisor · mlsynth