No Layovers, No Reservations

Productionizing Synthetic Controls for Tripadvisor Using mlsynth

2025-09-13

Executive Summary

  • The problem: SCMs are essential for geo-testing and policy analysis, but current tools are fragmented and hard to automate.

  • The solution: mlsynth — a lightweight Python library that unifies 20+ SCMs behind one API and runs cleanly in CI/CD.

  • The impact:

    • Multi-method estimates in minutes (minimal learning curve)
    • Automated geo-tests via GitHub Actions
    • Faster, reproducible insights for ad campaigns and policy
  • Tripadvisor benefit: Python-first workflows that match team skills and production needs.

What Are SCMs?

Synthetic Control Methods (SCMs) estimate what would have happened without the intervention by building a counterfactual from similar markets. The counterfactual is a weighted average of untreated units, or units that did not experience the same intervention.

SCMs are ideal for:

  • Geo-testing ad campaigns across markets

  • Evaluating local/national tourism policies (e.g., pricing, regulations)

The econometrics literature in recent years has seen a variety of SCM advances. Many such advances use machine-learning methods to augment the standard SCM.

Tripadvisor use case: Measure the impact of marketing campaigns or local policy changes.

Practical Problems

  • However, many of the recent SCM advances that Tripadvisor may benefit from have non-trivial startup costs. These costs inhibit their effective use.

  • The first cost is purely software: not all SCMs are offered in one software. Furthermore, not all software is free.

  • This makes it difficult to test different SCMs in a pipeline, especially given that other existing libraries (e.g., pysoncon, tidysynth, augsynth) differ in terms of data input expectations, options, and dependencies.

Practical Problems cont.

Beyond simple software constraints, SCM tools often fail to:

  • Support multiple methods in one workflow

  • Provide automated diagnostics or effect-size statistics

  • Facilitate easy comparison between methods

Tripadvisor impact: Manual/ad-hoc processes in Python/R slow down the implementation of SCMs for business-critical use cases.

Practical Solutions

A streamlined SCM workflow for Tripadvisor should include:

  • Input: A simple table with unit, time, treatment, and outcome
  • Method: A single .fit() call
  • Output: Effect estimates + diagnostics (e.g., fit quality, robustness)

Tripadvisor benefit: Enables rapid geo-testing across markets and policy evaluations with consistent, automated results.

The Solution is mlsynth

mlsynth is a lightweight Python library which:

  • Unifies multiple SCMs under one consistent API

  • Integrates seamlessly into GitHub Actions or CI/CD pipelines for automated analyses

Result: Tripadvisor’s analysts may focus on insights without wrestling with software logistics.

Why use mlsynth?

  • For Tripadvisor’s analysts:
    • Quick onboarding with one syntax
    • Extensible design to add new and bespoke SCMs
    • Transparent results for easy validation
  • For Tripadvisor’s organization:
    • Production-ready workflows for geo-testing
    • Faster analysis, reducing campaign optimization time
    • Lower technical debt by eliminating R dependencies
  • For Tripadvisor’s leadership:
    • Insights are delivered faster
    • Reduced friction from streamlined tools and automation

Empirical Example: Barcelona Hotel Moratorium

  • Intervention: City-wide hotel building moratorium (July 2015)
  • Outcome: Normalized hotel prices (anonymized, from Booking.com)
  • Control Units: 83 cities (16 Mediterranean)
  • Time: Weekly Data, January 2011 to late 2017.
  • Findings: We find the moratorium increase hotel room prices by roughly 12 percent.

Using mlsynth

Here are the findings from three models.

Why mlsynth is different

Before (Ad-Hoc R/Python Workflows)

  • ❌ Hard to automate (GitHub Actions)
  • ❌ Fragile dependencies/differing syntaxes inherent to different libraries
  • ❌ Limited variety of SCMs to use

After (mlsynth in Python)

  • ✅ Easy automation in Python (CI/CD ready)
  • ✅ One consistent syntax
  • ✅ Numerous, reproducible, production-ready methods

Key Takeaways for Tripadvisor

  • Faster: Faster analysis time, automated geo-testing
  • Scalable: Python-native, CI/CD ready, reproducible across teams
  • Future-proof: Extensible for new SCMs and custom tools

Next Steps

Partner with Jared Greathouse to:
- Select optimal SCMs for Tripadvisor’s use cases
- Productionize workflows with mlsynth + CI/CD
- Build a custom SCM tool with visualizations

📬 Jared Greathouse

Appendix, Page 1

Estimator Language Challenges
Forward DID R Free, but requires extensive user input
Forward SCM Stata Not free, not streamlined
Two-Step SCM Matlab Not free, not streamlined
Robust PCA Synth R + Python Requires two languages

Appendix, Page 2

This is the code used to produce the plot.

#| eval: False

import pandas as pd
from mlsynth import FSCM, CLUSTERSC, PDA

# Load public tourism dataset
url = "https://raw.githubusercontent.com/jgreathouse9/GSUmetricspolicy/refs/heads/main/data/RawData/hotelex.csv"
data = pd.read_csv(url)

# Configure inputs (unit, time, treatment, outcome)
config = {
    "df": data,
    "outcome": "indexed_price",
    "treat": "treat",
    "unitid": "fullname",
    "time": "time",
    "display_graphs": False
}

# Fit Forward SCM
arco = FSCM(config).fit()
y_obs = arco.sub_method_results['FSCM'].time_series.observed_outcome
y_FSCM = arco.sub_method_results['FSCM'].time_series.counterfactual_outcome.flatten()
T0 = arco.raw_results['Fit']['Pre-Periods']

# Fit PDA (L2 method)
config_pda = dict(config)
config_pda["method"] = "l2"
l2pda = PDA(config_pda).fit()
y_l2relaxed = l2pda.time_series.model_extra['synthetic_outcome']

# Fit Robust SCM
config_RSC = dict(config)
config_RSC["method"], config_RSC["cluster"] = "PCR", False
RSC = CLUSTERSC(config_RSC).fit()
y_RSC = RSC.sub_method_results['PCR'].time_series.counterfactual_outcome.flatten()