Applying Forward DID to Construction and Tourism Policy

Causal Inference
Machine Learning
Econometrics
Author

Jared Greathouse

Published

February 25, 2025

Causal inference is critical to economics, marketing, policy, and other sectors of industry. Frequently, policies or natural events occur that may affect metrics we care about. In order to maximize our decision making capabilities, understanding the effects of these events we care about is critical so that businseses and governments may plan future business decisions better or know if a policy intervention achieves its intended aims. In absence of A/B tests (randomized controlled trials, which is quite popular amongst marketing firms/other areas of tech), business scientists and policy analysts frequently resort to constructing counterfactuals to infer treatment effecs. This is because conducting proper experiments is difficult, costly, and/or unethical, especially with most of the events we are concerned with which impact millions of people.

Difference-in-Differences (DID) is one of the most popular methods for quasi-experimental designs/treatment effect analysis. DID is simple to compute, and is valid even in settings where we have one treated unit and a single control unit. The key identifying assumption of DID is parallel trends (PTA), or that the trend of the treated group would be constant with respect to the control group had a given intervention or policy never took place. Additionally, PTA posits no-anticipation of the intervention. Various restatements of PTA are common in the econometrics literature, especially under staggered adoption where DID is frequently applied to use cases researchers care about. However, sometimes DID is used even in settings of a single treated unit. And in these settings, PTA may be less likely to hold. This blog post goes over the Forward DID method with an application to the construction/tourism industry.

Barcelona’s Ban on Hotel Construction

Overtourism is an important problem for cities the globe over. Often, a major complaint about massive tourism based economies is that many of the inhabitants may feel they have no neighbors. In response to such complaints, cities such as Venice, Florence, and Rome have enacted or will enact taxes on visitors, or that Amsterdam and even places like Miami Beach have or have considerd enacting a moratorium on new hotel construction. A key question then, for the hotel inndustry, would be “how might this impact demand or the prices of hotel rooms”. For the bigger construction industry, depending on how important hotel construction is to the local economy, one may ask how these policies would affect put in place value, or the total amount built each month.

In July 2015, Barcelona enacted a hotel moratorium which stopped the building of new hotels. The effect this measure had on the normalized prices of hotel rooms was studied in an academic paper. This paper used the synthetic control method, finding a 16 point index increase in the price of hotel rooms. I use their data to demonstrate the Forward DID method, method which may serve as a complement to standard methods such as synthetic controls.

Applying Forward DID

Okay, now how can we use Forward DID? First install the latest version of mlsynth

pip install -U git+https://github.com/jgreathouse9/mlsynth.git

Now we can fit the FDID model using the entire donor pool. The treatment happens 2015 week 27, and extends to August of the same year.

config = {
    "df": df,
    "treat": treat,
    "time": time,
    "outcome": outcome,
    "unitid": unitid,
    "display_graphs": True,
    "counterfactual_color": "#ff7d7d"
}

model = FDID(config)

arco = model.fit()

This plots the observed versus fitted predictions. We can do the same with the Mediterranean only dataframe

mediterranean_df = df[df["mediterranean"] == 1]

config = {
    "df": mediterranean_df,  # Use the filtered dataframe
    "treat": treat,
    "time": time,
    "outcome": outcome,
    "unitid": unitid,
    "display_graphs": True,
    "counterfactual_color": "#ff7d7d"
}

modelmed = FDID(config)
arcomed = modelmed.fit()

These results may be summarized in a table

from IPython.display import Markdown
import pandas as pd

results = {
    "Metric": ["ATT (Original)", "ATT (Mediterranean)", "R-Squared (Original)", "R-Squared (Mediterranean)", "Weights (Original)", "Weights (Mediterranean)"],
    "Value": [
        arco[0]["FDID"]["Effects"]["ATT"],
        arcomed[0]["FDID"]["Effects"]["ATT"],
        arco[0]["FDID"]["Fit"]["R-Squared"],
        arcomed[0]["FDID"]["Fit"]["R-Squared"],
        ', '.join(set(arco[0]["FDID"]["Weights"].keys())),
        ', '.join(set(arcomed[0]["FDID"]["Weights"].keys()))
    ]
}

# Takes the keys of the dictionary and makes them into a set, joined by commas

results_df = pd.DataFrame(results)

markdown_table = results_df.to_markdown(index=False)

Markdown(markdown_table)
Metric Value
ATT (Original) 12.989
ATT (Mediterranean) 11.643
R-Squared (Original) 0.853
R-Squared (Mediterranean) 0.859
Weights (Original) Donor 40, Donor 82, Donor 30, Donor 81, Donor 80
Weights (Mediterranean) Donor 40, Donor 82, Donor 6, Donor 81, Donor 30

Well, we see that these DID models do pretty well (by comparison, even with the Mediterranean donor pool, DID does not do nearly as well without the regularization of the forward selection algorithm). We also see that regardless of who I use as controls, the models agree that Donors 30, 40, 81, and 82 were among the most important control units in the entire universe of controls provided to us. Now, we do not know who these donors are, but the point is that we now have, with Forward DID, a method by which we can choose our control group for treated units. By comparison, when we use the \(\ell_2\) relaxer, we get an ATT of 10.95 and an \(R^2=0.883\) for the case of all donors. When we use only Mediterranean donors, we get an ATT of 12.75 and an \(R^2=0.911\). The point of this is that by using advanced quasi-experimental methods, we can uncover causal impacts that we could not otherwise by simply doing a \(t\)-test; we can mitigate overfitting and judiciously weigh our control group to have a better sense of what might have been absent the treatment.

The main insight here is that the prices were raised by around 12 index points compared to what they would have been absent the policy intervention. It would be super cool to see what actually happened to demand for hotel rooms and so on, but the data do not afford us that luxury.

Business Use Cases

When might these be useful in business science such as construction or in tourism? Well for one, lots of states have passed laws regarding heat safety for workers which restrict local areas from passing laws which would provide workers with water, shade, and rest in the hot summer months. We can use these techniques to see how such laws/policies affect labor or safety. We can use causal inference to estimate the impact of events that are meant to affect the demand for tourism or other KPIs the tourism industry cares about. For construction, the physical building of construction units could be affected by these kind of policies, impacting metrics like put-in-place value or project stress indices. On the supply side, we may quantify the effects of policies such as tariffs on the costs of materials. With proper causal infernece, firms and policymakers may plan more effectively, knowing whether to pursue current policies or not, and take action with scientifically based analysis.