5 Asymptotic Theory
Across the earth, there are around 7.5 sextillion sand grains across all beaches and deserts. However, mathematics isn’t bound by Earthly constraints. As I mentioned in the previous chapter, probability and statistics is about quantifying uncertainty in order to draw conclusions. However, it is now time to understand the very basics under which we can draw conclusions to start with. To do this, we investigate basic asymptotic theory. It is the very underpinning of statistics, in particular explaining the circumstances under which we can be confident about our estimates. We defined confidence intervals rather quickly in the previous lecture. Here, we will add to your toolkit with which to understand when they are valid.
5.1 Law of Iterated Expectations
Suppose we wish to commute to and from Georgia State University from Marietta, Georgia. As we’ve discussed in the previous chapter, we can think of some variable
So suppose we take this same highway for one month, using only 75 South and North (unlike say I-285). We record the amount of time it took us to get to school and home that day using either way (that is, we record our commute time to and from school each day for both ways, taking the average separately for each week). Mathematically, we express this as the expected commute time conditional on the chosen route
So, if after a month of data collection, we find that the average commute time is 17 minutes, this is the unconditional expectation
5.2 Law of Large Numbers
Why might this be true, though? Why, after taking all of the averages across a whole month do we arrive at 17, which is a lot closer to 15 than we first thought? Surely, this number is much quicker than the 30 minutes we took the first day? The reason for us getting this value is because of what we call the Law of Large Numbers, or
As it turns out, that’s exactly what we do see. As researchers, what this means is that drawing from a large sample tends to be better than a small one for empirical. For example, if someone has a sample size of 20, we likely would not be okay with generalizing one particular aspect of this sample (say, weight or political affiliation) to everyone in the same city, as we’d need more data points to average over. Ideally, if we’re trying to sample American public opinion, we don’t want to use only 2 American states, ideally we’d have all the state data available to use.
5.3 Central Limit Theorem
By understanding the Law of Iterated Expectations (LIE) and the Law of Large Numbers (LLN), we can now delve into the Central Limit Theorem (CLT), which helps us characterize the overall distribution of commute times. That way, we can do things like calculate the confidence interval of our commute times, hoping it will approximate the true one. The CLT says:

This convergence supports the LLN, which tells us that our sample mean will approach the true mean

Additionally, the CLT allows us to visualize how even if the original distribution of the variable is not normal (say, a coin toss where we only have two outcomes, heads or tails), the distribution, given enough samples, converges to a normal distribution.The plot above flips a coin 1000 times, plotting the empirircal distribution after every 100 flips. We can see that the distribution, despite having only two outcomes, converges to a normal one. Thus, CLT allows us to construct confidence intervals and make inferences about a population, given a large enough sample.
5.4 Sampling
Before we continue however, we need to say a few words about sampling, and the idea of collecting a random, probibalistic sample versus a non-random sample. The reason this matters is because no matter how we calculate our statistics, we need to be sure that our statistics can actually map on to the actual constructs we want them to. A random sample, in principle, is the idea that everyone from a given population has the same chance to be included in the sample for some study. For example, if we wanted to survey the public opinion of Georgia State students, one potential way of doing this would be to get every single active GSU email from the University and put it into a spreadsheet. We then could generate a variable in the spreadsheet called a “Bernoulli” variable, or a variable that takes on the values 0 or 1. We can then, after setting the number of observations (defined as the number of emails we have), define the probability with which a given variable takes on 0 or 1 to be 0.5. In other words, everyone has an equal chance of being included in the survey, 1 means youre included, else 0. Note, there are other, much more sophisticated ways to do this, but this is one way of taking a random sample.
What would a non-random sample look like, then? Well, in principle it’s where everyone in the population does not have an equal chance of being included in the sample. But, this is a simple definition. It does not explain why it’s bad or why it’s harmful to researchers. So, let’s consider a few kinds of non-random sampling. The most obvious one is something called convenience sampling. As the name suggests, we take a sample based on whoever is around us. This can be in our class, on the street, or in our friend group. For example, I have two coworkers, one goes to MIT and the other went to Columbia, both for econ degrees. If I wanted to know how good the average person in the United States is at calculus or math, then I cannot simply assess their skillset and reach a conclusion. Why not? Well, MIT and Columbia are incredibly selective schools that select for math skills. Furthermore, the fact that I know them or work with them is likely correlated to my research interests which demand a good background in statistics/applied mathematics. Similarly, even if we personally randomly asked people on the street, this is still not a random sample. Why? Well, there are likely underlying reasons to people being in certain locations. If you’re in the biology department, it’s likely that most people you meet will have an uncanny knowledge about germ theory or anatomy. And we’d expect this, we’d say something’s wrong if these were not true.
At this point, you’re likely asking what any of this has to do with the previous discussion of asymptotic theory or policy analysis for that matter. Well, the sample we collect is directly related to the results that we obtain. If our goal is to approximate math knowledge in America, it’s inapprorpiate to only sample students from the engineering department at Georgia Tech or the economics students at Chicago. To say it differently: even if we did collect data from every single student at every single engineering department in the United States, this would map on to the population of engineering students, at best, not the United States as a whole. In statistics language, our estimate of the “true” mean would be biased, in this case significantly upwards. Even though our confidence intervals would be more precise as our sample size increases (that is, the more engineering students we ask), we would still never converge to the true parameter because our sample is so dissimilar to the population of people we care about, effectively giving us the right answer to the wrong question. When we discuss regression, the importance of sampling will become even more apparent, but for now suffice to say that it’s important that the sample be as representative of the underlying population as possible.
6 Summary
Asymptotic theory simplifies statistical analysis by encouraging us to think about the “true” population of interest. It provides us with tools to derive more accurate and precise estimates from. But, as a more general rule for policy analysis (and life), it demands us to think with an infinte mind instead of a limited one. In other words, we should always ask ourselves as researchers “if we could sample everyone, would this statistic I just calculated be close to reliable?” One practical implication of this is having a sufficiently large sample size in order to increase the probability of being closer to the true mean. However, as the previous section discusses, these asymptotics are only as justified as the quality of the sample. We need, in other words, our sample to be as representative as possible of the broader population of interest. In the next lecture on correlation, aside from sampling, we will discuss the basics of statistical design in order to have valid results from statistical analysis.