Student's Tea Chat
What we learnt from working with students on the job market last year
hello people. Job’s await. ChatGPT’s here. Interviews are on the cards. Choices, disappointment, exhilaration. All around the corner.
Last year we worked a lot with students who reached out to help navigate the market. We learned some stuff from doing that. So, these posts are a kind-of summary. Some technical, some process, some random thoughts. Take what you will.
This one is technical: It relates to discussions I have had over the last 4 years on trying to understand the link between how we analyze data and how we do policy. That link is subtle and I am still learning how to best think about it.
In my course on causal inference, I spend 6 lectures on inference and 8 on the Big 4 causal methods. That choice was driven by the fact that (a) most of our students will have a big role as consumers of research and (b) consuming research in a world that produces upwards of 80K papers a day is not trivial. Many policy questions that the course helps students think through arose again and again in job interviews for those thinking of going into data/analysis/policy fields.
So, here is a summary of big ideas from the inference part of that class. I pulled out multiple discussions over the year and asked ChatGPT to do the summary. If there are parts of that summary you don’t understand, let me know, and I can post some stuff.
If the writing is more formal than usual, blame the GPT, not me. Seriously. In response to my prompt, the GPT has the gall to tell me:
“Got it. Let me strip this down into clean, careful teaching notes, written for students, with zero email drama, no side rants, and no assumed sophistication beyond a standard econometrics course. Think of this as something you could hand out after a lecture and say: “This is the conceptual map.””
I now know what Trurl and Klapaucius felt when they started kicking that infernal machine...
What Do Experiments, Estimates, and Confidence Intervals Actually Tell Us?
A short guide to avoiding common statistical mistakes
1. What Question Are We Answering?
Statistical methods answer different kinds of questions, and confusion happens when we mix them up.
Frequentist inference asks:
If the true effect were X, how likely is it that I would observe data like this?
Policy and decision-making ask:
Given the data I observed, what should I believe about the size of the effect, and what should I do?
These are not the same question.
2. What a Hypothesis Test Does (and Does Not Do)
In a randomized experiment, a hypothesis test evaluates a null hypothesis (often “no effect”).
A p-value tells us how surprising the data would be if the null were true
A small p-value allows us to reject the null
That is all.
A hypothesis test does not tell us:
the probability that the true effect equals the estimate
the probability that the effect lies in any interval
how large the effect “really is”
3. What a Confidence Interval Really Means
A 95% confidence interval is often misinterpreted.
Correct interpretation (before the experiment):
If we were to repeat this experiment many times and compute a confidence interval each time, 95% of those intervals would contain the true effect.
Important implications:
The interval is random before the experiment
After the experiment:
Incorrect interpretation:
“There is a 95% probability that the true effect lies in this interval.”
That statement is not meaningful in frequentist statistics.
A Note: This is the slide that leads to the *most* confusion. The correct interpretation sounds circular—shouldn’t it be “compute an “effect” each time? The answer is “No”—it is truly “compute a confidence interval” each time. This is a key idea—the Confidence Interval is a random variable (interval) that changes each time a sample is taken. What then is “the confidence interval’? Well, there is no single confidence interval, but there is a procedure for generating a confidence interval based on every sample. What the interpretation tells us is that if our procedure is correct, then the CI that I have computed for each sample will have contained the true effect in 95% of the samples. The Wikipedia article on Confidence Intervals is actually really good if you are still having trouble!
A Second Note: You can invoke a Bayesian perspective to justify the `bad’ claim that there is a 95% probability that the correct confidence interval contains the true effect from a single sample. This will happen if the prior is uninformative over its support, in that, it attributes each probability to every possible outcome. Then, the 95% frequentist CI and the 95% Bayesian credible interval will coincide. That’s fine, but my point is that if we are going to use a Bayesian workaround to justify a frequentist procedure, why not then account for the fact that in many cases, I should *not* use an uninformative prior. What you can’t do is claim a Bayesian antecedent for your policy analysis, but then cry foul when people don’t agree with the implicit assumption of an uninformative prior.
4. Why the Point Estimate Is Not “the True Effect”
The estimated coefficient (β̂) is:
a random variable
one realization from a sampling process
The true effect (β) is:
fixed
unknown
Frequentist theory shows that β̂ has good properties under repeated experiments (e.g., consistency), but:
Once a single experiment is complete, frequentist methods do not assign probabilities to different values of β.
5. Repetition Is the Key to Frequentist Logic
All frequentist guarantees rely on repetition:
repeated samples
repeated estimates
repeated confidence intervals
Policy decisions are not repeated experiments. You observe:
one study
one estimate
one confidence interval
Frequentist inference alone does not tell you how to convert that single result into a belief about the true effect size.
6. Why This Matters for Policy Analysis (CEA / CBA)
Cost-effectiveness and cost-benefit analysis require:
beliefs about effect sizes
expected outcomes
trade-offs and losses
Frequentist outputs (p-values, confidence intervals) do not provide these directly.
When economists use estimated effects for policy:
they are making additional assumptions
often implicitly
often without acknowledging them
7. The Bayesian Alternative (Briefly)
Bayesian analysis allows statements like:
“There is a 95% probability the effect lies in this range.”
But only because:
effects are treated as random variables
beliefs depend on a prior
different priors lead to different conclusions
There is no “prior-free” way to do this.
8. The Bottom Line
Experiments are powerful tools for rejecting hypotheses
Confidence intervals describe procedures, not beliefs
Point estimates are not probabilistic statements about truth
Using evidence for policy requires:
A rule of thumb
Frequentist statistics tells us when data are inconsistent with a hypothesis. It does not tell us what to believe instead.

