I don't think Lindley's paradox supports p-circling
Posted by speckx 18 hours ago
Comments
Comment by CrazyStat 16 hours ago
This is an interesting post but the author’s usage of Lindley’s paradox seems to be unrelated to the Lindley’s paradox I’m familiar with:
> If we raise the power even further, we get to “Lindley’s paradox”, the fact that p-values in this bin can be less likely then they are under the null.
Lindley’s paradox as I know it (and as described by Wikipedia [1]) is about the potential for arbitrarily large disagreements between frequentist and Bayesian analyses of the same data. In particular, you can have an arbitrarily small p-value (p < epsilon) from the frequentist analysis while at the same time having arbitrarily large posterior probabilities for the null hypothesis model (P(M_0|X) > 1-epsilon) from the Bayesian analysis of the same data, without any particularly funky priors or anything like that.
I don’t see any relationship to the phenomenon given the name of Lindley’s paradox in the blog post.
Comment by jeremysalwen 14 hours ago
The fact remains that for things which are claimed to be true but turn out to not be true later, the p values that were provided in the paper are very often near the significance threshold. Not so much for things which are obviously and strongly true. This is direct evidence of something that we already know, which is thst nobody cares about p values per se, they only use them to communicate information about something being true or false in the real world, and the technical claim of "well maybe x or y is true, but when I said p=0.49 I was only talking about a hypothetical world where x is true, and my statement about that world still holds true" is no solace.
Comment by nerdponx 2 hours ago
Comment by contravariant 16 hours ago
That said you can give a Bayesian argument for p-circling provided you have a prior on the power of the test. The details are almost impossible to work out except for a case by case calculation because unless I'm mistake the shape of the p-value distribution when the null-hypothesis does not hold is very ill defined.
However it's quite possible to give some examples where intuitively a p-value of just below 0.05 would be highly suspicious. You just need to mix tests with high power with unclear results. Say for example you're testing the existence of gravity with various objects and you get a probability of <0.04% that objects just stay in the air indefinitely.
Comment by gweinberg 14 hours ago
Comment by CrazyStat 14 hours ago
(1) You can get the same effect with a prior distribution concentrated around a point instead of a point prior. The null hypothesis prior being a point prior is not what causes Lindley’s paradox.
(2) Point priors aren’t intrinsically nonsensical. I suspect that you might accept a point prior for an ESP effect, for example (maybe not—I know one prominent statistician who believes ESP is real).
(3) The prior probability assigned to each of the two models also doesn’t really matter, Lindley’s paradox arises from the marginal likelihoods (which depend on the priors for parameters within each model but not the prior probability of each model).
Comment by gweinberg 5 hours ago
The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.
Comment by CrazyStat 5 hours ago
> The nonsense isn't just that they're assuming a point probability, it's that, conditional on that point probability not being true, there's only a 2% chance that theta is .5 += .01. Whereas the actual a priori probability is more like 99.99%.
The birth sex ratio in humans is about 51.5% male and 48.5% female, well outside of your 99.99% interval. That’s embarrassing.
You are extremely overconfident in the ratio because you have a lot of prior information (but not enough, clearly, to justify your extreme overconfidence). In many problems you don’t have that much prior information. Vague priors are often reasonable.
Comment by Veedrac 11 hours ago
https://en.wikipedia.org/wiki/Lindley%27s_paradox#The_lack_o...
Indeed, Bayesian approaches need effort to correct bad priors, and indeed the original hypothesis was bad.
That said. First, in defense of the prior, it is infinitely more likely that the probability is exactly 0.5 than it is some individual uniformly chosen number to each side. There are causal mechanisms that can explain exactly even splits. I agree that it's much safer to use simpler priors that can at least approximate any precise simple prior, and will learn any 'close enough' match, but some privileged probability on 0.5 is not crazy, and can even be nice as a reference to help you check the power of your data.
One really should separate out the update part of Bayes from the prior part of Bayes. The data fits differently under a lot of hypotheses. Like, it's good to check expected log odds against actual log odds, but Bayes updates are almost never going to tell you that a hypothesis is "true", because whether your log loss is good is relative to the baselines you're comparing it against. Someone might come up with a prior on the basis that particular ratios are evolutionarily selected for. Someone might come up with a model that predicts births sequentially using a genomics-over-time model and get a loss far better than any of the independent random variable hypotheses. The important part is the log-odds of hypotheses under observations, not the posterior.
Comment by KK7NIL 12 hours ago
This Veritasium video does a great job at explaining how such skewed priors can easily appear in our current academic system and the paradox in general: https://youtu.be/42QuXLucH3Q?si=c56F7Y3RB5SBeL4m
Comment by senkora 17 hours ago
> One could specify a smallest effect size of interest and compare the plausibility of seeing the reported p-value under that distribution compared to the null distribution. 6 Maier and Lakens (2022) suggest you could do this exercise when planning a test in order to justify your choice of alpha-level
Huh, I’d never thought to do that before. You pretty much have to choose a smallest effect size of interest in order to do a power analysis in the first place, to figure out how many samples to collect, so this is a neat next step to then base significance level off of it.
Comment by CrazyStat 15 hours ago
Given rampant incentive misalignments (the goal in academic research is often to publish something as much as—or more than—to discover truth), having fixed significance levels as standards across whole fields may be superior in practice.
Comment by levocardia 13 hours ago
Usually you have to go collect data first, then analyze it, then (in an ideal world where science is well-incentivized) replicate your own analysis in a second wave of data collection doing everything exactly the same. Psychology has actually gotten to a point where this is mostly how it works; many other fields have not.