I’ve been working a lot with Bayes factors lately. I don’t want to sound cultish, but I think part of the issue is this stuff doesn’t work “half way”. As soon as you’re talking about the null hypothesis and Bayes factors, you’re mixing up two schools of thought that don’t play nice.
Bayes factors work with comparing models. There is no null model. What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0. And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.
So, you need to pick two models and compare them. I’m not saying this is right for science. It’s working well for my purposes. One model meaning “as planned”, one model meaning “not as planned”, use the Bayes factor to decide if things are going as planned. But you do need to be explicit about what models you’re comparing. You have to be able to just put some data in and get a probability back, or it’s not going to work.
This is what makes this criticism of Bayes factors so unpersuasive. They’re very easy to calculate, but they’re never calculated here! It’s just the ratio of marginal likelihoods, the probability of the data under the model.
So does the traditional Neyman–Pearson hypothesis testing.
> There is no null model.
Why can’t there be?
> What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0%.
Well, if your null hypothesis is deterministic and says 0% effect, getting anything other than 0% absolutely will make you reject the null hypothesis. But most of the time hypotheses are not deterministic. Usually you sample random variables.
> And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.
Traditional hypothesis testing is a particular case of minimising risk, ie the expected value of your loss given possible models and your decision rule. You don’t assume any belief on the probability of a specific model to be true, thus I think it is incorrect to claim that you encode a belief. You don’t even claim that it can be measured.
Of course, that makes it impossible to quantify the risk over all possible models. Thus, you only deal with Type I and Type II errors, which values presumes that the null or alternative hypothesis is correct.
If you have a probability measure for models, you can simply average your risks over it and get what is known as Bayes risk. That would be encoding some belief.
You've sliced up what I've said to the point it doesn't really make sense. This is exactly what I said was confusing. I'm talking about Bayes factors and you're talking about null hypothesis testing.
I'll just answer your question, why there can't be a null model. You can have a hypothesis that represents all differences between groups are due to chance. To make this a statistical model, something that can calculate the probability of an event, you have to make assumptions. Maybe it's just about the distribution. Maybe it's independence. But, it's always something. You said it yourself "You don’t assume any belief on the probability of a specific model to be true." To be a statistical model, to calculate the probability of an event, to calculate marginal likelihoods, to calculate Bayes factors, you have to do that.
This is largely a philosophical point. You can have a null model. Something you pick to represent "no effect". But there's not the null, this belief free model that's categorically different from a model with priors.
If there's a belief-free model that can give a marginal likelihood, then I'm wrong. I'd also very much like to know about it.
I think there is some confusion going on. Nobody claims that there is the null hypothesis. I think you are fighting windmills.
Let’s say you study P. You know that P belongs to the family 𝒫. For example, 𝒫 = {N(μ,σ²): μ∈ℝ, σ²>0}. To be aware of 𝒫 is a prerequisite to do any sort of testing. For example, to test H0: μ=0 vs H1: μ≠0. After all, a typical hypothesis test is just a likehood ratio test.
What you don’t have to know or even to assume existence of—if you don’t do Bayesian stuff—is a probability measure Π on 𝒫 (and its appropriate sigma-field). That’s the philosophical difference. But you have to have well-defined 𝒫 either way.
The priors is Π. Existence of 𝒫 means that you have a family of models, but it doesn’t force you to assume any priors about those models. I don’t see how not having Π makes P∈𝒫 less of a model. You are still allowed to do conditional reasoning, eg the aforementioned type I and type II errors.
I think this approach to non-Bayesian hypothesis testing is dangerous or misleading.
> Let’s say you study P. You know that P belongs to the family 𝒫. For example, 𝒫 = {N(μ,σ²): μ∈ℝ, σ²>0}. To be aware of 𝒫 is a prerequisite to do any sort of testing.
Okay, sure: you’ve decided on a family of probability distributions. It’s surely an approximation (very few things you would test are actually Gaussian — for one thing, the negative tail often makes no sense).
> For example, to test H0: μ=0 vs H1: μ≠0. After all, a typical hypothesis test is just a likehood ratio test.
This is indeed the usual formulation.
> What you don’t have to know or even to assume existence of—if you don’t do Bayesian stuff—is a probability measure Π on 𝒫 (and its appropriate sigma-field). … But you have to have well-defined 𝒫 either way.
If 𝒫 is well defined, then there is some probability that H0 is true. But H0 occupies a lower-dimensional space than 𝒫 — it’s a measure-zero subset. Most probability measures 𝒫 (and all measures that are continuous on their parameters) give zero probability to H0. So (in Bayesian terms) H0 is a priori wrong w.p. 1. And in non-Bayesian terms, you’re calculating the likelihood of your measurements under two competing hypotheses, one of which is correct w.p. 0 even conditioned on one of the two hypotheses being correct.
And this results in what I consider to be useless headline results:
“This intervention has an effect with significance 0.02” — great, of course it has an effect. What is the effect? Can you say anything intelligent about effect size? Did you even try?
“We did not find significant evidence that some intervention causes some undesirable effect” — great, but that’s actually a statement about your trial and conveys essentially no information about whether the effect is there. I can do a study with n=1 and fail to find significant evidence of anything! But I also learn nothing! Why didn’t you either (a) come up with an actual reasonable hypothesis and test that or (b) put some confidence bounds on the size of the undesirable effect.
And you can do (b) without a Bayesian prior as long as it choose your hypothesis well. “Our data is inconsistent with the intervention causing the undesired effect in more than 0.001% of cases” with some clarification as to what “inconsistent” means.
What is dangerous or misleading? It is a formulation I learned in my graduate statistics course. It is the one that makes the most sense and requires the least amount of handwaving.
> But H0 occupies a lower-dimensional space than 𝒫 — it’s a measure-zero subset.
In non-Bayesian framework your hypotheses don't have to be a part of a measurable structure at all. Nevertheless, if you have a measure, it doesn't have to be a zero at every point. I think it is quite intuitive to see. Let's look at two questions: (1) Does X have an effect? (2) How large is that effect? If your prior puts a non-zero probability that the answer to the first question is "No", then priors for the second question will have non-zero at point 0, even though the probability of any other point may be zero.
> And this results in what I consider to be useless headline results
These headlines don't have anything to do with Bayesian vs non-Bayesian as far as I see. People not doing power analysis is a people problem, not an issue with a statistical framework.
I’ve never taken graduate statistics, but I’m generally not impressed with the state of statistics or statistics education. Just because almost every university teaches almost every student this (and most published papers work this way) doesn’t mean it’s good. Intuitive, sure. Wise, not so much.
> Let's look at two questions: (1) Does X have an effect? (2) How large is that effect? If your prior puts a non-zero probability that the answer to the first question is "No", then priors for the second question will have non-zero at point 0, even though the probability of any other point may be zero.
Even ignoring whether it makes sense to have a nonzero prior for X having no effect, it’s generally not useful to learn the answer. An arbitrarily small effect is, for practical purposes, indistinguishable from no effect — an experiment that is intended to be useful should say something about the size of an effect. If a pill helps depression enough to be worth taking the pill, that’s one thing. If it helps in the sense that you could dose literally everyone in the world and one person would feel very slightly better for a day, that’s not helpful. Similarly, the existence of an effect says nothing about the sign of the effect.
So I think hypotheses being tested should be useful. If you want to determine whether something is useful, at least set a threshold for usefulness and test that. Or come up with a quantitative measure. The Bayesian-vs-frequentist debate is IMO somewhat secondary to this except insofar as it seems less common to make worthless but mathematically correct Bayesian tests because thinking about priors at all requires some acknowledgment of whether a prior is remotely plausible.
Also, how exactly can you make a well defined experiment that can confirm the null hypothesis without putting something resembling a prior on the non-null hypothesis if the non-null hypothesis contains distributions that are arbitrarily close to null? I’m sure it’s doable, but it seems likely to be pretty messy if you dig in.
(There are exceptions. For example, the existence of a neutrino mass is very interesting irrespective of what that mass is. But even then, physics results like this generally put bounds on a value that is hypothesized to be zero instead of merely testing for zeroness, because every experiment has finite power to detect small effects, and the readers of an outcome of an experiment should care more about the detection limits than, say, the number of dollars the experiment cost.)
The point of priors is to represent beliefs about reality. It would certainly be surprising if you had a zero prior for neutrino to have no charge. Why cannot people have similar beliefs about drugs?
> an experiment that is intended to be useful should say something about the size of an effect
I agree. Nothing stops people who study depression drugs from doing power analysis and reporting those numbers.
> how exactly can you make a well defined experiment that can confirm the null hypothesis
I think you refer to the common of idea of rejecting or not rejecting H0. Well, from the point of view of statistical decision theory, the distinction between non-rejecting and accepting just doesn’t exist. You either choose H0 or H1. If you had n=1, of course there is a high chance of Type II error. Do power analysis and design an appropriate experiment given your desired significance level, statistical power and effect size.
> Most probability measures 𝒫 (and all measures that are continuous on their parameters) give zero probability to H0.
And some don't.
> So (in Bayesian terms) H0 is a priori wrong w.p. 1.
Or not.
> And in non-Bayesian terms, you’re calculating the likelihood of your measurements under two competing hypotheses, one of which is correct w.p. 0 even conditioned on one of the two hypotheses being correct.
If one of the two (H0 or H1) is correct and you don't know which one then it could be either... Of course is you knew a priori which one is correct you wouldn't be considering the other at all.
> Nobody claims that there is the null hypothesis.
Google “the null hypothesis”.
If you mean null model, then I’m not fighting against anyone. We all agree which null model to use is a choice to be made.
Otherwise, I’m not even sure what you’re trying to convince me of at this point. I’ll restate the essence of my first comment more concisely.
Bayes factors are a method of model comparison. You take the ratio of marginal likelihoods for two models given the data. Choosing a null model for this purpose requires more assumptions than doing null hypothesis testing with frequentist statistics. Mixing the schools of thought of Bayesian and frequentist makes things more confusing than operating within them individually. Bayes factors have other uses than null hypothesis testing.
Maybe you straight up disagree with one of those sentences to the point you could quote it and say “this is wrong because…”.
I have this feeling you got to my second paragraph on my first comment and started quoting stuff before reading it all the way through. Or maybe you just had some stuff you really wanted to talk about. Because my whole point was how calculating Bayes factors and the normal mentality of null hypothesis testing don’t play nice together, but can have other benefits. So a line by line comparison doesn’t really make sense.
> If you mean null model, then I’m not fighting against anyone. We all agree which null model to use is a choice to be made.
I don’t understand what difference you are trying to imply by drawing a distinction between a null model and a null hypothesis.
> Otherwise, I’m not even sure what you’re trying to convince me of at this point. I’ll restate the essence of my first comment more concisely.
I will try to make it as clear as possible.
> Bayes factors are a method of model comparison.
Are you implying that hypothesis testing isn’t? That’s just false. And I’ve explained why.
> You take the ratio of marginal likelihoods for two models given the data. Choosing a null model for this purpose requires more assumptions than doing null hypothesis testing with frequentist statistics.
And in frequentist statistics you just calculate likehood because you can’t integrate over your model probabilities to get marginal likehood because you don’t assume your models to have a probability of being true. That’s the only extra assumption you have in Bayesian statistics. Everything else is the same. If you are saying that there are some other extra assumptions, that’s just false as I’ve explained in my previous comments. There are no extra assumptions for a “null model” beyond putting a prior on it.
> Mixing the schools of thought of Bayesian and frequentist makes things more confusing than operating within them individually. Bayes factors have other uses than null hypothesis testing.
There is no any confusing “mixing”. It’s just statistical decision theory. In the frequentists approach you calculate the risk of your decision rule for each model and call it a day. In the Bayesian approach you go one step further and average your risks using your priors to get the “total” Bayes risk.
Both approaches have uses other than null hypothesis testing. Null hypothesis testing is just a particular case of a decision problem with a 0-1 loss function. The loss is 0 if you have chosen the correct hypothesis and it is 1 if you have encountered type I or type II error.
> Bayes factors work with comparing models. There is no null model. What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0. And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.
> So, you need to pick two models and compare them. I’m not saying this is right for science. It’s working well for my purposes. One model meaning “as planned”, one model meaning “not as planned”, use the Bayes factor to decide if things are going as planned. But you do need to be explicit about what models you’re comparing. You have to be able to just put some data in and get a probability back, or it’s not going to work.
It is the same way with traditional hypothesis testing. You take two models and compare their likehood.
> It is the same way with traditional hypothesis testing. You take two models and compare their likehood.
With a Bayes factor you compare the marginal likelihood. You have to account for the weight of the parameters according to the priors. With a likelihood ratio, you pick the best parameters and take the ratio of those likelihoods.
This means a model used in a Bayes factor must be able to make predictions that follow probability axioms. Models in likelihood ratios don’t have this restriction.
I agree likelihood ratios and Bayes factors are similar. They’re also different.
> With a Bayes factor you compare the marginal likelihood. You have to account for the weight of the parameters according to the priors. With a likelihood ratio, you pick the best parameters and take the ratio of those likelihoods.
Yeah, that's the difference that I mentioned. And seems very different from whatever "it put the probability of 0% at 1 and everything else at 0" is supposed to refer to.
> This means a model used in a Bayes factor must be able to make predictions that follow probability axioms. Models in likelihood ratios don’t have this restriction.
Models in likehood ratios absolutely have to follow probability axioms, otherwise it would make no sense to apply probability axioms to study them.
> Choosing a null model for this purpose requires more assumptions than doing null hypothesis testing with frequentist statistics.
How so?
I could choose the same null model that predicts that the observation is distributed as, say, a standard Gaussian. What additional assumptions are required?
Mean and standard deviation up front, not from the sample. At least, that would go against my understanding of Bayes factors and how I’ve calculated them. You can do other stats, t-test for example, without declaring that up front.
If my choice of null model is p(x)=exp(-x^2/2) and I get some observation xobs I can do frequentist things with it - like calculating the p-value p(|x|>|xobs|) for example - and I can compare it with some alternative model p’(x) using the Bayes factor p(xobs)/p’(xobs).
What does “Mean and standard deviation up front, not from the sample.” mean in this context?
Mean and standard deviation up front would mean that you set your null as a normal distribution with a mean of 0 and a standard deviation of 2%, for example. This is different from saying that you’ll just say it’s normal, and take the mean and standard deviation from the sample.
I mean, I could be wrong on this. You could do that if you want. I just think of Bayes factors as a competition and it needs to be “fair”. So it doesn’t make sense to let the null update as data comes in but not the alternative.
As you said "Bayes factors work with comparing models."
I don't think that there is a concept of "fairness" that prevents us from including in the comparison a very simple model.
"There is no null model. What, 0% effect?"
For example. But even if the true effect is zero you may get a non-zero observation because measurements are not perfect.
If what you measure is precisely what you want to know there is no need for statistical analysis!
Let's say for the sake of this example that you know that your measurement error is distributed normally with unit variance.
"Ok, there was a non-zero effect."
There was a non-zero _observation_.
"That model loses since it put the probability of 0% at 1 and everything else at 0."
It didn't. The prediction of the null model is that observations will be normally distributed around 0.
That's precisely what the author of the blogpost complains about: that if your observation is 1 the prediction of the null model (close to 0) was more accurate than the prediction of the second model (somewhere between 1 and 10) and the observation favours the former over the latter.
Pick a measure. That’s what I mean by “effect is 0%”. It’s a straw man here.
Pick a fully specified model. This is a model that, up front, you could ask what is the probability of event E? For a normal distribution, this would require choosing concrete mean and standard deviation.
Pick an under-specified model. This would be that it’s normal, but you don’t pick the mean and standard deviation. You pull them from the sample. As I’ve described it here, you can’t get P(E) from that.
The expectation from our alternative hypothesis and model is that it’s fully formed before we look at the data. It’s a choice whether you want that to be the case or not with the null model. “Fair” as I’m describing it is that you would pick something.
A> Pick a measure. That’s what I mean by “effect is 0%”. It’s a straw man here.
I don't understand what you mean by "pick a measure" but maybe the "it’s a straw man here" (that I don't really understand either) indicates that looking at the other two options is enough.
B> Pick a fully specified model. This is a model that, up front, you could ask what is the probability of event E? For a normal distribution, this would require choosing concrete mean and standard deviation.
Ok. That seems to describe a simple classical null hypothesis like the example I gave in my previous comment. The underlying thing of interest is zero and the sampling distribution for the data is normally distributed around zero.
C> Pick an under-specified model. This would be that it’s normal, but you don’t pick the mean and standard deviation. You pull them from the sample. As I’ve described it here, you can’t get P(E) from that.
That is not the kind of null hypothesis I gave in my example, I think we can agree on that.
> The expectation from our alternative hypothesis and model is that it’s fully formed before we look at the data.
I don't understand that sentence. What is "it" that is fully formed before we look at the data? The alternative hypothesis and model?
> It’s a choice whether you want that to be the case or not with the null model.
What is "that"? Being formed before we look at the data? (In that case I hope that the null model I described would satisfy that.)
> “Fair” as I’m describing it is that you would pick something.
Pick something of what? I'm completely lost, I'm afraid.
I'm just saying that I can have a null model of the form B like in the example "the underlying thing is zero and the data generated by this model has a probability distribution p(x)=exp(-x^2/2)".
And I can compare that model it with any other model described by a distribution probability for the underlying thing which, taking into account the measurement error, results in a probability distribution p'(x) for the data generated.
“Pick a measure” just meant that you’re predicting the difference will be exactly 0%. P(0) = 1.
The difference between a Bayes factor and a likelihood ratio is Bayes factor uses the marginal likelihood. So you need to pick your parameters ahead of time, weighted by priors. With a likelihood ratio, you can use the best parameters given the data.
You can do the likelihood ratio in an objective way, because you’re choosing whatever has the least error given the data. Bayes factor you can’t be totally objective. You need to choose ahead. Upside is it reduces overfitting.
> “Pick a measure” just meant that you’re predicting the difference will be exactly 0%.
The difference of what? If you mean for example the difference between the population means of two groups that doesn’t mean that the observed difference between two sample means is zero. A non-zero observation doesn’t mean that “the model loses”. A non-zero observed difference is not just something that can happen, it’s what is expected.
If you mean that the null hypothesis is really “the difference between the observed means is exactly zero” that doesn’t seem very useful and I’ve never seen anyone do that. You don’t need statistics of any kind to reject the model “the observation is zero” when the observation is not zero.
Apart from that I agree that different hypothesis testing procedures do different things and their respective merits are debatable. My point was just that if you have a well-defined “null” model you can do different things with it. Using the same exact model.
> that doesn’t seem very useful and I’ve never seen anyone do that.
Yes. That’s why it’s a straw man. I’m being sort of uncharitable in that description. My point is that’s a starting point. To go from there, you need to choose a model which will have parameters or priors.
That’s where the « null hypothesis » comes in. If it fixes the parameters in the model you get a well-defined « null » model with a well-defined probability distribution for the observation and - just like you can take this null hypothesis model and do frequentist calculations with it - you can take this model and calculate a Bayes factor relative to some other model.
(To be clear, if the null hypothesis doesn’t fully specify the parameters the preceding paragraph doesnt apply and the situation is more complex.)
It provides a null hypothesis. You can compare two hypotheses without any of them designating absence of effect. You don't have to have "the null hypothesis" in some philosophical sense to do a t-test and whatnot.
The statistical interpretation of observations is so subtle and complex that it's a good idea to assume that any publication from the empirical sciences is complete garbage, until you know for sure that a qualified statistician has supervised the process. A semester of "introduction to statistical methods" (which is all the background that most scientists have) is NOT enough.
Imagine a mathematician writing a paper on a medical topic, making all kinds of claims on how things work in the human body – and then that mathematician justifies their expertise by saying "I did a two-week first aid course once, and also, I was really good at biology in school". This is pretty much how lots of science operates when it comes to interpreting results mathematically.
I read a survey once, that found that a huge number of PhDs/researchers in the studied sample gave an incorrect definition for what a "95% confidence interval" (/p-value, etc) actually means, and that several popular introductory textbooks defined it incorrectly as well. Wish I bookmarked it.
At bare minimum, journals need to require that researchers publish all their data alongside every paper, so statistical analyses can be redone and flaws can be spotted.
I think you may be talking about "Mindless statistics" by Gigerenzer. He has some surveys about p-values and how radically wrong they are usually interpreted.
>At bare minimum, journals need to require that researchers publish all their data alongside every paper, so statistical analyses can be redone and flaws can be spotted.
P-values work great when they’re super low, experiments run at a human-scale frequency, and hypotheses are extremely precise in their predictions, e.g. some physics.
If you run an experiment a day and get p < 10^-9, your priors, your multiple hypothesis correction, even your interpretation of p-values approximately don’t matter. Running social sciences experiments with p < 0.05 threshold is where things get weird.
probably this article:
Hoekstra, R., Morey, R.D., Rouder, J.N. et al. Robust misinterpretation of confidence intervals. Psychon Bull Rev 21, 1157–1164 (2014). https://doi.org/10.3758/s13423-013-0572-3
another good article on misinterpretation of p-values and confidence intervals is:
Greenland, S., Senn, S.J., Rothman, K.J. et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31, 337–350 (2016). https://doi.org/10.1007/s10654-016-0149-3
While I agree on the data point, it would kill so much research. It is bad that a lot of research validation basically comes down to "trust me guys", but with data being both very valuable and often times highly sensitive, it can be really difficult to just publish the data along with the research.
A decent compromise would be to at least require meta-data to sufficiently exclude some flaws. A different approach could be to have researchers document and publish the process of th research, similar to a git-repo with the main branch being completely off limits to history-rewriting.
On the other hand it seems there's also a lack of testing subjects. It's already frequently pointed out how medical results might not represent everyone. I would also assume that e.g. pharmaceutical certification processes do apply more sophisticated statistics.
The ugly truth is that lots of "science" being done today isn't actually science – it's a performance art that superficially imitates certain behaviors that are associated with real science.
And how could it be otherwise? There are nearly 10 million scientists in the world right now. And all of them are pushing out papers as fast as humanly possible. There isn't anywhere near enough statistical brainpower available to quality-control all of that. Not to mention that most people with sufficient expertise in statistics have better things to do than micromanaging science grad students who have a hard time comprehending Bayes' theorem.
I screwed around with trying to compute Bayes factors for models of distributions over set partitions, having been led astray by Bayesian phylogenetic inference methods. It was a waste of time--in practice the epistemology was terrible because the choice of prior distributions had such a huge effect on model comparisons. On top of that, the computations were highly unstable so I had to do a lot of fancy multi-temperature MCMC stuff that never quite worked.
Unless your priors are based on actual observations, stick with model selection approaches that are based on measured predictive power, or at least plausible approximations thereof, e.g. Aki Vehtari et al. LOO-CV (approximate leave-one-out cross-validation):
Bayesian methods are not easy to use, I agree. But that's because they're trying to answer much more meaningful (but harder) questions than frequentist ones, which researchers should be trying to do. You can't ignore Bayesian epistemology just by not using Bayesian methods. The underlying considerations in the Bayesian framework will inevitably become relevant to how you interpret your data, whether or not you use a formal Bayesian method.
The thing is, formally, frequentist methods like Null Hypothesis Significance Testing don't tell you what you really want to know. If you get a significant p-value, that means the data you observed wouldn't often happen by chance (within your model of the null). This doesn't actually tell you if your particular hypothesis should be favored. That requires other considerations, including ones that Simonsohn is negative about in this article.
For example, Simonsohn's conclusion says:
> To use Bayes factors to test hypotheses: you need to be OK with the following two things:
> 1. Accepting the null when “the alternative” you consider, and reject, does not represent the theory of interest.
> 2. Rejecting a theory after observing an outcome that the theory predicts.
He implies that these should be points against Bayes factors. But #2 is something you actually should do sometimes. Demonstrably. If the data suggests a wildly implausible effect size that doesn't show up consistently in other analyses, that should be a point against your theory and in favor of some more mundane explanation, like noisy data from an underpowered study [1].
Not using Bayesian methods is understandable if you don't feel comfortable with the very heavy demands they can make on your statistics acumen. But if you're, say, a social scientist incentivized to get "sexy" results and you refuse to engage with Bayesian epistemology at all, your career will almost certainly just be a contribution of more noise publications to the replication crisis.
> Bayesian methods are not easy to use, I agree. But that's because they're trying to answer much more meaningful (but harder) questions than frequentist ones
I think your view about the difference between Frequentist and Bayesian methods is wrong. There is this rant I like from Larry Wasserman [1] on the subject:
My opinions have shifted a bit. [...] Bayes-Frequentist debate still matters. And people — including many statisticians — are still confused about the distinction. I thought the basic Bayes-Frequentist debate was behind us. A year and a half of blogging (as well as reading other blogs) convinced me I was wrong here too. And this still does matter.
My emphasis on high-dimensional models is germane, however. In our world of high-dimensional, complex models I can’t see how anyone can interpret the output of a Bayesian analysis in any meaningful way.
I wish people were clearer about what Bayes is/is not and what frequentist inference is/is not. Bayes is the analysis of subjective beliefs but provides no frequency guarantees. Frequentist inference is about making procedures that have frequency guarantees but makes no pretense of representing anyone’s beliefs. In the high dimensional world, you have to choose: objective frequency guarantees or subjective beliefs. Choose whichever you prefer, but you can’t have both. I don’t care which one people pick; I just wish they would be clear about what they are giving up when they make their choice.
[...]
Of course, one can embrace objective Bayesian inference. If this means “Bayesian procedures with good frequentist properties” then I am all for it. But this is just frequentist inference in Bayesian clothing."
Having a specific opinion isn't inherently a bias, and it's rather uninspiring to put effort into a substantial comment and then get a reply which is essentially nothing but a baseless accusation of bias followed by a long block quote of unclear relevance (it's extremely far from true that all statistical analyses worth consideration are in high dimensions).
Hm, that line doesn't seem bad to me. But I can certainly edit it if you think it's harsh in some way. Which would you prefer?:
1. "I think your view about the difference between Frequentist and Bayesian methods is biased"
2. "I think your view about the difference between Frequentist and Bayesian methods is wrong".
3. Any other suggestion?
A Bayesian proponent wanting an objective standpoint ... Sorry, could not resist :) But I edited the comment. Sorry if it ruined your mood. I shared it with good intentions, not to pick a fight.
If the minimum wage is increased $4, the competing explanations seem to be:
1. Change in unemployment is normally distributed with mean 0% and standard deviation 0.606%.
2. Change in unemployment is uniformly distributed between 1% and 10%.
I don't really agree that "(1) vs (2)" is a particularly good formulation of the original question ("Would raising the minimum wage by $4 lead to greater unemployment?"). But if it were, how would the math work out?
If we observe that unemployment increases 1%, then yes, that piece of evidence is very slightly in favor of explanation (1). This doesn't feel weird or paradoxical to me. But surely we wouldn't want to decide the matter based just on that one inconclusive data point? Instead we would want to look at another instance of the same situation. If in that case an increase of, say, 6% would (almost) conclusively settle the matter in favor of (2), and an increase of, say, 0.8% would (absolutely) conclusively settle the matter in favor of (1).
So you have just one data point and you want to do statistics about it? No matter what you do, the results won't be useful.
In Bayesian approach, you start with some distribution that is a wild guess and doesn't even need to base on any knowledge besides of the basics how money work and that unemployment cannot be 0% or 100%. Each data point will refine your distribution until at some dataset size, it will converge to something estimating the reality.
> Note: By theory I merely mean the rationale for investigating the effect of x on y. A theory can be as simple as “I think people value a mug more once they own it”.
Hoo boy, the [2019] is well deserved on this one -- that's a dan arielly reference from before The 2021 Accusation and before the recent NPR story refuting his excuse[1].
I could swear one of his research projects involved asking people to make a mug and pricing it out after they finished. But I guess it was Kahneman that researched mugs? Whoops
p-values aren't problematic. How people use them is.
Same with bayes factors. I've seen people claim "anything above 3 is significant".
Incidentally, the theory behind p-values is actually beautiful, and p-values can generalise really well in theory, but in practice most people don't know this.
E.g., did you know that you can have "bayesian" p-values? (in the sense that the p-value can be designed to take priors and other models into account, without violating its definition in any way)
Milton Friedman was correct: because the true minimum wage is $0.00 (unemployment), he was correct to compare wage increase to the null hypothesis. The potshot in the opening paragraph ("Milton feels bad about the unemployed but good about his theory.") is simultaneously an appeal to emotion and a presumptuous ad hominem.
Potshot? It just seems like a joke, but one that puts this character (a nod to Milton Friedman but not like a serious insert) in a positive light. He’s pleased by being correct but sympathetic since he was right about something bad happening to people (unemployment).
The point isn't that Friedman is right or wrong, but that the statistic model tells him to reject his hypothesis, even though he observed a result consistent with his hypothesis.
>is simultaneously an appeal to emotion and a presumptuous ad hominem.
Because it makes Friedman heartless. He feels bad but he still promulgated theories which wreaked the badness he felt bad about. So it goes to character.
If he _really_ felt bad, he'd have done what Norbert Weiner did and move out of the field. He stayed an economist. Not so bad feeling, eh?
Predicting a negative effect is extremely common in science effecting humans. No reason to abandon a field.
>but he still promulgated theories which wreaked the badness he felt bad about
No. His prediction was that increasing the minimum wage leads to increased unemployment. He predicted a negative effect and a negative effect happened.
None of this has to do with article, which makes a very simple point about statistics.
> None of this has to do with article, which makes a very simple point about statistics.
I am addressing a 'how is this ad hom' question not Friedman's character or the article. I seek to explain what view of him would be a critique of character not substance.
Bayes factors work with comparing models. There is no null model. What, 0% effect? Ok, there was a non-zero effect. That model loses since it put the probability of 0% at 1 and everything else at 0. And if you do anything else, you’re encoding some amount of belief into the model, some judgment you’ve made.
So, you need to pick two models and compare them. I’m not saying this is right for science. It’s working well for my purposes. One model meaning “as planned”, one model meaning “not as planned”, use the Bayes factor to decide if things are going as planned. But you do need to be explicit about what models you’re comparing. You have to be able to just put some data in and get a probability back, or it’s not going to work.
This is what makes this criticism of Bayes factors so unpersuasive. They’re very easy to calculate, but they’re never calculated here! It’s just the ratio of marginal likelihoods, the probability of the data under the model.