It is very strange that this is on a main page. The key thing is likelihood is p...

kgwgk · on Feb 18, 2024

> The key thing is likelihood is probability density of your data!

In fact the important thing to understand about the likelihood function is that it’s not a probability density.

LudwigNagasena · on Feb 19, 2024

I think there is some language misunderstanding going on. Likelihood function is not a probability density. Likelihood function evaluated for D is equal to probability density of D (by definition). In other words, f(x;theta) as a function of x is a probability density function. f(x;theta) as a function of theta is a likelihood function. But f(x;theta) for given x and theta is just a value, which, one can say, is both likelihood and probability density.

sega_sai · on Feb 18, 2024

For continuous data it is exactly a probability density evaluated on your data (for discrete it's PMF instead).

L(params)=P(D|params)

kgwgk · on Feb 18, 2024

The point is that L(params) is not a probability density. The integral of L(params) over params is not one.

vermarish · on Feb 18, 2024

Right. But if you make the notation slightly more explicit, then the integral of L(data, params) over data is 1. This follows from the independence assumption.

So we ARE working with a probability function. Its output can be interpreted as probabilities. It's just that we're maximizing L = P(events | params) with respect to params.

kgwgk · on Feb 18, 2024

The likelihood function is a function of params for a fixed value of data and it is not a probability function.

There is another function - a function of data for fixed params - which is a probability density. That doesn’t change the fact that the likelihood function isn’t.

331c8c71 · on Feb 18, 2024

The independence has nothing do with the integral being 1 to be honest. You could write a model where the observations are not independent but the (multivariate) integral over their domain will still be 1.

vermarish · on Feb 19, 2024

But for such a model, the joint pdf would not be written simply as a product of each individual pdf. That's what independence provides.

kgwgk · on Feb 19, 2024

If by “joint probability” you mean function(params, data) there is no joint probability here in general.

L(params, data) is constructed from a family density functions p(data) for each possible value of param. The integral of L(params, data) may be anything or diverge. You don’t need any extra independence assumption either.

Or maybe you mean “joint probability” as p(data1, data2) when data is composed of two observations, for example. But you don’t need any independence assumption for that probability density to integrate to one! It necessarily does that - whether you can factorize it as p’(data1)p’’(data2) or not.

331c8c71 · on Feb 18, 2024

That's exactly the reason why frequentist approach sucks by the way;) Parameters are treated specially and there is no internal consistency - to have it you need to introduce priors...

3abiton · on Feb 18, 2024

It's the bayes vs frequentist war again.

acc_297 · on Feb 18, 2024

A likelihood could be referring to data drawn from discrete distribution though and this wouldn’t change much about how it’s treated and it would be a proper probability not prob-density

nerdponx · on Feb 18, 2024

I'm surprised it's here as well. Of all the interesting questions on CV, I would not consider this one of them. I wonder if this was sent through the second-chance pool.