Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Kaggle Ensembling Guide (mlwave.com)
51 points by jphilip147 on June 15, 2015 | hide | past | favorite | 5 comments


Couple points:

a) I think one of the biggest challenges in a Kaggle competition is getting away from overfitting to the leaderboard. It's super common... I won a Kaggle competition last year, and was something like 65th place on the public leaderboard at the end: the other teams were overfitting like crazy. As such, one should be super careful when taking 'well-performing' models to build an ensemble.

b) The point about the ensembling of uncorrelated models is hella important. If you make an ensemble consisting of 20 near-identical predictions from one algorithm, and 10 near-identical predictions from another algorithm, you're in effect taking a vote between the two algorithms and giving the first one a 2/3's weighting.

It might be interesting to think about explicitly de-correlating the model outputs, and finding an nice 'voting' method for combining the results... (And actually, this comes down to Z_2 arithmetic, so we could probably use a fourier transform for it... think I feel a blog post coming on.)


>> a) I think one of the biggest challenges in a Kaggle competition is getting away from overfitting to the leaderboard.

This actually depends on the data. The commenter above won the Social Circles competition. That competition had a very small number of instances - it looks like it was 60 in the training set and 50 in the test set. It had one of the larger shakeups in Kaggle history.


It's basically impossible to overfit to the leaderboard in some Kaggles like Avazu where both the train and test are massive in terms of unique observations.


> (And actually, this comes down to Z_2 arithmetic, so we could probably use a fourier transform for it... think I feel a blog post coming on.)

Care to elaborate on the teaser. Pretty please.


Surprisingly good, both as a broad overview and in the specifics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: