Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice write-up! Any info under what circumstances gradient boosted trees behave better versus traditional random forests?


(this answer from limited practical experience 10 years ago, but at least the theory doesn't go out of date):

random forest is less prone to over fitting as each tree in the ensemble is independent, if the base tree doesn't over fit then a random Forest of them also will not over fit. Whereas trees in a boosted model are not independent, boosting trains a sequence of models where model n depends on the previous models.

This is a double edged sword: you can probably get better predictive accuracy with boosting if you have enough data & have controls to prevent over fitting. Whereas a random forest is much more idiot proof to over fitting but it will not perform as well as a boosted model trained but not overfit on a large dataset.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: