Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Some great advice here on crawling at scale, which has inspired our crawlers a lot : http://news.ycombinator.com/item?id=4367933

Basically it boils down to three things: 1. If the site is slow,crawl slooowly. 2. If you see non-200 http error codes, stop! 3. Obey robots.txt and speed restrictions.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: