Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes!

My Clojure scraping framework [0] facilitates that kind of workflow, and I’ve been using it to scrape/restructure massive sites (millions of pages). I guess I’m going to write a blog post about scraping with it at scale. Although it doesn’t really scale much above that – it’s meant for single-machine loads at the moment – it could be enhanced to support that kind of workflow rather easily.

[0]: https://github.com/nathell/skyscraper



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: