Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://www.findlectures.com

Right now it has an index of ~70k conference talks / lectures / speeches. I'm working on improving it to get slide text and audio quality (for ranking), and getting more historical content.



I didn't know i wanted this.

How do you scrap all that data?


I wrote up some more info here on data acquisition here (just notes right now) - https://www.findlectures.com/articles/2017/01/22/Software-Ar...


Thanks for the write up!


I started out scraping sites manually, and started automating more pieces (a lot of sites use wordpress, so they are pretty structured). I'm working on a talk on the subject, so I'll have an article soon that explains better :)


How do you pay for the bandwidth costs for scraping?


I'm doing everything over my home network


ANd how much does that cost?


what is the stack you used?


It's mostly Node right now. I started documenting it in more detail (just notes right now) - https://www.findlectures.com/articles/2017/01/22/Software-Ar...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: