Right now it has an index of ~70k conference talks / lectures / speeches. I'm working on improving it to get slide text and audio quality (for ranking), and getting more historical content.
I started out scraping sites manually, and started automating more pieces (a lot of sites use wordpress, so they are pretty structured). I'm working on a talk on the subject, so I'll have an article soon that explains better :)
Right now it has an index of ~70k conference talks / lectures / speeches. I'm working on improving it to get slide text and audio quality (for ranking), and getting more historical content.