Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Please use the API. DO not scrape wikipedia via the website.

What you're looking for is:

https://en.wikipedia.org/wiki/Special:Export

You can start with the index page and collect all the page titles you're interested in, and then use the special:export API to download XML (probably other formats too) of all those pages.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: