Nice, thanks! I wish there was a cleaned up repository of datasets like these, in a unified format, directly accessible to a public MapReduce engine like Elastic at AWS.
BitTorrent Please!
Why does it cost so much? They grabbed our data for free and they have enough free Bandwidth. Let's assume they are greedy, then they could at least offer it through BitTorrent. DVD's for that amount of data is ridiculous. I don't even have a DVD-Reader…
Can't afford buying all that + shipping to Europe, but would like to play with the Data for my NLP Project.
I agree ! I too can't afford it but would really love to play around with that data because i'm just beginning to learn about NLP and stuff. I too feel that shouldn't have been priced and not in a DVD!
When playing with new programming languages instead of a 'todo' list I always end up building an XKCD password generator. Interestingly enough, I've never found a frequency/comprehension list worth using to populate it for public consumption.