We actually do what you describe as well sometimes. In particular when we scrape...

JoshTriplett · on Jan 20, 2016

Crawling sites actively hostile to your crawler seems like it has both useful and shady applications. What kinds of things do you use that for?

ddebernardy · on Jan 20, 2016

On the one hand side there's no shortage of users who want to crawl popular sites to monitor e.g. search engine ranking or prices. Which is kind of shady in some sense, or not - when there's no API there's no other way...

On the other there are also areas of the web where crawlers are simply not welcome. For instance, DARPA uses a number of our technologies to monitor the dark web for criminal activities:

http://opencatalog.darpa.mil/MEMEX.html

shostack · on Jan 20, 2016

As an "early" programmer playing with web scraping with the Nokogiri gem, I've been wondering about this aspect (although haven't encountered it yet).

Are there legal implications to scraping a site that actively tries to prevent bots from scraping it? I mean, if the data is publicly accessible on the web, could they go after you?

I don't plan on doing this for any malicious reasons or anything, and like I said, I haven't encountered it yet. Just having the "what if" thought of what my legal risks might be if I'm playing around with this and whether a site could come after me.

ddebernardy · on Jan 20, 2016

> Are there legal implications to scraping a site that actively tries to prevent bots from scraping it? I mean, if the data is publicly accessible on the web, could they go after you?

When we do projects, the baseline is if Google can see it we can too. So from a legal standpoint if Google is covered so are we.

From a legal standpoint firms do go after web scrapers. And lose more often than not. The exception is when you're logged in when you crawl. In that case you've implicitly accepted the terms of use. Some companies aggressively sue when you're logged in while scraping, so it's best to stay on the safe side. Further reading on the topic:

https://www.quora.com/What-is-the-legality-of-web-scraping