Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A few years back we came into work one morning to find that some bot was scanning our site so hard that it seemed the lights nearly dimmed. Some detective work suggests that it was a service performed on behalf of a competitor, to get our price list (bear in mind that our catalog has a few hundred thousand products).

We were really annoyed that rather than just ask us, they had launched what amounted to a DDOS attack. So we thought about how we might exact vengeance...

After a few hours we figured out a pattern to the rogue requests that allowed us to filter them, despite their efforts at stealth (like, they cycle through a list of various user agent strings to make it look like there are multiple different users). We toyed with the idea of, rather than outright banning them, making our pages sensitive to their presence, so that when we detected them, we'd display a false price, defeating their whole operation.

We finally just decided to take the high road, temporarily banning any rogue IP addresses we detected (we couldn't make it permanent because many of the requests came from the Amazon cloud, from which we also receive some legitimate requests)

EDIT: you wouldn't think that requests for a few hundred thousand products would amount to a DDOS, but the bot was rather poorly written and grossly inefficient in the way it walked through the list.



I built a system called caltrops that did almost exactly that. As a given session's requests grew more and more suspicious, their data would skew from reality further and further. A real user on the line would notice immediately (and the more real-looking the user interactions, the more it would reduce suspicion), but competitors scraping our data would get pretty deliciously bunk data.


to deal with similar problems, kickstarter built a pretty useful tool called rack-attack https://github.com/kickstarter/rack-attack


This is a most excellent idea!

Btw, did you actually return incorrect price data, or did you just insert random bytes, etc.?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: