Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It'd be a nice gesture to reach out to the creators of the training data, like is usual with web scrapers.

I don’t think this is practical. And who notifies people of scraping content? I would’ve annoyed if I got spam from sites that scraped my content.



I've contacted websites about scraping when it'd be a repeat thing and they didn't have a robots.txt file available. Also if their stance on enforcing copyright was hazy (e.g. medical coding created by a non-profit). Sometimes, they pointed me toward an API I didn't know about.

>I don’t think this is practical.

I don't like people ignoring things just because they're impractical for ML. That leads to crap like automated account banning without possiblity of talking to a living customer service representative.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: