Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
LunaSea
on Jan 12, 2022
|
parent
|
context
|
favorite
| on:
The State of Web Scraping 2022
Common Crawl is missing far too many URLs for it to be useful in a real world scenario.
Chris2048
on Jan 12, 2022
[–]
But can't you
add
to their index?
wumpus
on Jan 12, 2022
|
parent
[–]
No. You can add to the Wayback Machine at web.archive.org via their "save page now" interface... Common Crawl is attempting to be a sample of the web, and doesn't take url suggestions.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: