Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Maybe useful to someone:

https://archive.org/details/20250128-cdc-datasets

"""

An archive of all CDC datasets uploaded to https://data.cdc.gov/browse before January 28th, 2025. Excludes corrupt datasets and data not publicly accessible.

Most datasets are accompanied by an additional file ending in -meta that includes the metadata associated with the data. Attachments referenced in these files can be found in the attachments/ folder.

If you would like to seed this data to improve its redundancy please do not use the auto generated torrent, as it is incomplete. Instead use the torrent file labeled "full-20250128-cdc-datasets-USETHIS.torrent"

"""



This highlights why the Internet Archive is so important.


Thanks, that is useful. Are there any other efforts to archive all the data on government websites? I suppose we could crawl archive.org.


It will only take an order making the data illegal to host to have it removed. More copies are critical.


This means nothing if you host it on non-US servers. No one would take it seriously internationally.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: