Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Who owns archive.is, and why are they trustworthy?
27 points by ev1 on May 26, 2020 | hide | past | favorite | 8 comments
I understand the need for anonymity when you're doing this due to the sheer amount of abuse reports, fake and real DMCAs, etc.

But why do people trust it? How do you know the pages you're archiving haven't been tampered with selectively to change history? This is just out of sheer curiosity, and I am not saying they do this.

This is made further interesting because of the following:

- Analytics from various Russian providers, instead of self-hosted (FYI: I consider GA to be equally privacy-violating as Metrika or Mail.ru)

- Large amounts of reverse proxies off questionable or bulletproof hosting providers

- Indefinitely doing this can't necessarily be cheap either at scale, who is paying for this?

- Demanding tracking or else blocking your access to the site, blocking any resolver that doesn't send the first 3 octets of your IP to them (edns-client-subnet)

- Explicitly tracking you in odd ways: they repeatedly load pixels/do DNS preconnect/preload from wildcard subdomains containing a cookied number, IP, country, tracking IDs. View any archived page and ^F "pixel.archive.is"



Archive.is isn't very important yet. I don't believe they warrant much concern re such questions. It doesn't yet matter very much if they're super trustworthy or not.

At their present scale, going through and manually changing (tampering with) saved content for propaganda (or similar) purposes, would have very little impact. More realistically, it probably has close to zero potential consequential impact. It'd be quite the chore for very little return.

If they become important some day, with dramatically greater scale of usage, then getting answers to these questions might be important.

If they eventually betray trust, they're trivial to replace. Other competing variations of archive.is exist now. It's a relatively easy service to create. Someone should probably challenge them just on the basis of how bad their ui & ux are.

At scale, if they begin abusing their position, it would become well known, they would get a reputation and it'd kill their service. The barrier to competition extremely low.


Also, why do people use archive.is over web.archive.org? One (web.archive.org) is an actual library and gets all the legal protections that entails, while the other doesn’t.


Why would you trust them? If you are trying to archive something, make sure to use multiple (separately owned) services so that you don't need to trust them.


Which services would you recommend? Last time I searched for free general-purpose website archival sites (for personal bitrot prevention), I could find only archive.is and archive.org.


Some alternatives that may be what you are looking for are archive.st, Webrecorder.io, FreezePage, and ArchiveBox. There is also perma.cc, which is a project of the Harvard Library Innovation Lab intended for academic usage.


> This is made further interesting because of the following:

An additional concern: they've shown signs in the past of being capricious, or at least, easily annoyed by (subjectively) insignificant slights. They continue to block Cloudflare DNS users, last I checked. The "reason" is that Cloudflare doesn't send along the eDNS client subnet, as a way of protecting their users' privacy. [1]

I would argue this means archive.today / is can't be trusted to have the best interests of the community at heart. It's not a public service in the way that archive.org is.

[1] This bad behavior is actually mentioned in their Wikipedia article, along with the additional uncited claim that they throttle users to 20 MB of data per day, upon which they apparently ban your IP address. I haven't verified the latter claim. https://en.wikipedia.org/wiki/Archive.today#Worldwide


I didn't know about the Cloudflare DNS users being blocked. I started using Cloudflare's DNS two days ago, I just checked and it looks like I can't use the site anymore.

I can use it with Tor after more than a dozen different reCaptcha requests.


I share most of these concerns, especially since it’s primarily used in the “alt-right” sphere and could easily become a vector to sow discord, either by tampering with content or simply by mapping the communities of users visiting it.

Worth noting, it’s probably not that expensive to run. Most of the hosting services they use would be offering “unmetered” bandwidth, so the cost is probably fixed per month, likely under $1000.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: