I have many bookmarks too (10k+ -- firefox is not very fast handling them).
Most of them not needed of course, but there have been several instances I tried to open something in my bookmarks and the site no longer existed... so obviously bookmarks also need to store the last viewed site content)
> so obviously bookmarks also need to store the last viewed site content
Yeah, just save every page you ever open to disk. Just in case. Right? Also sync all of that data on multiple machines. But what if the syncing service disappears some day? Better ask for a self-hosted solution. But a free one. Which saves everything. Just in case anything ever gets lost. How much better life would be then.
Saving much content to disk makes some degree of sense, and there are tools that already do this, though not at the user level (and increasingly incompletely as SSL/TLS transport becomes near universal): caching proxies.
Your browser also caches aggressively.
If targeted to specific high-value sites, or setting retention based on site / content value (some automatic, some less so, some short-lived, some logner), you'll end up with a useful and usable local archive with what is today very small amounts of storage -- even a few GB of text out of a TB or more, isn't much, and that would be a pretty extensive collection.
If the content can be reduced such that it's just necessary text (excluding web crud and more), the end result is likely much smaller still. I've experimented with reducing Washington Post articles and homepage to a simplified view, by selecting specific HTML elements, and the result weighs in at about 3-10% of the source page.
A typical online article likely runs about 800 words. If you read (or save) 20 articles a day for a year, thats about 300 MB.
You would eventually fill a 1 TB hard drive with text at that point. In about 3,400 years.