What open source tools are you referring to? Do you just mean the search component?
There'd be two hard parts to this problem I reckon:
- gathering the data
If you make it too cumbersome and with friction, it won't be used. If you make it too easy to dump data, the useful info might get drown out.
- ensuring search gives you good results
We have open source engines like Lucene that let you search extensively, but what happens when you get 200 results back? How do you know what's the best/most-useful one? It's likely most users would get exhausted having to sift through everything and just default back to a Google
A script running curl on my browsing history would collect the html. I’d solve the 200 result problem if and when it was an actual problem in a way that addressed the actual problem. There’s a lot of success before too many results is a problem.
The idea that it might be too much friction than it was worth is why I didnt build it. Probably why nobody has built it and perhaps why you just listed a bunch of imagined problems as reasons not to build it.
I mean it would probably be shit if I built it and I liked my idea better than the idea of the work. That’s most things.
—
For what it is worth, I would default to google for the things google does better and use my personalized historic search when I wanted to see what I had seen before. Its both-and not either-or.
I've been of the opinion that website content monitoring should be implemented with a browser extension (plus possibly a local agent app)[1]. An extension-based approach would work well and be easy to use IMO.
I've been extremely disappointed by how Chrome in particular likes to forget everything about my browsing history (except for tracking cookies) after three months. I don't see why a link I clicked on a year ago on any given page should turn blue just because computers from 2004 might have performance problems with it.
[1]: Enterprises seem to prefer MITM here instead, but I'd argue it's not truly required, given the overwhelming popularity of agent-based EDR solutions.
There'd be two hard parts to this problem I reckon:
- gathering the data
If you make it too cumbersome and with friction, it won't be used. If you make it too easy to dump data, the useful info might get drown out.
- ensuring search gives you good results
We have open source engines like Lucene that let you search extensively, but what happens when you get 200 results back? How do you know what's the best/most-useful one? It's likely most users would get exhausted having to sift through everything and just default back to a Google