Since it wraps existing browsers (chromium/safari/electron/webkit) is it really fair to not say that in the readme? Sure it mentions that you should install them under "quickstart", but all the other parts compare itself directly against other browsers when this really seems to fill more of a "plugin-like" role than a full browser.
Also, when doing readme's for ambitious projects it's probably best to mark what features are complete and which ones are just planned.
Right now it looks like the whole V-lang debacle IMO with lots of features promised and no way to know what is implemented and not, while wrapping other projects without acknowledging it.
> Also, when doing readme's for ambitious projects it's probably best to mark what features are complete and which ones are just planned.
I think that's what issues and milestones are for. I tried to document as much as possible. Currently, a lot of UI features are not ready, and the HTML and CSS parser both need major rework.
> Right now it looks like the whole V-lang debacle IMO with lots of features promised and no way to know what is implemented and not, while wrapping other projects without acknowledging it.
Not a single external library included. Pretty much everything implemented by myself over the last 1 1/2 years.
Please don't confuse a Browser with a Browser Engine. I never claimed to replace WebKit, Blink and neither Gecko.
For now, I, as a single person working on the project, just didn't have the time to fork Servo into a reduced library that can be bundled anyhow - because that's a shitload of work to be done. For now I'm focussing on parsing and filtering, because the underlying network concept was proven to work. I had to build a test runner, too, to be able to even test these behaviours.
Currently the prototype of the UI is implemented in HTML5, reusing as much as possible from the nodejs-side ES2018 modules.
> I think that's what issues and milestones are for. I tried to document as much as possible.
I agree, but then those should not be mentioned in the readme as "Features". Are there any things mentioned in the readme in present tense that are not currently implemented?
> Please don't confuse a Browser with a Browser Engine. I never claimed to replace WebKit, Blink and neither Gecko.
I'm not saying that you have to fork or implement a browser, I'm just saying that if you wrap an existing browser it's best to be upfront about it and not make it sound like you have implemented a browser. I wouldn't have objected if the readme said something like "It uses chromium/electron or other browsers under the hood" at the beginning.
This is especially critical if you claim improved privacy or security.
> For now, I, as a single person working on the project
I'm not complaining about the quality or the scope of the work. I'm saying that what the readme presents as the current status of the project does not match reality.
I saw that you updated the readme to include that it is based on other browsers, great!
The feature list still seems intact though, so if that is intended to represent the current state perhaps you can help me understand these points (And please forgive me if you just intended to update it later):
---
> It uses trust-based Peers to share the local cache. Peers can receive, interchange, and synchronize their downloaded media
What is a trusted peer? How is trust established?
> It is peer-to-peer and always uses the most efficient way to share resources and to reduce bandwidth, which means downloaded websites are readable even when being completely offline.
What P2P tech is used here? Again, how is trust handled for resources downloaded over P2P networks?
I'm saying that if you make extraordinary claims you will need extraordinary evidence. Or say that they are ambitions until you can produce that evidence, which is completely fine!
By default, stealth trusts nothing. If you want to browse a website you have to configure the level of "trust" you expect the website to have. A news website, for example, should serve text and images, not sound or video (unless you expect it to). Trust on websites is based on media types (in the iana sense) and so called site modes that can be applied to all URLs.
Trust on peers is established with a currently manual handshake which leads to an availability query to the requested peer, and its available internet connection bandwidth (to not drain peers with cellular internet and save bandwidth). You want to use a peers resources? You have to add it first, in the browser settings. This is the current workflow until peer discovery (and a UI for that) is implemented.
> The Browser is completely free-of-DOM
Please note that the Browser UI is nothing more than a websocket client, and therefore remote controls the locally running stealth/source/Browser.mjs instance. This is why the browser/source folder contains only so few implementations, because it doesn't need more.
> What P2P tech is used here?
Currently, there is no way to discover peers automatically. Peer discovery will be available once radar gets online and is integrated.
But protocol wise, all protocols (except dns tunnel due to not being ready yet) can be used in a peer to peer manner, you can use a stealth instance in the same LAN for example as a socks proxy, or as an http/s proxy, or as a ws/s proxy.
Primarily, the officially supported API will probably be the websocket api because it will be easier to implement for potential third parties.
Regarding evidence: Take a look at covert, the testrunner, which also is testing peer to peer scenarios end-to-end.
If the network isn’t free and data is centralized, one day you think you have it all and the next you could have nothing. Tor pretends to be secure, but is dark and compromised. This project seems to understand that and wants to try again to fix via P2P in a way that has promise.
The simple implementation of web forms is broken in today’s web. It’s an input field or other element styled as an input field that may or may not be grouped or in a form object, possibly generated dynamically. Websockets, timers, validations, ... it’s a huge PITA.
The DOM is a freaking mess. It’s not there until it’s there, it’s detached, it’s shared. It’s been gangbanged so much, there’s no clear parent anymore.
ECMAScript- which version and which interpretation should Babel translate for you, and would you like obscufation via webpack, and how about some source maps with that, so you can unobscufate to debug it? Yarn, requireJS, npm, and you need an end script tag, should it go in the body or the head? You know the page isn’t full loaded yet, and it won’t ever be. There, it’s done, until that timer goes off. Each element generated was just regenerated and the old ones are hidden, but the new ones have the same script with different references. Sorry, that was the old framework, use this one, it’s newer and this blog or survey says more people use it.
For a P2P open data sharing network over https, the proxy could allow a request to get someone else down the path. Not everything is direct.
> Tor pretends to be secure, but is dark and compromised.
Citation needed. Please stop with the "tor is compromised" meme... and what do you even mean by "dark"? What the hell... Tor is by no means a perfect anonymity solution but it's to my knowledge the best we've got. It's certainly way better than a VPN or no anonymization at all.
More specifically, tor anonymity is limited by the fact that it's low-latency. This is a fundamental limitation of any low-latency transport layer and not the fault of the tor developers or any obscure forces. In particular, if your attacker has control of both your entry point (your tor guard node or your ISP) and your exit point (tor exit node, or the tor hidden service or website you are connecting to) , it becomes possible to de-anonymize your connection (to the specific exit point in question) through traffic analysis. There's just no way around that for a network meant to transport real-time traffic (as opposed to plain data or email for instance). And yes, it stands to reason that various intelligence agencies will have invested in running exit nodes or entry nodes but this is just unavoidable. What you can do counteract this is to run your own nodes or to donate to (presumably) trustworthy node operators.
I think it's also worth noting that although tor can by no means 100% guarantee that you will be free from government surveillance at all times, it does make mass surveillance more difficult and more error-prone, and to me that's the whole point. Furthermore, although government surveillance cannot be thwarted 100%, tor does make corporate surveillance basically impossible (assuming you can avoid browser fingerprinting; this is what the tor browser is for).
All in all, I can't claim tor is perfect (because it can't be!) but the more people use it the better it gets and it's certainly better than anything else, so please stop spreading FUD and encourage people to use it instead.
Also, it's unclear to me how Stealth helps at all with hiding the IP addresses of its participants... It claims to be "private" but the README doesn't say anything about network privacy...
The code doesn’t strike me as concerning itself with protecting privacy so much as changing who will get to log your traffic. Interesting effort though; I’ll hope for more details from them in the future!
Chill, bro. I said “seems hand-wavy” and “I’d love to be wrong”. I was hedging my bets and clearly indicating this was a surface-level read. I shouldn’t have to have a better alternative on deck to point out something in the codebase that didn’t seem to be privacy-friendly. No offense was meant.
Since you asked how I would do things: I would have had a clear and detailed security-specific document or section of the readme to detail in what ways it is peer-to-peer and in what ways it is private. I would have probably gestured towards the threat model I used when designing the protocols, but —- let’s be honest —- I’d probably be too lazy to document it adequately. As far as I can tell, there’s one paragraph in its developer guide on security and two paragraphs on peer-to-peer communication and I wasn’t able to get a good read on its concrete design or characteristics.
> Note that the DNS queries are only done when 1) there's no host in the local cache and 2) no trusted peer has resolved it either.
This wasn’t clear to me from my first spelunk through the readme or the docs. Are you affiliated with the project? Is there a good security overview of the project you know of?
> I mean, DNS is how the internet works. Can't do much about it except caching and delegation to avoid traceable specificity.
What I meant to say is, I was not so sure that the google public dns could be considered private. But nevermind on that, I can’t confirm their logging policies. I’m probably just paranoid about how easy google seems to build a profile on me. So yeah, as mentioned, just my initial read.
Hey, my comment wasn't meant in a defending manner...I'm just curious whether I maybe missed a new approach to gathering DNS data :)
I've seen some new protocols that try to build a trustless blockchain inspired system, but they aren't really there yet and sometimes still have recursion problems.
When I was visiting a friend in France I first realized how much is censored there by ISPs and cloudflare/google and others, so that's why I decided it might be a good approach to have a ronin here.
I totally agree that threat model isn't documented. Currently the peer to peer stuff is mostly manual, as there's no way to discover peers (yet). So you would have to add other local machines yourself in the browser settings.
Security wise there's currently a lot of things that are changing, such as the upcoming DNS tunnel protocol that can use dedicated other peers that are connected to the clearnet already by encapsulating e.g. https inside dns via fake TXT queries etc.
> public dns could be considered private
Totally agree here, I tried to find as many DoT and DoH dns servers as possible, and the list was actually longer before.
In 2019 a lot of dns providers went either broke or went commercial (like nextdns which now requires a unique id per user, which defeats the purpose of it completely)... But maybe someone knows a good DoH/DoT directory that's better than the curl wiki on github?
Thanks for following up with added info! I’ll look forward to seeing the project progress; It’s an area I’m super interested in. As far as naming systems better at privacy than DNS, I’m not aware of any serious options. Personally, I’m working on implementing something that hopes to improve the verifiability of naming resolutions, but thats a long ways off: https://tools.ietf.org/html/draft-watson-dinrg-delmap-02
Large scale actors (read: ISPs and government agencies) have a huge amount of entry and exit nodes. They can simply measure timestamps and stream bytesizes, which allows them to trace your IP and geolocation.
They do not have to decrypt HTTPS traffic for that, because the order of those streams is pretty unique when it comes to target IPs and timestamps.
Yes, hidden services are safe (well, no system is really safe). But if e.g. a hidden service includes a web resource from the clearnet, it can be traced.
I was talking about the "using tor to anonymize my IP" use case, where exit nodes get a huge amount of traffic per session.
In order to be really anon you would need a custom client side engine that randomizes the order of external resources, and pauses/resumes requests (given 206 or chunked encoding is supported), and/or introduces null bytes to have a different stream bytesize after TLS encryption is added.
Hidden services are safer in the sense that your connection can't be deanonymized with the help of your third relay (which would have been an exit node in the case of a clearnet connection) but if the hidden service in question were to be a honeypot and your entrypoint (ISP or tor guard node) were to be monitored by the same entity (this second requirement also holds for clearnet connection monitoring BTW), it would be possible to deanonymize your connection to the hidden service.
How easy it is to perform the traffic analysis would have to depend on the amount of data being transferred, if I had to guess, so downloading a video would probably be worse than browsing a plaintext forum like hackernews. But if we're talking about a honeypot, your browser could be easily tricked into downloading large-enough files even from a plaintext website (just add several megabytes of comments in the webpage source for instance).
> In order to be really anon you would need a custom client side engine that randomizes the order of external resources, and pauses/resumes requests (given 206 or chunked encoding is supported), and/or introduces null bytes to have a different stream bytesize after TLS encryption is added.
It's unclear to me how any of this helps avoid traffic analysis. I believe tor already pads data into 512-byte cells, which might help a little bit.
> fallsback to http:// only when necessary and only when the website was not MITM-ed
How would you know when an HTTP site is being MITM'd? There are some easy cases, but for everything else, well, ensuring this is half the point and most of the operational complexity of HTTPS!
https is used primarily. If there's only http available, trusted peers are asked for two things: their host caches for that domain and whether or not the data was transferred securely via https (port and protocol).
If either of those isn't statistically confirmed, it is assumed that the targeted website is compromised.
Currently I think this is an as good as possible approach, but otherwise I have no idea on how to verify that the website is legit without introducing too much traffic overhead for the network.
Personally, I wouldn't trust any http or https with tls < 1.2 website anyways. But whether or not that assumption can be extrapolated...dunno.
Do you have another way to verify its authenticity in mind?
I don't. Authentication is a pain, and the later in the stack you try to solve it the harder and hackier it gets. We have the gross hack of certificate authorities because we failed to deliver authenticated information via the domain name system.
> their host caches for that domain
Hm, would this trip a MITM flag if someone switched hosting providers? Like, if example.com is http only and is moved to a new datacenter, is there a way to distinguish this from someone MITMing the traffic?
As of now, a change of server ips would trigger an MITM warning if the new server is still served via http. If it's https and has identical hash content values, it is currently assumed to be the same server.
I basically decided to do this equivalent of statistical certificate pinning because I have no better solution at hand. I mean, you could have a couple of servers hosted in different geolocations and assume that if they crawl it and it's identical, then it must be true...but usually state level actors block it in the originating country, so it's hard to trace without something like a censorship index by geolocation (which is my plan for now).
But honestly, I have no way to know whether this will work out, because every piece of the internet's infrastructure can be potentially infiltrated.
I hear this a lot and hence have the same feeling about node/electron but is there actually anything of substance when it comes to the lack security of Node/Electron apps? Honestly curious.
Other posters have already pointed at some examples, it boils down to that you are taking in a relatively very large attack surface and one that is rather difficult to validate effectively and exhaustively.
Not exactly, the chromium/chrome sandbox isn’t dependent on how and what code you execute the electron/node one is and that is because the latter were designed to execute code across many more privilege levels than what “dedicated” browser needs.
If I download and build chromium (as long as I don’t disable the sandbox altogether) I don’t actually need to think about those issues while I do need to do that with Electron.
Electron has local file access, etc. in fact, it states: “Under no circumstances should you load and execute remote code with Node.js integration enabled.”
So, Stealth should consider forking Electron if better sandboxing is needed.
Between this and the Sciter (edit: added missing r, thanks everyone :-) project the other day and a number of other projects I'm starting to get optimistic that the web might soon be ready for a real overhaul.
I'm not sure if GP meant sciter or scite.ai (both of which had a few posts about them recently), or even SciTE.
However I don't see how either of those indicate some "real overhaul of the web", as sciter seems to be "just" a embeddable HTML/CSS engine which doesn't seem like a big change compared to e.g. webview[0].
> but it would indeed not overhaul the web in the slightest.
Kind of agree.
What I am hoping for is
either a new rendering engine that purposely incompatible with abusive websites and so much faster that people like us will use it anyway for everyplace it works and just keep a mainstream browser in backup for abusive web pages,
or more realistically something like asm.js where Firefox made an alternative faster path for Javascript that adhered to certain rules.
Looked through a bunch of the authors project, cool/funny stuff, keep it up.
A couple things;
- The project (and others) need more clear calls to action or goals. Reading the different pages made me think a bunch but I had no idea on what to do.
- Maybe, the Stealth browser is not meant for everyone. Maybe just a community of people use the browser and contribute to your goal of decentralized semantic data.
And really, your vision is so big, it might be worth doing a video.
Author of the project here. I didn't expect this to be posted on HN because the project is kind of still in its infancy.
The motivation behind the Browser was that I usually "use the Web" as a knowledge resource and am reading articles online, on blogs, on news websites, on social media and so on. But there are a couple of problems when seeing "what a Browser is" currently. A Browser currently is made for manual human interaction, and not for self-automation of repetitive tasks. These are currently only available at the mercy of programming or extensions, which I do not think is the reasonable way to go.
Why block everything except 2 things on a website when you could just grab the information you expect a website to contain?
I'm calling it a Semantic Web Browser because I want to build a p2p network that understands the knowledge the websites contain with its site adapters (beacons) and workflows (echoes), whereas the underlying concept tries to decentralize as much automation aspects as possible.
In the past a lot of websites were taken down (some for BS reasons, some not), but what's more important to me is the knowledge that is lost forever. Even if the knowledge is web-archived, the discovery (and sharing) aspects are gone, too.
My goal with the project is to be a network that tries to find truth and knowledge in the semantic content of the web, whereas I'm trying to build something that understands bias in articles, the authors of said articles, and history of articles that were (re-)posted on social media with biased perspectives.
I know that NLP currently isn't that far, but I think with swarm intelligence ideas (taken from Honeybee Democracy and similar research on bees) and the compositional game theory, it might be possible to have a self-defending system against financial actors.
Currently the Browser doesn't have a fully functioning UI/UX yet, and the parsers are being refactored. So it's still a prototype.
It is a decentralized Browser in the sense that if you've trusted peers in your local (or global) network, you can reuse their caches and share automation aspects with them (your friend's or your family's browser(s)), which allows fully decentralized (static) ways to archive websites and their links to other websites.
I'm not sure where the journey is heading to be honest, but I think the Tholian race and the naming makes it clear: "Be correct; we do not tolerate deceit." pretty much sums up why I built this thing.
Currently I don't have funding, and I'm trying to build up a startup around the idea of this "Web Intelligence Network" whereas I see a potential business model for large scale actors that want to do web scraping and/or gathering via the "extraction adapters" of websites that are maintained by the network.
I think this project turned out to be very important to me, especially when taking a glimpse at the post-COVID social media that contains so much bullshit that you could easily lose hope for humanity.
This project looks amazingly promising, thank you for creating it and I wish you the best of luck in its success.
One humble suggestion/idea I offer to think about, related to:
> It uses trust-based Peers to share the local cache. Peers can receive, interchange, and synchronize their downloaded media. This is especially helpful in rural areas, where internet bandwidth is sparse; and redundant downloads can be saved. Just bookmark Stealh as a Web App on your Android phone and you have direct access to your downloaded wikis, yay!
Trusted peers with a shared web cache is a good start, but how about _trustless_ peers? Is this possible?
Possibly using something like https://tlsnotary.org - which uses TLS to provide cryptographic proof of the authenticity of saved HTTPS pages (but unfortunately only works with TLS 1.0)
I'm still reading through the code and the paper, but this sounds actually amazing.
I planned on integrating a self-signing intermediary certificate for TLS anyways, so that peer-to-peer communication can be encrypted without a thirdparty handshake.
It sounds like this would integrate very nicely as a hashing/verification mechanism for shared caches. Thanks much for the hint!
All requests are shareable. Conditions for this are:
1. You have a trusted peer with a local IP configured (peer A knows Peer B and vice versa)
2. Peer A is downloading the url currently (stash) or is done downloading (cache)
3. Peer B can then reuse the same stream or download the file via Peer A
Note that stealth has for this reason also an HTML5 UI. Download a video on desktop, let stealth running and go to your Android or iOS tablet...connect to desktop-ip:65432. Open up the video and get the same stream, too :)
I was more thinking could I share media tracks from a video element on a page with my peer. I wasn't sure because the browser was headless if this would be possible
Any proxy could act as a MIM, so someone using a malicious fork of Stealth may cause problems.
But, the net is like this already. One site may send you to another site that tricks you into stealing your data. And, a relatively recent vulnerability subverted any WebKit-based browser from stating whether the site’s URL was using the correct server, so you’d have no visible way of knowing a site using HTTPS was legitimate.
Using a VPN could be better, but it’s sometimes worse, because you change who is trusted more (the VPN provider), as they know one of the addresses you’re coming from and everything you’re doing, and can record and sell that data.
I mean, technically, mozilla's ca-certificates tracker is the biggest attack vector on the internet's infrastructure [1]
and TLS transport encryption relies heavily on identification mechanisms which are recorded, verified and stored in a manner that a lot of third parties have to be trusted, too.
Even when ignoring that salesforce is a private entity with financial motivations, and that the server is hosted on 17 years out of date OSes, I wouldn't trust any single entity with a responsibility like this. Maybe the UN, but nothing below that, and I think a legislation for this would be the "most correct" approach.
I hope that in future (given tlsnotary works in the peer to peer case) this can be solved with content based signatures instead of per-domain-and-ip based certificates.
I mean, a snakeoil cert has to be assumed to be just as legit as a cross-signed cert these days due to the lower feasibility of letsencrypt certs.
Certificate pinning was a nice approach from the statistical perspective, but with letsencrypt taking over this is only valid for 3 months (max) until the pinned cert will lead to a required reverification.
I would highly recommend trying Antidetection Browser GoLogin for Multi-accounting
GoLogin's advantages:
All profiles are separated and protected
Each profile is in a separate container so that their data do not conflict with each other.
Identity protection
Before using the browser, GoLogin will open a page with your connection data, so you can make sure it is safe and anonymous. Antidetect without installing software
You only need to have a regular browser and Internet access, you are not tied to a specific place.
Automation
Automate any emulation process in a real browser. This will make your digital fingerprints look natural and your accounts will definitely not be blocked.
Teamwork
One-click access to any profile for each team member without any risk of blocking or leaking account data.
https://gologin.com/?utm_source=forum&utm_medium=comment&utm...
Also, when doing readme's for ambitious projects it's probably best to mark what features are complete and which ones are just planned.
Right now it looks like the whole V-lang debacle IMO with lots of features promised and no way to know what is implemented and not, while wrapping other projects without acknowledging it.