Urbit: The Internet failed (slides and video)

mbrubeck · on Sept 27, 2013

The thing that got me interested in Urbit when it was first announced back in 2010 was reading Van Jacobson's "Networking Named Content" paper [1] and related work from PARC, referenced in this blog post [2]. As someone who has spent 15 years now working on web development, browser development, and web standards, I feel the Internet really needs some form of "named data" networking as its next big step. I don't know if Urbit is the way it's going to happen, but it'll happen somehow. Maybe we'll evolve browsers and HTTP into a named data network instead. (On the one hand, it would be orders of magnitude easier than building an entire new model of computing from scratch... on the other hand, you don't get to throw out the whole ball of mud if you just keep building on top of it.)

Anyway, I encourage everyone to read the Van Jacobson paper or watch the tech talk. It points out some assumptions that are so baked into our network architectures that I had never really thought about them. And it gives some context for these slides' claim that "the internet has failed," by pointing to a working alternate system that's a better fit for some common cases. The link from the Urbit blog to the Van Jacobson talk is broken since Google Video was shut down, but you can now find it on YouTube [3].

[1] http://www.parc.com/publication/2318/networking-named-conten...

[2] http://moronlab.blogspot.com/2010/01/urbit-functional-progra...

[3] http://www.youtube.com/watch?v=oCZMoY3q2uM

urbit · on Sept 27, 2013

And of course the urbit frontpage itself: http://urbit.org.

Basically thanks to HN, 8300 people have watched a 10-minute screencast of a command-line session. Totally insane. My lovely and talented wife, who put together this video, was really hoping people would spend more time watching it than it took to edit.

Edit: And Google makes my point and VJ's brilliantly, by breaking the link to his Google TechTalk. Brilliant!

pdonis · on Sept 27, 2013

the Internet really needs some form of "named data" networking

Don't we already have that? URIs are names for data. I had this same reaction when I read about "Content-Centric Networking" back when it first came out.

Maybe we'll evolve browsers and HTTP into a named data network

They already are. We don't need to evolve them so much as use them in the way HTTP was originally designed to be used: you retrieve and (if you have the privileges) act on data by addressing appropriate HTTP verbs and payloads to its URI.

mbrubeck · on Sept 27, 2013

> URIs are names for data.

Yes, they are, essentially. But the difference is that an HTTP URI is not a name for a piece of data -- instead it is a set of instructions that say "connect to this network endpoint and send them this string; in return you will get some data." The data may be different every time someone follows these steps to dereference the URI. A single URI does not name a single piece of data.

In a URI scheme where the URIs really name the data (rather than an address for finding data), you can get away from TCP-style point-to-point networking in cases where it makes sense, like in one-to-many broadcasts. The paper I linked [1] specifically compares PARC's content-centric network protocol against TCP+HTTP in these scenarios. And it's not just one-to-many that can benefit; see also the VoCCN paper [2] for a very different use case.

I agree that you could (and should) still use URIs for data in your content-centric network, which is why I think we could evolve the web platform to use CCN without any major upheaval. You just need to add a new URI scheme besides http and https. For the web, the changes would not be at the application layer but at lower layers like TCP and routing. But more importantly, CCN would help enable new types of platforms (like Urbit?) that do not look like the web.

[1] http://conferences.sigcomm.org/co-next/2009/papers/Jacobson....

[2] http://www.parc.com/publication/2319/voccn.html

pdonis · on Sept 27, 2013

A single URI does not name a single piece of data.

It doesn't have to; but there's no reason why it couldn't. And there's no reason why a standard couldn't be developed that let the owner of a URI designate that it named a single piece of data that would never change.

Thanks for the links, I'll read them to get a better idea of the proposal.

mbrubeck · on Sept 27, 2013

> And there's no reason why a standard couldn't be developed that let the owner of a URI designate that it named a single piece of data that would never change.

Yes, I agree. The "new URI scheme" I suggested in my previous comment is one possible way such a standard might be written.

That's just one part of the solution, though; if you want the benefits of getting off of TCP then you also need standards for the new content-centric network layer protocol(s) that will replace it, and you need to implement that in your networks and routers.

urbit · on Sept 27, 2013

URIs are names for data - but they're not referentially transparent names. Basically, the inspiration for URIs is Unix filesystems, whereas it should have been Git repositories.

It's easy to build mutable resources on top of an immutable namespace. Just put the date, change number, or label, in the name. But once you stop being referentially transparent, you can never recover that virginity. There was sort of an attempt to make Cache-Control headers work in HTTP, but they really don't (I speak as a recovering browser developer), and there's just no substitute for everything having an infinite lifetime.

pdonis · on Sept 27, 2013

It's easy to build mutable resources on top of an immutable namespace. Just put the date, change number, or label, in the name.

But git repositories don't work this way. Every object in a Git repo is immutable; when I pull a new commit from somewhere, it looks to me like the files in my working copy magically change to the new versions (i.e., that they are mutable), but what's actually happening in the repo is that a set of new immutable objects are added; nothing that was there already changes.

This sort of scheme can just as easily be built using URIs (or, for that matter, Unix filenames), so I don't see what the big deal is. You have some URIs (analogous to the filenames in my git working copy) that appear to be mutable resources, but they are implemented on top of other URIs (analogous to the hashes of commits in the git repo) that are immutable resources. All this can be done now, using existing HTTP.

urbit · on Sept 27, 2013

Aren't we agreeing vehemently?

The problem is when you try to build immutable resources on top of a mutable namespace. How exactly do you know they're immutable? By what convention? What happens to your code if they mutate?

Even worse, using a mutable-namespace for immutable bindings, you have two immutable systems talking to each other through a mutable pipe. First the kind of transformation you describe has to be applied, then it has to be reversed. It's a recipe for madness.

pdonis · on Sept 27, 2013

How exactly do you know they're immutable?

You don't, unless the owner of the resource says they are and is trustworthy. No amount of code will change that.

If you mean, how does the owner of the resource signal that the resource is immutable, yes, I can see that establishing a standard for this would be helpful. But that's just a matter of adding an RFC to describe how to mark URIs as naming immutable resources. Why does it require completely reworking the Internet infrastructure from scratch?

using a mutable-namespace for immutable bindings, you have two immutable systems talking to each other through a mutable pipe

No, it's a matter of making sure you don't use a mutable namespace for immutable bindings. See above.

urbit · on Sept 27, 2013

You don't, unless the owner of the resource says they are and is trustworthy. No amount of code will change that.

No, but you have to decide how your code will behave if it detects a change. Should it keep a hash and report an error?

But that's just a matter of adding an RFC to describe how to mark URIs as naming immutable resources.

That's certainly an interesting way to use the word just. You don't strike me as someone who first heard of the IETF yesterday.

And besides, we already have an RFC that does exactly that. It's RFC 2616 - use Cache-Control: max-age. Except that - this channel is polluted with crap information, because lots of sites set their cache-control wrong and don't mean it.

So you need a new channel that isn't polluted with crap information. Maybe we're agreeing after all.

Why does it require completely reworking the Internet infrastructure from scratch?

DNS is also not referentially transparent. Nor are HTTP responses signed by DNS identities.

When all is said and done, you have a new service by any standards. Why layer the new service (which is much simpler than the old service) over the old ball of mud? If you want to talk to it from the ball of mud, write a gateway.

pdonis · on Sept 27, 2013

you have to decide how your code will behave if it detects a change

That decision will depend on the application, and that will be true regardless of how resources are mapped to names.

That's certainly an interesting way to use the word just.

I agree. :-) But note that I'm making a comparison to inventing a whole new operating system, programming language, etc. Compared to that, getting an RFC through the IETF strikes me as fairly simple. :-)

we already have an RFC that does exactly that. It's RFC 2616 - use Cache-Control: max-age. Except that - this channel is polluted with crap information

But that's not for the specific purpose of marking resources as immutable; it's overloaded with a number of purposes, which don't always mesh very well.

DNS is also not referentially transparent.

Maybe this would make more sense to me if I had read more of the papers relating to this, but it seems to me that "not referentially transparent" is a very general term that can mean a number of different things. Do you just mean that the IP address pointed to by an A record can change? Or do you mean that the resource pointed to by, say, "peterdonis.net" can change?

urbit · on Sept 28, 2013

Compared to that, getting an RFC through the IETF strikes me as fairly simple.

If I had to get the whole new operating system, etc, through the IETF, I might agree with you!

You know, all your questions have the same answer. Of course no standard can control the behavior of a remote server. But when you have an environment in which mutability is neither expected nor produced, any violation of immutability is easily recognized as a bug - and (in a highly specified system like Urbit) a very unlikely bug.

Whereas whatever conventions you add to the W3 environment that say "this resource is immutable," pretty much everything in that environment is "overloaded with a number of purposes, which don't always mesh very well" - and if the receiving end of a protocol actually treats your conventions as if they were ironclad rules, it's violating Postel's Law. There's just too much "give" in the whole system, and that's a mistake that's much easier to not make in a new system, than to fix in the old one.

dragonwriter · on Sept 27, 2013

> But that's just a matter of adding an RFC to describe how to mark URIs as naming immutable resources.

The simplest thing would be to just register a URN namespace specifically for immutable resources; the URN scheme already exists for resource naming separate from resolution.

pdonis · on Sept 27, 2013

Aren't we agreeing vehemently?

I don't know, because I don't understand (yet) what problem you think Urbit is solving, that can't be solved using existing tools (but possibly with new standards).

api · on Sept 27, 2013

The real problem is that I can't host a web page. I have to host it in the cloud. That's because there is no (easy) way for you to connect to my computer.

urbit · on Sept 27, 2013

Here, I think, we disagree, because I think your primary computer (ie, your personal state and main applications) should be in the cloud.

(But not for everyone - just for everyone who doesn't have powerful enemies. That's why it's essential to abstract across cloud hosting and personal hosting. If people who have powerful enemies and need to host at home have the same experience on the network as most people (who don't have powerful enemies), except that their enemies can't own them in some data center, they experience a kind of "herd immunity." Everyone doesn't need privacy against powerful enemies - everyone just needs to be able to obtain privacy, without deleting their digital lives, if they find out they have powerful enemies.)

api · on Sept 27, 2013

I've thought for a long time that you could have it both ways. If it's hard to completely decentralize networks (e.g. meshnets), then why not decentralize the computer?

I've got some ideas for an execution environment based on HTML5 and eventually-consistent cryptography-based distributed databases, but they're only ideas. It'd be a much bigger bite to bite off than ZeroTier One, and I don't have that kind of time/money right now.

The overall idea is to have a virtual cryptographically-defined system boundary that could be loaded -- invoked -- on any hardware running the appropriate VM.

urbit · on Sept 27, 2013

I don't know, man. I really try very hard to make sure there's no one between me and outright insanity. But sometimes I wish there was... are you talking about homomorphic encryption? I'm not sure I'd hold my breath waiting for a practical homomorphic VM.

anonymous · on Sept 27, 2013

> That's because there is no (easy) way for you to connect to my computer.

Er.. just give me your IP and we're good to go, no? Even easier if you register a name for free on dyndns (for instance) and then give me something like mshaxxorette.homelinux.org.

dragonwriter · on Sept 27, 2013

> URIs are names for data.

URIs are the result of merging the formerly-separate concepts of names for data (URNs) and identification of mechanisms for locating/accessing data (URLs). A particular URI may or may not be a name for a piece of data, and, except in certain cases, the scheme of a URI won't tell you whether or not it is such a name (one using the urn: scheme unambiguously is such a name, but one using the http: scheme may or may not be.)

platz · on Sept 27, 2013

Around the same time I heard of Urbit, I was listening to Don Stewart talk on a podcast (Haskell Cast episode #2) about using Haskell to avoid running an entire OS in the cloud; only the necessary code to perform the necessary operations to integrate with other systems was necessary.

He was using Xen to run a GHC process which he describes as a "microkernel environment".

Xen + Language Runtime (ghc) + user functions = the whole stack, and this boots in milliseconds, using only 1 or 2 meg of memory.

What's interesting is that this is a technique that he's using in development today.

I didn't read all the Urbit documentation, but the concepts between Don Stewart's approach and Urbit seem to share some similarities with eliminating OS bloat.

Some folks have claimed Urbit is as much art at practical, but Don is already doing practical things like this; unless I'm mistaken about what Urbit is about (I'm sure it does a lot of other things differently as well).

urbit · on Sept 27, 2013

I can argue with Haskell but I would never argue with teh awesome that is Don Stewart.

Urbit runs on Linux right now - but you're right, it would be very cool to run directly on the hypervisor, and it's totally possible for a self-contained system like this one. Personally I think Haskell never should have tried to be a language for producing Unix system calls - the impedance mismatch between FP and imperative OS is just too great.

asciilifeform · on Sept 29, 2013

I'd go one step further and argue that: no sane language should be a language for producing Unix system calls.

The impedance mismatch between sanity and imperative OS (and imperative hardware) is too great.

wmf · on Sept 28, 2013

Or you could skip the hypervisor and use containers.

helloTree · on Sept 28, 2013

You are talking about halvm I think - is it still in use or is it a dead project? The github project page does not seem to have seen commits for a while, but maybe it just ... works.

And can you please point me to the talk?

platz · on Sept 30, 2013

Here's the talk: http://www.haskellcast.com/episode/002-don-stewart-on-real-w...

helloTree · on Oct 1, 2013

thanks!

urbit · on Sept 27, 2013

A direct link to the video: http://www.youtube.com/watch?v=6S8JFoT6BEM

brian_cloutier · on Sept 27, 2013

Near the end of the talk someone asks you how to delete data if the state of the machine is just a function of the network packets it has received.

I have to admit I didn't really understand your answer:

> [When you try to forget data] you're making a subset of this computer that will not compute when it tries to use this information.

Could you elaborate a little on what that means, and how you'd apply it in practice. If I wanted to not just make data inaccessible but actually purge it from my machine, how would I do so?

urbit · on Sept 27, 2013

So, you're doing exactly the same thing when you press ^C on an event - what happens is, the event fails to compute. It doesn't go into your log. It doesn't affect your state. It never happened. Packet caused an infinite loop? You never got that packet. And so on.

If an event depends on data that's been deleted, that event never happened. But, at the Unix level (which can generate any events it wants), we can generate another event which reports the error. So it's not like things fall silently down the hole, either. (For the same reason, you can also get the stack trace of the infinite loop that was interrupted by your ^C.)

api · on Sept 27, 2013

http://blog.zerotier.com/post/58157836374/op-ed-internet-cen...

urbit · on Sept 27, 2013

ZeroTier looks awesome and that's a great post too.

I'd dispute the point about the CAP theorem slightly. Thing is, your data - Little Data - is naturally centralized. It doesn't have to be centralized in the same center as everyone else's data though! Then it becomes Big Data, where it's having this giant privacy-violating orgy with everyone else's data all day long.

The amount of traffic that your own Little Data has to serve, unless you're a celebrity in which case you can pay for serious hosting, is never going to require a giant cluster of replicated servers. So CAP just isn't that much of an issue.

api · on Sept 27, 2013

It's been obvious to me for a long time that the firewall and NAT are the primary causes of Internet centralization. I think they're a far greater factor than the CAP theorem or protocol / programming model difficulties. For whatever reason, I seem unable to get other people to see what I see here. Most people seem to just not get it.

I mean seriously... if nodes cannot easily contact each other horizontally then of course everything evolves toward a super-centralized model with large central groups of nodes acting as intermediaries. What part of that is hard to understand?!?

Thanks for the props. A new alpha release of ZeroTier is coming soon, and then it's going into beta with downloadable installers and other nice things. ZT1 is not going to decentralize the 'net, but it does create a lab where people can play with such things. (And it's an interesting VPN alternative for decentralized orgs too.)

urbit · on Sept 27, 2013

I would argue that spam is an even bigger one.

Basically, the Internet has spam because identity isn't a limited resource. IP addresses are kind of a limited resource, but they're not really the property of the person using/abusing them, so a blacklist doesn't inflict precise targeted damage, can't be made too draconian, and is easy to evade in lots of ways.

And everything above the IP level is unlimited. If there's an unlimited supply of identities, you can't tell the difference between a new customer and an old enemy. You want the first to have positive default reputation and the second to have infinite negative reputation.

People in the personal cloud community often point to email as proof that spam can be solved. Yes - but spam was solved in email because email already existed in an spam-free Internet. On the Internet we have, there's a much easier solution to the fact that any new protocol which is successful starts to attract spammers. The solution is: stop using the protocol. Google turned off XMPP federation for this very reason.

Basically in an orc-infested environment, you can't have your own cute little bungalow in the cloud. You gotta have an apartment in a giant fortified castle in the cloud.

Having a limited supply of identities, in which identities are (a) property and (b) property you control cryptographically (Bitcoin style, "allodial title"), makes it easy to make spamming not pay, once the price of an identity is greater than the profit a spammer can earn by burning it. And it does not require a central governance authority, or even a central reputation authority. (Reputation authorities shouldn't be built into any system, because if they abuse their own reputations the consequences are insanely dire.)

NAT is a problem, but there are lots of ways to tunnel around it. Which all suck, of course, but...

api · on Sept 27, 2013

NAT can be tunneled around, but there's two problems with that:

(1) Every NAT traversal protocol is unique, so there is no interoperability between different apps. The power of IP lies in the fact that it's a lingua franca-- anything can open TCP or send UDP. But anything cannot speak BitTorrent-DHT or Skype or whatever. So there's no potential for exponential growth in capability by tying disparate things together. Firewalls and NAT kill protocol interoperability.

(2) NAT traversal is hard. I know cause I just did it. It's a pain in the rear, and I'm still going to have to build port 80 HTTP tunneling into ZeroTier for that 0.1% of users who cannot use UDP or tunnel through their NAT. So you have to implement NAT-t + a proxy service for everything.

As far as spam goes, you're right. I should have mentioned that. But there are lots of strategies for dealing with spam. Why can't we have a few million small castles instead of one big one, for example? BBSes each had sysops who would kick off abusive users. There are also cryptographic things like hash-cash, Bitcoin economies, trust matrices, etc. These are complex but if we could engineer something good here then it could eventually be packaged into a friendly library that programmers could use without having to understand all the devilish details.

The fact is that we did build a decentralized many-to-many Internet. Then we broke it with firewalls and NAT.

urbit · on Sept 27, 2013

Well, dude, I just did NAT traversal too but I have a funny feeling you did it better.

Prediction: one day my protocol will run on top of yours. Rebuilding teh Internets, one layer at a time...

api · on Sept 27, 2013

Your protocol doesn't have to run on top of mine. Just build it to use IP and it can run over ZT1's virtual networks or plain vanilla IP or cjdns or whatever.

With IPv6 every device could have a first-order IP. We just need to get people to give up middle-box firewalls. That and anyone who attempts to build IPv6 NAT should be placed on trial for crimes against humanity, with the punishment being impalement followed by incineration atop a pyre.

NAT is the devil incarnate. Virtually all evils in the world -- from child abuse to war -- can safely be blamed on NAT.

wmf · on Sept 27, 2013

Yeah, I was kinda wondering why aliens don't use IPv6 and let Teredo or whatever worry about NAT traversal. I heard that IPv6 makes it much harder for Jeff Goldblum to upload the virus.

api · on Sept 28, 2013

joke comprehension error at 0x93: core dumped

unimpressive · on Sept 27, 2013

>I seem unable to get other people to see what I see here. Most people seem to just not get it.

I've thought this for a long time. I was even going to cover it in a "computer issues for regular people" book.

EDIT: Cover it in one, not as one, there's a lot more material than that.

api · on Sept 27, 2013

I think a big barrier is the netsec defense in depth / reduce surface area dogma. There's this massive cargo cult of the firewall in security circles, and if you suggest removing a firewall everyone freaks out and calls you an idiot.

Of course most malware and other attacks today bypass the firewall using "pull" based vectors like HTTP and e-mail, but try telling people that. Remotely exploitable "pushable" vulnerabilities are rare these days on stock OSes too, but again try telling people that.

And by firewall in this context I am referring to middle-box firewalls, not local firewalls. The latter are under the control of a box's user/OS and so can easily be opened to permit lateral communication. Middleboxes are the structural culprit here.

bct · on Sept 27, 2013

Another critical factor is that centralization is the easiest way to turn a profit.

api · on Sept 27, 2013

There are other ways though: sell the software itself, sell services built around something that is itself distributed, etc. But in general you are right. Most Internet business models revolve around monetizing a flow of traffic, and to do that the traffic has to flow through you.

sz4kerto · on Sept 27, 2013

It's a shame that as a developer, I need to see these stuff (like this Urbit fantasy) to remind myself about why I really started to tinker with computers.

urbit · on Sept 27, 2013

Fantasy? Laugh while you can, monkey-boy! :-)

mkhalil · on Sept 27, 2013

Initially by the title, I thought this was going to be another overly-dramatic simplification of problem article. Turns out to be a good one. I was thinking about this a lot lately.

Gravityloss · on Sept 27, 2013

Is the link right?

urbit · on Sept 27, 2013

Yes. It's the second talk on the list.

The other talks are great too! Watch them all!