I've become increasingly convinced that a standardized VM in the browser that other languages could target would be the best idea. And we could forget that the whole JS thing ever happened.
Seriously, asm.js already comes pretty close to a standardized VM at a low level. But we intend to grow it to include integration with structured binary data and GC, to the point where it'll provide very similar functionality to VM's like the JVM or CLR.
The big real-world benefits of asm.js over a boil-the-ocean VM are that (a) the content is already 100% compatible and quite highly optimized in existing browsers and engines, and (b) it's far easier for engines to implement on top of their existing technology.
The biggest downside is that the syntax is weird. But that's just not a big deal for compilers. They can generate whatever weird syntax they want. You could even implement your preferred bytecode format as an IR and compile that to asm.js if you wanted, but I'm not sure compiler-writers would even care that much. They'd do just as well to have IR data structures in the internals of their compiler, without needing a special surface syntax for it.
There was a company in the '90s that was working on a byte code representation of programs which was designed to be “optimally” small by encoding more at the AST level than the ASM level. There were papers, but they were acquired by Microsoft (I think) and vanished. Does anyone remember them? The papers would be interesting in light of this, and conveniently near the end of patent lifetimes if patented and solid prior art if not.
Edit: Maybe I'm thinking of Michael Franz and Thomas Kistler's work on “Slim Binaries”. It matches the content if not the ambiance that I remember. It's been a while. My brain tapes could have print-through.
I really can't tell what you're advocating here. It seems as though you're against JS in general, but I think you'll have a tough time arguing that JS isn't: 1) standardized, 2) a VM, 3) in the browser, or 4) a target for other languages.
JS is standardized. JS VM's are not. Also I'm not against JS per se, I've grown to appreciate the language for what it is, it's more the JS monopoly that I kind of dislike.
Nobody is going to agree to standardize the internals of their JS implementation. Competition for JS performance remains high, engines change their implementations all the time. Engine implementors do not want to, and should not, expose their implementation internals.
Instead, what people really want is a standard semantics they can target that isn't tied to any one vendor's implementation details. That's the crux of a VM, and that's what asm.js is providing.
Sorry, I did not explain my position accurately. When I said that JS VMs are not standardized, I was not arguing that they should be (that is in fact a terrible idea), I was responding to the poster above who said that since we (a) already have a JS standard and (b) browsers have JS VMs, that we already pretty much have what I was arguing for.
I also agree that standard semantics is what people really want. Asm.js seems like an interesting project but one of the issues that (I like to think) the VM approach would solve is that the semantics could be more low-level. Why do I think that that is a good idea? Well ideally, one would not have to wait for different browsers to implement say websockets, but the websocket functionality could be simply implemented by the website as a library which would use (presumably secure) lower level primitives of the VM to achieve the websocket functionality. (If that sounds vaguely familiar to the microkernel idea, it's not just you :-) ).
What you're asking for is drastically harder to achieve from a security perspective. If we didn't care about security, we could just standard the Linux syscalls or something and call it a day. :) But what we're doing here is providing the low-level computational model but still only providing attenuated access to system facilities like the network through standardized web API's.
I would not say 'drastically'. I guess arguments from the monolithic vs microkernel discussion can be recycled. Basically, security is one of the main arguments for the latter [1] and a lot of OSs that require high security are in fact microkernels. But yeah, implementing the Linux API in the browser does in fact sound like a rather sub-par idea but I was not arguing for that.
I did not say that it would be simple, it would be pretty hard actually. The question is, whether it is worthwhile for people to be investing so much time to make something do something it was not intended to do (I'm not talking just about asm.js but about all the other languages and tools that target JS as well. I feel like these don't really add anything new to the table). And the question is also not whether it is hard or not but whether it is hard compared to similar projects. For example, I'm not convinced that it would be any harder than building a new browser (again I realize that this is super hard, but I'm speaking comparatively).
As to arguing, I prefer the word 'debating'. Also I was under the impression that that is what comments were for. Correct me if I'm wrong.
You have to define what you mean by "intent" for that to make any sense. Here is Dave Herman, one of the authors of javascript, telling you here is asm.js. you can use compilers to compile to it. Is that not intendy enough for you?
> Asm.js seems like an interesting project but one of the issues that (I like to think) the VM approach would solve is that the semantics could be more low-level.
Can you elaborate? asm.js semantics are already low-level, in fact as I said in another comment, they are lower than some VMs (e.g. PNaCl).
Which part of asm.js do you find to be too high-level?
You're too narrow in your definition here. Think of it this way: Dalvik and HotSpot are both implementations of the Java Virtual Machine (JVM). In the same vein, V8 is just an implementation of the Javascript Virtual Machine (JSVM).
"Ahh, I see, Clojure SHOULD HAVE been compiled to Java, not the JVM" said no one ever. This is a blatantly horrible idea. Horrible ideas are great for hacks, but it actually looks like Mozilla is serious about this (which scares me).
I want my bytecode to look like this: `iadd`, NOT `(a+b)|0`. Maybe I'm just old fashioned, but running bytecode through a text interpreter, which is then compiled again to machine code (possibly via another bytecode) seems like a horrific hack.
I would be much happier with a standardized VM which Mozilla could then implement via javascript or whatever if they wanted.
"I want my bytecode to look like this: `iadd`, NOT `(a+b)|0`."
That isn't enough of a reason to forego backwards compatibility. It is a hack, to be sure. But there are all sorts of suboptimal things that we deal with in order to make sure that technologies survive in the real world. Especially when it's really just a question of what the encoding looks like, which is not something that's that important.
(Besides, it turns out that making a multi-language VM that performs as well as a custom VM for dynamic languages is an unsolved problem. Many language implementers, Mike Pall of LuaJIT fame for example, don't believe in it. The closest thing is probably PyPy...)
> I want my bytecode to look like this: `iadd`, NOT `(a+b)|0`.
So build a front-end! I actually thought about doing this myself, because I knew we would get people confused about the difference between syntax and semantics. The syntax is not important to the computer, and it's generally not important to a compiler either (codegen can output whatever format it wants).
Note that the bytecode for the JVM and CLR, for example, don't look like `iadd` either. They're binary encoded formats. The format is just not relevant.
> Maybe I'm just old fashioned, but running bytecode through a text interpreter, which is then compiled again to machine code (possibly via another bytecode) seems like a horrific hack.
Never has to pass through an interpreter at all. It does not pass through an interpreter in our implementation.
You seem to be very hung up on surface syntax. The importance of a VM is not the syntax of its language but the semantics.
> You seem to be very hung up on surface syntax. The importance of a VM is not the syntax of its language but the semantics.
I'm protesting that there is a syntax to this 'VM'. It's rather indicative of Mozilla's approach to life, namely, write everything in JS out of some pseudomasochistic desire for backward compatibility hacks.
I would really, really like to see some benchmarks comparing NaCL vs asm.js, and I won't buy this as a viable compilation target until there's data to back up these (dubious) claims. If they do, then perhaps a frontend would be useful.
> The importance of a VM is not the syntax of its language but the semantics.
> I'm protesting that there is a syntax to this 'VM'.
That isn't a very reasonable thing to protest. Every language has a syntax. That includes every ASM variant. Syntax is an inherent aspect of language. What it sounds like you're actually offended by is that its syntax is very different from most ASM syntaxes.
> The semantics of JS are pretty horrible too.
Are you talking about asm.js, or are you talking about the superset that is not relevant to this discussion?
> I would really, really like to see some benchmarks comparing NaCL vs asm.js, and I won't buy this as a viable compilation target until there's data to back up these (dubious) claims.
The current numbers are that asm.js is around 2x slower than native code. I didn't compare to NaCl (which would be apples-to-oranges since it is non-portable) nor PNaCl (which I am not sure is ready yet for benchmarking? Please correct me if not).
We expect to improve on the 2x later this year, this is just the first iteration. I do think 2x is quite promising already though - it's in the range of Java and C# (on a fast VM for them).
I'll be putting up some slides with more specific numbers tomorrow after I finish giving a talk on it.
> It's rather indicative of Mozilla's approach to life, namely, write everything in JS out of some pseudomasochistic desire for backward compatibility hacks.
I'm just trying to make progress in a messy world. JS is not my ideal programming language (even though I've worked hard to make it better 7 years now (!) and counting). I simply believe asm.js has a good adoption/evolution story, and I want to web to continue growing and competing.
> The semantics of JS are pretty horrible too.
A cheap shot, and missing the point. The subset of JS, while being completely equivalent to JS semantics, is also equivalent to a low-level machine model. IOW, it gives you the semantics of a low-level but safe VM.
Don't think that I'm not sympathetic here; I agree that readable machine code is generally undervalued. At some point your code generator will bug out, and someone will be forced to sift through generated code and mentally interpret stuff like `(a+b)|0` to figure out where it's gone wrong. I wonder if the eventual asm.js proposal would be regular enough to define a 1:1 mapping between `(a+b)|0` and `iadd` (or some other more-conventional ASM format), to be used for human-reading and debugging rather than machine interpretation. I doubt it's possible in the general case, but I could be wrong.
Yes, absolutely. When we first started talking about the project I seriously considered designing a "disassembly" syntax, to make it easier. We can still do that pretty easily, it's just taking a bit of a back seat to getting v1 of the spec and Odin implementation out the door.
Let's assume, for a moment, that your position is basically right - that we need to come up with a language-independent VM, and that JavaScript isn't going to cut it. We'll even imagine that we have the ideal spec sitting in front of us, which does everything that anyone will ever need (which isn't going to happen).
How can we possibly get there from where we are now? Basically, we can't, for various reasons.
Anything that doesn't work on existing deployed web browsers is dead on arrival. If the VM won't work on the vast majority of deployed browsers, then I can't use it. It would take years for JavaScript-only browsers to die off enough that I can use the VM, especially Internet Explorer and various phone / tablet browsers (Android being an obvious problem, since most manufacturers do not release updates, or release them years late).
So, if I wanted to use the VM before 2020, I'd have to have a JavaScript-only fallback. I just don't have any other option, because I certainly can't use PNaCl (Chrome-only) or Flash (effectively desktop OS only). If I have a fully functional JavaScript-only solution, which works right now, in all existing browsers, what do I need the VM for?
That's assuming that we could possibly get any agreement on a VM. That's never going to happen, for a number of reasons.
The only browser manufacturer actually interested in a VM is Google. Hence, they're working on PNaCl. Microsoft, Apple and Mozilla are all against the idea for various reasons, ranging from pragmatic to idealistic.
There are far, far too many different ways to do a VM, which depends largely on what languages you want it to run. A VM designed for C / C++ code (which asm.js pretty much is, as is LLVM) is very different from something like the JVM or .NET CLR, which is different from the kind of VM a JavaScript implementation has. Different groups of people want completely different things, which are often diametrically opposed to one another. The kind of VM a game developer would want is completely different than the kind of VM you'd want for developing business applications, and either VM would be essentially useless for the other group.
Besides that, what's the point? Modern JavaScript JITs are stupidly fast, even when they have virtually no information about the code they're running (types, in particular). You can already cross-compile a number of languages to JavaScript, which often treat it as a low-level target.
Remember, the web is a huge collection of different systems (browsers, servers, websites, development tools, and so on) which really can't just be upgraded in one pop. It has to evolve, and it has to work at every step of the way. If something doesn't work, or isn't usable, it tends to die, even if it was a really good idea.
There's one possibility, though:
People start using JavaScript as a compilation target. Improvements in JavaScript engines make this approach viable, and reasonably fast. Compilers tend towards using JavaScript in a certain way, which is kind of predictable. Browser manufacturers notice this, examine the subset, and come up with a spec describing it, and a plan for optimizing it. Compilers start producing the standardized subset, and JavaScript VMs gradually start adding optimizations for the subset. Browser vendors start competing with one another for performance, pushing the optimized VM close to native speeds.
Then, eventually, someone might take what those browsers are already doing, write a bytecode implementation of it, and a JavaScript shim that turns the bytecode back into JavaScript to support existing browsers. If there's any advantage to it at all, this bytecode would be trivial to adopt in a browser (it's close to what they're already doing), so there's no reason for browser manufacturers to resist it. If there are no advantages to it, nobody will bother.
It'll probably end up being a perfectly reasonable feature, based on a design that would be completely insane if you designed it from scratch, but works anyway.
Forcing everybody to use the a ubiquitous, homogeneous VM implementation is probably not going to work. Browser makers like being able to improve their implementations on their own at a brisk clip. Also, while standardization is a good thing, competing implementations are also a good thing.
You actually said "standardized VM," which I took to mean "one VM upon which everybody standardizes."
But even now, it's unclear to me what you mean to be standardized about this VM if not its input and output (as we already have that with JavaScript VMs) and not the implementation. What are you asking for?
A VM with standardized input and output is exactly what I meant. Nevertheless, the underlying implementation of said VMs can still differ but they would all implement the same bytecode standard (similarly as both HotSpot and Dalvik both implement the JVM bytecode standard).
But JavaScript VMs already have standardized input and output. It is not a bytecode, but it is standardized input with standardized output. So we already have what you want except that the input format is different than you envisioned. It seems to me that's the whole idea behind asm.js — they're defining a subset of the language that can be implemented very efficiently with precise low-level semantics so that we get most of the benefit of a bytecode without throwing out all that we have now.
I thought we had made a standardized VM language called Javascript so we could forget this whole activeX thing ever happened..?
Not that I disagree that a VM would be very awesome, but it should take as many clues from the collected knowledge of Javascript security as possible, obviously just hooking your 'sandboxed' VM to the browser doesn't cut it. As demonstrated by SUN's Java applet :)
LLVM byte code in an OS secured jail/sandbox? The JIT is already there and BSD licensed so all the players can use it. You'd really have to trust your sandbox though.
The API for what you what you can do out of your sandbox would be the hard part. Every capability you add to the API is also a lurking attack vector in each implementation.
Microkernels in the browser to the rescue :-). But yeah, I had something like that in mind. I do realize that getting security right would be tricky, but I'm not sure if it would be that much trickier than say the security of any given JS engine. Since the semantics of said VM would be probably simpler, I would make the argument that getting the security right would be easier to do than the security of said JS engine.
I agree... I don't know who thinks having Javascript as "bytecode for the web" is any good: Even other languages(Fay, Clojurescript) need to compile to JS.
Wouldn't it be better if we had a standarized, fast, language that other languages like the aforementioned can target?