V8 Sparkplug – A non-optimizing JavaScript compiler

MaxBarraclough · on May 27, 2021

> With our current two-compiler model, we can’t tier up to optimised code much faster; we can (and are) working on making the optimisation faster, but at some point you can only get faster by removing optimisation passes [...]

> Enter Sparkplug: our new non-optimising JavaScript compiler we’re releasing with V8 v9.1, which nestles between the Ignition interpreter and the TurboFan optimising compiler.

This surprised me. Didn't V8 have a 'fast JIT' alongside an optimizing JIT as early as 2013? It was called full-codegen. Did they remove it?

https://wingolog.org/archives/2013/04/18/inside-full-codegen...

Leszek · on May 27, 2021

Author here: we did, and as other comments say we removed it in favour of Ignition, with the intent of at some point in the future bringing back some form of baseline compiler.

The big difference is that Sparkplug compiles from bytecode, not from source, and thus the bytecode stays the source of truth for the program. Back in the FCG days, the optimising compiler had to re-parse the source code to AST, and compile from there - even worse, to be able to deoptimise back to FCG, it had to kind of "replay" FCG compilation to get the deopted stack frame right.

We did actually have a configuration (never released) that had Ignition, FCG, _and_ TurboFan, and boy was it a mess...

MaxBarraclough · on May 27, 2021

> We did actually have a configuration (never released) that had Ignition, FCG, _and_ TurboFan, and boy was it a mess...

Do you mean it had technical debt that was never resolved, or do you mean 3 tiers is always too many?

I believe WebKit's JavaScript engine currently has 4 tiers, [0] although that includes an interpreter as the first tier.

[0] https://arstechnica.com/information-technology/2014/05/apple...

Leszek · on May 28, 2021

It had massive technical debt. Both Ignition and FCG had to compile from source, and we had to be able to tier up from both of them up to TurboFan, so TurboFan maintained two front ends - one to compile from bytecode, and one to compile from AST. There was all sorts of complexity then around deoptimisation point IDs, maintaining consistent suspension points for generators/async functions, maintaining consistent type feedback across Ignition and FCG. Even better, for a while we had Crankshaft in there too as an optional "alternative" top-tier (while we were working on TurboFan), in a configuration we called the "Frankenpipeline". You can see that wonderful mess on slide 26 of this presentation: https://docs.google.com/presentation/d/1OqjVqRhtwlKeKfvMdX6H... (from this talk: https://youtu.be/r5OWCtuKiAk)

Three tiers is perfectly fine if the complexity is handled well, I could even see us adopting a fourth tier at some point in the future (not dissimilar indeed to JSC), most likely between Sparkplug and TurboFan.

Gaelan · on May 28, 2021

More recent source: https://webkit.org/blog/10308/speculation-in-javascriptcore/

The exact tiers have changed, but it’s still 4 of them: an interpreter (LLInt), the Baseline JIT, and two optimizing JITs

kaba0 · on May 27, 2021

Could you please compare it to OpenJDK’s JIT compilers? I’m unfortunately not familiar with V8 - but do I get it right that they have sort of converged and just got more similar?

masklinn · on May 27, 2021

> Did they remove it?

They removed it a while back after they introduced the ignition interpreter: v8 used to only have compilers (full-codegen and crankshaft, the latter being the jit optimising compiler, full-codegen was not a jit).

One particularity of the pipeline was that since there was no interpreter both compilers worked from the source, which crankshaft had to keep around and re-parse.

In 2013 they started working on a new optimising jit (turbofan), but for a while it didn’t really improve things (though the failure modes were often better than crankshaft’s).

Then some engineers realised they could use part of turbofan for an interpreter, which would allow load sharing (amongst other things turbofan would jit ignition’s bytecode instead of having to re-parse from source), that brought in the improvements they were looking for, but then full-codegen was kinda sitting in the middle awkwardly with no integration, its low-end eaten by ignition and with no real high end. So it was dropped alongside crankshaft in v8 5.9.

Now they’re reintroducing a non-optimising compiler, but one that’s integrated with the ignition-turbofan pipeline, and actually a jit.

tom_mellior · on May 28, 2021

> full-codegen was not a jit

By what definition?

antpls · on May 28, 2021

The way I understand the comment :

With an interpreter, the JavaScript source is compiled to some bytecode, then the bytecode is executed in a loop, instruction by instruction.

With a compiler (not jit), the whole bytecode would be further fully compiled to some C apis before being executed, making it effectively an "ahead of time" compilation.

The catch is : since the code never was executed, there is very little static information to work with, so the compiled output is often just a giant unrolled interpret loop. The compiled version can be like 10% faster on very hot JavaScript loops compared to the interpreted version.

tom_mellior · on May 28, 2021

> With a compiler (not jit), the whole bytecode would be further fully compiled to some C apis before being executed, making it effectively an "ahead of time" compilation.

This hinges at least on the question of what you mean by "the whole bytecode". If the source is made up of N functions, and you compile all N functions ("the whole bytecode" of the entire program) to machine code before you know if any of them will be executed: Yes, I'd agree that that's a form of ahead of time compilation. Is this what full-codegen used to do?

On the other hand, if the source is made up of N functions, and you only compile "the whole bytecode" of each of them to machine code one by one, only those that will actually be executed, and only when you are preparing to actually execute them: That's JIT compilation. Even if it is the very first execution (i.e., you have never even interpreted), even if the compilation doesn't use fancy profiling or sophisticated code generation or whatnot.

> since the code never was executed, there is very little static information to work with, so the compiled output is often just a giant unrolled interpret loop

I think you mean "dynamic" information. And "the compiled output is often just a giant unrolled interpret loop" is pretty much the definition of what a baseline JIT compiler produces.

masklinn · on May 29, 2021

By the definition that it would compile the entire thing upfront? Hence the name.

tom_mellior · on May 29, 2021

By "the entire thing", do you mean an entire file, without knowing if all functions in it will be used? I'm just baffled because there are several posts here from V8 developers, and quotes from the FAQ, saying things of the form "this is like FullCodeGen, except X", and the "X" is never "FullCodeGen was not a JIT". Which suggests to me that the people closest to development of these things don't think of them in the same terms as you do, which is why I'm trying to understand your terms.

masklinn · on June 2, 2021

> By "the entire thing", do you mean an entire file, without knowing if all functions in it will be used?

Yes. Afaik full-codegen had no way to discover that.

> and the "X" is never "FullCodeGen was not a JIT". Which suggests to me that the people closest to development of these things don't think of them in the same terms as you do, which is why I'm trying to understand your terms.

If you look at the langage they used, they’d always call full-codegen a compiler or baseline compiler. Not a jit. Because there was nothing to gather time-of-use information from. And full-codegen didn’t keep parse information around which is why crankshaft had to separately re-parse everything, full-codegen could not hand it the source or bytecode of functions to optimise.

chrisseaton · on May 27, 2021

> This surprised me. Didn't V8 have a 'fast JIT' alongside an optimizing JIT as early as 2013? It was called full-codegen.

Did you see the image at the bottom of the design document? That explains it.

wonnage · on May 27, 2021

I'm not sure how you'd expect people to find that since it's not linked from the article.

ptx · on May 27, 2021

Where?

IainIreland · on May 27, 2021

In the document here[1], there's a picture of Gandalf the White at the bottom with the caption "Sparkplug: I am FullCodeGen. Or rather, FullCodeGen, as it should have been".

[1]:https://docs.google.com/document/d/13c-xXmFOMcpUQNqo66XWQt3u...

chrisseaton · on May 27, 2021

https://docs.google.com/document/d/13c-xXmFOMcpUQNqo66XWQt3u...

MaxBarraclough · on May 27, 2021

Thanks. For those following, see specifically FAQ > Isn’t this just FullCodeGen?

star-trek-fleet · on May 27, 2021

I guess it's not obvious that answers with information are mostly adjective or preposition words are poor answer to engineering questions, because engineers are looking for (relatively) precise answers.

For example: * What is bottom? Did it means one of the pictures in the bottom page. Or the absolute bottom.

* What if the picture was not at the bottom, instead, it was followed by large chunk of texts?

* What deisng doc? Was it mentioned in the parent post, or was it a "well known" design doc?

* Design doc for which topic? Turbofan/Craneshaft/etc? (This is nitpicking, but still happen often because one might found this as the first sight, not following the thread. Again, reading answers for engineers are also a skill, quite different from reading, for example, science fiction, another day another topic)

* What does the figure explain? What if the figure shows something that not match my concept of the question?

In the end, answering engineering question, should be, and have to be, stating accurate facts. If and how that actually answers the question, is more depending on the readers and questioner's continued communication.

Edit: To make sure I was not leaving wrong impression, those questions I listed above was encountered everyday for the answers I received. It's not nitpicking. You can read this thread for an example how answers without concrete information being inappropriate engineering answers.

coldtea · on May 27, 2021

Note how we didn't learn anything (precise or not) about the answer to the question from this comment, but we did learn your preferences about communication styles.

star-trek-fleet · on May 27, 2021

Now the parent was gone, it seems I am random ranting...

BTW, I view this as not about communication style, but communication efficiency.

tamlin · on May 27, 2021

Very cool, that's a decent performance bump.

It'd be nice to have some simple benchmarks that directly compare ignition, sparkplug and turbofan (and full-codegen too). And also time spent compiling.

It looks somewhat similar to context threading as I remember it:

http://www.cs.toronto.edu/~matz/pubs/demkea_context.pdf

but it's been a while...

Ha-ha, the stacks grow downwards button is priceless :-D

inglor · on May 27, 2021

If anyone is interested here is the SparkPlug design document I have bookmarked from a few months back: https://docs.google.com/document/d/13c-xXmFOMcpUQNqo66XWQt3u...

inglor · on May 27, 2021

One interesting point from there (regarding their old FullCodeGen original JIT compiler):

> Isn’t this just FullCodeGen?

> Kind of! It’s FullCodeGen for modern V8, and that’s not such a bad thing. This brings the performance benefits of FCG back to V8, but

rkagerer · on May 27, 2021

A CPU is effectively an interpreter itself, albeit one for machine code; seen this way, Sparkplug is a "transpiler" from Ignition bytecode to CPU bytecode, moving your functions from running in an "emulator" to running "native".

quotemstr · on May 27, 2021

Yep. Formally in CS, there's no difference between a "compiler" and a "transpiler". They're both names for the process of program translation. All compilation is "transpilation", even when you're compiling things like C++.

hajile · on May 27, 2021

To me, in general, compiling seems to imply native output where optimizations are done while transpiling seems to target another high-level language where semantics are kept as close to the original as possible.

kragen · on May 28, 2021

You are welcome to use the word "compile" with a non-standard definition, but its standard definition does not imply either that the output is native code or that optimizations are done. There are numerous well-known compilers whose target language is not native code (the original Cfront, javac, C++ on the CIL, most Pascal compilers from the 01970s, Chicken, GHC's old C backend) and numerous well-known compilers whose output is totally unoptimized (tcc, perl -MO=C, META-II, arguably the original Turbo Pascal). So if you want to understand what other people are saying about "compilers" you should keep in mind that the standard definition includes what you are calling "transpilers".

kaba0 · on May 27, 2021

There is no real distinction, it is all just bad usage of the word. There are only compilers. It may compile to a high level language, but “transpiling” adds zero information to the sentence.

throwawaw · on May 28, 2021

I just say "compile", too, because it's strictly correct and I like being correct. But "transpiling adds zero information" seems unfair: when you see that word, you at least know the speaker meant "compiling to something other than machine code". No one has ever called gcc a transpiler, so they definitely aren't talking about gcc.

I like the reasoning in the sibling comment:

> So if you want to understand what other people are saying about "compilers" you should keep in mind that the standard definition includes what you are calling "transpilers".

Really, the main disadvantage of saying "transpile" is that it implies you don't know what "compile" means. In a context where that wasn't an issue -- e.g., if I heard it from one of my coworkers, who certainly know what a compiler is -- I would just think they liked the neologism.

kaba0 · on May 28, 2021

I meant the zero information part like you have to state what is the output. So in a typical sentence containing “transpiler”, you would say something like “Nim transpiles to C” - which has the exact same information content as “Nim compiles to C”.

TazeTSchnitzel · on May 27, 2021

Compiling between high-level languages is a similarly difficult endeavour to compiling to low-level ones!

Rapzid · on May 28, 2021

Colloquially though, compilation typically implies moving toward a more low level(less abstract) representation.

SigmundA · on May 27, 2021

"The second trick is that Sparkplug doesn’t generate any intermediate representation (IR) like most compilers do. Instead, Sparkplug compiles directly to machine code in a single linear pass over the bytecode, emitting code that matches the execution of that bytecode."

Isn't bytecode an intermediate representation?

https://en.wikipedia.org/wiki/Intermediate_representation

Am I just being pedantic?

pizlonator · on May 27, 2021

For Sparkplug, bytecode is the input representation. But you’re right that bytecode is an intermediate representation of V8. So to me, whether bytecode is an IR or not depends on whether you’re talking about V8 as a whole or Sparkplug in particular.

To me, an intermediate representation is what the compiler translates the input to other than machine code. Sparkplug goes straight from its input (bytecode) to machine code.

So you’re not being too pedantic. It’s just that V8 has two compilers now, one that has an IR of its own (turbofan), and one that doesn’t (sparkplug). Lack of IR is one of the things (maybe the most important thing) that makes Sparkplug special. If you define IR in a way that means that bytecode is an IR then I don’t know how to easily articulate what it is that makes Sparkplug special.

chrisseaton · on May 27, 2021

In the compiler community, people usually use IR to mean something lower than bytecode, usually used as part of the compilation process itself rather than used for storage or when not actively compiling. IR is usually not intended for direct execution.

(LLVM IR is an unusual exception therefore. Please don't 'well actually' me I'm trying to give a useful practical explanation.)

Ar-Curunir · on May 27, 2021

I thought IR was just the internal processing format used by a compiler, eg: MIR or LLVM IR, and so is at a level usually “higher” on the abstraction stack than assembly/bytecode, which can be seen as the primary outputs of a compiler

MikeHolman · on May 27, 2021

IR is usually represented as a sort of linked list of instructions where you can easily move around instructions, add/remove them, and replace higher level instructions with lower level instructions as you go through compiler passes. At the end, the IR is a linked list of machine instructions, which gets written out to a code buffer as the last step.

Bytecode is usually a buffer of high level instructions that is compact and fast to read (for the machine). Good for interpreting and holding the semantics of a program in a format that is more efficient than the source code, but it is usually not a good format for running compiler passes.

chrisseaton · on May 27, 2021

We'd say it is 'lower' not 'higher' as it usually decomposes single higher-level compound operations into multiple simpler operations, and it makes things that were implicit explicit so they can be reasoned about. Usually more verbose therefore. Converting into IR is often called 'lowering'.

titzer · on May 27, 2021

Even if you do consider bytecode to be an intermediate representation, the statement "Sparkplug doesn’t generate any intermediate representation (IR)" is true. It goes from one existing representation to machine code without creating a new one in the process.

SigmundA · on May 27, 2021

Right the javascript compiler is Ignition + Sparkplug with the bytecode as IR. Sparkplug doesn't parse or optimize, it relies on Ignition for that, so its not really a full compiler, its just the backend of a compiler.

https://en.wikipedia.org/wiki/Compiler#Back_end

masklinn · on May 27, 2021

Well except ignition is first and foremost an interpreter, so sparkplug is a compiler backend welded onto an interpreter frontend.

SigmundA · on May 27, 2021

Ignition optionally interprets but always parses javascript, optimizes and compiles bytecode. So IMO Ignition is primarily a compiler frontend that has an optional interpreter back end built in.

I could see them splitting Ignition into a bytecode compiler and separate interpreter and give them separate names.

titzer · on May 28, 2021

The V8 parser predates Ignition, so most V8 developers will consider it a separate subsystem and Ignition really to be just the bytecode format and interpreter for that bytecode. The parser was used to feed fullcodegen back in the day, and TurboFan also used to have an AST-to-TurboFan phase. Nowadays, I believe the only JavaScript path into TurboFan is actually through Ignition bytecode. TurboFan also has a frontend for Wasm bytecode (or viewed conversely, the Wasm engine has a TurboFan IR generator, since this lives in the wasm directory and is maintained by a subteam).

SigmundA · on May 28, 2021

So does the parser generate byte code or ignition? Is ignition just an interpreter or is it a byte code generator too? Does ignition take byte code or is it fed the ast from the parser to generate byte code?

Leszek · on May 28, 2021

"Ignition" was the codename for the entire interpreter project, including both the Ignition compiler and the Ignition interpreter. So the answer to your question is "yes".

titzer · on May 28, 2021

The parser produces an AST, which is an in-memory data structure that is a tree. Ignition (in src/interpreter) has a visitor that walks over the AST and generates bytecode.

meheleventyone · on May 27, 2021

I guess the difference is that the bytecode is executed by the interpreter and an intermediate representation typically isn’t executable?

KMag · on May 28, 2021

I think the most useful definition is that a program representation is an intermediate representation if its primary raison d'etre is as a starting point for building other program representations. (We assume source code has dual purposes of being edited/reviewed and also as the starting point for later program representations, so human-edited source code wouldn't be considered intermediate. On the other hand, JavaScript used as an implementation detail for some high-level language compiled to WASM would be considered an intermediate representation.)

In this case, if they didn't have an interpreter, they'd probably use an abstract syntax tree or control flow graph as their common source of truth for their two compilers. So, the primary reason the bytecode exists is to be interpreted by the interpreter and therefore isn't intermediate.

SigmundA · on May 27, 2021

Well some processors can execute Java byte code directly, and most of the time its jitted, however originally it was just interpreted.

Twirrim · on May 27, 2021

Does anyone actually do that other than Transmeta back at the turn of the millenium?

SigmundA · on May 27, 2021

Not really anymore but many phones had Jazelle capable processors in the mid 2000's:

https://en.wikipedia.org/wiki/Java_processor

johntb86 · on May 27, 2021

It seems like Sparkplug per se isn't generating the IR. Instead it's the interpreter doing that.

IshKebab · on May 27, 2021

Yes but Sparkplug doesn't generate any bytecode.

kragen · on May 27, 2021

It's disappointing that they don't include any performance results from Sparkplug itself, only from V8 using Sparkplug some unspecified and unknowable percentage of the time. While it's true that Octane doesn't provide optimal performance targets for engineering V8, it does provide a legible approximation of the relative speeds of different operations.

From what's in the post, we can't tell if the interpreter is 128× slower than optimized code while Sparkplug code is only 32× slower, or the interpreter is 32× slower and Sparkplug code is 16× slower, or the interpreter is 32× slower and Sparkplug code is 4× slower (which is my guess as to the reality of the situation.) I'm more interested in this is-it-4×-or-32× question than the possibility that microbenchmarks might misrepresent a 32× slowdown as being a 48× slowdown or a 24× slowdown.

https://docs.google.com/document/d/13c-xXmFOMcpUQNqo66XWQt3u... does offer more performance numbers, but they're for "an early prototype", and more importantly they don't break out the compilation speed vs. the execution speed.

The description in the post makes it sound like Sparkplug output is pretty similar to direct-threaded Forth code, just a sequence of subroutine calls to other methods and implementations of primitive operations, interspersed with the occasional open-coded primitive (=== maybe) and control flow. And typically that means an overhead of around 4×, although maybe it's more with JS's aggressive implicit coercions and whatnot. It sure would have been nice to see some disassembly listings, too.

Leszek · on May 27, 2021

Author here: we do have various numbers collected here, with varying levels of "I'm sure enough about these numbers to share them in a public blog post".

Compile time is on roughly the same order of magnitude as Ignition compilation (just AST to bytecode, so excluding parsing), and roughly two to three orders of magnitude faster that TurboFan. The relative performance to the interpreter, as the sibling comments point out, varies _wildly_ by workload, but around 4x is probably a decent approximation for something not entirely dominated by property loads.

kragen · on May 27, 2021

Thank you, that's great! How fast are property loads?

Leszek · on May 27, 2021

Those vary widely too, depending on what (and more importantly, how many) object shapes they see. They'll be roughly the same cost in Ignition and Sparkplug though, since they effectively call the same handlers for those loads (and way faster in TurboFan, since that one then can generate optimised code tailored to that type feedback)

kragen · on May 27, 2021

Oh, so code that's entirely dominated by property loads will be much slower in Sparkplug than in TurboFan, by a factor of more than that 4×, perhaps closer to the 32× I was SWAGging for Ignition's overall relative slowdown? I guess at this point I ought to be running microbenchmarks instead of asking you to do it for me... ;)

Leszek · on May 28, 2021

Sorry, the 4x was my estimate of Ignition vs. Sparkplug (I guess it's closer to Iain's estimate of 2-4x). Sparkplug vs. TurboFan is crazy workload dependent. For microbenchmark-y tight loops, integer attorneys, and clean single-shape ("monomorphic") property loads, TurboFan is orders of magnitude faster. For small functions that just have non-inlineable function calls and heavily polymorphic property loads (with lots of shapes so you can't skip the property lookup), like you see in a lot of web code, they're close to on-par (because there's nothing TF can optimise away). If you put a gun to my head, I'd say it's on "average" maybe 10x? Very much a Fermi approximation though.

kragen · on May 28, 2021

Oh, thanks! That's surprising! And precisely the kind of question I was interested in.

IainIreland · on May 27, 2021

As an educated guess: Sparkplug is very similar in design to SpiderMonkey's baseline compiler. I ran a quick test a few weeks ago to see what the performance delta was between SM's baseline compiler and baseline interpreter, and going from the interpreter to the compiler was about a 2x speedup. Based on that, I'd expect that you're looking at a ~2-4x speedup from the interpreter to Sparkplug.

Edited to add: Which is a very good result! Congrats to the V8 team for landing this!

sfink · on May 27, 2021

I doubt guesstimating based on interpreter loop overhead is going to get you that far, because Sparkplug looks like it has ICs. I would expect their impact to dominate on many, many workloads.

kragen · on May 27, 2021

ICs? Integrated circuits?

sfink · on May 27, 2021

Sorry, Inline Caches.

From what I understand, there are basically 3 things that provide the bulk of performance gains in compilers: inline caches, register allocation, and inlining. The latter two don't really apply to interpreters or baseline compilers.

Compilers do many optimizations beyond those, but generally only for incremental improvements.

kragen · on May 27, 2021

Oh, sorry, that should have been obvious. Are you suggesting that the performance improvement from inline caches (whether PIC or by-the-book monomorphic Deutsch & Schiffman) wouldn't show up in microbenchmarks, or would be misleadingly effective in microbenchmarks? IIRC Octane has a benchmark or two specifically for this...

I think constant folding, specialization, and loop hoisting (especially of bounds checks) commonly produce more than merely incremental improvements. If you don't have some kind of inlining you don't get those (usually) but inlining by itself is usually only a minor improvement.

sfink · on May 27, 2021

You're right, inlining by itself doesn't do much. I think inlining + register allocation is the big one, but you might be right about the other optimizations that are enabled (or at least massively boosted) by inlining.

ICs steal a lot of specialization's thunder. (Alternatively, since the point of ICs is specialization, you could say it's the other way around!)

kragen · on May 27, 2021

Well, just as inlining is necessary for constant propagation or loop hoisting in a lot of cases, specialization is necessary for inlining in a lot of cases. Despite the nominative similarity, inline caches don't enable inlining in the same way specialization does, because they just enact the transfer of control more quickly; the code they transfer control to is just the standard generic version of the code, not an inline copy that can be optimized further according to the arguments at that callsite.

ndesaulniers · on May 27, 2021

"I think the stack grows downward" was a nice touch, lol.

Rapzid · on May 27, 2021

I love how it doesn't go into any details and just leaves people to wonder why there is a button casually challenging/catering to their assumptions haha.

Leszek · on May 27, 2021

Just wait until you look at the source code which decides which way round to show you the stacks at first...

ndesaulniers · on May 27, 2021

Oh, I already enjoy

#define __ basm_.

azinman2 · on May 27, 2021

Wonder why the M1 is so much worse on certain benchmarks. I wish they’d gone into detail there as it’s not uniform across their tests.

Leszek · on May 27, 2021

Author here: this is indeed just relative score change, the M1's absolute values are of course impressively high. My personal, unofficial suspicion is that this has something to do with the size of the M1's reorder buffer being so large that it can just predict and speculate straight through all the interpreter overheads; but that's just a guess.

azinman2 · on May 28, 2021

But why is it that one of the benchmarks is so negative then? What is it about that test?

verwaest · on May 29, 2021

I'm guessing that benchmark is so short-running that adding compile overhead doesn't result in code that manages to pay for the compile overhead. It's unlikely a huge problem on that short-running benchmark though. But fair to show the weak points as well as the strengths of Sparkplug.

drew-y · on May 27, 2021

Those benchmarks measure performance increase rather than performance overall. M1 isn't necessarily doing "worse" as much as it just didn't see as much of an improvement as the others.

azinman2 · on May 28, 2021

But there’s one that went way negative, on a relative basis.

kaliszad · on May 28, 2021

Now the compiler people at Google and other contributors to Chromium could be much smarter about when CSS needs to be recalculated. The behaviour really seems very arbitrary at times and there is no high-level overview we have found, how to write a Web-App to avoid expensive recalculations. (See https://orgpad.com/list for examples.)

Btw. the world of browsers currently is totally crazy. You have people dropping to HTML canvas and own rendering because manipulating the DOM is so slow for anything serious. There are bugs in CSS and SVG all over the place, you cannot really read more in depth documentation how to program the web for performance in this space. (If you have great, very in depth documentation, please link it!) The situation is so bad, Google teams rather switch to their own rendering in HTML canvas instead of talking to their peers and letting them fix the DOM manipulation for everybody. Actually, Chrome/ Chromium feel progressively worse with each release. There are always new regressions and there don't seem to be any real improvements in efficiency. The result is, we are busy working around idiotic APIs and bugs like 90% of the time. Yeah, OrgPad is not a trivial app, but we have something like 1000x better hardware than in the 90s and some of the interactions feel much slower than the games we had back then.

Does anybody else have a similar experience or some actually helpful resources we possibly overlooked?

pavpanchekha · on May 28, 2021

I'm writing a book on web browser internals with Chris Harrelson, who heads the Chrome rendering team:

https://browser.engineering/

It's in progress but we cover the core architecture of a modern browser in some depth. It's funny you mention CSS recomputation... I have a student working on CSS and layout recomputations (with the Google Chrome team, who are fantastic) right now. It's a challenging and exciting area.

k__ · on May 27, 2021

I like, how the "improvements" for Reddit are negative, lol

a_wild_dandan · on May 27, 2021

The Reddit redesign is far and away the most sluggish, unresponsive major site I've ever used. It brings my fancy $3,000 laptop to its knees.

The sad thing is, the front end stack they use (React/Redux) is great. But it's an abysmal example of the tech. I turned on re-renders in the React devtools, and this is what it looks like: https://i.imgur.com/Sm1U24M.gifv

Makes me sad.

fastball · on May 27, 2021

Oh no that is bad.

sergiomattei · on May 27, 2021

I'm usually very pro-SPA, but Reddit messed up theirs so bad, it's plain unusable.

Most interactions using a 2018 iPad Pro w/ Safari take seconds to execute. This device can handle anything I throw at it except reddit.com.

The new web app is a plain regression.

masklinn · on May 27, 2021

TBF the entire concept of an SPA for Reddit is insane to begin with. There are no complex split-second interactions or anything. It’s a case study in what does not benefit in any way from spas.

I have pretty much the same thoughts about discourse, which I loathe.

8note · on May 28, 2021

I imagine the ads benefit from it.

phpnode · on May 27, 2021

I feel like they're pretty strongly incentivised to not improve it. They want to push people to the app, and they have enough traction and community lock-in that they're not really worried about losing traffic

nice__two · on May 28, 2021

This is indeed the correct reason. The app is much more usable.

That’s why I switched to teddit.net. It’s much faster, resembles the old Reddit layout and supports dark mode.

Leszek · on May 27, 2021

Author here: ah well, you win some, you lose some. This is just v1 that's shipping with M91, we've already made improvements since then and there's plenty more low hanging fruit to pick, heuristics to tweak, etc.

pizlonator · on May 27, 2021

This is an impressive result! Congratulations are in order. :-)

zachrip · on May 27, 2021

Am I the only one struggling to read the excalidraw'ings?

Leszek · on May 27, 2021

Yeah, sorry, dark mode support needed CSS changes, so if you have cached CSS then it looks wrong. A hard refresh should fix it.

pistachiopro · on May 27, 2021

Seems like the page background and text invert for OS dark mode, but the drawings don't.

fwip · on May 27, 2021

Thanks, changing my OS to light mode made the diagrams readable.

cj · on May 28, 2021

This is off topic but:

After clicking the link to the article, I hit the back button back to HN and a Chrome popup display "Click Install V8 to get back to this site easily" or something of that nature. It was definitely a chrome popup triggered by leaving the site, since the popup was on top of the bookmarks toolbar.

Anyone notice that happening as of the latest chrome release? I've seen this a few times, and it always catches me off guard.

IshKebab · on May 27, 2021

This sounds like it should be so fast I'm surprised they even kept the interpreter. Is there some delay setting JIT up or something?

titzer · on May 27, 2021

Memory consumption and cold code. Machine code is larger, and it takes time to generate that only pays off if you execute the code many times.

nayuki · on May 27, 2021

This feels like a repeat of the C1 and C2 compiler tiers in the Java HotSpot virtual machine. https://openjdk.java.net/groups/hotspot/docs/HotSpotGlossary...

titzer · on May 27, 2021

It isn't. C1 and C2 are two competing optimizing compilers in HotSpot, originally developed by different teams. There is a big difference in compilation time vs code quality between those, though that narrowed as C1 evolved from almost a template-compiler to an SSA-CFG-based compiler. C2 was and is a graph-based, heavily optimizing compiler with a graph coloring register allocation. It's only been in the past 5-10 years or so that both compilers have been used in concert for tiering. Prior to that, they were selected by the "-client" or "-server" options, once at startup.

Sparkplug is designed specifically as a baseline compiler that does not have an IR, in order for maximum compile speed. While the idea of two compilers with different compile time/speed tradeoffs is similar, the constants involved are pretty different.

TurboFan is similar to C2, but Sparkplug is not similar to C1. They don't compete, but cooperate.

kaba0 · on May 27, 2021

If you happen to be somewhat familiar with GraalVM, could you please give us a very brief comparison to it as well? I am very interested in the topic, but it seems very hard to get into it.

titzer · on May 27, 2021

Probably better that a Graal expert chimes in, but AFAIK GraalVM has a single optimizing compiler, the Graal compiler. I think it reuses HotSpot's bytecode interpreter. I am not sure if it can operate with either C1 or C2 also enabled. With Truffle, you can write AST-walking interpreters that get automagically turned into optimizing compilers via partial evaluation. It takes some care and tuning to make work well.

cbmuser · on May 27, 2021

I wish there was a portable version of V8 similar to OpenJDK Zero which would allow Chrome to run on more platforms.

It‘s really a pity how Qt applications such as kmail became less portable when the underlying web framework was switched from WebKit to Chromium/Blink.

eevilspock · on May 27, 2021

What tool are they using to generate those "hand drawn" illustrations? i like them.

nielsbot · on May 27, 2021

Here's one https://roughjs.com

Leszek · on May 28, 2021

This is correct, manual calls to roughjs (which is the same library as used by excalidraw afaik)

sambe · on May 27, 2021

A sibling poster suggested it is https://excalidraw.com/

Lorin · on May 27, 2021

The blog illustrations don't display properly in 'dark mode' site theme.

gmaster1440 · on May 27, 2021

Neat, would love to understand how this would affect Node.js performance. Tooling like esbuild[0] exists for the very reason that compiled languages are significantly faster, so I wonder how viable of an alternative (in terms of performance) is it to build an esbuild in `node --sparkplug`.

Or take the existing Webpack CLI which is written in Node, what kind of improvements can we expect there, if any?

[0] https://esbuild.github.io/faq/#why-is-esbuild-fast

klodolph · on May 27, 2021

It's not really a good blanket statement that compiled languages are significantly faster. There are a lot of problems getting tools written in JavaScript to run fast, and the fact that JavaScript is "not compiled" is a bit farther down the list. There are other pressing issues, like the fact that code written in Go, like esbuild, is free to run multiple threads and freely share data structures between threads.

The distinction between compiled and interpreted languages is also a bit fuzzy to begin with.

gmaster1440 · on May 27, 2021

Yes, as the link I shared mentions, there's concurrency via actual threads, and proper memory sharing—by no means implying compilation is all there is, but it does seem at the very least to be a contributing factor, and understanding the change to bottom-line performance would still be welcome.

wonnage · on May 27, 2021

This would mostly benefit performance the first time some code is run, before v8 has a chance to gather runtime data and optimize it. Since all JS execution goes through this phase, this should make everything a little faster, but it will probably benefit short-lived processes (e.g, a website script) more than something like a bundler.

ksec · on May 27, 2021

So they went from two tiers to three tiers, similar to what WenKit JSC is doing which I think is four tiers.

I wonder how will it compare to JSC.

inglor · on May 27, 2021

Technically four tiers: Ignition, Sparkplug, TurboProp, TurboFan. The design document goes into more detail https://docs.google.com/document/d/13c-xXmFOMcpUQNqo66XWQt3u... as well as the "The Case For Four Tiers" document (Google Internal mostly ATM https://docs.google.com/document/d/1VNUPPb2JQh0yg8SR2Y2SAk7h... )

sfink · on May 27, 2021

Yeah, it's kind of funny. A couple years back, V8 had a baseline interpreter and optimizing compiler, then added a baseline compiler. Spidermonkey had a baseline compiler and optimizing compiler, then added a baseline interpreter.

My guess is that all of the main JS engines will eventually evolve into crabs.

masklinn · on May 27, 2021

> A couple years back, V8 had a baseline interpreter and optimizing compiler, then added a baseline compiler.

Initially v8 only had a baseline compiler and an optimising jit (in fact that was very much the originality of v8 at release).

A few years back they switched to an interpreter + optimising jit (ignition / turbofan), then they added the mid-tier turboprop which is turbofan with the most expensive optimisations dropped.

gloryless · on May 28, 2021

I see you, excalidraw!

mastrsushi · on May 27, 2021

Let us all take a moment to acknowledge the author's choice in deliberately titling the post "non-optimizing"

londons_explore · on May 27, 2021

* Figure out which bits of javascript consume the most CPU time globally.

* Rewrite those bits of javascript in C++, and patch them in whenever they are seen byte-for-byte in the wild.

* Have both a fuzzer and a webcrawler verify the two implementations never differ.

Obviously it's vital the C++ and javascript implementations behave identically - so probably best to have a bunch of checks that the input is in the exact form expected, and fall back to the javascript version if the C++ version can't be guaranteed to behave the same.

The C++ versions of say the globally most popular 25 MB of javascript source code could be shipped with Chrome, or as a seperate download-on-first-use component.

KMag · on May 28, 2021

From 2006 to 2010, I worked on the Google indexing system, primarily writing C++ code that executed every single bit of JavaScript in every HTML page Googlebot could find. At any time, there were roughly 1 million threads on roughly 1,000 machines executing some C++ code analyzing web pages, though at any moment, the majority of these threads were waiting on BigTable I/O.

Saving a bit of JavaScript execution time on the most costly JS snippets would have translated into real money savings in terms of both capital and operating expenses.

Especially since there were tons of pages where the most popular JS frameworks were being loaded and executed once, I thought about dumping SpikerMonkey bytecode (or toward the end, v8's first-pass x86 machine code) to disk/BigTable, or perhaps C++ rewrites of several of the most common JS frameworks.

I basically wrote a high-performance low-fidelity browser simulator that simultaneously pretended to be FireFox and IE (so that it would mostly work on IE-only pages and also mostly work on FF-only pages). There was some appeal to the idea of tweaking the JS frameworks to fit my simulator rather than tweaking the simulator for all these little corner cases. (I remember having to improve simulation fidelity for Saab's unified landing page for several Arab markets.)

The thing is, a high-fidelity rewrite of a small number of JavaScript farmeworks into C++ is probably roughly equivalent to writing the first version of v8, and you need to throw it all away as the frameworks upgrade versions.

There's still some merit to caching the bytecode for the most popular JS frameworks, and I wouldn't be surprised if Google currently does this for executing JS in the indexing system.