This is really cool. I understand why all the front end tooling is written in JS but it really is a pain to deal with in my experience, especially when it comes to building and deploying. Since they're essentially just scripts you have to also install the right versions of node and npm and use npm to install the dependencies (and watch out for native extensions!). It's fairly tedious to get it all right, whenever I try it, since I'm not a pure frontend dev, and things have always changed since the last time I did it. But the promise of this is just dropping in a binary and having it all work!
I actually like JS as a programming language for the web, I just don't like scripting languages in general for command line tools.
Funny, I actually view it as the opposite. swc is distributed via npm as a native extension, so you need to have the right rust version installed in order for it to work. I tried `yarn add --dev swc` like in the instructions, but it failed. I already had rust installed but I guess it wasn't the right rust. I'd be hesitant to tell my team to use it because of the extra dev environment setup.
JS is a universal binary format that tends to just work from my experience. You do occasionally run into issues if you're on an old node version, but other than that it has been pretty reliable for me. But it may just be a difference in familiarity.
This is one area where wasm may shine; you can distribute it pre-compiled, nobody using JS needs a rust toolchain installed. This is something the webpack team has indicated interest in, for example.
Of course, wasm will need threading first for this project.
I do it by downloading it and building it myself, or trusting a large beaurocracy running the project to audit the source and produce signed packages, on behalf of the third party developers. Debian style.
So, is npm getting a wasm build infrastructure to support this?
That makes auditing that the wasm matches the code pretty hard. (It's also a problem with adding minified JS). Auditing just the code is hard enough as is. :(
Typically Rust programs are distributed in pre-compiled form so you don't have to have a Rust compiler installed. I'm not sure why this project ships source code currently, but it could probably ship binaries too.
You mean typically “available” in pre-compiled form, right? Because Rust doesn’t have a binary distribution system per se, it is at the mercy of package managers to build and distribute binaries, or maintainers need to upload build artifacts to, say, GitHub releases (and installation process would be nonstandard as a result).
I get most of my Rust binaries through cargo when on Ubuntu, except in a few cases when I really want rg (still not available in Ubuntu 18.04) but don’t want to waste 25% storage on the Rust tool chain.
> What do you think is the essential difference between a 'script' and a 'program'?
I never said "program". I contrasted a script vs. a "binary" by which I meant a compiled executable. What I meant to highlight was the difference between the JS toolchain which is delivered as an artifact which needs an installed JS interpreter to evaluate. I've had issues my node/npm version to what the script expects, as well as managing different expectations by different scripts. In contrast, executables which are compiled for your architecture in advance and can be executed on their own without further dependencies are much simpler to use. Just download and run. Yes, there's static vs dynamic linking, but I mean to highlight how nice the (typical) Go workflow is which tends to statically include everything, moreso than rust, but rust isn't far behind.
This is a difficult question (and I think you're right in a technical sense in your implication that there isn't a true difference). Colloquially, I think of "scripts" as programs distributed as ASCII or UTF-mumble (or other widely-used string formats) source code that have to be evaluated by different programs installed on your computer, e.g. most JS, Ruby, Python programs.
It's a murky distinction, because it's very common to preprocess programs before distribution to the extent that the original code is hardly discernable; for example, using Uglify or transpiling ES2017 to ES3. And you can package programs in both Python and Node such that they actually don't depend on your system having a Node or Python interpreter installed. And I doubt people would refer to C programs as scripts, even if they're distributed in a BSD ports- or Gentoo-style packaging system that defaults to distributing source code rather than precompiled binaries.
But I think a decent rule of thumb is that if you can run the original source code file(s) directly by adding the right shebang to the top, it's a script.
one big differnce is that compiled binaries start much faster. but on a modern computer we are talking milliseconds. you have to take all trade offs into account though, people complain that apps made in JS needs a runtime. and that dowloading that runtime is a waste of space, but they forget that a statically linked binary can also get rather big.
Scripts are a subset of program focusing. The primary characteristic being they rely on another program to interpret the source at run time. Scripts also tend to be a problematic subset since libraries written in scripting languages tend to have poorly defined interfaces.
the library implements how to do the lower level -- which things you can do, and takes care of doing it
and the script deals with which things you want to do and when
some examples:
I made some opencv bindings for JS, and the library changes did the heavy image processing algorthims, and the JS script chose which ones to use with what parameters
for some video game or another, there was a library for what an NPC can do, and scripts for how they act within the world, including things that are more like movie scripts detailing whow says what adn where they walk
In this sense, JavaScript has advanced far beyond a "scripting language". TypeScript and Flow both allow plenty of formal verification with a type system that's quite a bit more expressive than many popular compiled languages have. It is indeed distributed as JS source code, but "simple interpretation" is an unfair description of what modern JS engines do. The JS is compiled to machine code, then introspected at runtime to look for optimization opportunities based on the real-world way that each function is used.
In my case, it's still running as JS, but rearchitected to solve the more straightforward problem where you don't need to compile to ES5. I've thought about rewriting it in Rust to see how much faster it gets, though currently I'm trying to get it running in WebAssembly via AssemblyScript ( https://github.com/AssemblyScript/assemblyscript ), which has been promising.
I'm curious about the about the Babel performance comparison. Benchmarking JS is tricky because, from my observations, the performance improves by a factor of ~20 if you give it 5-10 seconds for the JIT to fully optimize everything. https://github.com/alangpierce/sucrase/issues/216 . Rust and WebAssembly both have a significant advantage in that sense when running on small datasets.
I just did a little benchmarking of my own. Here's how the three perform on 100 compilations of an ES5 version of Parser.js from the webpack source code:
Sucrase had the jsx and imports transforms enabled, swc had all transforms enabled (they're not configurable), and Babel had @babel/preset-env enabled. So I'm a little skeptical of swc's 20x speedup claim, but certainly depends on a lot of details.
Hey man, I just started using sucrase in production at work. Great work with it; just the right features to let me kill Babel, simplify build docs, and still get use some modern niceties (I use it alongside gulp/rollup)
Feedback/questions always welcome, of course! Feel free to file an issue or chat in gitter. Fortunately it's a pretty well-scoped project that's mostly done at this point.
Yep, certainly agreed. In my wasm experiments so far, wasm is faster when compiling small codebases and JS is faster when compiling very large codebases, so it may even be best to ship both and either automatically decide which one to use or make it a preference. Hopefully the wasm build can be better-optimized to always be faster than JS, though.
The nice thing about implementing a JS-JS compiler is that there is already a huge number of tests out there, like these, that you can use to TDD your compiler's edge-cases.
SWC's CONTRIBUTING mentions this:
> Include tests that cover all non-trivial code. The existing tests in test/ provide templates on how to test swc's behavior in a sandbox-environment. The internal crate testing provides a vast amount of helpers to minimize boilerplate. See [testing/lib.rs] for an introduction to writing tests.
But I cannot find a `/test` directory, and it does not appear in the .gitignore either.
EDIT: Ah, the tests are in `/tests.rs` with JS embedded inside rust code. This makes it a bit harder to compare directly with babel's suite, but claims to be lifted from test262, which babel also based many of their tests on (albeit recategorized). hzoo's comment here has details: https://news.ycombinator.com/item?id=18746905
Seems like transcompilation is an area where you really DON'T want be overfocusing on performance. Correctness is king here. I mean, compiler bugs are hard. Really hard. Deep voodoo hard. No one who's ever dealt with a significant code generation bug on a large project ever wants to repeat that experience.
That's not to say that babel isn't too slow or that this project isn't correct, just that it would scare me to try to work with it. Does babel have a regression suite? Does this pass it?
Js packing performance (including compilation) is probably the slowest part of the iteration cycle for frontend devs today. When combined with Babel’s and ecmas extensive test suites I think something like this could be wildly successful.
Can you elaborate more on this? I've worked with very large codebases in React and Angular and have not seen this issue. I'm guessing it's either because all of my applications were able to hot-reload or we're not thinking of the same thing.
I don’t know if we’re doing something wrong but initial startup time on sentry’s webpack setup is something like 30 seconds and iterative changes around 5 seconds. Still beats our rust or python iteration times though :)
a lot of codebases don't support hot reload, but even the ones that do often require multi-second incremental compile times during hot reload if they're big enough [1].
[1] current project is very slow w/ 250kloc (2.5mm kloc w/ dependencies). even seen it be slow on my last project which was only 40kloc.
Just using tsc to compile a hello world with 7 tokens takes over a second in my experience. Sure, most of that is initialization, but compare that to the C and C++ world, where GCC spends tens of milliseconds at most compiling hello world.
Hot reloading _helps_, but it's definitely an issue that javascript tooling spends seconds on tasks other languages' tooling spends tens or hundreds of milliseconds on. (The issue isn't just limited to tsc either; just starting the build system can take a decent amount of time with webpack, while make or cmake starts in milliseconds.)
If you can use test262's test suite you should be good, it looks like they are referring to Babel's tests for some of the features? example https://github.com/swc-project/swc/pull/86, I just checked really quick though.
In terms of perf, I'm sure rust will be much faster than JS. And this project's main goal is to be a faster version than Babel. I would suggest the perf test would be different though because it's testing code I would consider not to be representative of what Babel actually runs on. Example with the transform test: https://github.com/swc-project/swc/blob/222bdc191fcab7714319... it's a very small file with minimal transforms even necessary at all. And the parser test is for all minified js files, none of which are written in ES6+ https://github.com/swc-project/swc/blob/master/ecmascript/pa.... Currently it's not really testing ES6+, just how at a baseline it's much faster to process a ES5 file, so it's possible that it might be even faster.
edit: I will note that Babel does a lot more than what this project is doing currently (plugins, other proposals, ts/flow, sourcemaps, etc) but this is more about what could be so if people are willing to more on this effort in the long run (with funding/support/people) it could be super useful! I'm curious about how that will work when our project has such a struggle with contributors and Babel itself is written in JS. (I work on Babel :D)
I'm sorry, but I'm kind of tired of the idea that some problems are "deep voodoo hard". It casts it in an inaccessible light, creating a culture that new folks internalize, guiding then away from ever even trying.
Unless we're talking about "understanding neural network internals", which is a big open problem, there just really aren't "black magic" problems inherent to particular subjects. It's almost always poor spec definition or poor code design, if there are any hand-waivey bits to be found.
If anything, compiler design (along with other subjects like text editor development) is a very deeply studied, very well understood, very well documented field. It should be easier than any other fields less central to computer science and software development, just on account of not being so thoroughly studied.
Many people working on compilers will have a PhD in the subject. It's an area where it definitely takes many years to get up to speed. As you said yourself it's deeply studied, and studying all that work takes ages.
Compilers are also definitely not well documented. I think they're rather notoriously undocumented with much of our knowledge of the subject passed on orally and existing only in the knowledge of people working in the field.
For example a couple of years ago I went looking for details of a very common compiler technique, called safepoints, used in many major industrial compilers today, and found that there was essentially no written information on them at all. That's pretty extraordinary.
1. Safepoints are kind of off-topic, because they're part of the interplay between a complex optimizing compiler and the garbage collector/runtime. Babel and its ilk are source-source compilers targeting a language that doesn't have manual memory management. To my mind, that makes things much simpler.
2. How long ago were you searching for safepoint info? I've collected several articles and talks relating to safepoints. I don't know of anything that discusses safepoints in a way that you could implement them from scratch just based on the discussion, but they're tied to a complex part of the runtime, so any example/tutorial code would probably be parochial to a particular VM.
> I don't know of anything that discusses safepoints in a way that you could implement them from scratch just based on the discussion
The best discussion of them at the moment is this one - and as you'll notice they're essentially having to do analysis of what is already out there because it just wasn't written down anywhere at the time. They don't have much of a 'related work' section or a bibliography because there wasn't much to refer to.
And implementing them from scratch is what I wanted to know how to do. For a lot of compiler topics the only way to learn is to sit down with an existing expert and physically ask them. Compared to other CS fields that’s really bad.
> Many people working on compilers will have a PhD in the subject.
There is some truth to this, but as someone with a PhD in compilers, let me say that this area is very accessible to people without PhDs. The first time I met @dannybee, I asked him where he went to grad school. Turns out he went to law school. Didn't stop him writing the GCC alias analysis framework though.
> Already in the late 90s, there were several good books about compilers.
There are tons of introductory texts, a few intermediate texts, a couple of very narrow advanced texts, and then it just runs out. A great deal of our knowledge is undocumented.
He didn't say compilers were voodoo. Diagnosing subtle bugs in a complex code generator transformer is voodoo. If you haven't written the compiler, but are just an end user, I'd say that's a fair assessment. It's a lot of work to identify what went wrong.
The slower the build the more developer's focus fades away whether he wants it or not. I'd say performance of build tools is at least as important as their correctness :) compiler bug can be fixed but developers' focus not very much.
Not when compared to mature software it isn't, no. The comparison isn't to "Javascript" as an implementation language for a transcompiler, it's to "Babel" as an actual transcompiler in wildly active use to generate production code everywhere in the world.
This is true of every upstart project, though. LLVM was a risky project at one point. People with high risk tolerance will pave the way, and the rest of us will join in if/when the project seems mature.
What part of rust prevents the writing of bugs? I understand it helps you avoid unsafe memory operations, but to say it gives you “bug free” code seems like an falsehood.
Rust doesn’t implicitly create bug free code. I’ve created lots of bugs in Rust.
But there are classes of bugs that the language eliminates. Thread unsafe dataraces (not important in JS), unhandled exceptions, with no GC it helps reduce memory usage and make it more consistent, the type system allows you to build complex state machines easily and correctly, the type system itself can allow you to express states that should never occur as compilation failures, the compiler protects against concurrent updates in iterators and other situations.
The list goes on, but these are the things that make it such a powerful language and allow me to focus on logic and not runtime bugs. Oh, and it’s really fast which is a nice bonus, generally as fast as C.
To add: safe Rust prevents all data races, not just thread based ones (this is mostly relevant around iterators and other concurrency-without-parallelism structures, but is relevant more generally as well)
Rust can not guarantee correctness, but it is a focus in some sense. Strong static typing, robust error handling... you can encode a lot of invariants in the type system. Especially compared to something dynamically typed, like JavaScript. It’s not a proof assistant, but every bit helps!
Thanks for correcting me there. Rust _does_ have a more than the borrow checker that are helping the programmer avoid common dynamic language issues (and rust is awesome).
I just know that even if the compiler of the language I am writing is very smart and can catch a lot of my silly mistakes, it would behoove me to think that I could write "bug free" code as a result. I would still write my code defensively, even in rust.
Should be noted it usually takes a bit longer to get to that first working version in Rust. It's debatable whether in the long run it's faster to develop with.
I would argue it is much faster to get to a final, 'correct' working version because you spend less time debugging. Time is also drastically cut down the more familiar you are with the language, but still, I'd love to see some data on dev time between languages with robust type systems vs dynamic.
I expect my experience is much the same as Steve’s on this point.
Mostly I would say fewer bugs; the bugs that are avoided vary in severity. Many, probably most, are the severe but obvious bugs where you would have hit the first time you tried running the code in another language, but which are instead now caught by the compiler—that’s why “time to first run” is not a reasonable metric for comparing writing code in such languages, but “time to first meaningfully successful run”! But it’s also common for Rust to catch and prevent far more insidious bugs; especially, in my experience, in two areas: firstly, with respect to ownership—things like passing data structures around and improperly mutating them where you should have cloned it instead, which is structurally impossible in Rust (disregarding things like Rc<RwLock<T>>, which make it possible, but make the intent very much clearer); and secondly, where enums (tagged unions) can be used, due to the greater and more precise expressivity.
I have much greater confidence that my first pass at a piece of code will work correctly in Rust, once it compiles, than I do in JavaScript—and I’ve been working in JavaScript a lot more than Rust in the last couple of years, yet it’s still true.
Expressive type system, strong and static - JS is weak and dynamic. This is not limited to Rust - but if you're aiming for correct code pretty much anything after C/C++ and maybe PHP is better than JS and it's million and one way to write unexpected bugs and 10.000 ways to do the same basic thing (with an NPM package no less) of which each dev/library chooses it's own path for cool factor.
Who would even argue that JS is better for writing correct code over a language that's obsessive about security and correctness. Last time I checked Rust wouldn't even upcast ints FFS, meanwhile in JS you can't even be sure what "this" refers to in any given context and 0 == '0' == [0].
Tagged union types (called `enum` in Rust), recursion, and pattern matching are great for tree data structures, such as ASTs, and help catch many bugs. This is the reason why so many compilers, typecheckers, and interpreters are written in languages like OCaml, F#, Haskell, and Rust.
There’s at least a few; cranelift is powering stuff in production at Fastly. Facebook is writing a MIR interpreter to do static analysis, don’t think that’s production yet though.
certainly won't guarantee bug free code, but exhaustive pattern matching really helps when you're building compilers (as it does in other languages that have this feature).
You can mark values (e.g. Error conditions) as Must Use. This, along with structured enumerated types, means that you can't forget to handle error conditions.
Other bugs such as the goto fail bug are eliminated by dead code spotting and consistent formatting.
You can write bugs in Rust but they tend to be business logic bugs, or incorrect explicit decisions.
But you're talking about a step that javascript developers want to inflict on themselves every time they change anything. If the time to do that is not exactly zero, it will kill productivity [1].
[1] Naturally excluding the 18 hours of productivity that will have been killed getting said unnecessary javascript->javascript build system set up and running in the first place.
for me these kinds of tools are only as good as the time spent debugging when something goes awry. as such, the biggest risk i see is that it isn't built in javascript. i simply don't see myself debugging rust while trying to compile javascript. i really don't.
this being said, i do wish this project (and any others like it) great success. we definitely do need something faster when compiling js.
(this tool reminds me of a similar project for a small client. they required a fast js compiler. the project was a failure because no-one wanted to debug non-js code when the build failed)
This is why I'd really like to see some projects that compile to readable js when not in production mode. Throw in some auto-generated comments above every method that tell you the file and line number in the Rust/Go/Crystal/Whatever code that this comes from, and debugging will be fairly easy.
Perhaps the best thing about Babel is its plugin-based architecture. As of Babel 7, even its parser accepts plugins.
This not only makes it easy to keep up to the latest of ECMA's standards and extensions like JSX, Flow, TypeScript – it also makes it surprisingly easy to implement your own syntax extensions to JS (eg; https://wcjohnson.github.io/lightscript/, which I have worked on).
Curious how this project is thinking about plugins and staying up-to-date sustainably.
Does it really? Last time I looked into this, the "parser plugins" were all fake: single-line modules which simply switched on hidden config flags in Babel to enable functionality already hard-coded into the core parser. Left a bad taste in my mouth that the developers weren't up-front about the true (non-)extensibility of the system.
I don’t know much about how Babel is implemented, but this uses the excellent rayon crate, which makes data parallelism awesome and easy. Rust is also generally fast, especially when you put effort into performance, which (from the creator talking about this on reddit) was a big goal.
This is the kind of thing Rust should be excellent at.
> This is the kind of thing Rust should be excellent at.
I would actually think this (compilers) is a thing where Rust doesn’t have that big of an advantage. Compilers are very special. They mostly deal with some kind of tree transformation. And as soon as those trees use owned nodes (E.g. in the form of Box of Rc) things are not that far from what a speedy managed language would do. Compilers are also non realtime at all. It doesn’t really matter if there’s a GC pause or not, and whether individual allocations are quick or slow. The only thing that matters is the overall throughput/performance.
Now this statement shouldn’t mean that Rust is bad at doing compilers (enough projects tell otherwise). Only that in other domains (E.g. realtime and low level code, graphics, audio, etc) it’s advantages might be even bigger.
I agree - which is why I’ve kept an eye on Fastpack as a replacement for my biggest bottleneck which is bundling — which is really just a special case of a compiler in a lot of ways. OCaml is brilliant for these tasks.
I bet if you compared single threaded performance you’d still blow Babel out of the water. Even with the best JITs, idiomatic js code is so much harder/impossible to optimize than idiomatic rust or c/c++ code.
The Babel team is great.... and they're also tiny, overworked, and underfunded. (See some of Henry Zhu's recent conference talks on dealing with being an OSS maintainer for background.)
I think Babel 7 had some perf improvements (including tweaks contributed by Benedikt Meurer from the v8 team), but I don't think they've ever had time to focus on just improving perf.
Yep there's been various PRs for perf over the years but it's never been a focus per-se, and we've had help from various people including the v8 team https://twitter.com/left_pad/status/927554660508028929. Haha yeah no one wants to hear about open source sustainability in a thread about asking that tools be correct/fast/easy to use at the same time :D
It's true, speed is not really a concern on my mind at the moment - rather that we have enough people even willing to contribute back at all
Trust me, I am well aware that even big open source projects don’t get the resourcing they deserve. Everyone always has to make do. Babel is a fantastic project.
Yeah, those kind of one-off contributions was what I was meaning. You are 100% right with regards to expectations, and I don’t mean to imply that they should have. Only that it’s a bit more likely than some random projetct that nobody uses.
There's still going to be a finite number of cores on a machine. I can't see gaining parallelism being that big a boost (as opposed to gaining concurrency(if not already there))
You'd be surprised, even consumer-grade CPUs are fast approaching manycore territory these days. Just look e.g. at AMD's Threadripper, which BTW is being targeted at hobbyists and enthusiasts, not just professionals. (There are also ARM-based 'desktops'/'workstations' with comparable number of cores.) If we want to make anywhere near full use of these capabilities, we'll need easy parallelization (for applicable problems) to become essentially ubiquitous. Rust and Rayon are perfect for that.
Closure Compiler is a poorly specified pile of edge cases and bugs. (Embarassingly poor quality for something Google relies on so much.) Is this attempting to support its type system, minification moded, or what? I'd be impressed if it managed a significant subset.
Honestly I don't think Babel is the performance bottleneck in most dev setups. Most slow frontend dev setups are because of webpack and it's core architecture. Even the ts compiler is quite fast.
I've looked into dev tooling performance quite a bit and I think your comment is half-true. Babel is just part of the performance story, and in some situations, speeding up Babel doesn't help at all.
One complication is the fact that Babel results are often cached, so with a reliable cache, Babel performance barely matters at all. However, every caching approach I've seen has reliability issues in practice. Certainly a tool is more pleasant to use without a cache, since "maybe I have a bad cache" is never a worry when running into strange behavior.
It also depends a lot on your scale, how you compile your code, and what operations you want to be fast. When switching to Sucrase (my own Babel alternative project, mentioned elsewhere) on a large codebase, webpack startup went from 40 seconds to 20 seconds, though incremental builds weren't any different. Running tests (in node with a require hook) became much faster, from minutes to seconds when uncached and about a 2x speedup when Babel pulled from cache. In both of these cases, the remaining time is now in other things: webpack processing the files, node running the imports, etc. So there's a lot more besides Babel, but Babel still is a non-trivial component in the cases I've run into.
Yeah I've had a difficult time figuring out how this was tested:
I would recommend using the benchmark at https://github.com/v8/web-tooling-benchmark in the future if it's currently not possible since currently it only seems to be testing ES5 (jquery.min.js) for the parser and a short ES6 snippet for the transforms.
I actually like JS as a programming language for the web, I just don't like scripting languages in general for command line tools.