Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it a bad thing?

That might make the installed size large, but if it's isolated to Bazel, managed by Bazel, updated by Bazel, etc, then why does it matter whether it's a JVM, a Python binary, a JS VM, or anything else?



> Is it a bad thing?

Yes. It makes verifiability much harder.

E.g. Debian put a lot of work into making a more trustworthy build chain by introducing reproducible builds. There's no Bazel on Debian. And not on many other distros. Because getting Bazel to be built itself is a complexity nightmare. "Here, download this big blob where you have no idea how it's been built" doesn't exactly help.


I don't think bundling in components inherently changes the reproducible builds equation.

In fact I think that bundling in a JVM that they know works in a particular way will likely improve the reproducibility of Bazel _setups_ and therefore builds that come out of Bazel, which is their aim.


"every time the NSA's compiler binary builds my code I get the same output" isn't really all that comforting. If you can't build the compiler (and/or build tool) how can you be sure that there isn't anything malicious in it? Trusting trust all over again


What’s so hard about:

env EXTRA_BAZEL_ARGS="--host_javabase=@local_jdk//:jdk" bash ./compile.sh.

To build the bazel binary in a reproducible way, also set SOURCE_DATE_EPOCH.


Could the Bazel folks better define the composition of their blob? That way they can distribute their blob, but someone could substitute for another instance of the JVM. I'm assuming ownership of this lies with Bazel and this is more of a political/motivation problem than a technical one. Or, is that incorrect?


Sure, this is not impossible.

But there's a simple fact: The more complex your build chain is the more difficult this is. And Java adds a whole bunch of complexity.

Read into the bug report to get an idea what a nightmarish task this is: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=782654


The blob is just the JDK + Bazel + a wrapper binary. Just use their regular build artifacts instead if you want to bring your own JDK.


Next thing we know, every program is gonna ship with either a slimmed-down (!) 20MB JRE (if you use modules, otherwise it's gonna be worse), 130MB Electron runtime (this makes bundling the JRE seem like a good option), or what have you..

Wait, I think we're already in that era.

Oh well.

(Cue "but disk space is so cheap nowadays"...)


> (Cue "but disk space is so cheap nowadays"...)

Sure I'll bite. There are cases where this matters, absolutely, it would also matter if the difference was between 10MB and 10GB, but we're talking about a pretty small relative difference.

In return, you get the reliability of not having to worry about how your OS/configuration/other packages may interact with the software. In my opinion that's a huge productivity win, and also fits nicely with the fact that build systems should be thoroughly reproducible and do the same thing everywhere they run.

There are some good, niche, reasons for not bundling things like a JVM:

- If you're a strong free software advocate and need total control over the freedom in your dependencies. Debian do a lot of dependency unbundling for this reason. They're good at it, but it's not something most people care about or want to deal with the fallout of.

- Memory usage optimisation by sharing libraries in memory. This has a nice benefit for native code, but unlikely to do a lot for Java processes from what I understand. This likely _could_ have a significant benefit for the Electron ecosystem given that many people now run 2-3 Electron apps at a time.


"Small relative difference" - Most programs, probably including Bazel, have <1MB worth of code. Some programs have additional media files, but not all of them. And even in 2019, small programs do make a difference in convenience. Sometimes when you download things, sometimes when you copy files around, sometimes when you search the files on your computer... A factor of 20, or 130, makes a huge difference if you consider all programs installed on your computer.


Currently Bazel has 48MB of source code in the `src` directory, plus another 125MB of integrations. Some of this will be tests, some will be build config, but that's probably in the ballpark of 100MB of source code. Adding somewhere between 20-100% for reliable running of that in ~all circumstances feels like a pretty solid engineering decision to me?


I haven't looked at the project. In my experience, 100K lines of C source code roughly corresponds to 1MB of native code. That might not be representative, though. But maybe it's just that that the rest of this project has some more of these "20-100% expansion will be ok" decisions.


I just took these sizes from the master branch when I wrote the above comment. While that may have been true at one point, or is for C code, I suspect that is no longer the case in general. Between data files, resources, modern compilers optimising for speed over size, and more, I'd expect this to be significantly higher now.


In the era of static linking (golang, rust), sure, why not.

These days getting dependencies and the right version of the runtime is considered a harder problem.

I personally feel like the world is wasting a huge amount of memory because of this, but have to agree that distributing apps is hard.


Oh the irony of seeing such comment.

I still remember when dynamic linking was being introduced into OSes, Amiga libraries, Windows 3.x, the very first Linux kernel to support dynamic linking,....


Can you explain why it's ironic?


Because the large majority of OS only allowed for static linking.

When dynamic linking was finally possible, it was welcomed with pleasure, because not only it allowed to save disk and memory space, it also opened the door for extensibility via plugins, and dynamic behavior without restarts, instead slow IPC with one process per plugin/action.

Now thanks to missteps of glibc design and .so hell on Linux (even worse than on Windows due to all distributions), many devs embrace static linking as some golden solution.


Which was my point when I said I feel the world is wasting a huge amount of memory. It feels like a step backwards.


Isn’t the whole point of dotnet that it ships with all the executables, libraries and binaries too?

I don’t know how I feel about it myself.


You can choose to do a self-contained deployment, which bundles the runtime with your app, or a framework-dependent deployment, which requires a pre-existing shared install. Pick your poison.


Does it also ship java binary? Is the java binary statical-linked? If not, how can we build on platform with not-mainstream libc (read Linux with musl libc like Alpine, Void)?


I don’t know if this has changed but a couple of years ago you just had to try and build bazel from source which was extremely painful.


sometimes it is, in Enterprise it's very common to have invasive proxies and just yesterday I was fighting Bazel into using different than the built-in cert store. (it tries to fetch some of its components from github, and that fails if you don't use the proper certs)


if the version shipped in bazel isn't right for you, then you're stuck with either having to customize it, or wait for it to update.


But it doesn't have to be right for you, it only has to be right for Bazel. You wouldn't use it for anything else except Bazel, so is it a problem if it's out of date or lacking features? I think that's only a problem for the Bazel developers.


It is a problem if it has a security hole. Or if you have reason not to trust the Bazel developers. (Or trust people who might have illegitimate access to said developers' machines - for example the NSA.)

See https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p7... for the canonical paper on why it can be important to be able to recreate your build chain from scratch in a verified way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: