Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mid-stack inlining in the Go compiler (docs.google.com)
241 points by dcu on March 6, 2017 | hide | past | favorite | 73 comments


In case someone else is wondering: What is being called "mid-stack inlining" here is what is generally understood by the term "inlining".


The presentation makes a distinction between mid-stack and leaf inlining, and apparently it was only done on leaf calls before because this is less confusing in backtraces.


throwawayish was downvoted, but I appreciated his and CUViper's clarification.

It's not some weird, exotic form of inlining; it's just a more complete version than Go used to have.


leaf calls, and calls must not be complicated (no function calls, no loops, no switches in calls).

I for one am very glad that more work is put into inlining


The point is, as usual, Go is trying to catch up to what everyone else has had for years. The use of non-standard terminology here is suspicious, raising the question that the Go people are trying to hide the fact that they are playing catch-up.


Comments like these are so depressing. Someone does a bunch of work to improve the Go compiler and writes a presentation to share their approach (primarily so that other people working on Go can understand and extend it), and gives it all away for free. This is textbook open source citizenry, which should be applauded.

But instead, you come along and criticize them for being too specific with their terms (!!) and also accuse them of being deceptive. This is not a marketing exercise. There is no conspiracy here.


> writes a presentation to share their approach (primarily so that other people working on Go can understand and extend it

Perhaps Ericson2314's gripe is that it's being posted (and upvoted) on HN, rather than just a Go-specific forum (e.g. reddit.com/r/golang), and by implication it's intended to be read by a more general audience.


Well that's just silly. People post random stuff to HN all the time. That something is appears here does not mean that HN is the intended audience.


Nothing about giving your work away for free or wanting to do good necessarily puts one above criticism for misuse (or useless invention) of terminology. It's not a marketing exercise, you are correct -- all the more reason to expect a narrow use of terminology.


If you're going to be pedantic, you might at least be strictly correct. There is no existing term that would distinguish it from Go's prior state, so there is no misuse nor "useless invention". Even if there were and you were correct, this would still be pedantry at its finest.


> ...would still be pedantry at its finest.

Why would correct use of terminology be "pedantry at its finest"?

What people are concerned about is intellectual dishonesty. The presentation does not go to much effort to show, for example, why the term "mid-stack inlining" needs to be introduced for Go. Do other languages not have this kind of inlining? Does comparing and contrasting other implementations just not matter?


> Why would correct use of terminology be "pedantry at its finest"?

Because you're debating terminology when the wrong term caused no one confusion, thereby detracting from the actual conversation.

> What people are concerned about is intellectual dishonesty.

There's no cause for this concern.

> The presentation does not go to much effort to show, for example, why the term "mid-stack inlining" needs to be introduced for Go.

Sure it does; see slides 3-6.

> Do other languages not have this kind of inlining? Does comparing and contrasting other implementations just not matter?

Perhaps the author omitted it from his deck because it's impractical to cover the whole breadth of inlining in a deck that's already 35 slides long. Perhaps he simply didn't think to include it. There are a lot of likely explanations for why this wasn't included besides nefarious motives. You sound paranoid.


> There's no cause for this concern.

Well, why do you think we are bringing it up, then? Due to some nefarious motive?


My take: it's because of an unnecessarily cynical worldview coupled with a misunderstanding about the intended audience of the presentation.


No hiding or sneaking is necessary to explain anything here. They needed to come up with a way to describe the difference between how it is currently implemented and how they are changing it. That's all.


> The point is, as usual, Go is trying to catch up to what everyone else has had for years.

As far as programming languages go, Go is very new. Of course, there will be a number of areas that still need to be optimized or fully fleshed out. I think the Go compiler has only been written in Go for about a year. I'm sure you you find this pretty lame as well.


I don't think anyone is pretending that Go is a decades-old language. What is the standard term for this kind of inlining which would distinguish it from the inlining already present in the compiler?


> What is the standard term for this kind of inlining which would distinguish it from the inlining already present in the compiler?

There isn't a standard term, because virtually all inliners always did what Go is now doing. It's pretty basic functionality.


Really? I feel like every C++ compiler I've used has given very unreliable stacktraces with optimizations enabled.


Do most compilers preserve enough metadata to report accurate stacks?


So they're not avoiding a standard term to be misleading; there simply isn't a standard term to use. Thank you for clarifying.


The point is "inlining" already refers to this kind of inlining. The already-present form is more limited than the typical use of the term.


Then what word would you have used in this presentation, given no established word existed?


"Fixing Go's inlining"


"Go inlining improvements..." and then "Conventionally, inlining includes ... but Go has been limited to ... until recently. With ... Go now supports inlining in a broader array of circumstances."


"Conventionally inclining includes ...?"

What does it include? You've conveniently managed to skip over the crucial part with ellipses. Calls in the middle of stacks perhaps?

You people have too much time on your hands to go and pick at the wording people use in their presentations.


> You people have too much time on your hands to go and pick at the wording people use in their presentations.

What kind of argument is that?


I'm saying that this is an incredibly inane, nitpicky, superficial conversation that has nothing to do with the technical content of the presentation.


You are saying it; but you aren't really backing it up with anything.


Shitting on Go for sport: a HN pastime.


This is absolutely fantastic work.

Since I've learned about continuation passing style (which Go channels could probably be formally transformed into), I've been convinced that there's a better way to do codegen. Better calling convention, better stack representation, better instruction architecture; I'm not yet sure - it's a nag continuously at the back of my mind, almost as though it's at the tip of my tongue. In this specific case, it must surely be possible to inline a continuation with some foreign architecture. I'd love to see some literature on the more experimental end of this stuff, if anyone has it.


Are you aware of "Compiling with Continuations?

https://www.amazon.com/Compiling-Continuations-Andrew-W-Appe...


That's an interesting read. One of only a handful of tech books I've read twice. It's thin and doesn't repeat itself all that much so if you read it twice it's still faster than reading most tech books once.

More recently though I heard someone proved mathematically that CPS can be transformed one-for-one into one of the more conventional models. That doesn't mean it might not still be easier for the humans to deal with however.



Jvm/java had that for a while in its JIT - it's nice to see it to coming to Golang.

Byproduct of that could be little hacky things like this below that make code faster by restructuring code a little

https://techblug.wordpress.com/2013/08/19/java-jit-compiler-...


9% faster, 15% bigger. I'll take that!


FWIW: This is actually not that great, but it's a good start.

You should be able to get about 15-20% with about 3-5% binary increase size.

In fact, with ThinLTO, we often see that gain with binary size decrease from smart inlining choices.

(The heuristics for inlining take a very very long time to get right and tune)

The issue they will next hit is that inlining is going to make the compiler slower until they tune the heuristics well.


If I'm understanding the increase is mostly due to the "debugging" info that is added, not necessarily due to more code.


I strongly doubt this. It doesn't say this in the preso, and ...

1. The compiler is a lot slower, which is usually from code growth and not debugging info growth. If the compiler is that much slower from debugging info growth, they have larger issues :) 2. Usually people do not include debug info sizes in binary sizes, because DWARF/et al info can be stripped and put alongside the binary (IE it doesn't even have to be part of the binary)


It does say this. One of the last slides says 4% of the additional size came from adding more debugging information, excluding anything to do with the new inlining.


"If I'm understanding the increase is mostly due to the "debugging" info that is added, not necessarily due to more code. " vs " One of the last slides says 4% of the additional size came from adding more debugging information, excluding anything to do with the new inlining"

So no, it doesn't say that it's "mostly due", it says ~25% is due to debugging information.


Increased sizes lead to less effective use of instruction caches. Different use cases will be impacted differently by that, so it's worth profiling your particular application, at least until the size overhead comes down.


What impact would that have on build times? I know a lot of work has gone into getting back to 1.4 build times, but would the added work of inlining prolong builds?


The compiler got a bit slower:

https://github.com/golang/go/issues/19386

Those numbers are just for my CLs that fix stack traces but with mid-stack inlining still off. Turning it on makes builds noticeably slower:

    $ time ./make.bash
    real: 45.32s  user: 118.67s  cpu: 5.85s

    $ time GO_GCFLAGS='-l=4' ./make.bash
    real: 64.51s  user: 167.04s  cpu: 7.12s
We'll need to tweak the inlining heuristic to find a good balance between performance, build times, and binary size.


But then the compiler gets faster once you compile it with these new improvements. Is that accounted for in the issue?


Please have a switch where I can sacrifice build time for maximum possible runtime benefits.


The compile-time/run-time tradeoff is interesting. Getting the "maximum possible runtime benefits" probably calls for https://en.wikipedia.org/wiki/Superoptimization :)


The presentation redacted the stats about how this affects Google performance. I bet it saves enough CPU hours to pay the author's salary many many many times over. Good job!


Maybe? I'm under the impression that the vast majority of Google's software is not written in Golang, though.


But even 1% of their software being go would still see more use than most software you or I write in our lifetimes.

dl.google.com has been golang since 2013. Imagine the traffic that application gets!

https://talks.golang.org/2013/oscon-dl.slide#1


Agreed.


Whoah, nine percent? That's a lot! Now I wonder if the improvement better or worse on non-x86 platforms?


We measured 10% improvement on ppc64.


This is interesting work!

That said, it's a little disappointing when runtimes require custom algorithms or metadata to walk the stack and construct a stack trace. It makes it harder to build debuggers that grok the state of multiple runtimes (e.g., the Go code and the C code in the same program). This also affects runtime tracing tools like DTrace, which by construction can't rely on runtime support for help.


We plan to expose all of the inlining information in the DWARF tables so debuggers won't have any problems with this. Internally, the runtime uses a different representation just so we can make it more compact and optimized for the runtime's exact needs. This way, you can also strip the debug info without breaking the runtime's own ability to walk stacks.


Isn't that what `.eh_frame` is for?


.eh_frame is DWARF with a couple tiny tweaks


Right, but it's an allocated section that doesn't get stripped like debuginfo.


The plan is to add inlining information to the DWARF tables so that debuggers and other tools can take advantage of it.


It looks like most of the improvements are in string formatting problems. I'm curious if better heuristics will help other areas as well.


As someone who now professionally uses go on a range of mips devices, I, for one, do care about any binary size increases!


It doesnt look like this solves inlining library calls?


Go already performs cross-package inlining, so it can already inline library calls. (This is relatively easy to do in Go compared to other languages because packages must form a DAG. Compiling package A writes out enough information in the object file for A that compiling package B that depends on A can inline calls to functions in A.)


> This is relatively easy to do in Go compared to other languages because packages must form a DAG.

That doesn't make sense to me. The complicated part is storing the IR in packages in a form that can be read back into the compiler later. That's needed to do inlining at all. Once you have that done, doing LTO is trivial: you just slurp your IR for all modules linked together into the compiler and emit a single binary. (You can be fancier, like ThinLTO does, but again, the effort needed to do ThinLTO is independent of whether you have cyclic dependencies or not.)


"Compiling package A writes out enough information in the object file for A that compiling package B that depends on A can inline calls to functions in A"

So it records the calling convention, architecture flags, alignment, and other ABI pieces etc? As well as an estimate of instruction-level inlining cost, summary info about arguments, etc, so you effectively decide whether inlining it will help or hurt, without having the IR around to try?

FWIW: Writing out the info is usually not the hard part, actually, and is unrelated to the DAG-ness of the packages.

GCC is just the perennial example here, but they refused to write it out for years for political reasons, not technical ones :)


"So it records the calling convention, architecture flags, alignment, and other ABI pieces etc?"

No. At the moment it records the AST in the object file, because the inliner works at the Go AST level. In the future it may instead record the SSA representation (which would obviously give better cost estimates; the current heuristics are really extremely simple).

"FWIW: Writing out the info is usually not the hard part, actually, and is unrelated to the DAG-ness of the packages."

The DAG-ness means it's always available when compiling the call site, even if it's a cross-package call. It means you don't have to do it at link time.


"In the future it may instead record the SSA representation (which would obviously give better cost estimates; the current heuristics are really extremely simple)."

This would be identical to what others do then :)

"The DAG-ness means it's always available when compiling the call site, even if it's a cross-package call. It means you don't have to do it at link time. "

This is unrelated to DAG-ness. Unless you mean something else by DAG-ness. DAG-ness means it's a directed-acyclic graph. That is, all other things being equal, it has no cycles.

This is unrelated to the problem.

For example, in other languages/etc, it could be weakly defined, or other some form of overridable, regardless of whether it has cycles, is actually multiply defined etc. That requires link-time resolution, becuase you can't optimize it making an assumption about its callees/callers, or even inline it, and then just hand that version to others because you may have screwed it up in a way one of those other callers depend on.

That is, the overridability is an attribute of the function, not a problem of how it is used.

Ditto on the ABI, alignment, etc.

None of the interesting problems they have to solve are related to packaging. They occur with DAGs or non-DAGs, are just related to these languages supporting a richer set of things you can do to functions :)


> The DAG-ness means it's always available when compiling the call site, even if it's a cross-package call. It means you don't have to do it at link time.

Why is it any harder to do at link time?

(I've implemented this in a production compiler, and choosing whether to do it at compile time or link time was a trivial decision.)


traditionally, this required a linker that understands there is ir in the files. in practice, i don't believe this has been a problem for many years now (and again, was only a problem in the open source world, so saying it's related to the language is kind of strange.).

Every good production C++ compiler has had some form of link time optimization for many years.

IBM's, for example, has been happily cross-optimizing between C++, java, fortran, PL/IX, etc without any issues, going on at least 15, maybe 25+ years now (I know it's 15 for sure, i suspect it's closer to 25).


Go statically links to all Go libraries, and only dynamically links when interfacing with C code. (As of the last time I used it, a couple years ago; this may have changed since then.)


That doesn't really answer whether it can inline library functions. The fact that it statically links means it potentially could inline methods it finds in them but I don't think that it does currently since it appears that this inline functionality works when compiling source.


It can inline library functions, since, at latest, 1.4.2 (this is just what I happened to have handy on this computer). It's pretty easy to test by writing a package containing a simple function like "func Square(x int) int { return x * x }", compiling a main package that imports that package and calls Square, and then disassembling the executable to see whether it calls Square or inlines it.


Go nowadays supports dynamic linking, just plugins aren't 100% supported across all targets.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: