Of course the obvious question is "What kind of memory management does it use?" ...

p_l · on Oct 15, 2021

If the target is "real-time systems with low enough latency bounds to make GC an issue", the question starts to be "does it have predictable pause times" (Rust, afaik, doesn't due to backing mechanisms behind some of the lower level aspects of memory management and/or use of Rc/Arc, though of course someone could get things better there)

doppioandante · on Oct 15, 2021

Do you mind posting some reference on the unpredictability of pause times in rust? I'm interested in real-time systems and I'd love to know more.

p_l · on Oct 15, 2021

This is more specifically about any reference counted system, though it also includes unique owner approaches and others.

Namely, when you free an object, whether it involves manually written code to release anything it holds recursively, or through some form of reference counter, you do not know the size of object graph you're freeing. If you add any custom release logic to the mix (destructors, any kind of shutdown logic, etc), you also don't know how much extra work unrelated to simple memory management will be also involved.

This means that naive manual memory management, or reference counting systems, have usually unbounded and unpredictable pause times (in addition to various costs involved in implementation details, whether it's malloc()/free() or refcounting). You do not know if object X going out of scope isn't the sole owner of a huge object graph.

This is one of (though not only) reason manual memory management and refcounting aren't considered good options in real-time compared to static allocation of everything (object pools, static stack analysis & limits) or RT Garbage Collector algorithms, though of course it depends on how deep into the rabbit hole you go.

Garbage Collector systems tend to have more control over pauses, but majority of algorithms used optimise for throughput, not latency or predictability. Thus you have "huge GC pause" as GC uses fastest space/time algorithm and "pays down the debt" for making allocation stupidly fast (often single Compare-And-Swap operation, sometimes not even that if each thread uses separate nursery and can keep local heap-end pointer).

However you can also setup environment towards low latency and predictability, probably the most famous example being IBM Metronome (available in J9 and recently open sourced), which uses a static quanta of time for GC and minor variability is the noise of multiple threads of execution synchronizing. This works by dividing available time between mutator (your "work" code) and collector (GC) statically, so that GC will for example always run for 30ms then mutator for 70ms then GC for 30ms and so on. Similar techniques exist for amortizing pathological performance of reference counting by pushing actual deallocation into separate collector thread or similarly scheduled slice (depends on whether you're fine with possibly unpredictable contention between two threads or want more strict timing).

A lot of very hard real time is more about predictability in the end.

kragen · on Oct 16, 2021

It's worth mentioning that (1) typically even ordinary reference counting avoids large pauses, because typically objects that go out of scope aren't the sole owners of a huge object graph, and that this is often but not always a significant improvement over (other simple kinds of) garbage collection; (2) there's a readily available tradeoff with reference counting known as "deferred decrement" which can meet hard real-time deadlines --- but at the cost of sacrificing the relatively predictable and compact memory usage you usually get from reference counting.

Deferred decrement enqueues objects whose reference counts drop to zero onto a "destruction queue" instead of deallocating them immediately. Every time you allocate an object, you work through some fixed amount of the destruction queue, such as two objects, unless it's empty; this amounts to destroying each of the references in the destroyed objects, decrementing whatever objects they refer to. This may drop those objects' reference counts to zero, in which case you add them to the destruction queue. This is very simple, it strictly limits the amount of work needed per allocation or deallocation, thus completely avoiding any large pauses, and it's guaranteed to empty out the queue eventually.

However, it's still reference counting, so it's still inefficient, and it still uses an unpredictable amount of memory, which means that any allocation can fail. Many real-time systems are not amenable to recovering from failures.

p_l · on Oct 17, 2021

The main difference in favour of GC in the first case is that GC gets you centralised place to control it.

Generally GC offers more explicit, centralised control over allocation/deallocation, and I've seen it favoured over other schemes if dynamic memory management is necessary in critical code.

kragen · on Oct 19, 2021

That's interesting! Is the idea that, for example, your interrupt handler can freely allocate memory and update pointers and stuff, and you make sure to run the GC often enough that there's always enough memory available for the interrupt handler when it fires? Has this been written up anywhere?

I usually think that the main advantage of GC is not a performance thing; it's that your program can pass around arbitrarily large and complex data structures as easily as it can pass around integers, without you having to think about how much memory they use or when it gets freed. Of course, not thinking about that can easily result in programs that use too much memory and die, or run your machine out of RAM; but for very many programs, either "compute a correct answer" or "die" are acceptable outcomes. You wouldn't want to program your antilock braking system that way, but lots of times people are willing to put up with programs crashing occasionally.

p_l · on Oct 19, 2021

For very tight code, like interrupt handler context, you usually avoid allocation at all (with some object caches to provide storage possibly), but yes.

Metronome is all about partitioning run-time into static slice of GC/mutator ensuring that mutator code has predictable performance - it is slower than if mutator was running full-tilt, but it means that unless you manage to horribly exceed GC performance in collection, you're not going to see a pause.

That said, GCs are faster than manual management usually thanks to pauses - if you look at total runtime, wall-clock time, and not latency. The most common algorithms trade latency for throughput and total wall-clock time.

kragen · on Oct 20, 2021

Yeah, that's what I thought about interrupt handlers. (Also if you're allocating in your interrupt handler, your background thread can't allocate without either disabling interrupts or using some kind of lock-free synchronization.)

I still haven't read the Metronome paper.

I don't know if I'd go so far as to say GC is usually faster than manual memory management, even if we measure by throughput, but at least "often faster" is justifiable. I think usually you can find a way for manual memory management to be faster.

rurban · on Oct 15, 2021

An overview of it's deficiencies: https://github.com/carp-lang/Carp/issues/1317