Bootstrapping a multiplayer server with Elixir

ravi-delia · on July 29, 2021

This is the kind of thing that I love about Elixir. As great as Phoenix is, realistically you are going to be equally productive in any web framework. Elixir is my favorite language because it gives you primitives that perfectly suit your needs for just about anything running on a socket, especially once you add in Plug.

Throwing together a prototype is instant, but the awesome part is how quickly you can turn that prototype into something useful. More concurrency? Done. Need to sock away some state somewhere you can easily find and deal with it? Easy. It's easy to write, easy to reason about, and gives you fantastic composable tools to put something great together. It doesn't feel like a codebase so much as an incredibly well managed and cohesive OS.

princevegeta89 · on July 30, 2021

Learning curve is a little steep, because of the functional aspect. However, the benefit always was worth my time. Our tiny server runs blazing fast in production and picks off background jobs incredibly fast

ravi-delia · on July 30, 2021

I never really had trouble with that aspect with it, but I also never really mastered OO programming. The paradigm shift is probably tough, but I sucked enough beforehand that I didn't have to deal with it.

toastal · on July 30, 2021

Is the functional paradigm still considered something most people still haven't learned in 2021?

masklinn · on July 30, 2021

Yes?

Most people might have experience with functional-ish features like HoFs and lambdas in imperative langages but Elixir, through Erlang, is much more of the package: immutable structures and bindings, expressions-oriented, patterns, limited support for iteration. It’s not pure or anything, but it’s closer to OCaml than it is C# or Javascript.

“I’ve used Array#map” doesn’t mean you know functional programming.

princevegeta89 · on July 30, 2021

Yes sir. People would have fun doing .entryList() or map.each do {} end which seems trivial but however with something as functional as Elixir, we can only achieve this with the Enum module for example. There is no iteration but only a bunch of methods from which to choose based on your requirement

s3cur3 · on July 29, 2021

Hey folks, I’m Tyler! I’m the guy who did the original work on this. I’m happy to answer any questions!

You might also be interested in the discussion happening on the associated Twitter thread: https://twitter.com/elixirlang/status/1420782381825503240

Zababa · on July 29, 2021

Thanks for the article, that was a nice read! How much concurrent player do you think you could handle at most?

s3cur3 · on July 29, 2021

The limitation is more about the number of players we can have in the same area—if you had a bunch of players equally spaced around the globe, I'm pretty sure we'd saturate the Ethernet link before running out of CPU on a big (say, 32-core) VM. The CPU cost (and outbound network bandwidth) of a player scales with the square of the number of other players who can potentially "see" them.

In my scalability testing, we could handle upwards of 1000 players in the same 50 nm by 50 nm square on my 8 core dev workstation. That's pretty far beyond the limit of what would actually be _fun_ to fly the sim with. It becomes pandemonium even with 100 planes in the same area. We could easily do a dozen pockets of 100 planes in the same area even on our current 8 core VM.

Bayart · on July 29, 2021

>In my scalability testing, we could handle upwards of 1000 players in the same 50 nm by 50 nm square on my 8 core dev workstation. That's pretty far beyond the limit of what would actually be _fun_ to fly the sim with.

But it would make a great fit for an MMO. AFAIK there aren't any of those now levering Erlang / Elixir. If anything they're doing the opposite and trying to contain people in smaller and smaller instances.

dnautics · on July 29, 2021

Didn't square enix do something or another with elixir for an mmo?

bullen · on July 29, 2021

Do those 1000 send to all 1000 or are those 50nm squared partitioned so only players in a certain range see each other?

s3cur3 · on July 29, 2021

In general, we send each player all the planes that their in-game map could possibly show (which is usually something like 140 statute miles by 140 statute miles).

We could one day be smarter about this and only send you everything if your map is actually open, and otherwise limit it based on your field of view. That hasn't been a priority yet, though.

temp_account001 · on July 29, 2021

Does X-Plane feature projectiles at all (guns, missiles)? If so, do those have to be sent over the network to deal with collisions with laggy hit boxes and such?

bullen · on July 30, 2021

afaik threads in erlang are processes so where do you store and how do you share the players positions between cores?

Zababa · on July 29, 2021

Thank you for the detailed answer. 50 nm here is 50 nautical miles I assume?

s3cur3 · on July 29, 2021

Correct! Sorry about that. :)

jbluepolarbear · on July 29, 2021

This is really cool.

How many players can you handle in close visible proximity?

What’s the bandwidth usage for spread out states?

What’s the bandwidth usage for the max player state in close proximity?

How much is client driven vs server?

Are you doing data compression on the plane states?

Probably easy to make predictions of a plane as it’s not changing speed or direction very fast, which where I see the low poll rate coming in.

Looks like it was a fun project to work on.

brainless · on July 30, 2021

Hey Tyler, this is really nice to go through. I just heard your podcast (1). I do not know of X-Plane.

I want to ask: do you share the nearest-region data to all planes within a region only for informational purpose or is there a crash possibility and you take planes out of the simulation, etc.?

I mean like in other MMO games, can players hurt each other and that information actually affects other client apps? That is where, I feel (I am new to this too), the global state mutations become more expensive right?

1. https://thinkingelixir.com/podcast-episodes/035-x-planes-eli...

thibaut_barrere · on July 29, 2021

Favorite quote:

"this article explores why they chose Elixir and how a team of one developer - without prior language experience - learned the language and deployed a well-received multiplayer experience in 6 months."

Thaxll · on July 29, 2021

Building a gameserver in Elixir is not great imo, performance is subpar. I work in online games and no one would want to use that language.

I guess they have low constrains since they only need to support 10000 players at a time and they want a single instance for that.

- Erlang runtime is slow

- Memory consumption is high

- GC is not very good ( vs Java / C# / Go )

- high cost in term of performance because of immutability

It's not the kind of things you want for a game.

To give you some perspective on popular FPS games, they run between 30hz and 120hz with up to 64 players and usually only use a single CPU core. And they're doing gameplay simulation / state replication / physics sometimes AI etc ...

Also it must be pretty painful to not re-use their C++ client code on the server side, they have to re-implement twice? ( this concern is actually higher than the performance one )

Overall I think it's an impressive achievement to be able to do that for a single person in 6 month!

s3cur3 · on July 29, 2021

YMMV, but:

- Memory consumption in production peaks at like 300 MB for us

- Scalability (i.e., being able to go wide on, say, a 64-core CPU) was more important to us than raw, synchronous speed

- Garbage collection is not something I've ever had to think about with Elixir (which hasn't been my experience with Python and Java)

- Because it's a sim and not something like a FPS, we don't have to worry about cheating (what would cheating even mean?), so we don't have to do any physics on the server side

- Despite being way more familiar with C++, I can't imagine going from zero to production in 6 months (or even 2 years) if we'd tried to do this in C++

andrewmcwatters · on July 29, 2021

> what would cheating even mean?

This is such a rudimentary question in game development that I don’t know how you can’t answer it. Mainstream server authoritative design has been a thing since 1999, and probably even longer when you look beyond Quakeworld.

All it takes is a few malformed packets from a script kiddie and then you realize you have to do distance checking and other anti-cheat functionality.

“Cheating” in gamedev doesn’t specifically mean there is a win scenario and you need to make sure people don’t shortcut that and play unfairly.

It means in this case that someone can’t connect to your server and transport their plane to another nearby player and wave it around in their flight path to annoy others while they spam chat to grief people for fun and to produce a YouTube video.

And those checks are pretty simple.

elteto · on July 29, 2021

No need to get aggressive. It's a high fidelity simulator. There is no goal, no missions, no points, no wins or losses. You just fly.

There is probably very little the server is actually doing. Player state as seen from the server is probably just x,y,z coords, plane type, etc. Just the bare minimum to be able to draw another player in the game.

For that use case Elixir (or Go, or maybe even Python with more computational resources) is a very good fit.

jay_kyburz · on July 29, 2021

Yes but lets acknowledge that it's not really a game, and that nobody should do this for any kind of public game with winners and losers.

A multiplayer server needs to run the simulation for every player and be the authority, the clients should simply collect input from the player and run a "lite" simulation to hide latency.

JoeyJoJoJr · on July 29, 2021

No, not every multiplayer game needs an authoritive server. It depends largely where your priorities are. For myself, having a base of players with some cheaters would be a great problem to have.

jay_kyburz · on July 30, 2021

Yeah but, the solution to your great problem is to write the authoritative server, so I would recommend you just do that in the first place. It's not really any extra work if you are starting a new project. (An authoritative server is just a headless client after all :) )

setr · on July 30, 2021

RTS games typically run deterministic simulations locally, and only sync inputs, with no real need for an authority. The anti-cheat mechanism is simply the fact that simulation state has to stay in sync, and artificial changes would trivially desync.

Of course, this leads to the standard RTS cheat — map hacking — because the client has to have everything by definition. But you have enough entities in an RTS that syncing full state, or even delta, gets ridiculous quickly, so it’s just something that has to be dealt with. Fighting games also tend to do something similar — deterministic lockstep with rollback — which also sensibly runs with no authority (again, you can’t teleport/wallhack because you’re only passing player inputs, and moving just your character is the same as the sim desyncing)

But my main point is that an authoritative server is not a guaranteed architecture for multiplayer games — the game design should definitely factor in (as it does in OP’s case).

s3cur3 · on July 29, 2021

Sorry, to clarify, our thinking on this point was: our players aren't particularly motivated to lie to the server since there isn't a "goal" to a flight simulator. Given the choice between shipping (much, much) faster and completely locking down the experience against the vague, yet-to-be-realized threat of people sending fake packets to the server, we opted for the former.

networked · on July 29, 2021

Have you had any reports of multiplayer griefing so far? I wonder if griefing happens in serious flight simulators like X-Plane and what form it takes if it does. (If there are griefers, then they might conceivably find a way to exploit this. But I am asking mostly because I am curious about what the player community is like.)

s3cur3 · on July 29, 2021

Griefing, yes. The audience for our mobile app skews young and, shall we say, less "serious aviators" than the desktop sim, so to some degree people like that you can buzz a 747 on short final in an F-22. :D

Long-term we've talked about things like a reputation system, where flying responsibly would earn you points to get into a separate "world" filled with people who also want to do more serious flying.

nerdponx · on July 29, 2021

I would also wonder about vectors for causing vandalism (botting your client to write racist words in the sky with a fleet of stunt planes emitting smoke trails), or other general forms of malice (trying to DoS or otherwise crash the server).

Presumably X-Plane is small enough and "boring" enough that your only malicious interactions would be only lazy drive-by attempts, but those kinds of situations would be more concerning to me than someone modding the client to make their plane fly at Mach 7.

andy_ppp · on July 29, 2021

You’re misunderstanding, there isn’t the concept of winning in a simulator like this. It’s not goal driven it’s practicing modelling the real world not showing how many points you get when you land a plane.

andrewmcwatters · on July 29, 2021

I’m not misunderstanding, it’s just clear that they have virtually no requirements. As soon as you have any meaningful requirements, this entire architecture probably falls apart, let alone the fact that you can’t hire for it, as no one is going to want to work on your obscure game server.

The criteria are also so brittle at those concurrent numbers that if the game decided to do any more than what they do today, you would face engineering challenges more difficult than other game server software.

As you deal with more and more concurrent clients there are calculable limits to how you can facilitate more features and how bandwidth and server frame time budget limits you.

Zababa · on July 29, 2021

That's just a "No true Scotsman". Your definition of "virtually no requirements" also seems to be circular with this architecture failing or not. People have created a chatroom with 2 millions people connected at the same time with Elixir on a single box https://www.phoenixframework.org/blog/the-road-to-2-million-.... Your ideas about performance are not based in reality.

andrewmcwatters · on July 29, 2021

[flagged]

remh · on July 29, 2021

You are being disagreeable. Requirements are requirements, sometimes they are easier, sometimes they are more complex. But you dismissing an engineer who successfully shipped something on production that met their requirements because you think you have "industry advice" is pure arrogance.

Every piece of software is written with the actual requirements in mind, defined (usually) by the people who know the context they are working with the best. Not by armchair specialists.

Zababa · on July 29, 2021

I don't understand your point, you're building a strawman and then attacking it. This architecture works for them, and will probably keep doing so as you can see on the other response.

> Experience shows there’s just no reason to talk to people about these specifics because even though they’ve never implemented it before or even done the math, you definitely know better than I do.

But they DID implement it and it works perfectly well for them. I don't understand why you're splitting hairs about fantasy scenarios. If they had a different problem, they would solve it differently. If you've faced something like that, feel free to share you story, what were the constraints and how you solved them.

andy_ppp · on July 29, 2021

Using the right tool for the right job is important, here they chose Elixir and it’s working well for them. There’s no need to be this disgruntled with your ill informed opinions about the Erlang VM (gc not very good - no it’s just optimised for different things, latency being one). The Elixir core team have recently released nx-elixir which allows much faster maths and allows mutability as the calculations happen outside of the runtime, so you might want to look at this before you make assumptions.

anonymousDan · on July 29, 2021

To claim GC of the JVM as an advantage Vs elixir is farcical, the lack of shared state/independent heaps is one of the main advantages of Erlang Vs stop-the-world GC with the JVM.

h0l0cube · on July 29, 2021

Another recent advance is the BEAM JIT. And I'm pretty sure you can call native code using 'nifs' for physics calculations etc. I'm wondering if the OP is basing their opinion on recent performance metrics, or just speculation, or something they heard?

masklinn · on July 30, 2021

TBF it depends on the workload and architecture (likewise, say, Go v JVM GC).

If you can split your workload in tons of actors the beam GC works great. If you need very large synchronous actors (even just a few) then it’s excruciating, so is splitting a simple single-threaded sequential workload into a concurrent mess just so the GC does not eat itself.

Of course you probably should not be using erlang/elixir then. But let’s not pretend the beam gc doesn’t have issues and pitfalls.

anonymousDan · on July 30, 2021

Can you give an example of what you mean by 'synchronous actors'? Blocking on what exactly? The point is surely that elixir/Erlang makes it difficult to write concurrent code with lots of global shared state in the first place no?

masklinn · on July 30, 2021

> Can you give an example of what you mean by 'synchronous actors'?

Something like simulation (e.g. physics), a sequential long-running CPU-bound process.

> Blocking on what exactly?

I never used the word "blocking", I'm not sure where you imagined it from.

> The point is surely that elixir/Erlang makes it difficult to write concurrent code with lots of global shared state in the first place no?

Never used the words "global" or "shared" either.

I used the words "sequential" and "single-threaded" though, which you apparently completely swung by. The entire comment was about non-naturally-concurrent workloads which would not trivially map to concurrent actors (or even not be parallelisable at all).

In which case the GC is just a fairly simplistic STW.

I sadly don't remember the name even though I've been scratching my head, but way back when (in the early aught when I first played with erlang, before SMT and Programming Erlang) one of the major "public" erlang codebase was a GUI-based tool which did basically everything in one enormous process, complete misuse / completely unsuitable to erlang.

anonymousDan · on July 30, 2021

Thanks, yes usually when I hear synchronous I think of some kind of blocking call - as you say not really what you were intending here. I hadn't really thought about GC latency for a single threaded app. Is the erlang GC any worse than e.g. the JVM for this case?

masklinn · on July 31, 2021

> I hadn't really thought about GC latency for a single threaded app. Is the erlang GC any worse than e.g. the JVM for this case?

It is a lot, lot worse yes.

Because of the purpose of Erlang / Elixir, and the way the runtime functions, the GC is rather simple (though not trivial): it's a stop-the-world generational (2) semispace collector. So on a GC run, the execution stops, the GC acquires a new empty heap, scans the stack (the "root-set"), traverses the tree of heap object, and each heap object it finds is copied to the new heap (the actual process is a bit different but that's the idea).

That works well, and the generational hypothesis applies nicely because Erlang only has immutable data structures so unlike Java and friends an Erlang object can not refer to something younger than it is. However it generates a lot of garbage as an "update" requires creating a new object.

It's quite simple, and has good (though not amazing) throughput, but it has horrible latency.

The trick is, the GC works per process. So the "world" it stops is a single process, and all the stack scanning and faffing about is per-process, meaning on a stop it might have to deal with kilobytes of data, megabytes at the absolute worst. It doesn't need to scan the entire runtime and go through gigabytes of heap as Java commonly does, because Erlang processes are shared-nothing (aside from the global ref-counted "shared heap" but that's a bit of a special case). This means with "normal" usages (normal for BEAM) a given GC run has very little memory to scan and not too much work to do per collection, it's essentially leveraging the other characteristics of the language to create an emergent concurrent low-latency collector, the concurrency and latency are not part of the design of the GC, but instead part of the system the GC is used in.

All of that falls over when you start moving away from lots of small actors, and towards few big actors. Then the size of the stack increases, the amount of garbage explodes, and the concurrency drops precipitously, because the GC's "world" covers a larger and larger amount of the program's surface.

Zababa · on July 29, 2021

They did implement their game server in Elixir, and it performs great for them in reality. From what I understand, they are supporting 10 000 players with a low CPU and memory usage, so if this scales linearly they should be able to support 100k players. You're also ignoring the reliability aspect, which is why they chose Elixir (and the BEAM) in the first place. Performance is also less of a problem since they're on the server, and thus can add more hardware as needed.

Could you share a bit more about your experience in the industry? I think you and they have different use cases, which makes you reach different conclusions.

namelosw · on July 30, 2021

> GC is not very good ( vs Java / C# / Go )

IMHO it’s the opposite, at least in the context of games: In languages like Java and C#, the GC is global and freezes the world. The time would be in terms of milliseconds, while in Erlang the GC is per-process, in terms of microseconds.

So in terms of GC and latency, Erlang is not as good as C++ and Rust, but it should be better than Java and C#.

> Also it must be pretty painful to not re-use their C++ client code on the server side, they have to re-implement twice?

I haven’t dive into the problem but my intuitive tells me it could be done through NIF. It’s a pretty common practice to use Rust to compliment Elixir in performance hotspots.

yellowapple · on July 30, 2021

Re: performance issues, offloading CPU-intensive operations to NIFs (or ports, if I/O bandwidth needs are low enough and you need to run something long-running) is a pretty common strategy for speeding up Erlang (and therefore Elixir) applications. It doesn't seem like the X-Plane folks have felt the need to do this, but it's definitely a possibility should they run into some computational bottleneck.

Re: GC, the Erlang VM does it per-process ("process" in the EVM's sense, i.e. a preemptively-scheduled userspace/"green" thread rather than an OS process), so not only are GC operations fast, but they also only affect the process being GC'd. The immutability is also a factor here; since data is copied instead of modified in place, it's arguably a lot easier to know when to discard unused objects (namely: for an object to persist within a process, the recursive function(s) backing some process has to explicitly pass it along to the next invocation, so that's a natural place to discard everything else).

tudelo · on July 29, 2021

So, since you seem to have some knowledge here, what would you suggest a single engineer trying to make a somewhat scalable multiplayer game? Either instanced with <64 players or a more larger user count.

I did some experimenting with rolling my own networking layer in c++ and it was a lot of work, and tried to do it through unity but it seems to require quite a large investment in time to understand how networking fits in the overall picture of the game. But maybe I'm looking for magic where it does not or can not exist :)

Thaxll · on July 29, 2021

The way to go is usually to re-use your game client and strip the rendering part.

ravi-delia · on July 29, 2021

I don't disagree with you on running games on the BEAM (although slow is a little too far imo), but in this case what they're doing is using Elixir to connect clients with the server. The actual physics simulation and such isn't running on Elixir.

edit: I just read through the article, and it doesn't seem to say one way or the other. 99% sure they aren't running any kind of physics engine on the BEAM, but there isn't anything specific

Thaxll · on July 29, 2021

The article is no clear on what the server is actually doing, for example just replicating state between players through that hub or it's doing simulation.

Either way they had to re-implement Racknet which is annoying because from now on everytime they add / change something in the network layer they won't be able to re-use the client code in the server.

I'm curious on why they did not go with C++.

Zababa · on July 29, 2021

Answered here: https://news.ycombinator.com/item?id=28000050

"Despite being way more familiar with C++, I can't imagine going from zero to production in 6 months (or even 2 years) if we'd tried to do this in C++"

s3cur3 · on July 29, 2021

This post on our dev blog gives some detail on the language decision here:

https://developer.x-plane.com/2021/01/have-you-heard-the-goo...

jbluepolarbear · on July 30, 2021

What would you choose to build this same feature set? Would it be feasible for 1 person in 6 months? What if C++ isn’t an option?

andrewmcwatters · on July 29, 2021

Only on HN can you provide specific industry advice and be downvoted for your insight because your opinion wasn’t agreeable.

I upvoted your comment to bring it out of 0 or less because these concerns are valid and critically make or break important when building multiplayer servers.

My relevant background here is in building multiplayer server software where a minimum of 1000 concurrent players is a required goal metric.

Zababa · on July 29, 2021

I don't think the downvoting is linked to opinions, it's linked to facts. Why should "specific industry advice" be preferred over working code? That sounds dogmatic to me.

> My relevant background here is in building multiplayer server software where a minimum of 1000 concurrent players is a required goal metric.

They are handling 10k concurrent players with low resources usage.

andrewmcwatters · on July 29, 2021

That’s really great, but as soon as you understand their specific requirements it’s simple to go from feasible to do 100000 concurrent clients to impossible with compute budget constraints.

You can build a “game” with a million “concurrent” clients that do absolutely nothing, to dealing with thousands where you have tick rates that make that impossible.[1]

[1]: https://www.planimeter.org/grid-sdk/api/Tick_rate_and_bandwi...

Zababa · on July 29, 2021

I don't really understand your point. Are you saying that the resources used will scale exponentially with the number of players? If that's the case, I can see how this could make things difficult in the long run but caring about it now sounds a bit like premature optimization to me. Maybe the players don't even want to be all on the same world?

andrewmcwatters · on July 29, 2021

[flagged]

Zababa · on July 29, 2021

Sorry if I offended you.

Vgoose · on July 29, 2021

It used to be around every few weeks I would read something cool about Elixir. This past two weeks it's down to every few days.

I think someone's trying to tell me something.

sergiotapia · on July 29, 2021

I've been writing Elixir since 2016 and I don't see myself switching anytime soon. It's a beautiful language to use, just nice and lovely. The functional aspects mean every function has no side effect. It's very very liberating.

thibaut_barrere · on July 29, 2021

It is true that _most_ functions do not have side effect and that this is very liberating (makes it easier to refactor and maintain). Also, most of the time when there is a side effect, it is clearly visible (due to using a stateful abstraction or library).

So all in all while stating "every function has no side effect" is incorrect, I still share your overall feeling.

dudul · on July 29, 2021

> The functional aspects mean every function has no side effect.

This is not accurate. Any function can print, can call a DB or a web service.

"Purity" (as in referentially transparent) is tangential to function programming. It can be achieved with other paradigms. That being said, functional programming does encourage it.

c4wrd · on July 29, 2021

Is there any open source implementations of game servers/highly concurrent production servers like this for reference (or at least blog posts that contain a good amount of code)? I can find a lot of toy examples, but I'd love to see some insight onto an actual implementation of a product that is open-source as I'm learning Elixir.

s3cur3 · on July 29, 2021

The core of the X-Plane server is our RakNet UDP protocol, which is open sourced under the MIT license here:

https://github.com/X-Plane/elixir-raknet

It's not a full game server, but the "Usage" section of the README provides a sketch of what the rest of the server (the part that implements the business logic) looks like.

auraham · on July 29, 2021

I am learning Elixir and I'd love to see a tiny game made in Elixir to play with and learn from the code. Any examples?

masklinn · on July 30, 2021

I really don’t think games are the sort of things Elixir would be well suited to, in general.

Games would generally be going against the grain if the langage, not easily leveraging its strengths and really needing areas where it’s weak.

A game server, however, could be good. At least if you don’t need very high performances (e.g. if you need fully accurate physics sim on the server then probably not, plus you’d want to run the game’s own sim anyway).

z3t4 · on July 29, 2021

Elexir might be a nice language, but you could probably also implement the server using socat... That said, scaling to tens of thousands of concurrent players is like the holy grail, especially if you want to shoot stuff and there need to be hit detection, and safe from client modifications/hacks.

hinkley · on July 30, 2021

I recall when I used to read the Eve Online technical blog, thinking they were reinventing Erlang, badly.

Simonflare · on July 29, 2021

Who else puts Elixir in their Tinder bio?

cpursley · on July 29, 2021

Of course, it's the best way to get lots of concurrent connections!

conradfr · on July 29, 2021

And hopefully a lot of pattern matching?

syspec · on July 30, 2021

HN != Reddit