This is the kind of thing that I love about Elixir. As great as Phoenix is, realistically you are going to be equally productive in any web framework. Elixir is my favorite language because it gives you primitives that perfectly suit your needs for just about anything running on a socket, especially once you add in Plug.
Throwing together a prototype is instant, but the awesome part is how quickly you can turn that prototype into something useful. More concurrency? Done. Need to sock away some state somewhere you can easily find and deal with it? Easy. It's easy to write, easy to reason about, and gives you fantastic composable tools to put something great together. It doesn't feel like a codebase so much as an incredibly well managed and cohesive OS.
Learning curve is a little steep, because of the functional aspect. However, the benefit always was worth my time. Our tiny server runs blazing fast in production and picks off background jobs incredibly fast
I never really had trouble with that aspect with it, but I also never really mastered OO programming. The paradigm shift is probably tough, but I sucked enough beforehand that I didn't have to deal with it.
Most people might have experience with functional-ish features like HoFs and lambdas in imperative langages but Elixir, through Erlang, is much more of the package: immutable structures and bindings, expressions-oriented, patterns, limited support for iteration. It’s not pure or anything, but it’s closer to OCaml than it is C# or Javascript.
“I’ve used Array#map” doesn’t mean you know functional programming.
Yes sir. People would have fun doing .entryList() or map.each do {} end which seems trivial but however with something as functional as Elixir, we can only achieve this with the Enum module for example. There is no iteration but only a bunch of methods from which to choose based on your requirement
The limitation is more about the number of players we can have in the same area—if you had a bunch of players equally spaced around the globe, I'm pretty sure we'd saturate the Ethernet link before running out of CPU on a big (say, 32-core) VM. The CPU cost (and outbound network bandwidth) of a player scales with the square of the number of other players who can potentially "see" them.
In my scalability testing, we could handle upwards of 1000 players in the same 50 nm by 50 nm square on my 8 core dev workstation. That's pretty far beyond the limit of what would actually be _fun_ to fly the sim with. It becomes pandemonium even with 100 planes in the same area. We could easily do a dozen pockets of 100 planes in the same area even on our current 8 core VM.
>In my scalability testing, we could handle upwards of 1000 players in the same 50 nm by 50 nm square on my 8 core dev workstation. That's pretty far beyond the limit of what would actually be _fun_ to fly the sim with.
But it would make a great fit for an MMO. AFAIK there aren't any of those now levering Erlang / Elixir. If anything they're doing the opposite and trying to contain people in smaller and smaller instances.
In general, we send each player all the planes that their in-game map could possibly show (which is usually something like 140 statute miles by 140 statute miles).
We could one day be smarter about this and only send you everything if your map is actually open, and otherwise limit it based on your field of view. That hasn't been a priority yet, though.
Does X-Plane feature projectiles at all (guns, missiles)? If so, do those have to be sent over the network to deal with collisions with laggy hit boxes and such?
Hey Tyler, this is really nice to go through. I just heard your podcast (1). I do not know of X-Plane.
I want to ask: do you share the nearest-region data to all planes within a region only for informational purpose or is there a crash possibility and you take planes out of the simulation, etc.?
I mean like in other MMO games, can players hurt each other and that information actually affects other client apps? That is where, I feel (I am new to this too), the global state mutations become more expensive right?
"this article explores why they chose Elixir and how a team of one developer - without prior language experience - learned the language and deployed a well-received multiplayer experience in 6 months."
Building a gameserver in Elixir is not great imo, performance is subpar. I work in online games and no one would want to use that language.
I guess they have low constrains since they only need to support 10000 players at a time and they want a single instance for that.
- Erlang runtime is slow
- Memory consumption is high
- GC is not very good ( vs Java / C# / Go )
- high cost in term of performance because of immutability
It's not the kind of things you want for a game.
To give you some perspective on popular FPS games, they run between 30hz and 120hz with up to 64 players and usually only use a single CPU core. And they're doing gameplay simulation / state replication / physics sometimes AI etc ...
Also it must be pretty painful to not re-use their C++ client code on the server side, they have to re-implement twice? ( this concern is actually higher than the performance one )
Overall I think it's an impressive achievement to be able to do that for a single person in 6 month!
- Memory consumption in production peaks at like 300 MB for us
- Scalability (i.e., being able to go wide on, say, a 64-core CPU) was more important to us than raw, synchronous speed
- Garbage collection is not something I've ever had to think about with Elixir (which hasn't been my experience with Python and Java)
- Because it's a sim and not something like a FPS, we don't have to worry about cheating (what would cheating even mean?), so we don't have to do any physics on the server side
- Despite being way more familiar with C++, I can't imagine going from zero to production in 6 months (or even 2 years) if we'd tried to do this in C++
This is such a rudimentary question in game development that I don’t know how you can’t answer it. Mainstream server authoritative design has been a thing since 1999, and probably even longer when you look beyond Quakeworld.
All it takes is a few malformed packets from a script kiddie and then you realize you have to do distance checking and other anti-cheat functionality.
“Cheating” in gamedev doesn’t specifically mean there is a win scenario and you need to make sure people don’t shortcut that and play unfairly.
It means in this case that someone can’t connect to your server and transport their plane to another nearby player and wave it around in their flight path to annoy others while they spam chat to grief people for fun and to produce a YouTube video.
No need to get aggressive. It's a high fidelity simulator. There is no goal, no missions, no points, no wins or losses. You just fly.
There is probably very little the server is actually doing. Player state as seen from the server is probably just x,y,z coords, plane type, etc. Just the bare minimum to be able to draw another player in the game.
For that use case Elixir (or Go, or maybe even Python with more computational resources) is a very good fit.
Yes but lets acknowledge that it's not really a game, and that nobody should do this for any kind of public game with winners and losers.
A multiplayer server needs to run the simulation for every player and be the authority, the clients should simply collect input from the player and run a "lite" simulation to hide latency.
No, not every multiplayer game needs an authoritive server. It depends largely where your priorities are. For myself, having a base of players with some cheaters would be a great problem to have.
Yeah but, the solution to your great problem is to write the authoritative server, so I would recommend you just do that in the first place. It's not really any extra work if you are starting a new project. (An authoritative server is just a headless client after all :) )
RTS games typically run deterministic simulations locally, and only sync inputs, with no real need for an authority. The anti-cheat mechanism is simply the fact that simulation state has to stay in sync, and artificial changes would trivially desync.
Of course, this leads to the standard RTS cheat — map hacking — because the client has to have everything by definition. But you have enough entities in an RTS that syncing full state, or even delta, gets ridiculous quickly, so it’s just something that has to be dealt with. Fighting games also tend to do something similar — deterministic lockstep with rollback — which also sensibly runs with no authority (again, you can’t teleport/wallhack because you’re only passing player inputs, and moving just your character is the same as the sim desyncing)
But my main point is that an authoritative server is not a guaranteed architecture for multiplayer games — the game design should definitely factor in (as it does in OP’s case).
Sorry, to clarify, our thinking on this point was: our players aren't particularly motivated to lie to the server since there isn't a "goal" to a flight simulator. Given the choice between shipping (much, much) faster and completely locking down the experience against the vague, yet-to-be-realized threat of people sending fake packets to the server, we opted for the former.
Have you had any reports of multiplayer griefing so far? I wonder if griefing happens in serious flight simulators like X-Plane and what form it takes if it does. (If there are griefers, then they might conceivably find a way to exploit this. But I am asking mostly because I am curious about what the player community is like.)
Griefing, yes. The audience for our mobile app skews young and, shall we say, less "serious aviators" than the desktop sim, so to some degree people like that you can buzz a 747 on short final in an F-22. :D
Long-term we've talked about things like a reputation system, where flying responsibly would earn you points to get into a separate "world" filled with people who also want to do more serious flying.
I would also wonder about vectors for causing vandalism (botting your client to write racist words in the sky with a fleet of stunt planes emitting smoke trails), or other general forms of malice (trying to DoS or otherwise crash the server).
Presumably X-Plane is small enough and "boring" enough that your only malicious interactions would be only lazy drive-by attempts, but those kinds of situations would be more concerning to me than someone modding the client to make their plane fly at Mach 7.
You’re misunderstanding, there isn’t the concept of winning in a simulator like this. It’s not goal driven it’s practicing modelling the real world not showing how many points you get when you land a plane.
I’m not misunderstanding, it’s just clear that they have virtually no requirements. As soon as you have any meaningful requirements, this entire architecture probably falls apart, let alone the fact that you can’t hire for it, as no one is going to want to work on your obscure game server.
The criteria are also so brittle at those concurrent numbers that if the game decided to do any more than what they do today, you would face engineering challenges more difficult than other game server software.
As you deal with more and more concurrent clients there are calculable limits to how you can facilitate more features and how bandwidth and server frame time budget limits you.
That's just a "No true Scotsman". Your definition of "virtually no requirements" also seems to be circular with this architecture failing or not. People have created a chatroom with 2 millions people connected at the same time with Elixir on a single box https://www.phoenixframework.org/blog/the-road-to-2-million-.... Your ideas about performance are not based in reality.
You are being disagreeable.
Requirements are requirements, sometimes they are easier, sometimes they are more complex.
But you dismissing an engineer who successfully shipped something on production that met their requirements because you think you have "industry advice" is pure arrogance.
Every piece of software is written with the actual requirements in mind, defined (usually) by the people who know the context they are working with the best. Not by armchair specialists.
I don't understand your point, you're building a strawman and then attacking it. This architecture works for them, and will probably keep doing so as you can see on the other response.
> Experience shows there’s just no reason to talk to people about these specifics because even though they’ve never implemented it before or even done the math, you definitely know better than I do.
But they DID implement it and it works perfectly well for them. I don't understand why you're splitting hairs about fantasy scenarios. If they had a different problem, they would solve it differently. If you've faced something like that, feel free to share you story, what were the constraints and how you solved them.
Using the right tool for the right job is important, here they chose Elixir and it’s working well for them. There’s no need to be this disgruntled with your ill informed opinions about the Erlang VM (gc not very good - no it’s just optimised for different things, latency being one). The Elixir core team have recently released nx-elixir which allows much faster maths and allows mutability as the calculations happen outside of the runtime, so you might want to look at this before you make assumptions.
To claim GC of the JVM as an advantage Vs elixir is farcical, the lack of shared state/independent heaps is one of the main advantages of Erlang Vs stop-the-world GC with the JVM.
Another recent advance is the BEAM JIT. And I'm pretty sure you can call native code using 'nifs' for physics calculations etc. I'm wondering if the OP is basing their opinion on recent performance metrics, or just speculation, or something they heard?
TBF it depends on the workload and architecture (likewise, say, Go v JVM GC).
If you can split your workload in tons of actors the beam GC works great. If you need very large synchronous actors (even just a few) then it’s excruciating, so is splitting a simple single-threaded sequential workload into a concurrent mess just so the GC does not eat itself.
Of course you probably should not be using erlang/elixir then. But let’s not pretend the beam gc doesn’t have issues and pitfalls.
Can you give an example of what you mean by 'synchronous actors'? Blocking on what exactly? The point is surely that elixir/Erlang makes it difficult to write concurrent code with lots of global shared state in the first place no?
> Can you give an example of what you mean by 'synchronous actors'?
Something like simulation (e.g. physics), a sequential long-running CPU-bound process.
> Blocking on what exactly?
I never used the word "blocking", I'm not sure where you imagined it from.
> The point is surely that elixir/Erlang makes it difficult to write concurrent code with lots of global shared state in the first place no?
Never used the words "global" or "shared" either.
I used the words "sequential" and "single-threaded" though, which you apparently completely swung by. The entire comment was about non-naturally-concurrent workloads which would not trivially map to concurrent actors (or even not be parallelisable at all).
In which case the GC is just a fairly simplistic STW.
I sadly don't remember the name even though I've been scratching my head, but way back when (in the early aught when I first played with erlang, before SMT and Programming Erlang) one of the major "public" erlang codebase was a GUI-based tool which did basically everything in one enormous process, complete misuse / completely unsuitable to erlang.
Thanks, yes usually when I hear synchronous I think of some kind of blocking call - as you say not really what you were intending here. I hadn't really thought about GC latency for a single threaded app. Is the erlang GC any worse than e.g. the JVM for this case?
> I hadn't really thought about GC latency for a single threaded app. Is the erlang GC any worse than e.g. the JVM for this case?
It is a lot, lot worse yes.
Because of the purpose of Erlang / Elixir, and the way the runtime functions, the GC is rather simple (though not trivial): it's a stop-the-world generational (2) semispace collector. So on a GC run, the execution stops, the GC acquires a new empty heap, scans the stack (the "root-set"), traverses the tree of heap object, and each heap object it finds is copied to the new heap (the actual process is a bit different but that's the idea).
That works well, and the generational hypothesis applies nicely because Erlang only has immutable data structures so unlike Java and friends an Erlang object can not refer to something younger than it is. However it generates a lot of garbage as an "update" requires creating a new object.
It's quite simple, and has good (though not amazing) throughput, but it has horrible latency.
The trick is, the GC works per process. So the "world" it stops is a single process, and all the stack scanning and faffing about is per-process, meaning on a stop it might have to deal with kilobytes of data, megabytes at the absolute worst. It doesn't need to scan the entire runtime and go through gigabytes of heap as Java commonly does, because Erlang processes are shared-nothing (aside from the global ref-counted "shared heap" but that's a bit of a special case). This means with "normal" usages (normal for BEAM) a given GC run has very little memory to scan and not too much work to do per collection, it's essentially leveraging the other characteristics of the language to create an emergent concurrent low-latency collector, the concurrency and latency are not part of the design of the GC, but instead part of the system the GC is used in.
All of that falls over when you start moving away from lots of small actors, and towards few big actors. Then the size of the stack increases, the amount of garbage explodes, and the concurrency drops precipitously, because the GC's "world" covers a larger and larger amount of the program's surface.
They did implement their game server in Elixir, and it performs great for them in reality. From what I understand, they are supporting 10 000 players with a low CPU and memory usage, so if this scales linearly they should be able to support 100k players. You're also ignoring the reliability aspect, which is why they chose Elixir (and the BEAM) in the first place. Performance is also less of a problem since they're on the server, and thus can add more hardware as needed.
Could you share a bit more about your experience in the industry? I think you and they have different use cases, which makes you reach different conclusions.
IMHO it’s the opposite, at least in the context of games: In languages like Java and C#, the GC is global and freezes the world. The time would be in terms of milliseconds, while in Erlang the GC is per-process, in terms of microseconds.
So in terms of GC and latency, Erlang is not as good as C++ and Rust, but it should be better than Java and C#.
> Also it must be pretty painful to not re-use their C++ client code on the server side, they have to re-implement twice?
I haven’t dive into the problem but my intuitive tells me it could be done through NIF. It’s a pretty common practice to use Rust to compliment Elixir in performance hotspots.
Re: performance issues, offloading CPU-intensive operations to NIFs (or ports, if I/O bandwidth needs are low enough and you need to run something long-running) is a pretty common strategy for speeding up Erlang (and therefore Elixir) applications. It doesn't seem like the X-Plane folks have felt the need to do this, but it's definitely a possibility should they run into some computational bottleneck.
Re: GC, the Erlang VM does it per-process ("process" in the EVM's sense, i.e. a preemptively-scheduled userspace/"green" thread rather than an OS process), so not only are GC operations fast, but they also only affect the process being GC'd. The immutability is also a factor here; since data is copied instead of modified in place, it's arguably a lot easier to know when to discard unused objects (namely: for an object to persist within a process, the recursive function(s) backing some process has to explicitly pass it along to the next invocation, so that's a natural place to discard everything else).
So, since you seem to have some knowledge here, what would you suggest a single engineer trying to make a somewhat scalable multiplayer game? Either instanced with <64 players or a more larger user count.
I did some experimenting with rolling my own networking layer in c++ and it was a lot of work, and tried to do it through unity but it seems to require quite a large investment in time to understand how networking fits in the overall picture of the game. But maybe I'm looking for magic where it does not or can not exist :)
I don't disagree with you on running games on the BEAM (although slow is a little too far imo), but in this case what they're doing is using Elixir to connect clients with the server. The actual physics simulation and such isn't running on Elixir.
edit: I just read through the article, and it doesn't seem to say one way or the other. 99% sure they aren't running any kind of physics engine on the BEAM, but there isn't anything specific
The article is no clear on what the server is actually doing, for example just replicating state between players through that hub or it's doing simulation.
Either way they had to re-implement Racknet which is annoying because from now on everytime they add / change something in the network layer they won't be able to re-use the client code in the server.
"Despite being way more familiar with C++, I can't imagine going from zero to production in 6 months (or even 2 years) if we'd tried to do this in C++"
Only on HN can you provide specific industry advice and be downvoted for your insight because your opinion wasn’t agreeable.
I upvoted your comment to bring it out of 0 or less because these concerns are valid and critically make or break important when building multiplayer servers.
My relevant background here is in building multiplayer server software where a minimum of 1000 concurrent players is a required goal metric.
I don't think the downvoting is linked to opinions, it's linked to facts. Why should "specific industry advice" be preferred over working code? That sounds dogmatic to me.
> My relevant background here is in building multiplayer server software where a minimum of 1000 concurrent players is a required goal metric.
They are handling 10k concurrent players with low resources usage.
That’s really great, but as soon as you understand their specific requirements it’s simple to go from feasible to do 100000 concurrent clients to impossible with compute budget constraints.
You can build a “game” with a million “concurrent” clients that do absolutely nothing, to dealing with thousands where you have tick rates that make that impossible.[1]
I don't really understand your point. Are you saying that the resources used will scale exponentially with the number of players? If that's the case, I can see how this could make things difficult in the long run but caring about it now sounds a bit like premature optimization to me. Maybe the players don't even want to be all on the same world?
I've been writing Elixir since 2016 and I don't see myself switching anytime soon. It's a beautiful language to use, just nice and lovely. The functional aspects mean every function has no side effect. It's very very liberating.
It is true that _most_ functions do not have side effect and that this is very liberating (makes it easier to refactor and maintain). Also, most of the time when there is a side effect, it is clearly visible (due to using a stateful abstraction or library).
So all in all while stating "every function has no side effect" is incorrect, I still share your overall feeling.
> The functional aspects mean every function has no side effect.
This is not accurate. Any function can print, can call a DB or a web service.
"Purity" (as in referentially transparent) is tangential to function programming. It can be achieved with other paradigms. That being said, functional programming does encourage it.
Is there any open source implementations of game servers/highly concurrent production servers like this for reference (or at least blog posts that contain a good amount of code)? I can find a lot of toy examples, but I'd love to see some insight onto an actual implementation of a product that is open-source as I'm learning Elixir.
It's not a full game server, but the "Usage" section of the README provides a sketch of what the rest of the server (the part that implements the business logic) looks like.
I really don’t think games are the sort of things Elixir would be well suited to, in general.
Games would generally be going against the grain if the langage, not easily leveraging its strengths and really needing areas where it’s weak.
A game server, however, could be good. At least if you don’t need very high performances (e.g. if you need fully accurate physics sim on the server then probably not, plus you’d want to run the game’s own sim anyway).
Elexir might be a nice language, but you could probably also implement the server using socat... That said, scaling to tens of thousands of concurrent players is like the holy grail, especially if you want to shoot stuff and there need to be hit detection, and safe from client modifications/hacks.
Throwing together a prototype is instant, but the awesome part is how quickly you can turn that prototype into something useful. More concurrency? Done. Need to sock away some state somewhere you can easily find and deal with it? Easy. It's easy to write, easy to reason about, and gives you fantastic composable tools to put something great together. It doesn't feel like a codebase so much as an incredibly well managed and cohesive OS.