But second, I'd love to understand the compute vs storage tradeoff chosen here. Looking at the (pretty!) picture [1], I was shocked to see "Wow, it's mostly storage?". Is that from going all flash?
Given how much of the rack is storage, I'm not sure which Milan was chosen (and so whether that's 2048 threads or 4196 [edit: real cores, 4196 threads]), but it seems like visually 4U is compute? [edit: nope] Is that a mistake on my part, because dual-socket Milan at 128 threads per socket is 256 threads per server, so you need at least 8 servers to hit 2048 "somethings", or do the storage nodes also
have Milans [would make sense] and their compute is included [also fine!] -- and so similarly that's how you get a funky 30 TiB of memory?
[Top-level edit from below: the green stuff are the nodes, including the compute. The 4U near the middle is the fiber]
P.S.: the "NETWORK SPEED 100 GB/S" in all caps / CSS loses the presumably 100 Gbps (though the value in the HTML is 100 gb/s which is also unclear).
Leaving that RAM for ZFS L2 ARC perhaps? I don't think they would use Illumos as the hypervisor OS without also using OpenZFS with it. They also need some for management, the control UI, a DB for metrics and more.
Btw. if I count correctly, they have 20 SSD slots per node (if a node is full width) and 16 nodes. They would need 2 TB to reach 1 PB of "raw" capacity with the obvious redundancy overhead of ~ 20%.
It is also quite possible, they don't use ZFS at all and use e.g. Ceph or something like it but I don't think that is the case, because that wouldn't be cantrillian. :-) E.g. using Minio, they can provide something S3 like on top of a cluster of ZFS storage nodes too but they most likely get better latency with local ZFS and not a distributed filesystem. Financial institutions especially seem to be part of the target here and there latency can be king.
I'm fairly confident the nodes are half width; if you look at the latches it very much would appear you can pull out half of every 2u at once, and if you look at the rear there's 2 net cables going into each side.
Good observation, it looks like it. It probably makes upgrading/ maintenance easier since the unit of failure is smaller. Of course, you can also only tackle stuff, that demands no more than 64 cores before you have to rearchitecture your monolith into a distributed system, which has lots of overhead.
Duh! I got tricked by the things near the PDU as "oh, these must be the pure-compute nodes".
So maybe that's the better question: what are the 4U worth of stuff surrounding the power? More networking stuff? Management stuff? (There was some swivel to the back of the rack / with networking, but I can't find it now)
Edit: Ahh! The rotating view is on /product and so that ~4U is the fiber. (Hat tip to Jon Olson, too)
Control-plane most likely, and having a mid-centered PDU probably adds to heat on the upper stack, which shortens life over time.
As someone who has designed quite a few datacenters, whats more interesting to me in this evolution of computing is the reduction in cabling.
Cabling in a DC is a huge suck on all aspects - plastics, power, blah blah blah - the list is long....
But there are a LOT of cabling companies that do LV out there - so the point is that when these types of systems get more "obelisk" like, are many of these companies going to die? (I'm looking at you Cray and SGI.)
When I worked at Intel - I had a friend who was a proc designer at MIPS - and we talked about rack insertion and a global back-plane for the rack (which we all know to be common now) - but this was ~1997 or so... but when I built the Brocade HQ - cables were still massive and it was an art to properly dress them.
Lucas was the same - so many human work hours spent on just cable mgmt...
Their diagrams of system resiliency is odd in my opinion:
That looks like a ton of failures that they can negotiate...
Whats weird is the SPF isn't going to be in your DC/HQ/Whatever - its going to be outside - this is why we have always sought +2+ carrier ISPs or built private infra...
A freaking semi truck crashed into a telephone pole in Sacramento the other day and wiped comcast off the map to half the region.
Thats ONE fiber line that brought down 100K+ connections...
---
EDIT: I guess what I am actually saying is that this entire marketing strat is to convince any companies that *"failure is imminent and please buy things that are going to fail, but don't worry because you bought plenty more things to live beyond the epic failure that these devices will have"*
---
Not to discredit anything this company has going for its product - but their name is literally "RUST" (*oxide*) --- which we all know is what kills metal.
On the topic of naming, there was thought put into it...
> With accelerating conviction that we would build a company to do this, we needed a name — and once we hit on Oxide, we knew it was us: oxides form much of the earth’s crust, giving a connotation of foundation; silicon, the element that is the foundation of all of computing, is found in nature in its oxide; and (yes!) iron oxide is also known as Rust, a programming language we see playing a substantial role for us. Were there any doubt, that Oxide can also be pseudo-written in hexadecimal — as 0x1de — pretty much sealed the deal!
Power footprint also confirms that the compute density is pretty low.
We built a few racks of Supermicro AMD servers (4 X computes in 2U), and we load tested it to 23kva peak usage (about 1/2 full with nthat type of nodes only, our DC would let us go further)
Were also over 1 PB of disks (unclear how much of this is redundancy), also in NVMe (15.36 TB x 24 in 2U is a lot of storage...)
Other then that not a bad concept, not sure of a premium they will charge or what will be comparable on price.
But second, I'd love to understand the compute vs storage tradeoff chosen here. Looking at the (pretty!) picture [1], I was shocked to see "Wow, it's mostly storage?". Is that from going all flash?
Heading to https://oxide.computer/product for more details, lists:
- 2048 cores
- 30 TB of memory
- 1024 TB of flash (1 PiB)
Given how much of the rack is storage, I'm not sure which Milan was chosen (and so whether that's 2048 threads or 4196 [edit: real cores, 4196 threads]), but it seems like visually 4U is compute? [edit: nope] Is that a mistake on my part, because dual-socket Milan at 128 threads per socket is 256 threads per server, so you need at least 8 servers to hit 2048 "somethings", or do the storage nodes also have Milans [would make sense] and their compute is included [also fine!] -- and so similarly that's how you get a funky 30 TiB of memory?
[Top-level edit from below: the green stuff are the nodes, including the compute. The 4U near the middle is the fiber]
P.S.: the "NETWORK SPEED 100 GB/S" in all caps / CSS loses the presumably 100 Gbps (though the value in the HTML is 100 gb/s which is also unclear).
[1] https://oxide.computer/_next/image?url=%2Fimages%2Frenders%2...