So, we have found that that really IS correct, though perhaps not in the ways yo...

throwawaylinux · on July 2, 2023

> So, we have found that that really IS correct, though perhaps not in the ways you might think.

I was replying more to the idea there's all this costly legacy IO.

I guess you aren't the first to try different geometries or power delivery or cabling either. There's been lots of little opencompute-type efforts and startups come and go. I'm skeptical there's a lot in it in a significant niche that does not already do these things, but you don't need to convince me. Although if you did want to you could show some comparative density numbers. What can you fit, 2048 cores, 32TB, and push 15kW through a rack?

bcantrill · on July 2, 2023

Yeah: 32 AMD 7713P (64 cores/128 threads), so 2048 cores/4096 threads, 32 TB of DRAM, 1PB of NVMe, with 2x100GbE to dual, redundant 6.4Tb/s switches -- all in 15 kW. In terms of other efforts, there have certainly been some industry initiatives (OCP, Open19), but no startups that I'm aware (the smaller companies in the space have historically been integrators rather than doing their own de novo designs -- and they don't do/haven't done their own software at all); is there one in particular you're thinking of?

throwawaylinux · on July 2, 2023

Well the first OCP specification more than 10 years ago specced 13kW per rack. ORv3 is up to 30kW now, some vendors are pushing 40 and more. So maybe I'm missing something, didn't really seem like density was a major point of difference there.

And not one in particular, there's just a bunch that have sprung up around OCP over the past decade. None that I'm aware of that are doing everything that Oxide does, but we were talking more about the mechanical, electrical, and cooling side of it there -- they do seem to do okay with power density.

bcantrill · on July 2, 2023

To be clear, the problem is in how the power budget is being spent (most enterprise DCs don't even have 15 kW going to the rack). The question on density is: how can you get the most compute elements into the space/power that you've got, and cramming towards highest possible density (i.e., 1U) actually decreases density because you spend so much of that power budget cranking fans and ramming air. And the challenge with OCP is: those systems aren't for sale (we tried to buy one!) and even if you got one, it has no software.

throwawaylinux · on July 3, 2023

> To be clear, the problem is in how the power budget is being spent (most enterprise DCs don't even have 15 kW going to the rack).

I thought you were going for cloud DCs rather than enterprise. Seems like a big uphill battle to get software certified to run on your platform. Are any of the major Linux distros or Windows certified to run on your hypervisor platform? Any ISVs?

> The question on density is: how can you get the most compute elements into the space/power that you've got, and cramming towards highest possible density (i.e., 1U) actually decreases density because you spend so much of that power budget cranking fans and ramming air.

The question really is how much compute power, and electrical/thermal power ~= compute power. Sure you could fit more CPUs and run them at a lower frequency or duty cycle.

> And the challenge with OCP is: those systems aren't for sale (we tried to buy one!) and even if you got one, it has no software.

OCP is a set of standards. They certainly are systems sold. I guess the nature of the beast is that buyers of one probably don't get taken very seriously, particularly not a competitor.

riking · on July 4, 2023

> electrical/thermal/support power ~= compute power.

Yes, and using larger fans decreases the proportional there. Centralized rectifiers reduces the proportional. Google can make it all the way down to 10% overhead power. That's the point they are making.

throwawaylinux · on July 5, 2023

Yes, and the question I am asking is how much better are they doing than competition there. For compute density, the real cloud racks are pushing 30-40kW per rack so even if they were 20% and oxide 0%, they don't seem to have a compute density advantage there.