Intel Clear Linux: 16% more Ryzen 9 9950X performance

tiffanyh · on Aug 20, 2024

PHP: 2x perf with Clear Linux vs Ubuntu

That's a staggering improvement.

https://www.phoronix.com/review/linux-os-amd-ryzen9-9950x/6

RachelF · on Aug 20, 2024

That's what comes with actually using the instruction set of modern CPU's instead of the lowest common denominator instruction set.

However, note that some of the modern instructions like AVX-512 use a lot of electrical power on the CPU, for example pushing a 12gen Intel chip above 300W.

Lockal · on Aug 21, 2024

I wouldn't write it all off on the AVX-512 in this case. As you can see in https://github.com/clearlinux-pkgs/php and other clearlinux repositories, the difference from other distros is:

1) a number of backported optimization patches

2) few patches in Linux kernel and glibc that were not accepted to upstream

3) and maybe the most important part: PGO for python, php and other performance-critical components. It is a well-known fact that PGO helps interpreted languages. And it is benchmark-oriented: for example, even though PGO helps bash, they don't enable it (maybe until somebody invents bash-bench).

paulryanrogers · on Aug 21, 2024

Does benchmark-oriented means it's unlikely to help (or could even hurt) real world applications?

Lockal · on Aug 22, 2024

ClearLinux pulls phpbench-0.8.2 (and no other applications) for PGO[1] and then Phoronix benchmarks with phpbench-0.8.1. In my opinion, that's a valid reason to disqualify the result. It does not mean that PGO makes PHP faster or slower, it means that there are no measurements to make the conclusion. I only have theory, that PGO generally helps to reduce startup time, so that in case when app starts 1000s times per second, it helps even if PGO data is irrelevant. But modern PHP does not behave in this way, also some real-word PHP apps are usually CPU / I/O bound in database, so one may see no difference...

Some time ago I tried to apply PGO to Blender, which only hurt the performance, but that's the other story (I read it as: if application already has logical separation for kernels for CPUs, attempt to join and reseparate in PGO may hurt the performance).

[1] https://github.com/clearlinux-pkgs/php/blob/03c0859388a5d640...

lathiat · on Aug 21, 2024

Ubuntu is experimenting with making a more modern CPU build: https://discourse.ubuntu.com/t/trying-out-ubuntu-23-04-on-x8... https://www.phoronix.com/review/ubuntu-x86-64-v3-benchmark

Also -O3: https://www.phoronix.com/news/Ubuntu-Evaluating-O3-Optimized

I am not sure off hand how that CPU level compares to Intel Clear Linux's options. The phoronix comparison fo Ubuntu with and without v3 were mostly quite small single digit percent though, on an Intel CPU.

Notes: Canonical (Ubuntu) is my employer, I am not working on this project.

justinclift · on Aug 21, 2024

Hmm, -O3 at least used to have a bad reputation for producing potentially buggy and/or slow code.

That was years ago though, and I have no idea if that's changed for the better.

lathiat · on Aug 23, 2024

I do solidly remember this from the "earlier" (200x) Gentoo days.

luyu_wu · on Aug 21, 2024

An interesting part of Zen 5 architecture was related to lowering the absurd instruction passthrough time/power if I remember correctly!

This definetly used to be the case on the older Intel implementations of large SIMD instructions (my Tiger lake chip being an anecdote)!

CoastalCoder · on Aug 21, 2024

This is what bothers me about the characterizations of CPU architecture in ELF files and (IIRC) Python eggs...

They're so vague about which particular x86 ISA variants are compatible with the binaries.

So, e.g., a program loader doesn't necessarily verify that the binary will work on the CPU.

Or, as we see with Linux distros, they build packages for the lowest common denominator ISA.

(Note: it's been a while since I dealt with this, so I might be remembering a bit wrong.)

Arrath · on Aug 20, 2024

> for example pushing a 12gen Intel chip above 300W

And pushing 13th and 14th gen chips into an early grave! /s

ElectricalUnion · on Aug 20, 2024

Not really because AVX-512 isn't supported by those platforms; supposedly something related to processes in general being very unhappy if they're being scheduled to E cores that doesn't handle those instructions randomly.

Havoc · on Aug 20, 2024

The per OS divergence on benchmarks on these chips seem weird.

I've seen multiple of the usual YT suspects comment on it, but haven't seen someone clearly articulate wth is going.

autoexecbat · on Aug 20, 2024

It's the compilers.

> Clear Linux makes use of compiler function multi versioning, performance-minded defaults, aggressive compiler CFLAGS/CXXFLAGS defaults, optional AVX-512 usage for more libraries, and many other patches and optimizations in the name of delivering the greatest x86_64 Linux performance.

bravetraveler · on Aug 20, 2024

They even go so far as to encourage filesystem journaling to happen less rigorously. A lot of interesting patches in their repo.

Good for fanciful numbers, perhaps not some work or scenarios

crest · on Aug 20, 2024

Compiling a few dozen important libraries with a higher x86_64 instruction version unlocking AVX2 or even AVX-512 by default can make a large difference. Normally these optimisations aren't used by distro packages because the libraries aren't written in such a way that one binary runs as fast as possible on new CPUs and still runs at all on older CPUs. Instead processes just crash with an illegal instruction exception (aka SIGILL). The tooling exists to allow such libraries and executables to be created and have the runtime linker pick the right implementation at link time, but support for it is spotty.

rightbyte · on Aug 20, 2024

> The tooling exists to allow such libraries and executables to be created and have the runtime linker pick the right implementation at link time, but support for it is spotty.

Attribute target_clones works quite well with gcc in my experience, but it is done all by hand for worthwhile functions and should interfere with inlining and short jumps.

abhinavk · on Aug 21, 2024

In addition to what other commenters said, Clear Linux goes all the way but you can get somewhat there with CachyOS. It's Arch with compiler optimizations and targets CPUs with AVX2 and newer.

jcalvinowens · on Aug 21, 2024

I'd love to see how Gentoo sustem built with -march=native looks on those benchmarks. I'd guess it would perform better.

jmakov · on Aug 20, 2024

Any benefits if you compile your own code base?

blibble · on Aug 21, 2024

sounds like -funroll-loops is back