It seems the original sin was having an intmax type in the first place, instead of only fixed-size types. And then of course the stupidity in using it: just because a high-end CPU can do math on 512 bit integers within some 10-20 cycles shouldn't mean that after a recompile all the chrono code is suddenly a huge register stomping inefficient mess, for no good fucking reason at all.
And on the other end of the spectrum, someone might well decide 16 bit is the intmax for an AVR8. Now your chrono code doesn't work at all.
intmax_t is as bad as int, for precisely the same reason. The developer has no idea what the capabilities of the type are. "Give me the biggest integer type you have" is not something any developer should say, ever, at least not in code that is intended to be remotely portable across time, space and architecture.
it's really a tie between "int", "long" and DWORD (though thankfully DWORD never made it into a language standard).
I would put "char" in the running now that we (thankfully) live in an increasingly UTF-8 world, but at least it's more or less universally understood to mean an 8 bit entity (even though this is not strictly true). It certainly isn't "a character".
long is definitely worse than int. At least you know int is never 64-bit on desktops, and it comes in handy enough for things like logarithms. long being either 32-bit or 64-bit on the same platform depending on the compiler just feels like insanity.
Even programmers who are roughly aware of integral promotion rules may not have connected all the dots to realize the answer is implementation dependent.
If sizeof(long) == sizeof(unsigned int), 1L * 2U results in an unsigned int, meaning the result is "false".
If sizeof(long) > sizeof(unsigned int), 1L * 2U result in a signed long, meaning the result is "true".
Wow that is cruel! Somehow I just assumed int would get promoted to long regardless of their size difference. That this isn't the case almost seems like a language defect!
I used intmax_t just yesterday in an ODBC type conversion test. I wrote a template for use in a Google Test typed test to find the highest and lowest values supported by the client and server type. Something like:
Yeah, min() returns the most negative int, so it seems like it should do the same for float. Then the float with the least magnitude could have been, say, smallest().
> at least not in code that is intended to be remotely portable across time, space and architecture.
intmax_t exists probably because most integer types in C are not portable. You can't write C code that uses int64_t and expect it to compile, the type doesn't have to exist, neither do int32_t or int16_t. Say you have a generic int lowestBitSet(intmax_t) function and want to make it available, you don't care how large the input integer is as long as you do not truncate the largest type available to whoever ends up using it. Without intmax_t you end up with library developers having to either setting their own upper limit, limiting the portability of their library or checking for the availability of each type themselves.
I don't use int64_t and expect it to exist. I use int64_t and expect compilation to fail if stdint.h doesn't define it. It would be a problem, but much less of a problem than using intWhatever_t, having compilation work, but having no idea what the bounds were.
intmax_t doesn't define an upper limit either, as TFA explains, because you don't know at runtime what the limit really is.
The idea for the library writer are that the bounds are large enough for whatever the user needs. User inputs a int64_t? Great the type intmax_t on that system is at least int64_t wide by definition. System doesn't have a int64_t ? Great the user can at best use a int32_t, which intmax_t would once again support.
> intmax_t doesn't define an upper limit either, as TFA explains
Yeah, it would almost be sad that C++ is getting bitten by Cs lack of name mangling. Almost because it is kinda pointless to define a maximum integer in a language where any programmer can add their own fixed sized integer types, so a generic library supporting arbitrary sized integers would have to take template parameters anyway.
While that isn't really relevant as the sizes used by my example could just be larger (if intmax_t worked as intended, which of course it doesn't), the fixed width types are all optional. For example on systems that use ones complement you would have a intmax_t, but no int32_t or any of the others.
I agree that intmax is silly, but I believe there are legitimate uses for variable-size types, like int_fast8_t and uint_fast16_t. These are their nominal size on x86 because the architecture has arithmetic operations for those sizes. But on POWER, they are both 32-bit types, because there are no 8-bit or 16-bit opcodes, and emulating overflow behavior with bitmasks incurs a performance hit.
Implementing this has the language level has proven to be a total disaster.
The number of cases where it’s important to use different types on different architectures exists, but is very small.
That’s something that would be much easier to handle at the library-level for a finite set of architectures than at the language level for all archs.
I’d also like to see the elimination of printf specifiers. Another total disaster and source of soooooo many bugs. Especially with all of these variable types.
C99, which introduced `intmax_t`, also requires that a compiler provide a `long long int` of at least 64 bits. So on the bright side, `intmax_t` will be at least 64 bits, and on the down side, you really don't want to use that type on an AVR.
But if you used a fixed size type, you end up doing things like using 64-bit integers on AVR, where you’re going to again have a bad time. Flexible and fixed width types both have their uses.
The pain seems to originate from the wish to keep one ABI for an architecture even though the architecture is mutable and constantly changing. The intmax_t type was made to represent the largest signed integer type on an architecture. If you modify the architecture to support a new, larger, integer type, you have now created a new architecture, with a new ABI. If you do this, there is no problem with intmax_t.
But of course, ABI’s are important, since proprietary binary blobs are what is important.
ABIs are important even in the absence of proprietary binary blobs.
They allow different compiler chains to interoperate, they're the basis of FFIs, there are so many situations where I have the source to both sides of some pipe-fitting operation and having a common ABI is what makes my job possible.
ABIs are important within an architecture. But there is no good reason to stretch an ABI over too many variants of the same architecture, except to preserve compatibility with a single binary.
For C++, you don’t need that fancy struct. You can just use inline namespaces, which were specifically designed as a portable form of symbol versioning.
But making the standard library work with a changed ABI is the easy part. The hard part is every other library. Users expect to be able to compile libraries (both static and dynamic) and then keep using the binary artifacts even after upgrading their operating system. Admittedly, most libraries won’t use intmax_t in their public API, but if one does, changing the definition of intmax_t will cause the same types of ABI breaks as with the standard library. And unlike the standard library developers, those users probably won’t know how to fix them, nor appreciate being made to.
If you change the size or padding in a struct of course a dynamic library is gonna break every program using it, I'm not sure why intmax_t was singled out here.
I don't know how D handles it, but I got an extraordinary "all is right with the world" years ago when I saw ada with common definitions for bit order, max integers and digits.
Perhaps I’m misunderstanding the problem here, but isn’t the solution here that a libc must carry multiple ABI-compatible functions and pick the right one based on metadata in the binary? Is there anything specific here that applies only to intmax_t or is this just a general issue with ABIs?
I would expect it will work a lot better if you have one libc library per ABI, and link in the matching one to the executable (and the same for all your other libraries).
The only thing that may need to deal with multiple supported ABIs simultaneously is the kernel. But maybe kernels are using more explicit types?
Personally, I don't like to write code with int/long/etc. I want to know how many bits are in the thing, because it usually doesn't make sense for the program to sometimes have one number of bits, and sometimes another (pointer/memory offsets excluded).
> The only thing that may need to deal with multiple supported ABIs simultaneously is the kernel. But maybe kernels are using more explicit types?
Right. Kernel ABIs use explicit-width types, and if an ABI change is needed, a new syscall is added. The old one continues to exist with the old ABI indefinitely.
Yeah, I think you can handle this with symbol versioning. Which is a sort of hand-coded linker magic for symbol mangling that the author complains about. But I don't buy that it's the major problem the author claims.
This article presents as a language flaw what is really an operating system flaw. Exposing system functionality through dynamically-linked C libraries, and using C features that do not guarantee ABI compatibility, results in an OS interface that does not provide backwards compatibility.
When I ran into similar problems trying to distribute a binary for Linux users, I thought "Screw it, I'll just statically link everything." But the older statically linked C libraries would crash on newer kernels, and the newer static C libraries would refuse to run on older kernels. (Or do I have that backwards? Anyway, I ended up concluding that Linux doesn't really have a stable ABI.)
Now, if you think "operating system" == "kernel", then, yes this is a language problem. And yes, people who write in x86 assembly and directly invoke the Linux kernel don't have all of these worries. They have much more manageable backwards compatibility issues. But for most of us, "programming to Linux" means using the APIs defined in man pages.
Perhaps the actual "OS API" should be more completely divorced from C library issues. The kernel should define its own contract in some kind of IDL from which one can generate APIs for any language (including all of the variants of C that differ by implementation choices). Yes, it should map to C without any impedance mismatch, and the C folks can deal with how exactly it maps to different implementations of C, but other language runtimes shouldn't have to go through C and re-live all its problems.
That is an UNIX problem, not every OS uses C as the OS ABI, e.g. Android, Windows, IBM i, z/OS, ...
Also given the target domain of C, the standard also must take into account possible free standing deployments, where an OS doesn't even come into play.
Nope, the calling conventions are selectable, originally on the 16 bit days actually used Pascal calling convention, then with age there are several APIs whose public API is only available as either COM, .NET or both, regardless how the interface with the kernel is done.
Interesting read but IMHO fairly irrelevant for language users. I've been coding in C and C++ for more than 25 years and never encountered intmax_t or any problems related to it.
It's better to accept that C (and C++) are only portable in theory, in reality it's nearly impossible to write non-trivial code which compiles without warnings in a different compiler or runs out of the box on a different hardware architecture (still the language does a pretty good job at it everything considered). IMHO the standard committees should focus on the (handful) platforms that exist today, and not care too much about legacy or future platforms. The C standard shouldn't be some "pure ideal", just harmonize the differences between compilers and platforms, and only so far as it makes sense for real-world code.
Personally I'd rather see extensions from gcc and clang moved into the standard which are already battle proven instead of wasting time fixing 30 year old problems.
> It's better to accept that C (and C++) are only portable in theory, in reality it's nearly impossible to write non-trivial code which compiles without warnings in a different compiler or runs out of the box on a different hardware architecture
It’s not that hard, unless you’re using -Wall or similar. And usually a platform that warns about something is probably giving decent advice anyways, regardless of whether the warning shows up elsewhere.
Of course. I guess what I wanted to say is: There's no way around testing your code on all compilers and platforms where that code is expected to run. There will always be new warnings and unpexted portability problems. Fixing those makes the code more robust. The language standard can only help a limited amount.
> Interesting read but IMHO fairly irrelevant for language users. I've been coding in C and C++ for more than 25 years and never encountered intmax_t or any problems related to it.
15 years and ditto. I've only used it and seen it used for printf with %j, because that sucks less than the C99 PRIxBLAH conversions.
> It's better to accept that C (and C++) are only portable in theory,
Having maintained a "portable" C code based across HP-UX, Aix, DG/UX, Solaris, Linux, BSD and Windows NT/2000, yes it is portable in theory, with plenty of workarounds in practice.
> Personally I'd rather see real-world extensions from gcc and clang moved into the standard which are already battle proven instead of wasting time fixing 30 year old problems.
there's nothing that prevents you from using clang on all platforms, which will be exactly the same experience that if you used $LANG_WITHOUT_AN_ISO_STANDARD
Oh but there is :) As a library author I really don't want to dictate users what compiler to use, they should at least be free to choose between the "big 3": gcc, clang and MSVC. Having those 3 compilers agree on more language features would be very helpful (this doesn't even need to happen through the standard, it would be enough if gcc, clang and MSVC would agree on a handful of important non-standard extensions).
I use a clang/libc++ toolchain to target Linux, macOS and Windows - works like a charm. It's also the toolchain used by Android and iOS, and I could also do some WASM with it. So portability here is definitely not an issue to me
c'mon, clang is able to target x86&64, ARM, AArch64, MIPS, AVR, RISC-V, WASM among others - those combined are 99.9% of the market (and 100% of the market for desktop apps).
Putting VLA in the stack is just an implementation choice. A conforming implementation could allocate all VLA in the heap (and freeing the memory upon scope loss), or do not even use a stack/heap system at all. In that sense, VLA are a crucial language feature to allow a RAII-style for C code. Essentially, you could replace this
void f(int n, ...)
{
float *t = malloc(n * sizeof*t);
// do stuff with t[i]
free (t);
}
with the following semantically equivalent code
void f(int n, ...)
{
float t[n];
// do stuff with t[i]
}
Today, this is not a good idea for large arrays because, unfortunately, many C implementations put VLA in the stack. I hope that by keeping VLA in the standard language, future C implementations will get to favor this usage.
Notice that VLA are not more dangerous or risky than recursion. The exact same argument that you use against VLA can be used against recursion. Are you against recursive functions in C? Of course, you wouldn't traverse a long array using recursion, but if you are careful that the depth of recursion is logarithmic with the array size (e.g. as in binary search), then it is perfectly safe and reasonable to use recursion. The same thing with VLA. If you are sure that n is 2,3 or at most 4 there's no reason to avoid VLA due to stack scare, and they may make your code much simpler, clearer and better.
On the other hand, VLA are useful even if you do not use them to allocate memory. Pointers to vla can be used for many things, for example accessing a one-dimensional array (allocated by a single call to malloc) using two indices as if it was a 2d array.
Yes they are still there are optional, which basically means that other than clang and gcc, no other C compiler is going to bother supporting them anyway.
Google has sponsored a 2 year effort to remove all VLAs from Linux kernel, that is how much love they get from the security industry.
> other than clang and gcc, no other C compiler is going to bother supporting them anyway.
I understand that you have some sort of personal grudge against VLA, but there's no need to use misinformation to further your agenda. VLA are supported by all C compilers except Microsoft's. More precisely, they are supported by gcc, clang, icc, tcc and oracle developer studio (suncc), as well as several other experimental C compilers that seem to be written for fun as a learning experience.
I understand that it makes sense to avoid some language features on an OS kernel, like VLA or recursion. For example, it would make sense to avoid types of ambiguous size such as "int". This does not mean in any way that using "int" is wrong in all cases. Many algorithms (I think about mathematical and numerical codes) are very naturally expressed using these fancier language features.
You can make a very valid case for language simplification and against VLA and other "modern" features. But your argument is less powerful if you sound ignorant of the widespread support and uses of these features.
Everyone that cares about security hates VLAs, that is why ISO acknowledged the mistake of placing them into C99 and removed them from the standard in C11.
Also why Google's Kernel Self Preservation project for Linux has sponsored the 2 year effort that it took to clean up Linux from VLA usage.
You arguments for VLA adoption are meaningless when companies like Google and Microsoft are voting with their wallets against them, and WG14 has acknowledged their mistake.
Love and hate are emotional considerations not really worthy in this context. I would say that everybody that sees C as a valid replacement of Fortran loves and relies on VLAs and complex types. And there's probably much more people in the world interested in numeric codes than in "security" codes.
> that is why ISO acknowledged the mistake of placing them into C99 and removed them from the standard in C11.
This is most certainly not true. First, I'm almost sure that ISO never acknowledged any mistake with VLAs. Can you point me to where ISO acknowledged that purported mistake?
More importantly, your statement is false because ISO never removed VLA from the standard. That would be untoward and very contrary to their traditions. If they really wanted to remove them they would start by first deprecating them, and they are not deprecated. They are merely an optional feature of the language, just like the "main()" function is. The standard defines different kinds of conforming implementations, and some of them include VLA. Thus it is an official part of the language.
Anyone that knows ISO standardese knows that when a feature moves from compulsory into optional, it is a synonum for deprecated, as compilers that haven't yet adopted the specific feature never bother to spend resources implementing it.
You should read the link contents before posting them here.
"In particular, the order of parameters in function declarations should be arranged such that the size of an array appears before the array. The purpose is to allow Variable-Length Array (VLA) notation to be used."
Notation for what?
"Application Programming Interfaces (APIs) should be self-documenting when possible."
And why is that?
"This not only makes the code's purpose clearer to human readers, but also makes static analysis easier. Any new APIs added to the Standard should take this into consideration. "
On top of that, if you bother to actually read the ongoing C2X mailings, there are no plans to bring VLAs back to life, besides adopting the syntax to describe function parameters.
Most devs that take security seriously actually take the effort to know the details of the devil that we have to deal with.
You didn’t address any of his points and your rebuttals come across as little more than frantic handwaving. Moreover, as has been pointed in this post, VLAs don’t necessarily need to have security implications since they can be implemented outside the stack. Yet you keep bringing up Google and KSPP as if they are arbiters of truth instead of corporate entities with their own agendas.
You are also spreading misinformation. VLAs being optional does not mean they have been removed from the standard as you keep claiming.
Already done in C11, they were marked optional, so any compiler that jumped from C89 to C11 (we are on the edge of C2x now), isn't required to even bother themselves with VLAs and will still achieve C11 certification (C17, C2X,...)
it's an answer to the parent's claim that the "big" three compilers are gcc, clang and msvc. I guess they meant that in the sense of "the most relevant", or "the most used". While microsoft's is unarguably an important C++ compiler, it's rarely used for non-legacy C code.
>The glibc maintainers decide they’re going to change from long long as their intmax_t and move to __int256_t for most platforms, because most platforms support it and they have a lot of customers asking for it.
Is this a practical concern? That seems like such a strong calling convention change that it wouldn't really bother me if you had to recompile all your software in this situation.
I mean, let's even imagine that there would be a way to make this change somewhat transparent and automated. You still end up with some types magically now taking 256 bits instead of (probably) 64bits before that. If you use this type a lot the performance implications are pretty huge, both in terms of actually doing maths with a 256 bit integer, but also for RAM and cache usage.
If you suddenly changed int to be 64 instead of 32bits you'd have a similar problem for instance. I don't think anybody would expect such a change not to require a full recompile. So what's the difference here?
> If you suddenly changed int to be 64 instead of 32bits you'd have a similar problem for instance. I don't think anybody would expect such a change not to require a full recompile. So what's the difference here?
intmax_t is supposed to store any sized integer type. If you introduce a int128_t, it doesn't have any impact on 'int.' But it does mean that your intmax_t must be int128_t, and if it was 64-bit before, now your ABI changes (for intmax_t-using ABIs). Ordinarily introducing a new type or function wouldn't cause ABI incompatibility, but introducing a wider integer type that intmax_t cannot represent is not a valid choice.
I'd suggest considering ABIs with intmax_t in them unstable, and probably just avoiding use of intmax_t in ABIs entirely. Integer size-generic routines are best served with wrapper macros and use of the C11 _Generic() construction, or some of the typeof() magic used by, e.g., Linux.
Yeah I think I agree with you and stncls, the practical conclusion is that one probably shouldn't use intmax_t in external ABIs.
Not that it should be much of a concern in practice, as far as I can tell intmax_t seems mostly about writing "generic" integer code without having to specialize for any int type, but that wouldn't make for a very clean or idiomatic public C API IMO. At least I don't think I've ever encountered it in the wild.
> Not that it should be much of a concern in practice, as far as I can tell intmax_t seems mostly about writing "generic" integer code without having to specialize for any int type, but that wouldn't make for a very clean or idiomatic public C API IMO. At least I don't think I've ever encountered it in the wild.
I think you're exactly right. That's what I was trying to suggest and address with: "Integer size-generic routines are best served with wrapper macros and use of the C11 _Generic() construction, or some of the typeof() magic used by, e.g., Linux." It predates the C11 _Generic() stuff, and the standards ignore the GCC extensions like typeof().
> Is this a practical concern?
> That seems like such a strong calling convention change
I think you are spot on here. The OP's concern may be due to things like this:
> For example, let’s take a C Standard function that uses intmax_t, imaxabs
It looks like an unusual departure from typical libc functions like strtol(), strtoll(), etc. which have a primary type right in the name. To the untrained eye, intmax_t may look like some promise of portability, when it is actually no more portable than specifying int or long.
I’m confused by one line from the first code block in the “A Demonstration” section:
intmax_t val = imaxabs(val);
This is the first time `val` is defined in the block, so wouldn’t this raise a C compiler compilation error? If not, then I don’t understand the effect of this line.
I do not know who was against man pages, but God, man pages are a God-send. It must stay. I never liked info though.
---
In any case, I write in Forth, Ada, Go, and JavaScript (Vue.js). Not much C these days. If someone asks me a simple program to write to extract data from a format specified by him, C is what I would choose; mainly: structs. I could do it in Forth, but easier in C, he has GCC, I can create a binary easily, etc.
I don't think the linux kernel is a good example. They define their own internal ABI. There are no libraries. They don't try to be portable to exotic compilers, in fact only GCC is officially supported. They even get to opt out of things like signed overflow being undefined behavior.
Signed overflow should be defined and a platform that provides a normal int which has different overflow behavior should have their int named different. Maybe noverflowed_int or
Defining signed overflow requires sticking more instructions for every integer addition, since the platform is not guaranteed to actually overflow when you want it to so the program will have to detect this and force the overflow to happen. Major slowdowns.
Lots of loop stuff also depends on non-overflowing signed integers. If integers can shrink on addition then you cannot safely do a lot of unrolling or fancier stuff based on affine loop transformations.
> Defining signed overflow requires sticking more instructions for every integer addition, since the platform is not guaranteed to actually overflow when you want it to so the program will have to detect this and force the overflow to happen. Major slowdowns.
That is most definitely not true. All modern hardware is 2's complement arithmetic, and the corresponding signed integer arithmetic operation will silently overflow without complaint.
> Lots of loop stuff also depends on non-overflowing signed integers. If integers can shrink on addition then you cannot safely do a lot of unrolling or fancier stuff based on affine loop transformations.
Not really. Only certain particular patterns force weirdness on loops were overflow well-defined, and these aren't the most common pattern (a basic for (int i = 0; i < N; i++) loop doesn't need overflow to be undefined to have loop optimizations kick in). Even if you hit a case where it can be annoying (such as the poor man's multidimensional array access, because C lacks such niceties), you can multiversion the loop to fall back to an unoptimized loop in cases where overflow would have kicked in and otherwise use the optimized loop.
Where I've seen the most issues with signed versus unsigned is when LLVM decides to take a loop with a signed 32-bit index, promote the index variable to 64-bit (using zero-extension for all the related values), and then retain a signed 32-bit comparison for the loop bound. The unsigned version of the code had fewer problems. So much for signed overflow being good for optimizations.
> All modern hardware is 2's complement arithmetic, and the corresponding signed integer arithmetic operation will silently overflow without complaint.
But not the same width. Does it overflow at 32bits or 64? If you define it as one or the other, that requires a runtime check.
Exactly, if the coder thinks those optimizations will buy him anything then let them enable them for that one case. And stop pretending that micro benchmarks matter for the rest of the 99.9% of code out there.
Last time I looked at the speedup you got from loop optimization it at most a few percent for a very simple loop. For a loop that has some meat it'd be noise.
> Last time I looked at the speedup you got from loop optimization it at most a few percent for a very simple loop.
Wow, no.
Most loop optimizations make loops significantly faster, even an order of magnitude is not uncommon.
Write a few loops and compare the code generated by any compiler with O0 vs O3.
No it wouldn't. It would mean loop optimizations shouldn't use `int` as it's defined. It means they should use whatever other integer type _does_ do what the optimization wants. And if that means creating a new integer type then so be it.
First, you are not forced to use intmax_t (unless of course you inherit it from someone else).
Second, in my personal opinion, it's really much safer to use C89/C90, unless you have very good reasons to use the extensions present in the newer versions of the standard. Even large projects like SQLite appreciate this approach. And if you are developing for an environment supporting C99 and newer standards, you could as well use C++ (or a newer language).
And on the other end of the spectrum, someone might well decide 16 bit is the intmax for an AVR8. Now your chrono code doesn't work at all.