Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm no expert on C, but I was under the impression that type aliases like off_t are introduced to have the possibility to change then later. This clearly doesn't work. Am I wrong?


Source vs binary compatibility. Using typedefs like off_t mean you usually don't have to re-write code, but you do have to re-compile everything that uses that type.


But isn’t the point of a source only distribution like Gentoo, to build everything yourself? Who is running gentoo but also lugging around old precompiled stuff they don’t have the source for?


As the post describes, the problem is that on Gentoo you can’t really build everything then switch binaries for everything, or at least that’s not what happens when you update things the usual way.

Instead, dependencies are built and then installed, then dependents are built against the installed versions and then installed, etc. In the middle of this process, the system can technically be half-broken, because it could be attempting to run older dependents against newer dependencies. Usually this is not a real problem. Because of how pervasive time_t is, though, the half-broken state here is potentially very broken to the point that you may be unable to resume the update process if anything breaks for any reason.


It kinda work, but not in a source based distro. If you can atomically rebuild @world with changed definition of off_t, then there will be no problem. But source based distro doesn't rebuild @world atomically. It rebuild one package at time, so there would be inconveniences like libc.so has 64-bit off_t, while gcc was build for 32-bit off_t, so gcc stops working. Or maybe bash, coreutils, make, binutils or any other package that is needed for rebuilding @world. At this point you are stuck.

So such upgrade needs care.


As someone whose nearest exposure to a "source based distro" is FreeBSD, this sounds like madness, as it means a broken build could not only impair attempts to repair the build, but render the system unbootable and/or unusable.

And as someone who regularly uses traditional Linux distros like Debian and Fedora, the idea that a package management system would allow a set of packages known to be incompatible with one another to be installed without force, or that core package maintainers would knowingly specify incorrect requirements, is terrifying.

While I'm not familiar with Gentoo, my reading of this article suggests that its maintainers are well aware of these sorts of problems and that Gentoo does not, in fact, suffer from them (intentionally; inevitably mistakes happen just as they occasionally do on the bleeding edge of FreeBSD and other Linux distros).


FreeBSD does not use package management for the core system (i.e. freebsd itself, not ports). Gentoo does.


Why not essentially treat it as a cross compilation scenario? NixOS is also source based, but I don't think such a migration would be particularly difficult. You'd use the 32-bit off_t gcc to compile a glibc with 64-bit off_t, then compile a 64-bit off_t gcc linked against that new glibc, and so on. The host compiler shouldn't matter.

I always understood the challenge as binary compatibility, when you can't just switch the entire world at once.


Nixos has it easier here because they don't require packages to be "installed" before building code against them. For Gentoo, none of their build scripts (ebuilds) are written to support that. It's plausible that they might change the embuild machinery so that this kind of build (against non-installed packages) could work, but it would need investigation and might be a difficult lift to get it working for all packages.

"treat it as a cross compilation scenario" is essentially what the post discusses when they mention "use a different CHOST". A CHOST is a unique name identifying a system configuration, like "x86_64-unknown-linux-gnu" (etc). Gentoo treats building for different CHOSTs as cross compiling.


NixOS isn't the same kind of source-based. At some level, even Debian could be said to be source based: there's nothing stopping you from deciding to build every package from source before installing it, and obviously the packages are themselves built from source at some point.

NixOS sits between Debian and Gentoo, as it maintains an output that's capable of existing independently of the rest of the system (like Debian) but is designed to use the current host as a builder (like Gentoo). Gentoo doesn't have any way to keep individual builds separate from the system as a whole, as intimated in the article, so you need to work out how to keep the two worlds separate while you do the build.

I think what they're suggesting winds up being pretty similar to what you suggest, just with the right plumbing to make it work in a Gentoo system. NixOS would need different plumbing, I'm not sure whether they've done it yet or how but I can easily imagine it being more straightforward than what Gentoo is needing to do.


There absolutely will be problems with different Nix profiles that aren't updated together; for example, if you update some packages installed in your user's profile but not the running system profile. But this is common enough with other glibc ABI breakage that folks tend to update home and system profiles together, or know that they need to reboot.

Where it will be hell is running Nix-built packages on a non-NixOS system with non-ABI-compatible glibc. That is something that desperately needs fixing on the glibc side, mostly from the design of nss and networking, that prevent linking against glibc statically.


> Where it will be hell is running Nix-built packages on a non-NixOS system with non-ABI-compatible glibc.

This isn't a thing. Nix-built binaries all each use the hardcoded glibc they were built with. You can have any number of glibc's simultaneously in use.


> there would be inconveniences like libc.so has 64-bit off_t

glibc specifically has support for the 32-bit and 64-bit time_t abi simultaneously.

From the post:

> What’s important here is that a single glibc build remains compatible with all three variants. However, libraries that use these types in their API are not.


Yeah, glibc was a bad example. Probably libz.so or something would be better.


Yep, agreed. Though it does expose an option here which would be "have glibc provide a mechanism for other libraries (likely more core/widely used libs) to support both ABIs simultaneously.

Presumably if that had been done in glibc when 64-bit time_t support was added, we could have had multi-size-time ABI support in things like zlib by now. Seems like a mistake on glibc's part not to create that initially (years ago).

Though if other distros have already switched, I'd posit that perhaps Gentoo needs to rethink its design a bit so it doesn't run into this issue instead.


> have glibc provide a mechanism for other libraries (likely more core/widely used libs) to support both ABIs simultaneously

The linked article is wrong to imply this isn't possible - and it really doesn't depend on "GLIBC provide a mechanism". All you have to do is:

* Compile the library itself with traditional time_t (32-bit or 64-bit depending on platform), but convert and call time64 APIs internally.

* Make a copy of every public structure that embeds a time_t (directly or indirectly), using time64_t (or whatever contains it) instead.

* Make a copy of every function that takes any time-using type, to use the time64 types.

* In the public headers, check if everything is compiled with 64-bit time_t, and if so make all the traditional types/functions aliases for the time64 versions.

* Disable most of this on platforms that already used 64-bit time_t. Instead (for convenience of external callers) make all the time64 names aliases for the traditional names.

It's just that this is a lot of work, and little benefit to any particular library. the GCC 5 std::string transition is probably a better story than LFS, in particular the compiler-supported `abi_tag` to help detect errors (but I think that's only for C++, ugh - language without room for automatic mangling suck).

(minor note: using typedefs rather than struct tags for your API makes this easier)


This is a nightmare to implement (unless you go library by library and do it by hand, or you have significant compiler help), as e.g. functions may not directly use a struct with time_t but indirectly assume something about the size of the struct.

Note that for example the std::string transition did not do this.


This sounds like something similar to NeXTSTEP/macOS fat binaries, only with the possibility of code sharing between "architectures".

I like it, though it sounds like something that'd be unlikely to see adoption in the Linux world without the endorsement of multiple major distros.


Fat binaries (packing together 2 entirely different objects and picking one of them) could potentially work here, though I suspect we'd run into issues of mixed 32-bit/64-bit time_t things.

Another option that's closer to how things work with LFS (large file support) (mostly in the past now) is to use create different interfaces to support 64-bit time_t, and pick defaults for time_t at compile time with a macro that picks the right impl.

Also possible could be having something like a version script (https://sourceware.org/binutils/docs/ld/VERSION.html) to tell the linker what symbols to use when 64-bit time_t was enabled. While this one might have some benefits, generally folks avoid using version scripts when possible, and would require changes in glibc, making it unlikely.

Both of those options (version script extention, and LFS-like pattern) could allow re-using the same binary (ie: smaller file size, no need to build code twice in general), and potentially enable mixing 32-bit time_t and 64-bit time_t code together in a single executable (not desirable, but does remove weird link issues).


I was thinking along the lines of a simple extension to the Mach-O fat binary concept to allow each member of the fat binary archive to be associated with one or more ABIs rather than just a single architecture.

Then all the time_t size-independent code would go into a member associated with both 32-bit and 64-bit time_t ABIs, and all the time_t size-dependent code would be separately compiled into single-ABI members, one for each ABI.

For both executables and libraries, name conflicts between members associated with the same ABI would be prohibited at compile/link time.

The effective symbol table for both compile time and runtime linking would then be the union of symbol tables defined by all members associated with the active ABI.

Language-level mechanisms that allow compilers to recognize and separately compile time_t-dependent symbols would be required, ideally implemented in ways that allow such dependence to be inferred in most if not all cases.

While compilers would be free to size-optimize code by splitting functions into dependent and independent subparts, I see no immediate reason why this needs to be explicitly supported at link level.

Finally, as I imagine it, mixing 32- and 64-bit time_t ABIs would be prohibited — implicitly, as 64-bit time_t symbols aren't ever visible to 32-bit time_t code and vice versa — with code that needs to support non-native time_t values (for I/O, IPC, etc.) left its own devices, just like code dealing with, e.g., non-native integer and floating-point formats today.

Admittedly this sounds like a lot of unnecessary work vs. similar alternatives built on existing multi-arch mechanisms, but it's still an interesting idea to contemplate.


> Finally, as I imagine it, mixing 32- and 64-bit time_t ABIs would be prohibited — implicitly, as 64-bit time_t symbols aren't ever visible to 32-bit time_t code and vice versa — with code that needs to support non-native time_t values (for I/O, IPC, etc.) left its own devices, just like code dealing with, e.g., non-native integer and floating-point formats today.

It's pretty easy to imagine having both 32-bit time_t and 64-bit time_t in a single "executable" as long as the interfaces actually in use between 32-bit-time and 64-bit-time components don't use `time_t` (or derived types).

iow: if the fact that `time_t` is 32-bits is kept entirely internal to some library A used by some library B (by virtue of library A not having any exposed types with time_t that are used by library B, and library A not having any functions that accept time_t or derived types), there's nothing preventing mixing 32-bit-time_t code and 64-bit-time_t code in a single executable/process (in this theoretical case where we use a LFS (ala _FILE_OFFSET_BITS, etc) for time_t).

LFS had/has the same capability for mixing (with off_t being the type in question there).


nixos is an example of a distro that is source based and can do the atomic rebuild here because it has a way to build packages against other packages that aren't "installed". In nixos, this is because there are very few things that get "installed" as the global, only thing to use. But one could imagine that Gentoo could build something that would allow them to at least build up one set of new packages without installing them and then install the files all at once.


this is just as much of a problem on binary distros mind you. just because there might be a smaller time delta does not mean that the problem is solved


It's only the first step of the puzzle. And arguably only a half step. As the article points out anytime that off_t is stuffed into a struct, or used in a function call, or integrated into a protocol, the abstraction is lost and the actual size matters. Mixing old and new code, either by loading a library or communicating over a protocol, means you get offsets wrong and things start crashing. Ultimately the changeover requires everybody to segregate their programs between "legacy" and "has been ported or at least looked over", which is incredibly painful.


They make it easier, but just at a source code level. They're not a real (and certainly not full) abstraction. An example that'll be making it obvious: if you replace the underlying type with a floating point type, the semantics would change dramatically, fully visible to the user code.

With larger types that otherwise have similar semantics, you can still have breakage. A straightforward one would be padding in structs. Another one is that a lot of use cases convert pointers to integers and back, so if you change the underlying representation, that's guaranteed to break. Whether that's a good or not is another question, but it's certainly not uncommon.

(Edit: sibling comments make the same point much more succinctly: ABI compatibility!)


It does work, the problem with ABI changes is that when you change it, you have to change it everywhere at the same time. By default there's nothing stopping you from linking one library built using 32-bit off_t with another library built using 64-bit off_t, and the resulting behaviour can be incredibly unpredictable.


The C library uses macros so that the referenced symbols are different depending on ABI (e.g. open becomes open64, etc.), but most 3rd party libraries don't bother with that, so they break if their API uses time_t/off_t.


I think the main reason they’re introduced is to provide a hint to the programmer and maybe some typesafety against accidentally passing, say, a file descriptor.


They are introduced to abstract ABI differences into a common API. They change when you move to a different system but chaning them on the same system was never the intent.


That makes sense, thank you.


Edit: nevermind others raise better points.


That's not the problem. If time_t was a struct with a single int32_t, you'd be in the same situation when you changed it to int64_t (ABI incompatibility: you need more space to store the value now).

In order not to have this problem you'd need the API to use opaque struts, for which only pointers would be exchanged.


at the source level it does, but when you have compiled libraries it breaks.


Yeah, the problem in general with type aliases is that they're just that, aliases, not proper types. This means that they leak all the details of the underlying type, and don't work for proper encapsulation, which is what's needed for being able to change what should be implementation details.


The problem here would be exactly the same regardless of what form this type took. The issue is fundamental to any low level language: the size of a type is a part of its public API (well, ABI), by definition. This is true in C++ with private fields as well, for example: if you add a private field to a class, or change the type of a private field with one of a different size, all code using that class has to be re-built. The only way to abstract this is to use types strictly through pointers, the way Java does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: