Sometimes you want the struct to be defined in a header so it can be passed and returned by value rather than pointer.
A technique I use is to leverage GCC's `poison` pragma to cause an error if attempting to access the struct's fields directly. I give the fields names that won't collide with anything, use macros to access them within the header and then `#undef` the macros at the end of the header.
Example - an immutable, pass-by-value string which couples the `char*` with the length of the string:
It just wraps `<string.h>` functions in a way that is slightly less error prone to use, and adds zero cost. We can pass the string everywhere by value rather than needing an opaque pointer. It's equivalent on SYSV (64-bit) to passing them as two separate arguments:
These DO NOT have the same calling convention. The latter is less efficient because it needs to dereference a pointer to return the out parameter. The former just returns length in `rax` and chars in `rdx` (`r0:r1`).
So returning a fat pointer is actually more efficient than returning a size and passing an out parameter on SYSV! (Though only marginally because in the latter case the pointer will be in cache).
Perhaps it's unfair to say "zero-cost" - it's slightly less than zero - cheaper than the conventional idiom of using an out parameter.
But it only works if the struct is <= 16-bytes and contains only INTEGER types. Any larger and the whole struct gets put on the stack for both arguments and returns. In that case it's probably better to use an opaque pointer.
That aside, when we define the struct in the header we can also `inline` most functions, so that avoids unnecessary branching overhead that we might have when using opaque pointers.
`#pragma GCC poison` is not portable, but it will be ignored wherever it isn't supported, so this won't prevent the code being compiled for other platforms - it just won't get the benefits we get from GCC & SYSV.
The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.
> The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.
Hmm, I found a solution and it was easier than expected. GCC has `__attribute__((designated_init))` we can stick on the struct which prevents positional initializers and requires the field names to be used (assuming -Werror). Since those names are poisoned, we won't be able to initialize except through functions defined in our library. We can similarly use a macro and #undef it.
Full encapsulation of a struct defined in a header:
Aside from horrible pointer aliasing tricks, the only way to create a `string_t` is via `string_alloc_from_chars` or other functions defined in the library which return `string_t`.
#include <stdio.h>
int main() {
string_t s = string_alloc_from_chars("Hello World!");
if (string_is_valid(s))
puts(string_to_chars(s));
string_free(s);
return 0;
}
A technique I use is to leverage GCC's `poison` pragma to cause an error if attempting to access the struct's fields directly. I give the fields names that won't collide with anything, use macros to access them within the header and then `#undef` the macros at the end of the header.
Example - an immutable, pass-by-value string which couples the `char*` with the length of the string:
It just wraps `<string.h>` functions in a way that is slightly less error prone to use, and adds zero cost. We can pass the string everywhere by value rather than needing an opaque pointer. It's equivalent on SYSV (64-bit) to passing them as two separate arguments: These have the exact same calling convention: length passed in `rdi` and `chars` passed in `rsi`. (Or equivalently, `r0:r1` on other architectures).The main advantage is that we can also return by value without an "out parameter".
These DO NOT have the same calling convention. The latter is less efficient because it needs to dereference a pointer to return the out parameter. The former just returns length in `rax` and chars in `rdx` (`r0:r1`).So returning a fat pointer is actually more efficient than returning a size and passing an out parameter on SYSV! (Though only marginally because in the latter case the pointer will be in cache).
Perhaps it's unfair to say "zero-cost" - it's slightly less than zero - cheaper than the conventional idiom of using an out parameter.
But it only works if the struct is <= 16-bytes and contains only INTEGER types. Any larger and the whole struct gets put on the stack for both arguments and returns. In that case it's probably better to use an opaque pointer.
That aside, when we define the struct in the header we can also `inline` most functions, so that avoids unnecessary branching overhead that we might have when using opaque pointers.
`#pragma GCC poison` is not portable, but it will be ignored wherever it isn't supported, so this won't prevent the code being compiled for other platforms - it just won't get the benefits we get from GCC & SYSV.
The biggest downside to this approach is we can't prevent the library user from using a struct initializer and creating an invalid structure (eg, length and actual string length not matching). It would be nice if there were some similar to trick to prevent using compound initializers with the type, then we could have full encapsulation without resorting to opaque pointers.