In Rust? The two I'm big fans of, CompactString and ColdString do not use unions although historically CompactString did so and it still has a dependency on smallvec's union feature
ColdString is easier to explain, the whole trick here is the "Maybe this isn't a pointer?" trick, ColdString might be a single raw pointer onto your heap with the rest of the data structure at the far end of the pointer, this case is expensive because nothing about the text lives inline, but... the other case is that your entire text was hidden in the pointer, on modern hardware that's 8 bytes of text, at no overhead, awesome.
CompactString is more like a drop-in replacement, it's much bigger, the same size as String, so 24 bytes on modern hardware, but that's all SSO, so text like "This will all fit nicely" fits inline, yet the out-of-line case has the usual affordances such as capacity and length in the data structure. This isn't doing the "Maybe this isn't a pointer?" trick but is instead relying on knowing that the last byte of a UTF-8 string can't have certain values by definition.
I realise that I don't do the best job of explaining ColdString here. After all most 8 byte strings of UTF-8 text could equally be a pointer so, why can this work?
All ColdStrings which look like 8 bytes of UTF-8 text really are 8 bytes of UTF-8 text, just the type label on those 8 bytes isn't "[u8; 8]" an array of 8 bytes but instead "mut *u8" a raw pointer. "Validate" for example is 8 bytes of ASCII, thus UTF-8, and Rust is OK with us just saying we want a pointer on a 64-bit machine with those bytes. It's not a valid pointer, but it is a pointer and Rust is OK with that, we just need to be careful never to [unsafely] dereference the pointer because it's invalid
OK, so there are two cases left: First, what if there are fewer bytes of text? Zero even?
Since there are fewer than 8 bytes of text we can use the whole first byte to signal how many of the remainder are text, we use the UTF-8 over-long prefix indicator in which the top five bits of the byte are all set, bytes 0xF8 through 0xFF for this, there are eight of these bytes corresponding to our 8 lengths 0 through 7 inclusive. Because it's over-long this indicator isn't itself a valid UTF-8 prefix. Again we can pretend this is a pointer while knowing it's invalid.
Lastly, the seemingly trickiest problem, what if the string didn't fit inline? We use a heap allocation to store the text prefixed by a variable size integer length and we insist this allocation is aligned to 4 bytes. This means a valid pointer to our allocation has zeroes for the bottom two bits, then we rotate that pointer so those bottom two bits are at the top of the first byte position (depending on machine word layout) and we set the top bit. This is now always invalid UTF-8 because it has the continuation marker - the top bit is set but the next is not, which cannot happen in the first byte of any UTF-8 text, and so our code can detect this and reverse the transformation to get back a valid pointer using the strict provenance APIs if this marker is present.
> Localization files for every language on Earth - [...] - Samsung really wanted to make sure everyone on the planet could experience this suffering equally
Why are you considering localization as bloat? I bet your reaction wouldn't be positive if your native language(s) were missing instead.
The alternative would be the installer only installing the languages that match the system settings. Which yes is imperfect, but not nearly as bad as separate downloads or god forbid the two tier base language and modification pack system Microsoft came up with.
I don't think that's standardized, it probably only has some heuristic to detect a subscription's associated payments and rejects them. It will not integrate in any way with merchants to cancel the subscription on their side, and in fact they suggest to first trying to cancel the subscription on the merchant side.
To be honest the limited popularity of F-Droid also helps it be less targetted by bad actors. If it was more popular I would bet the situation would surely be different
This argument can be refuted by considering Debian repositories. No malware exists there despite it being a good target. It's the FLOSS that solves the malware problem, with a bit of moderation.
I'd argue OSS isn't sufficient on its own and that I suspect moderation only plays a small role. I think it's primarily the separation of roles. For a complete outsider whose only interest is exploiting users publishing a sufficiently popular piece of software and also gaining the ability to add things to the debian repos is a huge barrier. You'd have to invest years of work to do both of those things and then hope that no one happened to notice anything before it was too late.
Of course the FLOSS aspect adds an additional hurdle that this popular piece of software will have to somehow avoid having much of a contributor community around it since that would greatly increase the risks of your malicious changeset being reviewed. I guess what happened with XZ was about the best case scenario that an attacker could realistically hope for.
There were a few mishaps with PyPI and npm - including in the past week and even today. Not sure if those meet your criteria of FLOSS, but if it does I wouldn't call it solved.
Yeah but supply chain attacks like that can hit literally anything. Debian repos, Play store, an individual publishing on his own website, it's all vulnerable.
reply