I remember being very disappointed when Apple went with the NeXT tech instead of the Be tech. I was in undergrad when that happened.
In retrospect though, the company wasn't making a technology decision. They were making a decision between Jobs and Gassee. Jobs came with NeXT and Gassee came with Be.
I don't think the technology mattered that much in the large scale of things. Jobs brought with him a strategy for moving personal computing from a technical market category to a fashion market category - either to make technology fashionable or to make fashion technical (however you want to look at it). It's a strategy that started with candy-coloured iMacs and ended with iPhones.
In retrospect though, the company wasn't making a technology decision. They were making a decision between Jobs and Gassee. Jobs came with NeXT and Gassee came with Be.
I don't think the technology mattered that much in the large scale of things.
Yes and no. The core of the purchase decision was really based on the technology. Ellen Hancock (Apple's CTO at the time) actually did a decent analysis of BeOs and NeXTStep. She was actually against some aspects of the purchase, and was not in favor of Be. She was also not in favor of the NeXT kernel. It is painful to say as a Be employee at the time, but Be internals were fragile, some technologies were very shallow, the kernel was brittle and under constant churn and we had big problems with our decision to have a C++ API. Gil Amelio liked Steve and Steve did a good job selling both a vision and the NeXT technology. BeOs was a really cool demo that was getting pulled into the direction of a real OS but had a long, long way to go. There actually was a possibility that Apple could have also gotten the Be code, but the board didn't go for it. As it turned out, most of the primary BeOs developers ended up at Apple via Eazel. The ones that didn't ended up at Google via Danger Research/Android.
Thank you for the Be-related posts. Maybe, one day, you could write a more detailed report of it in a format made for longer articles. I would read it.
I assure you that many Canadians who are making these moves are emitting very little signal outside of their purchasing decisions.
This is not some end state of success, but a process. It's people sharing their ideas, thoughts, and strategies on how to accomplish a relatively challenging economic shift.
What you are witnessing and commenting on is quite literally the messy business of a market organically evolving and developing. "Not American" is now a selling point for services.
That seems like a simplistic take, given that slavery in pratice still exists and we just decide not to call it slavery due to technical loopholes. The countries most closely associated with the global economic oil supply for example, are largely run on slave labour.
"The west" is no longer a well defined thing. America is its own thing now, and I don't think it fits in with any traditional notion of "The West" anymore, outside of historical inclusion. And without America the term just means Europe, so you might as well just refer to things directly instead of coming up with a new term: America, Europe, Canada, etc.
It provides no analytical value anymore to talk about "the west" as a shared family of identities or cultures. That concept was more an ephemeral artifact of some colonial history combined with the post WW2 global landscape and the fact that the US was the last industrialized country remaining that didn't have its industrial base bombed to smithereens.
I mentioned Canada in my comments, but only out of vanity as I'm a Canadian. Really when most people talk about "the west" what they have in their mind's eye is US and Europe. The other countries are largely considered lesser auxiliaries, including mine (although Canada has had a higher prominence in recent years).
What I don't understand is what analytical value the term "the west" holds anymore, OUTSIDE of that historical artifact. What meaningful statement can you make about "the west" as you define it these days?
I think it's wrong to characterize this in terms of belief. This is the behavioural outcome of the influential pressure of a systemic structure.
Infinite exponential growth is something we ALL "believe" in when we put a dollar into savings and expect to get a dollar and 5c out the next year.
The problem to me seems more that we tie all sorts of OTHER structural societal constructs to this one. To the degree that if we want to feed ourselves, clothe ourselves, and ensure shelter and security for ourselves and our loved ones - those basic _biological_ needs shared by most moderately sophisticated mammals - we are forced to plug into this system and ensure it delivers on its promise.
I've incorporated that infinite growth expectation into my kid's education plans, into our family retirement plans.
This is not a they issue, this is a we issue. The systemic structure is some parts organic but many parts choice and belief driven by general people on the street.
No, because you're not expecting to get 5c every year regardless of your investment. In this example, they want 5% of their initial investment. So, $100 becomes $105 the next year, the $105 becomes $110.25 the year after that, and so on. 1.05^years. The fact that economic growth is measured as a percentage implies an exponential.
We used cloth diapers for our son for about 8 months, and then it just got to be too much of a hectic nightmare washing the poop cloths between work and other issues. So we did disposables for about two years. We were having a hard time getting him trained off of the diapers, until one day we just decided to follow some advice we'd heard and took them off and let him go diaperless on the floors (thankfully wooden).. and he trained in a couple days.
Just thought I'd pass along the one training suggestion I have. Cloth or disposable.. when you're ready for them to move off it - it really helps if they're able to see and associate their bowel and bladder movements with the physical artifacts.
I suspect it helps it click faster that yes, "this is the stuff that needs to go into the potty and not pooling around my legs in a clammy cold puddle".
>Just thought I'd pass along the one training suggestion I have. Cloth or disposable.. when you're ready for them to move off it - it really helps if they're able to see and associate their bowel and bladder movements with the physical artifacts.
Cloth diapers help a lot with this. It's one of the reasons kids on cloth diapers are usually much earlier trained.
Modern diapers are so good there's essentially no feedback.
That's why we started with cloth. We just didn't have the tenacity to pull through and gave in within a year.
I agree that the modern diaper is so good that it effectively disconnects the feeling of evacuation from the consequence of it.
I think the other thing kids pick up on when you're mopping up their floor leavings is the grossness aspect, which is a bit more learned. They see you grimacing every time you touch it - they see you taking care to ensure that it doesn't get on other parts of your body. Toddlers watch body language and reactions a lot to understand how they should relate to things.
> let him go diaperless on the floors (thankfully wooden).. and he trained in a couple days.
This terrifies me! We have carpets. Some tile, but also some carpets. I guess it's not too different from when the cats have an "accident" though, just bigger messes.
I'm sure there are cognitive declines as you age, but even discounting those there's some fundamental change happening to the opportunity space.
I'm in my mid 40s, I've had a really fulfilling career working on interesting things and making decent money, and over that time have accumulated a few passion projects that I knew were always out of my reach.
Well, technically within my reach but I'd need to somehow find someone to pay for me and a team for some period of time to work on stuff.
When I started playing around with these tools, it started feeling like maybe some of my ideas were within reach. Some time after, it felt plausible enough that I've decided to go for it. I'm actively in the middle of some deep performance research that I simply would not have the bandwidth or capacity for without these tools.
I've also managed to acquire enough confidence in the likelyhood of some degree of success that I'm investing in starting a company (self-funded) to develop and release and license the stuff i'm building.
I don't know exactly how my ideas will turn out, but that's part of the excitement and anticipation. Point is I never felt I had enough breathing room to really go for it (between normal life obligations like mortgage, feeding kids, etc.)
These tools have changed the equation enough that it's made it more feasible for me to pursue some of these ideas on my own. Things I would have shelved for the rest of my life, probably.. or maybe tried to encourage and interest others into doing.
I've encountered this failure mode, and the opposite of it: thinking too much. A behaviour I've come to see as some sort of pseudo-neuroticism.
Lazy thinking makes LLMs do surface analysis and then produce things that are wrong. Neurotic thinking will see them over-analyze, and then repeatedly second-guess themselves, repeatedly re-derive conclusions.
Something very similar to an anxiety loop in humans, where problems without solutions are obsessed about in circles.
yeah i experienced this the other day when asking claude code to build an http proxy using an afsk modem software to communicate over the computers sound card. it had an absolute fit tuning the system and would loop for hours trying and doubling back. eventually after some change in prompt direction to think more deeply and test more comprehensively it figured it out. i certainly had no idea how to build a afsk modem.
Knowledge is being aware of the analogy of tomatoes not being treated like fruits even though they technically are.
Wisdom is understanding that if there was legislation on the matter, and people who ate, produced, or sold non-tomato fruits were hunted and deprived of their freedoms by the state on the basis that fruits are bad for society, then you would likely see similar frustrations expressed about an article title that includes the phrase "tomatoes and fruits" to distinguish them.
Alcohol in moderation is relaxing. Most drugs, OTOH, when used at the doses that make them attractive to recreational drug users, impair reason, and impairing reason is not just stupid, but immoral. We can debate the particular methods by which the state regulates or otherwise deals with drug use, but there is nothing intrinsically wrong with the criminalization of such drugs as such. No one has a right to take drugs (there is no right to immorality). This may seem alien to a culture whose emaciated understanding of morality is exhausted by the concept of consent. The law is a teacher, and it is good to teach people that recreational drug use (and drunkenness) is a bad thing. Like all immorality, it is an insult to one's dignity and humanity.
We can tolerate the impairment of reason as a proportionate side effect [0] (for instance, high doses of morphine given to terminally ill patients in extreme pain), but this is not recreational use.
I suspect it's just circumstantial - two different design approaches. Both of the approaches have their advantages and disadvantages.
IMHO the bigger issue with NaN-boxing is that on 64-bit systems it relies on the address space only needing <50 bits or so, as the discriminator is stored on the high bits. It's ok for now when virtual address spaces typically only need 48 bits of representation, but that's already starting to slip with newer systems.
On the other hand, I love the fact that NaN-boxing basically lets you eliminate all heap allocations for doubles.
I actually wrote a small article a while back on a hybrid approach called Ex-boxing (exponent boxing), which tries to get at the best of both worlds: decouple the boxing representation from virtual address significant bits, and also represent most (almost all) doubles that show up at runtime as immediates.
> IMHO the bigger issue with NaN-boxing is that on 64-bit systems it relies on the address space only needing <50 bits or so, as the discriminator is stored on the high bits.
Is this right? You get 51 tag bits, of which you must use one to distinguish pointer-to-object from other uses of the tag bits (assuming Huffman-ish coding of tags). But objects are presumedly a minimum of 8-byte sized and aligned, and on most platforms I assume they'd be 16-byte sized and aligned, which means the low three (four) bits of the address are implicit, giving 53 (54) bit object addresses. This is quite a few years of runway...
There's a bit of time yes, but for an engine that relies on this format (e.g. spidermonkey), the assumptions associated with the value boxing format would have leaked into the codebase all over the place. It's the kind of thing that's far less painful to take care of when you don't need to do it than when you need to do it.
But fair point on the aligned pointers - that would give you some free bits to keep using, but it gets ugly.
You're right about the 51 bits - I always get mixed up about whether it's 12 bits of exponent, or the 12 includes the sign. Point is it puts some hard constraints on a pretty large number of high bits of a pointer being free, as opposed to an alignment requirement for low-bit tagging which will never run out of bits.
I think this is an attempt to try to enrich the locality model in transformers.
One of the weird things you do in transformers is add a position vector which captures the distance between the token being attended to the some other token.
This is obviously not powerful enough to express non-linear relationships - like graph relationships.
This person seems to be experimenting with doing pre-processing of the input token set, to linearly reorder it by some other heuristic that might map more closely to the actual underlying relationship between each token.
Once upon a time during me being language modeling researcher I built and finetuned a big (at the time - about 5 billions parameters) Sparse Non-Negative Matrix Language Model [1].
As this model allows for mix-and-match of various contexts, one thing that I did is to have a word-sorted context. This effectively transforms position-based context into a word-set based context, where "you and me", "me and you" and "and me you" are the same.
This allowed for longer contexts and better prediction.
I've saved it to look at it in the future. I also remembered Kristina Tautanova's name (your editor). Looking up recent publications, she's done interesting work on analyzing pretraining mixtures.
Well, in your work, whay benefit did you get from it? And do you think it would be beneficial today combined with modern techniques? Or obsoleted by other technqiue?
(I ask because I'm finding many old techniques are still good or could be mixed with deep learning.)
It was not bad, but I had trouble scaling it to the 1B set. Mostly because I have not enough time.
I do hold same mindset as yours, that many old techniques are misunderstood or underapplied. For example, decision trees, in my experiments, allow for bit-length-per-byte comparable to LSTM (lstm-compress or LSTM in nncp experiments): https://github.com/thesz/codeta
Adding the position vector is basic sure, but it's naive to think the model doesn't develop its own positional system bootstrapping on top of the barebones one.
> This is obviously not powerful enough to express non-linear relationships - like graph relationships.
the distance metrics used is based on energy-informed graphs that encode energy relations in a distribution called taumode, see my previous paper on spectral indexing for vector databases for a complete roll-out
In retrospect though, the company wasn't making a technology decision. They were making a decision between Jobs and Gassee. Jobs came with NeXT and Gassee came with Be.
I don't think the technology mattered that much in the large scale of things. Jobs brought with him a strategy for moving personal computing from a technical market category to a fashion market category - either to make technology fashionable or to make fashion technical (however you want to look at it). It's a strategy that started with candy-coloured iMacs and ended with iPhones.
Gassee brought a really cool OS.
Apple made the right choice.