More

kannanvijayan · 2026-03-25T12:38:36 1774442316

I remember being very disappointed when Apple went with the NeXT tech instead of the Be tech. I was in undergrad when that happened.

In retrospect though, the company wasn't making a technology decision. They were making a decision between Jobs and Gassee. Jobs came with NeXT and Gassee came with Be.

I don't think the technology mattered that much in the large scale of things. Jobs brought with him a strategy for moving personal computing from a technical market category to a fashion market category - either to make technology fashionable or to make fashion technical (however you want to look at it). It's a strategy that started with candy-coloured iMacs and ended with iPhones.

Gassee brought a really cool OS.

Apple made the right choice.

diskzero · 2026-03-25T14:48:21 1774450101

In retrospect though, the company wasn't making a technology decision. They were making a decision between Jobs and Gassee. Jobs came with NeXT and Gassee came with Be. I don't think the technology mattered that much in the large scale of things.

Yes and no. The core of the purchase decision was really based on the technology. Ellen Hancock (Apple's CTO at the time) actually did a decent analysis of BeOs and NeXTStep. She was actually against some aspects of the purchase, and was not in favor of Be. She was also not in favor of the NeXT kernel. It is painful to say as a Be employee at the time, but Be internals were fragile, some technologies were very shallow, the kernel was brittle and under constant churn and we had big problems with our decision to have a C++ API. Gil Amelio liked Steve and Steve did a good job selling both a vision and the NeXT technology. BeOs was a really cool demo that was getting pulled into the direction of a real OS but had a long, long way to go. There actually was a possibility that Apple could have also gotten the Be code, but the board didn't go for it. As it turned out, most of the primary BeOs developers ended up at Apple via Eazel. The ones that didn't ended up at Google via Danger Research/Android.

gond · 2026-03-25T18:38:02 1774463882

Thank you for the Be-related posts. Maybe, one day, you could write a more detailed report of it in a format made for longer articles. I would read it.

kannanvijayan · 2026-03-26T04:02:34 1774497754

Always interesting to get an insiders take! I really appreciate the insight.

chuckadams · 2026-03-25T13:04:21 1774443861

I believe the saying goes that NeXT acquired Apple for -$427 million.

kannanvijayan · 2026-03-23T13:44:05 1774273445

I assure you that many Canadians who are making these moves are emitting very little signal outside of their purchasing decisions.

This is not some end state of success, but a process. It's people sharing their ideas, thoughts, and strategies on how to accomplish a relatively challenging economic shift.

What you are witnessing and commenting on is quite literally the messy business of a market organically evolving and developing. "Not American" is now a selling point for services.

kannanvijayan · 2026-03-19T13:12:10 1773925930

That seems like a simplistic take, given that slavery in pratice still exists and we just decide not to call it slavery due to technical loopholes. The countries most closely associated with the global economic oil supply for example, are largely run on slave labour.

"The west" is no longer a well defined thing. America is its own thing now, and I don't think it fits in with any traditional notion of "The West" anymore, outside of historical inclusion. And without America the term just means Europe, so you might as well just refer to things directly instead of coming up with a new term: America, Europe, Canada, etc.

It provides no analytical value anymore to talk about "the west" as a shared family of identities or cultures. That concept was more an ephemeral artifact of some colonial history combined with the post WW2 global landscape and the fact that the US was the last industrialized country remaining that didn't have its industrial base bombed to smithereens.

philipallstar · 2026-03-20T09:10:15 1773997815

Sorry, I don't know what you're talking about. The west includes Canada, Australia, New Zealand. Not just Europe.

But when I'm talking about is hundreds of years ago. History goes back more than a few months.

kannanvijayan · 2026-03-20T13:45:47 1774014347

I mentioned Canada in my comments, but only out of vanity as I'm a Canadian. Really when most people talk about "the west" what they have in their mind's eye is US and Europe. The other countries are largely considered lesser auxiliaries, including mine (although Canada has had a higher prominence in recent years).

What I don't understand is what analytical value the term "the west" holds anymore, OUTSIDE of that historical artifact. What meaningful statement can you make about "the west" as you define it these days?

kannanvijayan · 2026-03-08T17:11:20 1772989880

I think it's wrong to characterize this in terms of belief. This is the behavioural outcome of the influential pressure of a systemic structure.

Infinite exponential growth is something we ALL "believe" in when we put a dollar into savings and expect to get a dollar and 5c out the next year.

The problem to me seems more that we tie all sorts of OTHER structural societal constructs to this one. To the degree that if we want to feed ourselves, clothe ourselves, and ensure shelter and security for ourselves and our loved ones - those basic _biological_ needs shared by most moderately sophisticated mammals - we are forced to plug into this system and ensure it delivers on its promise.

I've incorporated that infinite growth expectation into my kid's education plans, into our family retirement plans.

This is not a they issue, this is a we issue. The systemic structure is some parts organic but many parts choice and belief driven by general people on the street.

mejutoco · 2026-03-08T17:45:45 1772991945

> Infinite exponential growth is something we ALL "believe" in when we put a dollar into savings and expect to get a dollar and 5c out the next year.

Isnt that linear growth?

tavavex · 2026-03-08T22:15:41 1773008141

No, because you're not expecting to get 5c every year regardless of your investment. In this example, they want 5% of their initial investment. So, $100 becomes $105 the next year, the $105 becomes $110.25 the year after that, and so on. 1.05^years. The fact that economic growth is measured as a percentage implies an exponential.

kannanvijayan · 2026-03-08T15:03:18 1772982198

We used cloth diapers for our son for about 8 months, and then it just got to be too much of a hectic nightmare washing the poop cloths between work and other issues. So we did disposables for about two years. We were having a hard time getting him trained off of the diapers, until one day we just decided to follow some advice we'd heard and took them off and let him go diaperless on the floors (thankfully wooden).. and he trained in a couple days.

Just thought I'd pass along the one training suggestion I have. Cloth or disposable.. when you're ready for them to move off it - it really helps if they're able to see and associate their bowel and bladder movements with the physical artifacts.

I suspect it helps it click faster that yes, "this is the stuff that needs to go into the potty and not pooling around my legs in a clammy cold puddle".

apexalpha · 2026-03-08T17:17:00 1772990220

>Just thought I'd pass along the one training suggestion I have. Cloth or disposable.. when you're ready for them to move off it - it really helps if they're able to see and associate their bowel and bladder movements with the physical artifacts.

Cloth diapers help a lot with this. It's one of the reasons kids on cloth diapers are usually much earlier trained.

Modern diapers are so good there's essentially no feedback.

kannanvijayan · 2026-03-09T00:32:06 1773016326

That's why we started with cloth. We just didn't have the tenacity to pull through and gave in within a year.

I agree that the modern diaper is so good that it effectively disconnects the feeling of evacuation from the consequence of it.

I think the other thing kids pick up on when you're mopping up their floor leavings is the grossness aspect, which is a bit more learned. They see you grimacing every time you touch it - they see you taking care to ensure that it doesn't get on other parts of your body. Toddlers watch body language and reactions a lot to understand how they should relate to things.

abustamam · 2026-03-08T18:22:20 1772994140

> let him go diaperless on the floors (thankfully wooden).. and he trained in a couple days.

This terrifies me! We have carpets. Some tile, but also some carpets. I guess it's not too different from when the cats have an "accident" though, just bigger messes.

beAbU · 2026-03-08T22:15:50 1773008150

Just get a carpet washer. Your kids will make a mess on the carpet sooner or later. Regardless of your potty training regime.

abustamam · 2026-03-09T05:01:55 1773032515

That sounds like a good investment.

kannanvijayan · 2026-03-09T00:38:48 1773016728

If you're considering it for real and the carpet issue a real concern, the solution I've seen is one of those large plastic/rubber/foam playmats.

kannanvijayan · 2026-03-07T15:57:10 1772899030

I'm sure there are cognitive declines as you age, but even discounting those there's some fundamental change happening to the opportunity space.

I'm in my mid 40s, I've had a really fulfilling career working on interesting things and making decent money, and over that time have accumulated a few passion projects that I knew were always out of my reach.

Well, technically within my reach but I'd need to somehow find someone to pay for me and a team for some period of time to work on stuff.

When I started playing around with these tools, it started feeling like maybe some of my ideas were within reach. Some time after, it felt plausible enough that I've decided to go for it. I'm actively in the middle of some deep performance research that I simply would not have the bandwidth or capacity for without these tools.

I've also managed to acquire enough confidence in the likelyhood of some degree of success that I'm investing in starting a company (self-funded) to develop and release and license the stuff i'm building.

I don't know exactly how my ideas will turn out, but that's part of the excitement and anticipation. Point is I never felt I had enough breathing room to really go for it (between normal life obligations like mortgage, feeding kids, etc.)

These tools have changed the equation enough that it's made it more feasible for me to pursue some of these ideas on my own. Things I would have shelved for the rest of my life, probably.. or maybe tried to encourage and interest others into doing.

kannanvijayan · 2026-02-22T02:41:28 1771728088

I've encountered this failure mode, and the opposite of it: thinking too much. A behaviour I've come to see as some sort of pseudo-neuroticism.

Lazy thinking makes LLMs do surface analysis and then produce things that are wrong. Neurotic thinking will see them over-analyze, and then repeatedly second-guess themselves, repeatedly re-derive conclusions.

Something very similar to an anxiety loop in humans, where problems without solutions are obsessed about in circles.

denimnerd42 · 2026-02-22T03:07:57 1771729677

yeah i experienced this the other day when asking claude code to build an http proxy using an afsk modem software to communicate over the computers sound card. it had an absolute fit tuning the system and would loop for hours trying and doubling back. eventually after some change in prompt direction to think more deeply and test more comprehensively it figured it out. i certainly had no idea how to build a afsk modem.

kannanvijayan · 2026-02-04T15:10:32 1770217832

Knowledge is being aware of the analogy of tomatoes not being treated like fruits even though they technically are.

Wisdom is understanding that if there was legislation on the matter, and people who ate, produced, or sold non-tomato fruits were hunted and deprived of their freedoms by the state on the basis that fruits are bad for society, then you would likely see similar frustrations expressed about an article title that includes the phrase "tomatoes and fruits" to distinguish them.

lo_zamoyski · 2026-02-04T15:40:16 1770219616

This is such a terrible analogy. Hunted?

Alcohol in moderation is relaxing. Most drugs, OTOH, when used at the doses that make them attractive to recreational drug users, impair reason, and impairing reason is not just stupid, but immoral. We can debate the particular methods by which the state regulates or otherwise deals with drug use, but there is nothing intrinsically wrong with the criminalization of such drugs as such. No one has a right to take drugs (there is no right to immorality). This may seem alien to a culture whose emaciated understanding of morality is exhausted by the concept of consent. The law is a teacher, and it is good to teach people that recreational drug use (and drunkenness) is a bad thing. Like all immorality, it is an insult to one's dignity and humanity.

We can tolerate the impairment of reason as a proportionate side effect [0] (for instance, high doses of morphine given to terminally ill patients in extreme pain), but this is not recreational use.

[0] https://plato.stanford.edu/entries/double-effect/

master-lincoln · 2026-02-04T16:31:36 1770222696

Alcohol impairs cognitive functions more than Marijuana when under high influence.

kannanvijayan · 2026-01-18T13:40:27 1768743627

I suspect it's just circumstantial - two different design approaches. Both of the approaches have their advantages and disadvantages.

IMHO the bigger issue with NaN-boxing is that on 64-bit systems it relies on the address space only needing <50 bits or so, as the discriminator is stored on the high bits. It's ok for now when virtual address spaces typically only need 48 bits of representation, but that's already starting to slip with newer systems.

On the other hand, I love the fact that NaN-boxing basically lets you eliminate all heap allocations for doubles.

I actually wrote a small article a while back on a hybrid approach called Ex-boxing (exponent boxing), which tries to get at the best of both worlds: decouple the boxing representation from virtual address significant bits, and also represent most (almost all) doubles that show up at runtime as immediates.

https://medium.com/@kannanvijayan/exboxing-bridging-the-divi...

addaon · 2026-01-18T19:12:29 1768763549

> IMHO the bigger issue with NaN-boxing is that on 64-bit systems it relies on the address space only needing <50 bits or so, as the discriminator is stored on the high bits.

Is this right? You get 51 tag bits, of which you must use one to distinguish pointer-to-object from other uses of the tag bits (assuming Huffman-ish coding of tags). But objects are presumedly a minimum of 8-byte sized and aligned, and on most platforms I assume they'd be 16-byte sized and aligned, which means the low three (four) bits of the address are implicit, giving 53 (54) bit object addresses. This is quite a few years of runway...

kannanvijayan · 2026-01-18T22:48:37 1768776517

There's a bit of time yes, but for an engine that relies on this format (e.g. spidermonkey), the assumptions associated with the value boxing format would have leaked into the codebase all over the place. It's the kind of thing that's far less painful to take care of when you don't need to do it than when you need to do it.

But fair point on the aligned pointers - that would give you some free bits to keep using, but it gets ugly.

You're right about the 51 bits - I always get mixed up about whether it's 12 bits of exponent, or the 12 includes the sign. Point is it puts some hard constraints on a pretty large number of high bits of a pointer being free, as opposed to an alignment requirement for low-bit tagging which will never run out of bits.

kannanvijayan · 2026-01-18T13:17:22 1768742242

I think this is an attempt to try to enrich the locality model in transformers.

One of the weird things you do in transformers is add a position vector which captures the distance between the token being attended to the some other token.

This is obviously not powerful enough to express non-linear relationships - like graph relationships.

This person seems to be experimenting with doing pre-processing of the input token set, to linearly reorder it by some other heuristic that might map more closely to the actual underlying relationship between each token.

thesz · 2026-01-18T22:54:25 1768776865

  > like graph relationships

Once upon a time during me being language modeling researcher I built and finetuned a big (at the time - about 5 billions parameters) Sparse Non-Negative Matrix Language Model [1].

[1] https://aclanthology.org/Q16-1024/

As this model allows for mix-and-match of various contexts, one thing that I did is to have a word-sorted context. This effectively transforms position-based context into a word-set based context, where "you and me", "me and you" and "and me you" are the same.

This allowed for longer contexts and better prediction.

nickpsecurity · 2026-01-19T01:42:25 1768786945

I've saved it to look at it in the future. I also remembered Kristina Tautanova's name (your editor). Looking up recent publications, she's done interesting work on analyzing pretraining mixtures.

https://aclanthology.org/2025.acl-long.1564/

Thanks to you both for two, interesting papers tonight. :)

thesz · 2026-01-19T17:34:26 1768844066

I am not an author of SNMLM paper. ;)

I was using their model in my work.

nickpsecurity · 2026-01-19T23:53:18 1768866798

I misunderstood what you said.

Well, in your work, whay benefit did you get from it? And do you think it would be beneficial today combined with modern techniques? Or obsoleted by other technqiue?

(I ask because I'm finding many old techniques are still good or could be mixed with deep learning.)

thesz · 2026-01-20T01:03:58 1768871038

At the time (2018), it had perplexity close to LSTM, while having more coefficients and much shorter (hours vs days) training time.

I tried to apply SNMLM's ideas to the byte-level prediction modeling here: https://github.com/thesz/snmlm-per-byte

It was not bad, but I had trouble scaling it to the 1B set. Mostly because I have not enough time.

I do hold same mindset as yours, that many old techniques are misunderstood or underapplied. For example, decision trees, in my experiments, allow for bit-length-per-byte comparable to LSTM (lstm-compress or LSTM in nncp experiments): https://github.com/thesz/codeta

adroniser · 2026-01-18T14:16:40 1768745800

Adding the position vector is basic sure, but it's naive to think the model doesn't develop its own positional system bootstrapping on top of the barebones one.

thesz · 2026-01-18T22:56:31 1768776991

For some reason people are still adding position encodings into embeddings.

As if they are not relying on the model's ability to develop its own "positional system bootstrapping on top of the barebones one."

tuned · 2026-01-19T07:08:22 1768806502

> This is obviously not powerful enough to express non-linear relationships - like graph relationships.

the distance metrics used is based on energy-informed graphs that encode energy relations in a distribution called taumode, see my previous paper on spectral indexing for vector databases for a complete roll-out