More

addaon · 2026-01-12T17:35:40 1768239340

Suppose two models with similar parameters trained the same way on 1800-1875 and 1800-2025 data. Running both models, we get probability distributions across tokens, let's call the distributions 1875' and 2025'. We also get a probability distribution finite difference (2025' - 1875'). What would we get if we sampled from 1.1*(2025' - 1875') + 1875'? I don't think this would actually be a decent approximation of 2040', but it would be a fun experiment to see. (Interpolation rather than extrapolation seems just as unlikely to be useful and less likely to be amusing, but what do I know.)

sigmoid10 · 2026-01-13T08:34:43 1768293283

These probability shifts would only account for the final output layer (which may also have some shift), but I expect the largest shift to be in the activations in the intermediate latent space. There are a bunch of papers out there that try to get some offset vector using PCA or similar to tune certain model behaviours like vulgarity or friendlyness. You don't even need much data for this as long as your examples capture the essence of the difference well. I'm pretty certain you could do this with "historicalness" too, but projecting it into the future by turning the "contemporaryness" knob way up probably won't yield an accurate result. There are too many outside influences on language that won't be captured in historical trends.

lopuhin · 2026-01-13T10:25:27 1768299927

On whether this accounts only the final output layer -- once the first token is generated (i.e. selected according to the modified sampling procedure), and assuming a different token is selected compared to standard sampling, then all layers of the model would be affected during generation of subsequent tokens.

sigmoid10 · 2026-01-14T12:19:11 1768393151

This way it wouldn't be much better than instructing the model to elicit a particular behaviour using the system prompt. Limiting tokens to a subset of outputs is already common (and mathematically equivalent to a large shift in the output vector), e.g. for structured outputs, but it doesn't change the actual world representation inside the model. It would also be very sensitive to your input prompt to do it this way.

pvab3 · 2026-01-12T19:10:41 1768245041

What if it's just genAlpha slang?

andai · 2026-01-12T19:35:10 1768246510

The real mode collapse ;)

addaon · 2026-01-12T01:18:26 1768180706

> No, LIDAR is relatively trivial to render immune to interference from other LIDARs.

For rotating pulsed lidar, this really isn't the case. It's possible, but certainly not trivial. The challenge is that eye safety is determined by the energy in a pulse, but detection range is determined by the power of a pulse, driving towards minimum pulse width for a given lens size. This width is under 10 ns, and leaning closer to 2-4 ns for more modern systems. With laser diode currents in the tens of amps range, producing a gaussian pulse this width is already a challenging inductance-minimization problem -- think GaN, thin PCBs, wire-bonded LDs etc to get loop area down. And an inductance-limited pulse is inherently gaussian. To play any anti-interference games means being able to modulate the pulse more finely than that, without increasing the effective pulse width enough to make you uncompetitive on range. This is hard.

CamperBob2 · 2026-01-12T01:26:43 1768181203

I think we may have had this discussion before, but from an engineering perspective, I don't buy it. For coding, the number of pulses per second is what matters, not power.

Large numbers of bits per unit of time are what it takes to make two sequences correlate (or not), and large numbers of bits per unit of time are not a problem in this business. Signal power limits imposed by eye safety requirements will kick in long after noise limits imposed by Shannon-Hartley.

addaon · 2026-01-12T01:32:01 1768181521

> For coding, the number of pulses per second is what matters, not power.

I haven't seen a system that does anti-interference across multiple pulses, as opposed to by shaping individual pulses. (I've seen systems that introduce random jitter across multiple pulses to de-correlate interference, but that's a bit different.) The issue is you really do get a hell of a lot of data out of a single pulse, and for interesting objects (thin poles, power lines) there's not a lot of correlation between adjacent pulses -- you can't always assume properties across multiple pulses without having to throw away data from single data-carrying pulses.

Edit: Another way of saying this -- your revisit rate to a specific point of interference is around 20 Hz. That's just not a lot of bits per unit time.

> Signal power limits imposed by eye safety requirements will kick in long after noise limits imposed by Shannon-Hartley.

I can believe this is true for FMCW lidar, but I know it to be untrue for pulsed lidar. Perhaps we're discussing different systems?

CamperBob2 · 2026-01-12T01:55:15 1768182915

I haven't seen a system that does anti-interference across multiple pulses...

My naive assumption would be that they would do exactly that. In fact, offhand, I don't know how else I'd go about it. When emitting pulses every X ns, I might envision using a long LFSR whose low-order bit specifies whether to skip the next X-ns time slot or not. Every car gets its own lidar seed, just like it gets its own key fob seed now.

Then, when listening for returned pulses, the receiver would correlate against the same sequence. Echoes from fixed objects would be represented by a constant lag, while those from moving ones would be "Doppler-shifted" in time and show up at varying lags.

So yes, you'd lose some energy due to dead time that you'd otherwise fill with a constant pulse train, but the processing gain from the correlator would presumably make up for that and then some. Why wouldn't existing systems do something like this?

I've never designed a lidar, but I can't believe there's anything to the multiple-access problem that wasn't already well-known in the 1970s. What else needs to be invented, other than implementation and integration details?

Edit re: the 20 Hz constraint, that's one area where our assumptions probably diverge. The output might be 20 Hz but internally, why wouldn't you be working with millions of individual pulses per frame? Lasers are freaking fast and so are photodiodes, given synchronous detection.

addaon · 2026-01-12T02:19:20 1768184360

I suggest looking at a rotating lidar with an infrared scope... it's super, super informative and a lot of fun. Worth just camping out in SF or Mountain View and looking at all the different patterns on the wall as different lidar-equipped cars drive by.

A typical long range rotating pulsed lidar rotates at ~20 Hz, has 32 - 64 vertical channels (with spacing not necessarily uniform), and fires each channel's laser at around 20 kHz. This gives vertical channel spacing on the order of 1°, and horizontal channel spacing on the order of 0.3°. The perception folks assure me that having horizontal data orders of magnitude denser than vertical data doesn't really add value to them; and going to a higher pulse rate runs into the issue of self-interference between channels, which is much more annoying to deal with then interference from other lidars.

If you want to take that 20 kHz to 200 kHz, you first run into the fact that there can now be 10 pulses in flight at the same time... and that you're trying to detect low-photon-count events with an APD or SPAD outputting nanoamps within a few inches of a laser driver putting generating nanosecond pulses at tens of amps. That's a lot of additional noise! And even then, you have an 0.03° spacing between pulses, which means that successive pulses don't even overlap at max range with a typical spot diameter of 1" - 2" -- so depending on the surfaces you're hitting, on their continuity as seen by you, you still can't really say anything about the expected time alignment of adjacent pulses. Taking this to 2 MHz would let you guarantee some overlap for a handful of pulses, but only some... and that's still not a lot of samples to correlate. And of course your laser power usage and thermal challenges just went up two orders of magnitude...

addaon · 2026-01-11T18:28:19 1768156099

Finally writing up the documentation (architecture and safety concept) for the fly-by-wire system for the homebuilt airplane I'm 15+ years into designing. Got to OML lock about a year ago, and the aerodynamics are checking out, so really hoping that I can get a subscale flying in 2026 (although I've said that before). On full scale, major remaining design task is structures, but there's plenty of other stuff (propulsion integration) as well.

addaon · 2026-01-11T17:35:15 1768152915

No, that would be “tactical.”

addaon · 2026-01-10T21:11:41 1768079501

> This seems backwards to me. Colleges should be prioritizing strong students for admission and nothing else. Our country needs the best engineers and doctors. Colleges are a scarce, valuable resource and should be reserved for the best students, regardless of why they are the best students.

It seems unlikely that Americans would be so massively overrepresented in American colleges under this policy...

addaon · 2026-01-10T18:09:37 1768068577

> 32-bit apps?

The Core 2 Duo, used in the last 32 bit Mac, was released in 2006.

> PowerPC stuff?

The last G5 PowerPCs were, similarly, discontinued in 2006.

> every 5 years.

20?

schmuckonwheels · 2026-01-10T18:23:00 1768069380

Your stance is all software should die as soon as the generation of chip it was developed on stops being sold?

addaon · 2026-01-10T18:30:53 1768069853

Sorry, I see how my post might have been somewhat unclear. No, my stance is that 2006 is closer to 20 years ago than 5 years ago.

addaon · 2026-01-09T20:30:38 1767990638

You only need the piece type for pawns (that can be upgraded), and a bit on the king to track if castling is possible; otherwise a single bit for on-board/captured is sufficient, since the types of the other pieces are implicit in the array index. (You can shave single bits in a few places -- if the state represents a game in progress the king-captured bit isn't needed; natural bishops only need 5 bits for position on board, etc. This doesn't really add up though.)

On the other hand, there are 32 pieces (max) on a chess board, not 16, so grandparent is off by a factor of more than two.

Scarblac · 2026-01-09T20:45:22 1767991522

Two bits on the king for castling, queenside and kingside.

nurettin · 2026-01-10T05:07:16 1768021636

Why not just one bit "castled"?

toast0 · 2026-01-10T05:52:15 1768024335

You can only castle if neither the king nor the rook have been moved (and none of the three squares the king uses may be under attack, and all the squares between the rook and the king must be empty).

Since you could move either rook somewhere and then back to their starting squares, you have to track their eligibility separately. If the king moves, both rooks lose eligibility.

addaon · 2026-01-08T20:33:46 1767904426

> You become accustomed to blindly hitting "Yes" every time you've accidentally typed something into the text box, and then that time when you actually put a lot of effort into something... Boom. Its gone.

Wouldn't you just hit undo? Yeah, it's a bit obnoxious that Chrome for example uses cmd-shift-T to undo in this case instead of the application-wide undo stack, but I feel like the focus for improving software resilience to user error should continue to be on increasing the power of the undo stack (like it's been for more than 30 years so far), not trying to optimize what gets put in the undo stack in the first place.

9rx · 2026-01-08T20:39:52 1767904792

> Wouldn't you just hit undo?

Because:

1. Undo is usually treated as an application-level concern, meaning that once the application has exited there is no specific undo, as it is normally though of, function available. The 'desktop environment' integration necessary for this isn't commonly found.

2. Even if the application is still running, it only helps if the browser has implemented it. You mention Chrome has it, which is good, but Chrome is pretty lousy about just about everything else, so... Pick your poison, I guess.

3. This was already mentioned as the better user experience anyway, albeit left open-ended for designers, so it is not exactly clear what you are trying to add. Did you randomly stop reading in the middle?

poopooracoocoo · 2026-01-08T23:59:29 1767916769

Now y'all are just analysing the UX of YouTube and Chrome.

The problem is that by agreeing to close the tab, you're agreeing to discard the comment. There's currently no way to bring it back. There's no way to undo.

AI can't fix that. There is Microsoft's "snapshot" thing but it's really just a waste of storage space.

johnnyanmac · 2026-01-09T01:10:36 1767921036

I mean, it can. But so can a task runner that periodically saves writing to a clipboard history. The value is questionable, but throwing an LLM at it does feel overkill on terms of overhead.

addaon · 2026-01-06T22:52:55 1767739975

Most people drive the same car most days. Either many or most people (I don’t have stats) drive a different car some days. There’s entire companies — Hertz, Avis, etc — with business models based around this observation.

addaon · 2026-01-04T18:35:44 1767551744

Omni Group. Wolfram. Parts of Apple. Rhino3D. Parts of Breville. Prusa (on device, not on desktop). Speed Queen (dial-based). Just from applications I currently have open and devices I can see from where I'm sitting.

websiteapi · 2026-01-04T18:42:11 1767552131

I mean something that has a clear Google analog/equivalent that way can compare on. I personally think Wolfram Alpha (assuming that's what you're talking about) isn't any better than Google.

addaon · 2026-01-04T22:42:03 1767566523

Never really used Alpha, was talking about Mathematica.

I don’t the the web is compatible with good UX, but that doesn’t mean good UX isn’t possible — it just means that the companies that are successful at UX build native applications, or physical objects, or both.