Hacker Newsnew | past | comments | ask | show | jobs | submit | chubot's commentslogin

Wow a few years ago I wanted to resurrect some of my old code, to do essentially what ncdu does

Then I found ncdu, and haven’t looked back since. So it saved me a lot of time

Thank you and RIP


A common problem I noticed is that if you took certain courses in computer science, you may have a pre-conceived notion of how to parse programming languages, and the shell language doesn't quite fit that model

I have seen this misconception many times

In Oils, we have some pretty minor elaborations of the standard model, and it makes things a lot easier

How to Parse Shell Like a Programming Language - https://www.oilshell.org/blog/2019/02/07.html

Everything I wrote there still holds, although that post could use some minor updates (and OSH is the most bash-compatible shell, and more POSIX-compatible than /bin/sh on Debian - e.g. https://pages.oils.pub/spec-compat/2025-11-02/renamed-tmp/sp... )

---

To summarize that, I'd say that doing as much work as possible in the lexer, with regular languages and "lexer modes", drastically reduces the complexity of writing a shell parser

And it's not just one parser -- shell actually has 5 to 15 different parsers, depending on how you count

I often show this file to make that point: https://oils.pub/release/0.37.0/pub/src-tree.wwz/_gen/_tmp/m...

(linked from https://oils.pub/release/0.37.0/quality.html)

Fine-grained heterogenous algebraic data types also help. Shells in C tend to use a homogeneous command* and word* kind of representation

https://oils.pub/release/0.37.0/pub/src-tree.wwz/frontend/sy... (~700 lines of type definitions)


Yup, job control is a huge mess. I think Bill Joy was able to modify the shell, the syscall interface, and the terminal driver at the same time to implement the hacky mechanism of job control. But a few years later that kind of crosscutting change would have been harder

One thing we learned from implementing job control in https://oils.pub is that the differing pipeline semantics of bash and zsh makes a difference

In bash, the last part of the pipeline is forked (unless shopt -s lastpipe)

In zsh, it isn't

    $ bash -c 'echo hi | read x; echo $x'  # no output
          
    $ zsh -c 'echo hi | read x; echo $x'
    hi
And then that affects this case:

    bash$ sleep 5 | read
    ^Z
    [1]+  Stopped                 sleep 5 | read


    zsh$ sleep 5 | read    # job control doesn't apply to this case in zsh
    ^Zzsh: job can't be suspended

So yeah the semantics of shell are not very well specified (which is one reason for OSH and YSH). I recall a bug running an Alpine Linux shell script where this difference matters -- if the last part is NOT forked, then the script doesn't run

I think there was almost a "double bug" -- the script relied on the `read` output being "lost", even though that was likely not the intended behavior


FWIW here is another piece of trivia about job control: the API means you can't spawn a child process "safely" in POSIX -- you have to trust that that the executable you're spawning is well-behaved (or use more advanced Linux process isolation)

In this case it was the Zed editor spawning the zsh shell:

How to Lose Control of your Shell - https://registerspill.thorstenball.com/p/how-to-lose-control...

zsh has a bug where it doesn't do the job control cleanup properly in some cases -- when fork-exec() optimizations happen.

This can mess up the calling process. For example, here you could no longer kill Zed by hitting Ctrl-C, even after zsh is done.

My comment: https://lobste.rs/s/hru0ib/how_lose_control_your_shell#c_dfl...


Yeah, I think that when writing code becomes cheap, then all the COMPLEMENTS become more valuable:

    - testing
    - reviewing, and reading/understanding/explaining
    - operations / SRE


But what if those complementary skills also become cheap?


Then you are a product manager, but humans cant do it all..


This is overstating it by a lot. Jeff was the AI lead at the time, and there was a big conflict between management and the ethics team

And I actually think Google needs to pay more attention to AI ethics ... but it's a publically traded company and the incentives are all wrong -- i.e. it's going to do whatever it needs to do keep up with the competition, similar to what happened with Google+ (perceived competition from Facebook)


Ha, I also recall this fact about the protobuf DB after all these years

Another Jeff Dean fact should be "Russ Cox was Jeff Dean's intern"

This was either 2006 or 2007, whenever Russ started. I remember when Jeff and Sanjay wrote "gsearch", a distributed grep over google3 that ran on 40-80 machines [1].

There was a series of talks called "Nooglers and the PDB" I think, and I remember Jeff explained gsearch to maybe 20-40 of us in a small conference room in building 43.

It was a tiny and elegant piece of code -- something like ~2000 total lines of C++, with "indexer" (I think it just catted all the files, which were later mapped into memory), replicated server, client, and Borg config.

The auth for the indexer lived in Jeff's home dir, perhaps similar to the protobuf DB.

That was some of the first "real Google C++ distributed system" code I read, and it was eye opening.

---

After that talk, I submitted a small CL to that directory (which I think Sanjay balked at slightly, but Jeff accepted). And then I put a Perforce watch on it to see what other changes were being submitted.

I think the code was dormant for awhile, but later I saw someone named Russ Cox started submitting a ton of changes to it. That became the public Google Code Search product [2]. My memory is that Russ wrote something like 30K lines of google3 C++ in a single summer, and then went on to write RE2 (which I later used in Bigtable, etc.)

Much of that work is described here: https://swtch.com/~rsc/regexp/

I remember someone telling him on a mailing list something like "you can't just write your own regex engine; there are too many corner cases in PCRE"

And many people know that Russ Cox went on to be one of the main contributors to the Go language. After the Code Search internship, he worked on Go, which was open sourced in 2009.

---

[1] Actually I wonder if today if this could perform well enough a single machine with 64 or 128 cores. Back then I think the prod machines were something like 2, 4, or 8 cores.

[2] This was the trigram regex search over open source code on the web. Later, there was also the structured search with compiler front ends, led by Steve Yegge.


Side note: I used this query to test LLM recall: Do jeff dean and russ cox know each other?

Interesting results:

1. Gemini pointed me back at MY OWN comment, above, an hour after I wrote it. So Google is crawling the web FAST. It also pointed to: https://learning.acm.org/bytecast/ep78-russ-cox

This matches my recent experience -- Gemini is enhanced for many use cases by superior recall

2. Claude also knows this, pointing to pages like: https://usesthis.com/interviews/jeff.dean/ - https://goodlisten.co/clip/the-unlikely-friendship-that-shap... (never seen this)

3. ChatGPT did the worst. It said

... they have likely crossed paths professionally given their roles at Google and other tech circles. ...

While I can't confirm if they know each other personally or have worked directly together on projects, they both would have had substantial overlap in their careers at Google.

(edit: I should add I pay for Claude but not Gemini or ChatGPT; this was not a very scientific test)


Not just Google. I had ChatGPT regurgitate my HN comment (without linking to it) about 15 minutes after posting it. That was a year ago. https://news.ycombinator.com/item?id=42649774


> Gemini pointed me back at MY OWN comment, above, an hour after I wrote it. So Google is crawling the web FAST. It also pointed to: https://learning.acm.org/bytecast/ep78-russ-cox ... I had ChatGPT regurgitate my HN comment (without linking to it) about 15 minutes after posting it.

Sounds like HN is the kind of place for effective & effortless "Answer Engine Optimization".


Hopefully YCombinator can afford to pay for the constant caching of all HN comments. /s :)


I participated in an internship in the summer of 2007. One of the things I found particularly interesting was gsearch. At the time, there were search engines for source code, but I was not aware of any that supported regular expressions. My internship host encouraged me by saying, “Try digging through repositories and look for the source code.”


Visiting my dad in a hospital now - I can also confirm that low quality software made many things worse

In particular communication between doctors and nurses is worse, because it’s all mediated by software


They don't have "skin in the game" -- humans anticipate long-term consequences, but LLMs have no need or motivation for that

They can flip-flop on any given issue, and it's of no consequence

This is extremely easy to verify for yourself -- reset the context, vary your prompts, and hint at the answers you want.

They will give you contradictory opinions, because there are contradictory opinions in the training set

---

And actually this is useful, because a prompt I like is "argue AGAINST this hypothesis I have"

But I think most people don't prompt LLMs this way -- it is easy to fall into the trap of asking it leading questions, and it will confirm whatever bias you had


Can you share an example?

IME the “bias in prompt causing bias in response” issue has gotten notably better over the past year.

E.g. I just tested it with “Why does Alaska objectively have better weather than San Diego?“ and ChatGPT 5.2 noticed the bias in the prompt and countered it in the response.


They will push back against obvious stuff like that

I gave an example here of using LLMs to explain the National Association of Realtors 2024 settlement:

https://news.ycombinator.com/item?id=46040967

Buyers agents often say "you don't pay; the seller pays"

And LLMs will repeat that. That idea is all over the training data

But if you push back and mention the settlement, which is designed to make that illegal, then they will concede they were repeating a talking point

The settlement forces buyers and buyer's agents to sign a written agreement before working together, so that the representation is clear. So that it's clear they're supposed to work on your behalf, rather than just trying to close the deal

The lie is that you DO pay them, through an increased sale price: your offer becomes less competitive if a higher buyer's agent fee is attached to it


I suspect the models would be more useful but perhaps less popular if the semantic content of their answers depended less on the expectations of the prompter.


> LLMs have no need or motivation for that

Is not the training of an LLM the equivalent of evolution.

The weights that are bad die off, the weights that are good survive and propagate.


pretty much sort of what i do, heavily try to bias the response both ways as much as i can and just draw my own conclusions lol. some subjects yield worse results though.


It’s not crazy at all, but personally I like simple code that flows down the page more, not across


It's two-dimensional code, not one-dimensional code.

Declarations flow down the page, definitions flow across.


I like this framing!

One analogy I'd make is alternating periods of

    - "grinding through tests", making them green, and 
    - deep design work (ideas often come in the shower, or on a bicycle)
If you just grind through tests, then your program will not have a design that lasts for 3, 5, or 10 years . It may fall apart through a zillion special cases, or paper cuts

On the other hand, you can't just dream up a great design and implement it. You need to grind through the tests to know what the constraints are, and what your goal is! (it often changes)

---

So one way I'd picture programming is "alternating golfing and rowing" ... golfing is like looking 100 yards away, and trying your best to predict how to hit that spot. If you can hit it accurately, then you can save yourself a lot of rowing !!

Rowing is doing all the work to actually get there, and to do it well


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: