Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I always enjoy reading these kinds of write ups digging into why something is as fast as it is and how it interacts with a wider system. However, I do think that one should not arrive at the belief that this kind of optimisation is warranted everywhere and that code simplicity can also be a goal. The classic argument is to compare OpenBSD’s yes [1] to GNU coreutils’ yes [2] and contemplate on under which circumstances those additional MB/s of “y” will be critical enough to warrant the maintenance of more than one hundred additional lines.

[1]: https://cvsweb.openbsd.org/src/usr.bin/yes/yes.c?rev=1.9&con...

[2]: https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/yes...

Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’ cat [3].

[3]: http://9front.org/img/longcat.png

Just to reiterate in the end though. This is not an argument against optimisation and learning how to make something blazingly fast. But, there is such a thing as optimising the wrong thing and using speed as the only justification for merging a patch is probably not the right thing to do for a bigger project.



I carefully considered this before I optimized GNU yes.

The reason it is useful is because yes can output anything, and so is useful to produce any repeated data for test files etc.

You can see the justification detailed in the original optimization commit: https://github.com/coreutils/coreutils/commit/35217221


I have a history question. I've seen this link a few times: https://www.gnu.org/prep/standards/html_node/Reading-Non_002...

Did this advice from GNU inspire you to optimise `yes`, did your optimised `yes` inspire GNU to write this, or is there no historical connection between your optimised `yes` and this advice?


Trying to differentiate GNU implementations had nothing to do with it. This is never a consideration for me. It was worth the slight increase in complexity for the reasons stated in the commit message. Also an unmentioned point is that the coreutils code is very often referenced, so should be as robust and performant as possible, so those properties may percolate elsewhere.


Thanks, that makes a lot of sense.


So you actually wrote a new program. Yes was made for those pesky installers, not for producing large amounts of data. This would be my approach. Keep yes simple and create a new program that does spilling data well.


And then you have a possibly confusing situation where you have two programs that do essentially the same thing, but one is faster, and the other one is possibly not provided by default. As a user, in cases it matters, you'd have to know the issue and bother installing the new program. This is worse. As developers, I think it's our duty to make life of users simpler, even if it makes our lives a bit more complicated. I'd argue that's what we are here for.

I guess there's no ideal solution. But I think the "new" program does what the first one did better, and does not do anything worse.

We are talking about a program that still under 1000 lines of code and that's not getting new features every month, or at all anyway, so maintainability does not seem to be a big issue?

I see the reasoning but I don't see any actual practical drawback to have improved the original program directly in this specific case. I don't see any advantages of keeping "yes" dead simple neither. The new version is still pretty much readable and the extra time it takes to read it and modifying without doing mistakes seems worth the advantages.


> And then you have a possibly confusing situation where you have two programs that do essentially the same thing

My point is, I'd never thought of using yes for this purpose. So in this case, you could make a command called 'outputsomethingfast' and you could make a command called yes, that internally calls 'outputsomethingfast --output=yes' or something like that.

To me this is way more logical, and more in line with the Linux philosophy, right?


I guess. I don't like the name "yes" and I think we would have been better off with a more general name since the command is general, but now it's there, so…

However, this is independent from this optimization, "yes" already had this feature of outputting anything, I think?

But I expect this kind of accident to happen in any working system that has been long enough. This seems unavoidable. So we'd better put up with this kind of mess probably.


Touche, and there is already a program for the exact purpose of generating large amounts of data: jot(1)

https://manpage.me/?q=jot


Or if homogeneous data is fine, just cat or dd from a source like /dev/zero.


I think I've seen /dev/random used for generating large amounts of garbage data as well. (Though usually my problem is too much garbage data, not too little.)


And here is the problem. Now we have 2 programs that do essentially the same thing.


i really don't want to have to learn 12342384 programs. it's much less discoverable than having a few programs with a --help (and more generally a tree-based organization of functionality on your computer).

also, if there's a new program, say, "fastrepeat" wouldn't that be a duplication of functionality between "yes" which just outputs "y" and "fastrepeat 'y'" which is, like, even more bloat since now you need both ?


I would much rather have a very easy to remember command that does one thing and one thing only (namely, what it says on the tin) than to have to remember or dig through a whole slew of command line options in order to get 'yes' to become the equivalent to 'no' or 'cat'.


> I would much rather have a very easy to remember command that does one thing and one thing only

well, I definitely don't. I don't want to encumber my mind with a name for every single of the 25000 "one thing" things I have to do.


really you want one program that outputs data, and yes is an alias to outputdatafast --data="y"


  > However, I do think that one should not arrive at the belief that this kind of
  > optimisation is warranted everywhere and that code simplicity can also be a goal
Definitively not warranted everywhere, especial if premature and at such detail, but core utils like `cat` and `yes` are IMO prime examples of where such optimizations are warranted:

- They're in use daily on a huge amount of setups, even small benefits add up much more than in some niche tool.

- They got a clear and small feature set that won't change anytime soon, so there won't be much code churn and thus maintenance effort will be relatively low.

  > Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’
  > cat [3].
  >
  > [3]: http://9front.org/img/longcat.png
IMO that isn't an exactly fair comparison though, as the difference is not only in optimizations but a lot in boilerplate license/copyright comments and in features like line numbering, modes for showing (non-printable) special characters or whitespace and option parsing for said features.

Strip all that out, and you got (eye balled) 1/3 of that, and 200 lines for an extremely fast core util is really not much nor hard to maintain, as it won't get any new features soon anyway.


Right. Faster generally means less CPU used for a particular use. For example I made ls(1) a bit faster recently. Even though not really noticeable per run, given how often it is run I estimated this to save 9 tonnes of carbon emissions per year. https://twitter.com/pixelbeat_/status/1511030095011581953


This comment rubs me all the wrong ways possible.

Is the posted link supposed to be a reference? That is your own tweet aka self-reference, boasting your own contribution.

In the tweet itself you say "estimate"! How do arrive at such a grandiose estimate?

How do you attribute a saved carbon footprint to an optimization in a command line tool? You cannot even approximate that. I would argue that such tiny optimizations make 0, nil difference in overall energy consumption on my local machine, all my laptops and all the servers in this building.

I'm not saying that we shouldn't run optimized code, but everyone can scream around random numbers.


> and in features like line numbering, modes for showing (non-printable) special characters or whitespace and option parsing for said features.

I remember a page somewhere saying plan9 people where very much against making cat be a generic tool for all those use cases, and it should do what it says in the name: concatenate files.


I agree with your premise, that additional complexity is not always worthwhile. But I don't think this "classic argument" is very strong. Hardly even an argument at all.

You compare two things and see one is longer and more complex than the other. How much cost is that really? 10x more code sounds bad, but 100 more lines might put it in perspective. And how complex is the code really? And what is the benefit? A common complaint seen on OpenBSD lists is that performance is behind competitors so you take a bunch of those complaints and make an equally sound argument the other way.

I will say that a lot of the tools and libraries and functionality I have seen the hell optimized out of and functionality added to, allows solutions to be put together which would be infeasible or impossible with simple / naive implementations. More layers or custom code or more complexity can be avoided. Let's say a database layer could be avoided if filesystem operations are fast enough. Or a shell script + cmd line tools can be used instead of writing a new program if fork+exec+exit+context switching+pipe IO, and these kind of tools (yes and cat) are fast. If malloc+free are fast then you don't need to write your own caching allocator in front of it. Etc etc. So you might end up with an end-to-end solution that meets your requirements and actually has less code, or at least less bespoke complexity and more that is long maintained and used by many.


I think the main reason that GNU tools are overly optimized is copyright. [0] They just try to avoid any copyright claim by the old proprietary UNIX tools like the copyright on an empty file. [1]

[0]: https://www.gnu.org/prep/standards/html_node/Reading-Non_002...

[1]: http://trillian.mit.edu/~jc/humor/ATT_Copyright_true.html


Even if those additional MB/s are not critical for any particular application, tools like these are ran a humongous amount of times every day throughout the world, so while I have no data, I suppose the total CPU usage could be enough that global energy/carbon savings from this kind of optimization would be relevant.


I never heard about anyone using yes to generate large data files/streams. Does anyone actually uses yes in a way that its daily CPU usage will be bigger than a rounding error?

Also very simple FreeBSD implementation [1] is not too slow on my 10-years old notebook:

  > time yes test_string | dd of=/dev/null bs=1M count=65536
  ...
  2646329200 bytes transferred in 1.709196 secs (1548288918 bytes/sec)
  0.023u 0.850s 0:01.71 50.8% 5+166k 0+0io 0pf+0w
Firefox probably have used more CPU time while I was composing this comment - thanks to JS (in other tabs, HN is a rare example of a site which doesn't abuse my CPU). FF is almost always on the 1st line in top.

If you'll check top/powertop and a typical desktop or a server you'll likely find better targets than 'yes' to reduce energy use.

[1] https://github.com/freebsd/freebsd-src/blob/main/usr.bin/yes...


I don't think it works that way.

While in general efficient programs can save energy, the "saved" MB/s do not necessarily correlate with saved J. There is no direct cost for a instruction, there is a severe overhead from the machine plainly being switched on. And it's not like you will always be able to "use" that saved MB/s for something else.

And you entirely neglect the "cost" of optimizing. Alone the time spent looking at inefficient code probably cost more energy than all the actually saved energy by a single change.

Consider the time someone could have spent on something else, with significantly more impact.


I'm not an expert on hardware, but I do know that CPU consumption depends on CPU load. A laptop battery drains faster and fans run faster when the CPU is working than when it is idle. More MB/s mean less seconds and therefore more idle time. So how could running a program that has more instructions and takes longer to do the same work not have a cost?

About the second part of your comment, it's true that optimizing has a cost, and if I were creating a "yes" or "cat" program for personal use it would obviously be pointless to optimize. But if it's a program ran by millions of people often (probably more true of "cat" than "yes"), it's not that obvious to me that millions of little savings cannot offset the time of one person optimizing.


So much agreed. The computing operations where saving instructions equals saving energy are really few and far between. Or non-existent.


Once the good-enough version has already been written and sitting around for a few decades, you need an excuse not to get around to optimizing it sooner or later.

The maintenance argument isn't a good enough one. It's a factor, just not a strong one.

There is no reason for every tiny bit of something as foundational as the os to just get better and better forever.

The only reason we write things down in the first place, is so that we can do the work once and then refer to it many times without having to re-create it each time. So there is very little argument for keeping a program small and simple like the first version.

Some, but just not much. Because black-boxifying that complexity is what writing (be it a legal document or a program) is for in the first place. Making a more sophisticated better performing version 2 of something is simply using the tools of writing that exist for no other purpose ultimately.

It does go the other way too. Version 3 could be to invest yet more brainpower into figuring out how to get the same performance in fewer operations. And on that day soneone will wonder if it's worth optimizing that when it already works fine and compute resources are infinite. The answer then as ow will be the same "Yes. Of course."


> Similarly, there is the NSFW comparison between Plan 9’s cat and GNU coreutils’ cat [3].

You know, if there was a SFW version, it would be nicely illuminating about the complexity for getting similar stuff done and handling edge cases and whatnot.

That said, I think the Plan 9 version could use a few comments to decrease the cognitive load, since individual bits of code felt more approachable to me in the GNU version.

Though with the code itself being shorter, one liners or even just a few lines at the top of the function definition could be sufficient.


When to optimise [1] and “premature optimisation” has been an ongoing concern in computer science ever since performance constraints and limits were identified (i.e. since forever).

[1]: https://en.wikipedia.org/wiki/Program_optimization


Re [3], yeah so you mean a program which is faster, better documented, and has more functionality, has a longer source code? Hm yeah?

> using speed as the only justification for merging a patch is probably not the right thing to do for a bigger project

I don't get this. Speed is important. Energy efficience is important. Have we gotten so used to the bloat that performance and energy saving must be disregarded?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: