Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Awk is cool. It's a full-fledged programming language that's there in anything remotely unix-flavored, but I mostly see it used in one-liners to grab bits of text from piped stdout.

But you can use awk as a general-purpose scripting language [1], in many ways it's nicer than bash for this purpose. I wonder why you don't see more awk scripts in the wild. I suppose perl came along and tried to combine the good features of shell, awk, and sed into one language, and then people decided perl was bad and moved on from that.

[1] Random excerpt from NetBSD's source code https://github.com/NetBSD/src/blob/trunk/sys/dev/eisa/devlis...



You nailed it. Perl replaced awk and then turned out to be counterproductive in a lot of cases because there was no simple and broadly understood way for people to write Perl code that was 1) readable for other programmers and 2) scalable to medium-to-large programs.

Which is not to say that nobody ever figured out those things and did them well, just that the success rate was low enough across the industry to earn Perl a really bad reputation.

I'd like to see a revival of awk. It's less easy to scale up, so there's very little risk that starting a project with a little bit of awk results in the next person inheriting a multi-thousand line awk codebase. Instead, you get an early-ish rewrite into a more scalable and maintainable language.


> I'd like to see a revival of awk. It's less easy to scale up, so there's very little risk that starting a project with a little bit of awk results in the next person inheriting a multi-thousand line awk codebase. Instead, you get an early-ish rewrite into a more scalable and maintainable language.

Taco Bell programming is the way to go.

This is the thinking I use when putting together prototypes. You can do a lot with awk, sed, join, xargs, parallel (GNU), etc. But it's really a lot of effort to abstract in a bash script, so the code is compact. I've built many data engineering/ML systems with this technique. Those command line tools are SO WELL debugged and have reasonable error behavior that you don't have to worry about complexities of exception handling, etc.


The problem Perl and the like have to contend with is that they have to compete with Python. If a dependency needs to be installed to do something you have to convince me that whatever language and script is worthwhile to maintain over Python which is the next de jure thing people reach for after bash. The nice thing about awk is that it’s baked in. So it has an advantage. You can convince me awk is better because I don’t have to deal with dependency issues, but it’s a harder sell for anything I have to install before I can use

And it’s not even that Python is a great language. Or has a great package manager or install situation. It doesn’t have any of those things. It does, however, have the likelihood of the next monkey after me understanding it. Which is unfortunately more than can be said about Perl


> The problem Perl and the like have to contend with is that they have to compete with Python. If a dependency needs to be installed to do something you have to convince me that whatever language and script is worthwhile to maintain over Python which is the next de jure thing people reach for after bash

A historical note: Perl was that language before Python was, and it lost that status to Python through direct competition. For a while, if you had to do anything larger than a shell script but not big enough to need a "serious" C++ or Java codebase, Perl was the natural choice, and nobody would argue with it (unless they were arguing for shell or C.) That's why Perl 5 is installed on so many systems by default.

When I first started using Python, I felt a little scared for liking it too much. I thought I should be smart enough to prefer Perl. Then Eric Raymond's article about Python[1] came out in Linux Journal in 2000, and I felt massive relief that a smart person (or someone accomplished enough that their opinions got published in Linux Journal) felt the same way I did. But I still made a couple more serious attempts to force Perl into my brain because I thought Perl was going to be the big dog forever and every professional would need to know it.

But Perl was doomed —- if Python didn't exist, it would have lost to Ruby, and if Ruby didn't exist, it would have eventually lost to virtually any language that popped up in the same niche.

[1] https://www.linuxjournal.com/article/3882


Perl is installed by default on most Unix systems. FreeBSD being the exception. Python isn’t. Although Python is popular; if we’re comparing the probability of someone having the interpreter installed already, it’s greater for Perl, even if people aren’t aware they already have it.


> Perl is installed by default on most Unix systems. FreeBSD being the exception. Python isn’t.

Most all “Linux” cannot even boot without python, and it is quite easy to find a minimal Linux distribution that does not have a dependency on Perl.


Care to give any examples of Linux distros without Perl?


Though one would probably never be able to work with an assumed install of Python anyway because one would not be able to assume a specific version. I am guessing this is a lesser problem for Perl, since it’s been frozen at some version of 5 for the past 25-30 years correct?


Incorrect. Perl (5.x) has seen 10 stable releases in as many years, IOW since 2014.


Bingo. I would argue Ruby has the quality of being a great language the next person can understand, but I think Rails has prejudiced people.


Agree, Ruby is a wonderful language that is unfortunately dominated by an opinionated Web framework.

Job descriptions tend to be looking for rails developers and forgetting that actually its Ruby developers they are looking for.


May Introduce you to https://bashsta.cc/


What Perl nailed was being useful to write cross platform shell scripts. Agree that it didn’t scale up but you had a chance of delivering n platforms with minimal pain.

awk v gawk doesn’t make me want o to relive those days.


> awk v gawk doesn’t make me want o to relive those days

That's a fair point. I always explicitly write my scripts to invoke gawk so that I don't accidentally invoke a different version.


I’ve seen enough awk behemoths in my time, no thanks.


I’m curious: what are some problems where awk was (presumably) a reasonable choice at first but then the implementation grew into a behemoth? Did the solution need to grow as the problem grew? Or was awk just the wrong choice from the beginning?


One case I saw it used was for processing genomics data. It was kinda ok at first but when we needed to add a new sequencing type it was laborious.

Personally I don’t think awk is a good choice for anything beyond one liners and personal scripts. Here it was fine because it was (initially) some write-once academic code that needed to not be insanely slow.


> then people decided perl was bad and moved on from that.

Screw what people think. I found out I like perl. The last thing I wrote is a programmatic partition editor [1] - like how you use sfdisk to zero out the partitions, except I wanted to do more than zap, like having the MBR and GPT partition table to combine them and make hybrids.

I was fun, and I will use perl again (I may also use awk at one point now that I see how cool it is)

[1] https://github.com/csdvrx/hdisk/


Perl is a great programming language, unless true.


Valid comment, unless $is_bad_example. I genuinely really like the use of unless in Perl. There are lots of times it's nicer to express inverse logic. You could change the name of a variable to have inverse truthyness and not have if not everywhere. Or you could accept you often need to deal with inverse logic on something and use the right language.


Awk is incredibly useful. I wrote a script this week to parse Postgres logs (many, many GB) to answer the question, "what were the top users making queries in the first few minutes at the top of every hour?" [0] Took a couple of functions, maybe 20 LOC in total, plus some pipes through sort and uniq [1]. Also quite fast, especially if you prefix it with LC_ALL=C.

[0]: If you're wondering why there wasn't adequate observability in place to not have to do this, you're not wrong, but sometimes you must live with the reality you have.

[1]: Yes, I know gawk can do a unique sort while building the list. It was late into an incident and I was tired, and | sort | uniq -c | sort -rn is a lot easier to remember.

[1].a: Yes, I know sort has a -u arg. It doesn't provide a count, and its unique function is also quite a bit slower than uniq's implementation.


you can do that in a lot lesser line of code in python and much better performance.


I suspect the performance part is only true if you're familiar with Python's stdlib performance quirks, like how `datetime.fromisoformat()` is orders of magnitude faster than `datetime.strptime()` (which would likely be the function you'd reach for if not familiar), or at the very least, that directly casting slices of the string into ints is in between the two. This example is parsing a file of 10,000,000 ISO8601 datetimes, then counting those between `HH:00:SS – HH:02:SS` inclusive. The count method is the same, and likely benefitting some OS caching, but the parse times remained constant even with repeated runs.

    $ python3 times_python.py strptime_listcomp
    parse time: 45.96 seconds
    count time: 0.54 seconds
    count: 498621

    $ python3 times_python.py slices
    parse time: 9.96 seconds
    count time: 0.40 seconds
    count: 498621

    $ python3 times_python.py isofmt
    parse time: 0.80 seconds
    count time: 0.38 seconds
    count: 498621


> then people decided perl was bad and moved on from that.

That's a large part of what's driving Awk's renaissance: devs that never learned Perl to begin with want something to fill the gap between shell and Python, and other devs like me who (reluctantly) abandoned Perl because it was deemed "uncool" by HN types, which means Perl and all code written in it now has an expiration date on it. But since Awk is a POSIX standard, HN types can't get rid of it.


"HN types" can't get rid of perl either. So just use perl if you want to. Personally I think perl is a terrible language and that anything which is too complex for a shell script (which is most things) should just be done in python. But if you disagree, it's not like anyone can stop you. If your issue is "my teammates hate it and want me to use something else", I promise you they will be just as annoyed if you use awk.


I'm pretty sure people have read more perl code than awk code, so they'll roll their eyes but will be able to cover for perl-required tasks, but won't build up the courage to touch the awk.

To me, hell is having to debug perl scripts other people wrote. Based on experiences in the 1998 time frame.


Hell is having to debug perl scripts you wrote yourself long enough ago that you've forgotten and the developers of the packages you depend on have literally died of old age.

Because you have only yourself to blame.


took a contract job where I was writing a thing in perl to connect a billion dollar collection of ecommerce sites to facebook ads in 2019.


OMG. I hope the pay was excellent.


What is an "HN" type? HN has smart, dumb, and in-between people of every variety. That's the byproduct you get of encouraging curiosity. I don't think HN was even around in any sort of prominence when Perl died.

I stopped using Perl because my egg drop bots got laborious along with my expect scripts. They were novel early on, but maintaining them became something of a chore I wasn't inclined to do anymore. Other things started to do that better.

Personally, I think Python won because it's syntax was much more readable. It had nothing to do with technical merits. DX is a strong subliminal motivator.


can't wait for the day I will be able to compile Linux without Perl


You can do that right now. Sabotage Linux rewrote the one and only perl dependency in awk years ago.

Grab https://raw.githubusercontent.com/sabotage-linux/sabotage/ma... and clobber the perl script in the Linux source.

Then: sed -i -e 's@perl $(srctree)/$(src)/build_OID_registry@awk -f $(srctree)/$(src)/build_OID_registry@' ./lib/Makefile

They also removed the perl dependency for ca-certificates since one of the goals was to remove perl dependencies from the core system including its toolchain and kernel. It's not needed at all now.

This Aho project is neat because it has the potential of removing the perl dependency on having a git client, which was a problem prior.


that's the day I'll eat my red hat


I love AWK. Its stringly-typedness would make a Javascript programmer blush: 0, “”, and actually-nothing are identically falsey. Numbers are no different from strings representing numbers, like Lua. Somehow, I don’t mind — if you really need to keep your numbers numbers and strings strings, sigilate the value (prepend with #/$) and peel it off with a substr() later.


I like the way Lua handles strings and numbers quite a bit. They're different types, but arithmetic will convert a string to a number when it can.

This is without footguns, because concatenation is a different operator entirely.

    2 + "3"  -- 5 
    2 .. "3" -- "23"
It's rather convenient, especially when you can combine it with LPeg to parse files with numbers in them, then do the maths directly.


Lua does not represent numbers with strings. It does have number type which is always floating-point.


Integers were introduced in 5.3.


I have written scripts in awk (what seems a lifetime ago!), bash, then perl, ruby and python - in that chronological order. I think awk scripting didn't take off for the masses because while it was good for its goal, (1) it was a bit niche; the common knowledge people came in to work on unix systems was bash and awk/grep/sed one-liners - learning awk would have been work that was seen to have specialized gains, (2) yes, Perl sort of provided a sane alternative to the mix of shell scripts, magic one-liners and awk scripts. Of course, later it was supplanted by Python (transitioning through a brief period of Ruby).

Reading legacy scripts was wild back then - you had to be somewhat good at bash, unix tools like awk, C, make, Perl:-)


awk isn't readable. modern programming languages can do things much better than awk in much lesser lines of code.all of them have better code to work with text and string.


> much lesser lines of code

Doubtful. awk has a lot of implicit behaviour which allows the programmer to write very terse scripts. An equivalent Python program is usually several times longer.


Disagree on both counts.

For the problem domain that Awk targets, its close to as good as it gets. Lot of the line reading, delimiting, chunking etc is already done for you. You the programmer dont have to deal with re-implementing that same old ceremony. You get straight to the point of writing the transformations that you want.

If you break out into Python for that, that's a bit like being at your best and formal self, your best table manners, the first time you are meeting your fiance's parents. Awk on the other hand is like being with you old childhood pal, partner in crime, from your old street where the need for such ceremonies do not exist.

With modern awks you can write extensions for them. Mawk is plenty fast too.

One thing that still grates though is that the 'split' function does not come with a corresponding 'join' and one has to iterate through the array explicitly to join.


enough talk, show me the code


seek and ye shall find. Enough open source awk out there


> awk isn't readable.

I mean, russian is also unreadable, until you know how to read it.

awk's power for me isn't the LOC needed to accomplish a task, its power is that I can express the business logic I need very easily and very quickly, and the resulting code is really fast. I am by no means great with awk, but I can go months without touching awk, encounter some problem where awk shines, and in a few minutes or less I have exactly what I need.

you can learn to be extremely productive with awk in a few hours and it's very comforting to have this in your toolset moving forward. Essential? Probably not, but I like that I don't need to break my thought process when working with awk because it's just so natural to express what I want awk to do that I don't really "think" about writing awk, I just write it.


> people decided perl was bad

Like any programming language, you have to get good at Perl to write good Perl.

Python has clean notation. It's a juggernaut that has changed people's expectations of language design.

But even so, Python has not decreased the amount of bad code in the world. Not even within Pypi.


Does that excerpt start with an if-else sequence?





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: