Unix pipes lack any ability to say "this is the kind of data that I'm pushing you." The push raw, unidentified bytes. If the downstream program had some idea what format the data was in, it would know how to output it.
Imagine an alternate universe where a pipe was a stream of bytes and a MIME type. If a terminal gets data written to it from a program in application/terminal.vt100, then it knows it should process those escape sequences. If it get text/plain; charset=utf-8, it knows it should not, and can take a different action.
I think such a universe would make for more pleasant piping, too. Imagine that every interactive command on a terminal ended with an imaginary |show command; it's job is to read the mimetype and display, on the terminal, that data. A command that emitted CSV data could then automatically render in the terminal as an actual table. A program that emitted a PNG could render an ASCII art version of that image (or, since we're in an imaginary universe, imagine our terminal has escape sequences for images – which actually exists in some terminals today – it could emit the actual image!). Essentially, the program emits the data in whatever form it natively speaks, and that implicit |show parses that into something appropriate for the terminal. (That way, the program can still be used in pipelines, such as make_a_png | resize_it.) If you want your raw bytes, then you can imagine
and then the final, implicit |show would perhaps output a nice hexdump for you. (Since emitting raw bytes to a terminal never makes sense.)
Now, I'm a bit fuzzy on exactly how the pipeline being executed, the shell, and the terminal all interact exactly, and I'm nearly certain such a change would require low-level, POSIX-breaking changes. But it was a dream I had, and I think it might be a better model than what we work w/ presently.
(In the article's case, though, git log/diff both emit terminal sequences, so even in my imaginary world, they'd need to know that they should escape them. It mostly works for simpler types, but git rm could conceptually emit just text, since I don't think it ever colors.)
and I'm nearly certain such a change would require low-level, POSIX-breaking changes.
Well, as long as you were happy with it being opt-in (ie, every currently existing program and terminal defaulting to just ignoring the content type as it does now), I imagine you could get 99% of the way there by adding two new command types to the fcntl(2) syscall:
F_SETCTYPE
Set the content type for the data written to this
descriptor to the value pointed to by the third
argument, which must be a pointer to struct fctype.
F_GETCTYPE
Get the content type for the data read from this
descriptor and store it into structure pointed to
by the third argument, which must be a pointer to
struct fctype.
struct fctype would just contain a character array.
You'd implement it so that the value set by F_SETCTYPE on the writing end of a pipe or named pipe is returned by F_GETCTYPE on the reading end, and for pseudo-terminals the value set by F_SETCTYPE on the slave side is returned by F_GETCTYPE on the master side.
You probably wouldn't bother supporting it for sockets or disk files (though files could store it in a POSIX extended attribute if you really wanted to).
The writing side would have to set the content type before calling write(), and the reading side would have to get the content type after read() returns.
This is one of the problems that I've been attempting to solve with Strukt[1]. Its operations work by processing a stream of objects (which have named keys, and values with real types), rather than bytes.
You're right that shoving raw bytes to a terminal doesn't make sense. I display binary data as a picture of the bits themselves. After a little while, it becomes surprisingly informative. You can spot patterns like "ASCII(ish) text" or "all 0's".
You're right, too, that once you start tugging on one corner, the whole thing starts to come apart. That's why I didn't even bother trying to make it a "Unix shell". There's just too many issues with trying to remain backwards compatible that far back. When the operations aren't programs, for example, I can optimize between them.
Unfortunately, perhaps, "people who spend money on software" and "people who are looking for a Unix shell" are pretty much disjoint sets, so I'm initially going after markets like EDA and ETL.
Your website, BTW, serves itself as ISO-8859-1¹, but the actual data is UTF-8. The result is that any non-ASCII is mojibake. (Such as the degree/minute/second symbols in the location example, or the second text example).
¹you serve a Content-Type of text/html, with no charset, and you have no <meta> charset in the HTML itself, so this is the default.
> I display binary data as a picture of the bits themselves.
And after looking at that on the website, that is a very interesting approach.
Thanks! I didn't realize that, since it worked fine in all of my browsers, and nobody has said anything about it.
It's a simple HTML page (from some Jekyll templates), served up through an AWS S3 bucket, and I just learned that while it auto-detects mime-types, it doesn't do this for encodings. Fortunately, there is a way to specify it by hand [1].
> And after looking at that on the website, that is a very interesting approach.
I've been surprised by the response to this. It's the 5th or 6th design I tried, and the first one I didn't totally hate, but I've had a user tell me it's super cool and I should feature it more prominently.
One of my philosophies is "When in doubt, show the user their data".
PowerShell does something similar to what you're describing, but arguably better.
With PowerShell, instead of passing text strings between programs, you actually pass CLR objects between programs.
PS does a lot of things very nicely, it's a great tool for Sysadmins, with more safety than Bash. The verbose syntax (e.g. Get-ChildItem vs ls) is a big stumbling block the few times I've used PS, but it makes sense from the point of writing a script to be executed more than once, it's a much more readable language than Bash.
I'm a Linux user (my Windows days predated Powershell), so I'm unfamiliar here.
I think the biggest concern such a setup gives me would be:
1. is whatever encoding the CLR uses for IPC efficient enough for all purposes?
2. can you stream objects / push multiple objects? (some programs, such as tar, generate potentially massive amounts of output that you cannot buffer)
Otherwise, that actually sounds a good deal stronger than my initial suggestion, and potentially a lot more powerful. (I think I just wonder about its generality & performance.)
The problem largely still is the effort of rewriting all/enough userspace tools to actually use the type system to make it worthwhile, or the even less appealing approach of maintaining parsers/shims/wrappers for things that allow it to work.
Imagine an alternate universe where a pipe was a stream of bytes and a MIME type. If a terminal gets data written to it from a program in application/terminal.vt100, then it knows it should process those escape sequences. If it get text/plain; charset=utf-8, it knows it should not, and can take a different action.
I think such a universe would make for more pleasant piping, too. Imagine that every interactive command on a terminal ended with an imaginary |show command; it's job is to read the mimetype and display, on the terminal, that data. A command that emitted CSV data could then automatically render in the terminal as an actual table. A program that emitted a PNG could render an ASCII art version of that image (or, since we're in an imaginary universe, imagine our terminal has escape sequences for images – which actually exists in some terminals today – it could emit the actual image!). Essentially, the program emits the data in whatever form it natively speaks, and that implicit |show parses that into something appropriate for the terminal. (That way, the program can still be used in pipelines, such as make_a_png | resize_it.) If you want your raw bytes, then you can imagine
and then the final, implicit |show would perhaps output a nice hexdump for you. (Since emitting raw bytes to a terminal never makes sense.)Now, I'm a bit fuzzy on exactly how the pipeline being executed, the shell, and the terminal all interact exactly, and I'm nearly certain such a change would require low-level, POSIX-breaking changes. But it was a dream I had, and I think it might be a better model than what we work w/ presently.
(In the article's case, though, git log/diff both emit terminal sequences, so even in my imaginary world, they'd need to know that they should escape them. It mostly works for simpler types, but git rm could conceptually emit just text, since I don't think it ever colors.)