Unix pipes lack any ability to say "this is the *kind* of data that I'm pushing ...

chickenfries · on Dec 13, 2017

> Imagine an alternate universe where a pipe was a stream of bytes and a MIME type.

Isn't this powershell?

> I'm nearly certain such a change would require low-level, POSIX-breaking changes.

That seems to be the trade off, breaking with posix compatibility for nicer abstractions.

caf · on Dec 14, 2017

and I'm nearly certain such a change would require low-level, POSIX-breaking changes.

Well, as long as you were happy with it being opt-in (ie, every currently existing program and terminal defaulting to just ignoring the content type as it does now), I imagine you could get 99% of the way there by adding two new command types to the fcntl(2) syscall:

  F_SETCTYPE
    Set the content type for the data written to this
    descriptor to the value pointed to by the third
    argument, which must be a pointer to struct fctype.

  F_GETCTYPE
    Get the content type for the data read from this
    descriptor and store it into structure pointed to
    by the third argument, which must be a pointer to
    struct fctype.

struct fctype would just contain a character array.

You'd implement it so that the value set by F_SETCTYPE on the writing end of a pipe or named pipe is returned by F_GETCTYPE on the reading end, and for pseudo-terminals the value set by F_SETCTYPE on the slave side is returned by F_GETCTYPE on the master side.

You probably wouldn't bother supporting it for sockets or disk files (though files could store it in a POSIX extended attribute if you really wanted to).

The writing side would have to set the content type before calling write(), and the reading side would have to get the content type after read() returns.

ken · on Dec 14, 2017

This is one of the problems that I've been attempting to solve with Strukt[1]. Its operations work by processing a stream of objects (which have named keys, and values with real types), rather than bytes.

[1]: https://freerobotcollective.com

You're right that shoving raw bytes to a terminal doesn't make sense. I display binary data as a picture of the bits themselves. After a little while, it becomes surprisingly informative. You can spot patterns like "ASCII(ish) text" or "all 0's".

You're right, too, that once you start tugging on one corner, the whole thing starts to come apart. That's why I didn't even bother trying to make it a "Unix shell". There's just too many issues with trying to remain backwards compatible that far back. When the operations aren't programs, for example, I can optimize between them.

Unfortunately, perhaps, "people who spend money on software" and "people who are looking for a Unix shell" are pretty much disjoint sets, so I'm initially going after markets like EDA and ETL.

deathanatos · on Dec 14, 2017

Your website, BTW, serves itself as ISO-8859-1¹, but the actual data is UTF-8. The result is that any non-ASCII is mojibake. (Such as the degree/minute/second symbols in the location example, or the second text example).

¹you serve a Content-Type of text/html, with no charset, and you have no <meta> charset in the HTML itself, so this is the default.

> I display binary data as a picture of the bits themselves.

And after looking at that on the website, that is a very interesting approach.

ken · on Dec 14, 2017

> Your website, BTW, serves itself as ISO-8859-1

Thanks! I didn't realize that, since it worked fine in all of my browsers, and nobody has said anything about it.

It's a simple HTML page (from some Jekyll templates), served up through an AWS S3 bucket, and I just learned that while it auto-detects mime-types, it doesn't do this for encodings. Fortunately, there is a way to specify it by hand [1].

[1]: https://github.com/aws/aws-cli/issues/1346#issuecomment-3332...

Should be fixed now.

> And after looking at that on the website, that is a very interesting approach.

I've been surprised by the response to this. It's the 5th or 6th design I tried, and the first one I didn't totally hate, but I've had a user tell me it's super cool and I should feature it more prominently.

One of my philosophies is "When in doubt, show the user their data".

toomanybeersies · on Dec 14, 2017

PowerShell does something similar to what you're describing, but arguably better.

With PowerShell, instead of passing text strings between programs, you actually pass CLR objects between programs.

PS does a lot of things very nicely, it's a great tool for Sysadmins, with more safety than Bash. The verbose syntax (e.g. Get-ChildItem vs ls) is a big stumbling block the few times I've used PS, but it makes sense from the point of writing a script to be executed more than once, it's a much more readable language than Bash.

deathanatos · on Dec 14, 2017

I'm a Linux user (my Windows days predated Powershell), so I'm unfamiliar here. I think the biggest concern such a setup gives me would be:

1. is whatever encoding the CLR uses for IPC efficient enough for all purposes?

2. can you stream objects / push multiple objects? (some programs, such as tar, generate potentially massive amounts of output that you cannot buffer)

Otherwise, that actually sounds a good deal stronger than my initial suggestion, and potentially a lot more powerful. (I think I just wonder about its generality & performance.)

fowl2 · on Dec 14, 2017

1. It's mostly in-process. 2. Yes, there's a "pipeline" concept.

It's a quite a... quirky language, but you can[2] do pretty much anything down to calling C functions, etc.

PS. PowerShell is on Linux now too[1]

[1] https://azure.microsoft.com/en-us/blog/powershell-is-open-so...

[2] You probably shouldn't.

CGamesPlay · on Dec 13, 2017

This idea was explored a long time ago! https://acko.net/blog/on-termkit/

I personally think something like Jupyter where the "rich display" is actually a separate channel makes more sense than pipes for rich content.

shabble · on Dec 14, 2017

its quite a lot older than that! http://xml.coverpages.org/linuxml-19990226.html is the first one I've heard of, but I wouldn't be surprised if there are predecessors decades older still.

The problem largely still is the effort of rewriting all/enough userspace tools to actually use the type system to make it worthwhile, or the even less appealing approach of maintaining parsers/shims/wrappers for things that allow it to work.