Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem with typed streams is that couples the implementations together on both ends and tends to de-genericise them. You get a "grep" that would only work on strings when actually you'd be quite happy grepping numbers 99% of the time. Also in order to serialise them you'd need fully typed files. Notice that you can't serialise a Powershell stream.

The world tried "everything is XML" and more recently "everything is JSON" as a solution to this. The latter is almost there.



What CLI userlands (POSIX, Powershell, Plan9) are missing, IMHO, is a self-describing serialization "stream container" format, like Avro (https://avro.apache.org), the interchange format used in Hadoop, Kafka, and several other systems of the "gluing IO-streaming components of different languages together into an ETL pipeline" variety. (Which is, of course, exactly what you're doing when you write a Unix pipeline, just in-the-small.)

Where in self-describing data formats like JSON or XML (or even the more efficient encodings like ASN.1), every term is "described in place", taking up a lot of encoding overhead and bandwidth; in a self-describing stream container format, each encoded stream first encodes a schema (or the ID of one) for what it's about to transmit; and then transmits the terms encoded using the schema.

Because schemas are referred to by embedding them in the document in a normalized form, each stream received by a client can be handled by a hybrid approach between building up just-in-time dynamic parsers from the schema, if it's not known; or recognizing known schemas (by e.g. hashing the representation of the schema, which works since it's in a normal form) and using a baked-in decoder for that specific schema if available.

This would work pretty well as an enhancement to standard POSIX CLI IO-streaming tools; they'd be able to use specific logic to optimize the parsing of a few schema-encoded types they "expect", while also faithfully (but less efficiently) handling data of any type by falling back to a generic parser routine. (By which I don't mean "falling back to treating the stream as text", but rather "falling back to treating the stream as the custom type that the sender specified." Sort of like how, in most languages, you can compile regexes that are available at compile time, but also have a regex interpreter for regexes received at runtime.)


"Notice that you can't serialise a Powershell stream."

Not sure what you mean by this. This code works fine:

    $credential = Get-Credential -Message "Enter Credential"
    $credential | Export-Clixml "credential.xml"
    $credential = Import-Clixml "credential.xml"
You can also

    $variable | Out-File ./file.txt
if you just want to save the text output.


Yea, but Powershell is nearly unusable in many use cases. It is soooo slow for some things. I really like bits and pieces of it though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: