In the case of Maya (and his cousins like Houdini, [ex]Shake...) every gui action is a word in an embedded DSL/bindings api (mel script, hscript, python). Maybe Adobe products too, but the javascript parts aren't a cultural feature of the application. Maya is as macro-capable as Emacs, you operate buttons, open the console log, copy the last instructions and drag it into a toolbar to have a new operation.
Plus the underlying reactive dataflow model is like a limited mathematical language. The gap between GUI and CLI is short here. So even without relying on text, you still can build operators (Houdini does this to a great extent, even the skeletal system is build out of user-level geometric primitives)
Some software that took the growable GUI a step forward is Luxologic Modo, here GUI tools are combinators that you sequence in a `toolpipe` to suit your needs, kinda like building a function out of lower level ones.
To finish on a blurry note : text doesn't exist, bytes lead to intepretation/action aka a function of the application model, I see no real reason that visual interactions can't be made 'linguistic'
You are talking about possibility. I have no doubt that it’s possible to manipulate images, sound and videos with CLIs. The question is whether that would be the optimal way.
When I worked with Photoshop, after a while most of the time was spent in a shitty point and click batch instructions editor, it should have been a text file in Vim.
Plus the underlying reactive dataflow model is like a limited mathematical language. The gap between GUI and CLI is short here. So even without relying on text, you still can build operators (Houdini does this to a great extent, even the skeletal system is build out of user-level geometric primitives)
Some software that took the growable GUI a step forward is Luxologic Modo, here GUI tools are combinators that you sequence in a `toolpipe` to suit your needs, kinda like building a function out of lower level ones.
I agree with jpalomaki (http://news.ycombinator.com/item?id=3154076) there's room for different and more linguistic GUI.
To finish on a blurry note : text doesn't exist, bytes lead to intepretation/action aka a function of the application model, I see no real reason that visual interactions can't be made 'linguistic'