Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sounds like the same argument people have against Wix and Squarespace - "you can't make a website in a GUI".

Yes, you can - but you'll be pretty limited. If you're a brick and mortar or service focused business, a website builder is great. If you rely deeply on a customized web experience, you need to do something custom.

Same with data science. You can get pretty far with some simple data analysis tool. If you need to go farther, then you need to build custom solutions.



I thought the argument that a command line gives you reproducibility in a way that a GUI doesn't was good.

Most of the things someone does in Photoshop don't have to be redone repeatedly. For system administration or I'd guess data science, a lot of things need to be redone regularly. Using the command is good for both doing that and getting in the mindset of doing that.


It needn't be this way: we /could/ have a well defined set of actions encompassed by an API and the GUI just allows one to tinker with the API, and we could have a detailed audit history of every API call that was made, and with which parameters, regardless of whether it originated from the GUI or from a script. This also means scripting support, of course. Finally you should be able to re-play a portion of history. This is approximately how photoshop works, as far as I remember. Perhaps it's slightly worse.


Recently I made a photoshop macro to crop border some documents in a standard way for our graphic designers.

It took me 5 minutes to produce a working proof of concept with an approximate workflow...

But it took me 2 more hours to figure out how to change numericals parameters to precisely set the initial selection box instead of defining it imprecisely by hand. The most frustrating part being that the GUI displayed perfectly theses parameters but offer no f..ing way to edit them. Export/re-import the macro to plain text for edition was not an easy option because of proprietary binary format.

GUI are nice but when it obfuscate scripting capabilities it’s just another way to bind you to a plateform by making you learn plateform specific skills to work around limitations instead of learning universal coding principles.

There are good GUI around however. Just have a look at QGIS project for instance, it can be used purely as a GUI but offer a lot of opportunities to input custom formulas when needed for small adjustment. Heavy scripting extensibility is also possible but more hidden from basic user.

(NB: for Photoshop macro, I wasn’t using the latest CC release so I don’t know if it’s still the case)


Sure but I think you example shows that even a gui that produces the nicest possible text file for an exported macro is going to leave one with two activities; using the gui to accomplish the task and editing the output macro to create a batch for a repeated activity.

The command line allows these two activities to be closer to one activity and so when you base your skill set on using the command line, you get both things and get easy switch between them. It seem clear that for the automating of little tasks, this is kind of necessary.


Have a look at QGIS if you can. For me they really achieved the perfect balance. Some task are a hundred time easier with GUI like refining layout and trying out graphical style. But you can use variable and formulaes pretty much everywhere to override manual control using data defined attributes.

And when you need to you can run the application without GUI for heavy scripting. But still the two approach are fully compatible so you can easily define a layout with the GUI and reuse it via CLI for instance.

Maybe it’s common and I’m just a goof but this software workflow really impress me.


It doesn't have to actually be CLI. It does need to be repeatable (by you) and reproducible (by others), and it does need to go through a known API.

Those requirements will tend to drive you towards a CLI solution as the easiest/best available, but if you could get those requirements satisfied in any other programmatic way, then I think you'd probably be okay.

But then the problem becomes, can you do programming from the GUI? I think that's actually a lot harder to do.


Those requirements will tend to drive you towards a CLI solution as the easiest/best available, but if you could get those requirements satisfied in any other programmatic way, then I think you'd probably be okay.

Theoretically you can do these some other way. In practice, the CLI is the only way that's remained. The thing with the CLI is it is quite easy to a new component to it whereas someone creating a little app for a little task as a GUI tends to create a "cul-de-sac", stovepipe, a program with not relation to any other program.

So third approach would be great but it doesn't seem to be getting any closer.


Programs like Nuke, Houdini and Touch Designer have been doing this for years specific to graphics - this is trying to be more generalized:

https://github.com/LiveAsynchronousVisualizedArchitecture/la...



This is basically what I kinda like about STATA. It's a Data Science tool for researchers.

The thing is that all its menu commands are stored in history, in ".do" file that you can send to anyone to reproduce your steps.

You can also just use the CL "shortcut" to address these same commands. It's pretty nifty.


There is also a package for R called rcmdr that provides GUI for basic statistical analysis. All your actions, performed in GUI, are stored as an actual .R script. However, I don't personally use it so I can't elaborate more.


I thought the argument that a command line gives you reproducibility in a way that a GUI doesn't was good.

I fundamentally disagree with that. I work mainly we geo-data analysis and use programming, command line tools and GUI tools. And honestly setting up a data processing pipe line in a GUI like FME is much easier and more reproducible out of the box than whatever happens to be left over after I've been screwing around with a bunch of random command line tools for a couple of hours. The main thing you lose is some flexibility.


I did some remote desktop support work in the early 90's, around the time that everybody started shifting from command-prompt DOS to GUI windows. We went from saying "type in ipconfig and tell me what you see" to "ok, do you see a start button? Click on the start button. Do you see where it says 'control panel'? It's about two-thirds of the way up. Yes, click on that. With the left button. No, the left button. Ok, do you see something that says 'network'? Click on that." It became obvious that it was impossible to do this over the phone, so we ended up installing what would probably be called "spyware" today so that we could remote-access everybody's machines (I can't remember what the software was called, but it rarely worked as it was supposed to). I see the same problem surrounding documentation around GUIs vs. command-line/text-oriented tools; the documentation spends pages to describe what a simple command would spend one line to cover. Although there are some things that do make some sense in a graphical interface, I still believe that _most_ things are orders of magnitude more efficient if you strip away the graphics and boil them down to some minimal commands.


Whenever I start a new software project, one of the first things I insist on is that the design include a command line tool with the same functionality as the GUI. This usually begets a sensible API between the business logic and the shiny parts, which ends up allowing for a great deal of flexibility for the GUI.

I’ve been burned too often by the GUI first, top-down approach, where the software architecture evolves from the GUI. V1 gets shipped, then the UX designer gets bored and v2 has to have a totally different look and feel and you’re screwed because your software design is permanently tied to the original UI.


Text is a way to build an AST.

GUIs are a way to build an AST.

You can add, remove, transform those changes, and track that history. Music software, Unreal Engine's blueprints, and other node-based programming environments have been doing this, in some cases, for decades.

Saying that 'you can't do data science is a GUI' explains more about the speakers lack of understanding of GUIs than it does about their knowledge of data science.


Depending on the tools it also gives you composability that you don't get from GUI programs. GUI's don't have their pipe equivalent so the aim to let the user do as much as possible within a single tool.


Depends on your definition of "Gui". See my other comment for an example of "science" in Smalltalk.


Doesn't Photoshop have a macro system for reproducible commands, though? I'm not an expert user, but I've definitely seen they have some form of automation available.


And a history api - with a few settings, and making the psd file (not a flattened jpeg etc export) - photoshop gives similar information to a series of high frequency vcs commits.

Granted, since Photoshop itself is closed source (and on a subscription model) there's some very strong limits to scientific replication of a process.

But one could do something similar with gimp, additionally aided by python scripts.

So yeah, Photoshop bad; cli good isn't as clever as all that as a blanket statement (not implying anyone said exactly that; just making an observation).

I see the talk/article is about "data" science ; but the headline reminded me about an Alan Kay talk about teaching - where there's a clip of kids filming a fallen object and then juxtapositioning the video with a rendered sequence based on (v=at etc): whole video worth watching, but see the "16:17" link in the transcript ("Now, what if we want to look at this more closely?"):

https://www.ted.com/talks/alan_kay_shares_a_powerful_idea_ab...


Yes, Photoshop has macros (they call them “Actions”).

These days, Photoshop even has (multiple) JavaScript interpreters (including an instance of node.js they call “Generator”) built in to enable automation using an API that provides access to most (but not all) actions that can be executed via the GUI.

I have been working with these for a side project and the documentation isn’t great, but I have been able to get up and running relatively quickly.


Yes.


That's definitely a problem for science in general (not necessarily "data science" although all science uses data). There are people who analyze biomedical science with R and those who use GUI tools like Prism and Partek. Not only are the GUI tools limited, but the results are basically unreproducible -- if somebody clicked on a button incorrectly or set some setting in a different way, it would be impossible to tell.


I had this discussion today, and that was my conclusion: repeatability is the great advantage of CLIs and text files over GUIs.


What nonsense is this? Loads of GUI tools have it built in

'save report settings'


This is a benefit of using Vim (versus GUI editors) - you have a full history of all the commands you did


What nonsense is this? Loads of GUI tools have it built in.

'save report settings'


It seems like about every five years or so, some higher-up in my organization pushes down some "enterprise" graphical logic builder program that's supposed to simplify the creation of "complex business rules" so that even non-programmers can maintain them. Inevitably they end up being some variation of a drag-and-drop logic builder where something that resembles a flowchart is created by dragging if statements and loop constructs out of a tool palette onto a blank canvas. Of course, this is presented as "revolutionary" every time (even though I've seen the same thing at least five times already), costs a fortune, and turns out to be worse than useless - what the "non-programmers" are able to produce using this thing is limited to what can fit on a single screen (the thing slows down so much that it's unusable if you go bigger than a screen), and impossible to debug in any way. Yet in spite of failure after failure, I have every confidence that I'll end up dealing with more than one other graphical "business logic" tool in my career.


> some higher-up in my organization pushes down some "enterprise" graphical logic builder program... where something that resembles a flowchart is created

It should be said that the people who want to put this tool in place are the least likely to use any sort of flow chart when putting out requirements. In fact I would say that the people who are most likely to buy this are also the most likely to use pantomime and postit notes to convey requirements.


As someone considering building a graphical builder of a kind you described - "dragging if statements and loop constructs out of a tool palette onto a blank canvas" - would you have any advice/suggestions on how to make it actually useful?

It's so true what you say about how this category of applications are often presented as revolutionary and prove to be limited, bloated, difficult to debug.. I'm thinking web page builders especially, but also various attempts at graphical programming. At the same time, there are in history some (more or less) successful examples, like HyperCard.


That is what happens when the logicbuildergui hits the commandlinefan


Is Excel not a GUI? Photoshop? Unity? You can get pretty damn far with those tools (if not all the way).


Can you? I have long been under the impression that Excel is to be avoided for all but the lightest, most cursory analyses.

From 2007:

http://people.umass.edu/evagold/excel.html

From 2013:

http://biostat.mc.vanderbilt.edu/wiki/Main/ExcelProblems

I could post more but I would have to fire-up my old spreadsheet ;)


I opened your second link. The first issue is the classic floating point numbers have rounding errors problem, which as far as I know, every system suffers from. That isn't just an excel problem.


The problem is often related to floating point representation, that is true, but it's not correct to conclude "oh well, everything gets this wrong so I might as well use excel".

One issue with excel is that many of the built in functions and statistical measures are implemented in numerically naive ways (and presumably remain so for reasons of computation speed and backwards compatibility) so if you want to do robust analysis you have to avoid them entirely - at which point you are far better off with a language designed for this. This is particularly an issue with larger data sets, where accumulation errors can become acute. Excel also introduces additional error terms due to binary encoding.

By the way: it is misleading to think of "rounding error problems". Far better to think about it as "rounding properties"/"truncation properties" and the like, then realize that you can't (in general) write floating point operations as if they were utilizing real numbers and expect correct behavior. That doesn't mean correct behavior is not achievable.


I think Excel is a bit of both. It is pretty amazing what people do with Excel. It's probably one of the most effective software tools ever created.

It straddles this incredible balance between completely free-form input and structured data enabling very powerful functionality.


There is a reason VisiCalc was such a big deal when it first came out. This is a UI that is both intuitive to regular people and also incredibly powerful -- and one that takes little effort to learn. It's a sign of the kinds of things that are possible with computing (reducing the gap between what today we call "coders" and "users"). There are few really large efforts towards research in this area these days and we are poorer for it.

Here is a good paper from Alan Kay that relates: https://frameworker.files.wordpress.com/2008/05/alan-kay-com...


But there could be errors hiding anywhere. All you know is it looks correct.


Sure, but the cost to fix that issue, and the flexibility you have to give up to do so, is apparently not worth it most of the time, if we look at how people use the software.

Perhaps there is a systematic undervaluing of more polished, robust custom solutions by excel users across the world. That could be the case.

But there is also probably an under-supply of adequate custom solutions. I have seen comments over the years on HN from people who have had much success in consultanting gigs where they simply built custom tools to replace ad-hoc workflows and processes living in places like Excel.


All existing business processes tolerate high error rates. They have to, because anything that people do by hand will necessarily have a high error rate. So when programmers come to automate an existing business process, they often vastly overestimate the value of correctness at least in the short term: if the program does the wrong thing even 1% of the time, it's really not a problem, because the processes around this process will be built to tolerate that.

In the long term, correctness may become more valuable. An analogy: when factories first switched to electric power, they simply connected electric motors to the existing driveshafts used to distribute steam power around the factory, and only realised small improvements in productivity that way. But once factories were more fully converted to electric power, it became possible to rearrange machines to suit workflows (rather than being arranged around the driveshafts) and this lead to much bigger productivity gains.


> Sure, but the cost to fix that issue, and the flexibility you have to give up to do so, is apparently not worth it most of the time, if we look at how people use the software.

That's not true. People just don't know any better. Look around you and you'll see many people using the wrong tools. It doesn't mean they've made a rational decision to use those, it usually means they're not aware of a better alternative.


You don't get any better assurances on the cli.


What do you mean by "the cli"? Garbage in/garbage out always applies, that's not what I'm talking about. The point is errors in the code doing the processing. If it's written in a language you can read and understand the entire thing. You cannot read and understand an entire spreadsheet.


>You cannot read and understand an entire spreadsheet.

Why wouldn't you?

Except if you mean the internal implementation (by MS). But nobody reads the implementation of Pandas or R either...


I'm not talking about reading the underlying implementation, although that is, of course, another advantage of Python and R. I'm talking about reading the project. What's your process for reading a spreadsheet? Reading every single cell?


That's a good point. In a block of 10,000 cells that should all have the same formula (offset by 1, say) how do you know that they all actually have that same formula? I'm not aware of a way to check that that doesn't involve writing custom VBA.


A spreadsheet doesn't offer you a linear story that you can read. It's a strange amalgamation of computation and program.


I like iTunes 10 as a GUI.

In fact, I like it so much that I'm hoping to rewrite it in JavaScript so it can run in the browser. I also don't like what Apple did with iTunes 11 and 12.

1. SQL queries are Smart Playlists.

2. Summary statistics are shown on the bottom, and apply to the selection (if there is one), or the playlist (if nothing is selected).

3. It's object-oriented data. This is a valid AppleScript command:

tell application "iTunes" to return name of first track whose artist contains "Blink 182"

4. A browser view along the top to quickly see Genre, Artist, Album.

5. Nested playlist folders, including smart playlists.

6. Support for other data types. I wrote a script to generate m3u files to add virtual "Radio stations" to iTunes. Clicking those triggers a PHP script that usually opens my browser with a URL, but could technically do anything else. (OK, this is a hack - I'll document it if anyone wants to know).

Excel is fine, but it doesn't have the hierarchical structure that iTunes is so good at. It also mixes the data and the instructions.


And let us not forget that Powerpoint is turing complete!

http://www.andrew.cmu.edu/user/twildenh/PowerPointTM/Paper.p...


> Is Excel not a GUI?

Yes and Excel is horrible for Data Science. You have hidden data and to be honest almost every complex spreadsheet has at least 2 errors in it. https://www.forbes.com/sites/salesforce/2014/09/13/sorry-spr...


Excel? Not in the traditional sense. It's more of a... spatial(?) programming language.


The problem is that those limitations prevent you from being able to do any real "data science".

The GUI comes with a whole host of built-in assumptions about the data and what it could tell you, and you're probably not smart enough to know what those are and how to correct for them.

You might as well just search for your keys at the base of a streetlight.


The amount of money flowing through the system that's being optimized might be less than the cost of a completely general data scientist trained to work with completely generalizable tooling. Many businesses are happy to be able to spend $30/month and a small slice of the time of a resident spreadsheet master in order to find an incomplete but useful list of optimizations.


Try KNIME. You can do data science with a GUI and not to be limited at all. Specially if this GUIs let you also execute Python and R.

Also, Knime and similar apps show a workflow way easier to understand at first sight. Specially for people with low programming skills.

Not to mention that you don't have to care about the environment and quirks of the packages and so on, although I recognize this is not really a problem in R, but I had with Python from my noob perspective of programming.

I do pretty complicated stuff with Knime, PostgreSQL and Apache Nifi. I make workflows almost on the fly, snap many operations to them, etc. I'd love to see this guys doing something similar in the same timeframe I do.

Well made GUIs like Knime has, make your life easier, this kind of articles sound a bit snob tbh. This statements are true for developing websites and many other fields, not really for treating data. It doesn't make much difference given the current tools.

Edit: I see people arguing Excel is fantastic. It's true, it's amazing but it doesn't keep track of what you do with the data. It's difficult to understand what's going on if there's something even a bit complicated, not to mention if people just copies and pastes stuff everywhere, which is common.

Sorry for my poor english.


I have used Knime for a few months, enough to get comfortable with it, and while some of what you say is true I have to disagree about it standing up to command line tools. Especially after having tried to go through some code written by others in it.

The main problem I have with it is in many cases the upstream data has to be available to a node in order to be allowed to open the node up and see the configuration. This has resulted in my spending a day or two updating and tweaking database queries on an old connection and fixing the resulting configuration errors just to get to one node and read the configuration. In this case I didn't care about the data itself, I just needed the config of that one node.

This would not have been necessary with a text-based tool, where I could just scroll down and read. Knime can be a powerful tool but for sharing work I think it's unnecessarily painful.


You are right about this. I suggest you to report it in the repos or forum, they are somewhat responsive to it and it definitely would be a plus for everyone.

You also have Orange from Biolab, although is way less powerful but it has some cool nodes and AFAIK it was possible to open up nodes config without connecting them to anything.


It's just more Gatekeeping BS. I'm sure they said the same thing about software i.e. "you have to code in command line to be a real programmer"


Its turtles, all the way down. All users of computers must deal with the abstraction handles and layers for which they are comfortable - and which produce results. Not just data-science people. Pretty much a user maxim.


No, you've missed two of the major points: reproducibility and scrutability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: