Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think what your are doing is fine. SQL is strongest at answering questions not processing data. I think metabase has the right approach: What question do you want to ask your data? If you want to process and transform your data, I think your tools you are using are great for that.


Not trying to play word games, but what is the difference between answering a question and processing data? Aren't they effectively the same?

Using another tool for processing data often results in recreating SQL mechanics at application level. E.g. select this data, retrieve it, loop and if this, then set that, etc. SQL does it way better, guaranteed.

Of course that's often required for technical reasons (scalability etc.) or processing that's too complex to implement at data layer, or just for cleaner design.

But SQL is amazing at processing data!


«what is the difference between answering a question and processing data? Aren't they effectively the same?»

I think it influences the mindset of the developer. As you say, “retrieve ... if this, then ... loop”. If you're in a “data processing” mindset, then you'll think of a problem like “Get the total number of car widgets in the warehouse” as fetch a widget row; if it's of type car, add number to total; loop until you've processed every row; there you have your total. If, OTOH, you're in an “asking questions” mindset, you'll go: What was the question again, exactly? Oh yes, get the sum of the number for all the widgets which are of type car widgets. Which is almost exactly the same as SELECT SUM(NUMBER) FROM WIDGET WHERE TYPE = 'CAR';.

Processing data is when you do it (in code); answering questions is when the RDBMS (i.e, its code) does it for you. :-)

(At least that's what I think the difference is _in terms of vvkumar's original question._)


Agreed that SQL is amazing at processing data! I would argue that a lot of people are trying to both process their data _and_ ask a question of it in the same statement. Separating those out is really important to make analytics more scalable.

We do >95% of our transformations with pure SQL and the queries are primarily in Looker.


Processing to me - e.g. running through some sort of algorithm or complex logic. Not just a transformation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: