Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've used all of these so I might be able to offer some perspective here

In an ELT/ETL pipeline:

Airflow is similar to the "extract" portion of the pipeline and is great for scheduling tasks and provides the high-level view for understanding state changes and status of a given system. I'll typically use airflow to schedule a job that will get raw data from xyz source(s), do something else with it, then drop it into S3. This can then trigger other tasks/workflows/slack notifications as necessary.

You can think of dbt as the "transform" part. It really shines with how it enables data teams to write modular, testable, and version controlled SQL - similar to how a more traditional type developer writes code. For example, when modeling a schema in a data warehouse all of the various source tables, transformation and aggregation logic, as well as materialization methods are able to to live in the their own files and be referenced elsewhere through templating. All of the table/view dependencies are handled under the hood by dbt. For my organization, it helped untangle the web of views building views building views and made it simpler to grok exactly what and where might be changing and how something may affect something else downstream. Airflow could do this too in theory, but given you write SQL to interface with dbt, it makes it far more accessible for a wider audience to contribute.

Fivetran/Stitch/Singer can serve as both the "extract" and "load" parts of the equation. Fivetran "does it for you" more or less with their range of connectors for various sources and destinations. Singer simply defines a spec for sources (taps) and destinations (targets) to be used as a standard when writing a pipeline. I think the way Singer drew a line in the sand and approached defining a way of doing things is pretty cool - however active development on it really took a hit when the company was acquired. Stitch came up with the singer spec and their offered service is through managing the and scheduling various taps and targets for you.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: