Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you believe in ELT over ETL, you want both.

Schema on read is good for raw data stages. You want to get the data from external systems in the database no matter what. You don't have a control over it and it can change any time. Just get it into a blob, variant or text column in a raw table and then transform it to your schema later. This way you can implement validation logic for the raw data within your database, instead of doing that outside in your ETL system. So if you believe in ELT, just get the data in first.

Of course you don't want to use stringly typed tables for any serious data analysis. You need to type the columns to make calculations perform and make the schema stable over data history. Instead of doing your typing logic, converting string and variant columns to native database types, in an ETL script, you do it in a the data warehouse.

The transformation into types has to happen somewhere and if you skip schema on read, that only means you never adopted ELT and are stuck with your database AND an ETL system. Adopting ELT would mean you'd only use one system, the data warehouse to do both.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: