So having benchmarks tests is great as a general guideline for what works under different architectures/schema designs. Unfortunately, benchmarking is highly subjective to the initial choices. I am a big fan of BigQuery (enough to go through Google's vetting process), but there are plenty of performance issues that I've run into it that would have been easily resolved with Redshift. Here are some concrete examples:
1) Running a Query across several very small tables. It turns out that occasionally querying small tables causes heavy network traffic within Google's distributed system. The solution on Redshift would be to adjust distribution. On Google, however, you don't have any control over this. You just have to hope that Google's algorithms pick up the issue based on usage (they don't).
2) Joining large tables. Avoid joining large tables in BigQuery. In Redshift the join would have been done by making sure that the sortkey is set on the column that is used for a join on the (typically) right table. Then having a common distkey between the two tables (this way the relevant data on both tables lives on the same node. BigQuery just throws resources at the problem. Well, it turns out that throwing resources at the problem is super slow (think 5-15 Redshift seconds vs. 200 BQ seconds).
Re: Snowflake. Can't speak to it as I haven't had personal experience. I have worked with Data people who had opinions on both favorable and negative sides of the spectrum. This just suggests to me that just like Redshift and BigQuery, Snowflake is not a universal solution. You really need to understand:
1) what your goals are for the usage among varying consumers
2) what skill set do the various users of the database have
So having benchmarks tests is great as a general guideline for what works under different architectures/schema designs. Unfortunately, benchmarking is highly subjective to the initial choices. I am a big fan of BigQuery (enough to go through Google's vetting process), but there are plenty of performance issues that I've run into it that would have been easily resolved with Redshift. Here are some concrete examples:
1) Running a Query across several very small tables. It turns out that occasionally querying small tables causes heavy network traffic within Google's distributed system. The solution on Redshift would be to adjust distribution. On Google, however, you don't have any control over this. You just have to hope that Google's algorithms pick up the issue based on usage (they don't).
2) Joining large tables. Avoid joining large tables in BigQuery. In Redshift the join would have been done by making sure that the sortkey is set on the column that is used for a join on the (typically) right table. Then having a common distkey between the two tables (this way the relevant data on both tables lives on the same node. BigQuery just throws resources at the problem. Well, it turns out that throwing resources at the problem is super slow (think 5-15 Redshift seconds vs. 200 BQ seconds).
Re: Snowflake. Can't speak to it as I haven't had personal experience. I have worked with Data people who had opinions on both favorable and negative sides of the spectrum. This just suggests to me that just like Redshift and BigQuery, Snowflake is not a universal solution. You really need to understand: 1) what your goals are for the usage among varying consumers 2) what skill set do the various users of the database have