If you haven't read it already, I strongly suggest reading the original Dremel paper [0]. It's no doubt somewhat out of date, but I believe BQ is based in Dremel.
tl;dr for underlying storage model: distributed column store which pushes computation down a tree to leaf nodes to parallelize disk I/O. Parent nodes aggregate computations before returning to the client.
- multidimensional / hierarchy modeling for analytical purposes
- permissioning model (roles with element-level granularity, ie. User A - allow Country=USA, User B - allow Product=Bike)
- historical modeling (built-in slowly-changing dimensions support)
What is the underlying storage model? Is it a column store ? Is it closer to traditional row-based stores?
Are you taking advantage of GPUs or other dedicated hardware to accelerate BQ?