If you're looking to do something similar at your company, I work at Interana[1], and we're aiming to provide near-real-time insight into analytics; Both slice-and-dice real-time queries and session-stitched funnels.
The founders were the guys who built Scuba, and we're taking a somewhat different approach (mostly driven by differences in scale). We're not quite at the second scale delivery times, and are based on more classical logfile rotation and aggregation mechanisms to get our raw data, and then an efficient sharding layer to get it into our columnstore.
How many applications are written embedding or extending your tool? Understand the Scuba comparison and very valid points you make but the core of the paper seems to be about writing applications that can make real time decisions like fraud, ad analytics, Sensor alerts, etc
AFAIK your tool cannot seem to identify trending events as they are streamed in (like moving standard deviation for example) and feed downstream to a pipeline unless I am mistaken
Json happens to be the self-describing interchange format that's well known and generally accepted. It's the payload of most tracking cookies (which are the primary type of data we ingest); You don't need any transformation on the ingest tier; Just POST, validate, dump to logfile.
We also support CSV and apache logs, but JSON is what works for customers.
The founders were the guys who built Scuba, and we're taking a somewhat different approach (mostly driven by differences in scale). We're not quite at the second scale delivery times, and are based on more classical logfile rotation and aggregation mechanisms to get our raw data, and then an efficient sharding layer to get it into our columnstore.
[1] http://www.interana.com/