I’m somewhat convinced that the “difference” between OLAP and “data warehouses” is shady advertising.
Structurally they’re really similar, I suspect some vendors couldn’t match the outright performance of existing OLAP db’s, so added extra features to differentiate it enough to justify a new product category, and then talk endlessly about how OLAP databases aren’t capable of handling this brave new future; even though for the majority of workloads, people would be better off just going with a “boring” OLAP database.
Large parts of this comment are directed pointedly at Snowflake.
I think it's more a matter of comparing minivans (cloud "DWH" engines) to sports cars (Clickhouse et al) here.
Snowflake's performance characteristics & ops paradigm have always been more consistent with managed Spark than anything else. Thus the competition with Databricks. They have only recently started pretending to be anything than a low-maintenance batch processor with a nice managed storage abstraction, and their pricing model reinforces this.
That being said, for now it's pretty hard currently to find something that gives you:
- Bottomless storage
- Always "OK" performance
- Complete consistency without surprises (synchronous updates, cross table transactions, snapshot isolation)
- The ability to happily chew through any size join and always return results
- Complete workload isolation
...all in one place, so people will probably be buying Snowflake credits for a few years yet.
I'm excited about the coming generation--c.f. StarRocks and the Clickhouse roadmap--but the workloads and query patterns for OLAP and DWH only overlap due to marketing and the "I have a hammer" effect.
I don't think the slight misuse of either type of engine is bad at small-to-medium scale, either. It's healthy to make "get it done" stacks with fewer query engines, fewer integration points, and already-known system limitations.
Structurally they’re really similar, I suspect some vendors couldn’t match the outright performance of existing OLAP db’s, so added extra features to differentiate it enough to justify a new product category, and then talk endlessly about how OLAP databases aren’t capable of handling this brave new future; even though for the majority of workloads, people would be better off just going with a “boring” OLAP database.
Large parts of this comment are directed pointedly at Snowflake.