Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I guess my strong opinion is that Snowflake is massively overrated and it will perform worse and cost you more than you expect.

Are there specific use cases or experiences that prompt you to say this? I've seen a lot of examples (such as web analytics or SEIM) where teams have built very capable stacks on ClickHouse or similar analytic databases. Basically if you have a focused use case it's often possible to built a custom stack with open source on Kubernetes that outperforms Snowflake along axes like p95 response, cost-efficiency at scale, and data ownership. It would be interesting to hear more about your experience.



It's trivially easy to beat Snowflake on latency because its latency is truly awful. It often takes 1-4 SECONDS end-to-end to run a query that touches just a few thousand (not million!) rows. In theory this is fine for OLAP but when you have a Looker dashboard with 20+ tiles, it becomes a serious problem. ClickHouse absolutely thrashes Snowflake at this, routinely running millions-of-rows queries in hundreds of milliseconds.

Anyway the specific thing I'm remembering about cost is a case where a data team I joined had built a (dumb) CI process that ran a whole DBT pipeline when a PR was opened. After a month or so we got a bill for something like $50k.

Snowflake's rack-rate pricing is $2/credit and an XS warehouse is 1 credit/hr. That XS warehouse is, allegedly, an 8-core/16GB(?) instance with about a hundred gigs of SSD cache, from a "c" family if you're on AWS. Of course since your data is in S3 (cache notwithstanding), you're likely to be network-constrained for many query patterns. BigQuery, which is unquestionably faster than Snowflake, proves that this can be done efficiently. But compare to Redshift (non-RA3) or ClickHouse where you have data in locally-attached disks, Snowflake just gets smoked. The only lever they give you to get more performance is to spend more money which is great for their bottom line but bad for you.

The pitch is that because you can turn it all off when you're not using it (which in fairness they make very easy!), the overall costs end up low. Ehhhhhhhhhhh, maybe. It only takes one person leaving a Looker dashboard open with auto-refresh enabled to keep a warehouse constantly online, and that will add up fast. Plus if you are being silly and building DW data hourly, as is popular, it's going to need to be on anyway. (Do daily builds! You don't need more than that!) Point being, the cost model you will get from sales reps makes very optimistic assumptions about utilization, and it is very likely you will be hit with a bill larger than expected. In practice while it is technically easy to control utilization, it is not actually easy because there are humans in the loop.


Agree with this. Snowflake has best-in-class dev experience and performance for Spark-like workloads (so ETL or unconstrained analytics queries).

It has close to worst-in-class performance as a serving layer.

If you're creating an environment to serve analysts and cached BI tools, you'll have a great time.

If you're trying to drive anything from Snowflake where you care about operations measured in ms or single digit seconds, you'll have a bad time and probably set a lot of money on fire in the process.


Thank you. Much appreciated! I work on ClickHouse, and it's definitely the cat's meow for user facing analytics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: