Hacker Newsnew | past | comments | ask | show | jobs | submit | ozgune's commentslogin

Also, do you know if their benchmarks are available?

In their website, the benchmarks say “Multilingual (Chinese), Multilingual (East-asian), Multilingual (Eastern europe), Multilingual (English), Multilingual (Western europe), Forms, Handwritten, etc.” However, there’s no reference to the benchmark data.


This is huge!

When people ask me what’s missing in the Postgres market, I used to tell them “open source Snowflake.”

Crunchy’s Postgres extension is by far the most ahead solution in the market.

Huge congrats to Snowflake and the Crunchy team on open sourcing this.


Honestly. Just pay snowflake for the amazing DB and ecosystem it is. And then go build cool stuff unless your value add to customers is infra let them handle all that.


Sounds great until you're locked into Snowflake - so glad iceberg is becoming the standard, anything is great.

The trap you end up in is you have to pay snowflake to access your data, iceberg and other technology help with the walled garden.

Not just snowflake, any pay on use provider.

(Context - have spent 5+ years working with Snowflake, it's great, have built drivers for various languages, etc).


Locked in? I mean they’re your partner. As long as you’re deriving value from them the partnership is still valuable no?


Everytime you want to query your data, you need to pay the compute cost.

If instead you can write to something like Parquet/Iceberg, you're not paying for access your data.

Snowflake is great at aggregations and other stuff (seriously, huge fan of snowflakes SQL capabilities), but let's say you have a visualisation tool, you're paying for pulling data out .

Instead, writing data to something like S3, you instead can hookup your tools to this.

It's expensive to pull data out of Snowflake otherwise.


You people can’t be serious, right?

Ok so I build my data lake on s3 using all open tech. I’m still paying for S3 for puts and reads and lists.

Ok I put it on my own hardware. In my own colo. you’re still paying electricity and other things. Everything is lock in.

On top of that you’re beholden to an entire community of people and volunteers to make your tech work. Need a feature? Sponsor it. Or write it and fight to upstream it. On top of that if you do this at scale at a company what about the highly paid team of engineers you have to have to maintain all this?

With snowflake I alone could provide an entire production ready bi stack to a company. And I can do so and sleep well at night knowing it’s managed and taken care of and if it fails entire teams of people are working to fix it.

Are you going to build your own roads, your own power grid, your own police force?

Again my point remains. The vast majority of times people build on a vendor as a partner and then go on to build useful things.

Apple using cloud vendors for iCloud storage. You think they couldn’t do it themselves? They couldn’t find and pay and support all the tech their own? Of course they could. But they have better things to do than to reinvent the wheel I.e building value on top of dumb compute and that’s iCloud.


After running Snowflake in production for 5+ years I would rather have my data on something like Parquet/Iceberg (which Snowflake fully supports...) than in the table format Snowflake has.

It's not that deep


Ok. And this flexibility is only really possible since they did a lot of work to make external and internal tables roughly equivalent in performance.


Yeah, performance depends.

I think a hybrid approach works best (store on Snowflake native and iceberg/tables where needed), and allows you the benefit of Snowflake without paying the cost for certain workloads (which really adds up).

We're going to see more of this (either open or closed source), since Snowflake has acquired Crunchydata, and the last major bastion is "traditional" database <> Snowflake.


I had no idea they did. This pg lake announcement dropped that nugget and i was surprised.


Agreed btw.


They didn't do it out of good will. They realized that's where the market was going and if their query engine didn't perform as well as others on top of iceberg, then they'd be another Oracle in the long-term.


Yes, don’t be obtuse. “Vendor lock-in” is not some foreign unheard of concept.


Teams of the smartest people on earth make these kind of big vendor decisions, vendor lock-in is top of mind, I tell anyone who will listen to avoid databricks live tables and their sleezy sales reps pushing it over cheaper less locked in solutions


Not all vendors are same. Snowflake charges an arm and leg for compute.

It’s 36x more expensive than equivalent EC2 compute.


yeah, this exchange reads like a sales ad


Snowflake is expensive, even compared to Databricks, and you pay their pre-AWS discount storage price while they get the discount and pocket the difference as profit


If the benchmark doesn’t use AIO, why the performance difference between PG 17 and 18 in the blog post (sync, worker, and io_uring)?

Is it because remote storage in the cloud always introduces some variance & the benchmark just picks that up?

For reference, anarazel had a presentation at pgconf.eu yesterday about AIO. anarazel mentioned that remote cloud storage always introduced variance making the benchmark results hard to interpret. His solution was to introduce synthetic latency on local NVMes for benchmarks.


OmniAI has a benchmark that companies LLMs to cloud OCR services.

https://getomni.ai/blog/ocr-benchmark (Feb 2025)

Please note that LLMs progressed at a rapid pace since Feb. We see much better results with the Qwen3-VL family, particularly Qwen3-VL-235B-A22B-Instruct for our use-case.


Omni OCR team says that according to their own benchmark, the best OCR is the Omni OCR. I am quite surprised.


Magistral-Small-2509 is pretty neat as well for its size, has reasoning + multimodality, which helps in some cases where context isn't immediately clear, or there are few missing spots.


(Disclaimer: Ozgun from Ubicloud)

I agree with you. I feel the challenge is that using AI coding tools is still an art, and not a science. That's why we see many qualitative studies that sometimes conflict with each other.

In this case, we found the following interesting. That's why we nudged Shikhar to blog about his experience and put a disclaimer at the top.

* Our codebase is in Ruby and follows a design pattern uncommon industry * We don't have a horse in this game * I haven't seen an evaluation that evaluates coding tools in (a) coding, (b) testing, and (c) debugging dimension


The SGLang Team has a follow-up blog post that talks about DeepSeek inference performance on GB200 NVL72: https://lmsys.org/blog/2025-06-16-gb200-part-1/

Just in case you have $3-4M lying around somewhere for some high quality inference. :)

SGLang quotes a 2.5-3.4x speedup as compared to the H100s. They also note that more optimizations are coming, but they haven't yet published a part 2 on the blog post.


Isn't Blackwell optimized for FP4? This blog post runs Deepseek at fp8, which is probably the sweet spot but new models with fp4 native training and inference would be drastically faster than fp8 on blackwell.


I agree that you could get to high margins, but I think the modeling holds only if you're an AI lab operating at scale with a setup tuned for your model(s). I think the most open study on this one is from the DeepSeek team: https://github.com/deepseek-ai/open-infra-index/blob/main/20...

For others, I think the picture is different. When we ran benchmarks on DeepSeek-R1 on 8x H200 SXM using vLLM, we got up to 12K total tok/s (concurrency 200, input:output ratio of 6:1). If you're spiking up 100-200K tok/s, you need a lot of GPUs for that. Then, the GPUs sit idle most of the time.

I'll read the blog post in more detail, but I don't think the following assumptions hold outside of AI labs.

* 100% utilization (no spikes, balanced usage between day/night or weekdays) * Input processing is free (~$0.001 per million tokens) * DeepSeek fits into H100 cards in a way that network isn't the bottleneck


I was modeling configurations purpose-built for running specific models in specific workloads. I was trying to figure out how much of a gross margin drag some software companies could have if they hosted their own models and served them up as APIs or as integrated copilots with their other offerings


Previously discussed here: https://news.ycombinator.com/item?id=44941118

It's also disappointing that MIT requires you to fill out a form (and wait for) access to the report. I read four separate stories based on the report, and they all provide a different perspective.

Here's the original pdf before MIT started gating it: https://web.archive.org/web/20250818145714/https://nanda.med...


This is a very impressive general purpose LLM (GPT 4o, DeepSeek-V3 family). It’s also open source.

I think it hasn’t received much attention because the frontier shifted to reasoning and multi-modal AI models. In accuracy benchmarks, all the top models are reasoning ones:

https://artificialanalysis.ai/

If someone took Kimi k2 and trained a reasoning model with it, I’d be curious how that model performs.


>If someone took Kimi k2 and trained a reasoning model with it

I imagine that's what they are going at MoonshotAI right now


Why hasn’t Kimis current and older models been benchmarked and added to Artificial analysis yet?


I think that's unlikely.

DeepSeek-R1 0528 performs almost as well as o3 in AI quality benchmarks. So, either OpenAI didn't restrict access, DeepSeek wasn't using OpenAI's output, or using OpenAI's output doesn't have a material impact in DeepSeek's performance.

https://artificialanalysis.ai/?models=gpt-4-1%2Co4-mini%2Co3...


almost as well as o3? kind of like gemini 2.5? I dug deeper and surprise surprise: https://techcrunch.com/2025/06/03/deepseek-may-have-used-goo...

I am not at all surprised, the CCP views AI race as absolutely critical for their own survival...


Not everything that's written is worth reading, let alone drawing conclusions from. That benchmark shows different trees each time the author runs it, which should tell you something about it. It also stacks grok-3-beta together with gpt-4.5-preview in the GPT family, making the former appear to be trained on the latter. This doesn't make sense if you check the release dates. And previously it classified gpt-4.5-preview to be in a completely different branch than 4o (which does make some sense but now it's different).

EQBench, another "slop benchmark" from the same author, is equally dubious, as is most of his work, e.g. antislop sampler which is trying to solve an NLP task in a programmatic manner.


The benchmarks are not reflective of real world use case. This is why OpenAI dominates B2B. As a business, its in your best interest to save money without sacrificing quality.

"Follow the money."

Businesses are pouring money into the OpenAI API. This is your biggest clue.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: