When people ask me what’s missing in the Postgres market, I used to tell them “open source Snowflake.”
Crunchy’s Postgres extension is by far the most ahead solution in the market.
Huge congrats to Snowflake and the Crunchy team on open sourcing this.
It's also worth noting that the default for data checksums has changed, with some overhead due to that.
Is it because remote storage in the cloud always introduces some variance & the benchmark just picks that up?
For reference, anarazel had a presentation at pgconf.eu yesterday about AIO. anarazel mentioned that remote cloud storage always introduced variance making the benchmark results hard to interpret. His solution was to introduce synthetic latency on local NVMes for benchmarks.
https://getomni.ai/blog/ocr-benchmark (Feb 2025)
Please note that LLMs progressed at a rapid pace since Feb. We see much better results with the Qwen3-VL family, particularly Qwen3-VL-235B-A22B-Instruct for our use-case.
I agree with you. I feel the challenge is that using AI coding tools is still an art, and not a science. That's why we see many qualitative studies that sometimes conflict with each other.
In this case, we found the following interesting. That's why we nudged Shikhar to blog about his experience and put a disclaimer at the top.
* Our codebase is in Ruby and follows a design pattern uncommon industry * We don't have a horse in this game * I haven't seen an evaluation that evaluates coding tools in (a) coding, (b) testing, and (c) debugging dimension
Just in case you have $3-4M lying around somewhere for some high quality inference. :)
SGLang quotes a 2.5-3.4x speedup as compared to the H100s. They also note that more optimizations are coming, but they haven't yet published a part 2 on the blog post.
> can someone help folks at Mistral find more weak baselines to add here? since they can't stomach comparing with SoTA....
> (in case y'all wanna fix it: Chandra, dots.ocr, olmOCR, MinerU, Monkey OCR, and PaddleOCR are a good start)
In their website, the benchmarks say “Multilingual (Chinese), Multilingual (East-asian), Multilingual (Eastern europe), Multilingual (English), Multilingual (Western europe), Forms, Handwritten, etc.” However, there’s no reference to the benchmark data.