https://genai-showdown.specr.net
This model gets 8 of the 12 prompts correct and easily comes within striking distance of the best-in-class models Imagen and gpt-image-1 and is a significant upgrade over the old Gemini Flash 2.0 model. The reigning champ, gpt-image-1, only manages to edge out Flash 2.5 on the maze and 9-pointed star.
What's honestly most astonishing to me is how long gpt-image-1 has remained at the top of the class - closing in on half a year which is basically a lifetime in this field. Though fair warning, gpt-image-1 is borderline useless as an "editor" since it almost always changes the whole image instead of doing localized inpainting-style edits like Kontext, Qwen, or Nano-Banana.
Comparison of gpt-image-1, flash, and imagen.
https://genai-showdown.specr.net?models=OPENAI_4O%2CIMAGEN_4...
Honestly seems like a dubious idea. The C++ community that remains are even more "just get good" than before. They still think UB all over the place is fine.
Amazon switched the Prime Video app to WebAssembly and doubled its performance. They support 8,000 device types: https://www.amazon.science/blog/how-prime-video-updates-its-...
A recent talk on it with transcript: https://www.infoq.com/presentations/prime-video-rust/
Assuming Next gen PNG will still require new decoder. They could just call it PNG2.
JPEG-XL already provides everything most people asked for a lossless codec. If there are any problems it is its encoding and decoding speed and resources.
Current champion of Lossless image codec is HALIC. https://news.ycombinator.com/item?id=38990568
I am so glad that Java decided to go down the path of virtual threads (JEP 444, JDK 21, Sep 2023). They decided to put some complexity into the JVM in order to spare application developers, library writers, and human debuggers from even more complexity.
I have been extremely impressed with o1, o3, o4-mini and Gemini 2.5 as debugging aids. The combination of long context input and their chain-of-thought means they can frequently help me figure out bugs that span several different layers of code.
I wrote about an early experiment with that here: https://simonwillison.net/2024/Sep/25/o1-preview-llm/
Here's a Gemini 2.5 Pro transcript from this afternoon where I'm trying to figure out a very tricky bug: https://gist.github.com/simonw/4e208ab9edb5e6a814d3d23d7570d...