Readit News logoReadit News
bryanh commented on Andrej Karpathy: Software in the era of AI [video]   youtube.com/watch?v=LCEmi... · Posted by u/sandslash
greybox · 9 months ago
He's talking about "LLM Utility companies going down and the world becoming dumber" as a sign of humanity's progress.

This if anything should be a huge red flag

bryanh · 9 months ago
Replace with "Water Utility going down and the world becoming less sanitary", etc. Still a red flag?
bryanh commented on GPT-4o mini: advancing cost-efficient intelligence   openai.com/index/gpt-4o-m... · Posted by u/bryanh
minimaxir · 2 years ago
Good catch: the calculators here are bizarre. For GPT-4o, a 512x512 image uses 170 tile tokens. For GPT-4o mini, a 512x512 image uses 5,667 tile tokens. How does that even work in the context of a ViT? The patches and its image encoder should be the same size/output.

Since the base token counts increase proportionally (which makes even less sense) I have a hunch there's a JavaScript bug instead.

bryanh · 2 years ago
Confirmed that mini uses ~30x more tokens than base gpt-4o using same image/same prompt: { completionTokens: 46, promptTokens: 14207, totalTokens: 14253 } vs. { completionTokens: 82, promptTokens: 465, totalTokens: 547 }.
bryanh commented on Gemini AI   deepmind.google/technolog... · Posted by u/dmotz
polygamous_bat · 2 years ago
I assume these landing pages are made for wall st analysts rather than people who understand LLM eval methods.
bryanh · 2 years ago
True, but even some of the apples to apples is favorable to Gemini Ultra 90.04% CoT@32 vs. GPT-4 87.29% CoT@32 (via API).
bryanh commented on Gemini AI   deepmind.google/technolog... · Posted by u/dmotz
rolisz · 2 years ago
What is up with that eval @32? Am I reading it correctly that they are generating 32 responses and taking majority? Who will use the API like that? That feels like such a fake way to improve metrics
bryanh · 2 years ago
Page 7 of their technical report [0] has a better apples to apples comparison. Why they choose to show apples to oranges on their landing page is odd to me.

[0] https://storage.googleapis.com/deepmind-media/gemini/gemini_...

bryanh commented on Show HN: AI Proxy with Support for OpenAI, Anthropic, LLaMa2, Mistral   braintrustdata.com/blog/a... · Posted by u/ankrgyl
bryanh · 2 years ago
We've been using Braintrust for evals at Zapier and it's been really great -- pumped to try out this proxy (which should be able to replace some custom code we've written internally for the same purpose!).

u/bryanh

KarmaCake day5854March 7, 2010
About
co-founder/cto zapier.com (YC S12) writes at bryanhelmig.com

I likes to hack and play the jazz.

bryanATzapierDOTcom

View Original