bryanh (u/bryanh) - Readit News

bryanh commented on Andrej Karpathy: Software in the era of AI [video] youtube.com/watch?v=LCEmi... · Posted by u/sandslash

greybox · 9 months ago

He's talking about "LLM Utility companies going down and the world becoming dumber" as a sign of humanity's progress.

This if anything should be a huge red flag

bryanh · 9 months ago

Replace with "Water Utility going down and the world becoming less sanitary", etc. Still a red flag?

Posted by u/bryanh 10 months ago

New capabilities for building agents on the Anthropic API anthropic.com/news/agent-...

Posted by u/bryanh 10 months ago

Claude Integrations anthropic.com/news/integr...

Posted by u/bryanh a year ago

OpenAI Realtime API platform.openai.com/docs/...

Posted by u/bryanh 2 years ago

OpenAI O1-Mini openai.com/index/openai-o...

bryanh commented on GPT-4o mini: advancing cost-efficient intelligence openai.com/index/gpt-4o-m... · Posted by u/bryanh

minimaxir · 2 years ago

Good catch: the calculators here are bizarre. For GPT-4o, a 512x512 image uses 170 tile tokens. For GPT-4o mini, a 512x512 image uses 5,667 tile tokens. How does that even work in the context of a ViT? The patches and its image encoder should be the same size/output.

Since the base token counts increase proportionally (which makes even less sense) I have a hunch there's a JavaScript bug instead.

bryanh · 2 years ago

Confirmed that mini uses ~30x more tokens than base gpt-4o using same image/same prompt: { completionTokens: 46, promptTokens: 14207, totalTokens: 14253 } vs. { completionTokens: 82, promptTokens: 465, totalTokens: 547 }.

Posted by u/bryanh 2 years ago

GPT-4o mini: advancing cost-efficient intelligence openai.com/index/gpt-4o-m...

bryanh commented on Gemini AI deepmind.google/technolog... · Posted by u/dmotz

polygamous_bat · 2 years ago

I assume these landing pages are made for wall st analysts rather than people who understand LLM eval methods.

bryanh · 2 years ago

True, but even some of the apples to apples is favorable to Gemini Ultra 90.04% CoT@32 vs. GPT-4 87.29% CoT@32 (via API).

bryanh commented on Gemini AI deepmind.google/technolog... · Posted by u/dmotz

rolisz · 2 years ago

What is up with that eval @32? Am I reading it correctly that they are generating 32 responses and taking majority? Who will use the API like that? That feels like such a fake way to improve metrics

bryanh · 2 years ago

Page 7 of their technical report [0] has a better apples to apples comparison. Why they choose to show apples to oranges on their landing page is odd to me.

[0] https://storage.googleapis.com/deepmind-media/gemini/gemini_...

bryanh commented on Show HN: AI Proxy with Support for OpenAI, Anthropic, LLaMa2, Mistral braintrustdata.com/blog/a... · Posted by u/ankrgyl

bryanh · 2 years ago

We've been using Braintrust for evals at Zapier and it's been really great -- pumped to try out this proxy (which should be able to replace some custom code we've written internally for the same purpose!).

u/bryanh

KarmaCake day5854March 7, 2010

About

co-founder/cto zapier.com (YC S12) writes at bryanhelmig.com

I likes to hack and play the jazz.

bryanATzapierDOTcom

View Original