fertrevino (u/fertrevino)

fertrevino commented on Ask HN: GitHub Copilot down? · Posted by u/thecopy

fertrevino · 2 days ago

'There seems to be a problem with your account. Please contact Github support' is what I get in my vscode extension.

fertrevino commented on From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf] fertrevino.com/docs/gpt5_... · Posted by u/fertrevino

degamad · 8 days ago

Obligxkcd: https://xkcd.com/1838/

fertrevino · 8 days ago

loved the cartoon :)

fertrevino commented on From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf] fertrevino.com/docs/gpt5_... · Posted by u/fertrevino

xnx · 8 days ago

Have you looked at comparing to Google's foundation models or specialty medical models like MedGemma (https://developers.google.com/health-ai-developer-foundation...)?

fertrevino · 8 days ago

That would be an interesting extension. MedGemma isn't part of the original benchmark either [1]. Since Gemini 2.0 Flash is on 6th place, expectations are for MedGemma to achieve higher than that :)

[1]https://crfm.stanford.edu/helm/medhelm/latest/#/leaderboard

fertrevino commented on From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf] fertrevino.com/docs/gpt5_... · Posted by u/fertrevino

aresant · 8 days ago

Feels like a mixed bag vs regression?

eg - GPT-5 beats GPT-4 on factual recall + reasoning (HeadQA, Medbullets, MedCalc).

But then slips on structured queries (EHRSQL), fairness (RaceBias), evidence QA (PubMedQA).

Hallucination resistance better but only modestly.

Latency seems uneven (maybe more testing?) faster on long tasks, slower on short ones.

fertrevino · 8 days ago

Mixed results indeed. While it leads the benchmark in two question types, it falls short in others which results in the overall slight regression.

fertrevino commented on From GPT-4 to GPT-5: Measuring progress through MedHELM [pdf] fertrevino.com/docs/gpt5_... · Posted by u/fertrevino

narrator · 8 days ago

I agree. I have found GPT-5 significantly worse on medical queries. It feels like it skips important details and is much worse than o3, IMHO. I have heard good things about GPT-5 Pro, but that's not cheap.

I wonder if part of the degraded performance is where they think you're going into a dangerous area and they get more and more vague, for example like they demoed on launch day with the fireworks example. It gets very vague when talking about non-abusable prescription drugs for example. I wonder if that sort of nerfing gradient is affecting medical queries.

After seeing some painfully bad results, I'm currently using Grok4 for medical queries with a lot of success.

fertrevino · 8 days ago

Interesting, it seems the anecdotal experience agrees with the benchmark results.

Deleted Comment

fertrevino commented on The EU AI Act – consultation on general-purpose AI artificialintelligenceact... · Posted by u/voytec

fertrevino · 4 months ago

I find this topic more relevant as time goes by, I consider compliance and assessment of AI systems is necessary for its own success. I currently conduct a field study on the topic: https://www.ai-compliance.cloud/

fertrevino commented on Ask HN: What are you working on? (April 2025) · Posted by u/david927

fertrevino · 4 months ago

I am extending my pet project menuop.com, a digital menu maker for restaurants. I plan to integrate AI to enhance visual impact, analytics and recommendations to restaurant owners.

fertrevino commented on I'm Peter Roberts, immigration attorney, who does work for YC and startups. AMA · Posted by u/proberts

fertrevino · 7 months ago

I heard russian citizens were having troubles getting any kind of U.S. visa due to the political situation in the world. Does this situation still persists? Asking for a friend who still hasn't heard back for his tourist visa.

u/fertrevino

KarmaCake day86December 10, 2020View Original