ozgune (u/ozgune) - Readit News

ozgune commented on 95% of Companies See 'Zero Return' on $30B Generative AI Spend thedailyadda.com/95-of-co... · Posted by u/speckx

jawns · 6 days ago

Full disclosure: I'm currently in a leadership role on an AI engineering team, so it's in my best interest for AI to be perceived as driving value.

Here's a relatively straightforward application of AI that is set to save my company millions of dollars annually.

We operate large call centers, and agents were previously spending 3-5 minutes after each call writing manual summaries of the calls.

We recently switched to using AI to transcribe and write these summaries. Not only are the summaries better than those produced by our human agents, they also free up the human agents to do higher-value work.

It's not sexy. It's not going to replace anyone's job. But it's a huge, measurable efficiency gain.

ozgune · 6 days ago

Previously discussed here: https://news.ycombinator.com/item?id=44941118

It's also disappointing that MIT requires you to fill out a form (and wait for) access to the report. I read four separate stories based on the report, and they all provide a different perspective.

Here's the original pdf before MIT started gating it: https://web.archive.org/web/20250818145714/https://nanda.med...

ozgune commented on Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model twitter.com/Kimi_Moonshot... · Posted by u/c4pt0r

ozgune · a month ago

This is a very impressive general purpose LLM (GPT 4o, DeepSeek-V3 family). It’s also open source.

I think it hasn’t received much attention because the frontier shifted to reasoning and multi-modal AI models. In accuracy benchmarks, all the top models are reasoning ones:

https://artificialanalysis.ai/

If someone took Kimi k2 and trained a reasoning model with it, I’d be curious how that model performs.

ozgune commented on DeepSeek R2 launch stalled as CEO balks at progress reuters.com/world/china/d... · Posted by u/nsoonhui

astar1 · 2 months ago

This, my guess is OpenAI wised up after r1 and put safeguards in place for o3 that it didn't have for o1, hence the delay.

ozgune · 2 months ago

I think that's unlikely.

DeepSeek-R1 0528 performs almost as well as o3 in AI quality benchmarks. So, either OpenAI didn't restrict access, DeepSeek wasn't using OpenAI's output, or using OpenAI's output doesn't have a material impact in DeepSeek's performance.

https://artificialanalysis.ai/?models=gpt-4-1%2Co4-mini%2Co3...

ozgune commented on Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task brainonllm.com/... · Posted by u/bayindirh

ozgune · 2 months ago

This was on HN's front page yesterday. Here's the discussion:

https://news.ycombinator.com/item?id=44286277

ozgune commented on Magistral — the first reasoning model by Mistral AI mistral.ai/news/magistral... · Posted by u/meetpateltech

danielhanchen · 3 months ago

I made some GGUFs for those interested in running them at https://huggingface.co/unsloth/Magistral-Small-2506-GGUF

ollama run hf.co/unsloth/Magistral-Small-2506-GGUF:UD-Q4_K_XL

or

./llama.cpp/llama-cli -hf unsloth/Magistral-Small-2506-GGUF:UD-Q4_K_XL --jinja --temp 0.7 --top-k -1 --top-p 0.95 -ngl 99

Please use --jinja for llama.cpp and use temperature = 0.7, top-p 0.95!

Also best to increase Ollama's context length to say 8K at least: OLLAMA_CONTEXT_LENGTH=8192 ollama serve &. Some other details in https://docs.unsloth.ai/basics/magistral

ozgune · 3 months ago

Their benchmarks are interesting. They are comparing to DeepSeek-V3's (non-reasoning) December and DeepSeek-R1's January releases. I feel that comparing to DeepSeek-R1-0528 would be more fair.

For example, R1 scores 79.8 on AIME 2024, R1-0528 performs 91.4.

R1 scores 70 on AIME 2025, R1-0528 scores 87.5. R1-0528 does similarly better for GPQA Diamond, LiveCodeBench, and Aider (about 10-15 points higher).

https://huggingface.co/deepseek-ai/DeepSeek-R1-0528

ozgune commented on Rethinking PostgreSQL Storage ubicloud.com/blog/time-to... · Posted by u/furkansahin

ozgune · 3 months ago

Question to author.

Are you planning to publish CH benchmarks (TPC-C and TPC-H combined)? I'd expect Aurora to perform much worse on CH than on TPC-C/H. That's because Aurora pushes the WAL logs to replicated shared storage. Since you only need quorum on a write, you get a fast ack on the write (TPC-C). The way you've run TPC-H doesn't modify the data that much, so you also get baseline Postgres performance.

However, when you're pushing writes and you have a sequential scan over the data, then Aurora needs to reconcile the WAL writes, manage locks, etc. CH benchmark exercises that path and I'd expect it to notably slow down Aurora.

(Disclaimer: ex-Citus and current Ubicloud founder)