Readit News logoReadit News
wesleyyue commented on Computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku   anthropic.com/news/3-5-mo... · Posted by u/weirdcat
wesleyyue · 10 months ago
If anyone would like to try the new Sonnet in VSCode. I just updated https://double.bot to the new Sonnet. (disclaimer: I am the cofounder/creator)

---

Some thoughts:

* Will be interesting to see what we can build in terms of automatic development loops with the new computer use capabilities.

* I wonder if they are not releasing Opus because it's not done or because they don't have enough inference compute to go around, and Sonnet is close enough to state of the art?

wesleyyue commented on Llama 3.2 released: Multimodal, 1B to 90B sizes   llama.com/... · Posted by u/modeless
daemonologist · a year ago
On the second point, you're comparing MMMU-Pro (multimodal) to MMLU-Pro (text only). I don't think they published scores on MMLU-Pro for 3.2.

(Edit: parent comment was corrected, thanks!)

wesleyyue · a year ago
Yep you're right, thanks for catching (sorry for the ninja edit!)
wesleyyue commented on Llama 3.2 released: Multimodal, 1B to 90B sizes   llama.com/... · Posted by u/modeless
idiliv · a year ago
Where do you see the MMLU-Pro evaluation for Llama 3.2 90B? On the link I only see Llama 3.2 90B evaluated against multimodal benchmarks.
wesleyyue · a year ago
Ah you're right I totally misread that!
wesleyyue commented on Llama 3.2 released: Multimodal, 1B to 90B sizes   llama.com/... · Posted by u/modeless
wesleyyue · a year ago
Interesting observations:

* Llama 3.2 multimodal actually still ranks below Molmo from ai2 released this morning.

* AI2D: 92.3 (3.2 90B) vs 96.3 (of Molmo 72B)

* Llama 3.2 1B and 3B is pruned from 3.1 8B so no leapfrogging unlike 3 -> 3.1.

* Notably no code benchmarks. Deliberate exclusion of code data in distillation to maximize mobile on-device use cases?

Was hoping there would be some interesting models I can add to https://double.bot but doesn't seem like any improvements to frontier performance on coding.

wesleyyue commented on Show HN: Void, an open-source Cursor/GitHub Copilot alternative   github.com/voideditor/voi... · Posted by u/andrewpareles
pzo · a year ago
there is also PearAI - "The Open Source AI-Powered Code Editor. A fork of VSCode and Continue." [0]. It's getting very crowded in this space: cursor.sh, continue.dev, double.bot, supermaven, codium.ai, PearAI and now Void.

[0] - https://github.com/trypear/pearai-app

wesleyyue · a year ago
If you've tried others, would love to understand if there's anything you didn't like specifically (I'm one of the creators for https://double.bot)
wesleyyue commented on Show HN: Void, an open-source Cursor/GitHub Copilot alternative   github.com/voideditor/voi... · Posted by u/andrewpareles
tristan957 · a year ago
You can publish your extensions on OpenVSX fyi. A lot of projects have started doing that now. Not all, but a good amount. Glad you found Theia though.
wesleyyue · a year ago
Ah interesting! I'm building https://double.bot (ai assistant vscode extension) and someone asked about VSCodium but I didn't realize there's a open marketplace for that specifically.
wesleyyue commented on Learning to Reason with LLMs   openai.com/index/learning... · Posted by u/fofoz
TheMiddleMan · a year ago
Trying out Double now.

o1 did a significantly better job converting a JavaScript file to TypeScript than Llama 3.1 405B, GitHub Copilot, and Claude 3.5. It even simplified my code a bit while retaining the same functionality. Very impressive.

It was able to refactor a ~160 line file but I'm getting an infinite "thinking bubble" on a ~420 line file. Maybe something's timing out with the longer o1 response times?

wesleyyue · a year ago
> Maybe something's timing out with the longer o1 response times?

Let me look into this – one issue is that OpenAI doesn't expose a streaming endpoint via the API for o1 models. It's possible there's an HTTP timeout occurring in the stack. Thanks for the report

wesleyyue commented on Learning to Reason with LLMs   openai.com/index/learning... · Posted by u/fofoz
wesleyyue · a year ago
Just added o1 to https://double.bot if anyone would like to try it for coding.

---

Some thoughts:

* The performance is really good. I have a private set of questions I note down whenever gpt-4o/sonnet fails. o1 solved everything so far.

* It really is quite slow

* It's interesting that the chain of thought is hidden. This is I think the first time where OpenAI can improve their models without it being immediately distilled by open models. It'll be interesting to see how quickly the oss field can catch up technique-wise as there's already been a lot of inference time compute papers recently [1,2]

* Notably it's not clear whether o1-preview as it's available now is doing tree search or just single shoting a cot that is distilled from better/more detailed trajectories in the training distribution.

[1](https://arxiv.org/abs/2407.21787)

[2](https://arxiv.org/abs/2408.03314)

wesleyyue commented on Sourcegraph went dark   eric-fritz.com/articles/s... · Posted by u/kaycebasques
BaculumMeumEst · a year ago
This thread reminded me to finally try Cody, I've been bouncing on and off Copilot for a few months. I wish I knew how good this was sooner, and I had no idea there was a generous free tier.
wesleyyue · a year ago
If you're open to trying new AI coding assistants, would love if you can give https://double.bot a try! (note: I'm one of the creators) The main philosophical differences is that we are more expensive and are trying to build the best copilot with the technology possible at any given time. For example, we serve a larger, more accurate, and more modern autocomplete model, but it does cost more to serve. We also do a lot of somewhat novel work in getting the details right, like improving the autocomplete model to never screw up closing brackets, and always auto-close them as if you typed them.
wesleyyue commented on Large Enough   mistral.ai/news/mistral-l... · Posted by u/davidbarker
nabakin · a year ago
Are you sure the chat history is being passed when the second message is sent? That looks like the kind of response you'd expect if it only received the prompt "in python" with no chat history at all.
wesleyyue · a year ago
Yes, I built the extension. I actually also just went to send another message asking what the first msg was just to double check I didn't have a bug and it does know what the first msg was.

u/wesleyyue

KarmaCake day148September 14, 2022View Original