Yusefmosiah (u/Yusefmosiah)

Yusefmosiah commented on Show HN: Maestro – A Framework to Orchestrate and Ground Competing AI Models · Posted by u/defqon1

Yusefmosiah · 9 months ago

I’m building something similar. https://github.com/YusefMosiah/Choir.chat — if you email me at yusef@choir.chat I can invite you to the iOS TestFlight alpha. Happy to talk about in more detail as well.

Getting the UX to work well enough is a major challenge. I’m redesigning that currently, as I got negative feedback from early testers on my initial experimental UX. There’s a balance to be struck between giving users a low latency response, giving the models time to work together and call tools, and not overloading the user with too much information.

Yusefmosiah commented on Claude 3.7 in Cursor is eager – sometimes a bit too eager mathstodon.xyz/@pmigdal/1... · Posted by u/stared

Yusefmosiah · a year ago

Yes, I experienced this too. Is it a model issue or something related to the configuration and system prompts of Cursor?

Yusefmosiah commented on Qwen2.5-1M: Deploy your own Qwen with context length up to 1M tokens qwenlm.github.io/blog/qwe... · Posted by u/meetpateltech

anotherpaulg · a year ago

In my experience with AI coding, very large context windows aren't useful in practice. Every model seems to get confused when you feed them more than ~25-30k tokens. The models stop obeying their system prompts, can't correctly find/transcribe pieces of code in the context, etc.

Developing aider, I've seen this problem with gpt-4o, Sonnet, DeepSeek, etc. Many aider users report this too. It's perhaps the #1 problem users have, so I created a dedicated help page [0].

Very large context may be useful for certain tasks with lots of "low value" context. But for coding, it seems to lure users into a problematic regime.

[0] https://aider.chat/docs/troubleshooting/edit-errors.html#don...

Yusefmosiah · a year ago

It’s not just the quantity of tokens in context that matters, but the coherence of the concepts in the context.

Many conflicting ideas are harder for models to follow than one large unified idea.

Yusefmosiah commented on Rohlang3: A point-free, homoiconic, and dependently typed "SK calculus" rohan.ga/blog/rohlang3/... · Posted by u/ocean_moist

Yusefmosiah · a year ago

I wonder if combinators could be useful for neurosymbolic AI—either in the backward pass (e.g., training models on synthetic data) or the forward pass (e.g., iterative code generation with evolutionary algorithms). Combinators feel alien, making even Haskell or APL seem intuitive, but maybe that’s because they don’t align with human working memory. Language models, with their massive context windows, handle long-range dependencies in sequences well, even if their understanding is shallower in some ways. Could combinators, with their compositional and deductive nature, be a better fit for machines than humans? For example, instead of generating Python functions in an evolutionary approach[0], could we use combinators as the building blocks? They’re compact, formal, and inherently step-by-step, which might make them ideal for tasks requiring structured reasoning and generalization. What do you think?

[0]: https://jeremyberman.substack.com/p/how-i-got-a-record-536-o...

Yusefmosiah commented on Towards a new kind of science and technology scottlocklin.wordpress.co... · Posted by u/surprisetalk

slowmovintarget · a year ago

Many in the physics profession don't take him as seriously as they probably should. Part of it is Wolfram's tendencies to make grandiose claims about his work before it pans out, which inoculates against belief even when the work does show promise.

His current work with "the Ruliad" and the hypergraph model of all possible rules is actually interesting. Whether it will yield results as a framework for finding a TOE; who knows? (It helps that the hypergraph edges have no physical length, which means Lorentz contraction and continuous space can still be modeled. It does seem to require discrete time, relative to some starting node, though. Could just be my limited understanding.)

Also, his derisive term for people who think in terms of computation is likely a back-handed reference to Wolfram.

Yusefmosiah · a year ago

To throw shade without directly showing where Wolfram goes wrong is pathetic. This blog reads like a post about basketball in the 90s that doesn't mention Michael Jordan, titled "Bull in a China Shop".

Yusefmosiah commented on Towards a new kind of science and technology scottlocklin.wordpress.co... · Posted by u/surprisetalk

Yusefmosiah · a year ago

I feel like I'm in a parallel universe where Stephen Wolfram doesn't exist. But he does, he's a living legend, and his seminal work, A New Kind of Science, and its core concepts, computational equivalence and computational irreducibility, are the answers you're looking for here.

For more up-to-date thoughts on thermodynamics I'd start here: https://writings.stephenwolfram.com/2023/02/computational-fo...

Yusefmosiah commented on Devin is now generally available cognition.ai/blog/devin-g... · Posted by u/neural_thing

cbhl · a year ago

This predates the o1 release, but the folks behind Devin did do some early evaluation of o1 vs 4o vs Devin back in September:

https://x.com/cognition_labs/status/1834292718174077014

I'd expect a very different experience with Devin vs the IDE-forks -- it provides status updates in Slack, runs CI, and when it's done it puts up a pull request in GitHub.

Yusefmosiah · a year ago

Thanks, but that comparison is for old models, a different, non-shipped version of Devin called “Devin-base”, and doesn’t include Claude.

Slack integration, automatically pushing to CI, etc., are relatively low-value compared to the questions of “does it write better code than alternatives?”, “can I depend on it to solve hard problems?”, “will I still need a Cursor and/or ChatGPT Pro subscription to debug Devin’s mistakes?”

Yusefmosiah commented on Devin is now generally available cognition.ai/blog/devin-g... · Posted by u/neural_thing

Yusefmosiah · a year ago

Looking for comprehensive benchmarks with Devin vs Cursor + Claude 3.6 vs ChatGPT o1 Pro.

In my own experience using Cursor with Claude 3.5 Sonnet (new) and o1-preview, Claude is sufficient for most things, but there are times when Claude gets stumped. Invariably that means I asked it to do too much. But sometimes, maybe 10-20% of the time, o1-preview is able to do what Claude couldn’t.

I haven’t signed up for o1 Pro because going from Cursor to copy/pasting from ChatGPT is a big DevX downgrade. But from what I’ve heard o1 Pro can solve harder coding problems that would stump Claude or o1-preview.

My solution is just to split the problem into smaller chunks that make it tractable for Claude. I assume this is what Devin’s doing. Or is Devin using custom models or an early version of the o1 (full or pro) API?

Yusefmosiah commented on Devin is now generally available cognition.ai/blog/devin-g... · Posted by u/neural_thing

projectileboy · a year ago

I work with a team at Nubank that has been using Devin. I would say that it doesn't quite make sense to compare it to Claude 3.5, because Devin isn't really like Copilot; it's more like an assistant to which you can assign a project. We're using it only for particular use cases, but for those particular use cases it's like having a superpower.

Yusefmosiah · a year ago

The reason it makes sense to compare them is there are problems that Claude 3.5 (or o1) can’t solve. Can Devin solve them? If yes, it’s easily worth the $500. If no, it’s a harder sell.

Yusefmosiah commented on Show HN: Windsurf – Agentic IDE codeium.com/windsurf/... · Posted by u/fortenforge

Yusefmosiah · a year ago

This has a lot of potential. I've been using Cursor Composer heavily and it's great but buggy, and could be more agentic.

After about an hour with Windsurf, I find myself frustrated with how it deals with context. If you add a directory to your Cascade, it's reluctant to actually read all the files in the directory.

I understand that they don't want to pay for a ton of long-context queries, but please, let users control the context, and pass the costs to the user.

It's very annoying to have the LLM try to create a file that already exists, it just didn't know about it.

Also, comments on the terminal management reflect a real issue. One solution is to expose the Cascade terminal to the user, letting the user configure the terminal in a working state, so that it has access to the correct dependencies and the PATH is properly sourced.