I really like a lot of what Google produces, but they can't seem to keep a product that they don't shut down and they can be pretty ham-fisted, both with corporate control (Chrome and corrupt practices) and censorship
I think Claude is much more predictable and follows instructions better- the todo list it manages seems very helpful in this respect.
Play board games, go on a walk, play on the floor with my kid, play a sport.
"Doing" doesn't have to be creative or productive!
* Way way way more code in the training set.
* Code is almost always a more concise representation.
There has been work in the past training graph neural networks or transformers that get AST edge information. It seems like some sort of breakthrough (and tons of $) would be needed for those approaches to have any chance of surpassing leading LLMs.
Experimentally having agents use ast-grep seems to work pretty well. So, still representing a everything as code, but using a syntax aware search replace tool.
Why not convert the training code to AST?
In this situation, I would have a tool called "request ability to edit GLTF". I This would trigger an addition to the tool list specifically for your desired GLTF. The model would send the "tool list changed' notification and now the LLM would have access.
If you want to do it without the tool list changed notification ability, I'd have two tools, get schema for GLTF, and edit GLTF with schema. If you note that the get schema is a dependency for edit, the LLM could probably plumb that together on it's own fairly well
You could probably also support this workflow using sampling.
But alignment work has steadily improved role adherence; a tonne of RLHF work has gone into making sure roles are respected, like kernel vs. user space.
If role separation were treated seriously -- and seen as a vital and winnable benchmark (thus motivate AI labs to make it even tighter) many prompt injection vectors would collapse...
I don't know why these articles don't communicate this as a kind of central pillar.
Fwiw I wrote a while back about the “ROLP” — Role of Least Privilege — as a way to think about this, but the idea doesn't invigorate the senses I guess. So, even with better role adherence in newer models, entrenched developer patterns keep the door open. If they cared tho, the attack vectors would collapse.
I think it will get harder and harder to do prompt injection over time as techniques to seperate user from system input mature and as models are trained on this strategy.
That being said, prompt injection attacks will also mature, and I don't think that the architecture of an LLM will allow us to eliminate the category of attack. All that we can do is mitigate
You're absolutely right! This can actually extend even to things like safety guardrails. If you tell or even train an AI to not be Mecha-Hitler, you're indirectly raising the probability that it might sometimes go Mecha-Hitler. It's one of many reasons why genuine "alignment" is considered a very hard problem.
If you are looking at something, you are more likely to steer towards it. So it's a bad idea to focus on things you don't want to hit. The best approach is to pick a target line and keep the target line in focus at all times.
I had never realized that AIs tend to have this same problem, but I can see it now that it's been mentioned! I have in the past had to open new context windows to break out of these cycles.
I don't see that- cursor will die by losing market share, not by the death of the market. Agentic coding as a market will continue to grow and if Claude remains competitive then Anthropic will do just fine.
If you're a large org with an API that an ecosystem of other partners use then you should host a remote MCP and then people should connect LLMs to it.
The current model of someone bundling tools into an MCP and then you download and run that MCP locally feels a bit like the wrong path. Tool definitions for LLMs are already pretty standardized if things are just running locally why am I not just importing a package of tools, I'm not sure what the MCP server is adding.
I think it provides the similar benefits of decoupling the front and back end of a standard app.
I can pick my favorite AI "front end"- whether that's in my IDE as a dev, a desktop app as a business user, or on a server if I'm running an agentic workflow.
MCP allows you to package tools, prompts, etc. in a way that works across any of those front ends.
Even if you don't plan on leveraging the MCP across multiple tools in that way- I do think it has some benefits in de-coupling the lifecycle of the tool development from the model/ UI.
This is one example of the "horseless carriage" AI solutions. I've begun questioning further that actually we're going into a generation where a lot of the things we are doing now are not even necessary.
I'll give you one more example. The whole "Office" stack of ["Word", "Excel", "Powerpoint"] can also go away. But we still use it because change is hard.
Answer me this question. In the near future if we could have LLMs that can traverse to massive amount of data why do we need to make excel sheets anymore? Will we as a society continue to make excel spreadsheets because we want the insights the sheet provides or do we make excel sheets to make excel sheets.
The current generation of LLM products I find are horseless carriages. Why would you need agents to make spreadsheets when you should just be able to ask the agent to give you answers you are looking for from the spreadsheet.
A couple of related questions- if airplanes can fly themselves with auto-pilot, why do we need steering yolks? If I have a dishwasher- why do I still keep sponges and dish soap next to my sink?
The technology is nowhere near being reliable enough that we can eschew traditional means of interacting with data. That doesn't prevent the technology from being massively useful.