I've been thinking about this because it would be nice to have a fuzzier search.
But the cool kids? They'd do something worse;
They'd define some complicated agentic setup that cloned your code base into containers firewalled off from the world, give prompts like;
Your expert software dev in MY_FAVE_LANG, here's a bug description 'LONG BUG DESCRIPTION' explore the code and write a solution. Here's some tools (read_file, write_file, ETC)
You'd then spawn as many of these as you can, per task, and have them all generate pull requests for the tasks. Review them with an LLM, then manually and accept PR's you wanted. Now your in the ultra money.
You'd use RAG to guide an untuned LLM on your code base for styles and how to write code. You'd write docs like "how to write an API, how to write a DB migration, ETC" and give that as tool to the agents writing the code.
With time and effort, you can write agents to be specific to your code base through fine tuning, but who's got that kind of money?
I assume the NER model is small enough to run on CPU at less than 1s~ per pass at the trade off of storage per instance (1s is fast enough in dev, in prod with long convos - that's a lot of inference time), generally a neat idea though.
Couple questions;
- NER doesn't often perform well in different domains, how accurate is the model?
- How do you actually allocate compute/storage for inferring on the NER model?
- Are you batching these `filter` calls or is it just sequential 1 by 1 calls