We could have LLMs ingest all these historical sessions, and use them as context for the current session. Basically treat the current session as an extension of a much, much longer previous session.
Plus, future models might be able to "understand" the limitations of current models, and use the historical session info to identity where the generated code could have deviated from user intention. That might be useful for generating code, or just more efficient analysis by focusing on possible "hotspots", etc.
Basically, it's high time we start capturing any and all human input for future models, especially open source model development, because I'm sure the companies already have a bunch of this kind of data.
Context rot is very much a thing. May still be for future agents. Dumping tens/hundreds of thousand of trash tokens into context very much worsen the performance of the agent
Replication crisis[1].
Given initial conditions and even accounting for 'noise' would a LLm arrive at the same output.It should , for the same reason math problems require one to show their working. Scientific papers require the methods and pseudocode while also requireing limitations to be stated.
Without similar guardrails , maintainance and extension of future code becomes a choose your own adventure.Where you have to guess at the intent and conditions of the LLM used.
[1] https://www.ipr.northwestern.edu/news/2024/an-existential-cr...
We don't put our transitional proofs in papers, only the final best one we have. So that analogy doesn't work.
For every proof in a paper there is probably 100 non-working / ugly sketches or just snippets of proofs that exist somewhere in a notebook or erased on a blackboard.