It feels like either finding that 2% that's off (or dealing with 2% error) will be the time consuming part in a lot of cases. I mean, this is nothing new with LLMs, but as these use cases encourage users to input more complex tasks, that are more integrated with our personal data (and at times money, as hinted at by all the "do task X and buy me Y" examples), "almost right" seems like it has the potential to cause a lot of headaches. Especially when the 2% error is subtle and buried in step 3 of 46 of some complex agentic flow.
After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.
I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.
[1] https://ssdeep-project.github.io/ssdeep/index.html
[2] https://joshleeb.com/posts/content-defined-chunking.html
How ever many modalities do end up being incorporated however, does not change the horizon of this technology which has progressed only by increasing data volume and variety -- widening the solution class (per problem), rather than the problem class itself.
There is still no mechanism in GenAI that enforces deductive constraints (and compositionality), ie., situations where when one output (, input) is obtained the search space for future outputs is necessarily constrained (and where such constraints compose). Yet all the sales pitches about the future of AI require not merely encoding reliable logical relationships of this kind, but causal and intentional ones: ones where hypothetical necessary relationships can be imposed and then suspended; ones where such hypotheticals are given a ordering based on preference/desires; ones where the actions available to the machine, in conjunction with the state of its environment, lead to such hypothetical evaluations.
An "AI Agent" replacing an employee requires intentional behaviour: the AI must act according to business goals, act reliably using causal knowledge of the environment, reason deductively over such knowledge, and formulate provisional beliefs probabilistically. However there has been no progress on these fronts.
I am still unclear on what the sales pitch is supposed to be for stochastic AI, as far as big business goes or the kinds of mass investment we see. I buy a 70s-style pitch for the word processor ("edit without scissors and glue"), but not a 60s-style pitch for the elimination of any particular job.
The spend on the field at the moment seems predicated on "better generated images" and "better generated text" somehow leading to "an agent which reasons from goals to actions, simulates hypothetical consequences, acts according to causal and environmental constraints.. " and so on. With relatively weak assumptions one can show the latter class of problem is not in the former, and no amount of data solving the former counts as a solution to the latter.
The vast majority of our work is already automated to the point where most non-manual workers are paid for the formulation of problems (with people), social alignment in their solutions, ownership of decision-making / risk, action under risk, and so on.
https://www.autoharnesshouse.com/69018.html
> Note for customers retaining OEM headunit: This adapter can also be used for those wishing to remove/disable the OEM Subaru Telematics functions. This is done to eliminate the tracking cabability that Subaru has built into these vehicles. If this is you, we will need to add an additional part to this adapter to re-enable the bluetooth microphone. Please purchase the option 2 adapter near the bottom of this page for this situation.