I wonder if the following things make the C driven version slower...
- prepare the send buffers (sqlite side)
- prepare the receive buffers (go side)
- do the call
- get the received data into go buffers of some kind
- free up the send buffers (happens automatically)
- free up the receive buffers (semi automatically in Go).
When using stdin/stdout, the system looks after send/receive buffers. It's simply reading/writing them. No allocation is needed. The stream can be as big or as little as wanted/needed. The OS will look after the integrity of the streams and these are probably fairly well tested subsystems on most operating systems.
stdin/stdout becomes a "library" for "fast data transfer".
Pretty neat.
Excerpt from the user agreement:
When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. For example, this license includes the right to use Your Content to train AI and machine learning models, as further described in our Public Content Policy. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.
People put their heads in the sand over reddit for some reason, but it's worse than FAANG."novel" here depends on what you mean. Could an LLM produce output that is unique that both it and no one else has seen before, possibly yes. Could that output have perceived or emotional value to people, sure. Related challenge: Is a random encryption key generated by a csprng novel?
In the case of the US copyright office, if there wasn't sufficient human involvement in the production then the output is not copyrightable and how "novel" it is does not matter - but that doesn't necessarily impact a prior production by a human that is (whether a copy or not). Novel also only matters in a subset of the many fractured areas of copyright laws affecting the space of this form of digital replication. The copyright office wrote: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell....
Where I imagine this approximately ends up is some set of tests that are oriented around how relevant to the whole the "copy" is, that is, it may not matter whether the method of production involved "copying", but may more matter if the whole works in which it is included are at large a copy, or, if the area contested as a copy, if it could be replaced with something novel, and it is a small enough piece of the whole, then it may not be able to meet some bar of material value to the whole to be relevant - that there is no harmful infringement, or similarly could cross into some notion of fair use.
I don't see much sanity in a world where small snippets become an issue. I think if models were regularly producing thousands of tokens of exactly duplicate content that's probably an issue.
I've not seen evidence of the latter outside of research that very deliberately performs active search for high probability cases (such as building suffix tree indices over training sets then searching for outputs based on guidance from the index). That's very different from arbitrary work prompts doing the same, and the models have various defensive trainings and wrappings attempting to further minimize reproductive behavior. On the one hand you have research metrics like 3.6 bits per parameter of recoverable input, on the other hand that represents a very small slice of the training set, and many such reproductions requiring strongly crafted and long prompts - meaning that for arbitrary real world interaction the chance of large scale overlap is small.
Well, AI can perhaps solve the problem it created here: generated IP with AI is much cheaper than with humans, so it will be viable even at lower payoffs.
Less cynical: you can use trade secrets to protect your IP. You can host your software and only let customers interact with it remotely, like what Google (mostly) does.
Of course, this is a very software-centric view. You can't 'protect' eg books or music in this way.
https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
We don't yet know how courts will rule on cases like Does v Github (https://githubcopilotlitigation.com/case-updates.html). LLM-based systems are not even capable of practicing clean-room design (https://en.wikipedia.org/wiki/Clean_room_design). For a maintainer to accept code generated by an LLM is to put the entire community at risk, as well as to endorse a power structure that mocks consent.
This is similar to the ruling by Alsup in the Anthropic books case that the training is “exceedingly transformative”. I would expect a reinterpretation or disagreement on this front from another case to be both problematic and likely eventually overturned.
I don’t actually think provenance is a problem on the axis you suggest if Alsups ruling holds. That said I don’t think that’s the only copyright issue afoot - the copyright office writing on copyrightability of outputs from the machine essentially requires that the output fails the Feist tests for human copyrightability.
More interesting to me is how this might realign the notion of copyrightability of human works further as time goes on, moving from every trivial derivative bit of trash potentially being copyrightable to some stronger notion of, to follow the feist test, independence and creativity. Further it raises a fairly immediate question in an open source setting if many individual small patch contributions themselves actually even pass those tests - they may well not, although the general guidance is to set the bar low - but is a typo fix either? There is so far to go on this rabbit hole.
Imagine following case: an attacker pretending to represent a company sends you a repository as a test task before the interview. You run something like "npm install" or run Rust compiler, and your computer is controlled by an attacker now.
Or imagine how one coworker's machine gets hacked, the malicious code is written into a repository and whole G, F or A is now owned by foreign hackers. All thanks to npm and Rust compiler.
Maybe those tools should explicitly confirm executing every external command (with caching allowed commands list in order to not ask again). And maybe Linux should provide an easy to use and safe sandbox for developers. Currently I have to make sandboxes from scratch myself.
Also in maybe cases you don't need the ability to run external code, for example, to install a JS package all you need to do is to download files.
Also this is an indication why it is a bad idea to use environment variables for secrets and configuration. Whoever wrote "12 points app" doesn't know that there are command-line switches and configuration files for this.
Fractional CTO @ A Consumer Healthtech Marketplace 20 - 40 hrs | $175 - $200 / hr | Remote (USA only) https://www.fractionaljobs.io/jobs/chief-technology-officer-...
Senior AI Engineer @ A European Insurtech Startup 20 - 40 hrs / week | €85 - €100 / hr | Remote (CET +/- 6hrs) https://www.fractionaljobs.io/jobs/senior-ai-engineer-at-a-e...
Senior Full-stack Engineer @ A Consumer Social Startup 20 - 40 hrs / week | $125 - $150 / hr | Remote (EST +/- 5 hrs) https://www.fractionaljobs.io/jobs/senior-full-stack-enginee...
Staff Frontend Engineer @ An HR-tech Analytics Platform 20 - 40 hrs / week | $120 - $180 / hr | Remote (USA / Canada only) https://www.fractionaljobs.io/jobs/staff-frontend-engineer-a...
AI Engineer @ A Creator-focused AI Startup 10 - 15 hrs / week | $100 - $125 / hr | Remote (USA / Canada / Europe only) https://www.fractionaljobs.io/jobs/ai-engineer-at-a-creator-...