raggi (u/raggi) - Readit News

raggi commented on Show HN: JavaScript-free (X)HTML Includes github.com/Evidlo/xsl-web... · Posted by u/Evidlo

raggi · a day ago

I used to do this in the 2000's era, there was a lot to love about it. At the time though the IE engines were far more complete and less buggy than others with various XML features.

raggi commented on Benchmarks for Golang SQLite Drivers github.com/cvilsmeier/go-... · Posted by u/cvilsmeier

kreelman · 2 days ago

That's an interesting thought. I wonder.

I wonder if the following things make the C driven version slower...

- prepare the send buffers (sqlite side)

- prepare the receive buffers (go side)

- do the call

- get the received data into go buffers of some kind

- free up the send buffers (happens automatically)

- free up the receive buffers (semi automatically in Go).

When using stdin/stdout, the system looks after send/receive buffers. It's simply reading/writing them. No allocation is needed. The stream can be as big or as little as wanted/needed. The OS will look after the integrity of the streams and these are probably fairly well tested subsystems on most operating systems.

stdin/stdout becomes a "library" for "fast data transfer".

Pretty neat.

raggi · 2 days ago

fwiw, the tailscale fork of Crawshaw’s library has a good number of allocation removals and other optimizations, but cgo is still expensive.

raggi commented on AI tooling must be disclosed for contributions github.com/ghostty-org/gh... · Posted by u/freetonik

j4coh · 2 days ago

By novel, I mean if I ask a model to write some lyrics or code and it produces pre-existing code or lyrics, is it novel and legally safe to use because the pre-existing code or lyrics aren’t precisely encoded in a large enough model, and therefore legally not a reproduction just coincidentally identical.

raggi · 2 days ago

No. I don't think "novelty" would be relevant in such a case. How much risk you have depends on many factors, including what you mean by "use". If you mean sell, and you're successful, you're at risk. That would be true even if it's not the same as other content but just similar. Copyright provides little to no protection from legal costs if someone is motivated to bring a case at you.

raggi commented on AI tooling must be disclosed for contributions github.com/ghostty-org/gh... · Posted by u/freetonik

aspenmayer · 2 days ago

Tell that to Reddit. They’re AI translating user posts and serving it up as separate Google search results. I don’t remember if Reddit claims copyright on user-submitted content, or on its AI translations, but I don’t think Reddit is paying ad share like X is, either, so it kind of doesn’t matter to the user, as they’re (still) not getting paid, even as Reddit collects money for every ad shown/clicked. Even if OP did write it, an AI translated the version shown.

https://news.ycombinator.com/context?id=44972296

raggi · 2 days ago

reddit is a user hostile company, have been forever, always will be. they take rights over your content, farm things about you, sell data, do invasive things in the mobile apps, use creepware cookies, etc.

Excerpt from the user agreement:

    When Your Content is created with or submitted to the Services, you grant us a worldwide, royalty-free, perpetual, irrevocable, non-exclusive, transferable, and sublicensable license to use, copy, modify, adapt, prepare derivative works of, distribute, store, perform, and display Your Content and any name, username, voice, or likeness provided in connection with Your Content in all media formats and channels now known or later developed anywhere in the world. This license includes the right for us to make Your Content available for syndication, broadcast, distribution, or publication by other companies, organizations, or individuals who partner with Reddit. For example, this license includes the right to use Your Content to train AI and machine learning models, as further described in our Public Content Policy. You also agree that we may remove metadata associated with Your Content, and you irrevocably waive any claims and assertions of moral rights or attribution with respect to Your Content.

People put their heads in the sand over reddit for some reason, but it's worse than FAANG.

raggi commented on AI tooling must be disclosed for contributions github.com/ghostty-org/gh... · Posted by u/freetonik

j4coh · 2 days ago

So if you can get an LLM to produce music lyrics, for example, or sections from a book, those would be considered novel works given the encoding as well?

raggi · 2 days ago

"an LLM" could imply an LLM of any size, for sufficiently small or focused training sets an LLM may not be transformative. There is some scale at which the volume and diversity of training data and intricacy of abstraction moves away from something you could reasonably consider solely memorization - there's a separate issue of reproduction though.

"novel" here depends on what you mean. Could an LLM produce output that is unique that both it and no one else has seen before, possibly yes. Could that output have perceived or emotional value to people, sure. Related challenge: Is a random encryption key generated by a csprng novel?

In the case of the US copyright office, if there wasn't sufficient human involvement in the production then the output is not copyrightable and how "novel" it is does not matter - but that doesn't necessarily impact a prior production by a human that is (whether a copy or not). Novel also only matters in a subset of the many fractured areas of copyright laws affecting the space of this form of digital replication. The copyright office wrote: https://www.copyright.gov/ai/Copyright-and-Artificial-Intell....

Where I imagine this approximately ends up is some set of tests that are oriented around how relevant to the whole the "copy" is, that is, it may not matter whether the method of production involved "copying", but may more matter if the whole works in which it is included are at large a copy, or, if the area contested as a copy, if it could be replaced with something novel, and it is a small enough piece of the whole, then it may not be able to meet some bar of material value to the whole to be relevant - that there is no harmful infringement, or similarly could cross into some notion of fair use.

I don't see much sanity in a world where small snippets become an issue. I think if models were regularly producing thousands of tokens of exactly duplicate content that's probably an issue.

I've not seen evidence of the latter outside of research that very deliberately performs active search for high probability cases (such as building suffix tree indices over training sets then searching for outputs based on guidance from the index). That's very different from arbitrary work prompts doing the same, and the models have various defensive trainings and wrappings attempting to further minimize reproductive behavior. On the one hand you have research metrics like 3.6 bits per parameter of recoverable input, on the other hand that represents a very small slice of the training set, and many such reproductions requiring strongly crafted and long prompts - meaning that for arbitrary real world interaction the chance of large scale overlap is small.

raggi commented on AI tooling must be disclosed for contributions github.com/ghostty-org/gh... · Posted by u/freetonik

eru · 2 days ago

> Sure it’s a big hill to climb in rethinking IP laws to align with a societal desire that generating IP continue to be a viable economic work product, but that is what’s necessary.

Well, AI can perhaps solve the problem it created here: generated IP with AI is much cheaper than with humans, so it will be viable even at lower payoffs.

Less cynical: you can use trade secrets to protect your IP. You can host your software and only let customers interact with it remotely, like what Google (mostly) does.

Of course, this is a very software-centric view. You can't 'protect' eg books or music in this way.

raggi · 2 days ago

In the US you can not generate copyrightable IP without substantial human contribution to the process.

https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...

raggi commented on AI tooling must be disclosed for contributions github.com/ghostty-org/gh... · Posted by u/freetonik

jedbrown · 2 days ago

Provenance matters. An LLM cannot certify a Developer Certificate of Origin (https://en.wikipedia.org/wiki/Developer_Certificate_of_Origi...) and a developer of integrity cannot certify the DCO for code emitted by an LLM, certainly not an LLM trained on code of unknown provenance. It is well-known that LLMs sometimes produce verbatim or near-verbatim copies of their training data, most of which cannot be used without attribution (and may have more onerous license requirements). It is also well-known that they don't "understand" semantics: they never make changes for the right reason.

We don't yet know how courts will rule on cases like Does v Github (https://githubcopilotlitigation.com/case-updates.html). LLM-based systems are not even capable of practicing clean-room design (https://en.wikipedia.org/wiki/Clean_room_design). For a maintainer to accept code generated by an LLM is to put the entire community at risk, as well as to endorse a power structure that mocks consent.

raggi · 2 days ago

For a large LLM I think the science in the end will demonstrate that verbatim reproduction is not coming from verbatim recording, as the structure really isn’t setup that way in the models under question here.

This is similar to the ruling by Alsup in the Anthropic books case that the training is “exceedingly transformative”. I would expect a reinterpretation or disagreement on this front from another case to be both problematic and likely eventually overturned.

I don’t actually think provenance is a problem on the axis you suggest if Alsups ruling holds. That said I don’t think that’s the only copyright issue afoot - the copyright office writing on copyrightability of outputs from the machine essentially requires that the output fails the Feist tests for human copyrightability.

More interesting to me is how this might realign the notion of copyrightability of human works further as time goes on, moving from every trivial derivative bit of trash potentially being copyrightable to some stronger notion of, to follow the feist test, independence and creativity. Further it raises a fairly immediate question in an open source setting if many individual small patch contributions themselves actually even pass those tests - they may well not, although the general guidance is to set the bar low - but is a typo fix either? There is so far to go on this rabbit hole.

raggi commented on How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos research.kudelskisecurity... · Posted by u/spiridow

jeremyjh · 4 days ago

They are talking about executing code at compile time (macros and such). With modern IDEs/editors, just opening the folder may trigger such behavior (when LSP boots and compiles) though some environments warn you.

raggi · 3 days ago

I know, but the _implication_ is that it's extremely unsafe, I don't buy the implication - code gets executed.

raggi commented on How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos research.kudelskisecurity... · Posted by u/spiridow

codedokode · 5 days ago

One of the problems is that code analyzers, bundlers, compilers (like Rust compiler) allow running arbitrary code without any warning.

Imagine following case: an attacker pretending to represent a company sends you a repository as a test task before the interview. You run something like "npm install" or run Rust compiler, and your computer is controlled by an attacker now.

Or imagine how one coworker's machine gets hacked, the malicious code is written into a repository and whole G, F or A is now owned by foreign hackers. All thanks to npm and Rust compiler.

Maybe those tools should explicitly confirm executing every external command (with caching allowed commands list in order to not ask again). And maybe Linux should provide an easy to use and safe sandbox for developers. Currently I have to make sandboxes from scratch myself.

Also in maybe cases you don't need the ability to run external code, for example, to install a JS package all you need to do is to download files.

Also this is an indication why it is a bad idea to use environment variables for secrets and configuration. Whoever wrote "12 points app" doesn't know that there are command-line switches and configuration files for this.

raggi · 4 days ago

I love this implication that there's some valuable body of code out there that gets reviewed, compiled and never executed.

raggi commented on Show HN: Fractional jobs – part-time roles for engineers fractionaljobs.io... · Posted by u/tbird24

tbird24 · 5 days ago

I realize I should probably comment links to some of the better engineering roles we're currently featuring right now. BTW I should also note we don't take a % commission like Upwork, Toptal, etc. So if you get hired you'd work with the company directly and get paid by them direct.

Fractional CTO @ A Consumer Healthtech Marketplace 20 - 40 hrs | $175 - $200 / hr | Remote (USA only) https://www.fractionaljobs.io/jobs/chief-technology-officer-...

Senior AI Engineer @ A European Insurtech Startup 20 - 40 hrs / week | €85 - €100 / hr | Remote (CET +/- 6hrs) https://www.fractionaljobs.io/jobs/senior-ai-engineer-at-a-e...

Senior Full-stack Engineer @ A Consumer Social Startup 20 - 40 hrs / week | $125 - $150 / hr | Remote (EST +/- 5 hrs) https://www.fractionaljobs.io/jobs/senior-full-stack-enginee...

Staff Frontend Engineer @ An HR-tech Analytics Platform 20 - 40 hrs / week | $120 - $180 / hr | Remote (USA / Canada only) https://www.fractionaljobs.io/jobs/staff-frontend-engineer-a...

AI Engineer @ A Creator-focused AI Startup 10 - 15 hrs / week | $100 - $125 / hr | Remote (USA / Canada / Europe only) https://www.fractionaljobs.io/jobs/ai-engineer-at-a-creator-...

raggi · 5 days ago

These prices seem very low.