whs (u/whs) - Readit News

whs commented on We put a coding agent in a while loop github.com/repomirrorhq/r... · Posted by u/sfarshid

ghuntley · 2 days ago

Nice. Any chance you could put in some attributions and credits in your paper? https://orcid.org/0009-0007-3955-9994

whs · 2 days ago

I never read your work though (and still haven't since it's paywalled), I just discovered today that we independently discovered the same thing.

whs commented on We put a coding agent in a while loop github.com/repomirrorhq/r... · Posted by u/sfarshid

ghuntley · 2 days ago

> - weird copyright / IP questions all around. This will be a minefield.

Yeah, we're in weird territory because you can drive an LLM as a Bitcoin mixer over intellectual property. That's the entire point/meaning behind https://ghuntley.com/z80.

You can take something that exists, distill it back to specs, and then you've got your own IP. Throw away the tainted IP, and then just run Ralph over a loop. You are able to clone things (not 100%, but it's better than hiring humans).

whs · 2 days ago

I wrote an MCP based on that technique - https://github.com/whs/mcp-chinesewall

Basically to avoid the ambiguity of training LLM from unlicensed code, I use it to generate description of the code to another LLM trained from permissively licensed code. (There aren't any usable public domain models I've found)

I use it in real world and it seems that the codegen model work 10-20% of the time (the description is not detailed enough - which is good for "clean room" but a base model couldn't follow that). All models can review the code, retry and write its own implementation based on the codegen result though.

whs commented on Ask HN: Do you think differently about working on open source these days? · Posted by u/gillyb

whs · 15 days ago

I kinda want to quit making my code publicly available. It is my believe that the model may output license contaminated code and any license I put into my code will not be accurate, so my policy is I don't use AI coding on my publicly available open source code. However, this also means that I'm working for free fueling the AI industry.

However, it is quite fun to remove the boring part in programming with AI, so any hobby code I write I won't be making them public.

Currently I'm working on a way to use models trained from MIT-licensed code (eg. Comma) by using normal commercial model to supervise it. I believe this make the output code only be tainted with permissive code, and so I can now slowly use AI to write open source code again.

whs commented on Ollama's new app ollama.com/blog/new-app... · Posted by u/BUFU

nileshtrivedi · a month ago

Question since you are here, how long before tool-calling is enabled for Gemma3 models?

whs · a month ago

Seems that Google intend it to be that way - https://ai.google.dev/gemma/docs/capabilities/function-calli... . I suppose they are saying that the model is good enough that if you put the tool call format in prompt it should be able to handle any formats.

I use PetrosStav/gemma3-tools and it seems that it only works half of the time - the rest the model call the tool but it doesn't get properly parsed by Ollama.

whs commented on Claude Code Unleashed ymichael.com/2025/07/15/c... · Posted by u/ymichael

ffsm8 · a month ago

At least for art there is already precedent in US courts with someone trying to copyright an image generated by midjourney and it getting revoked in 22, because ai generated art cannot be copyrighted.

for code it hasn't been challenged yet, but I find it doubtful they'd decide differently there

whs · a month ago

I was reading Doe 1 v. GitHub for my paper. The case involves open source developers suing Github Copilot which were trained on, and generating open source code including code with MIT and AGPL license.

So far, the judge believe that training models on open source code is not a license violation as the code is public for anyone to read, but by "distribution or redistribution" (I assume, of the model's outputs?) it is still up for the court's decision whether that violate the terms of the license, among other laws.

The case is currently moved to Ninth Circuit without a decision in the district court, as there are other similar cases (such as Authors Guild's) and they wanted that the courts would offer a consistent rules. I believe one of the big delay in the case is in damages, which I think the plaintiff tried to ask for details of Microsoft's valuation of GitHub when it was acquired, as GitHub's biggest asset is the Git repositories and may provide a monetary value of how much each project is worth. Microsoft is trying to stall and not reveal this.

whs commented on bootc-image-builder: Build your entire OS from a Containerfile github.com/osbuild/bootc-... · Posted by u/twelvenmonkeys

yjftsjthsd-h · 2 months ago

> A container for deploying bootable container images.

...as long as the images are in the Red Hat family (Fedora, CentOS Stream, RHEL).

whs · 2 months ago

I was going to try this to perhaps use it in production. Turns out the RHEL clones like Alma or Rocky doesn't have this thing in production-ready grade. All options you have now are owned by Red Hat themselves.

whs commented on Falsehoods programmers believe about aviation flightaware.engineering/f... · Posted by u/cratermoon

fho · 3 months ago

How does "model, make and serial number" translate to humans?

(No racist intentions here, but you bring up both points and I thought that to be interesting)

whs · 3 months ago

Gender, Father and order of birth

Like Mary, first daughter of Henry VIII

whs commented on The cryptography behind passkeys blog.trailofbits.com/2025... · Posted by u/tatersolid

exabrial · 3 months ago

Have a question, is the TLSSessionState part of the signature nonce?

I remember this being an anti-MITM measure for u2f

whs · 3 months ago

My friend asked similar question yesterday, and while I don't know the answer I wish it don't

Most large websites are hosted behind a CDN or a load balancer, which terminate the TLS session and is a MITM between the customer and the actual backend server. The problem is similar to TLS Client Certificate - you can't forward these to the backend now, and the load balancer is not smart enough to validate the data so it is impossible to use it.

In recent years (~5 years), AWS ALB and competitors gained the client certificate support now which pass the certificate information to your application in HTTP headers - instead of a standardized way of reading client certificate the servers has to read from non-standard headers.

If passkeys is also passed as HTTP payload, I don't see believe that the LB would read the payload anytime soon. It might become a selling feature for IDP-as-a-service like Auth0 that you can't do it with IaaS.

whs commented on AWS Built a Security Tool. It Introduced a Security Risk token.security/blog/aws-b... · Posted by u/simplesort

placardloop · 4 months ago

AWS does not treat metadata with the same level of sensitivity as other data. The docs explicitly say that sensitive information should not be stored in eg tags or policies. If you are attempting to do so, you’re fighting against the very tool you’re using.

whs · 4 months ago

To add on this point, in my interaction with AWS employees it seems that

- The account manager and the enterprise support TAM can view a list of all resources on the account, including metadata like resource name, instance type and cost explorer tags. Enterprise support routinely present a monthly cost review with us, so it is clear that they can always access this information without our explicit consent. They do not have the ability to view detailed internal information about it though, such as internal logs.

- When opening support case, the ticketing system ask for resource ARN which may contains the name. It seems that the support team can view some data about that object including monitoring data and internal logs, but potentially accessing "customer data" (such as ssh-ing into an RDS instance) requires explicit, one off consent.

- I never opened any issues about IAM policy, so I don't know if they see IAM role policy document

- It seems that the account ID and account name is also often used by both AWS' sales side and reseller's side. I think I read somewhere that it is possible to retrieve the AWS account ID if you know S3 bucket or something, and when exchanging data with external partner via AWS (eg. S3, VPC peering) you're required to exchange account ID to the partner.

whs commented on Reading Zanzibar macwright.com/2025/05/02/... · Posted by u/surprisetalk

whs · 4 months ago

A few years ago I tried implementing Zanzibar for my company, but I needed one change - I don't want to store permissions in Zanzibar but instead it should act as an API gateway that lookup permissions stored in the services. Like if user act on an order, the user service and order service should be contacted.

Turns out it is pretty much required for a distributed system. A common question in microservice architecture is whether to validate permissions only at the API gateway layer, or at every points of use. If you want to validate it everywhere, what happen when you're running async job and the user get revoked. In Zanzibar you attach the cookie as the user's context and Zanzibar will always return the same answer. (This is not meant for cronjob where user set it once and it repeat daily, but rather for quick, one off background jobs like generating reports to users' email) If you remove the internal store, the application's API must provide point-in-time query, which I never see one application does that let alone a microservice environment.

Another problem is cache invalidation - when permission get added or removed, users want that to reflect quickly. I can't remember how the paper handle this, but in any case since the permissions are stored in Zanzibar, every changes goes through Zanzibar. If you remove the internal data store, you lose the change notification.

The pseudo-Zanzibar lives in production today, but I feel like it is one of the mistake in my career.