Yeah, we're in weird territory because you can drive an LLM as a Bitcoin mixer over intellectual property. That's the entire point/meaning behind https://ghuntley.com/z80.
You can take something that exists, distill it back to specs, and then you've got your own IP. Throw away the tainted IP, and then just run Ralph over a loop. You are able to clone things (not 100%, but it's better than hiring humans).
Basically to avoid the ambiguity of training LLM from unlicensed code, I use it to generate description of the code to another LLM trained from permissively licensed code. (There aren't any usable public domain models I've found)
I use it in real world and it seems that the codegen model work 10-20% of the time (the description is not detailed enough - which is good for "clean room" but a base model couldn't follow that). All models can review the code, retry and write its own implementation based on the codegen result though.
However, it is quite fun to remove the boring part in programming with AI, so any hobby code I write I won't be making them public.
Currently I'm working on a way to use models trained from MIT-licensed code (eg. Comma) by using normal commercial model to supervise it. I believe this make the output code only be tainted with permissive code, and so I can now slowly use AI to write open source code again.
I use PetrosStav/gemma3-tools and it seems that it only works half of the time - the rest the model call the tool but it doesn't get properly parsed by Ollama.
for code it hasn't been challenged yet, but I find it doubtful they'd decide differently there
So far, the judge believe that training models on open source code is not a license violation as the code is public for anyone to read, but by "distribution or redistribution" (I assume, of the model's outputs?) it is still up for the court's decision whether that violate the terms of the license, among other laws.
The case is currently moved to Ninth Circuit without a decision in the district court, as there are other similar cases (such as Authors Guild's) and they wanted that the courts would offer a consistent rules. I believe one of the big delay in the case is in damages, which I think the plaintiff tried to ask for details of Microsoft's valuation of GitHub when it was acquired, as GitHub's biggest asset is the Git repositories and may provide a monetary value of how much each project is worth. Microsoft is trying to stall and not reveal this.
...as long as the images are in the Red Hat family (Fedora, CentOS Stream, RHEL).
I remember this being an anti-MITM measure for u2f
Most large websites are hosted behind a CDN or a load balancer, which terminate the TLS session and is a MITM between the customer and the actual backend server. The problem is similar to TLS Client Certificate - you can't forward these to the backend now, and the load balancer is not smart enough to validate the data so it is impossible to use it.
In recent years (~5 years), AWS ALB and competitors gained the client certificate support now which pass the certificate information to your application in HTTP headers - instead of a standardized way of reading client certificate the servers has to read from non-standard headers.
If passkeys is also passed as HTTP payload, I don't see believe that the LB would read the payload anytime soon. It might become a selling feature for IDP-as-a-service like Auth0 that you can't do it with IaaS.
- The account manager and the enterprise support TAM can view a list of all resources on the account, including metadata like resource name, instance type and cost explorer tags. Enterprise support routinely present a monthly cost review with us, so it is clear that they can always access this information without our explicit consent. They do not have the ability to view detailed internal information about it though, such as internal logs.
- When opening support case, the ticketing system ask for resource ARN which may contains the name. It seems that the support team can view some data about that object including monitoring data and internal logs, but potentially accessing "customer data" (such as ssh-ing into an RDS instance) requires explicit, one off consent.
- I never opened any issues about IAM policy, so I don't know if they see IAM role policy document
- It seems that the account ID and account name is also often used by both AWS' sales side and reseller's side. I think I read somewhere that it is possible to retrieve the AWS account ID if you know S3 bucket or something, and when exchanging data with external partner via AWS (eg. S3, VPC peering) you're required to exchange account ID to the partner.
Turns out it is pretty much required for a distributed system. A common question in microservice architecture is whether to validate permissions only at the API gateway layer, or at every points of use. If you want to validate it everywhere, what happen when you're running async job and the user get revoked. In Zanzibar you attach the cookie as the user's context and Zanzibar will always return the same answer. (This is not meant for cronjob where user set it once and it repeat daily, but rather for quick, one off background jobs like generating reports to users' email) If you remove the internal store, the application's API must provide point-in-time query, which I never see one application does that let alone a microservice environment.
Another problem is cache invalidation - when permission get added or removed, users want that to reflect quickly. I can't remember how the paper handle this, but in any case since the permissions are stored in Zanzibar, every changes goes through Zanzibar. If you remove the internal data store, you lose the change notification.
The pseudo-Zanzibar lives in production today, but I feel like it is one of the mistake in my career.