lt (u/lt) - Readit News

lt commented on Judge pares down artists' AI copyright lawsuit against Midjourney, Stability AI reuters.com/legal/litigat... · Posted by u/starshadowx2

kranke155 · 2 years ago

If the model didn't learn anything important from Picasso, it wouldn't be in the training data.

This whole argument of "ah but it doesnt really need it" doesn't hold up. If the model didn't need it, it wouldn't have used it in the first place.

Same thing in Artstation. It was of course propitious for AI scientists to find such a lovely database of high quality imagery, and all so helpfully tagged into categories.

All they had to do was take it.

lt · 2 years ago

Of course it learned, that's the point of training.

You claimed the model can reproduce an image from that training data. That's false, and what the judge dismissed.

  “none of the Stable Diffusion output images provided in response to a 
  particular Text Prompt is likely to be a close match for any specific image in  
  the training data.”

  “I am not convinced that copyright claims based a derivative theory can 
  survive absent ‘substantial similarity’ type allegations,” the ruling stated.

Whether using copyrighted data to train a model is fair use or not is a different discussion.

lt commented on Judge pares down artists' AI copyright lawsuit against Midjourney, Stability AI reuters.com/legal/litigat... · Posted by u/starshadowx2

kranke155 · 2 years ago

I find the whole comparison "it´s just like a person learning" to be a tiring trope. It's demonstrably not.

Like I said to another poster - you've probably seen a Picasso. Can you make me a copy?

Because a Diffusion model can. But you can't. Why not?

Your denial that there is a demonstrable difference between human and machine attention is part of the core obfuscation these companies are using to win this battle, so I reject it entirely. That difference creates the whole issue. If you don't recognise it, then answer me - Why can't you paint me a Picasso? You're saying the Machine is just like a human, yet a simple question of reproduction tells you it's not like a human in any way. It's a machine, and it produces machine reproductions. It learns faster and more accurately than any human, and its purpose is to produce derivative works. If the machine didn't need human data to do this, this discussion would be academic. But it does.

So the whole future of the Arts will be decided by investigating what the machine actually does, not the simplistic idea of it´s just like a human.

You have to evaluate the machine's abilities and impact onto the world. And that's the tough part. But just saying "hihih it's just a person" while it produces superhuman output is not a solution, it's just a lie that was invented by the people profiting from these models.

>Is your argument that this AI would be legally prohibited from viewing any images it doesn't have a specific license for?

Yes. You pay for access.

lt · 2 years ago

A diffusion model can't make a copy. That's the whole point. The original Picasso isn't in the model weights.

It has learned to make pixels a particular color to mimic that style, but that's it.

lt commented on Electronic Structure of LK-99 arxiv.org/abs/2308.00676... · Posted by u/spekcular

dralley · 2 years ago

My understanding is that the "99" in LK-99 is the year it they first synthesized the material, i.e. 1999.

Assuming this is all true, why is it just now coming to light? Did they just not know what they had? (I have not been following this closely, maybe this has already been explained)

lt · 2 years ago

As I've read it, the first lead to the material was 99.

2018 They got funding to research it further,

2020 was a first attempt of publication at Nature that was retracted, further improvements were made until 22/23 were two patents were filled, then suddenly 10 days ago Kwon, one of the co-researchers jumped the gun publishing a paper with the details, on one hand fearing a leak of someone else publishing first as that was too simple to replicate, on the other hand excluding everyone else from the paper and only listing him and Lee/Kim (LK) as authors as a Nobel prize can only be shared by three people. 2.5hrs later LK published again listing other 5 authors but him.

lt commented on LLM Powered Autonomous Agents lilianweng.github.io/post... · Posted by u/DanielKehoe

novaRom · 2 years ago

How small can be a LLM transformer in order to be able to understand basic human language and search for answers on the internet? It should not contain all the facts and knowledge, but must be quick (so, it's a small model), understand at least one language, and know how and where to look for answers.

Would it be sufficient to have 1B, 3B or 7B parameters to achieve this? Or is it doable with 100M or even fewer parameters? I mean vocabulary size might be quite small, max context size could also be limited to 256 or 512 tokens. Is there any paper on that maybe?

lt · 2 years ago

The contents of whatever it found online is fed back in the context, plus the response it generates based on that also counts to the limit.

So 512 is really inadequate unless you just want to make search queries using natural language.

lt commented on Gorilla: Large Language Model Connected with APIs shishirpatil.github.io/go... · Posted by u/throwaway888abc

arbuge · 2 years ago

In the colab example it appears you are using the openai python library but with the gorilla model instead of openai's models. That works? How do you set that up?

  # Query Gorilla server 
  
  def get_gorilla_response(prompt="I would like to translate from English to French.", model="gorilla-7b-hf-v0"):
  try:
    completion = openai.ChatCompletion.create(
      model=model,
      messages=[{"role": "user", "content": prompt}]
    )
    return completion.choices[0].message.content
  except Exception as e:
    raise_issue(e, model, prompt)

lt · 2 years ago

they point openai.api_base to their server that implements the same API

lt commented on Reddit goes down fully as thousands of subreddits protest API changes 9to5mac.com/2023/06/12/re... · Posted by u/mikece

lt · 2 years ago

I see three perspectives on the whole thing:

First, Reddit's monetization is broken by design. It never made any sense to me why they would charge for reddit gold for an ad-free experience on their website and own mobile app but not on the API. Why would they let third party apps serve their own ads and let them charge to remove them? This would be simple to fix, both technically and in the API's ToS, just serve the same ads regardless of the client. People would be upset, but ultimately I feel it would be entirely fair. But no, it doesn't seem to be a solution considered.

Second, the LLM dataset issue is also attributed to the price hike. Again, I think it's fair if unpopular to charge premium for bulk data. Again, there are technical and ToS solutions for this. They could introduce exponential tiers for bulk data, restrictions on allowed usage, other things that make user-facing usage reasonable but bulk processing expensive, but then again, starting measuring api usage per client id and not per user goes against this point, just making the API extremely expensive for everyone anywhere to the point of being unusable.

Third, all points seem to lead to the fact that what they really want is to kill third party apps and hope a large part of those users move to their app, for what? More tracking, tighter grip, better engagement metrics? Not sure. Even the changes to the extremely hostile mobile site now forcing some users to download the app. Really, I'd figure they'd understand their userbase better than that and how a small fraction on content producers and a even smaller fraction of power users and moderators carry the site, and pissing them off is a really bad idea. But what do I know.

lt commented on Fark redesign is now live (2007) fark.com/comments/2762299... · Posted by u/MrThoughtful

onion2k · 2 years ago

The problem with this approach (not just for Reddit, but for all internet content sites) is that $9.99 is a significant investment when there are other sites providing content for free. The choice the user has to make isn't "Is Reddit worth $9.99?" but more like "Is Reddit better value at $9.99 than some other website at $0.00?", and Reddit is probably never going to win in that process for the majority of people.

It's the same reason why so few people sign up to pay for YouTube or Twitter. Video content is available elsewhere. Whatever the hell Twitter content is can be found just by talking to people.

On the other side so many do pay for Spotify and Netflix because that content is locked down pretty well. You can pirate some things, but it's a pain. Paying is easier.

Reddit seems to think it has a product that has value. Like so many web content hosts before it, it's probably wrong. It probably can't work as a business unless it's showing people adverts. And the problem with that model is that people hate adverts.

Honestly, until users realise that they can't have a content host unless they pay, no sites like Reddit will survive long term.

lt · 2 years ago

Not to mention it’s actually the users generating the content.

lt commented on Fark redesign is now live (2007) fark.com/comments/2762299... · Posted by u/MrThoughtful

alexruf · 2 years ago

I guess you brought it right to the point: Reddit has never grown to a company it’s „dorm room shitshow“

lt · 2 years ago

Recently I read Reddit has over 2000 employees. I can’t begin to imagine what they possibly do.

lt commented on Drag Your GAN: Interactive Point-Based Manipulation of Images vcai.mpi-inf.mpg.de/proje... · Posted by u/waqasy

ortusdux · 2 years ago

Demo video: https://twitter.com/_akhaliq/status/1659424744490377217

lt · 2 years ago

Longer video with more examples from one of the paper authors:

https://twitter.com/XingangP/status/1659483374174584832

lt commented on Jsonformer: Generate structured output from LLMs github.com/1rgs/jsonforme... · Posted by u/yunyu

andrewcamel · 2 years ago

Seen a lot of things trying to do this by pressure testing the outputs, but all feel like anti-patterns. This is the first that seems like the "right" way to do it. Better to manage how the model is generating vs creating one more potentially faulty "glue" layer.

lt · 2 years ago

Can you elaborate about what you mean by pressure testing? Haven't heard this term yet.