It looks like a lot of this stuff is already in the public domain and so they are just letting people use the scans/photos of these works that they took and are hosting. It's a shame that we don't already have this being done by public libraries.
> It's a shame that we don't already have this being done by public libraries.
What makes you say that? It's already done by so many institutions within the GLAM sector. They usually don't have the same marketing budget as Getty though. Here's[1] a glimpse into the digital heritage of the EU. That's a good starting point for some exploration.
> Unfortunately, the item media as provided to Europeana can not be displayed at the moment. Please try to download the media or view the item on the providing institution's website.
Download link is a dead dropbox account. And this is the first thing I tried.
Adding on to this: The National Diet Library of Japan has a lot of very well digitised resources. They can be a bit annoying to get to but they are there.
If you view items in the image bank, they show you a preview. You'll then have to click the link that takes you to the digital collections page for that item (which shows the full uncropped image). From there you will want to scroll down to the download panel and make sure you select "high resolution". Those images are generally at least around 2k x 2k or better depending on when they were captured.
You should also be able to get the unconverted image from their api (as many of them are in varying, less common formats like JP2/JPEG2000) but I haven't been able to figure out how. If you sent someone at the library an email you could probably figure it out though.
Some do. A lot depends on the library’s resources, but the Library of Congress has a lot of free imagery available, and I think a number of other large public libraries with art collections also do the same.
I'm surprised to find myself defending Getty Images at all, but here I am. Before I dealt with huge collections as a developer, I didn't consider how much intellectual work went into managing collections as entities-- Things like ontology, or even maintaining consistent terminology usage in that much metadata is not trivial, and older materials cause the most problems. Even if these were exclusively public domain images, it would still be a praiseworthy effort. I'm sure some Getty archivist or the like fought hard for this.
This is not true in the US. The most relevant case is Bridgeman v. Corel [1], ruling that photographic reproductions of public domain paintings could not be copyrighted.
Museums like to pretend that they hold copyrights on these photos. They do not.
When I was publishing a typography magazine in the 90s, most of my covers were non-typographic images. I used an image of a statue of Venus from the Getty for one cover. They not only provided the transparency free of charge, but offered to take a picture from a different angle if I needed a different image of the statue (also for no charge). The Getty does a lot to share their collections. (In contrast, I paid a couple hundred dollars to LACMA for the use of a transparency of a 17th century painting in their collection.)
If you're ever in LA, I recommend checking out the Getty. I'm by no means an art buff in any capacity, and honestly I thought going might be boring because I did the Getty Villa in Malibu when I was in elementary school and wasn't appreciative, but it completely changed my mind. Even if you don't like the art or the exhibits, the views of Los Angeles are amazing, probably better than Griffith in my personal opinion.
It's an incredible facility indoors and out, with some really great works (especially given the age of the collection - it's relatively new compared to some larger institutes)
When I was there they also had an awesome special exhibit (cave temples of dunhuang if my memory and Google skills serve), which makes me think their average special exhibits are decent.
This is great, and I will repeat something here I have mentioned in the past about AI image generator datasets.
Instead of building image generators off of images scraped from people's art without their consent, we could use openly licensed images, and intentionally push the public towards licensing more of their images for open datasets by encouraging twitter and instagram/meta to add image license options to image uploads, and running some public service campaigns on these platforms to encourage use of open licenses to help build better datasets.
At the same time the smaller image sets that would be available would encourage additional research in to sample efficiency, which I regularly hear would be a generally useful area for further research.
This approach would ensure several things: Public datasets would be available to all, not just the major institutions (assuming enough people cared about open licensed models to push the major players to use them), sample efficiency would be improved, more people would get used to licensing their images for public use, and artists who did not want their style copied by these technologies would have their rights respected.
That last point would have a HUGE effect on public opinion about AI, building trust between the public and the AI research community.
Ultimately I am in favor of a world where intellectual property restrictions are sharply curtailed, but to do that ethically we would need to put in place other systems to provide for those whose livelihood depends upon their intellectual works. And if someone created a work before AI existed, and they reserved their legal rights for their work at the time of publication, I think that should be respected such that the work would be left out of machine learning datasets.
This approach would take more work, but setting this expectation would encourage the tech industry to clarify image licenses on their platforms, ultimately promoting open culture and open image licenses.
There simply aren't enough of them. Stable Diffusion is trained on 5 billion images. That kind of scale doesn't exist in public domain artwork. This dataset of 88k images is 0.0017% of that.
Also, it's worth noting that intellectual property is a weird construct. Human artists are trained from looking at copyrighted works their entire lifetime. If you asked me to draw a cartoonistic bear I cannot guarantee you that it doesn't vaguely look like Winnie the Pooh or Baloo. I've seen those things and can't un-erase them from my head. And if you prevented me from ever seeing copyrighted works for my whole life, I might not be able to draw anything.
Whats the benefit to artists to licensing works in a way thats favorable for AI models? Most artists hate the concept of AI art regardless of whether or not its a threat to their livelihoods.
I could see it slowly approaching the open source model - you do it because you care, and maybe some people will donate to you because they want to see your work continue. You can also say you've done it, so if you're a prolific artist it would probably look good.
Also, I feel like the main reason artists have a beef with AI right now is mostly because lots of their published works were used without permission to train models. I think if instead SD/Midjourney et al had used open datasets curated in the way TaylorAlexander described, there would be a lot less pushback, because everyone would know the models were trained with consent of the underlying artists responsible for the training data's existence.
There is still the concern of automation eliminating demand for work done by humans, but I have a hunch that in the long term, artists will embrace these tools in the same way that's been done with Photoshop and every other digital tool. It still might be very different, i.e. AI is much more powerful/enabling than Photoshop, but I'm not sure that'll change the outcome.
So an image uploaded by a random person (not necessarily the copyright owner) to twitter and it automatically is added to this public set? Not sure legally it would be any different.
The understanding (or level of giving a crap) the average social media user has of copyright is pathetic. How often do you see "DM for credit" or "no copyright intended"?
I was thinking about this the other day; it would be really interesting to have big corporations whose profits depend on public-domain data. We might actually see lobbying to decrease copyright terms, to counter companies like Disney trying to extend copyright until the end of time.
Nice! I love using high quality images of art for some of my personal projects. Might built something similar with these like I did for the Rijksmuseum in Amsterdam[1].
Edit: Upon trying to get some others, the error shows up "Read error at byte 7127040" seems they probably either limiting, overloaded servers or having some more serious issues.
It seems like these are all already in the public domain. They were already free to use however we liked and neither the Getty nor anyone else could use copyright to prevent that. It is nice that the Getty is providing some scanning and hosting to make them accessible, though.
The article specifically mentions Irises by Van Gogh, but the link (https://www.getty.edu/art/collection/object/103JNH) takes you to a page where it seems like you have to ask nicely and agree to terms to use this public domain image.
The artwork is in the public domain. The photo of it is not. You can take your own photo, but if you want to use a photo someone else took, then you need their permission.
This was a nice idea that was tested and failed in court: https://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel.... Making a reproduction with a camera does not create a new copyright because copyright protects creative expression, not skill with a camera.
Getty-Images* will barefacedly "license" you public domain images at significant cost (i.e. thousands per image.)
Worse, they've attempted to extract licensing fees from people who use such public domain images, in one unfortunate case: Carol M. Highsmith, a photographer who had donated her works to the public domain received a letter of demand from Getty Images for using her own public domain images.
What makes you say that? It's already done by so many institutions within the GLAM sector. They usually don't have the same marketing budget as Getty though. Here's[1] a glimpse into the digital heritage of the EU. That's a good starting point for some exploration.
[1] https://www.europeana.eu/en
https://www.europeana.eu/en/item/08547/Museu_ProvidedCHO_Sta...
The image is only 600x768 pixels. Way too small to be useful. You can't even read the text in the image. The original work is 30x42 cm.
Download link is a dead dropbox account. And this is the first thing I tried.
Except we do.
https://digitalcollections.nypl.org/
https://www.flickr.com/photos/britishlibrary
NDL Digital Collections: https://dl.ndl.go.jp/
NDL Image Bank: https://ndlsearch.ndl.go.jp/en/imagebank
If you view items in the image bank, they show you a preview. You'll then have to click the link that takes you to the digital collections page for that item (which shows the full uncropped image). From there you will want to scroll down to the download panel and make sure you select "high resolution". Those images are generally at least around 2k x 2k or better depending on when they were captured.
You should also be able to get the unconverted image from their api (as many of them are in varying, less common formats like JP2/JPEG2000) but I haven't been able to figure out how. If you sent someone at the library an email you could probably figure it out though.
don't dismiss the value in simply making them accessible. they're providing a platform to access these works, which is great.
Deleted Comment
Museums like to pretend that they hold copyrights on these photos. They do not.
(In the US).
[1] https://en.wikipedia.org/wiki/Bridgeman_Art_Library_v._Corel....
When I was there they also had an awesome special exhibit (cave temples of dunhuang if my memory and Google skills serve), which makes me think their average special exhibits are decent.
Instead of building image generators off of images scraped from people's art without their consent, we could use openly licensed images, and intentionally push the public towards licensing more of their images for open datasets by encouraging twitter and instagram/meta to add image license options to image uploads, and running some public service campaigns on these platforms to encourage use of open licenses to help build better datasets.
At the same time the smaller image sets that would be available would encourage additional research in to sample efficiency, which I regularly hear would be a generally useful area for further research.
This approach would ensure several things: Public datasets would be available to all, not just the major institutions (assuming enough people cared about open licensed models to push the major players to use them), sample efficiency would be improved, more people would get used to licensing their images for public use, and artists who did not want their style copied by these technologies would have their rights respected.
That last point would have a HUGE effect on public opinion about AI, building trust between the public and the AI research community.
Ultimately I am in favor of a world where intellectual property restrictions are sharply curtailed, but to do that ethically we would need to put in place other systems to provide for those whose livelihood depends upon their intellectual works. And if someone created a work before AI existed, and they reserved their legal rights for their work at the time of publication, I think that should be respected such that the work would be left out of machine learning datasets.
This approach would take more work, but setting this expectation would encourage the tech industry to clarify image licenses on their platforms, ultimately promoting open culture and open image licenses.
There simply aren't enough of them. Stable Diffusion is trained on 5 billion images. That kind of scale doesn't exist in public domain artwork. This dataset of 88k images is 0.0017% of that.
Also, it's worth noting that intellectual property is a weird construct. Human artists are trained from looking at copyrighted works their entire lifetime. If you asked me to draw a cartoonistic bear I cannot guarantee you that it doesn't vaguely look like Winnie the Pooh or Baloo. I've seen those things and can't un-erase them from my head. And if you prevented me from ever seeing copyrighted works for my whole life, I might not be able to draw anything.
So why are we holding AI to a different standard?
Right, which is why work on sample-efficiency would be so valuable.
> So why are we holding AI to a different standard?
Because it is an automated computer system, not a human being. It can be held to a different standard because it is an entirely different system.
Also, I feel like the main reason artists have a beef with AI right now is mostly because lots of their published works were used without permission to train models. I think if instead SD/Midjourney et al had used open datasets curated in the way TaylorAlexander described, there would be a lot less pushback, because everyone would know the models were trained with consent of the underlying artists responsible for the training data's existence.
There is still the concern of automation eliminating demand for work done by humans, but I have a hunch that in the long term, artists will embrace these tools in the same way that's been done with Photoshop and every other digital tool. It still might be very different, i.e. AI is much more powerful/enabling than Photoshop, but I'm not sure that'll change the outcome.
The AI art models are still going to be trained regardless. One artist opting out, individually doesn't stop any of that.
Allowing people to remix your stuff can lead to awesome outcomes, just not personally financially beneficial
Deleted Comment
1. https://randomrijks.com
For a different project I’m looking in into using k-means to determine the dominant colors.
1. https://botsin.space/@ExhibitExplorer
For example I can't download the 11k version of this [1].
Is anyone else experiencing this?
[1] https://www.getty.edu/art/collection/object/103RHN
Edit: Upon trying to get some others, the error shows up "Read error at byte 7127040" seems they probably either limiting, overloaded servers or having some more serious issues.
The article specifically mentions Irises by Van Gogh, but the link (https://www.getty.edu/art/collection/object/103JNH) takes you to a page where it seems like you have to ask nicely and agree to terms to use this public domain image.
Worse, they've attempted to extract licensing fees from people who use such public domain images, in one unfortunate case: Carol M. Highsmith, a photographer who had donated her works to the public domain received a letter of demand from Getty Images for using her own public domain images.
https://petapixel.com/2016/11/22/1-billion-getty-images-laws...
* While Getty Images and Getty Museum are born of the same "Getty" family, they aren't connected entities, and one does not reflect upon the other.