PhotoPrism: AI-powered photos app for the decentralized web

Can it remove duplicates? That’s the holy grail along with storing my images. I’ve got so many damn photos, and I want to reduce the total amount I have but going through them is so daunting I’ll never do it without a computer-assisted organizer.

mceachen · 2 years ago

Deduplicating by SHA (so, exactly the same bytes) is straightforward, and phototprism does that: https://docs.photoprism.app/user-guide/library/duplicates/

Unfortunately, if you've used Google Takeout or other systems that can both downsample your photos and videos, as well as actually deleting _or changing_ metadata, deduplicating becomes a big wad of heuristics.

My first approach was to build a UID based on a series of metadata fields (like captured-at time, the shutter count of the camera body, and the geohash), but that breaks when metadata is missing or edited from their sibling variants.

Just finding what assets to compare against by time proved to be problematic in some edge cases. By storing in UTC (which is how I've built systems for the last thirty years!) made any file that didn't encode a time zones be wrong by whatever the correct offset was--almost all videos, and many RAW file formats don't have timezones. The solution I came up with was to encoding captured-at time _in the local time_, rather than UTC. I also infer or extract a milliseconds of precision for the date stamp, so files with more or less precision will still overlap during the initial captured-at query.

If you're interested, more details are here: https://photostructure.com/faq/what-do-you-mean-by-deduplica...

My main point here is to expose how deep this rabbit hole goes—I'd suggest you think about these different scenarios, and how aggressive or conservative you want to be in duplicate aggregation, and what you want to do with inexact duplicates.

Most importantly: have full offline backups before you use any deduping or asset management system. When (not if) the tool you're using behaves differently from how you expected, you won't have lost any data.

rivo · 2 years ago

And if you just want to go by the pixel data, look into "perceptual hashing". https://github.com/rivo/duplo works quite well for me, even when dealing with watermarks or slight colour correction / sharpening. You could even go further and improve your success rate with Neural Hash or something similar.

groby_b · 2 years ago

If software is deduplicating images by SHA including metadata, it's missing a lot about photography workflows :) So, huge props (and I mean that - so many tools get that wrong!) on your approach of trying to identify asset groups and trying to ID members via a number of heuristics.

If you want to go deeper, I suggest grouping everything by image content - that means, at the very least, comparing images via a resolution independent hash. (E.g. average down to e.g. an 8x8 picture in greyscale for the simplest approach). (undouble does that nicely, and has a number of different approaches: https://erdogant.github.io/undouble/pages/html/index.html)

michaelteter · 2 years ago

If there was ever an obvious use for AI, identifying (or at least scoring likeliness) images should be one.

Heck, we could even brute force this by resizing images via different methods and comparing checksums. But _surely_ AI can help us here.

buildbot · 2 years ago

I went about depdup in a similar but opposite way then you did - https://github.com/mgolub2/photodb

I used libraw to read the actual raw data from my images, ignoring possible metadata that can get changed/updated by Capture One for example. The raw data is just fed into a hashing function, to get an exact content hash. Does not work if your image is down-sampled of course, but that was actually my goal - I want to know if the raw content has changed/bit flipped, but don't care if the metadata is altered or missing.

j1elo · 2 years ago

Deduplicating by SHA, or by exactly identical metadata, sounds so rudimentary in 2023... 10 years ago VisiPics was already doing an excellent job to help find similar or identical images in a collection, based on visual similarity!

http://www.visipics.info/

So much time later, I would hope that we expect more sophisticated and flexible tools, not less, for that task.

BenTheElder · 2 years ago

> Unfortunately, if you've used Google Takeout or other systems that can both downsample your photos and videos, as well as actually deleting _or changing_ metadata, deduplicating becomes a big wad of heuristics.

I thought takeout exported the original files? (alongside metadata entered into the application)

eviks · 2 years ago

> actually deleting _or changing_ metadata, deduplicating becomes a big wad of heuristics

Why not calculate hashes just for the image data and then do metadata conflict resolution separately?

taneq · 2 years ago

> Most importantly: have full offline backups

Do this anyway. Always.

jyrkesh · 2 years ago

I used to use DupeGuru which has some photo-specific dupe detection where you can fuzzy match image dupes based on content: https://dupeguru.voltaicideas.net/

But I switched over to czkawka, which has a better interface for comparing files, and seems to be a bit faster: https://github.com/qarmin/czkawka

Unfortunately, neither of these are integrated into Photoprism, so you still have to do some file management outside the database before importing.

I also haven't used Photoprism extensively yet (I think it's running on one of my boxes, but I haven't gotten around to setting it up), but I did find that it wasn't really built for file-based libraries. It's a little more heavyweight, but my research shows that Nextcloud Memories might be a better choice for me (it's not the first-party Nextcloud photos app, but another one put together by the community): https://apps.nextcloud.com/apps/memories

robobub · 2 years ago

Memories is definitely loads better, especially with Recognize for face/object/video tagging

poulpy123 · 2 years ago

I used successfully https://github.com/qarmin/czkawka to remove duplicate and also https://www.scootersoftware.com/ to merge pictures folders

jonathankoren · 2 years ago

I wrote some scripts a few years ago to do just that. I haven’t used them myself in a while, but they should still be good.

https://github.com/jonathankoren/photo-autorganize

gardenhedge · 2 years ago

Same. Killer feature

Dead Comment

I have tried to install this from source but it's quite difficult. I don't like deploying opaque docker images.

cevn · 2 years ago

Full agreement, Immich has a similar problem. I don't know at what point basic systemd services stopped being enough but docker is usually a non starter for me.

grepfru_it · 2 years ago

>docker is a nonstarter

I dunno, I enjoy not having to install 500 libraries on my system just to test an app. Also upgrading those libraries without butchering my system is also nice. Not to mention rebuilding a system is really fast. Too many pros outweigh the cons

javier2 · 2 years ago

I more enjoy more a working image where the developer can control things like python version and image processing libraries

DistractionRect · 2 years ago

You can always build and tag your own docker image using the docker files included from source. Or simply follow the docker files as install instructions:

[0] https://github.com/photoprism/photoprism/blob/develop/docker...

JamesonNetworks · 2 years ago

"simply follow the docker files"

I'm not sure if you perused the docker files for this repo, but imho there is nothing simple about them

Edit:

I was curious so I dug into them a bit and found that the dockerfile references a develop docker image. There is a second docker file to see how that base image is built. In the steps to build that base image, we grab some .zip files from the web for the AI models. We install go, nodejs, mariadb, etc for quite a few deps, and then there is also a substantial list of packages installed. One step also does:

  apt-get update && apt-get -qq dist-upgrade

Which seems a bit iffy to me. Each step calls a script which in turn has its own steps. Overall, I'd say the unfurled install script would be quite long and difficult to grok. Also, I'm not saying any of this is "bad," but it is complex.

Cyph0n · 2 years ago

How is a Docker image more opaque than a binary built from source?

WirelessGigabit · 2 years ago

I disagree.

Docker is the one thing that works on MANY flavors of Linux.

If I want to provide a tool I want to spend my time on building the tool, not building an rpm, a snap, a deb, ...

The Docker build process is significantly easier. For example, I can just pull in NodeJS 20. I can't do that on Ubuntu. It's not available on packages.ubuntu.com.

Building a deb/snap/rpm is a whole other language to understand how dependencies are set up.

And then I need to test those. I've never even ran CentOS.

grepfru_it · 2 years ago

Rpm is trivial to learn

hn92726819 · 2 years ago

I generally don't either, but I wonder: are you more comfortable with running a docker image without internet access? You can firewall your host so the container can't access it and assign an internal network to the container.

nullcipher · 2 years ago

Well, I don't want to run untrusted apps for a start. I am not that desperate :)

faitswulff · 2 years ago

Has anyone tried the Nix package?

joeyaiello · 2 years ago

Yup, and it actually wasn't working until fairly recently (maybe a few months ago? I was keeping an eye on it for a while).

But yeah, it's even fully integrated into NixOS options now. You can set up a default install with one line: https://search.nixos.org/options?query=photoprism

misterio7 · 2 years ago

Yup! Worked great for the few months I used it. I think it's kinda funny how much simpler the Nix package is when compared to upstream's dockerfiles lol

flo123456 · 2 years ago

Yes, it works quite well. I’m using the NixOS module to configure it.

smallerfish · 2 years ago

I second this - docker images are a horrible way to distribute applications. Please make an AppImage or flatpak for Linux.

General question: does this have import from google photos? Please don't make me go through the google takeout pain.

fl0id · 2 years ago

it's a server app. flatpak or appimage is a terrible way for those.

Deleted Comment

barbazoo · 2 years ago

From https://www.photoprism.app/features for Community (free) and Essentials (EUR2.00/month):

> User Management

> Account Roles: Super Admin, Admin

So unless I switch to the >=EUR6.00/month Plus plan I cannot add a non admin user? So my grandma gets admin privileges? Is that not a blocker for a lot of families?

Edit: From https://www.photoprism.app/plus/kb/multi-user

> PhotoPrism® Plus includes advanced multi-user functionality and additional account roles. These roles are intended for situations where you want other people to have access to your library, such as giving family members access to your pictures without granting write permissions or exposing private content.

> It is recommended to set up additional instances if you have multiple users in a family, so that everyone can manage their own files independently. This way you can avoid problems with conflicting library settings, file permissions, and dealing with duplicates.

So you're actually supposed to run an instance per person it seems. But still, then my grandma would still be an admin. I think I'd like to about that.

donmcronald · 2 years ago

> So unless I switch to the >=EUR6.00/month Plus plan I cannot add a non admin user? So my grandma gets admin privileges? Is that not a blocker for a lot of families?

Maybe, but at least it's not per-user, per-month. I'm much more likely to set up something like this if I can add / remove family members as needed without thinking about the cost beyond the annual fee I pay for to run my own instance.

The blocker for me is typically ease of use. They recommend PhotoSync for transferring files to the instance. I currently use that for my family, but sync to a Nextcloud install (which I desperately want to replace). The part where it fails for me is that my family members don't understand how to verify everything is running as intended. For example, I checked one of their phones earlier this year and they had 5k un-synced photos.

What this really needs to be useful is a native app that does sync from the phone in a way that makes it relatively foolproof.

Beyond the ease of use, every link in the chain of the workflow is a point of risk IMO. For example, what if PhotoSync rug pulls everyone (I have no reason to believe they would) and starts charging a subscription? The app that runs on your phone and does the syncing is the more valuable half of the workflow IMO.

After thinking about it some more I now think you're right, at least it's a flat fee that doesn't go up as you add more users. And it's not that high a cost. I'll give it a try.

And I'll pay close attention to the reliability of the sync. I've been using Syncthing for a while now and am still not trusting it completely. Quite regularly I have to restart a mobile app for the files to get synched and I just can't figure out what's wrong. I don't want that to happen for photos. Maybe I'll monitor the date of the latest photo per user and alert if it's too old.

Freemium pricing is always a tricky balance. Your free tier should provide a modicum of value to entice people into installing your software, but ideally, it will also incentivize the users that are getting value from your efforts to remunerate to a degree that is reasonable and comfortable for them.

You're right, my knee jerk reaction was to dismiss it as dark pattern pricing but on second thought it might not be unreasonable actually.

nashashmi · 2 years ago

If you’re using it for reasons other than personal use, like family use, then a heavier set of administrative options would be necessary. and therefore pricing will be a part of the equation.

blitz_skull · 2 years ago

jsmith99 · 2 years ago

Some features are only in the paid version which is fair enough, but when I tried it a few months ago there was a small but permanently visible message on the app reminding me this was the free version. That was annoying.

RobotToaster · 2 years ago

I've read you just need to add "PHOTOPRISM_SPONSOR: "true"" to the docker compose lol.

seszett · 2 years ago

It's not really a secret since the application is open-source. I find their relaxed approach to a paid open-source application very nice.

gigatexal · 2 years ago

If I am going to host something myself and not pay i'd like to do so without a "this is the free version" reminder all the time. At that point I'll just keep paying for my Apple One subscription.

giamma · 2 years ago

It's opensource and AGPL, as long as you do not redistribute it I believe you can fork it and rebuild it.

buzzy_hacker · 2 years ago

I’m curious how this works with AGPL. Do they not have to release the source of the paid version?

joshuaissac · 2 years ago

They have a contributor licence agreement that allows them to relicense incoming contributions, so they can release the paid version under a proprietary licence, if they so wish.

https://www.photoprism.app/cla

rakoo · 2 years ago

Their obligation is only to the users who paid for it, and if you host the paid version to serve someone else, the obligation is on you only.

That doesn't mean the code can't get distributed more widely. I guess it's a bet based on the honesty of users.

velosol · 2 years ago

Wouldn't that source only need to be available to those who purchase the paid version?

kunwon1 · 2 years ago

I deployed this recently and added a bunch of pictures to it. PhotoPrism prominently features an AI-generated (?) 'description' on each photo, but for 98% of my photos, it was unable to come up with any description. The install procedure is needlessly complicated, there's no good reason for an app like this to require docker.

Overall, I was underwhelmed

FireInsight · 2 years ago

> there's no good reason for an app like this to require docker.

Well, there is a good reason. To make the install more simple. IMHO containerization makes selfhosting way easier.

I disagree with the contention that docker makes installs more simple. I would argue that it only makes installs more simple if you are already setup with Docker, and only when things go 100% correctly.

When I see a software application that recommends Docker for deployment, I always assume that I'm going to need to do extra work to make it behave correctly. Whether that is figuring out how to make it auto-start, or how to forward network ports, or how to share a data partition with the host OS.

Non-docker installs are simpler. At least, for my skill set.

shocks · 2 years ago

Docker made it basically trivial to run PhotoPrism on my Synology NAS…

xienze · 2 years ago

It did the same for me, and (at least when I looked at it) there was no way to just manually tag photos/identify subjects.

Ruthalas · 2 years ago

You can manually tag photos and identify people, though the UI flow is not the most convenient. (I've been trying it put for a couple months.)

infinitezest · 2 years ago

This is a similar project but fully open source (and probably less mature). The lead dev is very active and responsive. https://immich.app/

acatton · 2 years ago

PhotoPrism is under AGPL, how is that not fully open source?

https://github.com/photoprism/photoprism/blob/develop/LICENS...

https://opensource.org/license/agpl-v3/

guptarohit · 2 years ago

I believe OP meant some features of PhotoPrism is paid.

It is fully opensource, the opensource code contains also the logic to verify a serial key that you will be given if you purchase the license and if verification succeeds the code will enable the extra features.

As such you could fork it, tweak the code around the license verification to always return TRUE or whatever, and you would be running a "pro version" of the software.

The developer simply trusts that people who like the software will purchase a license, or the fact that the majority of people out there are not programmers and would not be able to rebuild the software for themselves. Besides, the product is also offered as a SaaS.

cvwright · 2 years ago

AFAIK PhotoPrism itself is open source, but the PhotoSync tool that they suggest for syncing your photos to it is not.

pablo1 · 2 years ago

Does immich support S3 storage and multiple users? These are the two features I miss in most of theses projects, don’t really care about the AI stuff

samaritan1331 · 2 years ago

Multiple users? Yes. As for s3, natively, no but you can always mount s3 as a directory and point the app to use that directory.

ruph123 · 2 years ago

Having used both I can say that PhotoPrism runs MUCH more stable and surprisingly smooth. Immich constantly breaks.

pl4nty · 2 years ago

Immich is very much alpha, that's why there are warning banners all over their docs. But at least it's in development - the single PhotoPrism dev rarely accepts PRs, so it hasn't had significant new features in ages

xela79 · 2 years ago

that demo site looks very promising, the feature set is really quite exhaustive already at such early development stage; comes closest to full blown google photos self hosted replacement.

would be cool if also had dynamic shared albums as google photos has with facial recognition

Thanks for sharing. Its Design looks relatively more clean.

ropeladder · 2 years ago

I looked at a bunch of these 2 years ago and ended up using PhotoView for a private gallery. It had the right mix of simplicity and features, and I was actually able to get it running.

https://github.com/photoview/photoview

jamesdwilson · 2 years ago

This seems to be light on the details for the p2p / decentralized part. Anyone have more details? Does this use DHT or a blockchain or how are they doing that?

jadbox · 2 years ago

I was also confused by the decentralized part as there's no mention of what exactly is decentralized on the website.