I'm the main developer of this extension. Happy to answer any question you have about this project and anonymization in general!
All that said, I wouldn't rely on this extension as a way to deliver anonymized data to downstream consumers outside of our software team. As others have pointed out, this is really more of a pseudonymization technique. It's great for removing phone numbers, emails, etc. from your data set, but it's not going to eradicate PII. Pretty much all anonymized records can be traced back to their source data through PKs or FKs.
There are many other masking functions that will actually anonymize the data.
And the extension does not force you to respect the foreign keys.
It's really up to you to decide how you want to implement your masking policy
https://dba.stackexchange.com/questions/306661/how-to-instal...
reply
* Static Masking will destroy the authentic data once for all
* Dynamic Masking will only alter the data the "masked users". Regular users will continue to view the real data.
Both techniques have their own advantage depending on your context.
I guess you can add some CI steps when modifying the db to ensure a give column is allowed or masked, but still, would be nice if this was defaulted the other way around.
https://postgresql-anonymizer.readthedocs.io/en/stable/priva...
As a computer scientist and academic researcher having worked on this topic for now more than a decade (some of my work if you are interested: [1, 2]), re-identification is often possible from few pieces of information. Masking or replacing a few values or columns will often not provide sufficient guarantees—especially when a lot of information is being released.
What this tool does is called ‘pseudonymization’ and maybe, if not very carefully, ‘de-identification’ in some case. With colleagues, reviewed all the literature and industry practices a few months ago [3], and our conclusion was:
> We find that, although no perfect solution exists, applying modern techniques while auditing their guarantees against attacks is the best approach to safely use and share data today.
This is clearly not what this tool is doing.
[1] https://www.nature.com/articles/s41467-019-10933-3 [2] https://www.nature.com/articles/s41467-024-55296-6 [3] https://www.science.org/doi/10.1126/sciadv.adn7053
The extension offers a large panel of masking functions : some are pseudonymizing functions but others are more destructive. For instance there's large collection of fake data generators ( names, address, phones, etc. )
It's up to the database administrator or the application developer to decide which columns need to be masked and how it should be masked.
In some use cases, pseudonymization is enough and others anonymization is required....
In my experience PG anonymizer has performance issues when it comes to large queries.
Performance should be better than with v1.x
https://postgresql-anonymizer.readthedocs.io/en/stable/maski...