daamien (u/daamien) - Readit News

daamien commented on How about trailing commas in SQL? peter.eisentraut.org/blog... · Posted by u/ingve

orf · a year ago

OP is massively overthinking it. Add them to CREATE TABLE and SELECT queries would remove 99.9% of annoyances.

daamien · a year ago

OP is one of 7 PostgreSQL core team members. That's kind of his job to massively overthink this :)

daamien commented on PostgreSQL Anonymizer postgresql-anonymizer.rea... · Posted by u/chynkm

daamien · a year ago

Hi !

I'm the main developer of this extension. Happy to answer any question you have about this project and anonymization in general!

daamien commented on PostgreSQL Anonymizer postgresql-anonymizer.rea... · Posted by u/chynkm

gmassman · a year ago

This is a very handy postgres extension! We've been using it at my job for a couple years now to generate test datasets for developers. We have a weekly job that restores a prod backup to a temporary DB, installs the `anon` extension, and runs pg_dump with the masking rules. Overall we've been very happy with this workflow since it gives us a very good idea of how new features will work with our production data. The masking rules do need maintenance as our DB schema changes, but that's par for the course with these kinds of dev tools.

All that said, I wouldn't rely on this extension as a way to deliver anonymized data to downstream consumers outside of our software team. As others have pointed out, this is really more of a pseudonymization technique. It's great for removing phone numbers, emails, etc. from your data set, but it's not going to eradicate PII. Pretty much all anonymized records can be traced back to their source data through PKs or FKs.

daamien · a year ago

Pseudonymizing functions are just one way to mask the data.

There are many other masking functions that will actually anonymize the data.

And the extension does not force you to respect the foreign keys.

It's really up to you to decide how you want to implement your masking policy

daamien commented on PostgreSQL Anonymizer postgresql-anonymizer.rea... · Posted by u/chynkm

hylaride · a year ago

It can be done manually:

https://dba.stackexchange.com/questions/306661/how-to-instal...

reply

daamien · a year ago

PostgreSQL Anonymizer developer here : this was possible with version 0.8 which is now completely unmaintained. This extension is currently not available on RDS but it is available on many others DBaaS providers : Azure SQL, Google Cloud SQL, Crunchy Bridge, ....

daamien commented on PostgreSQL Anonymizer postgresql-anonymizer.rea... · Posted by u/chynkm

sgt · a year ago

So basically running SELECT anon.anonymize_database(); will do it.

daamien · a year ago

Both approach are possible

* Static Masking will destroy the authentic data once for all

* Dynamic Masking will only alter the data the "masked users". Regular users will continue to view the real data.

Both techniques have their own advantage depending on your context.

daamien commented on PostgreSQL Anonymizer postgresql-anonymizer.rea... · Posted by u/chynkm

riffraff · a year ago

this seems great. I wonder tho, how do you ensure new columns are masked by default? It seems a safer alternative would be to start with all columns being statically masked and only unveil them selectively.

I guess you can add some CI steps when modifying the db to ensure a give column is allowed or masked, but still, would be nice if this was defaulted the other way around.

daamien · a year ago

This is called "Privacy By Default"

https://postgresql-anonymizer.readthedocs.io/en/stable/priva...

daamien commented on PostgreSQL Anonymizer postgresql-anonymizer.rea... · Posted by u/chynkm

Cynddl · a year ago

I'm going to repeat myself as I do everytime I encounter such tools. These tools DO NOT provide anonymization, and especially not at the level required by the EU's GDPR (where the notion of PII does not exist).

As a computer scientist and academic researcher having worked on this topic for now more than a decade (some of my work if you are interested: [1, 2]), re-identification is often possible from few pieces of information. Masking or replacing a few values or columns will often not provide sufficient guarantees—especially when a lot of information is being released.

What this tool does is called ‘pseudonymization’ and maybe, if not very carefully, ‘de-identification’ in some case. With colleagues, reviewed all the literature and industry practices a few months ago [3], and our conclusion was:

> We find that, although no perfect solution exists, applying modern techniques while auditing their guarantees against attacks is the best approach to safely use and share data today.

This is clearly not what this tool is doing.

[1] https://www.nature.com/articles/s41467-019-10933-3 [2] https://www.nature.com/articles/s41467-024-55296-6 [3] https://www.science.org/doi/10.1126/sciadv.adn7053

daamien · a year ago

Of course there's no perfect solution for anonymizing a dataset...

The extension offers a large panel of masking functions : some are pseudonymizing functions but others are more destructive. For instance there's large collection of fake data generators ( names, address, phones, etc. )

It's up to the database administrator or the application developer to decide which columns need to be masked and how it should be masked.

In some use cases, pseudonymization is enough and others anonymization is required....

daamien commented on PostgreSQL Anonymizer postgresql-anonymizer.rea... · Posted by u/chynkm

debarshri · a year ago

In RDS, if you cannot use this, you can create masked view and use query rewrite to make it work.

In my experience PG anonymizer has performance issues when it comes to large queries.

daamien · a year ago

Version 2.0 was released a few days ago. It's a complete rewrite in Rust.

Performance should be better than with v1.x

daamien commented on PostgreSQL Anonymizer postgresql-anonymizer.rea... · Posted by u/chynkm

zdc1 · a year ago

Assuming if it's for a support team or internal users with a lower SLA, I wonder if it's possible to have a small self-hosted PostgreSQL server that basically acts as a shim by holding a foreign-data wrapper connection to the actual RDS instance

daamien · a year ago

This called "Masking Data Wrappers"

https://postgresql-anonymizer.readthedocs.io/en/stable/maski...