One year ago, I started a side-project called PostgreSQL Anonymizer to study and learn various ways to protect privacy using the power of PostgreSQL. The project is now part of the Dalibo Labs intiative and we’ve published a new version last week…
This seems like a nice moment to analyze the progress we’ve made, how the GDPR is changing the game and where we’re going….
GDPR : Sanctions are coming
While I was working on this, the landscape has changed… When the GDPR was implemented in May 2018, one of the biggest questions was if the fines would be significant enough to force a real change in corporate data policies….
From what we can see, the GDPR penalties are starting to fall, just during July 2019: Bristish Airways got 204 M€, and Marriott Hotels got 110 M€.
There are also smaller fines for smaller companies, what’s interesting is that the biggest fines are related to the Article 32 and the « Insufficient technical and organisational measures to ensure information security ».
In other words : Data Leaks.
Here’s where anonymization can help ! Based on my experience, we can reduce the risks of leaking personnal information by limiting the number of environments where the data is hosted. In many staging setups (such as pre-production, training, development, CI, analytics, etc…) the real data is not absolutely required. With a strong anonymization policy we can limit real data only where it is needed and work on fake/random data everywhere else. When anonymization is done the right way, the anonymized datasets are not concerned by the GDPR.
In a nutshell, anonymization is powerful method to reduce your attack surface and it’s a key to limit the risks of GDPR penalties related to data leaks.
This is why we’re investing a lot of efforts to develop masking tools directly inside PostgreSQL !
Major Improvements
Over the last month, I’ve worked on different aspects of the extension, especially :
- Adding more tests and protections against SQL injection
- Enabling users to export Anonymous Dumps using the wonderful ddlx extension
- Adding more Masking Functions, in particular
shuffling
andnoise insertion
- Rebuilding large parts of the Dynamic Masking engine
- Clarifying the concept of In-Place Anonymization
- Writing a better documentation :-)
Security Labels
One of the main drawback of the current implementation of PostgreSQL is that
Masking Rules are declared using the COMMENT
syntax which can be annoying
if your database already has comments.
Thanks to an idea from Alvaro Herrera, I’m currently working on a new declaration syntax based on Security Labels, a little known feature also used by the sepgsql extension
SECURITY LABEL FOR anon ON COLUMN people.zipcode
IS 'MASKED WITH FUNCTION anon.fake_zipcode()'
This should be available in a few weeks. Of cource, the former syntax will
still be supported for backward compatibility.
Let’s talk !
GDPR and data privacy are two very hot topics ! I will be talking about those subjects in various events in the forthcoming weeks :
- on October 14th at PostgreSQL Conference Europe in Milan, Italy
- on November 14th at Libday / Devops D-Day in Marseille, France
If you have any ideas or comments on PostgreSQL Anonymizer or more generally about protecting data privacy with progress, please send me a message at damien@dalibo.com.
blog comments powered by Disqus