[Pearg] new reidentification result using GANs...

Joseph Lorenzo Hall <joe@cdt.org> Wed, 24 July 2019 14:38 UTC

MIME-Version: 1.0
From: Joseph Lorenzo Hall <joe@cdt.org>
Date: Wed, 24 Jul 2019 10:38:28 -0400
Message-ID: <CABtrr-UNtcCsXar_+Zpc7T11xd_scBh1r9n55kKMLMCdh5OqCw@mail.gmail.com>
To: pearg@irtf.org
Content-Type: multipart/alternative; boundary="000000000000d95449058e6e435f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/pearg/eOl06edINFDAD7KsgrUKnbhFOoo>
Subject: [Pearg] new reidentification result using GANs...
Precedence: list

https://www.nature.com/articles/s41467-019-10933-3
(PDF: https://www.nature.com/articles/s41467-019-10933-3.pdf )

# Estimating the success of re-identifications in incomplete datasets using
generative models

Luc Rocher, Julien M. Hendrickx & Yves-Alexandre de Montjoye

Abstract: While rich medical, behavioral, and socio-demographic data are
key to modern data-driven research, their collection and use raise
legitimate privacy concerns. Anonymizing datasets through de-identification
and sampling before sharing them has been the main tool used to address
those concerns. We here propose a generative copula-based method that can
accurately estimate the likelihood of a specific person to be correctly
re-identified, even in a heavily incomplete dataset. On 210 populations,
our method obtains AUC scores for predicting individual uniqueness ranging
from 0.84 to 0.97, with low false-discovery rate. Using our model, we find
that 99.98% of Americans would be correctly re-identified in any dataset
using 15 demographic attributes. Our results suggest that even heavily
sampled anonymized datasets are unlikely to satisfy the modern standards
for anonymization set forth by GDPR and seriously challenge the technical
and legal adequacy of the de-identification release-and-forget model.

-- 
Joseph Lorenzo Hall
Chief Technologist, Center for Democracy & Technology [https://www.cdt.org]
1401 K ST NW STE 200, Washington DC 20005-3497
e: joe@cdt.org, p: 202.407.8825, pgp: https://josephhall.org/gpg-key
Fingerprint: 3CA2 8D7B 9F6D DBD3 4B10  1607 5F86 6987 40A9 A871

[Pearg] new reidentification result using GANs... Joseph Lorenzo Hall