[Pearg] new reidentification result using GANs...
Joseph Lorenzo Hall <joe@cdt.org> Wed, 24 July 2019 14:38 UTC
Return-Path: <jhall@cdt.org>
X-Original-To: pearg@ietfa.amsl.com
Delivered-To: pearg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ECDED12037B for <pearg@ietfa.amsl.com>; Wed, 24 Jul 2019 07:38:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cdt.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2j_LYLoRL9Cd for <pearg@ietfa.amsl.com>; Wed, 24 Jul 2019 07:38:40 -0700 (PDT)
Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8F9CD120357 for <pearg@irtf.org>; Wed, 24 Jul 2019 07:38:40 -0700 (PDT)
Received: by mail-io1-xd36.google.com with SMTP id j5so85912715ioj.8 for <pearg@irtf.org>; Wed, 24 Jul 2019 07:38:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cdt.org; s=google; h=mime-version:from:date:message-id:subject:to; bh=1Vs+NtzkHdoM2tC8BctWGsCQs/SOcVqDYwYPGXsF274=; b=rhGM3Fcd/Gylu+W1gYkr52gNur4Xce9j1gGxC85LGSX2EIl/RoH/g9Rz/yKqUUeZaP Hy9sJzW6cC+CCWzipBVFjyOMKeFoQ7az5M5kfYro4xUGg/3ShCsWYAJQB6mYfuCt2Una Fwz4Fhlt0WK1DLwED+Uq/wV7xe/ijO3nCoBAo=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=1Vs+NtzkHdoM2tC8BctWGsCQs/SOcVqDYwYPGXsF274=; b=bNepIUHugojmUxjn1OFCrXrnsdjD6QXD1TiksyB18Yjk/jPEUozrfajAbHKhw/YRH1 g23WFYZSNqV1ZQpAXsrpBuOWpC+eg1eo12ICZ25pJ2I1esjLvtzL4r8T0OlUMJNy/488 CXvrFu40M7VRSi3KH4GqJoneq6yGG5VByb+ANkA4fXUHc8KNQMAkWS4JTyl355pOi/mB dc9WRfnMya62bMPR213j4zVJ5+b2NPTvNnWUUncGVmmoIZef9MBrOYCh/8JOrV0WjPar oVNZ8VXrI7GG6SyXUkoQ7ZsEl4Db2WEmfSoOi+s8mi0KfsXjbPNsOgaTgsNypSy2rXG+ mgGw==
X-Gm-Message-State: APjAAAVsy3oM69/w5QEAfd8Fz+S4/XTmvyATfWe8GpeyjgWTY5f1DhWV jOtmWIy1L+lZejm0Ca0ml7mLrwIYTTILb67hg23d2aSv9X9bjN2A
X-Google-Smtp-Source: APXvYqwKr/4ZfjqOLwZNCXfhVdM/vvQvq1/g370MmIr7f8hm6g6R+p9SGjeKMBeR2XU44vUJ26oEvYbf/Ua0mURJTnA=
X-Received: by 2002:a5e:9506:: with SMTP id r6mr12011786ioj.219.1563979119409; Wed, 24 Jul 2019 07:38:39 -0700 (PDT)
MIME-Version: 1.0
From: Joseph Lorenzo Hall <joe@cdt.org>
Date: Wed, 24 Jul 2019 10:38:28 -0400
Message-ID: <CABtrr-UNtcCsXar_+Zpc7T11xd_scBh1r9n55kKMLMCdh5OqCw@mail.gmail.com>
To: pearg@irtf.org
Content-Type: multipart/alternative; boundary="000000000000d95449058e6e435f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/pearg/eOl06edINFDAD7KsgrUKnbhFOoo>
Subject: [Pearg] new reidentification result using GANs...
X-BeenThere: pearg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Privacy Enhancements and Assessment Proposed RG <pearg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/pearg>, <mailto:pearg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/pearg/>
List-Post: <mailto:pearg@irtf.org>
List-Help: <mailto:pearg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/pearg>, <mailto:pearg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jul 2019 14:38:44 -0000
https://www.nature.com/articles/s41467-019-10933-3 (PDF: https://www.nature.com/articles/s41467-019-10933-3.pdf ) # Estimating the success of re-identifications in incomplete datasets using generative models Luc Rocher, Julien M. Hendrickx & Yves-Alexandre de Montjoye Abstract: While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model. -- Joseph Lorenzo Hall Chief Technologist, Center for Democracy & Technology [https://www.cdt.org] 1401 K ST NW STE 200, Washington DC 20005-3497 e: joe@cdt.org, p: 202.407.8825, pgp: https://josephhall.org/gpg-key Fingerprint: 3CA2 8D7B 9F6D DBD3 4B10 1607 5F86 6987 40A9 A871
- [Pearg] new reidentification result using GANs... Joseph Lorenzo Hall