Re: [scim] Discussion Item: Personally Identifiable Information in SCIM

Radovan Semancik <radovan.semancik@evolveum.com> Tue, 16 November 2021 07:36 UTC

Return-Path: <radovan.semancik@evolveum.com>
X-Original-To: scim@ietfa.amsl.com
Delivered-To: scim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 04AA13A09D1 for <scim@ietfa.amsl.com>; Mon, 15 Nov 2021 23:36:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.941
X-Spam-Level:
X-Spam-Status: No, score=-3.941 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-1.852, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=evolveum.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QKWNv6Ky_sYN for <scim@ietfa.amsl.com>; Mon, 15 Nov 2021 23:36:17 -0800 (PST)
Received: from zimbra.evolveum.com (zimbra.evolveum.com [185.50.215.180]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 39D6C3A09CA for <scim@ietf.org>; Mon, 15 Nov 2021 23:36:15 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by zimbra.evolveum.com (Postfix) with ESMTP id 4571D439A33 for <scim@ietf.org>; Tue, 16 Nov 2021 08:36:12 +0100 (CET)
Received: from zimbra.evolveum.com ([127.0.0.1]) by localhost (zimbra.evolveum.com [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id D3NpRvQYnUQy for <scim@ietf.org>; Tue, 16 Nov 2021 08:36:12 +0100 (CET)
Received: from localhost (localhost [127.0.0.1]) by zimbra.evolveum.com (Postfix) with ESMTP id 097804399F5 for <scim@ietf.org>; Tue, 16 Nov 2021 08:36:12 +0100 (CET)
DKIM-Filter: OpenDKIM Filter v2.10.3 zimbra.evolveum.com 097804399F5
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=evolveum.com; s=993730DA-531D-11E9-B3C4-8B623A39D637; t=1637048172; bh=LpyxdeGDswwW7cRXB68bDZIDYOa7qxGuEvMaZ+NBogc=; h=To:From:Message-ID:Date:MIME-Version; b=TEnbSGDvTIkyQde1r79rNVniLwJkMwVkCS6z4chm6O7N8xOcAli2ttsj9iUnhqPJ9 CnEM3dJxfVLiVkmEcQyDqu8g5P+n6C1nEtpwp49R/KYCALdowCeKy25+4jVHHGF55x fdKCQPxptxk9ccXXpOth0S2d1R6ggfVHp5UzkcFH9sg2AKTeygOESNSUWh8qYPOoVB ZSvfqQb2IEwqp1hgKCOD8AUCTMJ4ulddgMdTe890zB4FDCHDVLzbKSqa/5Mq62B4Ck /Vgl1BXTB2q2+xt3kYSdEgfJXfjmP/gfYSQCfx+ai3nR51kk7LSQrNS6cA9pfHHc6x IIYJ9DCUBX8zw==
X-Virus-Scanned: amavisd-new at zimbra.evolveum.com
Received: from zimbra.evolveum.com ([127.0.0.1]) by localhost (zimbra.evolveum.com [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jXl1alLnohLy for <scim@ietf.org>; Tue, 16 Nov 2021 08:36:11 +0100 (CET)
Received: from [10.1.1.66] (static-dsl-137.87-197-146.telecom.sk [87.197.146.137]) by zimbra.evolveum.com (Postfix) with ESMTPSA id A847B4372EF for <scim@ietf.org>; Tue, 16 Nov 2021 08:36:11 +0100 (CET)
To: scim@ietf.org
References: <CO1PR11MB48024D5296FAF8B347454D1ACD949@CO1PR11MB4802.namprd11.prod.outlook.com> <ed126b67-aff7-0867-2e4b-ec07aed8d366@pdmconsulting.net> <CB1CBE7E-7D17-42E7-AD56-F95F925C6BA0@independentid.com> <5b794493-7fca-3098-65bc-c7ae91ab81f8@pdmconsulting.net> <MW3PR11MB4730EC0A50D149D94FD83D67CD989@MW3PR11MB4730.namprd11.prod.outlook.com> <0330afe6-2273-73e7-912e-0dc569be04b0@pdmconsulting.net>
From: Radovan Semancik <radovan.semancik@evolveum.com>
Message-ID: <ca91ec7e-9b24-5ee8-e4de-6e011f40e8bd@evolveum.com>
Date: Tue, 16 Nov 2021 08:36:11 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0
MIME-Version: 1.0
In-Reply-To: <0330afe6-2273-73e7-912e-0dc569be04b0@pdmconsulting.net>
Content-Type: multipart/alternative; boundary="------------4BA55CD0E586A39CD62BA818"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/scim/NjZlVq9wI77zj74Wq9-bwZCmGnw>
Subject: Re: [scim] Discussion Item: Personally Identifiable Information in SCIM
X-BeenThere: scim@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Simple Cloud Identity Management BOF <scim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scim>, <mailto:scim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/scim/>
List-Post: <mailto:scim@ietf.org>
List-Help: <mailto:scim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scim>, <mailto:scim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Nov 2021 07:36:24 -0000

Hello,

The situation is much more complicated that it may seem.

First of all, this issue is not limited to personally-identifiable 
information (PII), it also applied to personal data (please see our 
glossary [1] for clarification). In fact, GDPR and similar regulations 
deal with personal data, not just PII. Therefore it is better to 
consider personal data instead of PII.

Secondly, it is not very practical to deal with notion of personal data 
in the schema, especially in a system with considerably static schema 
definition such as SCIM. Whether a particular information is considered 
personal data heavily depends on context. E-mail address may be 
considered personal data (even if it is "corporate"), however if it is 
alias, distribution list or e-mail address used by automated processing 
system it probably won't be personal data, as there is no person behind 
it. Therefore the schema can specify that a particular data item may 
potentially contain personal data. But that is true for almost any data 
item, as the user may enter his SSN into a description. Obviously, 
having a probability of particular item containing a personal data is 
not very helpful either. Marking personal data in the schema not a way 
to go. Maybe marking it in metadata for every value? Well, that is quite 
complicated (e.g. see our data provenance project [2]), and it is even 
more complicated that it seems, because the other thing below.

Thirdly, knowing that some piece of data is "personal data" does not 
gives much information regarding its use - and it is use of personal 
data that matters. Storage and transfer of personal data is important, 
but ultimately it is the use of that that will get you in trouble. How 
should the client behave if something is marked as "personal data"? 
Should consent be requested from user? Or was the consent already given? 
For what purpose was the consent given? Oh, pardon my French, "consent" 
is really a dirty word ... so, is there any legal basis for use of the 
data? What legal basis? What are the constraints for data use? How to 
deal with data erasure? Those are the questions that matter, and they 
cannot be answered by simple "PII flag" in the schema.

This problem is not fully understood, not even by the identity 
management professionals. The technology is not mature enough, e.g. 
there is obviously a need for a rich value metadata, however, this is 
not really supported (or even envisioned) by schema languages (including 
SCIM schema). We are aware of this problem for many years, and last year 
we have tried to prototype a solution (see [2]). The prototype was quite 
successful, but it also uncovered hidden complexity ([3]).

SCIM is so not prepared for any of this. Even if a series of 
precisely-timed miracles happen and this can be somehow handled in SCIM 
specs, vast majority of SCIM clients will not be prepared for this 
anyway. And those that are won't agree on data protection metadata 
schema. This is a very unstable area for experiments and prototypes. It 
is way too early for standardization.

[1]
https://docs.evolveum.com/glossary/#personally-identifiable-information
https://docs.evolveum.com/glossary/#personal-data

[2]
https://docs.evolveum.com/midpoint/projects/midprivacy/phases/01-data-provenance-prototype/

[3]
https://docs.evolveum.com/midpoint/projects/midprivacy/phases/01-data-provenance-prototype/provenance-origin-basis/

-- 
Radovan Semancik
Software Architect
evolveum.com



On 16. 11. 2021 0:53, Danny Mayer wrote:
>
> Paulo,
>
> Are those Personal information or Corporate information? If they are 
> personal email addresses, for example, then I would agree, but what 
> about your business email address? The URL you referenced did not 
> differentiate between the two. Most corporate systems only want the 
> business email address and only HR will have a personal email address. 
> If need be we can specify that a personal email address not be part of 
> the SCIM schema while a business email address can. If we can 
> understand what GDPR requires for PII for email addresses the rest of 
> what you referenced will likely be the same. We can have a further 
> discussion after that on the security requirements for any PII. Note 
> that GDPR is not the only requirement for PII. California in the US 
> also has requirements and I know that other non-EU countries also have 
> requirements.
>
> Danny
>
> On 11/15/21 5:24 AM, Paulo Jorge N. Correia (paucorre) wrote:
>>
>> Danny,
>>
>> Email address, phone numbers, locations, most of it is consider PII, 
>> and the what is even more problematic is when that information 
>> travels across clouds.
>>
>> Many regulations like European GDPR 
>> (https://gdpr.eu/eu-gdpr-personal-data/), but not only EU, most of 
>> the geos like Canada, Singapore, etc. are creating similar 
>> legislations are controlling and monitoring what you do with PII
>>
>> So I would say that is very very relevant that SCIM have the right 
>> mechanisms for the GEOs regulation can by enforce or not.
>>
>> Of course this will require that some kind of privacy expert 
>> (normally Lawyer) to have a look at the RFC schemas and do 
>> recommendation if each attribute is consider PII or not.
>>
>> Thanks,
>>
>> Paulo
>>
>> *From:* scim <scim-bounces@ietf.org> *On Behalf Of *Danny Mayer
>> *Sent:* Sunday, November 14, 2021 17:07
>> *To:* Phillip Hunt <phil.hunt@independentid.com>
>> *Cc:* scim@ietf.org; Janelle Allen (janelall) 
>> <janelall=40cisco.com@dmarc.ietf.org>
>> *Subject:* Re: [scim] Discussion Item: Personally Identifiable 
>> Information in SCIM
>>
>> None of this answers my basic question of why PII would be a part of 
>> SCIM. HR systems (with the exception of a few properties) and Finance 
>> systems should not be sharing PII with other systems and a management 
>> system (a SCIM client) should not be aware of that information. I can 
>> imagine that an expense system, for example, might need some 
>> additional information from an HR system (like a physical address) 
>> but is that what is needed? The other need might be a payroll system 
>> needing Bank information for direct deposit and physical address, but 
>> you want that system to act as a direct SCIM client to HR and not go 
>> through any other servers for that information.
>>
>> Does this make sense? Can someone come up with actual use cases to 
>> justify PII in SCIM?
>>
>> Thanks,
>>
>> Danny
>>
>> On 11/13/21 2:41 PM, Phillip Hunt wrote:
>>
>>     Just for the group's information, the current SCIM specs do have
>>     privacy considerations sections. The confusion may be that back
>>     then, privacy considerations was not a top level table of
>>     contents items.
>>
>>     Relevant sections in existing drafts are:
>>
>>     RFC7644 Section 7.5 -
>>     https://datatracker.ietf.org/doc/html/rfc7644#section-7.5
>>
>>     RFC7643 Section 9 -
>>     https://datatracker.ietf.org/doc/html/rfc7643#section-9. This
>>     covers both sensitive data (e.g. passwords) as well as discussion
>>     on privacy.
>>
>>     Section RFC7644 7.5.2 refers to the case I pointed out in the WG
>>     session.  The HTTP POST .search method was designed to avoid
>>     passing information in request URIs that may appear in other
>>
>>     systems such as access logs which may be seen as inappropriate.
>>
>>     A compliant service provider implementation that allows searching
>>     of PII and sensitive data via GET should normally be returning
>>     HTTP status 403 (Forbidden) in response.  While one might argue
>>     that information has already been exposed by the client, it
>>     doesn’t help to compound the problem by confirming that the
>>     infromation requested is correct.
>>
>>     The SCIM POST Search solution I raised was the result of a
>>     “compromise” the SCIM WG had to make for PII. The SCIM WG
>>     informally raised the concerns with the HTTP WG.
>>
>>     The HTTPbis WG has discussed the problems of searching with HTTP
>>     GET many times before.
>>
>>
>>
>>     Julian Reschke presented on the issue in IETF93 (giving a good
>>     explanation of the privacy issues):
>>
>>     https://httpwg.org/wg-materials/ietf93/ietf-93-httpbis-search.pdf
>>
>>     Going forwards….
>>
>>     The issue of searching using HTTP GET has re-surfaced again with
>>     a proposal for HTTP QUERY:
>>
>>     https://datatracker.ietf.org/doc/draft-ietf-httpbis-safe-method-w-body/
>>
>>     If we end up talking about a SCIMbis effort, we may want to
>>     include support for safe query.  This would be fairly straight
>>     forward as we can take the body define in our POST search method
>>     and simply use the proposed HTTP QUERY method.
>>
>>     Phillip Hunt
>>
>>     @independentid
>>
>>     phil.hunt@independentid.com
>>
>>
>>
>>         On Nov 13, 2021, at 7:36 AM, Danny Mayer
>>         <mayer@pdmconsulting.net> wrote:
>>
>>         On 11/11/21 10:13 AM, Janelle Allen (janelall) wrote:
>>
>>             Hi there,
>>
>>             In the IETF session today, Phil mentioned privacy and the
>>             handling of PII.  A lot of legislation has occurred since
>>             SCIM 2.0. A question to this WG, should we be revisiting
>>             the core schema and marking some attributes as
>>             potentially containing PII?
>>
>>             This caused me to ponder should we be thinking of
>>             modifying the core schema to identify which attributes
>>             may carry PII eg: the complex name attribute has the
>>             potential to carry PII, should we consider adding a new
>>             item as a peer to “mutability” such as “containsPII:
>>             true/false”?. Or expand on the returned element such as
>>             returned: “restrictedPII”? or any other unmentioned
>>             method of addressing PII?
>>
>>         I'd like to understand the use case for even providing PII
>>         data in SCIM. Most of the data that the SCIM Schemas
>>         currently are offering (see RFC7643) are not PII (though
>>         maybe ims and photos might be considered PII - Section
>>         4.1.2). Having dealt with HR systems and their API's I know
>>         that there is only an extremely limited subset of data that
>>         should ever be made available to any outside system and you
>>         don't generally want to host it on a management platform if
>>         it is PII. I didn't attend the meeting so I don't know what
>>         the discussion was about. I personally feel that PII should
>>         NOT be made available through SCIM, but I'm willing to be
>>         persuaded otherwise as long as PII protections can be defined
>>         and required in any resulting document.
>>
>>         Danny
>>
>>         _______________________________________________
>>         scim mailing list
>>         scim@ietf.org <mailto:scim@ietf.org>
>>         https://www.ietf.org/mailman/listinfo/scim
>>         <https://www.ietf.org/mailman/listinfo/scim>
>>
>>
>>
>>     _______________________________________________
>>
>>     scim mailing list
>>
>>     scim@ietf.org
>>
>>     https://www.ietf.org/mailman/listinfo/scim
>>
>>
>> _______________________________________________
>> scim mailing list
>> scim@ietf.org
>> https://www.ietf.org/mailman/listinfo/scim
>
> _______________________________________________
> scim mailing list
> scim@ietf.org
> https://www.ietf.org/mailman/listinfo/scim