Re: [scim] SCIM Protocol - 3 suggestions for improvement

Ganesh and Sashi Prasad <g.c.prasad@gmail.com> Fri, 10 August 2012 22:39 UTC

Return-Path: <g.c.prasad@gmail.com>
X-Original-To: scim@ietfa.amsl.com
Delivered-To: scim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 611C611E80AD for <scim@ietfa.amsl.com>; Fri, 10 Aug 2012 15:39:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.598
X-Spam-Level:
X-Spam-Status: No, score=-3.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Nbw5AFB5bjbs for <scim@ietfa.amsl.com>; Fri, 10 Aug 2012 15:39:05 -0700 (PDT)
Received: from mail-bk0-f44.google.com (mail-bk0-f44.google.com [209.85.214.44]) by ietfa.amsl.com (Postfix) with ESMTP id 8548211E80BA for <scim@ietf.org>; Fri, 10 Aug 2012 15:38:58 -0700 (PDT)
Received: by bkty7 with SMTP id y7so862686bkt.31 for <scim@ietf.org>; Fri, 10 Aug 2012 15:38:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=Kx8p+fq6e5EYwUUyUw8q6tjz2psnwU67NF/MxXUKiTc=; b=TAurlUJ9bAdb8JWrp08JxtavzEuJcgWxdO1Q0Z1OIhYfe8zbwVXXwOOe1fdGUILl47 S10xH+rlymxKHSd4kj5giHh/K5SlxQ2mLbnZmRmigGqmIy5yGEBKlWfQHs/xioUKKhKY 7ItAUIsI4QW5luLVZeMR8Fmn//ITZfs5e9mETS0RhTqxaDdkNFeqpMQPu1OpXDRJ1KPW 2Yb+98mWhL3qjLYLtHAodQEw1/1tkf0xSQAj+RYNuI0FH/PiXIpi4/4DV2XM4uUKB6ev Uk9PiKTDNODKLT1UCNVKixyWdtCIrW/RP9XLDNKwxWAyJ7cZutQBbxVvOKKCxClWaJUm IhUg==
Received: by 10.204.157.143 with SMTP id b15mr1771764bkx.75.1344638337347; Fri, 10 Aug 2012 15:38:57 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.204.240.12 with HTTP; Fri, 10 Aug 2012 15:38:36 -0700 (PDT)
In-Reply-To: <CAO1wKwgTZSXMJ1KLEGDoGyH4S5H_oSWyqgDCECChDoQv-8vdrw@mail.gmail.com>
References: <CAOEeopgkEs9Z8WT_3kNw=owhL+g6JM8jmkS2f50pFFPrLt4Fbw@mail.gmail.com> <56C3C758F9D6534CA3778EAA1E0C34373302493F@BY2PRD0410MB354.namprd04.prod.outlook.com> <CAOEeopgVDq4L_fefJO0h+AeJxRNAdyL6QKxK=ewRGwX-OqeA+A@mail.gmail.com> <DF63ACC82673DB40A7AAC08FFA71DFBD27416E0B@AMXPRD0610MB353.eurprd06.prod.outlook.com> <56C3C758F9D6534CA3778EAA1E0C343733024BFD@BY2PRD0410MB354.namprd04.prod.outlook.com> <CAOEeopji6-x_58PG+vaXWkQUJPiq8aFVX0ApXya0dxKGa0P4qQ@mail.gmail.com> <CCDAAA14-504F-4198-BB53-19AC1AFC12E5@oracle.com> <CAOEeopj3Cz92UCgb_Mf=3o8rRddjg__4hprDroj+Uzabum7gAw@mail.gmail.com> <CAO1wKwgTZSXMJ1KLEGDoGyH4S5H_oSWyqgDCECChDoQv-8vdrw@mail.gmail.com>
From: Ganesh and Sashi Prasad <g.c.prasad@gmail.com>
Date: Sat, 11 Aug 2012 08:38:36 +1000
Message-ID: <CAOEeopgTjSKqk_+MpveC_rE_m-jibLLaRRYJDdOi+g+pT6Y6xQ@mail.gmail.com>
To: Trey Drake <trey.drake@unboundid.com>
Content-Type: multipart/alternative; boundary="0015175cd0de72f0fb04c6f102a6"
Cc: "scim@ietf.org" <scim@ietf.org>, Emmanuel Dreux <edreux@cloudiway.com>, Kelly Grizzle <kelly.grizzle@sailpoint.com>, Phil Hunt <phil.hunt@oracle.com>
Subject: Re: [scim] SCIM Protocol - 3 suggestions for improvement
X-BeenThere: scim@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Simple Cloud Identity Management BOF <scim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scim>, <mailto:scim-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/scim>
List-Post: <mailto:scim@ietf.org>
List-Help: <mailto:scim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scim>, <mailto:scim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 10 Aug 2012 22:39:10 -0000

Trey,

> The externalID is not part of the protocol.  It is an *optional*
attribute within the *schema* specification.

I didn't realise SCIM was also specifying a User schema. Does this mean the
User resource can only hold certain attributes as defined by this schema?
If so, that's going to be a severe constraint, because client organisations
are likely to require many specialised User attributes depending on the
nature of their business, and they will expect to be able to store all of
them, not just the subset defined by the SCIM spec. I hope the SCIM User
schema is not trying to be just a representation of inetOrgPerson (the
standard LDAP schema), because that could be severely limiting. Most
organisations with their own LDAP end up extending inetOrgPerson, and they
will expect that they can do the same in the cloud.

> As for #2, the protocol spec works as you describe if "arbitrary URI
parameters" equate to resource attributes

Yes, that's what I meant, but in implementation, it may work out to be
looser than that. The client should be able to specify any arbitrary
attribute in the search parameters, even those that aren't attributes of
the resource. If an attribute is not defined, or if the attribute exists
but the value doesn't match any stored record, the SP will return no
results. In the former case, it's a "400 Bad Request" and in the latter,
it's a "404 Not Found".

By "candidate key" (a data modelling term), I meant another attribute that
also uniquely identifies a single record, but is not the official "primary
key", i.e., the main ID. In this case, a client's internal identifier also
uniquely identifies a record, but is not the ID used in the URI of RESTful
operations.

Look, this may seem to be splitting hairs since a determined client may be
able to store and search by their own internal ID in either case (the
current SCIM spec or my suggestion). The difference is philosophical. If
the spec is showing how clients can store their internal IDs in the cloud
by explicitly providing for such an attribute, that constitutes bad advice,
IMO. If it turns a blind eye to what they store and they store internal IDs
anyway, they're constraining themselves but the spec comes out smelling of
roses because that's not "recommended practice".

Ganesh

On 11 August 2012 00:50, Trey Drake <trey.drake@unboundid.com> wrote:

> Ganesh,
>
> I'll base my comments on your latest reply (below).
>
> The externalID is not part of the protocol.  It is an *optional* attribute
> within the *schema* specification.  As for #2, the protocol spec works as
> you describe if "arbitrary URI parameters" equate to resource attributes
> (Allow generic search using 'GET /Users' and arbitrary URI parameters).
>  Please clarify your suggestion.
>
> I'm not tracking your coupling concern.  The client can search and hence
> retrieve resources on any attribute it chooses, externalId or otherwise.
>  Nothing mandates use of externalId.
>
>
> What do you mean by "candidate key"?  Given
>
> On Fri, Aug 10, 2012 at 5:49 AM, Ganesh and Sashi Prasad <
> g.c.prasad@gmail.com> wrote:
>
>> >  I think scim gets its current simplicity from its single owner hub
>> spoke model implementing tight coupling. [...] IMHO loose coupling is a
>> much more complex solution.
>>
>> Phil,
>>
>> I'm a bit surprised that you're implying "tight coupling == simple" and
>> "loose coupling == complex". That's contrary to my experience.
>>
>> When I say "loose coupling", I mean "no unnecessary dependencies".
>> Invariably, a reduction in dependencies leads to greater simplicity.
>>
>> Let's not confuse reduction of dependencies in the data model with a
>> hub-and-spokes architecture. They're entirely orthogonal aspects of the
>> solution.
>>
>> All that my suggestion involves is,
>>
>> 1. Take 'external ID' out of the protocol.
>> 2. Allow generic search using 'GET /Users' and arbitrary URI parameters
>>
>> No planned functionality is lost by this.
>>
>> 1. The client enterprise can still send its internal ID as part of the
>> resource body, inside some attribute defined by them (but not defined by
>> the protocol). Let's say they call it 'myID'.
>> 2. The client enterprise can search for resource URIs using any
>> attribute, including this internal ID
>> 'GET /Users?myID=bjensen'
>> Since myID is a candidate key, the server will return exactly one URI,
>> which is the canonical URI for the resource
>>
>> https://example.com/v1/Users/2819c223-7f76-453a-919d-413861904646
>>
>> 3. The client can use this URI to perform all other operations as usual.
>>
>>
>> So taking 'externalID' out of the protocol spec only does this:
>>
>> 1. It avoids enshrining tight coupling in the protocol (If clients want to tightly couple themselves to the cloud provider by sending their internal IDs, they can do so. Suicide is OK, but the protocol should not be guilty of assisted suicide. ;-)
>>
>> 2. It encourages loose coupling by nudging clients towards maintaining their own internal-to-external identifier mappings.
>>
>>
>> That's what I'd like to see. I don't believe this complicates the protocol. It simplifies it and it also lends itself to a loosely-coupled approach.
>>
>>
>> I'll address the multi-valued attribute suggestion separately.
>>
>>
>> Regards,
>>
>> Ganesh
>>
>>
>>
>>
>> On 10 August 2012 07:53, Phil Hunt <phil.hunt@oracle.com> wrote:
>>
>>>
>>>
>>> Phil
>>>
>>> On 2012-08-09, at 14:14, Ganesh and Sashi Prasad <g.c.prasad@gmail.com>
>>> wrote:
>>>
>>> >  storing this information in a mapping table outside of the SCIM spec
>>> is a great way to enable this solution.  Part of the key here is that SCIM
>>> is just a piece of the architecture for this solution, and is only
>>> responsible for the transport layer between domains.
>>>
>>> I wasn't suggesting that the mapping table be part of the SCIM spec. I
>>> provided that example to illustrate that splitting and merging identities
>>> is a common requirement, and that decoupling local identifiers within a
>>> domain from shared identifiers between domains was the best way to
>>> facilitate it.
>>>
>>> I'm suggesting that the spec do less, not more.
>>>
>>> What the SCIM spec needs to do there is just refrain from introducing
>>> tight coupling. I would like to see a single identifier exposed through the
>>> API, with the implication (and perhaps the recommendation) that it be the
>>> shared one. Allowing one domain to expose its internal identifier to the
>>> other creates tight coupling and ensures that both domains need
>>> simultaneously split or merge identities, which is not desirable. So I
>>> recommend _taking out_ the "external id" field from the API. The spec
>>> shouldn't encourage tight coupling. If clients want to pass in their
>>> internal ids as part of the resource body, no one can stop them, and they
>>> can always do a search on that attribute to retrieve the URI exactly as you
>>> visualise they will with the "external id", but let's not elevate an
>>> anti-pattern to a recommendation by enshrining the "external id" as an
>>> acceptable attribute.
>>>
>>> Am I making sense?
>>>
>>>
>>> I see what you are saying. I think scim gets its current simplicity from
>>> its single owner hub spoke model implementing tight coupling.
>>>
>>> IMHO loose coupling is a much more complex solution. The reality is that
>>> each end-point has value to contribute and thus the single-owner model will
>>> eventually need to become multi-owner or multi-hub.
>>>
>>> That said i think the current model provides a practical starting point.
>>>
>>>
>>> >  Regarding unique identifiers for multi-valued attributes there is a
>>> trade-off involved.  On one hand this makes PATCH semantics easier.  On the
>>> other hand it puts extra burden on service providers.
>>>
>>> Precisely. The spec has to strike the right balance. It would be
>>> interesting to hear from the other members of the spec mailing list. You
>>> know where I stand on this. It would be good to hear the spectrum of
>>> opinions.
>>>
>>> Regards,
>>> Ganesh
>>>
>>> On 10 August 2012 00:28, Kelly Grizzle <kelly.grizzle@sailpoint.com>wrote:
>>>
>>>>  Thanks Emmanuel.  I had started writing up a similar response.  As
>>>> you suggest, storing this information in a mapping table outside of the
>>>> SCIM spec is a great way to enable this solution.  Part of the key here is
>>>> that SCIM is just a piece of the architecture for this solution, and is
>>>> only responsible for the transport layer between domains.****
>>>>
>>>> ** **
>>>>
>>>> You could also model these ID mappings in the SCIM user as an extension
>>>> but would probably not want to expose these externally.  Here is an example
>>>> of how to model the end state of the false positive scenario (splitting a
>>>> user):****
>>>>
>>>> ** **
>>>>
>>>> | Internal Entity ID | External Domain ID | External Entity ID |
>>>> Primary flag |****
>>>>
>>>> | 9caf78aac3d6       | D2                 | ff487230b3a0       |
>>>> true         |****
>>>>
>>>> | a99a5feba839       | D2                 | 7a87f27c1dd8       |
>>>> true         |****
>>>>
>>>> ** **
>>>>
>>>> This could be represented as two SCIM users that contain information
>>>> about the entities on other domains.****
>>>>
>>>> ** **
>>>>
>>>> {****
>>>>
>>>>   "schemas": ["urn:scim:schemas:core:1.0",
>>>> "urn:scim:schemas:extension:federation:1.0"],****
>>>>
>>>>   "id": "9caf78aac3d6",****
>>>>
>>>>   "userName": "John Smith",****
>>>>
>>>>   "urn:scim:schemas:extension:federation:1.0": {****
>>>>
>>>>     "linkedUsers": [****
>>>>
>>>>       {****
>>>>
>>>>         "domain": "D2",****
>>>>
>>>>         "externalEntityId": "ff487230b3a0"****
>>>>
>>>>       }****
>>>>
>>>>     ]****
>>>>
>>>>   }****
>>>>
>>>> }****
>>>>
>>>> ** **
>>>>
>>>> {****
>>>>
>>>>   "schemas": ["urn:scim:schemas:core:1.0",
>>>> "urn:scim:schemas:extension:federation:1.0"],****
>>>>
>>>>   "id": "a99a5feba839",****
>>>>
>>>>   "userName": "John Smith",****
>>>>
>>>>   "urn:scim:schemas:extension:federation:1.0": {****
>>>>
>>>>     "linkedUsers": [****
>>>>
>>>>       {****
>>>>
>>>>         "domain": "D2",****
>>>>
>>>>         "externalEntityId": "7a87f27c1dd8"****
>>>>
>>>>       }****
>>>>
>>>>     ]****
>>>>
>>>>   }****
>>>>
>>>> }****
>>>>
>>>> ** **
>>>>
>>>> In the second user, the linkedUsers attribute would be empty until the
>>>> split user was synced to domain 2.****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Similarly, the false negative use case (merging two users) looked like
>>>> this at the end:****
>>>>
>>>> ** **
>>>>
>>>> | Internal Entity ID | External Domain ID | External Entity ID |
>>>> Primary flag |****
>>>>
>>>> | 9caf78aac3d6       | D2                 | ff487230b3a0       |
>>>> true         |****
>>>>
>>>> | 9caf78aac3d6       | D2                 | 41206cc97c8b       |
>>>> false        |****
>>>>
>>>> ** **
>>>>
>>>> This could be represented with the following SCIM user:****
>>>>
>>>> ** **
>>>>
>>>> {****
>>>>
>>>>   "schemas": ["urn:scim:schemas:core:1.0",
>>>> "urn:scim:schemas:extension:federation:1.0"],****
>>>>
>>>>   "id": "9caf78aac3d6",****
>>>>
>>>>   "userName": "John Smith",****
>>>>
>>>>   "urn:scim:schemas:extension:federation:1.0": {****
>>>>
>>>>     "linkedUsers": [****
>>>>
>>>>       {****
>>>>
>>>>         "domain": "D2",****
>>>>
>>>>         "externalEntityId": "ff487230b3a0"****
>>>>
>>>>       },****
>>>>
>>>>       {****
>>>>
>>>>         "domain": "D2",****
>>>>
>>>>         "externalEntityId": "41206cc97c8b",****
>>>>
>>>>         "deletionRequired": true****
>>>>
>>>>       }****
>>>>
>>>>     ]****
>>>>
>>>>   }****
>>>>
>>>> }****
>>>>
>>>> ** **
>>>>
>>>> ** **
>>>>
>>>> Regarding unique identifiers for multi-valued attributes there is a
>>>> trade-off involved.  On one hand this makes PATCH semantics easier.  On the
>>>> other hand it puts extra burden on service providers.  Since the inception
>>>> of SCIM, a key goal has been to foster adoption by service providers by
>>>> making things fit easily onto existing systems.  IMO the value gained by
>>>> unique identifiers for multi-valued attributes is not worth the demands put
>>>> on a service provider.  I also think that vendors that have a
>>>> non-SCIM-compliant API will choose to keep things that way if the spec is
>>>> too hard for them to implement.  In a green field environment we do have
>>>> the luxury of mandating a model to make certain operations more elegant.
>>>> However, we can’t ignore legacy systems. ****
>>>>
>>>> ** **
>>>>
>>>> --Kelly****
>>>>
>>>> ** **
>>>>
>>>> *From:* scim-bounces@ietf.org [mailto:scim-bounces@ietf.org] *On
>>>> Behalf Of *Emmanuel Dreux
>>>> *Sent:* Thursday, August 09, 2012 3:18 AM
>>>> *To:* Ganesh and Sashi Prasad; Kelly Grizzle
>>>> *Cc:* scim@ietf.org
>>>> *Subject:* Re: [scim] SCIM Protocol - 3 suggestions for improvement****
>>>>
>>>> ** **
>>>>
>>>> Hi Ganesh,****
>>>>
>>>> ** **
>>>>
>>>> Nothing prevents you in your SCIM implementation (client or server) to
>>>> generate a unique identifier for each synchronized object and maintain an
>>>> internal mapping table ( you would have to map group membership as well).
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> This is what we are doing with Active Directory sources or targets:****
>>>>
>>>> As we didn’t find an immutable uniqueID in AD systems
>>>> (DN,samAccountName, UPN) are subject to change (even objectGuid can change
>>>> if an AD domain is migrated), we decided to generate and maintain an
>>>> internal table of ids. This fits your requirements as it hides internal ids.
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> This was written in dotnet and we have started a project to rewrite our
>>>> SCIM stack in PHP and will give it to the Open Source community. This
>>>> implementation will have a parameter : AllocateIds versus UseExistingIDs.
>>>> ****
>>>>
>>>> This will give the choice of “hiding” internalIDs or use them as unique
>>>> ID.****
>>>>
>>>> ** **
>>>>
>>>> You can also implement such feature without violating the SCIM specs,
>>>> or without asking to include it in the specs.****
>>>>
>>>> ** **
>>>>
>>>> --****
>>>>
>>>> Regards,****
>>>>
>>>> Emmanuel Dreux****
>>>>
>>>> http://www.cloudiway.com****
>>>>
>>>> Tel: +33 4 26 78 17 58****
>>>>
>>>> Mobile: +33 6 47 81 26 70****
>>>>
>>>> skype: Emmanuel.Dreux****
>>>>
>>>> ** **
>>>>
>>>> *De :* Ganesh and Sashi Prasad [mailto:g.c.prasad@gmail.com<g.c.prasad@gmail.com>]
>>>>
>>>> *Envoyé :* jeudi 9 août 2012 03:35
>>>> *À :* Kelly Grizzle
>>>> *Cc :* scim@ietf.org
>>>> *Objet :* Re: [scim] SCIM Protocol - 3 suggestions for improvement****
>>>>
>>>> ** **
>>>>
>>>> Hi Kelly,****
>>>>
>>>> Thanks for your response. Let me first respond in brief to the two main
>>>> points you have made, and then elaborate on the first.****
>>>>
>>>> **1.      **Why should domains not expose their internal identifiers
>>>> to other domains?****
>>>>
>>>> a.****
>>>>
>>>> We are designing a protocol for a federated system of domains, where
>>>> all domains are co-equal peers. (In physics too, N-body problems are much
>>>> harder than 2-body problems. :-) Therefore, assuming that there are only
>>>> two players in the interaction makes this tightly coupled in a number of
>>>> ways. We should rely on messaging and notification, with encapsulation of
>>>> domain-specific data.****
>>>>
>>>> b. ****
>>>>
>>>> In any non-trivial data store, there will always be the ongoing need to
>>>> merge and split identities as and when “false negatives” and “false
>>>> positives” are discovered. A domain should be able to handle this internal
>>>> housekeeping freely, only notifying other domains when convenient. Mapping
>>>> of internal identifiers to external ones and maintaining this mapping
>>>> internally allows this loosely-coupled housekeeping to take place. Sharing
>>>> internal identifiers (or otherwise outsourcing the mapping of internal to
>>>> external identifiers) forces housekeeping activities to be done in
>>>> lock-step across domains.****
>>>>
>>>> c.****
>>>>
>>>> Asynchronous interaction is not just a matter of a suitable wire
>>>> protocol which can be designed later. The data model plays a crucial role
>>>> in enabling or constraining such interaction. A tightly-coupled data model
>>>> will force the use of synchronous interactions, and the exposure of
>>>> internal identifiers is a key part of this tight coupling.****
>>>>
>>>> 2. The difficulty of assigning unique identifiers to the individual
>>>> values of multi-valued attributes:****
>>>>
>>>> a. ****
>>>>
>>>> I'm not belittling the effort involved in migrating legacy data stores
>>>> to such a model. However, in the larger historical context of cross-domain
>>>> identity management, we are really at the very early stages. If a
>>>> relatively new discipline and a brand new spec are held captive to legacy
>>>> considerations, we are losing an opportunity to provide a clean and elegant
>>>> model to subsequent users of the spec, and this will have repercussions
>>>> over many years or even decades.****
>>>>
>>>> b. ****
>>>>
>>>> If incumbent cloud providers find it hard to immediately adopt the
>>>> dictionary model for existing multi-valued attributes, they can transition
>>>> to this model by offering both “SCIM-compliant” and “non-SCIM-compliant”
>>>> APIs to their customers and encouraging new customers to adopt the
>>>> “SCIM-compliant” API. Legacy customers can be supported using a
>>>> “non-SCIM-compliant” API for an arbitrarily long period and gradually
>>>> migrated to the SCIM-compliant API. The logistics are not insurmountable,
>>>> and shouldn't prevent the adoption of a dictionary model for multi-valued
>>>> attributes.****
>>>>
>>>> Elaboration of Point 1:****
>>>>
>>>> When we consider federated identity across more than one domain, we
>>>> have to assume that domains are not necessarily master-slave in their
>>>> interaction. The most generic interaction model is peer-to-peer, where
>>>> entity lifecycle events within a domain are notified to other domains (when
>>>> necessary) in an asynchronous manner (i.e., through messaging) and the
>>>> other domains are free to respond to these events in an appropriate manner
>>>> and at a time of their convenience.****
>>>>
>>>> A key set of lifecycle events for an entity is the merging and
>>>> splitting of identity that is often required.****
>>>>
>>>> The question “Is this one entity?” can be answered either yes
>>>> (positive) or no (negative). But sometimes, we can discover false positives
>>>> and false negatives in our data stores.****
>>>>
>>>> Consider a case where customers sign up online, and two customers who
>>>> are privacy-conscious enter fake IDs such as “John Smith”, and also use the
>>>> same date of birth (say, 1 Jan 1970) or similar attributes. The front-end
>>>> application may make an intelligent (but incorrect) guess that these two
>>>> persons are the same, and re-assign the same identifier to the second
>>>> person. This is a false positive. They appear to be the same entity, but
>>>> they're actually different. When the error is discovered, the identities
>>>> will need to be split, with a new identifier generated for one of them.
>>>> ****
>>>>
>>>> Consider the opposite case where a customer signs up through two
>>>> different portals or in two different sessions, using the names “JSmith”
>>>> and “JohnS”. It is very likely that they will be treated as two different
>>>> customers and assigned two unique identifiers. This is a false negative.
>>>> They appear to be two entities, but are actually the same. At a later
>>>> stage, when the error is discovered, the identities will have to be merged,
>>>> and one of the identifiers will have to be dropped.****
>>>>
>>>> These are not theoretical use cases. They form a significant proportion
>>>> of the user base in most large Web-facing applications. Let's see how these
>>>> can be managed in a federated way by mapping internal identifiers to
>>>> external ones and only exposing external identifiers to other domains.*
>>>> ***
>>>>
>>>> a. False positives:****
>>>>
>>>> Domain 1 has the following information about a customer in its data
>>>> store:****
>>>>
>>>> Internal ID: 9caf78aac3d6****
>>>>
>>>> Attributes: {name: “John Smith”, dob: “01-Jan-1970”}****
>>>>
>>>> When requesting the provisioning of this entity in Domain 2, the
>>>> following ID is returned by Domain 2: ff487230b3a0.****
>>>>
>>>> Domain 1 then maintains the following in a mapping table and uses it
>>>> for translation when talking to Domain 2, taking care never to expose its
>>>> internal identifier:****
>>>>
>>>> | Internal Entity ID | External Domain ID | External Entity ID |
>>>> Primary flag |****
>>>>
>>>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |****
>>>>
>>>> When the false positive is discovered and the entity is split, Domain 1
>>>> creates a new internal identifier and now has the following entity
>>>> information.****
>>>>
>>>> Internal ID: 9caf78aac3d6****
>>>>
>>>> Attributes: {name: “John Smith”, dob: “01-Jan-1970”}****
>>>>
>>>> Internal ID: a99a5feba839****
>>>>
>>>> Attributes: {name: “John Smith”, dob: “01-Jan-1970”}****
>>>>
>>>> This second entity with its own internal identifier is invisible to
>>>> Domain 2, and this is by design. Communication about the original entity
>>>> takes place as before by mapping “9caf78aac3d6” to “ff487230b3a0” and
>>>> vice-versa. At some convenient time (importantly, this doesn't have to be
>>>> at the time the split happens), Domain 2 can be requested to provision a
>>>> second entity, and when it responds with an identifier of “7a87f27c1dd8”,
>>>> this can go into the mapping table as a new record associated with the
>>>> second entity's internal identifier.****
>>>>
>>>> The mapping table now contains the following entries:****
>>>>
>>>> | Internal Entity ID | External Domain ID | External Entity ID |
>>>> Primary flag |****
>>>>
>>>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |****
>>>>
>>>> | a99a5feba839 | D2 | 7a87f27c1dd8 | true |****
>>>>
>>>> Domain 2 is not even aware that a split has happened, and the
>>>> provisioning that it does is not in lockstep with the split in identity
>>>> that occurred in Domain 1.****
>>>>
>>>> (What is the “Primary flag” used for? We'll see when we cover the
>>>> treatment of false negatives.)****
>>>>
>>>> b. False negatives:****
>>>>
>>>> Domain 1 has the following information about what it thinks are two
>>>> distinct customers in its data store:****
>>>>
>>>> Internal ID: 9caf78aac3d6****
>>>>
>>>> Attributes: {name: “JSmith”, dob: “01-Jan-1970”}****
>>>>
>>>> Internal ID: 273d36e30d09****
>>>>
>>>> Attributes: {name: “JohnS”, dob: “01-Jan-1970”}****
>>>>
>>>> When requesting the provisioning of these entities in Domain 2, the
>>>> following IDs are returned by Domain 2: ff487230b3a0 and 41206cc97c8b.*
>>>> ***
>>>>
>>>> Domain 1 then maintains the following in a mapping table and uses it
>>>> for translation when talking to Domain 2, taking care never to expose its
>>>> internal identifiers:****
>>>>
>>>> | Internal Entity ID | External Domain ID | External Entity ID |
>>>> Primary flag |****
>>>>
>>>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |****
>>>>
>>>> | 273d36e30d09 | D2 | 41206cc97c8b | true |****
>>>>
>>>> When the false negative is discovered and the two entities are merged,
>>>> Domain 1 drops one of the internal identifiers and rationalises the name of
>>>> the customer (say, to “John Smith”). Let's say it retains the first ID
>>>> “9caf78aac3d6” and drops the second “273d36e30d09”.****
>>>>
>>>> The mapping table now looks like this:****
>>>>
>>>> | Internal Entity ID | External Domain ID | External Entity ID |
>>>> Primary flag |****
>>>>
>>>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |****
>>>>
>>>> | 9caf78aac3d6 | D2 | 41206cc97c8b | false |****
>>>>
>>>> Now two external identifiers map to the same internal one, so inbound
>>>> communication from Domain 2 can be unambiguously translated to the same
>>>> entity internally. However, when going outwards, Domain 1 will have to look
>>>> up the translation table to determine the “primary” external ID for this
>>>> entity in Domain 2, which was decided to be “ff487230b3a0”. That's where
>>>> the “Primary flag” comes in. The second external ID “41206cc97c8b” is never
>>>> used thereafter in outbound communication.****
>>>>
>>>> At some stage (importantly, not in lockstep with the identity merge),
>>>> Domain 2 can be requested to delete the customer record identified by
>>>> “41206cc97c8b”, and the second entry in the mapping table can be removed
>>>> once this is acknowledged.****
>>>>
>>>> This scheme will scale up to multiple domains, because the “External
>>>> Domain ID” column helps to keep track of which external ID is shared with
>>>> which Domain. (Why don't we use just one external ID for an entity and
>>>> share it with all external domains? Tight coupling again. Just as OAuth
>>>> allows an access token given to a third party to be invalidated without
>>>> affecting the access of other third parties, the use of separate external
>>>> identifiers for different domains allows fine-grained control of identity
>>>> federation.)****
>>>>
>>>> The scheme also allows the splitting of an entity into more than two
>>>> entities, and the merging of more than two entities into a single one. (Any
>>>> organisation with a web-facing application will tell you how many John
>>>> Smiths there are who were born on 1 Jan 1970!)****
>>>>
>>>> This is a fairly long-winded explanation, but this is why we need to
>>>> hide internal identifiers from other domains, and why mappings need to be
>>>> managed internally in each domain. Such a data model also allows us to
>>>> choose asynchronous protocols for propagation of identity events, since
>>>> there is no consistency requirement to update multiple domains concurrently.
>>>> ****
>>>>
>>>> Regards, ****
>>>>
>>>> Ganesh Prasad****
>>>>
>>>> ** **
>>>>
>>>> On 9 August 2012 04:55, Kelly Grizzle <kelly.grizzle@sailpoint.com>
>>>> wrote:****
>>>>
>>>> Thanks for the feedback, Ganesh.  I read through this and your InfoQ
>>>> article (http://www.infoq.com/articles/scim-data-model-limitations)
>>>> and have some thoughts.****
>>>>
>>>>  ****
>>>>
>>>> > The rest of the protocol does not meaningfully use the enterprise
>>>> client’s identifier, the "external ID"****
>>>>
>>>> > at all, even though it was ostensibly introduced to make things
>>>> friendlier for the client.****
>>>>
>>>>  ****
>>>>
>>>> The usage pattern for an external ID would be to search for a user by
>>>> externalId and use the ID of the returned user in any desired operation.
>>>> For example:****
>>>>
>>>>  ****
>>>>
>>>> GET /Users?filter=externalId eq “bjensen”&attributes=id****
>>>>
>>>>  ****
>>>>
>>>> {****
>>>>
>>>>   “totalResults”: 1,****
>>>>
>>>>   “Resources”: [****
>>>>
>>>>     {****
>>>>
>>>>       “id”: “2819c223-7f76-453a-919d-413861904646”****
>>>>
>>>>     }****
>>>>
>>>>   ]****
>>>>
>>>> }****
>>>>
>>>>  ****
>>>>
>>>> Retrieve the ID from the response and use it.****
>>>>
>>>>  ****
>>>>
>>>> DELETE /Users/2819c223-7f76-453a-919d-413861904646****
>>>>
>>>>  ****
>>>>
>>>> This does introduce an additional HTTP request if the client chooses
>>>> not to store the server’s id.  An issue was created to consider allowing
>>>> operations to use the externalId (
>>>> http://code.google.com/p/scim/issues/detail?id=35), but I believe the
>>>> general consensus has been to not include this in the spec.  One main point
>>>> of contention is that much of the rest of the spec (eg – group membership
>>>> references, manager references, etc…) require knowledge of the server’s
>>>> identifier.  Continuing this discussion on the IETF list would be a good
>>>> thing, though.****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>> > the cloud provider's ID and the enterprise client's ID are both
>>>> "Internal IDs" with respect to their domains****
>>>>
>>>>  ****
>>>>
>>>> I think this comes down to a nomenclature problem.  The server’s ID
>>>> does not necessarily have to be the unique identifier that the underlying
>>>> identity store uses, it just has to be stable and unique.  In many cases,
>>>> the underlying identity store will provide identifiers with these
>>>> properties already (eg – a uuid) and it can be used by the SCIM interface.
>>>> The “externalId” is referring to the fact that the id is maintained
>>>> external to the SCIM server.  As long as the server’s identifiers are
>>>> stable and unique (which is mandated by the spec), I don’t see a problem.
>>>> ****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>> > The secret is that *every value needs a key*, and multi-valued
>>>> attributes lack that. So our solution is quite****
>>>>
>>>> > simple - turn every list or array (of values) into a dictionary (of
>>>> key-value pairs) by providing each value****
>>>>
>>>> > with a unique and meaning-free identifier.****
>>>>
>>>>  ****
>>>>
>>>> I agree that this would be useful, especially in the PATCH operation.
>>>> One reason that this wasn’t included in the spec originally is that it can
>>>> put undue burden on the service provider.  Many service providers are
>>>> putting SCIM interfaces in front of their existing identity stores (eg –
>>>> directory servers, SaaS application databases, etc…).  Many of these do not
>>>> have a unique identifier for multi-valued attributes.  By requiring this, a
>>>> majority of the server providers would have to start maintaining a unique
>>>> key for each multi-valued attribute.  I believe this would be a roadblock
>>>> for many implementers.****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>> > When the SCIM protocol uses PATCH, there are areas where it seems a
>>>> bit clumsy.****
>>>>
>>>>  ****
>>>>
>>>> I like the thoughts here.  Your example reminds me of unified diffs (
>>>> http://en.wikipedia.org/wiki/Diff#Unified_format), which are commonly
>>>> used with a patch program (pretty much the equivalent of the PATCH verb).
>>>>  However, the three proposals seem to largely hinge on being able to
>>>> uniquely address each element within an object.  Without these it is not so
>>>> easy to address each patch sub-operation (REPLACE, INCLUDE, etc…) or
>>>> provide a multi-status.****
>>>>
>>>>
>>>> The 207 response would be interesting to consider for the bulk endpoint
>>>> (
>>>> http://www.simplecloud.info/specs/draft-scim-api-00.html#bulk-resources),
>>>> however.****
>>>>
>>>>  ****
>>>>
>>>>  ****
>>>>
>>>> > There are other, non-data aspects of SCIM which may require review,
>>>> such as its synchronous request-response****
>>>>
>>>> > interaction model, which is a form of tight coupling and could prove
>>>> to be a source of brittleness.****
>>>>
>>>>  ****
>>>>
>>>> I agree that we should explore optional asynchronous requests in 2.0.**
>>>> **
>>>>
>>>>  ****
>>>>
>>>> Thanks again for your thoughts.  I hope you stay involved in the
>>>> discussion as work on SCIM 2.0 goes forward.****
>>>>
>>>>  ****
>>>>
>>>> --Kelly****
>>>>
>>>>  ****
>>>>
>>>> *From:* scim-bounces@ietf.org [mailto:scim-bounces@ietf.org] *On
>>>> Behalf Of *Ganesh and Sashi Prasad
>>>> *Sent:* Wednesday, August 01, 2012 4:24 PM
>>>> *To:* scim@ietf.org
>>>> *Subject:* [scim] SCIM Protocol - 3 suggestions for improvement****
>>>>
>>>>  ****
>>>>
>>>> (I posted this on the SCIM Google Group, and I was advised to subscribe
>>>> to the mailing list and post it here instead, so here goes.)****
>>>>
>>>>  ****
>>>>
>>>> Hi,****
>>>>
>>>>  ****
>>>>
>>>> My name is Ganesh Prasad, and my experience in Identity and Access
>>>> Management is mainly through a 3-year project at an Australian insurance
>>>> company, an experience I have written about as a eBook on InfoQ (
>>>> http://www.infoq.com/minibooks/Identity-Management-Shoestring).****
>>>>
>>>>  ****
>>>>
>>>> I have been following the SCIM spec off and on, and based on my
>>>> experience with a loosely-coupled architecture that I found to be
>>>> successful, I have the following 3 suggestions to make.****
>>>>
>>>>  ****
>>>>
>>>> 1. The enterprise client and the cloud provider should maintain their
>>>> own internal IDs for a resource, which they should not reveal to each
>>>> other. Both of them should map their internal IDs to a shared External ID,
>>>> and this is the only ID that should be exposed through the API. The current
>>>> specification's provision of an id (which is the external ID and the only
>>>> one to be transferred through the API) and an "external ID" (which is the
>>>> client's internal ID and should be hidden) is diametrically opposite to
>>>> this.****
>>>>
>>>>  ****
>>>>
>>>> 2. When dealing with multi-valued attributes of a resource (expressed
>>>> as arrays in JSON), they must be converted from an array into a dictionary
>>>> with unique keys (UUIDs generated by the cloud provider when the attribute
>>>> is created). Without unique keys for every attribute value of a resource,
>>>> manipulating it will be clumsy and inelegant.****
>>>>
>>>>  ****
>>>>
>>>> 3. The PATCH command can be improved in 3 significant ways:****
>>>>
>>>> 3a. Leverage the fact (from 2 above) that every value has a key, to
>>>> greatly simplify the API****
>>>>
>>>> 3b. Use special verbs as nested operations of the PATCH command to add,
>>>> modify and delete attributes at any level****
>>>>
>>>> 3c. Use the WebDAV status code of "207 Multi-Status" instead of "200
>>>> OK" as the response to a PATCH (or BULK) command.****
>>>>
>>>>  ****
>>>>
>>>> To elaborate,****
>>>>
>>>>  ****
>>>>
>>>> 1. Revealing private IDs externally is a form of tight coupling. A
>>>> major requirement with Identity Management is to split (or merge)
>>>> identities when false positives (or false negatives) are detected, i.e.,
>>>> when a resource is discovered to be more than one, or when multiple
>>>> resources are detected to be the same. If internal identifiers are revealed
>>>> to external domains, such clean-ups become difficult, hence every domain
>>>> that wants to expose references to a resource must map its internal ID to
>>>> and external one created for this explicit purpose, and only reveal this.
>>>> ****
>>>>
>>>>  ****
>>>>
>>>> In the SCIM case, when an enterprise client POSTs a resource creation
>>>> request, the cloud provider must generate its own internal UUID as well as
>>>> an external UUID, map them together, and only return the external UUID in
>>>> the "Location:" header. The enterprise client should map this external UUID
>>>> to a newly-generated internal ID of its own. In case the resource already
>>>> has an identifier within the enterprise client's domain, then this is the
>>>> internal ID that must be mapped to the external UUID returned through the
>>>> POST response.****
>>>>
>>>>  ****
>>>>
>>>> 2. If a resource is to be created, and one of its attributes is
>>>> multi-valued, e.g.,****
>>>>
>>>>  ****
>>>>
>>>>     "email-addrs" : ****
>>>>
>>>>     [****
>>>>
>>>>         "john_smith@yahoo.com",****
>>>>
>>>>         "john.smith@gmail.com",****
>>>>
>>>>         "jsmith1970@hotmail.com"****
>>>>  <
>>>>
>>> _______________________________________________
>>> scim mailing list
>>> scim@ietf.org
>>> https://www.ietf.org/mailman/listinfo/scim
>>>
>>>
>>
>> _______________________________________________
>> scim mailing list
>> scim@ietf.org
>> https://www.ietf.org/mailman/listinfo/scim
>>
>>
>