Re: [scim] SCIM Protocol - 3 suggestions for improvement

Phil Hunt <phil.hunt@oracle.com> Sat, 11 August 2012 17:06 UTC

Return-Path: <phil.hunt@oracle.com>
X-Original-To: scim@ietfa.amsl.com
Delivered-To: scim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EC99821F8578 for <scim@ietfa.amsl.com>; Sat, 11 Aug 2012 10:06:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.202
X-Spam-Level:
X-Spam-Status: No, score=-9.202 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=1.396, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Pv0X3doQBsrG for <scim@ietfa.amsl.com>; Sat, 11 Aug 2012 10:06:53 -0700 (PDT)
Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by ietfa.amsl.com (Postfix) with ESMTP id 0AE8221F84F6 for <scim@ietf.org>; Sat, 11 Aug 2012 10:06:52 -0700 (PDT)
Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by acsinet15.oracle.com (Sentrion-MTA-4.2.2/Sentrion-MTA-4.2.2) with ESMTP id q7BH6ld3015971 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 11 Aug 2012 17:06:48 GMT
Received: from acsmt356.oracle.com (acsmt356.oracle.com [141.146.40.156]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id q7BH6lix029297 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 11 Aug 2012 17:06:47 GMT
Received: from abhmt120.oracle.com (abhmt120.oracle.com [141.146.116.72]) by acsmt356.oracle.com (8.12.11.20060308/8.12.11) with ESMTP id q7BH6kd2021545; Sat, 11 Aug 2012 12:06:46 -0500
Received: from [25.73.50.96] (/74.198.150.224) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Sat, 11 Aug 2012 10:06:45 -0700
References: <CAOEeopgkEs9Z8WT_3kNw=owhL+g6JM8jmkS2f50pFFPrLt4Fbw@mail.gmail.com> <56C3C758F9D6534CA3778EAA1E0C34373302493F@BY2PRD0410MB354.namprd04.prod.outlook.com> <CAOEeopgVDq4L_fefJO0h+AeJxRNAdyL6QKxK=ewRGwX-OqeA+A@mail.gmail.com> <DF63ACC82673DB40A7AAC08FFA71DFBD27416E0B@AMXPRD0610MB353.eurprd06.prod.outlook.com> <56C3C758F9D6534CA3778EAA1E0C343733024BFD@BY2PRD0410MB354.namprd04.prod.outlook.com> <CAOEeopji6-x_58PG+vaXWkQUJPiq8aFVX0ApXya0dxKGa0P4qQ@mail.gmail.com> <CCDAAA14-504F-4198-BB53-19AC1AFC12E5@oracle.com> <CAOEeopj3Cz92UCgb_Mf=3o8rRddjg__4hprDroj+Uzabum7gAw@mail.gmail.com> <CAO1wKwgTZSXMJ1KLEGDoGyH4S5H_oSWyqgDCECChDoQv-8vdrw@mail.gmail.com> <CAOEeopgTjSKqk_+MpveC_rE_m-jibLLaRRYJDdOi+g+pT6Y6xQ@mail.gmail.com> <5159F724-D64F-470C-900A-D7B17A7BC6A5@unboundid.com> <CAOEeopjiz3MyNaSxeqG9E97=dvGsyZunCJFMx1iSjy3yC6FQuQ@mail.gmail.com>
In-Reply-To: <CAOEeopjiz3MyNaSxeqG9E97=dvGsyZunCJFMx1iSjy3yC6FQuQ@mail.gmail.com>
Mime-Version: 1.0 (1.0)
Content-Transfer-Encoding: 7bit
Content-Type: multipart/alternative; boundary="Apple-Mail-C7C4D6EF-93CE-43F1-9570-3016D2935F07"
Message-Id: <B78708E0-F60E-4237-A89F-BBC438D9E3D7@oracle.com>
X-Mailer: iPhone Mail (9B206)
From: Phil Hunt <phil.hunt@oracle.com>
Date: Sat, 11 Aug 2012 11:06:38 -0600
To: Ganesh and Sashi Prasad <g.c.prasad@gmail.com>
X-Source-IP: acsinet22.oracle.com [141.146.126.238]
Cc: "scim@ietf.org" <scim@ietf.org>, Kelly Grizzle <kelly.grizzle@sailpoint.com>, Trey Drake <trey.drake@unboundid.com>, Emmanuel Dreux <edreux@cloudiway.com>
Subject: Re: [scim] SCIM Protocol - 3 suggestions for improvement
X-BeenThere: scim@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Simple Cloud Identity Management BOF <scim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scim>, <mailto:scim-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/scim>
List-Post: <mailto:scim@ietf.org>
List-Help: <mailto:scim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scim>, <mailto:scim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 11 Aug 2012 17:06:57 -0000

Yes. I object. ExternalId is optional. There is no need for this major breaking change. IMHO. 

Phil

On 2012-08-11, at 10:18, Ganesh and Sashi Prasad <g.c.prasad@gmail.com> wrote:

> > And that's because?  I don't see any value to the behavior (enabling attribute searches on non-existent schema) you suggest. 
> 
> I wasn't suggesting there was value in it. I was covering the possibility that the client could erroneously send URI parameters that are not valid attributes of the resource, and the only response to such requests is a "400 Bad Request".
> 
> >  My question is why do you believe an SP ought to care about "candidate keys"?   Resources are uniquely identified exactly 1 way by the SP and that is by the SP minted id.  The Consumer can retrieve a resource by any attribute though only the id is guaranteed to be unique from the POV of the SP.  The SCIM model is a *mapping* between the consumer and SP hence it is not a prescription for how the Resource data is actually stored on either side. 
> 
> I think we may actually be in violent agreement here! Your description exactly matches my recommendation of a data model that is loosely-coupled between client and SP. The SP should not care about "candidate keys", of course. But if one of the attributes of the resource happens to be a client-internal key, then the _client_ knows that it is a candidate key, and that any search based on the candidate key will return at most one URI with the "SP minted ID".
> 
> This is what I'd like to see in the SCIM spec. There is only one key exposed _as a key_ between client and SP, and that is the "SP minted ID". There is no _special_ attribute in the API called an "external ID". If the client wants to embed its internal identifier as an attribute within the resource, it is free to do so. The SP will treat it as just another attribute and will not even bother to impose a uniqueness constraint on it.
> 
> > If externalId were dropped it wouldn't bother me a bit.  
> 
> Good. Does anyone have an actual _objection_ to dropping the "externalID" field? As I said above, it doesn't prevent a client from sending such an attribute as an ordinary attribute and doing searches on it. Keeping the SP ignorant of its status as a key within the client decouples the two parties to some extent, so dropping the specially-named "externalID" is desirable.
> 
> Regards,
> Ganesh
> 
> On 12 August 2012 01:59, Trey Drake <trey.drake@unboundid.com> wrote:
> On Aug 10, 2012, at 5:38 PM, Ganesh and Sashi Prasad <g.c.prasad@gmail.com> wrote:
> 
>> Trey,
>> 
>> > The externalID is not part of the protocol.  It is an *optional* attribute within the *schema* specification.
>> 
>> I didn't realise SCIM was also specifying a User schema. Does this mean the User resource can only hold certain attributes as defined by this schema? If so, that's going to be a severe constraint, because client organisations are likely to require many specialised User attributes depending on the nature of their business, and they will expect to be able to store all of them, not just the subset defined by the SCIM spec. I hope the SCIM User schema is not trying to be just a representation of inetOrgPerson (the standard LDAP schema), because that could be severely limiting. Most organisations with their own LDAP end up extending inetOrgPerson, and they will expect that they can do the same in the cloud.
>> 
> I recommend you read the relevant specifications. The working group has accepted both a schema and protocol draft as input to the effort. 
> 
>> > As for #2, the protocol spec works as you describe if "arbitrary URI parameters" equate to resource attributes 
>> 
>> Yes, that's what I meant, but in implementation, it may work out to be looser than that. The client should be able to specify any arbitrary attribute in the search parameters, even those that aren't attributes of the resource. If an attribute is not defined, or if the attribute exists but the value doesn't match any stored record, the SP will return no results. In the former case, it's a "400 Bad Request" and in the latter, it's a "404 Not Found".
> 
> And that's because?  I don't see any value to the behavior (enabling attribute searches on non-existent schema) you suggest. 
> 
>> 
>> By "candidate key" (a data modelling term), I meant another attribute that also uniquely identifies a single record, but is not the official "primary key", i.e., the main ID. In this case, a client's internal identifier also uniquely identifies a record, but is not the ID used in the URI of RESTful operations.
> 
> I know what a candidate key is.  My question is why do you believe an SP ought to care about "candidate keys"?   Resources are uniquely identified exactly 1 way by the SP and that is by the SP minted id.  The Consumer can retrieve a resource by any attribute though only the id is guaranteed to be unique from the POV of the SP.  The SCIM model is a *mapping* between the consumer and SP hence it is not a prescription for how the Resource data is actually stored on either side.
>   
>> 
>> Look, this may seem to be splitting hairs since a determined client may be able to store and search by their own internal ID in either case (the current SCIM spec or my suggestion). The difference is philosophical. If the spec is showing how clients can store their internal IDs in the cloud by explicitly providing for such an attribute, that constitutes bad advice, IMO. If it turns a blind eye to what they store and they store internal IDs anyway, they're constraining themselves but the spec comes out smelling of roses because that's not "recommended practice".
>> 
> 
> If externalId were dropped it wouldn't bother me a bit.  
> 
>> Ganesh
>> 
>> On 11 August 2012 00:50, Trey Drake <trey.drake@unboundid.com> wrote:
>> Ganesh,
>> 
>> I'll base my comments on your latest reply (below).  
>> 
>> The externalID is not part of the protocol.  It is an *optional* attribute within the *schema* specification.  As for #2, the protocol spec works as you describe if "arbitrary URI parameters" equate to resource attributes (Allow generic search using 'GET /Users' and arbitrary URI parameters).  Please clarify your suggestion.
>> 
>> I'm not tracking your coupling concern.  The client can search and hence retrieve resources on any attribute it chooses, externalId or otherwise.  Nothing mandates use of externalId.   
>> 
>> 
>> What do you mean by "candidate key"?  Given 
>> 
>> On Fri, Aug 10, 2012 at 5:49 AM, Ganesh and Sashi Prasad <g.c.prasad@gmail.com> wrote:
>> >  I think scim gets its current simplicity from its single owner hub spoke model implementing tight coupling. [...] IMHO loose coupling is a much more complex solution.
>> 
>> Phil,
>> 
>> I'm a bit surprised that you're implying "tight coupling == simple" and "loose coupling == complex". That's contrary to my experience.
>> 
>> When I say "loose coupling", I mean "no unnecessary dependencies". Invariably, a reduction in dependencies leads to greater simplicity.
>> 
>> Let's not confuse reduction of dependencies in the data model with a hub-and-spokes architecture. They're entirely orthogonal aspects of the solution.
>> 
>> All that my suggestion involves is,
>> 
>> 1. Take 'external ID' out of the protocol.
>> 2. Allow generic search using 'GET /Users' and arbitrary URI parameters
>> 
>> No planned functionality is lost by this.
>> 
>> 1. The client enterprise can still send its internal ID as part of the resource body, inside some attribute defined by them (but not defined by the protocol). Let's say they call it 'myID'.
>> 2. The client enterprise can search for resource URIs using any attribute, including this internal ID
>> 'GET /Users?myID=bjensen'
>> Since myID is a candidate key, the server will return exactly one URI, which is the canonical URI for the resource
>> https://example.com/v1/Users/2819c223-7f76-453a-919d-413861904646
>> 3. The client can use this URI to perform all other operations as usual.
>> 
>> So taking 'externalID' out of the protocol spec only does this:
>> 1. It avoids enshrining tight coupling in the protocol (If clients want to tightly couple themselves to the cloud provider by sending their internal IDs, they can do so. Suicide is OK, but the protocol should not be guilty of assisted suicide. ;-)
>> 2. It encourages loose coupling by nudging clients towards maintaining their own internal-to-external identifier mappings.
>> 
>> That's what I'd like to see. I don't believe this complicates the protocol. It simplifies it and it also lends itself to a loosely-coupled approach.
>> 
>> I'll address the multi-valued attribute suggestion separately.
>> 
>> Regards,
>> Ganesh
>> 
>> 
>> 
>> 
>> On 10 August 2012 07:53, Phil Hunt <phil.hunt@oracle.com> wrote:
>> 
>> 
>> Phil
>> 
>> On 2012-08-09, at 14:14, Ganesh and Sashi Prasad <g.c.prasad@gmail.com> wrote:
>> 
>>> >  storing this information in a mapping table outside of the SCIM spec is a great way to enable this solution.  Part of the key here is that SCIM is just a piece of the architecture for this solution, and is only responsible for the transport layer between domains. 
>>> 
>>> I wasn't suggesting that the mapping table be part of the SCIM spec. I provided that example to illustrate that splitting and merging identities is a common requirement, and that decoupling local identifiers within a domain from shared identifiers between domains was the best way to facilitate it.
>>> 
>>> I'm suggesting that the spec do less, not more.
>>> 
>>> What the SCIM spec needs to do there is just refrain from introducing tight coupling. I would like to see a single identifier exposed through the API, with the implication (and perhaps the recommendation) that it be the shared one. Allowing one domain to expose its internal identifier to the other creates tight coupling and ensures that both domains need simultaneously split or merge identities, which is not desirable. So I recommend _taking out_ the "external id" field from the API. The spec shouldn't encourage tight coupling. If clients want to pass in their internal ids as part of the resource body, no one can stop them, and they can always do a search on that attribute to retrieve the URI exactly as you visualise they will with the "external id", but let's not elevate an anti-pattern to a recommendation by enshrining the "external id" as an acceptable attribute.
>>> 
>>> Am I making sense?
>> 
>> I see what you are saying. I think scim gets its current simplicity from its single owner hub spoke model implementing tight coupling. 
>> 
>> IMHO loose coupling is a much more complex solution. The reality is that each end-point has value to contribute and thus the single-owner model will eventually need to become multi-owner or multi-hub. 
>> 
>> That said i think the current model provides a practical starting point. 
>>> 
>>> >  Regarding unique identifiers for multi-valued attributes there is a trade-off involved.  On one hand this makes PATCH semantics easier.  On the other hand it puts extra burden on service providers. 
>>> 
>>> Precisely. The spec has to strike the right balance. It would be interesting to hear from the other members of the spec mailing list. You know where I stand on this. It would be good to hear the spectrum of opinions.
>>> 
>>> Regards,
>>> Ganesh
>>> 
>>> On 10 August 2012 00:28, Kelly Grizzle <kelly.grizzle@sailpoint.com> wrote:
>>> Thanks Emmanuel.  I had started writing up a similar response.  As you suggest, storing this information in a mapping table outside of the SCIM spec is a great way to enable this solution.  Part of the key here is that SCIM is just a piece of the architecture for this solution, and is only responsible for the transport layer between domains.
>>> 
>>>  
>>> 
>>> You could also model these ID mappings in the SCIM user as an extension but would probably not want to expose these externally.  Here is an example of how to model the end state of the false positive scenario (splitting a user):
>>> 
>>>  
>>> 
>>> | Internal Entity ID | External Domain ID | External Entity ID | Primary flag |
>>> 
>>> | 9caf78aac3d6       | D2                 | ff487230b3a0       | true         |
>>> 
>>> | a99a5feba839       | D2                 | 7a87f27c1dd8       | true         |
>>> 
>>>  
>>> 
>>> This could be represented as two SCIM users that contain information about the entities on other domains.
>>> 
>>>  
>>> 
>>> {
>>> 
>>>   "schemas": ["urn:scim:schemas:core:1.0", "urn:scim:schemas:extension:federation:1.0"],
>>> 
>>>   "id": "9caf78aac3d6",
>>> 
>>>   "userName": "John Smith",
>>> 
>>>   "urn:scim:schemas:extension:federation:1.0": {
>>> 
>>>     "linkedUsers": [
>>> 
>>>       {
>>> 
>>>         "domain": "D2",
>>> 
>>>         "externalEntityId": "ff487230b3a0"
>>> 
>>>       }
>>> 
>>>     ]
>>> 
>>>   }
>>> 
>>> }
>>> 
>>>  
>>> 
>>> {
>>> 
>>>   "schemas": ["urn:scim:schemas:core:1.0", "urn:scim:schemas:extension:federation:1.0"],
>>> 
>>>   "id": "a99a5feba839",
>>> 
>>>   "userName": "John Smith",
>>> 
>>>   "urn:scim:schemas:extension:federation:1.0": {
>>> 
>>>     "linkedUsers": [
>>> 
>>>       {
>>> 
>>>         "domain": "D2",
>>> 
>>>         "externalEntityId": "7a87f27c1dd8"
>>> 
>>>       }
>>> 
>>>     ]
>>> 
>>>   }
>>> 
>>> }
>>> 
>>>  
>>> 
>>> In the second user, the linkedUsers attribute would be empty until the split user was synced to domain 2.
>>> 
>>>  
>>> 
>>>  
>>> 
>>> Similarly, the false negative use case (merging two users) looked like this at the end:
>>> 
>>>  
>>> 
>>> | Internal Entity ID | External Domain ID | External Entity ID | Primary flag |
>>> 
>>> | 9caf78aac3d6       | D2                 | ff487230b3a0       | true         |
>>> 
>>> | 9caf78aac3d6       | D2                 | 41206cc97c8b       | false        |
>>> 
>>>  
>>> 
>>> This could be represented with the following SCIM user:
>>> 
>>>  
>>> 
>>> {
>>> 
>>>   "schemas": ["urn:scim:schemas:core:1.0", "urn:scim:schemas:extension:federation:1.0"],
>>> 
>>>   "id": "9caf78aac3d6",
>>> 
>>>   "userName": "John Smith",
>>> 
>>>   "urn:scim:schemas:extension:federation:1.0": {
>>> 
>>>     "linkedUsers": [
>>> 
>>>       {
>>> 
>>>         "domain": "D2",
>>> 
>>>         "externalEntityId": "ff487230b3a0"
>>> 
>>>       },
>>> 
>>>       {
>>> 
>>>         "domain": "D2",
>>> 
>>>         "externalEntityId": "41206cc97c8b",
>>> 
>>>         "deletionRequired": true
>>> 
>>>       }
>>> 
>>>     ]
>>> 
>>>   }
>>> 
>>> }
>>> 
>>>  
>>> 
>>>  
>>> 
>>> Regarding unique identifiers for multi-valued attributes there is a trade-off involved.  On one hand this makes PATCH semantics easier.  On the other hand it puts extra burden on service providers.  Since the inception of SCIM, a key goal has been to foster adoption by service providers by making things fit easily onto existing systems.  IMO the value gained by unique identifiers for multi-valued attributes is not worth the demands put on a service provider.  I also think that vendors that have a non-SCIM-compliant API will choose to keep things that way if the spec is too hard for them to implement.  In a green field environment we do have the luxury of mandating a model to make certain operations more elegant.  However, we can’t ignore legacy systems.
>>> 
>>>  
>>> 
>>> --Kelly
>>> 
>>>  
>>> 
>>> From: scim-bounces@ietf.org [mailto:scim-bounces@ietf.org] On Behalf Of Emmanuel Dreux
>>> Sent: Thursday, August 09, 2012 3:18 AM
>>> To: Ganesh and Sashi Prasad; Kelly Grizzle
>>> Cc: scim@ietf.org
>>> Subject: Re: [scim] SCIM Protocol - 3 suggestions for improvement
>>> 
>>>  
>>> 
>>> Hi Ganesh,
>>> 
>>>  
>>> 
>>> Nothing prevents you in your SCIM implementation (client or server) to generate a unique identifier for each synchronized object and maintain an internal mapping table ( you would have to map group membership as well).
>>> 
>>>  
>>> 
>>> This is what we are doing with Active Directory sources or targets:
>>> 
>>> As we didn’t find an immutable uniqueID in AD systems (DN,samAccountName, UPN) are subject to change (even objectGuid can change if an AD domain is migrated), we decided to generate and maintain an internal table of ids. This fits your requirements as it hides internal ids.
>>> 
>>>  
>>> 
>>> This was written in dotnet and we have started a project to rewrite our SCIM stack in PHP and will give it to the Open Source community. This implementation will have a parameter : AllocateIds versus UseExistingIDs.
>>> 
>>> This will give the choice of “hiding” internalIDs or use them as unique ID.
>>> 
>>>  
>>> 
>>> You can also implement such feature without violating the SCIM specs, or without asking to include it in the specs.
>>> 
>>>  
>>> 
>>> --
>>> 
>>> Regards,
>>> 
>>> Emmanuel Dreux
>>> 
>>> http://www.cloudiway.com
>>> 
>>> Tel: +33 4 26 78 17 58
>>> 
>>> Mobile: +33 6 47 81 26 70
>>> 
>>> skype: Emmanuel.Dreux
>>> 
>>>  
>>> 
>>> De : Ganesh and Sashi Prasad [mailto:g.c.prasad@gmail.com] 
>>> Envoyé : jeudi 9 août 2012 03:35
>>> À : Kelly Grizzle
>>> Cc : scim@ietf.org
>>> Objet : Re: [scim] SCIM Protocol - 3 suggestions for improvement
>>> 
>>>  
>>> 
>>> Hi Kelly,
>>> Thanks for your response. Let me first respond in brief to the two main points you have made, and then elaborate on the first.
>>> 1.      Why should domains not expose their internal identifiers to other domains?
>>> a.
>>> We are designing a protocol for a federated system of domains, where all domains are co-equal peers. (In physics too, N-body problems are much harder than 2-body problems. :-) Therefore, assuming that there are only two players in the interaction makes this tightly coupled in a number of ways. We should rely on messaging and notification, with encapsulation of domain-specific data.
>>> b.
>>> In any non-trivial data store, there will always be the ongoing need to merge and split identities as and when “false negatives” and “false positives” are discovered. A domain should be able to handle this internal housekeeping freely, only notifying other domains when convenient. Mapping of internal identifiers to external ones and maintaining this mapping internally allows this loosely-coupled housekeeping to take place. Sharing internal identifiers (or otherwise outsourcing the mapping of internal to external identifiers) forces housekeeping activities to be done in lock-step across domains.
>>> c.
>>> Asynchronous interaction is not just a matter of a suitable wire protocol which can be designed later. The data model plays a crucial role in enabling or constraining such interaction. A tightly-coupled data model will force the use of synchronous interactions, and the exposure of internal identifiers is a key part of this tight coupling.
>>> 2. The difficulty of assigning unique identifiers to the individual values of multi-valued attributes:
>>> a.
>>> I'm not belittling the effort involved in migrating legacy data stores to such a model. However, in the larger historical context of cross-domain identity management, we are really at the very early stages. If a relatively new discipline and a brand new spec are held captive to legacy considerations, we are losing an opportunity to provide a clean and elegant model to subsequent users of the spec, and this will have repercussions over many years or even decades.
>>> b.
>>> If incumbent cloud providers find it hard to immediately adopt the dictionary model for existing multi-valued attributes, they can transition to this model by offering both “SCIM-compliant” and “non-SCIM-compliant” APIs to their customers and encouraging new customers to adopt the “SCIM-compliant” API. Legacy customers can be supported using a “non-SCIM-compliant” API for an arbitrarily long period and gradually migrated to the SCIM-compliant API. The logistics are not insurmountable, and shouldn't prevent the adoption of a dictionary model for multi-valued attributes.
>>> Elaboration of Point 1:
>>> When we consider federated identity across more than one domain, we have to assume that domains are not necessarily master-slave in their interaction. The most generic interaction model is peer-to-peer, where entity lifecycle events within a domain are notified to other domains (when necessary) in an asynchronous manner (i.e., through messaging) and the other domains are free to respond to these events in an appropriate manner and at a time of their convenience.
>>> A key set of lifecycle events for an entity is the merging and splitting of identity that is often required.
>>> The question “Is this one entity?” can be answered either yes (positive) or no (negative). But sometimes, we can discover false positives and false negatives in our data stores.
>>> Consider a case where customers sign up online, and two customers who are privacy-conscious enter fake IDs such as “John Smith”, and also use the same date of birth (say, 1 Jan 1970) or similar attributes. The front-end application may make an intelligent (but incorrect) guess that these two persons are the same, and re-assign the same identifier to the second person. This is a false positive. They appear to be the same entity, but they're actually different. When the error is discovered, the identities will need to be split, with a new identifier generated for one of them.
>>> Consider the opposite case where a customer signs up through two different portals or in two different sessions, using the names “JSmith” and “JohnS”. It is very likely that they will be treated as two different customers and assigned two unique identifiers. This is a false negative. They appear to be two entities, but are actually the same. At a later stage, when the error is discovered, the identities will have to be merged, and one of the identifiers will have to be dropped.
>>> These are not theoretical use cases. They form a significant proportion of the user base in most large Web-facing applications. Let's see how these can be managed in a federated way by mapping internal identifiers to external ones and only exposing external identifiers to other domains.
>>> a. False positives:
>>> Domain 1 has the following information about a customer in its data store:
>>> Internal ID: 9caf78aac3d6
>>> Attributes: {name: “John Smith”, dob: “01-Jan-1970”}
>>> When requesting the provisioning of this entity in Domain 2, the following ID is returned by Domain 2: ff487230b3a0.
>>> Domain 1 then maintains the following in a mapping table and uses it for translation when talking to Domain 2, taking care never to expose its internal identifier:
>>> | Internal Entity ID | External Domain ID | External Entity ID | Primary flag |
>>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |
>>> When the false positive is discovered and the entity is split, Domain 1 creates a new internal identifier and now has the following entity information.
>>> Internal ID: 9caf78aac3d6
>>> Attributes: {name: “John Smith”, dob: “01-Jan-1970”}
>>> Internal ID: a99a5feba839