Re: [scim] scim Digest, Vol 8, Issue 24

> In your example you are conflating value with an attribute id.

I don't believe so.

I'm adopting a model where every attribute of the resource is a key-value
pair. The key is a name or ID.

For non-repeating attributes (both simple and composite), the key is the
attribute name itself.

Simple attribute:

Key: "dob"
Value: "01 Jan 1970"

For composite attributes, the key employs dot notation to specify the
fully-qualified attribute name, e.g., "address.postcode".

Composite attribute:

Key: "address.street-number"
Value: "10"

Key: "address.suburb"
Value: "East Camden"

For repeating (multi-valued) attributes, I'm suggesting that there be new
keys for each individual value, otherwise they are impossible to
distinguish, and a positional index is inadequate. So we convert the array
into a dictionary and this then becomes a composite attribute using dot
notation for the key.

Multi-valued attribute:

Key: "emails.7dfcb444-74d8-4f17-aa66-daf9ea3bd902"
Value: "john_smith@yahoo.com"

So this allows us to apply uniform treatment to any arbitrarily deep
resource structure. We can refer to every leaf value with a key that is the
fully-qualified name using dot notation.

The verbs are just unambiguous operations on these (now) explicitly
addressable attributes.

INCLUDE to a collection and specify only the value. The key is generated
and returned. The fully-qualified key is
<collection-name>.<newly-generated-ID> and the value is what was specified
in the INCLUDE.

REPLACE a fully-qualified key with a new value. If the key doesn't exist,
return a "404 Not Found".

PLACE a value at the logical location implied by the fully-qualified key.
If there is already a key with that name, return a "409 Conflict".

FORCE the fully-qualified key to hold the given value, regardless of
whether it existed before or not. Only errors possible are "400 Bad
Request" and "500 Internal Error".

RETIRE an attribute or a collection given its fully-qualified key. The
implementation will determine whether the attribute will disappear entirely
or will exist holding a null value (the blank string "" or the empty object
{} ).

I'll explain in a separate post why we need operation verbs like these that
are independent of the HTTP verbs.

Regards,
Ganesh

On 11 August 2012 10:38, <scim-request@ietf.org> wrote:

> If you have received this digest without all the individual message
> attachments you will need to update your digest options in your list
> subscription.  To do so, go to
>
> https://www.ietf.org/mailman/listinfo/scim
>
> Click the 'Unsubscribe or edit options' button, log in, and set "Get
> MIME or Plain Text Digests?" to MIME.  You can set this option
> globally for all the list digests you receive at this point.
>
>
>
> Send scim mailing list submissions to
>         scim@ietf.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://www.ietf.org/mailman/listinfo/scim
> or, via email, send a message with subject or body 'help' to
>         scim-request@ietf.org
>
> You can reach the person managing the list at
>         scim-owner@ietf.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of scim digest..."
>
> Today's Topics:
>
>    1. Re: SCIM Protocol - 3 suggestions for improvement (Phil Hunt)
>
>
> ---------- Forwarded message ----------
> From: Phil Hunt <phil.hunt@oracle.com>
> To: Ganesh and Sashi Prasad <g.c.prasad@gmail.com>
> Cc: "Diodati, Mark" <Mark.Diodati@gartner.com>, Emmanuel Dreux <
> edreux@cloudiway.com>, Trey Drake <trey.drake@unboundid.com>, Kelly
> Grizzle <kelly.grizzle@sailpoint.com>, "scim@ietf.org" <scim@ietf.org>
> Date: Fri, 10 Aug 2012 17:36:54 -0700
> Subject: Re: [scim] SCIM Protocol - 3 suggestions for improvement
> Ganesh,
>
> In your example you are conflating value with an attribute id. I find that
> confusing.
>
> I agree though that operations in patch could be a lot more explicit.
>
> Eg explicitly deleting a value by saying delete or retire.
>
> Phil
>
> On 2012-08-10, at 16:19, Ganesh and Sashi Prasad <g.c.prasad@gmail.com>
> wrote:
>
>  >  I am concerned about your second suggestion:
>
> Let's discuss that now.
>
> The trade-offs are very clear here.
>
> Pros:
>
> Pro 1. The API to manipulate resources becomes so much cleaner, consistent
> and intuitive when every individual attribute value gets its own ID.
>
> Here's how to delete a single member from a Group, as per the current spec:
>
>    PATCH /Groups/acbf3ae7-8463-4692-b4fd-9b4da3f908ce
>    Host: example.com
>    Accept: application/json
>    Authorization: Bearer h480djs93hd8
>    ETag: W/"a330bc54f0671c9"
>
>    {
>      "schemas": ["urn:scim:schemas:core:1.0"],
>      "members": [
>        {
>          "display": "Babs Jensen",
>          "value": "2819c223-7f76-453a-919d-413861904646"
>          "operation": "delete"
>        }
>      ]
>    }
>
>
> Here's how to delete ALL members from a group according to the current
> spec:
>
>    PATCH /Groups/acbf3ae7-8463-4692-b4fd-9b4da3f908ce
>    Host: example.com
>    Accept: application/json
>    Authorization: Bearer h480djs93hd8
>    ETag: W/"a330bc54f0671c9"
>
>    {
>      "schemas": ["urn:scim:schemas:core:1.0"],
>      "meta": {
>        "attributes": [
>          "members"
>        ]
>      }
>    }
>
>
> The two operations differ significantly, and it's not very intuitive.
> With my suggestion, here's how to delete a single member from a group:
>
> PATCH /Groups/acbf3ae7-8463-4692-b4fd-9b4da3f908ce Host: example.comAccept: application/json Authorization: Bearer h480djs93hd8 ETag:
> W/"a330bc54f0671c9" {
> "operations" : [
> {
>  "RETIRE" : {
> "key" : "members.2819c223-7f76-453a-919d-413861904646"
>  }
> }
> ] }
> Here's how I suggest deleting ALL members from a group:
>
> PATCH /Groups/acbf3ae7-8463-4692-b4fd-9b4da3f908ce Host: example.comAccept: application/json Authorization: Bearer h480djs93hd8 ETag:
> W/"a330bc54f0671c9" {
> "operations" : [
> {
>  "RETIRE" : {
> "key" : "members"
>  }
> }
> ] }
>
> I'm sure you'll agree that this is simpler, more consistent and more
> intuitive to a reader.
>
> Pro 2: We can apply this mechanism consistently to three areas:
> (a) Manipulating multi-valued attributes of a resource
> (b) Manipulating members of a group
> (c) Performing bulk operations, where we simply use HTTP verbs instead of
> the specialised (and semantically less ambiguous) verbs I suggested for
> attributes, the "key" becomes the URI, and the "value" becomes the
> corresponding JSON object.
>
> All of them return "207 Multi-Status" with the "results" array holding
> individual status codes.
>
> In the current spec, (a) and (b) are done similarly but (c) is very
> different.
>
> Pro 3: Adoption of the standard by clients is likely to be higher because
> it's simpler for them.
>
> Pro 4: New (not incumbent) cloud providers will probably find this easier
> to implement because they have no legacy. They will probably use some form
> of NoSQL database and won't be constrained by the limitations of LDAP
> directories.
>
> Cons:
>
> Con 1: Incumbent cloud providers with existing data stores in a directory
> format (where multi-valued attributes are stored as comma-separated values
> under a single attribute node) will find it difficult to migrate to this
> model and store each attribute value as a sub-node with its own key. This
> will "hinder adoption of the spec", which is what you fear.
>
> Have I summed up the Pros and Cons correctly? I'm biased of course, so I
> could have missed a Con or hyped a Pro :-).
>
> In other words, we're debating interface complexity (current spec) versus
> implementation complexity (my suggestion). Both can hinder adoption of the
> spec by different parties.
>
> Here's what we need to discuss - Do the Pros make the suggestion worth
> adopting in spite of the Cons, or are the Cons so great that it's best to
> leave the spec as it is?
>
> Keep in mind that a complex spec that only favours incumbent cloud
> providers can cut both ways. It opens the door to a simpler interface
> offered by a new generation of nimbler SPs that don't have the same legacy
> issues, and there could be an exodus of clients to these new SPs. SCIM
> could end up being obsoleted very soon, because the API interface is very
> complex and clumsy, as any new reader can attest. I was taken aback by the
> complexity when I saw it, which is why I was prompted to suggest something
> simpler.
>
> This is an issue where we need the opinions of many people, and they need
> to state their affiliations. If most people weighing in belong to incumbent
> SPs and they vote in favour of interface complexity to avoid implementation
> complexity, then it means the spec is not doing a good job of balancing the
> interests of various groups. I think we should also poll client
> organisations to see what they would want.
>
> (Gartner is trusted by enterprise clients to evaluate the capabilities of
> vendors (SPs). I believe Gartner should take the lead in representing
> client interests in this working group rather than those of incumbent
> vendors, which is what it seems like to me. Correct me if I'm being unfair.)
>
> Regards,
> Ganesh
>
>
>
> On 11 August 2012 01:35, Diodati,Mark <Mark.Diodati@gartner.com> wrote:
>
>>  Hi Ganesh,
>>
>>
>>
>> I am concerned about your second suggestion:
>>
>> “2. When dealing with multi-valued attributes of a resource (expressed as
>> arrays in JSON), they must be converted from an array into a dictionary
>> with unique keys (UUIDs generated by the cloud provider when the attribute
>> is created). Without unique keys for every attribute value of a resource,
>> manipulating it will be clumsy and inelegant.”
>>
>>
>>
>> One of the primary reasons that SPML failed was lack of adoption by
>> service providers due to its complexity. Very few target applications
>> implemented SPML. Most of the commercial provisioning systems had an SPML
>> interface (either v1 or v2), but not one of them was conformant to the SPML
>> standard because of complexity. If you are interested, I will forward you
>> the research documents that discuss these problems in detail. For SCIM to
>> be successful, it must be adopted by commercial target applications (i.e.,
>> service providers). I am confident that a requirement for unique
>> identifiers with multi-valued attributes will preclude its adoption,
>> because it requires major changes to the service provider’s existing
>> identity storage mechanisms.
>>
>> Mark
>>
>>
>>
>>
>>
>>
>>
>> *From:* Trey Drake [mailto:trey.drake@unboundid.com]
>> *Sent:* Friday, August 10, 2012 9:51 AM
>>
>> *To:* Ganesh and Sashi Prasad
>> *Cc:* scim@ietf.org; Emmanuel Dreux; Kelly Grizzle; Phil Hunt
>>
>> *Subject:* Re: [scim] SCIM Protocol - 3 suggestions for improvement
>>
>>
>>
>> Ganesh,
>>
>>
>>
>> I'll base my comments on your latest reply (below).
>>
>>
>>
>> The externalID is not part of the protocol.  It is an *optional*
>> attribute within the *schema* specification.  As for #2, the protocol spec
>> works as you describe if "arbitrary URI parameters" equate to resource
>> attributes (Allow generic search using 'GET /Users' and arbitrary URI
>> parameters).  Please clarify your suggestion.
>>
>>
>>
>> I'm not tracking your coupling concern.  The client can search and hence
>> retrieve resources on any attribute it chooses, externalId or otherwise.
>>  Nothing mandates use of externalId.
>>
>>
>>
>>
>>
>> What do you mean by "candidate key"?  Given
>>
>>
>>
>> On Fri, Aug 10, 2012 at 5:49 AM, Ganesh and Sashi Prasad <
>> g.c.prasad@gmail.com> wrote:
>>
>> >  I think scim gets its current simplicity from its single owner hub
>> spoke model implementing tight coupling. [...] IMHO loose coupling is a
>> much more complex solution.
>>
>>
>>
>> Phil,
>>
>>
>>
>> I'm a bit surprised that you're implying "tight coupling == simple" and
>> "loose coupling == complex". That's contrary to my experience.
>>
>>
>>
>> When I say "loose coupling", I mean "no unnecessary dependencies".
>> Invariably, a reduction in dependencies leads to greater simplicity.
>>
>>
>>
>> Let's not confuse reduction of dependencies in the data model with a
>> hub-and-spokes architecture. They're entirely orthogonal aspects of the
>> solution.
>>
>>
>>
>> All that my suggestion involves is,
>>
>>
>>
>> 1. Take 'external ID' out of the protocol.
>>
>> 2. Allow generic search using 'GET /Users' and arbitrary URI parameters
>>
>>
>>
>> No planned functionality is lost by this.
>>
>>
>>
>> 1. The client enterprise can still send its internal ID as part of the
>> resource body, inside some attribute defined by them (but not defined by
>> the protocol). Let's say they call it 'myID'.
>>
>> 2. The client enterprise can search for resource URIs using any
>> attribute, including this internal ID
>>
>> 'GET /Users?myID=bjensen'
>>
>> Since myID is a candidate key, the server will return exactly one URI,
>> which is the canonical URI for the resource
>>
>> https://example.com/v1/Users/2819c223-7f76-453a-919d-413861904646
>>
>> 3. The client can use this URI to perform all other operations as usual.
>>
>>
>>
>> So taking 'externalID' out of the protocol spec only does this:
>>
>> 1. It avoids enshrining tight coupling in the protocol (If clients want to tightly couple themselves to the cloud provider by sending their internal IDs, they can do so. Suicide is OK, but the protocol should not be guilty of assisted suicide. ;-)
>>
>> 2. It encourages loose coupling by nudging clients towards maintaining their own internal-to-external identifier mappings.
>>
>>
>>
>> That's what I'd like to see. I don't believe this complicates the protocol. It simplifies it and it also lends itself to a loosely-coupled approach.
>>
>>
>>
>> I'll address the multi-valued attribute suggestion separately.
>>
>>
>>
>> Regards,
>>
>> Ganesh
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 10 August 2012 07:53, Phil Hunt <phil.hunt@oracle.com> wrote:
>>
>>
>>
>> Phil
>>
>>
>> On 2012-08-09, at 14:14, Ganesh and Sashi Prasad <g.c.prasad@gmail.com>
>> wrote:
>>
>>  >  storing this information in a mapping table outside of the SCIM spec
>> is a great way to enable this solution.  Part of the key here is that SCIM
>> is just a piece of the architecture for this solution, and is only
>> responsible for the transport layer between domains.
>>
>> I wasn't suggesting that the mapping table be part of the SCIM spec. I
>> provided that example to illustrate that splitting and merging identities
>> is a common requirement, and that decoupling local identifiers within a
>> domain from shared identifiers between domains was the best way to
>> facilitate it.
>>
>>
>>
>> I'm suggesting that the spec do less, not more.
>>
>>
>>
>> What the SCIM spec needs to do there is just refrain from introducing
>> tight coupling. I would like to see a single identifier exposed through the
>> API, with the implication (and perhaps the recommendation) that it be the
>> shared one. Allowing one domain to expose its internal identifier to the
>> other creates tight coupling and ensures that both domains need
>> simultaneously split or merge identities, which is not desirable. So I
>> recommend _taking out_ the "external id" field from the API. The spec
>> shouldn't encourage tight coupling. If clients want to pass in their
>> internal ids as part of the resource body, no one can stop them, and they
>> can always do a search on that attribute to retrieve the URI exactly as you
>> visualise they will with the "external id", but let's not elevate an
>> anti-pattern to a recommendation by enshrining the "external id" as an
>> acceptable attribute.
>>
>>
>>
>> Am I making sense?
>>
>>
>>
>> I see what you are saying. I think scim gets its current simplicity from
>> its single owner hub spoke model implementing tight coupling.
>>
>>
>>
>> IMHO loose coupling is a much more complex solution. The reality is that
>> each end-point has value to contribute and thus the single-owner model will
>> eventually need to become multi-owner or multi-hub.
>>
>>
>>
>> That said i think the current model provides a practical starting point.
>>
>>
>>
>> >  Regarding unique identifiers for multi-valued attributes there is a
>> trade-off involved.  On one hand this makes PATCH semantics easier.  On the
>> other hand it puts extra burden on service providers.
>>
>>
>> Precisely. The spec has to strike the right balance. It would be
>> interesting to hear from the other members of the spec mailing list. You
>> know where I stand on this. It would be good to hear the spectrum of
>> opinions.
>>
>>
>>
>> Regards,
>>
>> Ganesh
>>
>> On 10 August 2012 00:28, Kelly Grizzle <kelly.grizzle@sailpoint.com>
>> wrote:
>>
>> Thanks Emmanuel.  I had started writing up a similar response.  As you
>> suggest, storing this information in a mapping table outside of the SCIM
>> spec is a great way to enable this solution.  Part of the key here is that
>> SCIM is just a piece of the architecture for this solution, and is only
>> responsible for the transport layer between domains.
>>
>>
>>
>> You could also model these ID mappings in the SCIM user as an extension
>> but would probably not want to expose these externally.  Here is an example
>> of how to model the end state of the false positive scenario (splitting a
>> user):
>>
>>
>>
>> | Internal Entity ID | External Domain ID | External Entity ID | Primary
>> flag |
>>
>> | 9caf78aac3d6       | D2                 | ff487230b3a0       |
>> true         |
>>
>> | a99a5feba839       | D2                 | 7a87f27c1dd8       |
>> true         |
>>
>>
>>
>> This could be represented as two SCIM users that contain information
>> about the entities on other domains.
>>
>>
>>
>> {
>>
>>   "schemas": ["urn:scim:schemas:core:1.0",
>> "urn:scim:schemas:extension:federation:1.0"],
>>
>>   "id": "9caf78aac3d6",
>>
>>   "userName": "John Smith",
>>
>>   "urn:scim:schemas:extension:federation:1.0": {
>>
>>     "linkedUsers": [
>>
>>       {
>>
>>         "domain": "D2",
>>
>>         "externalEntityId": "ff487230b3a0"
>>
>>       }
>>
>>     ]
>>
>>   }
>>
>> }
>>
>>
>>
>> {
>>
>>   "schemas": ["urn:scim:schemas:core:1.0",
>> "urn:scim:schemas:extension:federation:1.0"],
>>
>>   "id": "a99a5feba839",
>>
>>   "userName": "John Smith",
>>
>>   "urn:scim:schemas:extension:federation:1.0": {
>>
>>     "linkedUsers": [
>>
>>       {
>>
>>         "domain": "D2",
>>
>>         "externalEntityId": "7a87f27c1dd8"
>>
>>       }
>>
>>     ]
>>
>>   }
>>
>> }
>>
>>
>>
>> In the second user, the linkedUsers attribute would be empty until the
>> split user was synced to domain 2.
>>
>>
>>
>>
>>
>> Similarly, the false negative use case (merging two users) looked like
>> this at the end:
>>
>>
>>
>> | Internal Entity ID | External Domain ID | External Entity ID | Primary
>> flag |
>>
>> | 9caf78aac3d6       | D2                 | ff487230b3a0       |
>> true         |
>>
>> | 9caf78aac3d6       | D2                 | 41206cc97c8b       |
>> false        |
>>
>>
>>
>> This could be represented with the following SCIM user:
>>
>>
>>
>> {
>>
>>   "schemas": ["urn:scim:schemas:core:1.0",
>> "urn:scim:schemas:extension:federation:1.0"],
>>
>>   "id": "9caf78aac3d6",
>>
>>   "userName": "John Smith",
>>
>>   "urn:scim:schemas:extension:federation:1.0": {
>>
>>     "linkedUsers": [
>>
>>       {
>>
>>         "domain": "D2",
>>
>>         "externalEntityId": "ff487230b3a0"
>>
>>       },
>>
>>       {
>>
>>         "domain": "D2",
>>
>>         "externalEntityId": "41206cc97c8b",
>>
>>         "deletionRequired": true
>>
>>       }
>>
>>     ]
>>
>>   }
>>
>> }
>>
>>
>>
>>
>>
>> Regarding unique identifiers for multi-valued attributes there is a
>> trade-off involved.  On one hand this makes PATCH semantics easier.  On the
>> other hand it puts extra burden on service providers.  Since the inception
>> of SCIM, a key goal has been to foster adoption by service providers by
>> making things fit easily onto existing systems.  IMO the value gained by
>> unique identifiers for multi-valued attributes is not worth the demands put
>> on a service provider.  I also think that vendors that have a
>> non-SCIM-compliant API will choose to keep things that way if the spec is
>> too hard for them to implement.  In a green field environment we do have
>> the luxury of mandating a model to make certain operations more elegant.
>> However, we can’t ignore legacy systems.
>>
>>
>>
>> --Kelly
>>
>>
>>
>> *From:* scim-bounces@ietf.org [mailto:scim-bounces@ietf.org] *On Behalf
>> Of *Emmanuel Dreux
>> *Sent:* Thursday, August 09, 2012 3:18 AM
>> *To:* Ganesh and Sashi Prasad; Kelly Grizzle
>> *Cc:* scim@ietf.org
>> *Subject:* Re: [scim] SCIM Protocol - 3 suggestions for improvement
>>
>>
>>
>> Hi Ganesh,
>>
>>
>>
>> Nothing prevents you in your SCIM implementation (client or server) to
>> generate a unique identifier for each synchronized object and maintain an
>> internal mapping table ( you would have to map group membership as well).
>>
>>
>>
>> This is what we are doing with Active Directory sources or targets:
>>
>> As we didn’t find an immutable uniqueID in AD systems (DN,samAccountName,
>> UPN) are subject to change (even objectGuid can change if an AD domain is
>> migrated), we decided to generate and maintain an internal table of ids.
>> This fits your requirements as it hides internal ids.
>>
>>
>>
>> This was written in dotnet and we have started a project to rewrite our
>> SCIM stack in PHP and will give it to the Open Source community. This
>> implementation will have a parameter : AllocateIds versus UseExistingIDs.
>>
>> This will give the choice of “hiding” internalIDs or use them as unique
>> ID.
>>
>>
>>
>> You can also implement such feature without violating the SCIM specs, or
>> without asking to include it in the specs.
>>
>>
>>
>> --
>>
>> Regards,
>>
>> Emmanuel Dreux
>>
>> http://www.cloudiway.com
>>
>> Tel: +33 4 26 78 17 58
>>
>> Mobile: +33 6 47 81 26 70
>>
>> skype: Emmanuel.Dreux
>>
>>
>>
>> *De :* Ganesh and Sashi Prasad [mailto:g.c.prasad@gmail.com<g.c.prasad@gmail.com>]
>>
>> *Envoyé :* jeudi 9 août 2012 03:35
>> *À :* Kelly Grizzle
>> *Cc :* scim@ietf.org
>> *Objet :* Re: [scim] SCIM Protocol - 3 suggestions for improvement
>>
>>
>>
>> Hi Kelly,
>>
>> Thanks for your response. Let me first respond in brief to the two main
>> points you have made, and then elaborate on the first.
>>
>> 1.      Why should domains not expose their internal identifiers to
>> other domains?
>>
>> a.
>>
>> We are designing a protocol for a federated system of domains, where all
>> domains are co-equal peers. (In physics too, N-body problems are much
>> harder than 2-body problems. :-) Therefore, assuming that there are only
>> two players in the interaction makes this tightly coupled in a number of
>> ways. We should rely on messaging and notification, with encapsulation of
>> domain-specific data.
>>
>> b.
>>
>> In any non-trivial data store, there will always be the ongoing need to
>> merge and split identities as and when “false negatives” and “false
>> positives” are discovered. A domain should be able to handle this internal
>> housekeeping freely, only notifying other domains when convenient. Mapping
>> of internal identifiers to external ones and maintaining this mapping
>> internally allows this loosely-coupled housekeeping to take place. Sharing
>> internal identifiers (or otherwise outsourcing the mapping of internal to
>> external identifiers) forces housekeeping activities to be done in
>> lock-step across domains.
>>
>> c.
>>
>> Asynchronous interaction is not just a matter of a suitable wire protocol
>> which can be designed later. The data model plays a crucial role in
>> enabling or constraining such interaction. A tightly-coupled data model
>> will force the use of synchronous interactions, and the exposure of
>> internal identifiers is a key part of this tight coupling.
>>
>> 2. The difficulty of assigning unique identifiers to the individual
>> values of multi-valued attributes:
>>
>> a.
>>
>> I'm not belittling the effort involved in migrating legacy data stores to
>> such a model. However, in the larger historical context of cross-domain
>> identity management, we are really at the very early stages. If a
>> relatively new discipline and a brand new spec are held captive to legacy
>> considerations, we are losing an opportunity to provide a clean and elegant
>> model to subsequent users of the spec, and this will have repercussions
>> over many years or even decades.
>>
>> b.
>>
>> If incumbent cloud providers find it hard to immediately adopt the
>> dictionary model for existing multi-valued attributes, they can transition
>> to this model by offering both “SCIM-compliant” and “non-SCIM-compliant”
>> APIs to their customers and encouraging new customers to adopt the
>> “SCIM-compliant” API. Legacy customers can be supported using a
>> “non-SCIM-compliant” API for an arbitrarily long period and gradually
>> migrated to the SCIM-compliant API. The logistics are not insurmountable,
>> and shouldn't prevent the adoption of a dictionary model for multi-valued
>> attributes.
>>
>> Elaboration of Point 1:
>>
>> When we consider federated identity across more than one domain, we have
>> to assume that domains are not necessarily master-slave in their
>> interaction. The most generic interaction model is peer-to-peer, where
>> entity lifecycle events within a domain are notified to other domains (when
>> necessary) in an asynchronous manner (i.e., through messaging) and the
>> other domains are free to respond to these events in an appropriate manner
>> and at a time of their convenience.
>>
>> A key set of lifecycle events for an entity is the merging and splitting
>> of identity that is often required.
>>
>> The question “Is this one entity?” can be answered either yes (positive)
>> or no (negative). But sometimes, we can discover false positives and false
>> negatives in our data stores.
>>
>> Consider a case where customers sign up online, and two customers who are
>> privacy-conscious enter fake IDs such as “John Smith”, and also use the
>> same date of birth (say, 1 Jan 1970) or similar attributes. The front-end
>> application may make an intelligent (but incorrect) guess that these two
>> persons are the same, and re-assign the same identifier to the second
>> person. This is a false positive. They appear to be the same entity, but
>> they're actually different. When the error is discovered, the identities
>> will need to be split, with a new identifier generated for one of them.
>>
>> Consider the opposite case where a customer signs up through two
>> different portals or in two different sessions, using the names “JSmith”
>> and “JohnS”. It is very likely that they will be treated as two different
>> customers and assigned two unique identifiers. This is a false negative.
>> They appear to be two entities, but are actually the same. At a later
>> stage, when the error is discovered, the identities will have to be merged,
>> and one of the identifiers will have to be dropped.
>>
>> These are not theoretical use cases. They form a significant proportion
>> of the user base in most large Web-facing applications. Let's see how these
>> can be managed in a federated way by mapping internal identifiers to
>> external ones and only exposing external identifiers to other domains.
>>
>> a. False positives:
>>
>> Domain 1 has the following information about a customer in its data store:
>>
>> Internal ID: 9caf78aac3d6
>>
>> Attributes: {name: “John Smith”, dob: “01-Jan-1970”}
>>
>> When requesting the provisioning of this entity in Domain 2, the
>> following ID is returned by Domain 2: ff487230b3a0.
>>
>> Domain 1 then maintains the following in a mapping table and uses it for
>> translation when talking to Domain 2, taking care never to expose its
>> internal identifier:
>>
>> | Internal Entity ID | External Domain ID | External Entity ID | Primary
>> flag |
>>
>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |
>>
>> When the false positive is discovered and the entity is split, Domain 1
>> creates a new internal identifier and now has the following entity
>> information.
>>
>> Internal ID: 9caf78aac3d6
>>
>> Attributes: {name: “John Smith”, dob: “01-Jan-1970”}
>>
>> Internal ID: a99a5feba839
>>
>> Attributes: {name: “John Smith”, dob: “01-Jan-1970”}
>>
>> This second entity with its own internal identifier is invisible to
>> Domain 2, and this is by design. Communication about the original entity
>> takes place as before by mapping “9caf78aac3d6” to “ff487230b3a0” and
>> vice-versa. At some convenient time (importantly, this doesn't have to be
>> at the time the split happens), Domain 2 can be requested to provision a
>> second entity, and when it responds with an identifier of “7a87f27c1dd8”,
>> this can go into the mapping table as a new record associated with the
>> second entity's internal identifier.
>>
>> The mapping table now contains the following entries:
>>
>> | Internal Entity ID | External Domain ID | External Entity ID | Primary
>> flag |
>>
>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |
>>
>> | a99a5feba839 | D2 | 7a87f27c1dd8 | true |
>>
>> Domain 2 is not even aware that a split has happened, and the
>> provisioning that it does is not in lockstep with the split in identity
>> that occurred in Domain 1.
>>
>> (What is the “Primary flag” used for? We'll see when we cover the
>> treatment of false negatives.)
>>
>> b. False negatives:
>>
>> Domain 1 has the following information about what it thinks are two
>> distinct customers in its data store:
>>
>> Internal ID: 9caf78aac3d6
>>
>> Attributes: {name: “JSmith”, dob: “01-Jan-1970”}
>>
>> Internal ID: 273d36e30d09
>>
>> Attributes: {name: “JohnS”, dob: “01-Jan-1970”}
>>
>> When requesting the provisioning of these entities in Domain 2, the
>> following IDs are returned by Domain 2: ff487230b3a0 and 41206cc97c8b.
>>
>> Domain 1 then maintains the following in a mapping table and uses it for
>> translation when talking to Domain 2, taking care never to expose its
>> internal identifiers:
>>
>> | Internal Entity ID | External Domain ID | External Entity ID | Primary
>> flag |
>>
>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |
>>
>> | 273d36e30d09 | D2 | 41206cc97c8b | true |
>>
>> When the false negative is discovered and the two entities are merged,
>> Domain 1 drops one of the internal identifiers and rationalises the name of
>> the customer (say, to “John Smith”). Let's say it retains the first ID
>> “9caf78aac3d6” and drops the second “273d36e30d09”.
>>
>> The mapping table now looks like this:
>>
>> | Internal Entity ID | External Domain ID | External Entity ID | Primary
>> flag |
>>
>> | 9caf78aac3d6 | D2 | ff487230b3a0 | true |
>>
>> | 9caf78aac3d6 | D2 | 41206cc97c8b | false |
>>
>> Now two external identifiers map to the same internal one, so inbound
>> communication from Domain 2 can be unambiguously translated to the same
>> entity internally. However, when going outwards, Domain 1 will have to look
>> up the translation table to determine the “primary” external ID for this
>> entity in Domain 2, which was decided to be “ff487230b3a0”. That's where
>> the “Primary flag” comes in. The second external ID “41206cc97c8b” is never
>> used thereafter in outbound communication.
>>
>> At some stage (importantly, not in lockstep with the identity merge),
>> Domain 2 can be requested to delete the customer record identified by
>> “41206cc97c8b”, and the second entry in the mapping table can be removed
>> once this is acknowledged.
>>
>> This scheme will scale up to multiple domains, because the “External
>> Domain ID” column helps to keep track of which external ID is shared with
>> which Domain. (Why don't we use just one external ID for an entity and
>> share it with all external domains? Tight coupling again. Just as OAuth
>> allows an access token given to a third party to be invalidated without
>> affecting the access of other third parties, the use of separate external
>> identifiers for different domains allows fine-grained control of identity
>> federation.)
>>
>> The scheme also allows the splitting of an entity into more than two
>> entities, and the merging of more than two entities into a single one. (Any
>> organisation with a web-facing application will tell you how many John
>> Smiths there are who were born on 1 Jan 1970!)
>>
>> This is a fairly long-winded explanation, but this is why we need to hide
>> internal identifiers from other domains, and why mappings need to be
>> managed internally in each domain. Such a data model also allows us to
>> choose asynchronous protocols for propagation of identity events, since
>> there is no consistency requirement to update multiple domains concurrently.
>>
>> Regards,
>>
>> Ganesh Prasad
>>
>>
>>
>> On 9 August 2012 04:55, Kelly Grizzle <kelly.grizzle@sailpoint.com>
>> wrote:
>>
>> Thanks for the feedback, Ganesh.  I read through this and your InfoQ
>> article (http://www.infoq.com/articles/scim-data-model-limitations) and
>> have some thoughts.
>>
>>
>>
>> > The rest of the protocol does not meaningfully use the enterprise
>> client’s identifier, the "external ID"
>>
>> > at all, even though it was ostensibly introduced to make things
>> friendlier for the client.
>>
>>
>>
>> The usage pattern for an external ID would be to search for a user by
>> externalId and use the ID of the returned user in any desired operation.
>> For example:
>>
>>
>>
>> GET /Users?filter=externalId eq “bjensen”&attributes=id
>>
>>
>>
>> {
>>
>>   “totalResults”: 1,
>>
>>   “Resources”: [
>>
>>     {
>>
>>       “id”: “2819c223-7f76-453a-919d-413861904646”
>>
>>     }
>>
>>   ]
>>
>> }
>>
>>
>>
>> Retrieve the ID from the response and use it.
>>
>>
>>
>> DELETE /Users/2819c223-7f76-453a-919d-413861904646
>>
>>
>>
>> This does introduce an additional HTTP request if the client chooses not
>> to store the server’s id.  An issue was created to consider allowing
>> operations to use the externalId (
>> http://code.google.com/p/scim/issues/detail?id=35), but I believe the
>> general consensus has been to not include this in the spec.  One main point
>> of contention is that much of the rest of the spec (eg – group membership
>> references, manager references, etc…) require knowledge of the server’s
>> identifier.  Continuing this discussion on the IETF list would be a good
>> thing, though.
>>
>>
>>
>>
>>
>> > the cloud provider's ID and the enterprise client's ID are both
>> "Internal IDs" with respect to their domains
>>
>>
>>
>> I think this comes down to a nomenclature problem.  The server’s ID does
>> not necessarily have to be the unique identifier that the underlying
>> identity store uses, it just has to be stable and unique.  In many cases,
>> the underlying identity store will provide identifiers with these
>> properties already (eg – a uuid) and it can be used by the SCIM interface.
>> The “externalId” is referring to the fact that the id is maintained
>> external to the SCIM server.  As long as the server’s identifiers are
>> stable and unique (which is mandated by the spec), I don’t see a problem.
>>
>>
>>
>>
>>
>> > The secret is that *every value needs a key*, and multi-valued
>> attributes lack that. So our solution is quite
>>
>> > simple - turn every list or array (of values) into a dictionary (of
>> key-value pairs) by providing each value
>>
>> > with a unique and meaning-free identifier.
>>
>>
>>
>> I agree that this would be useful, especially in the PATCH operation.
>> One reason that this wasn’t included in the spec originally is that it can
>> put undue burden on the service provider.  Many service providers are
>> putting SCIM interfaces in front of their existing identity stores (eg –
>> directory servers, SaaS application databases, etc…).  Many of these do not
>> have a unique identifier for multi-valued attributes.  By requiring this, a
>> majority of the server providers would have to start maintaining a unique
>> key for each multi-valued attribute.  I believe this would be a roadblock
>> for many implementers.
>>
>>
>>
>>
>>
>> > When the SCIM protocol uses PATCH, there are areas where it seems a
>> bit clumsy.
>>
>>
>>
>> I like the thoughts here.  Your example reminds me of unified diffs (
>> http://en.wikipedia.org/wiki/Diff#Unified_format), which are commonly
>> used with a patch program (pretty much the equivalent of the PATCH verb).
>>  However, the three proposals seem to largely hinge on being able to
>> uniquely address each element within an object.  Without these it is not so
>> easy to address each patch sub-operation (REPLACE, INCLUDE, etc…) or
>> provide a multi-status.
>>
>>
>> The 207 response would be interesting to consider for the bulk endpoint (
>> http://www.simplecloud.info/specs/draft-scim-api-00.html#bulk-resources),
>> however.
>>
>>
>>
>>
>>
>> > There are other, non-data aspects of SCIM which may require review,
>> such as its synchronous request-response
>>
>> > interaction model, which is a form of tight coupling and could prove to
>> be a source of brittleness.
>>
>>
>>
>> I agree that we should explore optional asynchronous requests in 2.0.
>>
>>
>>
>> Thanks again for your thoughts.  I hope you stay involved in the
>> discussion as work on SCIM 2.0 goes forward.
>>
>>
>>
>> --Kelly
>>
>>
>>
>> *From:* scim-bounces@ietf.org [mailto:scim-bounces@ietf.org] *On Behalf
>> Of *Ganesh and Sashi Prasad
>> *Sent:* Wednesday, August 01, 2012 4:24 PM
>> *To:* scim@ietf.org
>> *Subject:* [scim] SCIM Protocol - 3 suggestions for improvement
>>
>>
>>
>> (I posted this on the SCIM Google Group, and I was advised to subscribe
>> to the mailing list and post it here instead, so here goes.)
>>
>>
>>
>> Hi,
>>
>>
>>
>> My name is Ganesh Prasad, and my experience in Identity and Access
>> Management is mainly through a 3-year project at an Australian insurance
>> company, an experience I have written about as a eBook on InfoQ (
>> http://www.infoq.com/minibooks/Identity-Management-Shoestring).
>>
>>
>>
>> I have been following the SCIM spec off and on, and based on my
>> experience with a loosely-coupled architecture that I found to be
>> successful, I have the following 3 suggestions to make.
>>
>>
>>
>> 1. The enterprise client and the cloud provider should maintain their own
>> internal IDs for a resource, which they should not reveal to each other.
>> Both of them should map their internal IDs to a shared External ID, and
>> this is the only ID that should be exposed through the API. The current
>> specification's provision of an id (which is the external ID and the only
>> one to be transferred through the API) and an "external ID" (which is the
>> client's internal ID and should be hidden) is diametrically opposite to
>> this.
>>
>>
>>
>> 2. When dealing with multi-valued attributes of a resource (expressed as
>> arrays in JSON), they must be converted from an array into a dictionary
>> with unique keys (UUIDs generated by the cloud provider when the attribute
>> is created). Without unique keys for every attribute value of a resource,
>> manipulating it will be clumsy and inelegant.
>>
>>
>>
>> 3. The PATCH command can be improved in 3 significant ways:
>>
>> 3a. Leverage the fact (from 2 above) that every value has a key, to
>> greatly simplify the API
>>
>> 3b. Use special verbs as nested operations of the PATCH command to add,
>> modify and delete attributes at any level
>>
>> 3c. Use the WebDAV status code of "207 Multi-Status" instead of "200 OK"
>> as the response to a PATCH (or BULK) command.
>>
>>
>>
>> To elaborate,
>>
>>
>>
>> 1. Revealing private IDs externally is a form of tight coupling. A major
>> requirement with Identity Management is to split (or merge) identities when
>> false positives (or false negatives) are detected, i.e., when a resource is
>> discovered to be more than one, or when multiple resources are detected to
>> be the same. If internal identifiers are revealed to external domains, such
>> clean-ups become difficult, hence every domain that wants to expose
>> references to a resource must map its internal ID to and external one
>> created for this explicit purpose, and only reveal this.
>>
>>
>>
>> In the SCIM case, when an enterprise client POSTs a resource creation
>> request, the cloud provider must generate its own internal UUID as well as
>> an external UUID, map them together, and only return the external UUID in
>> the "Location:" header. The enterprise client should map this external UUID
>> to a newly-generated internal ID of its own. In case the resource already
>> has an identifier within the enterprise client's domain, then this is the
>> internal ID that must be mapped to the external UUID returned through the
>> POST response.
>>
>>
>>
>> 2. If a resource is to be created, and one of its attributes is
>> multi-valued, e.g.,
>>
>>
>>
>>     "email-addrs" :
>>
>>     [
>>
>>         "john_smith@yahoo.com",
>>
>>         "john.smith@gmail.com",
>>
>>         "jsmith1970@hotmail.com"
>>
>> <
>>
>> _______________________________________________
>> scim mailing list
>> scim@ietf.org
>> https://www.ietf.org/mailman/listinfo/scim
>>
>>
>>
>>
>> _______________________________________________
>> scim mailing list
>> scim@ietf.org
>> https://www.ietf.org/mailman/listinfo/scim
>>
>>
>>
>> ------------------------------
>>
>> This e-mail message, including any attachments, is for the sole use of
>> the person to whom it has been sent, and may contain information that is
>> confidential or legally protected. If you are not the intended recipient or
>> have received this message in error, you are not authorized to copy,
>> distribute, or otherwise use this message or its attachments. Please notify
>> the sender immediately by return e-mail and permanently delete this message
>> and any attachments. Gartner makes no warranty that this e-mail is error or
>> virus free.
>>
>
>
> _______________________________________________
> scim mailing list
> scim@ietf.org
> https://www.ietf.org/mailman/listinfo/scim
>
>