Re: [scim] SCIM Protocol - 3 suggestions for improvement

Kelly Grizzle <kelly.grizzle@sailpoint.com> Thu, 09 August 2012 14:28 UTC

Return-Path: <kelly.grizzle@sailpoint.com>
X-Original-To: scim@ietfa.amsl.com
Delivered-To: scim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 37C2521F86C2 for <scim@ietfa.amsl.com>; Thu, 9 Aug 2012 07:28:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.598
X-Spam-Level:
X-Spam-Status: No, score=-3.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m7RSc4jUWb9V for <scim@ietfa.amsl.com>; Thu, 9 Aug 2012 07:28:19 -0700 (PDT)
Received: from co1outboundpool.messaging.microsoft.com (co1ehsobe001.messaging.microsoft.com [216.32.180.184]) by ietfa.amsl.com (Postfix) with ESMTP id 7ADCD21F85F7 for <scim@ietf.org>; Thu, 9 Aug 2012 07:28:19 -0700 (PDT)
Received: from mail63-co1-R.bigfish.com (10.243.78.226) by CO1EHSOBE003.bigfish.com (10.243.66.66) with Microsoft SMTP Server id 14.1.225.23; Thu, 9 Aug 2012 14:28:18 +0000
Received: from mail63-co1 (localhost [127.0.0.1]) by mail63-co1-R.bigfish.com (Postfix) with ESMTP id AA74E740088; Thu, 9 Aug 2012 14:28:18 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:157.56.236.85; KIP:(null); UIP:(null); IPV:NLI; H:BY2PRD0410HT001.namprd04.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore: -26
X-BigFish: PS-26(zz98dI9371Ic89bhc430Ic85dh1432I111aI1447Izz1202hzz1033IL122ac1I8275eh8275bh8275dha1495iz2fh2a8h668h839hd25hf0ah107ah)
Received-SPF: pass (mail63-co1: domain of sailpoint.com designates 157.56.236.85 as permitted sender) client-ip=157.56.236.85; envelope-from=kelly.grizzle@sailpoint.com; helo=BY2PRD0410HT001.namprd04.prod.outlook.com ; .outlook.com ;
Received: from mail63-co1 (localhost.localdomain [127.0.0.1]) by mail63-co1 (MessageSwitch) id 1344522495237811_28009; Thu, 9 Aug 2012 14:28:15 +0000 (UTC)
Received: from CO1EHSMHS021.bigfish.com (unknown [10.243.78.227]) by mail63-co1.bigfish.com (Postfix) with ESMTP id 303B0BC00DB; Thu, 9 Aug 2012 14:28:15 +0000 (UTC)
Received: from BY2PRD0410HT001.namprd04.prod.outlook.com (157.56.236.85) by CO1EHSMHS021.bigfish.com (10.243.66.31) with Microsoft SMTP Server (TLS) id 14.1.225.23; Thu, 9 Aug 2012 14:28:13 +0000
Received: from BY2PRD0410MB354.namprd04.prod.outlook.com ([169.254.10.142]) by BY2PRD0410HT001.namprd04.prod.outlook.com ([10.255.83.36]) with mapi id 14.16.0175.005; Thu, 9 Aug 2012 14:28:12 +0000
From: Kelly Grizzle <kelly.grizzle@sailpoint.com>
To: Emmanuel Dreux <edreux@cloudiway.com>, Ganesh and Sashi Prasad <g.c.prasad@gmail.com>
Thread-Topic: [scim] SCIM Protocol - 3 suggestions for improvement
Thread-Index: AQHNcCv7gNwjoPnr4E2VDZsFNg+dnpdQQ4bAgAB6fACAAHCpAIAAZFlQ
Date: Thu, 09 Aug 2012 14:28:11 +0000
Message-ID: <56C3C758F9D6534CA3778EAA1E0C343733024BFD@BY2PRD0410MB354.namprd04.prod.outlook.com>
References: <CAOEeopgkEs9Z8WT_3kNw=owhL+g6JM8jmkS2f50pFFPrLt4Fbw@mail.gmail.com> <56C3C758F9D6534CA3778EAA1E0C34373302493F@BY2PRD0410MB354.namprd04.prod.outlook.com> <CAOEeopgVDq4L_fefJO0h+AeJxRNAdyL6QKxK=ewRGwX-OqeA+A@mail.gmail.com> <DF63ACC82673DB40A7AAC08FFA71DFBD27416E0B@AMXPRD0610MB353.eurprd06.prod.outlook.com>
In-Reply-To: <DF63ACC82673DB40A7AAC08FFA71DFBD27416E0B@AMXPRD0610MB353.eurprd06.prod.outlook.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [173.226.147.242]
Content-Type: multipart/alternative; boundary="_000_56C3C758F9D6534CA3778EAA1E0C343733024BFDBY2PRD0410MB354_"
MIME-Version: 1.0
X-OriginatorOrg: sailpoint.com
Cc: "scim@ietf.org" <scim@ietf.org>
Subject: Re: [scim] SCIM Protocol - 3 suggestions for improvement
X-BeenThere: scim@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Simple Cloud Identity Management BOF <scim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scim>, <mailto:scim-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/scim>
List-Post: <mailto:scim@ietf.org>
List-Help: <mailto:scim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scim>, <mailto:scim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Aug 2012 14:28:38 -0000

Thanks Emmanuel.  I had started writing up a similar response.  As you suggest, storing this information in a mapping table outside of the SCIM spec is a great way to enable this solution.  Part of the key here is that SCIM is just a piece of the architecture for this solution, and is only responsible for the transport layer between domains.

You could also model these ID mappings in the SCIM user as an extension but would probably not want to expose these externally.  Here is an example of how to model the end state of the false positive scenario (splitting a user):

| Internal Entity ID | External Domain ID | External Entity ID | Primary flag |
| 9caf78aac3d6       | D2                 | ff487230b3a0       | true         |
| a99a5feba839       | D2                 | 7a87f27c1dd8       | true         |

This could be represented as two SCIM users that contain information about the entities on other domains.

{
  "schemas": ["urn:scim:schemas:core:1.0", "urn:scim:schemas:extension:federation:1.0"],
  "id": "9caf78aac3d6",
  "userName": "John Smith",
  "urn:scim:schemas:extension:federation:1.0": {
    "linkedUsers": [
      {
        "domain": "D2",
        "externalEntityId": "ff487230b3a0"
      }
    ]
  }
}

{
  "schemas": ["urn:scim:schemas:core:1.0", "urn:scim:schemas:extension:federation:1.0"],
  "id": "a99a5feba839",
  "userName": "John Smith",
  "urn:scim:schemas:extension:federation:1.0": {
    "linkedUsers": [
      {
        "domain": "D2",
        "externalEntityId": "7a87f27c1dd8"
      }
    ]
  }
}

In the second user, the linkedUsers attribute would be empty until the split user was synced to domain 2.


Similarly, the false negative use case (merging two users) looked like this at the end:

| Internal Entity ID | External Domain ID | External Entity ID | Primary flag |
| 9caf78aac3d6       | D2                 | ff487230b3a0       | true         |
| 9caf78aac3d6       | D2                 | 41206cc97c8b       | false        |

This could be represented with the following SCIM user:

{
  "schemas": ["urn:scim:schemas:core:1.0", "urn:scim:schemas:extension:federation:1.0"],
  "id": "9caf78aac3d6",
  "userName": "John Smith",
  "urn:scim:schemas:extension:federation:1.0": {
    "linkedUsers": [
      {
        "domain": "D2",
        "externalEntityId": "ff487230b3a0"
      },
      {
        "domain": "D2",
        "externalEntityId": "41206cc97c8b",
        "deletionRequired": true
      }
    ]
  }
}


Regarding unique identifiers for multi-valued attributes there is a trade-off involved.  On one hand this makes PATCH semantics easier.  On the other hand it puts extra burden on service providers.  Since the inception of SCIM, a key goal has been to foster adoption by service providers by making things fit easily onto existing systems.  IMO the value gained by unique identifiers for multi-valued attributes is not worth the demands put on a service provider.  I also think that vendors that have a non-SCIM-compliant API will choose to keep things that way if the spec is too hard for them to implement.  In a green field environment we do have the luxury of mandating a model to make certain operations more elegant.  However, we can't ignore legacy systems.

--Kelly

From: scim-bounces@ietf.org [mailto:scim-bounces@ietf.org] On Behalf Of Emmanuel Dreux
Sent: Thursday, August 09, 2012 3:18 AM
To: Ganesh and Sashi Prasad; Kelly Grizzle
Cc: scim@ietf.org
Subject: Re: [scim] SCIM Protocol - 3 suggestions for improvement

Hi Ganesh,

Nothing prevents you in your SCIM implementation (client or server) to generate a unique identifier for each synchronized object and maintain an internal mapping table ( you would have to map group membership as well).

This is what we are doing with Active Directory sources or targets:
As we didn't find an immutable uniqueID in AD systems (DN,samAccountName, UPN) are subject to change (even objectGuid can change if an AD domain is migrated), we decided to generate and maintain an internal table of ids. This fits your requirements as it hides internal ids.

This was written in dotnet and we have started a project to rewrite our SCIM stack in PHP and will give it to the Open Source community. This implementation will have a parameter : AllocateIds versus UseExistingIDs.
This will give the choice of "hiding" internalIDs or use them as unique ID.

You can also implement such feature without violating the SCIM specs, or without asking to include it in the specs.

--
Regards,
Emmanuel Dreux
http://www.cloudiway.com
Tel: +33 4 26 78 17 58
Mobile: +33 6 47 81 26 70
skype: Emmanuel.Dreux

De : Ganesh and Sashi Prasad [mailto:g.c.prasad@gmail.com]
Envoyé : jeudi 9 août 2012 03:35
À : Kelly Grizzle
Cc : scim@ietf.org<mailto:scim@ietf.org>
Objet : Re: [scim] SCIM Protocol - 3 suggestions for improvement


Hi Kelly,

Thanks for your response. Let me first respond in brief to the two main points you have made, and then elaborate on the first.

1.      Why should domains not expose their internal identifiers to other domains?

a.

We are designing a protocol for a federated system of domains, where all domains are co-equal peers. (In physics too, N-body problems are much harder than 2-body problems. :-) Therefore, assuming that there are only two players in the interaction makes this tightly coupled in a number of ways. We should rely on messaging and notification, with encapsulation of domain-specific data.

b.

In any non-trivial data store, there will always be the ongoing need to merge and split identities as and when "false negatives" and "false positives" are discovered. A domain should be able to handle this internal housekeeping freely, only notifying other domains when convenient. Mapping of internal identifiers to external ones and maintaining this mapping internally allows this loosely-coupled housekeeping to take place. Sharing internal identifiers (or otherwise outsourcing the mapping of internal to external identifiers) forces housekeeping activities to be done in lock-step across domains.

c.

Asynchronous interaction is not just a matter of a suitable wire protocol which can be designed later. The data model plays a crucial role in enabling or constraining such interaction. A tightly-coupled data model will force the use of synchronous interactions, and the exposure of internal identifiers is a key part of this tight coupling.

2. The difficulty of assigning unique identifiers to the individual values of multi-valued attributes:

a.

I'm not belittling the effort involved in migrating legacy data stores to such a model. However, in the larger historical context of cross-domain identity management, we are really at the very early stages. If a relatively new discipline and a brand new spec are held captive to legacy considerations, we are losing an opportunity to provide a clean and elegant model to subsequent users of the spec, and this will have repercussions over many years or even decades.

b.

If incumbent cloud providers find it hard to immediately adopt the dictionary model for existing multi-valued attributes, they can transition to this model by offering both "SCIM-compliant" and "non-SCIM-compliant" APIs to their customers and encouraging new customers to adopt the "SCIM-compliant" API. Legacy customers can be supported using a "non-SCIM-compliant" API for an arbitrarily long period and gradually migrated to the SCIM-compliant API. The logistics are not insurmountable, and shouldn't prevent the adoption of a dictionary model for multi-valued attributes.

Elaboration of Point 1:

When we consider federated identity across more than one domain, we have to assume that domains are not necessarily master-slave in their interaction. The most generic interaction model is peer-to-peer, where entity lifecycle events within a domain are notified to other domains (when necessary) in an asynchronous manner (i.e., through messaging) and the other domains are free to respond to these events in an appropriate manner and at a time of their convenience.

A key set of lifecycle events for an entity is the merging and splitting of identity that is often required.

The question "Is this one entity?" can be answered either yes (positive) or no (negative). But sometimes, we can discover false positives and false negatives in our data stores.

Consider a case where customers sign up online, and two customers who are privacy-conscious enter fake IDs such as "John Smith", and also use the same date of birth (say, 1 Jan 1970) or similar attributes. The front-end application may make an intelligent (but incorrect) guess that these two persons are the same, and re-assign the same identifier to the second person. This is a false positive. They appear to be the same entity, but they're actually different. When the error is discovered, the identities will need to be split, with a new identifier generated for one of them.

Consider the opposite case where a customer signs up through two different portals or in two different sessions, using the names "JSmith" and "JohnS". It is very likely that they will be treated as two different customers and assigned two unique identifiers. This is a false negative. They appear to be two entities, but are actually the same. At a later stage, when the error is discovered, the identities will have to be merged, and one of the identifiers will have to be dropped.

These are not theoretical use cases. They form a significant proportion of the user base in most large Web-facing applications. Let's see how these can be managed in a federated way by mapping internal identifiers to external ones and only exposing external identifiers to other domains.

a. False positives:

Domain 1 has the following information about a customer in its data store:

Internal ID: 9caf78aac3d6

Attributes: {name: "John Smith", dob: "01-Jan-1970"}

When requesting the provisioning of this entity in Domain 2, the following ID is returned by Domain 2: ff487230b3a0.

Domain 1 then maintains the following in a mapping table and uses it for translation when talking to Domain 2, taking care never to expose its internal identifier:

| Internal Entity ID | External Domain ID | External Entity ID | Primary flag |

| 9caf78aac3d6 | D2 | ff487230b3a0 | true |

When the false positive is discovered and the entity is split, Domain 1 creates a new internal identifier and now has the following entity information.

Internal ID: 9caf78aac3d6

Attributes: {name: "John Smith", dob: "01-Jan-1970"}

Internal ID: a99a5feba839

Attributes: {name: "John Smith", dob: "01-Jan-1970"}

This second entity with its own internal identifier is invisible to Domain 2, and this is by design. Communication about the original entity takes place as before by mapping "9caf78aac3d6" to "ff487230b3a0" and vice-versa. At some convenient time (importantly, this doesn't have to be at the time the split happens), Domain 2 can be requested to provision a second entity, and when it responds with an identifier of "7a87f27c1dd8", this can go into the mapping table as a new record associated with the second entity's internal identifier.

The mapping table now contains the following entries:

| Internal Entity ID | External Domain ID | External Entity ID | Primary flag |

| 9caf78aac3d6 | D2 | ff487230b3a0 | true |

| a99a5feba839 | D2 | 7a87f27c1dd8 | true |

Domain 2 is not even aware that a split has happened, and the provisioning that it does is not in lockstep with the split in identity that occurred in Domain 1.

(What is the "Primary flag" used for? We'll see when we cover the treatment of false negatives.)

b. False negatives:

Domain 1 has the following information about what it thinks are two distinct customers in its data store:

Internal ID: 9caf78aac3d6

Attributes: {name: "JSmith", dob: "01-Jan-1970"}

Internal ID: 273d36e30d09

Attributes: {name: "JohnS", dob: "01-Jan-1970"}

When requesting the provisioning of these entities in Domain 2, the following IDs are returned by Domain 2: ff487230b3a0 and 41206cc97c8b.

Domain 1 then maintains the following in a mapping table and uses it for translation when talking to Domain 2, taking care never to expose its internal identifiers:

| Internal Entity ID | External Domain ID | External Entity ID | Primary flag |

| 9caf78aac3d6 | D2 | ff487230b3a0 | true |

| 273d36e30d09 | D2 | 41206cc97c8b | true |

When the false negative is discovered and the two entities are merged, Domain 1 drops one of the internal identifiers and rationalises the name of the customer (say, to "John Smith"). Let's say it retains the first ID "9caf78aac3d6" and drops the second "273d36e30d09".

The mapping table now looks like this:

| Internal Entity ID | External Domain ID | External Entity ID | Primary flag |

| 9caf78aac3d6 | D2 | ff487230b3a0 | true |

| 9caf78aac3d6 | D2 | 41206cc97c8b | false |

Now two external identifiers map to the same internal one, so inbound communication from Domain 2 can be unambiguously translated to the same entity internally. However, when going outwards, Domain 1 will have to look up the translation table to determine the "primary" external ID for this entity in Domain 2, which was decided to be "ff487230b3a0". That's where the "Primary flag" comes in. The second external ID "41206cc97c8b" is never used thereafter in outbound communication.

At some stage (importantly, not in lockstep with the identity merge), Domain 2 can be requested to delete the customer record identified by "41206cc97c8b", and the second entry in the mapping table can be removed once this is acknowledged.

This scheme will scale up to multiple domains, because the "External Domain ID" column helps to keep track of which external ID is shared with which Domain. (Why don't we use just one external ID for an entity and share it with all external domains? Tight coupling again. Just as OAuth allows an access token given to a third party to be invalidated without affecting the access of other third parties, the use of separate external identifiers for different domains allows fine-grained control of identity federation.)

The scheme also allows the splitting of an entity into more than two entities, and the merging of more than two entities into a single one. (Any organisation with a web-facing application will tell you how many John Smiths there are who were born on 1 Jan 1970!)

This is a fairly long-winded explanation, but this is why we need to hide internal identifiers from other domains, and why mappings need to be managed internally in each domain. Such a data model also allows us to choose asynchronous protocols for propagation of identity events, since there is no consistency requirement to update multiple domains concurrently.

Regards,

Ganesh Prasad

On 9 August 2012 04:55, Kelly Grizzle <kelly.grizzle@sailpoint.com<mailto:kelly.grizzle@sailpoint.com>> wrote:
Thanks for the feedback, Ganesh.  I read through this and your InfoQ article (http://www.infoq.com/articles/scim-data-model-limitations) and have some thoughts.

> The rest of the protocol does not meaningfully use the enterprise client's identifier, the "external ID"
> at all, even though it was ostensibly introduced to make things friendlier for the client.

The usage pattern for an external ID would be to search for a user by externalId and use the ID of the returned user in any desired operation.  For example:

GET /Users?filter=externalId eq "bjensen"&attributes=id

{
  "totalResults": 1,
  "Resources": [
    {
      "id": "2819c223-7f76-453a-919d-413861904646"
    }
  ]
}

Retrieve the ID from the response and use it.

DELETE /Users/2819c223-7f76-453a-919d-413861904646

This does introduce an additional HTTP request if the client chooses not to store the server's id.  An issue was created to consider allowing operations to use the externalId (http://code.google.com/p/scim/issues/detail?id=35), but I believe the general consensus has been to not include this in the spec.  One main point of contention is that much of the rest of the spec (eg - group membership references, manager references, etc...) require knowledge of the server's identifier.  Continuing this discussion on the IETF list would be a good thing, though.


> the cloud provider's ID and the enterprise client's ID are both "Internal IDs" with respect to their domains

I think this comes down to a nomenclature problem.  The server's ID does not necessarily have to be the unique identifier that the underlying identity store uses, it just has to be stable and unique.  In many cases, the underlying identity store will provide identifiers with these properties already (eg - a uuid) and it can be used by the SCIM interface.  The "externalId" is referring to the fact that the id is maintained external to the SCIM server.  As long as the server's identifiers are stable and unique (which is mandated by the spec), I don't see a problem.


> The secret is that every value needs a key, and multi-valued attributes lack that. So our solution is quite
> simple - turn every list or array (of values) into a dictionary (of key-value pairs) by providing each value
> with a unique and meaning-free identifier.

I agree that this would be useful, especially in the PATCH operation.  One reason that this wasn't included in the spec originally is that it can put undue burden on the service provider.  Many service providers are putting SCIM interfaces in front of their existing identity stores (eg - directory servers, SaaS application databases, etc...).  Many of these do not have a unique identifier for multi-valued attributes.  By requiring this, a majority of the server providers would have to start maintaining a unique key for each multi-valued attribute.  I believe this would be a roadblock for many implementers.


> When the SCIM protocol uses PATCH, there are areas where it seems a bit clumsy.

I like the thoughts here.  Your example reminds me of unified diffs (http://en.wikipedia.org/wiki/Diff#Unified_format), which are commonly used with a patch program (pretty much the equivalent of the PATCH verb).  However, the three proposals seem to largely hinge on being able to uniquely address each element within an object.  Without these it is not so easy to address each patch sub-operation (REPLACE, INCLUDE, etc...) or provide a multi-status.

The 207 response would be interesting to consider for the bulk endpoint (http://www.simplecloud.info/specs/draft-scim-api-00.html#bulk-resources), however.


> There are other, non-data aspects of SCIM which may require review, such as its synchronous request-response
> interaction model, which is a form of tight coupling and could prove to be a source of brittleness.

I agree that we should explore optional asynchronous requests in 2.0.

Thanks again for your thoughts.  I hope you stay involved in the discussion as work on SCIM 2.0 goes forward.

--Kelly

From: scim-bounces@ietf.org<mailto:scim-bounces@ietf.org> [mailto:scim-bounces@ietf.org<mailto:scim-bounces@ietf.org>] On Behalf Of Ganesh and Sashi Prasad
Sent: Wednesday, August 01, 2012 4:24 PM
To: scim@ietf.org<mailto:scim@ietf.org>
Subject: [scim] SCIM Protocol - 3 suggestions for improvement

(I posted this on the SCIM Google Group, and I was advised to subscribe to the mailing list and post it here instead, so here goes.)

Hi,

My name is Ganesh Prasad, and my experience in Identity and Access Management is mainly through a 3-year project at an Australian insurance company, an experience I have written about as a eBook on InfoQ (http://www.infoq.com/minibooks/Identity-Management-Shoestring).

I have been following the SCIM spec off and on, and based on my experience with a loosely-coupled architecture that I found to be successful, I have the following 3 suggestions to make.

1. The enterprise client and the cloud provider should maintain their own internal IDs for a resource, which they should not reveal to each other. Both of them should map their internal IDs to a shared External ID, and this is the only ID that should be exposed through the API. The current specification's provision of an id (which is the external ID and the only one to be transferred through the API) and an "external ID" (which is the client's internal ID and should be hidden) is diametrically opposite to this.

2. When dealing with multi-valued attributes of a resource (expressed as arrays in JSON), they must be converted from an array into a dictionary with unique keys (UUIDs generated by the cloud provider when the attribute is created). Without unique keys for every attribute value of a resource, manipulating it will be clumsy and inelegant.

3. The PATCH command can be improved in 3 significant ways:
3a. Leverage the fact (from 2 above) that every value has a key, to greatly simplify the API
3b. Use special verbs as nested operations of the PATCH command to add, modify and delete attributes at any level
3c. Use the WebDAV status code of "207 Multi-Status" instead of "200 OK" as the response to a PATCH (or BULK) command.

To elaborate,

1. Revealing private IDs externally is a form of tight coupling. A major requirement with Identity Management is to split (or merge) identities when false positives (or false negatives) are detected, i.e., when a resource is discovered to be more than one, or when multiple resources are detected to be the same. If internal identifiers are revealed to external domains, such clean-ups become difficult, hence every domain that wants to expose references to a resource must map its internal ID to and external one created for this explicit purpose, and only reveal this.

In the SCIM case, when an enterprise client POSTs a resource creation request, the cloud provider must generate its own internal UUID as well as an external UUID, map them together, and only return the external UUID in the "Location:" header. The enterprise client should map this external UUID to a newly-generated internal ID of its own. In case the resource already has an identifier within the enterprise client's domain, then this is the internal ID that must be mapped to the external UUID returned through the POST response.

2. If a resource is to be created, and one of its attributes is multi-valued, e.g.,

    "email-addrs" :
    [
        "john_smith@yahoo.com<mailto:john_smith@yahoo.com>",
        "john.smith@gmail.com<mailto:john.smith@gmail.com>",
        "jsmith1970@hotmail.com<mailto:jsmith1970@hotmail.com>"
    ]

then on successful creation, the server response should include the representation of the resource, and this attribute should look like this:

    "email-addrs" :
    [
        { "7dfcb444-74d8-4f17-aa66-daf9ea3bd902" : "john_smith@yahoo.com<mailto:john_smith@yahoo.com>" },
        { "3bd10085-c474-43b9-9cda-8646c3085bbf" : "john.smith@gmail.com<mailto:john.smith@gmail.com>" },
        { "581da5c7-c6e1-4cca-9db7-7a6d1de664e1" : "jsmith1970@hotmail.com<mailto:jsmith1970@hotmail.com>" }
    ]

The client now knows what each value is labelled. This now provides an unambiguous way to reference a value to add, modify and delete it:

Add:

POST /Users/2819c223-7f76-453a-919d-413861904646/email-addrs
value="js70@easy.com.au<mailto:js70@easy.com.au>"

Modify:

PUT /Users/2819c223-7f76-453a-919d-413861904646/email-addrs/3bd10085-c474-43b9-9cda-8646c3085bbf
value="john.r.smith@gmail.com<mailto:john.r.smith@gmail.com>"

Delete:
DELETE /Users/2819c223-7f76-453a-919d-413861904646/email-addrs/581da5c7-c6e1-4cca-9db7-7a6d1de664e1

One can even delete all email addresses like this:
DELETE /Users/2819c223-7f76-453a-919d-413861904646/email-addrs

I believe this is more elegant than what the spec recommends.

3. It's possible to think of the operations POST, PUT and DELETE as nested operations inside a PATCH. PATCH itself need not be nested because its semantics apply throughout the "tree" of a resource.

However, the semantics of PUT are a little messy. Also, the use of HTTP verbs at a different level could be confusing. That's why I would recommend 6 separate verbs that are a little more unambiguous in their meaning:

1. INCLUDE (equivalent to POST): Add this resource to a collection and return a generated URI
2. PLACE (equivalent to one form of PUT): Add this resource at the location specified by the accompanying URI. (If there's already a value at that location, return an error status.)
3. REPLACE (equivalent to another form of PUT): Replace the value at the location specified by the accompanying URI with this value. (If there's no such URI, return an error status.)
4. FORCE (equivalent to a third form of PUT): This means PLACE or REPLACE. (At the end of this operation, we want the specified URI to hold the accompanying value whether the URI already existed or not.)
5. RETIRE (equivalent to DELETE): Delete, deactivate or otherwise render inaccessible the resource at the specified URI.
6. AMEND (equivalent to PATCH): (This verb is just listed for completeness. We probably don't need a nested PATCH since PATCH cascades to every level of the tree.)

A PATCH request could therefore look like this:

PATCH /Users/2819c223-7f76-453a-919d-413861904646 HTTP/1.1
Host: example.com<http://example.com>
Accept: application/json
Authorization: Bearer h480djs93hd8
Content-length: ...

{
    REPLACE: {
        "key" : "first-name",
        "value" : "Jack"
    },
    PLACE : {
        "key" : "middle-name",
        "value" : "Richard"
    },
    FORCE : {
        "key" : "dob",
        "value" : "01-Jan-1971"
    },
    REPLACE : {
        "key" : "address.unit-number",
        "value" : "12"
    },
    PLACE : {
        "key" : "address.state",
        "value" : "SA"
    },
    FORCE : {
        "key" : "address.country",
        "value" : "Australia"
    },
    INCLUDE : {
        "key" : "email-addrs",
        "value" : "js70@easy.com.au<mailto:js70@easy.com.au>"
    },
    REPLACE : {
        "key" : "email-addrs/3bd10085-c474-43b9-9cda-8646c3085bbf",
        "value" : "john.r.smith@gmail.com<mailto:john.r.smith@gmail.com>"
    },
    RETIRE : {
        "key" : "email-addrs/581da5c7-c6e1-4cca-9db7-7a6d1de664e1"
    }
}

The PATCH response should utilise the status code "207 Multi-Status" because the nested operations could have varying status codes. A sample response is below:

HTTP/1.1 207 Multi-Status
Content-Type: application/json
ETag: W/"b431af54f0671a2"
Location:"https://example.com/v1/Users/2819c223-7f76-453a-919d-413861904646"

{
    "schemas":["urn:scim:schemas:core:1.0"],
    "external-id":"2819c223-7f76-453a-919d-413861904646",
    REPLACE: {
        "status" : "200 OK",
        "key" : "first-name",
        "value" : "Jack"
    },
    PLACE : {
        "status" : "200 OK",
        "key" : "middle-name",
        "value" : "Richard"
    },
    FORCE : {
        "status" : "200 OK",
        "key" : "dob",
        "value" : "01-Jan-1971"
    },
    REPLACE : {
        "status" : "200 OK",
        "key" : "address.unit-number",
        "value" : "12"
    },
    PLACE : {
        "status" : "200 OK",
        "key" : "address.state",
        "value" : "SA"
    },
    FORCE : {
        "status" : "200 OK",
        "key" : "address.country",
        "value" : "Australia"
    },
    INCLUDE : {
        "status" : "201 Created",
        "key" : "email-addrs/11f664ec-898b-4f6f-8948-ecfda74deff0",
        "value" : "js70@easy.com.au<mailto:js70@easy.com.au>"
    },
    REPLACE : {
        "status" : "200 OK",
        "key" : "email-addrs/3bd10085-c474-43b9-9cda-8646c3085bbf",
        "value" : "john.r.smith@gmail.com<mailto:john.r.smith@gmail.com>"
    },
    RETIRE : {
        "status" : "200 OK",
        "key" : "email-addrs/581da5c7-c6e1-4cca-9db7-7a6d1de664e1"
    }
    "meta": {
        "created":"2011-08-08T04:56:22Z",
        "lastModified":"2011-08-08T08:00:12Z",
        "location":"https://example.com/v1/Users/2819c223-7f76-453a-919d-413861904646",
        "version":"W\/\"b431af54f0671a2\""
    }
}

If there are errors, they will take the place of the "200 OK" or "201 Created" status codes in the above successful case. But the outer status will remain "207 Multi-Status".

The same scheme can be used to deal with operations on members of a group, and for bulk operations.

I hope you find these suggestions useful.

I read the SCIM spec afresh last week and these ideas came flooding into my head because I have been working at another organisation (a telco) for the last 5 months, also in Identity and Access Management, and my thoughts have moved further along the direction of evolving a specialised data model based on specific principles, especially for IAM.

I am planning to write about this and also the data-related principles soon and am in negotiations with InfoQ regarding publication.

Regards,
Ganesh Prasad