Re: [scim] SCIM Synchronization Problem

Phil Hunt <phil.hunt@independentid.com> Tue, 24 August 2021 01:03 UTC

Return-Path: <phil.hunt@independentid.com>
X-Original-To: scim@ietfa.amsl.com
Delivered-To: scim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2F49E3A11F4 for <scim@ietfa.amsl.com>; Mon, 23 Aug 2021 18:03:21 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Level:
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=independentid-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id I1Hyg4JaM9Fq for <scim@ietfa.amsl.com>; Mon, 23 Aug 2021 18:03:14 -0700 (PDT)
Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9EF793A11F0 for <scim@ietf.org>; Mon, 23 Aug 2021 18:03:14 -0700 (PDT)
Received: by mail-pl1-x635.google.com with SMTP id m17so4180290plc.6 for <scim@ietf.org>; Mon, 23 Aug 2021 18:03:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=independentid-com.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=r+issObLs72AXyNLcviIwf4mNPI4ynlyrT/uE47T5zU=; b=FIVFQ0Q1Mmu+t6wupCp9epURHiU3GxAQ50iYbRpRNZbhPAhKE43YgZP3gerVHeqySN frDDklA4XN4kxOer9oZnQt2Uiebq6ydHCm8Eq4JMzM/qTco8MJAhHg0+aTUNdRb7V48m 0WgnweMj9DJRibsszeE7c8zW8LrtfoOroT6V2HXnQkfq0KtzCJYnGFkOC7CzecIrdW27 r4TCPsCA+pR02BLqdxYIRpjoLVQKn6pVU5o3EemRmpqjGHj0at+2VlchRJ3NM/DqH44q 2Utls4TwKjc1KNO9S1oLM37iQbe/kgOLTQNXxO3NOsqYrNNetc/jqQqVyjDhNS3btdkL 24zg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=r+issObLs72AXyNLcviIwf4mNPI4ynlyrT/uE47T5zU=; b=IhJ3ad+yh877DG3Q3Pyb8MeKj1E9w/u1Nr/kHksHwOu8FCAd+6c0evmpmpQhzRLAzq RdVlzmHoPDEU07yM/qk18uzSJrHsrAP0mfqvShlYEpMnAu3NH7hMAexgpI4StanncZTR Ckoe+2/UTkZngVHEXktOQM3ZAhlcBBgPz6zlRQ8gphtb4X0wTdVeHErc2y0n+sDc46kb iXjENXXjqjS3MLjRym1/jkqvgldXC8VvbR5oUhJqdwNgRNn0/L8gvGT4NrBMhEZvxfM/ cH1Ef/iWGf4E+FESLwYwDc0kMjfkkLX/T1YD9iZte289ImQDFA6aFofZmYQWhxSkaKEn YmmQ==
X-Gm-Message-State: AOAM53096EqUSSpl3IR6dFQptoByHKsddCcCU63c8tLj2LGg2NBGriBX Hre3mMTpdqenBhXx0u/6riaX2jMUxwjh2esg
X-Google-Smtp-Source: ABdhPJxb1Gzdmy6jcolJR5mRb1b6uWnUeNYf5sAiDoB8nREa9L8rWNVRZR6R5RZIJ00wGuuOhWFx2A==
X-Received: by 2002:a17:903:18f:b0:130:2a02:1c9a with SMTP id z15-20020a170903018f00b001302a021c9amr19926984plg.78.1629766993384; Mon, 23 Aug 2021 18:03:13 -0700 (PDT)
Received: from smtpclient.apple (node-1w7jr9qrfoxx7wvwgxp6p1ofw.ipv6.telus.net. [2001:569:7a71:1d00:1991:85db:82f0:92fc]) by smtp.gmail.com with ESMTPSA id t15sm15189676pgk.13.2021.08.23.18.03.12 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Aug 2021 18:03:12 -0700 (PDT)
From: Phil Hunt <phil.hunt@independentid.com>
Message-Id: <439E06DD-213B-4C53-8BB7-EFF91ACB2FE9@independentid.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_07A371A5-0D65-4142-818A-A90183AD8BE8"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\))
Date: Mon, 23 Aug 2021 18:03:11 -0700
In-Reply-To: <MWHPR19MB0957723BC8FE476F68EBDEC6E1C49@MWHPR19MB0957.namprd19.prod.outlook.com>
Cc: Danny Mayer <mayer@pdmconsulting.net>, SCIM WG <scim@ietf.org>
To: "Matt Peterson (mpeterso)" <Matt.Peterson=40oneidentity.com@dmarc.ietf.org>
References: <e8f9d66c-f356-61b8-d38a-b5288fb9c518@pdmconsulting.net> <MWHPR19MB0957823E52255520BD4DB305E1C09@MWHPR19MB0957.namprd19.prod.outlook.com> <0f21255e-4ea9-b42d-353d-5c0661e1be5e@pdmconsulting.net> <MWHPR19MB0957723BC8FE476F68EBDEC6E1C49@MWHPR19MB0957.namprd19.prod.outlook.com>
X-Mailer: Apple Mail (2.3654.120.0.1.13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/scim/3Q-Wm0ib0DRRD2OpR8VxaehwOe0>
Subject: Re: [scim] SCIM Synchronization Problem
X-BeenThere: scim@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Simple Cloud Identity Management BOF <scim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scim>, <mailto:scim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/scim/>
List-Post: <mailto:scim@ietf.org>
List-Help: <mailto:scim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scim>, <mailto:scim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Aug 2021 01:03:22 -0000

+1 Matt

Ironically when developing the Security Event Token spec, we listed a hypothetical SCIM example in RFC8417.  For example in Section 2.2.1 (https://datatracker.ietf.org/doc/html/rfc8417#section-2.1.1), a password reset event.

  {
     "iss": "https://scim.example.com",
     "iat": 1458496025,
     "jti": "3d0c3cf797584bd193bd0fb1bd4e7d30",
     "aud": [
       "https://jhub.example.com/Feeds/98d52461fa5bbc879593b7754",
       "https://jhub.example.com/Feeds/5d7604516b1d08641d7676ee7"
     ],
     "sub": "https://scim.example.com/Users/44f6142df96bd6ab61e7521d9",
     "events": {
       "urn:ietf:params:scim:event:passwordReset": {
         "id": "44f6142df96bd6ab61e7521d9"
       },
       "https://example.com/scim/event/passwordResetExt": {
         "resetAttempts": 5
       }
     }
   }

Notice under the “events” attribute is a somewhat SCIM like structure consisting of an Event ID (urn:ietf:params:scim:event:passwordReset), a payload (the id). The example has an extension where additional data is provided which in this case is “resetAttempts”.

We could also have a set of *simple* IDM events indicating:  CREATE, MODIFY, DELETE and an id.  The events don’t even have to contain data. You could then have the receiver simply do a GET on the resource allowing it to reconcile the current resource.  This enables access control restrictions to process and keeps event payloads simple in the event.

SET has two approved RFCs for exchanging events, namely SET PUSH (RFC8935) and POLLING (RFC8936).  

Polling mode is great when a management server behind a firewall.  It can use HTTP LONG POLLING in order to receive events in real time. Polling is also good because you can theoretically re-use the same credential to pick up events.  Push on the other hand requires a credential be placed in the publishing side to POST to the event receiver.

Push ends up being preferred in situations where a single event publisher has many 1000s of subscribers (this was the downfall of pubsubhubub).  I think for most SCIM cases, we are looking at more equal relationships of publishers to subscribers.

Phil Hunt
@independentid
phil.hunt@independentid.com




> On Aug 23, 2021, at 4:49 PM, Matt Peterson (mpeterso) <Matt.Peterson=40oneidentity.com@dmarc.ietf.org> wrote:
> 
> The management server in your scenario is a SCIM client.  
> 
> If an application administrator goes in to the Application Server  and adds, removes, or modifies account,  the Management Server  has one option when using SCIM -- which is to query the Application Server (the SCIM server) for all accounts and then resolve the differences between this call and a previous call where all accounts were fetched.   This is very inefficient.  Especially when it comes to detecting deleted accounts, there is no other way in standard SCIM except to have your Management Server re-load all the accounts from your Application server.  
> 
> What you've implemented in your Management Server *is* a pure client-side cache scenario as I described it.  Except that you've recognized that the SCIM spec makes it very inefficient to keep your Management Serve up-to-date with changes happening "out-of-band" on the Application Servers.  
> 
> So... you implemented a notification system whereby your Management Server (SCIM Client) can be notified of changes that just happened on the Application Server (the SCIM server).
> 
> Your implementation, where the Application Server (SCIM server) notifies your Management Server (SCIM client) that a change has occurred *is* one of the proposed technical approaches identified by @Phil in a previous post to this list and was also mentioned in the BOF.  
> 
> I believe that an approach (similar to your implementation ) whereby a SCIM client can subscribe to be notified of changes on the SCIM server is an intuitive approach for solving the "Syncronization problem"
> 
> --
> Matt
> 
> 
> -----Original Message-----
> From: Danny Mayer <mayer@pdmconsulting.net <mailto:mayer@pdmconsulting.net>> 
> Sent: Friday, August 20, 2021 8:46 AM
> To: Matt Peterson (mpeterso) <Matt.Peterson@oneidentity.com <mailto:Matt.Peterson@oneidentity.com>>; SCIM WG <scim@ietf.org <mailto:scim@ietf.org>>
> Subject: Re: [scim] SCIM Synchronization Problem
> 
> CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> It's not that simple. If an administrator goes in an adds a new account or updates an existing account and adds permissions to it, how will the management side know? The way I did it was to have the application monitor for those changes and notify the management server of the changes. I did this via a message queue for our internal applications.
> The management server needs to decide how to handle these violations.
> 
> Danny
> 
> On 8/19/21 1:33 PM, Matt Peterson (mpeterso) wrote:
>> Thank you, Danny, for creating a new thread.  The following are my comments from meeting minutes (some edits):
>> 
>> There appears to be two use cases for "synchronization":
>> 
>>   1.  Constructing and enforcing Application authorization models -  An application (acting as a SCIM client) caches users/groups data from the IdP in order to present "user/group pickers" when presenting screens used to configure the application  authorization rules (RBAC, Policy, or ACL).  Also, when the application enforces authorization, user and group data needs to be immediately available so that authorization decisions can be made quickly.
>> 
>>   2.  Identity Management and Governance systems - implement a canonical identity model" where all accounts and groups are represented.  This model is used to build provisioning rules and calculate separation of duty violations, attestations, and approvals etc.   Management-time evaluation of the model needs to be done efficiently without external calls to the SCIM service provider.
>> 
>> Both above use cases, are "client-cache" use cases that need only a "one-way" sync (from the SCIM server to the SCIM client).   To accomplish there are two distinct steps:  a) download the initial results (users/groups) and, b) keep the cached copy of these initial results (user/groups) up to date with changes that are being made on the SCIM Service provider.
>> 
>> I think it could help us narrow down the list of suitable approaches if we could agree that "one-way" sync (keeping a client-side cache up to date) is our target.
>> 
>> --
>> Matt Peterson
>> 
>> -----Original Message-----
>> From: scim <scim-bounces@ietf.org> On Behalf Of Danny Mayer
>> Sent: Wednesday, August 18, 2021 8:34 AM
>> To: SCIM WG <scim@ietf.org>
>> Subject: [scim] SCIM Synchronization Problem
>> 
>> CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.
>> 
>> 
>> I decided that this needs it's own thread and not be part of the meeting minutes.
>> 
>> I have had a great deal of experience dealing with the user account synchronization problem. Here's my view of the problems.
>> 
>> I will be calling one system Management Server and the other system Application Server. I found client/server labels confusing. The Management Server is what I am defining to be the server that sends updates to add/update/remove users and groups to the Application server whose account, groups and access permissions are being managed.
>> 
>> First some definitions of user accounts. There are usually more than one of each of these:
>> 1. Builtin accounts
>> 2. Special-purpose accounts
>> 3. Employee
>> 4. Contractor
>> 5. Agent
>> 6. Customer
>> 
>> There may be more.
>> 
>> 1. Builtin accounts: These are accounts that applications have and there may be more than one. There is always an admin account which can do anything, for example the administrator account in Active Directory or a database admin account. The application may have more accounts for other purposes.
>> 
>> 2. Special-purpose accounts: These may be set up to provide access to other applications, for example a SCIM request to a SCIM REST API should be handled by a special account which cannot be used to login via a UI interface and only be able to perform certain functions. In addition there may be accounts set up to listen for topics or queues on a message queue among other possibilities. Keeping separate accounts like this are important for tracking in logs and applications.
>> 
>> 3. Employee: These are accounts that employees may login to the application.
>> 
>> 4. Contractors: These accounts that a contractor performing work for the company may use to log into an application. Unlike Employee accounts these would have an expiration date.
>> 
>> 5. Agent: Accounts like this are for external users who may need to manage information for their own customers. An example of this is an insurance agent logging in to handle an insurance policy for their clients.
>> 
>> 6. Customers: These are where the customers are using the application directly. For a bank it's likely to be millions of customers. The management platform should not be involved in managing these accounts.
>> 
>> Let's now look at a few example applications.
>> 
>> 1. Helpdesk
>> All employees and contractors will need to be able to log into a helpdesk application and enter tickets. This means loading information about all employees and contractors. For a company with only 1000 employees that's manageable. For a company with 100K employees, it's a bigger challenge.
>> 
>> 2. Customer Support
>> Only employees or contractors in the department providing customer support need access plus a few other employees. In addition identified customers may need accounts.
>> 
>> 3. Expenses
>> Not all employees or contractors will be submitting expenses so it may not be necessary to have accounts for all possible users. This is something that the application owner needs to decide.
>> 
>> Now let's look at logistics.
>> 
>> Bulk load:
>> Each application will need an initial set of accounts set up and for something like a helpdesk this could involve load 1000-100,000 accounts.
>> The information needed could come from either the management server or separately, say from an HR system. Many servers that I have encountered limit the number of records to something like 1000, so the pagination requirement is needed for this. Even when dealing with a limited subset of employees or contractors you can run into this need.
>> 
>> Synchronization
>> An application that is bulk-loaded above may need to be synchronized to the management server if the data did not come from the management server.
>> 
>> Change Management
>> This is really a synchronization issue as well. Changes happen all the time and new employees/contractors need to be added, terminated ones removed and updates happen all the time. The best way of dealing with this may be to set up a message queue that each application can subscribe to and they can take the needed action when it's convenient for that application. It's not the only method but it's the one I found to be the most helpful. There are two ways of doing that: 1. send the complete user information for new accounts, send just the change for updating accounts, send the ID for terminated accounts along with some meta information. The other method which I have used is just to send the ID and whether it's new, updated or terminated.
>> 
>> I hope this is helpful to the discussion.
>> 
>> Danny
>> 
>> 
>> _______________________________________________
>> scim mailing list
>> scim@ietf.org
>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
>> ietf.org <http://ietf.org/>%2Fmailman%2Flistinfo%2Fscim&amp;data=04%7C01%7CMatt.Peterson%40oneidentity.com <http://40oneidentity.com/>%7C0502a1bbffb94f6220b808d963e944e9%7C91c369b51c9e439c989c1867ec606603%7C0%7C1%7C637650675822236228%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ol7Mrjj%2Bx2SGTgQegrvwVNlVeMODf6pxEOc5PTjkK1g%3D&amp;reserved=0
>> 
> 
> _______________________________________________
> scim mailing list
> scim@ietf.org <mailto:scim@ietf.org>
> https://www.ietf.org/mailman/listinfo/scim <https://www.ietf.org/mailman/listinfo/scim>