Re: [scim] SCIM Synchronization Problem

Phillip Hunt <phil.hunt@independentid.com> Tue, 24 August 2021 16:10 UTC

Return-Path: <phil.hunt@independentid.com>
X-Original-To: scim@ietfa.amsl.com
Delivered-To: scim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6E2663A1B9A for <scim@ietfa.amsl.com>; Tue, 24 Aug 2021 09:10:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=independentid-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kezIfQj1vpfY for <scim@ietfa.amsl.com>; Tue, 24 Aug 2021 09:10:23 -0700 (PDT)
Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 75E223A1B9B for <scim@ietf.org>; Tue, 24 Aug 2021 09:10:23 -0700 (PDT)
Received: by mail-pl1-x629.google.com with SMTP id a5so12529651plh.5 for <scim@ietf.org>; Tue, 24 Aug 2021 09:10:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=independentid-com.20150623.gappssmtp.com; s=20150623; h=content-transfer-encoding:from:mime-version:subject:date:message-id :references:cc:in-reply-to:to; bh=Z3y6I0rD7KmRd/JukNdaN8Au7oNUC9IIo4a9RFmHtMc=; b=tYYRItcAlDwSeXWbyzeEueZZE7JGpqBCFAy9/v8zavo3nICbMtbKazSXn+fp26KTxZ XgKOaRw7Ju5EFpcY8m8ALBwwYf9GolYNIPGqB8Y/DEmllQVvwSybksGjI+hYvqAc4uUn au6IMeMpk1ShuvwANVdCb8MyF+2rzx/zeMrN8aZe4Y83zLTBnsNtXre2BY6K8RMByz3Z kXgps9MuDSWIXHepL8rJ4ghaEWN6aQY68lRNR2QYLUYqm0PXHJyhSEzQBqTuU/EwC9Zm QRlg0UV6xLK9YVjlhM60LuAUGfsGbUdvtL7hZmiJdoEPs1f7KriqbBFIPtI47C02LtR1 HIVA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=Z3y6I0rD7KmRd/JukNdaN8Au7oNUC9IIo4a9RFmHtMc=; b=UTC5bkSn85I9yu/RiWmeeJIgGtdixVMKdSui06FF2eBB/On2tVrR1SkdJufCRWR4H4 RIzZrVT3oh8bOOs5l2IQgpxHfBX7c9/eJuXeOZrqdOlJQbtwPNI6vB3VyicDKtU0NXVg GQ4q+VkT/YKnsgK94K/fQY0k4Vh7O+xKLtOsNFNGRmA0BKIpiATKQyaIfbG/0z50rjRU 93u97aa4HJC8XODnq/BdmjHINeaBR95RIF41fP3ubPIUyQON8ip+XqQWgHUVCy7OaLiG /Xelq7beqzCVS4gvVype4zZAyNe+7LWSrVpl+VcRol0ltHMz4o9K7QVRGYxIljfD7rmC peYw==
X-Gm-Message-State: AOAM532mxdJ0xxu+Vy2tat2DrmsXmdz4W3qA19ml80GF0XpoypC8j71M p6J2BwE90eqADkGZ2e1YHT30LQ==
X-Google-Smtp-Source: ABdhPJw0Wv+64mjr20VVVycogvpqlvXIooCBkdUYOgxBSIED/AE0k5zIX0+32jmo3/gcA1NUxrFZ1w==
X-Received: by 2002:a17:90b:1981:: with SMTP id mv1mr5107749pjb.45.1629821420754; Tue, 24 Aug 2021 09:10:20 -0700 (PDT)
Received: from smtpclient.apple (node-1w7jr9qrfoxx9v2e839ysri3r.ipv6.telus.net. [2001:569:7a71:1d00:99c2:dc4a:9c9d:4867]) by smtp.gmail.com with ESMTPSA id q21sm22413546pgk.71.2021.08.24.09.10.19 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 Aug 2021 09:10:20 -0700 (PDT)
Content-Type: multipart/alternative; boundary="Apple-Mail-755B77A8-4C45-41B4-964B-29940A439E17"
Content-Transfer-Encoding: 7bit
From: Phillip Hunt <phil.hunt@independentid.com>
Mime-Version: 1.0 (1.0)
Date: Tue, 24 Aug 2021 09:10:19 -0700
Message-Id: <77332A5A-F740-4191-BCFF-137CA2B378BD@independentid.com>
References: <6062e0f5-736f-b3e6-79b9-6002c72a06fa@pdmconsulting.net>
Cc: craigmcc@gmail.com, "Matt Peterson (mpeterso)" <Matt.Peterson=40oneidentity.com@dmarc.ietf.org>, SCIM WG <scim@ietf.org>
In-Reply-To: <6062e0f5-736f-b3e6-79b9-6002c72a06fa@pdmconsulting.net>
To: Danny Mayer <mayer@pdmconsulting.net>
X-Mailer: iPhone Mail (18G82)
Archived-At: <https://mailarchive.ietf.org/arch/msg/scim/bsFI9W0EmFtBozbKf2zEVvj6T9Q>
Subject: Re: [scim] SCIM Synchronization Problem
X-BeenThere: scim@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Simple Cloud Identity Management BOF <scim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scim>, <mailto:scim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/scim/>
List-Post: <mailto:scim@ietf.org>
List-Help: <mailto:scim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scim>, <mailto:scim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Aug 2021 16:10:31 -0000

So a requirement here is to indicate what schema is of interest for sync. Any notification method needs to be filterable on at least schema-id and potentially specific attributes. Eg a receiver may only be interested in events related to roles. 

The also shows why pre-conditioned HEAD queries couldn’t be used. These types of queries can ask: does this url exist? Has this resource changed since this etag or date.  Obviously any change to content changes the hash and mod-dates. 

Obviously one of the downsides to attaching session information to the user resource is affecting change rates. Why not have separate session resources point to users?  Their change rate and lifecycle are much different than that of a user. A best practice would be to keep this data separate from the user. 
IOW -> how much do we need to accommodate in sync?  Should we be having a separate session schema discussion?

Phil

> On Aug 24, 2021, at 8:49 AM, Danny Mayer <mayer@pdmconsulting.net> wrote:
> 
> 
> I've been there and done that. We were lucky enough to have a way to differentiate between someone logging in and out and when an update is made to something that the SCIM client cares about. We preferred to send a notification to a queue that something has changed for the specific ID and application and leave it up to the "SCIM client" to query the application for the given ID. This had to be a manual operation as the SCIM client had no way of reading the queue. The queue was necessary for compliance and audit purposes so it wasn't a waste.
> 
> Danny
> 
> On 8/23/21 9:43 PM, Craig McClanahan wrote:
>> Several years ago (early in the lifetime of the SCIM specification), I was involved in exactly this kind of situation.  I was trying to extract the Identity portion of a monolith out into a separate service.  Naturally, as you'll see in lots of monoliths, the User database information was used for *lots* of things, not just authentication and authorization, so we couldn't just remove it from the monolith's database.
>> 
>> From a SCIM terminology perspective, life was actually pretty clear -- the monolith was the SCIM server (the source of truth), and the new Identity Service was the SCIM client.  Creating the initial download mechanism was no big deal -- one humongous batch transfer (or done in pieces if need be) and the Identity Service is up to date with the initial snapshot.  But what happens when:
>> User changes their name (or a bunch of other profile fields)?  Not security sensitive, but definitely UI sensitive.  If they are part of the data transferred from the SCIM server to the SCIM client, it's definitely relevant.
>> User logs on or logs off?  (Groan ... the monolith cared a lot about this, because it affects UI like "is this person currently logged in" ... even worse, the monolith updated a field in the actual user database row, which triggered a gazillion "user change" events (originally internal to the monolith) documenting a change that had no other impact than to change the last logged in timestamp ... sigh ... performance ... sigh.)
>> User changes their password?  We probably could have absorbed that function, but would have required modifications to a bunch of apps out of my team's purview, so we didn't.
>> For that matter, *any* change happens between when the snapshot was taken, and the Identity Service caught up.
>> Anyone who has ever tried this trick has most likely run into the same kinds of issues -- how do you deal with incremental changes that need to be communicated from the server to the client?  Some of those changes (passwords, ACLs, account deactivations, etc) are very much time critical.
>> 
>> Our team chose to try to use webhooks from the monolith (SCIM server) back to the Identity Service (SCIM client), totally out-of-band to anything defined by SCIM.  This would have worked OK if the webhook technology was actually reliable and incorporated things like guaranteed forwarding of the change event messages.  It doesn't work very well, of course, when the Identity Service hasn't fully processed the initial snapshot yet but receives a realtime update from the monolith.  In retrospect, this approach was probably a mistake -- a different messaging technology would have been better for a narrow point-to-point requirement like this, but would not have addressed all of the issues.  And what about a more broad-based notification requirement?
>> 
>> Could we have turned the whole thing around, and made the new Identity Service the SCIM server, and the monolith the SCIM client?  I suppose, but it would have required the Identity Service to be involved in a gazillion things that were not authentication or authorization related, and the need for some sort of out-of-band "incremental change" event notifications would have remained, just going in the other direction.
>> 
>> For the SCIM specification, It's the "incremental change" thing that is the hard nut to crack, IMHO.  But it's actually a bigger problem than that, endemic to any scenario where you are trying to tear apart a monolith.
>> 
>> Craig McClanahan
>> 
>> 
>> 
>> On Mon, Aug 23, 2021 at 4:30 PM Matt Peterson (mpeterso) <Matt.Peterson=40oneidentity.com@dmarc.ietf.org> wrote:
>>> Thanks for the clarification.  
>>> 
>>> "Client" and "Server" are as used dozens of times in the existing RFCs which makes these as useful in understanding the SCIM protocol as they are for understanding the HTTP protocol.  In a SCIM protocol exchange there is a thing that provides a resource (the server) and a thing that makes a HTTP request (GET,POST,PUT,PATCH,DELETE) on the resource (the client).  
>>> 
>>> I think it would be helpful to try and use this generally accepted HTTP terminology if we can -- especially when talking about this synchronization topic.  
>>> 
>>> Management server <-- SCIM Client
>>> Application Server <-- SCIM Server
>>> 
>>> With this clarification, seems to me like your Management Server is using SCIM to manage accounts on the Application Server, and that the Management Server would benefit from having an up-to-date cache of accounts that are on the Application Server(s) even for cases where account changes/additions/deletions are made "out of band"  (e.g. directly on the Application Server, not initiated by the Management server.)
>>> 
>>> Did I understand correctly?
>>> 
>>> 
>>> 
>>> -----Original Message-----
>>> From: Danny Mayer <mayer@pdmconsulting.net> 
>>> Sent: Friday, August 20, 2021 8:54 AM
>>> To: Matt Peterson (mpeterso) <Matt.Peterson@oneidentity.com>; SCIM WG <scim@ietf.org>
>>> Subject: Re: [scim] SCIM Synchronization Problem
>>> 
>>> CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.
>>> 
>>> 
>>> I decided to avoid the client/server naming convention because I find it confusing. The system responsible for managing the user accounts and groups I have declared to be the Management server. I believe you describe this as the "SCIM Client" and is used to GET/POST/PATCH, etc.
>>> to the Application server to maintain the user accounts and groups. The application server is what I believe you call the "SCIM Server". This provides the SCIM API's needed.
>>> 
>>> I find the use of client/server not helpful to understand their role in the protocol.
>>> 
>>> Danny
>>> 
>>> On 8/19/21 1:42 PM, Matt Peterson (mpeterso) wrote:
>>> > Danny,
>>> >
>>> > To help me understand in your post is the "SCIM Client" and what is the "SCIM Server" can you tell me which of your components (the Management Server or the Application Server) implements SCIM endpoints?
>>> >
>>> > The component that acts as a "SCIM Server" is the component that provides the of WebAPI endpoints that conform to the SCIM spec.  For example, the  /user, /group SCIM endpoints.
>>> >
>>> > The component(s) that uses these endpoints to query (GET)  users/groups or to create (POST) user/groups is the "SCIM client".
>>> >
>>> > --
>>> > Matt
>>> >
>>> > -----Original Message-----
>>> > From: scim <scim-bounces@ietf.org> On Behalf Of Danny Mayer
>>> > Sent: Wednesday, August 18, 2021 8:34 AM
>>> > To: SCIM WG <scim@ietf.org>
>>> > Subject: [scim] SCIM Synchronization Problem
>>> >
>>> > CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.
>>> >
>>> >
>>> > I decided that this needs it's own thread and not be part of the meeting minutes.
>>> >
>>> > I have had a great deal of experience dealing with the user account synchronization problem. Here's my view of the problems.
>>> >
>>> > I will be calling one system Management Server and the other system Application Server. I found client/server labels confusing. The Management Server is what I am defining to be the server that sends updates to add/update/remove users and groups to the Application server whose account, groups and access permissions are being managed.
>>> >
>>> > First some definitions of user accounts. There are usually more than one of each of these:
>>> > 1. Builtin accounts
>>> > 2. Special-purpose accounts
>>> > 3. Employee
>>> > 4. Contractor
>>> > 5. Agent
>>> > 6. Customer
>>> >
>>> > There may be more.
>>> >
>>> > 1. Builtin accounts: These are accounts that applications have and there may be more than one. There is always an admin account which can do anything, for example the administrator account in Active Directory or a database admin account. The application may have more accounts for other purposes.
>>> >
>>> > 2. Special-purpose accounts: These may be set up to provide access to other applications, for example a SCIM request to a SCIM REST API should be handled by a special account which cannot be used to login via a UI interface and only be able to perform certain functions. In addition there may be accounts set up to listen for topics or queues on a message queue among other possibilities. Keeping separate accounts like this are important for tracking in logs and applications.
>>> >
>>> > 3. Employee: These are accounts that employees may login to the application.
>>> >
>>> > 4. Contractors: These accounts that a contractor performing work for the company may use to log into an application. Unlike Employee accounts these would have an expiration date.
>>> >
>>> > 5. Agent: Accounts like this are for external users who may need to manage information for their own customers. An example of this is an insurance agent logging in to handle an insurance policy for their clients.
>>> >
>>> > 6. Customers: These are where the customers are using the application directly. For a bank it's likely to be millions of customers. The management platform should not be involved in managing these accounts.
>>> >
>>> > Let's now look at a few example applications.
>>> >
>>> > 1. Helpdesk
>>> > All employees and contractors will need to be able to log into a helpdesk application and enter tickets. This means loading information about all employees and contractors. For a company with only 1000 employees that's manageable. For a company with 100K employees, it's a bigger challenge.
>>> >
>>> > 2. Customer Support
>>> > Only employees or contractors in the department providing customer support need access plus a few other employees. In addition identified customers may need accounts.
>>> >
>>> > 3. Expenses
>>> > Not all employees or contractors will be submitting expenses so it may not be necessary to have accounts for all possible users. This is something that the application owner needs to decide.
>>> >
>>> > Now let's look at logistics.
>>> >
>>> > Bulk load:
>>> > Each application will need an initial set of accounts set up and for something like a helpdesk this could involve load 1000-100,000 accounts.
>>> > The information needed could come from either the management server or separately, say from an HR system. Many servers that I have encountered limit the number of records to something like 1000, so the pagination requirement is needed for this. Even when dealing with a limited subset of employees or contractors you can run into this need.
>>> >
>>> > Synchronization
>>> > An application that is bulk-loaded above may need to be synchronized to the management server if the data did not come from the management server.
>>> >
>>> > Change Management
>>> > This is really a synchronization issue as well. Changes happen all the time and new employees/contractors need to be added, terminated ones removed and updates happen all the time. The best way of dealing with this may be to set up a message queue that each application can subscribe to and they can take the needed action when it's convenient for that application. It's not the only method but it's the one I found to be the most helpful. There are two ways of doing that: 1. send the complete user information for new accounts, send just the change for updating accounts, send the ID for terminated accounts along with some meta information. The other method which I have used is just to send the ID and whether it's new, updated or terminated.
>>> >
>>> > I hope this is helpful to the discussion.
>>> >
>>> > Danny
>>> >
>>> >
>>> > _______________________________________________
>>> > scim mailing list
>>> > scim@ietf.org
>>> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.
>>> > ietf.org%2Fmailman%2Flistinfo%2Fscim&amp;data=04%7C01%7CMatt.Peterson%
>>> > 40oneidentity.com%7C93c364afa0274e3fb35b08d963ea643e%7C91c369b51c9e439
>>> > c989c1867ec606603%7C0%7C1%7C637650680638761908%7CUnknown%7CTWFpbGZsb3d
>>> > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
>>> > 3000&amp;sdata=BqZusI4HsTB%2FyPo9SCrTMW6ZdIQNyVrPxGG%2BiiXx2fs%3D&amp;
>>> > reserved=0
>>> >
>>> 
>>> _______________________________________________
>>> scim mailing list
>>> scim@ietf.org
>>> https://www.ietf.org/mailman/listinfo/scim
> _______________________________________________
> scim mailing list
> scim@ietf.org
> https://www.ietf.org/mailman/listinfo/scim