Re: [scim] SCIM Synchronization Problem

Danny Mayer <mayer@pdmconsulting.net> Tue, 24 August 2021 15:48 UTC

Return-Path: <mayer@pdmconsulting.net>
X-Original-To: scim@ietfa.amsl.com
Delivered-To: scim@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6CEDE3A1923 for <scim@ietfa.amsl.com>; Tue, 24 Aug 2021 08:48:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uxMFQgVxJf_C for <scim@ietfa.amsl.com>; Tue, 24 Aug 2021 08:48:16 -0700 (PDT)
Received: from chessie.everett.org (chessie.everett.org [66.220.13.234]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2FA163A191D for <scim@ietf.org>; Tue, 24 Aug 2021 08:48:16 -0700 (PDT)
Received: from newusers-MBP.fios-router.home (pool-108-26-179-179.bstnma.fios.verizon.net [108.26.179.179]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by chessie.everett.org (Postfix) with ESMTPSA id 4GvD654GhKzMNWp; Tue, 24 Aug 2021 15:48:13 +0000 (UTC)
To: craigmcc@gmail.com, "Matt Peterson (mpeterso)" <Matt.Peterson=40oneidentity.com@dmarc.ietf.org>
Cc: SCIM WG <scim@ietf.org>
References: <e8f9d66c-f356-61b8-d38a-b5288fb9c518@pdmconsulting.net> <MWHPR19MB095771672B28345FA22D6399E1C09@MWHPR19MB0957.namprd19.prod.outlook.com> <10cb88b4-b115-8927-921c-1bde137940fa@pdmconsulting.net> <MWHPR19MB0957D8533A439A99C4E52C97E1C49@MWHPR19MB0957.namprd19.prod.outlook.com> <CANgkmLBneyCYFK4Hn0CG4pXDKW5Nfm+Z77dbnhqrGpotaM8EkQ@mail.gmail.com>
From: Danny Mayer <mayer@pdmconsulting.net>
Message-ID: <6062e0f5-736f-b3e6-79b9-6002c72a06fa@pdmconsulting.net>
Date: Tue, 24 Aug 2021 11:48:12 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.13.0
MIME-Version: 1.0
In-Reply-To: <CANgkmLBneyCYFK4Hn0CG4pXDKW5Nfm+Z77dbnhqrGpotaM8EkQ@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------5CD36CE9981521B1D5E790A0"
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/scim/H7vMOurfr2lyXxxK5ns9Rw7l3EA>
Subject: Re: [scim] SCIM Synchronization Problem
X-BeenThere: scim@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Simple Cloud Identity Management BOF <scim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scim>, <mailto:scim-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/scim/>
List-Post: <mailto:scim@ietf.org>
List-Help: <mailto:scim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scim>, <mailto:scim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Aug 2021 15:48:35 -0000

I've been there and done that. We were lucky enough to have a way to 
differentiate between someone logging in and out and when an update is 
made to something that the SCIM client cares about. We preferred to send 
a notification to a queue that something has changed for the specific ID 
and application and leave it up to the "SCIM client" to query the 
application for the given ID. This had to be a manual operation as the 
SCIM client had no way of reading the queue. The queue was necessary for 
compliance and audit purposes so it wasn't a waste.

Danny

On 8/23/21 9:43 PM, Craig McClanahan wrote:
> Several years ago (early in the lifetime of the SCIM specification), I 
> was involved in exactly this kind of situation.  I was trying to 
> extract the Identity portion of a monolith out into a separate 
> service.  Naturally, as you'll see in lots of monoliths, the User 
> database information was used for *lots* of things, not just 
> authentication and authorization, so we couldn't just remove it from 
> the monolith's database.
>
> From a SCIM terminology perspective, life was actually pretty clear -- 
> the monolith was the SCIM server (the source of truth), and the new 
> Identity Service was the SCIM client.  Creating the initial download 
> mechanism was no big deal -- one humongous batch transfer (or done in 
> pieces if need be) and the Identity Service is up to date with the 
> initial snapshot.  But what happens when:
>
>   * User changes their name (or a bunch of other profile fields)?  Not
>     security sensitive, but definitely UI sensitive.  If they are part
>     of the data transferred from the SCIM server to the SCIM client,
>     it's definitely relevant.
>   * User logs on or logs off?  (Groan ... the monolith cared a lot
>     about this, because it affects UI like "is this person currently
>     logged in" ... even worse, the monolith updated a field in the
>     actual user database row, which triggered a gazillion "user
>     change" events (originally internal to the monolith) documenting a
>     change that had no other impact than to change the last logged in
>     timestamp ... sigh ... performance ... sigh.)
>   * User changes their password?  We probably could have absorbed that
>     function, but would have required modifications to a bunch of apps
>     out of my team's purview, so we didn't.
>   * For that matter, *any* change happens between when the snapshot
>     was taken, and the Identity Service caught up.
>
> Anyone who has ever tried this trick has most likely run into the same 
> kinds of issues -- how do you deal with incremental changes that need 
> to be communicated from the server to the client?  Some of those 
> changes (passwords, ACLs, account deactivations, etc) are very much 
> time critical.
>
> Our team chose to try to use webhooks from the monolith (SCIM server) 
> back to the Identity Service (SCIM client), totally out-of-band to 
> anything defined by SCIM.  This would have worked OK if the webhook 
> technology was actually reliable and incorporated things like 
> guaranteed forwarding of the change event messages.  It doesn't work 
> very well, of course, when the Identity Service hasn't fully processed 
> the initial snapshot yet but receives a realtime update from the 
> monolith.  In retrospect, this approach was probably a mistake -- a 
> different messaging technology would have been better for a narrow 
> point-to-point requirement like this, but would not have addressed all 
> of the issues.  And what about a more broad-based notification 
> requirement?
>
> Could we have turned the whole thing around, and made the new Identity 
> Service the SCIM server, and the monolith the SCIM client?  I suppose, 
> but it would have required the Identity Service to be involved in a 
> gazillion things that were not authentication or authorization 
> related, and the need for some sort of out-of-band "incremental 
> change" event notifications would have remained, just going in the 
> other direction.
>
> For the SCIM specification, It's the "incremental change" thing that 
> is the hard nut to crack, IMHO.  But it's actually a bigger problem 
> than that, endemic to any scenario where you are trying to tear apart 
> a monolith.
>
> Craig McClanahan
>
>
>
> On Mon, Aug 23, 2021 at 4:30 PM Matt Peterson (mpeterso) 
> <Matt.Peterson=40oneidentity.com@dmarc.ietf.org 
> <mailto:40oneidentity.com@dmarc.ietf.org>> wrote:
>
>     Thanks for the clarification.
>
>     "Client" and "Server" are as used dozens of times in the existing
>     RFCs which makes these as useful in understanding the SCIM
>     protocol as they are for understanding the HTTP protocol.  In a
>     SCIM protocol exchange there is a thing that provides a resource
>     (the server) and a thing that makes a HTTP request
>     (GET,POST,PUT,PATCH,DELETE) on the resource (the client).
>
>     I think it would be helpful to try and use this generally accepted
>     HTTP terminology if we can -- especially when talking about this
>     synchronization topic.
>
>     Management server <-- SCIM Client
>     Application Server <-- SCIM Server
>
>     With this clarification, seems to me like your Management Server
>     is using SCIM to manage accounts on the Application Server, and
>     that the Management Server would benefit from having an up-to-date
>     cache of accounts that are on the Application Server(s) even for
>     cases where account changes/additions/deletions are made "out of
>     band"  (e.g. directly on the Application Server, not initiated by
>     the Management server.)
>
>     Did I understand correctly?
>
>
>
>     -----Original Message-----
>     From: Danny Mayer <mayer@pdmconsulting.net
>     <mailto:mayer@pdmconsulting.net>>
>     Sent: Friday, August 20, 2021 8:54 AM
>     To: Matt Peterson (mpeterso) <Matt.Peterson@oneidentity.com
>     <mailto:Matt.Peterson@oneidentity.com>>; SCIM WG <scim@ietf.org
>     <mailto:scim@ietf.org>>
>     Subject: Re: [scim] SCIM Synchronization Problem
>
>     CAUTION: This email originated from outside of the organization.
>     Do not follow guidance, click links, or open attachments unless
>     you recognize the sender and know the content is safe.
>
>
>     I decided to avoid the client/server naming convention because I
>     find it confusing. The system responsible for managing the user
>     accounts and groups I have declared to be the Management server. I
>     believe you describe this as the "SCIM Client" and is used to
>     GET/POST/PATCH, etc.
>     to the Application server to maintain the user accounts and
>     groups. The application server is what I believe you call the
>     "SCIM Server". This provides the SCIM API's needed.
>
>     I find the use of client/server not helpful to understand their
>     role in the protocol.
>
>     Danny
>
>     On 8/19/21 1:42 PM, Matt Peterson (mpeterso) wrote:
>     > Danny,
>     >
>     > To help me understand in your post is the "SCIM Client" and what
>     is the "SCIM Server" can you tell me which of your components (the
>     Management Server or the Application Server) implements SCIM
>     endpoints?
>     >
>     > The component that acts as a "SCIM Server" is the component that
>     provides the of WebAPI endpoints that conform to the SCIM spec. 
>     For example, the  /user, /group SCIM endpoints.
>     >
>     > The component(s) that uses these endpoints to query (GET) 
>     users/groups or to create (POST) user/groups is the "SCIM client".
>     >
>     > --
>     > Matt
>     >
>     > -----Original Message-----
>     > From: scim <scim-bounces@ietf.org
>     <mailto:scim-bounces@ietf.org>> On Behalf Of Danny Mayer
>     > Sent: Wednesday, August 18, 2021 8:34 AM
>     > To: SCIM WG <scim@ietf.org <mailto:scim@ietf.org>>
>     > Subject: [scim] SCIM Synchronization Problem
>     >
>     > CAUTION: This email originated from outside of the organization.
>     Do not follow guidance, click links, or open attachments unless
>     you recognize the sender and know the content is safe.
>     >
>     >
>     > I decided that this needs it's own thread and not be part of the
>     meeting minutes.
>     >
>     > I have had a great deal of experience dealing with the user
>     account synchronization problem. Here's my view of the problems.
>     >
>     > I will be calling one system Management Server and the other
>     system Application Server. I found client/server labels confusing.
>     The Management Server is what I am defining to be the server that
>     sends updates to add/update/remove users and groups to the
>     Application server whose account, groups and access permissions
>     are being managed.
>     >
>     > First some definitions of user accounts. There are usually more
>     than one of each of these:
>     > 1. Builtin accounts
>     > 2. Special-purpose accounts
>     > 3. Employee
>     > 4. Contractor
>     > 5. Agent
>     > 6. Customer
>     >
>     > There may be more.
>     >
>     > 1. Builtin accounts: These are accounts that applications have
>     and there may be more than one. There is always an admin account
>     which can do anything, for example the administrator account in
>     Active Directory or a database admin account. The application may
>     have more accounts for other purposes.
>     >
>     > 2. Special-purpose accounts: These may be set up to provide
>     access to other applications, for example a SCIM request to a SCIM
>     REST API should be handled by a special account which cannot be
>     used to login via a UI interface and only be able to perform
>     certain functions. In addition there may be accounts set up to
>     listen for topics or queues on a message queue among other
>     possibilities. Keeping separate accounts like this are important
>     for tracking in logs and applications.
>     >
>     > 3. Employee: These are accounts that employees may login to the
>     application.
>     >
>     > 4. Contractors: These accounts that a contractor performing work
>     for the company may use to log into an application. Unlike
>     Employee accounts these would have an expiration date.
>     >
>     > 5. Agent: Accounts like this are for external users who may need
>     to manage information for their own customers. An example of this
>     is an insurance agent logging in to handle an insurance policy for
>     their clients.
>     >
>     > 6. Customers: These are where the customers are using the
>     application directly. For a bank it's likely to be millions of
>     customers. The management platform should not be involved in
>     managing these accounts.
>     >
>     > Let's now look at a few example applications.
>     >
>     > 1. Helpdesk
>     > All employees and contractors will need to be able to log into a
>     helpdesk application and enter tickets. This means loading
>     information about all employees and contractors. For a company
>     with only 1000 employees that's manageable. For a company with
>     100K employees, it's a bigger challenge.
>     >
>     > 2. Customer Support
>     > Only employees or contractors in the department providing
>     customer support need access plus a few other employees. In
>     addition identified customers may need accounts.
>     >
>     > 3. Expenses
>     > Not all employees or contractors will be submitting expenses so
>     it may not be necessary to have accounts for all possible users.
>     This is something that the application owner needs to decide.
>     >
>     > Now let's look at logistics.
>     >
>     > Bulk load:
>     > Each application will need an initial set of accounts set up and
>     for something like a helpdesk this could involve load 1000-100,000
>     accounts.
>     > The information needed could come from either the management
>     server or separately, say from an HR system. Many servers that I
>     have encountered limit the number of records to something like
>     1000, so the pagination requirement is needed for this. Even when
>     dealing with a limited subset of employees or contractors you can
>     run into this need.
>     >
>     > Synchronization
>     > An application that is bulk-loaded above may need to be
>     synchronized to the management server if the data did not come
>     from the management server.
>     >
>     > Change Management
>     > This is really a synchronization issue as well. Changes happen
>     all the time and new employees/contractors need to be added,
>     terminated ones removed and updates happen all the time. The best
>     way of dealing with this may be to set up a message queue that
>     each application can subscribe to and they can take the needed
>     action when it's convenient for that application. It's not the
>     only method but it's the one I found to be the most helpful. There
>     are two ways of doing that: 1. send the complete user information
>     for new accounts, send just the change for updating accounts, send
>     the ID for terminated accounts along with some meta information.
>     The other method which I have used is just to send the ID and
>     whether it's new, updated or terminated.
>     >
>     > I hope this is helpful to the discussion.
>     >
>     > Danny
>     >
>     >
>     > _______________________________________________
>     > scim mailing list
>     > scim@ietf.org <mailto:scim@ietf.org>
>     >
>     https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
>     <https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww>.
>     > ietf.org
>     <http://ietf.org>%2Fmailman%2Flistinfo%2Fscim&amp;data=04%7C01%7CMatt.Peterson%
>     > 40oneidentity.com
>     <http://40oneidentity.com>%7C93c364afa0274e3fb35b08d963ea643e%7C91c369b51c9e439
>     >
>     c989c1867ec606603%7C0%7C1%7C637650680638761908%7CUnknown%7CTWFpbGZsb3d
>     >
>     8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
>     >
>     3000&amp;sdata=BqZusI4HsTB%2FyPo9SCrTMW6ZdIQNyVrPxGG%2BiiXx2fs%3D&amp;
>     > reserved=0
>     >
>
>     _______________________________________________
>     scim mailing list
>     scim@ietf.org <mailto:scim@ietf.org>
>     https://www.ietf.org/mailman/listinfo/scim
>     <https://www.ietf.org/mailman/listinfo/scim>
>