Re: [scim] Charter discussion item: What are the use cases for having a SCIM cursors

Phillip Hunt <phil.hunt@independentid.com> Wed, 07 July 2021 19:38 UTC

From: Phillip Hunt <phil.hunt@independentid.com>
Message-Id: <AA5823E7-E936-4152-8D0C-852CDC8A55D6@independentid.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_C3214C72-71C4-4422-9EE2-0E75026A83C5"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\))
Date: Wed, 07 Jul 2021 12:38:11 -0700
In-Reply-To: <MWHPR19MB095795E416B23D8A6486F494E11A9@MWHPR19MB0957.namprd19.prod.outlook.com>
Cc: SCIM WG <scim@ietf.org>
To: "Matt Peterson (mpeterso)" <Matt.Peterson@oneidentity.com>
References: <MWHPR19MB095795E416B23D8A6486F494E11A9@MWHPR19MB0957.namprd19.prod.outlook.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/scim/gw0oNQYmbyr_HyAJRPSz57Ju6nw>
Subject: Re: [scim] Charter discussion item: What are the use cases for having a SCIM cursors
Precedence: list

Matt

Thanks for the info.  I think there are a number of parts ot this.

1.  Client requirements

Why do scim client want/need paging?  What are the cases that drive the standard spec design (as opposed to how to implement). Do clients want change detection? Data set reconciliation?  Metadir functions? UI rendering? What are clients trying to accomplish?

2. Flow-through cursors

In some cases SCIM servers act as gateways and may be constrained by sources that won’t release the same data sets.  I think there has been an argument that if SCIM gateways can pass a cursur through than there is no performance cost to the scim server, even though there may be a large cost to the database.

Questions:
* is the scim result set equal to the DB result set. To pass the cursor through it has to be 1:1 or you will get varying result sets.  Things that might cause differences: scim filter processing differences (the final set is a subset of that returned from the db). Access control differences because Scim enforcement is different than the data source. 
* Given the varying underlying cursor designs is it realistic to try and create a standardized layer?  For example, some cursors are an index to retrieve the next page. Some are just identifiers that remain the same from call to call where the client changes the page number forwards and backwards.  Is it realistic to standardize this?  Can we realistically virtualize it in the SCIM GW layer so it always works the same for a prospective SCIM client?
* In a cluster, what happens when one node receives a curser issued by a different scim cluster node? Will the underlying databse allow the cursor to be passed from a different node?

3. SCIM server generatede cursors
The concern in the prior WG is that most underlying APIs did not support paging at all. Paging would require a scim gateway to parse all results and *hold* state/copy of the entire result creating its own cursor. For large result sets, holding them in memory for a large period of time is not scalable across more than a couple clients.  In a cluster, does this create additinal problem if subsequent requests don’t come back to the same server instance?

I think we need to understand:
* the reasons why clients want/use paging (cursors or not). If polling for changes are etags being used properly?  
* as aluded to above, what is the opportunity for universality on a single standard cursor model (flow through or virtualized in the scim gateway). There are many good solutions but can scim standardize something that isn’t standard underneath?

Phil

> On Jul 7, 2021, at 11:47 AM, Matt Peterson (mpeterso) <Matt.Peterson@oneidentity.com> wrote:
> 
> 
> Phil,
>  
> In order to implement index/offset pagination in SCIM on top of an API or database that uses cursor pagination, the SCIM Service Provider must fetch the entire result set (i.e. iterate on the cursor to completion), then store the full result set in order to serve (via SCIM) a specific indexed page.   This resource intensive translating between cursor pagination (the backend protocol/database) and SCIM index pagination (SCIM) was a showstopper for our implementation which uses low cost “serverless” (AWS Lambda, Azure Functions) architecture. 
>  
> We have implemented “front end” SCIM Service Provider for all of the “back-end” services in this list: https://www.cloud.oneidentity.com/products/connect/connectors <https://www.cloud.oneidentity.com/products/connect/connectors> and we found that the majority of the web protocols that we were calling on the “back end” use SCIM service provider use cursor pagination.  We would only have been able to support a fraction of these application without implementing the scim-cursor-pagination draft.
>  
> I recognize that my use case of a SCIM Service provider with dozens of “backends” is not common.  However, the frequency with which we encounter cursor pagination in existing APIs and protocols seems like an important consideration for the SCIM workgroup.  For example, one does not even need to venture farther than adding a SCIM interface on top of LDAP to encounter the pagination translation problem described above.
>  
> As for why SCIM clients need to query SCIM Service Provider to obtain large result sets, here is what we have encountered:
>  
> Constructing and enforcing Application authorization models -  An application (acting as a SCIM client) downloads users/groups from an IdP in order to present views to application administrators that are used to create RBAC, Policy and ACL authorization rules.  Enforcement-time authorization decisions need to be made quickly such that authorization-time calls to SCIM service provider is not viable which is why many applications maintain an “sync’d” application-side cache of users and groups.
> Identity Management and Governance systems – Use a “canonical” identity model where all accounts and groups are represented.  This model is used to create provisioning rules and calculate separation of duty violations, attestations, and approvals etc.   Management-time evaluation of the model needs to be done efficiently without blocking calls to connected systems.
>  
> In both 1 and 2 (above) the there is an *initial load* of objects into the SCIM client’s data model and then subsequent use of SCIM to keep client’s data model with changes on SCIM Service Provider. 
>  
> Pagination is desirable for the initial load of all objects.  However, the subsequent up-to-date maintenance of a client’s copy (cache) of the results is not necessarily a good use case for pagination.  For example, the need to re-request all objects in order to detect a deleted object is something that could be made much more efficient and should be addressed separately from the pagination topic.  
>  
> I have posted to this mailing list about “Maintaining SCIM client-side cache consistency with SCIMv2”.  See post with this same subject: (https://mailarchive.ietf.org/arch/msg/scim/P5DVTqWLmVKqyvD0dgUszADQZrE/).  There were no public responses to this thread, but I still believe that this is a VERY common pattern.   I did receive a good private response from Phil Hunt who suggested use of “ETags and RFC7232 Http Conditional requests” – an approach definitely worth discussing. 
>  
> As feedback to our charter draft, I believe that there is enough interest in two distinct areas to merit mention in the charter:
>  
> “Pagination of large data sets” - including interest in pagination of multi-valued attributes.  Yes, I think that object pagination and attribute pagination could be considered in the same topic.  Group membership is the primary use case for large multi-valued attributes. I believe the group membership use cases can be addressed with object pagination and best practice filters (see https://mailarchive.ietf.org/arch/msg/scim/oZ3DcI15AOvA519rH5sRtJOJV2A) <https://mailarchive.ietf.org/arch/msg/scim/oZ3DcI15AOvA519rH5sRtJOJV2A)>”.  Another alternative Dale Old’s suggestion this morning, of a “GroupMembers collection” (which would allow use of object pagination for group members and, probably easier detection of group membership changes).
> “Maintaining SCIM client-side cache consistency with SCIMv2” (https://mailarchive.ietf.org/arch/msg/scim/P5DVTqWLmVKqyvD0dgUszADQZrE/ <https://mailarchive.ietf.org/arch/msg/scim/P5DVTqWLmVKqyvD0dgUszADQZrE/>).  )
>  
> --
> Matt Peterson
> matt.peterson@oneidentity.com
>  
> From: scim <scim-bounces@ietf.org> On Behalf Of Phil Hunt
> Sent: Wednesday, July 7, 2021 10:17 AM
> To: SCIM WG <scim@ietf.org>
> Subject: [scim] Charter discussion item: What are the use cases for having a SCIM cursors
>  
> CAUTION: This email originated from outside of the organization. Do not follow guidance, click links, or open attachments unless you recognize the sender and know the content is safe.
>  
> As promised on the call today, this is to follow up and ask for use cases for people who want to use cursors. I think this would be helpful to understand in order to drive to a common purpose and solution.
>  
> For example, today I think I heard:
> *  Some clients want to confirm resources have been deleted.   Why does this come about? SCIM already returns a definitive success/fail upong HTTP DELETE. Is it a case of co-ordination between multiple clients? 
>  
> * Is it used to re-concile (e.g. meta-directory style)  between disparately managed systems periodically? 
>  
> * Others reasons?
>  
> Can the use of cursors be confined to “specialized” clients where cursor might be consider a special “priviledge”.  IOW….would you allow javascript UI components to use cursors against your SCIM server?
>  
> Phil Hunt
> @independentid
> phil.hunt@independentid.com <mailto:phil.hunt@independentid.com>
>  
>  
> 
>

[scim] Charter discussion item: What are the use … Phil Hunt
Re: [scim] Charter discussion item: What are the … Shon Vella
Re: [scim] Charter discussion item: What are the … Matt Peterson (mpeterso)
Re: [scim] Charter discussion item: What are the … Phillip Hunt