Re: [GNAP] DID as sub_id or assertion?

Justin Richer <jricher@mit.edu> Wed, 10 March 2021 19:18 UTC

Return-Path: <jricher@mit.edu>
X-Original-To: txauth@ietfa.amsl.com
Delivered-To: txauth@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EECC73A1622 for <txauth@ietfa.amsl.com>; Wed, 10 Mar 2021 11:18:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.918
X-Spam-Level:
X-Spam-Status: No, score=-1.918 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IIAhcNo4Lw_D for <txauth@ietfa.amsl.com>; Wed, 10 Mar 2021 11:18:16 -0800 (PST)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CE90F3A1621 for <txauth@ietf.org>; Wed, 10 Mar 2021 11:18:15 -0800 (PST)
Received: from [192.168.1.22] (static-71-174-62-56.bstnma.fios.verizon.net [71.174.62.56]) (authenticated bits=0) (User authenticated as jricher@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 12AJID38026142 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 10 Mar 2021 14:18:13 -0500
From: Justin Richer <jricher@mit.edu>
Message-Id: <B3A02C1B-5DF6-46AE-B806-8DBBF5F6B701@mit.edu>
Content-Type: multipart/alternative; boundary="Apple-Mail=_7062E67D-0005-480C-8BDD-4E3B8DC62412"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
Date: Wed, 10 Mar 2021 14:18:13 -0500
In-Reply-To: <CAM8feuQ5Q1LrGtniCH3WN5gyf6QhBa-9e+2kzaV0fxzA5D5m7w@mail.gmail.com>
Cc: GNAP Mailing List <txauth@ietf.org>
To: Fabien Imbault <fabien.imbault@gmail.com>
References: <CAM8feuQ5Q1LrGtniCH3WN5gyf6QhBa-9e+2kzaV0fxzA5D5m7w@mail.gmail.com>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/txauth/j-kLkYkcXuW71kA3kjkAF9hS7-E>
Subject: Re: [GNAP] DID as sub_id or assertion?
X-BeenThere: txauth@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: GNAP <txauth.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/txauth>, <mailto:txauth-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/txauth/>
List-Post: <mailto:txauth@ietf.org>
List-Help: <mailto:txauth-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/txauth>, <mailto:txauth-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Mar 2021 19:18:20 -0000

Hi Fabien,

First, thanks for doing the heavy lift of writing this topic up. Of these options, I would strongly prefer option (1). The difference between assertions and identifiers is that an assertion is a fully packaged set of attributes, one of which can be an identifier. They are, generally speaking, cryptographically protected bundles. They’re designed to carry a bunch of different information about a user and the authentication event in a form that’s independently verifiable and auditable. SAML assertions (XML document) and ID Tokens (signed JWTs) are the most common ones we see today, but other formats exist and will keep being invented. 

An identifier is a different kind of creature. It is a single statement that identifies the party in question in the context of the party doing the identification. It’s this full context that the receiver of the identifier needs to interpret it in. When using a federated identity protocol, each RP is going to have :some: kind of local user information store that they’re going to tie into, some field in a database they’ll key off of. It’s just never going to be the case that all RPs will be able to use server-provided opaque identifiers for all cases. Account recovery and binding is one such use case where a user-facing identifier like an email address makes a LOT of sense, for one small example. This is an array in the request and response precisely because a user could be known by several different identifiers to a party, and an RP will have a better idea about what kinds of things it wants to correlate against, and it’s up to the AS to provide that mapping. I think that if we don’t define a way to describe the format, the “as_id” field is going to get structure back-patched onto it like is being proposed here:

https://mattrglobal.github.io/oidc-portable-identities/ <https://mattrglobal.github.io/oidc-portable-identities/>

I’m personally not a fan of that approach as it mixes the data layering too much, and you end up making guesses about whether a field is meaningful or a pointer. I think GNAP will be much better served by using a data structure for this from the start. DIDs should be another “format” of the subject identifier. I don’t think GNAP should define this format, and I’ve dropped a note to the SECEVENT mailing list asking about adding DIDs to the subject identifiers draft before it gets published. DIDs are not assertions, even in the cases where they point to fully functional DID documents with user information in them. (And for what it’s worth, I think the OIDC world should also use the SECEVENT draft inside ID Tokens to solve the use case of the draft above).

For the user != RO cases, I think that potentially having a way to signal those two items separately in the request and response could be useful. Right now all the user/subject stuff is about identifying “the user that’s currently here at the client instance”, in a variety of forms that the client instance and AS can negotiate. If the client or AS has some other notion about who the RO is and how that could be separate from the current user, we could provide a way to signal that like you’re proposing below. In my mental model, there are going to be a lot of use cases where the “RO” isn’t known to the client at all, but rather to the AS and is based on what the client’s asking for. So I could identify the current user and ask for a specific medical record, and the AS realizes it has to go bug a particular doctor to approve things directly. This is instead of interacting with the current user and failing in an awkward way when the current user doesn’t have the rights to do things, but someone else would. But we’d want the system to be able to detect if the current user is the doctor, so we can just interact with them directly. Does that all track?

 — Justin

> On Mar 10, 2021, at 4:10 AM, Fabien Imbault <fabien.imbault@gmail.com> wrote:
> 
> Hello,
> 
> Notice the question mark in the title. Not to be considered as an editor's comment.
> This thread intends to provide a detailed comment on the interesting feedback by Kristina: "I would catch up on the thread to understand why DIDs are thought to be treated as assertions".
> In the chat, I answered: "Happy to get the discussion on DIDs. Actually, could be either or, depending on what we intend to do." Let me expand a bit on that, to make it understandable (didn't have the time during the meeting, unfortunately).
> 
> This relates to the items described on slides 20 and 34, which I'll explain further. I'll highlight as transparently as possible what I think would be the benefits and downsides of each. Both are probably viable options, but they correspond to different scopes and objectives. I'll spend a bit more time on option 2, since it is a different approach compared to the current draft.
> 
> 1. The first possibility is to use SECEVENT throughout. 
> In that case, DIDs should be a part of sub_ids (in the core or in the extension registry, depending on what Annabelle's WG decides). More generally, the WG would have to decide which additional sub_ids it needs (as per Yaron's comment).
> 
> That's obviously the closest to what is described in draft-04. It allows a communication of subject information between the client and the AS, by using global identifiers (mail, phone, etc.), or using a local opaque reference - i.e. governed by the AS (that's probably what we would put in the examples, unless someone has a better idea - cf issues #16 and #42). 
> 
> Assertions relate for instance to the proof of presence of the RO. The current draft-04 therefore plans for id_tokens and saml2, maybe there could be other needs later (extension registry).
> 
> - Benefits: identifiers are a key component of GNAP, which makes integration easier. Possibly one could choose to use DIDs throughout.
> - Downsides: email/phone identifiers will probably be used throughout by most devs (since that's correlation information they have in their user DB). Which means we should limit what assertions contain, to avoid doing links between weak global identifiers and hopefully stronger assertions. That's not really a problem if it's limited to authentication events mostly (in that perspective, samlv2 could make sense in the core). An assertion is probably a single value then (not an array).   
> 
> Option 1 can be summarized as: 
> 
> "subject": {
>    "sub_ids": { },			// request and response (SECEVENT -> including DID possibly)                   
>    "assertions": [ ],                   // request and response (id_token or samlv2)
> }
> ("hints" do not exist currently, but are partially covered separately via "request.user" / the relevance of "principal", discussed in the slide, is a question mark too - which is a different discussion on end-user != RO).
> 
> 
> 2. An alternate possibility is to use SECEVENT only as input/hint
> In that case, DIDs should be a part of assertions (which have a different meaning compared to option 1).
> 
> The AS only returns an local opaque reference as_ref (while in option 1, it was only one of many possibilities). Its only job is to help the client differentiate the response subject. The AS policy might further define the scope of that reference, as suggested in https://github.com/ietf-wg-gnap/gnap-core-protocol/issues/210 <https://github.com/ietf-wg-gnap/gnap-core-protocol/issues/210>, although I personally viewed that by default as (2) "as_id" an identifier locally unique to that AS for all the RSs.
> 
> SECEVENTS are still useful as an input, that's what I presented in request.hints.self.sub_ids (but it's the responsibility of the AS to match it with its own reference system - from instance from the email to AS's local record XUT2MFM1XBIKJKSDU8Q). Note: since the slides, a new draft-ietf-secevent-subject-identifiers-07 has been published. Notably this includes an opaque identifier, which the client can use to pass the as_ref hint to the AS. Thus we wouldn't need any further addition to SECEVENT, although hints would benefit from further additions (like DIDs for instance). Notice also that subject_types_supported becomes less important, because in the worst case scenario, it's simply a hint that wouldn't be understood and therefore taken into account by the AS.
> 
> The assertions array has a more important role than in option 1 (with the explicit aim of separating between client hints and validated info asserted by the AS). It serves as a generic/extendible mapping structure, e.g. "the AS asserts that opaque reference XUT2MFM1XBIKJKSDU8Q is matching with these (more or less verifiable) statements." The name assertion therefore corresponds to the responsibility of the AS in delivering those statements (cf should/must, issue #49). Some of these statements relate to identity or auth events (ex: DID, id_token), some might directly be useful for authorization decisions (like ZKP example in the parental control example). Possibly some of these validated assertions could be reused in next interactions.
> 
> Another question that popped-up is the scope of sub_ids. Should they be about the same user? (I think so). In the case of remote ROs especially, it might be important that DID be considered as an assertion (about someone else's) and not as sub_ids, because the AS needs to be careful about who the client wants to reach (avoid spam, etc.). This is consistent with the rest of discussion on what to do when end-user != RO, although using DIDComm might seem a bit early (the paint is very fresh). 
> 
> Option 2 was summarized as: 
> 
> // (suggestion only, not as an editor)
> "subject": {
>    "as_ref": { 				// response only (wouldn’t require SECEVENT)
> “as”: “https://ex1.as.com <https://ex1.as.com/>”,		
> “ref”: “XUT2MFM1XBIKJKSDU8QM”   
>    },                   
>    "assertions": [ ],                   // request and response (id_token/jwkthumb/DID/VC/etc. -> whatever info can be validated by the AS)
> 
>    "hints": {                           // request only (optional)
>       "self": { 		        // replaces request.user (support SECEVENT here)
>    “sub_ids”: { },		// SECEVENT (including opaque in draft-07 - that could contain the ref)
>    “assertions”: [ ]            // see examples: VC on DOB or ZKP on age (wider scope, possibly through extensions)  
> },                  
>       "principal": {			// new proposal presented at IETF110 (just an idea)
> 	   “automated”: true,		// rule engine		
> 	   “async”: { }			// remote ROs
> }              
>    }
> }
> 
> - Benefits: as_ref is self supporting (no strong coupling with SECEVENT), but can still take SECEVENT as an input hint for the AS. It avoids the risk of making an official association between a weak global identifier (e.g. email, used elsewhere) and hopefully stronger local assertions. GNAP would be fully agnostic to whatever identity system is used, only facilitating interoperability through assertions (second only in importance compared to the AS opaque reference). Some assertions (possibly extensions) could also be useful for the AS decision (e.g. VC or ZKP).
> - Downsides: opaqueness of as_ref is a feature, not a bug. The opaque reference is fully managed by the AS, therefore not inherently portable (cf portability discussions in OIDC currently). There's no explicit binding to an official identity, only a contextual mapping to AS validated assertions. It makes it more difficult to match the identifiers from one AS to another or to correlate with the client user DB (still possible via assertions, if the AS allows it). Assertions are much more generic (possibly via extensions), but might require a more advanced mechanism to request/response only the most relevant information (which makes the AS policy critical here).
> 
> I hope that clarifies the reasoning for both options. Again I'm not saying option 2 is better, just saying the trade-offs are different (+ knowing that we need to keep it simple).
> - option 1 puts more in sub_ids and less in assertions
> - option 2 puts less in the reference and more in assertions 
> All of this is open to discussion. More generally speaking, on every part in orange and question mark (?) I would love your ideas and criticisms. What I intended as an editor is only to highlight where the WG has important choices to make, before we can make the related PRs.
> 
> A further side comment: IETF EAT RATS was suggested for assertions, but relates more to the client attestations. It was added as a comment in https://github.com/ietf-wg-gnap/gnap-core-protocol/issues/44 <https://github.com/ietf-wg-gnap/gnap-core-protocol/issues/44>, which is a distinct topic we'll work on.
> 
> Fabien (editor's hat off)
> 
> -- 
> TXAuth mailing list
> TXAuth@ietf.org
> https://www.ietf.org/mailman/listinfo/txauth