Re: [Id-event] I-D Action: draft-ietf-secevent-subject-identifiers-11.txt

Annabelle and I have had a chance to discuss this directly, but I wanted to take a moment to record my response here for the group as well. I believe we now understand each other that the `did` format should be restored, alongside a generic `uri` format, with overall guidance on where and how to use each. Namely, use the most specific semantically appropriate format that you can. The reasons for my stance, and what I believe are the conclusions we agreed to, are discussed inline below:

> On May 18, 2022, at 6:29 PM, Backman, Annabelle <richanna@amazon.com> wrote:
> 
> There appear to be some issues with -11:
> The definition for `did` was removed, but not the `did` entry in the format registry
> No replacement `url` format was added.
> 
> Justin, my understanding is that your concerns are directed at the proposal to replace `did` with `url`, and thus would not be addressed by adding the missing `url` format. Is that correct? Assuming that is the case...

My concerns with the removal of `did` would NOT be addressed by the addition of a generic `url` or `uri` format. The primary reason for this, and to me a primary driver for the subject identifiers work, is that the subject identifier format defines not only the syntax of the identifier but also its semantic content. I do not believe that it is appropriate to remove the semantic information from the format and push it all down into the lower layer.

> 
> Replacing `did` with `url` doesn't push the semantic information anywhere; the semantic information is there in the lower layer already. Having a separate `did` format pulls that information up into the subject identifier format layer, encoding the same information twice. That significantly complicates processing and could hurt interoperability.

In fact, it does the opposite. One could make the argument that because we have “mailto:” URLs (rfc2368) and “tel:” URLs (rfc3966) then we don’t actually need the `email_address` or `phone_number` formats either, since we could just encode all that in the URL itself. And then there’s no need for an `opaque` because you could easily use a `urn` to solve that problem. Even the issuer/subject pair COULD be formatted as a single URL, if someone just sat down and made a syntax for it (and people argued for exactly that in OIDC, but it didn’t get anywhere).

So, in that world, why even bother with the subject identifiers? Let me tell you why:

When I’m creating a subject identifier block in my application, I know what kind of identifier it is. I want to tell the receiver that I specifically know what kind of identifier it is. The syntax for formatting the identifier itself is incidental to this — particularly if that syntax is itself a URL.

> 
> Consider the scenario where we have both `url` and `did` format types. An issuer might encode a DID using either format type; do processors that expect DIDs need to support both? If so then we've just made their lives harder. More likely, some would support both and some wouldn't, leading to unnecessary pain for parties that have to interoperate across processors and/or issuers.

We’d expect to use `did` here. I would not expect a processor to support both formats if they’re specifically looking for DIDs. 
> 
> Now consider the scenario where we just have `url`. A processor that accepts DID URLs (possibly alongside other non-URL identifier formats) and no other URL types will see the `url` format, assume the value is a DID, and attempt to validate it or otherwise process it as a DID. Note that this step is necessary even if we have a `did` format, as it's always possible that the issuer provided a malformed subject identifier. Likewise, a processor that expects some other type of URL (e.g., an https URL) will have to parse the URL and confirm it has the expected scheme, and depending on the use case may also need to apply other security checks (e.g., matching against allowed origins, ensuring that the URL doesn't contain a username or password, etc.).

This is exactly why we shouldn’t have just `url` without other layers. If I’m processing a URL as an identifier, I may or may not want to do specific things with that URL. Or it might simply just be an identifier string, like someone’s homepage. I would be much more comfortable if the `url` format did not have any additional processing implied, but that more specific formats could require such processing, as you’d expect a DID to do in most cases.

I think the malformed subject identifier example is a strawman - any identifier could be “malformed”. But instead of allowing the processor to have a much more limited check of “is this a DID?”, we now have to have a wider check of “is this a URL, is it a kind I know how to process, and is there more processing that I need to do with it?”, and that’s where all of the problems in the above example come in to play. 

> 
> In the case where a processor accepts both DIDs and some other type of URL, they have to parse and validate the URL and then branch based on the scheme, instead of just branching based on the identifier format.

Could a processor figure out that there was a DID url inside of a `url` block? Sure — but those are semantically different identifiers, just like if I had put a `mailto:` URL inside of a `url` block, I would not expect that to be treated with any particular equivalence to the same email address in an `email_address` block. And I think the draft can actually be explicit about that distinction: 

 - there’s no guarantee of equivalence between the information in different formats
 - you should use the most specific format for the information you’re trying to convey

> 
> Are there other scenarios where the issuer or processor encounters more significant pain if we just have `url` versus if we have `url` and `did`?

Yes, I think the entire act of punting everything to the lower layer causes nothing BUT pain. This confusion stems from the fact that both URIs and the subject identifier formats both specify some level of semantic and syntactic constraint. However, mixing them in the way proposed is deeply problematic and would be disastrous in practice.

As such, the subject identifiers format should continue to provide semantic information about its contents, just like it has in the past before draft -10, and not simply turn into a meaningless way to put URLs into a JSON object.

 — Justin

> 
> —
> Annabelle Backman (she/her)
> richanna@amazon.com <mailto:richanna@amazon.com>
> 
> 
> 
> 
>> On Apr 26, 2022, at 5:36 PM, Justin Richer <jricher@mit.edu <mailto:jricher@mit.edu>> wrote:
>> 
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>> 
>> 
>> 
>> I strongly disagree with the editor's removal of "did" from the spec and the reasons for doing so.pushing the semantic information off into a lower layer is not helpful in terms of complexity nor application. Now an application will need to parse the various url's to know what they are instead of being told in the data structure what's in there.
>> 
>> -Justin
>> ________________________________________
>> From: Id-event [id-event-bounces@ietf.org <mailto:id-event-bounces@ietf.org>] on behalf of internet-drafts@ietf.org <mailto:internet-drafts@ietf.org> [internet-drafts@ietf.org <mailto:internet-drafts@ietf.org>]
>> Sent: Thursday, April 21, 2022 3:56 PM
>> To: i-d-announce@ietf.org <mailto:i-d-announce@ietf.org>
>> Cc: id-event@ietf.org <mailto:id-event@ietf.org>
>> Subject: [Id-event] I-D Action: draft-ietf-secevent-subject-identifiers-11.txt
>> 
>> A New Internet-Draft is available from the on-line Internet-Drafts directories.
>> This draft is a work item of the Security Events WG of the IETF.
>> 
>>        Title           : Subject Identifiers for Security Event Tokens
>>        Authors         : Annabelle Backman
>>                          Marius Scurtescu
>>                          Prachi Jain
>>        Filename        : draft-ietf-secevent-subject-identifiers-11.txt
>>        Pages           : 22
>>        Date            : 2022-04-21
>> 
>> Abstract:
>>   Security events communicated within Security Event Tokens may support
>>   a variety of identifiers to identify subjects related to the event.
>>   This specification formalizes the notion of subject identifiers as
>>   structured information that describe a subject, and named formats
>>   that define the syntax and semantics for encoding subject identifiers
>>   as JSON objects.  It also defines a registry for defining and
>>   allocating names for such formats, as well as the sub_id JSON Web
>>   Token (JWT) claim.
>> 
>> 
>> The IETF datatracker status page for this draft is:
>> https://datatracker.ietf.org/doc/draft-ietf-secevent-subject-identifiers/ <https://datatracker.ietf.org/doc/draft-ietf-secevent-subject-identifiers/>
>> 
>> There is also an htmlized version available at:
>> https://datatracker.ietf.org/doc/html/draft-ietf-secevent-subject-identifiers-11 <https://datatracker.ietf.org/doc/html/draft-ietf-secevent-subject-identifiers-11>
>> 
>> A diff from the previous version is available at:
>> https://www.ietf.org/rfcdiff?url2=draft-ietf-secevent-subject-identifiers-11 <https://www.ietf.org/rfcdiff?url2=draft-ietf-secevent-subject-identifiers-11>
>> 
>> 
>> Internet-Drafts are also available by rsync at rsync.ietf.org <http://rsync.ietf.org/>::internet-drafts
>> 
>> 
>> _______________________________________________
>> Id-event mailing list
>> Id-event@ietf.org <mailto:Id-event@ietf.org>
>> https://www.ietf.org/mailman/listinfo/id-event <https://www.ietf.org/mailman/listinfo/id-event>
>> 
>> _______________________________________________
>> Id-event mailing list
>> Id-event@ietf.org <mailto:Id-event@ietf.org>
>> https://www.ietf.org/mailman/listinfo/id-event <https://www.ietf.org/mailman/listinfo/id-event>