Re: [Id-event] Subject Identifiers - Working Group Last Call

Aaron Parecki <aaron@parecki.com> Tue, 15 March 2022 15:11 UTC

Return-Path: <aaron@parecki.com>
X-Original-To: id-event@ietfa.amsl.com
Delivered-To: id-event@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7694F3A1520 for <id-event@ietfa.amsl.com>; Tue, 15 Mar 2022 08:11:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.108
X-Spam-Level:
X-Spam-Status: No, score=-7.108 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=parecki.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mvyuU2w9IZU6 for <id-event@ietfa.amsl.com>; Tue, 15 Mar 2022 08:11:05 -0700 (PDT)
Received: from mail-il1-x130.google.com (mail-il1-x130.google.com [IPv6:2607:f8b0:4864:20::130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A328C3A1515 for <id-event@ietf.org>; Tue, 15 Mar 2022 08:11:05 -0700 (PDT)
Received: by mail-il1-x130.google.com with SMTP id h7so13524465ile.1 for <id-event@ietf.org>; Tue, 15 Mar 2022 08:11:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=parecki.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rGypSWSnGwVuAxes59HR8XebOe7yErDsP+UOR5y654A=; b=dLdSOXeGQ4urmB/b0BCJy0OPcUWHW5QZyeXwGGN64+9O820bJ0xv0XVFp9CEObnEWn /1FEQukdjLNIp3o/eAARSH64qSZyLudr7O3FQlV38lF/iH++so7P+VlN2/qLsPqWs75b NctIWhmYr4o1BkAxL+6bYLRDfhhAxBQjBEq7n7XrW2xzUVGhXemVrOg14VVm4YIw0pxz tjTZFzM4lHOJDNeBLpA9fCiYhSJp1c6FlrdhpBbZlbXDo4xGWhKoiKa4ejDfozl9bO/Q h+0k5w28LjCVUJ3s6krEr2BGZLPfXuPnSQDMFoyoUX4P7hj83AWWY63tW5OCdlfDKYo6 dnsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rGypSWSnGwVuAxes59HR8XebOe7yErDsP+UOR5y654A=; b=4jF67Yljk6cB0R1yJ1SORDP7CKprpdNTnAZpq/Iybdvp9XCfvF1YD9lpk4y7Ohq3vL dsnn862gUYho8Vx7mbgbmrelGdmCbg/DwjnNwu1kN6Kgp/iHQPiv5g0KAnFtMNoAQWS5 gKoU0iul5XJD6kWrXqFtNOE/S8HTFuGVfjliUGuO6TFLkXjH2WvXk9zzLy+NpEpgQ2QZ w+IxdeoAuBwRtRmAgw5cPEAmgFANI1e9D//CEZBrds8h3e0rp3SfbxhHYM1I2iMIxO5M CPUTls8W3uePXy0udJuBw8bHF63V8pHBRP975t2TADxIXF+WSv8F70WyFfv1c22Y4iYc jCPA==
X-Gm-Message-State: AOAM5310jSChHhvVV4zyKXsJvSxate0PyBDkHedOpZLDsZta8ov8uTXt g0t8Nbjkvl85/wxt/n7IhDQCsOq6BwZ3W9rk
X-Google-Smtp-Source: ABdhPJwwK+rP3ug0anjfID4SK0h7FADKse1nNEKXuHawlZXUUFjvxNOj3PbPTSNvBBDDdFNveEyxkA==
X-Received: by 2002:a05:6e02:1b0f:b0:2c7:9ec2:1503 with SMTP id i15-20020a056e021b0f00b002c79ec21503mr7697619ilv.209.1647357063460; Tue, 15 Mar 2022 08:11:03 -0700 (PDT)
Received: from mail-io1-f45.google.com (mail-io1-f45.google.com. [209.85.166.45]) by smtp.gmail.com with ESMTPSA id h22-20020a5d9716000000b00645e6e57d5dsm10366939iol.1.2022.03.15.08.11.02 for <id-event@ietf.org> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 15 Mar 2022 08:11:02 -0700 (PDT)
Received: by mail-io1-f45.google.com with SMTP id w7so22544701ioj.5 for <id-event@ietf.org>; Tue, 15 Mar 2022 08:11:02 -0700 (PDT)
X-Received: by 2002:a05:6638:1349:b0:319:c499:33d4 with SMTP id u9-20020a056638134900b00319c49933d4mr19659752jad.265.1647357062006; Tue, 15 Mar 2022 08:11:02 -0700 (PDT)
MIME-Version: 1.0
References: <CAD9ie-uSbNHq=Mt3ohA=URf5rv2hz7YUdUMhOf80C_f=XBrGLA@mail.gmail.com> <36D66A89-D178-6047-B270-73AD540E7FAD@hxcore.ol> <81b58b05-97b6-d910-6b58-4b565ae6ea57@free.fr> <8330bc8a-d0f5-686e-1073-84e8bf83a294@free.fr> <BD4BD998-171C-49C8-B495-E5A7B3CE2448@amazon.com> <99121270-5eac-fab6-9aa6-d4e0ed0b734d@free.fr> <57676B78-2301-431A-A068-C60CCBED2338@amazon.com> <82673afd-ee5d-50b9-48d7-496b072e7927@free.fr> <D0596A7A-C838-40E6-93EF-6D4A1CB46F05@amazon.com> <56dac4b6-75b9-08fe-46d1-bd8a7e883c76@free.fr>
In-Reply-To: <56dac4b6-75b9-08fe-46d1-bd8a7e883c76@free.fr>
From: Aaron Parecki <aaron@parecki.com>
Date: Tue, 15 Mar 2022 15:10:50 +0000
X-Gmail-Original-Message-ID: <CAGBSGjoph8dvdd+2zSAGKrnoP4VLYZStAUNvEza5BZ7x9_Y-=Q@mail.gmail.com>
Message-ID: <CAGBSGjoph8dvdd+2zSAGKrnoP4VLYZStAUNvEza5BZ7x9_Y-=Q@mail.gmail.com>
To: Denis <denis.ietf@free.fr>
Cc: "Backman, Annabelle" <richanna@amazon.com>, Dick Hardt <dick.hardt@gmail.com>, Marius Scurtescu <marius.scurtescu@coinbase.com>, "Richard Backman, Annabelle" <richanna=40amazon.com@dmarc.ietf.org>, Roman Danyliw <rdd@cert.org>, SecEvent <id-event@ietf.org>, Yaron Sheffer <yaronf.ietf@gmail.com>
Content-Type: multipart/alternative; boundary="0000000000007fd09505da43344f"
Archived-At: <https://mailarchive.ietf.org/arch/msg/id-event/x0i5DprG5O7xfAG9nCTTJo_fbNY>
Subject: Re: [Id-event] Subject Identifiers - Working Group Last Call
X-BeenThere: id-event@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "A mailing list to discuss the potential solution for a common identity event messaging format and distribution system." <id-event.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/id-event>, <mailto:id-event-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/id-event/>
List-Post: <mailto:id-event@ietf.org>
List-Help: <mailto:id-event-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/id-event>, <mailto:id-event-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Mar 2022 15:11:25 -0000

The abstract states

> and named formats that define the syntax and semantics *for encoding*
subject identifiers as JSON objects (emphasis mine)

"semantics for encoding a subject identifier" is very different from
"semantics of a subject identifier" which is where your confusion is coming
from. There is no contradiction.

Aaron



On Tue, Mar 15, 2022 at 2:56 PM Denis <denis.ietf@free.fr> wrote:

> Hello Annabelle,
>
> Hi Denis,
>
> We may be talking past one another a bit here, so let me step back and try
> to state some things clearly. Within this draft, the word "subject"
> includes many, many different kinds of things. It is not a synonym for
> "user". Subjects may not have any clear relationship to an individual
> person or a group of people. Of course, a subject may be a user, or person,
> or group, or some other thing related to people.
>
> Agreed.
>
> But as stated in the draft, the *subject type* is out of scope.
>
> The text states in section 3.1:
>
>    Identifier Formats define how to encode identifying information for a
> subject.
>    They do not define the type or nature of the subject itself.
>
> However, the abstract states:
>
>          This specification formalizes the notion of subject identifiers
> as structured information that describe a subject, and named formats
>          that define the *syntax *and *semantics *for encoding subject
> identifiers as JSON objects.
>
> Two words are important in that sentence: "*semantics*" and "*syntax*".
>
> This means that what is called a "format" should define on one side:
>
>    - the semantics of the object identifier and on another side
>    - the syntax of the object identifier.
>
> The text states later on within section 3:
>
> A Subject Identifier MUST conform to a specific Identifier Format, (...).
>
> Section 3.1 states:
>
>    Identifier Formats define how to encode identifying information for a
> subject.
>    They do not define the type or nature of the subject itself.
>
> We are now in the core of the debate.
>
> The abstract states that subject identifiers define both *the semantics*
> and the syntax of a subject identifier
> while section 3.1 states that they do not define the semantics of a
> subject identifier, i.e. the type or nature
> of the subject identifier itself. This is contradictory.
>
> Some subject identifiers (and subject identifier formats) may be
> appropriate for some types of subjects but not others.
> For example, while I agree that a latitude and longitude coordinate
> doesn't make sense as an identifier for a user,
> it does make sense as an identifier for a plot of land. Likewise, an IP
> address is a perfectly reasonable identifier to use
> when the subject is the IP address itself (e.g., in an audit log of DHCP
> lease changes), or for a node on an IP-based network.
>
> If subject identifiers were fully opaque, they would be of little use.
> Hence they need to be structured to be able to make a difference
> at a first level of granularity between:
>
>    - those associated with a single individual, and
>    - those that relate to a group (i.e. to one or more individuals).
>
> Whether or not a subject represents an individual or a group is a property
> of the *subject itself*, and is not generally required in order to
> identify the subject.
> Since your example use case uses JWTs, this information can and should be
> included as a separate claim within the JWT. There are several benefits to
> this:
>
>
>    1. It is semantically correct, as JWTs are intended to encapsulate
>    claims about a subject.
>    2. It provides privacy advantages, as protocols like OIDC provide
>    mechanisms for clients to request access to different claims.
>    Embedding this information within the subject identifier implies it is
>    necessary to understanding the identifier.
>    3. It avoids adding something to subject identifiers that only really
>    applies to a subset of subject types.
>    4. It avoids the complicated question of how to interpret these values
>    in the context of different subject types.
>    For example, one implementer might consider a subject identifier that
>    identifies a single POSIX group itself
>    (i.e., the *group*, not the* members of* the group) to be "individual"
>    because it represents one single thing,
>    while another implementer might consider it to be "group" because the
>    one single thing it represents is literally a group.
>
> Section 2.4.of the GNAP draft (Identifying the User) states:
>
>    If the client instance knows the identity of the end user through one
>    or more identifiers or assertions, the client instance MAY send that
>    information to the AS in the "user" field.  The client instance MAY
>    pass this information by value or by reference.
>
>    sub_ids (array of objects):  An array of subject identifiers for the
>       end user, as defined by [I-D.ietf-secevent-subject-identifiers].
>       OPTIONAL.
>
>    assertions (array of objects)  An array containing assertions as
>       objects each containing the assertion format and the assertion
>       value as the JSON string serialization of the assertion.
>       OPTIONAL.
>
> Assertions could certainly be used, but at the moment I am not aware of an
> IETF RFC that would allow to support some form of interoperability
> for subject identifiers or for group memberships.
>
> The SEC-EVENT draft is an opportunity to standardize some of them so that
> they can be used in an interoperable fashion.
>
> One sentence is that section is currently:
>
>        For example, the entity to which the identifiers are presented now
> knows that both identifiers relate to the same subject,
>       and may be able to correlate additional data based on that.
>
> Such sentence is inaccurate because it depends upon the type of subject
> identifier that is being received.
>
> Section 6.1 addresses correlation risks specifically arising from
> including multiple subject identifiers within the same context, as this is
> something the draft enables
> (via the `aliases` format, and by introducing the `sub_id` JWT claim and
> permitting its use alongside `sub`). While including a subject identifier
> with other information
> more generally may introduce correlation risks, those risks are highly
> context-dependent, and I am not sure that there is any sensible advice to
> be given in this draft.
> However, I can add something to the effect of "implementers must consider
> such risks, and specs that use subject identifiers must provide appropriate
> privacy considerations of their own."
>
> Your argumentation would be valid if the semantics of the subject
> identifier would be out of the scope of the draft, but unfortunately this
> is not the case.
>
> Another valuable feature will be, simply, by looking at the structure of a
> subject identifier to know whether correlation of user accounts will
> or will not be possible. An auditor of an audit trail would be in a
> position to know it easily and hence to assess under which conditions a RS
> will or will not be in a position to correlate user accounts with another
> RS. At the moment, it is impossible.
>
> This assumes that the issuer of the subject identifier is willing to
> reveal that information to the consumers of that subject identifier.
>
> This assumes that the client is willing to ask to the AS to use or
> generate such a subject identifier (if may or may not be able to support
> it).
>
> Further, since subjects may be correlated via information that is not part
> of the subject identifier, any "non-correlatable" flag within the subject
> identifier
> would be insufficient to answer the question of whether the subject can be
> correlated.
>
> The question is not whether other information can allow such correlation,
> but whether such correlation will or will not be possible
> only by taking advantage of the content of a SINGLE subject identifier.
>
> I have identified two classes of user accounts: long term and temporary.
>
> The processing made by the RS will be different whether the subject
> identifier is a long term or a temporary subject identifier.
> Hence a distinction first needs to be made in the structure of the subject
> identifier to distinguish between these two classes.
>
> My individual draft provides an example for that processing.
>
> If the intent is to indicate whether or not the *user account* is long
> term or temporary, then we are again talking about a property of the
> *subject*,
> which is better represented as a separate JWT claim, for all the reasons
> stated above.
>
> In the JWT profile (RFC 9068), there is a a claim called "sub".
>
> The "Privacy considerations" section states:
>
>    This profile mandates the presence of the "sub" claim in every JWT
> access token, making it possible for resource servers to rely on that
>    information for correlating incoming requests with data stored  locally
> for the authenticated principal.
>
> We cannot change any more the semantics of the "sub" claim which is very
> general and which, by itself, does not allow to know whether or not
> some correlation will be possible.
>
> On the contrary, the "sub-id" claim would be able to let the client
> control whether or not some correlation will be possible by the RSa.
>
> I was not able to find any specific examples of how a processor might
> change its behavior based on this information.
>
> If you take a look at my draft, the processing by a RS of a sub-claim
> which contains a long term or a temporary subject identifier will be rather
> different.
>
> I think in most cases, they would operate the same way – operationally
> speaking, a short-term account is little different from a dormant long-term
> account.
>
> From what you've written, it seems your goal is to allow users to control
> whether or not an RS can correlate the end user's activity across multiple
> sessions.
>
> This is one of the goals, but not the single goal. Introducing the support
> of group memberships is another goal.
>
> That problem is much larger than subject identifiers and cannot be solved
> at that level.
>
> It could be solved in this draft.
>
> Many different kinds of claims may be used to correlate activity beyond
> those used to identify the subject.
>
> As said earlier, the question is not whether other information can allow
> such correlation, but whether such correlation will or will not be possible
> simply by looking at the content of a SINGLE subject identifier.
>
>
> The RS needs to be denied access to these claims or provided with masked
> or surrogate values.
>
> ?!?
>
> It is also likely that the end user does not want the RS to know that they
> are providing non-correlated values,
> as that would allow the RS to modify its behavior to attempt to block
> access or force the user to provide legitimate values.
> In such a case, including a flag in the subject identifier would undermine
> the user's control.
>
> If the subject identifier contains an email address, the RS will indeed
> know that correlation is likely to be possible.
> When the subject identifier contains a Type 1 identifier, the RS will need
> to recognize it otherwise, it cannot to process the request correctly.
>
> At present, I don't see any justification for including individual/group
> or correlatability information as part of the subject identifier structure.
>
> The following "Formats" are currently being defined in the draft:
>
>        3.2.1.  Account Identifier Format
>        3.2.2.  Aliases Identifier Format
>        3.2.3.  Decentralized Identifier (DID) Format
>        3.2.4.  Email Identifier Format
>        3.2.5.  Issuer and Subject Identifier Format
>        3.2.6.  Opaque Identifier Format
>        3.2.7.  Phone Number Identifier Format
>
>
> Some other useful "formats" should be added, like functional group
> memberships, roles and hierarchical group memberships.
> Is there a rational for not adding these ?
>
> Let us now focus on section 3.2.5 about "Issuer and Subject Identifier
> Formats". The text states:
>
>       The Issuer and Subject Identifier Format identifies a subject using
> a pair of "iss" and "sub" members, analogous to how subjects
>       are identified using the "iss" and "sub" claims in OpenID Connect
> [OpenID.Core] ID Tokens.
>
> The *syntax *is currently : a pair of "iss" and "sub" members. *Such
> syntax may be associated with different semantics*.
>
> A client should be able to ask to an AS to deliver (if it can do it) into
> a JWT a subject identifier associated with a user among five possibilities:
>
>      (1) a user identifier unique for each user/ RS pair
>      (2) a user identifier unique for each AS / RS pair
>      (3) a user identifier unique for the AS whatever RS being involved,
>      (4) a short-term user identifier unique for the AS,
>      (5) a globally unique user identifier (where the uniqueness is
> independent from the AS).
>
> Let me give an example for each of them:
>
> (1) a user identifier unique for each user/ RS pair
>
> {
>      "format": "UID-type 1",
>      "iss": "http://issuer.example.com/" <http://issuer.example.com/>,
>      "sub": "145234573"
>    }
>
> (2) a user identifier unique for each AS / RS pair
>
> {
>      "format": "UID-type 2",
>      "iss": "http://issuer.example.com/" <http://issuer.example.com/>,
>      "sub": "145234573"
>    }
>
>
> (3) a user identifier unique for an AS whatever RS being involved,
>
> {
>      "format": "UID-type 3",
>      "iss": "http://issuer.example.com/" <http://issuer.example.com/>,
>      "sub": "145234573"
>    }
>
>
> (4) a short-term user unique identifier (dependent from the AS)
>
> {
>      "format": "UID-type 4",
>      "iss": "http://issuer.example.com/" <http://issuer.example.com/>,
>      "sub": "145234573"
>    }
>
>
> (5) a globally unique user identifier (independent from the AS),
>
> {
>      "format": "GUID-email",
>      "syntax": "email",
>      "email": "user@example.com" <user@example.com>
>    }
>
>
> I will however publish an update with the additional privacy language I
> mentioned above.
>
> Before doing it, please take a look at RFC 9068: JWT Profile for OAuth
> 2.0 Access Tokens, on page 11
> at the "Privacy Considerations" section.
>
> Denis
>
>
> —
> Annabelle Backman (she/her)
> richanna@amazon.com
>
>
>
>
> On Mar 12, 2022, at 5:52 AM, Denis <denis.ietf@free.fr> wrote:
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> Annabelle,
>
> Thank you for your second email.
>
> Rather than responding between the lines, I have constructed a global
> reply taking into consideration your argumentation.
>
> You wrote:
>
>         Can you provide examples where it is critical to have this
> information encoded within the identifier data structure itself?
>         Under what circumstances would a consumer of a subject identifier
> change their behavior based on this information?
>
> The rational of the proposal is related to the case where JWTs are
> exchanged between a client and a RS (i.e. not between an AS and a RS).
> This is certainly not the single case to be considered, but it is an
> important case.
>
> When an access token is received by a RS, it may contain one or more
> subject identifiers.
>
> They allow, when necessary, to trace the actions that have been performed
> while using a JWT that contained these subject identifiers.
> These subject identifiers may be placed into an audit trail, for example,
> in order to be associated with an action that has taken place.
> If subject identifiers were fully opaque, they would be of little use.
> Hence they need to be structured to be able to make a difference
> at a first level of granularity between:
>
>    - those associated with a single individual, and
>    - those that relate to a group (i.e. to one or more individuals).
>
> Such a difference relates to what I wrote in my original email from last
> year:
>
>        In order to be able to make the difference, an *optional class* attribute
> should be defined which may take one out of two values:
>
>
>    - "*ind*" to indicate an individual identifier or
>    - "*grp" *to indicate a group identifier.
>
>
>
> *Subject identifiers associated with a single individual *
> As explained in my individual draft, a subject identifier that relates to
> an individual may disclose more or less information
> that allows RSs to link their user accounts. The choice between these
> various types of identifiers may be done by the end user
> or/and by the client.
>
> Depending upon the level of concern (or knowledge) of the individual as
> regard to his/her privacy and what is supported
> by the underlying technology, I have identified up to five possible
> choices for the individual.
>
> Now, let us come to your question:
>
>        Under what circumstances would a consumer of a subject identifier
> change their behavior based on this information?
>
> I have identified two classes of user accounts: long term and temporary.
>
> The processing made by the RS will be different whether the subject
> identifier is a long term or a temporary subject identifier.
> Hence a distinction first needs to be made in the structure of the subject
> identifier to distinguish between these two classes.
>
> My individual draft provides an example for that processing.
>
> Secondly, when a match needs to be done by a RS between subject
> identifiers received in different JWTs, it needs to be done
> using the same type of subject identifiers, hence including that type in
> the structure is necessary.
>
> Let us now consider the point of view of the client and raise the
> following question:
>
>         Under what circumstances would a client change its behavior based
> on this information?
>
> If a client has been asking for a type X of subject identifier in a JWT
> and is then able to discover that the JWT contains instead
> a type Y of subject identifier, then the client SHALL NOT transmit the JWT
> to the RS, because the privacy of the end user might be impacted.
>
> You wrote:
>
>          The Privacy Considerations section is intended to address the
> correlation risk generally.
>          The JWT case is mentioned only as an example. Any suggestions on
> how to make that more clear?
>
> It is not a matter to make it more clear based on the current content of
> the current draft.
> One sentence is that section is currently:
>
>        For example, the entity to which the identifiers are presented now
> knows that both identifiers relate to the same subject,
>       and may be able to correlate additional data based on that.
>
> Such sentence is inaccurate because it depends upon the type of subject
> identifier that is being received.
>
> It is a matter of exposing the privacy concerns of the end-user and how
> they may be addressed using one of the five types of subject identifiers.
>
> I mean that it will be possible to revise this section once the five types
> of subject identifiers will have been incorporated into the document.
>
> Another valuable feature will be, simply, by looking at the structure of a
> subject identifier to know whether correlation of user accounts will
> or will not be possible. An auditor of an audit trail would be in a
> position to know it easily and hence to assess under which conditions a RS
> will or will not be in a position to correlate user accounts with another
> RS. At the moment, it is impossible.
>
>
>
>
> *Subject identifiers that relate to a group *
> It will be valuable, simply by looking at the structure of a subject
> identifier to know that is it related to a group (and not to a single
> individual)
> and to which kind of group (e.g. hierarchical, functional, or a role).
>
> This relates to my original email from last year where I wrote:
>
>            It would be useful to define one format for these common
> groups: one or more character strings separated by the character slash
>            for both hierarchical (*hgrp*) and functional group
> memberships (*fgrp*) and roles (*role*):
>
>
>
> *Other replies related to your original email *
> You wrote:
>
>       From what you've shared so far, a few things jump out that make me
> think it would not be appropriate to include this as a core property within
> subject identifier formats:
>       The description assumes the subjects being identified are
> users/accounts and that the subject identifier is being exchanged between
> an AS and RS.
>       This is by far not the only use case for subject identifiers.
>
> As said earlier, this is certainly not the single case to be considered,
> but it is an important case.
>
> You wrote:
>
>      Non-correlation only really makes sense for opaque, surrogate
> identifiers like UUIDs.
>
> Non-correlation does not only apply to fully opaque identifiers.
>
> You wrote:
>
>      How do you prevent correlation if your identifier format is an IP
> address, phone number, government-issued ID number, domain name,
>      latitude/longitude, street address, etc.? (Note that if the local
> part of an email address can be understood as an opaque, surrogate
> identifier
>      if the issuer of the subject identifier controls the email address
> domain, for example, emails generated by Apple's Hide my Email feature).
>
> The standardization community is working taking into consideration roughly
> the ISO model, where the application layer is addressed independently
> from the transport or network layer. Hiding an IP address is not a concern
> for the application layer and hence for the content of a JWT.
> This concern can be addressed using specific techniques.
>
> There exist use cases where an individual can use his user account on a RS
> without disclosing a phone number, a government-issued ID number,
> a domain name, a latitude/longitude (!) or a street address. The list you
> provide is an example of such "end user attributes".
>
> Note that an email address may be used as a subject identifier if the AS
> incorporates it in a "globally unique user identifier", i.e. a type 4
> subject identifier
> in my contribution ... and if the end-user is indeed accepting or willing
> to use such "globally unique user identifier".
>
> Let us now finish to address the arguments raised in your first email.
>
>       Certainly there are systems out there that issue identifiers for
> individuals (i.e., users) and groups that may collide with one another
>       (e.g., any system that uses 0-based SQL auto-incrementing integers
> for its identifiers). However, I think it is rare for such identifiers
>       to be used in cases where interoperability is important, and *where
> the type of the subject is not made clear from context*
>       (e.g., a "GetGroupMembers" API would expect an identifier for a
> group, not a user). Further, I suspect any such system that did need
>       to disambiguate between individuals and groups within the identifier
> itself would likely need to disambiguate between other types of subjects
>       as well, e.g., hosts, documents, various resources provided by the
> service, etc. As such, this proposal would not solve their problem, and
>       they would be better off defining their own subject identifier
> format for their use case.
>
> You say that "there are systems out there that issue identifiers for
> individuals (i.e., users) and groups that may collide with one another".
>
> It is not because that there may exist some systems that are badly
> designed that we should not encourage to build system on good foundations.
> The type of the subject identifier cannot always made clear from context
> but it can be clear from the content of the JWT.
>
> Note also that, when an auditor takes a look at the content of an audit
> trail, the "context" has been lost and hence he/she may only understand
> the semantics of a subject identifier by looking at its internal structure.
>
> You also wrote "they would be better off defining their own subject
> identifier format for their use case."
>
> Within the IETF, one of the objectives is interoperability and as such it
> can only be achieved using standard track RFCs
> rather than by defining subject identifier formats for
> application-specific use cases, in a non-interoperable way.
>
> Denis
>
> It would be valuable to be able to make a difference between these five
> types of user identifiers.
>
> Can you provide examples where it is critical to have this information
> encoded within the identifier data structure itself? Under what
> circumstances would a consumer of a subject identifier change their
> behavior based on this information?
>
> From what you've shared so far, a few things jump out that make me think
> it would not be appropriate to include this as a core property within
> subject identifier formats:
>
>
>    1. The description assumes the subjects being identified are
>    users/accounts and that the subject identifier is being exchanged between
>    an AS and RS. This is by far not the only use case for subject identifiers.
>
>    2. Non-correlation only really makes sense for opaque, surrogate
>    identifiers like UUIDs. How do you prevent correlation if your identifier
>    format is an IP address, phone number, government-issued ID number, domain
>    name, latitude/longitude, street address, etc.? (Note that if the local
>    part of an email address can be understood as an opaque, surrogate
>    identifier if the issuer of the subject identifier controls the email
>    address domain, for example, emails generated by Apple's Hide my Email
>    feature).
>
>
> —
> Annabelle Backman (she/her)
> richanna@amazon.com
>
>
>
>
> On Mar 10, 2022, at 3:27 AM, Denis <denis.ietf@free.fr> wrote:
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> Hello Annabelle,
>
> I am glad to be able to exchange with you for the very first time.
>
> I am currently rather busy and I don't have the time available for a
> detailed response.
>
> Nevertheless, I browsed through your comments and I picked one of them:
>
> > A subject identifier *type* attribute would be able to support four
> values: *guid*, *shared*, *unique* and *tmp*.
>
> I'm not sure I'm following what you're intending to represent with this,
> and what problem you're trying to solve.
>
> Last August, I have posted a draft, that has expired, but that you can
> still find at:
>
>
> https://datatracker.ietf.org/doc/html/draft-pinkas-gnap-core-protocol-00.html
>
> If you have some time available, please take a look at section 1.7. called
> "Short term and long term user accounts",
> where you will get some information. In particular, the following text.
>
> The four types used in the context of long-term user accounts managed by a RS are:
>
>          (1) a unique user identifier used to identify a user for each User/ RS pair, or
>
>              Note: this option cannot be implemented in the context of a "software-only" solution.
>                    It requires the use, by the end-user, of a secure element with specific security
>                    properties.  [This option is not detailed any further at the moment].
>
>          (2) a unique user identifier used to identify a user for each AS / RS pair, or
>
>          (3) a locally unique user identifier used to identify a user whatever RS is being involved, or
>
>          (4) a globally unique user identifier.
>
>    The last type used in the context of short-term user accounts managed by a RS is:
>
>          (5) a short-term user unique identifier.
>
> It would be valuable to be able to make a difference between these five
> types of user identifiers.
>
> *Note*: the draft has been posted a few months after my original comment,
> hence at the time I made my original post
>            my ideas where not yet fully stabilized. Now, they are !
>
> Denis
>
> On Mar 9, 2022, at 9:31 AM, Denis <denis.ietf@free.fr> wrote:
>
> ...
>
> While this statement is correct, it should be remembered that the title of
> this document is:
>
> " Subject Identifiers for Security Event Tokens"
>
> and is not:
>
> " Subject Identifiers *Formats* for Security Event Tokens".
>
>
> The draft formalizes the concept of a "Subject Identifier", and provides a
> standard way to represent those as structured data. Therefore I think the
> current name remains appropriate.
>
> *1. Granularity of the identification*
>
> A subject identifier may be able to identify an entity either individually
> or as a member of a group.
>
> In order to be able to make the difference, an *optional class* attribute
> should be defined which may take one out of two values:
>
>    - "*ind*" to indicate an individual identifier or
>    - "*grp" *to indicate a group identifier.
>
> Whether or not a subject is an individual or group is a property of the
> subject itself, not (generally speaking) a property of the subject
> identifier. Subject identifiers are not general purpose containers for
> claims about a subject – we already have JWTs for that. 😀
>
> Certainly there are systems out there that issue identifiers for
> individuals (i.e., users) and groups that may collide with one another
> (e.g., any system that uses 0-based SQL auto-incrementing integers for its
> identifiers). However, I think it is rare for such identifiers to be used
> in cases where interoperability is important, and where the type of the
> subject is not made clear from context (e.g., a "GetGroupMembers" API would
> expect an identifier for a group, not a user). Further, I suspect any such
> system that did need to disambiguate between individuals and groups within
> the identifier itself would likely need to disambiguate between other types
> of subjects as well, e.g., hosts, documents, various resources provided by
> the service, etc. As such, this proposal would not solve their problem, and
> they would be better off defining their own subject identifier format for
> their use case.
>
>
> *2. Correlation operations that may be either performed or prevented*
>
> Currently, the Privacy Considerations section only addresses one
> correlation case where a JWT would have both "sub" and "sub_id" JWT claims.
> While it is appropriate to mention such a case, other correlation cases
> exist.
>
>
> The Privacy Considerations section is intended to address the correlation
> risk generally. The JWT case is mentioned only as an example. Any
> suggestions on how to make that more clear?
>
> These cases become visible if the "sub_id" contains an *optional* subject
> identifier *type* attribute.
>
> A subject identifier *type* attribute would be able to support four
> values: *guid*, *shared*, *unique* and *tmp*.
>
>
> I'm not sure I'm following what you're intending to represent with this,
> and what problem you're trying to solve. Any subject identifier transmitted
> from one party to another is by definition "shared". Once the recipient
> receives that identifier, the transmitter has no programmatic control over
> how it is used – that's the realm of *legal* contracts, not API
> contracts.
>
> A transmitter that wishes to prevent Recipient A from using the
> transmitters subject identifiers to correlate records with Recipient B may
> be able to do so by issuing directed identifiers that are unique per
> subject+recipient pair. A system that does so is essentially immune to this
> correlation risk; it is not clear to me what value there is in advertising
> this fact within the subject identifier.
>
> *3. Hierarchical group memberships, functional group memberships and
> roles.*
>
> Examples of group identifiers are : hierarchical group memberships,
> functional group memberships and roles.
> It would be useful to define one format for these common groups: one or
> more character strings separated by the character slash.
> for both hierarchical (*hgrp*) and functional group memberships (*fgrp*)
> and roles (*role*):
>
>
> Hierarchical groups and functional groups are both subject types, not
> subject identifier types. I can imagine something like a `path` subject
> identifier format that contains an ordered list of scalar values describing
> a path within a graph. That graph could be a filesystem directory tree, an
> org chart, a computer network, etc.
>
> It might look something like:
>
> {
>
>   "format": "path",
>
>   "path": ["usr", "local", "bin", "sha512sum"]
>
> }
>
>
> or
>
>
> {
>
>   "format": "path",
>
>   "path": ["Example University", "Faculty", "Computer Science", "Ada
> Lovelace"]
>
> }
>
>
> Note that the nature of the graph, and whether or not the subject
> identifier identifies the entire path itself or just the final node would
> depend on the context in which the identifier appears.
>
> While it is an interesting thought exercise, unless someone has a use case
> for this kind of subject identifier format I don't think it should go in
> this draft. It can always be defined by someone later, if needed.
>
> —
> Annabelle Backman (she/her)
> richanna@amazon.com
>
>
>
>
> On Mar 9, 2022, at 9:31 AM, Denis <denis.ietf@free.fr> wrote:
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
>
> Dick,
>
> As a response to your inquiry, I repost the original email sent on
> 27/05/2021 at 19:42 (Paris local time) to the same recipients.
>
> Denis
>
>
> I believe that this document should be enhanced on three aspects that are
> currently not addressed.
>
> Section 3.1 (Identifier Formats versus Principal Types) states:
>
> Identifier Formats define how to encode identifying information for a
> subject.  They do not define the type or nature of the subject itself.
>
>  While this statement is correct, it should be remembered that the title
> of this document is:
>
> " Subject Identifiers for Security Event Tokens"
>
> and is not:
>
> " Subject Identifiers *Formats* for Security Event Tokens".
>
> Therefore it would be possible to add *optional *attributes/characteristics
> to Subject Identifiers which relate to two different topics:
>
> -          the granularity of the identification and
> -          the correlation operations that may be either performed or
> prevented using some values contained in a specific format.
>
> *1. Granularity of the identification*
>
> A subject identifier may be able to identify an entity either individually
> or as a member of a group.
>
> In order to be able to make the difference, an *optional class* attribute
> should be defined which may take one out of two values:
>
>    - "*ind*" to indicate an individual identifier or
>    - "*grp" *to indicate a group identifier.
>
> Examples:
>
>  "format": "email",
>      "class": "ind"
>      "email": "tom.jones@example.com" <tom.jones@example.com>
>
> "format": "email",
>      "class": "grp"
>      "email": "marketing@example.com" <marketing@example.com>
>
>
> *2. Correlation operations that may be either performed or prevented*
>
> Currently, the Privacy Considerations section only addresses one
> correlation case where a JWT would have both "sub" and "sub_id" JWT claims.
>
> While it is appropriate to mention such a case, other correlation cases
> exist.
>
> These cases become visible if the "sub_id" contains an *optional* subject
> identifier *type* attribute.
>
> A subject identifier *type* attribute would be able to support four
> values: *guid*, *shared*, *unique* and *tmp*.
>
> --
---
Aaron Parecki
https://aaronparecki.com