Re: [Id-event] Subject Identifiers

Hello Annabelle,

> Hi Denis,
>
> We may be talking past one another a bit here, so let me step back and 
> try to state some things clearly. Within this draft, the word 
> "subject" includes many, many different kinds of things. It is not a 
> synonym for "user". Subjects may not have any clear relationship to an 
> individual person or a group of people. Of course, a subject may be a 
> user, or person, or group, or some other thing related to people.

Agreed.

> But as stated in the draft, the /subject type/ is out of scope.

The text states in section 3.1:

    Identifier Formats define how to encode identifying information for 
a subject.
    They do not define the type or nature of the subject itself.

However, the abstract states:

This specification formalizes the notion of subject identifiers as 
structured information that describe a subject, and named formats
          that define the *syntax *and *semantics *for encoding subject 
identifiers as JSON objects.

Two words are important in that sentence: "*semantics*" and "*syntax*".

This means that what is called a "format" should define on one side:

  * the semantics of the object identifier and on another side
  * the syntax of the object identifier.

The text states later on within section 3:

A Subject Identifier MUST conform to a specific Identifier Format, (...).

Section 3.1 states:

Identifier Formats define how to encode identifying information for a 
subject.
They do not define the type or nature of the subject itself.

We are now in the core of the debate.

The abstract states that subject identifiers define both *the semantics* 
and**the syntax of a subject identifier
while section 3.1 states that they do not define the semantics of a 
subject identifier, i.e. the type or nature
of the subject identifier itself. This is contradictory.

> Some subject identifiers (and subject identifier formats) may be 
> appropriate for some types of subjects but not others.
> For example, while I agree that a latitude and longitude coordinate 
> doesn't make sense as an identifier for a user,
> it does make sense as an identifier for a plot of land. Likewise, an 
> IP address is a perfectly reasonable identifier to use
> when the subject is the IP address itself (e.g., in an audit log of 
> DHCP lease changes), or for a node on an IP-based network.
>
>> If subject identifiers were fully opaque, they would be of little 
>> use. Hence they need to be structured to be able to make a difference
>> at a first level of granularity between:
>>
>>   * those associated with a single individual, and
>>   * those that relate to a group (i.e. to one or more individuals).
>>
> Whether or not a subject represents an individual or a group is a 
> property of the /subject itself/, and is not generally required in 
> order to identify the subject.
> Since your example use case uses JWTs, this information can and should 
> be included as a separate claim within the JWT. There are several 
> benefits to this:
>
>  1. It is semantically correct, as JWTs are intended to encapsulate
>     claims about a subject.
>  2. It provides privacy advantages, as protocols like OIDC provide
>     mechanisms for clients to request access to different claims.
>     Embedding this information within the subject identifier implies
>     it is necessary to understanding the identifier.
>  3. It avoids adding something to subject identifiers that only really
>     applies to a subset of subject types.
>  4. It avoids the complicated question of how to interpret these
>     values in the context of different subject types.
>     For example, one implementer might consider a subject identifier
>     that identifies a single POSIX group itself
>     (i.e., the /group/, not the/members of/ the group) to be
>     "individual" because it represents one single thing,
>     while another implementer might consider it to be "group" because
>     the one single thing it represents is literally a group.
>
Section 2.4.of the GNAP draft (Identifying the User) states:

If the client instance knows the identity of the end user through one
or more identifiers or assertions, the client instance MAY send that
information to the AS in the "user" field.The client instance MAY
pass this information by value or by reference.

sub_ids (array of objects):An array of subject identifiers for the
end user, as defined by [I-D.ietf-secevent-subject-identifiers].
OPTIONAL.

assertions (array of objects)An array containing assertions as
objects each containing the assertion format and the assertion
value as the JSON string serialization of the assertion.
OPTIONAL.

Assertions could certainly be used, but at the moment I am not aware of 
an IETF RFC that would allow to support some form of interoperability
for subject identifiers or for group memberships.

The SEC-EVENT draft is an opportunity to standardize some of them so 
that they can be used in an interoperable fashion.

>> One sentence is that section is currently:
>>
>> For example, the entity to which the identifiers are presented now 
>> knows that both identifiers relate to the same subject,
>>       and may be able to correlate additional data based on that.
>>
>> Such sentence is inaccurate because it depends upon the type of 
>> subject identifier that is being received.
>>
> Section 6.1 addresses correlation risks specifically arising from 
> including multiple subject identifiers within the same context, as 
> this is something the draft enables
> (via the `aliases` format, and by introducing the `sub_id` JWT claim 
> and permitting its use alongside `sub`). While including a subject 
> identifier with other information
> more generally may introduce correlation risks, those risks are highly 
> context-dependent, and I am not sure that there is any sensible advice 
> to be given in this draft.
> However, I can add something to the effect of "implementers must 
> consider such risks, and specs that use subject identifiers must 
> provide appropriate privacy considerations of their own."

Your argumentation would be valid if the semantics of the subject 
identifier would be out of the scope of the draft, but unfortunately 
this is not the case.

>> Another valuable feature will be, simply, by looking at the structure 
>> of a subject identifier to know whether correlation of user accounts 
>> will
>> or will not be possible. An auditor of an audit trail would be in a 
>> position to know it easily and hence to assess under which conditions 
>> a RS
>> will or will not be in a position to correlate user accounts with 
>> another RS. At the moment, it is impossible.
>
> This assumes that the issuer of the subject identifier is willing to 
> reveal that information to the consumers of that subject identifier.
>
This assumes that the client is willing to ask to the AS to use or 
generate such a subject identifier (if may or may not be able to support 
it).

> Further, since subjects may be correlated via information that is not 
> part of the subject identifier, any "non-correlatable" flag within the 
> subject identifier
> would be insufficient to answer the question of whether the subject 
> can be correlated.
>
The question is not whether other information can allow such 
correlation, but whether such correlation will or will not be possible
only by taking advantage of the content of a SINGLE subject identifier.

>> I have identified two classes of user accounts: long term and temporary.
>>
>> The processing made by the RS will be different whether the subject 
>> identifier is a long term or a temporary subject identifier.
>> Hence a distinction first needs to be made in the structure of the 
>> subject identifier to distinguish between these two classes.
>>
>> My individual draft provides an example for that processing.
>>
> If the intent is to indicate whether or not the /user account/ is long 
> term or temporary, then we are again talking about a property of the 
> /subject/,
> which is better represented as a separate JWT claim, for all the 
> reasons stated above.

In the JWT profile (RFC 9068), there is a a claim called "sub".

The "Privacy considerations" section states:

    This profile mandates the presence of the "sub" claim in every JWT 
access token, making it possible for resource servers to rely on that
    information for correlating incoming requests with data stored 
locally for the authenticated principal.

We cannot change any more the semantics of the "sub" claim which is very 
general and which, by itself, does not allow to know whether or not
some correlation will be possible.

On the contrary, the "sub-id" claim would be able to let the client 
control whether or not some correlation will be possible by the RSa.

> I was not able to find any specific examples of how a processor might 
> change its behavior based on this information.

If you take a look at my draft, the processing by a RS of a sub-claim 
which contains a long term or a temporary subject identifier will be 
rather different.

> I think in most cases, they would operate the same way – operationally 
> speaking, a short-term account is little different from a dormant 
> long-term account.
>
> From what you've written, it seems your goal is to allow users to 
> control whether or not an RS can correlate the end user's activity 
> across multiple sessions.

This is one of the goals, but not the single goal. Introducing the 
support of group memberships is another goal.

> That problem is much larger than subject identifiers and cannot be 
> solved at that level.

It could be solved in this draft.

> Many different kinds of claims may be used to correlate activity 
> beyond those used to identify the subject.
As said earlier, the question is not whether other information can allow 
such correlation, but whether such correlation will or will not be possible
simply by looking at the content of a SINGLE subject identifier.
>
> The RS needs to be denied access to these claims or provided with 
> masked or surrogate values.

?!?

> It is also likely that the end user does not want the RS to know that 
> they are providing non-correlated values,
> as that would allow the RS to modify its behavior to attempt to block 
> access or force the user to provide legitimate values.
> In such a case, including a flag in the subject identifier would 
> undermine the user's control.
>
If the subject identifier contains an email address, the RS will indeed 
know that correlation is likely to be possible.
When the subject identifier contains a Type 1 identifier, the RS will 
need to recognize it otherwise, it cannot to process the request correctly.

> At present, I don't see any justification for including 
> individual/group or correlatability information as part of the subject 
> identifier structure.

The following "Formats" are currently being defined in the draft:

3.2.1.Account Identifier Format
3.2.2.Aliases Identifier Format
3.2.3.Decentralized Identifier (DID) Format
3.2.4.Email Identifier Format
3.2.5.Issuer and Subject Identifier Format
3.2.6.Opaque Identifier Format
3.2.7.Phone Number Identifier Format

Some other useful "formats" should be added, like functional group 
memberships, roles and hierarchical group memberships.
Is there a rational for not adding these ?

Let us now focus on section 3.2.5 about "Issuer and Subject Identifier 
Formats". The text states:

The Issuer and Subject Identifier Format identifies a subject using a 
pair of "iss" and "sub" members, analogous to how subjects
are identified using the "iss" and "sub" claims in OpenID Connect 
[OpenID.Core] ID Tokens.

The *syntax *is currently : a pair of "iss" and "sub" members. _Such 
syntax may be associated with different semantics_.

A client should be able to ask to an AS to deliver (if it can do it) 
into a JWT a subject identifier associated with a user among five 
possibilities:

(1) a user identifier unique for each user/ RS pair
(2) a user identifier unique for each AS / RS pair
(3) a user identifier unique for the AS whatever RS being involved,
(4) a short-term user identifier unique for the AS,
(5) a globally unique user identifier (where the uniqueness is 
independent from the AS).

Let me give an example for each of them:

(1) a user identifier unique for each user/ RS pair

{
"format": "UID-type 1",
"iss": "http://issuer.example.com/",
"sub": "145234573"
}

(2) a user identifier unique for each AS / RS pair

{
"format": "UID-type 2",
"iss": "http://issuer.example.com/",
"sub": "145234573"
}

(3) a user identifier unique for an AS whatever RS being involved,

{
"format": "UID-type 3",
"iss": "http://issuer.example.com/",
"sub": "145234573"
}

(4) a short-term user unique identifier (dependent from the AS)

{
"format": "UID-type 4",
"iss": "http://issuer.example.com/",
"sub": "145234573"
}

(5) a globally unique user identifier (independent from the AS),

{
"format": "GUID-email",
"syntax": "email",
"email": "user@example.com"
}

> I will however publish an update with the additional privacy language 
> I mentioned above.

Before doing it, please take a look at RFC 9068: JWT Profile for OAuth 
2.0 Access Tokens, on page 11
at the "Privacy Considerations" section.

Denis

>
> —
> Annabelle Backman (she/her)
> richanna@amazon.com
>
>
>
>
>> On Mar 12, 2022, at 5:52 AM, Denis <denis.ietf@free.fr> wrote:
>>
>> *CAUTION*: This email originated from outside of the organization. Do 
>> not click links or open attachments unless you can confirm the sender 
>> and know the content is safe.
>>
>>
>> Annabelle,
>>
>> Thank you for your second email.
>>
>> Rather than responding between the lines, I have constructed a global 
>> reply taking into consideration your argumentation.
>>
>> You wrote:
>>
>>         Can you provide examples where it is critical to have this 
>> information encoded within the identifier data structure itself?
>>         Under what circumstances would a consumer of a subject 
>> identifier change their behavior based on this information?
>>
>> The rational of the proposal is related to the case where JWTs are 
>> exchanged between a client and a RS (i.e. not between an AS and a RS).
>> This is certainly not the single case to be considered, but it is an 
>> important case.
>>
>> When an access token is received by a RS, it may contain one or more 
>> subject identifiers.
>>
>> They allow, when necessary, to trace the actions that have been 
>> performed while using a JWT that contained these subject identifiers.
>> These subject identifiers may be placed into an audit trail, for 
>> example, in order to be associated with an action that has taken place.
>> If subject identifiers were fully opaque, they would be of little 
>> use. Hence they need to be structured to be able to make a difference
>> at a first level of granularity between:
>>
>>   * those associated with a single individual, and
>>   * those that relate to a group (i.e. to one or more individuals).
>>
>> Such a difference relates to what I wrote in my original email from 
>> last year:
>>
>>        In order to be able to make the difference, an */optional 
>> /class* attribute should be defined which may take one out of two 
>> values:
>>
>>       * "*ind*" to indicate an individual identifier or
>>       * "*grp" *to indicate a group identifier.
>>
>>
>> *Subject identifiers associated with a single individual
>> *
>> As explained in my individual draft, a subject identifier that 
>> relates to an individual may disclose more or less information
>> that allows RSs to link their user accounts. The choice between these 
>> various types of identifiers may be done by the end user
>> or/and by the client.
>>
>> Depending upon the level of concern (or knowledge) of the individual 
>> as regard to his/her privacy and what is supported
>> by the underlying technology, I have identified up to five possible 
>> choices for the individual.
>>
>> Now, let us come to your question:
>>
>>        Under what circumstances would a consumer of a subject 
>> identifier change their behavior based on this information?
>>
>> I have identified two classes of user accounts: long term and temporary.
>>
>> The processing made by the RS will be different whether the subject 
>> identifier is a long term or a temporary subject identifier.
>> Hence a distinction first needs to be made in the structure of the 
>> subject identifier to distinguish between these two classes.
>>
>> My individual draft provides an example for that processing.
>>
>> Secondly, when a match needs to be done by a RS between subject 
>> identifiers received in different JWTs, it needs to be done
>> using the same type of subject identifiers, hence including that type 
>> in the structure is necessary.
>>
>> Let us now consider the point of view of the client and raise the 
>> following question:
>>
>>         Under what circumstances would a client change its behavior 
>> based on this information?
>>
>> If a client has been asking for a type X of subject identifier in a 
>> JWT and is then able to discover that the JWT contains instead
>> a type Y of subject identifier, then the client SHALL NOT transmit 
>> the JWT to the RS, because the privacy of the end user might be impacted.
>>
>> You wrote:
>>
>>          The Privacy Considerations section is intended to address 
>> the correlation risk generally.
>>          The JWT case is mentioned only as an example. Any 
>> suggestions on how to make that more clear?
>>
>> It is not a matter to make it more clear based on the current content 
>> of the current draft.
>>
>> One sentence is that section is currently:
>>
>>        For example, the entity to which the identifiers are presented 
>> now knows that both identifiers relate to the same subject,
>>       and may be able to correlate additional data based on that.
>>
>> Such sentence is inaccurate because it depends upon the type of 
>> subject identifier that is being received.
>>
>> It is a matter of exposing the privacy concerns of the end-user and 
>> how they may be addressed using one of the five types of subject 
>> identifiers.
>>
>> I mean that it will be possible to revise this section once the five 
>> types of subject identifiers will have been incorporated into the 
>> document.
>>
>>
>> Another valuable feature will be, simply, by looking at the structure 
>> of a subject identifier to know whether correlation of user accounts 
>> will
>> or will not be possible. An auditor of an audit trail would be in a 
>> position to know it easily and hence to assess under which conditions 
>> a RS
>> will or will not be in a position to correlate user accounts with 
>> another RS. At the moment, it is impossible.
>>
>>
>>
>> *Subject identifiers that relate to a group
>> *
>> It will be valuable, simply by looking at the structure of a subject 
>> identifier to know that is it related to a group (and not to a single 
>> individual)
>> and to which kind of group (e.g. hierarchical, functional, or a role).
>>
>> This relates to my original email from last year where I wrote:
>>
>>            It would be useful to define one format for these common 
>> groups: one or more character strings separated by the character slash
>>            for both hierarchical (*hgrp*) and functional group 
>> memberships (*fgrp*) and roles (*role*):
>>
>> *
>> *
>>
>> *Other replies related to your original email
>> *
>> You wrote:
>>
>>       From what you've shared so far, a few things jump out that make 
>> me think it would not be appropriate to include this as a core 
>> property within subject identifier formats:
>> The description assumes the subjects being identified are 
>> users/accounts and that the subject identifier is being exchanged 
>> between an AS and RS.
>>       This is by far not the only use case for subject identifiers.
>>
>> As said earlier, this is certainly not the single case to be 
>> considered, but it is an important case.
>>
>> You wrote:
>>
>> Non-correlation only really makes sense for opaque, surrogate 
>> identifiers like UUIDs.
>>
>> Non-correlation does not only apply to fully opaque identifiers.
>>
>> You wrote:
>>
>>      How do you prevent correlation if your identifier format is an 
>> IP address, phone number, government-issued ID number, domain name,
>>      latitude/longitude, street address, etc.? (Note that if the 
>> local part of an email address can be understood as an opaque, 
>> surrogate identifier
>>      if the issuer of the subject identifier controls the email 
>> address domain, for example, emails generated by Apple's Hide my 
>> Email feature).
>>
>> The standardization community is working taking into consideration 
>> roughly the ISO model, where the application layer is addressed 
>> independently
>> from the transport or network layer. Hiding an IP address is not a 
>> concern for the application layer and hence for the content of a JWT.
>> This concern can be addressed using specific techniques.
>>
>> There exist use cases where an individual can use his user account on 
>> a RS without disclosing a phone number, a government-issued ID number,
>> a domain name, a latitude/longitude (!) or a street address. The list 
>> you provide is an example of such "end user attributes".
>>
>> Note that an email address may be used as a subject identifier if the 
>> AS incorporates it in a "globally unique user identifier", i.e. a 
>> type 4 subject identifier
>> in my contribution ... and if the end-user is indeed accepting or 
>> willing to use such "globally unique user identifier".
>>
>> Let us now finish to address the arguments raised in your first email.
>>
>>       Certainly there are systems out there that issue identifiers 
>> for individuals (i.e., users) and groups that may collide with one 
>> another
>>       (e.g., any system that uses 0-based SQL auto-incrementing 
>> integers for its identifiers). However, I think it is rare for such 
>> identifiers
>>       to be used in cases where interoperability is important, and 
>> _where the type of the subject is not made clear from context_
>>       (e.g., a "GetGroupMembers" API would expect an identifier for a 
>> group, not a user). Further, I suspect any such system that did need
>>       to disambiguate between individuals and groups within the 
>> identifier itself would likely need to disambiguate between other 
>> types of subjects
>>       as well, e.g., hosts, documents, various resources provided by 
>> the service, etc. As such, this proposal would not solve their 
>> problem, and
>>       they would be better off defining their own subject identifier 
>> format for their use case.
>>
>> You say that "there are systems out there that issue identifiers for 
>> individuals (i.e., users) and groups that may collide with one another".
>>
>> It is not because that there may exist some systems that are badly 
>> designed that we should not encourage to build system on good 
>> foundations.
>> The type of the subject identifier cannot always made clear from 
>> context but it can be clear from the content of the JWT.
>>
>> Note also that, when an auditor takes a look at the content of an 
>> audit trail, the "context" has been lost and hence he/she may only 
>> understand
>> the semantics of a subject identifier by looking at its internal 
>> structure.
>>
>> You also wrote "they would be better off defining their own subject 
>> identifier format for their use case."
>>
>> Within the IETF, one of the objectives is interoperability and as 
>> such it can only be achieved using standard track RFCs
>> rather than by defining subject identifier formats for 
>> application-specific use cases, in a non-interoperable way.
>>
>> Denis
>>
>>>> It would be valuable to be able to make a difference between these 
>>>> five types of user identifiers.
>>>>
>>> Can you provide examples where it is critical to have this 
>>> information encoded within the identifier data structure itself? 
>>> Under what circumstances would a consumer of a subject identifier 
>>> change their behavior based on this information?
>>>
>>> From what you've shared so far, a few things jump out that make me 
>>> think it would not be appropriate to include this as a core property 
>>> within subject identifier formats:
>>>
>>>  1. The description assumes the subjects being identified are
>>>     users/accounts and that the subject identifier is being
>>>     exchanged between an AS and RS. This is by far not the only use
>>>     case for subject identifiers.
>>>
>>>  2. Non-correlation only really makes sense for opaque, surrogate
>>>     identifiers like UUIDs. How do you prevent correlation if your
>>>     identifier format is an IP address, phone number,
>>>     government-issued ID number, domain name, latitude/longitude,
>>>     street address, etc.? (Note that if the local part of an email
>>>     address can be understood as an opaque, surrogate identifier if
>>>     the issuer of the subject identifier controls the email address
>>>     domain, for example, emails generated by Apple's Hide my Email
>>>     feature).
>>>
>>>
>>> —
>>> Annabelle Backman (she/her)
>>> richanna@amazon.com
>>>
>>>
>>>
>>>
>>>> On Mar 10, 2022, at 3:27 AM, Denis <denis.ietf@free.fr> wrote:
>>>>
>>>> *CAUTION*: This email originated from outside of the organization. 
>>>> Do not click links or open attachments unless you can confirm the 
>>>> sender and know the content is safe.
>>>>
>>>>
>>>> Hello Annabelle,
>>>>
>>>> I am glad to be able to exchange with you for the very first time.
>>>>
>>>> I am currently rather busy and I don't have the time available for 
>>>> a detailed response.
>>>>
>>>> Nevertheless, I browsed through your comments and I picked one of them:
>>>>
>>>> > A subject identifier *type* attribute would be able to support four 
>>>> values: *guid*, *shared*, *unique* and *tmp*.
>>>>
>>>>     I'm not sure I'm following what you're intending to represent
>>>>     with this, and what problem you're trying to solve.
>>>>
>>>> Last August, I have posted a draft, that has expired, but that you 
>>>> can still find at:
>>>>
>>>>     https://datatracker.ietf.org/doc/html/draft-pinkas-gnap-core-protocol-00.html
>>>>
>>>> If you have some time available, please take a look at section 1.7. 
>>>> called "Short term and long term user accounts",
>>>> where you will get some information. In particular, the following text.
>>>> The four types used in the context of long-term user accounts managed by a RS are:
>>>>
>>>>           (1) a unique user identifier used to identify a user for each User/ RS pair, or
>>>>
>>>>               Note: this option cannot be implemented in the context of a "software-only" solution.
>>>>                     It requires the use, by the end-user, of a secure element with specific security
>>>>                     properties.  [This option is not detailed any further at the moment].
>>>>
>>>>           (2) a unique user identifier used to identify a user for each AS / RS pair, or
>>>>
>>>>           (3) a locally unique user identifier used to identify a user whatever RS is being involved, or
>>>>
>>>>           (4) a globally unique user identifier.
>>>>
>>>>     The last type used in the context of short-term user accounts managed by a RS is:
>>>>
>>>>           (5) a short-term user unique identifier.
>>>>
>>>> It would be valuable to be able to make a difference between these 
>>>> five types of user identifiers.
>>>>
>>>> _Note_: the draft has been posted a few months after my original 
>>>> comment, hence at the time I made my original post
>>>>            my ideas where not yet fully stabilized. Now, they are !
>>>>
>>>> Denis
>>>>
>>>>
>>>>>> On Mar 9, 2022, at 9:31 AM, Denis <denis.ietf@free.fr> wrote:
>>>>>> ...
>>>>>>> While this statement is correct, it should be remembered that 
>>>>>>> the title of this document is:
>>>>>>>
>>>>>>> " Subject Identifiers for Security Event Tokens"
>>>>>>>
>>>>>>> and is not:
>>>>>>>
>>>>>>> " Subject Identifiers *Formats* for Security Event Tokens".
>>>>>
>>>>> The draft formalizes the concept of a "Subject Identifier", and 
>>>>> provides a standard way to represent those as structured data. 
>>>>> Therefore I think the current name remains appropriate.
>>>>>
>>>>>>> *1. Granularity of the identification*
>>>>>>>
>>>>>>> A subject identifier may be able to identify an entity either 
>>>>>>> individually or as a member of a group.
>>>>>>>
>>>>>>> In order to be able to make the difference, an */optional 
>>>>>>> /class* attribute should be defined which may take one out of 
>>>>>>> two values:
>>>>>>>
>>>>>>>   * "*ind*" to indicate an individual identifier or
>>>>>>>   * "*grp" *to indicate a group identifier.
>>>>>>>
>>>>> Whether or not a subject is an individual or group is a property 
>>>>> of the subject itself, not (generally speaking) a property of the 
>>>>> subject identifier. Subject identifiers are not general purpose 
>>>>> containers for claims about a subject – we already have JWTs for 
>>>>> that. 😀
>>>>>
>>>>> Certainly there are systems out there that issue identifiers for 
>>>>> individuals (i.e., users) and groups that may collide with one 
>>>>> another (e.g., any system that uses 0-based SQL auto-incrementing 
>>>>> integers for its identifiers). However, I think it is rare for 
>>>>> such identifiers to be used in cases where interoperability is 
>>>>> important, and where the type of the subject is not made clear 
>>>>> from context (e.g., a "GetGroupMembers" API would expect an 
>>>>> identifier for a group, not a user). Further, I suspect any such 
>>>>> system that did need to disambiguate between individuals and 
>>>>> groups within the identifier itself would likely need to 
>>>>> disambiguate between other types of subjects as well, e.g., hosts, 
>>>>> documents, various resources provided by the service, etc. As 
>>>>> such, this proposal would not solve their problem, and they would 
>>>>> be better off defining their own subject identifier format for 
>>>>> their use case.
>>>>>
>>>>>
>>>>>>> *2. Correlation operations that may be either performed or 
>>>>>>> prevented*
>>>>>>>
>>>>>>> Currently, the Privacy Considerations section only addresses one 
>>>>>>> correlation case where a JWT would have both "sub" and "sub_id" 
>>>>>>> JWT claims.
>>>>>>> While it is appropriate to mention such a case, other 
>>>>>>> correlation cases exist.
>>>>>
>>>>> The Privacy Considerations section is intended to address the 
>>>>> correlation risk generally. The JWT case is mentioned only as an 
>>>>> example. Any suggestions on how to make that more clear?
>>>>>
>>>>>>> These cases become visible if the "sub_id" contains an 
>>>>>>> */optional/* subject identifier *type* attribute.
>>>>>>>
>>>>>>> A subject identifier *type* attribute would be able to support 
>>>>>>> four values: *guid*, *shared*, *unique* and *tmp*.
>>>>>
>>>>> I'm not sure I'm following what you're intending to represent with 
>>>>> this, and what problem you're trying to solve. Any subject 
>>>>> identifier transmitted from one party to another is by definition 
>>>>> "shared". Once the recipient receives that identifier, the 
>>>>> transmitter has no programmatic control over how it is used – 
>>>>> that's the realm of /legal/ contracts, not API contracts.
>>>>>
>>>>> A transmitter that wishes to prevent Recipient A from using the 
>>>>> transmitters subject identifiers to correlate records with 
>>>>> Recipient B may be able to do so by issuing directed identifiers 
>>>>> that are unique per subject+recipient pair. A system that does so 
>>>>> is essentially immune to this correlation risk; it is not clear to 
>>>>> me what value there is in advertising this fact within the subject 
>>>>> identifier.
>>>>>
>>>>>>> *3. Hierarchical group memberships, functional group memberships 
>>>>>>> and roles.*
>>>>>>>
>>>>>>> Examples of group identifiers are : hierarchical group 
>>>>>>> memberships, functional group memberships and roles.
>>>>>>> It would be useful to define one format for these common groups: 
>>>>>>> one or more character strings separated by the character slash.
>>>>>>> for both hierarchical (*hgrp*) and functional group memberships 
>>>>>>> (*fgrp*) and roles (*role*):
>>>>>
>>>>> Hierarchical groups and functional groups are both subject types, 
>>>>> not subject identifier types. I can imagine something like a 
>>>>> `path` subject identifier format that contains an ordered list of 
>>>>> scalar values describing a path within a graph. That graph could 
>>>>> be a filesystem directory tree, an org chart, a computer network, etc.
>>>>>
>>>>> It might look something like:
>>>>>
>>>>>     {
>>>>>
>>>>>       "format": "path",
>>>>>
>>>>>       "path": ["usr", "local", "bin", "sha512sum"]
>>>>>
>>>>>     }
>>>>>
>>>>>
>>>>>     or
>>>>>
>>>>>
>>>>>     {
>>>>>
>>>>>       "format": "path",
>>>>>
>>>>>       "path": ["Example University", "Faculty", "Computer
>>>>>     Science", "Ada Lovelace"]
>>>>>
>>>>>     }
>>>>>
>>>>>
>>>>> Note that the nature of the graph, and whether or not the subject 
>>>>> identifier identifies the entire path itself or just the final 
>>>>> node would depend on the context in which the identifier appears.
>>>>>
>>>>> While it is an interesting thought exercise, unless someone has a 
>>>>> use case for this kind of subject identifier format I don't think 
>>>>> it should go in this draft. It can always be defined by someone 
>>>>> later, if needed.
>>>>>
>>>>> —
>>>>> Annabelle Backman (she/her)
>>>>> richanna@amazon.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Mar 9, 2022, at 9:31 AM, Denis <denis.ietf@free.fr> wrote:
>>>>>>
>>>>>> *CAUTION*: This email originated from outside of the 
>>>>>> organization. Do not click links or open attachments unless you 
>>>>>> can confirm the sender and know the content is safe.
>>>>>>
>>>>>>
>>>>>> Dick,
>>>>>>
>>>>>> As a response to your inquiry, I repost the original email sent 
>>>>>> on 27/05/2021 at 19:42 (Paris local time) to the same recipients.
>>>>>>
>>>>>> Denis
>>>>>>
>>>>>>
>>>>>>> I believe that this document should be enhanced on three aspects 
>>>>>>> that are currently not addressed.
>>>>>>>
>>>>>>> Section 3.1 (Identifier Formats versus Principal Types) states:
>>>>>>>
>>>>>>>     Identifier Formats define how to encode identifying
>>>>>>>     information for a subject.They do not define the type or
>>>>>>>     nature of the subject itself.
>>>>>>>
>>>>>>>  While this statement is correct, it should be remembered that 
>>>>>>> the title of this document is:
>>>>>>>
>>>>>>> " Subject Identifiers for Security Event Tokens"
>>>>>>>
>>>>>>> and is not:
>>>>>>>
>>>>>>> " Subject Identifiers*Formats*for Security Event Tokens".
>>>>>>>
>>>>>>> Therefore it would be possible to 
>>>>>>> add/*optional*/attributes/characteristics to Subject Identifiers 
>>>>>>> which relate to two different topics:
>>>>>>>
>>>>>>> -the granularity of the identification and
>>>>>>> -the correlation operations that may be either performed or 
>>>>>>> prevented using some values contained in a specific format.
>>>>>>> *
>>>>>>> *
>>>>>>> *1. Granularity of the identification*
>>>>>>>
>>>>>>> A subject identifier may be able to identify an entity either 
>>>>>>> individually or as a member of a group.
>>>>>>>
>>>>>>> In order to be able to make the difference, 
>>>>>>> an*/optional/class*attribute should be defined which may take 
>>>>>>> one out of two values:
>>>>>>>
>>>>>>>   * "*ind*" to indicate an individual identifier or
>>>>>>>   * "*grp"*to indicate a group identifier.
>>>>>>>
>>>>>>> Examples:
>>>>>>>
>>>>>>>  "format": "email",
>>>>>>> "class": "ind"
>>>>>>> "email":"tom.jones@example.com"
>>>>>>>
>>>>>>> "format": "email",
>>>>>>> "class": "grp"
>>>>>>> "email":"marketing@example.com"
>>>>>>> *
>>>>>>> *
>>>>>>> *2. Correlation operations that may be either performed or 
>>>>>>> prevented*
>>>>>>>
>>>>>>> Currently, the Privacy Considerations section only addresses one 
>>>>>>> correlation case where a JWT would have both "sub" and "sub_id" 
>>>>>>> JWT claims.
>>>>>>> While it is appropriate to mention such a case, other 
>>>>>>> correlation cases exist.
>>>>>>>
>>>>>>> These cases become visible if the "sub_id" contains 
>>>>>>> an*/optional/*subject identifier*type*attribute.
>>>>>>>
>>>>>>> A subject identifier*type*attribute would be able to support 
>>>>>>> four values:*guid*,*shared*,*unique*and*tmp*.
>>>>>>>
>>>>>>> -a*guid*type indicates a globally unique identifier. Suchsubject 
>>>>>>> identifiertype allows service providers or other servers to link 
>>>>>>> their accounts
>>>>>>>         as soon as they use the same guid.
>>>>>>>
>>>>>>> -an issuer and subject pair (i.e. a subject identifier with 
>>>>>>> the*iss_sub*format) where the subject identifier format would 
>>>>>>> include :
>>>>>>>
>>>>>>>         -a sharedidentifiertype (*shared*) to indicate a subject
>>>>>>>         identifier shared by severalservice providers
>>>>>>>                 (suchsubject identifiertype allows service
>>>>>>>         providers to link their accounts), or
>>>>>>>
>>>>>>>         -a unique identifier type (*unique)*to indicate a
>>>>>>>         subject identifier specific to asingle service provider
>>>>>>>                 (suchsubject identifiertype does not allow
>>>>>>>         service providers or servers to link their accounts),or
>>>>>>>
>>>>>>> -a temporary identifier type (*tmp*)to indicate a subject 
>>>>>>> identifier issued only once by the issuer(sometimes called 
>>>>>>> session identifier).
>>>>>>>         (suchsubject identifiertype does not allow any service 
>>>>>>> provider to perform any linkage between accounts, even on a 
>>>>>>> single service provider)..
>>>>>>>
>>>>>>> _Examples_:
>>>>>>>
>>>>>>> "format": "opaque",
>>>>>>> "type": "guid"
>>>>>>> "id": "11112222333344445555"
>>>>>>>
>>>>>>> "format": "iss_sub",
>>>>>>> "type": "shared"
>>>>>>> "iss":"http://issuer.example.com/",
>>>>>>> "sub": "145234573"
>>>>>>>
>>>>>>> "format": "iss_sub",
>>>>>>> "type": "unique"
>>>>>>> "iss":"http://issuer.example.com/",
>>>>>>> "sub": "145234573"
>>>>>>>
>>>>>>> "format": "opaque",
>>>>>>> "type": "tmp"
>>>>>>> "id": "11112222333344445555"
>>>>>>>
>>>>>>> *3. Hierarchical group memberships, functional group memberships 
>>>>>>> and roles.*
>>>>>>>
>>>>>>> Examples of group identifiers are : hierarchical group 
>>>>>>> memberships, functional group memberships and roles.
>>>>>>> It would be useful to define one format for these common 
>>>>>>> groups:one or more character strings separated by the character 
>>>>>>> slash.
>>>>>>> for both hierarchical (*hgrp*) and functional group memberships 
>>>>>>> (*fgrp*) and roles (*role*):
>>>>>>> _
>>>>>>> _
>>>>>>> _Examples_:
>>>>>>>
>>>>>>> "format": "hgrp",
>>>>>>> "hgrp": "marketing/customer relationships"
>>>>>>>
>>>>>>> "format": "fgrp",
>>>>>>> "fgrp": " university/science/teacher"
>>>>>>>
>>>>>>> "format": "role",
>>>>>>> "role": "auditor"
>>>>>>>
>>>>>>>
>>>>>>> *4. Two nits:*
>>>>>>>
>>>>>>>     (...) ways. (e.g., a host might be identified by an IP or
>>>>>>>     MAC address,
>>>>>>>     while a user might be identified by an email address)
>>>>>>>     Furthermore,
>>>>>>>
>>>>>>> The punctuation point should be moved after the closing 
>>>>>>> parenthesis.
>>>>>>>
>>>>>>>     7.1.Confidentiality and Integrity
>>>>>>>
>>>>>>>     This specification does not define any mechanism for
>>>>>>>     ensuring the
>>>>>>>     confidentiality or integrityi of a Subject Identifier.
>>>>>>>
>>>>>>> Change "integrityi" into "integrity".
>>>>>>>
>>>>>>> Denis
>>>>>>>
>>>>>>>> Thank you Dick and the authors.
>>>>>>>> With my co-chair hat off, I support progressing this document. 
>>>>>>>> I also have a couple comments:
>>>>>>>> 3.2.2: The text refers twice to "alias" subject IDs, but the 
>>>>>>>> format is now named "aliases".
>>>>>>>> Fig. 14 seems to be in conflict with the requirement to have a 
>>>>>>>> single subject for the JWT ("a JWT has one and only one JWT 
>>>>>>>> Subject"). Yes, maybe Elizabeth has a second email address, but 
>>>>>>>> we cannot assume that applications have this kind of logic. 
>>>>>>>> Similarly, the subject-related discussion in Sec. 4.2 (which is 
>>>>>>>> arguably a bit vague) as well as Fig. 18 seems to allow two 
>>>>>>>> different subjects within the JWT.
>>>>>>>> Thanks,
>>>>>>>> Yaron
>>>>>>>>
>>>>>>>> *From:*Dick Hardt<dick.hardt@gmail.com>
>>>>>>>> *Date:*Wednesday, May 26, 2021 at 23:22
>>>>>>>> *To:*SecEvent<id-event@ietf.org>
>>>>>>>> *Cc:*Yaron Sheffer<yaronf.ietf@gmail.com>, Richard Backman, 
>>>>>>>> Annabelle<richanna=40amazon.com@dmarc.ietf.org>, Roman 
>>>>>>>> Danyliw<rdd@cert.org>, Marius 
>>>>>>>> Scurtescu<marius.scurtescu@coinbase.com>
>>>>>>>> *Subject:*Subject Identifiers - Working Group Last Call
>>>>>>>>
>>>>>>>> Hello WG
>>>>>>>> Thanks to Annabelle (and Marius) for the latest update:
>>>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-secevent-subject-identifiers-08
>>>>>>>> Yaron and I would like to make another working group last call 
>>>>>>>> on this draft. We are hopeful there will be enough feedback on 
>>>>>>>> this draft from people that have reviewed it for us to 
>>>>>>>> recommend the draft progressing to the next step.
>>>>>>>> Please review and respond if you are supportive of this draft, 
>>>>>>>> and if you are not supportive, please clarify your concerns.
>>>>>>>> Dick and Yaron
>>>>>>>> Image removed by sender.ᐧ
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Id-event mailing list
>>>>>>>> Id-event@ietf.org
>>>>>>>> https://www.ietf.org/mailman/listinfo/id-event
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Id-event mailing list
>>>>>> Id-event@ietf.org
>>>>>> https://www.ietf.org/mailman/listinfo/id-event
>>>>>
>>>>
>>>
>>
>> _______________________________________________
>> Id-event mailing list
>> Id-event@ietf.org
>> https://www.ietf.org/mailman/listinfo/id-event
>

Re: [Id-event] Subject Identifiers - Working Group Last Call