Re: [GNAP] Review of draft-ietf-gnap-core-protocol-00 (first half)

Hi Yaron,

Thanks for the thorough reading. I agree with the other commenters that GitHub issues are a better way to track many of these items. There’s a natural threading and separation of discussions, which can get really unwieldy in a mailing list thread, but you can also explicitly cross-link things. Also, when changes are made to the document in response to a specific issue, the change set can be linked to the issue that caused them. It also lets people engage on the specific topics that they want. 

With that in mind, now is the best time for the group to decide on the best practices for disposition and addressing feedback, and agreeing on when and how to bring discussion to the list. We have a lot of work ahead of us, and now is when we need to decide how we’re going to do that work.

In the mean time, I’ve commented on each of these inline below. It’s a lot of writing, and I apologize ahead of time to everyone for that. I also apologize in advance if there are cut off sentences or things that don’t make sense — there was a lot to discuss here in one go, and I’ve tried not to leave anything out. I’m hoping that the responses below will offer some background and clarity into how we got to what’s there now, so that we can find what the right way is forward, even (or especially) where things need to change. Absolutely everything in the 00 draft is up for discussion to drive consensus within the working group going forward. And there is a LOT of discussion for this working group to have still, and I look forward to continuing that discussion in a productive manner.

Thanks,
 — Justin

> On Oct 18, 2020, at 3:44 PM, Yaron Sheffer <yaronf.ietf@gmail.com> wrote:
> 
> Hi Justin, all,
>  
> The draft has undergone major changes since we started the design team, so I spent some time reviewing it. I only got as far as Sec. 3, and I fully intend to review the remainder of the document. But I figured since I have so many comments, I might as well send out what I have.
>  
> * General comment: we have dozens, maybe hundreds of variants/options here. I think we need to define a must-implement subset, or we will never reach interoperability.

Completely agree, but please understand that many of these options are presented by the design team as explicit discussion points. In particular, there are MANY things that are in “Editor’s Note” blocks throughout the document, where the design team saw multiple ways to approach something that we believed the wider working group should weigh in on. In other words, many of what are presented as “options” aren’t integrated as options within the protocol itself, but options FOR the protocol to be considered for integration and decision. 

I’ll also note that the back-and-forth “negotiation” nature of the protocol allows the different parties to present options to each other at runtime (or before) and settle on an overlapping set of functions to enable interop. So the RC can ask for a feature, and the AS can reject it because it isn’t allowed for a specific request or because the AS doesn’t support it at all — in either case, the client doesn’t get what it wants. Even so, I agree that we will need to have a notion of mandatory to implement within the protocol in order to facilitate a base level of interoperability, and this is currently lacking in the document because I don’t have a good idea of what the MTI portions really need to be. As you know, a security protocol needs to be especially careful not to mandate implementation of an insecure option for MTI purposes, and that’s a delicate balance the WG is going to need to settle on. Would you suggest an MTI section be added? OIDC has this, with several profiles within that section.

> * Abstract: the boilerplate text (MUST etc.) usually goes to the Introduction, not the Abstract.

Thanks, I was led astray by the template I used when I converted to Markdown from XML. :) We should move it into a more proper “Introduction" section, which will encompass what’s now labeled “protocol” in section 1. 

> * 1. "The requesting party operating the software" - I think these actions are typically performed by the resource owner, who is not necessarily the requesting party.

You’re right, that is not at all clear from the text as written, we can fix that.

The funny thing is, this is exactly why we separated the requesting party from the resource owner — they are defined by the actions they take. The requesting party interacts with the client while the resource owner does the approval. You’re right that in many cases the RO and RQ are the same person, as this is the core OAuth 2 use case set — and this is what the text in question was actually trying to point out. 

> * RS, aka API: we are using the word "API" several times in a more generic sense (management API, identity API), so maybe replace with "RS, typically a protected API".

Good catch, this is an artifact from a previous version where non-RS items weren’t described as an API so it was a more safe comparison. 

(Even so, things like the token management API and request continuation/management now have parity with RS APIs by using access tokens for protection, so we do need to be more clear/precise about what an “RS” exactly is.)

> * 1.2, "key" is a too generic term in the context of a security protocol. Can we call it something more specific, maybe "holder identity key"?

This isn’t just about a user’s key — this is primarily a key for the requesting client software, but it gets used in a number of ways. Keys can be bound to client instances and to access tokens, and they can be used to identify various parties in the system (including users). 

It was brought up in the design team meetings that we might not even need to define this term, since it’s understood in the wider security space, and so I’m happy to either take it out or expand it as required.

> * 1.3.1. "augmented with a hash of the security information" - this is too implementation-specific, suggest "is cryptographically bound to the security information". In general, step (7) is way too detailed, and verges on the normative.

I’m fine with this suggested rewording. Note though: this is not a generic flow, but this sequence is intended to be specific to use of the “redirect” and “callback (with redirect)” interaction modes in combination with each other, and thus is specific to the functioning of those elements.

> * 1.3.2: it is unclear to me what the goal of the user-code interaction is, in other words what is the threat model. If it's just "second factor, something you have", then an OTP MFA device would be simpler and offer the same security.

This is an equivalent function of the OAuth Device Flow: https://tools.ietf.org/html/rfc8628 <https://tools.ietf.org/html/rfc8628>

It’s not about a second factor for authenticating the user, really, it’s about facilitating authorization on a secondary device through the use of a user-typed code to tie the back end and front end together.

> * 1.3.5 (4): Typically the access token would not be sent once it is expired, right? Let's make the example more realistic.

The intent is that time has passed between (2) and (4) to allow the token to expire — not that the token is already expired. I initially had a “successful request” in the middle, but I cut it out as it wasn’t interesting to the topic of refreshing a token. I’ll add it back in for clarity. 

> * 2: "OpenID Connect claims request" - why not "identity claims request" (which just happens to be similar to the OpenID request)?

Because this is specific to the OpenID Connect Claims query language, which has inherent limitations on its use. More comments in response to your note on section 2.8, below.

> * Never liked the "dolphin" here... If this is listed under "actions", it should be a verb, e.g. "swim".

So there’s a (fairly silly) story on why “dolphin” is an example here and has been an example in several RFC’s I’ve edited, but it’s more appropriately shared over beers when that’s an option again. :) I agree that “swim” is probably less confusing in this specific context though.

> * 2.1: an array for a single AT vs. an object for multiple ATs? This is extremely non-intuitive and probably ugly to code. Also, why should (multiple) tokens be named if a single token doesn't have to, why not list them in an array instead?

I agree that it looks a little odd at first glance, but I’ve coded this in a few languages now and on the request side (this section) it is really clear what you’re doing. I think it might be helpful to build up from the parts of this section of the request, which will show the history and decision process that got us to the syntax that’s there today. Obviously, this isn’t a statement that it needs to stay this way, but I hope that this will help bring to light the reasoning behind what’s there now so that it can be explained, refined, or updated as needed.

At the lowest level, we’ve got the dimensions for access. OAuth 2 showed us that “scope” was a useful way for a client to indicate what it wants, but we’ve found that API designers have used scope to talk about many different dimensions of access, all rolled into different scope values. We are now being explicit about those dimensions, and trying to make those dimensions orthogonal to each other. There is some small amount of fuzziness and potential overlap of interpretation, and ultimately these are all up to the AS/RS (and the API designer, really) to define:

 - type: “the kind of API I want to call”, like OIDC UserInfo or FHIR Medical Record
 - datatypes: “the kind of data I want to use at the API”, like email addresses or office visits
 - actions: “the kind of actions I want to do on the API”, like read, or write, or make the robot move
 - locations: “the RSs I want to call the API on”, like the URL of the UserInfo Endpoint or a URI for a specific document server cluster
 - identifier: “the entity I want to do these actions on”, like the medical record ID or bank account number

These are all strings or arrays of strings, and they’re meant to be very specific to the requested access to the API. Specific APIs are likely to add their own dimensions as well, which can be pretty much anything so long as the RS can interpret the token that it represents. This is why we have the “type” field, which gives us a way to namespace the dimensions. 

But these dimensions don’t live alone: I’m not going to just ask for “read” or just “medical record”, I probably want to ask for “read the medical record”. With scopes, we were forced to just put them all in a bucket and define awkward semantics for combining them, and then watch things go awry as an AS protected different APIs with different interpretations of, for example, “email” as a scope. So with GNAP, we’ve got a way to combine them into single objects. That lets us create combinations like “read the medical record” by combining the “type” and “actions” fields into a single object. The “pass by reference” feature allows us to send a simple string value that is interpreted as if it were one of the objects. So maybe there’s a “fhir-read” value that means “read all available sections of the fhir medical record of the current user”. You can think of it like a pre-computed object where the client doesn’t have to specify all the details — and in that way it works almost just like a scope value would. 

But what if I want read access to the medical history record but ALSO want update access to the demographic information? We can’t just throw it all onto the same object because there are two very distinct requests, but one access token can reasonably encompass both. So in GNAP we’ve got an array of objects that allows us to combine these multidimensional requests and have them all apply to a single access token. Since any of these objects could potentially be replaced by a string, this array could have a combination of strings or objects, but each item in the array represents the same kind of multi-dimensional request. The “scope” field in OAuth 2 is semantically an array (even though it’s formatted as a space-separated list of space-free ascii strings) for exactly this reason: I should be able to ask for multiple scopes at the same time. 

The resulting access token is for the total of everything in the array (that’s granted — of course any of this can be changed by the AS and RO). So an access token always represents an array of requested access rights, each access right is an object that describes what’s being requested across several dimensions which are properties of that object in their own syntax. An access token is a combined collection of access rights, so requesting an access token requires the structure of an array. And so far, this is inline (by design) with what’s in OAuth 2 RAR:  https://tools.ietf.org/id/draft-ietf-oauth-rar-03.html <https://tools.ietf.org/id/draft-ietf-oauth-rar-03.html>

And that brings us to multiple access tokens: Multiple access tokens are requested by sending an object which uses labels to mark individual single access token requests, which as we just saw is an array — so it’s an object whose values are all arrays, but specifically arrays that are defined to ask for a single access token.

Why put the labels on these requests? Multiple access tokens need to have names (or labels) so that the client can tell :which: of its access requests each token maps to. Otherwise, how can a client know which set of resources it’s supposed to use each token with? The “locations” field (discussed more below) is for identifying sets of resources, and so that field is not likely to be the exact set of URLs that the client can use the token at. And it’s also not guaranteed that all APIs will use “locations” in a way that’s meaningful to a client — what if the client wants “read” and “write” access at the same API with different tokens? The “read” token could be long-lived but the “write” token short lived, for security purposes, but the “locations” would be the same. Additionally, the “read” could be limited to GET requests and the “write” limited to “POST” requests, neither of which is really expressed in “locations”. If the RC is going to ask for different tokens in a single request, it needs to have a way to differentiate the portions of the response from each other. 

There are some different syntaxes we could use to make parts of this work, but these choices tend to have other follow-on effects. Here are a few that I’ve cycled through in the design of the current proposal:

We could ask for multiple access tokens this by having a “label” field inside the request for a single access token and then sending an array of single access token request (so an array of arrays), but then you need to answer the question of where in the request to put the label. Should it go in the array like the proposed “token flags”? Should it go inside one of the objects in the array? But if it does, what would it mean for multiple objects in the array to have a “label” value? Additionally, we’d need to detect at the top level whether we had an array of arrays or an array of objects and strings — doable, but more prone to weird error cases. (What if an RC sends an array with one array and one object? We need to detect that now and might end up down the wrong code path before we do, with a naive parser anyway.)

We could simply require all tokens to be labeled at all times, but I would argue that the overwhelmingly common use case of a single access token doesn’t need that kind of overhead because there’s only one set of requested items that the RC is asking for. 

We could define single access tokens as an object, with multiple aspects. This would potentially let us hang additional fields off of the access token request alongside the resource list, but it would also mean that we’d have a degenerate case where you’re defining an object with one member, whose value is always an array. In other words, this is exactly what the JSON Web Key Set format does, which I’ve always found clunky. The top level object doesn’t add anything in the JWKS case, and I’m not seeing how it would add anything here. On the other hand, this would let us use an object for a single access token and potentially an array for multiple access tokens, but at some cost to the complexity at the layers below that.

None of those ideas lead to a better syntax across the different uses cases, or that optimize for simplicity in the common use cases.

All that is to say that I currently think that what we have is the simplest and cleanest approach that I’ve seen so far, and I would really love to see if there are any alternatives that offer this same kind of power and clarity when exercised the same. I would love to see some fresh ideas on different ways to approach this.

> * 2.1.1, "locations": are these really URIs or just URLs? On the other hand, if these are URIs, why do we need a separate "identifier"? There's too much semantic overlap between "type", "location" and "identifier" - specifying just "type" may be sufficient.

In my implementation, “locations” are actually just Strings that the AS and RS can dereference to figure out the intended target of the RS. In practice I think these will be URIs that identify the RS (or really, the APIs hosted at each RS) in a way that the RC can communicate to the AS and back. It’s not meant to be an exhaustive list of URLs that the RS will use the token at, since an “RS” could have a nearly infinite set of URLs that it protects, depending on the nature of the API. GNAP shouldn’t take a stance on the structure of the APIs it protects, much like OAuth didn’t.

“identifier" is for an object identifier of some type within the API — so while “location” is more to identify the RS itself, “identifier” is to indicate the item at the RS which might not be exposed through any kind of URL. 

“type” tells you what’s in the rest of the object. The semantics are shared with OAuth RAR, as mentioned in the note: https://tools.ietf.org/id/draft-ietf-oauth-rar-03.html <https://tools.ietf.org/id/draft-ietf-oauth-rar-03.html> Though, as mentioned in the note, we can’t cleanly normatively reference RAR directly as RAR needs to exist in an OAuth 2 world, which has “scope” and “resource” parameters, as well as some extensions/profiles that use an “aud” parameter and other items. We don’t have those restrictions in GNAP and so the semantics are slightly different here. In an ideal world, we’d define these in GNAP and RAR would be a clean back-port of this function to OAuth 2.

For a concrete strawman example using OpenID Connect’s userinfo endpoint with the subject ID 248289761001, we might have something like this:

{
   type: http://openid.net/specs/connect/1.0/userinfo,
   locations: [ https://server.example.com/userinfo ],
   identifier: 248289761001
}

A more realistic example would be asking for a specific bank account (identifier) at a specific bank (locations) for a specific accounting API (type). 

Any advice to clarify this intent would be greatly appreciated!

> * 2.1.1: it's implied but should be spelled out that the semantics of this example is the cross product of permissions: I am requesting to read and write and "dolphin" both "metadata" and "images", and that applies to both "locations" - 12 different possible actions.

OK, that can be expounded upon here, as that is the intent. Each object is meant to encompass one fully realized combination of options, which is why they’re presented in an array overall (as talked about above).

> * 2.1.2: I don't see the value of the notion of "expansion". It is sufficient that the string is understood by the AS.

While I personally agree with your stance, this was from a discussion on the list about the nature of the string value that we wanted to capture here. 

> * 2.1.2, "in some situations the value is intended to be seen and understood be the RC developer" - shouldn't we require this value to be always fully documented by the AS owner, so that the RC knows what it is requesting?

For cases when the developer sees it and configures it into the RC code, I agree. But there are cases (like section 10.3) where things happen more dynamically that are also enabled by this functionality. 

> * 2.1.4: why are flags represented as resources? The answer, "this is how OAuth 2 does it" is not great.

The idea is that these are items that alter what the access token that’s being requested in the “resources” section. It is a sensible way to hang things together since everything in the array ties to a specific access token. This was a proposed design compromise, and if these flags can fit elsewhere then we can do that.

> * "the each access tokens" -> each access token.

Thanks, got it!

> * I suggest to extend the editor's note to clarify what is the use case, so that the group can decide whether split tokens are indeed necessary.

I think we should have a more detailed conversation about this potential feature than an editor’s note could reasonably get into. There are a number of large items (splitting access tokens, directing access tokens to specific portions of API or RS’s, among others) that I think will be big conversations for the WG, and all the context of these don’t necessarily need to be documented inside the notes in the document itself. 

But for discussion and context, this feature was proposed to deal with cases where the AS might decide to issue several access tokens in any case where the client requested one, because the AS wants to have restricted audiences for individual access tokens but the client doesn’t know about the audiences ahead of time. I agree that it’s a compelling use case, and it puts security of the protected domains in the hands of the AS where it belongs, but it would require a client to be smart enough to know what to do with these tokens downstream, and if OAuth has taught us anything it’s that we can’t rely on client software to be smart about most anything. :)

> * Why is key binding of the access token even optional?

Because people still want to use bearer tokens as an option. I think we can make bound tokens first class citizens in GNAP, and that’s a large reason behind having consistent key binding approaches instead of having them separated out (section 8). I think we can even make the “default” having the token bound to the RC’s presented key (and therefore make “unbound token” a flag instead), but I don’t think there would be support in the community for removing bearer tokens entirely.

> * 2.2, "Subject identifiers requested by the RC serve only to identify the RO in the context of the AS and can't be used as communication channels by the RC" - this sounds a bit naive, people will do that anyway. So why is this a useful statement? (And similarly in 3.4)

This inclusion is based roughly off of a similar constraint in OIDC around subject identifiers: https://openid.net/specs/openid-connect-core-1_0.html#ClaimStability <https://openid.net/specs/openid-connect-core-1_0.html#ClaimStability>

I agree that people will do dumb things anyway, but that’s why we ned to spell out why it’s dumb here.

> * 2.3: what's the benefit of including a URI (URL really) in "display"? Do we really want the user (RO) to click links provided by the RC?

I think we want to give them the option of doing so, yes. It’s up to the AS to help protect the RO, and that will need to be written up in the security considerations. 

This is analogous to “client_uri” in OAuth Dynamic Registration: https://tools.ietf.org/html/rfc7591#section-2

> * 2.3: why are we sending a JWK *as well as* a cert? Are we checking that the cert contains the same public key as the cert?

I would be ok with the different formats being mutually exclusive somehow. The AS would need to check that they’re the same and throw an error if not. But this would also let the RC send over whatever it’s got for its own keys every time.

> * 2.3, " the AS MUST ensure that the key used to sign the request... is associated with the instance identifier" - can we be much more specific here, e.g. the instance_id MUST be byte-identical to the Common Name of the cert? This "association" is never clarified, either here or in Sec. 8.

No, I don’t think we should be so prescriptive — for one, it won’t always be a certificate or a JWK (where you might use the Key ID). And for another, you might want to rotate the key without changing the instance identifier. And finally, I don’t think the CN would even be enough since there will be plenty of cases where you’d be accepting client-supplied keys. 

The “instance_id” is an AS-assigned value that it can use to look up information about the instance of RC software making the call. Part of that includes whatever key information is associated with that instance. The “association” for this kind of thing is up to the AS, and in OAuth we called this “static client registration” or “dynamic client registration”, depending on how it happened. Here, we’re taking a step back from saying that “a client must be registered” and instead simply saying that “if you see an instance_id, you need to know what key goes along with that and prove that that key is used”. For many clients, it’s the same net effect as having a client_id and registering their public key against that client_id in OAuth 2, but for large swaths of clients that currently fall outside of the OAuth world, this is a more flexible overall model.

I’d like to understand more about what you’re seeing here with regard to the keys, though, especially since this section was restructured several times during the design team discussions and it could be more consistent and clear. 

> * 2.3.2: allowing to send the key in multiple formats virtually invites security vulnerabilities. Why is this a good thing?

See above, I think this is something we need to talk about more in the group and I’m very open to having better ways present this information. I didn’t see a harm in having the same key in multiple formats, and JWK itself already allows that with the “x5c” field and related.

> * 2.3.2: cert#256 is not a key, it's an identifier. Why do we include it here? Note that in Sec. 8.3 you are using the full certificate rather than the fingerprint.

It’s more than an identifier, since an identifier could be an arbitrary value — this is cryptographically derived from a specific certificate. If you’re doing MTLS, the client is still sending the full certificate in the TLS layer, but this would be a way to call out that certificate in a way that’s tied to the request, without replicating it entirely within the request body. So it’s a way to send the certificate in a more compact fashion that you can verify, more than it is a way to identify a certificate whose value you’d need to go look up.

> * 2.3.2: why are we using raw PEM certificates instead of the JOSE representation?

The idea was to make it dead simple for developers. Not everything runs on JOSE, but JWK is there if you want to use it (including JWK certificate representations). 

> * 2.3.3: For obvious security reasons, I suggest adding: The AS MAY choose to avoid displaying an arbitrary URI. RC developers must only assume that the "name" field will be displayed.

I would actually go farther in the security considerations and say that not even the name is necessarily trusted for display. An RC could claim “Google Docs” as its display name, and without other information being displayed (like callback URIs or home pages), the RO’s got nothing to go on to figure out if it’s really Google Docs or not. 

> * 2.3.4, "created just for this request" -> created just for this series of requests.

Got it, thanks. 

> * 2.3.4: I suggest referring to RCs that present unknown keys as "anonymous RC", and then in 2.4, say that the AS MUST NOT accept any assertions from an anonymous RC.

I disagree with this — there are a lot of cases out there where the RC isn’t “anonymous” at all but is still presenting a key that the AS doesn’t know. And in fact, the RC might be presenting some assertions as to its own identity that the AS can verify.

> * 2.4: IMO assertion validation is a MUST, and we should specify what we require for every assertion type we allow.

I hear that point, but I’m just not sure we can or even should get into that level of detail. 

> * 2.4: moreover, when possible, the AS MUST match the assertions to the presented sub_id. It may be a hint, but you're not allowed to lie.

The reason that they’re a hint is that the client doesn’t really know who the user is yet. So I agree the AS needs to process them, but that doesn’t mean it has to trust them. 

> * 2.4, "If the identified RQ does not match the RO present at the AS" - I thought there are cases where the RQ is different from the RO. Also, I am wondering why this is a SHOULD, not a MUST.

The SHOULD is exactly for that use case where they aren’t the same person. :) The idea is that the AS would at least be able to detect that the user that the RC thinks is there is different from the one the AS sees. This might be fine for the AS for multi-person delegation use cases, and the AS can prompt the RO during authorization if that’s the case.

I can see that this needs to be a lot clearer, and we can work on that text.

> * 2.4.1 (but probably applicable to any references): what is the lifetime of the reference? As an RC, how long can I assume the user reference is still valid?

That’s up to the AS ultimately. Previous editions of the XYZ Protocol had lifetimes and other things tied to the reference handles (defined in section 3.5), but it didn’t seem like anyone actually cared about those parts in practice. A reference is either going to work, or it won’t, and it could fail for reasons that have nothing to do with a timeout that the RC can’t control or have input into.

> * 2.5: "preferred_locales" is obviously not a mode of interaction. Why is it bundled here?

It controls an aspect of how interaction could happen, which is what all of the fields in this section do. 

Even though it might look similar, these interaction modes aren’t setting up things like the OAuth 2 “grant type” where the combinations of parts are pre-defined and you’re signaling which combination you’re after right at the start. Instead, the RC is saying what it can handle throughout the interaction process. Some of these are things that get the user over to the AS (redirect, user_code, app), some are for getting the user back after interaction (callback), and some are about how interaction can be presented before the RO is known (preferred_locales). All of these are aspects of the interaction process, and it’s the combination of all of these together that say what the client can do. In this model, each of these elements is independently defined, but they are all taken together as a single set of parameters that can control the means of interaction with the user. So in one way of looking at it, you’re creating the “interaction mode” by defining all of the aspects of that mode at once, including any potential parallel alternatives (ie: redirect and app together). 

The discussion on bundles of interaction elements in the editor’s comment at the end of 2.5.3 provides an alternative model that would allow the RC to define multiple sets of these interactions, so that you could say, for instance, “I can get you to an arbitrary URL or type a user code but I can’t do a callback on those — OR I could get you to an arbitrary URL and also do a callback on that one”. This kind of thing would let the AS decide to give out two different interaction URLs with different expectations on what to do afterward, as opposed to right now where there is a single expectation of what to do afterward no matter which URL the user followed to get there. 

I’ll also note that we removed the term “capability” from this section as members of the design team found it confusing as used here since it’s also used elsewhere in the document. I replaced it with “mode”, but that might have the wrong connotations. I’m happy for suggestions of a better name than “mode”, but I’m starting to wonder if everything should just be called “a thing” everywhere. :)

> * 2.5: the way "callback" is described here is as a capability, not an interaction mode. The example only strengthens this view, "callback" is a modifier of the "redirect" mode.

The “callback” field defines “what to do after interaction is done”. Therefore, “callback” does not just modify “redirect”, it also potentially modifies other any other methods of getting the user in front of the AS. When the AS figures out that it’s done with the RO, it follows whatever steps the “callback” defines, regardless of how the user gets there. So for example, you could have the user present a user code, and then follow a callback to push information back into the RC. Or you could have some kind of in-app interaction and still have a callback.

Inversely, you could have a redirect :without: a callback, like if the user scans a QR code on a secondary device but doesn’t come back. 

> * 2.5: "use code" -> user code.

Got it, thanks.

> * 2.5.1: this unnecessary polymorphism IMO just complicates implementations and prevents extensibility. Instead, I would suggest:
> redirect: {}
> redirect:{max_length: 255}

I find the pattern of sending an empty object problematic, and we need to ask if there are other parameters that would really be sent here. 

> redirect:{max_length: 255, callback: {...}}

Again, the “callback” could be applied to multiple outing options, but this would be an alternative of including it as noted in the editor’s note in 2.5.3 about re-using the “callback” field. 

> * 2.5.2: the "app" option is problematic, as you already note. The RC may not even know if a given URL will result in an application launching.

It depends on the nature of app — it could be a single URL like it is in many mobile platforms today but it could also be a bundle of other information that can be used to launch the app. I don’t think we’ll be doing any good by assuming things will continue to be single URLs.

> * 2.5.2: and shouldn't the RC include a list of supported applications? (Which would have lovely privacy implications.)

I think that’s a reasonable dimension and one of the reasons “app” launch should be separate from the “normal” redirect. I brought this up with some implementors in the financial sector a while ago, and with that group at least there wasn’t a lot of desire for that kind of functionality. 

I agree that there are some significant privacy implications driving this as well. 

> * 2.5.3: there's a parenthetical discussion on whether the URL is unique. But the nonce parameter implies that the "hash" parameter is unique per request, making this discussion moot.

I agree with you, if we have a “nonce” then you don’t necessarily need a unique URL, at least not for security purposes. The editor’s note is to point out the inverse that came up in design team discussions, that the “nonce” is not needed to protect against some of the attacks if you require a unique URL. I would like to see some formal analysis against this feature set as we build this out. 

> * 2.5.3: even if we need a different kind of post-interaction "redirect", where the RC is not involved, there must be a way to ensure that the RC receives positive (cryptographically protected) confirmation that the authorization was successful. In other words, what we're discussing here is not an RC interaction mode, it is something else, and should probably have a separate protocol element to describe it.

I’m on the fence about it being something outside the interaction element, since it’s a kind of post-interaction method just like “callback” already defines. I don’t think it makes sense to allow a client to ask for more than one of these at a time, and it could lead to some weird race conditions (do I poll if I might get a callback?) and potential security issues (I polled but then got a callback, is that an injection attack?). 

> * 2.5.3: In the example for "interact" as an array (which I support), the second "redirect" and "user_code" should be independent elements of the array, not members of an object. This assumes they are truly independent interaction modes.

No, they are bundled together as the same element in the array on purpose. This is the client saying “I can send the user to a short URI or I can display a user code, but I can’t get a callback”. This is equivalent to the OAuth 2 device mode of displaying a QR code for the user to scan and giving them a code to type into a static URL. The QR code is already an arbitrary URL and so it can use the same mechanism as “redirect”. To do what you’re asking you’d want a three element array. 

> * 2.5.3.1: what does "RQ is present on the request" mean?

This is most clear for web server based RC’s: the session that was used to send the user outbound needs to be the same that is used when the return comes in. It makes less sense with other interaction methods and other client types, but wherever possible the RC needs to make sure that the user who started the request is “still there” when the response comes back in the front channel. 

Is there a better way to say this without making it overly prescriptive?

> * 2.5.3.2: typo: HTTP the request.

Got it, thanks!

> * 2.7: this seems to assume a short access token (which implies a bearer token). What if we want to use longer self-contained access tokens containing full keys? Should we have a separate "grant_id" value instead?

I don’t see how this implies a short access token, can you expand on what you mean by that? The access token value will always be a string, opaque to the client, so if it’s self contained and has its own keys wrapped inside of it in some way, that’s fine and would still be enough for the AS to figure out what it means. 

There previously was the equivalent of a “grant_id” for exactly this purpose — it was the “transaction handle” in previous revisions, and it was also used for continuation and a number of other functions. That feature has been phased out in favor of using either unique continuation URIs, access tokens, or a combination of both for managing the ongoing grant request process.  Having an explicit identifier would be helpful for this specific use case, but it’s not needed for the more common use cases anymore. Would it be worth bringing it back in?

> * 2.8: another problem with this stanza is that the RC needs to know a priority that the AS supports it. What happens if the AS doesn't?

If the AS doesn’t support this, then it gets ignored and has no effect on the resulting response. At least, that’s how I think it should go — we could also allow the AS to return an error for fields it doesn’t understand, but that’s something that needs to be consistent across the protocol in my opinion.

> * 2.8: I suggest to move this section into a separate I-D, or at least a separate appendix.

As stated in the last note in the section, I agree, and in fact don’t think we should be defining any use of OIDC within the GNAP working group. I think a good exercise would be to build out what an actual OIDC style protocol would be on top of GNAP, and it’s something I’d like to . The design team included this section as a way to show how this type of externally-defined query language could be incorporated, with all of its limitations and assumptions that come with it. 

> * 2.9: it makes sense to bundle extensions and the "capabilities" section into one.

While I see the point, the “capabilities” field is really meant to signal specific extensions. It’s really loosely defined right now though, and I think that we need a lot more concrete applications of the mechanism if we’re going to keep it.

> * 2.9: The AS MUST ignore any unknown extension.

I’m OK with this stance, but it needs to be consistent throughout the protocol, not just at this level. 

> * 3: it's confusing that we're using access_token for two different things. Maybe we could change the RC/AS one to "as_access_token"?

I disagree — I think we’re using them for the same kind of thing. They’re both access tokens, and they’re used by the client in exactly the same way. The resource being accessed is part of the AS in one case, but the RS in the other case, but you use the same kind of artifact in both cases to do the same kind of thing.  I think there’s a lot of power in re-using components like this instead of inventing something special here.

> * 3.1: MAY vary *by* request.

Got it, thanks!

> * 3.1: how is the client behavior "deterministic in all cases" - how does the RC know whether the AS allows continuation or only allows one request at a time?

If the AS does not include a “continue” field in any of its replies, then the RC can’t continue it again. 

> * 3.2.1: the "value" of the access token is opaque in some cases (bearer token) but not in others.

In what cases would it not be opaque to the client? In all the current binding methods, this value would still be opaque and the RC still has to send the token value.

> * 3.2.1: The ASCII restriction on "value" is insufficient. Needs to be printable ASCII, 0x20<v<=0x7f. And I'm not sure if this doesn't leave some characters that still need to be escaped.

That’s fair, and it’s currently a SHOULD anyway so I’m fine with making it more restrictive in that case. OAuth 2 currently has the range "%x20-7E”, but I’m not even sure how strictly that’s enforced in the wild.

> * 3.2.1: why is "expires_in" optional? Typically access tokens are not revoked, and the RC needs a way to manage their lifetime.

Access tokens are revoked outside of the RC’s control in a lot of different ways, not just timeouts. And there are non-expiring access tokens out there as well. It’s possible, though in my experience exceedingly rare, for an RC to preemptively get a new access token before expiry.

> * 3.2.1: "resources" refers to this same section. Did you mean 2.1.2, or are we missing a subsection describing resources/rights?

Good catch, this was in fact meant to point to 2.1.1 (the anchors in markdown are similarly named since this is the corresponding response to that request section).

> * 3.2.1: "key": another polymorphism alert!

Yes, this is intentional use of polymorphism to allow one field to represent different ways of related things instead of having multiple mutually-exclusive fields say the same thing. Right now it’s got four possible values, and each of them answers the question “which key is this token bound to”:

Boolean “true”: use the RC’s presented key (ie, “the key you sent” — the client doesn’t need any additional identifying information to determine which key or method)
Boolean “false”: bearer token (ie, “no key”)
Object: A specific key, not the one the client sent, either a public key for something the RC knows or possibly a full key pair (with private key) or a symmetric key provided by the AS; I think we need more on this use case
String: A specific key, not the one the client sent, indicated by this reference that the RC can figure out

An RC looking at this field is always looking to answer the question of how the token is bound, and this allows us to answer that definitively in a single space.

> * 3.2.1: the example at the bottom of the section talks about a way to present the token, I believe this discussion is out of context here.

I’m not sure which example you are referring to here, since I don’t see anything about a way to present a token. If you mean the editor’s note at the bottom about directed access tokens, this is relevant here because the information needed to direct the RC where it should use the tokens would be included in this response.

> * 3.3: the AS knows now which interaction modes are available. Why not pick just one for its response, to simplify the protocol? As one example, the AS can then more easily decide on timeout behavior.

The AS can do exactly that with the existing construction — it can respond to only the parts that it wants to do for that specific request. 

For example, if the RC doesn’t include the “callback” in its request but the AS decides, for security reasons, that it won’t serve any requests without a way to return control directly to the RC post-interaction, then the AS can reject a request that doesn’t have that. 

For another example, if an AS decides that a specific resource can only be authorized through a user code interaction (maybe it’s for a resource/client format that the AS knows to be a limited device and so the AS has a specific usage pattern it will allow), then the AS can respond only to the “user_code” portion of the client’s request, even if the client includes a “callback” and “redirect” parameter as well. 

Combining these interaction together is key to moving GNAP beyond what OAuth 2’s extremely limited “grant_type” model allows.

> * 3.3.3: Suggest to add to the last paragraph: If the RC does not receive a callback until an RC-determined timeout occurs, the RC MUST act as if the entire request failed.

I’m fine with adding that kind of guidance. It gets fuzzy in terms of security considerations though because “an RC-determined timeout” without other guidance is so loose as to be undefined.

> * 3.4: "updated_at": Unix time is nice, but for absolute time it's common to use ISO 8601 format.

I stole this field from OIDC, but I’m fine with any format here, so if there’s no outcry we should change it in the next revision. 

> * 3.6: YES, the "error" element is exclusive!

I think it should be, but I’d like to see if there’s any kind of counter argument. For example, returning an error BUT also allowing continuation to address the error. Is that allowed? If so we’d need to return the “continue” field as well, which is already defined. But if “error” means “this is final, this request is done now and forever”, then yes it should be completely exclusive of all others. 

> * 3.6: in SecEvent we tried to use RFC 7807 "problem details", I think it could work nicely here.

I’m unfamiliar with the details of that spec, but after a quick skim I think that’s got some possibility. It seems to incorporate a lot of information from the HTTP message in the response body though which isn’t really that applicable.

> * 10.4: "indicating that GNAP" - sentence is broken.

Thanks, that meant to say "indicating that GNAP needs to be used to access the resource”, fixed.

> * This "speculative" access is not useful if the response is only a MAY. Why would the RC try it if it is unlikely to provide useful information?

I think the MAY is not quite the right normative application here. It’s a MUST if you’re doing it to do it in this way, but it’s not a MUST for all RS’s everywhere to respond in this fashion. If the RC tries calling an RS that doesn’t respond with this, the RC simply doesn’t know, from any programmatic or automated way, what to do to make the request actually work.

> * 15: the BCP195 ref is broken - the author names are missing (yes I am one of them...).

Yes, all the references need to be cleaned up! RFCs and IDs get picked up automatically by the tools, but other documents (including, it seems, BCPs) don’t have that luxury. I focused on getting a basic reference in place so that it would compile, and now I need to go back and make it more correct. No offense intended to any of the BCP or other document authors. :)

> * Appendix D: can we name the first scenario, maybe "Simple API Access"?

I don’t think that is an accurate name for what’s being shown here, and especially for what’s being highlighted. We are specifically showing how to get API access through the use of fully redirect-based interaction — redirect the user there, and have them redirected back. It’s analogous to the OAuth 2 Authorization Code Grant Type. It’s very specific to that aspect, and not really specific to “API access”. In fact, nothing in the scenario really needs API access for it to work — the RC could be asking for subject information instead and the rest of the example would remain the same. This is the same scenario as discussed and diagrammed in 1.3.1, but with more extensive examples of protocol messages in use. 

 — Justin

> -- 
> TXAuth mailing list
> TXAuth@ietf.org <mailto:TXAuth@ietf.org>
> https://www.ietf.org/mailman/listinfo/txauth <https://www.ietf.org/mailman/listinfo/txauth>