Re: [OAUTH-WG] SD-JWT explicit guidance on parsing json strings

Inline (and sorry for repeating points / rambling) :

On Fri, Oct 13, 2023 at 1:25 PM Brian Campbell <bcampbell@pingidentity.com>
wrote:

> That makes sense in principle but is maybe not particularly actionable or
> helpful guidance. The need to do some JSON parsing/processing prior to
> signature verification is kinda inherent to JWS itself.
>

It's not when you have the verification keys, and you are still looking at
base64url encoded strings.
You can try all keys and trade off multiple verification attempts for
exposing the parser to an attacker directly.

But not all protocols work like this, sometimes you need to decode the JWS
to discover the verification keys.

In those cases, parsing is basically unavoidable... but you still have the
choice of "parse as an object" vs "find me a string I need, from another
string"... both approaches can have vulnerable implementations.

Given the existing security considerations that say that discovery is out
of scope,
it's reasonable to assert that implementers SHOULD NOT "parse as json
objects" the protected header or payload, prior to signature verification.

Noting that the JWT RFC contains normative statements in security
considerations that are already targeted at addressing security issues with
JSON: https://datatracker.ietf.org/doc/html/rfc7515#section-10.12

> At a minimum the algorithm is in the header.
>

Sorta, "parsing" can mean many things... it's reasonable to assume the
folks who worked on the original RFC had good reasons for
distinguishing JWS parsers from JSON parsers.

See : https://datatracker.ietf.org/doc/html/rfc7515#section-5.2
<https://datatracker.ietf.org/doc/html/rfc7515#section-5.2>
The specific RFC text does not require or forbid JSON parsing prior to a
signature check... meaning no guidance is provided.

But then later in security considerations, we see:
https://datatracker.ietf.org/doc/html/rfc7515#section-10.6

It does not say that you need to "parse JSON" in order to "match alg", I
can imagine processing the header as a string to achieve this, and
potentially avoiding exposing a generic (and possibly vulnerable) parser
directly to a tampered message.... I can also imagine doing this: const {
alg } = JSON.parse(protectedHeader).

In the case that you have a public key that is already restricted to a
single fully specified algorithm,
I'd argue it's safer to just use the algorithm specified for the public
key,
than it is to decode and parse the header, then match the alg to the public
key,
and then attempt to verify a potentially tampered / maliciously crafted
message.

SD-JWT doesn't give guidance like this on restricting keys, it could say
issuer and holder keys SHOULD be restricted to fully specified alg values,
and still defer key discovery to other specs.

Perhaps we really can't provide better guidance on parsing protected header
and payload as JSON objects, but I'm willing to try : )

> And as you note, a key id or similar might also be in the header and with
> JWT the issuer is in the payload.
>

This is required in some protocols, which to me implies that you can't
mandate a different behavior in SD-JWT, and support those protocols.
I don't think the trade off in breaking behavior here is worth it, but I do
think some security considerations are worth adding, and see below for why.

> In the absence of some out-of-band information or context known, which
> isn't always the case, some processing has to happen in order to identify
> the algorithm and key to use to check the signature.
>

Exactly.

We had a similar discussion on the COSE list a while back about `alg` and
its use in envelopes like  cose sign 1, and keys like cose keys... and how
to build safe protocols, with the existing RFC text.

The RFCs provided a lot of flexibility in this case for COSE, they allow
alg to even be omitted from headers, since both "alg" and "kid" can be
known in other contexts.

It's too late for that in JOSE, but I believe it would have been a better
path.

For better or worse that's how JWS and JWT work (and I know many have
> argued it's for the worse but I want to avoid that rabbit hole for now) and
> SD-JWT builds directly on JWS/JWT so inherits its benefits and flaws.
>

Yep, but in this case, we are inheriting the specific text in
https://datatracker.ietf.org/doc/html/rfc7515#section-5.2

Which does not forbid or require JSON parsing... in other words, it does
not provide clear guidance on if parsing the header as a JSON object is
required... and even if it did... parsing is hard to define as I said above.

If we assume that the section in question requires parsing of the header
(to match alg), then what I am saying is still a concern,
and what we say in security considerations should be informed by what we
have seen in the wild since the RFC was written.

Developers should be encouraged not to parse messages that have not been
verified, when it's possible to do that.

Occam's razor, least privilege, minimize attack surface, etc...

>
> Do you think there is some useful guidance that could be given at the
> SD-JWT layer that is still acknowledging of that reality?
>

Yes, I think the best outcome would be to state the general security
principle, so that implementers can be aware of it.

The principle is: don't use parsers when you don't need to, minimize
the code associated with processing untrusted data,

and treat all unverified messages as RCE payloads that are crafted to
exploit a vulnerable parser if you are paranoid.

It's still true that an RCE in a parser that is only exploitable in a
library after verification is a show stopper, but defense in depth requires
us to address what we can address.

An exploitable RCE in a parser on messages that have not yet been verified
is categorically worse, and when it's possible to avoid this, implementers
should avoid it.

Implementers might not be able to avoid it, especially when supporting
specific protocols, but that doesn't mean we should not warn them to use
caution.

When we say the discovery of issuer keys is out of scope, we punt the
likelihood of this happening to the spec that defines how to discover
issuer keys.

I think it is better we take more responsibility for guidance here.

> The SD-JWT verification steps do strongly suggest/imply that the hash
> check be done before decoding or parsing a Disclosure. Perhaps that could
> be made more explicit?
>

Yep, I think it's important to align the developers' mental model of what
matching hashes means here.

I do think the current text is ok on this front, but it could be more
explicit, and it should be compared to "calling json parse" on an
unverified message
... if it's not forbidden (and even if it is), the developer might decide
to parse and display unverified disclosures, prior to matching the hashes
to the expected values in the payload.

The order of operations here matters:
https://datatracker.ietf.org/doc/html/draft-ietf-oauth-selective-disclosure-jwt-05#name-verification-of-the-sd-jwt

> But that's just one area that JSON parsing happens.
>
>
The text is good... but consider the following:

Imagine you know the issuer has N keys, and you get an SD-JWT from a holder.

You try to verify the JWT with all known issuer keys and the second issuer
key verifies the JWT.

Alternatively you "parse" the protected head as a string, and discover the
alg and kid, and select the correct issuer key (without parsing the header
as an object). (parsing untrusted data)

The issuer's key verifies the JWT. (trust verification check)

Then you parse the JWT claim set, and grab all the hashes.  (parsing
trusted data)

Then you hash all the disclosures and reject if the holder disclosed hashes
is a superset of the issuer committed hashes. (trust verification check)

Then you parse all the disclosures and map them into their correct spots.
(parsing trusted data)

...

Compare this to the following approach:

The "holder" aka the attacker knows the verifier has an unpatched CVE....
for example: https://nvd.nist.gov/vuln/detail/CVE-2022-25845

The verifier (perhaps a government agency, or a non profit, or whatever...)
gets an SD-JWT from the holder and parses the protected header to obtain a
kid.  (parsing untrusted data)

Although the payload remains unparsed, the RCE runs when the protected
header is parsed... and the verifier is compromised... more bad stuff
happens here...

...

A developer who reads security guidance has the opportunity to try to build
a library that protects against this hypothetical attack... even with
our guidance they may not be successful.

It's not a silver bullet, nothing in security is, but it is an attack that
can be mitigated because its known to be a weak point that's been
repeatedly broken:

"Deserialization of Untrusted Data" -
https://cwe.mitre.org/data/definitions/502.html

I'm happy to propose text in a PR if that helps.

OS

>
>
>
>
>
>
>
> On Thu, Oct 12, 2023 at 5:01 PM Orie Steele <orie@transmute.industries>
> wrote:
>
>> hello,
>>
>> A recent thread on the JOSE mailing list reminded me of how parsing JSON
>> strings when not required can lead to scenarios where an attacker can
>> exploit a vulnerable json parser, possibly before verification occurs... as
>> tokens can sometimes be the first untrusted user input processed by
>> systems, this can be very bad.
>>
>> The word "parse" does not occur once in
>> https://datatracker.ietf.org/doc/html/draft-ietf-oauth-selective-disclosure-jwt-05
>>
>> And yet, there are instances of "JSON-encoded", which presumes "JSON
>> decoding", and through implementation experience, I am aware that it is
>> possible to decode and parse SD-JWTs and their disclosures without
>> performing any verification of hashes or digital signatures.
>>
>> I think it would be wise to add a comment to security considerations
>> stating something to the effect of:
>>
>> In general UTF-8 json strings should not be *parsed* until after a
>> signature verification or a hash match has been confirmed, in the case that
>> implementations parse strings that have not yet been authenticated either
>> by public key signature verification or comparison of a hashed value to
>> strings that have been verified by a public key, it's possible that a
>> malicious issuer, or holder might exploit a verifier that relies on a
>> vulnerable parser.
>>
>> There is this section of discovering issuer keys:
>>
>>
>> https://datatracker.ietf.org/doc/html/draft-ietf-oauth-selective-disclosure-jwt-05#name-issuer-signature-key-distri
>>
>> My experience tells me that if the issuer has multiple keys, the verifier
>> is often likely to employ one of these 2 strategies to find the correct key:
>>
>> 1. try all the keys they know for the issuer
>> 2. try a specific key with a specific key id, possibly after parsing the
>> protected header to obtain the key id.
>>
>> In the case that the parsing library is vulnerable, the attacker can
>> craft a protected header that exploits the verifier prior to signature
>> verification.
>>
>> Apologies if this has already been discussed.
>>
>> Regards,
>>
>> OS
>>
>> --
>>
>>
>> ORIE STEELE
>> Chief Technology Officer
>> www.transmute.industries
>>
>> <https://transmute.industries>
>> _______________________________________________
>> OAuth mailing list
>> OAuth@ietf.org
>> https://www.ietf.org/mailman/listinfo/oauth
>>
>
> *CONFIDENTIALITY NOTICE: This email may contain confidential and
> privileged material for the sole use of the intended recipient(s). Any
> review, use, distribution or disclosure by others is strictly prohibited.
> If you have received this communication in error, please notify the sender
> immediately by e-mail and delete the message and any file attachments from
> your computer. Thank you.*

-- 

ORIE STEELE
Chief Technology Officer
www.transmute.industries

<https://transmute.industries>