Re: [OAUTH-WG] SD-JWT explicit guidance on parsing json strings

Orie Steele <orie@transmute.industries> Sat, 14 October 2023 00:57 UTC

MIME-Version: 1.0
References: <CAN8C-_JcYztaVzRBZbiSs7-cXcf1K4AHK9d_Sd47QCaua9oi_A@mail.gmail.com> <CA+k3eCSb3xRrmKaPE4Np4VwdRiZK9DQ9hh8Q8HbNj4hhGxuJbQ@mail.gmail.com> <CAN8C-_Ke-Qo9QBds18OG3810AMomQLAm5t_Ve9asynEnAjQumA@mail.gmail.com> <34E969D2-0F66-46EF-95CB-ABCF5DCA77BE@alkaline-solutions.com>
In-Reply-To: <34E969D2-0F66-46EF-95CB-ABCF5DCA77BE@alkaline-solutions.com>
From: Orie Steele <orie@transmute.industries>
Date: Fri, 13 Oct 2023 19:57:27 -0500
Message-ID: <CAN8C-_KBVVvz=dmeZKJW=6Wk6mG+x4Rbk5t6d4EVRSXV-4hw7A@mail.gmail.com>
To: David Waite <david@alkaline-solutions.com>
Cc: Brian Campbell <bcampbell@pingidentity.com>, oauth <oauth@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000cfed9a0607a2a8cd"
Archived-At: <https://mailarchive.ietf.org/arch/msg/oauth/u2BSU6tc0jy-OZxBAlfyd_QSNWA>
Subject: Re: [OAUTH-WG] SD-JWT explicit guidance on parsing json strings
Precedence: list

Thanks David, responses inline:

On Fri, Oct 13, 2023 at 6:35 PM David Waite <david@alkaline-solutions.com>
wrote:

>
>
> On Oct 13, 2023, at 3:52 PM, Orie Steele <orie@transmute.industries>
> wrote:
>
> Inline (and sorry for repeating points / rambling) :
>
> On Fri, Oct 13, 2023 at 1:25 PM Brian Campbell <bcampbell@pingidentity.com>
> wrote:
>
>> That makes sense in principle but is maybe not particularly actionable or
>> helpful guidance. The need to do some JSON parsing/processing prior to
>> signature verification is kinda inherent to JWS itself.
>>
>
> It's not when you have the verification keys, and you are still looking at
> base64url encoded strings.
> You can try all keys and trade off multiple verification attempts for
> exposing the parser to an attacker directly.
>
> But not all protocols work like this, sometimes you need to decode the JWS
> to discover the verification keys.
>
> In those cases, parsing is basically unavoidable... but you still have the
> choice of "parse as an object" vs "find me a string I need, from another
> string"... both approaches can have vulnerable implementations.
>
> Given the existing security considerations that say that discovery is out
> of scope,
> it's reasonable to assert that implementers SHOULD NOT "parse as json
> objects" the protected header or payload, prior to signature verification.
>
>
> All of JOSE, as well as JWTs, define algorithms and representations for
> use in higher level interoperable applications within protocols.
>
> The “iss” claim is optional, as is its connotations around discovery (say,
> via an OAuth Server Metadata endpoint).
>
> I would tend to expect that the "JSON Object Signing and Encryption"
> specifications expect the ability to parse and serialize JSON objects as
> part of the security layer, such as what would be needed for creating and
> consuming JSON serialization of signed/encrypted messages.
>
> That doesn’t mean it has to be a fully formed JSON library;
>

Yes, I tried to say this.


> just as you don’t need some generalized CBOR library with an object model
> to handle verification of COSE signatures. However, there is a difference
> in established JSON parsers, as well as JSON being used in less constrained
> environments and JSON Text having complete/non-extensible set of data
> types. It is far more likely for generalized JSON parsers and generators to
> be used.
>

Yes, and... those are the kinds of parsers that a developer is likely to
take a software supply chain dependency on.


> An application can constrain things down to convey any requirements
> elsewhere in the protocol (such as conveying a reference to the keys as a
> HTTP header or query parameter), such that integrity of compact
> serialization could be verified before JSON parsing of the protected header.
>
I suppose they could also constrain identifiers and JSON string escaping
> such that they could pull them out via regular expressions.
>

(in Rick and Morty voice) Adding regular expressions is just adding parsing
vulnerability with extra steps

But yes, that might count as a "constrained" / "not a full json parser".


> An application might also require a limited character set (latin-1 code
> point range only with no unicode escaping in strings) to allow consumers to
> reduce the risk of unicode-level vulnerabilities.
>

This is definitely an app or protocol specific thing, I don't think SD-JWT
needs to give guidance like this.


>
> An example of text I would expect from a specification constraining
> JSON-formatted input to be consumed without JSON parsing would be
> https://www.w3.org/TR/webauthn-3/#clientdatajson-serialization.
>

yep, this is great.

> The serialization of the CollectedClientData is a subset of the algorithm
for JSON-serializing to bytes. I.e. it produces a valid JSON encoding of
the CollectedClientData but also provides additional structure that may be
exploited by verifiers to avoid integrating a full JSON parser.

riffing on this, if it's possible to use a "simple and safe parser on the
untrusted data", before using a "maybe safe / probably fast " parser on the
trusted data, maybe do that : )

This is also not dissimilar to additional effort CBOR specs make to reduce
> optionality/variability in formats. That said, the vast majority of relying
> parties (e.g. all but one implementation) will likely use off-the-shelf
> JSON parsers for this data, which could very well be coming from a
> malicious client.
>

So bad guys are going to get their malicious JSON to relying parties, and
relying parties are going to process the JSON (with the finest of JSON
parsers)... no matter what we say in SD-JWT ? : )


>
> Noting that the JWT RFC contains normative statements in security
> considerations that are already targeted at addressing security issues with
> JSON: https://datatracker.ietf.org/doc/html/rfc7515#section-10.12
>
>
>> At a minimum the algorithm is in the header.
>>
>
> Sorta, "parsing" can mean many things... it's reasonable to assume the
> folks who worked on the original RFC had good reasons for
> distinguishing JWS parsers from JSON parsers.
>
> See : https://datatracker.ietf.org/doc/html/rfc7515#section-5.2
> <https://datatracker.ietf.org/doc/html/rfc7515#section-5.2>
>
>
> A JWS parser converts compact and JSON serializations of a JWS into the
> representative parts (protected header, signature, payload, unprotected
> header). JSON parsers neither handle compact serialization input nor have
> octet-string output (sans extensions like replacers).
>
>
Agree with the first sentence, the second sentence answers something I
didn't ask...

https://datatracker.ietf.org/doc/html/rfc7515#section-4

   JWS parsers MUST either reject JWSs with
   duplicate Header Parameter names or use a JSON parser that returns
   only the lexically last duplicate member name, as specified in
   Section 15.12 ("The JSON Object") of ECMAScript 5.1 [ECMAScript].

Must be nice to have a JSON parser that returns the last duplicate
member... but also doesn't have any active CVEs.

I concede most JWT implementations will use a general purpose JSON parser.


<snip>
>
> In the case that you have a public key that is already restricted to a
> single fully specified algorithm,
> I'd argue it's safer to just use the algorithm specified for the public
> key,
> than it is to decode and parse the header, then match the alg to the
> public key,
> and then attempt to verify a potentially tampered / maliciously crafted
> message.
>
>
> Generally, you want the algorithm chosen by the issuer to be externally
> negotiated and authoritative. Whether you verify alg matches alg before or
> after could then be a function of how plausible you think it is that there
> is indeed a remote code execution vulnerability or DOS in your JSON parser.
>
>
Sorry for the confusion here, my comment is that I don't need to
deserialize your untrusted data if I have a key to try to verify it first.
Protocols requiring deserialization of untrusted data are probably not
going away, that doesn't mean we need to encourage more of them.

> SD-JWT doesn't give guidance like this on restricting keys, it could say
> issuer and holder keys SHOULD be restricted to fully specified alg values,
> and still defer key discovery to other specs.
>
>
> Wouldn’t signature and issuance claims verification be typically a
> function of the JWT libraries and not the wrapping SD-JWT libraries?
>
>
Yes, and this is basically what Brian said, and I basically agree... but
SD-JWT inherits this and then adds disclosure deserialization...
At a minimum we need to make sure that the disclosure deserialization never
happens on untrusted data, the current text accomplishes this, but could be
clearer about this pitfall / risk.


> Is this guidance specific for SD-JWT, or is it meant to provide JWT
> guidance in general?
>
>
Guidance for SD-JWT.


> Yep, but in this case, we are inheriting the specific text in
> https://datatracker.ietf.org/doc/html/rfc7515#section-5.2
>
> Which does not forbid or require JSON parsing... in other words, it does
> not provide clear guidance on if parsing the header as a JSON object is
> required... and even if it did... parsing is hard to define as I said above.
>
>
> I don’t understand how step 3 (which takes in a protected header octet
> string, validate it is proper JSON, and return the representative JSON
> object) would not constitute a JSON parser,
>

It does, that was my point... "parser is vague terminology"... any attempt
to deserialize untrusted data can be a weak point, see
https://cwe.mitre.org/data/definitions/502.html

" The product deserializes untrusted data without sufficiently verifying
that the resulting data will be valid. "

In this case, the challenge is that we sometimes need to deserialize before
we can verify... but that's because of the protocols, not the data format /
JOSE...
JOSE doesn't force the behavior, the convenience and power of general
purpose deserializers combined with protocol design constraints creates
this issue.

If it helps I concede that JWS requires the protected header to be parsed
as a JSON object... the same guidance from WebAuthN is therefore relevant.


> even if such a parser was not a generalized implementation. Step 5
> requires processing of the individual properties of this JSON object which
> are required by the application space for support, including `crit`
> validation, pre-signature validation (step 8).
>

Yes, and again, a protocol might require or forbid crit... I'd be cautious
doing anything with decoded protected headers... if you have a choice to
handle verified protected headers instead.


>
> <snip>
>
>
> The principle is: don't use parsers when you don't need to, minimize
> the code associated with processing untrusted data,
>
>
> For SD-JWT, I would suspect this would mean that validation of issuance
> claims MUST happen based on the JWT, e.g. not based on reconstitution of
> selectively disclosed information. E.g., an “exp” claim in a disclosure is
> an invalid SD-JWT.
>

Yes, and the draft does have some text on this, however AFAIK, you can
selectively disclose all registered claim names that the issuer allows ...
including `iss`, `sub`, `cnf` etc.... see:
https://datatracker.ietf.org/doc/html/draft-ietf-oauth-selective-disclosure-jwt-05#section-5.1

"""
It is the Issuer who decides which claims are selectively disclosable and
which are not.
However, claims controlling the validity of the SD-JWT, such as iss, exp,
or nbf are usually included in plaintext.
End-User claims MAY be included as plaintext as well, e.g., if hiding the
particular claims from the Verifier is not required in the intended use
case.
"""


>
> and treat all unverified messages as RCE payloads that are crafted to
> exploit a vulnerable parser if you are paranoid.
>
>
> Once you have established the necessary parsers, this is an audit
> requirement on paranoid implementations. For example, the certificates in
> protected headers could be malformed ASN.1 and could contain invalid public
> keys, such as points that are not in the subgroup.
>

Thanks for mentioning ASN.1... in general any parser might run on data that
has not yet been verified, if deserialization is required, and the data is
untrusted its deserialization of untrusted data,  common weakness
enumeration... aka a security consideration.


> This might be a good reason to push back on using a x509 certificate chain
> in a protected header to dynamically establish the issuer and issuer trust.
>

This is probably protocol or app specific guidance, I think it's safe to
omit from SD-JWT.


> However, there will be applications which require this, and if you need to
> interoperate your only choice will be to harden/isolate that functionality.
>

Yep


> It's still true that an RCE in a parser that is only exploitable in a
> library after verification is a show stopper, but defense in depth requires
> us to address what we can address.
>
>
> Worrying about malicious issuers is another application level concern. If
> your authenticated messages could be coming from malicious parties, you
> have a much harder time.
>

Yes, I would focus on worrying about malicious holders.

Holder's of legitimate credentials and false identities of attackers will
sit in the same lines and present credentials to the same verifiers.

SD-JWT can't fix the deserialization issues that are already out there in
the wild, but we don't need to encourage more of them,
and when shopping for a JWT library to implement SD-JWT, maybe this is a
thing you should consider.

We can say:

When you can, verify data before attempting to deserialize it... If you
can't ... yolo.


>
> <snip>
>
> -DW
>


-- 


ORIE STEELE
Chief Technology Officer
www.transmute.industries

<https://transmute.industries>

[OAUTH-WG] SD-JWT explicit guidance on parsing js… Orie Steele
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Brian Campbell
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Orie Steele
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… David Waite
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Orie Steele
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Denis
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Carsten Bormann
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… David Waite
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Brian Campbell
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Carsten Bormann
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Denis
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Orie Steele
Re: [OAUTH-WG] SD-JWT explicit guidance on parsin… Carsten Bormann