[secdir] Secdir last call review of draft-ietf-jsonpath-iregexp-06

Mike Ounsworth via Datatracker <noreply@ietf.org> Mon, 15 May 2023 15:17 UTC

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Mike Ounsworth via Datatracker <noreply@ietf.org>
To: secdir@ietf.org
Cc: draft-ietf-jsonpath-iregexp.all@ietf.org, jsonpath@ietf.org, last-call@ietf.org
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <168416383998.50512.953102690552943438@ietfa.amsl.com>
Reply-To: Mike Ounsworth <mike.ounsworth@entrust.com>
Date: Mon, 15 May 2023 08:17:19 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/ht1wIFYitMr9icXveyt0mMcbaAw>
Subject: [secdir] Secdir last call review of draft-ietf-jsonpath-iregexp-06

Reviewer: Mike Ounsworth
Review result: Has Nits

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG. These comments were written primarily for the benefit of the
security area directors. Document editors and WG chairs should treat
these comments just like any other last call comments.

Summary:
I think the normative technical content is fine, but in my opinion the Security
Considerations do not fully address the issue of Regular expression Denial of
Service (ReDoS). The authors could consider expanding the Security
Considerations to provide clearer usage guidance and security expectations. At
the minimum, I think this document should include a security consideration
about the types of "evil regexps" that are still expressible in I-Regexp and
that even with this reduced syntax it is still not advisable to run arbitrary
user-provided regular expressions on your hardware.

Longer review:

The primary security concern with a regular expression specification is
"regular expression denial of service (ReDoS)" attacks, for example as
described in the following OWASP article:
https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

Poor implementations become vulnerable to ReDoS attacks either when a
poorly-optimized regexp pattern is run on a CPU or memory constrained device
over attacker-supplied input, or when an attacker is allowed to submit a regexp
pattern to be run on victim hardware (regardless of amount of available
resources).

draft-ietf-jsonpath-iregexp does address this in both sections 6 and 8, and
while I agree that not including features such as capture groups, lookahead, or
backreferences does help mitigate against poor implementations containing ReDoS
issues, I do not think this is a sufficient coverage of the potential security
issues related to ReDoS. For example, the following "evil regexes" from the
above OWASP article are all expressible in I-Regexp: "(a+)+", "([a-zA-Z]+)*",
"(a|aa)+", "(a|a?)+", all of which will lead to large resource consumption when
processing the input string " aaaaaaaaaaaaaaaaaaaaaaaa!". I am not sufficiently
expert in this domain to know if a well-implemented regexp library would avoid
being vulnerable, or if maliciously-crafted regexp patterns will always be able
to cause large resource consumption regardless of the cleverness of the regexp
engine (I suspect the later). The problem with these "evil regexps" is that
they yield an exponential number of valid ways to parse a single input string
and then provide a worst-case input string which forces it to check every
possible parsing. I-Regexp is simply providing a "translation layer", so
something which is evil to the underlying regexp lib will still be evil at the
I-Regexp level. I think this document should at least provide a security
consideration about the types of non-optimized or evil regexp patterns that are
expressible in the I-Regexp syntax.

Finally, I am aware that this document specifies only the regexp syntax that is
meant to be an easily-convertible subset of the regexp syntaxes of existing
libraries, and is not proposing a processing algorithm. So maybe my further
paragraph is out-of-scope for this document.

I am not sufficiently expert in regular expressions or current best-practice
for writing regular expression libraries to suggest a fix, but I wonder if this
document could recommend that implementations include some sort of configurable
limit on nesting level or on recursion / backtracking depth. For example a
limit whereby an implementation will fail with an error if a given input
requires it to evaluate more than a certain number of nested clauses. Or a
"backtracking limit" whereby an implementation uses the heuristic of following
the longest possible expansion of a + or *, and the implementation will fail
with an error if it is required to backtrack more than a given number of times
to try a different decomposition path through the state machine.

[secdir] Secdir last call review of draft-ietf-js… Mike Ounsworth via Datatracker
Re: [secdir] Secdir last call review of draft-iet… Tim Bray
Re: [secdir] [EXTERNAL] Re: Secdir last call revi… Mike Ounsworth
Re: [secdir] [Jsonpath] Secdir last call review o… Greg Dennis
Re: [secdir] [Jsonpath] Secdir last call review o… Martin J. Dürst
Re: [secdir] [Last-Call] [Jsonpath] Secdir last c… Rob Sayre
Re: [secdir] [Last-Call] [Jsonpath] Secdir last c… Rob Sayre
Re: [secdir] [EXTERNAL] Secdir last call review o… Carsten Bormann
Re: [secdir] [EXTERNAL] Secdir last call review o… Mike Ounsworth