Re: [secdir] [Last-Call] [Jsonpath] Secdir last call review of draft-ietf-jsonpath-iregexp-06
Rob Sayre <sayrer@gmail.com> Thu, 18 May 2023 20:31 UTC
Return-Path: <sayrer@gmail.com>
X-Original-To: secdir@ietfa.amsl.com
Delivered-To: secdir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5C6BAC14CE4F; Thu, 18 May 2023 13:31:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.094
X-Spam-Level:
X-Spam-Status: No, score=-2.094 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Yo1LW9huAfYP; Thu, 18 May 2023 13:31:26 -0700 (PDT)
Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0BEC1C14F74E; Thu, 18 May 2023 13:31:26 -0700 (PDT)
Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-510ea8d0bb5so1396548a12.0; Thu, 18 May 2023 13:31:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1684441884; x=1687033884; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=vtIWOjDk4Tzk/Wj/rE1qH7G8DNw575Qe78HCHlDh0uY=; b=XMK+8AarBxtUwVGeEX72jjrK25VyUzHNpUhtTkzQRV0X7cYaURcDdxGRDWZjwl48ql LZ1JHdNpkzWsBDSCrqA1HP+VJjtXsENpYKx/9Ni1IhM7J7lzkiBBxEncR4paXf2apMu2 BbNfuw0SYoJ/gnJwWgVO9RpV7qfckE5JQjeRF3eWoHEZMsjkjtQu1wogdZ1TMJdV1e/6 j39BFT6RuP/dl+LjZ+LYWZo43/rqSzppq9lwxNPi1rk2xjoYocIyeJAviv4WrDG5YBAn AO8yrOO8mys60055W21Xgz+ZVErSCEV13Lj3XnnXCWDy5KZ1kGcMS0KGGaxhGxga0Myn dc+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684441884; x=1687033884; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vtIWOjDk4Tzk/Wj/rE1qH7G8DNw575Qe78HCHlDh0uY=; b=JNIzJRtK7g2zryDIeTKQ7gc9s40/uTy3u4L57+uBMGc/TgRyKAH2u/cx4u+XCireRf XH08hxeJuGqCSZ1YOlqjtVMszzRFEdzH+XAWSV7ghvS79fFOLmuiHBD2xcwC1srQm/xD v9kwB3aGqi10if7+Ycefw7YSt0PqQe06Fc3YN9DVcaAOeiSs0UArfT6msLRrFRTDnSxt fbwqA9Sb7L6ZJpoigrOBDejS7fd2K5nPlCSSypuISCh6ybjh1EKiLLnAfgf2iqblA3u2 m4nD07t9vTh0LArOP1+ac2qrKZI1CH6VxnZuAEpMEnsTLRK7iw2yUVV5gBIqWNGy7q+Y 33nw==
X-Gm-Message-State: AC+VfDySqh/OpAhCyopwu9iXaGVn3WTNjszNAOv0GvQNWPDRacu+v0q6 lXxJuJVasE8j50w7s9nKI/zDyMK2C5cEVOKizvfO3ietYWjGUA==
X-Google-Smtp-Source: ACHHUZ4HHMcfPB0XlhqQbr68GL5Nr3fVtV55DrANSDSNPSbXgGzeRTn8I8BH8o2XL4VTv+vaO9S9i5heKMa+uvNzSas=
X-Received: by 2002:aa7:c048:0:b0:510:f3a6:d50 with SMTP id k8-20020aa7c048000000b00510f3a60d50mr676364edo.36.1684441884049; Thu, 18 May 2023 13:31:24 -0700 (PDT)
MIME-Version: 1.0
References: <168416383998.50512.953102690552943438@ietfa.amsl.com> <CAHBU6iuKKp3g_HbhgaZT8CcStQBKoaHOcdf9ogku=bftYt5wgA@mail.gmail.com> <f69ab710-34eb-1697-be29-cc0298c88da8@it.aoyama.ac.jp> <CAChr6SwqnKoG+MkBkCupKme1Y8aMxosBwHYkVWYy=yfAiwMCZw@mail.gmail.com>
In-Reply-To: <CAChr6SwqnKoG+MkBkCupKme1Y8aMxosBwHYkVWYy=yfAiwMCZw@mail.gmail.com>
From: Rob Sayre <sayrer@gmail.com>
Date: Thu, 18 May 2023 13:31:12 -0700
Message-ID: <CAChr6SyZfLNDXNDkWMS5xu=YvxcPEr=bvCKLVcubs_-BWxn8KA@mail.gmail.com>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Cc: Tim Bray <tbray@textuality.com>, Mike Ounsworth <mike.ounsworth@entrust.com>, secdir@ietf.org, draft-ietf-jsonpath-iregexp.all@ietf.org, jsonpath@ietf.org, last-call@ietf.org
Content-Type: multipart/alternative; boundary="00000000000024ca1705fbfdb01d"
Archived-At: <https://mailarchive.ietf.org/arch/msg/secdir/gHiFQWTdchk3ahh3670cU_CciI8>
Subject: Re: [secdir] [Last-Call] [Jsonpath] Secdir last call review of draft-ietf-jsonpath-iregexp-06
X-BeenThere: secdir@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Security Area Directorate <secdir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/secdir>, <mailto:secdir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/secdir/>
List-Post: <mailto:secdir@ietf.org>
List-Help: <mailto:secdir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/secdir>, <mailto:secdir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 May 2023 20:31:30 -0000
On Wed, May 17, 2023 at 1:48 PM Rob Sayre <sayrer@gmail.com> wrote: > There's one more thing I think you should add to the security > considerations: handling of invalid Unicode in the input. Here is an > example: > > https://github.com/nst/JSONTestSuite/blob/master/test_parsing/i_string_1st_valid_surrogate_2nd_invalid.json > > That string is allowed in ECMAScript and Python, but not Ruby or Rust. > I went back and read: https://github.com/ietf-wg-jsonpath/iregexp/issues/22#issuecomment-1504004958 I think the issue is that JSON, .js, or .py source files are almost always valid UTF-8, so if you just interpret them as a text file, it's fine. But I-Regexp operates on the text produced after the escape sequence processing phase, which happens after UTF decoding. Some languages enforce valid Unicode at this point, but some don't. So it's a bit separate from the encoding of the source file. I think that's what I-JSON was going after, too. thanks, Rob > * https://bjoern.hoehrmann.de/utf-8/decoder/dfa/ > > On Tue, May 16, 2023 at 1:20 AM Martin J. Dürst <duerst@it.aoyama.ac.jp> > wrote: > >> Hello Tim, Mike, others, >> >> Many thanks to Mike for bringing up this issue. I have the following >> comments: >> >> 1) The syntax (and what's more important, but completely missing in the >> text, the semantics) is equivalent to theoretical regular expressions, >> i.e. regular expressions that define regular languages (Chomsky Type 3). >> They can therefore be implemented to run in time linear to the length of >> the string input, and if so implemented, they don't offer any >> possibilities for ReDoS (but see (4) below). >> >> 2) I-Regexps will be executed on all kinds of different regexp >> implementations, as explicitly envisioned by the draft. Some of these >> implementations use backtracking because they need it for more complex >> features, and may (or let's say will) exhibit quadratic, cubic, or >> exponential behavior on some I-Regexp inputs. The examples of nested >> quantifiers given by Mike are a typical case. >> >> 3) Even for the same programming language/regular expression engine, >> different versions may perform differently. As an example, Ruby just >> recently implemented some changes to its regexp engine, implementing a >> cache that makes a lot of cases linear in time and uses space >> proportional to the product of the length of the regexp and the input. >> (see https://bugs.ruby-lang.org/issues/19104). In addition, we >> introduced a predicate to check whether a regexp can be run in linear >> time (https://bugs.ruby-lang.org/issues/19194) and a feature to set a >> timeout (https://bugs.ruby-lang.org/issues/17837). So a regexp that >> might blow up on Ruby versions of 3.1 and lower may behave nicely on >> Ruby 3.2 and higher. >> >> 4) Repetitions with lower and upper bounds are a bit of a special case. >> In theory, α{m,n} (where α is any suitable subregexp) can always be >> expanded to m times α followed by (n-m) times α?. As an example, a{3,7} >> would expand to aaaa?a?a?a?, which can be implemented efficiently (e.g. >> using conversion from a nondeterministic finite automaton (NFA) to a >> deterministic finite automaton (DFA)). However, it's easy to spoof this >> with e.g. a regular expression of the form a{10,1000000}. And if >> 1,000,000 isn't enough, it's easy for the attacker to add a few zeros, >> they may get a 10-fold memory increase for each zero they add. >> It is possible to avoid this memory increase by using some kind of lazy >> construction, which would only be triggered on lengthy inputs, and be >> linear in the length of the input. Nested quantifier ranges will need >> special care. >> [It could be that "Programming Techniques: Regular expression search >> algorithm" (Ken Thomson, https://dl.acm.org/doi/10.1145/363347.363387) >> includes such a technique; unfortunately, I have not yet been able to >> understand that paper because I don't understand the assembly code it >> uses :-(.] >> Ruby produces a "too big number for repeat range" error for ranges >> greater than 100000. I have no idea how the number 100000 was found. >> Ruby 3.2 also returns false for >> Regexp.linear_time?(/(a?){20,200}{20,200}/) (and runs a veeery long time >> on some specific input), which I think is a limitation in the >> implementation, not in principle.) Many engines simply treat stacked >> range quantifiers as errors (see https://regex101.com). >> >> On 2023-05-16 00:52, Tim Bray wrote: >> > I was reading your note and agreeing that yes, it remains possible to >> > devise regexps that are going to cause combinatorial nasties in almost >> any >> > conceivable implementation, but unconvinced about the conclusion that >> it is >> > "still not advisable to run arbitrary user-provided regular expressions >> on >> > your hardware", because it seems to me that the only way to find out if >> the >> > regexp is evil is to run it. >> >> Not exactly. For I-Regexps, it's only the implementation that is >> relevant; all I-Regexps *can* be implemented safely. >> >> Even for much wider classes of regexps, most of them can be very quickly >> classified into safe or potentially dangerous, with rather simple >> heuristics. >> >> So in conclusion, please make sure that the security considerations >> mention all of the following: >> >> - I-Regexps may be implemented to run in linear time in the input, >> but this requires really great care. >> - Existing regexp engines may be used to run I-Regexps, but may run in >> quadratic, cubic, or exponential time of the input length for some >> I-Regexps. >> - Existing regexp engines should be able to easily handle most >> I-Regexps (after the adjustments discussed in Section 5), but >> may reject some types of syntactically legal I-Regexps because they >> cannot guarantee execution in reasonable time (a typical example >> might be a{2,4}{2,4}) >> - Range quantifiers (example: {2,4}) provide specific challenges >> for implementation and may therefore be limited in composability >> (see previous example) or range (e.g. disallowing very large ranges >> such as {20,200000}). Such large ranges may also allow specific >> attacks on some implementations. >> - Different versions of the same regexp library may be more or less >> vulnerable to some attacks. In particular, newer libraries may >> behave better or may offer additional features to check regular >> expressions or guard against high resource consumption. >> >> Regards, Martin. >> >> P.S.: The syntax currently has >> >>>> >> quantifier = ( %x2A-2B ; '*'-'+' >> / "?" ) / ( "{" quantity "}" ) >> >>>> >> The doubly nested alternatives and the extremely short range provide >> more confusion than clarity to the reader. Also, there's absolutely no >> need to use hex for * and + (see the char-val production in RFC 5234). >> So please rewrite this as: >> >>>> >> quantifier = "*" / "+" / "?" / ( "{" quantity "}" ) >> >>>> >> >> Because we want to refer to it in the security discussion, it may also >> be a good idea to introduce a rule name for "{" quantity "}": >> >>>> >> quantifier = "*" / "+" / "?" / range-quantifier >> range-quantifier = "{" quantity "}" >> >>>> >> >> P.P.S.: For additional reading, I suggest starting at >> >> https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865. >> >> All of James Davis' work on ReDoS is well worth reading. >> >> > But I think your closing paragraph provides a solution. >> > >> > On Mon, May 15, 2023 at 8:17 AM Mike Ounsworth via Datatracker < >> > noreply@ietf.org> wrote: >> > … >> > >> >> I wonder if this >> >> document could recommend that implementations include some sort of >> >> configurable >> >> limit on nesting level or on recursion / backtracking depth. >> > >> > >> > That sounds like a good direction, but pretty complex. A simpler option >> > would be that implementations impose a limit on time and/or memory costs >> > and error out when those are breached. Do you think that a >> recommendation >> > along those lines would address your concerns? >> > >> > >> >> -- >> last-call mailing list >> last-call@ietf.org >> https://www.ietf.org/mailman/listinfo/last-call >> >
- [secdir] Secdir last call review of draft-ietf-js… Mike Ounsworth via Datatracker
- Re: [secdir] Secdir last call review of draft-iet… Tim Bray
- Re: [secdir] [EXTERNAL] Re: Secdir last call revi… Mike Ounsworth
- Re: [secdir] [Jsonpath] Secdir last call review o… Greg Dennis
- Re: [secdir] [Jsonpath] Secdir last call review o… Martin J. Dürst
- Re: [secdir] [Last-Call] [Jsonpath] Secdir last c… Rob Sayre
- Re: [secdir] [Last-Call] [Jsonpath] Secdir last c… Rob Sayre
- Re: [secdir] [EXTERNAL] Secdir last call review o… Carsten Bormann
- Re: [secdir] [EXTERNAL] Secdir last call review o… Mike Ounsworth