Re: [openpgp] Possible ambiguity in description of regular expressions: [^][]
Ángel <angel@16bits.net> Fri, 08 January 2021 00:29 UTC
Return-Path: <angel@16bits.net>
X-Original-To: openpgp@ietfa.amsl.com
Delivered-To: openpgp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 334893A0EC4 for <openpgp@ietfa.amsl.com>; Thu, 7 Jan 2021 16:29:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vOkc8EtF8j_u for <openpgp@ietfa.amsl.com>; Thu, 7 Jan 2021 16:29:40 -0800 (PST)
Received: from mailer.hiddenmail.net (mailer.hiddenmail.net [199.195.249.9]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ECC0F3A0EC8 for <openpgp@ietf.org>; Thu, 7 Jan 2021 16:29:39 -0800 (PST)
Received: from mailer by mailer.hiddenmail.net with local (Exim 4.80) (envelope-from <angel@16bits.net>) id 1kxfes-0003D8-JZ for openpgp@ietf.org; Fri, 08 Jan 2021 01:29:38 +0100
Message-ID: <a061d617a22416638bf1fb0a1f7d66b7495f9b82.camel@16bits.net>
From: Ángel <angel@16bits.net>
To: openpgp@ietf.org
Date: Fri, 08 Jan 2021 01:29:36 +0100
In-Reply-To: <87456fad-06cd-6605-b5d1-ea5ac49c9ee4@andrewg.com>
References: <87r1nguquq.wl-neal@walfield.org> <87tusbuwzp.fsf@fifthhorseman.net> <87mtxzv7mr.wl-neal@walfield.org> <877dor8kl1.fsf@fifthhorseman.net> <87456fad-06cd-6605-b5d1-ea5ac49c9ee4@andrewg.com>
Content-Type: text/plain; charset="ISO-8859-15"
Content-Transfer-Encoding: quoted-printable
User-Agent: Evolution 3.30.5-1.1
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/openpgp/GD5uM-udvBzuQNMymqgvdA6WOXQ>
Subject: Re: [openpgp] Possible ambiguity in description of regular expressions: [^][]
X-BeenThere: openpgp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Ongoing discussion of OpenPGP issues." <openpgp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/openpgp>, <mailto:openpgp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/openpgp/>
List-Post: <mailto:openpgp@ietf.org>
List-Help: <mailto:openpgp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/openpgp>, <mailto:openpgp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jan 2021 00:29:42 -0000
On 2021-01-05 at 17:11 +0000, Andrew Gallagher wrote: > Is there anything to be said for referring out to an external regex > definition instead of reinventing the wheel? :-) The problem is that there is not a single regex specification. Although it would be beneficial to be able to change the regex definition in some minor ways. On this same topic, I had started the following reply last weekend: On 2020-12-23 at 22:58 +0100, Neal H. Walfield wrote: > Wait... that can also be parsed as match anything (2) followed by > match nothing (1)! > > > Perhaps I'm misreading the standard. I'd appreciate confirmation or > any help clarifying my mistake. > > Thanks, > > :) Neal I think it's kinda implicit in the > To include a literal ']' in the sequence, make it the first character > (following a possible '^'). that a range cannot be "[]" or "[^]" since it is specified that in such case the "]" will be a literal character. The place-]-at-the-start (along make-'-'-first-or-last) is a well-known trick when using old regex flavors which don't support escapes inside ranges. I would say the whole section makes sense. But there's room for improvement. A trickier case would be a regular expression such as: > Werner Koch (dist.* This could be taken as a valid regular expression, with the "(" «a single character with no other significance (matching that character)», or a syntax error, since parentheses are 'special' per «An atom is a regular expression in parentheses». Exactly the same case applies to "[foo". Although rfc 4880 makes no reference to invalid regular expressions, I think that's how these should be categorised (another example would be a regular expression beginning with a quantifier).* And since the usage of regular expressions is for trust signatures packets, 5.2.3.15 should probably state that a regular expression that is invalid, or the implementation cannot support for whatever reason e.g. implementations _will_ place a recursion limit), then trust MUST NOT be extended. There's a second definition of the Regular Expressions, which is > The regular expression uses the same syntax as the Henry Spencer's > "almost public domain" regular expression [REGEX] package. with > [REGEX] Jeffrey Friedl, "Mastering Regular Expressions," > O'Reilly, ISBN 0-596-00289-0. However, someone which turned to that book will find that the latest edition (14.5 years old 3rd edition, from August 2006), which is the one readily available, does not describe Henry Spencer regex flavor. It mentions it as historically relevant, and that Perl 2 used an enhanced version of that, but it is not described by itself nor included in the tables comparing different flavors (I guess more details about it might have been removed in the rewrite that went into the second version). It is possible to dig out the original code[1] and actually test how it performs (spoiler: it does reject the above constructs), but one should not need to rely on how that code works. If I had to define the message now from new, I would probably define it as being a POSIX Extended Regular Expressions (ERE)[2] (or a subset of that). Those are relatively similar to the existing definition, are well-known and well-defined, and such definition would allow to simply use existing libraries conforming to that one (including regexec on a POSIX libc). An openpgp client shouldn't really need to care much about creating a regular expression engine. It is a complex part for the tiny usage it would get. In fact, the easier way to implement it would probably be to barely parse the 4880 regex to convert it into an ERE, and then use an existing facility to execute that. The main differences are: Curly brackets { } are special for EREs (used for the range quantifier) but not for 4880 regex, where they would be literals. An empty regex alternation (a | at the beginning or end of an ERE, or of a group inside brackets) is undefined on an ERE. A 4880 regex supports it with the expected meaning. An equivalent regex using ? can be used instead. 4880 regex doesn't support collating expressions inside equivalence sets. 4880 regex allow escaping any character with a backslash. On an ERE you can only escape special characters, an ordinary character preceded by a backslash is undefined (and often used for extensions e.g. \w) Regular expressions are a little-used feature, and the "natural" way to write them would conform to both of those specifications. It is unlikely that someone would have restricted a trust value based on the presence of curly brackets on an User ID (they are legal in the local part of email addresses, even unquoted, but it would be very rare to find one). Equally, it would be strange to needlessly escape characters. So it _may_ be possible the change the definition without adversely affecting existing usage. For full compatibility, changing the regex would need to wait for V5 signatures or, preferably, use a new subpacket type. Happy New Year to all! Ángel González 1- there is a nice copy preserved at https://github.com/garyhouston/regexp.old, see https://garyhouston.github.io/regex/ 2- https://pubs.opengroup.org/onlinepubs/007908799/xbd/re.html#tag_007_004
- [openpgp] Possible ambiguity in description of re… Neal H. Walfield
- Re: [openpgp] Possible ambiguity in description o… Daniel Kahn Gillmor
- Re: [openpgp] Possible ambiguity in description o… Neal H. Walfield
- Re: [openpgp] Possible ambiguity in description o… Daniel Kahn Gillmor
- Re: [openpgp] Possible ambiguity in description o… Andrew Gallagher
- Re: [openpgp] Possible ambiguity in description o… Ángel
- Re: [openpgp] Possible ambiguity in description o… Ángel
- Re: [openpgp] Possible ambiguity in description o… Andrew Gallagher
- Re: [openpgp] Possible ambiguity in description o… Ángel
- Re: [openpgp] Possible ambiguity in description o… Daniel Kahn Gillmor
- Re: [openpgp] Possible ambiguity in description o… Ángel
- Re: [openpgp] Possible ambiguity in description o… Daniel Kahn Gillmor
- Re: [openpgp] Possible ambiguity in description o… Ángel
- Re: [openpgp] Possible ambiguity in description o… Wiktor Kwapisiewicz