Re: [Jsonpath] Comments on I-regexp

Greg Dennis <gregsdennis@yahoo.com> Sat, 14 October 2023 02:46 UTC

Return-Path: <gregsdennis@yahoo.com>
X-Original-To: jsonpath@ietfa.amsl.com
Delivered-To: jsonpath@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 88020C1516E0 for <jsonpath@ietfa.amsl.com>; Fri, 13 Oct 2023 19:46:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.105
X-Spam-Level:
X-Spam-Status: No, score=-7.105 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3sNzv79yd3Iu for <jsonpath@ietfa.amsl.com>; Fri, 13 Oct 2023 19:46:01 -0700 (PDT)
Received: from sonic309-15.consmr.mail.bf2.yahoo.com (sonic309-15.consmr.mail.bf2.yahoo.com [74.6.129.125]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D6C51C15155A for <jsonpath@ietf.org>; Fri, 13 Oct 2023 19:46:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1697251559; bh=KVfiKo8kvF9DR0Kob/0Mv3+d3/T9WhRd+VB/zPUDxA0=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject:Reply-To; b=K9fspPN8WpWgcuZa6jLW21Hp9VJMdsldKXpgsGdy5uffKan4gBNIG9q3TufPBAScvxoJTxOhFP9F5KXQ7vi+OAdhjjnewV3dHwv7lkKXbSKswZjEnVhZeZQN2hOylJdJZhFHjcIbbSoUktlI1gbUMABbiNd3YNZc+dC3MHicR4woucH5QTsyzs3FP4cL2V2fwcnQv5X8zMHb5jGLtuoVj2SnNrF0rypRANCz3Ecc5kWXlXhEhy1aevIbNJkMFkcfVfJovVhQiAIKieB7/GHZSCLzEoeP0/NluVaK4kv0CKbT6rV8jPWVNupemqdQhundmdxTzdPPNnFR0m760PfFXQ==
X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1697251559; bh=9i8tSz5oxDsCEuT7W4QAl9JKAQFnSgkzjN70C3J9+fA=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=EmP/PX4ildzKEmwDEjYVvs3IIpGcbIFu9TisSKGrFa3d4jPl5jcaJypIVir/pFPh4GTvE1BoH1G5J5Mdt04a7ay8LPEAUhN0G4Mvc0aLOdTqVwylvp6Q4S+h/Tp5Tjc1ioqJkLQK0YBlydqIBjodeW+HE0SnoNV/jWKrWv4X5qrfTtKgV+M1RkPiN7wrUeRoGL0LmOC+Krc7AZWe4TzBRwbQ44+nbsYr5GVR5dI6AgiV9TVrZH/R9iLO3Sbex4ktozbdEl5pwkiH4KAtLA3a9kiSUiAaTndayrQ6lkWR6m5mP7es/IuDSEm4xNYFl//9/QTkGwJpTFwjPupZKD3jYQ==
X-YMail-OSG: sXybgP4VM1mEy_544s3Lckqm_ddd6vkxkNz0_IJHiqwHQwSDb72pul_WyJ39aJN hj8VoNejheYbkZEGtz0_uP0UaAaMJPD0jQNYY_NHLSboQqYt6dRy3EQVCycutbUyt.Cjk7nVpiAL UC4cuYI6VsJBaHdCay_szoHyykQRe3GuhJJMfjvrTo_m7yzvVaWxfB9nn0ZRUsDZJdKVe3W54R7X 9sq2qCsYEyDi1xeqqDkwldfRcWRM_FiQkZHdB6DOfw45oBG1GYjvcjeS_kj.cFodPR.C4v2QsBWW s0t8UB7pZ09xrGZAQpmJnn_kbgajmsVS6wKa61_CzLOWYJ89xa.mp.FhmgXcT4.eoG.rn7l8f8wd tHkS03x.XeLne7rNo1Fn3KJ4akext.F9_oRI18ZhK6PZakXcCjhTDCoE3dknJffASBLYWB_hYv2R CEToSf.VhOayMTsz1QZzp8hSCFyeph7giN_D3_4nM_pec6.uLt4GOgNWnXx57KWmkxQU1Tltay_2 Mc2QZZF099pTVVIPqtdfMVWQ5SB8ZBl5xWUmBQpbtCVpq2vbMX867mqRX55R1rs_bpJPx2XN9a6R Tabc8Ry2yU0UW0v8is06iBOi1tTQqmq.T9eWCwi.z8ZHRmfFfoc_LfOc93bveJN24c971OU6fQA. 80jIk3kZf1fsB_XWCL3w7nt2FYgqln7LYHe5N6vnTBJllz8NvUmvV0OJ0tH5kY.Z3ut7pnBK2Lnu BuV7Hxzk3DkgcPtqU_7LPo7XVHZmqBSw_3RF0TtZNpdOnraoejjIcESzj..OFGxxQWYr12BwVniw 70KJaPQOp4NllkAZ.myLsBYy1f_dt6Jf8ikh8jckhnX5yb95ygOsKZyperFkTmFJLWu5s48LNYZz 00yMiojub8XMjzbZgZP58Z1IDWW2z5Py0RxWFzaUiVMRavqH7DRCpopgj7ueP83WV.jOApiUvAZq n7IDMXlLT5gZXwP_2S1PIBs2oU2JKOrDJY5ZOZ.0em3Son06F42uoL.ZZUvcuDLy7lPVIbfxbhsU eZIehyAQifzzowXTKuB1z8T1E.rDySeFrZYr.k0b_vRTyUP8uuPWIEUktd4Sf8bcIlN_FhmFpUpI jxONK_vshFXEj0IFF354PdbIIJsEWEeLvjcKOSn1xMMgm8cuEXV.KG7qLvluNWYeAt5PYjIPC9df 85HCK8b.AX1ePwZOr4gcsfVrwTSfRCs5B6wPYHGJyZKnJGeN_ruaOCmQXnQqFHAanM2PDPHtwsRA VkKg0aEqNfj2GYNAdDdmEedvKlwyjoBCk2SpXKT_9pV8SoXeOnqzF4LvhLUEaULUO1kFfIljj6uA _tanbOwdO2rPAm19QwJ3otCizDNQLEttEpJwjB4Kyr6jtknh.hWE5eVzDAOf4HvvrgQsGrJOCdVL MsZmQckUkuIDi69QhGnMZuXlqvN0PYy52BbsKcwQNhcCGTcpm5v0gVQdPWzDjj26rB36n5Hh8ukv jzc5i3pQegoWIg.MnrQfMd3vnmf_17hL_fG.rlqwO2lG8iATr8jdA_YUp1LjMZbJNhvcfp0prmk0 cklBmoE8Gn0RSIZW_9VV_.ujy3LUXVIUV4xxYaUOP7uinGZnrq1drzwSCOBPyJUyM3WZrYQI7cXB VActdaN8xOzkN6nY.WHjAivK9twlnnNeRcO3kYI6SuDO8UBeNw4pyvevZK._gjwsi2k59rcpvqDG 3gm1P5t3JfP0g.CcLqX1raFMoPE2PBrKuhjejqJbE9SnDmsCom.DIX932bblAxO7.mQUjS3HG1Y7 7vjgOhaAoq1SI5Covp4E3flsz8I.TnBIkuw6m2q9LSywO8XpU.nqJU7tw9kmJjONOP_c6wXJl5kD eCvjnXoLMpFCMomYO2tZe._gI9h9szwiZSEw6kg0PWG_NCD3jOtXbfNc47JycjwnwbLlOOSuYuH3 crLIuNkw_hsBt15Zw_AyJLOpaHanx1cEVEfNqF7asr4USPuupCUNG1Yd766biEgQzAG.efo_x.lT cQSeRKaabkf6sJNB7vSgoJoPv5qBNQm8otax0uUpMMuzW9LYv1.C9_wccy0dO5wAwL_t9ouTHlDr eFLIGmDvp70LOeZxKj5_6JlQ.7qv9FYfojNSh_zbbYh9SMTrQAMAlnjbGe9PjK3MQ7Zz1yTSIfP_ cYVCMz1iG7DuEj4wMEbwtoE.2dFYuRFRxCi3S6BF3ym.XpvCDXS6bqVi1HoT7CvXun19tGjs4CH8 jM3XkRGdVFVn6WiEA2nir0cQZAU8HZjqYUdt027cFxcJtXasy8ywGvbJc0Uebhm3F9NnLU.uQ5bi 5aKXwjmsGPnIJT7XhZZkikYReP7vs2n2RPDJ6Kvgwg2ziXmimbULzcnvy30h43sQQ5WccA4VcTUJ PJqQpgF0XrVk4CPFsc9Y-
X-Sonic-MF: <gregsdennis@yahoo.com>
X-Sonic-ID: 039d019b-1b74-40d1-9207-adfc743cb005
Received: from sonic.gate.mail.ne1.yahoo.com by sonic309.consmr.mail.bf2.yahoo.com with HTTP; Sat, 14 Oct 2023 02:45:59 +0000
Date: Sat, 14 Oct 2023 02:45:52 +0000
From: Greg Dennis <gregsdennis@yahoo.com>
Reply-To: Greg Dennis <gregsdennis@yahoo.com>
To: mike=40saxonica.com@dmarc.ietf.org, jsonpath@ietf.org
Message-ID: <2017239622.8220919.1697251552941@mail.yahoo.com>
In-Reply-To: <0194D248-0148-445A-A80B-6D34206B84C8@saxonica.com>
References: <0194D248-0148-445A-A80B-6D34206B84C8@saxonica.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_Part_8220918_1193477730.1697251552940"
X-Mailer: WebService/1.1.21797 YahooMailAndroidMobile
Archived-At: <https://mailarchive.ietf.org/arch/msg/jsonpath/OB0uX4FN_QFbVG_61jPjTnextbA>
Subject: Re: [Jsonpath] Comments on I-regexp
X-BeenThere: jsonpath@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: A summary description of the list to be included in the table on this page <jsonpath.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/jsonpath/>
List-Post: <mailto:jsonpath@ietf.org>
List-Help: <mailto:jsonpath-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/jsonpath>, <mailto:jsonpath-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Oct 2023 02:46:05 -0000

Regarding the anchoring comment, we had an extensive discussion about this on https://github.com/ietf-wg-jsonpath/iregexp/issues/15 when JSON Schema (which primarily uses unanchored expressions) was considering this for its normative Regex reference.
It was ultimately decided that anchors were desired for this specification.  I'd be surprised if it's not in the document somewhere.
Greg

Sent from Yahoo Mail on Android 
 
  On Sat, 14 Oct 2023 at 2:04 pm, Michael Kay<mike=40saxonica.com@dmarc.ietf.org> wrote:   Excellent work.

I find it disappointing that the normative reference is to XSD 1.0 rather than 1.1, since the 1.0 spec contains some quite serious bugs fixed in 1.1. The fact that 1.0 is more widely implemented seems irrelevant; indeed it seems harmful, since I-RegExp implementors may turn to XSD 1.0 implementations for guidance, and it is known that XSD 1.0 implementors have found different ways of fixing the bugs in the spec.

I note that there is no mechanism for identifying Unicode characters by their numeric codepoint. This is not needed in XSD, because the XML escape convention (e.g. `&#x10000;`) is available. I can imagine that in other contexts this could make it quite difficult to write readable regexps, unless some host-language escape mechanism is available. Perhaps the RFC should suggest a convention for denoting "visually indistinctive" characters such as NBSP when I-Regexps are used in IETF specifications?

It is stated that the only functionality supported is string matching; it may be worth mentioning, for those unfamiliar with XSD, that this means anchored string matching.

My reading of the ABNF is that standalone hyphens have been restricted to appear at the start or end of a character group. This is a good solution to a problem that has been very troublesome in XSD; it is worth highlighting this as one of the differences from XSD.

Specifying the syntax directly in the RFC, and the semantics by reference to a different specification that uses a different grammar, creates something of a disconnect. Essentially I think you're expected first to check that the I-RegExp parses according to the RFC grammar, then to reparse it according to the XSD grammar, which yields constructs that are referenced in the XSD semantics. The XSD semantics uses terms like CharGroupPart that appear in the XSD grammar but not in the RFC grammar; it also uses terms like quantifier that appear in both grammars but with different definitions. I don't think there are any insuperable problems here but it does make the detail quite hard to follow. Would it be better to lift the semantics out of XSD and into the RFC?

Michael Kay
Saxonica
-- 
JSONpath mailing list
JSONpath@ietf.org
https://www.ietf.org/mailman/listinfo/jsonpath