Re: [xml2rfc] Unable to run xml2rfc --help

Carsten Bormann <cabo@tzi.org> Tue, 29 December 2020 15:45 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EB38B3A1449 for <xml2rfc@ietfa.amsl.com>; Tue, 29 Dec 2020 07:45:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.895
X-Spam-Level:
X-Spam-Status: No, score=-1.895 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EnlUfKDfiO5L for <xml2rfc@ietfa.amsl.com>; Tue, 29 Dec 2020 07:44:58 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 542FA3A1447 for <xml2rfc@ietf.org>; Tue, 29 Dec 2020 07:44:57 -0800 (PST)
Received: from [192.168.217.124] (p548dca87.dip0.t-ipconnect.de [84.141.202.135]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4D4zJ71jj3zyRB; Tue, 29 Dec 2020 16:44:55 +0100 (CET)
Content-Type: multipart/alternative; boundary="Apple-Mail-F4246D4A-FA7C-4F5F-B5FE-DE0D9DDA8F92"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (1.0)
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <alpine.DEB.2.20.2012290940050.26613@maria.rogerprice.org>
Date: Tue, 29 Dec 2020 16:44:54 +0100
Cc: xml2rfc Mailing List <xml2rfc@ietf.org>
Message-Id: <E6768027-2AA0-46E2-8F24-61A799F7B963@tzi.org>
References: <alpine.DEB.2.20.2012290940050.26613@maria.rogerprice.org>
To: Roger Price <roger@rogerprice.org>
X-Mailer: iPhone Mail (18C66)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/wURuXVCySphLaK-mOsryP18BaWw>
Subject: Re: [xml2rfc] Unable to run xml2rfc --help
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Dec 2020 15:45:03 -0000

That appears to be a rather funny bug.
Apparently, the ASCII neutral minus hyphen is in the list of keys. Depending on how the Python hashing random number generator is seeded, that hyphen is going to a position where it can’t be in the regexp or where it can. True Heisenbug. 
(The hyphen needs to be protected to go into a character class, even when the resulting character range seems to be syntactically acceptable.)

Sent from mobile, sorry for terse

> On 29. Dec 2020, at 11:22, Roger Price <roger@rogerprice.org> wrote:
> 
> On Mon, 28 Dec 2020, Carsten Bormann wrote:
> 
>>> On 2020-12-25, at 15:24, Roger Price <roger@rogerprice.org> wrote:
>>> python3.5
>> but this got my attention: Python 3.5 is end-of-life; did you try a newer Python?
> 
> I changed the xml2rfc shebang to #!/usr/bin/python3.8 and ran Python 3.8.1 but it made no difference.
> 
>> I’m out of educated guesses (at least until you are showing more of the traceback),
> 
> The full Python error message is:
> 
> Traceback (most recent call last):
>  File "/usr/bin/xml2rfc", line 7, in <module>
>    from xml2rfc.run import main
>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/__init__.py", line
>    14, in <module>
>    from xml2rfc.parser import  XmlRfcError, CachingResolver, XmlRfcParser, XmlRfc
>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/parser.py",
>    line 20, in <module> from xml2rfc.writers import base
>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/writers/__init__.py",
>    line 2, in <module> from xml2rfc.writers.base import RfcWriterError
>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/writers/base.py",
>    line 30, in <module>
>    from xml2rfc.util.unicode import ( punctuation, unicode_replacements,
>    unicode_content_tags, bare_unicode_tags,
>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/util/unicode.py",
>    line 260, in <module>
>    punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))
>  File "/usr/lib/python3.5/re.py", line 224, in compile
>    return _compile(pattern, flags)
>  File "/usr/lib/python3.5/re.py", line 293, in _compile
>    p = sre_compile.compile(pattern, flags)
>  File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
>    p = sre_parse.parse(p, flags)
>  File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
>    p = _parse_sub(source, pattern, 0)
>  File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
>    itemsappend(_parse(source, state))
>  File "/usr/lib/python3.5/sre_parse.py", line 575, in _parse
>    raise source.error(msg, len(this) + 1 + len(that))
> sre_constants.error: bad character range …-‚ at position 3
> 
> In file unicode.py at line 260
> 
> punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))
> 
> I added the two lines
> 
> import sys
> print("unicode.py: list(punctuation.keys())) = {}".format(list(punctuation.keys())))
> 
> On runs when Python crashes I see outputs:
> 
> unicode.py: list(punctuation.keys()) = ['−', '‐', '′', '\u2002', '’', '´',
>                 '\u2003', '‚', '–', '\u2009', '”', '-', '“', '‘', '…', '„', '—']
> unicode.py: list(punctuation.keys()) = ['„', '”', '—', '–', '…', '-', '\u2009',
>                 '‘', '’', '\u2003', '−', '´', '\u2002', '‚', '′', '‐', '“']
> 
> When the runs succeed I see outputs:
> 
> unicode.py: list(punctuation.keys()) = ['−', '\u2003', '\u2009', '‘', '“', '‚',
>                 '-', '”', '—', '’', '′', '\u2002', '‐', '–', '…', '´', '„']
> unicode.py: list(punctuation.keys()) = ['\u2002', '‐', '′', '\u2003', '−', '´',
>                 '“', '”', '„', '–', '-', '‘', '’', '…', '‚', '\u2009', '—']
> 
> I do not understand why such a list of constants has to be so random. I'm not sure where to look next, but if you would like to see further traces of other functions, just ask. Roger_______________________________________________
> xml2rfc mailing list
> xml2rfc@ietf.org
> https://www.ietf.org/mailman/listinfo/xml2rfc