[xml2rfc] Non-determinism (Re: Unable to run xml2rfc --help)

Carsten Bormann <cabo@tzi.org> Mon, 04 January 2021 12:54 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 41D3F3A0CDF for <xml2rfc@ietfa.amsl.com>; Mon, 4 Jan 2021 04:54:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.019
X-Spam-Level:
X-Spam-Status: No, score=-0.019 tagged_above=-999 required=5 tests=[RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id grgYQIRaAntw for <xml2rfc@ietfa.amsl.com>; Mon, 4 Jan 2021 04:54:53 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AFA913A0CDC for <xml2rfc@ietf.org>; Mon, 4 Jan 2021 04:54:52 -0800 (PST)
Received: from [192.168.217.118] (p548dc939.dip0.t-ipconnect.de [84.141.201.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4D8bF45th0zyWB; Mon, 4 Jan 2021 13:54:48 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <E6768027-2AA0-46E2-8F24-61A799F7B963@tzi.org>
Date: Mon, 04 Jan 2021 13:54:48 +0100
X-Mao-Original-Outgoing-Id: 631457688.152284-1150d4b9b7b6549ba3175003781e6736
Content-Transfer-Encoding: quoted-printable
Message-Id: <E97EC791-6E3C-40A3-A4C1-E044DEE9E582@tzi.org>
References: <alpine.DEB.2.20.2012290940050.26613@maria.rogerprice.org> <E6768027-2AA0-46E2-8F24-61A799F7B963@tzi.org>
To: xml2rfc Mailing List <xml2rfc@ietf.org>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/Wwpqb9fg_RopzEcCs61T7ShvRg0>
Subject: [xml2rfc] Non-determinism (Re: Unable to run xml2rfc --help)
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jan 2021 12:54:56 -0000

While it might be a special need to actually compare generated HTML between different runs, but when testing an entire authoring chain, it is a bit unnerving to have changes like

-          <li class="toc ulEmpty compact" id="section-toc.1-1.12">
+          <li class="ulEmpty toc compact" id="section-toc.1-1.12">

between runs that are meant to produce identical output.

My python-fu is not sufficient to suggest a way to hide the randomized Python behavior from this output (replace Dictionary by OrderedDict?, but I thought that was the default now since Python 3.7); could somebody else please step in.

Grüße, Carsten



> On 2020-12-29, at 16:44, Carsten Bormann <cabo@tzi.org> wrote:
> 
> That appears to be a rather funny bug.
> Apparently, the ASCII neutral minus hyphen is in the list of keys. Depending on how the Python hashing random number generator is seeded, that hyphen is going to a position where it can’t be in the regexp or where it can. True Heisenbug. 
> (The hyphen needs to be protected to go into a character class, even when the resulting character range seems to be syntactically acceptable.)
> 
> Sent from mobile, sorry for terse
> 
>> On 29. Dec 2020, at 11:22, Roger Price <roger@rogerprice.org> wrote:
>> 
>> On Mon, 28 Dec 2020, Carsten Bormann wrote:
>> 
>>> On 2020-12-25, at 15:24, Roger Price <roger@rogerprice.org> wrote:
>>>> python3.5
>>> but this got my attention: Python 3.5 is end-of-life; did you try a newer Python?
>> 
>> I changed the xml2rfc shebang to #!/usr/bin/python3.8 and ran Python 3.8.1 but it made no difference.
>> 
>>> I’m out of educated guesses (at least until you are showing more of the traceback),
>> 
>> The full Python error message is:
>> 
>> Traceback (most recent call last):
>>  File "/usr/bin/xml2rfc", line 7, in <module>
>>    from xml2rfc.run import main
>>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/__init__.py", line
>>    14, in <module>
>>    from xml2rfc.parser import  XmlRfcError, CachingResolver, XmlRfcParser, XmlRfc
>>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/parser.py",
>>    line 20, in <module> from xml2rfc.writers import base
>>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/writers/__init__.py",
>>    line 2, in <module> from xml2rfc.writers.base import RfcWriterError
>>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/writers/base.py",
>>    line 30, in <module>
>>    from xml2rfc.util.unicode import ( punctuation, unicode_replacements,
>>    unicode_content_tags, bare_unicode_tags,
>>  File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/util/unicode.py",
>>    line 260, in <module>
>>    punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))
>>  File "/usr/lib/python3.5/re.py", line 224, in compile
>>    return _compile(pattern, flags)
>>  File "/usr/lib/python3.5/re.py", line 293, in _compile
>>    p = sre_compile.compile(pattern, flags)
>>  File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
>>    p = sre_parse.parse(p, flags)
>>  File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
>>    p = _parse_sub(source, pattern, 0)
>>  File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
>>    itemsappend(_parse(source, state))
>>  File "/usr/lib/python3.5/sre_parse.py", line 575, in _parse
>>    raise source.error(msg, len(this) + 1 + len(that))
>> sre_constants.error: bad character range …-‚ at position 3
>> 
>> In file unicode.py at line 260
>> 
>> punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))
>> 
>> I added the two lines
>> 
>> import sys
>> print("unicode.py: list(punctuation.keys())) = {}".format(list(punctuation.keys())))
>> 
>> On runs when Python crashes I see outputs:
>> 
>> unicode.py: list(punctuation.keys()) = ['−', '‐', '′', '\u2002', '’', '´',
>>                 '\u2003', '‚', '–', '\u2009', '”', '-', '“', '‘', '…', '„', '—']
>> unicode.py: list(punctuation.keys()) = ['„', '”', '—', '–', '…', '-', '\u2009',
>>                 '‘', '’', '\u2003', '−', '´', '\u2002', '‚', '′', '‐', '“']
>> 
>> When the runs succeed I see outputs:
>> 
>> unicode.py: list(punctuation.keys()) = ['−', '\u2003', '\u2009', '‘', '“', '‚',
>>                 '-', '”', '—', '’', '′', '\u2002', '‐', '–', '…', '´', '„']
>> unicode.py: list(punctuation.keys()) = ['\u2002', '‐', '′', '\u2003', '−', '´',
>>                 '“', '”', '„', '–', '-', '‘', '’', '…', '‚', '\u2009', '—']
>> 
>> I do not understand why such a list of constants has to be so random. I'm not sure where to look next, but if you would like to see further traces of other functions, just ask. Roger_______________________________________________
>> xml2rfc mailing list
>> xml2rfc@ietf.org
>> https://www.ietf.org/mailman/listinfo/xml2rfc
> _______________________________________________
> xml2rfc mailing list
> xml2rfc@ietf.org
> https://www.ietf.org/mailman/listinfo/xml2rfc