[xml2rfc] Non-determinism (Re: Unable to run xml2rfc --help)
Carsten Bormann <cabo@tzi.org> Mon, 04 January 2021 12:54 UTC
Return-Path: <cabo@tzi.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 41D3F3A0CDF for <xml2rfc@ietfa.amsl.com>; Mon, 4 Jan 2021 04:54:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.019
X-Spam-Level:
X-Spam-Status: No, score=-0.019 tagged_above=-999 required=5 tests=[RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id grgYQIRaAntw for <xml2rfc@ietfa.amsl.com>; Mon, 4 Jan 2021 04:54:53 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AFA913A0CDC for <xml2rfc@ietf.org>; Mon, 4 Jan 2021 04:54:52 -0800 (PST)
Received: from [192.168.217.118] (p548dc939.dip0.t-ipconnect.de [84.141.201.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4D8bF45th0zyWB; Mon, 4 Jan 2021 13:54:48 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <E6768027-2AA0-46E2-8F24-61A799F7B963@tzi.org>
Date: Mon, 04 Jan 2021 13:54:48 +0100
X-Mao-Original-Outgoing-Id: 631457688.152284-1150d4b9b7b6549ba3175003781e6736
Content-Transfer-Encoding: quoted-printable
Message-Id: <E97EC791-6E3C-40A3-A4C1-E044DEE9E582@tzi.org>
References: <alpine.DEB.2.20.2012290940050.26613@maria.rogerprice.org> <E6768027-2AA0-46E2-8F24-61A799F7B963@tzi.org>
To: xml2rfc Mailing List <xml2rfc@ietf.org>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/Wwpqb9fg_RopzEcCs61T7ShvRg0>
Subject: [xml2rfc] Non-determinism (Re: Unable to run xml2rfc --help)
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jan 2021 12:54:56 -0000
While it might be a special need to actually compare generated HTML between different runs, but when testing an entire authoring chain, it is a bit unnerving to have changes like - <li class="toc ulEmpty compact" id="section-toc.1-1.12"> + <li class="ulEmpty toc compact" id="section-toc.1-1.12"> between runs that are meant to produce identical output. My python-fu is not sufficient to suggest a way to hide the randomized Python behavior from this output (replace Dictionary by OrderedDict?, but I thought that was the default now since Python 3.7); could somebody else please step in. Grüße, Carsten > On 2020-12-29, at 16:44, Carsten Bormann <cabo@tzi.org> wrote: > > That appears to be a rather funny bug. > Apparently, the ASCII neutral minus hyphen is in the list of keys. Depending on how the Python hashing random number generator is seeded, that hyphen is going to a position where it can’t be in the regexp or where it can. True Heisenbug. > (The hyphen needs to be protected to go into a character class, even when the resulting character range seems to be syntactically acceptable.) > > Sent from mobile, sorry for terse > >> On 29. Dec 2020, at 11:22, Roger Price <roger@rogerprice.org> wrote: >> >> On Mon, 28 Dec 2020, Carsten Bormann wrote: >> >>> On 2020-12-25, at 15:24, Roger Price <roger@rogerprice.org> wrote: >>>> python3.5 >>> but this got my attention: Python 3.5 is end-of-life; did you try a newer Python? >> >> I changed the xml2rfc shebang to #!/usr/bin/python3.8 and ran Python 3.8.1 but it made no difference. >> >>> I’m out of educated guesses (at least until you are showing more of the traceback), >> >> The full Python error message is: >> >> Traceback (most recent call last): >> File "/usr/bin/xml2rfc", line 7, in <module> >> from xml2rfc.run import main >> File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/__init__.py", line >> 14, in <module> >> from xml2rfc.parser import XmlRfcError, CachingResolver, XmlRfcParser, XmlRfc >> File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/parser.py", >> line 20, in <module> from xml2rfc.writers import base >> File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/writers/__init__.py", >> line 2, in <module> from xml2rfc.writers.base import RfcWriterError >> File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/writers/base.py", >> line 30, in <module> >> from xml2rfc.util.unicode import ( punctuation, unicode_replacements, >> unicode_content_tags, bare_unicode_tags, >> File "/mnt/home/rprice/.local/lib/python3.5/site-packages/xml2rfc/util/unicode.py", >> line 260, in <module> >> punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys()))) >> File "/usr/lib/python3.5/re.py", line 224, in compile >> return _compile(pattern, flags) >> File "/usr/lib/python3.5/re.py", line 293, in _compile >> p = sre_compile.compile(pattern, flags) >> File "/usr/lib/python3.5/sre_compile.py", line 536, in compile >> p = sre_parse.parse(p, flags) >> File "/usr/lib/python3.5/sre_parse.py", line 829, in parse >> p = _parse_sub(source, pattern, 0) >> File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub >> itemsappend(_parse(source, state)) >> File "/usr/lib/python3.5/sre_parse.py", line 575, in _parse >> raise source.error(msg, len(this) + 1 + len(that)) >> sre_constants.error: bad character range …-‚ at position 3 >> >> In file unicode.py at line 260 >> >> punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys()))) >> >> I added the two lines >> >> import sys >> print("unicode.py: list(punctuation.keys())) = {}".format(list(punctuation.keys()))) >> >> On runs when Python crashes I see outputs: >> >> unicode.py: list(punctuation.keys()) = ['−', '‐', '′', '\u2002', '’', '´', >> '\u2003', '‚', '–', '\u2009', '”', '-', '“', '‘', '…', '„', '—'] >> unicode.py: list(punctuation.keys()) = ['„', '”', '—', '–', '…', '-', '\u2009', >> '‘', '’', '\u2003', '−', '´', '\u2002', '‚', '′', '‐', '“'] >> >> When the runs succeed I see outputs: >> >> unicode.py: list(punctuation.keys()) = ['−', '\u2003', '\u2009', '‘', '“', '‚', >> '-', '”', '—', '’', '′', '\u2002', '‐', '–', '…', '´', '„'] >> unicode.py: list(punctuation.keys()) = ['\u2002', '‐', '′', '\u2003', '−', '´', >> '“', '”', '„', '–', '-', '‘', '’', '…', '‚', '\u2009', '—'] >> >> I do not understand why such a list of constants has to be so random. I'm not sure where to look next, but if you would like to see further traces of other functions, just ask. Roger_______________________________________________ >> xml2rfc mailing list >> xml2rfc@ietf.org >> https://www.ietf.org/mailman/listinfo/xml2rfc > _______________________________________________ > xml2rfc mailing list > xml2rfc@ietf.org > https://www.ietf.org/mailman/listinfo/xml2rfc
- [xml2rfc] Unable to run xml2rfc --help Roger Price
- Re: [xml2rfc] Unable to run xml2rfc --help Roger Price
- Re: [xml2rfc] Unable to run xml2rfc --help Carsten Bormann
- Re: [xml2rfc] Unable to run xml2rfc --help Roger Price
- Re: [xml2rfc] Unable to run xml2rfc --help Roger Price
- Re: [xml2rfc] Unable to run xml2rfc --help Carsten Bormann
- [xml2rfc] Non-determinism (Re: Unable to run xml2… Carsten Bormann
- Re: [xml2rfc] Non-determinism (Re: Unable to run … Julian Reschke
- Re: [xml2rfc] Non-determinism (Re: Unable to run … Carsten Bormann
- Re: [xml2rfc] Non-determinism (Re: Unable to run … Julian Reschke
- Re: [xml2rfc] Non-determinism (Re: Unable to run … Roger Price
- Re: [xml2rfc] Non-determinism (Re: Unable to run … Carsten Bormann