[OPSEC] [IETF Successes and Failures] #3 (component1): xml2rfc: hyphen not escaped in unicode.py

opsec issue tracker <trac@tools.ietf.org> Mon, 04 January 2021 12:00 UTC

Return-Path: <trac@tools.ietf.org>
X-Original-To: opsec@ietfa.amsl.com
Delivered-To: opsec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD4323A0C6A for <opsec@ietfa.amsl.com>; Mon, 4 Jan 2021 04:00:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.001
X-Spam-Level:
X-Spam-Status: No, score=0.001 tagged_above=-999 required=5 tests=[SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 77NUpd6xjq1V for <opsec@ietfa.amsl.com>; Mon, 4 Jan 2021 04:00:18 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [64.170.98.42]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5B4AC3A0C67 for <opsec@ietf.org>; Mon, 4 Jan 2021 04:00:18 -0800 (PST)
Received: from localhost ([127.0.0.1]:54428 helo=zinfandel.tools.ietf.org ident=www-data) by zinfandel.tools.ietf.org with esmtp (Exim 4.80) (envelope-from <trac@tools.ietf.org>) id 1kwOX3-0007ZB-Hb; Mon, 04 Jan 2021 04:00:17 -0800
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: "opsec issue tracker" <trac@tools.ietf.org>
X-Trac-Version: 0.12.5
Precedence: bulk
Cc: opsec@ietf.org
Auto-Submitted: auto-generated
X-Mailer: Trac 0.12.5, by Edgewall Software
To: roger@rogerprice.org
X-Trac-Project: IETF Successes and Failures
Date: Mon, 04 Jan 2021 12:00:17 -0000
Reply-To: opsec@ietf.org
X-URL: http://tools.ietf.org/misc/outcomes/
X-Trac-Ticket-URL: https://trac.tools.ietf.org/misc/outcomes/ticket/3
Message-ID: <068.6ad53292a0eaefd51561ae1bc31a242f@tools.ietf.org>
X-Trac-Ticket-ID: 3
X-SA-Exim-Connect-IP: 127.0.0.1
X-SA-Exim-Rcpt-To: roger@rogerprice.org, opsec@ietf.org
X-SA-Exim-Mail-From: trac@tools.ietf.org
X-SA-Exim-Scanned: No (on zinfandel.tools.ietf.org); SAEximRunCond expanded to false
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/opsec/LredhjJvSN3RfFtPnWLy-NzDKLE>
Subject: [OPSEC] [IETF Successes and Failures] #3 (component1): xml2rfc: hyphen not escaped in unicode.py
X-BeenThere: opsec@ietf.org
X-Mailman-Version: 2.1.29
List-Id: opsec wg mailing list <opsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/opsec>, <mailto:opsec-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/opsec/>
List-Post: <mailto:opsec@ietf.org>
List-Help: <mailto:opsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsec>, <mailto:opsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Jan 2021 12:00:21 -0000

#3: xml2rfc: hyphen not escaped in unicode.py

 Debian Stretch, uname -a reports:
 Linux maria 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23)
 x86_64 GNU/Linux
 Command python3 reports:
 Python 3.8.1 (default, Feb 22 2020, 11:56:23) [GCC 6.3.0 20170516] on
 linux

 When I enter command "xml2rfc -h" repeatedly, it fails half the time with
 error message:

 Traceback (most recent call last):
 File "/usr/bin/xml2rfc", line 7, in <module>
 from xml2rfc.run import main
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/__init__.py", line 14, in <module>
 from xml2rfc.parser import  XmlRfcError, CachingResolver, XmlRfcParser,
 XmlRfc
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/parser.py", line 20, in <module>
 from xml2rfc.writers import base
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/writers/__init__.py", line 2, in <module>
 from xml2rfc.writers.base import RfcWriterError
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/writers/base.py", line 30, in <module>
 from xml2rfc.util.unicode import ( punctuation, unicode_replacements,
 unicode_content_tags, bare_unicode_tags,
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/util/unicode.py", line 260, in <module>
 punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))
 File "/usr/lib/python3.5/re.py", line 224, in compile
 return _compile(pattern, flags)
 File "/usr/lib/python3.5/re.py", line 293, in _compile
 p = sre_compile.compile(pattern, flags)
 File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
 p = sre_parse.parse(p, flags)
 File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
 p = _parse_sub(source, pattern, 0)
 File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
 itemsappend(_parse(source, state))
 File "/usr/lib/python3.5/sre_parse.py", line 575, in _parse
 raise source.error(msg, len(this) + 1 + len(that))
 sre_constants.error: bad character range −-“ at position 3

 At line 260 in .../xml2rfc/util/unicode.py I inserted two lines to display
 the value of punctuation.keys()

 259-punctuation.update(unicode_quote_replacements)
 260-import sys
 261-print("unicode.py: list(punctuation.keys()) {}"
 .format(list(punctuation.keys())),file=sys.stderr)
 262-punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))

 When xml2rfc succeeded, I saw

 unicode.py: list(punctuation.keys()) = ['\u2002', '-', '‐', '′', '–', '´',
 '…', '’',
 '−', '„', '—', '\u2009', '‚', '‘', '”', '“', '\u2003']
 unicode.py: list(punctuation.keys()) = ['´', '„', '\u2003', '‚', '−', '“',
 '’', '‘', '-', '…',
 '\u2009', '—', '–', '”', '′', '‐', '\u2002']

 When xml2rfc failed, I saw

 unicode.py: list(punctuation.keys()) = ['´', '\u2002', '−', '-', '“', '„',
 '‘', '′',
 '‚', '–', '…', '’', '‐', '”', '—', '\u2009', '\u2003']
 unicode.py: list(punctuation.keys()) = ['‐', '\u2003', '„', '\u2002',
 '\u2009', '‚',
 '—', '’', '−', '…', '‘', '′', '-', '“', '”', '´', '–']

 It looks as if the character "-" is being wrongly interpreted by re as a
 range indicator.
 Perhaps it should be escaped.

 My apologies for the wretched formatting of this message.
 Roger

-- 
----------------------------------+----------------------
 Reporter:  roger@rogerprice.org  |      Owner:  somebody
     Type:  defect                |     Status:  new
 Priority:  major                 |  Milestone:
Component:  component1            |    Version:
 Keywords:  re escape hyphen      |
----------------------------------+----------------------

Ticket URL: <https://trac.tools.ietf.org/misc/outcomes/ticket/3>
IETF Successes and Failures <http://tools.ietf.org/misc/outcomes/>
IETF Successes and Failures