[xml2rfc] xml2rfc --v3 --text fails to wrap table of contents for sections with long-enough titles

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Fri, 16 April 2021 16:47 UTC

Return-Path: <dkg@fifthhorseman.net>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8A6203A2C38 for <xml2rfc@ietfa.amsl.com>; Fri, 16 Apr 2021 09:47:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=neutral reason="invalid (unsupported algorithm ed25519-sha256)" header.d=fifthhorseman.net header.b=1M7eKp75; dkim=pass (2048-bit key) header.d=fifthhorseman.net header.b=2rRuNodN
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nFSPMQmAQm3x for <xml2rfc@ietfa.amsl.com>; Fri, 16 Apr 2021 09:47:00 -0700 (PDT)
Received: from che.mayfirst.org (che.mayfirst.org [IPv6:2001:470:1:116::7]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 345D53A2C35 for <xml2rfc@ietf.org>; Fri, 16 Apr 2021 09:47:00 -0700 (PDT)
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/simple; d=fifthhorseman.net; i=@fifthhorseman.net; q=dns/txt; s=2019; t=1618591618; h=from : to : subject : date : message-id : mime-version : content-type : from; bh=srMLfOs36f4wNvqiGMUq36Zlv1bvHdWYHJWK8cnMKaA=; b=1M7eKp75kWjvqJnGE306S+JYi6HEBIZgT8ibHoDKgZFP0kf0O09HI+SPDl21N/8A4Uwhn iyGneTAPmKmIELWAg==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fifthhorseman.net; i=@fifthhorseman.net; q=dns/txt; s=2019rsa; t=1618591618; h=from : to : subject : date : message-id : mime-version : content-type : from; bh=srMLfOs36f4wNvqiGMUq36Zlv1bvHdWYHJWK8cnMKaA=; b=2rRuNodNlsMNmmnqFBLIHWM0YhmykNqxfzw9zR3k4Qex0gRSirqeAWj2udIAj5z7VD8rl FQoKUzTIMFYvKZj6IhGiWn7PANaxG7X/FZnfbyxtRnRGaq3Ut3MExI0mexJCcg46NrSAyhU w3A7NM7r182qcPsVKKm5B/uhaYpFAjd8e+fC2QHot4qyie8FvfuNu4s/CzOCilVgnFSd28b zqH2GUoq3PSCGLznobq9R2Qiz8qSd+/z4WBVnaJWl7ogyX9Oepf1uN72RnQFl7oAM0aMZ/Q Zu7VsLyISYcfRkx/o43hb2DD/Iv/obuTtX81di6tyCiB0oVFUpvXsDDwSHCw==
Received: from fifthhorseman.net (lair.fifthhorseman.net [108.58.6.98]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 8DE28F9A5 for <xml2rfc@ietf.org>; Fri, 16 Apr 2021 12:46:58 -0400 (EDT)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 5196120362; Fri, 16 Apr 2021 12:46:55 -0400 (EDT)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: xml2rfc@ietf.org
Autocrypt: addr=dkg@fifthhorseman.net; prefer-encrypt=mutual; keydata= mDMEX+i03xYJKwYBBAHaRw8BAQdACA4xvL/xI5dHedcnkfViyq84doe8zFRid9jW7CC9XBiI0QQf FgoAgwWCX+i03wWJBZ+mAAMLCQcJEOCS6zpcoQ26RxQAAAAAAB4AIHNhbHRAbm90YXRpb25zLnNl cXVvaWEtcGdwLm9yZ/tr8E9NA10HvcAVlSxnox6z62KXCInWjZaiBIlgX6O5AxUKCAKbAQIeARYh BMKfigwB81402BaqXOCS6zpcoQ26AADZHQD/Zx9nc3N2kj13AUsKMr/7zekBtgfSIGB3hRCU74Su G44A/34Yp6IAkndewLxb1WdRSokycnaCVyrk0nb4imeAYyoPtBc8ZGtnQGZpZnRoaG9yc2VtYW4u bmV0PojRBBMWCgCDBYJf6LTfBYkFn6YAAwsJBwkQ4JLrOlyhDbpHFAAAAAAAHgAgc2FsdEBub3Rh dGlvbnMuc2VxdW9pYS1wZ3Aub3JnL0Gwxvypz2tu1IPG+yu1zPjkiZwpscsitwrVvzN3bbADFQoI ApsBAh4BFiEEwp+KDAHzXjTYFqpc4JLrOlyhDboAAPkXAP0Z29z7jW+YzLzPTQML4EQLMbkHOfU4 +s+ki81Czt0WqgD/SJ8RyrqDCtEP8+E4ZSR01ysKqh+MUAsTaJlzZjehiQ24MwRf6LTfFgkrBgEE AdpHDwEBB0DkKHOW2kmqfAK461+acQ49gc2Z6VoXMChRqobGP0ubb4kBiAQYFgoBOgWCX+i03wWJ BZ+mAAkQ4JLrOlyhDbpHFAAAAAAAHgAgc2FsdEBub3RhdGlvbnMuc2VxdW9pYS1wZ3Aub3Jnfvo+ nHoxDwaLaJD8XZuXiaqBNZtIGXIypF1udBBRoc0CmwICHgG+oAQZFgoAbwWCX+i03wkQPp1xc3He VlxHFAAAAAAAHgAgc2FsdEBub3RhdGlvbnMuc2VxdW9pYS1wZ3Aub3JnaheiqE7Pfi3Atb3GGTw+ jFcBGOaobgzEJrhEuFpXREEWIQQttUkcnfDcj0MoY88+nXFzcd5WXAAAvrsBAIJ5sBg8Udocv25N stN/zWOiYpnjjvOjVMLH4fV3pWE1AP9T6hzHz7hRnAA8d01vqoxOlQ3O6cb/kFYAjqx3oMXSBhYh BMKfigwB81402BaqXOCS6zpcoQ26AADX7gD/b83VObe14xrNP8xcltRrBZF5OE1rQSPkMNy+eWpk eCwA/1hxiS8ZxL5/elNjXiWuHXEvUGnRoVj745Vl48sZPVYMuDgEX+i03xIKKwYBBAGXVQEFAQEH QIGex1WZbH6xhUBve5mblScGYU+Y8QJOomXH+rr5tMsMAwEICYjJBBgWCgB7BYJf6LTfBYkFn6YA CRDgkus6XKENukcUAAAAAAAeACBzYWx0QG5vdGF0aW9ucy5zZXF1b2lhLXBncC5vcmcEAx9vTD3b J0SXkhvcRcCr6uIDJwic3KFKxkH1m4QW0QKbDAIeARYhBMKfigwB81402BaqXOCS6zpcoQ26AAAX mwD8CWmukxwskU82RZLMk5fm1wCgMB5z8dA50KLw3rgsCykBAKg1w/Y7XpBS3SlXEegIg1K1e6dR fRxL7Z37WZXoH8AH
Date: Fri, 16 Apr 2021 12:46:53 -0400
Message-ID: <87sg3q5g8y.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha256; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/aTFKSsaIwTEE8b8Rs2Egvzwm1aQ>
Subject: [xml2rfc] xml2rfc --v3 --text fails to wrap table of contents for sections with long-enough titles
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Apr 2021 16:47:06 -0000

This is a bug report for xml2rfc's text output.  I ran into it with a
real draft (which had a legitimately long section title) but i've
narrowed it down to a simple reproducer here.

Attached is a sample foo.xml draft.

When i ask xml2rfc to generate a text version, it builds a corrupted
table of contents:

$ xml2rfc --v3 --text foo.xml --out foo.txt
(No source line available): Warning: Too long line found (L64), 80 characters longer than 72 characters: 
           Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lo  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   2
 Created file foo.txt
$ cat foo.txt
[…]
Table of Contents

   1.  Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Lorem
           Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lo  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   2
   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .   2

1.  Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Lorem Ipsum
    Lorem Ipsum Lorem Ipsum Lorem Ipsum Lorem Ipsum Lo

   Nothing to see here.
$

it looks like it's some sort of line-wrapping miscalculation in
generating the text output for section title that are particularly
long (if i add or remove a few extra characters to the section title it
wraps correctly).

   --dkg