Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Carsten Bormann <cabo@tzi.org> Mon, 01 March 2021 10:58 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0FC593A19DC for <xml2rfc@ietfa.amsl.com>; Mon, 1 Mar 2021 02:58:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.018
X-Spam-Level:
X-Spam-Status: No, score=-0.018 tagged_above=-999 required=5 tests=[RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KIme0g8NweCp for <xml2rfc@ietfa.amsl.com>; Mon, 1 Mar 2021 02:58:03 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4BFBC3A19D9 for <xml2rfc@ietf.org>; Mon, 1 Mar 2021 02:58:03 -0800 (PST)
Received: from [192.168.217.123] (p5089a828.dip0.t-ipconnect.de [80.137.168.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Dpy0T3ND3zyVH; Mon, 1 Mar 2021 11:58:01 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <45ca32a4-65df-7eea-84f0-b5451698a27b@gmx.de>
Date: Mon, 01 Mar 2021 11:58:01 +0100
Cc: xml2rfc@ietf.org
X-Mao-Original-Outgoing-Id: 636289081.184101-40e95e185dfd06a14612fbe25ff9dc43
Content-Transfer-Encoding: quoted-printable
Message-Id: <D3D8A513-87A6-4A74-97CE-C3FA8DC36318@tzi.org>
References: <20210227191644.165F76F105E2@ary.qy> <28B528D6-7CBA-4735-A5EE-C7061D1C1D0C@tzi.org> <3dc1abe5-24bf-3b12-7b58-d06c7cde428e@taugh.com> <BBA9B16E-5B06-419D-9ABE-BFB7E69B54C9@tzi.org> <6603926-561f-c9b8-2612-2afb9847b71@taugh.com> <20210228173825.GE30153@localhost> <14ad2b3e-852a-28b1-27ae-5e25ec7823bc@taugh.com> <a7734631-a4f3-cee1-1ee7-e9e0bd3d534a@gmail.com> <d96fc964-f367-dc8f-bdf3-a76b90abd042@alum.mit.edu> <26DCBA0D-AA14-461F-9992-CC631774877E@tzi.org> <45ca32a4-65df-7eea-84f0-b5451698a27b@gmx.de>
To: Julian Reschke <julian.reschke@gmx.de>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/8dM-GCR5dgTzqR4xfy3U5RMris4>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Mar 2021 10:58:07 -0000

On 2021-03-01, at 09:33, Julian Reschke <julian.reschke@gmx.de> wrote:
> 
>> Double spaces in XML input are copied verbatim into the HTML (where they then are swallowed by the HTML processor), so it is not like the processor is not seeing them.
> 
> But do they survive the preptool step?

Sure.

A quick check shows that https://www.rfc-editor.org/rfc/rfc8949.xml has about 318 in-line sentence end marks, and 101 end-of-line ones (this may or may not count ones at the end of paragraphs).

> If so, that would IMHO be a bug.

Can’t speak to that.
I don’t think there is a canonical answer to that question in the XML community.
(This has been technical debt for a third of a century.)

Grüße, Carsten