Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Julian Reschke <> Sat, 27 February 2021 16:35 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 08B453A0D25 for <>; Sat, 27 Feb 2021 08:35:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 0XDwfZ2r5Zim for <>; Sat, 27 Feb 2021 08:35:06 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 12A8F3A0D20 for <>; Sat, 27 Feb 2021 08:35:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;; s=badeba3b8450; t=1614443703; bh=23mgYHHNy1lYsPrHz9TkinozzWdmFpC6/6XPfzhbyy8=; h=X-UI-Sender-Class:Subject:To:References:From:Date:In-Reply-To; b=Ov8QGDFJ4BaRaeGqFp2zW7E4rSCbQ/jaEVrWsQB97me9JIN8y1bw6lX49xO6kVB8U j2ZE7j6lEa0+tioECydGSwSsB1YpxPDMzz8gHWbx6UxDTIzHE0m2z+IOJTTlg4Rdgw CJ4TdHxKltUxKz/V9E4FVkKW4ufFgeB0JIW5wEiE=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [] ([]) by (mrgmx004 []) with ESMTPSA (Nemesis) id 1N33Il-1lxFfv3DO1-013JfE for <>; Sat, 27 Feb 2021 17:35:02 +0100
References: <> <20210227160926.GA30153@localhost>
From: Julian Reschke <>
Message-ID: <>
Date: Sat, 27 Feb 2021 17:35:02 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0
MIME-Version: 1.0
In-Reply-To: <20210227160926.GA30153@localhost>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:WHSMrvLRUxJ0kQIdPSxZYO3K/5TSGSq52n887sq69g6rAs0x3un yNpG/wXoud+rnXw3hz8CDEbab0Ye/FVGbqG2dzWpznr9FgsAxYXvWQLVKV+6rIx3gYNVMuQ RqwNu7hWfpwbb0r9mSeAQ7yNWZA9IVMjtSQxnnwCaSOf9+iOQ0cfTgctTRtXG9LWLiDPngx spbZbUwjPLr7a1aFPb3+A==
X-UI-Out-Filterresults: notjunk:1;V03:K0:NamSTeLApjI=:98l7LH8KyTZyhr6ARpXa1n jGnAY2jCWvfs/sAFq6FtQ2UdTFlBeGcWZIWr+Y/bAQnjBuqUm2tMqVFLShLBkhqV6cuDUIJ/m fnnLfOcUGl+ns2AEA5S0rYS4LITEdP2kmaHeBlSATjF0s4G1aNMUdhR88NVGjUDXv241UQUix EUzi+OVeADXc9QPGi4QJ3kUo+OtJb7hgXB5N/IaPRJTgQglCWv/dnvvt4KC2Hq0OL83VZLHGX 1B/pSx97nDbrYGWoB/D51LirStvadQiLIG7PyYBR+FrhSj+NatVAIoLOpEZd4k3SpUInAJzej 9owjjhYlToBASJKRs0vJFyAdrQQCJvYqt1K0AFPSf/3eYDcR6INnjypCN23VgL5RB5UN4na67 yOemJ3b3eYNb01w0zZZedsPo+PHMvkyLgjdT/HoTKEIqlwwaBIhmpp8RRQIDi9Wuf5Cnc3x3j +GGxyGzlCOeNA9Up76Hjx+HHCwdf499hk9GFZygQCdu2vur7WmedwUqu+M0JmFHEvP8gnAkc5 ZpzQe5I1wQLfgnyyhSTynSucOIsHpDlrbOxKHqgCg5ROVMqD5sxquFDkBSSMnBKwc+yiDrs5r LEVDbPdZR2qN79r5LTB/YDquJdQnW6ofYGE9BZzheN4n2/uSbMH5G+j70xqGBtFQjiR1j7o1V aS+6RYzXTXQFpCSQYbBQlkCKsvE/BYLqc1DqLYZADUdcjXFZeAfsGQRPOLFmRUpng1PqDUFbH YU2zeIKMkDZE5b3p/e4aMu+CGaJhh0fYMBfFHs/zqTkfaQOBNjsrRtOXpf1/1wcKsDmFr/8WV D4tvWFRHZDB7MiV56/Zon0VCl9RC+sYiyd4R/obcAa9cRQ6crsmRk6PoNmJzQTnDPdMLdD1Oc fKd/iMw/OG1N7tHOER1Q==
Archived-At: <>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 27 Feb 2021 16:35:08 -0000

Am 27.02.2021 um 17:09 schrieb Nico Williams:
> On Fri, Feb 26, 2021 at 09:44:09PM -0500, Daniel Kahn Gillmor wrote:
>> The toolchain to build draft-ietf-openpgp-crypto-refresh produces XML
>> that contains:
> There are only two possible correct answers, but historically it is
> impossible to get agreement on adopting either:

At least three :-).

>   - mark-up sentences, e.g.,
>       <sentence>D. K. G. wrote a post.</sentence>
>       <sentence>This follow-up might be controversial.</sentence>

I'd be surprised if people would be willing to do this.

>   - follow sentence-ending periods with two spaces (which does not mean
>     the the rendered output must also do the same, as it could use a wide
>     space instead), e.g.,
>       D. K. G. wrote a post.  This follow-up might be controversial.
>     i.e., `.  ` as a sort of mark-up

Tricky, because it would change the whitespace handling inside <t>
(where currently multiple white space characters are always equivalent
to a single space).

> Instead many developers prefer to code up imperfect heuristics for
> sentence ending periods.  If you search relevant archives (e.g., this
> list's) you'll find that this is a periodic discussion.


> I myself am quite used to always following every sentence period with
> two spaces.  (In smart phone text input boxes that's a huge pain
> because, at least on mine they turn two spaces typed in quick succession
> into a period and space.)
> Don your asbestos suits now.  Flame war incoming.

The third answer is: stop trying to. Optimizing the plain text output
format really really is not important.

Best regards, Julian