Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Julian Reschke <> Sat, 27 February 2021 17:41 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 92F323A0FFA for <>; Sat, 27 Feb 2021 09:41:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id uMJIvjHXaxRK for <>; Sat, 27 Feb 2021 09:41:03 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id A15073A0FDF for <>; Sat, 27 Feb 2021 09:41:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;; s=badeba3b8450; t=1614447655; bh=GQO59Zbcm7/nVOaPk41T9qv6XewSK/O4VW1bTT+uMow=; h=X-UI-Sender-Class:Subject:To:Cc:References:From:Date:In-Reply-To; b=TZhjaiYoT+Qlp6yXpLYNQ6sfOqShzsDwRr2JFaxzdp5BDsW8utr356o/iOhGgNKSh 9lFrXcGwpc0cLKRrRdUrpUP+j1ZUCqwn160Uem/7msGXi62gP9ntcaO8R9zKMckQcV LUUvAKpCrgBKk6qra8cOWvMItwy356libb4Gd3AE=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [] ([]) by (mrgmx005 []) with ESMTPSA (Nemesis) id 1MbivM-1lnTSb0oiU-00dHU8; Sat, 27 Feb 2021 18:40:55 +0100
To: Nico Williams <>
References: <> <20210227160926.GA30153@localhost> <> <20210227173003.GB30153@localhost>
From: Julian Reschke <>
Message-ID: <>
Date: Sat, 27 Feb 2021 18:40:53 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0
MIME-Version: 1.0
In-Reply-To: <20210227173003.GB30153@localhost>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:5L2mHfS5Xt9LqPePMGmmITlBVY6LxHAVwdcm2SFRL1KJWA9XTOu /7T2RVOZWMkS3lInZloiHei9DJMfy/8a2qemRVz6jTAZg9QnNYKCY5yqdYrnXkGMvU6sS54 NrXga2PKia51GYph6EFR3QLBhAcvD6LfkNsg0IbEPgImGlwYIjS2dKCTvA6M5qKBv19CvOx cZCB6xDFJi7r4oEow6++A==
X-UI-Out-Filterresults: notjunk:1;V03:K0:Wa2IFAB7Qg0=:abjB3In+R+xsjmZQh0tPHW UjB7w2jJ1s4OzKG8qfnSzIlcq0nn85n9tN71ewapI4CtXILB8Y1vfJ2+r7yxYgYSAHZbVoUvN A9LGqB1bacQDJ0obrBteZ5aiepPBJOFRB4kVmcOToJ3u71stC2NUyCYbYnFVcWAnTdnCJdQNC iQnNhsf54A53AqR6WLcBhH4GwPyQh7ndAVcQirGlXYHcH7MY5sCQH+eylXpQB1SYImuXA2wT6 7ft1IJukz0HNI8x4yRTgHWRQKNnzdQ5PBg/pu2dEsRKytXiqJKpuyLE8G0SmxI6aLwmLOSIgT miv7HB0TASihObn/ojM1HIN1+rhGT4YMbCDoo8NdC3b9dkM2vuOYoj/HYMQIstBHIgHYxpVyG GffM6HM/S040r9t+lFzDxPlw8yIbiiBXsk71nHhPccHkBzx/c2pU1dRquBgYiq4zP38keEsa7 NGUF1QTeAL99050Gxen+ogzVStKZl0jQs91YMwPLL3sfFm14tsRbFBuHjttkUzMyEm1pZqEwV c+9vzYb76Jc5L/gLHtMzmSkVRQEo4+z4FblAmKS9UIdEReNTmeofHyv5OI57s2pgk+QfRGiGj qajgF69JbLmasvh6QL4XfBtg9gwuBpSwk6JgR54VDHiBHpo+QezpNkUgOFHHOg8m6DD/Wu4oh jvxbl8HUOy6ALQD/edoBgZnuoWCkbnT4vzxoNSxVVok16opLJ/E8GusqnJmTlsRlxOKZOilw2 uIYn0i8XqUJPL7LVJ80FIWZmis7X+wV2oI98MIHmLXaeknoAD9AZsUNZ41vXK25K5EfH3uidJ SMJwGdIT4rlSbJuKBxDT3iJ4YpDHxkmXtxZSZTopVxCMKFlMpURtcEzoiszjFqScjPqomYD5T /Na+fhPGhLmTi2mDaF6Q==
Archived-At: <>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 27 Feb 2021 17:41:05 -0000

Am 27.02.2021 um 18:30 schrieb Nico Williams:
> On Sat, Feb 27, 2021 at 05:35:02PM +0100, Julian Reschke wrote:
>> Am 27.02.2021 um 17:09 schrieb Nico Williams:
>>> On Fri, Feb 26, 2021 at 09:44:09PM -0500, Daniel Kahn Gillmor wrote:
>>>> The toolchain to build draft-ietf-openpgp-crypto-refresh produces XML
>>>> that contains:
>>> There are only two possible correct answers, but historically it is
>>> impossible to get agreement on adopting either:
>> At least three :-).
> Your third solution is not a solution.  It says stop trying to have
> rendered text distinguish between sentence-ending periods and not.  That
> is just unacceptable.
> Ah, there is a third solution, duh: use a Unicode widespace space after
> sentence-ending periods (&emsp;).  It... might be just slightly less
> unpleasant than <sentence>:
>    Here's an example.&emsp; And here's the next sentence.&emsp; I had to
>    follow &amp;emsp; with a space because running on two sentences is
>    annoying.
>>>    - mark-up sentences, e.g.,
>>>        <sentence>D. K. G. wrote a post.</sentence>
>>>        <sentence>This follow-up might be controversial.</sentence>
>> I'd be surprised if people would be willing to do this.
> Right, no one would or should want that.
>>>    - follow sentence-ending periods with two spaces (which does not mean
>>>      the the rendered output must also do the same, as it could use a wide
>>>      space instead), e.g.,
>>>        D. K. G. wrote a post.  This follow-up might be controversial.
>>>      i.e., `.  ` as a sort of mark-up
>> Tricky, because it would change the whitespace handling inside <t>
>> (where currently multiple white space characters are always equivalent
>> to a single space).
> Is that XML or xml2rfc forcing that?

People rely on whitespace being insignificant inside <t> and similar
elements. For instance, for indentation.

So we would need an algorithm that can distinguish between what's
itentional and what is not.

>>> Instead many developers prefer to code up imperfect heuristics for
>>> sentence ending periods.  If you search relevant archives (e.g., this
>>> list's) you'll find that this is a periodic discussion.
>> Yes.
> Yuck.
>>> Don your asbestos suits now.  Flame war incoming.
>> The third answer is: stop trying to. Optimizing the plain text output
>> format really really is not important.
> Not acceptable.

I just opened a random ancient RFC I'm familiar with (2068), and it
hasn't two spaces after a sentence ending. At least not consistently.

So exactly when became this a "requirement"? And who determines what
"acceptable" for plain text output?

Best regards, Julian