Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Julian Reschke <julian.reschke@gmx.de> Sat, 27 February 2021 17:41 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 92F323A0FFA for <xml2rfc@ietfa.amsl.com>; Sat, 27 Feb 2021 09:41:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uMJIvjHXaxRK for <xml2rfc@ietfa.amsl.com>; Sat, 27 Feb 2021 09:41:03 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A15073A0FDF for <xml2rfc@ietf.org>; Sat, 27 Feb 2021 09:41:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1614447655; bh=GQO59Zbcm7/nVOaPk41T9qv6XewSK/O4VW1bTT+uMow=; h=X-UI-Sender-Class:Subject:To:Cc:References:From:Date:In-Reply-To; b=TZhjaiYoT+Qlp6yXpLYNQ6sfOqShzsDwRr2JFaxzdp5BDsW8utr356o/iOhGgNKSh 9lFrXcGwpc0cLKRrRdUrpUP+j1ZUCqwn160Uem/7msGXi62gP9ntcaO8R9zKMckQcV LUUvAKpCrgBKk6qra8cOWvMItwy356libb4Gd3AE=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.178.20] ([217.251.136.5]) by mail.gmx.net (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MbivM-1lnTSb0oiU-00dHU8; Sat, 27 Feb 2021 18:40:55 +0100
To: Nico Williams <nico@cryptonector.com>
Cc: xml2rfc@ietf.org
References: <87wnuucjra.fsf@fifthhorseman.net> <20210227160926.GA30153@localhost> <3494e8d8-e9bd-6c38-61f3-0c31d066a61f@gmx.de> <20210227173003.GB30153@localhost>
From: Julian Reschke <julian.reschke@gmx.de>
Message-ID: <11d84e59-6dd9-ade8-7b07-afb34fae27e5@gmx.de>
Date: Sat, 27 Feb 2021 18:40:53 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0
MIME-Version: 1.0
In-Reply-To: <20210227173003.GB30153@localhost>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:5L2mHfS5Xt9LqPePMGmmITlBVY6LxHAVwdcm2SFRL1KJWA9XTOu /7T2RVOZWMkS3lInZloiHei9DJMfy/8a2qemRVz6jTAZg9QnNYKCY5yqdYrnXkGMvU6sS54 NrXga2PKia51GYph6EFR3QLBhAcvD6LfkNsg0IbEPgImGlwYIjS2dKCTvA6M5qKBv19CvOx cZCB6xDFJi7r4oEow6++A==
X-UI-Out-Filterresults: notjunk:1;V03:K0:Wa2IFAB7Qg0=:abjB3In+R+xsjmZQh0tPHW UjB7w2jJ1s4OzKG8qfnSzIlcq0nn85n9tN71ewapI4CtXILB8Y1vfJ2+r7yxYgYSAHZbVoUvN A9LGqB1bacQDJ0obrBteZ5aiepPBJOFRB4kVmcOToJ3u71stC2NUyCYbYnFVcWAnTdnCJdQNC iQnNhsf54A53AqR6WLcBhH4GwPyQh7ndAVcQirGlXYHcH7MY5sCQH+eylXpQB1SYImuXA2wT6 7ft1IJukz0HNI8x4yRTgHWRQKNnzdQ5PBg/pu2dEsRKytXiqJKpuyLE8G0SmxI6aLwmLOSIgT miv7HB0TASihObn/ojM1HIN1+rhGT4YMbCDoo8NdC3b9dkM2vuOYoj/HYMQIstBHIgHYxpVyG GffM6HM/S040r9t+lFzDxPlw8yIbiiBXsk71nHhPccHkBzx/c2pU1dRquBgYiq4zP38keEsa7 NGUF1QTeAL99050Gxen+ogzVStKZl0jQs91YMwPLL3sfFm14tsRbFBuHjttkUzMyEm1pZqEwV c+9vzYb76Jc5L/gLHtMzmSkVRQEo4+z4FblAmKS9UIdEReNTmeofHyv5OI57s2pgk+QfRGiGj qajgF69JbLmasvh6QL4XfBtg9gwuBpSwk6JgR54VDHiBHpo+QezpNkUgOFHHOg8m6DD/Wu4oh jvxbl8HUOy6ALQD/edoBgZnuoWCkbnT4vzxoNSxVVok16opLJ/E8GusqnJmTlsRlxOKZOilw2 uIYn0i8XqUJPL7LVJ80FIWZmis7X+wV2oI98MIHmLXaeknoAD9AZsUNZ41vXK25K5EfH3uidJ SMJwGdIT4rlSbJuKBxDT3iJ4YpDHxkmXtxZSZTopVxCMKFlMpURtcEzoiszjFqScjPqomYD5T /Na+fhPGhLmTi2mDaF6Q==
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/nzp1vlF0E40MMmUrrc52MAlXQIg>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2021 17:41:05 -0000

Am 27.02.2021 um 18:30 schrieb Nico Williams:
> On Sat, Feb 27, 2021 at 05:35:02PM +0100, Julian Reschke wrote:
>> Am 27.02.2021 um 17:09 schrieb Nico Williams:
>>> On Fri, Feb 26, 2021 at 09:44:09PM -0500, Daniel Kahn Gillmor wrote:
>>>> The toolchain to build draft-ietf-openpgp-crypto-refresh produces XML
>>>> that contains:
>>>
>>> There are only two possible correct answers, but historically it is
>>> impossible to get agreement on adopting either:
>>
>> At least three :-).
>
> Your third solution is not a solution.  It says stop trying to have
> rendered text distinguish between sentence-ending periods and not.  That
> is just unacceptable.
>
> Ah, there is a third solution, duh: use a Unicode widespace space after
> sentence-ending periods (&emsp;).  It... might be just slightly less
> unpleasant than <sentence>:
>
>    Here's an example.&emsp; And here's the next sentence.&emsp; I had to
>    follow &amp;emsp; with a space because running on two sentences is
>    annoying.
>
>>>    - mark-up sentences, e.g.,
>>>
>>>        <sentence>D. K. G. wrote a post.</sentence>
>>>        <sentence>This follow-up might be controversial.</sentence>
>>
>> I'd be surprised if people would be willing to do this.
>
> Right, no one would or should want that.
>
>>>    - follow sentence-ending periods with two spaces (which does not mean
>>>      the the rendered output must also do the same, as it could use a wide
>>>      space instead), e.g.,
>>>
>>>        D. K. G. wrote a post.  This follow-up might be controversial.
>>>
>>>      i.e., `.  ` as a sort of mark-up
>>
>> Tricky, because it would change the whitespace handling inside <t>
>> (where currently multiple white space characters are always equivalent
>> to a single space).
>
> Is that XML or xml2rfc forcing that?

People rely on whitespace being insignificant inside <t> and similar
elements. For instance, for indentation.

So we would need an algorithm that can distinguish between what's
itentional and what is not.

>>> Instead many developers prefer to code up imperfect heuristics for
>>> sentence ending periods.  If you search relevant archives (e.g., this
>>> list's) you'll find that this is a periodic discussion.
>>
>> Yes.
>
> Yuck.
>
>>> Don your asbestos suits now.  Flame war incoming.
>>
>> The third answer is: stop trying to. Optimizing the plain text output
>> format really really is not important.
>
> Not acceptable.

I just opened a random ancient RFC I'm familiar with (2068), and it
hasn't two spaces after a sentence ending. At least not consistently.

So exactly when became this a "requirement"? And who determines what
"acceptable" for plain text output?

Best regards, Julian