Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Julian Reschke <julian.reschke@gmx.de> Mon, 01 March 2021 11:12 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A97483A1A09 for <xml2rfc@ietfa.amsl.com>; Mon, 1 Mar 2021 03:12:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0
X-Spam-Level:
X-Spam-Status: No, score=0 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8G_NByER5qze for <xml2rfc@ietfa.amsl.com>; Mon, 1 Mar 2021 03:12:52 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7D2203A1A08 for <xml2rfc@ietf.org>; Mon, 1 Mar 2021 03:12:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1614597160; bh=NHT+D73thm3mVoNS6WJTexe138uMgWBbqZIfc21pjBM=; h=X-UI-Sender-Class:Subject:To:Cc:References:From:Date:In-Reply-To; b=NoEWkZI6pJHFVkFFCY4LwfjF6e8xINambzfm31UR++ZD1xVc9OYIrwWX30/2WsuwM bWHSkciIIgYXO9VuSp8JUM+Z9rpJjoPPWMrU0GUo1oYRcyvcD/qMQ+p4efRDA7iPlj 10kEioTw/ripnyTwBkEjCjetXjzbpaWxdBoO4j38=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.1.236] ([217.91.35.233]) by mail.gmx.net (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1N49h5-1lymnu3Nlx-0104Eq; Mon, 01 Mar 2021 12:12:39 +0100
To: Carsten Bormann <cabo@tzi.org>
Cc: xml2rfc@ietf.org
References: <20210227191644.165F76F105E2@ary.qy> <28B528D6-7CBA-4735-A5EE-C7061D1C1D0C@tzi.org> <3dc1abe5-24bf-3b12-7b58-d06c7cde428e@taugh.com> <BBA9B16E-5B06-419D-9ABE-BFB7E69B54C9@tzi.org> <6603926-561f-c9b8-2612-2afb9847b71@taugh.com> <20210228173825.GE30153@localhost> <14ad2b3e-852a-28b1-27ae-5e25ec7823bc@taugh.com> <a7734631-a4f3-cee1-1ee7-e9e0bd3d534a@gmail.com> <d96fc964-f367-dc8f-bdf3-a76b90abd042@alum.mit.edu> <26DCBA0D-AA14-461F-9992-CC631774877E@tzi.org> <45ca32a4-65df-7eea-84f0-b5451698a27b@gmx.de> <D3D8A513-87A6-4A74-97CE-C3FA8DC36318@tzi.org>
From: Julian Reschke <julian.reschke@gmx.de>
Message-ID: <ec03aa52-6aa1-0bd0-3638-c11bfc9d64dd@gmx.de>
Date: Mon, 1 Mar 2021 12:12:38 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0
MIME-Version: 1.0
In-Reply-To: <D3D8A513-87A6-4A74-97CE-C3FA8DC36318@tzi.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:30H1CAnVQkcxCpuaNkl9el7006Nn4gKZX43+Ldu/pWtpTDyPoiu BlPJuedOqOq905EOJ4tDNXLZVvvS9827hn52WmA2nsVhH47C2OmdlthdyX34pvO5XdRjhba h0QHkQBc2ljnna5Ud4plvrRIChG1W84/v33tvQUqA6QmBjba+Wvf1dfW85X32EIruFokVoj AmGD1cVnlOMRJp7YRACpw==
X-UI-Out-Filterresults: notjunk:1;V03:K0:rqUEhqGN5IA=:uJ5MMfFo5h8rLUfHdXJHUM 2cVOGW2Xn8UvOTqTHeyQ0m5fkHv1NGSurfwNqLe+BYt3hNQZkcTmkkeXvTIdFgo7l3/bc8kNp i+mnBp/MUAoTuMGW21ZDmdETjPn9HwJeh0f4rJE5tOdxb/81wEnv6T0Yn7HlsLUmW0ofV8+T3 hwgfA5PiWNpSY40tv2qDNWtMSAbnG8tjAyzQlbb6khGVOcJ6vLM764DQTqSbYoqNJ348CfEZh it6voR69KDb8ZSNVrv7aVujYILuRLOHmYe68KFHb+EF+8eWqzr2NMM4iCzsPPx/90PkE4K7cA SE4LwTe3S+Rlp9rpw/opRNkrPnDgvFmZGNAeQ9mdonKhfeND0GFz5ioR47mQwGe8IKjqXRfwW xJvdir5kEBmPZGanNvHoNNLLRecLhn3xrElmxhNdH9IEcbBtEWw3P9kg6mOESyvf4dndsUtwG qQ69oEEtWpUIfbFnvn4k+Zav259O097C/8JIL/x4DwZDgQSimpD/qEkgdlgfU9uN5VTmC6BYc CN159eXwRfl9mMXr7hNr+asD/1w1WnyxFMmQtyHG3kFYuaXMqEqCgu542SG1VijMShL9tE3Fs f2IKKnpvLX0w2skWyXxe08Mj1BFztUVDAs/GlbH+OuugnHJx4+j18eVdnKoUD9c/gdjbre2Co yXnmRFO5Z0qTMLtu/wxqpdjmGGmHqdwQKoKWkktMp/Tayh5vNHejxNVtzW2dgxVphGJzwutTi Y3VFoEtGVlTBg+kAkCCkgR+LUB85rLUMl8idPlSl2Y2gMp2HiTiO7wAmhOjGWiZwuc36pS0J1 rEbCul8XzKp2yzPbeBrti7fReZxweESPiK4jL9cL19duFKBwjMcRC3mVNbKMppA6hY7UAqD2S kEWdvZ3OTRuBmsBErECw==
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/Yo-qwneu_MVrmtXJKb3oJfxvCaQ>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Mar 2021 11:12:54 -0000

Am 01.03.2021 um 11:58 schrieb Carsten Bormann:
> On 2021-03-01, at 09:33, Julian Reschke <julian.reschke@gmx.de> wrote:
>>
>>> Double spaces in XML input are copied verbatim into the HTML (where they then are swallowed by the HTML processor), so it is not like the processor is not seeing them.
>>
>> But do they survive the preptool step?
>
> Sure.
>
> A quick check shows that https://www.rfc-editor.org/rfc/rfc8949.xml has about 318 in-line sentence end marks, and 101 end-of-line ones (this may or may not count ones at the end of paragraphs).
>
>> If so, that would IMHO be a bug.
>
> Can’t speak to that.
 > ...

I can only talk about what we *intended* to specify.

The only elements in the V3 grammar which have an implied xml:space
value of "preserve" are <artwork> and <sourcecode>. Thus, whitespace in
any other element is not significant. That also means, that the
preptool, when pretty-printing, should remove it (I might open a ticket
for that).

Changing the defaults for <t> (and similar elements) *would* be a
change. I'm not saying it can't be done, but given the fact that the
Style Guide has removed the requirements for 2SP, and furthermore
<https://www.chicagomanualofstyle.org/qanda/data/faq/topics/OneSpaceorTwo.html>
(the CMO is frequently referenced for RFC style questions), I would be
surprised if there was willingness to do this.

In any case: *if* the outcome was that we want to handle
space-after-sentence-ends differently, we absolutely should do this in
all output formats, not just plain text.

Best regards, Julian