Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Nico Williams <nico@cryptonector.com> Mon, 01 March 2021 15:49 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 71AA63A1E5C for <xml2rfc@ietfa.amsl.com>; Mon, 1 Mar 2021 07:49:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.119
X-Spam-Level:
X-Spam-Status: No, score=-2.119 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lKyfvsdUSg8X for <xml2rfc@ietfa.amsl.com>; Mon, 1 Mar 2021 07:49:08 -0800 (PST)
Received: from butterfly.birch.relay.mailchannels.net (butterfly.birch.relay.mailchannels.net [23.83.209.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4344F3A1E5B for <xml2rfc@ietf.org>; Mon, 1 Mar 2021 07:49:08 -0800 (PST)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id C35EF542B02; Mon, 1 Mar 2021 15:49:05 +0000 (UTC)
Received: from pdx1-sub0-mail-a86.g.dreamhost.com (100-96-133-25.trex.outbound.svc.cluster.local [100.96.133.25]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id 560B65429C6; Mon, 1 Mar 2021 15:49:05 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a86.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.96.133.25 (trex/6.0.2); Mon, 01 Mar 2021 15:49:05 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Lyrical-Chief: 7bfa32ad2ebac79c_1614613745594_4000105912
X-MC-Loop-Signature: 1614613745594:3637719517
X-MC-Ingress-Time: 1614613745594
Received: from pdx1-sub0-mail-a86.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a86.g.dreamhost.com (Postfix) with ESMTP id 0AC227EC5A; Mon, 1 Mar 2021 07:49:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=gwMx9U2xqeguB2 mI+NsJVHanxbM=; b=KZ+uSWs3wsxV9RLTtfibgzXqXiedkUGneJppjNQ/oaSsY6 Mr3suuOLIw3bEj3r8e7fm3HE3RMdvDmCZZn7AvvhJ0rKqJ/VqoSVfbPr8mhG6y5j em18Q2HUC5tMAKV395Kw6eCVX1e7V6f+QKEwoJXQTIw4EfJN0TRdnVVKaXMjQ=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a86.g.dreamhost.com (Postfix) with ESMTPSA id DDB0D7E52C; Mon, 1 Mar 2021 07:49:01 -0800 (PST)
Date: Mon, 01 Mar 2021 09:48:59 -0600
X-DH-BACKEND: pdx1-sub0-mail-a86
From: Nico Williams <nico@cryptonector.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: Julian Reschke <julian.reschke@gmx.de>, xml2rfc@ietf.org
Message-ID: <20210301154858.GH30153@localhost>
References: <ec03aa52-6aa1-0bd0-3638-c11bfc9d64dd@gmx.de> <9C9F3CE7-E269-4BE4-A6FB-D13101D1927D@tzi.org> <47edd9eb-6c96-aa9c-9709-73e054373d4a@gmx.de> <40CE7C2C-65E7-4A4C-B16E-BA4ED62C6FF4@tzi.org> <39aac336-0032-5b11-7d64-e73ed314b79c@gmx.de> <FBC30EFB-10ED-4682-86BD-6CE89E7CDA80@tzi.org> <bf0e22cf-2f90-18e7-cb09-c791aa872f49@gmx.de> <3F93A1FB-0475-4DC5-96B1-A832880779CD@tzi.org> <50c61df3-64e7-d81a-a955-ef2854f05919@gmx.de> <6F258208-B6A0-4F20-90CD-AF32FD361FF5@tzi.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <6F258208-B6A0-4F20-90CD-AF32FD361FF5@tzi.org>
User-Agent: Mutt/1.9.4 (2018-02-28)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/-y4e9M1nfMYzjFEpot8yPBoIwYc>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Mar 2021 15:49:10 -0000

On Mon, Mar 01, 2021 at 02:55:24PM +0100, Carsten Bormann wrote:
> On 2021-03-01, at 14:35, Julian Reschke <julian.reschke@gmx.de> wrote:
> > 
> > Also, AFAICT, when M. T. Rose developed this, he used the same
> > processing model as HTML.
> 
> Right, which is fine with me.  My copy of xml2rfcv1 also does sentence
> spacing, pretty much on the model that I described.
> 
> My point is that nothing in XML or existing specifications speaks
> against sentence detection and sentence spacing.  The setting of
> xml:space is completely irrelevant, as the handling of whitespace is
> left to the application in either case.

Thanks for clarifying that for everyone.

> (This does not prejudice whether we want to do sentence spacing in the
> future, of course.)

Agreed.

IMO it would be good to have at least a way to handle the cases where no
extra spacing hurts readability, and to assume they are few and rare.

Nico
--