Re: [I18ndir] [art] Modern Network Unicode

Carsten Bormann <cabo@tzi.org> Thu, 11 July 2019 23:47 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 31C9E12007A; Thu, 11 Jul 2019 16:47:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.198
X-Spam-Level:
X-Spam-Status: No, score=-4.198 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_NONE=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9Vq1bTbX34gr; Thu, 11 Jul 2019 16:47:44 -0700 (PDT)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4DAF612006A; Thu, 11 Jul 2019 16:47:44 -0700 (PDT)
Received: from [192.168.217.110] (p548DCE40.dip0.t-ipconnect.de [84.141.206.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 45lCR14fHfzyY5; Fri, 12 Jul 2019 01:47:41 +0200 (CEST)
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <CAN40gStf08EwxiZ0+JUa02MLykQPEaL52quK-t9qc-Q8ALxT5A@mail.gmail.com>
Date: Fri, 12 Jul 2019 01:46:41 +0200
Cc: John C Klensin <john-ietf@jck.com>, art@ietf.org, "Asmus Freytag (c)" <asmusf@ix.netcom.com>, i18ndir@ietf.org
X-Mao-Original-Outgoing-Id: 584581599.285125-eb0ae30096202d4a412d581536d09c87
Content-Transfer-Encoding: quoted-printable
Message-Id: <0C841343-CD67-40A2-9C37-F5EB5B9DFF8C@tzi.org>
References: <0A5251342D480BA6437F7549@PSB> <B243365E-F7C5-4C53-A64F-2E3E87C4CD66@tzi.org> <248A8DD5DA0D3D34D6B6EFC9@PSB> <213ae024-b819-4f56-6e37-0cd53eb566c9@ix.netcom.com> <D921117F-BA9E-430B-8287-06D15248E1B7@tzi.org> <90f8f2b5-ff3d-f9f1-860c-ae4d43f92c81@ix.netcom.com> <7F1F41C25D0AC5960D95A67E@PSB> <C7BBF677-E752-4258-A357-AE56338F6326@tzi.org> <DFB116527FF004C961182B15@PSB> <CAN40gStf08EwxiZ0+JUa02MLykQPEaL52quK-t9qc-Q8ALxT5A@mail.gmail.com>
To: Ira McDonald <blueroofmusic@gmail.com>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/772xYD48kluPxULhk4WhnDSzCok>
Subject: Re: [I18ndir] [art] Modern Network Unicode
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jul 2019 23:47:46 -0000

This is a great discussion.

To me, it seems to converge on the following.

(1) Sending sane data is the job of the data originator.  

(2) Do not include gratuitous normalization steps in your processing, once the data have been originated in a sane form.

(2a) If you broke it, you fix it (as far as possible): If your processing steps did involve gratuitous normalization, you have to renormalize to NFC before sending.

Here, “sane” is defined as:

(0) Data SHOULD be originated in NFC, unless that would be inappropriate for the specific script, in which case the community consensus rules for the script govern.

For Latin script, this happens to collapse to what 5198 says.

This set of rules places the onus on the place where the data is generated, which is usually the place that knows most about the specific script and about the intent of the originator.  If you know that place isn’t doing its job, add the rule:

(1a) If the data originator does not do (0), the software placing the data on the network may need to sanitize (normalize towards sane).

1a is similar to 2a in that it doesn’t create perfect results, so both SHOULD be avoided — there is no way to, after the fact, perfectly sanitize data that weren’t originated sane or that were gratuitously normalized on the way.

With these definitions, MNU can direct towards: 
(A) Senders: send sane data
(B) Recipients: break as little as reasonable when data received isn’t sane
(C) B is not a valid excuse not to do A, and specifically: recipients are not expected to clean up after senders (because there is no correct way to do that).

(Rule C is the often forgotten third rule of the Postel principle.
It also means that an entity that is a recipient of MNU and then sends the data on as MNU has no need to gratuitously normalize, but it does not entirely get rid of rule 1a for recipients of data from places known not to be sane.)

Grüße, Carsten