Re: [art] [I18ndir] Modern Network Unicode

Ira McDonald <blueroofmusic@gmail.com> Thu, 11 July 2019 21:40 UTC

Return-Path: <blueroofmusic@gmail.com>
X-Original-To: art@ietfa.amsl.com
Delivered-To: art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5F035120159; Thu, 11 Jul 2019 14:40:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.702
X-Spam-Level:
X-Spam-Status: No, score=-0.702 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, PDS_NO_HELO_DNS=1.295, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n91BDgxphHSn; Thu, 11 Jul 2019 14:40:08 -0700 (PDT)
Received: from mail-yb1-xb43.google.com (mail-yb1-xb43.google.com [IPv6:2607:f8b0:4864:20::b43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A0931120112; Thu, 11 Jul 2019 14:40:08 -0700 (PDT)
Received: by mail-yb1-xb43.google.com with SMTP id a5so3137103ybo.13; Thu, 11 Jul 2019 14:40:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rTNepuW5guoqF3Su4Hd3hn7eQR8HoGcoUrtDSP5SSuk=; b=MdaHZ1NHjZAh2Yu8CMh4wJuiyACcspSeZbX5CCBsyJ6uqIXQxAfTbFyedfT5C+sl4r JDrRAMkwSyqNGuYSmHcem3/jbFoLvbOhKCw4krEVH2EOgvt7Xsp3/wXprobqhh8QlW+s H6hWemYEsj7vRfp0D7L8iPmfVVJl89yC5BZhCf5FGLD5NtlKDbrbsdG2F0i+T+vybtsM ka4/trBWsGHrB+Nz++ojE+/wUjVWdBw3F6b8/LjPRVTzg/Og75ywAezfmiVPpoorqAST xwNOpT6bIzrToHCGiDmxMHTGpXZJ4mqNxOMu02s9jvrpplqP2wEE5phWRpnTwbE7qNsv mbSA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rTNepuW5guoqF3Su4Hd3hn7eQR8HoGcoUrtDSP5SSuk=; b=YjcpZQuFmirw+VjUdUlVKw+G2/8b5DtpU2iqIpmoAmfRWOHTjxqBJtoUPoFENmq2mj iHKkSWgGpEoFFd+QBSBhxbwc1tvlZ4zCCZ4u3U7ZUXpbpC+qZv7ukHuF5JanWUx6l4Df BxrWbeHt71GVHFoBPZcKdLtq8cttR9Md1NJbI7wFFWU5o9+nyVQ00oDg2AB11FNzvKm3 V6NaWa9/KLUyMd2RA+pBNG+jex2ZG3wxNGzqrbfCQW06CwdAc063GrDYmow6t37ItnlG AyVtbP90Sa87G3FC0o57atO5WFOyGrFS9rSWOoWiQSIhjH9ScU1Qee1vR+1a/HXXdma6 qjfg==
X-Gm-Message-State: APjAAAWkrdLehnLSwPi4HaKiY8+4dve/CXxTX/456CaARgCFObBXTXzR T7rVZGNKCvvQ/3sBEhiTp0OBW/WrVwgEiUR3jaU=
X-Google-Smtp-Source: APXvYqwOiPUKKaB3c7GwTUO7cwHNBBisAELASg0AwesAwlgUy5XyioL7AY6b4CvyQlo4bX7847Is0UucOgrH5v3OZPc=
X-Received: by 2002:a25:d252:: with SMTP id j79mr4304660ybg.236.1562881207843; Thu, 11 Jul 2019 14:40:07 -0700 (PDT)
MIME-Version: 1.0
References: <0A5251342D480BA6437F7549@PSB> <B243365E-F7C5-4C53-A64F-2E3E87C4CD66@tzi.org> <248A8DD5DA0D3D34D6B6EFC9@PSB> <213ae024-b819-4f56-6e37-0cd53eb566c9@ix.netcom.com> <D921117F-BA9E-430B-8287-06D15248E1B7@tzi.org> <90f8f2b5-ff3d-f9f1-860c-ae4d43f92c81@ix.netcom.com> <7F1F41C25D0AC5960D95A67E@PSB> <C7BBF677-E752-4258-A357-AE56338F6326@tzi.org> <DFB116527FF004C961182B15@PSB>
In-Reply-To: <DFB116527FF004C961182B15@PSB>
From: Ira McDonald <blueroofmusic@gmail.com>
Date: Thu, 11 Jul 2019 17:41:58 -0400
Message-ID: <CAN40gStf08EwxiZ0+JUa02MLykQPEaL52quK-t9qc-Q8ALxT5A@mail.gmail.com>
To: John C Klensin <john-ietf@jck.com>, Ira McDonald <blueroofmusic@gmail.com>
Cc: Carsten Bormann <cabo@tzi.org>, art@ietf.org, "Asmus Freytag (c)" <asmusf@ix.netcom.com>, i18ndir@ietf.org
Content-Type: multipart/alternative; boundary="000000000000383f5b058d6ea3c8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/art/oOA79nkvmB8RMKUJUJroRhcKwa4>
Subject: Re: [art] [I18ndir] Modern Network Unicode
X-BeenThere: art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Applications and Real-Time Area Discussion <art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/art>, <mailto:art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/art/>
List-Post: <mailto:art@ietf.org>
List-Help: <mailto:art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/art>, <mailto:art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jul 2019 21:40:11 -0000

Hi,

Carsten - thanks for working on this MNU draft.

John - thanks for you excellent background comments.

FWIW - IEEE-ISTO PWG 5100.3 IPP Everywhere (a very widely implemented
driverless printing profile of IETF STD92 Internet Printing Protocol 1.1
and a few
other PWG IPP extensions) makes RFC 5198 and NFC normalization mandatory
for IPP Server generated names, keywords, and text strings and mandatory
for
IPP Client submitted names and keywords.  But IPP's purpose is reliable
document
printing (the document content may, of course, be in some legacy charset,
but must
be tagged) and most string attributes are names (like job names) or keywords
(fonts, media, etc.), so pure NFC w/ restricted C0 characters makes sense.

IPP Everywhere does have this warning in Internationalization
Considerations:

"WARNING – Performing normalization on UTF-8 strings received from IPP
Clients and
subsequently storing the results (e.g., in IPP Job objects) could cause
false negatives in
IPP Client searches and failed access (e.g., to IPP Printers with
percent-encoded UTF-8
URIs now 'hidden')."

Which speaks to John's point about the disadvantages of automatic NFC
(or any other normalization).

Cheers,
- Ira

Ira McDonald (Musician / Software Architect)
Co-Chair - TCG Trusted Mobility Solutions WG
Co-Chair - TCG Metadata Access Protocol SG
Chair - Linux Foundation Open Printing WG
Secretary - IEEE-ISTO Printer Working Group
Co-Chair - IEEE-ISTO PWG Internet Printing Protocol WG
IETF Designated Expert - IPP & Printer MIB
Blue Roof Music / High North Inc
http://sites.google.com/site/blueroofmusic
http://sites.google.com/site/highnorthinc
mailto: blueroofmusic@gmail.com
PO Box 221  Grand Marais, MI 49839  906-494-2434



On Thu, Jul 11, 2019 at 1:25 PM John C Klensin <john-ietf@jck.com> wrote:

>
>
> --On Thursday, July 11, 2019 18:44 +0200 Carsten Bormann
> <cabo@tzi.org> wrote:
>
> > On Jul 11, 2019, at 18:31, John C Klensin <john-ietf@jck.com>
> > wrote:
> >>
> >> (top post and adding the ART list back in)
> >>
> >> Carsten,
> >> I think Asmus and I are in complete agreement, but you might
> >> find an attempt to summarize the state of work for relevant
> >> characteristics and behaviors on the web, some of which
> >> interact with this discussion, illuminating.  It is a chart
> >> with explanation, not a long document.  See
> >> https://w3c.github.io/typography/gap-analysis/language-matrix
> >> .html
> >
> > Thank you for this interesting link.
> > This is very condensed information; what column should I look
> > for specifically on the issue of normalization and
> > applicability of NFC? E.g., in the linked page
> > https://w3c.github.io/iip/gap-analysis/beng-gap I find
> > something about grapheme clusters but nothing about
> > normalization — normalization of course is not an immediate
> > need for Web interoperability.
>
> I'll try to hunt for it when I have time unless someone else
> gets to it first, but there are W3C recommendations that
> specifically recommend against trying to insert a normalization
> step before transmission or immediately before storage but
> focusing on it as a comparison-time operation.   That is
> consistent with what Asmus and I have been trying to tell you.
>
> Let me try to summarize those comments in a different way:
>
> (1) For most language-script combinations, the form in which a
> network application gets a string is usually going to be the
> right form for that string.  If there is a strong preference
> that the string be normalized, it will probably come to the
> network application (and probably when it leaves something
> rather close to the keyboard) normalized.  If there is a strong
> preference that it not be normalized, it will probably come to
> you in _that_ form.  If there is no preference at all,
> normalizing the string (as long as you stay away from NFKC/NFDC)
> will probably not cause harm, but it won't buy much either.
>
> (2) Checking as to whether a string is normalized before sending
> it is unlikely to produce any safely actionable information
> unless you know a lot about the language, script, and
> circumstances.
>
> (3) If a string is unnormalized for what whomever created it
> thinks is a good reason, then normalizing on general principles
> is going to lose whatever information that decision was intended
> to convey.
>
> The above may be different for strings that are specifically
> intended as identifiers, but I don't think that is what you are
> talking about.    I used to believe that "normalize everywhere"
> and "store strings only in normalized form" was a good idea, but
> I've been gradually (and sometimes quite unpleasantly) educated
> by a long series of cases and counterexamples.   If I were
> updating 5198 today, that recommendation is the thing I would be
> most likely to either change or at least moderate and qualify
> considerably.
>
>     john
>
>
> _______________________________________________
> art mailing list
> art@ietf.org
> https://www.ietf.org/mailman/listinfo/art
>