Re: [art] Modern Network Unicode — –02 submitted

Peter Occil <poccil14@gmail.com> Tue, 09 July 2019 01:14 UTC

Return-Path: <poccil14@gmail.com>
X-Original-To: art@ietfa.amsl.com
Delivered-To: art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F1075120192 for <art@ietfa.amsl.com>; Mon, 8 Jul 2019 18:14:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.452
X-Spam-Level:
X-Spam-Status: No, score=-0.452 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, PDS_NO_HELO_DNS=1.295, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HciR5S-VLNcz for <art@ietfa.amsl.com>; Mon, 8 Jul 2019 18:14:45 -0700 (PDT)
Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com [IPv6:2607:f8b0:4864:20::830]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E006E120096 for <art@ietf.org>; Mon, 8 Jul 2019 18:14:44 -0700 (PDT)
Received: by mail-qt1-x830.google.com with SMTP id z4so16565341qtc.3 for <art@ietf.org>; Mon, 08 Jul 2019 18:14:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:mime-version:to:cc:from:subject:date:importance :in-reply-to:references; bh=plsiB725ktBMaUREdhG7t+7C8jHlWfK+CrSrvT24Pp8=; b=mYRsrqI+52PVDwbPvccwJnDOA3KPdZaIdv3Nh3GLwq4eh/B386ZGDSgQMZWfOUik4P yFOlSIy22R/D9Vg1DKeVlgBSMneg/p7tzD75MMmchP8osee9pdrUM1Ff5iOiWW2C2dzk xTrSZH7f8bgO696hrmln712w+ZvoFWUJCxSIoJMLqQf0i6WipGzaqxJk//W2KX73DKYZ tuEwdkuosOzFBu46qpHWfJjUdj0LAczP6vIJ/TTHRJCRKvVjb2xDw5UZkAUIn/1ka/ji 37mtHhbyHicZ9r/2cv2tba2hvq8J0WXASau6EUK/n9G05lmpy4yeivhn2Ls1P4hkc6zZ 5gAg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:mime-version:to:cc:from:subject:date :importance:in-reply-to:references; bh=plsiB725ktBMaUREdhG7t+7C8jHlWfK+CrSrvT24Pp8=; b=YBgx6Or0nscwg93RnGCoZYX+yM217uFP0cbjGjoOQmAy7bSrueFgHg1D+bq2EVnaGf 8SHN4du4M3YuZq+tdIVNkcQIEdk7dpxz55zhMEazsBZodkbRYTFCfNjTiH+pKfjBwAzy 0Gw10LegXlfY2uwulIfanXctgeAiY8yyHTsZD6NuZXcy6I9lkSinNxAQBlSBbXiqtJau 1p5j3ssjmvAGvfBdD3g6BP3Rbb1WTR2H7/7S9aCnu8bgcbFWu5ET1HdT8WBY/snFoIw9 OrAbRxN75b94MZH6xA4a8jjmQJ6td/uhnn4fASJdhMl/wMjLpnLd+wQUxKF4XxcQLTtJ 2MGA==
X-Gm-Message-State: APjAAAW2bj/ee7sVviKNiWc8BIuEiCXqAkV8rqOnSpM3yY/75TNMUBir bf4qoEPNc9pxZ27Oy5b3RnM=
X-Google-Smtp-Source: APXvYqwL7JFT5FvDIaNF0FiA8ufHFEJgAzx3fdC5YpxnDTNM7eV/4sesxSPq7OKLJWhd8STEOZFc6w==
X-Received: by 2002:ac8:2bf1:: with SMTP id n46mr16334727qtn.372.1562634883910; Mon, 08 Jul 2019 18:14:43 -0700 (PDT)
Received: from ?IPv6:2601:192:4e00:596:45a9:5713:df8b:1659? ([2601:192:4e00:596:45a9:5713:df8b:1659]) by smtp.gmail.com with ESMTPSA id r205sm6975223qke.115.2019.07.08.18.14.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Jul 2019 18:14:43 -0700 (PDT)
Message-ID: <5d23ea83.1c69fb81.23b36.b4fb@mx.google.com>
MIME-Version: 1.0
To: Carsten Bormann <cabo@tzi.org>, "Manger, James" <James.H.Manger@team.telstra.com>
Cc: "art@ietf.org" <art@ietf.org>
From: Peter Occil <poccil14@gmail.com>
Date: Mon, 08 Jul 2019 21:14:44 -0400
Importance: normal
X-Priority: 3
In-Reply-To: <790BA1EF-7C0C-4E14-8BB2-4AD421ACAFB5@tzi.org>
References: <CE3AD543-5847-4CAA-9B37-B293BF74C7D8@tzi.org> <SY2PR01MB2764141CB5B9863D0B358C39E5F60@SY2PR01MB2764.ausprd01.prod.outlook.com> <790BA1EF-7C0C-4E14-8BB2-4AD421ACAFB5@tzi.org>
Content-Type: multipart/alternative; boundary="_44BB3A4F-303F-415E-9965-85E652D1A5DE_"
Archived-At: <https://mailarchive.ietf.org/arch/msg/art/xiaRKaguHbTCWv7MaczBxRxe5gU>
Subject: Re: [art] Modern Network Unicode — –02 submitted
X-BeenThere: art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Applications and Real-Time Area Discussion <art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/art>, <mailto:art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/art/>
List-Post: <mailto:art@ietf.org>
List-Help: <mailto:art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/art>, <mailto:art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Jul 2019 01:14:47 -0000

Saying U+FFFE and U+FFFF (or any other noncharacters) “MUST NOT be used” merely “as per the Unicode specification” ought to be guided by looking at what the intent of Unicode actually is.

See corrigendum 9 of the Unicode Standard for that intent: http://www.unicode.org/versions/corrigendum9.html

“Noncharacters in the Unicode Standard are intended for internal use and have no standard interpretation when exchanged outside the context of internal use. However, they are not illegal in interchange nor do they cause ill-formed Unicode text. This has always been the intent of the standard, as expressed by the Unicode Technical Committee.”

Since noncharacters have no standard meaning outside of internal use, however, they may be even more problematic than the Unicode paragraph and line separators (which do have standard meaning but are forbidden in the draft for Modern Network Unicode).  Many kinds of protocol strings, such as URIs, IRIs, and strings complying with the PRECIS framework, do not allow noncharacter code points, while other kinds of UTF-8 text, such as JSON, do allow noncharacters.  XML allows all noncharacters except for U+FFFE and U+FFFF.

--Peter

From: Carsten Bormann
Sent: Monday, July 8, 2019 4:24 PM
To: Manger, James
Cc: art@ietf.org
Subject: Re: [art]Modern Network Unicode — –02 submitted

Hi James, Tim, Martin,

thank you for the quick feedback!  As suggested, I have started turning this into a standalone document (of course, still requiring RFC 3629, the UTF-8 definition):

Status:         https://datatracker.ietf.org/doc/draft-bormann-dispatch-modern-network-unicode/
Htmlized:       https://tools.ietf.org/html/draft-bormann-dispatch-modern-network-unicode-02
Diff:           https://tools.ietf.org/rfcdiff?url2=draft-bormann-dispatch-modern-network-unicode-02

I hope I haven’t missed anything important from RFC 5198 that I wanted to keep.  
Maybe time to involve the authors of that RFC…

Grüße, Carsten


> On Jul 8, 2019, at 04:18, Manger, James <James.H.Manger@team.telstra.com> wrote:
> 
> Would be nicer if it wasn't written as a diff from RFC5198 (Network Unicode). That is, if you could get all the rules directly from this doc. For instance, I assume Clean Modern Network Unicode must/should be NFC. Keep the RFC5198 comparisons for an informative annex.

_______________________________________________
art mailing list
art@ietf.org
https://www.ietf.org/mailman/listinfo/art