Re: [precis] I-D Action: draft-ietf-precis-7700bis-01.txt

Peter Saint-Andre <stpeter@stpeter.im> Mon, 05 September 2016 21:47 UTC

Return-Path: <stpeter@stpeter.im>
X-Original-To: precis@ietfa.amsl.com
Delivered-To: precis@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9AF7412B46D for <precis@ietfa.amsl.com>; Mon, 5 Sep 2016 14:47:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.41
X-Spam-Level:
X-Spam-Status: No, score=-3.41 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-1.508, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0aIh9SKtgtKr for <precis@ietfa.amsl.com>; Mon, 5 Sep 2016 14:47:21 -0700 (PDT)
Received: from stpeter.im (mailhost.stpeter.im [207.210.219.225]) by ietfa.amsl.com (Postfix) with ESMTP id 5868D12B445 for <precis@ietf.org>; Mon, 5 Sep 2016 14:47:21 -0700 (PDT)
Received: from aither.local (unknown [73.34.202.214]) (Authenticated sender: stpeter) by stpeter.im (Postfix) with ESMTPSA id EECA4F0793; Mon, 5 Sep 2016 15:47:35 -0600 (MDT)
To: Erin Millard <ezzatron@gmail.com>
References: <20160505174255.20595.13753.idtracker@ietfa.amsl.com> <CAHbk4RJVGUxrMYoOX6e7Z924C1Na-uhYsc8SKScBa3rc4j-1jQ@mail.gmail.com> <f6a03acb-4454-d17d-610a-ab87751f57d5@stpeter.im> <CAHbk4R+ibZ8bbn6jk1b+-V0gxu=u6C4abqw37JaZJwZOyb0NZQ@mail.gmail.com> <4A1BEC5B-4514-4BF3-AA81-412B94436591@stpeter.im> <CADz4d2Y0B5DNnQPi9XTCytDQW8Sge=+p3qRJCs3KfZe3UEy9eA@mail.gmail.com> <1818edcd-efdd-4f3e-35a4-c347d8077851@stpeter.im>
From: Peter Saint-Andre <stpeter@stpeter.im>
Message-ID: <8436c38d-c652-1268-2b0a-deb75b5ecdd4@stpeter.im>
Date: Mon, 05 Sep 2016 15:46:44 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <1818edcd-efdd-4f3e-35a4-c347d8077851@stpeter.im>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/precis/vIS6705DRCStvdRdWZcH9ibfHPc>
Cc: precis@ietf.org
Subject: Re: [precis] I-D Action: draft-ietf-precis-7700bis-01.txt
X-BeenThere: precis@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <precis.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/precis>, <mailto:precis-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/precis/>
List-Post: <mailto:precis@ietf.org>
List-Help: <mailto:precis-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/precis>, <mailto:precis-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Sep 2016 21:47:22 -0000

On 9/4/16 6:34 PM, Peter Saint-Andre wrote:
> On 9/4/16 5:30 PM, Erin Millard wrote:
>>     >>> * §2.2 Specifies that UTF-8 MUST be used as the encoding; do
>> we really
>>     >>> want to limit this to UTF-8 only? Is this for comparison
>> purposes?
>>     >>> Then again, 99.99% of the time UTF-8 is what you should be using
>>     >>> anyways, so I'm not sure that it matters.
>>     >>
>>     >> UTF-8 is your friend, and everything in PRECIS is UTF-8.
>>     >
>>     > PRECIS is mostly encoding agnostic; implementations might favor a
>>     > specific encoding, but I don't think anything in the spec
>> specifically
>>     > *needs* UTF-8. That being said, there are so few reasons to use
>>     > anything other than UTF-8 that I don't think it really matters,
>> it was
>>     > just curious to me that some of the PRECIS related specs called out
>>     > UTF-8 and some didn't.
>>
>>     I thought they all did, but will double-check.
>>
>>
>> This actually became a bigger issue when attempting to implement PRECIS
>> prepare in JavaScript for the browser. JavaScript doesn't have native
>> UTF-8 support, so this meant the extra bloat of bringing in a UTF-8
>> library.
>>
>> It didn't make a lot of sense to me either, since all the encoding
>> affects is how you go from string to code points, and vice versa. It had
>> no effect on the rest of my implementation. I could absolutely be
>> missing something, but compared to how focused the rest of the spec is,
>> the UTF-8 requirement seemed like an afterthought.
>>
>> Can anyone explain which parts of PRECIS are actually predicated on the
>> original string being encoded in UTF-8?
>
> Are we perhaps getting confused between the encoding that is sent over
> the wire and the encoding that is used within the processing application?
>
> In general, we in the IETF prefer to send UTF-8 over the wire. However,
> it's true that this is a matter for the "using protocol" (e.g., I
> distinctly recall an extremely long thread in the XMPP WG years ago
> about whether to support only UTF-8 or to give clients and servers the
> ability to also use UTF-16 - and "UTF-8 only" won that debate). Given
> that some protocols or other technologies that use PRECIS might use
> UTF-16 or give applications the ability to choose an encoding, you're
> probably right that it makes sense to relax the rule for PRECIS itself.
>
> I'll think about this some more and propose some text.

As promised, I've thought about it further and I agree that specifying 
an encoding of UTF-8 is not really appropriate in 7613bis and 7700bis. 
In fact, RFC 7564 (the PRECIS framework) states the following in §13.1:

    Although strings that are consumed in PRECIS-based application
    protocols are often encoded using UTF-8 [RFC3629], the exact encoding
    is a matter for the application protocol that uses PRECIS, not for
    the PRECIS framework.

Thus, for instance, it's fine for RFC 7622, which defines the address 
format in XMPP, to specify an encoding of UTF-8, but not for 7613bis or 
7700bis to do so.

I notice that RFC 5890 (for IDNA) has text like this

    o  A "U-label" is an IDNA-valid string of Unicode characters, in
       Normalization Form C (NFC) and including at least one non-ASCII
       character, expressed in a standard Unicode Encoding Form (such as
       UTF-8).

Text similar to that might be best for 7613bis and 7700bis.

Peter