Re: [precis] I-D Action: draft-ietf-precis-7700bis-01.txt

Peter Saint-Andre <> Mon, 05 September 2016 00:34 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 0114712B0CA for <>; Sun, 4 Sep 2016 17:34:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -3.41
X-Spam-Status: No, score=-3.41 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-1.508, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id oBY_5fN14gSi for <>; Sun, 4 Sep 2016 17:34:49 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id D6AAC12B00D for <>; Sun, 4 Sep 2016 17:34:49 -0700 (PDT)
Received: from aither.local (unknown []) (Authenticated sender: stpeter) by (Postfix) with ESMTPSA id 75051F0793; Sun, 4 Sep 2016 18:35:00 -0600 (MDT)
To: Erin Millard <>
References: <> <> <> <> <> <>
From: Peter Saint-Andre <>
Message-ID: <>
Date: Sun, 4 Sep 2016 18:34:47 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.2.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Archived-At: <>
Subject: Re: [precis] I-D Action: draft-ietf-precis-7700bis-01.txt
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Preparation and Comparison of Internationalized Strings <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 05 Sep 2016 00:34:51 -0000

On 9/4/16 5:30 PM, Erin Millard wrote:
>     >>> * §2.2 Specifies that UTF-8 MUST be used as the encoding; do we really
>     >>> want to limit this to UTF-8 only? Is this for comparison purposes?
>     >>> Then again, 99.99% of the time UTF-8 is what you should be using
>     >>> anyways, so I'm not sure that it matters.
>     >>
>     >> UTF-8 is your friend, and everything in PRECIS is UTF-8.
>     >
>     > PRECIS is mostly encoding agnostic; implementations might favor a
>     > specific encoding, but I don't think anything in the spec specifically
>     > *needs* UTF-8. That being said, there are so few reasons to use
>     > anything other than UTF-8 that I don't think it really matters, it was
>     > just curious to me that some of the PRECIS related specs called out
>     > UTF-8 and some didn't.
>     I thought they all did, but will double-check.
> This actually became a bigger issue when attempting to implement PRECIS
> prepare in JavaScript for the browser. JavaScript doesn't have native
> UTF-8 support, so this meant the extra bloat of bringing in a UTF-8 library.
> It didn't make a lot of sense to me either, since all the encoding
> affects is how you go from string to code points, and vice versa. It had
> no effect on the rest of my implementation. I could absolutely be
> missing something, but compared to how focused the rest of the spec is,
> the UTF-8 requirement seemed like an afterthought.
> Can anyone explain which parts of PRECIS are actually predicated on the
> original string being encoded in UTF-8?

Are we perhaps getting confused between the encoding that is sent over 
the wire and the encoding that is used within the processing application?

In general, we in the IETF prefer to send UTF-8 over the wire. However, 
it's true that this is a matter for the "using protocol" (e.g., I 
distinctly recall an extremely long thread in the XMPP WG years ago 
about whether to support only UTF-8 or to give clients and servers the 
ability to also use UTF-16 - and "UTF-8 only" won that debate). Given 
that some protocols or other technologies that use PRECIS might use 
UTF-16 or give applications the ability to choose an encoding, you're 
probably right that it makes sense to relax the rule for PRECIS itself.

I'll think about this some more and propose some text.