Re: draft-ietf-httpbis-header-structure-00, unicode range

Ilari Liusvaara <ilariliusvaara@welho.com> Tue, 13 December 2016 21:46 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D6B87129C0D for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 13 Dec 2016 13:46:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.797
X-Spam-Level:
X-Spam-Status: No, score=-9.797 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-2.896, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VrjOialGc8R4 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 13 Dec 2016 13:46:41 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 95C57129C21 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 13 Dec 2016 13:45:24 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1cGuqv-0003Mt-IA for ietf-http-wg-dist@listhub.w3.org; Tue, 13 Dec 2016 21:43:13 +0000
Resent-Date: Tue, 13 Dec 2016 21:43:13 +0000
Resent-Message-Id: <E1cGuqv-0003Mt-IA@frink.w3.org>
Received: from mimas.w3.org ([128.30.52.79]) by frink.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <ilariliusvaara@welho.com>) id 1cGuqn-0003LC-LZ for ietf-http-wg@listhub.w3.org; Tue, 13 Dec 2016 21:43:05 +0000
Received: from welho-filter1.welho.com ([83.102.41.23]) by mimas.w3.org with esmtp (Exim 4.84_2) (envelope-from <ilariliusvaara@welho.com>) id 1cGuqh-0003cB-BK for ietf-http-wg@w3.org; Tue, 13 Dec 2016 21:43:00 +0000
Received: from localhost (localhost [127.0.0.1]) by welho-filter1.welho.com (Postfix) with ESMTP id 09ABF159F2; Tue, 13 Dec 2016 23:42:32 +0200 (EET)
X-Virus-Scanned: Debian amavisd-new at pp.htv.fi
Received: from welho-smtp2.welho.com ([IPv6:::ffff:83.102.41.85]) by localhost (welho-filter1.welho.com [::ffff:83.102.41.23]) (amavisd-new, port 10024) with ESMTP id a_L-4K1I7BNy; Tue, 13 Dec 2016 23:42:31 +0200 (EET)
Received: from LK-Perkele-V2 (87-92-51-204.bb.dnainternet.fi [87.92.51.204]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by welho-smtp2.welho.com (Postfix) with ESMTPSA id 6933E21C; Tue, 13 Dec 2016 23:42:31 +0200 (EET)
Date: Tue, 13 Dec 2016 23:42:23 +0200
From: Ilari Liusvaara <ilariliusvaara@welho.com>
To: Poul-Henning Kamp <phk@phk.freebsd.dk>
Cc: Kari Hurtta <hurtta-ietf@elmme-mailer.org>, HTTP working group mailing list <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@varnish-cache.org>
Message-ID: <20161213214223.GA8638@LK-Perkele-V2.elisa-laajakaista.fi>
References: <20161213173327.C1F7D1714B@welho-filter2.welho.com> <25384.1481664527@critter.freebsd.dk>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
In-Reply-To: <25384.1481664527@critter.freebsd.dk>
User-Agent: Mutt/1.5.23 (2014-03-12)
Sender: ilariliusvaara@welho.com
Received-SPF: none client-ip=83.102.41.23; envelope-from=ilariliusvaara@welho.com; helo=welho-filter1.welho.com
X-W3C-Hub-Spam-Status: No, score=-5.7
X-W3C-Hub-Spam-Report: AWL=1.274, BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-3.099, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1cGuqh-0003cB-BK b718ed73245ff400401ddff5fa7012a9
X-Original-To: ietf-http-wg@w3.org
Subject: Re: draft-ietf-httpbis-header-structure-00, unicode range
Archived-At: <http://www.w3.org/mid/20161213214223.GA8638@LK-Perkele-V2.elisa-laajakaista.fi>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/33169
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On Tue, Dec 13, 2016 at 09:28:47PM +0000, Poul-Henning Kamp wrote:
> --------
> In message <20161213173327.C1F7D1714B@welho-filter2.welho.com>, Kari Hurtta wri
> tes:
> 
> >2.  Definition of HTTP Header Common Structure
> >https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-00#section-2
> >
> >|     unicode_string = * unicode_codepoint
> >|             # XXX: Is there a place to import this from ?
> >|             # Unrestricted unicode, because there is no sane
> >|             # way to restrict or otherwise make unicode "safe".
> >
> >What is range of unicode_codepoint ?
> 
> As far as I know, UNICODE does not have a firm upper end, but
> everybody _expects_ 32 bits to be enough for everybody.

Actually, it does: 10FFFD is the last codepoint in Unicode (it is
actually allocated as part of PUA).

IIRC, Unicode has exactly 1,111,998 codepoints in total (most of those
are unallocated). 
 
> Since section two is the abstract datamodel, that's the best we can
> do there.
> 
> >3.  HTTP/1 Serialization of HTTP Header Common Structure
> >https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-00#section-3
> >[...]
> >Or is unicode values > 0xFFFF
> >encoded with surrogates  (values 0xd8000 - 0xdffff) ?
> >( UCS-2 or UTF-16 is used )
> 
> That was the plan.
> 
> Not a particular good plan, as evindenced by the fact that I forgot
> to write that, and that JSON has seen interop issues with parsers
> missing that detail.

Also, note that the surrogate mechanism can only encode up to plane 16
(that's the reason why unicode only has 17 planes!)

And I suppose that the surrogates MUST be paired properly (JSON actually
does not require this).


-Ilari