Re: Unicode escape sequence | Re: draft-ietf-httpbis-header-structure-00, unicode range
Matthew Kerwin <matthew@kerwin.net.au> Wed, 14 December 2016 11:56 UTC
Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 07DF8129515 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 14 Dec 2016 03:56:35 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.296
X-Spam-Level:
X-Spam-Status: No, score=-9.296 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-2.896, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Rt8uNPnujt_d for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 14 Dec 2016 03:56:33 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id ABAC8129E07 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 14 Dec 2016 03:56:32 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1cH88h-0003I0-6u for ietf-http-wg-dist@listhub.w3.org; Wed, 14 Dec 2016 11:54:27 +0000
Resent-Date: Wed, 14 Dec 2016 11:54:27 +0000
Resent-Message-Id: <E1cH88h-0003I0-6u@frink.w3.org>
Received: from titan.w3.org ([128.30.52.76]) by frink.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <phluid61@gmail.com>) id 1cH88Y-0003H9-VG for ietf-http-wg@listhub.w3.org; Wed, 14 Dec 2016 11:54:18 +0000
Received: from mail-io0-f195.google.com ([209.85.223.195]) by titan.w3.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from <phluid61@gmail.com>) id 1cH88R-0004PM-Pv for ietf-http-wg@w3.org; Wed, 14 Dec 2016 11:54:13 +0000
Received: by mail-io0-f195.google.com with SMTP id f73so4322722ioe.2 for <ietf-http-wg@w3.org>; Wed, 14 Dec 2016 03:53:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=6u5GRHVyLXoqFEJWVRcZNL5nUn6mk+ww/P8Klc2JJQo=; b=QXxcv4IHu35FZHKcQQKo7L3tNjNveGQozrkVmDrqm5ITaPZnrq+NEbEgiaIXUym1rw HzOUVikpoI5fxwjIT1HKSam2x2nDQym+lkTD8FuMDP+LVzMraAqWR484GFphdSqA+GmT VOd4xtzSVqBQbmC41/oAAFMzaxRZVnD2hLAVOhsD6qLMwZtbxusWzLRZlp35KghhQ1+y clEE4/Jh1IwmCcRJd1RRTEjNJqqg7te54f0/rzmEZXX8jdkXjDDZnZurjVTD/nYK14x1 VE7zE5FAlgfLxvbFJjJhk1qaQXqjg8ZuHzFN8hbN9/pY5UEOx9uN5AUPlSb4s5sEWV35 vXDw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=6u5GRHVyLXoqFEJWVRcZNL5nUn6mk+ww/P8Klc2JJQo=; b=mKuS+1s25QTnrWMdWXFrTbyDZWCAU6v0xXp/33EUjyhzgV2NX6fZBom0ZBtFngwt+V hQzI1asFr8ADrG5NW6NPd3wTfC4VF2oFVLGssv0dHl9dPFskb8W15pjgdAmCUktgiPcs QsqTVSqHFuEyocnzLX7DBe8REwRkNABq5Dt15GduimsmjNpZDUNHo9BmCT9Ys2Z/LQBv awApTH1raD20S/q+5b+lxZysWzKeMWTVbFHbt6rt3J9tnC42RWt6v6TJvsxGXiQVCMBD hwsyB8cGd7BF5M6DfVbB3K+8guEYclraFiaz8C0N7gKbIkY7YWdsXBIW88QG7iSFg9aH TBnA==
X-Gm-Message-State: AKaTC03uqi4dyrGiTbfNXMOFG8XI2vCeC3ahZ4rEhcFEiDKywpt52KcYL3hMzraz+gK9z7Gytg15RBY21Er0Hw==
X-Received: by 10.107.59.88 with SMTP id i85mr93080969ioa.198.1481716425830; Wed, 14 Dec 2016 03:53:45 -0800 (PST)
MIME-Version: 1.0
Sender: phluid61@gmail.com
Received: by 10.107.135.84 with HTTP; Wed, 14 Dec 2016 03:53:45 -0800 (PST)
In-Reply-To: <0cce5fdf-5f1a-4fd3-2e3a-e810a34baccb@gmx.de>
References: <20161213173327.C1F7D1714B@welho-filter2.welho.com> <20161213175419.GA7943@LK-Perkele-V2.elisa-laajakaista.fi> <25434.1481665395@critter.freebsd.dk> <201612140628.uBE6SO3L025885@shell.siilo.fmi.fi> <36792.1481701328@critter.freebsd.dk> <CACweHNDKgWQewZHb=Kz3_2=41M58sY5472Q5OwpqPLxorvkzHQ@mail.gmail.com> <37223.1481707288@critter.freebsd.dk> <3a65ca44-f652-3b14-6d64-46f35b32df57@isode.com> <725824b9-de61-2650-4007-fb5b026bc7a6@gmx.de> <87f1efaf-74c5-f02b-d09e-a721afa86032@isode.com> <0cce5fdf-5f1a-4fd3-2e3a-e810a34baccb@gmx.de>
From: Matthew Kerwin <matthew@kerwin.net.au>
Date: Wed, 14 Dec 2016 21:53:45 +1000
X-Google-Sender-Auth: R8G_ElkPdyI6gO-xmzxvYKPBc4U
Message-ID: <CACweHNBYf-UuxsKNxYakt22rgku9xEP4YK4yL2R+=vMf_uB2Vg@mail.gmail.com>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Alexey Melnikov <alexey.melnikov@isode.com>, Poul-Henning Kamp <phk@phk.freebsd.dk>, Kari Hurtta <hurtta-ietf@elmme-mailer.org>, Ilari Liusvaara <ilariliusvaara@welho.com>, HTTP working group mailing list <ietf-http-wg@w3.org>, Poul-Henning Kamp <phk@varnish-cache.org>
Content-Type: multipart/alternative; boundary="94eb2c05be9438599505439cfde0"
Received-SPF: pass client-ip=209.85.223.195; envelope-from=phluid61@gmail.com; helo=mail-io0-f195.google.com
X-W3C-Hub-Spam-Status: No, score=-4.5
X-W3C-Hub-Spam-Report: AWL=-1.381, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: titan.w3.org 1cH88R-0004PM-Pv 84dfcceacfb6064cadbd22dd2e28c7a1
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Unicode escape sequence | Re: draft-ietf-httpbis-header-structure-00, unicode range
Archived-At: <http://www.w3.org/mid/CACweHNBYf-UuxsKNxYakt22rgku9xEP4YK4yL2R+=vMf_uB2Vg@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/33192
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
On 14 December 2016 at 20:46, Julian Reschke <julian.reschke@gmx.de> wrote: > On 2016-12-14 11:38, Alexey Melnikov wrote: > >> ... >> >>> Has this ever been used in a protocol? >>> >> Some: >> https://datatracker.ietf.org/doc/rfc5137/referencedby/ >> > > Actually, one. > > This was also extensively used in other RFCs without referencing the BCP. >> > > Example? > > The reason why I'm asking is because the notation > > \u'HHHH' or \u'HHHHHH' > > strikes me as: > > 1) verbose > > 2) potentially problematic because of the use of the single quote (which > might require extra escaping in some contexts) > > Yes. It says that "forms that use explicit string delimiters are generally preferred over other alternatives. In many contexts, symmetric paired delimiters are easier to recognize and understand than visually unrelated ones." So brackets are good. And while it advises against using Perl's \x{NNNN...} syntax (because of potential ambiguities with two-digit hex codes), it doesn't say anything at all about \u{N...} Curly braces cost 14+15 bits in HPACK, parentheses 10+10 (incidentally cheaper than single quotes, which are 11+11). It's also convenient that little 'u' is one bit cheaper than little 'x'. I don't think parentheses are at too much risk of needing escaping, so it seems like the solution that goes with BCP 137, and compresses alright with HPACK, is: %x5c.75.28 1*6HEXDIGIT %x29 It's still a little bit clunky for things like "Stra\u(df)e", but not so bad for emoji "\u(1f602)" and somewhere in between for Hiragana " \u(3053)\u(3093)\u(306b)\u(3064)". Cheers > Best regards, Julian > > PS: and, as a nit, it's strange that the syntax uses delimiters but > doesn't allow sequences of 1 to 3 HEXDIGs... > > Having just written "\u(df)" I kind of understand; it really feels like I'm describing an octet rather than a codepoint. I don't think there's a *technical* reason, though. Is it alright to see "\u(9)" or an equivalent in text? -- Matthew Kerwin http://matthew.kerwin.net.au/
- draft-ietf-httpbis-header-structure-00, unicode r… Kari Hurtta
- Re: draft-ietf-httpbis-header-structure-00, unico… Ilari Liusvaara
- Re: draft-ietf-httpbis-header-structure-00, unico… Kari Hurtta
- Re: draft-ietf-httpbis-header-structure-00, unico… Poul-Henning Kamp
- Re: draft-ietf-httpbis-header-structure-00, unico… Poul-Henning Kamp
- Re: draft-ietf-httpbis-header-structure-00, unico… Ilari Liusvaara
- Re: draft-ietf-httpbis-header-structure-00, unico… Poul-Henning Kamp
- Unicode escape sequence | Re: draft-ietf-httpbis-… Kari Hurtta
- Re: draft-ietf-httpbis-header-structure-00, unico… Julian Reschke
- Re: Unicode escape sequence | Re: draft-ietf-http… Poul-Henning Kamp
- Re: draft-ietf-httpbis-header-structure-00, unico… Martin J. Dürst
- Re: Unicode escape sequence | Re: draft-ietf-http… Matthew Kerwin
- Re: Unicode escape sequence | Re: draft-ietf-http… Poul-Henning Kamp
- Re: Unicode escape sequence | Re: draft-ietf-http… Martin Thomson
- Re: Unicode escape sequence | Re: draft-ietf-http… Alexey Melnikov
- Re: draft-ietf-httpbis-header-structure-00, unico… Julian Reschke
- Re: Unicode escape sequence | Re: draft-ietf-http… Julian Reschke
- Re: Unicode escape sequence | Re: draft-ietf-http… Julian Reschke
- Re: Unicode escape sequence | Re: draft-ietf-http… Poul-Henning Kamp
- Re: Unicode escape sequence | Re: draft-ietf-http… Julian Reschke
- Re: Unicode escape sequence | Re: draft-ietf-http… Alexey Melnikov
- Re: Unicode escape sequence | Re: draft-ietf-http… Julian Reschke
- Re: Unicode escape sequence | Re: draft-ietf-http… Poul-Henning Kamp
- Re: Unicode escape sequence | Re: draft-ietf-http… Martin Thomson
- Re: Unicode escape sequence | Re: draft-ietf-http… Matthew Kerwin
- Re: Unicode escape sequence | Re: draft-ietf-http… Julian Reschke
- Re: Unicode escape sequence | Re: draft-ietf-http… Matthew Kerwin
- Re: Unicode escape sequence | Re: draft-ietf-http… Poul-Henning Kamp
- Re: Unicode escape sequence | Re: draft-ietf-http… Poul-Henning Kamp
- Re: Unicode escape sequence | Re: draft-ietf-http… Kari Hurtta
- Re: Unicode escape sequence | Re: draft-ietf-http… Matthew Kerwin
- Re: Unicode escape sequence | Re: draft-ietf-http… Kazuho Oku
- Re: Unicode escape sequence | Re: draft-ietf-http… Daurnimator
- Re: Unicode escape sequence | Re: draft-ietf-http… Mark Nottingham
- Re: Unicode escape sequence | Re: draft-ietf-http… Poul-Henning Kamp
- Re: Unicode escape sequence | Re: draft-ietf-http… Mark Nottingham
- Re: Unicode escape sequence | Re: draft-ietf-http… Julian Reschke
- Re: Unicode escape sequence | Re: draft-ietf-http… Poul-Henning Kamp
- Re: draft-ietf-httpbis-header-structure-00, unico… Mark Nottingham
- Re: draft-ietf-httpbis-header-structure-00, unico… Poul-Henning Kamp
- Re: Unicode escape sequence | Re: draft-ietf-http… Martin J. Dürst