Re: #227: Encoding advice for new headers and parameters

Julian Reschke <> Thu, 29 September 2016 13:53 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 3A37F12B105 for <>; Thu, 29 Sep 2016 06:53:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -8.737
X-Spam-Status: No, score=-8.737 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, RP_MATCHES_RCVD=-2.316, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id QWeN9JEIlYU7 for <>; Thu, 29 Sep 2016 06:53:25 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 6245F12B12C for <>; Thu, 29 Sep 2016 06:53:25 -0700 (PDT)
Received: from lists by with local (Exim 4.80) (envelope-from <>) id 1bpbiL-0007xp-Df for; Thu, 29 Sep 2016 13:49:29 +0000
Resent-Date: Thu, 29 Sep 2016 13:49:29 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <>) id 1bpbi9-0007vt-Pg for; Thu, 29 Sep 2016 13:49:17 +0000
Received: from ([]) by with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from <>) id 1bpbhx-0001Lr-UI for; Thu, 29 Sep 2016 13:49:17 +0000
Received: from [] ([]) by (mrgmx103) with ESMTPSA (Nemesis) id 0MFctN-1bjqvo2fNF-00EhdA; Thu, 29 Sep 2016 15:48:29 +0200
To: Mark Nottingham <>, HTTP Working Group <>
References: <>
From: Julian Reschke <>
Message-ID: <>
Date: Thu, 29 Sep 2016 15:48:29 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:1DJb9zew1VTNJz3oS36m9Pwtj0dk54bgxwtjPmqwvX1MRsbS3gj fz4YnWAGIJ26yOywTGFsT9Z1aDXNhkYcytQyzCgtYBfHJF0duVjasMSlCo4zl/RxokuxqRP EWQtGK9RIKgI6/AIa0TGnTPLa50JAl7lXr9QgsvP7GtpqP3UnOyJfa06hdJpdZwMjBdAY9B 5NK3gMkqnD/eIdaHl0Fug==
X-UI-Out-Filterresults: notjunk:1;V01:K0:73zg1C2GCLQ=:0AfGshrTDjdpXtiw6wab+F BfPMFmRcQp7+sSL5t4Fv9iBau4zMisTr44nFRlrqNMzKIEWGDlQmSkqctxzgmt0UJt0nTZQN+ S4grGeLjy+QCOBWWoIoSiEyOSPwz3PeySt8qNDskKQzQ+YT5RzcHRI/q1e/o2uY/SCXS584BO l4EPO8h+8TcW40g2x3NX24WS8MrRVcVgpXePD8e+qyLDXKaVQx2F81nZ0uDtOBNwf0swqZqW/ PkETlZyzI6bVobOPMxINmNc8aZTrzUmGlfX2a9YEthL/2GCdHQsdCYARsRtGliTyGtANQoTat 9cR9NS8QKrSAHWp3ICdFMjSPRN3/ZLuP/AHoCgZm11h+OBnNJdTpmI03537efARheDeulxJ/U 4X35o87mzhpGkGYvbJeboMV+7M8S7WmUx75ePRuTQx+xO+ykD9AejV1p87b1FLvkJGdjHpf3A eUVKhpL/ufJk4w4V3bEMAYokp36LGfpq2N3ehqOHSnBUrF97rk+iid2RVRQlTv/R7XNL5HeOb O2M4/uL9MPVgROcY0H5jgTsAR+h9kz+wJL9FI0K3hTspdz/iIcb59gXyCETANS3MJJG2f4bZS lamRRRoi3ke5VIBkrnZV9UjbeVy/knUB4nke6xZ75q+593qy/DlGHq4LVdt2ZFvsWEjd+acDf rEYz4h4En0q8vqN/paqjpEMiFavfZ9DxCN8OWWXmWJmAziX13ya/Db8sWB2PNMFH8Lh6+tdcH 0v6l0dQgAp8yCNuBDr/TOrN//knEn8nTZywbZK7YQNMBpB2rsDGIMfOvyU8uakTfZAABX/MwP OZ300i2
Received-SPF: pass client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-8.0
X-W3C-Hub-Spam-Report: AWL=1.101, BAYES_00=-1.9, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: 1bpbhx-0001Lr-UI f9ad01d8446a8725ef7620b9d1ae53ef
Subject: Re: #227: Encoding advice for new headers and parameters
Archived-At: <>
X-Mailing-List: <> archive/latest/32428
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

On 2016-09-28 04:29, Mark Nottingham wrote:
> [ "just me" hat on ]
> <>
> After some discussion in Berlin and Stockholm, as well as experience with dealing with i18n in parameters for the Link header (see <>), I think we should give more definite advice about when RFC5987(bis) encoding should and should not be used.
> In particular, flagging encoding by using a parameter name complicates extension processing (see the issue referenced above), and causes a lot of uncertainty about precedence, etc.

It complicates processing *slightly*.

The issue of parameters potentially repeating, and the fact that you 
need to define what to do in that case, exists in any way. It is 
inherent in any format that supports name/value pairs.

> I think it would be much simpler and more reliable to advise people minting new HTTP headers to *not* use RFC5987(bis) encoding, but instead advise that they mandate use of an encoding on the field (or a specified portion thereof).

RFC 5987 defines a way to deal with non-ASCII. It's not pretty, has a 
slightly bizarre syntax, but at least it's there, and it has been 
implemented successfully in all widely deployed user agents.

Defining *another* way to achieve this seems like a bad idea to me 
(insert XKCD reference here...).

(And yes, I'm all for working on a new common field syntax, which, as 
side effect, addresses non-ASCII, but that's a separate discussion)

> E.g., if the "foo" parameter on the "bar" header field might need to accept non-ascii content, it MUST be generated with those characters encoded, and MUST be parsed by first decoding that portion of the header.

...which essentially *is* the format used RFC 5987, minus parameter 
naming and preamble.

Requiring it's use sounds attractive, but I have my doubts that the 
typical "producer" of field values will get it right; thus we might see 
"%" characters which are not meant to be percent-escapes in the wild.

The RFC 5987 format, as ugly it might be, at least has the property that 
the producer needs to make a conscious decision to choose the format, 
and thus hopefully will get it right according to spec.

> The actual encoding to be used need not be specified, ...

Here I disagree even more. Telling people not to use a standard format, 
but *not* to tell them what to use instead is strange.

 > ...but the simplest approach would probably be to use RFC3986 
%-encoding over a UTF-8 string.

> A more aggressive approach would be to also recommend that new parameters on existing fields (even if they specify use of 5987) SHOULD use such encoding.

-1 to mixing different escaping rules in the same field.

> Thoughts? I'm not going to lie down in the road for this, in that I suspect that most people will gravitate towards this kind of solution naturally, rather than use 5987, but it'd be nice to put clear advice out there.

I'm opposed to discourage use of RFC 5987 encoding until we have 
something better to recommend (and that includes a specification for it).

I'll also point out that in the meantime, at least one more 
specification uses this format 
so if you are serious about discouraging it's use, you really should 
comment on that spec right now.

Best regards, Julian