Re: New Version Notification for draft-kamp-httpbis-structure-01.txt (fwd)

"Poul-Henning Kamp" <> Fri, 11 November 2016 13:47 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 9BF37129A2E for <>; Fri, 11 Nov 2016 05:47:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -8.398
X-Spam-Status: No, score=-8.398 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-1.497, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 3wxHc-HgSpS3 for <>; Fri, 11 Nov 2016 05:47:02 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 2F97B1299D5 for <>; Fri, 11 Nov 2016 05:47:02 -0800 (PST)
Received: from lists by with local (Exim 4.80) (envelope-from <>) id 1c5C78-0005eW-Vd for; Fri, 11 Nov 2016 13:43:31 +0000
Resent-Date: Fri, 11 Nov 2016 13:43:30 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <>) id 1c5C73-0005ci-G1 for; Fri, 11 Nov 2016 13:43:25 +0000
Received: from ([]) by with esmtp (Exim 4.84_2) (envelope-from <>) id 1c5C6w-0006x4-Cf for; Fri, 11 Nov 2016 13:43:20 +0000
Received: from (unknown []) by (Postfix) with ESMTP id 0E46C27342; Fri, 11 Nov 2016 13:42:55 +0000 (UTC)
Received: from (localhost []) by (8.15.2/8.15.2) with ESMTP id uABDgsrd034042; Fri, 11 Nov 2016 13:42:55 GMT (envelope-from
To: Julian Reschke <>
cc: HTTP Working Group <>
In-reply-to: <>
From: "Poul-Henning Kamp" <>
References: <> <>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <>
Content-Transfer-Encoding: quoted-printable
Date: Fri, 11 Nov 2016 13:42:54 +0000
Message-ID: <>
Received-SPF: none client-ip=;;
X-W3C-Hub-Spam-Status: No, score=-6.8
X-W3C-Hub-Spam-Report: AWL=0.009, BAYES_00=-1.9, RP_MATCHES_RCVD=-2.899, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: 1c5C6w-0006x4-Cf 0f5401c2335fbe7066c4a2b317c99aec
Subject: Re: New Version Notification for draft-kamp-httpbis-structure-01.txt (fwd)
Archived-At: <>
X-Mailing-List: <> archive/latest/32864
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

In message <>de>, Julian Reschke writes:

>        ascii_string = * %x20-7e
>                # This is a "safe" string in the sense that it
>                # contains no control characters or multi-byte
>                # sequences.  If that is not fancy enough, use
>                # unicode_string.
>        unicode_string = * unicode_codepoint
>                # XXX: Is there a place to import this from ?
>                # Unrestricted unicode, because there is no sane
>                # way to restrict or otherwise make unicode "safe".
>It's not clear why there's even a distinction...

To give designers of HTTP headers a trivial way to define strings
which are "safe" and free from needless complexity vs. strings where
the recipient should be prepared to deal with BOM, RLM and "MAN

>Also, it needs to be stated whether the grammar is octet or character 
>based. For an abstract datamodel, the latter probably makes more sense.

The abstract data model is abstract, so it is obviously neither.

For the h1 serialization, I don't see how it makes a difference, unless
somebody is running HTTP/1 in EBCDIC or Morse-code ?

>        h1_common-structure-header =
>                ( field-name ":" OWS ">" h1_common_structure "<" )
>                        # Self-identifying HTTP headers
>                ( field-name ":" OWS h1_common_structure ) /
>                        # legacy HTTP headers on white-list, see {{iana}}
>Do not mix message block ABNF with field value ABNF. Just define what's 
>inside the field value.
>        h1_element = identifier * (";" identifier ["=" h1_value])
>Shouldn't the second "identifier" be "token"?

yes, probably.

>How would a generic recipient decide whether it needs to handle "\u"? 
>What's the point of having different ABNF productions?

Based on the definition/data-dictionary of the header in question.

Remember:  This is only the data-model/h1-serialization, for each
HTTP header, it will (still) be necessary to define what the data
is/can be.

>Also: this puts raw non-ASCII UTF-8 in the string value. It's not clear 
>that this is a good idea for HTTP/1, 

Neither is it obvious that it is going to cause any problems.

For reasons of transmission efficiency, I'm not keen on mandating
\uXXXX for all non-ascii unicode unless experimentation on the live
indicates that we have to, or if we decide for reasons of purity
that HTTP/1 can never have the high bit set.

Either way, another good reason to keep the "safe" string type.

>        h1_common_structure = ">" h1_common_structure "<"
>That's a bit too recursive

No, that is deliberately making recursion possible, as the simplest
possible way to define complex datastructures.

>(speaking of which: "_" isn't allowed in ABNF names)


Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.