Re: SPDY Header Frames

Mike Belshe <mike@belshe.com> Tue, 17 July 2012 07:19 UTC

Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BBB9021F8599 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 17 Jul 2012 00:19:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.912
X-Spam-Level:
X-Spam-Status: No, score=-9.912 tagged_above=-999 required=5 tests=[AWL=0.064, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PlF5+4eUnTyB for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 17 Jul 2012 00:19:07 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id A6A1121F8597 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 17 Jul 2012 00:19:07 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.72) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1Sr23Z-0002Nf-UM for ietf-http-wg-dist@listhub.w3.org; Tue, 17 Jul 2012 07:18:53 +0000
Resent-Date: Tue, 17 Jul 2012 07:18:53 +0000
Resent-Message-Id: <E1Sr23Z-0002Nf-UM@frink.w3.org>
Received: from lisa.w3.org ([128.30.52.41]) by frink.w3.org with esmtp (Exim 4.72) (envelope-from <mike@belshe.com>) id 1Sr23P-0002Mt-OB for ietf-http-wg@listhub.w3.org; Tue, 17 Jul 2012 07:18:43 +0000
Received: from mail-lb0-f171.google.com ([209.85.217.171]) by lisa.w3.org with esmtps (TLS1.0:RSA_ARCFOUR_SHA1:16) (Exim 4.72) (envelope-from <mike@belshe.com>) id 1Sr23C-0005Tl-J5 for ietf-http-wg@w3.org; Tue, 17 Jul 2012 07:18:39 +0000
Received: by lbom4 with SMTP id m4so244355lbo.2 for <ietf-http-wg@w3.org>; Tue, 17 Jul 2012 00:18:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=+sAjMjHl69P1pIL8ZbQDohAZZMyau7S9y81/NS9AP9Y=; b=b64M9w3zyHQpeKEJ99J4QXLM3S3D1j9N/A/HN2douIfuJ77h+Fr+PX6XfyziaRwv7J 22A1QAdbW5ANWytCQm9WUPcHW/ojsASZCp2XvX+SuPNrQjGRjDATkcd+Gj/lVioC4zdV jwjIVveumgZQQkyDRMrkpIoEIQSv/d82gCevyajezHLvMCthq4h2w8ZZ8l8cpd3occFn Mha6gI2dRwKaeSD9feIJu+65o8/sVCutsl6V7n0PQrRIo77nha2ZDbEAAm3wnuTG/XAe yeci6mAVOFwdlpeifl5wZWD/rc3zLIF5WfCqRSqxVDfZAFRDkRZi0xAOPqwOnn7mhxR3 Kg6A==
MIME-Version: 1.0
Received: by 10.152.131.37 with SMTP id oj5mr1385214lab.14.1342509483711; Tue, 17 Jul 2012 00:18:03 -0700 (PDT)
Received: by 10.112.99.1 with HTTP; Tue, 17 Jul 2012 00:18:03 -0700 (PDT)
In-Reply-To: <CABP7RbdZPYYqtDFWUprrAmGQ8GehOTh-LvXq9gizXGUJg4=68g@mail.gmail.com>
References: <CABP7RbepWH4ahSPHDU_M_w0tRVz_RRm1FV-jM_Y72=YHCVqO0g@mail.gmail.com> <CABP7RbdZPYYqtDFWUprrAmGQ8GehOTh-LvXq9gizXGUJg4=68g@mail.gmail.com>
Date: Tue, 17 Jul 2012 00:18:03 -0700
Message-ID: <CABaLYCswjWjNsBmFgJOWSCUZ+Nxu90NyRc+H_CRym9L7QX7ysg@mail.gmail.com>
From: Mike Belshe <mike@belshe.com>
To: James M Snell <jasnell@gmail.com>
Cc: ietf-http-wg@w3.org
Content-Type: multipart/alternative; boundary="f46d042c6b8de25b4204c501589e"
X-Gm-Message-State: ALoCoQmOI+k7UcsMOzNa7mV6opIv8cK8FJXTK2Z1j86jRKEIh8RCju7kqD1XvPBvyrH3jLJugvFe
Received-SPF: none client-ip=209.85.217.171; envelope-from=mike@belshe.com; helo=mail-lb0-f171.google.com
X-W3C-Hub-Spam-Status: No, score=-2.6
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7
X-W3C-Scan-Sig: lisa.w3.org 1Sr23C-0005Tl-J5 e84d2b1e28e22253f0f05466087ef943
X-Original-To: ietf-http-wg@w3.org
Subject: Re: SPDY Header Frames
Archived-At: <http://www.w3.org/mid/CABaLYCswjWjNsBmFgJOWSCUZ+Nxu90NyRc+H_CRym9L7QX7ysg@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/14313
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

i like the direction of this.  a good blend of a registry and extension
headers.

i wasn't quite sure how you got to the 6 byte compressed header, tho.

mike


On Mon, Jul 16, 2012 at 11:51 PM, James M Snell <jasnell@gmail.com> wrote:

> Ok... spent some time this evening playing around with the header frame
> syntax a bit more to see what further optimizations could be made and to
> see if the binary encoded header id's made any noticeable difference in
> size and ease of processing.
>
> Here's the revised structure I played around with...
>
> 1. Within a HEADER block, I assume two possible types of headers,
> REGISTERED and EXTENSION. A REGISTERED header is one that would be known to
> the registrar and assigned a numeric id and a codepage. If the codepage is
> 0, the implication is that the header is MUST UNDERSTAND and is considered
> one of the core headers for the basic operation of the protocol. Codepages
> 1-9 are MUST-IGNORE... that is, if a user-agent or server comes across a
> header on these code pages that is not understood, the header can simply be
> ignored. Codepages 10-14 are PRIVATE USE, with Codepage 10 being reserved
> for MUST UNDERSTAND PRIVATE USE headers. EXTENSION headers are simple name
> value pairs essentially as they exist today.
>
> 2. Within extension headers, the name portion MUST be ASCII and MUST NOT
> be longer than 255 bytes (quite generous really).
>
> 3. Values may be binary or character based, as indicated by a flags field.
> Values may be up to max(int32) in length.
>
> 4. Registered HTTP Methods can be identified by numeric value. Extension
> Methods can be identified by character value.
>
> 5. The structure for REGISTERED HEADERS is...
>
>   +------------------------------+
>   |0| id (15-bit)| flags(8-bit)  |
>   +------------------------------+
>   | len (32-bit) |     value     |
>   +------------------------------+
>
> 6. The structure for EXTENSION HEADERS is...
>
>   +------------------------------+
>   |1| flags(7-bit) | namelen (8) |
>   +------------------------------+
>   | name | val len (32) | value  |
>   +------------------------------+
>
> Assuming the following registered headers...
>
>   public static final short VERSION = 1;
>   public static final short METHOD = 2;
>   public static final short HOST = 3;
>   public static final short SESSION = 4;
>   public static final short CHARSET = 5;
>   public static final short REQUEST_URI = 6;
>   public static final short ACCEPT_LANG = 4097;
>
> And the following registered methods...
>
>   public static final byte GET = 1;
>   public static final byte POST = 2;
>   public static final byte PUT = 3;
>   public static final byte DELETE = 4;
>   public static final byte PATCH = 5;
>   public static final byte HEAD = 6;
>   public static final byte OPTIONS = 7;
>
> Let's assume that what we want to to encode a HTTP GET for resource:
>   http://www.example.org/this/is/the/request?is=it&not=beautiful
>
> With a session identifier of "session_key", ACCEPT_LANG = en-US and
> default charset encoding for all character based header values is
> "US-ASCII"... Let's also add an extension header "ext" with value "foo"...
>
> The Version header can be encoded as:
>   {0, 1, 0, 0, 0, 0, 2, 2, 0}
>
> The GET Method header can be encoded as:
>   {0, 2, 0, 0, 0, 0, 1, 1}
>
> The Host header would be encoded as:
>   {  0,   3,   1,  0,   0,   0,  15, 119, 119, 119,
>     46, 101, 120, 97, 109, 112, 108, 101,  46, 111,
>    114, 103}
>
> The Accept-Lang header would be encoded as:
>   {16, 1, 1, 0, 0, 0, 5, 'e', 'n', '-', 'U', 'S'}
>
> The Extension header ext: foo would be encoded as:
>   {-128, 1, 3, 101, 120, 116, 0, 0, 0, 3, 102, 111, 111}
>
> The entire header block is encoded into a structure of 145 bytes in length;
>
> [8, 0, 1, 0, 0, 0, 0, 2, 2, 0, 0, 2, 0, 0, 0, 0, 1, 1, 0, 3, 1, 0, 0, 0,
> 15, 119, 119, 119, 46, 101, 120, 97, 109, 112, 108, 101, 46, 111, 114, 103,
> 0, 6, 1, 0, 0, 0, 40, 47, 116, 104, 105, 115, 47, 105, 115, 47, 116, 104,
> 101, 47, 114, 101, 113, 117, 101, 115, 116, 63, 105, 115, 61, 105, 116, 38,
> 110, 111, 116, 61, 98, 101, 97, 117, 116, 105, 102, 117, 108, 0, 5, 1, 0,
> 0, 0, 8, 117, 115, 45, 97, 115, 99, 105, 105, 0, 4, 1, 0, 0, 0, 11, 115,
> 101, 115, 115, 105, 111, 110, 95, 107, 101, 121, 16, 1, 1, 0, 0, 0, 5, 101,
> 110, 45, 85, 83, -128, 1, 3, 101, 120, 116, 0, 0, 0, 3, 102, 111, 111]
>
> By comparison, the same structure encoded using the existing SPDY HEADER
> block would require 208 bytes sans compression.
>
> After applying compression of the block using the SPDY dictionary, the
> block compresses into 6 compact bytes.
>
> [120, 63, -29, -58, -89, -62]
> Assuming this structure was used within a SPDY_STREAM message,
> unencrypted, a proxy/router that is scanning the headers to determine where
> to route the SYN_STREAM too would need only to look at the first two bytes
> of each header to determine if the header is either the HOST, METHOD,
> REQUEST_URI, VERSION or SESSION identifier. This scheme should prove to be
> significantly faster to scan and perform operations on than the current
> all-text-key-pair model. As always, tho, your mileage may vary.
>
> /end-experiment
>
> - James
>
> On Fri, Jul 13, 2012 at 3:16 PM, James M Snell <jasnell@gmail.com> wrote:
>
>> This note is intended to provide some additional thoughts for discussion
>> around the design and use of SPDY as the possible basis for HTTP/2.0. The
>> intent is to provide fuel for discussion... comments are definitely welcome.
>>
>> As discussed within draft-tarreau-httpbis-network-friendly-00, and as has
>> been mentioned several times in discussion on list, handling of headers
>> within the current SPDY framing, and in particular the layering of HTTP/1.1
>> messages into SPDY frames is less than optimal. There is significant wasted
>> space, duplication, etc that -- strictly speaking -- really isn't
>> necessary. While I recognize that the following increases the basic
>> complexity of the protocol, it allows fairly significant optimization
>> following the same basic lines of reasoning expressed in
>> draft-tarreau-httpbis-network-friendly-00.
>>
>> Section 2.6.1 of the SPDY draft defines header blocks using the following
>> format:
>>
>>    +------------------------------------+
>>    | Number of Name/Value pairs (int32) |
>>    +------------------------------------+
>>    |     Length of name (int32)         |
>>    +------------------------------------+
>>    |           Name (string)            |
>>    +------------------------------------+
>>    |     Length of value  (int32)       |
>>    +------------------------------------+
>>    |          Value   (string)          |
>>    +------------------------------------+
>>    |           (repeats)                |
>>
>> This structure is used within SYN_STREAM and HEADERS frames.
>>
>> What I propose is the following revised structure:
>>
>>    +------------------------------------+
>>    |     Number of Headers (int32)      |
>>    +------------------------------------+
>>    |T| Flags (7) |     Length (24)      |
>>    +------------------------------------+
>>    |              Data                  |
>>    +------------------------------------+
>>    |T| Flags (7) |     Length (24)      |
>>    +-------------------------------------
>>    |              Data                  |
>>    +-------------------------------------
>>    |             (repeats)              |
>>
>> T is a single bit identifying the Header Type. There are two types..
>> REGISTERED (0) and EXTENSION (1)
>>
>> Flags provides flags for the specific header field. The flag 0x1
>> indicates that the header value contains Character Data. If not set, the
>> value is assumed to consist of raw octets. 0x2 indicates that the value is
>> compressed.
>>
>> Length is an unsigned 24-bit value specifying the number of octets after
>> the length field.
>>
>> When the T bit is NOT set, the Header field is a REGISTERED Header, the
>> structure of which is:
>>
>>    +------------------------------------+
>>    |0| Flags (7) |     Length (24)      |
>>    +------------------------------------+
>>    | ID | Value Length (int32) |Value...|
>>    +------------------------------------+
>>
>> The ID is a 32-bit number uniquely identifying the registered field. Each
>> is assigned by the registrar. For instance, the "Host" field could have a
>> registered value of "1", the "Accept-Lang" field could have a registered
>> value of "6", and so forth.
>>
>> The Value Length is a 32-bit value indicating the length of the value.
>>
>> If Flag 0x1 is set, the value is assumed to contain character data. When
>> set, the value MUST be preceded by a single unsigned 8-bit integer
>> identifying the character encoding utilized. The values are assigned by the
>> registrar. For instance, US-ASCII could have a registered value of "1",
>> while "UTF-8" could have a registered value of "2".
>>
>> For example:
>>
>>    +------------------------------------+
>>    |0| 0000001 |     24                 |
>>    +------------------------------------+
>>    | 1 | 16 | 1 |    www.example.org    |
>>    +------------------------------------+
>>
>> This Header record indicates a REGISTERED header containing character
>> content, the header ID = 1, the charset used is US-ASCII and the value is "
>> www.example.org". The header is expressed with a total of 28 bytes.
>>
>> When the T bit IS set, the Header field is an EXTENSION Header, the
>> structure of which is:
>>
>>    +------------------------------------+
>>    |0| Flags (7) |     Length (24)      |
>>    +------------------------------------+
>>    |      Length of name (int32)        |
>>    +------------------------------------+
>>    |           Name (string)            |
>>    +------------------------------------+
>>    |      Length of value (int32)       |
>>    +------------------------------------+
>>    |              Value                 |
>>    +------------------------------------+
>>
>> For example.. an extension header that contains raw binary data...
>>
>>    +------------------------------------+
>>    |0| 0000000 |       Length (24)      |
>>    +------------------------------------+
>>    |                5                   |
>>    +------------------------------------+
>>    |              x-foo                 |
>>    +------------------------------------+
>>    |                4                   |
>>    +------------------------------------+
>>    |           {raw bytes}              |
>>    +------------------------------------+
>>
>> The header is expressed with a total of 21 bytes.
>>
>> The same flags apply. 0x1 indicates that the value is character data. If
>> 0x1 is not set, the value contains raw octets. The key difference is that
>> there is a 32-bit name length and variable length name field in place of
>> the 32-bit ID field in the REGISTERED header. All other details remain the
>> same.
>>
>> As is currently the case in SPDY, if a single header value contains
>> multiple values, each can be separated using a single NUL (0) byte.
>>
>> There are several advantages to this approach:
>>
>> 1. Commonly used header names are omitted in favor of registered, known
>> numeric IDs, saving space and making it more efficient to scan over
>> commonly used headers. For instance, intermediaries that route requests
>> based on common headers such as Host etc could choose to ignore EXTENSION
>> header fields entirely, and scan only for the ID's of the fields they are
>> interested in, rather than having to parse the entire bag of header names.
>>
>> 2. Header values can be expressed as raw octets or character data.
>> Currently, mechanisms within HTTP require developers to muck around with
>> Base64 encoding or other encodings when including detail within a header.
>> This approach would eliminate that extra step. For instance, if I wanted to
>> have a Content-Integrity header whose value is an hmac digest, I would be
>> able to drop the raw bytes of the digest into the header value rather than
>> base64 or hex encoding it into an ASCII string, saving CPU cycles and
>> reducing the amount of data that must be transmitted.
>>
>> 3. Header values that contain character data would not be limited to
>> US-ASCII. Multiple charset encodings would be allowed... obviously this has
>> a whole slew of issues associated with it that need to be carefully
>> considered. The charset encoding flag could be dropped, if necessary, from
>> this proposal.
>>
>> For HTTP/1.1 Compatibility, each REGISTERED Header would be mapped to a
>> known, registered HTTP/1.1 header, allowing one to one translation from the
>> optimized form to the HTTP/1.1 form. Binary values would be base64-encoded.
>> If a particular header does not allow for Base64 encoded values under
>> HTTP/1.1, the down-level recipient would have the option of responding with
>> an appropriate 404 response.
>>
>> That's it for now. There are additional considerations to be given to the
>> specific selection of header fields to include within the SYN_STREAM vs.
>> follow-on HEADERS frames but that's a separate conversation. As always,
>> feedback is welcome...
>>
>> - James
>>
>>
>