Re: Misc review notes for draft-18 p1
Mark Nottingham <mnot@mnot.net> Tue, 07 February 2012 03:25 UTC
Return-Path: <ietf-http-wg-request@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4A01D11E80B0 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 6 Feb 2012 19:25:10 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.852
X-Spam-Level:
X-Spam-Status: No, score=-8.852 tagged_above=-999 required=5 tests=[AWL=1.747, BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XmQA8wj0Y+rD for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Mon, 6 Feb 2012 19:25:08 -0800 (PST)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) by ietfa.amsl.com (Postfix) with ESMTP id B4CC211E80BB for <httpbisa-archive-bis2Juki@lists.ietf.org>; Mon, 6 Feb 2012 19:25:08 -0800 (PST)
Received: from lists by frink.w3.org with local (Exim 4.69) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1RubfF-000861-Mr for ietf-http-wg-dist@listhub.w3.org; Tue, 07 Feb 2012 03:24:17 +0000
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtp (Exim 4.69) (envelope-from <mnot@mnot.net>) id 1Rubf2-000849-Q1 for ietf-http-wg@listhub.w3.org; Tue, 07 Feb 2012 03:24:04 +0000
Received: from fallback-out2.mxes.net ([216.86.168.191] helo=fallback-in2.mxes.net) by maggie.w3.org with esmtp (Exim 4.72) (envelope-from <mnot@mnot.net>) id 1Rubdy-000620-2O for ietf-http-wg@w3.org; Tue, 07 Feb 2012 03:23:29 +0000
Received: from mxout-07.mxes.net (mxout-07.mxes.net [216.86.168.182]) by fallback-in1.mxes.net (Postfix) with ESMTP id A9CA52FD7C0 for <ietf-http-wg@w3.org>; Mon, 6 Feb 2012 22:21:07 -0500 (EST)
Received: from [192.168.1.74] (unknown [118.209.240.235]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id 8705C22E247; Mon, 6 Feb 2012 22:18:34 -0500 (EST)
Mime-Version: 1.0 (Apple Message framework v1257)
Content-Type: text/plain; charset="us-ascii"
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <20120126155637.GA11227@1wt.eu>
Date: Tue, 07 Feb 2012 14:18:31 +1100
Cc: "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, "Julian F. Reschke" <julian.reschke@gmx.de>
Content-Transfer-Encoding: quoted-printable
Message-Id: <7F83A41A-2E06-4743-969C-E27967A1ACD5@mnot.net>
References: <20120126155637.GA11227@1wt.eu>
To: Willy Tarreau <w@1wt.eu>
X-Mailer: Apple Mail (2.1257)
Received-SPF: pass client-ip=216.86.168.191; envelope-from=mnot@mnot.net; helo=fallback-in2.mxes.net
X-W3C-Hub-Spam-Status: No, score=-1.9
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, TIME_LIMIT_EXCEEDED=0.001
X-W3C-Scan-Sig: maggie.w3.org 1Rubdy-000620-2O 6ce2c80183e92479eb80bddc588f1b9e
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Misc review notes for draft-18 p1
Archived-At: <http://www.w3.org/mid/7F83A41A-2E06-4743-969C-E27967A1ACD5@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/12357
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
Resent-Message-Id: <E1RubfF-000861-Mr@frink.w3.org>
Resent-Date: Tue, 07 Feb 2012 03:24:17 +0000
On 27/01/2012, at 2:56 AM, Willy Tarreau wrote: > Hi, > > I haven't finished reading p1 but I already have some comments, so > I'm sending them here and will proceed with what remains. Hi Willy, Thanks for that. I'll add a few comments below. > 2.1. Client/Server Messaging, page 11 > >> Note that 1xx responses (Section 7.1 of [Part2]) are not final; >> therefore, a server can send zero or more 1xx responses, followed by >> exactly one final response (with any other status code). > > This parts falls here quite out of context in my opinion. Neither > responses nor status core nor messaging has been defined yet and all > of a sudden we get this. I suggest we move this to P2 7.1 and replace > it with a small note such as : > > Note that sometimes a server may send multiple responses, see Section > 7.1 of [Part2] for more details about interim responses. I'll leave it to the editors to take this as input. > 2.4. Intermediaries, page 13 > > Context : >>>>>> >> UA =========== A =========== B =========== C =========== O >> < < < < > ... > >> For example, B might be receiving >> requests from many clients other than A, and/or forwarding requests >> to servers other than C, at the same time that it is handling A's >> request. > > I'd underline that there is no single path between a UA and an intermediary, > and that sometimes direct and indirect communications are possible. It helps > remind people that rewriting URLs along the path is not always a good idea. > I'd suggest this then : > > For example, B might be receiving requests from many clients other than A > including UA/C/O, and/or forwarding requests to servers other than C, at > the same time that it is handling A's request. To the editors (generally, I agree). > Later : > >> An HTTP-to-HTTP proxy is called a "transforming proxy" if it is >> designed or configured to modify request or response messages in a >> semantically meaningful way (i.e., modifications, beyond those >> required by normal HTTP processing, that change the message in a way >> that would be significant to the original sender or potentially >> significant to downstream recipients). > > It is not totally clear to me if a compressing proxy is a transforming > proxy, nor if one that rewrites Location headers to normalize them is > a transforming proxy. There's always going to be a fuzzy line here, I think. > 2.7.1. http URI scheme > >> If the host identifier is provided as an IP literal or IPv4 address, > > I did not find a clear definition of the term "IP literal". Also, does it > cover the bracketed format of IPv6 ? > > > 3.3. Message Body > >> The length of the message-body is determined by one of the following >> (in order of precedence): >> >> 1. Any response to a HEAD request and any response with a status >> code of 100-199, 204, or 304 is always terminated by the first >> empty line after the header fields, regardless of the header >> fields present in the message, and thus cannot contain a message- >> body. > > Now that we've included the CONNECT method in the spec, I think it makes > sense to define whether it has a body or not in case of success. I've > found myself sometimes adding "Content-length: 0" as well as huge values > in the past on some CONNECT requests to help interoperability with broken > proxies, as well as "Connection: close" on these similar requests. Obviously > the implementations were faulty but a faulty implementation often results > from ambiguous specs. > > Could we suggest that as a first rule, a 200 response to a CONNECT request > implies an infinite content length (I don't like that very much since it's > false), or that it has no message body and that the connection is immediately > switched to a tunnel ? > > 0. Any response with a status code of 200 to a CONNECT request does not > contain any message-body and immediately switches to a tunnel (Section > 6.9 of [Part2]). > > Also, since I've seen some implementations send "Content-length: 0" on > CONNECT requests (which I happened to mimmick once), I'm realizing that > it's not always obvious what to send on responses where no content is > expected. Would it make sense to insist on the fact that it is not > necessary to send "Content-length: 0" on messages which do not have a > body by the rules above ? This should be covered by <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/250>. > 3.5. Message Parsing Robustness > >> Likewise, although the line terminator for the start-line and header >> fields is the sequence CRLF, we recommend that recipients recognize a >> single LF as a line terminator and ignore any CR. > > Does this mean that CR CR CR CR CR CR LF should be interpreted as a single > LF ? It kinds of scares me on the risk of smuggling attacks. I'd rather > suggest : > > ... we recommend that recipients recognize a single LF as a line > terminator and ignore the optional preceeding CR. Messages containing > a CR not followed by an LF MUST be rejected. I've created <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/340>. >> When a server listening only for HTTP request messages, or processing >> what appears from the start-line to be an HTTP request message, >> receives a sequence of octets that does not match the HTTP-message > > Wouldn't "does not *exactly* match" be better ? I'm used to find > crappy requests in my logs which are blocked but which some not-so-lazy > implementations would let pass (eg: multiple SP). I don't think that adding a word here is going to change those implementations behaviour, unfortunately. >> grammar aside from the robustness exceptions listed above, the server >> MUST respond with an HTTP/1.1 400 (Bad Request) response. > > I would also suggest that clients and proxies protect themselves against > malformed response messages, which are problematic in shared hosting > environments. This could be summarized like this : > > In general, any agent which receives a malformed message MUST NOT try > to fix it if there is any possibility that any other implementation > along the chain understands it differently. In such conditions, the > message MUST be rejected. This will make too many implementations non-conformant. > 4.1. Types of Request Target > >> Note: The "no rewrite" rule prevents the proxy from changing the > > I did not find reference to this "no rewrite" rule. > > > 4.2. The Resource Identified by a Request > >> 1. If request-target is an absolute-URI, the host is part of the >> request-target. Any Host header field value in the request MUST >> be ignored. >> >> 2. If the request-target is not an absolute-URI, and the request >> includes a Host header field, the host is determined by the Host >> header field value. >> >> 3. If the host as determined by rule 1 or 2 is not a valid host on >> the server, the response MUST be a 400 (Bad Request) error >> message. > > Rule 3 might be difficult to apply in massively hosted environments, as > I easily imagine that there could be a large "vhosts" directory with > all the hosts roots presented by their names there. The server would > then simply try to "cd $host" to check for the host's validity, which > might seem appropriate at first. But using a host of ".." or a host > containing a slash would have dramatic effects. > > I don't know what recommendation we could add here because we can't > add boring long sentences, but avoiding such simple traps would be > nice. Maybe we should just add : > > For instance, a host should never be ".." nor contain a slash. That's an implementation-specific security concern, not a general HTTP one. > 7.4. Use of HTTP by other protocols > > It would make sense to list WebSocket here too since it's the first large > scale user of the Upgrade mechanism. > > > 8.4. TE > >> The presence of the keyword "trailers" indicates that the client is >> willing to accept trailer fields in a chunked transfer-coding, as > > Is it only limited to the client ? Nowhere it's said that a server cannot > advertise "TE: trailers" in responses so that a client knows it can emit > chunked-encoded messages with trailers in further requests (eg: backups > with SHA1 at the end). Replace "client" with "sender" maybe ? The current wording doesn't prohibit other uses (or users) in the future... That said, Julian is right, we need to clarify this. > 8.5. Trailer > >> If no Trailer header field is present, the trailer SHOULD NOT include >> any header fields. See Section 5.1.1 for restrictions on the use of >> trailer fields in a "chunked" transfer-coding. > > in 5.1.1 p37, it's said : > >> A server using chunked transfer-coding in a response MUST NOT use the >> trailer for any header fields unless at least one of the following is >> true: > > Is the SHOULD NOT vs MUST NOT difference on purpose ? It seems to indicate > that there should be a tolerance when parsing unadvertised trailers (I'm > fine with this, just checking whether the wording is expected). I think so. > A.1.2 Keep-Alive Connections > >> Clients are also encouraged to consider the use of Connection: keep- >> alive in requests carefully; while they can enable persistent >> connections with HTTP/1.0 servers, clients using them need will need >> to monitor the connection for "hung" requests (which indicate that >> the client ought stop sending the header), > > I know a number of people who use the term "the header" to designate all > the headers section. I must say that when I read this sentence, it was > unclear to me upon first reading that the intent was in fact to stop > sending "Connection: keep-alive" in subsequent requests, as it can also > be understood as "stop sending the headers as long as the connection > hangs" (which does not make sense). > > I'd suggest the following change : > > - the client ought stop sending the header), > + the client ought stop using this header in further communications with > + the server), .. or just "in future requests." To the editors. > At a number of places it is suggested to "close the connection". I > think we could add an annex such as the following one, with references > everywhere we suggest closing the connection, as well as one pointer > in "6.1.2.2 Pipelining" : > > A.x.x Closing a Connection > > When a server needs to close a connection, it must ensure that doing so > will not risk prematurely terminate any previous response. When TCP > segments are still in flight during a socket close, operating systems > generally turn the socket to orphaned state, during which lingering data > will still be emitted but any received data would cause an immediate > connection abort. The connection may also be aborted when the system > is getting low on orphaned sockets. This means that a close before all > lingering data are acknowledged by the client might result in a loss of > unacknowledged data. This is a very common issue when performing a > redirect upon a POST request before all the client's body has been read. > While this is not always an issue when a server wants to abort a current > request, it becomes a real issue when the client tries to pipeline requests, > because aborting the current request may also result in destroying previous > unacknowledged response too, possibly causing a client to retry already > processed requests that it believes were ignored. > > The proper way for a server to close a connection without risking issues > described above is the following : > > 1) shutdown the transmit channel, usually using the shutdown() system > call. > 2) drain any incoming data and (if possible) check for any lingering > data in the transmit queue. > 3) when the receive channel reports a shutdown, or when all transmitted > data have been acknowledged, or when enough time has elapsed, perform > the close() on the socket. > > Operating systems do not always easily report the amount of lingering data > and will not always wake up when the queue is empty. A tradeoff has to be > found between keeping connections alive for too long a time and risking > closing too early and having some clients get truncated or empty responses. My .02 - this seems more like implementation-specific advice; there are cases where this will not be the case. What do others think? -- Mark Nottingham http://www.mnot.net/
- Misc review notes for draft-18 p1 Willy Tarreau
- Re: Misc review notes for draft-18 p1 Julian Reschke
- Re: Misc review notes for draft-18 p1 Zhong Yu
- RE: Misc review notes for draft-18 p1 Jeroen de Borst
- Re: Misc review notes for draft-18 p1 Zhong Yu
- Re: Misc review notes for draft-18 p1 Willy Tarreau
- Re: Misc review notes for draft-18 p1 Willy Tarreau
- Re: Misc review notes for draft-18 p1 Amos Jeffries
- Re: Misc review notes for draft-18 p1 Amos Jeffries
- Re: Misc review notes for draft-18 p1 Henrik Nordström
- Re: Misc review notes for draft-18 p1 Mark Nottingham
- #341: whitespace in request-lines and status-lines Mark Nottingham
- #340: CR CR LF Mark Nottingham
- Re: Misc review notes for draft-18 p1 Amos Jeffries
- Re: #340: CR CR LF Amos Jeffries
- Re: #341: whitespace in request-lines and status-… Mark Nottingham
- Re: #341: whitespace in request-lines and status-… Willy Tarreau
- Re: #341: whitespace in request-lines and status-… Amos Jeffries
- Re: #341: whitespace in request-lines and status-… Willy Tarreau
- Re: #341: whitespace in request-lines and status-… Zhong Yu
- Re: #341: whitespace in request-lines and status-… Julian Reschke
- Re: #341: whitespace in request-lines and status-… Mark Nottingham
- Re: #341: whitespace in request-lines and status-… Roy T. Fielding
- Re: #341: whitespace in request-lines and status-… Willy Tarreau
- Re: #341: whitespace in request-lines and status-… Mark Nottingham
- Re: #341: whitespace in request-lines and status-… Willy Tarreau
- Re: #341: whitespace in request-lines and status-… Mark Nottingham
- Re: #340: CR CR LF Mark Nottingham