Re: Misc review notes for draft-18 p1

Julian Reschke <julian.reschke@gmx.de> Thu, 26 January 2012 18:16 UTC

Message-ID: <4F219829.4000704@gmx.de>
Date: Thu, 26 Jan 2012 19:15:05 +0100
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
To: Willy Tarreau <w@1wt.eu>
CC: ietf-http-wg@w3.org
References: <20120126155637.GA11227@1wt.eu>
In-Reply-To: <20120126155637.GA11227@1wt.eu>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Received-SPF: pass client-ip=213.165.64.23; envelope-from=julian.reschke@gmx.de; helo=mailout-de.gmx.net
Subject: Re: Misc review notes for draft-18 p1
Archived-At: <http://www.w3.org/mid/4F219829.4000704@gmx.de>
Resent-From: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
Resent-Message-Id: <E1RqTrb-0006ne-99@frink.w3.org>
Resent-Date: Thu, 26 Jan 2012 18:15:59 +0000

On 2012-01-26 16:56, Willy Tarreau wrote:
> Hi,
>
> I haven't finished reading p1 but I already have some comments, so
> I'm sending them here and will proceed with what remains.
>
>
> 2.1. Client/Server Messaging, page 11
>
>>    Note that 1xx responses (Section 7.1 of [Part2]) are not final;
>>    therefore, a server can send zero or more 1xx responses, followed by
>>    exactly one final response (with any other status code).
>
> This parts falls here quite out of context in my opinion. Neither
> responses nor status core nor messaging has been defined yet and all
> of a sudden we get this. I suggest we move this to P2 7.1 and replace
> it with a small note such as :
>
>    Note that sometimes a server may send multiple responses, see Section
>    7.1 of [Part2] for more details about interim responses.

We did that totally on purpose, see 
<http://trac.tools.ietf.org/wg/httpbis/trac/ticket/300>.

> 2.4. Intermediaries, page 13
>
> Context :
>>             >              >              >              >
>>        UA =========== A =========== B =========== C =========== O
>>                   <              <              <              <
> ...
>
>>    For example, B might be receiving
>>    requests from many clients other than A, and/or forwarding requests
>>    to servers other than C, at the same time that it is handling A's
>>    request.
>
> I'd underline that there is no single path between a UA and an intermediary,
> and that sometimes direct and indirect communications are possible. It helps
> remind people that rewriting URLs along the path is not always a good idea.
> I'd suggest this then :
>
>      For example, B might be receiving requests from many clients other than A
>      including UA/C/O, and/or forwarding requests to servers other than C, at
>      the same time that it is handling A's request.

UA I see, but C and O?

> ...
> 2.7.1. http URI scheme
>
>>     If the host identifier is provided as an IP literal or IPv4 address,
>
> I did not find a clear definition of the term "IP literal". Also, does it
> cover the bracketed format of IPv6 ?

I think we need to ref 
<http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.3.2.2> here.

> ...
> 3.5. Message Parsing Robustness
>
>>    Likewise, although the line terminator for the start-line and header
>>    fields is the sequence CRLF, we recommend that recipients recognize a
>>    single LF as a line terminator and ignore any CR.
>
> Does this mean that CR CR CR CR CR CR LF should be interpreted as a single
> LF ? It kinds of scares me on the risk of smuggling attacks. I'd rather
> suggest :
>
>      ... we recommend that recipients recognize a single LF as a line
>      terminator and ignore the optional preceeding CR. Messages containing
>      a CR not followed by an LF MUST be rejected.

Sounds good to me.

>>    When a server listening only for HTTP request messages, or processing
>>    what appears from the start-line to be an HTTP request message,
>>    receives a sequence of octets that does not match the HTTP-message
>
> Wouldn't "does not *exactly* match" be better ? I'm used to find
> crappy requests in my logs which are blocked but which some not-so-lazy
> implementations would let pass (eg: multiple SP).

"match" means "match"; I don't think there's any ambiguity here...

>>    grammar aside from the robustness exceptions listed above, the server
>>    MUST respond with an HTTP/1.1 400 (Bad Request) response.
>
> I would also suggest that clients and proxies protect themselves against
> malformed response messages, which are problematic in shared hosting
> environments. This could be summarized like this :
>
>      In general, any agent which receives a malformed message MUST NOT try
>      to fix it if there is any possibility that any other implementation
>      along the chain understands it differently. In such conditions, the
>      message MUST be rejected.

-0.5.

- it's a requirement hard to test for, and

- it's not going to be implemented by browsers.

> 4.1. Types of Request Target
>
>> Note: The "no rewrite" rule prevents the proxy from changing the
>
> I did not find reference to this "no rewrite" rule.

It's the rule above the note.

-> <http://trac.tools.ietf.org/wg/httpbis/trac/changeset/1517>

> 4.2. The Resource Identified by a Request
>
>>    1.  If request-target is an absolute-URI, the host is part of the
>>        request-target.  Any Host header field value in the request MUST
>>        be ignored.
>>
>>    2.  If the request-target is not an absolute-URI, and the request
>>        includes a Host header field, the host is determined by the Host
>>        header field value.
>>
>>    3.  If the host as determined by rule 1 or 2 is not a valid host on
>>        the server, the response MUST be a 400 (Bad Request) error
>>        message.
>
> Rule 3 might be difficult to apply in massively hosted environments, as
> I easily imagine that there could be a large "vhosts" directory with
> all the hosts roots presented by their names there. The server would
> then simply try to "cd $host" to check for the host's validity, which
> might seem appropriate at first. But using a host of ".." or a host
> containing a slash would have dramatic effects.
>
> I don't know what recommendation we could add here because we can't
> add boring long sentences, but avoiding such simple traps would be
> nice. Maybe we should just add :
>
>      For instance, a host should never be ".." nor contain a slash.

Are those allowed in a host name anyway?

> ...
> 8.4. TE
>
>>    The presence of the keyword "trailers" indicates that the client is
>>    willing to accept trailer fields in a chunked transfer-coding, as
>
> Is it only limited to the client ? Nowhere it's said that a server cannot
> advertise "TE: trailers" in responses so that a client knows it can emit
> chunked-encoded messages with trailers in further requests (eg: backups
> with SHA1 at the end). Replace "client" with "sender" maybe ?

We seem to be confused about who can set TE anyway:

"The "TE" header field indicates what extension transfer-codings it is 
willing to accept in the response, and whether or not it is willing to 
accept trailer fields in a chunked transfer-coding."

We need to state who "it" is...

> ...
> A.1.2 Keep-Alive Connections
>
>>    Clients are also encouraged to consider the use of Connection: keep-
>>    alive in requests carefully; while they can enable persistent
>>    connections with HTTP/1.0 servers, clients using them need will need
>>    to monitor the connection for "hung" requests (which indicate that
>>    the client ought stop sending the header),
>
> I know a number of people who use the term "the header" to designate all
> the headers section. I must say that when I read this sentence, it was
> unclear to me upon first reading that the intent was in fact to stop
> sending "Connection: keep-alive" in subsequent requests, as it can also
> be understood as "stop sending the headers as long as the connection
> hangs" (which does not make sense).
>
> I'd suggest the following change :
>
> -   the client ought stop sending the header),
> +   the client ought stop using this header in further communications with
> +   the server),

"...ought to stop using this header field in further ..."?

> ...
> That's all for me now, I'll probably have other comments later.
> ...

Thanks a lot for that; I tried to comment where I had some confidence on 
the resolution.

We probably need to figure out a way to manage the feedback better; 
maybe recommend sending smaller chunks with meaningful subject lines, so 
threading works properly?

Best regards, Julian

Misc review notes for draft-18 p1 Willy Tarreau
Re: Misc review notes for draft-18 p1 Julian Reschke
Re: Misc review notes for draft-18 p1 Zhong Yu
RE: Misc review notes for draft-18 p1 Jeroen de Borst
Re: Misc review notes for draft-18 p1 Zhong Yu
Re: Misc review notes for draft-18 p1 Willy Tarreau
Re: Misc review notes for draft-18 p1 Willy Tarreau
Re: Misc review notes for draft-18 p1 Amos Jeffries
Re: Misc review notes for draft-18 p1 Amos Jeffries
Re: Misc review notes for draft-18 p1 Henrik Nordström
Re: Misc review notes for draft-18 p1 Mark Nottingham
#341: whitespace in request-lines and status-lines Mark Nottingham
#340: CR CR LF Mark Nottingham
Re: Misc review notes for draft-18 p1 Amos Jeffries
Re: #340: CR CR LF Amos Jeffries
Re: #341: whitespace in request-lines and status-… Mark Nottingham
Re: #341: whitespace in request-lines and status-… Willy Tarreau
Re: #341: whitespace in request-lines and status-… Amos Jeffries
Re: #341: whitespace in request-lines and status-… Willy Tarreau
Re: #341: whitespace in request-lines and status-… Zhong Yu
Re: #341: whitespace in request-lines and status-… Julian Reschke
Re: #341: whitespace in request-lines and status-… Mark Nottingham
Re: #341: whitespace in request-lines and status-… Roy T. Fielding
Re: #341: whitespace in request-lines and status-… Willy Tarreau
Re: #341: whitespace in request-lines and status-… Mark Nottingham
Re: #341: whitespace in request-lines and status-… Willy Tarreau
Re: #341: whitespace in request-lines and status-… Mark Nottingham
Re: #340: CR CR LF Mark Nottingham