PP4: The Three Laws of HTTP Usage
Lisa Dusseault <ldusseault@commerce.net> Fri, 11 January 2008 01:19 UTC
Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1JD8Yo-0005BR-T5; Thu, 10 Jan 2008 20:19:50 -0500
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1JD8Yn-0005BM-Rp for discuss-confirm+ok@megatron.ietf.org; Thu, 10 Jan 2008 20:19:49 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1JD8Yn-0005BE-Hz for discuss@apps.ietf.org; Thu, 10 Jan 2008 20:19:49 -0500
Received: from gateout01.mbox.net ([165.212.64.21]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1JD8Yk-0002uv-47 for discuss@apps.ietf.org; Thu, 10 Jan 2008 20:19:49 -0500
Received: from gateout01.mbox.net (gateout01.mbox.net [165.212.64.21]) by gateout01.mbox.net (Postfix) with ESMTP id A43BF211B for <discuss@apps.ietf.org>; Fri, 11 Jan 2008 01:19:45 +0000 (GMT)
Received: from GW1.EXCHPROD.USA.NET [165.212.116.254] by gateout01.mbox.net via smtad (C8.MAIN.3.34P) with ESMTP id XID254makBTt4356Xo1; Fri, 11 Jan 2008 01:19:45 -0000
X-USANET-Source: 165.212.116.254 IN ldusseault@commerce.net GW1.EXCHPROD.USA.NET
X-USANET-MsgId: XID254makBTt4356Xo1
Received: from [10.1.1.107] ([157.22.41.236]) by GW1.EXCHPROD.USA.NET over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Thu, 10 Jan 2008 18:19:44 -0700
Mime-Version: 1.0 (Apple Message framework v752.3)
To: Apps Discuss <discuss@apps.ietf.org>
Message-Id: <85B1F813-0EA1-4EB5-A828-E93AD89BB222@commerce.net>
Content-Type: multipart/alternative; boundary="Apple-Mail-8-516367523"
From: Lisa Dusseault <ldusseault@commerce.net>
Subject: PP4: The Three Laws of HTTP Usage
Date: Thu, 10 Jan 2008 17:19:43 -0800
X-Mailer: Apple Mail (2.752.3)
X-OriginalArrivalTime: 11 Jan 2008 01:19:44.0935 (UTC) FILETIME=[0CA36370:01C853F0]
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 7191030d885084e634ab0f488bcd9d53
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org
The Three Laws of Robotics HTTP Usage Lisa Dusseault Jan 10, 2008 This is an attempt to rationalize and justify the requirements we make for IETF standards that use HTTP. The rationalization is inspired by the three laws of robotics described by Isaac Asimov. This categorization may help us figure out who is protected by HTTP reuse guidelines: - deployed systems, - implementors of the new application, - protocol designers (especially future application extensions), or - nobody. These guidelines and requirements may be less applicable to HTTP extensions, such as WebDAV and Atom, where the application makes use of many HTTP resource features and not just its transport characteristics. Those kinds of extensions are even harder to do and require HTTP experts and deep familiarity with RFC2616. A META-NOTE on the state of this document: The reminders and requirements in section 2 are intended to be complete in this draft, not reasonable. I look forward to very interesting discussions about which requirements are not reasonable, particularly if such discussions can happen before HTTP is revised by HTTPBIS WG, in time to influence that revision. 1. First law: Do no harm These are requirements on the design of the application, particularly whether it uses GET or POST and registers new MIME types or ports. These questions should be dealt with early in protocol design and specification. 1.1 Allow filtering Some organizations need to filter or otherwise identify traffic from various HTTP applications. Here are the most reasonable ways to filter HTTP traffic: - Well-known site - New methods - New HTTP version - New port - New MIME Type Defining a single site is not usually appropriate to IETF standards. Setting a new version for a protocol extending HTTP would be pretty nuts. New methods are more often used and appropriate in HTTP extensions than for applications that just want to transmit application requests to HTTP servers for handling by a lightweight extension module. Ruling those three out for most cases leaves us with new port numbers and new MIME types. Both of these use a registry which helps people monitoring traffic to figure out what is going on. BCP 56 recommends new ports, but new MIME types is becoming more common and seems acceptable under these "moral" principles. To make filtering with MIME types reasonable, the MIME type must be unique to the application, registered, and both requests and responses that have bodies MUST use filterable MIME types. It's a little hard to imagine an application using HTTP as a transport that doesn't have any message bodies, but that would definitely be a special case to consider if it were proposed. 1.2 Support intermediaries A few features must be supported correctly, or else intermediaries may do the wrong thing. Since intermediaries can cache responses and retransmit requests, the application needs to support caching and retransmission correctly. Intermediaries are allowed to retransmit any request using a method that is defined as idempotent methods; the most important such method is GET. Thus, an application can only tunnel over GET requests if no harm results from intermediaries retransmitting GET requests. An example of an idempotent application GET request is querying for a certificate from a repository. The POST method should be used to tunnel non-idempotent requests. Since GET responses can be cached, application responses to GET requests need to be cacheable or have the correct cache prevention headers. 2. Second Law: Obey orders This section is more about helping implementations get HTTP right than about designing the new application. A specification using HTTP can get theoretical compliance with RFC2616 just by making a normative reference. However, implementations of such specifications typically do not comply properly with RFC2616. Perhaps implementors need a couple reminders? Specifications should provide a list of oft-violated requirements and remind implementors that they really are requirements. (Alternatively, a protocol specification could overrule HTTP requirements, often by requiring the opposite: instead of requiring servers to handle HEAD requests, the specification could require clients not to send HEAD requests). This extra work to ensure that implementors follow HTTP2616 is not just for pedantic perfection. There are two main reasons, both of which protect the application protocol and its implementors. First, proper support for all features protects the full feature set in case it is needed later. If a few early clients don't support a feature that isn't used by early server implementations, this seems innocuous. But later, one finds that the application servers cannot be updated to make use of the full feature set even if it would be very useful. There's no way to advertise or negotiate for these features because they are REQUIRED in HTTP. When required and un-advertised features are poorly implemented, they become practically impossible to use later. Second, proper support for all features makes general-purpose client libraries and general-purpose server libraries work better. Again, this protects the new application by making it easier to implement with standard libraries -- but this only works if early implementations really do the right thing. Designers of a new protocol might instead make the choice that the new protocol cannot use existing HTTP client or server libraries, in which case the choice should be explicitly stated and version number or protocol name changes should be considered. These features are typically very easy to support properly. In some cases it's just returning the right error -- e.g. servers can fail a request containing unrecognized Content-* headers. Requirements reminders for server implementors: - MUST handle the HEAD request properly, returning no body in the response. - MUST be prepared to handle OPTIONS * requests. - MUST use an error responding to unrecognized methods. - MUST examine conditional headers on requests, and if necessary, fail the request (If-* and ) - MUST honour Content-* headers on requests. Any Content-* headers that are not recognized or cannot be parsed, should cause a ??? error. - MUST handle the Range header or fail the request. - MUST look for the Expect header and be able to do 100 Continue (without waiting for request body) or fail. - MUST either support persistent connections or include the "close" connection option in every response. If the server allows persistent connections it MUST also implement pipelining, not dropping pipelined requests and handling responses in order. Requirements reminders for client implementors: - MUST include a Host header on requests - MUST support several ways response endings may be handled: chunked transfer-encoding, connection closing, and Content-Length. - MUST either support persistent connections or include the "close" connection option in every request. - If the client supports HTTP caching, it MUST examine the Vary, Cache-Control and Expires headers. - MUST NOT automatically follow redirects for methods other than GET and HEAD. - MUST handle a variety of success responses as successes (202, 203, 205) - MUST handle the 407 Proxy Authentication Required response and be able to use the Proxy-Authenticate response-header to authenticate. Requirements on protocol design: - HTTP status codes MUST preserve the same meaning to interoperate well with HTTP client libraries. For example, the 401 Unauthorized code triggers a login request using HTTP authentication. A tricky one is 412 Precondition failed, which can only be used when the client put a precondition on the request. 3. Third Law: Protect yourself These are considerations that can protect the security or extensibility of an application using HTTP. Being considerations, these might not be enforced in outside review of the specification, but they are well worth considering. Thus, this section does not have normative requirements. 3.1 Security Protocol designers may consider whether redirects may safely be followed in all cases or in limited cases. The application could require clients to support redirects, which gives servers more deployment flexibility. On the other hand, the application could limit redirects (only within local site) or forbid the use entirely. Recall that HTTP supports TLS proxies, and these are used in some corporate sites. What this means is that rather than have the client connect directly to the target site, the client connects to the proxy and the proxy initiates its own TLS connection to the target site. The proxy thus gains full access to the content of the application protocol. If this is not an acceptable situation, the application cannot use HTTP as-is. HTTP authentication is not as secure as many other more modern IETF authentication technologies. An application that requires better-than-standard authentication over HTTP may find that default client and server libraries cannot be used. 3.2 Discoverability and interoperability: URL considerations Decide how URLs are known or discovered. Do application requests go straight to http://app.example.org? Can there be a path part? An extended example in this section is the Noogie application, where users have personal Noogie URLs. These URLs identify resources that can receive Noogie application requests, which may trigger virtual noogies given to the receiving user, and result in success or failure responses to the requestor. 3.2.1 Scheme Applications that use HTTP can of course use the HTTP URL scheme. However, protocol designers should consider defining a new URL scheme anyway. In cases where the URL can be found along with other HTTP URLs, this allows clients to select a URL that does what they want. Example: the Noogie application URLs are intended to appear in VCards, presence information documents and on Web sites. In order to allow clients to immediately detect such a URL and know what it's for, the Noogie standard designers can register the "noogie" scheme and explain how this scheme maps to the "http" scheme. 3.2.2 Path part Does the URL path part have any structure? If so, make sure that the path part still allows application servers to be deployed in places where development frameworks and site policies dictate prefixes to this path part. E.g. Legal Noogie URLs must also be able to contain prefixes like "servlets/noogie/" at sites that have policies about service framework usage. One resulting URL would be http://www.example.org/ servlets/noogie/John/Doe even though another site might prefer http:// noogie.example.org/John/Doe. 3.2.3 Query part Most applications that use HTTP do not extend or formalize the query part of an HTTP URL. In this case, the protocol specification might forbid query parts or require that they be stripped from URLs. Query strings have been used to exploit security holes in many HTTP servers.
- PP4: The Three Laws of HTTP Usage Lisa Dusseault