PP4: The Three Laws of HTTP Usage

Lisa Dusseault <ldusseault@commerce.net> Fri, 11 January 2008 01:19 UTC

Mime-Version: 1.0 (Apple Message framework v752.3)
To: Apps Discuss <discuss@apps.ietf.org>
Message-Id: <85B1F813-0EA1-4EB5-A828-E93AD89BB222@commerce.net>
Content-Type: multipart/alternative; boundary="Apple-Mail-8-516367523"
From: Lisa Dusseault <ldusseault@commerce.net>
Subject: PP4: The Three Laws of HTTP Usage
Date: Thu, 10 Jan 2008 17:19:43 -0800
Precedence: list
Errors-To: discuss-bounces@apps.ietf.org


The Three Laws of Robotics HTTP Usage

Lisa Dusseault
Jan 10, 2008


This is an attempt to rationalize and justify the requirements we make
for IETF standards that use HTTP.  The rationalization is inspired by  
the
three laws of robotics described by Isaac Asimov.  This  
categorization may
help us figure out who is protected by HTTP reuse guidelines:
- deployed systems,
- implementors of the new application,
- protocol designers (especially future application extensions), or
- nobody.

These guidelines and requirements may be less applicable to HTTP
extensions, such as WebDAV and Atom, where the application
makes use of many HTTP resource features and not just its transport
characteristics.  Those kinds of extensions are even harder to do and
require HTTP experts and deep familiarity with RFC2616.

A META-NOTE on the state of this document: The reminders and  
requirements
in section 2 are intended to be complete in this draft, not reasonable.
I look forward to very interesting discussions about which requirements
are not reasonable, particularly if such discussions can happen before
HTTP is revised by HTTPBIS WG, in time to influence that revision.



1.  First law: Do no harm

These are requirements on the design of the application, particularly
whether it uses GET or POST and registers new MIME types or ports.   
These
questions should be dealt with early in protocol design and  
specification.

1.1 Allow filtering

Some organizations need to filter or otherwise identify traffic from
various HTTP applications.  Here are the most reasonable ways to  
filter HTTP
traffic:
	- Well-known site
	- New methods
	- New HTTP version
	- New port
	- New MIME Type
	
Defining a single site is not usually appropriate to IETF standards.
Setting a new version for a protocol extending HTTP would be pretty  
nuts.
New methods are more often used and appropriate in HTTP extensions than
for applications that just want to transmit application requests to HTTP
servers for handling by a lightweight extension module.

Ruling those three out for most cases leaves us with new port numbers  
and new
MIME types.  Both of these use a registry which helps people monitoring
traffic to figure out what is going on.  BCP 56 recommends new ports,  
but
new MIME types is becoming more common and seems acceptable under these
"moral" principles.

To make filtering with MIME types reasonable, the MIME type must be
unique to the application, registered, and both requests and responses
that have bodies MUST use filterable MIME types.  It's a little hard to
imagine an application using HTTP as a transport that doesn't have any
message bodies, but that would definitely be a special case to consider
if it were proposed.

1.2 Support intermediaries

A few features must be supported correctly, or else intermediaries
may do the wrong thing.  Since intermediaries can cache responses and
retransmit requests, the application needs to support caching and
retransmission correctly.

Intermediaries are allowed to retransmit any request using a method
that is defined as idempotent methods; the most important such method
is GET.  Thus, an application can only tunnel over GET requests if no
harm results from intermediaries retransmitting GET requests.

An example of an idempotent application GET request is querying for a
certificate from a repository.  The POST method should be used to
tunnel non-idempotent requests.

Since GET responses can be cached, application responses to GET requests
need to be cacheable or have the correct cache prevention headers.
	
2.  Second Law: Obey orders

This section is more about helping implementations get HTTP right
than about designing the new application.  A specification using
HTTP can get theoretical compliance with RFC2616 just by making a
normative reference. However, implementations of such specifications
typically do not comply properly with RFC2616.  Perhaps implementors
need a couple reminders?  Specifications should provide a list of
oft-violated requirements and remind implementors that they really
are requirements.  (Alternatively, a protocol specification could
overrule HTTP requirements, often by requiring the opposite:  instead
of requiring servers to handle HEAD requests, the specification could
require clients not to send HEAD requests).

This extra work to ensure that implementors follow HTTP2616 is not
just for pedantic perfection.  There are two main reasons, both of
which protect the application protocol and its implementors.

First, proper support for all features protects the full feature set in
case it is needed later. If a few early clients don't support a feature
that isn't used by early server implementations, this seems innocuous.
But later, one finds that the application servers cannot be updated
to make use of the full feature set even if it would be very useful.
There's no way to advertise or negotiate for these features because
they are REQUIRED in HTTP.  When required and un-advertised features
are poorly implemented, they become practically impossible to use later.

Second, proper support for all features makes general-purpose client
libraries and general-purpose server libraries work better.  Again,
this protects the new application by making it easier to implement
with standard libraries -- but this only works if early implementations
really do the right thing.  Designers of a new protocol might instead
make the choice that the new protocol cannot use existing HTTP client or
server libraries, in which case the choice should be explicitly stated
and version number or protocol name changes should be considered.

These features are typically very easy to support properly.  In some
cases it's just returning the right error -- e.g. servers can fail a
request containing unrecognized Content-* headers.

Requirements reminders for server implementors:
		
	- MUST handle the HEAD request properly, returning no body in the
response.
	- MUST be prepared to handle OPTIONS * requests.
	- MUST use an error responding to unrecognized methods.
	- MUST examine conditional headers on requests, and if necessary,
fail the request (If-* and )
	- MUST honour Content-* headers on requests.  Any Content-* headers
that are not recognized or cannot be parsed, should cause a ??? error.
	- MUST handle the Range header or fail the request.
	- MUST look for the Expect header and be able to do 100 Continue
(without waiting for request body) or fail.
	- MUST either support persistent connections or include the "close"
connection option in every response.   If the server allows persistent
connections it MUST also implement pipelining, not dropping pipelined
requests and handling responses in order.

Requirements reminders for client implementors:

	- MUST include a Host header on requests
	- MUST support several ways response endings may be handled: chunked
transfer-encoding, connection closing, and  Content-Length.
	- MUST either support persistent connections or include the "close"
connection option in every request.
	- If the client supports HTTP caching, it MUST examine the Vary,
Cache-Control and Expires headers.
	- MUST NOT automatically follow redirects for methods other than GET
	and HEAD.
	- MUST handle a variety of success responses as successes (202, 203,
205)
	- MUST handle the 407 Proxy Authentication Required response and be
able to use the Proxy-Authenticate response-header to authenticate.

Requirements on protocol design:

	- HTTP status codes MUST preserve the same meaning to interoperate
well with HTTP client libraries.  For example, the 401 Unauthorized code
triggers a login request using HTTP authentication.  A tricky one is
412 Precondition failed, which can only be used when the client put a
precondition on the request.

3.  Third Law: Protect yourself

These are considerations that can protect the security or
extensibility of an application using HTTP.  Being considerations,
these might not be enforced in outside review of the specification,
but they are well worth considering.  Thus, this section does not
have normative requirements.

3.1 Security

Protocol designers may consider whether redirects may safely be
followed in all cases or in limited cases.  The application could
require clients to support redirects, which gives servers more
deployment flexibility.  On the other hand, the application could
limit redirects (only within local site) or forbid the use entirely.

Recall that HTTP supports TLS proxies, and these are used in some
corporate sites.  What this means is that rather than have the client
connect directly to the target site, the client connects to the proxy
and the proxy initiates its own TLS connection to the target site.
The proxy thus gains full access to the content of the application
protocol. If this is not an acceptable situation, the application
cannot use HTTP as-is.

HTTP authentication is not as secure as many other more modern
IETF authentication technologies.  An application that requires
better-than-standard authentication over HTTP may find that default
client and server libraries cannot be used.	

3.2 Discoverability and interoperability: URL considerations

Decide how URLs are known or discovered.  Do application requests go
straight to http://app.example.org?  Can there be a path part?

An extended example in this section is the Noogie application, where
users have personal Noogie URLs.  These URLs identify resources that
can receive Noogie application requests, which may trigger virtual
noogies given to the receiving user, and result in success or failure
responses to the requestor.

3.2.1 Scheme

Applications that use HTTP can of course use the HTTP URL scheme.
However, protocol designers should consider defining a new URL scheme
anyway.  In cases where the URL can be found along with other HTTP
URLs, this allows clients to select a URL that does what they want.

	Example: the Noogie application URLs are intended to appear in
VCards, presence information documents and on Web sites.  In order to
allow clients to immediately detect such a URL and know what it's
for, the Noogie standard designers can register the "noogie" scheme
and explain how this scheme maps to the "http" scheme.
	
3.2.2 Path part

Does the URL path part have any structure?  If so, make sure that the
path part still allows application servers to be deployed in places
where development frameworks and site policies dictate prefixes to
this path part.

E.g.   Legal Noogie URLs must also be able to contain prefixes like
"servlets/noogie/" at sites that have policies about service
framework usage.  One resulting URL would be http://www.example.org/
servlets/noogie/John/Doe even though another site might prefer http://
noogie.example.org/John/Doe.


3.2.3 Query part

Most applications that use HTTP do not extend or formalize the query
part of an HTTP URL.  In this case, the protocol specification might
forbid query parts or require that they be stripped from URLs.
Query strings have been used to exploit security holes in many HTTP
servers.

PP4: The Three Laws of HTTP Usage Lisa Dusseault