[http-state] Missing specification in RFC 2617, cannot use a user name nor a password in encoding different from ISO-8859-1

Honza Bambas <hbambas@mozilla.com> Mon, 03 May 2010 18:54 UTC

Return-Path: <hbambas@mozilla.com>
X-Original-To: http-state@core3.amsl.com
Delivered-To: http-state@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id F28C23A697D for <http-state@core3.amsl.com>; Mon, 3 May 2010 11:54:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 2.35
X-Spam-Level: **
X-Spam-Status: No, score=2.35 tagged_above=-999 required=5 tests=[BAYES_60=1, HELO_EQ_CZ=0.445, HOST_EQ_CZ=0.904, HTML_MESSAGE=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CBTDDh+VOnz5 for <http-state@core3.amsl.com>; Mon, 3 May 2010 11:54:58 -0700 (PDT)
Received: from smtp2.vol.cz (smtp2.vol.cz [195.250.128.75]) by core3.amsl.com (Postfix) with ESMTP id 11A293A6886 for <http-state@ietf.org>; Mon, 3 May 2010 11:54:57 -0700 (PDT)
Received: from [192.168.0.18] (a40-prg1-17-91.static.adsl.vol.cz [88.146.67.91]) by smtp.volny.cz (Postfix) with ESMTP id 04ECE28A2B for <http-state@ietf.org>; Mon, 3 May 2010 20:54:39 +0200 (CEST)
Message-ID: <4BDF1BEF.6000701@mozilla.com>
Date: Mon, 03 May 2010 20:54:39 +0200
From: Honza Bambas <hbambas@mozilla.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.8) Gecko/20100216 Thunderbird/3.0.2
MIME-Version: 1.0
To: http-state@ietf.org
Content-Type: multipart/alternative; boundary="------------030604010807030604040407"
X-Mailman-Approved-At: Mon, 03 May 2010 13:29:27 -0700
Subject: [http-state] Missing specification in RFC 2617, cannot use a user name nor a password in encoding different from ISO-8859-1
X-BeenThere: http-state@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Discuss HTTP State Management Mechanism <http-state.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/http-state>
List-Post: <mailto:http-state@ietf.org>
List-Help: <mailto:http-state-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/http-state>, <mailto:http-state-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 03 May 2010 18:56:30 -0000

There has been observed that many server appliances allow setup of user 
names and passwords with characters that cannot be represented with 
ISO-8859-1.  Client implementations that have problems to communicate 
with such servers properly, obeying RFCs, because of missing 
specification of character encoding of a user name and a password in 
both 'basic' and 'digest' authentication scheme.  Specially building of 
the Authorization header and its username= directive value and building 
of A1 string.

As for the username= directive value: it is by definition a 
'quoted-string' that is unable to carry any information about its 
character encoding.  I have found any explicit information in RFC 2617 
about a required character encoding for it.  RFC 2047 encoding cannot be 
used because "an 'encoded-word' MUST NOT appear within a 
'quoted-string'" per RFC 2047 and on the other hand, per RFC 2616, 
"words of *TEXT (which 'quoted-string' consist of) MAY contain 
characters from character sets other than ISO-8859-1 only when encoded 
according to the rules of RFC 2047".  A dead end.

As for A1 value: it's not said anywhere in what byte representation the 
user name, password and realm should be read when establishing the A1 
octet array.  The same problem applies to basic authentication where a 
base64 string is built to carry the username:password pair, but it is 
not said anywhere in what character encoding or generally an encoding 
the source for base64 has to be.

For example of violation: Apache configuration utility program for 
configuring digest authentication database is taking the user name and 
the password directly as an argument from a terminal, in, often but not 
generally, UTF-8 encoding, pushing it to a hash function directly 
without any further translation.  The client side then should build A1 
string on it's side from UTF-8 encoded octet arrays.  The authentication 
mod seems to take the usename= directive value, used to create the A1 
string, "as is", in a byte representation sent by the client.  But, 
there is no way for the client to know, what encoding should be used 
when generating the headers.


My question is: should we disallow acceptance of a user name or password 
input in encoding different from ISO-8859-1 on the client side 
(independently on a server being setup for it, in any way) or should 
there be defined an extension to RFC 2617 allowing communication of the 
encoding between the client and the server?

For reference there are Mozilla platform bugs 
https://bugzilla.mozilla.org/show_bug.cgi?id=546330 and 
https://bugzilla.mozilla.org/show_bug.cgi?id=41489.

Sincerely,
Honza Bambas