Charles Lindsey <chl@clw.cs.man.ac.uk> Thu, 10 October 2002 16:14 UTC
Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id g9AGEos22949 for ietf-822-bks; Thu, 10 Oct 2002 09:14:50 -0700 (PDT)
Received: from curlew.cs.man.ac.uk (noplay@curlew.cs.man.ac.uk [130.88.13.7]) by above.proper.com (8.11.6/8.11.3) with ESMTP id g9AGEnv22943 for <ietf-822@imc.org>; Thu, 10 Oct 2002 09:14:49 -0700 (PDT)
Received: from clerew.man.ac.uk ([194.66.22.208] helo=clw.cs.man.ac.uk) by curlew.cs.man.ac.uk with esmtp (Exim 2.05 #6) id 17zfxY-000E20-00; Thu, 10 Oct 2002 17:14:48 +0100
Received: from localhost (localhost [127.0.0.1]) by clw.cs.man.ac.uk (8.9.1b+Sun/8.9.1) with SMTP id PAA10259; Thu, 10 Oct 2002 15:58:08 +0100 (BST)
Message-Id: <200210101458.PAA10259@clw.cs.man.ac.uk>
Date: Thu, 10 Oct 2002 15:58:08 +0100
From: Charles Lindsey <chl@clw.cs.man.ac.uk>
Reply-To: ietf-822@imc.org
To: ietf-822@imc.org
Cc: usenet-format@landfield.com
MIME-Version: 1.0
Content-Type: TEXT/plain; charset="ISO-8859-1"
Content-MD5: PXgnG6vPMrveJAehiGLh5Q==
X-Mailer: dtmail 1.3.0 CDE Version 1.3 SunOS 5.7 sun4m sparc
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from QUOTED-PRINTABLE to 8bit by above.proper.com id g9AGEnv22946
Sender: owner-ietf-822@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-822/mail-archive/>
List-ID: <ietf-822.imc.org>
List-Unsubscribe: <mailto:ietf-822-request@imc.org?body=unsubscribe>
{This message is sent primarily to the ietf-822 mailing list, because I want to hear the views of the mail experts. But it is Cced also to the usefor list. Reply-To is set to ietf-822.} I am having problems understanding RFC 2047. In particular, I want to establish what a user agent would be REQUIRED to do in order to claim minimal compliance with RFC 2047 (yes, I know actual implementations are likely to be more 'liberal' than that, but I after a fundamental baseline which other documents can *rely* on). The problem splits into two parts: 1. Where can encoded-words legitimately appear (RFC 2047 section 5). 2. Where are encoded-words required to be recognized (RFC 2047 section 6). 1. Where can encoded-words legitimately appear (RFC 2047 section 5). ------------------------------------------------------------------- In 5 (1) I find: An 'encoded-word' may replace a 'text' token (as defined by RFC 822) in any Subject or Comments header field, any extension message header field, or any MIME body part field for which the field body is defined as '*text'. An 'encoded-word' may also appear in any user-defined ("X-") message or body part header field. That is ambiguous, depending on how you interpret the commas in the first sentence: Interpretation A: It means you can use an encoded-word in any Subject any Comments any extension message header field any MIME body part field for which the field body is defined as '*text' any X-header Interpretaion B: It means you can use an encoded-word in any Subject ) any Comments ) for which the field body is any extension message header field ) defined as '*text' any MIME body part field ) any X-header I am inclined to believe Interpretation A, because I if I had wanted Interpretation B, I would have written "... any extension message header field or any MIME body part field, for which the field body is defined as '*text'". Of course, the difference only arises in the case of extension message header fields. Now suppose I am writing a standards-track document and I want to introduce a new header-field. Under RFC 822, it would be regarded as an "extension-field" (under 2822 it would be an "optional-field"). Let us take a specific example from Usefor (note that any news article is potentially also an email message, because it may have been posted-and-mailed, or it may be en route to a moderator). So I can write: Mail-Copies-To: Claus Färber <claus@faerber.muc.de> In Usefor, that is a structured header with a pretty obvious syntax in which "Claus Färber" is clearly a 'phrase'. In the email version (even if not in the news version also) that has to be encoded as: Mail-Copies-To: =?ISO-8859-1?Q?Claus_F=E4rber?= <claus@faerber.muc.de> Note that a news user agent has some semantic duties to perform when it sees that header, but all an email user agent is expected to do is not to munge or delete it, and to enable it to be displayed to the user (at least if the user asks to see it). Q: Is an email message containing that header-field (or should I say the user agent which permitted it to be sent as an email) RFC 2047-compliant? A: Under Interpretation A, Yes. Because it is an extension-field which satisfies the requirements of Rule 5(1). Under Interpretation B, No. Because the field body is not defined as '*text'. However, even with Interpretation B, it might get by under Rule 5(3) because, under the Usefor syntax, it is within a 'phrase'. OTOH, both those views of Interpetation B seem to presuppose that the user agent was familiar with the syntax of Usefor. But maybe it was just a simple (non-Usefor-aware) MUA and the user had inserted that header manually (as is sometimes done by users emailing directly to a moderator). So we would have the seemingly absurd situation that one user agent would be compliant when sending that message, but another user agent which sent that same message would be non-compliant. Here is another example from Usefor: Organization: Färber Fabrik That is, of course, an unstructured header, and would be encoded as Organization: =?ISO-8859-1?Q?Claus_F=E4rber Fabrik?= Q: Is that one RFC 2047-compliant? A: Yes, under both Interpretations A and B (though under B one might wonder how the user agent was supposed to know that it was unstructured). It should be noted in passing that Rule 5(1) contains no requirement for an encoded-word to be preceded (or followed) by 'linear white space', although section 7 does seem to enforce such a requirement. 2. Where are encoded-words required to be recognized (RFC 2047 section 6). -------------------------------------------------------------------------- One would expect section 6 to require the recognition of anything that was allowed to appear under section 5, but that seems not to be the case because there is no mention of "extension message header fields". In 6.1 I find: A mail reader must parse the message and body part headers according to the rules in RFC 822 to correctly recognize 'encoded-word's. Again, I see two interpretaions: Interpretation C: The wording "rules in RFC 822" means that only the headers explicitly defined in RFC 822 are required to be examined for the presence of 'encoded-word's. Interpretation D: The wording "rules in RFC 822" includes the rules for 'extension-field' and 'user-defined-field'. The RFC 822 syntax for 'extension-field' is extension-field = <Any field which is defined in a document published as a formal extension to this specification; none will have names beginning with the string "X-"> Hence "must parse the message" means that the rules in the document defining the extension are to be applied. Ad Interpretation C:- The text I quoted above refers to "message and body part headers". But since RFC 822 does not define any body part headers, Interpretation C would not permit any body part header field to be examined (body part header fields are introduced in section 5.1 of RFC 2046, and are all supposed to be of the form "Content-*"). For example, you could not recognize an encoded-word in a "Content-Description", let alone in "Organisation" or "Mail-Copies-To". So I cannot see how Interpretation C could have been the intended one. Ad Interpretation D:- OTOH, Interpretation D seems to require that all user agents be magically aware of all new extension headers as soon as their defining documents are published. Moreover, you cannot have an agent which attempts to recognize anything that just happens to look like an encoded-word because it needs to know, at the very least, whether some unknown header is "unstructured" or not (i.e. is defined as '*text'). For example, "(=?ISO-8859-1?Q?Claus_F=E4rber?=)" can occur and should be recognised in a structured field (if it is in a context where a comment would be allowed), but it cannot occur in an unstructured field and should therefore be displayed in its un-decoded form (as is explained in the examples in section 8 of RFC 2047). Thus the best interpretation I can place on section 6 is that a compliant mail reader MUST recognize and decode 'encoded-word's that occur in the headers explicitly defined in RFC 2822, and that it MAY/SHOULD/MUST/SOMETHING-ELSE recognize all 'encoded-word's produced by a compliant agent (as in section 5). 3. And finally ........... -------------------------- The questions I am unsure about: 1. Which of my "Interpretations" is correct, or are there other possible Interpretations that I have missed? 2. If someone is writing a standards-track document (whether for news or email) and wishes to introduce some new header-fields that can make use of RFC 2047, what does he have to say? Clearly, he defines syntax that shows whether those fields are unstructured or not, and that introduces 'phrase's and 'comment's in the proper manner. But does he have to include a remark to the effect that "RFC 822 (or RFC 2822) is hereby augmented to include these new header-fields, and RFC 2047 is to be construed accordingly"? 3. Is it possible to go further and introduce header-fields with explicit 'encoded-word's in them, for example: ueser-agent-header = "User-Agent" ":" 1*( product ["/" token] ) ; OK, it also needs to show where WS goes product = token / quoted-string / encoded-word Charles H. Lindsey ---------At Home, doing my own thing------------------------ Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl Email: chl@clw.cs.man.ac.uk Snail: 5 Clerewood Ave, CHEADLE, SK8 3JU, U.K. PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5
- Charles Lindsey
- Re: Jacob Palme