Re: Return of the Son of Beneath the Planet of RFC-XXXX
Nathaniel Borenstein <nsb@thumper.bellcore.com> Tue, 14 May 1991 17:50 UTC
Received: from thumper.bellcore.com by NRI.NRI.Reston.VA.US id aa09336; 14 May 91 13:50 EDT
Received: from greenbush.bellcore.com by thumper.bellcore.com (4.1/4.7) id <AA22687> for gvaudre@NRI.Reston.VA.US; Tue, 14 May 91 13:51:54 EDT
Received: by greenbush.bellcore.com (4.12/4.7) id <AA03248> for gvaudre@NRI.Reston.VA.US; Tue, 14 May 91 13:55:31 edt
Received: from Messages.7.14.N.CUILIB.3.45.SNAP.NOT.LINKED.greenbush.mouseclub.sun4.40 via MS.5.6.greenbush.mouseclub.sun4_40; Tue, 14 May 1991 13:55:26 -0400 (EDT)
Message-Id: <scA2GCu0M2Yt0_1_wo@thumper.bellcore.com>
Date: Tue, 14 May 1991 13:55:26 -0400
From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>
Subject: Re: Return of the Son of Beneath the Planet of RFC-XXXX
In-Reply-To: <9105141332.aa09046@NRI.NRI.Reston.VA.US>
References: <9105141332.aa09046@NRI.NRI.Reston.VA.US>
Your wish is my command... How to Read the May Draft of RFC-XXXX This is the fifth major draft, at least, of RFC-XXXX. Those of you who have been following along are, no doubt, heartily sick of the process by now, as am I. I'm trying to make it easier for us all in the following ways: 1. I've compiled a list of major changes from the April draft. I'm not trying to pull any fast ones on anybody, but it is possible that the list is incomplete. It is, however, my best attempt to provide a simple list of what has changed. 2. With previous drafts, I think that comments mostly came in three flavors: NITS: Minor points of clarification, typographical or technical correction, etc. These were uncontroversial and I tried to adopt them all. SHOW-STOPPERS: These were major disagreements, where people indicated unhappiness so great that they might be unable to live with the draft as written. Obviously I've tried VERY hard to deal with these, but sometimes people have SHOW-STOPPER comments that are pretty nearly in direct conflict with each other. ARGUMENTS: These are sincere disagreements where the person disagreeing could still live with the draft if he lost the argument. I would like to STRONGLY URGE the readers of this draft to self-classify their comments into the above three categories, and to treat them in the following ways: NITS: Send them directly to me; no need to bother the whole list. SHOW-STOPPPERS: Sigh... I'm hoping there aren't any left, but if you have them, please send them to the whole list. ARGUMENTS: If you can live with losing the argument, and if the argument has already been well-argued in the past on the list, ask yourself: is it worth re-arguing? I'm not trying to prevent debate, merely encouraging you to reflect before reopening old arguments. I still need help on a number of things, particularly fleshing out some of the references and sanity-checking some of the areas in which I'm not an expert, notably character sets, audio, and privacy-enhanced messages (PEM). If you know something about one of these, please read that part of the draft extra carefully. That's all. Enjoy. I look forward to your comments. Well, sort of.... :-) -- Nathaniel Major Changes From April Draft There is a lot of new prose, and the document has been reorganized substantially, to clarify intent and to discuss rejected alternatives. Content-type syntax: There is now a distinguished place for character sets, which are no longer content-types. The rest of the syntax has been generalized to a set of semicolon-separated parameters. Content-types: Several content-types have been consolidated into "image" and "audio". The Scribe and SGML content-types have been eliminated. DES-MESSAGE has been replaced by PEM-MESSAGE. New content-types: binary, digest, message, partial-message, external-reference. The scheme for officially defining new content-types has been changed to require an RFC. The "Encoded-Variable" stuff has been elminated, in favor of Content-type: Message/charset Content-Encoding has been changed to Content-TransferEncoding. The hexadecimal encoding has been eliminated, and some prose about the need for a compressed encoding has been added. The base64 encoding has added "," as a way to specify portable end-of-lines. The quoted-printable encoding has changed "&" and "\" to "=" and ":" for portability, and has added some rules (and clarified others) regarding CRLF and trailing white space. Two new optional header fields, Content-ID and Content-Description, have been defined. Multipart messages: The definition has changed so that body-parts are no longer messages, though the syntax is the same. A new distinguished closing delimiter is now required. The content-type for multipart can now specify a character set, which made it seem reasonable to reinstate the notion of a text prefix & postfix in the specified character set. (US-centrism was a major criticism of earlier proposals to allow text in the prefix & postfix.) Added a new notion of "RFC-XXXX-compliant" implementations, defining a minimal subset to be implemented to earn such a label. Network Working Group -- Request for Comments: XXXX Mechanisms for Specifying and Describing the Format of Internet Message Bodies Nathaniel Borenstein, Bellcore Ned Freed, Innosoft May 1991 Status of This Memo This document suggests extensions to the RFC 822 message representation protocol to allow multi-part textual and non-textual messages to be represented and exchanged without loss of information. Discussion and suggestions for improvements are welcome. This memo does not specify an Internet standard, but it is intended to be a step towards a standard. This draft document will be submitted to the RFC editor as a protocol specification. Distribution of this memo is unlimited. Please send comments to Nathaniel Borenstein <nsb@thumper.bellcore.com> Table of Contents Introduction The Content-Type Header Field The Content-TransferEncoding Header Field Quoted-Printable Content-TransferEncoding Base64 Content-TransferEncoding Additional Optional Content- Header Fields Optional Content-ID Header Field Optional Content-Description Header Field Optional Content-Size Header Field RFC-XXXX Compliance Summary Acknowledgements References Appendix [APP-CONTENTTYPES] -- Partial List of Predefined Content-Type Values Appendix [APP-TEXT] -- The TEXT Content-type and the MAILASCII Character Set Appendix [APP-MULTIPART] -- The "Multipart" Content-Type Appendix [APP-SIMPLE] -- Simple Non-ASCII Text Example Appendix [APP-COMPLEX] -- A Complex Multipart Example Introduction Since its publication in 1982, RFC 822 [RFC-822] has defined the standard format of textual mail messages on the Internet. Its success has been such that the RFC 822 format has been adopted, wholly or partially, well beyond the confines of the Internet and of SMTP transport, as defined by RFC 821 [RFC-821]. As the format has seen wider use, a number of limitations have become increasingly problematic for the user community. RFC 822 was intended to specify a format for text messages. As such, non-text messages, such as multimedia messages that might include audio or images, are simply not mentioned. Even in the case of text, however, RFC 822 is inadequate for the needs of email users whose languages require the use of character sets richer than US ASCII [REF-ANSI]. For mail containing audio, video, Japanese text, or even text in most European languages, RFC 822 does not specify enough to permit interoperability. One of the notable limitations of RFC 821/822 based mail systems is the fact that they limit the contents of electronic mail messages to relatively short lines of seven-bit ASCII. This forces a user to convert any non-textual data that she may wish to send into a seven-bit ASCII representation before invoking her local mail UA (User Agent program). Examples of such encodings currently used in the Internet include pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified in RFC 1113, the Andrew Toolkit Representation [REF-ATK], and many others. These limitations become even more apparent as gateways are designed to allow for the exchange of mail messages between RFC 822 hosts and X.400 hosts. X.400 [REF-X400] specifies mechanisms for the inclusion of non-textual body parts within electronic mail messages. The current standards for the mapping of X.400 messages to RFC 822 messages specify that either X.400 non-textual body parts should be converted to (not encoded in) an ASCII format, or that they should be discarded, notifying the RFC 822 user that discarding has occurred. This is clearly undesirable, as information that a user may wish to receive is lost. Even though a user's UA may not have the capability of dealing with the non-textual body part, the user might have some mechanism external to the UA that can extract useful information from the body part. Moreover, it does not allow for the fact that the message may eventually be gatewayed back into an X.400 MHS, where the non-textual information would definitely become useful again. This memo describes several mechanisms that combine to solve these problems. In particular, it describes: 1. A Content-type header field, generalized from RFC 1049 [RFC-1049], which can be used to describe the type of data in the body of a message and to fully specify the representation (encoding) of such data. 2. A Content-TransferEncoding header field, which can be used to describe an auxilliary encoding that was applied to the data in order to allow it to pass through the mail transport layer. 3. A "text" content-type value, which can be used to represent text information in a number of character sets in a standardized manner. 4. A "multipart" content-type value, which can be used to combine several separate body-parts, which may be made of different types of data, into a single message. 5. A "binary" content-type value, which can be used to transmit uninterpreted or partially-interpreted binary data, and hence to implement an email file transfer service. 6. "Message" and "Digest" content-type values for encapsulating one or more mail messages. 7. Several additional content-type values, which can be used by consenting User Agents to interoperate with additional message types such as audio, images, and more. 8. Several optional header fields that can be used to further describe the data in a message body or body-part, in particular the Content-Size, Content-ID, and Content-Description header fields. Finally, to specify and promote a minimal level of interoperability, this memo describes a subset of the above mechanisms that defines "compliance" with this memo. That is, it specifies the minimal subset required for an implementation to be called "RFC-XXXX-compliant." The Content-Type Header Field The Content-Type header field was previously defined in RFC 1049, and is reaffirmed and generalized here. The remainder of this section is derived from RFC 1049, and, where different, is intended to supersede it. The Content-Type header field is used to specify the type of data in a message, by giving a type name, and to provide auxilliary information that may be required for certain types. In addition. a distinguished syntax is defined for specifying character set information. After the type name and the optional character set, the remainder of the header field is simply a set of parameter specifications, as defined for each named type, and an optional comment. (DISCUSSION: It has been suggested that character sets can be specified in the same way as any other auxilliary information, and that character set specification is meaningless for content-types such as "audio" and therefore should not be broadly defined as part of the top-level syntax. However, character sets have been given a distinguished syntax in order to aid gateways that need to do character set translation without necessarily understanding all possible content-types. Such translation should not, however, be undertaken lightly, as the complexities involved are formidable and easily underestimated.) (COMPATIBILITY NOTE: Readers familiar with RFC 1049 Content-types will notice that the syntax has been generalized substantiallly. However, RFC 1049 content-types are all compliant with the new syntax. In particular, RFC 1049 content-types omitted the character-set specification, and always had at most two of the parts now called "parameters", which were distinguished by their position as indication a version number and a resource reference.) In the Extended BNF notation of RFC-822, we define a COntent-type header field value as follows: Content-Type:= type ["/" char-set] *[";" parameter] [comment] parameter := local-part char-set := "MAILASCII"/ "ISO-10646" / "ISO-8859-" *DIGIT / "ISO-2022" type := "TEXT" / "MESSAGE" / "MULTIPART" / "DIGEST" / "BINARY" / "AUDIO" / "IMAGE" / "PEM-MESSAGE"/ "PARTIAL-MESSAGE"/ "EXTERNAL-REFERENCE"/ "POSTSCRIPT" / "TeX" / "TROFF" / "DVI" / "ODA" / "DVI" / "X-BE2" / "X-"atom These values are not case sensitive. POSTSCRIPT, Postscript, and POStscriPT are all equivalent. This set of type names is not intended to be exhaustive. More may be defined later. The only constraint on the definition of such names is the desire that their uses not conflict. That is, it would be undesirable to have two different communities using "Content-type: foo" to mean two different things. The process of defining new content-types, then, is not intended to be a mechanism for imposing restrictions, but simply a mechanism for publicizing content-type usages. There are, therefore, only two acceptable mechanisms for defining new content-type values: 1. Private values (starting with "X-") may be defined unilaterally. 2. "Standard" values may be defined by the publication of an Internet RFC. The RFC need not be very long, but must define the content-type, its associated parameter syntax, and the format of the body of a message so marked. Several specific predefined "type" fields are explained in the appendices of this memo. If no Content-type header field is present, "text" is assumed, with the default character set (MAILASCII) as specified in Appendix [APP-TEXT]. This is consistent with the default message body type as defined by RFC 822. It should be noted that the list of Content-type values given in the appendices is expected to be augmented in time, via the mechanisms described above. We have simply attempted, in this RFC, to give as many standard Content-type definitions as was possible given the current state of our knowledge. The Content-type values defined here are compatible with the values defined by RFC 1049. The Content-TransferEncoding Header Field Many content-types are represented, in their "natural" format, as 8-bit or binary data. Such data can not be transmitted over existing Internet mail mechanisms because both RFC 821 and RFC 822 restrict mail messages to 7 bit data with reasonably short lines. It is necessary, therefore, to define a standard mechanism for encoding such data in an acceptable manner. This RFC specifies that such encodings will be indicated by a new "Content-TransferEncoding" header field. The Content-TransferEncoding field is used to indicate the type of transformation that has been used to represent the message body in an acceptable manner. It should be noted, also, that there is considerable interest and effort being expended on extending mail transport to permit 8-bit or binary data. If such extensions ever become commonplace, the Content-TransferEncoding mechanism will quickly become irrelevant, and it is therefore desirable not to "overload" Content-TransferEncoding with additional mechanisms that might still be useful in such a future. For this reason, Content-TransferEncoding is restricted in its scope to refer to nothing but the 7-bit encoding question. Matters such as the basic format in which information is "encoded" are to be handled by another mechanism. Unlike Content-types, which are expected to proliferate, it is expected that there will never be more than a few different Content-TransferEncoding values, both because there is less need for variation and because the effect of variation in Content-TransferEncoding would be more problematic. However, establishing only a single Content-TransferEncoding mechanism does not seem possible. In particular, there is a tradeoff between the desire for a compact and efficient encoding of binary data and the desire for a readable encoding of data that is mostly, but not entirely, 7-bit data. For this reason, at least two encoding mechanisms are necessary, a "readable" encoding and a "dense" encoding. A third encoding, for compressed ("super-dense") data, is also strongly desirable. This RFC does not specify a "compressed" encoding, due to the uncertain legal state of the UNIX "compress" command and a lack of certainty, during the drafting of this RFC, regarding the right way to define a standard compression algorithm. It is hoped that a compressed Content-TransferEncoding will be defined in a future RFC. Any compression algorithm for such a use should be unambiguously defined and without legal encumbrances. The Content-TransferEncoding field is designed to specify a two-way mapping between the "native" representation of a type of data and a representation that can be readily exchanged using 7 bit mail transport protocols as defined by RFC 821 (SMTP). This field has not been defined by any previous RFC. The field's value is a single atom specifying the type of encoding, as enumerated below. Formally: Content-TransferEncoding:= "BASE64"/ "QUOTED-PRINTABLE"/ "8BIT"/"BINARY"/ "7BIT"/"X-"atom These values are not case sensitive. That is, Base64 and BASE64 and bAsE64 are all equivalent. An encoding type of 7BIT implies that the message is already in a seven-bit mail-ready representation. This value is assumed if the Content-TransferEncoding header field is not present. If the message is stored or transported via a mechanism that permits 8-bit data, a Content-TransferEncoding of "8bit" should be used. If the message is stored or transported via a mechanism that permits arbitary binary data, a Content-TransferEncoding of "binary" should nonetheless be used. (DISCUSSION: The distinction between the Content-TransferEncoding values of "binary," "8bit," and "7bit" may seem unimportant in an 8-bit binary environment, but clear labeling will be of enormous value to gateways between 8-bit and 7-bit systems. The difference between "8bit" and "binary" is that "8bit" implies adherence to SMTP limits on line length and CR/LF semantics, whereas "binary" does not.) Implementors may define new content encoding values, but should prefix them with "x-" to indicate their non-standard status, e.g. "Content-TransferEncoding: x-my-new-encoding". However, unlike Content-types, the creation of new Content-TransferEncoding values is explicitly discouraged, as it seems likely to hinder interoperability with little potential benefit. If a Content-TransferEncoding header field appears as part of a message header, it applies to the entire message body, whether or not that body is of type "multipart." If it is of type multipart, the encoding applies recursively to all of the encapsulated parts, including their encapsulated headers. If a Content-TransferEncoding header field appears as part of an encapsulation's headers, it applies only to the body of the encapsulated part. If the encapsulated part is itself of type "multipart", the encoding applies recursively to all of the encapsulated parts within that encapsulated part. It should be noted that, because email is character-oriented, the mechanisms describe here are mechanisms for encoding arbitrary byte streams, not bit streams. If a bit stream is to be encoded via one of these mechanisms, it should first be converted to a byte stream using the "big-endian" bit order, in which the earlier bits in a stream become the higher-order bits in a byte. A bit stream not ending at an 8-bit boundary should be padded with zeroes. If the precise bit count is needed, it can be given in the Content-Size header field, described later in this document. The following sections will define the two standard encoding mechanisms. Quoted-Printable Content-TransferEncoding The Quoted-Printable encoding is intended to represent data that is largely, but not entirely, 7 bit ASCII. Printable ASCII portions of body parts encoded in this way should be recognizable by humans, if necessary, without translation. In this encoding, ASCII characters 9 (tab), 10 (nl), 12 (np), 13 (cr), 32 through 57, inclusive, 59, 60, and 62 through 126, inclusive, are unchanged. All other characters, including characters 58 (:), 61 (=), and 127 (DEL), are to be represented as determined by the following rules: Rule #1: Any 8 bit value may be represented by a ":" followed by a two digit hexadecimal representation of the character's 8-bit value. Thus, for example, character 12 (control-L, or formfeed) can be represented by ":0C", the equal-sign character (61) can be represented by ":3D", and the colon character (58) itself can be represented by ":3A". Rule #1 is the REQUIRED representation for characters 127 through 160 and for character 255. Rule #2: An 8 bit value from 161 through 254 may, alternately, be represented by an equal-sign character followed by the single character obtained by the removal of the high order bit, i.e. by subtracting 128 from the value. Thus the 8 bit value 193 may be represented as "=A". Rule #2 is completely optional, given rule #1, but is provided for improved readability of some 8-bit character sets in which turning on the 8th bit produces a character similar to the corresponding 7 bit character, e.g. the 8th bit simply adds an umlaut. Rule #3: The literal equal-sign and colon characters must themselves be quoted by colons. Thus, the colon may be represented as "::" and the equal-sign as ":=". Note that this is not ambiguous with regard to the first clause, because neither ":" nor "=" are part of the hexadecimal alphabet. Rule #4: A colon at the end of a line may be used to indicate a non-significant line break. That is, if one needs to include a long line without line breaks, a message encoded with the quoted-printable encoding should include "soft" line breaks in which the line break is preceded by a colon. Thus if the "raw" form of the line is a single line that says: Now's the time for all men to come to the aid of their country. Now's the time for all men to come to the aid of their country. Now's the time for all men to come to the aid of their country. This could be represented, in the quoted-printable encoding, as Now's the time for all men to come to the aid of their country. : Now's the time for all men to come to the aid of their country. : Now's the time for all men to come to the aid of their country. This provides a mechanism with which long lines are encoded in such a way as to be restored by the user agent. The quoted-printable encoding REQUIRES that lines be broken so that they are no more than 79 characters long, using soft line breaks when necessary. Rule #5: Although the SPACE (32) and TAB (9) characters may generally be represented as themselves, they should NOT be so represented at the end of a line, because some MTA's are known to remove "white space" from the end of a line. In such cases, the characters MUST be represented as in rule #1 (as ":20" and ":09" respectively) or as themselves, followed by a soft line break followed by a real line break. IMPORTANT NOTE: In decoding a quoted-printable message, any trailing white space on a line should be deleted, as it will have been added by intermediate transport agents. It is also recommended that the persistence of character codes less than 32 should not be relied on, particularly the TAB, CR, and LF characters. Even though TAB and form-feed are permitted in this encoding, they can be quoted, and this is wise if their precise persistence is critical. NOTE ABOUT CR AND LF in quoted-printable encoded messages: The use of CR or LF characters that are not part of a CR/LF sequence must be encoded as :0D and :0A, respectively, in messages that use the Quoted-Printable encoding. Sequences such as CR LF LF are also invalid; the only correct unencoded sequence is CR LF CR LF. Although RFC-822 defines CR and LF as ordinary characters when used outside of the CR/LF sequence, some implementations treat one (or both) as equivalent to end-of-line or as error characters that are discarded. Messages which contain embedded bare CR or LF characters should use encoding style #1 to encode these characters "safely". (Discussion: Some environments use a bare CR or bare LF as the local end-of-line convention. If a message contains embedded bare CR or LF characters, it is impossible to transform it from Internet to local conventions without interfering with this local convention.) The presence of a CR LF sequence in a quoted-printable-encoded message is to be interpreted as an end-of-line marker, to be represented as such according to local convention by the decoding agent. If it is necessary to send a binary sequence of CR LF in such a message, these characters should be represented as :0D:0A in order to prevent them from being re-converted into the local end-of-line convention. Since the hyphen character ("-") is represented as itself in the Quoted-Printable encoding, the usual care must be taken, when encapsulating a quoted-printable encoded message or body part in a multipart message, to ensure that the encapsulation boundary does not appear anywhere in the message. See the definition of multipart messages, in Appendix [APP-MULTIPART]. Base64 Content-TransferEncoding The Base64 Content-TransferEncoding is designed to represent arbitrary 8 bit data in a form that is not humanly readable. The encoding and decoding algorithms are simple, but the encoded data is only about 33 percent larger than the unencoded data. This encoding is also used in Privacy Enhanced Mail applications; it is described in RFC 1113. The ability in RFC1113 to embed clear text within such an encoding is not allowed in this context, however. The following description of the encoding is adapted from RFC 1113; apart from the exclusion of the "*" mechanism for embedded clear text and the definition of a portable newline syntax, using the comma character, there are no significant technical changes from RFC 1113. A 64-character subset of International Alphabet IA5 is used, enabling 6 bits to be represented per printable character. (The proposed subset of characters is represented identically in IA5 and ASCII.) One additional character, "=", is used to signify special processing functions. The character "=" is used for padding within the printable encoding procedure. The encoding function's output is delimited into text lines (using local conventions), with each line except the last containing exactly 64 printable characters and the final line containing 64 or fewer printable characters. (This line length is easily printable and is guaranteed to satisfy SMTP's 1000 character transmitted line length limit.) Although implementations are encouraged to be liberal in accepting lines of different lengths if they are received, they should only compose lines of the specified lengths. The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from left to right across a 24-bit input group is formed by concatenating 3 8-bit input groups, this is then treated as 4 concatenated 6-bit groups. When encoding a bit stream via the base64 encoding, the bit stream should be presumed to be ordered with the most-significant-bit first. That is, the first bit in the stream will be the high-order bit in the first byte, and the eighth bit with be the low-order bit in the first byte, and so on. Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is placed in the output string. These characters, identified in Table 1 below, are selected so as to be universally representable, and the set excludes characters with particular significance to SMTP (e.g., ".", "<CR>", "<LF>"). Table 1 Value Encoding Value Encoding Value Encoding Value Encoding 0 A 17 R 34 i 51 z 1 B 18 S 35 j 52 0 2 C 19 T 36 k 53 1 3 D 20 U 37 l 54 2 4 E 21 V 38 m 55 3 5 F 22 W 39 n 56 4 6 G 23 X 40 o 57 5 7 H 24 Y 41 p 58 6 8 I 25 Z 42 q 59 7 9 J 26 a 43 r 60 8 10 K 27 b 44 s 61 9 11 L 28 c 45 t 62 + 12 M 29 d 46 u 63 / 13 N 30 e 47 v 14 O 31 f 48 w (pad) = 15 P 32 g 49 x 16 Q 33 h 50 y Special processing is performed if fewer than 24 bits are available at the end of a message or encapsulated part of a message. A full encoding quantum is always completed at the end of a message. When fewer than 24 input bits are available in an input group, zero bits are added (on the right) to form an integral number of 6-bit groups. Output character positions which are not required to represent actual input data are set to the character "=". Since all canonically encoded output is an integral number of octets, only the following cases can arise: (1) the final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of encoded output will be an integral multiple of 4 characters with no "=" padding, (2) the final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by two "=" padding characters, or (3) the final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be three characters followed by one "=" padding character. One addition is made to the RFC 1113 specification of this encoding: The comma character (",", ASCII 44) may be used to represent an "end-of-line" or "end-of-record" marker. If line-oriented data are encoded using base64, it is desirable to restore end-of-line markers according to the local convention. The RFC 1113 specification, as given above, offers no way to differentiate between a binary file including a CRLF sequence and a portable end-of-line marker. This memo augments that mechanism to permit such differentiation, as follows. To represent an end-of-line marker: 1. Treat the byte stream preceding the end-of-line as terminating with at the end of the line -- that is, pad with "=" characters as appropriate to complete the representation of the line. 2. Insert a comma character. 3. Resume the encoding starting a new 24-bit input group with the first character on the next line. Thus, while encoding the binary sequence "a-b-c-CR-LF-a-b-c" yields "YWJjDQphYmM=", encoding "a-b-c" followed by an end-of-line followed by "a-b-c" yields "YWJj,YWJj" They will be translated back into the same thing if the local end-of-line convention is CRLF, but they will be translated back differently if the end-of-line convention is anything other than CRLF. Since the hyphen character ("-") is not used, there is no need to worry about quoting apparent encapsulation boundaries within base64-encoded body parts. Additional Optional Content- Header Fields Optional Content-ID Header Field In constructing a high-level user agent, it may be desirable to allow one message body-part to make reference to another. This may be done using the "Content-ID" header field, which is syntactically identical to the "Message-ID" header field: Content-ID := "<" msg-id ">" Optional Content-Description Header Field It may be desirable to associate some descriptive information with a given body-part. For example, it may be useful to mark an "image" body-part as "a picture of the Space Shuttle Endeavor." Such text may be placed in the Content-Description header field. The text will be assumed to be in the same character set as the multipart message within which it is a part. Content-Description := *text Optional Content-Size Header Field In the discussions of earlier drafts of this memo, some people indicated a strong preference for using a size-counting scheme to delimit the boundaries between encapsulated parts of multipart messages. This was rejected because such schemes are not, in general, sufficiently robust across the SMTP transport layer. For example, line counts can be altered by line-wrapping MTA's, and byte counts can be altered in any number of ways, and may be confused by crossing boundaries in which the size of an end-of-line marker changes. However, there are restricted environments in which either or both of these counts can be relied upon, and in such environments it may be desirable to implement a count-based approach to delimiters. Therefore this memo specifies a conventional way to do this, in order to promote interoperability among systems that are able to take this approach. In such cases, boundary delimiters, as defined above, are still required. However, the header area of an encapsulated part may include an optional Content-Size header which indicates where the encapsulated part ends, if its size has not been altered. The size may be measured in either bytes or lines. Those who use the Content-Size header field should still preserve the encapsulation boundaries, and should recognize that other agents are free to ignore it in favor of complete reliance on encapsulation boundaries. The Content-Size header field is defined as follows: Content-Size = 1*DIGIT "lines" / 1*DIGIT "bytes" / 1*DIGIT "bits" Note that each encapsulated part should still end with an end-of-line followed by an encapsulation boundary. However, a message store that wishes, for example, to use a storage format that is largely RFC 822-compliant, but includes binary storage of binary objects, can use the Content-Size header field to indicate whether or not the final end-of-line is to be interpreted as part of the binary object. If the end-of-line follows the number of bytes specified for the encapsulation, then it is not part of the encapsulation. The size given by the Content-Size header field is the size of the encapsulation's body only, not counting the blank line that separates the header from the body. In other words, the four bytes CRLF CRLF, which separate header from body, are NOT counted as part of the content-size. RFC-XXXX Compliance The mechanisms described in this memo are open-ended. It is definitely not expected that all implementations will implement all of the content-types described, nor that they will all share the same extensions. In order to promote interoperability, however, it is useful to define the concept of "RFC-XXXX-Compliance" to define a certain level of implementation that allows the useful interworking of messages with content that differs from US ASCII text. In this section, we specify the requirements for such compliance. An RFC-XXXX-Compliant mail user agent must: 1. Recognize the Content-TransferEncoding header field, and un-encode data encoded with either the quoted-printable or base64 implementations. (If a compressed encoding is ever agreed to, it should also become part of all compliant user agents.) 2. Recognize and interpret the Content-type header field, and avoid showing an unsuspecting user raw data that has a content-type field other than text. 3. Explicitly handle the following content-type values, as defined in the appendices: -- text, with at least the MAILASCII character set. -- message, with at least the MAILASCII character set. -- multipart, although parallel parts may be serialized. -- digest, with at least the MAILASCII character set. -- binary, although no particular subtype recognition is required. 4. Upon encountering an unrecognized content-type, an implementation should treat it as if it had a content-type of "binary" with no parameter sub-arguments. A user agent that meets the above conditions is said to be RFC-XXXX compliant. The meaning of this phrase is that it is assumed to be "safe" to send virtually any kind of properly-marked data to users of such mail systems, because they will at least be able to treat the data as undifferentiated binary, and will not simply splash it onto the screen of unsuspecting users. Summary Using the Content-Type and Content-TransferEncoding header fields, it is possible to include, in a standardized way, arbitrary types of data objects in RFC 822 mail messages, without breaking any of the existing restrictions imposed by RFC 821 and RFC 822. Using the "mulitpart" content-type, it is possible to mix multiple objects of different types in a single message. Additional optional header fields provide conventional mechanisms for certain extensions deemed desirable by many implementors. Finally, a number of useful content-types are defined for general use by consenting user agents. For more information, the authors of this document may be contacted via Internet mail: Nathaniel Borenstein <nsb@thumper.bellcore.com> Ned Freed <ned@hmcvax.claremont.edu> Acknowledgements This RFC is the result of the collective effort of a large number of people, at several IETF meetings and on the IETF-SMTP and IETF-822 mailing lists. Although any enumeration seems doomed to suffer from egregious omissions, the following are among the many contributors to this effort: Harald Alvestrand, Randall Atkinson, Kevin Carosso, Mark Crispin, Dave Crocker, Walt Daniels, Frank Dawson, Hitoshi Doi, Kevin Donnelly, Johnny Eriksson, Craig Everhart, Roger Fajman, Alain Fontaine, David Herron, Bruce Howard, Bill Janssen, Risto Kankkunen, Phil Karn, Tim Kehres, Neil Katin, Steve Kille, Anders Klemets, John Klensin, Vincent Lau, Timo Lehtinen, John MacMillan, Rick McGowan, Leo Mclaughlin, Goli Montaser-Kohsari, Keith Moore, Mark Needleman, John Noerenberg, David J. Pepper, Jonathan Rosenberg, Jan Rynning, Mark Sherman, Keld Simonsen, Bob Smart, Einar Stefferud, Michael Stein, Taro Suzuki, Steve Uhler, Stuart Vance, Erik van der Poel, Peter Vanderbilt, Greg Vaudreuil, Brian Wideen, Glenn Wright, and David Zimmerman. The authors apologize for any omissions from this list, which were certainly unintentional. References [REF-PS] Adobe Systems, Inc. Postscript Language Reference Manual. Addison-Wesley, Reading, Mass., 1985. [REF-SGML] ISO TC97/SC18. Standard Generalized Markup Language. Tech. Rept. DIS 8879, ISO, 1986. [REF-TEX] Knuth, Donald E. The TEXbook. Addison-Wesley, Reading, Mass., 1984. [REF-TROFF] Ossanna, Joseph F. NROFF/TROFF User's Manual. Bell Laboratories, Murray Hill, New Jersey, 1976. Computing Science Technical Report No.54. [REF-SCRIBE] Scribe Systems. SCRIBE Document Production Software. Scribe Systems, 1985. Fourth Edition. [REF-ISO646] International Standard--Information Processing--ISO 7-bit coded character set for information interchange, ISO 646:1983. [REF-ISO-2022] International Standard--Information Processing--ISO 7-bit and 8-bit coded character sets--Code extension techniques, ISO 2022:1986. [REF-ANSI] Coded Character Set--7-Bit American Standard Code for Information Interchange, ANSI X3.4-1986. [REF-X400] Schicker, Pietro, "Message Handling Systems, X.400", Message Handling Systems and Distributed Applications, E. Stefferud, O-j. Jacobsen, and P. Schicker, eds., North-Holland, 1989, pp. 3-41. [RFC-821] Postel, J.B. Simple Mail Transfer Protocol. August, 1982, Network Information Center, RFC-821. [RFC-822] Crocker, D. Standard for the format of ARPA Internet text messages. August, 1982, Network Information Center, RFC-822. [RFC-934] Rose, M.T.; Stefferud, E.A. Proposed standard for message encapsulation. January, 1985, Network Information Center, RFC-934. [RFC-1049] Sirbu, M.A. Content-type header field for Internet messages. March, 1988, Network Information Center, RFC-1049. [RFC-1113] Linn, J. Privacy enhancement for Internet electronic mail: Part I - message encipherment and authentication procedures [Draft]. August, 1989, Network Information Center, RFC-1113. [RFC-1148] Kille, S.E. Mapping between X.400(1988) / ISO 10021 and RFC 822. March, 1990, Network Information Center, RFC-1148. [RFC-1154] Robinson, D.; Ullmann, R. Encoding header field for internet messages. April, 1990, Network Information Center, RFC-1154. [REF-ATK] Borenstein, Nathaniel S., Multimedia Applications Development with the Andrew Toolkit, Prentice Hall, 1990. [REF-CCITT84c] CCITT SG 5/VII, "Recommendations X.420," Message Handling Systems: Interpersonal Messaging User Agent Layer, October 1984. [REF-CCITT/ISO88b] CCITT/ISO, "CCITT Recommendations X.420/ ISO IS 10021-7", Message Handling Systems: Interpersonal Messaging System, [REF-ODA] ISO 8613; Information Processing: Text and Office System; Office Document Architecture (ODA) and Interchange Format (ODIF), Part 1-8 1989. [REF-ULAW] *************** [REF-ALAW] *************** [REF-DES] **************** [REF-PBM] **************** [REF-G3FAX] ************ [REF-ISO-10646] ************ [REF-ISO-8859] ********** [REF-AFS] ************ Appendix [APP-CONTENTTYPES] -- Partial List of Predefined Content-Type Values TEXT -- Indicates the body or body part contains textual information. The precise meaning of text body parts is given in Appendix [APP-TEXT]. MESSAGE -- Indicates that the body or body part is an encapsulated message, with the syntax of an RFC 822 message. If a character set specification is given, it applies to the uninterpreted textual fields in the RFC 822 message header area. Thus it can be used to represent address and subject information in non-ASCII character sets. The character set specification in the "Content-type: message" field does NOT apply to the body of the encapsulated message. Thus, to encapsulate a message with non-ASCII characters in both the header fields and in the body, you would need something like the following: From: <ASCII form> Subject: <ASCII form> Content-type: message/iso-10646 From: <iso-10646-form> Subject <iso-10646-form> Content-type: text/iso-10646 Message body in iso-10646 character set. MULTIPART -- Indicates the body or body part contains multiple encapsulated body parts, each of which may be of a different content-type. The precise syntax of a "multipart" message is defined in Appendix [APP-MULTIPART]. DIGEST -- Indicates that the body or body part is a digest of encapsulated messages. The digest content-type is syntactically identical to the multipart content-type, but the parts are to be interpreted as encapsulated messages rather than as simple body parts. The digest content-type is also suitable for encapsulation of a single message with a prefix, e.g. for a rejection message when mail cannot be delivered. BINARY -- Indicates that the body or body part is binary data. A character set may be specified, but its automatic interpretation is unlikely to be meaningful. The parameters for type binary are a set of attribute/value pairs, of the form "NAME=VALUE", separated by the usual semicolons. The set of possible attributes to be defined includes, but is not limited to: NAME -- a suggested name for the binary data as a file. TYPE -- the type of binary data CONVERSIONS -- the set of operations that have been performed on the data before putting it in the mail (and before any Content-TransferEncoding that might have been applied). If multiple conversions have occurred, they should be specified in the order they were applied, and separated by commas. The values for these attributes are left undefined at present, but may require specification in the future. An example of a common (though discouraged) usage might be: Content-type: binary; name=foo.tar; type=tar; \ conversions=compress,uuencode However, the use of such mechanisms as uuencode and compress is explicitly discouraged, in favor of the more standardized Content-TransferEncoding mechanism. In particular, uuencode is not well-suited for mail transport because it is ill-defined, it comes in several incompatible versions, many of which do not work in a pipe, and which use characters that do not translate well into certain representations (e.g. EBCDIC) and are not transmitted reliably over certain connections (e.g. those that remove trailing white space from a line). The recommended action for an implementation that receives binary mail of an unrecognized type is to simply offer to put the data in a file, with any Content-TransferEncoding undone, or perhaps to use it as input to a user-specified process. Implementations are warned NOT to implement a path-search mechanism whereby an arbitrary program named in the Content-type header (e.g. the "type=" subfield) is found and executed using the binary data as input. Such an implementation could open up a significant security problem, the elucidation of which is left as an exercise for the reader. AUDIO -- Indicates that the body or body part contains audio data. The first parameter specifies the audio representation format; predefined case-insensitive values are "U-law" [REF-ULAW] and "A-law" [REF-ALAW]. (U-law and A-law are the American and European audio telephony standards.) The second parameter may be used to name a header format (e.g. "Sun"). The third parameter may be used to give the size, in bytes, of the header that precedes the actual audio data. This byte count applies to the raw audio data, not to the size of the data as represented in, for example, the base64 encoding. IMAGE --Indicates that the body or body part contains an image. The first parameter specializes the image format; predefined case insensitive values include "G3Fax" for Group Three Fax [REF-G3FAX] and "pbm", "pgm", and "ppm" for the "portable bitmap" formats [REF-PBM] for black and white, grey scale, or color images. PEM-MESSAGE -- Indicates that the body or body part is an encapsulated message encrypted with DES encryption [REF-DES] and formatted as the encapsulated portion of Privacy Enhanced Mail according to RFC 1113 [RFC-1113]. In this case, the body-part is ONLY the privacy-enhanced encapsulated part that, according to RFC 1113, occurs between the encapsulation boundaries. The boundaries themselves (lines of the form " -----PRIVACY-ENHANCED MESSAGE BOUNDARY-----") are not included. PARTIAL-MESSAGE -- Indicates that the body or body part is a fragment of a larger message. Three subfields must be specified in the content-type field: The first is a unique identifier, to be used to match the parts together. The second, an integer, is the part number. The third, another integer, is the total number of parts. Thus, part 2 of a 3-part message might have the following header field: Content-type: Partial-Message; oc=jpbe0M2Yt4s; 2; 3 When the parts of a message broken up in this manner are put together, the result is a complete RFC-822 format message, which may have its own Content-type header field, and thus may contain any other data type. EXTERNAL-REFERENCE -- Indicates that the body or body part is primarily a placeholder for the data that are intended to be conveyed, presumably because too much data is involved for the underlying mail transport mechanism to handle. The subfields are, as in the case of the "binary" content-type, attribute-value pairs. In this case, the subfields describe a mechanism for accessing the binary data. The set of possible attributes includes, but is not limited to: FILENAME -- The name of a file that contains the external data. SITE -- one or more domain names, comma separated, of machines that are known to have access to the data file. REAL-TYPE -- The real content-type of the data, once retrieved. EXPIRATION -- The date (in the format "month day, year") after which the existence of the external data is not guaranteed. With the emerging possibility of very wide-area file systems [REF-AFS], it becomes very hard to know in advance the set of machines where a file will and will not be accessible directly from the file system. Therefore it makes sense to provide both a file name, to be tried directly, and the name of one or more sites from which the file is known to be accessible. An implementation can try to retrieve remote files using FTP or any other protocol, using anonymous file retrieval or prompting the user for the necessary name and password. However, the external-reference mechanism is not intended to be limited to file retrieval. One can imagine, for example, using unique identifiers and a video server for external references to video clips. However, this memo explicitly defines only the FILENAME and SITE attributes for retrieval purposes, as this is the only retrieval method that is currently widely applicable. Other attributes may be defined as needed. The "REAL-TYPE" attribute may be used to specify a new content-type header field to be applied to the data once retrieved, as the data are assumed to be only the body of a message, not including any header information. Note that semicolons may be quoted within subfields. Thus an external reference to an image in G3FAX format might have the following content-type header field: Content-Type: external-reference; \ name=/usr/local/images/contact.g3; \ site=thumper.bellcore.com; \ real-type="image; g3fax" \ expiration = "September 23, 1997" If a message is of content-type "external-reference", then the actual body of the message is ignored. POSTSCRIPT -- Indicates the body or body part consists of information encoded using the Postscript Page Definition Language developed by Adobe Systems, Inc. [REF-PS]. For type "postscript" the first parameter is a version-number field ("1.0", "2.0", or "null"), and the second field is a comma-separated list of resource references, including, but not limited to, "laserprep2.9", "laserprep3.0", "laserprep3.1", and "laserprep4.0". TeX -- Indicates the body or body part contains embedded formatting information according to the syntax of the TeX document production language. [REF-TEX] TROFF -- Indicates the body or body part contains embedded formatting information according to the syntax specified for the TROFF formatting package developed by AT&T Bell Laboratories. [REF-TROFF]. For type "troff" the parameters include, but are not limited to, "eqn", "tbl", "me", and the names of other troff macro packages. Alternate character set specifications are acceptable. ODA -- Indicates that the body or body part is an ODA document, containing a whole document encoded according to the Office Document Architecture [REF-ODA]. The single parameter following "Content-type: ODA" should be either "; ODIF" or "; ODL/SDIF" to indicate the ODA encoding type. Any additional information needed to process the document must be included in the document profile which is included in the document. DVI -- Indicates the body or body part is information in the device independent file format produced by TROFF or TeX. X-BE2 -- Indicates the body or body part is Andrew-format information [REF-ATK]. The first parameter is the Andrew datastream version number, and the second "X-"atom -- Any type value beginning with the characters "X-" and not defined here or in another RFC is a private value, to be used by consenting mail systems by mutual agreement. Any format without a rigorous and public definition should be named with an "X-" prefix. Appendix [APP-TEXT] -- The TEXT Content-type and the MAILASCII Character Set In keeping with historical practice and expectations, the default content-type for internet mail is "text", and the default character set is the one specified by RFC 822. This content-type can be explicitly specified as "text", and the character set as "mailascii". Alternately,a different character set may be specified, in which case the body text is in the specified character set. A recommended list of predefined character sets can be found at the end of this appendix. Note that if the specified character set includes 8-bit data, the Content-TransferEncoding header field is required in order to transmit the message via SMTP. The default character set has been the subject of some confusion and ambiguity in the past. Its definition is spelled out here to reduce such ambiguity in the future. The MAILASCII character set is based on a series of standards and on the historical standard practice in the Internet mail community. However, the precise meaning of this character set has been the subject of some debate. In this appendix, therefore, we define the MAILASCII character set. It is our belief that this definition corresponds with the default assumptions made for messages without Content-type headers, as defined by RFC 822. The message body is coded in the character set of American Standard Code for Information Interchange, sometimes known as "7-bit ASCII". This is not an arbitrary seven-bit character code, but indicates that the message body uses character coding that uses the exact correspondence of codes to characters specified in ASCII. National use variations of ISO646 [REF-ISO646] are not ASCII, and neither an explicit "ASCII" character set, nor "MAILASCII", nor the default (omission of a character set) should be used when characters are coded using them. (Discussion: RFC821 very explicitly specifies "ASCII", and references an earlier version of the American Standard cited in [REF-ANSI]. Whether that specification, rather than a reference to an International Standard, was done deliberately or out of convenience or ignorance, is no longer interesting: insofar as one of the purposes of specifying a content-type and character set is to permit the receiver to unambiguously determine how the sender intended the coded message to be interpreted, assuming anything other than "strict ASCII" as the default would risk unintentional and incompatible changes to the semantics of messages now being transmitted. This also implies that messages containing characters coded according to national variations on ISO646, or using code-switching procedures (e.g., those of ISO2022), as well as 8-bit or multiple octet character encodings MUST use an appropriate character set specification to be consistent with this specification.) Because of the restriction imposed on message bodies by RFC 822 and, in practice, by Message Transport Agents that are more-or-less compliant with RFC 821, implementors should be careful in several ways regarding MAILASCII text: (1) Delimiters other than CR-LF pairs may be used in the local representation of a message on some systems. The persistence of CR-LF pairs should not be relied on. (2) Isolated CR and LF characters are not well tolerated in general; they may be lost or converted to delimiters on some systems, and hence should not be relied on. (3) TAB characters may be misinterpreted or may be automatically converted to variable numbers of spaces. This is unavoidable in some environments, notably those not based on the ASCII character set. Such conversion is STRONGLY DISCOURAGED, but it may occur, and users of MAILASCII format should not rely on the persistence of TAB characters. (4) Lines longer than 78 characters may be wrapped or truncated in some environments. Line wrapping and line truncation are STRONGLY DISCOURAGED, but unavoidable in some cases. Applications which depend on lines not being wrapped should use mechanisms other than unencoded MAILASCII bodyparts to transmit messages. (5) Trailing "white space" characters (SPACE, TAB, etc.) on a line may be discarded by some transport agents, and hence should not be relied on. Please note that the above list is NOT a list of recommended practices -- we do not recommend that MTA's alter the character of white space, or wrap long lines. These are known BAD practices on established networks, and implementors must guard against the bad effects they can cause. See RFC 821, RFC 822, and RFC1113 for additional information about canonical SMTP formats. Authors of software which composes "MAILASCII" in compliance with this RFC should be well-acquainted with SMTP formats. The complete MAILASCII character set is listed below: ***** SHOULD WE KEEP IN THE CONTROL CHARS???? 0 nul 16 dle 32 sp 48 0 64 @ 80 P 96 ` 112 p 1 soh 17 dc1 33 ! 49 1 65 A 81 Q 97 a 113 q 2 stx 18 dc2 34 " 50 2 66 B 82 R 98 b 114 r 3 etx 19 dc3 35 # 51 3 67 C 83 S 99 c 115 s 4 eot 20 dc4 36 $ 52 4 68 D 84 T 100 d 116 t 5 enq 21 nak 37 % 53 5 69 E 85 U 101 e 117 u 6 ack 22 syn 38 & 54 6 70 F 86 V 102 f 118 v 7 bel 23 etb 39 ' 55 7 71 G 87 W 103 g 119 w 8 bs 24 can 40 ( 56 8 72 H 88 X 104 h 120 x 9 ht 25 em 41 ) 57 9 73 I 89 Y 105 i 121 y 10 nl 26 sub 42 * 58 : 74 J 90 Z 106 j 122 z 11 vt 27 esc 43 + 59 ; 75 K 91 [ 107 k 123 { 12 np 28 fs 44 , 60 < 76 L 92 \ 108 l 124 | 13 cr 29 gs 45 - 61 = 77 M 93 ] 109 m 125 } 14 so 30 rs 46 . 62 > 78 N 94 ^ 110 n 126 ~ 15 si 31 us 47 / 63 ? 79 O 95 _ 111 o 127 del Beyond MAILASCII, one can imagine an enormous proliferation of character sets. It is the opinion of the authors of this memo that a large number of character sets is NOT a good thing. We would prefer to specify a single character set that can be used universally for representing all of the world's languages in electronic mail. Unfortunately, there is no clear choice for such a universal representation, and existing practice in several communities seems to point to the continuing use of multiple character sets in the near future. For this reason, we define names for a small number of character sets for which a strong consituent base exists. We recommend the use of ISO-10646 wherever possible. The defined character set names are: MAILASCII -- as defined above. ISO-10646 -- as defined in [REF-ISO-10646] ISO-8859-X -- where "X" is to be replaced, as necessary, for the national use variants of ISO-8859 [REF-ISO-8859] ISO-2022 -- as defined in [REF-ISO-2022] In the opinion of the authors, this is already far more character sets than are really desirable, and implementors are discouraged from defining new ones unless absolutely necessary. ***** I AM SURE THAT I NEED SOME FLESHING OUT OF THE ABOVE DEFINITIONS & REFERENCES Appendix [APP-MULTIPART] -- The "Multipart" Content-Type In the case of multiple part messages, a "multipart" Content-type field should appear in the RFC 822 message header. The message body is then assumed to contain multiple parts separated by encapsulation boundaries. Each of the parts is defined, syntactically, as a complete RFC 822 message in miniature. That is, what is found between the encapsulation boundaries is a header area, a blank line, and a body area, in accordance with the RFC 822 syntax for a message. However body parts are NOT to be interpreted as actually being RFC 822 messages. To begin with, NO header fields are actually required in body parts. A body part that starts with a blank line, therefore, is a body part for which all default values are to be assumed. In such a case, of course, the absence of a Content-type header field implies that the encapsulation is MAILASCII text. The only header fields that have defined meaning for body-parts are those the names of which begin with "Content-". All other header fields are to be ignored in body-parts, and may be discarded by gateways. They are permitted to appear in body parts only for ease of conversion between messages and body parts. It must be understood that body parts are NOT messages. For example, a gateway between Internet and X.400 mail must be able to tell the difference between a body part that consists of an image and a bodypart that consists of an encapsulated message, the body of which is an image. In order to represent the latter, the body part should have "Content-type: message", and its body (after the blank line) should be the encapsulated message, with its own "Content-type: image" header field. Body parts use the same syntax as messages because there are many legitimate cases in which a body part might be converted into a message, or vice versa. The identical syntax makes such conversions easy, but must be understood by implementors. (For the special case in which all parts are actually messages, a "digest" content-type is also defined.) As stated previously, each pair of consecutive body parts are separated by an encapsulation boundary. The encapsulation boundary MUST NOT appear inside any of the encapsulated parts. Thus, it is crucial that the composing agent be able to choose and specify the boundary that will separate the parts. The Content-type field for multipart messages requires two supplementary fields. The first is used to specify a version number and should be either "1-S" and "1-P". The two versions have identical syntax, but the "-P" is intended as a hint, to receivers, that the parts are intended to be viewed in parallel rather than sequentially. Implementations that can not show the parts in parallel, or that choose not to do so, are free to treat all multipart messages of version "1-P" as if they were version "1-S". However, all implementations should check the version number, to ensure graceful behavior in the event that an incompatible future version of multipart messages is defined later. The second supplementary field, which is always required for multipart messages, is used to specify the format of the encapsulation boundary. The encapsulation boundary is defined as a line consisting entirely of two hyphen characters ("-", decimal code 45) followed by the second parameter of the Content-type header field with any leading or trailing white space removed. (DISCUSSION: The specification that white space be removed is intended to eliminate the possible introduction of ambiguity caused by the addition or deletion of white space by message transport agents. They hyphens are for rough compatibility with the earlier RFC 934 method of message encapsulation, and for ease of searching for the boundaries in some implementations. However, it should be noted that multipart messages are NOT completely compatible with RFC 934 encapsulations; in particular, they do not obey RFC 934 quoting conventions for embedded lines that begin with hyphens.) Thus, a typical multipart content-type header field might look like this: Content-type: multipart; 1-S; gc0p4Jq0M2Yt08jU534c0p This indicates that the message consists of several parts, each itself structured as an RFC 822 message, which are intended to be viewed one-at-a-time, and that the parts are separated by the line --gc0p4Jq0M2Yt08jU534c0p The encapsulation boundaries must not appear within the encapsulations, and should be no longer than 70 characters, not counting the two leading hyphens. The encapsulation boundary following the last body-part should be a distinguished delimiter that indicates that no further body-parts will follow. Such a delimiter is identical to the previous delimiters, with the addition of two more hyphens at the end of the line: --gc0p4Jq0M2Yt08jU534c0p-- It should be noted that there is room for additional information prior to the first encapsulation boundary and following the final such boundary. In these "prefix" and "postfix" areas, arbitrary text may be included. It is legitimate for a multipart message to specify an alternate character set. In such cases, the specified character set specified applies to the prefix area, the postfix area, and the textual portions of the body-part headers. Distinguished portions of the body-part headers, such as the words "Content-type:", are to retain their interpretation in US ASCII. The use of "Content-Type: Multipart" as a message part within another "Content-Type: Multipart" is explicitly allowed. In such cases, for obvious reasons, care must be taken to ensure that each nested mulitpart message should use a different boundary delimiter. See Appendix [APP-COMPLEX] for an example of nested multipart messages. The use of content-type "Multipart" with only a single included part may be useful in certain contexts, and is explicitly permitted. Overall, the body of a multipart message may be specified as follows: body := prefix 1*encapsulation close-delimiter postfix encapsulation := delimiter CRLF message delimiter := "--" <delimiter from Content-type resource> close-delimiter := delimiter "--" prefix := *text postfix := *text message = <as defined in RFC 822, with all header fields optional, containing no lines matching "delimiter"> Appendix [APP-SIMPLE] -- Simple Non-ASCII Text Example ***** FILL IN HERE WITH AN EXAMPLE OF NON-ASCII TEXT. Can somone provide me with a cute example from a non-ASCII character set? Appendix [APP-COMPLEX] -- A Complex Multipart Example What follows is the outline of a complex multipart message. This message has three parts to be displayed serially: an introductory plain text part, an embedded multipart message, and a closing encapsulated text message in a non-ASCII character set. The embedded multipart message has two parts to be displayed in parallel, a picture and an audio fragment. From: ... Subject: ... Content-type: multipart; 1-s; tweedledum This is a multipart message. Since I've not specified another character set, this "prefix" area is in US ASCII. --tweedledum ...Some more text appears here... [Note that the preceding blank line means no header fields were given and this is text, with charset US ASCII.] --tweedledum Content-type: multipart; 1-p; tweedledee This is a multipart message. If you are reading this text, you might want to consider changing to a user agent that understands how to properly display multipart messages. --tweedledee Content-type: u-law; 8000 HZ; X-NEXT Content-TransferEncoding: base64 ... base64-encoded NeXT-format audio data goes here.... --tweedledee Content-type: image; G3FAX Content-TransferEncoding: Base64 ... base64-encoded FAX data goes here.... --tweedledee-- --tweedledum Content-type: message/ISO-8859-1 From: Keld J|rn Simonsen (name can be non-ASCII) Subject: whatever Content-type: Text/ISO-8859-1 Content-TransferEncoding: Quoted-printable ... Closing text goes here ... --tweedledum--
- Return of the Son of Beneath the Planet of RFC-XX… Nathaniel Borenstein
- Re: Return of the Son of Beneath the Planet of RF… Nathaniel Borenstein
- RFC-xxxx videoconference? John C Klensin
- re: Return of the Son of Beneath the Planet of RF… Mark Crispin
- re: Return of the Son of Beneath the Planet of RF… John C Klensin
- re: Return of the Son of Beneath the Planet of RF… Mark Crispin
- Re: Return of the Son of Beneath the Planet of RF… John C Klensin
- Re: Return of the Son of Beneath the Planet of RF… Nathaniel Borenstein
- Re: Return of the Son of Beneath the Planet of RF… Nathaniel Borenstein
- Re: Return of the Son of Beneath the Planet of RF… John C Klensin