Re: Return of the Son of Beneath the Planet of RFC-XXXX

Nathaniel Borenstein <nsb@thumper.bellcore.com> Tue, 14 May 1991 17:50 UTC
Message-Id: <scA2GCu0M2Yt0_1_wo@thumper.bellcore.com>
Date: Tue, 14 May 1991 13:55:26 -0400
From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>
Subject: Re: Return of the Son of Beneath the Planet of RFC-XXXX
In-Reply-To: <9105141332.aa09046@NRI.NRI.Reston.VA.US>
References: <9105141332.aa09046@NRI.NRI.Reston.VA.US>
Your wish is my command...

                  How to Read the May Draft of RFC-XXXX

This is the fifth major draft, at least, of RFC-XXXX.  Those of you who
have been following along are, no doubt, heartily sick of the process by
now, as am I.  I'm trying to make it easier for us all in the following
ways:

1.  I've compiled a list of major changes from the April draft.  I'm not
trying to pull any fast ones on anybody, but it is possible that the
list is incomplete.  It is, however, my best attempt to provide a simple
list of what has changed.

2.  With previous drafts, I think that comments mostly came in three flavors:

    NITS:  Minor points of clarification, typographical or
        technical correction, etc.  These were uncontroversial and I
        tried to adopt them all.

    SHOW-STOPPERS:  These were major disagreements, where people
        indicated unhappiness so great that they might be unable to
        live with the draft as written.  Obviously I've tried VERY
        hard to deal with these, but sometimes people have
        SHOW-STOPPER comments that are pretty nearly in direct
        conflict with each other.

    ARGUMENTS:  These are sincere disagreements where the person
        disagreeing could still live with the draft if he lost the
        argument.

I would like to STRONGLY URGE the readers of this draft to self-classify
their comments into the above three categories, and to treat them in the
following ways:

    NITS:  Send them directly to me; no need to bother the whole list.

    SHOW-STOPPPERS:  Sigh...  I'm hoping there aren't any left,
        but if you have them, please send them to the whole list.

    ARGUMENTS:  If you can live with losing the argument, and if
        the argument has already been well-argued in the past on the
        list, ask yourself: is it worth re-arguing?  I'm not trying
        to prevent debate, merely encouraging you to reflect before
        reopening old arguments.

I still need help on a number of things, particularly fleshing out some
of the references and sanity-checking some of the areas in which I'm not
an expert, notably character sets, audio, and privacy-enhanced messages
(PEM).  If you know something about one of these, please read that part
of the draft extra carefully.

That's all.  Enjoy.  I look forward to your comments.  Well, sort of....
 :-)   -- Nathaniel

Major Changes From April Draft

There is a lot of new prose, and the document has been reorganized
substantially, to clarify intent and to discuss rejected alternatives. 

Content-type syntax:  There is now a distinguished place for character
sets, which are no longer content-types.  The rest of the syntax has
been generalized to a set of semicolon-separated parameters.

Content-types:  Several content-types have been consolidated into
"image" and "audio".  The Scribe and SGML content-types have been
eliminated.  DES-MESSAGE has been replaced by PEM-MESSAGE.  New
content-types: binary, digest, message, partial-message,
external-reference.  The scheme for officially defining new
content-types has been changed to require an RFC.

The "Encoded-Variable" stuff has been elminated, in favor of
Content-type: Message/charset

Content-Encoding has been changed to Content-TransferEncoding.  The
hexadecimal encoding has been eliminated, and some prose about the need
for a compressed encoding has been added.

The base64 encoding has added "," as a way to specify portable end-of-lines.

The quoted-printable encoding has changed "&" and "\" to "=" and ":" for
portability, and has added some rules (and clarified others) regarding
CRLF and trailing white space.

Two new optional header fields, Content-ID and Content-Description, have
been defined.

Multipart messages:  The definition has changed so that body-parts are
no longer messages, though the syntax is the same.  A new distinguished
closing delimiter is now required.  The content-type for multipart can
now specify a character set, which made it seem reasonable to reinstate
the notion of a text prefix & postfix in the specified character set. 
(US-centrism was a major criticism of earlier proposals to allow text in
the prefix & postfix.)

Added a new notion of "RFC-XXXX-compliant" implementations, defining a
minimal subset to be implemented to earn such a label.
Network Working Group -- Request for Comments: XXXX

                Mechanisms for Specifying and Describing
                  the Format of Internet Message Bodies

                     Nathaniel Borenstein, Bellcore
                           Ned Freed, Innosoft

                                May 1991

Status of This Memo

This document suggests extensions to the RFC 822 message representation
protocol to allow multi-part textual and non-textual messages to be
represented and exchanged without loss of information. Discussion and
suggestions for improvements are welcome.  This memo does not specify an
Internet standard, but it is intended to be a step towards a standard. 
This draft document will be submitted to the RFC editor as a protocol
specification.  Distribution of this memo is unlimited.  Please send
comments to Nathaniel Borenstein <nsb@thumper.bellcore.com>

Table of Contents

Introduction
The Content-Type Header Field
The Content-TransferEncoding Header Field
      Quoted-Printable Content-TransferEncoding
      Base64 Content-TransferEncoding
Additional Optional Content- Header Fields
      Optional Content-ID Header Field
      Optional Content-Description Header Field
      Optional Content-Size Header Field
RFC-XXXX Compliance
Summary
Acknowledgements
References
Appendix [APP-CONTENTTYPES] -- Partial List of Predefined Content-Type Values
Appendix [APP-TEXT] -- The TEXT Content-type and the MAILASCII Character Set
Appendix [APP-MULTIPART] -- The "Multipart" Content-Type
Appendix [APP-SIMPLE] --  Simple Non-ASCII Text Example
Appendix [APP-COMPLEX] -- A Complex Multipart Example
Introduction

Since its publication in 1982, RFC 822 [RFC-822] has defined the
standard format of textual mail messages on the Internet.  Its success
has been such that the RFC 822 format has been adopted, wholly or
partially, well beyond the confines of the Internet and of SMTP
transport, as defined by RFC 821 [RFC-821].  As the format has seen
wider use, a number of limitations have become increasingly problematic
for the user community.

RFC 822 was intended to specify a format for text messages.  As such,
non-text messages, such as multimedia messages that might include audio
or images, are simply not mentioned.  Even in the case of text, however,
RFC 822 is inadequate for the needs of email users whose languages
require the use of character sets richer than US ASCII [REF-ANSI].  For
mail containing audio, video, Japanese text, or even text in most
European languages, RFC 822 does not specify enough to permit
interoperability.

One of the notable limitations of RFC 821/822 based mail systems is the
fact that they limit the contents of electronic mail messages to
relatively short lines of seven-bit ASCII.  This forces a user to
convert any non-textual data that she may wish to send into a seven-bit
ASCII representation before invoking her local mail UA (User Agent
program).  Examples of such encodings currently used in the Internet
include pure hexadecimal, uuencode, the 3-in-4 base 64 scheme specified
in RFC 1113, the Andrew Toolkit Representation [REF-ATK], and many
others.

These limitations become even more apparent as gateways are designed to
allow for the exchange of mail messages between RFC 822 hosts and X.400
hosts.  X.400 [REF-X400] specifies mechanisms for the inclusion of
non-textual body parts within electronic mail messages.  The current
standards for the mapping of X.400 messages to RFC 822 messages specify
that either X.400 non-textual body parts should be converted to (not
encoded in) an ASCII format, or that they should be discarded, notifying
the RFC 822 user that discarding has occurred.  This is clearly
undesirable, as information that a user may wish to receive is lost. 
Even though a user's UA may not have the capability of dealing with the
non-textual body part, the user might have some mechanism external to
the UA that can extract useful information from the body part. 
Moreover, it does not allow for the fact that the message may eventually
be gatewayed back into an X.400 MHS, where the non-textual information
would definitely become useful again.

This memo describes several mechanisms that combine to solve these
problems.  In particular, it describes:

1.  A Content-type header field, generalized from RFC 1049 [RFC-1049],
which can be used to describe the type of data in the body of a message
and to fully specify the representation (encoding) of such data.

2.  A Content-TransferEncoding header field, which can be used to
describe an auxilliary encoding that was applied to the data in order to
allow it to pass through the mail transport layer.

3.  A "text" content-type value, which can be used to represent text
information in a number of character sets in a standardized manner.

4.  A "multipart" content-type value, which can be used to combine
several separate body-parts, which may be made of different types of
data, into a single message.

5.  A "binary" content-type value, which can be used to transmit
uninterpreted or partially-interpreted binary data, and hence to
implement an email file transfer service.

6.  "Message" and "Digest" content-type values for encapsulating one or
more mail messages.

7.  Several additional content-type values, which can be used by
consenting User Agents to interoperate with additional message types
such as audio, images, and more.

8.  Several optional header fields that can be used to further describe
the data in a message body or body-part, in particular the Content-Size,
Content-ID, and Content-Description header fields.

Finally, to specify and promote a minimal level of interoperability,
this memo describes a subset of the above mechanisms that defines
"compliance" with this memo.  That is, it specifies the minimal subset
required for an implementation to be called "RFC-XXXX-compliant."

The Content-Type Header Field

The Content-Type header field was previously defined in RFC 1049, and is
reaffirmed and generalized here.  The remainder of this section is
derived from RFC 1049, and, where different, is intended to supersede it.

The Content-Type  header field is used to specify the type of data in a
message, by giving a type name, and to provide auxilliary information
that may be required for certain types.   In addition. a distinguished
syntax is defined for specifying character set information.  After the
type name and the optional character set, the remainder of the header
field is simply a set of parameter specifications, as defined for each
named type, and an optional comment.

(DISCUSSION:  It has been suggested that character sets can be specified
in the same way as any other auxilliary information, and that character
set specification is meaningless for content-types such as "audio" and
therefore should not be broadly defined as part of the top-level syntax.
  However, character sets have been given a distinguished syntax in
order to aid gateways that need to do character set translation without
necessarily understanding all possible content-types.  Such translation
should not, however, be undertaken lightly, as the complexities involved
are formidable and easily underestimated.)  

(COMPATIBILITY NOTE:  Readers familiar with RFC 1049 Content-types will
notice that the syntax has been generalized substantiallly.  However,
RFC 1049 content-types are all compliant with the new syntax.  In
particular, RFC 1049 content-types omitted the character-set
specification, and always had at most two of the parts now called
"parameters", which were distinguished by their position as indication a
version number and a resource reference.)

In the Extended BNF notation of RFC-822, we define a COntent-type header
field value as follows:

Content-Type:= type ["/" char-set] *[";" parameter]
		[comment]

parameter :=      local-part

char-set := "MAILASCII"/
            "ISO-10646" /
            "ISO-8859-" *DIGIT /
            "ISO-2022"

type   := "TEXT" /
          "MESSAGE" /
          "MULTIPART" /
          "DIGEST" /
          "BINARY" /
	  "AUDIO" /
          "IMAGE" /
          "PEM-MESSAGE"/
          "PARTIAL-MESSAGE"/
          "EXTERNAL-REFERENCE"/
	  "POSTSCRIPT" /
          "TeX" /
          "TROFF" /
          "DVI" /
          "ODA" /
          "DVI" /
          "X-BE2" /
          "X-"atom

These values are not case sensitive.  POSTSCRIPT, Postscript, and
POStscriPT are all equivalent.  

This set of type names is not intended to be exhaustive.  More may be
defined later.  The only constraint on the definition of such names is
the desire that their uses not conflict.  That is, it would be
undesirable to have two different communities using "Content-type: foo"
to mean two different things.  The process of defining new
content-types, then, is not intended to be a mechanism for imposing
restrictions, but simply a mechanism for publicizing content-type
usages.  There are, therefore, only two acceptable mechanisms for
defining new content-type values:

    1.  Private values (starting with "X-") may be defined
        unilaterally.

    2.  "Standard" values may be defined by the publication of
        an Internet RFC.  The RFC need not be very long, but must
        define the content-type, its associated parameter syntax,
        and the format of the body of a message so marked.

Several specific predefined "type" fields are explained in the
appendices of this memo.

If no Content-type header field is present, "text" is assumed, with the
default character set (MAILASCII) as specified in Appendix [APP-TEXT].  
This is consistent with the default message body type as defined by RFC
822. 

It should be noted that the list of Content-type values given in the
appendices is expected to be augmented in time, via the mechanisms
described above.  We have simply attempted, in this RFC, to give as many
standard Content-type definitions as was possible given the current
state of our knowledge.  The Content-type values defined here are
compatible with the values defined by RFC 1049.
The Content-TransferEncoding Header Field

Many content-types are represented, in their "natural" format, as 8-bit
or binary data.  Such data can not be transmitted over existing Internet
mail mechanisms because both RFC 821 and RFC 822 restrict mail messages
to 7 bit data with reasonably short lines.  It is necessary, therefore,
to define a standard mechanism for encoding such data in an acceptable
manner.

This RFC specifies that such encodings will be indicated by a new
"Content-TransferEncoding" header field.  The Content-TransferEncoding
field is used to indicate the type of transformation that has been used
to represent the message body in an acceptable manner.  

It should be noted, also, that there is considerable interest and effort
being expended on extending mail transport to permit 8-bit or binary
data.  If such extensions ever become commonplace, the
Content-TransferEncoding mechanism will quickly become irrelevant, and
it is therefore desirable not to "overload" Content-TransferEncoding
with additional mechanisms that might still be useful in such a future. 
For this reason, Content-TransferEncoding is restricted in its scope to
refer to nothing but the 7-bit encoding question.  Matters such as the
basic format in which information is "encoded" are to be handled by
another mechanism.  

Unlike Content-types, which are expected to proliferate, it is expected
that there will never be more than a few different
Content-TransferEncoding values, both because there is less need for
variation and because the effect of variation in
Content-TransferEncoding would be more problematic.  However,
establishing only a single Content-TransferEncoding mechanism does not
seem possible.  In particular, there is a tradeoff between the desire
for a compact and efficient encoding of binary data and the desire for a
readable encoding of data that is mostly, but not entirely, 7-bit data. 
For this reason, at least two encoding mechanisms are necessary, a
"readable" encoding and a "dense" encoding.   

A third encoding, for compressed ("super-dense") data, is also strongly
desirable.  This RFC does not specify a "compressed" encoding, due to
the uncertain legal state of the UNIX "compress" command and a lack of
certainty, during the drafting of this RFC, regarding the right way to
define a standard compression algorithm.  It is hoped that a compressed
Content-TransferEncoding will be defined in a future RFC.  Any
compression algorithm for such a use should be unambiguously defined and
without legal encumbrances.

The Content-TransferEncoding field is designed to specify a two-way
mapping between the "native" representation of a type of data and a
representation that can be readily exchanged using 7 bit mail transport
protocols as defined by RFC 821 (SMTP). This field has not been defined
by any previous RFC. The field's value is a single atom specifying the
type of encoding, as enumerated below.  Formally:

Content-TransferEncoding:=	"BASE64"/
			"QUOTED-PRINTABLE"/
			"8BIT"/"BINARY"/
			"7BIT"/"X-"atom

These values are not case sensitive.  That is, Base64 and BASE64 and
bAsE64 are all equivalent.  An encoding type of 7BIT implies that the
message is already in a seven-bit mail-ready representation. This value
is assumed if the Content-TransferEncoding header field is not present. 
If the message is stored or transported via a mechanism that permits
8-bit data, a Content-TransferEncoding of "8bit" should be used.  If the
message is stored or transported via a mechanism that permits arbitary
binary data, a Content-TransferEncoding of "binary" should nonetheless
be used.  (DISCUSSION:  The distinction between the
Content-TransferEncoding values of "binary," "8bit," and "7bit" may seem
unimportant in an 8-bit binary environment, but clear labeling will be
of enormous value to gateways between 8-bit and 7-bit systems.  The
difference between "8bit" and "binary" is that "8bit" implies adherence
to SMTP limits on line length and CR/LF semantics, whereas "binary" does
not.)

Implementors may define new content encoding values, but should prefix
them with "x-" to indicate their non-standard status, e.g.
"Content-TransferEncoding:  x-my-new-encoding".   However, unlike
Content-types, the creation of new Content-TransferEncoding values is
explicitly discouraged, as it seems likely to hinder interoperability
with little potential benefit.

If a Content-TransferEncoding header field appears as part of a message
header, it applies to the entire message body, whether or not that body
is of type "multipart."  If it is of type multipart, the encoding
applies recursively to all of the encapsulated parts, including their
encapsulated headers.  If a Content-TransferEncoding header field
appears as part of an encapsulation's headers, it applies only to the
body of the encapsulated part.  If the encapsulated part is itself of
type "multipart", the encoding applies recursively to all of the
encapsulated parts within that encapsulated part.

It should be noted that, because email is character-oriented, the
mechanisms describe here are mechanisms for encoding arbitrary byte
streams, not bit streams.  If a bit stream is to be encoded via one of
these mechanisms, it should first be converted to a byte stream using
the "big-endian" bit order, in which the earlier bits in a stream become
the higher-order bits in a byte.  A bit stream not ending at an 8-bit
boundary should be padded with zeroes.  If the precise bit count is
needed, it can be given in the Content-Size header field, described
later in this document.

The following sections will define the two standard encoding mechanisms.

Quoted-Printable Content-TransferEncoding

The Quoted-Printable encoding is intended to represent data that is
largely, but not entirely, 7 bit ASCII.  Printable ASCII portions of
body parts encoded in this way should be recognizable by humans, if
necessary, without translation.

In this encoding, ASCII characters 9 (tab), 10 (nl), 12 (np), 13 (cr),
32 through 57, inclusive, 59, 60, and 62 through 126, inclusive, are
unchanged.  All other characters, including characters 58 (:), 61 (=),
and 127 (DEL), are to be represented as determined by the following
rules:

    Rule #1:  Any 8 bit value may be represented by a ":" followed by a
    two digit hexadecimal representation of the character's 8-bit value.
     Thus, for example, character 12 (control-L, or formfeed) can be
    represented by ":0C", the equal-sign character (61) can be
    represented by ":3D", and the colon character (58) itself can be
    represented by ":3A".  Rule #1 is the REQUIRED representation for
    characters 127 through 160 and for character 255.

    Rule #2:  An 8 bit value from 161 through 254 may, alternately, be
    represented by an equal-sign character followed by the single
    character obtained by the removal of the high order bit, i.e. by
    subtracting 128 from the value.  Thus  the 8 bit value 193 may be
    represented as "=A".  Rule #2 is completely optional, given rule #1,
    but is provided for improved readability of some 8-bit character
    sets in which turning on the 8th bit produces a character similar to
    the corresponding 7 bit character, e.g. the 8th bit simply adds an
    umlaut.  

    Rule #3:  The literal equal-sign and colon characters must
    themselves be quoted by colons.  Thus, the colon may be represented
    as "::" and the equal-sign as ":=".  Note that this is not ambiguous
    with regard to the first clause, because neither ":" nor "=" are
    part of the hexadecimal alphabet.

    Rule #4:  A colon at the end of a line may be used to indicate a
    non-significant line break.  That is, if one needs to include a long
    line without line breaks, a message encoded with the
    quoted-printable encoding should include "soft" line breaks in which
    the line break is preceded by a colon.  Thus if the "raw" form of
    the line is a single line that says:

    Now's the time for all men to come to the aid of their country. 
    Now's the time for all men to come to the aid of their country. 
    Now's the time for all men to come to the aid of their country.

    This could be represented, in the quoted-printable encoding, as

    Now's the time for all men to come to the aid of their country.  :
    Now's the time for all men to come to the aid of their country.  :
    Now's the time for all men to come to the aid of their country.  

    This provides a mechanism with which long lines are encoded in such
    a way as to be restored by the user agent.    The quoted-printable
    encoding REQUIRES that lines be broken so that they are no more than
    79 characters long, using soft line breaks when necessary.

    Rule #5:  Although the SPACE (32) and TAB (9) characters may
    generally be represented as themselves, they should NOT be so
    represented at the end of a line, because some MTA's are known to
    remove "white space" from the end of a line.  In such cases, the
    characters MUST be represented as in rule #1 (as ":20" and ":09"
    respectively) or as themselves, followed by a soft line break
    followed by a real line break.  IMPORTANT NOTE:  In decoding a
    quoted-printable message, any trailing white space on a line should
    be deleted, as it will have been added by intermediate transport
    agents.

It is also recommended that the persistence of character codes less than
32 should not be relied on, particularly the TAB, CR, and LF characters.
 Even though TAB and form-feed are permitted in this encoding, they can
be quoted, and this is wise if their precise persistence is critical.

NOTE ABOUT CR AND LF in quoted-printable encoded messages:  The use of
CR or LF characters that are not part of a CR/LF sequence must be
encoded as :0D and :0A, respectively, in messages that use the
Quoted-Printable encoding.  Sequences such as CR LF LF are also invalid;
the only correct unencoded sequence is CR LF CR LF.  Although RFC-822
defines CR and LF as ordinary characters when used outside of the CR/LF
sequence, some implementations treat one (or both) as equivalent to
end-of-line or as error characters that are discarded.  Messages which
contain embedded bare CR or LF characters should use encoding style #1
to encode these characters "safely".  (Discussion: Some environments use
a bare CR or bare LF as the local end-of-line convention.  If a message
contains embedded bare CR or LF characters, it is impossible to
transform it from Internet to local conventions without interfering with
this local convention.)  The presence of a CR LF sequence in a
quoted-printable-encoded message is to be interpreted as an end-of-line
marker, to be represented as such according to local convention by the
decoding agent.  If it is necessary to send a binary sequence of CR LF
in such a message, these characters should be represented as :0D:0A in
order to prevent them from being re-converted into the local end-of-line
convention.

Since the hyphen character ("-") is represented as itself in the
Quoted-Printable encoding, the usual care must be taken, when
encapsulating a quoted-printable encoded message  or body part in a
multipart message, to ensure that the encapsulation boundary does not
appear anywhere in the message.  See the definition of multipart
messages, in Appendix [APP-MULTIPART].

Base64 Content-TransferEncoding

The Base64 Content-TransferEncoding is designed to represent arbitrary 8
bit data in a form that is not humanly readable.  The encoding and
decoding algorithms are simple, but the encoded data is only about 33
percent larger than the unencoded data.  This encoding is also used in
Privacy Enhanced Mail applications; it is described in RFC 1113. The
ability in RFC1113 to embed clear text within such an encoding is not
allowed in this context, however. The following description of the
encoding is adapted from RFC 1113; apart from the exclusion of the "*"
mechanism for embedded clear text and the definition of a portable
newline syntax, using the comma character, there are no significant
technical changes from RFC 1113.

A 64-character subset of International Alphabet IA5 is used, enabling 6
bits to be represented per printable character.  (The proposed subset of
characters is represented identically in IA5 and ASCII.) One additional
character, "=", is used to signify special processing functions.  The
character "=" is used for padding within the printable encoding
procedure. The encoding function's output is delimited into text lines
(using local conventions), with each line except the last containing
exactly 64 printable characters and the final line containing 64 or
fewer printable characters.  (This line length is easily printable and
is guaranteed to satisfy SMTP's 1000 character transmitted line length
limit.)  Although implementations are encouraged to be liberal in
accepting lines of different lengths if they are received, they should
only compose lines of the specified lengths.

The encoding process represents 24-bit groups of input bits as output
strings of 4 encoded characters. Proceeding from left to right across a
24-bit input group is formed by concatenating 3 8-bit input groups, this
is then treated as 4 concatenated 6-bit groups.  When encoding a bit
stream via the base64 encoding, the bit stream should be presumed to be
ordered with the most-significant-bit first.  That is, the first bit in
the stream will be the high-order bit in the first byte, and the eighth
bit with be the low-order bit in the first byte, and so on.

Each 6-bit group is used as an index into an array of 64 printable
characters. The character referenced by the index is placed in the
output string. These characters, identified in Table 1 below, are
selected so as to be universally representable, and the set excludes
characters with particular significance to SMTP (e.g., ".", "<CR>",
"<LF>").

                                 Table 1

   Value Encoding  Value Encoding  Value Encoding  Value Encoding
       0 A            17 R            34 i            51 z
       1 B            18 S            35 j            52 0
       2 C            19 T            36 k            53 1
       3 D            20 U            37 l            54 2
       4 E            21 V            38 m            55 3
       5 F            22 W            39 n            56 4
       6 G            23 X            40 o            57 5
       7 H            24 Y            41 p            58 6
       8 I            25 Z            42 q            59 7
       9 J            26 a            43 r            60 8
      10 K            27 b            44 s            61 9
      11 L            28 c            45 t            62 +
      12 M            29 d            46 u            63 /
      13 N            30 e            47 v
      14 O            31 f            48 w         (pad) =
      15 P            32 g            49 x
      16 Q            33 h            50 y

Special processing is performed if fewer than 24 bits are available at
the end of a message or encapsulated part of a message.  A full encoding
quantum is always completed at the end of a message. When fewer than 24
input bits are available in an input group, zero bits are added (on the
right) to form an integral number of 6-bit groups.  Output character
positions which are not required to represent actual input data are set
to the character "=".  Since all canonically encoded output is an
integral number of octets, only the following cases can arise: (1) the
final quantum of encoding input is an integral multiple of 24 bits;
here, the final unit of encoded output will be an integral multiple of 4
characters with no "=" padding, (2) the final quantum of encoding input
is exactly 8 bits; here, the final unit of encoded output will be two
characters followed by two "=" padding characters, or (3) the final
quantum of encoding input is exactly 16 bits; here, the final unit of
encoded output will be three characters followed by one "=" padding
character.

One addition is made to the RFC 1113 specification of this encoding: 
The comma character (",", ASCII 44) may be used to represent an
"end-of-line" or "end-of-record" marker.  If line-oriented data are
encoded using base64, it is desirable to restore end-of-line markers
according to the local convention.  The RFC 1113 specification, as given
above, offers no way to differentiate between a binary file including a
CRLF sequence and a portable end-of-line marker.  This memo augments
that mechanism to permit such differentiation, as follows.  To represent
an end-of-line marker:

    1.  Treat the byte stream preceding the end-of-line as
    terminating with at the end of the line -- that is, pad with "="
    characters as appropriate to complete the representation of the
    line.

    2.  Insert a comma character.

    3.  Resume the encoding starting a new 24-bit input group with
    the first character on the next line.

Thus, while encoding the binary sequence "a-b-c-CR-LF-a-b-c"  yields
"YWJjDQphYmM=", encoding "a-b-c" followed by an end-of-line followed by
"a-b-c" yields "YWJj,YWJj"  They will be translated back into the same
thing if the local end-of-line convention is CRLF, but they will be
translated back differently if the end-of-line convention is anything
other than CRLF.

Since the hyphen character ("-") is not used, there is no need to worry
about quoting apparent encapsulation boundaries within base64-encoded
body parts.
Additional Optional Content- Header Fields

Optional Content-ID Header Field

In constructing a high-level user agent, it may be desirable to allow
one message body-part to make reference to another.  This may be done
using the "Content-ID" header field, which is syntactically identical to
the "Message-ID" header field:

Content-ID := "<" msg-id ">"

Optional Content-Description Header Field

It may be desirable to associate some descriptive information with a
given body-part.  For example, it may be useful to mark an "image"
body-part as "a picture of the Space Shuttle Endeavor."  Such text may
be placed in the Content-Description header field.  The text will be
assumed to be in the same character set as the multipart message within
which it is a part.

Content-Description := *text

Optional Content-Size Header Field

In the discussions of earlier drafts of this memo, some people indicated
a strong preference for using a size-counting scheme to delimit the
boundaries between encapsulated parts of multipart messages.  This was
rejected because such schemes are not, in general, sufficiently robust
across the SMTP transport layer.  For example, line counts can be
altered by line-wrapping MTA's, and byte counts can be altered in any
number of ways, and may be confused by crossing boundaries in which the
size of an end-of-line marker changes.  However, there are restricted
environments in which either or both of these counts can be relied upon,
and in such environments it may be desirable to implement a count-based
approach to delimiters.  Therefore this memo specifies a conventional
way to do this, in order to promote interoperability among systems that
are able to take this approach.

In such cases, boundary delimiters, as defined above, are still
required.  However, the header area of an encapsulated part may include
an optional Content-Size header which indicates where the encapsulated
part ends, if its size has not been altered.  The size may be measured
in either bytes or lines.  Those who use the Content-Size header field
should still preserve the encapsulation boundaries, and should recognize
that other agents are free to ignore it in favor of complete reliance on
encapsulation boundaries.

The Content-Size header field is defined as follows:

Content-Size = 1*DIGIT "lines"
	/ 1*DIGIT "bytes"
	/ 1*DIGIT "bits"

Note that each encapsulated part should still end with an end-of-line
followed by an encapsulation boundary.  However, a message store that
wishes, for example, to use a storage format that is largely RFC
822-compliant, but includes binary storage of binary objects, can use
the Content-Size header field to indicate whether or not the final
end-of-line is to be interpreted as part of the binary object.  If the
end-of-line follows the number of bytes specified for the encapsulation,
then it is not part of the encapsulation.

The size given by the Content-Size header field is the size of the
encapsulation's body only, not counting the blank line that separates
the header from the body.  In other words, the four bytes CRLF CRLF,
which separate header from body, are NOT counted as part of the
content-size.
RFC-XXXX Compliance

The mechanisms described in this memo are open-ended.  It is definitely
not expected that all implementations will implement all of the
content-types described, nor that they will all share the same
extensions.  In order to promote interoperability, however, it is useful
to define the concept of "RFC-XXXX-Compliance" to define a certain level
of implementation that allows the useful interworking of messages with
content that differs from US ASCII text.  In this section, we specify
the requirements for such compliance.

An RFC-XXXX-Compliant mail user agent must:

    1.  Recognize the Content-TransferEncoding header field, and
    un-encode data encoded with either the quoted-printable or
    base64 implementations.  (If a compressed encoding  is ever
    agreed to, it should also become part of all compliant user
    agents.)

    2.  Recognize and interpret the Content-type header field, and
    avoid showing an unsuspecting user raw data that has a
    content-type field other than text.

    3.  Explicitly handle the following content-type values, as
    defined in the appendices:

        -- text, with at least the MAILASCII character set.

        -- message, with at least the MAILASCII character set.

         -- multipart, although parallel parts may be serialized.

        -- digest, with at least the MAILASCII character set.

        -- binary, although no particular subtype recognition is
        required.

    4.  Upon encountering an unrecognized content-type, an
    implementation should treat it as if it had a content-type of
    "binary" with no parameter sub-arguments.

A user agent that meets the above conditions is said to be RFC-XXXX
compliant.  The meaning of this phrase is that it is assumed to be
"safe" to send virtually any kind of properly-marked data to users of
such mail systems, because they will at least be able to treat the data
as undifferentiated binary, and will not simply splash it onto the
screen of unsuspecting users.
Summary

Using the Content-Type and Content-TransferEncoding header fields, it is
possible to include, in a standardized way, arbitrary types of data
objects in RFC 822 mail messages, without breaking any of the existing
restrictions imposed by RFC 821 and RFC 822.  Using the "mulitpart"
content-type, it is possible to mix multiple objects of different types
in a single message.  Additional optional header fields provide
conventional mechanisms for certain extensions deemed desirable by many
implementors.  Finally, a number of useful content-types are defined for
general use by consenting user agents.

For more information, the authors of this document may be contacted via
Internet mail:

             Nathaniel Borenstein <nsb@thumper.bellcore.com>
                  Ned Freed <ned@hmcvax.claremont.edu>

Acknowledgements

This RFC is the result of the collective effort of a large number of
people, at several IETF meetings and on the IETF-SMTP and IETF-822
mailing lists.  Although any enumeration seems doomed to suffer from
egregious omissions, the following are among the many contributors to
this effort:  Harald Alvestrand, Randall Atkinson, Kevin Carosso, Mark
Crispin, Dave Crocker, Walt Daniels, Frank Dawson, Hitoshi Doi, Kevin
Donnelly, Johnny Eriksson, Craig Everhart, Roger Fajman, Alain Fontaine,
David Herron, Bruce Howard, Bill Janssen, Risto Kankkunen, Phil Karn,
Tim Kehres, Neil Katin, Steve Kille, Anders Klemets, John Klensin,
Vincent Lau, Timo Lehtinen, John MacMillan, Rick McGowan, Leo
Mclaughlin, Goli Montaser-Kohsari, Keith Moore, Mark Needleman, John
Noerenberg, David J. Pepper, Jonathan Rosenberg, Jan Rynning, Mark
Sherman, Keld Simonsen, Bob Smart, Einar Stefferud, Michael Stein, Taro
Suzuki, Steve Uhler, Stuart Vance,  Erik van der Poel, Peter Vanderbilt,
Greg Vaudreuil, Brian Wideen, Glenn Wright, and David Zimmerman.  The
authors apologize for any omissions from this list, which were certainly
unintentional.
References

[REF-PS]  Adobe Systems, Inc.  Postscript Language Reference Manual. 
Addison-Wesley, Reading, Mass., 1985.

[REF-SGML]  ISO TC97/SC18.  Standard Generalized Markup Language. Tech.
Rept. DIS 8879, ISO, 1986.

[REF-TEX]  Knuth, Donald E.  The TEXbook.  Addison-Wesley, Reading,
Mass., 1984.

[REF-TROFF]  Ossanna, Joseph F. NROFF/TROFF User's Manual.  Bell
Laboratories, Murray Hill, New Jersey, 1976.  Computing Science
Technical Report No.54.

[REF-SCRIBE]  Scribe Systems.  SCRIBE Document Production Software. 
Scribe Systems, 1985. Fourth Edition.

[REF-ISO646] International Standard--Information Processing--ISO 7-bit
coded  character set for information interchange, ISO 646:1983.

[REF-ISO-2022] International Standard--Information Processing--ISO 7-bit
and  8-bit coded character sets--Code extension techniques, ISO
2022:1986.

[REF-ANSI] Coded Character Set--7-Bit American Standard Code for 
Information Interchange, ANSI X3.4-1986.

[REF-X400]  Schicker, Pietro, "Message Handling Systems, X.400", Message
Handling Systems and Distributed Applications, E. Stefferud, O-j.
Jacobsen, and P. Schicker, eds., North-Holland, 1989, pp. 3-41.

[RFC-821] Postel, J.B.  Simple Mail Transfer Protocol.  August, 1982,
Network Information Center, RFC-821. 

[RFC-822]   Crocker, D.  Standard for the format of ARPA Internet text
messages.   August, 1982, Network Information Center, RFC-822.

[RFC-934]   Rose, M.T.; Stefferud, E.A.  Proposed standard for message 
encapsulation.  January, 1985, Network Information Center, RFC-934.

[RFC-1049]  Sirbu, M.A.  Content-type header field for Internet
messages.  March, 1988, Network Information Center, RFC-1049. 

[RFC-1113]  Linn, J.  Privacy enhancement for Internet electronic mail:
Part I -  message encipherment and authentication procedures [Draft]. 
August, 1989, Network Information Center, RFC-1113.

[RFC-1148]  Kille, S.E.  Mapping between X.400(1988) / ISO 10021 and RFC
822.  March, 1990, Network Information Center, RFC-1148.

[RFC-1154]  Robinson, D.; Ullmann, R.  Encoding header field for
internet messages. April, 1990, Network Information Center, RFC-1154.

[REF-ATK] Borenstein, Nathaniel S., Multimedia Applications Development
with the Andrew Toolkit, Prentice Hall, 1990.

[REF-CCITT84c]  CCITT SG 5/VII, "Recommendations X.420," Message
Handling Systems: Interpersonal Messaging User Agent Layer, October 1984.

[REF-CCITT/ISO88b]  CCITT/ISO, "CCITT Recommendations X.420/ ISO IS
10021-7", Message Handling Systems: Interpersonal Messaging System,

[REF-ODA]  ISO 8613; Information Processing: Text and Office System;
Office Document Architecture (ODA) and Interchange Format (ODIF), Part
1-8 1989.

[REF-ULAW] ***************

[REF-ALAW] ***************

[REF-DES] ****************

[REF-PBM] ****************

[REF-G3FAX] ************

[REF-ISO-10646] ************

[REF-ISO-8859] **********

[REF-AFS] ************
Appendix [APP-CONTENTTYPES] -- Partial List of Predefined Content-Type Values

TEXT -- Indicates the body or body part contains textual information. 
The precise meaning of text body parts is given in Appendix [APP-TEXT].

MESSAGE -- Indicates that the body or body part is an encapsulated
message, with the syntax of an RFC 822 message.   If a character set
specification is given, it applies to the uninterpreted textual fields
in the RFC 822 message header area.  Thus it can be used to represent
address and subject  information in non-ASCII character sets.  The
character set specification in the "Content-type: message" field does
NOT apply to the body of the encapsulated message.  Thus, to encapsulate
a message with non-ASCII characters in both the header fields and in the
body, you would need something like the following:

    From: <ASCII form>
    Subject:  <ASCII form>
    Content-type:  message/iso-10646

    From: <iso-10646-form>
    Subject <iso-10646-form>
    Content-type: text/iso-10646

    Message body in iso-10646 character set.

MULTIPART -- Indicates the body or body part contains multiple
encapsulated body parts, each of which may be of a different
content-type.  The precise syntax of a "multipart" message is defined in
Appendix [APP-MULTIPART].

DIGEST -- Indicates that the body or body part is a digest of
encapsulated messages.  The digest content-type is syntactically
identical to the multipart content-type, but the parts are to be
interpreted as encapsulated messages rather than as simple body parts. 
The digest content-type is also suitable for encapsulation of a single
message with a prefix, e.g. for a rejection message when mail cannot be
delivered.

BINARY -- Indicates that the body or body part is binary data.  A
character set may be specified, but its automatic interpretation is
unlikely to be meaningful.  The parameters for type binary are a set of
attribute/value pairs, of the form "NAME=VALUE", separated by the usual
semicolons.  The set of possible attributes to be defined includes, but
is not limited to:

    NAME -- a suggested name for the binary data as a file.

    TYPE -- the type of binary data

    CONVERSIONS -- the set of operations that have been performed on
    the data before putting it in the mail (and before any
    Content-TransferEncoding that might have been applied).  If
    multiple conversions have occurred, they should be specified in
    the order they were applied, and separated by commas.  

The values for these attributes are left undefined at present, but may
require specification in the future.  An example of a common (though
discouraged) usage might be:

    Content-type:  binary; name=foo.tar; type=tar; \
            conversions=compress,uuencode

However, the use of such mechanisms as uuencode and compress is
explicitly discouraged, in favor of the more standardized
Content-TransferEncoding mechanism.  In particular, uuencode is not
well-suited for mail transport because it is ill-defined, it comes in
several incompatible versions, many of which do not work in a pipe, and
which use characters that do not translate well into certain
representations (e.g. EBCDIC) and are not transmitted reliably over
certain connections (e.g. those that remove trailing white space from a
line).  

The recommended action for an implementation that receives binary mail
of an unrecognized type is to simply offer to put the data in a file,
with any Content-TransferEncoding undone, or perhaps to use it as input
to a user-specified process.  Implementations are warned NOT to
implement a path-search mechanism whereby an arbitrary program named in
the Content-type header (e.g. the "type=" subfield) is found and
executed using the binary data as input.  Such an implementation could
open up a significant security problem, the elucidation of which is left
as an exercise for the reader.

AUDIO --   Indicates that the body or body part contains audio data. 
The first parameter specifies the audio representation format;
predefined case-insensitive values are "U-law" [REF-ULAW] and "A-law"
[REF-ALAW].  (U-law and A-law are the American and European audio
telephony standards.)  The second parameter may be used to name a header
format (e.g. "Sun").  The third parameter may be used to give the size,
in bytes, of the header that precedes the actual audio data.  This byte
count applies to the raw audio data, not to the size of the data as
represented in, for example, the base64 encoding.

IMAGE --Indicates that the body or body part contains an image.  The
first parameter specializes the image format; predefined case
insensitive values include "G3Fax" for Group Three Fax [REF-G3FAX] and
"pbm", "pgm", and "ppm" for the "portable bitmap" formats [REF-PBM] for
black and white, grey scale, or color images.

PEM-MESSAGE -- Indicates that the body or body part is an encapsulated
message encrypted with DES encryption [REF-DES] and formatted as the
encapsulated portion of Privacy Enhanced Mail according to RFC 1113
[RFC-1113].   In this case, the body-part is ONLY the privacy-enhanced
encapsulated part that, according to RFC 1113, occurs between the
encapsulation boundaries.  The boundaries themselves (lines of the form
"   -----PRIVACY-ENHANCED MESSAGE BOUNDARY-----") are not included.

PARTIAL-MESSAGE -- Indicates that the body or body part is a fragment of
a larger message.  Three subfields must be specified in the content-type
field:  The first is a unique identifier, to be used to match the parts
together.  The second, an integer, is the part number.  The third,
another integer, is the total number of parts.  Thus, part 2 of a 3-part
message might have the following header field:

    Content-type: Partial-Message; oc=jpbe0M2Yt4s; 2; 3

When the parts of a message broken up in this manner are put together,
the result is a complete RFC-822 format message, which may have its own
Content-type header field, and thus may contain any other data type.

EXTERNAL-REFERENCE -- Indicates that the body or body part is primarily
a placeholder for the data that are intended to be conveyed, presumably
because too much data is involved for the underlying mail transport
mechanism to handle.  The subfields are, as in the case of the "binary"
content-type, attribute-value pairs.  In this case, the subfields
describe a mechanism for accessing the binary data.   The set of
possible attributes includes, but is not limited to:

    FILENAME -- The name of a file that contains the external data.

    SITE -- one or more domain names, comma separated, of machines
    that are known to have access to the data file.

    REAL-TYPE -- The real content-type of the data, once retrieved.

    EXPIRATION -- The date (in the format "month day, year") after
    which the existence of the external data is not guaranteed.

With the emerging possibility of very wide-area file systems [REF-AFS],
it becomes very hard to know in advance the set of machines where a file
will and will not be accessible directly from the file system. 
Therefore it makes sense to provide both a file name, to be tried
directly, and the name of one or more sites from which the file is known
to be accessible.  An implementation can try to retrieve remote files
using FTP or any other protocol, using anonymous file retrieval or
prompting the user for the necessary name and password.  However, the
external-reference mechanism is not intended to be limited to file
retrieval.  One can imagine, for example, using unique identifiers and a
video server for external references to video clips.  However, this memo
explicitly defines only the FILENAME and SITE attributes for retrieval
purposes, as this is the only retrieval method that is currently widely
applicable.  Other attributes may be defined as needed.

The "REAL-TYPE" attribute may be used to specify a new content-type
header field to be applied to the data once retrieved, as the data are
assumed to be only the body of a message, not including any header
information.  Note that semicolons may be quoted within subfields.  Thus
an external reference to an image in G3FAX format might have the
following content-type header field:

    Content-Type: external-reference; \
        name=/usr/local/images/contact.g3; \
        site=thumper.bellcore.com; \
        real-type="image; g3fax" \
        expiration = "September 23, 1997"

If a message is of content-type "external-reference", then the actual
body of the message is ignored.

POSTSCRIPT -- Indicates the body or body part consists of information
encoded using the Postscript Page Definition Language developed by Adobe
Systems, Inc. [REF-PS].  For type "postscript" the first parameter is a
version-number field ("1.0", "2.0", or "null"), and the second field is
a comma-separated list of resource references, including, but not
limited to, "laserprep2.9", "laserprep3.0", "laserprep3.1", and
"laserprep4.0".

TeX -- Indicates the body or body part contains embedded formatting
information according to the syntax of the TeX document production
language. [REF-TEX]

TROFF -- Indicates the body or body part contains embedded formatting
information according to the syntax specified for the TROFF formatting
package developed by AT&T Bell Laboratories. [REF-TROFF].  For type
"troff" the parameters include, but are not limited to, "eqn", "tbl",
"me", and the names of other troff macro packages.  Alternate character
set specifications are acceptable.

ODA -- Indicates that the body or body part is an ODA document,
containing a whole document encoded according to the Office Document
Architecture [REF-ODA].   The single parameter following "Content-type:
ODA" should be either "; ODIF" or "; ODL/SDIF" to indicate the ODA
encoding type.  Any additional information needed to process the
document must be included in the document profile which is included in
the document.

DVI -- Indicates the body or body part is information in the device
independent file format produced by TROFF or TeX.

X-BE2 -- Indicates the body or body part is Andrew-format information
[REF-ATK].    The first parameter is the Andrew datastream version
number, and the second

"X-"atom -- Any type value beginning with the characters "X-" and not
defined here or in another RFC is a private value, to be used by
consenting mail systems by mutual agreement.  Any format without a
rigorous and public definition should be named with an "X-" prefix.
Appendix [APP-TEXT] -- The TEXT Content-type and the MAILASCII Character Set

In keeping with historical practice and expectations, the default
content-type for internet mail is "text", and the default character set
is the one specified by RFC 822.  This content-type can be explicitly
specified as "text", and the character set as "mailascii".

Alternately,a different character set may be specified, in which case
the body text is in the specified character set.  A recommended list of
predefined character sets can be found at the end of this appendix. 
Note that if the specified character set includes 8-bit data, the
Content-TransferEncoding header field is required in order to transmit
the message via SMTP.

The default character set has been the subject of some confusion and
ambiguity in the past.  Its definition is spelled out here to reduce
such ambiguity in the future.

The MAILASCII character set is based on a series of standards and on the
historical standard practice in the Internet mail community.  However,
the precise meaning of this character set has been the subject of some
debate.  In this appendix, therefore, we define the MAILASCII character
set.  It is our belief that this definition corresponds with the default
assumptions made for messages without Content-type headers, as defined
by RFC 822.

The message body is coded in the character set of American Standard Code
for Information Interchange, sometimes known as "7-bit ASCII". This is
not an arbitrary seven-bit character code, but indicates that the
message body uses character coding that uses the exact correspondence of
codes to characters specified in ASCII.  National use variations of
ISO646 [REF-ISO646] are not ASCII, and neither an explicit "ASCII"
character set, nor "MAILASCII", nor the default (omission of a character
set) should be used when characters are coded using them.   (Discussion:
RFC821 very explicitly specifies "ASCII", and references  an earlier
version of the American Standard cited in [REF-ANSI].  Whether that
specification, rather than a reference to an International Standard, was
done deliberately or out of convenience or ignorance, is no longer
interesting: insofar as one of the purposes of specifying a content-type
and character set is to permit the receiver to unambiguously determine
how the sender intended the coded message to be interpreted, assuming
anything other than "strict ASCII" as the default would risk
unintentional and incompatible changes to the semantics of messages now
being transmitted.    This also implies that messages containing
characters coded according  to national variations on ISO646, or using
code-switching procedures (e.g., those of ISO2022), as well as 8-bit or
multiple  octet character encodings MUST use an appropriate character
set specification to be consistent with this specification.)    

Because of the restriction imposed on message bodies by RFC 822 and, in
practice, by Message Transport Agents that are more-or-less compliant
with RFC 821, implementors should be careful in several ways regarding
MAILASCII text:  

    (1) Delimiters other than CR-LF pairs may be used in the local
    representation of a message on some systems.  The persistence of
    CR-LF pairs should not be relied on.

    (2) Isolated CR and LF characters are not well tolerated in
    general; they may be lost or converted to delimiters on some
    systems, and hence should not be relied on.

    (3) TAB characters may be misinterpreted or may be automatically
    converted to variable numbers of spaces.  This is unavoidable in
    some environments, notably those not based on the ASCII
    character set. Such conversion is STRONGLY DISCOURAGED, but it
    may occur, and users of MAILASCII format should not rely on the
    persistence of TAB characters.

    (4) Lines longer than 78 characters may be wrapped or truncated
    in some environments. Line wrapping and line truncation are
    STRONGLY DISCOURAGED, but unavoidable in some cases.
    Applications which depend on lines not being wrapped should use
    mechanisms other than unencoded MAILASCII bodyparts to transmit
    messages. 

    (5)  Trailing "white space" characters (SPACE, TAB, etc.) on a
    line may be discarded by some transport agents, and hence should
    not be relied on.

Please note that the above list is NOT a list of recommended practices
-- we do not recommend that MTA's alter the character of white space, or
wrap long lines.  These are known BAD practices on established networks,
and implementors must guard against the bad effects they can cause.

See RFC 821, RFC 822, and RFC1113 for additional information about
canonical SMTP formats.  Authors of software which composes "MAILASCII"
in compliance with this RFC should be well-acquainted with SMTP formats.

The complete MAILASCII character set is listed below: 

***** SHOULD WE KEEP IN THE CONTROL CHARS????

 0 nul  16 dle  32 sp   48  0   64  @   80  P    96  `   112  p 
 1 soh  17 dc1  33  !   49  1   65  A   81  Q    97  a   113  q 
 2 stx  18 dc2  34  "   50  2   66  B   82  R    98  b   114  r 
 3 etx  19 dc3  35  #   51  3   67  C   83  S    99  c   115  s 
 4 eot  20 dc4  36  $   52  4   68  D   84  T   100  d   116  t 
 5 enq  21 nak  37  %   53  5   69  E   85  U   101  e   117  u 
 6 ack  22 syn  38  &   54  6   70  F   86  V   102  f   118  v 
 7 bel  23 etb  39  '   55  7   71  G   87  W   103  g   119  w 
 8 bs   24 can  40  (   56  8   72  H   88  X   104  h   120  x 
 9 ht   25 em   41  )   57  9   73  I   89  Y   105  i   121  y 
10 nl   26 sub  42  *   58  :   74  J   90  Z   106  j   122  z 
11 vt   27 esc  43  +   59  ;   75  K   91  [   107  k   123  { 
12 np   28 fs   44  ,   60  <   76  L   92  \   108  l   124  |
13 cr   29 gs   45  -   61  =   77  M   93  ]   109  m   125  } 
14 so   30 rs   46  .   62  >   78  N   94  ^   110  n   126  ~ 
15 si   31 us   47  /   63  ?   79  O   95  _   111  o   127 del

Beyond MAILASCII, one can imagine an enormous proliferation of character
sets.  It is the opinion of the authors of this memo that a large number
of character sets is NOT a good thing.  We would prefer to specify a
single character set that can be used universally for representing all
of the world's languages in electronic mail.  Unfortunately, there is no
clear choice for such a universal representation, and existing practice
in several communities seems to point to the continuing use of multiple
character sets in the near future.  For this reason, we define names for
a small number of character sets for which a strong consituent base
exists.  We recommend the use of ISO-10646 wherever possible.

The defined character set names are:

MAILASCII -- as defined above.

ISO-10646 -- as defined in [REF-ISO-10646] 

ISO-8859-X -- where "X" is to be replaced, as necessary, for the
national use variants of ISO-8859 [REF-ISO-8859]

ISO-2022 -- as defined in [REF-ISO-2022]

In the opinion of the authors, this is already far more character sets
than are really desirable, and implementors are discouraged from
defining new ones unless absolutely necessary.

***** I AM SURE THAT I NEED SOME FLESHING OUT OF THE ABOVE DEFINITIONS &
REFERENCES
Appendix [APP-MULTIPART] -- The "Multipart" Content-Type

In the case of multiple part messages, a "multipart" Content-type field
should appear in the RFC 822 message header. The message body is then
assumed to contain multiple parts separated by encapsulation boundaries.
 Each of the parts is defined, syntactically, as a complete RFC 822
message in miniature.  That is, what is found between the encapsulation
boundaries is a header area, a blank line, and a body area, in
accordance with the RFC 822 syntax for a message.  However body parts
are NOT to be interpreted as actually being RFC 822 messages.  To begin
with, NO header fields are actually required in body parts.  A body part
that starts with a blank line, therefore, is a body part for which all
default values are to be assumed.  In such a case, of course, the
absence of a Content-type header field implies that the encapsulation is
MAILASCII text.  The only header fields that have defined meaning for
body-parts are those the names of which begin with "Content-".  All
other header fields are to be ignored in body-parts, and may be
discarded by gateways.  They are permitted to appear in body parts only
for ease of conversion between messages and body parts.

It must be understood that body parts are NOT messages.  For example, a
gateway between Internet and X.400 mail must be able to tell the
difference between a body part that consists of an image and a bodypart
that consists of an encapsulated message, the body of which is an image.
 In order to represent the latter, the body part should have
"Content-type: message", and its body (after the blank line) should be
the encapsulated message, with its own "Content-type: image" header
field.  Body parts use the same syntax as messages because there are
many legitimate cases in which a body part might be converted into a
message, or vice versa.  The identical syntax makes such conversions
easy, but must be understood by implementors.  (For the special case in
which all parts are actually messages, a "digest" content-type is also
defined.)

As stated previously, each pair of consecutive body parts are separated
by an encapsulation boundary.  The encapsulation boundary MUST NOT
appear inside any of the encapsulated parts.  Thus, it is crucial that
the composing agent be able to choose and specify the boundary that will
separate the parts.  

The Content-type field for multipart  messages requires two
supplementary fields.  The first is used to specify a version number and
should be either "1-S" and "1-P".  The two versions have identical
syntax, but the "-P" is intended as a hint, to receivers, that the parts
are intended to be viewed in parallel rather than sequentially.  
Implementations that can not show the parts in parallel, or that choose
not to do so, are free to treat all multipart messages of version "1-P"
as if they were version "1-S".  However, all implementations should
check the version number, to ensure graceful behavior in the event that
an incompatible future version of multipart messages is defined later.

The second supplementary field, which is always required for multipart
messages, is used to specify the format of the encapsulation boundary. 
The encapsulation boundary is defined as a line consisting entirely of
two hyphen characters ("-", decimal code 45) followed by the second
parameter of the Content-type header field with any leading or trailing
white space removed.  (DISCUSSION:  The specification that white space
be removed is intended to eliminate the possible introduction of
ambiguity caused by the addition or deletion of white space by message
transport agents.  They hyphens are for rough compatibility with the
earlier RFC 934 method of message encapsulation, and for ease of
searching for the boundaries in some implementations.  However, it
should be noted that multipart messages are NOT completely compatible
with RFC 934 encapsulations; in particular, they do not obey RFC 934
quoting conventions for embedded lines that begin with hyphens.)

Thus, a typical multipart content-type header field might look like this:

Content-type: multipart; 1-S; gc0p4Jq0M2Yt08jU534c0p

This indicates that the message consists of several parts, each itself
structured as an RFC 822 message, which are intended to be viewed
one-at-a-time, and that the parts are separated by the line

--gc0p4Jq0M2Yt08jU534c0p

The encapsulation boundaries must not appear within the encapsulations,
and should be no longer than 70 characters, not counting the two leading
hyphens.

The encapsulation boundary following the last body-part should be a
distinguished delimiter that indicates that no further body-parts will
follow.  Such a delimiter is identical to the previous delimiters, with
the addition of two more hyphens at the end of the line:

--gc0p4Jq0M2Yt08jU534c0p--

It should be noted that there is room for additional information prior
to the first encapsulation boundary and following the final such
boundary.    In these "prefix" and "postfix" areas, arbitrary text may
be included.  

It is legitimate for a multipart message to specify an alternate
character set.  In such cases, the specified character set specified
applies to the prefix area, the postfix area, and the textual portions
of the body-part headers.  Distinguished portions of the body-part
headers, such as the words "Content-type:", are to retain their
interpretation in US ASCII.

The use of "Content-Type: Multipart" as a message part within another
"Content-Type: Multipart" is explicitly allowed.   In such cases, for
obvious reasons, care must be taken to ensure that each nested mulitpart
message should use a different boundary delimiter.  See Appendix
[APP-COMPLEX] for an example of nested multipart messages.

The use of content-type "Multipart" with only a single included part may
be useful in certain contexts, and is explicitly permitted.

Overall, the body of a multipart message may be specified as follows:

body := prefix 1*encapsulation close-delimiter postfix

encapsulation := delimiter CRLF message

delimiter := "--" <delimiter from Content-type resource> 

close-delimiter := delimiter "--"

prefix := *text

postfix := *text

message = <as defined in RFC 822, with all header fields
	  optional, containing no lines matching "delimiter">
Appendix [APP-SIMPLE] --  Simple Non-ASCII Text Example

***** FILL IN HERE WITH AN EXAMPLE OF NON-ASCII TEXT.  Can somone
provide me with a cute example from a non-ASCII character set?
Appendix [APP-COMPLEX] -- A Complex Multipart Example

What follows is the outline of a complex multipart message.  This
message has three parts to be displayed serially:  an introductory plain
text part, an embedded multipart message, and a closing encapsulated
text message in a non-ASCII character set.  The embedded multipart
message has two parts to be displayed in parallel, a picture and an
audio fragment.

    From: ...
    Subject: ...
    Content-type: multipart; 1-s; tweedledum

    This is a multipart message.  
    Since I've not specified another character set, 
    this "prefix" area is in US ASCII.
    --tweedledum

    ...Some more text appears here...
    [Note that the preceding blank line means 
    no header fields were given and this is text,
    with charset US ASCII.]
    --tweedledum
    Content-type: multipart; 1-p; tweedledee

    This is a multipart message.  
    If you are reading this text, you might want to 
    consider changing to a user agent that understands 
    how to properly display multipart messages.
    --tweedledee
    Content-type: u-law; 8000 HZ; X-NEXT
    Content-TransferEncoding: base64

    ... base64-encoded NeXT-format audio data goes here....
    --tweedledee
    Content-type: image; G3FAX
    Content-TransferEncoding: Base64

    ... base64-encoded FAX data goes here....
    --tweedledee--
    --tweedledum
    Content-type: message/ISO-8859-1

    From: Keld J|rn Simonsen (name can be non-ASCII)
    Subject: whatever
    Content-type: Text/ISO-8859-1
    Content-TransferEncoding: Quoted-printable

    ... Closing text goes here ...
    --tweedledum--
Return of the Son of Beneath the Planet of RFC-XX… Nathaniel Borenstein
Re: Return of the Son of Beneath the Planet of RF… Nathaniel Borenstein
RFC-xxxx videoconference? John C Klensin
re: Return of the Son of Beneath the Planet of RF… Mark Crispin
re: Return of the Son of Beneath the Planet of RF… John C Klensin
re: Return of the Son of Beneath the Planet of RF… Mark Crispin
Re: Return of the Son of Beneath the Planet of RF… John C Klensin
Re: Return of the Son of Beneath the Planet of RF… Nathaniel Borenstein
Re: Return of the Son of Beneath the Planet of RF… Nathaniel Borenstein
Re: Return of the Son of Beneath the Planet of RF… John C Klensin