Re: [ietf-smtp] Proper definition of the term "email payload".

John C Klensin <john-ietf@jck.com> Mon, 01 April 2019 02:53 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: ietf-smtp@ietfa.amsl.com
Delivered-To: ietf-smtp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A1C9B120075 for <ietf-smtp@ietfa.amsl.com>; Sun, 31 Mar 2019 19:53:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XLXXvPHlBw8u for <ietf-smtp@ietfa.amsl.com>; Sun, 31 Mar 2019 19:53:44 -0700 (PDT)
Received: from bsa2.jck.com (ns.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 53C4212002E for <ietf-smtp@ietf.org>; Sun, 31 Mar 2019 19:53:44 -0700 (PDT)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1hAn4s-000086-MK; Sun, 31 Mar 2019 22:53:38 -0400
Date: Sun, 31 Mar 2019 22:53:27 -0400
From: John C Klensin <john-ietf@jck.com>
To: Viruthagiri Thirumavalavan <giri@dombox.org>, Mark Sapiro <mark@msapiro.net>
cc: "R. David Murray" <rdmurray@bitdance.com>, Barry Warsaw <barry@python.org>, ietf-smtp@ietf.org
Message-ID: <221B317451ECA8F7674E1AE6@PSB>
In-Reply-To: <CAOEezJQo=TQcBqVGypW4YD4rLT0JNnfa1zZ9eh1gjQz9Cz8htA@mail.gmail.com>
References: <20190401000943.57BB2143638@mail.wooz.org> <FCBB7422-295F-4E58-AC4E-42A6894C6406@python.org> <75720834-0bdc-deba-40fc-24f17d5a6752@msapiro.net> <CAOEezJQo=TQcBqVGypW4YD4rLT0JNnfa1zZ9eh1gjQz9Cz8htA@mail.gmail.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-smtp/H6t0MiQcrZ71-DZLZl7X7xz5Me4>
Subject: Re: [ietf-smtp] Proper definition of the term "email payload".
X-BeenThere: ietf-smtp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discussion of issues related to Simple Mail Transfer Protocol \(SMTP\) \[RFC 821, RFC 2821, RFC 5321\]" <ietf-smtp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-smtp/>
List-Post: <mailto:ietf-smtp@ietf.org>
List-Help: <mailto:ietf-smtp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-smtp>, <mailto:ietf-smtp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 01 Apr 2019 02:53:48 -0000

Hi.

In the hope that we don't need to iterate further, let me try to
express the problem in a different way, one that is entirely
consistent with Dave Crocker's comments and a few others.

The difficult with a term like "payload" is that the definition
depends on where one is looking from.  Internet email is a
layered system and the perspective depends on the layer.  For
SMTP (from RFC 821 through 5321 fairly consistently), what it is
transferring ("the payload") would be the message content
starting after the DATA command (or equivalent) and continuing
to the end of data indication (normally CRLF.CRLF).  From a
header specification standpoint pre-MIME (e.g., from the
perspective of RFC 822), the payload would probably the message
body after the blank line that indicates the end of the headers
although I suppose one could construct an argument that would
distinguish between trace information and everything else.  When
we get to MIME (and especially "content-type=multipart/"), one
might claim that multipart messages have multiple payloads, one
per body part after the message headers and MIME body part
headers are excluded.  

Dave noted that the term payload "does not appear at all in RFC
5321 or RFC 5322 or RFC 3501".    As the author of one of those
documents, that omission is no accident and is closely connected
to the discussion above. 

best,
   john



--On Monday, April 1, 2019 06:34 +0530 Viruthagiri
Thirumavalavan <giri@dombox.org> wrote:

> Thanks Mark. You have written beautifully. And yes your answer
> makes sense.
> 
> On Mon, Apr 1, 2019 at 6:21 AM Mark Sapiro <mark@msapiro.net>
> wrote:
> 
>> To elaborate just a bit on what Barry says, as far as the
>> Python email library is concerned, the stuff that comes over
>> the wire as a response to the SMTP DATA command (RFC 821 and
>> successors) is the email.message object. If you want to see
>> the whole thing, you use the as_string() or as_bytes()
>> methods on that object.
>> 
>> That object consists of headers and body as described in RFC
>> 822 and successors. The Python email library refers to that
>> body as the payload of that message object.
>> 
>> I think this is all consistent and reasonable in terms of
>> what the email library is trying to do.
>> 
>> In the RFC 821 context, the metadata is the envelope which
>> has a sender and recipients and the entire message is the
>> data, but in the RFC 822 context, the data is split into
>> headers and body and we choose to call the body the payload.
>> 
>> This is a semantic issue. In your "box of beer" example, the
>> service that delivers it considers the payload to be the box
>> and contents, but the consumer considers the payload to be
>> only the contents (and maybe just the beer and not the cans).
>> Take your pick.
>> 
>> I.e., there is no one definitive answer to your question. You
>> have reasons for considering the RFC 821 DATA to be the
>> payload, and you are not wrong, and we have reasons for
>> considering the RFC 822 body to be the payload, and we are
>> not wrong either
>> 
>> Forwarded message.
>> >  *From: *Barry Warsaw <barry@python.org
>> >  <mailto:barry@python.org>> *Subject: **Re: Proper
>> >  definition of the term "email payload".* *Date: *March 31,
>> >  2019 at 17:09:30 PDT
>> >  *To: *Viruthagiri Thirumavalavan <giri@dombox.org
>> >  <mailto:giri@dombox.org>>
>> >  *Cc: *ietf-smtp@ietf.org <mailto:ietf-smtp@ietf.org>, "R.
>> >  David Murray" <rdmurray@bitdance.com
>> >  <mailto:rdmurray@bitdance.com>>, Mark Sapiro
>> >  <msapiro@value.net <mailto:msapiro@value.net>>
>> > 
>> > 
>> >  Hi, I hope you (and they!) don't mind me CCing two other
>> >  people who have worked extensively on Python's email
>> >  library, and in fact much more than myself in the recent
>> >  years.  RDM has done the bulk of the work on the
>> >  new-in-Python-3 APIs, and Mark is a long time core
>> >  developer on GNU Mailman (the project that spawned
>> >  Python's email library).
>> > 
>> >  There are two ways I think about this, and I'll use the
>> >  original RFC numbers to clarify.  There's RFC 821, which
>> >  describes the on-the-wire protocol for SMTP transfers,
>> >  embodied in Python's smtplib library. Then there's RFC
>> >   822, which describes the format of the content of that
>> >  SMTP transfer, but not the protocol itself.  Of course
>> >  there are lots of developments along the way, but that's
>> >  unimportant for the way I think about these things.
>> > 
>> >  What I think you are describing, where the headers are
>> >  part of the payload, is more akin to RFC 821.  That's
>> >  the payload as far as the actual bytes-on-the-wire are
>> >  concerned.  Python's email library is for RFC 822 (and
>> >  the many, many elaborations thereof), so in that case, the
>> >  payload is the body of the message.  On more practical
>> >  terms, the implementation makes this clear, and the APIs
>> >  you use to change headers are different in form and
>> >  function than the ones you use to change the body of the
>> >  message.
>> > 
>> >  I think the Python documentation is fairly clear about this
>> >  distinction.  At least, I don't remember seeing any
>> >  feedback to the contrary, although RDM may have a better
>> >  sense of that.  Of course, we are always open to
>> >  improvements in Python's documentation.
>> > 
>> >  Cheers,
>> >  -Barry
>> > 
>> >> On Mar 31, 2019, at 10:57, Viruthagiri Thirumavalavan
>> >> <giri@dombox.org <mailto:giri@dombox.org>> wrote:
>> >> 
>> >> Hello IETF,
>> >> 
>> >> I need some clarification about the term "email payload".
>> >> 
>> >> Wikipedia says
>> >> 
>> >> In computing and telecommunications, the payload is the
>> >> part of transmitted data that is the actual intended
>> >> message. Headers and metadata are sent only to enable
>> >> payload delivery
>> >> 
>> >> Python email library documentation says this.
>> >> 
>> >> An email message consists of headers and a payload (which
>> >> is also referred to as the content). Headers are RFC 5322
>> >> or RFC 6532 style field names and values, where the field
>> >> name and value are separated by a colon. The colon is not
>> >> part of either the field name or the field value. The
>> >> payload may be a simple text message, or a binary object,
>> >> or a structured sequence of sub-messages each with their
>> >> own set of headers and their own payload. The latter type
>> >> of payload is indicated by the message having a MIME type
>> >> such as multipart/* or message/rfc822.
>> >> 
>> >> It looks like Python email library author "Barry Warsaw"
>> >> followed similar definition found in wikipedia when
>> >> defining his library functions. But I feel like calling
>> >> ONLY the email "Body Part" as "payload" is wrong. The term
>> >> "payload" should refer to the entire "Message Part" in
>> >> Email. i.e. Both Headers and Body.
>> >> 
>> >> When you place an order for a "box of beer", you are not
>> >> paying only for the "beer cans", but also paying for the
>> >> "container box". So the payload here is the entire box.
>> >> 
>> >> HTTP Example:
>> >> 
>> >> HTTP/1.1 200 OK
>> >> Date: Sun, 10 Oct 2010 23:26:07 GMT
>> >> Content-Type: text/html
>> >> Content-Length: 1234
>> >> 
>> >> <html>
>> >> 
>> >> <head>
>> >> <title>Hello World!</title>
>> >> </head>
>> >> 
>> >> <body>
>> >> (more contents)
>> >>  .
>> >>  .
>> >>  .
>> >> </body>
>> >> </html>
>> >> 
>> >> 
>> >> If you take a closer look at this HTTP example, the
>> >> headers are only just instructions for the client. The end
>> >> user doesn't need to worry about any piece of information
>> >> found in those headers. So wikipedia definition perfectly
>> >> suited for applications like HTTP.
>> >> 
>> >> But in Email, When a mail get transferred from Hop A to
>> >> Hop C via Hop B, the user in Hop A actually wants to
>> >> deliver the whole "message part" to Hop C. If Hop B,
>> >> strips the headers and transfer only the "Body" part, then
>> >> it becomes an "Anonymous" message. So the end user
>> >> requires the information found in the "Headers" too. e.g.
>> >> From, Subject, Date etc. [In HTTP, title tag is equivalent
>> >> to Subject and it's found in the "head" Markup, not in the
>> >> HTTP Headers]
>> >> 
>> >> As you can see, the user is interested in the "entire
>> >> message". So the term "actual intended message" should
>> >> refer to the "whole message" extracted from the DATA
>> >> command. The "actual intended message" should be pictured
>> >> like this in email.
>> >> 
>> >> Also note that, when you migrate your mails to another
>> >> mail service, you need the whole message with Headers, not
>> >> just the body.
>> >> 
>> >> Based on my points, I believe calling only the "Body" part
>> >> as "Payload" is wrong. I would love to hear your thoughts
>> >> on this. If Barry Warsaw is here, would love to know your
>> >> opinion too.
>> >> 
>> >> PS: I did actually ask this question 2 years back in a
>> >> stackexchange website. I wasn't satisfied with the answer
>> >> I got there. I don't want to use the term incorrectly in
>> >> my application. That's why I'm posting it here.
>> >> 
>> >> Thanks
>> >> --
>> >> Best Regards,
>> >> 
>> >> Viruthagiri Thirumavalavan
>> >> Dombox, Inc.
>> 
>> 
>> --
>> Mark Sapiro <mark@msapiro.net>        The highway is for
>> gamblers, San Francisco Bay Area, California    better use
>> your sense - B. Dylan
>>