[ietf-822] Most common mail header fields seen with nonsyntactic values

Peter Occil <poccil14@gmail.com> Tue, 24 July 2018 03:28 UTC

Return-Path: <poccil14@gmail.com>
X-Original-To: ietf-822@ietfa.amsl.com
Delivered-To: ietf-822@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 55E4C130FC8 for <ietf-822@ietfa.amsl.com>; Mon, 23 Jul 2018 20:28:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.749
X-Spam-Level:
X-Spam-Status: No, score=-1.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GyOilq2NXrqA for <ietf-822@ietfa.amsl.com>; Mon, 23 Jul 2018 20:28:51 -0700 (PDT)
Received: from mail-yb0-x230.google.com (mail-yb0-x230.google.com [IPv6:2607:f8b0:4002:c09::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6184D130FCA for <ietf-822@ietf.org>; Mon, 23 Jul 2018 20:28:51 -0700 (PDT)
Received: by mail-yb0-x230.google.com with SMTP id x10-v6so1113468ybl.10 for <ietf-822@ietf.org>; Mon, 23 Jul 2018 20:28:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:mime-version:to:from:subject:date:importance; bh=nYTNLwewELV5LmAA+Pb3uMpFeaGsNk9+Hg2fneT8f5U=; b=KeZHHwBOLo9YgYpDWGIlfbLrZXLKfM61dlxZxCp7VLGUibmQMwiqRaKKKyotfLYn5/ ZgRw0rs9lT/6xO3p/qxFMcxlX2CAKccUCXzS3CHdK/aL05G5pHZUY2Ya0oE5kRRcG+V6 /Xhw8EFr9+bG03KnZ5UF/HCmd+q/bo2QyDKnmX9WDe+GmL9FPqkrOXNVp7g26ULw/j0M wjCYS7BWvbiFfQD3mg+WFwoKpMI6pZE1Ede/5hoOP2EdHwNeUIkd2GVGWQBrUgU3lDvn VqyotYfC/WxmL42dQN8VJHMeo+OQCvX8Rp1WXpbz6tE5q3xFn8R0IsdxBuCmHWBofKyU k5DQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:mime-version:to:from:subject:date :importance; bh=nYTNLwewELV5LmAA+Pb3uMpFeaGsNk9+Hg2fneT8f5U=; b=ZLVz1obcqhpsbDoYyGUUpnEyr7RP9PzL2rgmzb7LydyvahpgcmFgSLB/kjWcAXY7bL FS6sUdEuvk32q7ei5wpGBQqbuFBeF5FlZaN0b8plLpectvPNrz4SqAcH/mQm3Pa74UIu F+KbqecZ+yaXc4uwjic++o1h/yFMmJPP3zndw2jcRoG0j0/8Cm7F+xxa3tY8qrdIetRk metp1aXaOAGCpqSLt+jxek3ZN/cK2DoeQirk9F0SnBETAFvlxbWu7k+tO6HpM2Q2wu5j U9JlcLrVHum5iRQvFbnmNrQu6XcfvVlBIZDvqYnlczuKvM159WgI8htDzzfIVcNSDxvq 284g==
X-Gm-Message-State: AOUpUlGTPrPxcMzhqcUDCXQUSNHtAX/HoBI7B25Ju2ig2ZPnOP+tlNT7 a+Cf2p+EtrH/Sp54nE/JcAUTQqBeWbE=
X-Google-Smtp-Source: AAOMgpeu77OzpcsDow26EMf3/+JO6FouTUn9Uoy7HM0ROz/QEk9dm6o0jFsQfEg64SirzwgefGSD8A==
X-Received: by 2002:a5b:391:: with SMTP id k17-v6mr8307130ybp.490.1532402930231; Mon, 23 Jul 2018 20:28:50 -0700 (PDT)
Received: from ?IPv6:2601:192:4e00:596:22:8b71:4eb9:6006? ([2601:192:4e00:596:22:8b71:4eb9:6006]) by smtp.gmail.com with ESMTPSA id i125-v6sm4808664ywd.92.2018.07.23.20.28.49 for <ietf-822@ietf.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Jul 2018 20:28:49 -0700 (PDT)
Message-ID: <5b569cf1.1c69fb81.faa6a.8e55@mx.google.com>
MIME-Version: 1.0
To: "ietf-822@ietf.org" <ietf-822@ietf.org>
From: Peter Occil <poccil14@gmail.com>
Date: Mon, 23 Jul 2018 23:28:51 -0400
Importance: normal
X-Priority: 3
Content-Type: multipart/alternative; boundary="_528B80D6-0EA6-48B4-B5C7-BE6C17B73512_"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-822/sMuJJ--ci9hLiiqUzag1hKl977U>
Subject: [ietf-822] Most common mail header fields seen with nonsyntactic values
X-BeenThere: ietf-822@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: "Discussion of issues related to Internet Message Format \[RFC 822, RFC 2822, RFC 5322\]" <ietf-822.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-822/>
List-Post: <mailto:ietf-822@ietf.org>
List-Help: <mailto:ietf-822-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Jul 2018 03:28:53 -0000

The following is a list of email header fields where I find a significant proportion of those fields in practice using a different form from the documented syntax of those fields.

This list is, for the moment, for your information only.  Whether the documents defining the header fields listed below should be updated (whether to accommodate how those fields are used in practice or otherwise), or what error handling a program should use if it encounters any of these header fields, are matters that require further discussion.  (In one case, the note in RFC 5322 provides guidance on error handling, but in many other cases, those documents don't seem to suggest or require any particular error-handling behavior.)

ARC-Authentication-Results.

Some nonsyntactic values of this header field contain a "header.b" parameter value containing a slash, which cannot occur in a "pvalue".

Authentication-Results.

Many nonconforming Authentication-Results values are of an unusual form that I've already reported elsewhere, in the "dmarc" mailing list.  Unlike most of the other forms I report here, this one may be truly nonconforming.

Other nonsyntactic Authentication-Results values--

- don't mention the domain name of the authentication server (they generally have a comment like "(sender IP is ...)"),
- contain a "header.b" parameter value containing a slash, which cannot occur in a "pvalue",
- contain an "x-tls.subject" parameter right after the authserv name (which only one specific implementation apparently generates), and/or
- contain "d=<pvalue>" or "reason=<pvalue>" after the form "<method>=<result> (comment)", which doesn't conform to the documented syntax.


Content-ID.

Of the Content-ID header fields I've seen in practice, a significant proportion of them (almost half) do not follow the syntax of "msg-id", even though they contain angle-brackets.  Some examples use UUIDs inside angle-brackets rather than "msg-id"s with an at-sign, while other examples, such as "<example.jpg>" and "<down_arrow>", were obviously generated to be message-unique rather than "world-unique" as required by RFC 2045 sec. 7.  (On the other hand, I see very few instances of Message-ID header fields not following the syntax of that header field.)  A smaller number of fields do not use angle-brackets at all, and some of them include the values "html-body" and "text-body".

List-Archive.

All of the nonsyntactic List-Archive values I've seen so far involve GitHub URLs.  Here the URL appears without angle brackets.

List-ID.

Some List-ID values either include no dots or domain names, or they are numbers or underscore-separated number sequences with no angle-brackets.

List-Unsubscribe.

Many nonsyntactic List-Unsubscribe values involve either URLs not appearing in angle brackets, or URLs encoded with RFC 2047 encoded words (compare with Content-Location, which does allow the latter).

Received.

Many nonsyntactic Received bodies--

- include fractional seconds in the date and time,
- include unquoted IPv6 addresses (which contain colons and don't conform to the "received-token" syntax), 
- have no semicolon before the date/time, and/or
- include a "for" clause containing "<multiple recipients>" (without the quotation marks).

In one case, I have noticed a Received header field with an ASCII control character (U+0001, I think) in the "by" clause; unfortunately such a field is not downgradable under RFC 6857, nor can it appear in a generated header field under RFC 5322.

Received-SPF.

Many nonsyntactic Received-SPF bodies include an unquoted IPv6 in the "client-ip" parameter (which conform to neither "dot-atom" nor "quoted-string" because of the colons), and some include an unquoted email address in the "envelope-from" parameter.

Return-Path.

Many Return-Path header-field values don't include angle brackets (and appear as "addr-spec", rather than "path" as required by RFC 5322.)  A very small number also include a display name (and appear as "mailbox" under that RFC).

-------

For other standard header fields, nonconforming values occur very rarely if at all (in my experience).

--Peter