Re: [spfbis] Most common mail header fields seen with nonsyntactic values

Peter Occil <> Wed, 25 July 2018 03:17 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 6E80E130F84; Tue, 24 Jul 2018 20:17:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.749
X-Spam-Status: No, score=-1.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id a6gBRjgqzCzs; Tue, 24 Jul 2018 20:17:10 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4002:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id F0EDE130F66; Tue, 24 Jul 2018 20:17:09 -0700 (PDT)
Received: by with SMTP id q129-v6so2358225ywg.8; Tue, 24 Jul 2018 20:17:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=message-id:mime-version:to:cc:from:subject:date:importance :in-reply-to:references; bh=IUiVG+MkICAFv043J5sfFKD+3xx9trjkQoCZwHWrQ7U=; b=MfKa2eaRk6zBmQ2mmmLfbQCVaaU2NgFRC1Ln66EgWC0VhQrQxMYSId8s1xruVAsEC3 ME0Jj1QGhSNYtgfDPhHj3u2qXub64QOs+i1HjoILNLlNYOPFUdFJShaEfSYj6B28bGQa S8W5/7FGH1mLFdePqejshrozKOkU0qd5miSz1H9OW22nVy2pBvlA3Q41uvSUgV+DsUk4 7lTre/TN7CvgoCySRqTbGbXJxKSmSN58l5wuanalMURT56/jPOI9fFi+1xBkxOG4ketk c5jpqpKdXf94VYsCLyKSyVGh4lue/sRUBOp9ueZQYdoKeLI5GdOo6NxmFcD7nTdImXDX PEuQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:message-id:mime-version:to:cc:from:subject:date :importance:in-reply-to:references; bh=IUiVG+MkICAFv043J5sfFKD+3xx9trjkQoCZwHWrQ7U=; b=sUUuwvtjgo9MNU0YrU4q4a8lzCrIM5gq7LJqwIRTcSLHQ303L5NPqYlqVo2PMEL7rQ IgYx6hEKDgqVwsz76Z/1CP2lhEnuj6qc7d9H0oy2O1ku7Cvi5cOf9M7fCGnvo8bEF6Gj nh+l53zSDm6eMlEEELszve/2PL1LIkKlKLg2143V1g8LTs6Jkjoji8UShV4j3cF3ucEy ZVUDc0+c0LpDa+iErvfeWFKfjUjoy3opfBpIEOJWNLukgHtkPm5K+iHVa12ztZR07xSq CC1ksO2xIvL3ZXMGv/tAvedgOMOeUBhrhGSbmgVYXVlnX/YyNXF2kKl2pN4QNV2LVaAf 3TpA==
X-Gm-Message-State: AOUpUlHG9XZYcyRHfkEXf8+sSy7pYuhRH5zaI30uLvVe1EOWvQ1GTu0V vPlaFg/mts44n4UgKvjv4dvBt6UXIW0=
X-Google-Smtp-Source: AAOMgpc1zhkYufNhXWJy2i/Y4SryDlY3M7gG5yNyINmeqULp5NLvfJIaRgozwR6RgEAXMmS7F9amig==
X-Received: by 2002:a81:e301:: with SMTP id q1-v6mr10762917ywl.499.1532488628807; Tue, 24 Jul 2018 20:17:08 -0700 (PDT)
Received: from ?IPv6:2601:192:4e00:596:22:8b71:4eb9:6006? ([2601:192:4e00:596:22:8b71:4eb9:6006]) by with ESMTPSA id n66-v6sm10660810ywn.77.2018. (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Jul 2018 20:17:08 -0700 (PDT)
Message-ID: <>
MIME-Version: 1.0
To: "" <>
Cc: "" <>, "" <>
From: Peter Occil <>
Date: Tue, 24 Jul 2018 23:17:08 -0400
Importance: normal
X-Priority: 3
In-Reply-To: <>
References: <>
Content-Type: multipart/alternative; boundary="_2FF26A6A-47FD-4B50-A24C-2E5003C5B5E4_"
Archived-At: <>
Subject: Re: [spfbis] Most common mail header fields seen with nonsyntactic values
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: SPFbis discussion list <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 25 Jul 2018 03:17:13 -0000

Adding additional mailing lists because three header fields listed below (Received-SPF, Authentication-Results, Arc-Authentication-Results) are within their scope.


From: Peter Occil
Sent: Monday, July 23, 2018 11:28 PM
Subject: Most common mail header fields seen with nonsyntactic values

The following is a list of email header fields where I find a significant proportion of those fields in practice using a different form from the documented syntax of those fields.

This list is, for the moment, for your information only.  Whether the documents defining the header fields listed below should be updated (whether to accommodate how those fields are used in practice or otherwise), or what error handling a program should use if it encounters any of these header fields, are matters that require further discussion.  (In one case, the note in RFC 5322 provides guidance on error handling, but in many other cases, those documents don't seem to suggest or require any particular error-handling behavior.)


Some nonsyntactic values of this header field contain a "header.b" parameter value containing a slash, which cannot occur in a "pvalue".


Many nonconforming Authentication-Results values are of an unusual form that I've already reported elsewhere, in the "dmarc" mailing list.  Unlike most of the other forms I report here, this one may be truly nonconforming.

Other nonsyntactic Authentication-Results values--

- don't mention the domain name of the authentication server (they generally have a comment like "(sender IP is ...)"),
- contain a "header.b" parameter value containing a slash, which cannot occur in a "pvalue",
- contain an "x-tls.subject" parameter right after the authserv name (which only one specific implementation apparently generates), and/or
- contain "d=<pvalue>" or "reason=<pvalue>" after the form "<method>=<result> (comment)", which doesn't conform to the documented syntax.


Of the Content-ID header fields I've seen in practice, a significant proportion of them (almost half) do not follow the syntax of "msg-id", even though they contain angle-brackets.  Some examples use UUIDs inside angle-brackets rather than "msg-id"s with an at-sign, while other examples, such as "<example.jpg>" and "<down_arrow>", were obviously generated to be message-unique rather than "world-unique" as required by RFC 2045 sec. 7.  (On the other hand, I see very few instances of Message-ID header fields not following the syntax of that header field.)  A smaller number of fields do not use angle-brackets at all, and some of them include the values "html-body" and "text-body".


All of the nonsyntactic List-Archive values I've seen so far involve GitHub URLs.  Here the URL appears without angle brackets.


Some List-ID values either include no dots or domain names, or they are numbers or underscore-separated number sequences with no angle-brackets.


Many nonsyntactic List-Unsubscribe values involve either URLs not appearing in angle brackets, or URLs encoded with RFC 2047 encoded words (compare with Content-Location, which does allow the latter).


Many nonsyntactic Received bodies--

- include fractional seconds in the date and time,
- include unquoted IPv6 addresses (which contain colons and don't conform to the "received-token" syntax), 
- have no semicolon before the date/time, and/or
- include a "for" clause containing "<multiple recipients>" (without the quotation marks).

In one case, I have noticed a Received header field with an ASCII control character (U+0001, I think) in the "by" clause; unfortunately such a field is not downgradable under RFC 6857, nor can it appear in a generated header field under RFC 5322.


Many nonsyntactic Received-SPF bodies include an unquoted IPv6 in the "client-ip" parameter (which conform to neither "dot-atom" nor "quoted-string" because of the colons), and some include an unquoted email address in the "envelope-from" parameter.


Many Return-Path header-field values don't include angle brackets (and appear as "addr-spec", rather than "path" as required by RFC 5322.)  A very small number also include a display name (and appear as "mailbox" under that RFC).


For other standard header fields, nonconforming values occur very rarely if at all (in my experience).