Re: [Ietf-message-headers] Re: Jabber-ID header field

Bruce Lilly <> Mon, 02 October 2006 03:09 UTC

Received: from [] ( by with esmtp (Exim 4.43) id 1GUEAy-0007pJ-IV; Sun, 01 Oct 2006 23:09:04 -0400
Received: from [] ( by with esmtp (Exim 4.43) id 1GUEAx-0007pE-97 for; Sun, 01 Oct 2006 23:09:03 -0400
Received: from ([]) by with esmtp (Exim 4.43) id 1GUEAw-00009z-PH for; Sun, 01 Oct 2006 23:09:03 -0400
Received: from ([]) by with ESMTP; 01 Oct 2006 23:09:00 -0400
Received: from ( []) by (MOS 3.7.5a-GA) with ESMTP id MHT45454; Sun, 1 Oct 2006 23:08:59 -0400 (EDT)
Received: from (HELO ([]) by with ESMTP; 01 Oct 2006 23:08:56 -0400
X-IronPort-AV: i="4.09,242,1157342400"; d="scan'208"; a="286259894:sNHT30874548"
Received: from ( []) by with ESMTP id k9238O4j001943(8.13.6/8.13.6/ /etc/ 1.28 2006/06/11 04:26:23) (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) ; Sun, 1 Oct 2006 23:08:24 -0400
Received: from (localhost []) (authenticated (0 bits)) by with ESMTP id k9238NJC001942(8.13.6/8.13.6/ 1.3 2005/04/08 12:29:31) (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) ; Sun, 1 Oct 2006 23:08:23 -0400
From: Bruce Lilly <>
Organization: Bruce Lilly
Subject: Re: [Ietf-message-headers] Re: Jabber-ID header field
Date: Sun, 1 Oct 2006 23:08:08 -0400
User-Agent: KMail/1.9.4
References: <> <> <>
In-Reply-To: <>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <>
X-Junkmail-Status: score=10/50,
X-Junkmail-SD-Raw: score=unknown, refid=str=0001.0A090203.452082A7.002A,ss=1,fgs=0, ip=, so=2006-05-09 23:27:51, dmn=5.2.113/2006-07-26
X-Spam-Score: 0.1 (/)
X-Scan-Signature: a3f7094ccc62748c06b21fcf44c073ee
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: sic <>
List-Id: "Discussion list for header fields used in Internet messaging applications." <>
List-Unsubscribe: <>, <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>

On Thu September 28 2006 18:54, Frank Ellermann wrote:
> Bruce Lilly wrote:

> > some proposed syntax conflicts with RFC 2822 (e.g. ';' is
> > an RFC 2822 "special").
> Grmbl, yes.  Do you have a tool for this ?

None needed; the list of RFC 822/2822 specials is small and easily
remembered -- when I see [RFC 4622]

     pathxmpp  = [ nodeid "@" ] host [ "/" resid ]
     nodeid    = *( unreserved / pct-encoded / nodeallow )
     nodeallow = "!" / "$" / "(" / ")" / "*" / "+" / "," / ";" /
                 "=" / "[" / "\" / "]" / "^" / "`" / "{" / "|" /
     resid     = *( unreserved / pct-encoded / resallow )
     resallow   = "!" / DQUOTE / "$" / "&" / "'" / "(" / ")" /
                  "*" / "+" / "," / ":" / ";" / "<" / "=" / ">" /
                  "[" / "\" / "]" / "^" / "`" / "{" / "|" / "}"

I can immediately see (in "resallow") conflicts DQUOTE, "(", ")",
":", ";", "<", ">", "[", "]", and ",", and of course "@" in
pathxmpp itself.  The "nodeallow" construct adds "\".  Indeed "." seems to be
the only 822/2822 special not directly in conflict with 4622 [and that is
pulled in via 3986 "unreserved"].  Some of these ("@", ".",) are unlikely to
cause trouble; others are likely to cause varying degrees of trouble
depending on parser construction.  N.B. [RFC 2822]:

   Each of the characters in specials can be used to indicate
   a tokenization point in lexical analysis.

Unbalanced use of specials which (in 2822 fields) always appear in properly
nested and balanced pairs, but which 4622 does not require to be balanced or
properly nested are highly likely to cause trouble except for the most robust
of parsers. That includes DQUOTE, parentheses, colon/semicolon, square
brackets, and angle brackets (N.B. all but dot, comma, and backslash).
Backslash is also highly likely to cause trouble as it is used for quoting
and has some rather baroque rules about where it may legally be used for
quoting (presumably just a plain character elsewhere) -- and many
implementations run afoul of those rules.

Now percent-encoding can be used to avoid troublesome direct use of certain
characters, e.g. as implied by RFC 2369.  However, there remain several sticky
1. obviously, the particular troublesome characters should be explicitly
   listed in the ABNF and normative text
2. from a practical point of view, to make any use of the content, encoding
   and decoding functions need to be available.  These will have to be
   defined or reference made to an existing definition.  But RFC 2369
   references RFC 1738 whereas RFC 4622 references RFC 3986, and the lists of
   "reserved" / "unreserved" characters are markedly different between 1738
   and 3986.  Given the utility of the RFC 2369 (and 2919) fields in a mail
   environment, and the corresponding near-complete separation of mail and
   chat applications (except possibly in one old implementation of one client
   line) means that the 1738 characters are going to continue to be used [for
   backward compatibility], no matter what RFC 3986 says.
3. The combination of the necessity of encoding/decoding and the separation
   of client functionality means that naive copy-and-paste (between MUA and
   chat applications) is unlikely to work well.  That is one reason why a
   transparent (i.e. not requiring specialized encoding) mechanism such as a
   media type is infinitely preferable to a highly specialized encoding of a
   (probably undisplayed) message header field; a media type whose content is
   unencoded CAN be copied-and-pasted even with naive client implementations,
   and can be easily saved to a file for export/import in cases where
   copy-and-paste is not provided among applications.

> > I believe that the peculiarity of use of an obs- construct
> > in a proposed new field has already been noted.
> That's just the known issue of "no folding in a FWS directly
> before a CRLF" combined with the 2822 accept obs-FWS MUSTard
> in a pending erratum.

With one exception, the obs- constructs in 2822 exist to accommodate legal
822 and earlier syntax generation of legacy fields.  There is no reason for a
newly defined field to incorporate any obs- construct; 2822 rules prohibit
new generation of obs- constructs in any case, and precisely because the
field is new, there are no 822 or pre-822 generators of the field to worry

> > exactly which mailbox or set of mailboxes corresponds to the
> > draft use of the word "sender"?
> >         Resent-Sender:
> >         Resent-From:,
> >         From:,
> >         Sender: (see RFC 3192) "FAX=+33-1-88335215"
> >         Reply-To: foo list:,;
> It's no Resent-Jabber-ID, the Resent-* are out.

So where is the prohibition against a resender adding such a field?  For
that matter, where is the specification of who may insert/modify/interpret
such a field (see RFC 4249 sect. 3.3.1)?

>   The
> I-D says that the Sender sets the Jabber-ID.

The draft does not capitalize "sender", nor does it explicitly state whether
"sender" means the mailbox in the Sender field if present; neither does it
state what to use if the optional Sender field (in the case of a sole author)
is absent.  Note also [RFC 822]

        This field contains the authenticated identity  of  the  AGENT
        (person,  system  or  process)  that sends the message.  It is
        intended for use when the sender is not the author of the mes-
        sage,  or  to  indicate  who among a group of authors actually
        sent the message.  If the contents of the "Sender" field would
        be  completely  redundant  with  the  "From"  field,  then the
        "Sender" field need not be present and its use is  discouraged
        (though  still legal).  In particular, the "Sender" field MUST
        be present if it is NOT the same as the "From" Field.

        The Sender mailbox  specification  includes  a  word  sequence
        which  must correspond to a specific agent (i.e., a human user
        or a computer program) rather than a standard  address.   This
        indicates  the  expectation  that  the field will identify the
        single AGENT (person,  system,  or  process)  responsible  for
        sending  the mail and not simply include the name of a mailbox
        from which the mail was sent.  For example in the  case  of  a
        shared login name, the name, by itself, would not be adequate.
        The local-part address unit, which refers to  this  agent,  is
        expected to be a computer system term, and not (for example) a
        generalized person reference which can  be  used  outside  the
        network text message context.

N.B. the Sender field does not necessarily reference a person's mailbox
and in general is not usable for addressing (RFC 3834 section 4 
reinforces these facts and emphasizes that the field need not be valid
for replies).

> > There appears to be no provision for line folding within an
> > ID (RFC 4622 "pathxmpp" production).
> Yes, same issue as for <msg-id> and <addr-spec> in RFC 2822.

In mail (the applicable application for addr-spec in messages), the length of
an addr-spec local-part is bounded by the SMTP protocol specifications, and
(contrary to your implication) line folding in addr-spec is permitted by RFC
2822 on either side of the "@" by [CFWS] trailing local-part (via dot-atom
and quoted-string) on the left, and by leading [CFWS] on the right (dot-atom
and domain-literal).  The length of the text form of a domain name is bounded
(255 octets) by the DNS specifications; domain literals are likewise of
bounded length.  As for msg-id, the domain name or literal bounds still
apply; the left-hand-side is limited in length by the overall line length
limit of 998 octets due to the prohibition of folding within msg-id.

However, URIs and URI-derived constructs typically do not specify upper
bounds -- indeed both "nodeid" and "resid" relevant to the draft under
discussion are specified with neither lower nor upper bounds in the
referenced RFC 4622.

> What do you propose, a statement "make sure that a Jabber-ID
> isn't longer than 997 octets" ?

No, unless it is proposed that the draft contain an amendment of 4622
imposing upper bounds on "nodeid" and "resid" such that "pathxmpp" can
never be longer than 997 octets (and that would have to include any
encoding, etc.).  Otherwise, there's not much point in defining a field
that simply will not work for some subset of otherwise valid (outside of
the Internet Message Format) IDs.  Aside from the recommended method of
simply avoiding the whole sticky mess by using a media type, it might (or
might not) be possible to adapt existing mechanisms for handling long URIs
(RFCs 2017, 2231 (probably not, as its scope is MIME field parameters), and

> > What happens when an ID exceeds 998 octets in length
> The nice MSA will reject the message as specified in RFC 4409.

MSAs are optional components.  Whether or not an MSA is present, the message
is likely to bounce back to the designated handler for error responses (as
specified via SMTP MAIL FROM); in the subset of cases where that is the
message initiator, the only fallback mechanism for conveying such an ID
with the message would be a suitably-tagged MIME-part which is transfer
encoded using MIME mechanisms (including the "binary" identity transformation
if that is supported by SMTP body transport).  So if a suitable media type
needs to be defined anyway (for fallback), then one might as well use that
as the primary mechanism (with all of the advantages noted in my earlier
messages), avoiding the pitfalls associated with the proposed field. [In
cases where the message initiator has delegated handling of errors, the
initiator may be unaware of a failure; note that such failure would prevent
delivery of not only the errant proposed field, but the entire message!]

> Your media type proposal strikes me as odd, the JID could then
> be also added to some existing vcard or similar "attachment"...

Possibly, if the syntax is compatible with the existing media type (esp.
encoding; N.B. 7bit and 8bit encodings still retain the 998 octet line
length limit).  If it is in fact compatible, I have no objection to a
suitable modification of that existing type [it would of course have to go
through the media type review process, and I have no idea what comments that
might evoke, though I expect that the 1738/3986 conflict would arise (RFC
2426 references 1738)].  I suspect that due not only to changes from 1738
through 2396 to 3986 but also from the earlier registration procedures to the
current RFCs 4288/4289 (BCP 13), an RFC 2426 successor would be needed, and
that is likely to be a non-trivial task (references would need to be
organized as Normative vs. Informative, boilerplate revised, etc.; if the
revision is intended as Standards Track progress (to Draft) then an
implementation report is required).  Likewise, resolving all of the problems
noted with conflicts between 2822 and the proposed field would likely be
non-trivial (and would still leave unresolved practical problems related to
copy-and-paste, etc.).  A new application media subtype seems to be the fast
track to getting the desired functionality.
> Nobody would use it

And your evidence to support that assertion is...? (N.B. vcards ARE in

> With the 
> JID as header field you implicitly have the vcard,

No, vcards are not header fields, they are in fact conveyed as a MIME media

> if you have 
> an UA suppporting to retrieve the user info of the given JID.

And the UAs which do so are...? (unlikely to exist, due to the 1738 vs.
3986 conflicts and the above noted separation of client functionality)

Ietf-message-headers mailing list