Problems with in-reply-to threading

Harald Tveit Alvestrand <harald@alvestrand.no> Fri, 31 August 2001 12:03 UTC

Received: from localhost (localhost [[UNIX: localhost]]) by above.proper.com (8.11.6/8.11.3) id f7VC3ql06724 for ietf-822-bks; Fri, 31 Aug 2001 05:03:52 -0700 (PDT)
Received: from eikenes.alvestrand.no ([217.13.28.204]) by above.proper.com (8.11.6/8.11.3) with ESMTP id f7VC3oD06714 for <ietf-822@imc.org>; Fri, 31 Aug 2001 05:03:50 -0700 (PDT)
Received: from [192.168.1.31] (eikenes.alvestrand.no [217.13.28.203]) by eikenes.alvestrand.no (Postfix) with ESMTP id 5945F61C4C; Fri, 31 Aug 2001 14:02:21 +0200 (CEST)
Date: Fri, 31 Aug 2001 14:00:57 +0200
From: Harald Tveit Alvestrand <harald@alvestrand.no>
To: imap@u.washington.edu, ietf-822@imc.org
Subject: Problems with in-reply-to threading
Message-ID: <15189671.999266457@[192.168.1.31]>
X-Mailer: Mulberry/2.1.0 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Sender: owner-ietf-822@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-822/mail-archive/>
List-ID: <ietf-822.imc.org>
List-Unsubscribe: <mailto:ietf-822-request@imc.org?body=unsubscribe>

Hi,
recently I have had some problems with messages being threaded 
inappropriately in my (IMAP) inbox.

Being the kind of person I am, I more or less hunted it down, finding these 
gems:

1) Header:
In-Reply-To: Message from Harald Tveit Alvestrand <harald@alvestrand.no>
   of "Fri, 06 Jul 2001 13:17:51 +0200." 
<759246899.994425471@[192.168.1.31]>

2) RFC 2822:

4.5.4. Obsolete identification fields

   The obsolete "In-Reply-To:" and "References:" fields differ from the
   current syntax in that they allow phrase (words or quoted strings) to
   appear.  The obsolete forms of the left and right sides of msg-id
   allow interspersed CFWS, making them syntactically identical to
   local-part and domain respectively.

obs-message-id  =       "Message-ID" *WSP ":" msg-id CRLF

obs-in-reply-to =       "In-Reply-To" *WSP ":" *(phrase / msg-id) CRLF

Note the absence, inherited from 822, of anything indicating the proper
content of a "phrase".

3) THREAD specification - draft-ietf-imapext-thread-07, section 6.3:

            If a message does not contain a References header line, or
            the References header line does not contain any valid
            Message IDs, then use the FIRST (if any) valid Message ID
            found in the In-Reply-To header line as the only reference
            (parent) for this message.

               Note: Although RFC 822 permits multiple Message IDs in
               the In-Reply-To header, in actual practice this
               discipline has not been followed.  For example,
               In-Reply-To headers have been observed with email
               addresses after the Message ID, and there are no good
               heuristics for software to determine the difference.
               This is not a problem with the References header however.

            If a message does not contain an In-Reply-To header line, or
            the In-Reply-To header line does not contain a valid Message
            ID, then the message does not have any references (NIL).

My capitalization.

Even more worrisome is this header:

References: <harald@alvestrand.no>
	<257869316.998951974@[192.168.1.31]>
	<200108281438.f7SEci101350@hygro.adsl.duke.edu>
	<E15bttt-0004pr-00@roam.psg.com>
	<4475164.999069825@localhost>

which was apparently created on the basis of a different incarnation of the 
previous one, following the algorithm of RFC 2822 section 3.6.4:

   The "References:" field will contain the contents of the parent's
   "References:" field (if any) followed by the contents of the parent's
   "Message-ID:" field (if any).  If the parent message does not contain
   a "References:" field but does have an "In-Reply-To:" field
   containing a single message identifier, then the "References:" field
   will contain the contents of the parent's "In-Reply-To:" field
   followed by the contents of the parent's "Message-ID:" field (if
   any).  If the parent has none of the "References:", "In-Reply-To:",
   or "Message-ID:" fields, then the new message will have no
   "References:" field.

(It seems to have missed the part about In-Reply-To field containing a 
single messgae identifier, though...)

The product creating the initial problem seems to be an MH variant's 
group-reply-to init file (replgroupcomps); we can fix it one install at a 
time, but this seems like a glorious time for clarifications....

Suggested fixes:

1) In RFC 2822bis, state that the msg-id form of obs-in-reply-to MUST 
contain a message-ID, and NOT an email address (a "phrase" cannot contain 
an unquoted angle bracket, so it is only the msg-id that allows it)

2) In RFC 2822bis section 3.6.4, state that the In-reply-to should only be 
used to form References if it has a single message-ID, and that the reason 
is that users of obs-in-reply-to sometimes put emails in their in-reply-to 
fields.

3) Unless someone says that a reasonably widespread implementation exists 
that puts FIRST a message-ID and THEN an email address into the in-reply-to 
field, change the THREAD specification to pick up the LAST instead of the 
first identifier.

What do people think?

                  Harald