Re: [ietf-822] utf8 messages

Ned Freed <ned.freed@mrochek.com> Fri, 15 August 2014 01:50 UTC

Return-Path: <ned.freed@mrochek.com>
X-Original-To: ietf-822@ietfa.amsl.com
Delivered-To: ietf-822@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 56D791A09E7 for <ietf-822@ietfa.amsl.com>; Thu, 14 Aug 2014 18:50:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.67
X-Spam-Level:
X-Spam-Status: No, score=-2.67 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-0.668, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id N428KSsThcwz for <ietf-822@ietfa.amsl.com>; Thu, 14 Aug 2014 18:50:50 -0700 (PDT)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.159.242.17]) by ietfa.amsl.com (Postfix) with ESMTP id 2DADF1A0665 for <ietf-822@ietf.org>; Thu, 14 Aug 2014 18:50:50 -0700 (PDT)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01PBDQ3R930W001DPZ@mauve.mrochek.com> for ietf-822@ietf.org; Thu, 14 Aug 2014 18:45:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mrochek.com; s=mauve; t=1408067147; bh=pxlU6o1YDaFtQr1DMmiGtUu272VL2mAi0cD92NGYEnU=; h=Cc:Date:From:Subject:In-reply-to:References:To; b=V1PLx+AlSACwN1ZK44jOgQ7Wp7fWSVEQBTWGK07Axl/OsNC98eNtUduTjiiDGHHYi XiCTxqaqtlKKdZhlW0tvEd47yGJKbak8ird33aH5P6vi5Vj7K+Hx0UIJr8fm5PtA6W rbNEfAUXwzqtrOaR2nqqcEa1GAhnyVNw5Odpm6ZY=
MIME-version: 1.0
Content-transfer-encoding: 7bit
Content-type: TEXT/PLAIN; CHARSET="us-ascii"
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01PBCA2453TC0000SM@mauve.mrochek.com>; Thu, 14 Aug 2014 18:45:43 -0700 (PDT)
Message-id: <01PBDQ3P6A8Q0000SM@mauve.mrochek.com>
Date: Thu, 14 Aug 2014 18:42:38 -0700
From: Ned Freed <ned.freed@mrochek.com>
In-reply-to: "Your message dated Thu, 14 Aug 2014 17:41:24 -0700" <CABa8R6sbFQHaP=YgrejJjUKJS+20BFP+kATZ+PrDnTgUhPMpHw@mail.gmail.com>
References: <CABa8R6tWEhjjZSvq6NbM7EimokOms3suZufn0-6N1SB_fzGM8Q@mail.gmail.com> <01PB9FABWA4E0000SM@mauve.mrochek.com> <CABa8R6tns-idiZTj=+vb9fVNyH-nNYT+w9oNMb80XbCs5osvFw@mail.gmail.com> <01PBABOOL4QO0000SM@mauve.mrochek.com> <CABa8R6vBqS1ewmTtHh8tTOdzobsWpvSEokRxOqpj1Oq3hA+vsw@mail.gmail.com> <01PBBWUH11D60000SM@mauve.mrochek.com> <CABa8R6uJ--4Fcntdgef+h6ZXjP_q0q7hZaBW-SOozMTtiE918g@mail.gmail.com> <01PBCGZERCU20000SM@mauve.mrochek.com> <CABa8R6sbFQHaP=YgrejJjUKJS+20BFP+kATZ+PrDnTgUhPMpHw@mail.gmail.com>
To: Brandon Long <blong@google.com>
Archived-At: http://mailarchive.ietf.org/arch/msg/ietf-822/gQ33o9GF5rAfwxKWIrpMXRJgt2s
Cc: ietf-822@ietf.org, Ned Freed <ned.freed@mrochek.com>
Subject: Re: [ietf-822] utf8 messages
X-BeenThere: ietf-822@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion of issues related to Internet Message Format \[RFC 822, RFC 2822, RFC 5322\]" <ietf-822.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf-822/>
List-Post: <mailto:ietf-822@ietf.org>
List-Help: <mailto:ietf-822-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Aug 2014 01:50:51 -0000

> On Wed, Aug 13, 2014 at 6:34 PM, Ned Freed <ned.freed@mrochek.com> wrote:

> > > Let me try one more time, since something isn't making it through.
> >
> > > I have three messages.  One message has an entirely 7bit header with 2047
> > > encoded subject.  Another message is a 6532 message, with the subject in
> > > utf8.  A third message is has a cp-1250 8bit subject.  There are two 8bit
> > > bytes in the subject in both of the last two messages, and in the cp1250
> > > case, those two bytes happen to also be a valid utf8 character.
> >
> > > We want to be able to parse all three of those and do so correctly.  We
> > > know the third type is technically invalid, but we see millions of such
> > > messages every day, dropping all of those would be a dis-service to our
> > > users.  We currently see way more of such messages than we do of 6532
> > > messages... though in practice, the most common charset now is utf-8, so
> > I
> > > guess those are now the same as 6532 messages that have leaked.
> >
> > I thought I understood the problem you were attempting to solve, but now
> > I'm
> > totally confused, because this seems to hqve nothing to do with additional
> > labeling of legitimate EAI messages at all.
> >

> My point is that without a label, I can't tell the difference between the
> 6532 messages and the illegitimate messages, given just the message.

Yes, that's exactly what I thought. But given that, surely it makes more sense
to label the illegimate messages as such, especially since you're going to want
to set the EAI message bit for some of them, making it essentally an orthogonal
setting?

Did you even bother to read the rest of my response?

				Ned