Re: [ietf-822] utf8 messages

Brandon Long <blong@google.com> Fri, 15 August 2014 00:41 UTC

Return-Path: <blong@google.com>
X-Original-To: ietf-822@ietfa.amsl.com
Delivered-To: ietf-822@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2A6101A0473 for <ietf-822@ietfa.amsl.com>; Thu, 14 Aug 2014 17:41:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.046
X-Spam-Level:
X-Spam-Status: No, score=-2.046 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.668, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5VACFX4dPaWb for <ietf-822@ietfa.amsl.com>; Thu, 14 Aug 2014 17:41:25 -0700 (PDT)
Received: from mail-ig0-x236.google.com (mail-ig0-x236.google.com [IPv6:2607:f8b0:4001:c05::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8F5611A0465 for <ietf-822@ietf.org>; Thu, 14 Aug 2014 17:41:25 -0700 (PDT)
Received: by mail-ig0-f182.google.com with SMTP id c1so544860igq.15 for <ietf-822@ietf.org>; Thu, 14 Aug 2014 17:41:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Ai/bBNwqIWA1Uya4Y2f4KDSCAvIjYE2b7Hvj+cSfdb0=; b=LO11rn++3/y31nhhcNxGSkILK+m5zJ3toJeWvCTFvMgVfYurYG04mdEqydMI2mBCRg 8kqoxCyy1SJyO4ZK3Yeurdlx9nRQE6FjnFmw9Cvl9e5SbhGM+HTnDtSFAPStYhG1MaFj 80mzlsxZ1IlSpdOwm1g+j9L5hdYsCt1Rj+GJR4vob8dXBCCMsuY6OgEfIB6tyra0/dq+ A/W8ZmunyM37PH4sg46x1g25FbXhJ8nz01/V/zliX1b9gt27J9WGAS57U2M88KAHbUsc QFNnIUDTvL2ntZUk7ed5zJ0bGM2Qgd8F5mMZaoINkua2tIrDZuzgOXH+l951I81YudKt rO2w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=Ai/bBNwqIWA1Uya4Y2f4KDSCAvIjYE2b7Hvj+cSfdb0=; b=HR1JslXr4xWc8VTl/lXuBBOHQf7kn28biNMvl3is7aBWOC9tQSzuLug4RYxwae6Qsg eqduXl+OjtbIWTP6dzn4bydaSseuMIdSh/SPmrLXeEicyWfw9LZIiQqZKJ5x1wCXPNrp 6cDysjNWSOuUwCH7CTngoaiuYNwL9GOqDwSUtViTKBEtMhK5k1pIA7t8TD31kqYCPBqy q1DRYkqjHtys7wbRKdnZfTWANXMqwzyXxL6XMphXC9t66ubMluA/bQvBbaBkkhZX7BkL Hg/nWuQ8dLpWBHJjiv00v2HbSWUSqF+9bNDGxIaO1bR3+JEQ9xnR+6VdyAZDaQ45n6Ti P3tg==
X-Gm-Message-State: ALoCoQnd20lxwlbYUL2Dx6DlNjMIoKkDeD1wBPjRN/aR2XvyCXurj55HVukf/N35qGlY3A/xCDRt
MIME-Version: 1.0
X-Received: by 10.42.64.77 with SMTP id f13mr17904397ici.72.1408063284817; Thu, 14 Aug 2014 17:41:24 -0700 (PDT)
Received: by 10.64.62.78 with HTTP; Thu, 14 Aug 2014 17:41:24 -0700 (PDT)
In-Reply-To: <01PBCGZERCU20000SM@mauve.mrochek.com>
References: <CABa8R6tWEhjjZSvq6NbM7EimokOms3suZufn0-6N1SB_fzGM8Q@mail.gmail.com> <01PB9FABWA4E0000SM@mauve.mrochek.com> <CABa8R6tns-idiZTj=+vb9fVNyH-nNYT+w9oNMb80XbCs5osvFw@mail.gmail.com> <01PBABOOL4QO0000SM@mauve.mrochek.com> <CABa8R6vBqS1ewmTtHh8tTOdzobsWpvSEokRxOqpj1Oq3hA+vsw@mail.gmail.com> <01PBBWUH11D60000SM@mauve.mrochek.com> <CABa8R6uJ--4Fcntdgef+h6ZXjP_q0q7hZaBW-SOozMTtiE918g@mail.gmail.com> <01PBCGZERCU20000SM@mauve.mrochek.com>
Date: Thu, 14 Aug 2014 17:41:24 -0700
Message-ID: <CABa8R6sbFQHaP=YgrejJjUKJS+20BFP+kATZ+PrDnTgUhPMpHw@mail.gmail.com>
From: Brandon Long <blong@google.com>
To: Ned Freed <ned.freed@mrochek.com>
Content-Type: multipart/alternative; boundary="90e6ba614c64e9e5e90500a047db"
Archived-At: http://mailarchive.ietf.org/arch/msg/ietf-822/4VQ4Q6ALOBPmb0tyF-SpFsW_cRA
Cc: ietf-822@ietf.org
Subject: Re: [ietf-822] utf8 messages
X-BeenThere: ietf-822@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion of issues related to Internet Message Format \[RFC 822, RFC 2822, RFC 5322\]" <ietf-822.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf-822/>
List-Post: <mailto:ietf-822@ietf.org>
List-Help: <mailto:ietf-822-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-822>, <mailto:ietf-822-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Aug 2014 00:41:27 -0000

On Wed, Aug 13, 2014 at 6:34 PM, Ned Freed <ned.freed@mrochek.com> wrote:

> > Let me try one more time, since something isn't making it through.
>
> > I have three messages.  One message has an entirely 7bit header with 2047
> > encoded subject.  Another message is a 6532 message, with the subject in
> > utf8.  A third message is has a cp-1250 8bit subject.  There are two 8bit
> > bytes in the subject in both of the last two messages, and in the cp1250
> > case, those two bytes happen to also be a valid utf8 character.
>
> > We want to be able to parse all three of those and do so correctly.  We
> > know the third type is technically invalid, but we see millions of such
> > messages every day, dropping all of those would be a dis-service to our
> > users.  We currently see way more of such messages than we do of 6532
> > messages... though in practice, the most common charset now is utf-8, so
> I
> > guess those are now the same as 6532 messages that have leaked.
>
> I thought I understood the problem you were attempting to solve, but now
> I'm
> totally confused, because this seems to hqve nothing to do with additional
> labeling of legitimate EAI messages at all.
>

My point is that without a label, I can't tell the difference between the
6532 messages and the illegitimate messages, given just the message.

Brandon