Re: [MMUSIC] Resolving IESG issues with RFC4566bis-35: a=charset

Christer Holmberg <christer.holmberg@ericsson.com> Sat, 08 June 2019 17:52 UTC

Return-Path: <christer.holmberg@ericsson.com>
X-Original-To: mmusic@ietfa.amsl.com
Delivered-To: mmusic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A9F681200DB for <mmusic@ietfa.amsl.com>; Sat, 8 Jun 2019 10:52:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.011
X-Spam-Level:
X-Spam-Status: No, score=-2.011 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=ericsson.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7gt3nEtP-rk8 for <mmusic@ietfa.amsl.com>; Sat, 8 Jun 2019 10:52:46 -0700 (PDT)
Received: from EUR02-AM5-obe.outbound.protection.outlook.com (mail-eopbgr00071.outbound.protection.outlook.com [40.107.0.71]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4DE3B120106 for <mmusic@ietf.org>; Sat, 8 Jun 2019 10:52:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=95PH0iaLVZfjWgPWJ04LlyYgrcedFvsuLyOxnj/kdRM=; b=U7E0sAwwm3QOMfjIrcgjhGvipDjlxBnkeb6cvE7tKkskRdyKvTd1u+36i8y8iySKD8sjtOWwCmSBUQjjXnh9CpcCEyq+3/BgpLeKduydmq3mMou/tYwbLbG/S2oCEzmKRAKM6eSlyGI+OKen4OW4KtZlQyOAi8ht6pZRxq5KLeE=
Received: from HE1PR07MB3161.eurprd07.prod.outlook.com (10.170.245.23) by HE1PR07MB3131.eurprd07.prod.outlook.com (10.170.245.17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1987.7; Sat, 8 Jun 2019 17:52:43 +0000
Received: from HE1PR07MB3161.eurprd07.prod.outlook.com ([fe80::ec69:84fc:5339:d4fd]) by HE1PR07MB3161.eurprd07.prod.outlook.com ([fe80::ec69:84fc:5339:d4fd%7]) with mapi id 15.20.1987.008; Sat, 8 Jun 2019 17:52:43 +0000
From: Christer Holmberg <christer.holmberg@ericsson.com>
To: Paul Kyzivat <pkyzivat@alum.mit.edu>, IETF MMUSIC WG <mmusic@ietf.org>
Thread-Topic: [MMUSIC] Resolving IESG issues with RFC4566bis-35: a=charset
Thread-Index: AQHVHT65eKOky9R93ke35kJypanbCqaSPb6A
Date: Sat, 08 Jun 2019 17:52:43 +0000
Message-ID: <1D536F12-1008-462C-BFF8-876B6A03F274@ericsson.com>
References: <155922060388.22145.12090008162284261785.idtracker@ietfa.amsl.com> <5b944fc8-3f97-55e6-2faf-45bfd11c5837@alum.mit.edu> <CALaySJJjwG26NLCJqFdo2yW_JhYCYbY+ADHENa490XqM539U2A@mail.gmail.com> <d1954b5e-f7bb-40e1-88dc-5565212b517d@www.fastmail.com> <1f37b132-98b7-af5e-7997-e0fd095ff207@alum.mit.edu> <834ef36e-e664-55de-292f-8c7cf3b3b868@alum.mit.edu>
In-Reply-To: <834ef36e-e664-55de-292f-8c7cf3b3b868@alum.mit.edu>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/10.19.0.190512
authentication-results: spf=none (sender IP is ) smtp.mailfrom=christer.holmberg@ericsson.com;
x-originating-ip: [79.134.118.162]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 906eed1b-f830-47dd-b7f8-08d6ec3a1bb8
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020); SRVR:HE1PR07MB3131;
x-ms-traffictypediagnostic: HE1PR07MB3131:
x-ms-exchange-purlcount: 1
x-microsoft-antispam-prvs: <HE1PR07MB31318362F8FD466412A3DB7C93110@HE1PR07MB3131.eurprd07.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 0062BDD52C
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(366004)(396003)(136003)(376002)(346002)(199004)(189003)(36756003)(8936002)(8676002)(11346002)(305945005)(58126008)(476003)(2616005)(446003)(478600001)(7736002)(6436002)(316002)(25786009)(26005)(81156014)(3846002)(6116002)(81166006)(76176011)(6512007)(2906002)(99286004)(186003)(6506007)(102836004)(6306002)(86362001)(110136005)(33656002)(2171002)(6486002)(68736007)(53546011)(53936002)(6246003)(66476007)(66946007)(486006)(44832011)(256004)(66066001)(66556008)(82746002)(66446008)(966005)(76116006)(64756008)(73956011)(83716004)(71190400001)(5660300002)(71200400001)(229853002)(14454004); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR07MB3131; H:HE1PR07MB3161.eurprd07.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: ericsson.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: LgdYXGKnPlT/FN0jaXi8JiPzB83LTjXc48MuEhb0P3uY3cFnuFFxpUTrWi4gbZCEsBoQ+olHmgDyTy4/aEBbuZBw3cJaFlOX7kp09Hp28KMOavZ1ROi+l5K8A+WKYK+bpOLds6oIIWnJs1D9MLO3j+cRbc+rRmzEICg1clw6LNzAsaTPWxwMECvYK1jqXGl6up86yTR1WLBkfok/ClCRaK8S0tf8aURAIRfCGFJ2LpK4jCAT33CiLM260Xyx+kJ0UeCSj8plj+9JzunA7QFUBkbhb9sBLF2UPcGEF1IdPiZvm6n0a7RlSQDVxsdOJjtEPmiU/3639ZNSNIHFbwU4TSqv2vZEf01Zf4EXSrfa5qEOpfljCbhJMsKZAijuMuAYvLLeTkMrpQuiiFrz0ba0LJipednGPuDK+5fUd+YuHkI=
Content-Type: text/plain; charset="utf-8"
Content-ID: <6C709CCD73D24445A8908C14E7A697D3@eurprd07.prod.outlook.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-OriginatorOrg: ericsson.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 906eed1b-f830-47dd-b7f8-08d6ec3a1bb8
X-MS-Exchange-CrossTenant-originalarrivaltime: 08 Jun 2019 17:52:43.5483 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 92e84ceb-fbfd-47ab-be52-080c6b87953f
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: christer.holmberg@ericsson.com
X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR07MB3131
Archived-At: <https://mailarchive.ietf.org/arch/msg/mmusic/sspyfkLsPHnxfMbiuScqov5ewOA>
Subject: Re: [MMUSIC] Resolving IESG issues with RFC4566bis-35: a=charset
X-BeenThere: mmusic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <mmusic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mmusic>, <mailto:mmusic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mmusic/>
List-Post: <mailto:mmusic@ietf.org>
List-Help: <mailto:mmusic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmusic>, <mailto:mmusic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 08 Jun 2019 17:52:50 -0000

Hi,

I don't have a strong opinion, but I would really like to avoid a "MUST-MAY-SHOULD NOT" definition.

One bold approach would be to only allow, and mandate support of, a set of charsets. The definition and usage of any additional charsets need to be defined in a separate document.

Regards,

Christer


On 07/06/2019, 17.38, "mmusic on behalf of Paul Kyzivat" <mmusic-bounces@ietf.org on behalf of pkyzivat@alum.mit.edu> wrote:

    MMUSIC SDP fans,
    
    The message below already went to mmusic, but here I'm reducing the 
    distribution list to only mmusic so we don't spam the iesg with our 
    internal discussion.
    
    It seems that Alexy doesn't want to let us sweep the charset issues 
    under the rug.
    
    Would his suggestion to restrict the charsets permitted to be used be 
    acceptable? Repeating it:
    
    >> I would actually suggest that the document should tighten the definition of which charsets are allowed. For textual media types we now recommend use of UTF-8 (which should be the default) and possibly allowing a few others.
    >> 
    >> So I suggest that the new definition of a=charset be along the lines of "MUST support UTF-8 and US-ASCII. MAY support ISO-8859-1. SHOULD NOT use any other charsets".
    
    	Thanks,
    	Paul
    
    On 6/7/19 10:24 AM, Paul Kyzivat wrote:
    > Alexy,
    > 
    > When we first realized the issues with charset we thought we were very 
    > near the end of these revisions, and there didn't seem to be much taste 
    > for opening this can of worms. But the collection of iesg comments have 
    > led me to do a fair number of revisions. So I will ask again if there is 
    > willingness to make this kind of change. The main concern is with 
    > backward compatibility - is there any use in the wild of other charsets. 
    > I doubt it, but don't have any data to back that up.
    > 
    > (The whole a=charset thing is a pain without much gain. Much trouble 
    > identifying things that are and aren't charset-dependent.)
    > 
    >      Thanks,
    >      Paul
    > 
    > On 6/7/19 8:09 AM, Alexey Melnikov wrote:
    >> Hi Barry/Paul,
    >>
    >> On Mon, Jun 3, 2019, at 8:54 PM, Barry Leiba wrote:
    >>> Hi, Paul.  Sticking my oar in with Alexey's here, just on a couple of 
    >>> items:
    >>>
    >>>>> In Section 1:
    >>>>>
    >>>>> electronic mail using the MIME   extensions [RFC5322]
    >>>>>
    >>>>> This needs another reference for MIME. E.g. RFC 2045.
    >>>>
    >>>> I don't understand. This paragraph is referencing examples of protocols
    >>>> that can be used to *transport* SDP. RFC5322 references the mail 
    >>>> message
    >>>> format that would be used to encapsulate SDP if it were transported via
    >>>> email. (Though it doesn't actually mention the *transport* protocols
    >>>> used for mail messages.)
    >>>>
    >>>> ISTM that it is the containing protocols that should reference rfc2045.
    >>>> RFC5322 does so, and so says how to carry SDP in mail messages. SIP is
    >>>> itself effectively an extension to RFC2045 though it doesn't say so.
    >>>
    >>> Alexey's point is that you explicitly mention "MIME extensions" and
    >>> don't provide a reference for it.  I'll go a bit farther to say that
    >>> you're not just talking about message *format* here, but also SMTP as
    >>> the transport (more correctly, application-layer) protocol, yes?  So
    >>> this should say something more like, "electronic mail [RFC5321] using
    >>> the MIME extensions [RFC2045]".  I don't think you need 5322, because
    >>> 822 is cited by 2045, and that is obsoleted by 2822, and that by 5322.
    >>> But I think you do need to cite SMTP and MIME.
    >>
    >> Yes, exactly.
    >>
    >>>>> In 6.10:
    >>>>>
    >>>>>      Note that a character set specified MUST still prohibit the 
    >>>>> use of
    >>>>>      bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).
    >>>>>
    >>>>> This doesn’t actually say what you intended. None of the common 
    >>>>> charsets
    >>>>> prohibit these bytes. I think you meant that when using such 
    >>>>> charsets, these
    >>>>> characters MUST NOT be used in values.
    >>>
    >>> Adding to what Alexey says, and maybe clarifying a bit: Character set
    >>> and encoding are different things.  The character set is the
    >>> abstraction of the characters used, and the encoding is how they're
    >>> represented.  The encoding is what creates the bytes on the wire.  One
    >>> problem is that "ASCII" refers to both, so it's confusing.  But with
    >>> Unicode, "Unicode" is the character set and "UTF-8" is (usually) the
    >>> encoding.
    >>
    >> Right. And the term "charset" is encoding of a particular character 
    >> set. It might be worth using it below.
    >>
    >>> But your point here is that the three byte values you list MUST NOT
    >>> appear in the string, and that has nothing to do with the character
    >>> set or the encoding.  Those three bytes are prohibited.
    >>>
    >>> You say that quite well in Section 5:
    >>>
    >>>     Text-containing fields such as the session-name-field and
    >>>     information-field are octet strings that may contain any octet with
    >>>     the exceptions of 0x00 (Nul), 0x0a (ASCII newline), and 0x0d (ASCII
    >>>     carriage return).
    >>>
    >>> ... and in 5.13:
    >>>
    >>>     Attribute values are octet strings, and MAY use any octet value
    >>>     except 0x00 (Nul), 0x0A (LF), and 0x0D (CR).
    >>>
    >>> But in 6.10 I think you want something more like this:
    >>>
    >>> OLD
    >>>     Note that a character set specified MUST still prohibit the use of
    >>>     bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).  Character sets 
    >>> requiring
    >>>     the use of these characters MUST define a quoting mechanism that
    >>>     prevents these bytes from appearing within text fields.
    >>> NEW
    >>>     Note that the restriction specified in Section 5 applies: these 
    >>> strings
    >>>     MUST NOT contain the bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).
    >>>     Character encodings that use these bytes MUST define a quoting
    >>>     mechanism that prevents these bytes from appearing within the text
    >>>     strings.
    >>> END
    >>
    >> I think this is much better, although "use these bytes" is still 
    >> ambiguous. E.g. if these bytes are used to shift between encoding 
    >> modes within a particular charset, then there is a problem. If they 
    >> are just used to convey specific characters, it might not be.
    >>
    >> However, see my comment below.
    >>
    >>>> I don't recall what the state of character set definitions was in 1998
    >>>> when this was first published. But it appears that they got carried 
    >>>> away
    >>>> and over-generalized. It is easy to understand how one might choose to
    >>>> use ISO 8859-1 rather than UTF-8 since they are closely related and
    >>>> byte-oriented. But it is unclear how one might use some other 
    >>>> registered
    >>>> charsets, such as EBCDIC, or other encodings of ISO 10646, such as 
    >>>> UTF-16.
    >>>>
    >>>> The bottom line is that use of alternate charsets other than 8859-1 is
    >>>> underspecified. We considered revamping the definition of charset, but
    >>>> didn't want to open that can of worms, since in practice it isn't an 
    >>>> issue.
    >>>
    >>> I appreciate that, and I think this isn't the place to tackle that.
    >>> So we just need to get the text here to accurately reflect what you're
    >>> trying to say.
    >>
    >> I would actually suggest that the document should tighten the 
    >> definition of which charsets are allowed. For textual media types we 
    >> now recommend use of UTF-8 (which should be the default) and possibly 
    >> allowing a few others.
    >>
    >> So I suggest that the new definition of a=charset be along the lines 
    >> of "MUST support UTF-8 and US-ASCII. MAY support ISO-8859-1. SHOULD 
    >> NOT use any other charsets".
    >>
    >>> Hoping to be helpful,
    >>> Barry
    >>>
    >>
    > 
    > _______________________________________________
    > mmusic mailing list
    > mmusic@ietf.org
    > https://www.ietf.org/mailman/listinfo/mmusic
    
    _______________________________________________
    mmusic mailing list
    mmusic@ietf.org
    https://www.ietf.org/mailman/listinfo/mmusic