[MMUSIC] Resolving IESG issues with RFC4566bis-35: a=charset

Paul Kyzivat <pkyzivat@alum.mit.edu> Fri, 07 June 2019 14:38 UTC

Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: mmusic@ietfa.amsl.com
Delivered-To: mmusic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2A9CF1200B9 for <mmusic@ietfa.amsl.com>; Fri, 7 Jun 2019 07:38:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uy0c_QR_L4FL for <mmusic@ietfa.amsl.com>; Fri, 7 Jun 2019 07:38:21 -0700 (PDT)
Received: from outgoing-alum.mit.edu (outgoing-alum.mit.edu [18.7.68.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 345E7120105 for <mmusic@ietf.org>; Fri, 7 Jun 2019 07:38:09 -0700 (PDT)
Received: from PaulKyzivatsMBP.localdomain (c-24-62-227-142.hsd1.ma.comcast.net [24.62.227.142]) (authenticated bits=0) (User authenticated as pkyzivat@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.14.7/8.12.4) with ESMTP id x57Ec7Dx020773 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for <mmusic@ietf.org>; Fri, 7 Jun 2019 10:38:08 -0400
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
To: IETF MMUSIC WG <mmusic@ietf.org>
References: <155922060388.22145.12090008162284261785.idtracker@ietfa.amsl.com> <5b944fc8-3f97-55e6-2faf-45bfd11c5837@alum.mit.edu> <CALaySJJjwG26NLCJqFdo2yW_JhYCYbY+ADHENa490XqM539U2A@mail.gmail.com> <d1954b5e-f7bb-40e1-88dc-5565212b517d@www.fastmail.com> <1f37b132-98b7-af5e-7997-e0fd095ff207@alum.mit.edu>
Message-ID: <834ef36e-e664-55de-292f-8c7cf3b3b868@alum.mit.edu>
Date: Fri, 07 Jun 2019 10:38:07 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:60.0) Gecko/20100101 Thunderbird/60.7.0
MIME-Version: 1.0
In-Reply-To: <1f37b132-98b7-af5e-7997-e0fd095ff207@alum.mit.edu>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/mmusic/Uyt65p4qqjcw8fAbF1ArqScDAFQ>
Subject: [MMUSIC] Resolving IESG issues with RFC4566bis-35: a=charset
X-BeenThere: mmusic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <mmusic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mmusic>, <mailto:mmusic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mmusic/>
List-Post: <mailto:mmusic@ietf.org>
List-Help: <mailto:mmusic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmusic>, <mailto:mmusic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Jun 2019 14:38:24 -0000

MMUSIC SDP fans,

The message below already went to mmusic, but here I'm reducing the 
distribution list to only mmusic so we don't spam the iesg with our 
internal discussion.

It seems that Alexy doesn't want to let us sweep the charset issues 
under the rug.

Would his suggestion to restrict the charsets permitted to be used be 
acceptable? Repeating it:

>> I would actually suggest that the document should tighten the definition of which charsets are allowed. For textual media types we now recommend use of UTF-8 (which should be the default) and possibly allowing a few others.
>> 
>> So I suggest that the new definition of a=charset be along the lines of "MUST support UTF-8 and US-ASCII. MAY support ISO-8859-1. SHOULD NOT use any other charsets".

	Thanks,
	Paul

On 6/7/19 10:24 AM, Paul Kyzivat wrote:
> Alexy,
> 
> When we first realized the issues with charset we thought we were very 
> near the end of these revisions, and there didn't seem to be much taste 
> for opening this can of worms. But the collection of iesg comments have 
> led me to do a fair number of revisions. So I will ask again if there is 
> willingness to make this kind of change. The main concern is with 
> backward compatibility - is there any use in the wild of other charsets. 
> I doubt it, but don't have any data to back that up.
> 
> (The whole a=charset thing is a pain without much gain. Much trouble 
> identifying things that are and aren't charset-dependent.)
> 
>      Thanks,
>      Paul
> 
> On 6/7/19 8:09 AM, Alexey Melnikov wrote:
>> Hi Barry/Paul,
>>
>> On Mon, Jun 3, 2019, at 8:54 PM, Barry Leiba wrote:
>>> Hi, Paul.  Sticking my oar in with Alexey's here, just on a couple of 
>>> items:
>>>
>>>>> In Section 1:
>>>>>
>>>>> electronic mail using the MIME   extensions [RFC5322]
>>>>>
>>>>> This needs another reference for MIME. E.g. RFC 2045.
>>>>
>>>> I don't understand. This paragraph is referencing examples of protocols
>>>> that can be used to *transport* SDP. RFC5322 references the mail 
>>>> message
>>>> format that would be used to encapsulate SDP if it were transported via
>>>> email. (Though it doesn't actually mention the *transport* protocols
>>>> used for mail messages.)
>>>>
>>>> ISTM that it is the containing protocols that should reference rfc2045.
>>>> RFC5322 does so, and so says how to carry SDP in mail messages. SIP is
>>>> itself effectively an extension to RFC2045 though it doesn't say so.
>>>
>>> Alexey's point is that you explicitly mention "MIME extensions" and
>>> don't provide a reference for it.  I'll go a bit farther to say that
>>> you're not just talking about message *format* here, but also SMTP as
>>> the transport (more correctly, application-layer) protocol, yes?  So
>>> this should say something more like, "electronic mail [RFC5321] using
>>> the MIME extensions [RFC2045]".  I don't think you need 5322, because
>>> 822 is cited by 2045, and that is obsoleted by 2822, and that by 5322.
>>> But I think you do need to cite SMTP and MIME.
>>
>> Yes, exactly.
>>
>>>>> In 6.10:
>>>>>
>>>>>      Note that a character set specified MUST still prohibit the 
>>>>> use of
>>>>>      bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).
>>>>>
>>>>> This doesn’t actually say what you intended. None of the common 
>>>>> charsets
>>>>> prohibit these bytes. I think you meant that when using such 
>>>>> charsets, these
>>>>> characters MUST NOT be used in values.
>>>
>>> Adding to what Alexey says, and maybe clarifying a bit: Character set
>>> and encoding are different things.  The character set is the
>>> abstraction of the characters used, and the encoding is how they're
>>> represented.  The encoding is what creates the bytes on the wire.  One
>>> problem is that "ASCII" refers to both, so it's confusing.  But with
>>> Unicode, "Unicode" is the character set and "UTF-8" is (usually) the
>>> encoding.
>>
>> Right. And the term "charset" is encoding of a particular character 
>> set. It might be worth using it below.
>>
>>> But your point here is that the three byte values you list MUST NOT
>>> appear in the string, and that has nothing to do with the character
>>> set or the encoding.  Those three bytes are prohibited.
>>>
>>> You say that quite well in Section 5:
>>>
>>>     Text-containing fields such as the session-name-field and
>>>     information-field are octet strings that may contain any octet with
>>>     the exceptions of 0x00 (Nul), 0x0a (ASCII newline), and 0x0d (ASCII
>>>     carriage return).
>>>
>>> ... and in 5.13:
>>>
>>>     Attribute values are octet strings, and MAY use any octet value
>>>     except 0x00 (Nul), 0x0A (LF), and 0x0D (CR).
>>>
>>> But in 6.10 I think you want something more like this:
>>>
>>> OLD
>>>     Note that a character set specified MUST still prohibit the use of
>>>     bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).  Character sets 
>>> requiring
>>>     the use of these characters MUST define a quoting mechanism that
>>>     prevents these bytes from appearing within text fields.
>>> NEW
>>>     Note that the restriction specified in Section 5 applies: these 
>>> strings
>>>     MUST NOT contain the bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).
>>>     Character encodings that use these bytes MUST define a quoting
>>>     mechanism that prevents these bytes from appearing within the text
>>>     strings.
>>> END
>>
>> I think this is much better, although "use these bytes" is still 
>> ambiguous. E.g. if these bytes are used to shift between encoding 
>> modes within a particular charset, then there is a problem. If they 
>> are just used to convey specific characters, it might not be.
>>
>> However, see my comment below.
>>
>>>> I don't recall what the state of character set definitions was in 1998
>>>> when this was first published. But it appears that they got carried 
>>>> away
>>>> and over-generalized. It is easy to understand how one might choose to
>>>> use ISO 8859-1 rather than UTF-8 since they are closely related and
>>>> byte-oriented. But it is unclear how one might use some other 
>>>> registered
>>>> charsets, such as EBCDIC, or other encodings of ISO 10646, such as 
>>>> UTF-16.
>>>>
>>>> The bottom line is that use of alternate charsets other than 8859-1 is
>>>> underspecified. We considered revamping the definition of charset, but
>>>> didn't want to open that can of worms, since in practice it isn't an 
>>>> issue.
>>>
>>> I appreciate that, and I think this isn't the place to tackle that.
>>> So we just need to get the text here to accurately reflect what you're
>>> trying to say.
>>
>> I would actually suggest that the document should tighten the 
>> definition of which charsets are allowed. For textual media types we 
>> now recommend use of UTF-8 (which should be the default) and possibly 
>> allowing a few others.
>>
>> So I suggest that the new definition of a=charset be along the lines 
>> of "MUST support UTF-8 and US-ASCII. MAY support ISO-8859-1. SHOULD 
>> NOT use any other charsets".
>>
>>> Hoping to be helpful,
>>> Barry
>>>
>>
> 
> _______________________________________________
> mmusic mailing list
> mmusic@ietf.org
> https://www.ietf.org/mailman/listinfo/mmusic