Re: [MMUSIC] Alexey Melnikov's No Objection on draft-ietf-mmusic-rfc4566bis-35: (with COMMENT)

Paul Kyzivat <pkyzivat@alum.mit.edu> Fri, 07 June 2019 14:25 UTC

Return-Path: <pkyzivat@alum.mit.edu>
X-Original-To: mmusic@ietfa.amsl.com
Delivered-To: mmusic@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B0F971201F8; Fri, 7 Jun 2019 07:25:06 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Khb5iuvAWcjG; Fri, 7 Jun 2019 07:25:04 -0700 (PDT)
Received: from outgoing-alum.mit.edu (outgoing-alum.mit.edu [18.7.68.33]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D910C120219; Fri, 7 Jun 2019 07:24:59 -0700 (PDT)
Received: from PaulKyzivatsMBP.localdomain (c-24-62-227-142.hsd1.ma.comcast.net [24.62.227.142]) (authenticated bits=0) (User authenticated as pkyzivat@ALUM.MIT.EDU) by outgoing-alum.mit.edu (8.14.7/8.12.4) with ESMTP id x57EOosr019843 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Fri, 7 Jun 2019 10:24:51 -0400
To: Alexey Melnikov <aamelnikov@fastmail.fm>, Barry Leiba <barryleiba@computer.org>
Cc: The IESG <iesg@ietf.org>, Flemming Andreasen <fandreas@cisco.com>, mmusic-chairs@ietf.org, draft-ietf-mmusic-rfc4566bis@ietf.org, mmusic@ietf.org
References: <155922060388.22145.12090008162284261785.idtracker@ietfa.amsl.com> <5b944fc8-3f97-55e6-2faf-45bfd11c5837@alum.mit.edu> <CALaySJJjwG26NLCJqFdo2yW_JhYCYbY+ADHENa490XqM539U2A@mail.gmail.com> <d1954b5e-f7bb-40e1-88dc-5565212b517d@www.fastmail.com>
From: Paul Kyzivat <pkyzivat@alum.mit.edu>
Message-ID: <1f37b132-98b7-af5e-7997-e0fd095ff207@alum.mit.edu>
Date: Fri, 07 Jun 2019 10:24:50 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:60.0) Gecko/20100101 Thunderbird/60.7.0
MIME-Version: 1.0
In-Reply-To: <d1954b5e-f7bb-40e1-88dc-5565212b517d@www.fastmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/mmusic/tvRKvkf2dRAP8VSd3vt-B1HI3fc>
Subject: Re: [MMUSIC] Alexey Melnikov's No Objection on draft-ietf-mmusic-rfc4566bis-35: (with COMMENT)
X-BeenThere: mmusic@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <mmusic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mmusic>, <mailto:mmusic-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/mmusic/>
List-Post: <mailto:mmusic@ietf.org>
List-Help: <mailto:mmusic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmusic>, <mailto:mmusic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 07 Jun 2019 14:25:07 -0000

Alexy,

When we first realized the issues with charset we thought we were very 
near the end of these revisions, and there didn't seem to be much taste 
for opening this can of worms. But the collection of iesg comments have 
led me to do a fair number of revisions. So I will ask again if there is 
willingness to make this kind of change. The main concern is with 
backward compatibility - is there any use in the wild of other charsets. 
I doubt it, but don't have any data to back that up.

(The whole a=charset thing is a pain without much gain. Much trouble 
identifying things that are and aren't charset-dependent.)

	Thanks,
	Paul

On 6/7/19 8:09 AM, Alexey Melnikov wrote:
> Hi Barry/Paul,
> 
> On Mon, Jun 3, 2019, at 8:54 PM, Barry Leiba wrote:
>> Hi, Paul.  Sticking my oar in with Alexey's here, just on a couple of items:
>>
>>>> In Section 1:
>>>>
>>>> electronic mail using the MIME   extensions [RFC5322]
>>>>
>>>> This needs another reference for MIME. E.g. RFC 2045.
>>>
>>> I don't understand. This paragraph is referencing examples of protocols
>>> that can be used to *transport* SDP. RFC5322 references the mail message
>>> format that would be used to encapsulate SDP if it were transported via
>>> email. (Though it doesn't actually mention the *transport* protocols
>>> used for mail messages.)
>>>
>>> ISTM that it is the containing protocols that should reference rfc2045.
>>> RFC5322 does so, and so says how to carry SDP in mail messages. SIP is
>>> itself effectively an extension to RFC2045 though it doesn't say so.
>>
>> Alexey's point is that you explicitly mention "MIME extensions" and
>> don't provide a reference for it.  I'll go a bit farther to say that
>> you're not just talking about message *format* here, but also SMTP as
>> the transport (more correctly, application-layer) protocol, yes?  So
>> this should say something more like, "electronic mail [RFC5321] using
>> the MIME extensions [RFC2045]".  I don't think you need 5322, because
>> 822 is cited by 2045, and that is obsoleted by 2822, and that by 5322.
>> But I think you do need to cite SMTP and MIME.
> 
> Yes, exactly.
> 
>>>> In 6.10:
>>>>
>>>>      Note that a character set specified MUST still prohibit the use of
>>>>      bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).
>>>>
>>>> This doesn’t actually say what you intended. None of the common charsets
>>>> prohibit these bytes. I think you meant that when using such charsets, these
>>>> characters MUST NOT be used in values.
>>
>> Adding to what Alexey says, and maybe clarifying a bit: Character set
>> and encoding are different things.  The character set is the
>> abstraction of the characters used, and the encoding is how they're
>> represented.  The encoding is what creates the bytes on the wire.  One
>> problem is that "ASCII" refers to both, so it's confusing.  But with
>> Unicode, "Unicode" is the character set and "UTF-8" is (usually) the
>> encoding.
> 
> Right. And the term "charset" is encoding of a particular character set. It might be worth using it below.
> 
>> But your point here is that the three byte values you list MUST NOT
>> appear in the string, and that has nothing to do with the character
>> set or the encoding.  Those three bytes are prohibited.
>>
>> You say that quite well in Section 5:
>>
>>     Text-containing fields such as the session-name-field and
>>     information-field are octet strings that may contain any octet with
>>     the exceptions of 0x00 (Nul), 0x0a (ASCII newline), and 0x0d (ASCII
>>     carriage return).
>>
>> ... and in 5.13:
>>
>>     Attribute values are octet strings, and MAY use any octet value
>>     except 0x00 (Nul), 0x0A (LF), and 0x0D (CR).
>>
>> But in 6.10 I think you want something more like this:
>>
>> OLD
>>     Note that a character set specified MUST still prohibit the use of
>>     bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).  Character sets requiring
>>     the use of these characters MUST define a quoting mechanism that
>>     prevents these bytes from appearing within text fields.
>> NEW
>>     Note that the restriction specified in Section 5 applies: these strings
>>     MUST NOT contain the bytes 0x00 (Nul), 0x0A (LF), and 0x0d (CR).
>>     Character encodings that use these bytes MUST define a quoting
>>     mechanism that prevents these bytes from appearing within the text
>>     strings.
>> END
> 
> I think this is much better, although "use these bytes" is still ambiguous. E.g. if these bytes are used to shift between encoding modes within a particular charset, then there is a problem. If they are just used to convey specific characters, it might not be.
> 
> However, see my comment below.
> 
>>> I don't recall what the state of character set definitions was in 1998
>>> when this was first published. But it appears that they got carried away
>>> and over-generalized. It is easy to understand how one might choose to
>>> use ISO 8859-1 rather than UTF-8 since they are closely related and
>>> byte-oriented. But it is unclear how one might use some other registered
>>> charsets, such as EBCDIC, or other encodings of ISO 10646, such as UTF-16.
>>>
>>> The bottom line is that use of alternate charsets other than 8859-1 is
>>> underspecified. We considered revamping the definition of charset, but
>>> didn't want to open that can of worms, since in practice it isn't an issue.
>>
>> I appreciate that, and I think this isn't the place to tackle that.
>> So we just need to get the text here to accurately reflect what you're
>> trying to say.
> 
> I would actually suggest that the document should tighten the definition of which charsets are allowed. For textual media types we now recommend use of UTF-8 (which should be the default) and possibly allowing a few others.
> 
> So I suggest that the new definition of a=charset be along the lines of "MUST support UTF-8 and US-ASCII. MAY support ISO-8859-1. SHOULD NOT use any other charsets".
> 
>> Hoping to be helpful,
>> Barry
>>
>