Re: [MMUSIC] Comments on ABNF in 4566bis

Colin Perkins <csp@csperkins.org> Thu, 04 December 2014 12:26 UTC

Return-Path: <csp@csperkins.org>
X-Original-To: mmusic@ietfa.amsl.com
Delivered-To: mmusic@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 65D211A0363 for <mmusic@ietfa.amsl.com>; Thu, 4 Dec 2014 04:26:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.9
X-Spam-Level:
X-Spam-Status: No, score=-3.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_I_LETTER=-2] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id J4oNHHp-x5lN for <mmusic@ietfa.amsl.com>; Thu, 4 Dec 2014 04:26:10 -0800 (PST)
Received: from balrog.mythic-beasts.com (balrog.mythic-beasts.com [IPv6:2a00:1098:0:82:1000:0:2:1]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E084E1A034F for <mmusic@ietf.org>; Thu, 4 Dec 2014 04:26:09 -0800 (PST)
Received: from [130.209.247.112] (port=60690 helo=mangole.dcs.gla.ac.uk) by balrog.mythic-beasts.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <csp@csperkins.org>) id 1XwVTw-0000R4-Hs; Thu, 04 Dec 2014 12:26:06 +0000
Content-Type: text/plain; charset="windows-1252"
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Colin Perkins <csp@csperkins.org>
In-Reply-To: <547F514C.4020707@alum.mit.edu>
Date: Thu, 04 Dec 2014 12:26:02 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <E8F1B802-9724-45F4-8182-2F198DFD72A6@csperkins.org>
References: <596DA376-D525-49A2-B260-2BADB38AF2E2@csperkins.org> <547C9B29.9060801@alum.mit.edu> <57F676D2-E980-4359-87B1-BE1D764B095A@csperkins.org> <547F514C.4020707@alum.mit.edu>
To: Paul Kyzivat <pkyzivat@alum.mit.edu>
X-Mailer: Apple Mail (2.1878.6)
X-BlackCat-Spam-Score: -28
X-Mythic-Debug: Threshold = On =
Archived-At: http://mailarchive.ietf.org/arch/msg/mmusic/BS11tENHtLiRibFa0si6o748EFs
Cc: mmusic@ietf.org
Subject: Re: [MMUSIC] Comments on ABNF in 4566bis
X-BeenThere: mmusic@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Multiparty Multimedia Session Control Working Group <mmusic.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/mmusic>, <mailto:mmusic-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mmusic/>
List-Post: <mailto:mmusic@ietf.org>
List-Help: <mailto:mmusic-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mmusic>, <mailto:mmusic-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Dec 2014 12:26:12 -0000

On 3 Dec 2014, at 18:07, Paul Kyzivat <pkyzivat@alum.mit.edu> wrote:
> On 12/1/14 6:39 PM, Colin Perkins wrote:
> 
>>>> *a=charset:*
>>>> - This is a name from the registry at
>>>> http://www.iana.org/assignments/character-sets/character-sets.xhtml and needs to follow that syntax
>>> 
>>> In this and other cases where what goes in the SDP is a reference to something defined in a registry, I am inclined to give a simplified syntax that might be a superset of what is permitted in the registry. It doesn't matter a whole lot if it allows some invalid values, because such names will never be in the registry.
>>> 
>>> This will help implementers. And when the exact formal definition isn't available in ABNF, or is very complex, it can also ease formal checking of the SDP ABNF.
>>> 
>>> Given that explanation, is there anything wrong with what I have?
>> 
>> Why not reference the correct ABNF from RFC 2978? I don’t understand the desire to put something incorrect here.
> 
> I've looked again at 2978, and it is less than clear. It says:
> 
>   Each assigned name MUST uniquely identify a single charset.  All
>   charset names MUST be suitable for use as the value of a MIME content
>   type charset parameter and hence MUST conform to MIME parameter value
>   syntax.  This applies even if the specific charset being registered
>   is not suitable for use with the "text" media type.
>   ...
>   Finally, charsets being registered for use with the "text" media type
>   MUST have a primary name that conforms to the more restrictive syntax
>   of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
>   MIME extended parameter values [RFC-2184].  A combined ABNF
>   definition for such names is as follows:
> 
>     mime-charset = 1*mime-charset-chars
>     ...
> 
> The final paragraph above seems to say that mime-charset only applies for charsets registered for use with "text", but based on the first paragraph this syntax could apply for all charset names.
> 
> And this syntax doesn't apply any limit to how long the names can be, while the IANA registry itself says:
> 
>   The character set names may be up to 40 characters taken from the
>   printable characters of US-ASCII.  However, no distinction is made
>   between use of upper and lower case letters.
> 
> I based my ABNF on that. But yes, I could reference mime-charset from 2978.
> 
> Note also that I currently reference "Preferred MIME Name", but have a note asking if I should instead be referencing "Name". These are references to column headings in the table in the registry, for lack of something better to reference. Looking again at the registry, I think I need instead to say:
> 
> OLD:
> 
>   The charset specified MUST be one of those registered with IANA, such
>   as ISO-8859-1.  The character set identifier is a US-ASCII string and
>   MUST be compared against the IANA identifiers using a case-
>   insensitive comparison.  If the identifier is not recognised or not
>   supported, all strings that are affected by it SHOULD be regarded as
>   octet strings.
> 
> NEW:
> 
>   The charset specified MUST be one of those registered in the IANA
>   Character Sets registry
>   (http://www.iana.org/assignments/character-sets) such
>   as ISO-8859-1.  The character set identifier is a US-ASCII string and
>   MUST be compared against identifiers from the "Name" or "Preferred
>   MIME Name" field of the registry using a case-insensitive
>   comparison.  If the identifier is not recognised or not supported,
>   all strings that are affected by it SHOULD be regarded as octet
>   strings.
> 
> I think this is needed because the identifiers in the "Preferred MIME Name" field are not repeated in the "Name" field, but both are allowed to be used in references.

I expect the charset community has common practices in this area. It might be worth asking the designated expert for the registry what’s the preferred field to match on, and how to reference the name syntax. Alternatively, the APPS area director can likely suggest a reviewer who can advise on the right thing to do.

-- 
Colin Perkins
https://csperkins.org/