Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt

Martin Duerst <duerst@it.aoyama.ac.jp> Wed, 16 August 2006 10:44 UTC

Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1GDIsr-0003uU-6f; Wed, 16 Aug 2006 06:44:25 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1GDIEM-0000OJ-FE for gen-art@ietf.org; Wed, 16 Aug 2006 06:02:34 -0400
Received: from scmailgw2.scop.aoyama.ac.jp ([133.2.251.195]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1GDIEL-00049E-NI for gen-art@ietf.org; Wed, 16 Aug 2006 06:02:34 -0400
Received: from scmse2.scbb.aoyama.ac.jp (scmse2 [133.2.253.17]) by scmailgw2.scop.aoyama.ac.jp (secret/secret) with SMTP id k7GA2WWG006027; Wed, 16 Aug 2006 19:02:32 +0900 (JST)
Received: from (133.2.210.1) by scmse2.scbb.aoyama.ac.jp via smtp id 7110_5585b750_2d0e_11db_87b4_0014221f2a2d; Wed, 16 Aug 2006 19:02:31 +0900
Received: from Tanzawa.it.aoyama.ac.jp (localhost.localdomain [127.0.0.1]) by localhost.localdomain (8.13.7/8.13.1) with ESMTP id k7GA2Gbs006163; Wed, 16 Aug 2006 19:02:24 +0900
Message-Id: <6.0.0.20.2.20060816161139.082531f0@localhost>
X-Sender: duerst@localhost
X-Mailer: QUALCOMM Windows Eudora Version 6J
Date: Wed, 16 Aug 2006 17:25:42 +0900
To: Spencer Dawkins <spencer@mcsr-labs.org>, General Area Review Team <gen-art@ietf.org>
From: Martin Duerst <duerst@it.aoyama.ac.jp>
Subject: Re: [Gen-art] Gen-ART Review of draft-newman-i18n-comparator-13.txt
In-Reply-To: <047601c6c066$caba4500$0600a8c0@china.huawei.com>
References: <00fc01c63c21$db3d68e0$0500a8c0@china.huawei.com> <047601c6c066$caba4500$0600a8c0@china.huawei.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 827a2a57ca7ab0837847220f447e8d56
X-Mailman-Approved-At: Wed, 16 Aug 2006 06:44:23 -0400
Cc: Arnt Gulbrandsen <arnt@oryx.com>, Lisa Dusseault <lisa@osafoundation.org>, chris.newman@Sun.COM
X-BeenThere: gen-art@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: "GEN-ART: General Area Review Team" <gen-art.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/gen-art>
List-Post: <mailto:gen-art@ietf.org>
List-Help: <mailto:gen-art-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/gen-art>, <mailto:gen-art-request@ietf.org?subject=subscribe>
Errors-To: gen-art-bounces@ietf.org

At 21:31 06/08/15, Spencer Dawkins wrote:

>Review Comments:
>
>2.2.  Purpose
>
>   Collations abstraction layer for comparison functions so that these
>   comparison functions can be used in multiple protocols.
>
>I am just barely able to parse this sentence so that it's not a sentence fragment. I think the problem is that "functions" is being used as a verb and as a noun in the same sentence. I saw later in the document that you had changed "function"-the-noun to "operation", so should be easy to fix. But this isn't an editorial comment, because I'm not sure what the sentence is saying.

"comparison functions" is nominal in both cases. What's missing is the verb after
the first word, e.g.:

Collations *provide an* abstraction layer...     or
Collations *function as* an abstraction layer

or so.


>4.2.2.  Equality
>    ...
>    In this specification, the return values of the equality test are
>    called "match", "no-match" and "undefined".  This is not a
>    specification, merely a choice of phrasing.
>
>What does the last sentence mean? (Brian Carpenter asked me, so he doesn't know, either).

I suggest changing the sentence:
    This is not a specification, merely a choice of phrasing.
To something like:
    Each protocol has to specify how to represent these values.
or something in that direction. Also, maybe adding a sentence
to 5.2 mentioning this would help overall understanding.

>5.2.  Operations
>
>...
>
>   Although the collation's substring function provides a list of
>   matches, a protocol need not provide all that to the client.  It may
>   provide only the first matching substring, or even just the
>   information that the substring search matched.
>
>Hmmm. I am trying to remember that you're not defining a protocol, only describing what protocols do and don't do, but I'm trying to read this from the application's perspective, and having a hard time understanding how (for example) an application that is trying to display what is matching responds when the protocol only provides an indication that something matched. You may say this is what the protocol developers are supposed to worry about ("if you think applications will want to display what matches, you'd better define the protocol so that this information is returned"), and that's OK. I'm just struggling a bit here.

To take a specific example, assume that IMAP provides a function to select
all the emails that contain a specific string (IMAP probably does provide
such a function, but I don't know for sure). IMAP will only tell you which
mails matched, and then your MUA will have to download these to show the
details.


>6.  Use by Existing Protocols
>
>...
>
>   IMAP [16] also collates, although that is explicit only when the
>   COMPARATOR [18] extension is used.  The built-in IMAP substring
>   operation and the ordering provided by the SORT [17] extension may
>   not meet the requirements made in this document.
>
>   Other protocols may be in a similar position.
>
>   In IMAP, the default collation is i;ascii-casemap, because its
>   operations most closely resembles IMAP's built-in operations.
>
>EDITORIAL: I'm guessing that the previous paragraph should be moved up one? At the very least, I'm confused because I'm not sure if the top paragraph in this extract describes the differences between i;ascii-casemap and IMAP's built-in operations or is talking about something else.

I'm reading it that i;ascii-casemap is a rather separate thing. The
second (one line) para applies to the first, and so the order of the
paras shouldn't be changed, I guess. But I'm not an IMAP expert.

Apart from that, there is a small typo: change
"operations most closely resembles" to "operations most closely resemble".


>9.1.1.  ASCII Numeric Collation Description
>
>   The "i;ascii-numeric" collation is a simple collation intended for
>   use with arbitrary sized unsigned decimal integer numbers stored as
>   octet strings.  US-ASCII digits (0x30 to 0x39) represent digits of
>   the numbers.  Before converting from string to integer, the input
>   string is truncated at the first non-digit character.  All input is
>   valid; strings which do not start with a digit represent positive
>   infinity.
>
>Is it obvious to everyone except me that leading zeros are ignored? The examples giving a little further down say so - is making this point in examples normative enough?

It's not only the examples that say so, but also the details of the
equality operation. But I agree that adding "Leading zeroes are allowed
but ignored." or so to the first
paragraph would help.

On a different level, I'm not at all sure that this collation needs to
be defined on octet strings. This is also in conflict with section
4.1, where it says "The i;ascii-numeric (Section 9.1) operation operates
on numbers." The best way to word things is to have this collation work
on characters, but sort them as numbers. The relevant sentence in
section 4.1 can just be removed. The first paragraph of 9.1 should be
reworded as follows:

   The "i;ascii-numeric" collation is a simple collation intended for
   use with arbitrary sized unsigned decimal integer numbers.
   These numbers are represented by digits from the US-ASCII range
   (U+0030 to U+0039).  Before converting from string to integer, the input
   string is truncated at the first non-digit character.  All input is
   valid; strings which do not start with a digit represent positive
   infinity. Leading zeroes are allowed but ignored.


>9.2.1.  ASCII Casemap Collation Description
>
>...
>
>   The i;ascii-casemap collation is well suited to to use with many
>   internet protocols and computer languages.  Use with natural language
>   is often inappropriate: even though the collation apparently supports
>   languages such as Italian and English, in real-world use it tends to
>   stumble over words such as "naive", names such as "Llwyd", people and
>   place names containing non-ASCII, euro and pound sterling symbols,
>   quotation marks, dashes/hyphens, etc.
>
>OK, this may be inadvertantly funny - are "naive" and "Llwyd" supposed to include a non-ascii character, or is that sentence saying something else? (Welcome to the world of the RFC Editor)

A separate point: "to to" in the first line should be fixed.
I'll get back to this in response to Lisa.

>13.  Open Issues
>
>    ... adding a
>    note to the RFC editor to possibly replace the 3066 reference
>
>From Brian: Surely this needs to be done?

Yes, this needs to be done. We should replace the reference with
draft-ietf-ltru-registry-14.txt (and maybe also draft-ietf-ltru-matching-15.txt,
both of these together will be BCP 47; don't know whether/how we can
refer to half of a BCP :-(.

>From Spencer: I'm thinking that the "checking the SP SP "1" SP SP string for correctness" also needs to be done pretty soon :-0

I'd be glad to check, but I'm not sure what needs to be checked.
Everything looks good, except that:
- In the string that is being searched, spaces seem to be collapsed,
  i.e. the draft says " 1 " (SP SP "1" SP SP) where it should say
  "  1  " (SP SP "1" SP SP). This may be an effect of XML2RFC
  or some other production effect. In the case of XML2RFC, I guess
  there should be a workaround. If we don't find a workaround at
  this stage, we have to make a note to the RFC editor.
  (and we need a note to ourselves to make sure we check this again,
   too).
- I'd personally change "the substring operation could" to
  "the substring operation should", but that may be too late
  (and lowercase 'should' is better avoided in RFCs).

Regards,    Martin.



#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     


_______________________________________________
Gen-art mailing list
Gen-art@ietf.org
https://www1.ietf.org/mailman/listinfo/gen-art