Re: [VCARDDAV] Questions about text handling in vCard 4.0 (rev 11)

Daisuke Miyakawa <d.miyakawa@gmail.com> Wed, 07 July 2010 03:09 UTC

Return-Path: <d.miyakawa@gmail.com>
X-Original-To: vcarddav@core3.amsl.com
Delivered-To: vcarddav@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id F17003A681C for <vcarddav@core3.amsl.com>; Tue, 6 Jul 2010 20:09:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.362
X-Spam-Level:
X-Spam-Status: No, score=0.362 tagged_above=-999 required=5 tests=[AWL=2.360, BAYES_00=-2.599, HTML_MESSAGE=0.001, J_CHICKENPOX_31=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Dtrge4PEsVUD for <vcarddav@core3.amsl.com>; Tue, 6 Jul 2010 20:09:13 -0700 (PDT)
Received: from mail-gw0-f44.google.com (mail-gw0-f44.google.com [74.125.83.44]) by core3.amsl.com (Postfix) with ESMTP id 4F1BB3A6359 for <vcarddav@ietf.org>; Tue, 6 Jul 2010 20:09:13 -0700 (PDT)
Received: by gwb10 with SMTP id 10so3782899gwb.31 for <vcarddav@ietf.org>; Tue, 06 Jul 2010 20:09:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=iL/2+Zf9zcxCM0KKOE8XPbMjEqlUH88ZzeEYBlN5B6U=; b=U8NU4Zzzne2R4JVO5clRzaEG5YaojjOjoUyPjDUJoy8mprG/g9r+QJZ7CJ0NCq6mL0 Wv++floxLABpJrUAYDdG/xPRDvALCTJ17Y6cug6gjl8lRfbG85E+gxmyZ3FTLbLsu1zj tERF2Z9yOwj1qxyFzGy0ZnkRnqYML6IAR4PkM=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=tUBQs5aMnISqPzKWLzFWHlsRMoyxR/B/jODg7ySeh3kLA4QOrKiRmKvesfHN+V0S38 WAbG54FsQEsfirHfIldFqjKJedO9q14lTlp8Q1yN82CBd8as1l3X6Z9JWmXRLiaxiHnv tgm5/Isz8d2qu+pSirceka6b7p53N1WmqrJqU=
MIME-Version: 1.0
Received: by 10.90.73.20 with SMTP id v20mr4108825aga.69.1278472152424; Tue, 06 Jul 2010 20:09:12 -0700 (PDT)
Received: by 10.90.56.11 with HTTP; Tue, 6 Jul 2010 20:09:12 -0700 (PDT)
In-Reply-To: <AANLkTin6HWxlksKWyZLcYTRJPRoaCVbS_j8UNFyy1HLn@mail.gmail.com>
References: <AANLkTik6O1nZvjdDRn1bdGb20xKbWJApIsnwfTJ8BbRa@mail.gmail.com> <4C31CF6B.9050500@viagenie.ca> <AANLkTimt74eL5nCfDFK2QgHggyL9qONlqAUDOWKjan-l@mail.gmail.com> <4C31DA5F.6030906@viagenie.ca> <AANLkTin2KEkx8wphdHhdQj2H9sY0VjR85JTsRjc7rJWr@mail.gmail.com> <4C3333A4.4000307@viagenie.ca> <AANLkTikGQf5yj_9TmjHIySqhuyoa-QIFLXZmqev6xFK4@mail.gmail.com> <4C333FE5.6010004@viagenie.ca> <AANLkTin6HWxlksKWyZLcYTRJPRoaCVbS_j8UNFyy1HLn@mail.gmail.com>
Date: Wed, 07 Jul 2010 12:09:12 +0900
Message-ID: <AANLkTimQ5O7LEqTeS3Nl9EjYz9XoG3djZZGWlPFxsRFE@mail.gmail.com>
From: Daisuke Miyakawa <d.miyakawa@gmail.com>
To: Simon Perreault <simon.perreault@viagenie.ca>
Content-Type: multipart/alternative; boundary="00163630f661802a52048ac37e49"
Cc: vcarddav@ietf.org
Subject: Re: [VCARDDAV] Questions about text handling in vCard 4.0 (rev 11)
X-BeenThere: vcarddav@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF vcarddav wg mailing list <vcarddav.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vcarddav>, <mailto:vcarddav-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vcarddav>
List-Post: <mailto:vcarddav@ietf.org>
List-Help: <mailto:vcarddav-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vcarddav>, <mailto:vcarddav-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 07 Jul 2010 03:09:16 -0000

Please let me add two actual cases we can refer and I'd like to know your
opinion about

1) Java
http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp

s1=hello there
s2=\u3053\u3093\u306b\u3061\u306f

<http://java.sun.com/javase/technologies/core/basic/intl/faq.jsp>
Java has been aware of Unicode and allows both format, though they don't use
UTF-8 but UTF-16.
I usually don't see \u format since it is not readable, but I sometimes see
it in comments, which are appropriately handled by javadoc and emitted as
human-readable html.

2) ICU4C

I'm not familiar with ICU library at a whole, but I have experience looking
at its collator config.
For example, icu4c/data/coll/ja.txt says...

ja{
    Version{"2.0.41.26"}
    collations{
        standard{
            Sequence{
                " [strength 3 ] [hiraganaQ on ]"
                "&ヽ=ゝ"
...

 "<條<梛<梃<檮<梹<桴<梵<梠<梺<椏<梍<桾<椁<棊<椈<棘<椢<椦<棡<椌<棍<棔<棧<棕<椶<椒<椄<棗<棣<椥<棹<棠<棯<椨"
...
                "&←=←"
                "&↑=↑"
                "&→=→"
                "&↓=↓"
                "&│=│"
                "&■=■"
                "&○=○"
                "&、=、"
                "&。=。"
                "&「=「"
                "&」=」"
                "&ん<'\u0020'<'!'<'\"'<'#'<'
...


In zh.txt, I can see '\uE84D'.

Even ICU4C, a reliable library for Unicode developed by Unicode authority,
is allowing this kind of format (though it uses \u or \U, not \x), so I
think it is worth considering vCard's supporting this kind of format.

Thanks,

2010年7月6日23:54 Daisuke Miyakawa <d.miyakawa@gmail.com>:

>
>
> 2010/7/6 Simon Perreault <simon.perreault@viagenie.ca>
>
> On 2010-07-06 10:25, Daisuke Miyakawa wrote:
>> > One proposal I can do is that composer side are allowed (but not
>> > recommended) to use the format only when they cannot emit the word (like
>> > when users want to edit foreign friends name without an appropriate IME)
>> > but know codepoint for that characters.
>> > Receiver side MUST be able to decode \xNNNN to appropriate Unicode form.
>> > This is a kind of dirty compromise but there's no technical difficulty
>> > nor theoretically insufficiency. I suppose I can implement
>> > sender/receiver easily.
>>
>> I think you are confusing encoding with how a particular implementation
>> renders the text.
>
>
>
>> It is perfectly fine for an implementation to render text as \xNNNN when
>> e.g. a font is unavailable.
>>
>> It is also perfectly fine for an implementation to take \xNNNN as input.
>>
>> But what goes on the wire is pure UTF-8. How it gets rendered or how it
>> is input by a user is completely irrelevant for the vCard data format.
>
>
> Well, I think we should also take care of actual use cases vCard is treated
> by users, not only from the view of purity of UTF-8. I know \xNNNN makes the
> text dirtier.
>
> This is my understanding: If we have contact application with 100%
> capability, we don't need to take care of \xNNNN topic at all. But my
> (short) experience tells me that, really often, I had to look at bare data
> with emacs, vi, or notepad.
> At that stage, adding \xNNNNNNNN is really practically useful fallback for
> reading/editing the data.
>
> Feel free to correct me. I know my saying is really biased toward "how it
> can be implemented and be used",
> not "how it should be written as a specification".
>
> Thanks!
>
>
>
>>  Simon
>> --
>> NAT64/DNS64 open-source --> http://ecdysis.viagenie.ca
>> STUN/TURN server        --> http://numb.viagenie.ca
>> vCard 4.0               --> http://www.vcarddav.org
>>
>
>
>
> --
> Daisuke Miyakawa (宮川大輔)
> d.miyakawa@gmail.com
>



-- 
Daisuke Miyakawa (宮川大輔)
d.miyakawa@gmail.com