[VCARDDAV] Proposal around escape character handling (2nd round)
Daisuke Miyakawa <d.miyakawa@gmail.com> Tue, 13 July 2010 13:35 UTC
Return-Path: <d.miyakawa@gmail.com>
X-Original-To: vcarddav@core3.amsl.com
Delivered-To: vcarddav@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id DF4CF3A69E1 for <vcarddav@core3.amsl.com>; Tue, 13 Jul 2010 06:35:30 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.444
X-Spam-Level: *
X-Spam-Status: No, score=1.444 tagged_above=-999 required=5 tests=[AWL=-0.311, BAYES_50=0.001, HTML_MESSAGE=0.001, MIME_BASE64_TEXT=1.753]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Pym6lxGGMt2s for <vcarddav@core3.amsl.com>; Tue, 13 Jul 2010 06:35:28 -0700 (PDT)
Received: from mail-gx0-f172.google.com (mail-gx0-f172.google.com [209.85.161.172]) by core3.amsl.com (Postfix) with ESMTP id 4A19B3A67DA for <vcarddav@ietf.org>; Tue, 13 Jul 2010 06:35:28 -0700 (PDT)
Received: by gxk3 with SMTP id 3so3555481gxk.31 for <vcarddav@ietf.org>; Tue, 13 Jul 2010 06:35:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=yJdsiCS2v+iskB634Bvh0TNolMgjP3Z3LW5+rmPOMtw=; b=sKL+1TLD8mmyaGyLYH6LJYj7ZIRH6y+x2+KDAjoYwg7RxIcN9d0rydRb9gjDTYmd9Y 2FVLczKjBacNDLsS25mGX2aOYOyePTAqhO7yAeS6MEPLk6GlSspQUX0nDf/H1L1Jew0Y dM+aCVFN3l2Z3dMZhUalwljRpk3oLIOjz9mHg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=t06Clggefkz2JmUk18s1jsuRvXZR3QwfkZy5IK/qkDb1W9OeQEvix5JHvo5V6fwWmn fSRLQ9tQ76UHdpEZ/yxqe+foyiTy0j+JeELJyVguS8eQinVvPJPnsUwJljQ9PJlLacwd thilZjC+IRirjP/qjJQ42XeYbNtmomk8t2Fys=
MIME-Version: 1.0
Received: by 10.90.90.13 with SMTP id n13mr2234125agb.31.1279028133630; Tue, 13 Jul 2010 06:35:33 -0700 (PDT)
Received: by 10.90.35.2 with HTTP; Tue, 13 Jul 2010 06:35:32 -0700 (PDT)
Date: Tue, 13 Jul 2010 22:35:32 +0900
Message-ID: <AANLkTilx6XgI2iosuKf5zmHnLggkmYe4EeeN-PijvI5K@mail.gmail.com>
From: Daisuke Miyakawa <d.miyakawa@gmail.com>
To: vcarddav@ietf.org
Content-Type: multipart/alternative; boundary="0016e68db6e6901613048b44f115"
Subject: [VCARDDAV] Proposal around escape character handling (2nd round)
X-BeenThere: vcarddav@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF vcarddav wg mailing list <vcarddav.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vcarddav>, <mailto:vcarddav-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vcarddav>
List-Post: <mailto:vcarddav@ietf.org>
List-Help: <mailto:vcarddav-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vcarddav>, <mailto:vcarddav-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jul 2010 13:35:31 -0000
Hi group! As you may know, I had proposals around escape characters. Fortunately or unfortunately the thread became a bit too long for the other people to look over. Thus, I'd like to re-organize (and modify a bit) my proposals and ask your opinions again. Please let me know when I miss something important mentioned in the previous thread (e.g. ignoring your opinion), though I carefully checked it. ****** What the current draft (rev12) defines: Backslash, semicolon, comma, and new line MUST be encoded in accordance with following rules: - backslash <-> \\ - semicolon <-> \; - comma <-> \, - new line <-> \n or \N Currently, how semicolon should be handled depends on which property has it. According to Simon's description: >* *The rule for comma, backslash, and newline are global and apply > everywhere (see section 3.3). However, the rule for semicolon only > applies to some properties which use the semicolon as separator. It is > up to the people defining a particular X- property to decide whether > they want to use the semicolon as separator. ****** Proposal 1 (new): The one-to-one rules above MUST be applied to "all" the properties, even including X- properties, for uniformity between properties. In other words, semicolons MUST be escaped even when the property does not allow multiple values (like (0, 1)). In this proposal, how readers must/should act when ';' is given without escape in those properties is undefined. I don't think "undefined" is a good idea, but I cannot think up better idea for mentioning it as a formal specification. ****** Proposal 2 (new): Add one additional one-to-one mapping. - \t <-> TAB Reason: - This convention has been used to encode usual texts, not in vCard but in the other text handlers (from C-language), I kind of thought it might be better to add this -- I felt vCard looks "exceptional" without this rule. - I think white spaces should carefully be treated and \t is typical and important for us to specially take care of. I'd say that there are few opportunities where I've seen TAB in actual vCard. This proposal is just for keeping consistency between vCard 4.0 and the other escaping rules used in other systems (like programming languages). # For example, see http://www.python.org/dev/peps/pep-3138/ ****** Proposal 3: (currently up to the group's decision) \uNNNN <-> (a Unicode character with charcode 0xNNNN) \U00NNNNNN <-> (a Unicode character with charcode 0xNNNNNN, where 0xNNNNNN SHOULD be more than 0x10000) The proposal above is mainly based on http://www.python.org/dev/peps/pep-3138/ I modified this proposal a bit (from "\x" to "\u", "\U"). Reason: Python, JavaScript and some other programming languages use \u \U format for encoding Unicode, not \x (it is usually for 8bits). This is not applied to Perl, where \x{NNNN}, 0xNNNN seem to be used. I'm not sure about Ruby. What I can tell is that programming languages actual support this kind of escaping rules. Theoretically we don't need to consider surrogate pairs like Python do, but I think convention and uniformity are important from a practical view point. Yea: 1 (only me) Reason: - The other actual implementations using Unicode (Java, Python, icu4c) actually support this format. - I prefer actual usability to theoretical simplicity, while I agree that "theoretically" this spec is not needed. -- My experience handling Unicode with surrogate pairs tell me that this mapping is practically useful. - I don't think \xNN is needed in vCard, as we don't need to take care of readability of 8bits. We just need b encoding for it. Nay: 4 (Simon, Cyrus, Julian, and Barry) Simon's reply > I disagree again. If you are using a given character in a sentence, > whether it is visible or not, it is because you intend the recipient to > read it. Otherwise, the character would not be useful and would not be > present. For example, in this paragraph I used many spaces which are > invisible and I don't think we would gain anything by replacing them > with \x20 in a vCard. We are encoding user-readable text in vCard, not > random bits. Barry's reply: > a single wire format is sufficient; display of non-ASCII characters is an orthogonal issue ****** Proposal 4 (new): When a vCard entry happened to have escaped characters undefined in vCard 4.0 spec, readers SHOULD just remove the backslash and append the wrongly escaped characters as is. (This is not "MUST" but "SHOULD", because actual astute vCard readers may have to cope with wrong input composed by the other composer) e.g. This is \a pen. -> This is a pen. (Readers SHOULD NOT understand \a as an alert (like C-language requires) but just a character 'a'). Jurian's reply related to this proposal > on \ escaping: either allow \ in front of any character, or be *very* clear that using it when it's not needed makes the vCard invalid (test cases? validator service?) I suppose just requesting readers to remove invalid backslash would suffice. I agree that my proposals are a bit too dependent on practical view points. Feel free to correct me if they have problems (from the view of the IETF way or something I'm not familiar with). Thanks, -- Daisuke Miyakawa (宮川大輔) d.miyakawa@gmail.com
- [VCARDDAV] Proposal around escape character handl… Daisuke Miyakawa
- Re: [VCARDDAV] Proposal around escape character h… Simon Perreault
- Re: [VCARDDAV] Proposal around escape character h… Cyrus Daboo
- Re: [VCARDDAV] Proposal around escape character h… Simon Perreault
- Re: [VCARDDAV] Proposal around escape character h… Florian Zeitz
- [VCARDDAV] Alignment of CUTYPE and KIND Mike Douglass
- Re: [VCARDDAV] Proposal around escape character h… Daisuke Miyakawa