Re: [VCARDDAV] Proposal around escape character handling (2nd round)

Cyrus Daboo <cyrus@daboo.name> Tue, 13 July 2010 14:59 UTC

Return-Path: <cyrus@daboo.name>
X-Original-To: vcarddav@core3.amsl.com
Delivered-To: vcarddav@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4473C3A68DE for <vcarddav@core3.amsl.com>; Tue, 13 Jul 2010 07:59:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.432
X-Spam-Level:
X-Spam-Status: No, score=-0.432 tagged_above=-999 required=5 tests=[AWL=-0.433, BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p5zFCcTS8WfC for <vcarddav@core3.amsl.com>; Tue, 13 Jul 2010 07:59:41 -0700 (PDT)
Received: from daboo.name (daboo.name [151.201.22.177]) by core3.amsl.com (Postfix) with ESMTP id 718C83A69C9 for <vcarddav@ietf.org>; Tue, 13 Jul 2010 07:59:41 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by daboo.name (Postfix) with ESMTP id E28E41898CFE6; Tue, 13 Jul 2010 10:59:49 -0400 (EDT)
X-Virus-Scanned: amavisd-new at daboo.name
Received: from daboo.name ([127.0.0.1]) by localhost (chewy.mulberrymail.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id myanLdapr3fn; Tue, 13 Jul 2010 10:59:45 -0400 (EDT)
Received: from caldav.corp.apple.com (unknown [17.101.32.44]) by daboo.name (Postfix) with ESMTPSA id A919D1898CFDB; Tue, 13 Jul 2010 10:59:44 -0400 (EDT)
Date: Tue, 13 Jul 2010 10:59:42 -0400
From: Cyrus Daboo <cyrus@daboo.name>
To: Daisuke Miyakawa <d.miyakawa@gmail.com>, vcarddav@ietf.org
Message-ID: <36FF8BB750F3694C0400EAC1@caldav.corp.apple.com>
In-Reply-To: <AANLkTilx6XgI2iosuKf5zmHnLggkmYe4EeeN-PijvI5K@mail.gmail.com>
References: <AANLkTilx6XgI2iosuKf5zmHnLggkmYe4EeeN-PijvI5K@mail.gmail.com>
X-Mailer: Mulberry/4.1.0a1 (Mac OS X)
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; size=5337
Subject: Re: [VCARDDAV] Proposal around escape character handling (2nd round)
X-BeenThere: vcarddav@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF vcarddav wg mailing list <vcarddav.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vcarddav>, <mailto:vcarddav-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vcarddav>
List-Post: <mailto:vcarddav@ietf.org>
List-Help: <mailto:vcarddav-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vcarddav>, <mailto:vcarddav-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Jul 2010 14:59:43 -0000

Hi Daisuke,

--On July 13, 2010 10:35:32 PM +0900 Daisuke Miyakawa 
<d.miyakawa@gmail.com> wrote:

> ****** Proposal 1 (new):
> The one-to-one rules above MUST be applied to "all" the properties, even
> including X- properties, for uniformity between properties.
> In other words, semicolons MUST be escaped even when the property does
> not allow multiple values (like (0, 1)).
>
>
> In this proposal, how readers must/should act when ';' is given without
> escape in those properties is undefined. I don't think "undefined" is a
> good idea,
> but I cannot think up better idea for mentioning it as a formal
> specification.

; is used as a separator for "compound" values not multiple values (i.e. 
the "cardinality" (0, 1) has no bearing on that). In compound values it is 
vitally important to know where the field delimiters are (';') vs normal 
text occurrences of the field delimiter ('\;').

So there is a real problem here. For example, if in the future I define a 
new property FOO that uses a ;-delimited compound value, then I want to be 
sure that clients not aware of this property will properly "round-trip" it. 
So if I send such a client this:

FOO:delimited;text\;string

I want to be sure that when it parses and re-generates the vcard, the exact 
same value (octet-by-octet) comes back. In that case the client must not 
unescape - if it did then it would generate:

FOO:delimited;text;string

alternatively, if the client left the \; as-is and then re-generated it 
could end up with:

FOO:delimited;text\\;string

and that would be wrong! Now a smart client might spot the use of \; in the 
original value and somehow "mark" that as being compound and then apply 
compound text generation rules when re-generating (in which case it would 
hopefully generate the correct result). However, I am not sure we can rely 
on that - if we want to then we need text clearly explaining what has to 
happen. Failing that, there is no way to define a new property that uses a 
compound TEXT value and ensure that it is backwards compatible (without 
resorting to using something other than ; and , as delimiters).

Given all of that, I think we need text stating that new properties 
(registered, X-, vendor etc) MUST NOT use COMMA or SEMI-COLON text 
delimiting as existing clients may not roundtrip them. If a "structured" 
value is required, then a different delimiter has to be used. We may want 
to pick a specific character for that for IANA registered properties if we 
care.

> ****** Proposal 2 (new):
> Add one additional one-to-one mapping.
> - \t <-> TAB
>
>
> Reason:
> - This convention has been used to encode usual texts, not in vCard but
> in the other text handlers (from C-language), I kind of thought it might
> be better to add this
> -- I felt vCard looks "exceptional" without this rule.
> - I think white spaces should carefully be treated and \t is typical and
> important for us to specially take care of.
>
>
> I'd say that there are few opportunities where I've seen TAB in actual
> vCard.
> This proposal is just for keeping consistency between vCard 4.0 and the
> other escaping rules used in other systems (like programming languages).
># For example, see http://www.python.org/dev/peps/pep-3138/

Please see my comments to your proposal #4.

Also, it needs to be absolutely clear, that if we adopt this, that a TAB 
character used for line-folding MUST NOT be encoded. i.e. this is 
definitely not allowed:

DESCRIPTION:Long line of text that is folded
\tright here

> ****** Proposal 3: (currently up to the group's decision)
> \uNNNN <-> (a Unicode character with charcode 0xNNNN)
> \U00NNNNNN <-> (a Unicode character with charcode 0xNNNNNN, where
> 0xNNNNNN SHOULD be more than 0x10000)
>
>
> The proposal above is mainly based on
> http://www.python.org/dev/peps/pep-3138/
> I modified this proposal a bit (from "\x" to "\u", "\U").

I will again re-iterate my opposition to this. I think it adds too much 
extra complexity to parsing. We have support for utf-8 - let's leave it at 
that. I worry if we allow this, someone else will turn around and ask us to 
support &amp; style entities too!


> ****** Proposal 4 (new):
> When a vCard entry happened to have escaped characters undefined in vCard
> 4.0 spec, readers SHOULD just remove the backslash and append the wrongly
> escaped characters as is.
> (This is not "MUST" but "SHOULD", because actual astute vCard readers may
> have to cope with wrong input composed by the other composer)
>
>
> e.g. This is \a pen. -> This is a pen.
> (Readers SHOULD NOT understand \a as an alert (like C-language requires)
> but just a character 'a').

Doing this would be at odds with your proposals #2 and #3. If we had 
allowed clients to do this in the past, then a client could have 
legitimately generated "\t" and expected that to be mapped to "t" (same for 
"\u", "x", "\U"). That would prevent us for extending the \ escape 
characters to have \t -> TAB. So, since I think we should leave our options 
open for extending \ escaping in the future, I believe we should not 
legitimize use of \ escaping for characters the spec does not explicitly 
allow.

The spec is very clear right now:

"In all other cases, escaping MUST NOT be used."

I think that meets Julian point of being clear on what is allowed and what 
is not.

-- 
Cyrus Daboo