[VCARDDAV] Questions, Concerns, and Errata concerning vcardrev-13

Rohit Khare <Rohit@Khare.org> Tue, 12 October 2010 06:44 UTC

Return-Path: <Rohit@Khare.org>
X-Original-To: vcarddav@core3.amsl.com
Delivered-To: vcarddav@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 286C63A6BC5 for <vcarddav@core3.amsl.com>; Mon, 11 Oct 2010 23:44:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.409
X-Spam-Level:
X-Spam-Status: No, score=0.409 tagged_above=-999 required=5 tests=[AWL=-1.741, BAYES_50=0.001, GB_I_LETTER=-2, IP_NOT_FRIENDLY=0.334, J_CHICKENPOX_33=0.6, J_CHICKENPOX_44=0.6, MANGLED_SIDE=2.3, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qv6NomO5apda for <vcarddav@core3.amsl.com>; Mon, 11 Oct 2010 23:43:20 -0700 (PDT)
Received: from xent.com (xent.com [69.55.232.243]) by core3.amsl.com (Postfix) with ESMTP id 067A23A67B6 for <vcarddav@ietf.org>; Mon, 11 Oct 2010 23:41:55 -0700 (PDT)
Received: from [192.168.2.102] (m209-97.dsl.rawbw.com [198.144.209.97]) (authenticated bits=0) by xent.com (8.13.5.20060308/8.13.5/Debian-3ubuntu1.1) with ESMTP id o9C6gQ8K026254 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Mon, 11 Oct 2010 23:42:28 -0700
Message-Id: <2B2DCE4A-4AC8-4C21-88CA-597A8123C809@Khare.org>
From: Rohit Khare <Rohit@Khare.org>
To: vcarddav@ietf.org
Content-Type: text/plain; charset="WINDOWS-1252"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Apple Message framework v936)
Date: Mon, 11 Oct 2010 23:42:25 -0700
X-Mailer: Apple Mail (2.936)
Subject: [VCARDDAV] Questions, Concerns, and Errata concerning vcardrev-13
X-BeenThere: vcarddav@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF vcarddav wg mailing list <vcarddav.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vcarddav>, <mailto:vcarddav-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vcarddav>
List-Post: <mailto:vcarddav@ietf.org>
List-Help: <mailto:vcarddav-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vcarddav>, <mailto:vcarddav-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Oct 2010 06:44:29 -0000

The very least I owe our colleagues here on the list is a detailed  
reading of the current draft-ietf-vcarddav-vcardrev-13.txt

I wanted to share some fairly low-level notes I took as I read through  
the spec. Some are errata, which may have already been posted to a  
wiki page [1]. Others are observations that, while frank, cranky, or  
(hopefully) humorous, are intended to reflect what an outsider might  
question when their boss first assigns him or her to “add vCard4 to  
our product by next week!” and starts reading from page 1 — without  
any of the WG history or archives at hand.

At the high-level, I don’t believe there’s such a beast as a “file  
format that humans aren’t exposed to” Short of taking extreme measures  
to prevent comprehensibility (e.g. ASN.1 BER), developers, advanced  
users, and administrators who have to put such things in envelopes  
(e.g. DB fields) all have to encounter and puzzle out the meaning of  
fields and work around bugs. Few will be English speakers, so I’m not  
appealing to semantically workable meanings for words, but many will  
be cutting and pasting instructions from the Interwebs to get the job  
done, and they deserve our sympathy — not our scorn.

Finally, I don’t like many of my own ‘editorial recommendations,’ but  
I wanted to respect the spirit of the group’s request for ‘alternate  
text’ by October 11. I am solely responsible for the text that  
follows, which means I’m glad to take the blame and even happier to  
share the credit :)

— Rohit


[1] https://wiki.mozilla.org/VCard4#draft_13_section_by_section_review


* case-insensitivity (page 7). Raised a red flag for me about the XML  
mapping I knew was coming up in association with this spec, since the  
XML will necessarily be case-sensitive. Is there evidence of case- 
mixing being interoperable at present, in the wild?

Editorial recommendation: If the data supports this, recommend that  
implementers prefer one case style as canonical, whether or not they  
are “liberal in what they accept”

* The group concept arrives early on (page 9). Only much later would  
the revision history clue me in that it’s been a controversial idea  
that’s made it in and out of the drafts.

In any case, the “group.” prefix came out of nowhere as I was reading  
the syntax, without reference to its roots in the HOME/WORK  
distinction. Whenever I introduce a new degree of freedom, I prefer to  
ground it with specific, evocative examples as soon as possible. At  
this point, even after reviewing the whole doc, I’m unclear on why it  
exists — are there other cultures that have lots of evidence of a  
taxonomy with higher valence than Home/Work? Are implementations  
incorrectly assuming the group prefix correlates to a user-facing  
label (fields should never determine display, imo)?

Editorial recommendation: a better explanation inline, or at least a  
specific forward reference. Instead, the early arrival of such a weak  
claim (SHOULD/MAY) reduced my confidence that this would be an  
interoperable spec.

* “space-saving reasons” (page 8). Caught my eye, because it  
contradicted the (debatable) point I’ve heard repeatedly of late, that  
vCard is not a human-readable format. In which case, saving space is  
ZIP’s job (or some other content-transfer-encoding), not a legitimate  
reason to make any tradeoff that makes writing a parser harder than it  
should be…

* … specifically because it makes comma and semicolon handling  
worryingly complex (§3.3). My spidey-senses start tingling when I have  
to scan over special handling for escaping. I always hope for  
something so mandatory that I can pre-process my input with a regex,  
but I usually don’t get that moment of satori :) Short of that, I  
became concerned that if I’m going to be ABNF-driven, that’s fine, but  
having input processing rules that aren’t context-sensitive seemed  
surprising (every comma, everywhere).

Editorial recommendation: MUST always unescape all input (obviously);  
MUST escape commas in fields that repeat; SHOULD escape commas in  
fields that don’t — in the end, implementers must be prepared to read  
in all commas permitted by the ABNF in §3.2 (the actual ABNF, not the  
warning text about “some value types” at the end — otherwise, exclude  
it from SAFE-CHAR), and not those separately forbidden by the side- 
agreement in §3.3.

Confidence: Low — I may have misunderstood the mechanics here.

* 4.2 - data: URIs are very handy here (RFC 2397). Other URL schemes  
would also be helpful, and I recognize that can apply by virtue of  
6.2.4. Data uris also support BASE64 (which is what Apple Address Book  
already preferred http://markmail.org/message/s6rcxdh6u5ylqtm4 )

Editorial consideration: Since data: uris specify the MIME Types of  
their payload, it may be worth illustrating this in the reference so  
that implementers are reminded of the potential of multiple,  
conflicting image types, and to (at least) prefer the one found in the  
data: uri.

In general, though, I suspect FMTTYPE should now have any normative  
meaning for a URI. The MIME Type specified when retrieving a  
representation of that resource, whether over the network or by  
decoding data: uris, is controlling.

Editorial recommendation: remove fmttype-param from this rule:

      refer-param = "VALUE=uri" / fmttype-param

* 4.4 boolean is misspelled (third time it occurs on the first line).  
Also, I don’t understand why so much case conversion is permitted  
here? Are there lots of examples of interoperability failures, or can  
we just remind authors the ABNF is the controlling legal language (all  
CAPS per page 10 - not at all “case insensitive”!). They may accept  
“TrUE” at their own risk, but never to generate that.

Editorial recommendation: mandate all-caps case.

* 4.6 FLOAT does not permit scientific notation. Is that worth warning  
implementers, who may just use their favorite language’s %f formatting?

* 5.0 (top of page 17) Implementers MUST ignore undocumented or  
private use parameters, but MAY ignore group prefixes (3.2). Given the  
risk of merge loops, especially when a PID is specified, dropping  
fields seems worse than round-tripping intact. I wonder why that was  
ruled out? Evidence of propagation failures in the wild might help  
decide this issue.

* Case-insensitivity arises throughout the doc, again in 5.2 (‘b’ and  
‘B’). As an implementer, I wish I didn’t have to read the text to  
discover gotchas — either the ABNF should be mixed-case, or declare  
outright that uppercase is preferred and MUST NOT be propagated.

* 5.3 The motivational scenario for non-vCard-aware consumers of vCard  
data seems quite made-up. For that matter, the hypothetical search  
engine just got some witness killed because it indexed the location as  
TEXT but didn’t obey CLASS ;) And what’s with the sudden preference  
for lower-case symbols? I was half expecting to be told elsewhere in  
the doc that TYPEs would be case-insensitive too.

* PREF reminds me of content-negotiation in HTTP. It was a beautiful  
idea that was convoluted too far to really ever work. The similarities  
begin with the arbitrary 0-100 scale (oops! 1-100 see? and 1 > 100,  
don’t forget — this kind of scaling jiggery-pokery is unlikely to  
interoperate, much less compose (concatenating records from multiple  
sources). Where is the actual reference to a running system with that  
much subtlety of expression?

Editorial Recommendation: By contrast, I can bet you that users care  
immensely about the *exact order* that fields were typed in. Why not  
make this ORDER, if we don’t want the original sequential occurrence  
of lines in the file to be controlling law? If anything, I’d wish we  
ruled out the use of PREF precisely because the same four characters  
exist in RFC 2426 with a completely different meaning (and it’s a  
presence/absence token to boot, not a parameter)

* Aside: one new function that’s become integral to my user of an  
‘address book’ in the decades since RFC 2426 is the ‘call log’. I can  
imagine that it might be a useful illustrative example for how to use  
private extensions and propose new IANA registries to show how a  
cellphone developer could extend vCard to round-trip their address  
book more completely…

* 5.5. Is ALTID a number or an entity tag or what? I mean, if every  
single example uses a digit, I was inclined to believe it was close to  
a ordinal position referent system, until I was rudely awakened at the  
very last line of the section that it was text (and hence a string- 
equivalence index referent system). What do real implementations do  
when marshaling these graph structures?

* 5.7 “act like tags” is devoid of meaning. What definition or  
citation are we providing for this? And the cross-dependency with the  
setting of KIND to ‘individual’ is exactly the kind of English- 
language lawyering no implementer wants to read up on after getting  
the ABNF working. In general, I’d strike KIND entirely, until some  
braver working group wants to pursue “vThing” on its merits — rather  
than glomming it onto the interchange format for a class of software —  
address books — that has never aspired to catalog more general than  
methods of addressing communications.

In addition, TYPE is one of the more vacuous/contentious four-letter  
strings in computing. Look no further than RFC 2426 where Type exists  
a descriptive mixed-case string in every other section header —  
because then it was being used the way ‘properties’ is in the current  
spec.

“multiple, different uses” the brightest of bright-red-flags that a  
spec will interoperate poorly in the field. I already have nightmarish  
visions of having to compile truth tables into my validation code that  
correspond to the listings in this section. To say nothing of making  
future extensibility a nightmare to maintain, since I’d have to find  
and recompile all of those lookup tables when a new property comes  
along…

* 5.7 also contains a typo: FIBURL. Unless that’s a new way to lie to  
your boss that you aren’t going to be available for that meeting —  
ever :)

* 5.8 c’mon, really? You’re going to justify *@&# like DEATH on  
genealogical bases, and I can’t even create a stock vCard for Jesus H.  
Christ? Where’s B.C.E.? And next I suppose I’ll be told I can’t I use http://en.wikipedia.org/wiki/French_Republican_Calendar 
  for cataloging notable guillotinings. Not to mention offending  
followers of http://en.wikipedia.org/wiki/Hebrew_calendar , http://en.wikipedia.org/wiki/Japanese_era_name 
  , or http://en.wikipedia.org/wiki/Stardate And don’t forget TIME  
while you’re at it: http://en.wikipedia.org/wiki/Swatch_Internet_Time

Punting by saying, “well, there’s only one true defined meaning, but  
you can plug in anything else you want as x- or ask IANA” is not a  
decision that supports diversity — it’s a decision to favor exactly  
one point of view.

This kind of Eurocentric dead-white-male crap is inexcusable in my  
opinion. Either there’s a legitimate degree of freedom required here,  
with documented, interoperable use cases, or it’s just useless — no,  
scratch that, *offensive* - preening.

Now, of course, it happens to be that every single use of *address  
book* software that I’ve been exposed to uses Gregorian dates. Every  
other CALSCALE use that might make sense for people, organizations, or  
things in general is appealing to the existence of software yet  
unwritten — and if you can use it to create a concordance of Star Trek  
characters, I’ll bet it won’t be called an “address book” or a  
“contact card” either.

* 5.10 should cite RFC 5870 http://geouri.org/ as the recommended URI  
scheme, or a forward reference to 6.5.2. Chasing this down was the  
first time I became aware that “GEO” is *both* a parameter and a  
property, but I suppose that ship has sailed. No implementer should be  
burdened by caring about which word-beginning-with-p to use…

* 5.12 solves only one narrow problem — was there no other way to  
indicate XML versions without this additional layer of out-of-band  
signaling? Of course, I’m in favor of dropping the XML property  
entirely — whether or not there’s ever a standard XML mapping of vCard  
itself.

Editorial recommendation: VERSION directly conflicts with the older  
description of §3.6.9 in RFC 2426 to boot, so I would drop it entirely.

===========
I stopped editing as closely at this point. It’s extremely late  
tonight on October 11. Here’s the fairly raw dump of the remaining  
bullet points in my scratchpad buffer. Many may be wrong, of course!
===========

* 5.13 doesn’t permit full MIME types? MIME types have parameters too,  
so might be worth noting. Or it might be too pedantic for words...

* 6.1.4 Q: where the hell does KIND come from? A: the impulse to model  
things no address book has ever modeled before.

* 6.2.3 I’ve seen nickname used to store screenname — there are a lot  
more custom services and social networks that need to dump member  
listings, and I can imagine more will go with the shortcut “username  
is like a kind of alias or nickname, right”

The 3rd example of nick=boss is especially perverse, since that’s a  
job title or role relationship — and, once again, a level of detail  
I’ve never imagined any real user typing into a contact system to  
remind themselves how to refer to a colleague, client, or relative.  
When examples lack credibility, specifications lack credibility.

* 6.2.4 PHOTO - bare subtypes? WHy not just describe it as *incorrect*  
(but found in the wild — you have to cull the herd at some point).  
Having a nearly-mandatory Encoding with a single possible value  
bothers me, but can’t easily be ruled out — at least we got the option  
of URIs for this field in return for that complexity.

* 6.2.5 calscale is just getting in the way again here. Why are we so  
sure of what it can modify if we have no examples of even a second  
type of calscale? I can make up hypothetical use cases that say even a  
bare calscale would be helpful, without any date or datetime at all…

* 6.2.10 is absence supposed to be inferred as 0 or 9? Because let me  
tell you, the politically freighted “not known” is not at all the same  
as “not included”. Properties ought to have default values, and the  
debate this one would provoke is worth watching.

Furthermore, it’s utterly unreasonable to use a scalar unit (integer)  
to represent a limited group — it’s a %d and will be used as such. So  
why not at least mandate a testable rule for interoperability by  
claiming that unrecognized integers should survive round-trips. http://en.wikipedia.org/wiki/Third_gender

Interestingly, HR-XML mentions that gender specification may be  
illegal in certain contexts: http://ns.hr-xml.org/2_5/HR-XML-2_5/CPO/GuidelinesForISOUtilities.html 
  section 3

(and yes, I did find http://www.ietf.org/mail-archive/web/vcarddav/current/msg00997.html 
  and the thread at http://lists.w3.org/Archives/Public/public-contacts-coord/2010JulSep/0010.html 
  )

Too bad this is only in German: http://translate.google.com/translate?hl=en&sl=de&u=http://de.wikipedia.org/wiki/Datenstandards_zur_Beschreibung_des_Geschlechts&ei=bvqzTPfZFoy2vQPhv7mbCg&sa=X&oi=translate&ct=result&resnum=1&ved=0CBYQ7gEwAA&prev=/search%3Fq%3D%2522ISO%2B5218%2522%2B%252Bintersex%26hl%3Den%26safe%3Doff%26client%3Dsafari%26rls%3Den%26prmd%3Div 
  it’s the best reference I could find on the topic shows that ISO  
5218 was extended by the central cancer registry (NACCR) with 3/Other  
(hermaphrodite) and 4/Transsexual. There’s another half-dozen  
standards in there. Not to mention a telling point that the UK  
Government Data Standards includes a birth-sex and a current-sex field…

* 6.3.1 positional notation isn’t that much better than CSV, is it  
really? Even if it is mandated by some other standard (and a citation  
at this point in the doc would be helpful, why not make the ABNF more  
developer-friendly by actually marking tokens, in order, for each of  
the levels. I don’t want to see devs counting ;s…

* 6.3.2 I never thought much about a pragma (escape hatch) just to  
print a preformatted label. Seems far from DRY, but not worth fighting

* 6.4.1 Editorial suggestion: make a table? it’s a wall of text and as  
a dev, I’d appreciate the Least You Have To Know (TM).

Also, what happened to Telex?:)
http://www.cio.com/article/598628/10_Technologies_That_Should_Be_Extinct_but_Aren_t_
http://www.economist.com/node/10609367 (telex: a faint ping - also at http://edwardlucas.blogspot.com/2008/02/telex-lives-on.html
Telex is still used for ships at sea — this company was advertising on  
a Google search for the term today: http://plainsailing.org/telex.html
and for the final entertainment from the time capsule: how hip ad  
firms use MCI Mail to send telexes! 1988 story on “THE EXECUTIVE  
COMPUTER; Sending a Telex From Your Desk” http://select.nytimes.com/gst/abstract.html?res=FB0710FE3F550C778DDDA10894D0484D81

* 6.4.2 interaction with IDNs? non-Internet email addrs? Twitter  
handles? PEOPLE WILL PUT WHATEVER THEY NEED TO IN THOSE FIELDS — even  
deadlines shoved into anniversary b/c they need the client name to pop  
up on their calendar in 6 weeks to sell again. so be aware of that  
whenever we finally mature to a standard that’s expected to  
interoperate in the real-world, not the lab.

* 6.5.1 Typo: “dailight“

* 6.5.2 We’re not shy about mandating formats for values that allow  
actual interoperability; I wish I knew from the doc (c.f. mabbet) why  
there’s a MAY here for other formats. Perhaps we can have vCards for  
stars, when someone invents an celestial: scheme for right ascension  
and declination :)

* I found 6.6.2 sufficiently confusing I had to consult the X.520  
text. I found a freely-available copy at http://www.itu.int/rec/dologin.asp?lang=e&id=T-REC-X.520-200508-I!!PDF-E&type=items 
, http://www.x500standard.com/index.php?n=Ig.LatestAvail that confused  
me further until I hit this phrase: “people sharing the same  
occupation.” — that is, in colloquial English, Role == “occupation” or  
perhaps “trade”/“profession”. But anyway, I’ll drop it — that this  
field even exists is a sign of how much Versit, in turn, signed over  
decisionmaking to ITU… and we don’t have any statistical data in the  
wild about how it’s even used, and by whom.

* 6.6.6. is aptly numbered — what the heck was going on when the spec  
wandered into the territory of social applications without a single  
citation to an authority about this oddly-curated collection of  
potential relationships.

At least XFN can point to an ‘installed base’ of tens of millions of  
example social graphs in the wild and a field-tested universe of link  
relations. “private extensions may be used” is no excuse for a lack of  
discetion. What language even has “supervisee”?
  You have to escalate from the free Merriam-Webster Online Dictionary  
because it’s “only  available in our premium Merriam-Webster  
Unabridged Dictionary.”

* 6.7.1 Comparing it to §3.6.1 in RFC 2426, I think it’s nifty to drag  
the field in to the 21st century by recasting them as “tags” - but if  
we really do mean to accommodate tags, in the sense of the rel=tag  
microformat, then the values have to be URLs (the visible tag is the  
final base pathname component). So either the non-normative statement  
“Also known as "tags".” has some other defensible meaning, or it  
probably should avoid that four-letter word IMHO.

Since it’s a legacy field, I know we can’t change the spelling, but I  
also find it amusing that it’s the only plural field name in the  
entire spec…

* Another bug going back to §3.6.2 of RFC 2426 is basing NOTE on the  
“X.520 Description” — the actual X.520 spec is “text that describes  
the associated object” rather than “supplemental information” More  
like an acronym expansion than a warning that the fax machine will be  
off-hook after 5:15PM.

And, of course, neither definition reflects what really goes into a  
NOTE field: CRM, cases, birthdays of your client’s  children, and so on…

* 6.8.1 When upgrading CLASS with additional informative text about  
the basically-useless PRIVATE/PUBLIC/CONFIDENTIAL distinction dating  
back to 3.7.1 in RFC 2426, has anyone ever field tested what  
distinction a developer would naively draw between PRIVATE and  
CONFIDENTIAL? Pop quiz time…

I think the explanatory text is a valiant effort to imply some  
behavior never seen enforced by interoperable running code instead of  
putting a stake through the heart of unused fields?  As stated in §9,  
it is merely “desired,” but by whom? Since “That policy is not  
enforced in any way” (as it was in 2426, where they chose not to  
dignify CLASS with definitions)

(CLASS is absent from vCard 2.1, http://www.imc.org/pdi/vcard-21.txt )
This also makes a mockery of the template in 10.2.6. for “TOP SECRET”  
— how about actually useful examples, such as perhaps a “NOINDEX”  
classification to warn other services not to “crawl” that card for  
searching…?

* 6.8.2 are there examples of KEY in the wild? Extending its  
definition from vCard3 to use FMTTYPE permits the addition of MIME  
typing information, which from my humble knowledge of the field, is  
still inadequate to the task of identifying how the key would be used.  
But I also can’t recommend opening up this can of worms to permit a  
common, easily used non-“b” encoding of key as text (e.g. SSH, in RFC  
4716; or PEM, in RFC1421; or Asymmetric Key Packages, RFC5958  — see http://www.cryptosys.net/pki/rsakeyformats.html 
  for even more)