Re: APPSDIR review of draft-farrell-decade-ni-07, major design issue (one or two URI schemes)

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Tue, 12 June 2012 09:13 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B7B0B21F853C for <ietf@ietfa.amsl.com>; Tue, 12 Jun 2012 02:13:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -99.468
X-Spam-Level:
X-Spam-Status: No, score=-99.468 tagged_above=-999 required=5 tests=[AWL=-0.092, BAYES_40=-0.185, GB_I_LETTER=-2, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bwMznvEyUpeA for <ietf@ietfa.amsl.com>; Tue, 12 Jun 2012 02:13:45 -0700 (PDT)
Received: from scintmta02.scbb.aoyama.ac.jp (scintmta02.scbb.aoyama.ac.jp [133.2.253.34]) by ietfa.amsl.com (Postfix) with ESMTP id EF77C21F85DF for <ietf@ietf.org>; Tue, 12 Jun 2012 02:13:44 -0700 (PDT)
Received: from scmse02.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta02.scbb.aoyama.ac.jp (secret/secret) with SMTP id q5C9DXMx004973 for <ietf@ietf.org>; Tue, 12 Jun 2012 18:13:34 +0900
Received: from (unknown [133.2.206.133]) by scmse02.scbb.aoyama.ac.jp with smtp id 7682_ec23_e287e040_b46e_11e1_96d0_001d096c5782; Tue, 12 Jun 2012 18:13:33 +0900
Received: from [IPv6:::1] ([133.2.210.1]:38052) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S15D1671> for <ietf@ietf.org> from <duerst@it.aoyama.ac.jp>; Tue, 12 Jun 2012 18:13:37 +0900
Message-ID: <4FD7083A.6080502@it.aoyama.ac.jp>
Date: Tue, 12 Jun 2012 18:13:30 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Subject: Re: APPSDIR review of draft-farrell-decade-ni-07, major design issue (one or two URI schemes)
References: <4FCDD499.7060206@it.aoyama.ac.jp> <4FCDE96E.5000109@cs.tcd.ie>
In-Reply-To: <4FCDE96E.5000109@cs.tcd.ie>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: Graham Klyne <GK@ninebynine.org>, IETF discussion list <ietf@ietf.org>, "draft-farrell-decade-ni@tools.ietf.org" <draft-farrell-decade-ni@tools.ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Jun 2012 09:13:46 -0000

Hello Stephen,

This mail responds to your points on the main technical issue that I 
have identified.

On 2012/06/05 20:11, Stephen Farrell wrote:

> On 06/05/2012 10:42 AM, "Martin J. Dürst" wrote:
>> Hello everybody,
>>
>> [For replies, please trim the cc list, thanks!]

Done, removed apps-discuss@ietf.org for the moment.


>> Major design issue:
>>
>> The draft defines two schemes, which differ only slightly, and mostly
>> just gratuitously (see also editorial issues).
>> These are the ni: and the nih: scheme. As far as I understand, they
>> differ as follows:
>>                                      ni:                nih:
>> authority:                          optional           disallowed
>> ascii-compatible encoding:          base64url          base16
>> check digit:                        disallowed         optional
>> query part:                         optional           disallowed
>> decimal presentation of algorithm:  disallowed         possible
>>
>> The usability of URIs is strongly influenced by the number of different
>> schemes, with the smaller a number, the better. As a somewhat made-up
>> example, if the original URIs had been separated into httph: for HTML
>> pages and httpi: for images, or any other arbitrary subdivision that one
>> can envision, that would have hurt the growth and extensibility of the
>> Web. Creating new URI schemes is occasionally necessary, and the ideas
>> that lead to this draft definitely seem to warrant a new scheme (*), but
>> there's no reason for two schemes.
>> [(*) I know people who would claim the the .well-formed http/https thing
>> is completely sufficient, no new scheme needed at all.]
>>
>> More specifically, if the original URIs had been separated into httpm:
>> (for machines) and httph: (for humans), the Web for sure wouldn't have
>> grown at the speed it did (and does) grow. In practice, there are huge
>> differences in human 'speakability' for URIs (and IRIs, for that
>> matter); compare e.g. http://google.com with
>> http://www.google.co.jp/#sclient=psy-ab&hl=en&site=&source=hp&q=hash&oq=hash&aq=f&aqi=g4&aql=
>>
>> (which I have significantly shortened to hopefully eliminate potential
>> privacy issues), or compare the average mailto: URI with the average
>> data: URI. However, what's important is that there never has been a
>> strong dividing line between machine-only and human-only URIs or
>> schemes, the division has always been very gradual. Short and mainly
>> human-oriented URIs have of course been handled by machines, and on the
>> other hand, very long URIs have been spoken when really necessary.
>> "Speakability" has been maintained to some extent by scheme designers,
>> and to some extent by "survival of the fittest" (URIs that weren't very
>> speakable (or spellable/memorizable/guessable/...), and their Web sites,
>> might just die out slowly).
>>
>> It should also be noted that the resistance against multiple URI schemes
>> may have been low because there are so many different ways to express
>> hashes in the draft anyway, and one more (the nih: section is the last
>> one before the examples section) didn't seem like much of a deal
>> anymore. But when it comes to URIs, one less is a lot better than one more.
>>
>> In the above ni:/nih: distinction, nih: seems to have been added as an
>> afterthought after realizing that reading an ni: URI aloud over the
>> phone may be somewhat suboptimal because there is a need for repeated
>> "upper case" - "lower case" (sure very quickly shortened to "upper" -
>> "lower" and then to "up" - "low" or something similar). It is not a bad
>> idea to try to make sure that IETF technology, and URIs in particular,
>> are accessible to people with certain kinds of dislexya. (There are
>> indeed people who have tremendous difficulties with distinguishing
>> upper- and lower-case letters, and this may or may not be connected with
>> other aspects of dislexya.) It is however totally unclear to this
>> reviewer why this has to lead to two different URI schemes with other
>> gratuitous differences.
>>
>> Finding a solution is rather easy (of course, other solutions may also
>> be possible): Merge the schemes, so that authority, check digit, and
>> query part are all optional (an authority part and/or a query part may
>> very well be very useful in human communication, and a check digit won't
>> hurt when transmitted electronically) and the decimal presentation of
>> the algorithm is always allowed, and use base32
>> (http://tools.ietf.org/html/rfc4648) as the encoding. This leads to a
>> 16.6% less efficient encoding of the value part of the ni: URI, but
>> given that other URI-related encodings, e.g. the %-encoding resulting
>> when converting an IRI to an URI, are much less efficient, and that URI
>> infrastructure these days can handle URIs with more than 1000 bytes,
>> this should not be a serious problem. Also, there's a separate binary
>> format (section 6) that is more compact already.
>
> I strongly disagree with merging ni&  nih. Though that clearly
> could be done, it would be an error.
>
> There was no such comment on the uri-review list and the designated
> expert was happy. That review was IMO the time for such comments
> and second-guessing the designated expert at this stage seems
> contrary to the registration requirements. So process-wise I
> think your main comment is late.

First, if IETF Last Call is too late to make serious technical comments 
on drafts, then I think we have to rename it to IETF Too-Late Call.

Second, designated experts are there to check for minimum requirements 
for a registration, and to give advice as they see fit (and have time). 
I'm myself a designated expert on "Character Sets", and I have 
definitely in the past approved, and would again in the future approve, 
registrations for stuff on which I would complain strongly if the 
question was "is this a good technical solution".

Graham Klyne, the designated expert for URI scheme registrations, has 
confirmed offline that he does not see his role as "expert reviewer" as 
judging the technical merit of a URI scheme proposal.


> But in any case, I also think you're wrong technically in this
> case.

Let's see. I hope we agree that we should come to a conclusion on this 
issue on technical merits, rather than on process details.


> nih *is* intended for a corner case, where humans need to speak these
> URIs and was added as a direct result of requirements from the core
> WG and not as an afterthought. ni URIs are not intended for that
> and so there really are IMO different requirements, (esp. e.g.
> checkdigit) that are best met with different schemes.

I agree that the value of a checkdigit is very limited for communication 
among machines (and for communication among humans with the help of 
machines, such as in the case of email).

On the other hand, I can't understand why (even assuming we needed a 
separate scheme) there is no authority and no query part on nih.

For the authority, I'd assume that it would be as useful when the URI is 
transmitted e.g. over the phone as when it is transmitted e.g. over email.

For the query part, there are already various ideas and proposals 
floating around, and at least some of them would be of interest for when 
the URI is transmitted e.g. over the phone. Also, even if we currently 
didn't have any actual proposals for query parameters, I think it would 
be a very bad idea to exclude them a priori for transmission e.g. over 
the phone.


> Merging ni/nih would also add more complexity for no benefit,
> which would be a bad idea.

Can you please explain what kind of complexity would have to be added? 
In terms of specification, merging the two schemes doesn't seem to be 
difficult or complex at all. Also, in terms of implementation, the only 
additions to the ni: scheme that become necessary are the check digit 
and the expression of the "suite id" as a decimal. It's very difficult 
for me to imagine that this would add significant complexity to an 
implementation; if code for nih: exists, that can mostly just be moved over.


> Your analogy about httpm/h may appear reasonable, but it is always
> unreasonable to draw conclusions from analogies. It is also unwise
> to reason from counterfactuals, which we'd also be doing if we
> accepted your argument. So I find that speculation utterly useless
> to be honest.

It is definitely unreasonable to draw conclusions from analogies *only*. 
But if you think that the httpm/h analogy is wrong, and that ni/nih is 
different, could you please explain *what* is different?


> In this case, we are dealing with different requirements so this
> should stay as-is.

If "different requirements" is your main (or only) real argument, could 
you at least explain exactly how they are different? Just that one 
requirement came from the core WG and others from other WGs or other 
parties doesn't help me to understand how the actual requirements 
differ. (Please note that even if the requirements differ, that doesn't 
mean that we need different technology to address them.)

Why do you say that ni: URIs are not intended for humans to speak? What 
am I supposed to do if I got an ni: URI in a mail message and call you 
on the phone to tell you about that? If I want to send somebody the 
information in an ni: URI by mail, should I use only the ni: version or 
only the nih: version, or both, if I can't exclude that the recipient 
may want to relay this information via voice?


> Finally, we have (some, early,) running code that matches the
> current draft and that ought also count for something

How much? The boiler plate on every ID is pretty clear that they are not 
set in stone. Also, the changes needed to merge the two schemes are not 
rocket science, quite to the contrary. (I herewith volunteer to fix the 
Ruby version, just to show)


> when compared
> to a change that would be a gratuitous dis-improvement

In what sense would merging the two schemes be a dis-improvement? Can 
you please explain?


> based it
> seems upon dubious argument

If you think that my arguments are dubious, please explain exactly why.


> that is also offered at the wrong
> point in the process.

See above. If there's something wrong with IETF Last Call, or with the 
fact that the Apps Area Directorate does reviews (which I don't think), 
then that should be addressed separately. For this discussion, I hope we 
can concentrate on technical issues.


Regards,   Martin.