Re: [urn] I want URNs for hashes and large random numbers

Sean Leonard <dev+ietf@seantek.com> Fri, 12 September 2014 18:18 UTC

Return-Path: <dev+ietf@seantek.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2D74A1A710D for <urn@ietfa.amsl.com>; Fri, 12 Sep 2014 11:18:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.601
X-Spam-Level:
X-Spam-Status: No, score=-2.601 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hh7FtyG3Htmo for <urn@ietfa.amsl.com>; Fri, 12 Sep 2014 11:18:49 -0700 (PDT)
Received: from mxout-08.mxes.net (mxout-08.mxes.net [216.86.168.183]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2EFC41A6F85 for <urn@ietf.org>; Fri, 12 Sep 2014 11:18:49 -0700 (PDT)
Received: from [192.168.123.7] (unknown [23.240.242.6]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id E4E25509B5; Fri, 12 Sep 2014 14:18:46 -0400 (EDT)
Message-ID: <54133919.5010103@seantek.com>
Date: Fri, 12 Sep 2014 11:19:05 -0700
From: Sean Leonard <dev+ietf@seantek.com>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.1
MIME-Version: 1.0
To: John C Klensin <john-ietf@jck.com>
References: <54129263.7080109@seantek.com> <541293C5.9030205@gmx.de> <54129F90.3090201@seantek.com> <725D9113FA12205449854DF4@JcK-HP8200.jck.com>
In-Reply-To: <725D9113FA12205449854DF4@JcK-HP8200.jck.com>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/8HyopFtK8urp7vam5bCeWhuZF1Y
Cc: Julian Reschke <julian.reschke@gmx.de>, urn@ietf.org
Subject: Re: [urn] I want URNs for hashes and large random numbers
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Sep 2014 18:18:51 -0000

On 9/12/2014 2:18 AM, John C Klensin wrote:
> Sean,
>
> Just to help me understand, two questions...
>
I think there were more than two questions. :)

[SNIP]
>
>> Specifically I want the definition of URN to be able to
>> accommodate these kinds of naming schemes.
>>
>> It's not a challenge to the ni: URI. It's just being
>> realistic: people are using mathematically deterministic
>> processes to uniquely and persistently identify (i.e., name)
>> things in real life already. So if URNs uniquely and
>> persistently identify things, don't we have a match?
> Again, what do you believe bars that use?   Is it in 2141, 3986,
> or one of the documents the WG is now working on?  Or are you
> just trying to warn against our making some change that would
> make such a URN namespace invalid?

Over the years I have made a couple of URN proposals. One source of 
pushback has been that when the URN uses some large random number or 
cryptographic hash, the assignment process does not "guarantee" enough 
uniqueness. Some respondents said that you need to have an organization 
or an IANA registry doling out identifiers. (There were/are other 
issues, but they are out-of-scope for this discussion.)

If you compare RFC 3406 and and 
draft-ietf-urnbis-rfc3406bis-urn-ns-reg-09, it seems clear that 
non-human processes are permitted, so long as they provide "consistent 
assignment"--organizations are no longer required. Compare, in 
particular, Page 5 of RFC 3406 with Page 4 of the urnbis draft.


>
> Going back to your original note in this thread:
>
> --On Thursday, September 11, 2014 23:27 -0700 Sean Leonard
> <dev+ietf@seantek.com> wrote:
>
>> and I want identifiers that are valid in their respective
>> namespace, that represent (unbroken) cryptographic hashes or
>> large random numbers, to be valid URNs as well, without any
>> complaints:
>>
>> urn:oid:2.25.324969006592305634633390616021200786553 ***
> It is one of several substantive points that have gotten lost in
> the 3986 debate, one that probably should have been on my
> "decisions to make" list as "do we still believe this?", but one
> of the reasons 3406bis is moving toward a registration model
> (rather than the IETF Consensus one called for in 3406) is
> precisely to allow easy registration and use of NIDs for
> externally-defined namespaces.
[SNIP]
(some of the premise of the text was incorrect--hopefully my description 
of UUIDs clarifies that issue)

All I'll say is that we need some standards. URNs shouldn't be a 
free-for-all.

>> This tells me that an organization is not required; it is
>> sufficient to define a process or algorithm (e.g.,
>> cryptographic hashing operation, or "pick a random large
>> number") that guarantees uniqueness within certain
>> constraints, and call it a day.
> I believe that was the intent.   It is also an excellent
> example, IMO, of why we have to be careful that the needs and
> perspectives of one particular cluster of URN namespaces,
> definitions, and user community do not define things in ways
> that exclude the equally-reasonable requirements of other
> communities.

Yup. And this is a reasonable requirement, based on engineering 
discipline and experience.

[SNIP]
>> Should 3406bis be written to simply accept that,
>> i.e., to allow pointing to an en external specification and, if
>> what it specifies is not unique, assume it is Someone Else's
>> Problem.   Or should there be a requirement for an explanation
>> of why the string is [sufficiently] unique and under what
>> conditions (e.g., the likelihood of collisions with a
>> cryptographic hash is not zero but is calculable) and some
>> serious attempt to review that explanation?

Per the above, the metric that I advocate for is "engineering 
reasonableness". Persistence and location-independence (specifically, 
independent of Internet topology) are clear and desirable goals for 
URNs, that URLs like http, ldap, gopher, ws, snmp, dns, file, and others 
lack. (The commonality with the URLs mentioned is that they use // 
syntax with an authority part; thus, they are intended to identify 
resources accessible via the Internet using IP or DNS.)

Anyway though, to get that persistence, you have to delegate the 
allocation of names using a process. All I'm saying is that processes 
based on natural phenomena, i.e., bounded and describable by physics, 
mathematics, or other sciences, are just as reliable as commitments by 
human organizations. Organizations (being made up of humans) fail; 
organizations make mistakes; organizations change their minds. Natural 
phenomena don't change like that--they have other problems. Sometimes 
natural phenomena (i.e., MD5) fail spectacularly. But the commonality 
between organizations and natural processes is that for engineering 
purposes, they're good enough.

In the case of large random numbers, as long as the address space is 
large enough, the probability of collision is small enough, that it's 
good enough for all practical intents and purposes. The same can be said 
of cryptographic hash operations (in fact that is the engineered purpose 
of a cryptographic hash operation, and what distinguishes a 
cryptographic hash from a non-cryptographic one).

Honestly, engineering reasonableness is achievable by using the 
Specification Required standard, as long as the people ratifying the 
standard are reasonable engineers. :-) I don't think that 
rfc3406bis-urn-ns-reg needs to lower the bar. We just need to accept 
that there is more than one way to generate identifiers that are unique 
within the bounds that matter for engineered systems.

Sean