Re: [urn] gbs Name space identifier

worley@ariadne.com (Dale R. Worley) Thu, 12 September 2019 02:12 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 256C812012D for <urn@ietfa.amsl.com>; Wed, 11 Sep 2019 19:12:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.932
X-Spam-Level:
X-Spam-Status: No, score=-1.932 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7CphbeHz_zuJ for <urn@ietfa.amsl.com>; Wed, 11 Sep 2019 19:12:10 -0700 (PDT)
Received: from resqmta-ch2-12v.sys.comcast.net (resqmta-ch2-12v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:44]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 85CA5120098 for <urn@ietf.org>; Wed, 11 Sep 2019 19:12:10 -0700 (PDT)
Received: from resomta-ch2-15v.sys.comcast.net ([69.252.207.111]) by resqmta-ch2-12v.sys.comcast.net with ESMTP id 8EHOio8UZyVMu8EafiREpH; Thu, 12 Sep 2019 02:12:09 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20180828_2048; t=1568254329; bh=1BXhm3gdzybhMCzfDuyuCSXchKe0cz8yg4+Z6m+X/bs=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=pUBIkXyZOUF6tCczypwZS1JSnA8ZTAgEfwsbyMcCkO5qD3rqS+rKoEXldwH7/Ur43 w1opTpcgRnpPEC+VoVoDWnrY/CTfNWpxw2Xxud0mSaO5/eNM2pEdFqnxGjvjx4t5Qz Ko5R+KTvyWdl7lwHIvB+zUwpxf6yFMJ7qMGIQMlILWMKysQ2S3FL19Rs039PmrBszo CX6JoIkM56ej13Muv57iOogXKPdk2UyUG641coywNmGdRIGsMD85+q8HQ/B7S8/3AM bhxGDAS95znIpd454a0Pwz4XITDdOQaVvunn5D9rO/GJkKuv+KS3xmIqi5H41J0fzX abskVO6oq6tFQ==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4603:9471:222:fbff:fe91:d396]) by resomta-ch2-15v.sys.comcast.net with ESMTPA id 8EaeinRFi3u9J8Eafi8xI4; Thu, 12 Sep 2019 02:12:09 +0000
X-Xfinity-VMeta: sc=-100;st=legit
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id x8C2C7vw016530; Wed, 11 Sep 2019 22:12:07 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id x8C2C7Q4016527; Wed, 11 Sep 2019 22:12:07 -0400
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: Philip R Brenan <philiprbrenan@gmail.com>
Cc: urn@ietf.org
In-Reply-To: <CALhwFR=5Y3gjTX62P10HT_fHGWZV5t9ov=siWmWKD9MaA4EUhA@mail.gmail.com> (philiprbrenan@gmail.com)
Sender: worley@ariadne.com
Date: Wed, 11 Sep 2019 22:12:07 -0400
Message-ID: <87r24m4614.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/A503CGa6xfaMMy8eUCO6Un4oLmc>
Subject: Re: [urn] gbs Name space identifier
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Sep 2019 02:12:12 -0000

Certainly an interesing idea, but there are some things that need to be
clarified:

One thing you want to clarify is what sort of resources the URNs
specify.  From the way you write, I believe that they are intended to be
what are called BLOBs, finite sequences of octets, or files as a Unix
user thinks of them, with no metadata.  But that should be stated
explicitly.

>    <T> is a string of one or more characters drawn from: [a-zA-Z0-9_] which
>    identifies the type of content from a list of types published by the
>    registrant at https://metacpan.org/pod/Dita::GB::Standard::Types .

I attempted to obtain the list of valid types at the given URL, but was
unsuccessful.  That page seemed to be a very top-level discussion of
"The GB Standard".

>    <G> is a string of one or more characters: [a-zA-Z0-9_] chosen
>    algorithmically depending on the value of the <T> component. The possible
>    algorithms will be published on https://metacpan.org by the registrant. The
>    user is directed to the appropriate algorithm by a link published beside the
>    description of type <T> at:
>    https://metacpan.org/pod/Dita::GB::Standard::Types

As written, this provides no real constraint on the <G> value, other
than its character set, because there seems to be no constraint on the
algorithms involved.

I think the approach you want to take is to state that for each type <T>
there will be a published algorithm, and <G> has to conform to the
algorithm for <T>.

>    Identifier persistence considerations:
>
>        Persistence is guaranteed by the immutability over time of the MD5 sum
>        of the <B> component.

Because of <B>, there is a guarantee that if a URN refers to a resource
at a time, and at another time refers to a resource, then the two
resources are identical.  And that is the minimum needed for a URN
definition.

But there are no statements that types will not be removed from the list
of types, or that users might decide to use a different type for the
same resource, leaving two URNs referring to the same resource.
Similarly, what guarantees are there regarding the <G> algorithms over
time?

None of these questions are critical, but it would be much better if
you declared your intentions.

>     Equivalence is determined by comparing the <G> components of the two items
>     to be compared.  If they are equal the two items are considered to be equal.
>     Otherwise they are considered to be unequal even if the underlying content
>     is in fact identical.

If equivalence is determined solely by the <G> values, why are the <T>
and <B> values present in the URN?

>    The validity of the urn can be checked as follows:
>
>    Check that the <T> component is on the published list of possibilities.
>
>    Check that the <G> component is computed correctly when the algorithm named
>    by the <T> component is applied to the content.
>
>    Check the the <B> component matches the MD5 sum of the content.

That last item is ill-defined, as it requires one has the resource for
the URN, but there is no algorithm for constructing the resource from
the URN.

Dale