Re: [urn] gbs Name space identifier (Dale R. Worley) Thu, 12 September 2019 02:12 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 256C812012D for <>; Wed, 11 Sep 2019 19:12:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.932
X-Spam-Status: No, score=-1.932 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 7CphbeHz_zuJ for <>; Wed, 11 Sep 2019 19:12:10 -0700 (PDT)
Received: from ( [IPv6:2001:558:fe21:29:69:252:207:44]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 85CA5120098 for <>; Wed, 11 Sep 2019 19:12:10 -0700 (PDT)
Received: from ([]) by with ESMTP id 8EHOio8UZyVMu8EafiREpH; Thu, 12 Sep 2019 02:12:09 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20180828_2048; t=1568254329; bh=1BXhm3gdzybhMCzfDuyuCSXchKe0cz8yg4+Z6m+X/bs=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=pUBIkXyZOUF6tCczypwZS1JSnA8ZTAgEfwsbyMcCkO5qD3rqS+rKoEXldwH7/Ur43 w1opTpcgRnpPEC+VoVoDWnrY/CTfNWpxw2Xxud0mSaO5/eNM2pEdFqnxGjvjx4t5Qz Ko5R+KTvyWdl7lwHIvB+zUwpxf6yFMJ7qMGIQMlILWMKysQ2S3FL19Rs039PmrBszo CX6JoIkM56ej13Muv57iOogXKPdk2UyUG641coywNmGdRIGsMD85+q8HQ/B7S8/3AM bhxGDAS95znIpd454a0Pwz4XITDdOQaVvunn5D9rO/GJkKuv+KS3xmIqi5H41J0fzX abskVO6oq6tFQ==
Received: from ([IPv6:2601:192:4603:9471:222:fbff:fe91:d396]) by with ESMTPA id 8EaeinRFi3u9J8Eafi8xI4; Thu, 12 Sep 2019 02:12:09 +0000
X-Xfinity-VMeta: sc=-100;st=legit
Received: from ( []) by (8.14.7/8.14.7) with ESMTP id x8C2C7vw016530; Wed, 11 Sep 2019 22:12:07 -0400
Received: (from worley@localhost) by (8.14.7/8.14.7/Submit) id x8C2C7Q4016527; Wed, 11 Sep 2019 22:12:07 -0400
X-Authentication-Warning: worley set sender to using -f
From: (Dale R. Worley)
To: Philip R Brenan <>
In-Reply-To: <> (
Sender: (Dale R. Worley)
Date: Wed, 11 Sep 2019 22:12:07 -0400
Message-ID: <>
Archived-At: <>
Subject: Re: [urn] gbs Name space identifier
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 12 Sep 2019 02:12:12 -0000

Certainly an interesing idea, but there are some things that need to be

One thing you want to clarify is what sort of resources the URNs
specify.  From the way you write, I believe that they are intended to be
what are called BLOBs, finite sequences of octets, or files as a Unix
user thinks of them, with no metadata.  But that should be stated

>    <T> is a string of one or more characters drawn from: [a-zA-Z0-9_] which
>    identifies the type of content from a list of types published by the
>    registrant at .

I attempted to obtain the list of valid types at the given URL, but was
unsuccessful.  That page seemed to be a very top-level discussion of
"The GB Standard".

>    <G> is a string of one or more characters: [a-zA-Z0-9_] chosen
>    algorithmically depending on the value of the <T> component. The possible
>    algorithms will be published on by the registrant. The
>    user is directed to the appropriate algorithm by a link published beside the
>    description of type <T> at:

As written, this provides no real constraint on the <G> value, other
than its character set, because there seems to be no constraint on the
algorithms involved.

I think the approach you want to take is to state that for each type <T>
there will be a published algorithm, and <G> has to conform to the
algorithm for <T>.

>    Identifier persistence considerations:
>        Persistence is guaranteed by the immutability over time of the MD5 sum
>        of the <B> component.

Because of <B>, there is a guarantee that if a URN refers to a resource
at a time, and at another time refers to a resource, then the two
resources are identical.  And that is the minimum needed for a URN

But there are no statements that types will not be removed from the list
of types, or that users might decide to use a different type for the
same resource, leaving two URNs referring to the same resource.
Similarly, what guarantees are there regarding the <G> algorithms over

None of these questions are critical, but it would be much better if
you declared your intentions.

>     Equivalence is determined by comparing the <G> components of the two items
>     to be compared.  If they are equal the two items are considered to be equal.
>     Otherwise they are considered to be unequal even if the underlying content
>     is in fact identical.

If equivalence is determined solely by the <G> values, why are the <T>
and <B> values present in the URN?

>    The validity of the urn can be checked as follows:
>    Check that the <T> component is on the published list of possibilities.
>    Check that the <G> component is computed correctly when the algorithm named
>    by the <T> component is applied to the content.
>    Check the the <B> component matches the MD5 sum of the content.

That last item is ill-defined, as it requires one has the resource for
the URN, but there is no algorithm for constructing the resource from
the URN.