Re: [urn] gbs Name space identifier

Philip R Brenan <> Wed, 25 September 2019 18:46 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id E142F12086D for <>; Wed, 25 Sep 2019 11:46:45 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id gZJTGbGwYTje for <>; Wed, 25 Sep 2019 11:46:43 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::d36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 490C7120048 for <>; Wed, 25 Sep 2019 11:46:43 -0700 (PDT)
Received: by with SMTP id b136so1528249iof.3 for <>; Wed, 25 Sep 2019 11:46:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=JXXkr5yyWlFFaJD5Dr1Ckck2AEGdq33FHRKXUfzqkUg=; b=S/CKHpkkEXr0HTsdlCi8kTxGfdAMz7cn2rtUbfedIxI00MQ9OetomogSSM71vwZIkt LQGRE2Kf2+jnnkfaqjLilcJUCPfLmEvn9vqBGDZieP14YV2ZSME8u6EZIMzfdrtUhH1o NIfAw6MoJwBVRgkTzXrHv+a3FUBflkQswmg1vFMSk1QRGxy12dYlSwH/Zz12r0vQSZQs 9asMFxqtxxpl7FjnbhIxQw5gQRs910Fi6rt2mfhj41aBufDn99SUY8KiC+/fx6nomZGa Ce4qx/+mg0my4f6YJOMFriTs1/nohj+iK9QMYHGSiPi3px6T9P6FV5Qt/gEYiLa1N/9D HvHg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=JXXkr5yyWlFFaJD5Dr1Ckck2AEGdq33FHRKXUfzqkUg=; b=HEmVf5DZzamd/oAkoNxvAeSTsmexCfZCtk1D8t0elhTPl4eM4RZ+c35kh5BovF0nB9 08rEjkjvNI/WGjVIK0YS+ArBJDtO1xbmAXBy4HKPH8vzHLkp451T9a2w24mvy6NB31MV CkBVnyFRnWVFP03O/w9fiR273fOxmpSZ3WwMFbnqmdQ8tY7TY79GBiTCqGuXBPaFFZS0 wU3bNMWg82wYtLgt57i1w+nbPyBcnGJf2Hhs7FW48+Wj83csJXsweIL6iWmX3XjUX2WY UNFHy7nYZWCr7TOcpaa8SCKX54bDh99Hw1dLkx1dIYFqDh0oxcqEfF0V6wrzfkf0KNT3 RMRQ==
X-Gm-Message-State: APjAAAVqZDEGIumdV4pkyZzHrJohKx6WEvKVAFYoWKFXf9I9l9Xjarca 9PuRLO1hQ96JCkIaDvtpeaJx4wM6vCuPqft1tHk=
X-Google-Smtp-Source: APXvYqyUKtCIGTv9dBQ7mzksCIJgGI73TbvohVW9wmU1N61vY+pOe4b0joy+RM6G9tTG2UcUq5a2ycEsgrq/Smyt5yQ=
X-Received: by 2002:a5e:c644:: with SMTP id s4mr876764ioo.291.1569437202561; Wed, 25 Sep 2019 11:46:42 -0700 (PDT)
MIME-Version: 1.0
References: <> <> <trinity-ca77aa47-8a00-419e-bfe8-867543668e08-1569325868991@3c-app-webde-bap33>
In-Reply-To: <trinity-ca77aa47-8a00-419e-bfe8-867543668e08-1569325868991@3c-app-webde-bap33>
From: Philip R Brenan <>
Date: Wed, 25 Sep 2019 19:46:25 +0100
Message-ID: <>
Cc: "Dale R. Worley" <>,
Content-Type: multipart/alternative; boundary="000000000000f4c0dc05936512ef"
Archived-At: <>
X-Mailman-Approved-At: Wed, 25 Sep 2019 12:47:24 -0700
Subject: Re: [urn] gbs Name space identifier
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 25 Sep 2019 18:46:46 -0000

I have removed the link in question as the explanation of the derivation of
the *<T>* component was deemed unsatisfactory.  Here is what I was trying
to achieve:

It is anticipated that the GB Standard represented by the *urn:* *gbs* name
space could be usefully applied to a number of different document types,
such as Dita, DocBook, Word, Html etc.  The <T> component is designed to
separate these various name spaces. At the moment the only <T> in active
use  is *dita* for Dita documents.  Within the Dita space the algorithm for
computing the *<G>* component is included in:

as gbStandardFileName().

The computation of the <G> component is performed by examining the text
between which ever of the following *xml* tags exist in a particular Dita
document in the order in which they appear:

 title mainbooktitle booktitlealt

The text between these tags is used to form the <G> component after
converting runs of all characters other than a-zA-Z0-9 to single
underscores. This method was chosen because it produces the most readable
names that are closely aligned with what authors expect to see as a file

The purpose of the GB Standard is to control the explosion of duplicate
Dita topics that tends to occur as documents evolve.  Typically when a new
product is documented, the author takes the existing set of linked topic
files comprising the documentation of the product, duplicates all of these
files to preserve the linkage structure,  then makes a small number of
changes to a few of the duplicated files, leaving the bulk of the topic
files unchanged.  It is difficult to reuse the original topic files in situ
because of the need to maintain the links between them.

The GB Standard seeks to reduce this exponential growth of topic files by
giving each topic a unique deterministic name so that links between topics
can be expressed in a way that endures as the topic files are copied over

As proposed, the GB Standard allows a server to quickly determine whether
it has a copy of a file by computing the GB Standard name of an incoming
file and comparing it to the names of all such files stored locally.   If
the name already exists then that file is reused, if the name does not
exist on the server then the server adds the incoming file to its list of
files available.

It is not the current intention to use the GB Standard name to locate off
site copies of a file - as things stand this could only be achieved by
querying each server known to store files in this manner in turn.  Please
tell me whether it is necessary for a *urn* to be able to uniquely locate
files as well as classify them?  If it is a requirement that a *urn *can be
used to locate a topic file anywhere in the world then I need to rethink
this aspect of the GB Standard and update my application for the *gbs*
namespace accordingly.  If location is not necessarily required then the
description of the computation of the <G> component and adequate
documentation of the standard names in the <T> would be seem to be the
elements that need work to progress this application further?

On Tue, Sep 24, 2019 at 12:51 PM <> wrote:

> > >    <T> is a string of one or more characters drawn from: [a-zA-Z0-9_]
> which
> > >    identifies the type of content from a list of types published by the
> > >    registrant at .
> >
> > I attempted to obtain the list of valid types at the given URL, but was
> > unsuccessful.  That page seemed to be a very top-level discussion of
> > "The GB Standard".
> That URL gives me a 404...
> Best,
> Lars


Phil <>

Philip R Brenan <>