Re: [urn] gbs Name space identifier

Philip R Brenan <philiprbrenan@gmail.com> Sun, 29 September 2019 13:57 UTC

Return-Path: <philiprbrenan@gmail.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0EF151200B9 for <urn@ietfa.amsl.com>; Sun, 29 Sep 2019 06:57:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.704
X-Spam-Level: *
X-Spam-Status: No, score=1.704 tagged_above=-999 required=5 tests=[AC_DIV_BONANZA=0.001, BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id iOnlvffO1ixZ for <urn@ietfa.amsl.com>; Sun, 29 Sep 2019 06:57:43 -0700 (PDT)
Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E105112004F for <urn@ietf.org>; Sun, 29 Sep 2019 06:57:42 -0700 (PDT)
Received: by mail-io1-xd2e.google.com with SMTP id a1so30793173ioc.6 for <urn@ietf.org>; Sun, 29 Sep 2019 06:57:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GOf/zhCshxfnqRjjcsWOZn6pBFo9S+l3FHLX+P72QNk=; b=eWuIXgfdmSdzMTitAXQHJkRGhUg7qwUbHf5fhk4+Mfb+XAdoViHnWNwymh5lvFldVb WUAPAWodP9maqnOBl+C0M5GZQWH4klf9f91SHMWgO/UGRHbEsAfdl6lBi1X2jkENiLe7 NiFuoxZ19BhotjdNa1u3DVGTpMasuVtFpcCHEEgi1MGCoIPIqZdycxKlgs8elOrN0BD8 lUNndGMEC8zijXurW9cFxHV5y1emGmi5CulxDQWJtdE++kIXb45zV7R6o+ur71zSxqcc VDNJM3tZGbBfZEJcnEXXXyvjAv92FKiIQc4Mq2lngGt+3DMxx8Nv+3XuskDL4hG1e7XB 7SOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GOf/zhCshxfnqRjjcsWOZn6pBFo9S+l3FHLX+P72QNk=; b=j2qEGzmXhYOG2wNoffTqtvYKdl1cryAj4XvkqYCGm3iNIrYJIBMAYTh8QoCVoDhdIz 7c1Olrqavgyo1mPzlBHrEFEipzSTHXkelPVj9EcQ/nLeA+FFGqV4yechvqwDRVe/rdyR Z6wGu2KLSLOZ1nIdLnCxpYiE4xFDmXCu0gjXCTOltTVuDmAn5y2LRa+OsCjKyMNXIKQ9 WAHHNi3u1fX8NHk83lCYVbpGm5/ufJBQcCf0RblP41kEnJtiQtlToHHblpT57/OhjGxq UPqv6xkOo62ngeyq1OmuRwCpcRhfdqm5rJtYLTRKuj7Nc2vLmt2jxOzyn80fHcHBSu1P LWLQ==
X-Gm-Message-State: APjAAAVGumR4KCVIIbGZeInxKJ3t2CjFlE4mm8N0uLBXAixeBR6olFJ2 mAFqFyq8DRkOTdfuiope/6S1Z+ykE+Fg22+/P1g=
X-Google-Smtp-Source: APXvYqxAcrG8EwwcFpxID6UG1vXs9fgDFgOL9elylviYIFRiGNpDwu7J3zyVzp0M9eNvheSjalFCCrDgffu7xIaOjz4=
X-Received: by 2002:a02:1c02:: with SMTP id c2mr16117151jac.118.1569765461977; Sun, 29 Sep 2019 06:57:41 -0700 (PDT)
MIME-Version: 1.0
References: <CALhwFR=5Y3gjTX62P10HT_fHGWZV5t9ov=siWmWKD9MaA4EUhA@mail.gmail.com> <87r24m4614.fsf@hobgoblin.ariadne.com> <trinity-ca77aa47-8a00-419e-bfe8-867543668e08-1569325868991@3c-app-webde-bap33> <CALhwFRmtVK_xjZZQcw7JRyuW7PEr4n0keb3CnAyfsGyJjfxf3Q@mail.gmail.com> <HE1PR07MB30972990D54C3FF07D4D5712FA860@HE1PR07MB3097.eurprd07.prod.outlook.com>
In-Reply-To: <HE1PR07MB30972990D54C3FF07D4D5712FA860@HE1PR07MB3097.eurprd07.prod.outlook.com>
From: Philip R Brenan <philiprbrenan@gmail.com>
Date: Sun, 29 Sep 2019 14:57:25 +0100
Message-ID: <CALhwFRmk_XzXHdpQXCDCp95cUGEpd9tmLTi4yjg+AAtpNhPg9g@mail.gmail.com>
To: "Hakala, Juha E" <juha.hakala@helsinki.fi>
Cc: "lars.svensson@web.de" <lars.svensson@web.de>, "urn@ietf.org" <urn@ietf.org>, "Dale R. Worley" <worley@ariadne.com>
Content-Type: multipart/alternative; boundary="000000000000bdf0eb0593b18092"
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/wdX8Aw522RsxYQpfL6UIMYecsww>
Subject: Re: [urn] gbs Name space identifier
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 29 Sep 2019 13:57:47 -0000

Hi *Juha*:

Thank you for your helpful comments.

I have updated this document based on the email discussion as follows:

1 - Clarified that the principle purpose of the URN being applied for is for
naming topics rather than locating them.

2 - Specified that currently there is only one <T> type active, namely
"dita".

3 - Specified the computation of the <G> component in this document rather
than
by reference elsewhere.

4 - Expanded the discussion of the purpose of the URN.

5 - Expanded the discussion on naming versus location and why naming is so
useful in this context.

6 - Expanded the discussion of inter-operability.

Please let me know which areas of this application might require further
elaboration?

Per: https://tools.ietf.org/html/rfc8141

Namespace ID:

   gbs

Registration Information:

   Version: 1
   Date:    2019-09-27

Declared registrant of the namespace:

   Name:    Ryffine Inc.
   Address: 445 N Broadway, Denver, CO 80203
   Contact: Philip R Brenan
   E-mail:  philiprbrenan@gmail.com
   www:     http://www.ryffine.com

Purpose:

   To allow organizations to share content written in Xml to the Dita
Standard:

http://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/dita-v1.3-os-part2-tech-content.html
   without the exponential duplication that occurs without the name space
   standardization provided by a URN.

   Dita is a technical documentation standard promulgated by OASIS: a
nonprofit
   consortium that drives the development, convergence and adoption of open
   standards for the global information society as noted at
   https://www.oasis-open.org/org

   A major goal of Dita is to enable authors to build documents from small
   reusable components called topics and then to share and reuse these
topics
   via collections to enable other documents to be be built more rapidly.

   As a consequence of the current addressing mechanism used to link Dita
   topics together within a document the number of such topics in existence
   tends to grow exponentially over time as documents evolve.  Typically
when a
   new version of a product is documented the author takes the existing set
of
   linked topic files comprising the documentation of the product,
duplicates
   all of these files to preserve the complex linkage structure between
these
   topics, then makes a small number of changes to a few of the duplicated
   files, leaving the bulk of the topic files unchanged.  At the moment it
is
   difficult to reuse the original topic files in situ because of the need
to
   maintain the links between them.

   The GB Standard as currently implemented at:

   hhttps://metacpan.org/pod/Dita::GB::Standard

   seeks to reduce this exponential growth of topic files by giving each
topic
   a unique deterministic name so that links between topics can be
expressed in
   a way that endures as the topic files are copied over time.

   As proposed, the GB Standard allows a collection of Dita topics to
quickly
   determine whether it already has a copy of an incoming topic by computing
   the GB Standard name of the topic and comparing it to the names of all
such
   topics already collected locally ready for publication. If the name
already
   exists then the incoming topic is discarded and the existing topic is
   reused, if the name does not exist in the collection then the collection
   adds the incoming topic to its list of topics available for publication.

   At the same time, the GB standard provides a human readable name for each
   topic which assists authors in selecting topics from each collection for
   reuse.

   The GB standard has been used by the applicant since 2016 to successfully
   build and maintain several large collections of topics.

   The purpose of this application then is to formalize the GB Standard
naming
   convention as a globally recognized URN to enable standardized topic
naming
   among organizations collaborating on the production of collections of
   technical documentation using Dita.  The proposed URN will not, as it
   stands, provide immediate global location of topics so named, instead, it
   provides a standardized method of querying one or more collections of
such
   topics by both humans and computers in an efficient manner.


Syntax:

   urn: gbs : <T> : <G> : <B>

   where:

   <T> is a string of one or more characters drawn from: [a-zA-Z0-9_] which
   identifies the type of content being classified. At this point in time
only
   one such type is in active use: the "dita" type. It is possible that
further
   types might be required in the future, if so, this document will be
updated
   to reflect these new types.

   <G> is a string of 1 to 64 characters drawn from: [a-zA-Z0-9_].  When <T>
   has the value: "dita" (currently the only permissible value),  <G> is
   computed by concatenating the text between which ever of the following
Xml
   tags exist in a the Dita topic in the order in which they appear in that
   topic:

     <title>  <mainbooktitle>  <booktitlealt>

   The text between these tags is used to form the <G> component after
   converting runs of all characters other than a-zA-Z0-9 to single
underscores
   and truncating after character 64 if the resulting string is longer than
64
   characters in length. This method was chosen based on operational
experience
   as it produces readable names that are closely aligned with what authors
   expect to see as a topic name.

   <B> is the MD5 sum https://en.wikipedia.org/wiki/MD5 of the content being
   identified presented as a 32 character lowercase hexadecimal string drawn
   from: [a-z0-9]{32} . Presenting the MD5 sum in lowercase, last and
therefore
   to the right has the beneficial side effect of allowing authors to
visually
   ignore it and concentrate instead on the <G> component in the majority of
   cases where the <G> component happens to be (almost) unique.  This
   arrangement makes the GB Standard name useful to both humans and
computers.

Assignment:

   Identifier uniqueness considerations:

       Uniqueness is guaranteed by the <B> component being an MD5 sum and is
       thus guaranteed to be identical for identical content and very
probably
       different for differing content.

   Identifier persistence considerations:

       Persistence is guaranteed by the immutability over time of the MD5
sum
       of the <B> component.

   Process of identifier assignment:

       <T> is currently set to "dita".

       <G> is chosen algorithmically depending on the value of <T> using the
       topic as input as described above.

       <B> is chosen by computing the MD5 sum of the content.

  For example:


 urn:gbs:dita:Introduction_to_the_GB_Standard:dddb7e2c29d2c8b9d87187fdf52a2702

Resolution:

    Content cannot be directly located by this standard.  However, URN's are
    not necessarily required to provide locations services initially:
providing
    a globally unique name is valuable in its own right because it
encourages
    the development of, and convergence on, a small number of large, shared,
    inter-operable, global collections of topics within each of which the
    uniqueness of the URN is sufficient to provide a location service.

    Equivalence is determined by comparing (ignoring case) the <B>
components
    of the two topics to be compared.  If they are equal the two topics are
    considered to be equal. Otherwise they are considered to be unequal
even if
    the underlying content is in fact identical. The characteristics of the
MD5
    sum ensure that only a small number of topics will be unnecessarily
    duplicated as a result of such false positive equivalences.

Security and Privacy:

   The validity of the URN can be checked as follows:

   Check that the <T> component is "dita".

   Check that the <G> component is computed correctly as described above.

   Check the the <B> component matches the MD5 sum of the content.

Inter-operability:

   The case of the letters chosen is immaterial and can be safely ignored in
   all computations on the proposed URN as only the <B> component is used
for
   comparisons.

   Dita topics that do not contain ASCII characters suitable for
constructing
   the <G> component will be accommodated by adding a new value to the list
of
   values accepted by the <T> component and specifying the corresponding
   algorithm for computing the <G> component in an update to this document.

Additional Information:

   An implementation in Perl of the GB Standard as specified above  when
<T> is
   equal to "dita" is located at:

   https://metacpan.org/pod/Dita::GB::Standard

References:

   ASCII: https://en.wikipedia.org/wiki/ASCII

   Dita specification:
http://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/dita-v1.3-os-part2-tech-content.html

   MD5 Sum: https://en.wikipedia.org/wiki/MD5

   XML: https://en.wikipedia.org/wiki/XML

On Thu, Sep 26, 2019 at 5:15 AM Hakala, Juha E <juha.hakala@helsinki.fi>
wrote:

> Hello Philip,
>
>
>
> as regards this:
>
>
>
> Please tell me whether it is necessary for a *urn* to be able to uniquely
> locate files as well as classify them?
>
>
>
> URNs don’t have to be provide resolution services, so the resolver (if
> any) does not need to know the location or locations of the identified
> resource, or to link the URN to these URL / URLs. You may want to mention
> in the urn:gbs namespace registration request that no resolution services
> are anticipated.
>
>
>
> It might be useful to add to the request the sentences below on document
> types and <T> values, with a note that for the time being only Dita
> documents are within scope. And some background information about Dita
> might be useful as well, for those who are not familiar with it.
>
>
>
> Best regards,
>
>
>
> Juha
>
>
>
> *Lähettäjä:* urn <urn-bounces@ietf.org> *Puolesta *Philip R Brenan
> *Lähetetty:* keskiviikko 25. syyskuuta 2019 21.46
> *Vastaanottaja:* lars.svensson@web.de
> *Kopio:* urn@ietf.org; Dale R. Worley <worley@ariadne.com>
> *Aihe:* Re: [urn] gbs Name space identifier
>
>
>
> I have removed the link in question as the explanation of the derivation
> of the *<T>* component was deemed unsatisfactory.  Here is what I was
> trying to achieve:
>
>
>
> It is anticipated that the GB Standard represented by the *urn:* *gbs*
> name space could be usefully applied to a number of different document
> types, such as Dita, DocBook, Word, Html etc.  The <T> component is
> designed to separate these various name spaces. At the moment the only <T>
> in active use  is *dita* for Dita documents.  Within the Dita space the
> algorithm for computing the *<G>* component is included in:
>
>
>
> https://metacpan.org/pod/Dita::GB::Standard
>
>
>
> as gbStandardFileName().
>
>
>
> The computation of the <G> component is performed by examining the text
> between which ever of the following *xml* tags exist in a particular Dita
> document in the order in which they appear:
>
>
>
>  title mainbooktitle booktitlealt
>
>
>
> The text between these tags is used to form the <G> component after
> converting runs of all characters other than a-zA-Z0-9 to single
> underscores. This method was chosen because it produces the most readable
> names that are closely aligned with what authors expect to see as a file
> name.
>
>
>
> The purpose of the GB Standard is to control the explosion of duplicate
> Dita topics that tends to occur as documents evolve.  Typically when a new
> product is documented, the author takes the existing set of linked topic
> files comprising the documentation of the product, duplicates all of these
> files to preserve the linkage structure,  then makes a small number of
> changes to a few of the duplicated files, leaving the bulk of the topic
> files unchanged.  It is difficult to reuse the original topic files in situ
> because of the need to maintain the links between them.
>
>
>
> The GB Standard seeks to reduce this exponential growth of topic files by
> giving each topic a unique deterministic name so that links between topics
> can be expressed in a way that endures as the topic files are copied over
> time.
>
>
>
> As proposed, the GB Standard allows a server to quickly determine whether
> it has a copy of a file by computing the GB Standard name of an incoming
> file and comparing it to the names of all such files stored locally.   If
> the name already exists then that file is reused, if the name does not
> exist on the server then the server adds the incoming file to its list of
> files available.
>
>
>
> It is not the current intention to use the GB Standard name to locate off
> site copies of a file - as things stand this could only be achieved by
> querying each server known to store files in this manner in turn.  Please
> tell me whether it is necessary for a *urn* to be able to uniquely locate
> files as well as classify them?  If it is a requirement that a *urn *can
> be used to locate a topic file anywhere in the world then I need to rethink
> this aspect of the GB Standard and update my application for the *gbs*
> namespace accordingly.  If location is not necessarily required then the
> description of the computation of the <G> component and adequate
> documentation of the standard names in the <T> would be seem to be the
> elements that need work to progress this application further?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Sep 24, 2019 at 12:51 PM <lars.svensson@web.de> wrote:
>
> > >    <T> is a string of one or more characters drawn from: [a-zA-Z0-9_]
> which
> > >    identifies the type of content from a list of types published by the
> > >    registrant at https://metacpan.org/pod/Dita::GB::Standard::Types .
> >
> > I attempted to obtain the list of valid types at the given URL, but was
> > unsuccessful.  That page seemed to be a very top-level discussion of
> > "The GB Standard".
>
> That URL gives me a 404...
>
> Best,
>
> Lars
>
>
>
> --
>
> Thanks,
>
> Phil <https://opentokrtc.com/room/phil>
>
> Philip R Brenan <https://opentokrtc.com/room/phil>
>


-- 
Thanks,

Phil <https://opentokrtc.com/room/phil>

Philip R Brenan <https://opentokrtc.com/room/phil>