Per: https://tools.ietf.org/html/rfc8141

Namespace ID:

   gbs

Registration Information:

   Version: 1
   Date:    2019-10-23

Declared registrant of the namespace:

   Name:    Ryffine Inc.
   Address: 445 N Broadway, Denver, CO 80203
   Contact: Philip R Brenan
   E-mail:  philiprbrenan@gmail.com
   www:     http://www.ryffine.com

Purpose:

   To allow organizations to share content written in Xml to the Dita Standard:
   http://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/dita-v1.3-os-part2-tech-content.html
   without the exponential duplication that occurs without the name space
   standardization provided by a URN.

   Dita is a technical documentation standard promulgated by OASIS: a nonprofit
   consortium that drives the development, convergence and adoption of open
   standards for the global information society as noted at
   https://www.oasis-open.org/org

   A major goal of Dita is to enable authors to build documents from small
   reusable components called topics and then to share and reuse these topics
   via collections to enable other documents to be be built more rapidly.

   As a consequence of the current addressing mechanism used to link Dita
   topics together within a document the number of such topics in existence
   tends to grow exponentially over time as documents evolve.  Typically when a
   new version of a product is documented the author takes the existing set of
   linked topic files comprising the documentation of the product, duplicates
   all of these files to preserve the complex linkage structure between these
   topics, then makes a small number of changes to a few of the duplicated
   files, leaving the bulk of the topic files unchanged.  At the moment it is
   difficult to reuse the original topic files in situ because of the need to
   maintain the links between them.

   The GB Standard as currently implemented at:

   https://metacpan.org/pod/Dita::GB::Standard

   seeks to reduce this exponential growth of topic files by giving each topic
   a unique deterministic name so that links between topics can be expressed in
   a way that endures as the topic files are copied over time.

   As proposed, the GB Standard allows a collection of Dita topics to quickly
   determine whether it already has a copy of an incoming topic by computing
   the GB Standard name of the topic and comparing it to the names of all such
   topics already collected locally ready for publication. If the name already
   exists then the incoming topic is discarded and the existing topic is
   reused, if the name does not exist in the collection then the collection
   adds the incoming topic to its list of topics available for publication.

   At the same time, the GB standard provides a human readable name for each
   topic which assists authors in selecting topics from each collection for
   reuse.

   The GB standard has been used by the applicant since 2016 to successfully
   build and maintain several large collections of topics.

   The purpose of this application then is to formalize the GB Standard naming
   convention as a globally recognized URN to enable standardized topic naming
   among organizations collaborating on the production of collections of
   technical documentation using Dita.  The proposed URN will not, as it
   stands, provide immediate global location of topics so named, instead, it
   provides a standardized method of querying one or more collections of such
   topics by both humans and computers in an efficient manner.


Syntax:

   urn: gbs : <T> : <G> : <B>

   where:

   <T> is a string of one or more characters drawn from: [a-zA-Z0-9_] which
   identifies the type of content being classified. At this point in time only
   one such type is in active use: the "dita" type. It is possible that further
   types might be required in the future, if so, this document will be updated
   to reflect these new types.

   <G> is a string of 1 to 64 characters drawn from: [a-zA-Z0-9_].  When <T>
   has the value: "dita" (currently the only permissible value),  <G> is
   computed by concatenating the text between which ever of the following Xml
   tags exist in a the Dita topic in the order in which they appear in that
   topic:

     <title>  <mainbooktitle>  <booktitlealt>

   The text between these tags is used to form the <G> component after
   converting runs of all characters other than a-zA-Z0-9 to single underscores
   and truncating after character 64 if the resulting string is longer than 64
   characters in length. This method was chosen based on operational experience
   as it produces readable names that are closely aligned with what authors
   expect to see as a topic name.

   <B> is the MD5 sum https://en.wikipedia.org/wiki/MD5 of the content being
   identified presented as a 32 character hexadecimal string represented by
   characters drawn from:

     a-fA-F0-9

   with uppercase and lowercase versions of a character being considered
   equivalent.

   Where possible, the <B> component should be presented to humans in lowercase
   as operational experience indicates that this makes it easier for humans to
   both locate and ignore the <B> component and concentrate instead on the <G>
   component, which is usually much easier for humans to remember, say out aloud
   and thus reason about than the <B> component.

Assignment:

   Identifier uniqueness considerations:

       Uniqueness is guaranteed by the <B> component being an MD5 sum and is
       thus guaranteed to be identical for identical content and very probably
       different for differing content.

   Identifier persistence considerations:

       Persistence is guaranteed by the immutability over time of the MD5 sum
       of the <B> component.

   Process of identifier assignment:

       <T> is currently set to "dita".

       <G> is chosen algorithmically depending on the value of <T> using the
       topic as input as described above.

       <B> is chosen by computing the MD5 sum of the content.

  For example:

       urn:gbs:dita:Introduction_to_the_GB_Standard:dddb7e2c29d2c8b9d87187fdf52a2702

Resolution:

    Content cannot be directly located by this standard.  However, URN's are
    not necessarily required to provide locations services initially: providing
    a globally unique name is valuable in its own right because it encourages
    the development of, and convergence on, a small number of large, shared,
    inter-operable, global collections of topics within each of which the
    uniqueness of the URN is sufficient to provide a location service.

    Equivalence is determined by comparing (ignoring case) just the <B>
    components of the two topics to be compared.  If they are equal the two
    topics are considered to be equal, even if, as a result of an MD5 collision
    the content of the underlying documents is in fact different. The
    characteristics of the MD5 sum make such an occurrence extremely unlikely,
    see for example Mead:

    "Unique File Identification in the National Software
     Reference Library"

    at:

    https://www.nist.gov/sites/default/files/draft-060530.pdf

    Authors who have concerns over the possible impact of an MD5 collision
    on their work should not use this name space.

Security and Privacy:

   Access to the content of the topic named by the URN is required to check the
   validity of the URN. The validity of the URN for a topic can be checked as
   follows:

   Check that the <T> component is "dita".

   Check that the <G> component is computed correctly as described above.

   Check the the <B> component matches the MD5 sum of the content.

Inter-operability:

   For many computations, the case of the letters in the URN is immaterial and
   can be safely ignored  because only the <B> component is authoritative:
   although the preferred presentation of the <B> component is in lowercase to
   minimize its visual impact on human readers, the <B> component may be
   represented using letters of either case.

   The MD5 sum represented by the <B> component can undergo degradation in,
   copying, storage or transmission yet still be recoverable by querying
   significant collections of topics for topics with similar <B> components
   given that the anticipated size of the topic space is of order 1e10 versus
   an MD5 space size of order 1e38.

   Ideally, the <G> component should be presented in the case designated by the
   original author of the topic.  In cases where this is not possible, the <G>
   component can undergo degradation and still remain useful, for example: high
   degradation rates in the <G> component have been noticed when the <G>
   component is spoken out loud by people collaborating in a shared work space
   on less than 1e2 topics. The <G> component has a space size of at least 1e6
   as evinced by the number of Wikipedia articles in English:

   https://www.wikipedia.org/

   Operational experience has confirmed, so far, that the <G> component is
   capable of tolerating the significant degradation of case and spelling that
   normally occurs in human speech.

   If the <G> component of two topics is intentionally identical or identical
   after degradation then the identity of a specific topic can be confirmed by
   saying the first few characters of its <B> component using a phonetic
   alphabet, such as the one used by NATO:

   https://en.wikipedia.org/wiki/NATO_phonetic_alphabet.

   It is anticipated that names from the proposed name space will be embedded
   in XML topic references, file names, URL queries and commands entered via the
   command line.  XML is sensitive to spaces and the following characters:

     <>'"=&

   File systems are often sensitive to file names containing:

     :/\.

   The query portion of a URL is sensitive to:

     &=#

   The command line is sensitive to spaces and:

     "'*.?\$()[]{}

   The <T>, <G>, <B> components avoid these characters to facilitate
   inter-operability between these systems.

   To facilitate the construction of file names and URLs containing references
   to topics named by this proposed name space, the formal name assigned by
   this proposal may be intentionally degraded by omitting the words 'urn' and
   'gbs', omitting the <T> component, replacing the colon between the <G> and
   <B> components with an underscore and adding a file name extension if the
   formal URN can be reliably recovered from the degraded version in the
   context within which the degraded version is being used. In such a context,
   a formal URN:

     urn:gbs:dita:Introduction_to_the_GB_Standard:dddb7e2c29d2c8b9d87187fdf52a2702

   may be intentionally degraded to:

     Introduction_to_the_GB_Standard_dddb7e2c29d2c8b9d87187fdf52a2702.xml

   Other acceptable degradations will be published as updates to this document.

   Dita topics that do not contain ASCII characters suitable for constructing
   the <G> component will be accommodated by adding a new value to the list of
   values accepted by the <T> component and specifying the corresponding
   algorithm for computing the <G> component in an update to this document.

Additional Information:

   An implementation in Perl of the proposed name space when <T> is equal to
   "dita" is located at:

   https://metacpan.org/pod/Dita::GB::Standard

References:

   ASCII: https://en.wikipedia.org/wiki/ASCII

   CRC-32: https://en.wikipedia.org/wiki/Cyclic_redundancy_check#CRC-32_algorithm

   Dita: http://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/dita-v1.3-os-part2-tech-content.html

   File extension: https://en.wikipedia.org/wiki/List_of_filename_extensions

   MD4: https://en.wikipedia.org/wiki/MD4

   MD5: https://en.wikipedia.org/wiki/MD5

   Mead: https://www.nist.gov/sites/default/files/draft-060530.pdf

   NATO: https://en.wikipedia.org/wiki/NATO_phonetic_alphabet

   SHA-2: https://en.wikipedia.org/wiki/SHA-2

   URL: https://en.wikipedia.org/wiki/URL

   Wikipedia: https://www.wikipedia.org/

   XML: https://en.wikipedia.org/wiki/XML