Re: [urn] gbs Name space identifier

Philip R Brenan <philiprbrenan@gmail.com> Thu, 03 October 2019 23:33 UTC

MIME-Version: 1.0
References: <CALhwFR=5Y3gjTX62P10HT_fHGWZV5t9ov=siWmWKD9MaA4EUhA@mail.gmail.com> <87r24m4614.fsf@hobgoblin.ariadne.com> <trinity-ca77aa47-8a00-419e-bfe8-867543668e08-1569325868991@3c-app-webde-bap33> <CALhwFRmtVK_xjZZQcw7JRyuW7PEr4n0keb3CnAyfsGyJjfxf3Q@mail.gmail.com> <HE1PR07MB30972990D54C3FF07D4D5712FA860@HE1PR07MB3097.eurprd07.prod.outlook.com> <CALhwFRmk_XzXHdpQXCDCp95cUGEpd9tmLTi4yjg+AAtpNhPg9g@mail.gmail.com> <HE1PR07MB3097B87B52D645973095F252FA9C0@HE1PR07MB3097.eurprd07.prod.outlook.com>
In-Reply-To: <HE1PR07MB3097B87B52D645973095F252FA9C0@HE1PR07MB3097.eurprd07.prod.outlook.com>
From: Philip R Brenan <philiprbrenan@gmail.com>
Date: Fri, 04 Oct 2019 00:33:02 +0100
Message-ID: <CALhwFR=LOoennVu3Vvo7dzDVW+kJRBgV9ei1pYZmt-F1YhxMtg@mail.gmail.com>
To: "Hakala, Juha E" <juha.hakala@helsinki.fi>
Cc: "lars.svensson@web.de" <lars.svensson@web.de>, "urn@ietf.org" <urn@ietf.org>, "Dale R. Worley" <worley@ariadne.com>
Content-Type: multipart/mixed; boundary="000000000000b88f8a05940a029a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/Qb-fcLvO58ubQF4SC5k166dPUTQ>
Subject: Re: [urn] gbs Name space identifier
Precedence: list

Hi *Juha*:

I would be pleased to elevate this proposal to ISO status and play what
every role is required of me in that process if there is some-one in your
group who is willing to guide my actions to make this happen.  To that end
I have made the suggested edit to remove the superfluous *h* from the
proposed standard and attached the results as a file for easy reference.
Please tell me how I might progress this application further?


On Wed, Oct 2, 2019 at 5:42 AM Hakala, Juha E <juha.hakala@helsinki.fi>
wrote:

> Hello Philip,
>
>
>
> as far as I am concerned, the application does not require any further
> editing, except that you should remove the extra “h” from
>
>
>
> hhttps://metacpan.org/pod/Dita::GB::Standard
>
>
>
> IMO one of the main challenges with namespace registration requests has
> been that the authors themselves usually know well the intended scope the
> proposed namespace. Alas, it may be difficult to communicate this
> information to reviewers who might not be familiar with the context. The
> text below is now pretty good at explaining the initial usage of  urn:gbs
> identifiers.
>
>
>
> I think it is a good idea that there is also room for growth. In ISO TC
> 46/SC 9 (which deals with ISO standard identifiers like ISSN, ISBN and DOI)
> we recently evaluated a new work item proposal for an identifier that would
> have been machine generated from the content of the identified textual
> resource like urn:gbs:dita’s. Alas, the proposed syntax and algorithms were
> not rigorous enough, and the proposal was not approved.
>
>
>
> As an aside, in order to make urn:gbs:dita’s ISO standard identifiers, it
> would be necessary to standardize just RFC 8141 in ISO. Would you have any
> interest on elevating the status of urn:gbs like that? Since IETF is
> Category A liaison with ISO TC 46, fast track process could be used to
> create quickly an ISO standard which is identical with the original one.
>
>
>
> All the best,
>
>
>
> Juha
>
>
>
> *Lähettäjä:* Philip R Brenan <philiprbrenan@gmail.com>
> *Lähetetty:* sunnuntai 29. syyskuuta 2019 16.57
> *Vastaanottaja:* Hakala, Juha E <juha.hakala@helsinki.fi>
> *Kopio:* lars.svensson@web.de; urn@ietf.org; Dale R. Worley <
> worley@ariadne.com>
> *Aihe:* Re: [urn] gbs Name space identifier
>
>
>
> Hi *Juha*:
>
>
>
> Thank you for your helpful comments.
>
>
>
> I have updated this document based on the email discussion as follows:
>
> 1 - Clarified that the principle purpose of the URN being applied for is
> for
> naming topics rather than locating them.
>
> 2 - Specified that currently there is only one <T> type active, namely
> "dita".
>
> 3 - Specified the computation of the <G> component in this document rather
> than
> by reference elsewhere.
>
> 4 - Expanded the discussion of the purpose of the URN.
>
> 5 - Expanded the discussion on naming versus location and why naming is so
> useful in this context.
>
> 6 - Expanded the discussion of inter-operability.
>
> Please let me know which areas of this application might require further
> elaboration?
>
> Per: https://tools.ietf.org/html/rfc8141
>
> Namespace ID:
>
>    gbs
>
> Registration Information:
>
>    Version: 1
>    Date:    2019-09-27
>
> Declared registrant of the namespace:
>
>    Name:    Ryffine Inc.
>    Address: 445 N Broadway, Denver, CO 80203
>    Contact: Philip R Brenan
>    E-mail:  philiprbrenan@gmail.com
>    www:     http://www.ryffine.com
>
> Purpose:
>
>    To allow organizations to share content written in Xml to the Dita
> Standard:
>
> http://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/dita-v1.3-os-part2-tech-content.html
>    without the exponential duplication that occurs without the name space
>    standardization provided by a URN.
>
>    Dita is a technical documentation standard promulgated by OASIS: a
> nonprofit
>    consortium that drives the development, convergence and adoption of open
>    standards for the global information society as noted at
>    https://www.oasis-open.org/org
>
>    A major goal of Dita is to enable authors to build documents from small
>    reusable components called topics and then to share and reuse these
> topics
>    via collections to enable other documents to be be built more rapidly.
>
>    As a consequence of the current addressing mechanism used to link Dita
>    topics together within a document the number of such topics in existence
>    tends to grow exponentially over time as documents evolve.  Typically
> when a
>    new version of a product is documented the author takes the existing
> set of
>    linked topic files comprising the documentation of the product,
> duplicates
>    all of these files to preserve the complex linkage structure between
> these
>    topics, then makes a small number of changes to a few of the duplicated
>    files, leaving the bulk of the topic files unchanged.  At the moment it
> is
>    difficult to reuse the original topic files in situ because of the need
> to
>    maintain the links between them.
>
>    The GB Standard as currently implemented at:
>
>    hhttps://metacpan.org/pod/Dita::GB::Standard
>
>    seeks to reduce this exponential growth of topic files by giving each
> topic
>    a unique deterministic name so that links between topics can be
> expressed in
>    a way that endures as the topic files are copied over time.
>
>    As proposed, the GB Standard allows a collection of Dita topics to
> quickly
>    determine whether it already has a copy of an incoming topic by
> computing
>    the GB Standard name of the topic and comparing it to the names of all
> such
>    topics already collected locally ready for publication. If the name
> already
>    exists then the incoming topic is discarded and the existing topic is
>    reused, if the name does not exist in the collection then the collection
>    adds the incoming topic to its list of topics available for publication.
>
>    At the same time, the GB standard provides a human readable name for
> each
>    topic which assists authors in selecting topics from each collection for
>    reuse.
>
>    The GB standard has been used by the applicant since 2016 to
> successfully
>    build and maintain several large collections of topics.
>
>    The purpose of this application then is to formalize the GB Standard
> naming
>    convention as a globally recognized URN to enable standardized topic
> naming
>    among organizations collaborating on the production of collections of
>    technical documentation using Dita.  The proposed URN will not, as it
>    stands, provide immediate global location of topics so named, instead,
> it
>    provides a standardized method of querying one or more collections of
> such
>    topics by both humans and computers in an efficient manner.
>
>
> Syntax:
>
>    urn: gbs : <T> : <G> : <B>
>
>    where:
>
>    <T> is a string of one or more characters drawn from: [a-zA-Z0-9_] which
>    identifies the type of content being classified. At this point in time
> only
>    one such type is in active use: the "dita" type. It is possible that
> further
>    types might be required in the future, if so, this document will be
> updated
>    to reflect these new types.
>
>    <G> is a string of 1 to 64 characters drawn from: [a-zA-Z0-9_].  When
> <T>
>    has the value: "dita" (currently the only permissible value),  <G> is
>    computed by concatenating the text between which ever of the following
> Xml
>    tags exist in a the Dita topic in the order in which they appear in that
>    topic:
>
>      <title>  <mainbooktitle>  <booktitlealt>
>
>    The text between these tags is used to form the <G> component after
>    converting runs of all characters other than a-zA-Z0-9 to single
> underscores
>    and truncating after character 64 if the resulting string is longer
> than 64
>    characters in length. This method was chosen based on operational
> experience
>    as it produces readable names that are closely aligned with what authors
>    expect to see as a topic name.
>
>    <B> is the MD5 sum https://en.wikipedia.org/wiki/MD5 of the content
> being
>    identified presented as a 32 character lowercase hexadecimal string
> drawn
>    from: [a-z0-9]{32} . Presenting the MD5 sum in lowercase, last and
> therefore
>    to the right has the beneficial side effect of allowing authors to
> visually
>    ignore it and concentrate instead on the <G> component in the majority
> of
>    cases where the <G> component happens to be (almost) unique.  This
>    arrangement makes the GB Standard name useful to both humans and
> computers.
>
> Assignment:
>
>    Identifier uniqueness considerations:
>
>        Uniqueness is guaranteed by the <B> component being an MD5 sum and
> is
>        thus guaranteed to be identical for identical content and very
> probably
>        different for differing content.
>
>    Identifier persistence considerations:
>
>        Persistence is guaranteed by the immutability over time of the MD5
> sum
>        of the <B> component.
>
>    Process of identifier assignment:
>
>        <T> is currently set to "dita".
>
>        <G> is chosen algorithmically depending on the value of <T> using
> the
>        topic as input as described above.
>
>        <B> is chosen by computing the MD5 sum of the content.
>
>   For example:
>
>
>  urn:gbs:dita:Introduction_to_the_GB_Standard:dddb7e2c29d2c8b9d87187fdf52a2702
>
> Resolution:
>
>     Content cannot be directly located by this standard.  However, URN's
> are
>     not necessarily required to provide locations services initially:
> providing
>     a globally unique name is valuable in its own right because it
> encourages
>     the development of, and convergence on, a small number of large,
> shared,
>     inter-operable, global collections of topics within each of which the
>     uniqueness of the URN is sufficient to provide a location service.
>
>     Equivalence is determined by comparing (ignoring case) the <B>
> components
>     of the two topics to be compared.  If they are equal the two topics are
>     considered to be equal. Otherwise they are considered to be unequal
> even if
>     the underlying content is in fact identical. The characteristics of
> the MD5
>     sum ensure that only a small number of topics will be unnecessarily
>     duplicated as a result of such false positive equivalences.
>
> Security and Privacy:
>
>    The validity of the URN can be checked as follows:
>
>    Check that the <T> component is "dita".
>
>    Check that the <G> component is computed correctly as described above.
>
>    Check the the <B> component matches the MD5 sum of the content.
>
> Inter-operability:
>
>    The case of the letters chosen is immaterial and can be safely ignored
> in
>    all computations on the proposed URN as only the <B> component is used
> for
>    comparisons.
>
>    Dita topics that do not contain ASCII characters suitable for
> constructing
>    the <G> component will be accommodated by adding a new value to the
> list of
>    values accepted by the <T> component and specifying the corresponding
>    algorithm for computing the <G> component in an update to this document.
>
> Additional Information:
>
>    An implementation in Perl of the GB Standard as specified above  when
> <T> is
>
>    equal to "dita" is located at:
>
>    https://metacpan.org/pod/Dita::GB::Standard
>
> References:
>
>    ASCII: https://en.wikipedia.org/wiki/ASCII
>
>    Dita specification:
> http://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/dita-v1.3-os-part2-tech-content.html
>
>    MD5 Sum: https://en.wikipedia.org/wiki/MD5
>
>    XML: https://en.wikipedia.org/wiki/XML
>
>
>
> On Thu, Sep 26, 2019 at 5:15 AM Hakala, Juha E <juha.hakala@helsinki.fi>
> wrote:
>
> Hello Philip,
>
>
>
> as regards this:
>
>
>
> Please tell me whether it is necessary for a *urn* to be able to uniquely
> locate files as well as classify them?
>
>
>
> URNs don’t have to be provide resolution services, so the resolver (if
> any) does not need to know the location or locations of the identified
> resource, or to link the URN to these URL / URLs. You may want to mention
> in the urn:gbs namespace registration request that no resolution services
> are anticipated.
>
>
>
> It might be useful to add to the request the sentences below on document
> types and <T> values, with a note that for the time being only Dita
> documents are within scope. And some background information about Dita
> might be useful as well, for those who are not familiar with it.
>
>
>
> Best regards,
>
>
>
> Juha
>
>
>
> *Lähettäjä:* urn <urn-bounces@ietf.org> *Puolesta *Philip R Brenan
> *Lähetetty:* keskiviikko 25. syyskuuta 2019 21.46
> *Vastaanottaja:* lars.svensson@web.de
> *Kopio:* urn@ietf.org; Dale R. Worley <worley@ariadne.com>
> *Aihe:* Re: [urn] gbs Name space identifier
>
>
>
> I have removed the link in question as the explanation of the derivation
> of the *<T>* component was deemed unsatisfactory.  Here is what I was
> trying to achieve:
>
>
>
> It is anticipated that the GB Standard represented by the *urn:* *gbs*
> name space could be usefully applied to a number of different document
> types, such as Dita, DocBook, Word, Html etc.  The <T> component is
> designed to separate these various name spaces. At the moment the only <T>
> in active use  is *dita* for Dita documents.  Within the Dita space the
> algorithm for computing the *<G>* component is included in:
>
>
>
> https://metacpan.org/pod/Dita::GB::Standard
>
>
>
> as gbStandardFileName().
>
>
>
> The computation of the <G> component is performed by examining the text
> between which ever of the following *xml* tags exist in a particular Dita
> document in the order in which they appear:
>
>
>
>  title mainbooktitle booktitlealt
>
>
>
> The text between these tags is used to form the <G> component after
> converting runs of all characters other than a-zA-Z0-9 to single
> underscores. This method was chosen because it produces the most readable
> names that are closely aligned with what authors expect to see as a file
> name.
>
>
>
> The purpose of the GB Standard is to control the explosion of duplicate
> Dita topics that tends to occur as documents evolve.  Typically when a new
> product is documented, the author takes the existing set of linked topic
> files comprising the documentation of the product, duplicates all of these
> files to preserve the linkage structure,  then makes a small number of
> changes to a few of the duplicated files, leaving the bulk of the topic
> files unchanged.  It is difficult to reuse the original topic files in situ
> because of the need to maintain the links between them.
>
>
>
> The GB Standard seeks to reduce this exponential growth of topic files by
> giving each topic a unique deterministic name so that links between topics
> can be expressed in a way that endures as the topic files are copied over
> time.
>
>
>
> As proposed, the GB Standard allows a server to quickly determine whether
> it has a copy of a file by computing the GB Standard name of an incoming
> file and comparing it to the names of all such files stored locally.   If
> the name already exists then that file is reused, if the name does not
> exist on the server then the server adds the incoming file to its list of
> files available.
>
>
>
> It is not the current intention to use the GB Standard name to locate off
> site copies of a file - as things stand this could only be achieved by
> querying each server known to store files in this manner in turn.  Please
> tell me whether it is necessary for a *urn* to be able to uniquely locate
> files as well as classify them?  If it is a requirement that a *urn *can
> be used to locate a topic file anywhere in the world then I need to rethink
> this aspect of the GB Standard and update my application for the *gbs*
> namespace accordingly.  If location is not necessarily required then the
> description of the computation of the <G> component and adequate
> documentation of the standard names in the <T> would be seem to be the
> elements that need work to progress this application further?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Sep 24, 2019 at 12:51 PM <lars.svensson@web.de> wrote:
>
> > >    <T> is a string of one or more characters drawn from: [a-zA-Z0-9_]
> which
> > >    identifies the type of content from a list of types published by the
> > >    registrant at https://metacpan.org/pod/Dita::GB::Standard::Types .
> >
> > I attempted to obtain the list of valid types at the given URL, but was
> > unsuccessful.  That page seemed to be a very top-level discussion of
> > "The GB Standard".
>
> That URL gives me a 404...
>
> Best,
>
> Lars
>
>
>
> --
>
> Thanks,
>
> Phil <https://opentokrtc.com/room/phil>
>
> Philip R Brenan <https://opentokrtc.com/room/phil>
>
>
>
> --
>
> Thanks,
>
> Phil <https://opentokrtc.com/room/phil>
>
> Philip R Brenan <https://opentokrtc.com/room/phil>
>


-- 
Thanks,

Phil <https://opentokrtc.com/room/phil>

Philip R Brenan <https://opentokrtc.com/room/phil>

Attachment: gbStandardUrnRegistration.txt

[urn] gbs Name space identifier Philip R Brenan
Re: [urn] gbs Name space identifier Dale R. Worley
Re: [urn] gbs Name space identifier lars.svensson
Re: [urn] gbs Name space identifier Hakala, Juha E
Re: [urn] gbs Name space identifier Philip R Brenan
Re: [urn] gbs Name space identifier Hakala, Juha E
Re: [urn] gbs Name space identifier Philip R Brenan
Re: [urn] gbs Name space identifier Hakala, Juha E
Re: [urn] gbs Name space identifier Philip R Brenan
Re: [urn] gbs Name space identifier Dale R. Worley
Re: [urn] gbs Name space identifier Dale R. Worley
Re: [urn] gbs Name space identifier Philip R Brenan
Re: [urn] gbs Name space identifier Dale R. Worley
Re: [urn] gbs Name space identifier Philip R Brenan
Re: [urn] gbs Name space identifier Dale R. Worley

Re: [urn] gbs Name space identifier

Attachment: gbStandardUrnRegistration.txt