Re: [urn] gbs Name space identifier
Philip R Brenan <philiprbrenan@gmail.com> Thu, 03 October 2019 23:33 UTC
Return-Path: <philiprbrenan@gmail.com>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 41DB912087A for <urn@ietfa.amsl.com>; Thu, 3 Oct 2019 16:33:25 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.997
X-Spam-Level:
X-Spam-Status: No, score=-0.997 tagged_above=-999 required=5 tests=[AC_DIV_BONANZA=0.001, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, FREEMAIL_REPLY=1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AQf7GjUao0SG for <urn@ietfa.amsl.com>; Thu, 3 Oct 2019 16:33:20 -0700 (PDT)
Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AFD7A12086A for <urn@ietf.org>; Thu, 3 Oct 2019 16:33:20 -0700 (PDT)
Received: by mail-io1-xd36.google.com with SMTP id c6so9441213ioo.13 for <urn@ietf.org>; Thu, 03 Oct 2019 16:33:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Z3lMqTH/xeckAMuTRadE55imXEvWbinH4oXZwJzceRM=; b=nWpTZ03/uExvN5PoB0Cig53QLnl73m9NGqJMzdXZ45t7sj62sKXWLIuqvg35s1D8dB eehCbWFzRELhSC3iqmQezZpakSQpO8WY58j4mQO9ZfdX5IhAlHQh3kZUfRSMTRE7azr/ LcB90t2/reP8N/AYLlRQ7MaoiIUh5Syy15m9XlhcyQthTxiELG7en/uiZXsLpMpl84EL KsgWRkJxhxDE+/ZDMCdeG47R8nXEjGF5staepP6nD8qQktDnnCM7YWBc77bPBNJj2F3u QMEmrdBY9ePjwd3/itMTU3n9bA7jjuUmrS7Ygr+Sqcik/ZDUVaZweW2f1OhszcnD5SDE XO+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Z3lMqTH/xeckAMuTRadE55imXEvWbinH4oXZwJzceRM=; b=mKS1OOLW8V46DGC5jpErCg5uMw1FFfxd6ep4bOEeqLtT8sc2Q1UJGFRv3Csf9xJhmL Em4HgdGoaDeIyKrN1J9Wl5GaCE/ysuB/QKw7cl7FH6pdVbmJqq47y4hB+e//iezWSOKO KSnnsiD/L+BhpvNDBj/RXhXv/cV4XWZQO668nGvcOoj8KV3Y/v2DfZmWi4yFfbh/gG3L 6JbcnFogATcLQwj7FpQIi+jfSC05h+HxS7jO3yOXWxRpNoykOcug5snqXg54bzesohm0 YuastKbUJcP7zeqZyEZxjyZoFbJ3IpPp6MBNbkrwgoBeV0ifkreoe6z5ct3fXfOYKAQC FV1w==
X-Gm-Message-State: APjAAAXuQwx1sBgGdoMcsrJqFbjO2d/5HzmnZypt7AnCF5QzuH3SPTZo eREQn9GyZVTXSRpdLWotHH+L3oP17a6EHT0ClSw=
X-Google-Smtp-Source: APXvYqxQilSRWQpzqBFBhgR55QkC1LliGj95xZYd2p4XoF52sKr3Dw1mn1atEKeEja0nf2EqYpzKRc5d858FUpIb2Ew=
X-Received: by 2002:a02:1c02:: with SMTP id c2mr12055797jac.118.1570145599773; Thu, 03 Oct 2019 16:33:19 -0700 (PDT)
MIME-Version: 1.0
References: <CALhwFR=5Y3gjTX62P10HT_fHGWZV5t9ov=siWmWKD9MaA4EUhA@mail.gmail.com> <87r24m4614.fsf@hobgoblin.ariadne.com> <trinity-ca77aa47-8a00-419e-bfe8-867543668e08-1569325868991@3c-app-webde-bap33> <CALhwFRmtVK_xjZZQcw7JRyuW7PEr4n0keb3CnAyfsGyJjfxf3Q@mail.gmail.com> <HE1PR07MB30972990D54C3FF07D4D5712FA860@HE1PR07MB3097.eurprd07.prod.outlook.com> <CALhwFRmk_XzXHdpQXCDCp95cUGEpd9tmLTi4yjg+AAtpNhPg9g@mail.gmail.com> <HE1PR07MB3097B87B52D645973095F252FA9C0@HE1PR07MB3097.eurprd07.prod.outlook.com>
In-Reply-To: <HE1PR07MB3097B87B52D645973095F252FA9C0@HE1PR07MB3097.eurprd07.prod.outlook.com>
From: Philip R Brenan <philiprbrenan@gmail.com>
Date: Fri, 04 Oct 2019 00:33:02 +0100
Message-ID: <CALhwFR=LOoennVu3Vvo7dzDVW+kJRBgV9ei1pYZmt-F1YhxMtg@mail.gmail.com>
To: "Hakala, Juha E" <juha.hakala@helsinki.fi>
Cc: "lars.svensson@web.de" <lars.svensson@web.de>, "urn@ietf.org" <urn@ietf.org>, "Dale R. Worley" <worley@ariadne.com>
Content-Type: multipart/mixed; boundary="000000000000b88f8a05940a029a"
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/Qb-fcLvO58ubQF4SC5k166dPUTQ>
Subject: Re: [urn] gbs Name space identifier
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Oct 2019 23:33:25 -0000
Hi *Juha*: I would be pleased to elevate this proposal to ISO status and play what every role is required of me in that process if there is some-one in your group who is willing to guide my actions to make this happen. To that end I have made the suggested edit to remove the superfluous *h* from the proposed standard and attached the results as a file for easy reference. Please tell me how I might progress this application further? On Wed, Oct 2, 2019 at 5:42 AM Hakala, Juha E <juha.hakala@helsinki.fi> wrote: > Hello Philip, > > > > as far as I am concerned, the application does not require any further > editing, except that you should remove the extra “h” from > > > > hhttps://metacpan.org/pod/Dita::GB::Standard > > > > IMO one of the main challenges with namespace registration requests has > been that the authors themselves usually know well the intended scope the > proposed namespace. Alas, it may be difficult to communicate this > information to reviewers who might not be familiar with the context. The > text below is now pretty good at explaining the initial usage of urn:gbs > identifiers. > > > > I think it is a good idea that there is also room for growth. In ISO TC > 46/SC 9 (which deals with ISO standard identifiers like ISSN, ISBN and DOI) > we recently evaluated a new work item proposal for an identifier that would > have been machine generated from the content of the identified textual > resource like urn:gbs:dita’s. Alas, the proposed syntax and algorithms were > not rigorous enough, and the proposal was not approved. > > > > As an aside, in order to make urn:gbs:dita’s ISO standard identifiers, it > would be necessary to standardize just RFC 8141 in ISO. Would you have any > interest on elevating the status of urn:gbs like that? Since IETF is > Category A liaison with ISO TC 46, fast track process could be used to > create quickly an ISO standard which is identical with the original one. > > > > All the best, > > > > Juha > > > > *Lähettäjä:* Philip R Brenan <philiprbrenan@gmail.com> > *Lähetetty:* sunnuntai 29. syyskuuta 2019 16.57 > *Vastaanottaja:* Hakala, Juha E <juha.hakala@helsinki.fi> > *Kopio:* lars.svensson@web.de; urn@ietf.org; Dale R. Worley < > worley@ariadne.com> > *Aihe:* Re: [urn] gbs Name space identifier > > > > Hi *Juha*: > > > > Thank you for your helpful comments. > > > > I have updated this document based on the email discussion as follows: > > 1 - Clarified that the principle purpose of the URN being applied for is > for > naming topics rather than locating them. > > 2 - Specified that currently there is only one <T> type active, namely > "dita". > > 3 - Specified the computation of the <G> component in this document rather > than > by reference elsewhere. > > 4 - Expanded the discussion of the purpose of the URN. > > 5 - Expanded the discussion on naming versus location and why naming is so > useful in this context. > > 6 - Expanded the discussion of inter-operability. > > Please let me know which areas of this application might require further > elaboration? > > Per: https://tools.ietf.org/html/rfc8141 > > Namespace ID: > > gbs > > Registration Information: > > Version: 1 > Date: 2019-09-27 > > Declared registrant of the namespace: > > Name: Ryffine Inc. > Address: 445 N Broadway, Denver, CO 80203 > Contact: Philip R Brenan > E-mail: philiprbrenan@gmail.com > www: http://www.ryffine.com > > Purpose: > > To allow organizations to share content written in Xml to the Dita > Standard: > > http://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/dita-v1.3-os-part2-tech-content.html > without the exponential duplication that occurs without the name space > standardization provided by a URN. > > Dita is a technical documentation standard promulgated by OASIS: a > nonprofit > consortium that drives the development, convergence and adoption of open > standards for the global information society as noted at > https://www.oasis-open.org/org > > A major goal of Dita is to enable authors to build documents from small > reusable components called topics and then to share and reuse these > topics > via collections to enable other documents to be be built more rapidly. > > As a consequence of the current addressing mechanism used to link Dita > topics together within a document the number of such topics in existence > tends to grow exponentially over time as documents evolve. Typically > when a > new version of a product is documented the author takes the existing > set of > linked topic files comprising the documentation of the product, > duplicates > all of these files to preserve the complex linkage structure between > these > topics, then makes a small number of changes to a few of the duplicated > files, leaving the bulk of the topic files unchanged. At the moment it > is > difficult to reuse the original topic files in situ because of the need > to > maintain the links between them. > > The GB Standard as currently implemented at: > > hhttps://metacpan.org/pod/Dita::GB::Standard > > seeks to reduce this exponential growth of topic files by giving each > topic > a unique deterministic name so that links between topics can be > expressed in > a way that endures as the topic files are copied over time. > > As proposed, the GB Standard allows a collection of Dita topics to > quickly > determine whether it already has a copy of an incoming topic by > computing > the GB Standard name of the topic and comparing it to the names of all > such > topics already collected locally ready for publication. If the name > already > exists then the incoming topic is discarded and the existing topic is > reused, if the name does not exist in the collection then the collection > adds the incoming topic to its list of topics available for publication. > > At the same time, the GB standard provides a human readable name for > each > topic which assists authors in selecting topics from each collection for > reuse. > > The GB standard has been used by the applicant since 2016 to > successfully > build and maintain several large collections of topics. > > The purpose of this application then is to formalize the GB Standard > naming > convention as a globally recognized URN to enable standardized topic > naming > among organizations collaborating on the production of collections of > technical documentation using Dita. The proposed URN will not, as it > stands, provide immediate global location of topics so named, instead, > it > provides a standardized method of querying one or more collections of > such > topics by both humans and computers in an efficient manner. > > > Syntax: > > urn: gbs : <T> : <G> : <B> > > where: > > <T> is a string of one or more characters drawn from: [a-zA-Z0-9_] which > identifies the type of content being classified. At this point in time > only > one such type is in active use: the "dita" type. It is possible that > further > types might be required in the future, if so, this document will be > updated > to reflect these new types. > > <G> is a string of 1 to 64 characters drawn from: [a-zA-Z0-9_]. When > <T> > has the value: "dita" (currently the only permissible value), <G> is > computed by concatenating the text between which ever of the following > Xml > tags exist in a the Dita topic in the order in which they appear in that > topic: > > <title> <mainbooktitle> <booktitlealt> > > The text between these tags is used to form the <G> component after > converting runs of all characters other than a-zA-Z0-9 to single > underscores > and truncating after character 64 if the resulting string is longer > than 64 > characters in length. This method was chosen based on operational > experience > as it produces readable names that are closely aligned with what authors > expect to see as a topic name. > > <B> is the MD5 sum https://en.wikipedia.org/wiki/MD5 of the content > being > identified presented as a 32 character lowercase hexadecimal string > drawn > from: [a-z0-9]{32} . Presenting the MD5 sum in lowercase, last and > therefore > to the right has the beneficial side effect of allowing authors to > visually > ignore it and concentrate instead on the <G> component in the majority > of > cases where the <G> component happens to be (almost) unique. This > arrangement makes the GB Standard name useful to both humans and > computers. > > Assignment: > > Identifier uniqueness considerations: > > Uniqueness is guaranteed by the <B> component being an MD5 sum and > is > thus guaranteed to be identical for identical content and very > probably > different for differing content. > > Identifier persistence considerations: > > Persistence is guaranteed by the immutability over time of the MD5 > sum > of the <B> component. > > Process of identifier assignment: > > <T> is currently set to "dita". > > <G> is chosen algorithmically depending on the value of <T> using > the > topic as input as described above. > > <B> is chosen by computing the MD5 sum of the content. > > For example: > > > urn:gbs:dita:Introduction_to_the_GB_Standard:dddb7e2c29d2c8b9d87187fdf52a2702 > > Resolution: > > Content cannot be directly located by this standard. However, URN's > are > not necessarily required to provide locations services initially: > providing > a globally unique name is valuable in its own right because it > encourages > the development of, and convergence on, a small number of large, > shared, > inter-operable, global collections of topics within each of which the > uniqueness of the URN is sufficient to provide a location service. > > Equivalence is determined by comparing (ignoring case) the <B> > components > of the two topics to be compared. If they are equal the two topics are > considered to be equal. Otherwise they are considered to be unequal > even if > the underlying content is in fact identical. The characteristics of > the MD5 > sum ensure that only a small number of topics will be unnecessarily > duplicated as a result of such false positive equivalences. > > Security and Privacy: > > The validity of the URN can be checked as follows: > > Check that the <T> component is "dita". > > Check that the <G> component is computed correctly as described above. > > Check the the <B> component matches the MD5 sum of the content. > > Inter-operability: > > The case of the letters chosen is immaterial and can be safely ignored > in > all computations on the proposed URN as only the <B> component is used > for > comparisons. > > Dita topics that do not contain ASCII characters suitable for > constructing > the <G> component will be accommodated by adding a new value to the > list of > values accepted by the <T> component and specifying the corresponding > algorithm for computing the <G> component in an update to this document. > > Additional Information: > > An implementation in Perl of the GB Standard as specified above when > <T> is > > equal to "dita" is located at: > > https://metacpan.org/pod/Dita::GB::Standard > > References: > > ASCII: https://en.wikipedia.org/wiki/ASCII > > Dita specification: > http://docs.oasis-open.org/dita/dita/v1.3/os/part2-tech-content/dita-v1.3-os-part2-tech-content.html > > MD5 Sum: https://en.wikipedia.org/wiki/MD5 > > XML: https://en.wikipedia.org/wiki/XML > > > > On Thu, Sep 26, 2019 at 5:15 AM Hakala, Juha E <juha.hakala@helsinki.fi> > wrote: > > Hello Philip, > > > > as regards this: > > > > Please tell me whether it is necessary for a *urn* to be able to uniquely > locate files as well as classify them? > > > > URNs don’t have to be provide resolution services, so the resolver (if > any) does not need to know the location or locations of the identified > resource, or to link the URN to these URL / URLs. You may want to mention > in the urn:gbs namespace registration request that no resolution services > are anticipated. > > > > It might be useful to add to the request the sentences below on document > types and <T> values, with a note that for the time being only Dita > documents are within scope. And some background information about Dita > might be useful as well, for those who are not familiar with it. > > > > Best regards, > > > > Juha > > > > *Lähettäjä:* urn <urn-bounces@ietf.org> *Puolesta *Philip R Brenan > *Lähetetty:* keskiviikko 25. syyskuuta 2019 21.46 > *Vastaanottaja:* lars.svensson@web.de > *Kopio:* urn@ietf.org; Dale R. Worley <worley@ariadne.com> > *Aihe:* Re: [urn] gbs Name space identifier > > > > I have removed the link in question as the explanation of the derivation > of the *<T>* component was deemed unsatisfactory. Here is what I was > trying to achieve: > > > > It is anticipated that the GB Standard represented by the *urn:* *gbs* > name space could be usefully applied to a number of different document > types, such as Dita, DocBook, Word, Html etc. The <T> component is > designed to separate these various name spaces. At the moment the only <T> > in active use is *dita* for Dita documents. Within the Dita space the > algorithm for computing the *<G>* component is included in: > > > > https://metacpan.org/pod/Dita::GB::Standard > > > > as gbStandardFileName(). > > > > The computation of the <G> component is performed by examining the text > between which ever of the following *xml* tags exist in a particular Dita > document in the order in which they appear: > > > > title mainbooktitle booktitlealt > > > > The text between these tags is used to form the <G> component after > converting runs of all characters other than a-zA-Z0-9 to single > underscores. This method was chosen because it produces the most readable > names that are closely aligned with what authors expect to see as a file > name. > > > > The purpose of the GB Standard is to control the explosion of duplicate > Dita topics that tends to occur as documents evolve. Typically when a new > product is documented, the author takes the existing set of linked topic > files comprising the documentation of the product, duplicates all of these > files to preserve the linkage structure, then makes a small number of > changes to a few of the duplicated files, leaving the bulk of the topic > files unchanged. It is difficult to reuse the original topic files in situ > because of the need to maintain the links between them. > > > > The GB Standard seeks to reduce this exponential growth of topic files by > giving each topic a unique deterministic name so that links between topics > can be expressed in a way that endures as the topic files are copied over > time. > > > > As proposed, the GB Standard allows a server to quickly determine whether > it has a copy of a file by computing the GB Standard name of an incoming > file and comparing it to the names of all such files stored locally. If > the name already exists then that file is reused, if the name does not > exist on the server then the server adds the incoming file to its list of > files available. > > > > It is not the current intention to use the GB Standard name to locate off > site copies of a file - as things stand this could only be achieved by > querying each server known to store files in this manner in turn. Please > tell me whether it is necessary for a *urn* to be able to uniquely locate > files as well as classify them? If it is a requirement that a *urn *can > be used to locate a topic file anywhere in the world then I need to rethink > this aspect of the GB Standard and update my application for the *gbs* > namespace accordingly. If location is not necessarily required then the > description of the computation of the <G> component and adequate > documentation of the standard names in the <T> would be seem to be the > elements that need work to progress this application further? > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 24, 2019 at 12:51 PM <lars.svensson@web.de> wrote: > > > > <T> is a string of one or more characters drawn from: [a-zA-Z0-9_] > which > > > identifies the type of content from a list of types published by the > > > registrant at https://metacpan.org/pod/Dita::GB::Standard::Types . > > > > I attempted to obtain the list of valid types at the given URL, but was > > unsuccessful. That page seemed to be a very top-level discussion of > > "The GB Standard". > > That URL gives me a 404... > > Best, > > Lars > > > > -- > > Thanks, > > Phil <https://opentokrtc.com/room/phil> > > Philip R Brenan <https://opentokrtc.com/room/phil> > > > > -- > > Thanks, > > Phil <https://opentokrtc.com/room/phil> > > Philip R Brenan <https://opentokrtc.com/room/phil> > -- Thanks, Phil <https://opentokrtc.com/room/phil> Philip R Brenan <https://opentokrtc.com/room/phil>
- [urn] gbs Name space identifier Philip R Brenan
- Re: [urn] gbs Name space identifier Dale R. Worley
- Re: [urn] gbs Name space identifier lars.svensson
- Re: [urn] gbs Name space identifier Hakala, Juha E
- Re: [urn] gbs Name space identifier Philip R Brenan
- Re: [urn] gbs Name space identifier Hakala, Juha E
- Re: [urn] gbs Name space identifier Philip R Brenan
- Re: [urn] gbs Name space identifier Hakala, Juha E
- Re: [urn] gbs Name space identifier Philip R Brenan
- Re: [urn] gbs Name space identifier Dale R. Worley
- Re: [urn] gbs Name space identifier Dale R. Worley
- Re: [urn] gbs Name space identifier Philip R Brenan
- Re: [urn] gbs Name space identifier Dale R. Worley
- Re: [urn] gbs Name space identifier Philip R Brenan
- Re: [urn] gbs Name space identifier Dale R. Worley