Re: [urn] Suggested PWID URN for Persistent Web IDentifiers

Re: [urn] Suggested PWID URN for Persistent Web IDentifiers - version 3

worley@ariadne.com (Dale R. Worley) Sat, 08 September 2018 14:08 UTC

From: worley@ariadne.com
To: elzi@kb.dk, urn@ietf.org
In-Reply-To: <f5b4lgl9htu.fsf@troutbeck.inf.ed.ac.uk>
Sender: worley@ariadne.com
Date: Sat, 08 Sep 2018 10:08:37 -0400
Message-ID: <87y3cculne.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/urn/iDVhRqQtdsX1zzfGrZyY0UAULV8>
Subject: Re: [urn] Suggested PWID URN for Persistent Web IDentifiers - version 3
Precedence: list

Re-reading this thread, I think Henry's comments are significant in
regard to how this proposal will interact with actual archive use.  I
don't know enough about archiving to speak well to that, but there are
significant technical points that seem to be connect with or parallel
Henry's archiving considerations.


One critical element is the archive identifier.  There seem to be two
cases regarding what is intended.

One case is where the PWID URNs are constructed by archive organization
itself.  Thus, the archive organization can choose the archive
identifier, and presumably that will be a domain name that it controls.
That is not a perfect solution (we've had trouble with it in other URN
proposals), but it's clear.  But in that situation, the archive is
probably already minting URLs that can be used to retrieve archived
resources, so it's not clear that there is a large benefit to be gained.

The other case is where the URNs are constructed by third parties, that
is, neither the archive nor the consumer of the URN.  As Henry says,
this usage has the nature of a citation.  What is unclear in this case
is who selects the archive identifier of the archive.

One possibility in this case is that it is expected that there will be
only a small number of archives, and there's an IANA registry of archive
identifiers, and any person is allowed to register an identifier for any
archive.  Presumably, expert review could be used to keep the situation
under control.  This is alluded to by:

      On long term, there should be created a registry that keeps track
      of identifiers of archives over time, since they are likely to
      change names, merge etc. when taking about a 100 year period.


The proposal generally assumes that an archived resource can be
identified using just:
- the archive identifier
- the URL whose contents were archived
- the time at which the contents were archived

Embedding the URL into the URN presents syntactic problems.  The
characters [ and ] can be used in the host-address part of a URL, and ?
is used for queries.  The latter I consider to be particularly
important, as web links can often include query-parts.  This problem
needs to be solved, as a syntax that "covers about 80-95% of all cases"
is a syntax that doesn't suffice for the problem at hand.


I don't see a need to worry about fragment-parts of archived URLs, as
the fragment structure is inherently embedded in the resource retrieved
via the URL.  So an archive can archive the URL-without-fragment, the
URN can reference that URL, and user can attach the required
fragment-part to the URN to specify the desired fragment.


There is some lack of clarity about coverage-spec.  If the "coverage" is
*part* of the archived resource, then it is, or should be, a fragment,
and that can be specified by appending a fragment-part to the URN.  If
the coverage is metadata about the resource, it seems to be undefined
what forms the metadata resource could take.  But some coverage values
seem to suggest that the referenced information is a set of resources
which contains as a member the resource designated by the recorded URL.
This concept gets very interesting indeed.  There doesn't seem to be any
defined resource type for "a web site", or even "all the files needed to
display a web page".  Also, it's not clear what the archival-time means
in this context, since presumably the archive need not contain archived
copies of everything in the aggregate that were all made at the
specified archival-time.

Dale

[urn] Suggested PWID URN for Persistent Web IDent… Eld Zierau
Re: [urn] Suggested PWID URN for Persistent Web I… Dale R. Worley
Re: [urn] Suggested PWID URN for Persistent Web I… Henry S. Thompson
Re: [urn] Suggested PWID URN for Persistent Web I… Hakala, Juha E
Re: [urn] Suggested PWID URN for Persistent Web I… Eld Zierau
Re: [urn] Suggested PWID URN for Persistent Web I… Dale R. Worley
Re: [urn] Suggested PWID URN for Persistent Web I… Eld Zierau
Re: [urn] Suggested PWID URN for Persistent Web I… Dale R. Worley
[urn] PWID as citation (was: Suggested PWID URN f… Dale R. Worley
Re: [urn] PWID as citation (was: Suggested PWID U… Eld Zierau
Re: [urn] PWID as citation (was: Suggested PWID U… Eld Zierau
Re: [urn] PWID as citation (was: Suggested PWID U… Eld Zierau