Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)

Juha Hakala <juha.hakala@helsinki.fi> Fri, 02 May 2014 19:56 UTC

Return-Path: <juha.hakala@helsinki.fi>
X-Original-To: urn@ietfa.amsl.com
Delivered-To: urn@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CDDB31A6F68 for <urn@ietfa.amsl.com>; Fri, 2 May 2014 12:56:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.852
X-Spam-Level:
X-Spam-Status: No, score=-4.852 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4C2mwRZPxwR1 for <urn@ietfa.amsl.com>; Fri, 2 May 2014 12:56:38 -0700 (PDT)
Received: from smtp-rs1-vallila2.fe.helsinki.fi (smtp-rs1-vallila2.fe.helsinki.fi [128.214.173.75]) by ietfa.amsl.com (Postfix) with ESMTP id 67A901A093A for <urn@ietf.org>; Fri, 2 May 2014 12:56:36 -0700 (PDT)
Received: from [192.168.100.30] (a88-114-107-213.elisa-laajakaista.fi [88.114.107.213]) (authenticated bits=0) by smtp-rs1.it.helsinki.fi (8.14.4/8.14.4) with ESMTP id s42JuMDb011843 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NOT); Fri, 2 May 2014 22:56:23 +0300
Message-ID: <5363F867.60503@helsinki.fi>
Date: Fri, 02 May 2014 22:56:23 +0300
From: Juha Hakala <juha.hakala@helsinki.fi>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0
MIME-Version: 1.0
To: John C Klensin <john-ietf@jck.com>, jehakala@mappi.helsinki.fi
References: <C93A34DBE97565AD96CEC321@JcK-HP8200.jck.com> <534BED18.9090009@gmx.de> <3D39F1AA700A179F3C051DE2@JcK-HP8200.jck.com> <534D3410.50607@ninebynine.org> <54ecc96adba240159cf624c54c507136@BL2PR02MB307.namprd02.prod.outlook.com> <952E89C207E59D25CD5953D6@JCK-EEE10> <20140502180642.Horde.k922N8-cIl2au4mAP9neJA2@webmail.helsinki.fi> <86412DCF67470AFC510CD4F4@JcK-HP8200.jck.com>
In-Reply-To: <86412DCF67470AFC510CD4F4@JcK-HP8200.jck.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/urn/37Oi4fd1PoYgOboKZrPIRJhCkKM
Cc: julian.reschke@gmx.de, urn@ietf.org, Graham Klyne <GK@ninebynine.org>
Subject: Re: [urn] [apps-discuss] URNs are not URIs (another look at RFC 3986)
X-BeenThere: urn@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Revisions to URN RFCs <urn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/urn>, <mailto:urn-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/urn/>
List-Post: <mailto:urn@ietf.org>
List-Help: <mailto:urn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/urn>, <mailto:urn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 May 2014 19:56:42 -0000

Hello,

On 2.5.2014 20:16, John C Klensin wrote:
> (apps-discuss list dropped)
>
> Juha,
>
> I found this (both the part quoted below and the rest of your
> note) very helpful.  Thanks.  One question below...
>
> --On Friday, May 02, 2014 18:06 +0300 jehakala@mappi.helsinki.fi
> wrote:
>
> I would assume that, if the poem is published as part of a
> collection, the collection appears in a book, and the book is
> identified with an ISBN, the mechanism for finding the poem
> within the book becomes part of a specification about the book
> and its identifier.
Sort of. I left out some details from my description, but add them here 
because similar techniques are being put into use not only in libraries 
but in other organizations as well.

In our slang, there is a host record describing the book, and component 
part records describing the poem, article, image or any other thing in 
the book. There is a bidirectional link between host and component part 
records. Similar linking techniques are in use when libraries describe 
for instance serials and serial articles and CDs & tracks in them. The 
work level record of a poem, track, article or any other component part 
will be linked directly to the the component part record. Persistent 
identifiers are required in all levels, and there can be several of them 
- for instance, a periodical articles often have embedded images.

These component parts are sometimes components only in logical sense. 
Each track or article may be available as a separate file. But if a file 
contains many component resources, the file syntax may reveal this 
internal structure. For instance, when the National Library of Finland 
digitizes serials we often create structured METS/ALTO XML files where 
encoding shows the logical structure of the issue.
> That is an instruction about the component
> of the book, not what I think would normally be considered
> "metadata about the book".  And depending on other conventions
> or protocols (in the traditional, not computer sense), that
> mechanism might be shown as "page 57ff", "Chapter 5", or
> something else.   Some introducing (or surrounding) syntax would
> be needed for that sort of mechanism description and one would
> probably want it to be different from the mechanisms used to
> query, or extract information from, the "metadata about the
> book".

METS/ALTO encoding commonly used for digitized textual resources can do 
a lot. Not only can we encode articles, but also chapters, pages, 
paragraphs etc.; it is also possible to describe the font and size of 
each OCR'd character - and provide the probability of the correctness of 
the OCR result for each character.
>   
>
> While the distinctions are real, the details may be a matter of
> convention.  For example, if the book's table of contents were
> considered metadata about the book, the poem might also be found
> within the book by asking a question of that metadata and then
> using the result.
Table of contents (and abstract) is often provided as metadata for 
non-fiction books. So it is possible to find for instance an article 
within a book even if there is no separate metadata record for it. 
Whether it is possible to provide a direct link to such article depends 
on the encoding of the (text) file. Rich encoding takes time, but serves 
the users well.

>
> Is that just about right?  If not, where am I confused?

I don't think you are  ;-).

  Juha
>
> thanks,
>      john
>