Query on discovery algoritm for LRDD (draft-hammer-discovery-04)

John Panzer <jpanzer@google.com> Thu, 25 March 2010 23:41 UTC

Return-Path: <jpanzer@google.com>
X-Original-To: apps-discuss@core3.amsl.com
Delivered-To: apps-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7B8FF3A6B46 for <apps-discuss@core3.amsl.com>; Thu, 25 Mar 2010 16:41:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -97.295
X-Spam-Level:
X-Spam-Status: No, score=-97.295 tagged_above=-999 required=5 tests=[AWL=-0.849, BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, J_CHICKENPOX_41=0.6, J_CHICKENPOX_43=0.6, J_CHICKENPOX_44=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E+sa039nlQlN for <apps-discuss@core3.amsl.com>; Thu, 25 Mar 2010 16:41:34 -0700 (PDT)
Received: from smtp-out.google.com (smtp-out.google.com [74.125.121.35]) by core3.amsl.com (Postfix) with ESMTP id D56C03A69BF for <apps-discuss@ietf.org>; Thu, 25 Mar 2010 16:41:33 -0700 (PDT)
Received: from hpaq5.eem.corp.google.com (hpaq5.eem.corp.google.com [10.3.21.5]) by smtp-out.google.com with ESMTP id o2PNfthP032310 for <apps-discuss@ietf.org>; Fri, 26 Mar 2010 00:41:55 +0100
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta; t=1269560515; bh=unXi/hFGdd7roSAGpq7ktqDHL88=; h=MIME-Version:From:Date:Message-ID:Subject:To:Cc:Content-Type; b=M2J94rQetuNJGYdOksQQo826Bmp1c7gI4Vms1rm8WZE4zY83Y5UHRnTf/7bkwszCM yHHaMI+Zef8t2pB2wCyxA==
DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=mime-version:from:date:message-id:subject:to:cc: content-type:x-system-of-record; b=lVJRkdHzYKs+KJzS+aQGf5d+8+R1aJjcffvkGjdw8ca79QM8kRsbtAY0kSVObtnWf 5ugZF48eddvQZzEsbPJRA==
Received: from pzk11 (pzk11.prod.google.com [10.243.19.139]) by hpaq5.eem.corp.google.com with ESMTP id o2PNfqc6015048 for <apps-discuss@ietf.org>; Fri, 26 Mar 2010 00:41:53 +0100
Received: by pzk11 with SMTP id 11so2467pzk.17 for <apps-discuss@ietf.org>; Thu, 25 Mar 2010 16:41:52 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.114.187.11 with SMTP id k11mr1600471waf.153.1269560509413; Thu, 25 Mar 2010 16:41:49 -0700 (PDT)
From: John Panzer <jpanzer@google.com>
Date: Thu, 25 Mar 2010 16:41:29 -0700
Message-ID: <cb5f7a381003251641h3ab009a7i88bc442c0e1c9f24@mail.gmail.com>
Subject: Query on discovery algoritm for LRDD (draft-hammer-discovery-04)
To: apps-discuss@ietf.org
Content-Type: multipart/alternative; boundary="0016e64cca7e2f3a770482a89792"
X-System-Of-Record: true
Cc: webfinger@googlegroups.com, "salmon-protocol@googlegroups.com" <salmon-protocol@googlegroups.com>
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Mar 2010 23:41:36 -0000

 All,

This is my attempt to translate *
http://tools.ietf.org/html/draft-hammer-discovery-04* into an actionable
algorithm, as a first step in attempting to write a truly general resource
discovery library that matches the latest spec.  It is written under the
assumption that a client may wish to query for sets of links with a given
relvalue, and is not going to wish to retrieve and re-parse host-meta files
any more often than necessary.  Feedback welcomed!

*Definitions*

   - *startURI*: Starting point for discovery; input into the discovery
   process.  Discovery produces metadata for *startURI.*
   - *meta data:*A collection of links and properties for the resource.
   - *link*: An abstract edge between the*startURI*resource and another
   resource identified by a field *href.  *Has a*rel*field indicating the
   relationship from *startURI *to *href.* Has a *lrdd_pri *field indicating
   ordering within the *rel *value.  Has a *secure *boolean field indicating
   whether it was retrieved securely.  Can be annotated with arbitrary *
   properties*.
   - *property*: A piece of metadata about a resource less structured than
   a *link.* Only present for XRD-derived data.  Has a *secure *boolean
   field indicating whether it was retrieved securely.



*Algorithm*

   1. Determine priority order (Host-priority or Resource-priority):
      1. lookup HostMetaLRDD(startURI)
      2. if above lookup fails because Host(startURI) is not available,
      priority = Resource (goto step 2)
      3. Else, does result contain XRD/Property@type=
      http://lrdd.net/priority/resource?
      4. If yes, then priority = Resource (goto step 2)
      5. Otherwise, priority = Host (goto step 2)
   2. If priority = Host:  *// priority order is host-meta, HTTP Link,
   <Link> elements*
      1. metadata = lookup HostMetaLRDD(startURI, 1)
      2. augment metadata with result of lookup HeaderLinks(startURI, 2)
      3. augment metadata with lookup ResourceLinks(startURI, 3)
   3. If priority = Resource:*// priority order is <Link> elements, HTTP
   Link, host-meta*
      1. metadata = lookup ResourceLinks(startURI, 1)
      2. augment metadata with result of lookup HeaderLinks(startURI, 2)
      3. augment metadata with lookup HostMetaLRDD(startURI, 3)


*where*

HostMetaLRDD(uri, pri) is:

   1. Revalidate local XRD data cache for hostof(*uri*
   )/.well-known/host-meta.
   2. Find*the first link*with rel=”lrdd” in host-meta document (call the
   href value *lrdd_url*)
   3. Revalidate local XRD data cache for *lrdd_url,*annotating each link
   with *lrdd_pri*=pri
   4. Return metadata (non-lrdd links and properties) for *lrdd_uri*



HeaderLinks(uri, pri) is:

   1. If uri is not dereferenceable (e.g., acct:) return empty set
   2. Revalidate local resource link cache for *uri*
   3. Retrieve Link: headers for resource from cache, annotating each link
   with *lrdd_pri*=pri
   4. Return links to caller



ResourceLinks(uri, pri) is:

   1. If uri is not dereferenceable (e.g., acct:) return empty set
   2. Revalidate local resource link cache for *uri*, annotating each link
   with *lrddsrc =*resource
   3. Retrieve MIME-type-specific set of links known for resource,
   annotating each link with *lrdd_pri=*pri
   4. Return links to caller



augment X with Y with priority pri is:

   1. For each set of links y in Y sharing link relation r:
      1. if r==”lrdd”, and X does not already contain a link with
      rel=”lrdd”:, augment X with results of:
         1. Revalidate local XRD data cache for first *href *for rel=”lrdd”,
         annotating each link with *lrdd_pri*= pri
         2. Augment X with resulting metadata from *href*
      2. else, add list of links y to X.
   2. Add all properties in Y to X.



NB: Some link types can contain space-separated rel values, which are
interpreted as a series of links with each of the space-separated rel value
names in the above algorithm.

NB: Ordering of links and properties from each source with the same rel/type
values is retained throughout, with the caveat that HTTP headers do not have
a guaranteed stable order.

Security is not shown here, but is based on either TLS used throughout the
retrievals or (for XRD) the XRD signature specification.  Content type
specific security may also be used to verify links within content (e.g.,
Magic Signatures over link elements).  When set, the *secure *boolean
indicates that a specific link or property was retrieved securely; when not
set, it indicates that the retrieval path contained one or more insecure
steps.  The algorithm will also reject data that is red-flagged (e.g., if a
CRL check shows that a certificate has been revoked as opposed to merely
expiring).  (?)

*Result*:

An ordered collection of metadata, consisting of links and properties, each
of which is annotated with its*lrddsrc*provenance.  There is at most one
rel=”lrdd” link in the collection which identifies the specific LRDD
document used, if any.

A client which wishes to coalesce links and properties found in all three
places within a given rel or type value can just query the collection for
the desired rel or type and retrieve collections of links or properties.

Once retrieved, the set of metadata can be queried.  An important/useful
query is to retrieve the list of links with a given rel value:

get_links(metadata, rel)  # Get sequence of links with given rel value, or
empty list if not found.

Sometimes you may require that only the first set of links found count:

get_top_priority_links(metadata, rel)  # Get sequence of links with given
rel value, ignoring all but the first set found

...and of course you can do a similar query for properties.