Re: [vwrap] Client-side caching, URIs, and scene scalability

Nexii Malthus <nexiim@gmail.com> Thu, 30 September 2010 04:48 UTC

Return-Path: <nexiim@gmail.com>
X-Original-To: vwrap@core3.amsl.com
Delivered-To: vwrap@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 746D13A67E7 for <vwrap@core3.amsl.com>; Wed, 29 Sep 2010 21:48:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.283
X-Spam-Level:
X-Spam-Status: No, score=-2.283 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 373z7WfqfeLC for <vwrap@core3.amsl.com>; Wed, 29 Sep 2010 21:48:40 -0700 (PDT)
Received: from mail-wy0-f172.google.com (mail-wy0-f172.google.com [74.125.82.172]) by core3.amsl.com (Postfix) with ESMTP id 046E63A6C27 for <vwrap@ietf.org>; Wed, 29 Sep 2010 21:48:39 -0700 (PDT)
Received: by wyi11 with SMTP id 11so1744323wyi.31 for <vwrap@ietf.org>; Wed, 29 Sep 2010 21:49:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=8X/K+RBBYtNrsJ8ExujbovU0lkjUTIR5Kpq4JDqdSOc=; b=KmgT9I+P7aN6tbecyd7iqo3ky6M9xlcY1RzEafCw0TgaVhXDFcxI7fggwJecffKN9V iYKGxMTFCdwdSws/a1pUHuJrfA0V4Eb9P4yO92jQZW8UEHoaaooOV3YatLuwqT8rzF4s JvSDqKphmw+sCtZexvsoGJe12ly9+vgVk0frw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=Ty4qFZ5YOmcXYlNUkX+ELwPshWLRH3exde4tbyFr7X3YksoZXH8K3ZBv5073mZ6Kb7 OJJQvqzUDSfSa6+tr4Krwly0QcX5A/xsRGw9x40EVRjISs35zsw33AfiEPVrvKl/3EYs zMRsLLHDXO1t8K3rzx8j0/Dkj7LZCRZ+/Ro94=
MIME-Version: 1.0
Received: by 10.227.138.134 with SMTP id a6mr2589235wbu.68.1285822164520; Wed, 29 Sep 2010 21:49:24 -0700 (PDT)
Received: by 10.227.149.194 with HTTP; Wed, 29 Sep 2010 21:49:24 -0700 (PDT)
In-Reply-To: <AANLkTimEBbz5zCtRU8BcO+o65hCwhSxE_R8HM9UyCh40@mail.gmail.com>
References: <AANLkTin5GF7=qPXYTOFyB0T-2C4JrS2=xaDKo0wZC+fH@mail.gmail.com> <62BFE5680C037E4DA0B0A08946C0933D012AD7E419@rrsmsx506.amr.corp.intel.com> <AANLkTimEBbz5zCtRU8BcO+o65hCwhSxE_R8HM9UyCh40@mail.gmail.com>
Date: Thu, 30 Sep 2010 05:49:24 +0100
Message-ID: <AANLkTi=-xDoHVmnA=xQo9mnEzmffyxeKyuyKmqrQ8ss0@mail.gmail.com>
From: Nexii Malthus <nexiim@gmail.com>
To: Morgaine <morgaine.dinova@googlemail.com>
Content-Type: multipart/alternative; boundary="001485f44c025c4e4b049172cd29"
X-Mailman-Approved-At: Wed, 29 Sep 2010 21:59:35 -0700
Cc: "vwrap@ietf.org" <vwrap@ietf.org>
Subject: Re: [vwrap] Client-side caching, URIs, and scene scalability
X-BeenThere: vwrap@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual World Region Agent Protocol - IETF working group <vwrap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vwrap>
List-Post: <mailto:vwrap@ietf.org>
List-Help: <mailto:vwrap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Sep 2010 04:49:30 -0000

Sounds awesome and like a nicely evaluated approach.

- Nexii

On Mon, Sep 27, 2010 at 8:23 PM, Morgaine <morgaine.dinova@googlemail.com>wrote:

> That's good to hear John, but it can be made far more effective and
> efficient.
>
> Your approach allows a client to avoid requesting the data, but only at the
> cost of requesting the ETag header, which entails making a network request.
> The approach that I am advocating would eliminate the network requests
> altogether, because the hash is part of the asset identifier that is held by
> the region and is handed to the client.  Requesting hashes with another
> round trip doesn't scale as scenes grow massively.
>
> Avoiding unnecessary network accesses will become crucial as new worlds
> expand the asset pool with millions of replicated assets in seconds.
> Hash-based URIs would also provide virtual worlds with asset resilience,
> since fallback services can be queried automatically for the known asset
> hash when retrieval from the initial asset URI fails.  You can't do that
> when the hash is held by an asset service that is now inaccessible.
>
>
> Morgaine.
>
>
>
>
>
> ==============================
>
> On Mon, Sep 27, 2010 at 6:26 PM, Hurliman, John <john.hurliman@intel.com>wrote:
>
>> Agreed. We are already doing this in the SimianGrid asset server by using
>> the ETag HTTP header to deliver a SHA256 hash of asset data. This allows a
>> client to do a HEAD request before fetching data and is compatible with
>> existing web caching systems.
>>
>>
>>
>> John
>>
>>
>>
>> *From:* vwrap-bounces@ietf.org [mailto:vwrap-bounces@ietf.org] *On Behalf
>> Of *Morgaine
>> *Sent:* Monday, September 27, 2010 9:53 AM
>> *To:* vwrap@ietf.org
>> *Subject:* [vwrap] Client-side caching, URIs, and scene scalability
>>
>>
>>
>> We've discussed the structure of caps, URIs and asset addressing here many
>> times in the past.  I would like us to examine this issue in the specific
>> context of *client-side caching* and *scene growth*, which we have not
>> previously addressed.  Scalability is a matter of huge importance (as well
>> as being part of the IETF mission statement), and I'm particularly
>> interested in making sure that VWRAP standards are scalable in key
>> dimensions.
>>
>> Scenes will inevitably rise in size and complexity with the passage of
>> time.  In quite a short while we can expect millions of assets in a scene
>> within the field of view of an agent, and further orders of magnitude not
>> long after.  While some may be tempted to call this "sci fi", observing the
>> increase in memory, disk, and other computing resources over time suggests
>> otherwise.  From kilo to mega, giga and tera, it's only when we look back
>> that we realize that our inability to visualize exponential growth is epic.
>>
>> This becomes relevant when deciding on URI formats.  It's no use defining
>> an elegant URI format if it doesn't scale as scene complexity rises.
>>
>> What this means for us when we are designing the structure of URIs is that
>> we need to focus on what the URI is for, namely data access, both local and
>> remote.  When designing for scalability through good use of caching, our
>> goal is to avoid a client having to perform each remote access if at all
>> possible.  If our elegant URI format results in clients needing to access
>> data remotely despite it already being cached locally, then our elegant
>> addressing scheme has failed.  "Elegant but non-scalable" is not the mark of
>> success, so let's check against this requirement.
>>
>> When a region tells the client about the items it current holds (narrowed
>> down by an interest list in an optimized implementation), it does so by
>> listing the items in the scene using item identifiers of some kind.  The
>> client can then use each identifier as an index into its local cache, and
>> then request from the relevant asset services only those items that are not
>> already cached.  This is easy in an isolated world where identifiers can be
>> world-global.  Where it breaks down is when worlds interoperate, and those
>> arbitrary identifiers (eg. UUIDs or URIs based on them) become useless for
>> deciding whether an item common to multiple worlds is actually in the cache
>> or not.  Done wrongly, it can easily result in repeat downloading on a
>> massive scale.
>>
>> Local or global identifiers will work poorly unless they're an intrinsic
>> property of the actual data being indexed.  The reason they won't work well
>> is because the same data used in two different worlds won't have a common
>> URI-based cache index unless it happens to be supplied by the same asset
>> service.  The same item replicated in thousands of worlds would end up being
>> stored thousands of times in the cache.  While the storage cost may be of
>> little consequence, the repeated access cost is not, because round-trip
>> times have very limited downward scalability.
>>
>> The engineering solution to this is pretty obvious:  scene component
>> identifiers should include a *hash or digest over the data*, this
>> information being separable from other parts of the identifier/URI so that
>> it can be used as a key into the cache.  The cache is king, and terabyte
>> caches should be regarded as normal now, with petabyte caches not so many
>> years down the line.  The goal of "Never download the same thing twice" is
>> already reasonable with terabyte drives today, never mind tomorrow.  VWRAP
>> needs to embrace this, if it is to be a scalable interop standard.
>>
>> Note that in the above, my reference to "data" *excludes metadata* by
>> intent.  Two objects may be quite separate, with totally different metadata,
>> yet denote exactly the same data, which would give them the same hash digest
>> and hence share a cache index.  This situation is likely to be extremely
>> common, especially for environmental items such as trees, vegetation and
>> other natural elements.  We can easily foresee a situation in which people
>> create their brand new world by unpacking a region archive and releasing
>> another few million items into the metaverse.  If those items were cached
>> the first time that they were seen in one world, they would not need to be
>> loaded again from this new world, if we design our URIs with good
>> engineering properties and foresight.
>>
>> Cache scalability as the number of worlds with common assets rises is one
>> issue, but there is also another related one on the horizon.  As we move
>> away from SL's primitive assets and 1-level linksets towards *hierachical
>> objects* that allow object composition, the number of virtual obects made
>> from reusable components will skyrocket, because builders will be riding on
>> the shoulders of giants, just like in RL engineering.  This again will
>> result in massive cross-world sharing of replicated components.
>>
>> In summary:  The asset identifiers supplied by a region to a client should
>> contain an *explicit hash/digest over the data* (calculated *ONCE* by the
>> relevant asset service of course, not by each region), to allow client-side
>> caches to be highly effective at eliminating unnecessary network traffic.
>> This will be very important in a metaverse of countless worlds, huge amounts
>> of shared data, and massive scenes.
>>
>>
>> Morgaine.
>>
>> PS. Hash digests in asset URIs deliver two other important benefits as
>> well, beyond scalability:
>>
>>    - They provide the interesting property of *near-universal asset
>>    addressing*.  This may appeal to those who focus on social aspects of
>>    digital content such as imposing property semantics, in which case using a
>>    URI format that almost uniquely identifies assets can kill several birds
>>    with one stone.
>>
>>
>>    - They provide isolation from host and network outages.
>>    Near-universal asset addressing means that when an asset service fails to
>>    respond or returns an error code, a new URI containing the same hash digest
>>    could be manufactured and sent to a second asset service as fallback.  The
>>    benefits of this for *virtual world resilience* are of course
>>    immense.  Resilience is so important that I suggest it should be a protocol
>>    requirement.  The fact that we would gain resilience automatically as a mere
>>    side-effect of digest-based addressing highlights the rather nice properties
>>    of this design.
>>
>>
>> -- End.
>>
>> _______________________________________________
>> vwrap mailing list
>> vwrap@ietf.org
>> https://www.ietf.org/mailman/listinfo/vwrap
>>
>>
>
> _______________________________________________
> vwrap mailing list
> vwrap@ietf.org
> https://www.ietf.org/mailman/listinfo/vwrap
>
>