Re: [vwrap] Client-side caching, URIs, and scene scalability

Morgaine <morgaine.dinova@googlemail.com> Mon, 27 September 2010 19:23 UTC

Return-Path: <morgaine.dinova@googlemail.com>
X-Original-To: vwrap@core3.amsl.com
Delivered-To: vwrap@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 607CC3A6B85 for <vwrap@core3.amsl.com>; Mon, 27 Sep 2010 12:23:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.649
X-Spam-Level:
X-Spam-Status: No, score=-1.649 tagged_above=-999 required=5 tests=[AWL=0.012, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id acOSK2QR89Zc for <vwrap@core3.amsl.com>; Mon, 27 Sep 2010 12:23:16 -0700 (PDT)
Received: from mail-qy0-f172.google.com (mail-qy0-f172.google.com [209.85.216.172]) by core3.amsl.com (Postfix) with ESMTP id 018A93A6B47 for <vwrap@ietf.org>; Mon, 27 Sep 2010 12:23:15 -0700 (PDT)
Received: by qyk1 with SMTP id 1so5415706qyk.10 for <vwrap@ietf.org>; Mon, 27 Sep 2010 12:23:55 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=JgnRCQbnbOtLlLl1aCoKFanDN4jxdOC3h+CTVLY16nk=; b=lh216v+fLLL8Y1iPinL3w2ROIAas0PEDaJ4XpBSWy17UzQeQQXknvVpT25CuGfNOE0 axYMsFyOqM2Gu6TVnRc4+MfbaXKtVBO/D/GowabOH14j9jiS19LcEswatq4MCSg2FYHS CqO1pkVIwBP4i/pNCjP8d4LsHBguJC1BESJNI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=CAbeWL/TMnbjsE7sJIns9/v6lxl9eDKj+CaVavifEmdCR+A5gtY03EDZD96P1w0R8n AoxW67Dl2lDivL9z6AsJpx5qhjTHXVyeZdiK+XE2yuZTFJZK1UHKq8c1PgiuBO9Gz0DJ kemhqxEaujcHKOZKMlrgpHmNZ8Uvk2y/yZd/U=
MIME-Version: 1.0
Received: by 10.224.66.74 with SMTP id m10mr5800319qai.328.1285615434891; Mon, 27 Sep 2010 12:23:54 -0700 (PDT)
Received: by 10.229.232.69 with HTTP; Mon, 27 Sep 2010 12:23:54 -0700 (PDT)
In-Reply-To: <62BFE5680C037E4DA0B0A08946C0933D012AD7E419@rrsmsx506.amr.corp.intel.com>
References: <AANLkTin5GF7=qPXYTOFyB0T-2C4JrS2=xaDKo0wZC+fH@mail.gmail.com> <62BFE5680C037E4DA0B0A08946C0933D012AD7E419@rrsmsx506.amr.corp.intel.com>
Date: Mon, 27 Sep 2010 20:23:54 +0100
Message-ID: <AANLkTimEBbz5zCtRU8BcO+o65hCwhSxE_R8HM9UyCh40@mail.gmail.com>
From: Morgaine <morgaine.dinova@googlemail.com>
To: "vwrap@ietf.org" <vwrap@ietf.org>
Content-Type: multipart/alternative; boundary="00c09f8e5e905082b3049142abe1"
Subject: Re: [vwrap] Client-side caching, URIs, and scene scalability
X-BeenThere: vwrap@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual World Region Agent Protocol - IETF working group <vwrap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vwrap>
List-Post: <mailto:vwrap@ietf.org>
List-Help: <mailto:vwrap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 19:23:18 -0000

That's good to hear John, but it can be made far more effective and
efficient.

Your approach allows a client to avoid requesting the data, but only at the
cost of requesting the ETag header, which entails making a network request.
The approach that I am advocating would eliminate the network requests
altogether, because the hash is part of the asset identifier that is held by
the region and is handed to the client.  Requesting hashes with another
round trip doesn't scale as scenes grow massively.

Avoiding unnecessary network accesses will become crucial as new worlds
expand the asset pool with millions of replicated assets in seconds.
Hash-based URIs would also provide virtual worlds with asset resilience,
since fallback services can be queried automatically for the known asset
hash when retrieval from the initial asset URI fails.  You can't do that
when the hash is held by an asset service that is now inaccessible.


Morgaine.





==============================

On Mon, Sep 27, 2010 at 6:26 PM, Hurliman, John <john.hurliman@intel.com>wrote:

> Agreed. We are already doing this in the SimianGrid asset server by using
> the ETag HTTP header to deliver a SHA256 hash of asset data. This allows a
> client to do a HEAD request before fetching data and is compatible with
> existing web caching systems.
>
>
>
> John
>
>
>
> *From:* vwrap-bounces@ietf.org [mailto:vwrap-bounces@ietf.org] *On Behalf
> Of *Morgaine
> *Sent:* Monday, September 27, 2010 9:53 AM
> *To:* vwrap@ietf.org
> *Subject:* [vwrap] Client-side caching, URIs, and scene scalability
>
>
>
> We've discussed the structure of caps, URIs and asset addressing here many
> times in the past.  I would like us to examine this issue in the specific
> context of *client-side caching* and *scene growth*, which we have not
> previously addressed.  Scalability is a matter of huge importance (as well
> as being part of the IETF mission statement), and I'm particularly
> interested in making sure that VWRAP standards are scalable in key
> dimensions.
>
> Scenes will inevitably rise in size and complexity with the passage of
> time.  In quite a short while we can expect millions of assets in a scene
> within the field of view of an agent, and further orders of magnitude not
> long after.  While some may be tempted to call this "sci fi", observing the
> increase in memory, disk, and other computing resources over time suggests
> otherwise.  From kilo to mega, giga and tera, it's only when we look back
> that we realize that our inability to visualize exponential growth is epic.
>
> This becomes relevant when deciding on URI formats.  It's no use defining
> an elegant URI format if it doesn't scale as scene complexity rises.
>
> What this means for us when we are designing the structure of URIs is that
> we need to focus on what the URI is for, namely data access, both local and
> remote.  When designing for scalability through good use of caching, our
> goal is to avoid a client having to perform each remote access if at all
> possible.  If our elegant URI format results in clients needing to access
> data remotely despite it already being cached locally, then our elegant
> addressing scheme has failed.  "Elegant but non-scalable" is not the mark of
> success, so let's check against this requirement.
>
> When a region tells the client about the items it current holds (narrowed
> down by an interest list in an optimized implementation), it does so by
> listing the items in the scene using item identifiers of some kind.  The
> client can then use each identifier as an index into its local cache, and
> then request from the relevant asset services only those items that are not
> already cached.  This is easy in an isolated world where identifiers can be
> world-global.  Where it breaks down is when worlds interoperate, and those
> arbitrary identifiers (eg. UUIDs or URIs based on them) become useless for
> deciding whether an item common to multiple worlds is actually in the cache
> or not.  Done wrongly, it can easily result in repeat downloading on a
> massive scale.
>
> Local or global identifiers will work poorly unless they're an intrinsic
> property of the actual data being indexed.  The reason they won't work well
> is because the same data used in two different worlds won't have a common
> URI-based cache index unless it happens to be supplied by the same asset
> service.  The same item replicated in thousands of worlds would end up being
> stored thousands of times in the cache.  While the storage cost may be of
> little consequence, the repeated access cost is not, because round-trip
> times have very limited downward scalability.
>
> The engineering solution to this is pretty obvious:  scene component
> identifiers should include a *hash or digest over the data*, this
> information being separable from other parts of the identifier/URI so that
> it can be used as a key into the cache.  The cache is king, and terabyte
> caches should be regarded as normal now, with petabyte caches not so many
> years down the line.  The goal of "Never download the same thing twice" is
> already reasonable with terabyte drives today, never mind tomorrow.  VWRAP
> needs to embrace this, if it is to be a scalable interop standard.
>
> Note that in the above, my reference to "data" *excludes metadata* by
> intent.  Two objects may be quite separate, with totally different metadata,
> yet denote exactly the same data, which would give them the same hash digest
> and hence share a cache index.  This situation is likely to be extremely
> common, especially for environmental items such as trees, vegetation and
> other natural elements.  We can easily foresee a situation in which people
> create their brand new world by unpacking a region archive and releasing
> another few million items into the metaverse.  If those items were cached
> the first time that they were seen in one world, they would not need to be
> loaded again from this new world, if we design our URIs with good
> engineering properties and foresight.
>
> Cache scalability as the number of worlds with common assets rises is one
> issue, but there is also another related one on the horizon.  As we move
> away from SL's primitive assets and 1-level linksets towards *hierachical
> objects* that allow object composition, the number of virtual obects made
> from reusable components will skyrocket, because builders will be riding on
> the shoulders of giants, just like in RL engineering.  This again will
> result in massive cross-world sharing of replicated components.
>
> In summary:  The asset identifiers supplied by a region to a client should
> contain an *explicit hash/digest over the data* (calculated *ONCE* by the
> relevant asset service of course, not by each region), to allow client-side
> caches to be highly effective at eliminating unnecessary network traffic.
> This will be very important in a metaverse of countless worlds, huge amounts
> of shared data, and massive scenes.
>
>
> Morgaine.
>
> PS. Hash digests in asset URIs deliver two other important benefits as
> well, beyond scalability:
>
>    - They provide the interesting property of *near-universal asset
>    addressing*.  This may appeal to those who focus on social aspects of
>    digital content such as imposing property semantics, in which case using a
>    URI format that almost uniquely identifies assets can kill several birds
>    with one stone.
>
>
>    - They provide isolation from host and network outages.  Near-universal
>    asset addressing means that when an asset service fails to respond or
>    returns an error code, a new URI containing the same hash digest could be
>    manufactured and sent to a second asset service as fallback.  The benefits
>    of this for *virtual world resilience* are of course immense.
>    Resilience is so important that I suggest it should be a protocol
>    requirement.  The fact that we would gain resilience automatically as a mere
>    side-effect of digest-based addressing highlights the rather nice properties
>    of this design.
>
>
> -- End.
>
> _______________________________________________
> vwrap mailing list
> vwrap@ietf.org
> https://www.ietf.org/mailman/listinfo/vwrap
>
>