Re: [vwrap] Client-side caching, URIs, and scene scalability

"Hurliman, John" <john.hurliman@intel.com> Mon, 27 September 2010 17:25 UTC

Return-Path: <john.hurliman@intel.com>
X-Original-To: vwrap@core3.amsl.com
Delivered-To: vwrap@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 9C8F03A6D5C for <vwrap@core3.amsl.com>; Mon, 27 Sep 2010 10:25:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.858
X-Spam-Level:
X-Spam-Status: No, score=-4.858 tagged_above=-999 required=5 tests=[AWL=-1.175, BAYES_50=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kx8Iw-SlTlvM for <vwrap@core3.amsl.com>; Mon, 27 Sep 2010 10:25:51 -0700 (PDT)
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by core3.amsl.com (Postfix) with ESMTP id 9BC823A6C9B for <vwrap@ietf.org>; Mon, 27 Sep 2010 10:25:51 -0700 (PDT)
Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP; 27 Sep 2010 10:26:26 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos; i="4.57,243,1283756400"; d="scan'208,217"; a="661595671"
Received: from rrsmsx602.amr.corp.intel.com ([10.31.0.33]) by orsmga001.jf.intel.com with ESMTP; 27 Sep 2010 10:26:26 -0700
Received: from rrsmsx601.amr.corp.intel.com (10.31.0.151) by rrsmsx602.amr.corp.intel.com (10.31.0.33) with Microsoft SMTP Server (TLS) id 8.2.254.0; Mon, 27 Sep 2010 11:26:25 -0600
Received: from rrsmsx506.amr.corp.intel.com ([10.31.0.39]) by rrsmsx601.amr.corp.intel.com ([10.31.0.151]) with mapi; Mon, 27 Sep 2010 11:26:25 -0600
From: "Hurliman, John" <john.hurliman@intel.com>
To: "vwrap@ietf.org" <vwrap@ietf.org>
Date: Mon, 27 Sep 2010 11:26:24 -0600
Thread-Topic: [vwrap] Client-side caching, URIs, and scene scalability
Thread-Index: ActeZH777Z/SBMfaRPyQyI9qafuRWAABGcdg
Message-ID: <62BFE5680C037E4DA0B0A08946C0933D012AD7E419@rrsmsx506.amr.corp.intel.com>
References: <AANLkTin5GF7=qPXYTOFyB0T-2C4JrS2=xaDKo0wZC+fH@mail.gmail.com>
In-Reply-To: <AANLkTin5GF7=qPXYTOFyB0T-2C4JrS2=xaDKo0wZC+fH@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_62BFE5680C037E4DA0B0A08946C0933D012AD7E419rrsmsx506amrc_"
MIME-Version: 1.0
Subject: Re: [vwrap] Client-side caching, URIs, and scene scalability
X-BeenThere: vwrap@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual World Region Agent Protocol - IETF working group <vwrap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vwrap>
List-Post: <mailto:vwrap@ietf.org>
List-Help: <mailto:vwrap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 17:25:53 -0000

Agreed. We are already doing this in the SimianGrid asset server by using the ETag HTTP header to deliver a SHA256 hash of asset data. This allows a client to do a HEAD request before fetching data and is compatible with existing web caching systems.

John

From: vwrap-bounces@ietf.org [mailto:vwrap-bounces@ietf.org] On Behalf Of Morgaine
Sent: Monday, September 27, 2010 9:53 AM
To: vwrap@ietf.org
Subject: [vwrap] Client-side caching, URIs, and scene scalability

We've discussed the structure of caps, URIs and asset addressing here many times in the past.  I would like us to examine this issue in the specific context of client-side caching and scene growth, which we have not previously addressed.  Scalability is a matter of huge importance (as well as being part of the IETF mission statement), and I'm particularly interested in making sure that VWRAP standards are scalable in key dimensions.

Scenes will inevitably rise in size and complexity with the passage of time.  In quite a short while we can expect millions of assets in a scene within the field of view of an agent, and further orders of magnitude not long after.  While some may be tempted to call this "sci fi", observing the increase in memory, disk, and other computing resources over time suggests otherwise.  From kilo to mega, giga and tera, it's only when we look back that we realize that our inability to visualize exponential growth is epic.

This becomes relevant when deciding on URI formats.  It's no use defining an elegant URI format if it doesn't scale as scene complexity rises.

What this means for us when we are designing the structure of URIs is that we need to focus on what the URI is for, namely data access, both local and remote.  When designing for scalability through good use of caching, our goal is to avoid a client having to perform each remote access if at all possible.  If our elegant URI format results in clients needing to access data remotely despite it already being cached locally, then our elegant addressing scheme has failed.  "Elegant but non-scalable" is not the mark of success, so let's check against this requirement.

When a region tells the client about the items it current holds (narrowed down by an interest list in an optimized implementation), it does so by listing the items in the scene using item identifiers of some kind.  The client can then use each identifier as an index into its local cache, and then request from the relevant asset services only those items that are not already cached.  This is easy in an isolated world where identifiers can be world-global.  Where it breaks down is when worlds interoperate, and those arbitrary identifiers (eg. UUIDs or URIs based on them) become useless for deciding whether an item common to multiple worlds is actually in the cache or not.  Done wrongly, it can easily result in repeat downloading on a massive scale.

Local or global identifiers will work poorly unless they're an intrinsic property of the actual data being indexed.  The reason they won't work well is because the same data used in two different worlds won't have a common URI-based cache index unless it happens to be supplied by the same asset service.  The same item replicated in thousands of worlds would end up being stored thousands of times in the cache.  While the storage cost may be of little consequence, the repeated access cost is not, because round-trip times have very limited downward scalability.

The engineering solution to this is pretty obvious:  scene component identifiers should include a hash or digest over the data, this information being separable from other parts of the identifier/URI so that it can be used as a key into the cache.  The cache is king, and terabyte caches should be regarded as normal now, with petabyte caches not so many years down the line.  The goal of "Never download the same thing twice" is already reasonable with terabyte drives today, never mind tomorrow.  VWRAP needs to embrace this, if it is to be a scalable interop standard.

Note that in the above, my reference to "data" excludes metadata by intent.  Two objects may be quite separate, with totally different metadata, yet denote exactly the same data, which would give them the same hash digest and hence share a cache index.  This situation is likely to be extremely common, especially for environmental items such as trees, vegetation and other natural elements.  We can easily foresee a situation in which people create their brand new world by unpacking a region archive and releasing another few million items into the metaverse.  If those items were cached the first time that they were seen in one world, they would not need to be loaded again from this new world, if we design our URIs with good engineering properties and foresight.

Cache scalability as the number of worlds with common assets rises is one issue, but there is also another related one on the horizon.  As we move away from SL's primitive assets and 1-level linksets towards hierachical objects that allow object composition, the number of virtual obects made from reusable components will skyrocket, because builders will be riding on the shoulders of giants, just like in RL engineering.  This again will result in massive cross-world sharing of replicated components.

In summary:  The asset identifiers supplied by a region to a client should contain an explicit hash/digest over the data (calculated ONCE by the relevant asset service of course, not by each region), to allow client-side caches to be highly effective at eliminating unnecessary network traffic.  This will be very important in a metaverse of countless worlds, huge amounts of shared data, and massive scenes.


Morgaine.

PS. Hash digests in asset URIs deliver two other important benefits as well, beyond scalability:

 *   They provide the interesting property of near-universal asset addressing.  This may appeal to those who focus on social aspects of digital content such as imposing property semantics, in which case using a URI format that almost uniquely identifies assets can kill several birds with one stone.

 *   They provide isolation from host and network outages.  Near-universal asset addressing means that when an asset service fails to respond or returns an error code, a new URI containing the same hash digest could be manufactured and sent to a second asset service as fallback.  The benefits of this for virtual world resilience are of course immense.  Resilience is so important that I suggest it should be a protocol requirement.  The fact that we would gain resilience automatically as a mere side-effect of digest-based addressing highlights the rather nice properties of this design.

-- End.