[vwrap] Client-side caching, URIs, and scene scalability

Morgaine <morgaine.dinova@googlemail.com> Mon, 27 September 2010 16:52 UTC

Return-Path: <morgaine.dinova@googlemail.com>
X-Original-To: vwrap@core3.amsl.com
Delivered-To: vwrap@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id A4D2F3A6DA3 for <vwrap@core3.amsl.com>; Mon, 27 Sep 2010 09:52:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.368
X-Spam-Level:
X-Spam-Status: No, score=-0.368 tagged_above=-999 required=5 tests=[AWL=-1.307, BAYES_50=0.001, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, SARE_MILLIONSOF=0.315]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id D9J6yYAxOo9o for <vwrap@core3.amsl.com>; Mon, 27 Sep 2010 09:52:27 -0700 (PDT)
Received: from mail-pv0-f172.google.com (mail-pv0-f172.google.com [74.125.83.172]) by core3.amsl.com (Postfix) with ESMTP id 65DF83A6D6B for <vwrap@ietf.org>; Mon, 27 Sep 2010 09:52:11 -0700 (PDT)
Received: by pvg7 with SMTP id 7so1749364pvg.31 for <vwrap@ietf.org>; Mon, 27 Sep 2010 09:52:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:cc:content-type; bh=WJwdppnwEs/RA/2dt3iCwXQv0mJIompHjOaiPr9gIgc=; b=GL2UkAKI9qyUW7Q5n5+cMNdsscvGSIIXocLg8Tao+jscCiY4WPUSx0bfqo8tIAY48s 3Tuw9Quz+hFPBONjaI7mFjdPyBiDyvLp58l6CCWqjJCWNgpGEzlu9ewmKBgkfJ3K534P CF5yJ3Isj0YZRhKeLXn1hdK+fVHKWfTh4mta4=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:cc:content-type; b=N8D8DOYAqYZAJeVXojLm5m/heS6uU5jVFBK1Wu8n23CquBYgxeML8etce6PyEQ+9f9 x3aTEsIwBe254RhE+u+Ko9u0ZVYZOp+WsWkiZKBxGmoriSqj+hiL+UMCIUNb4Zn0lkCZ DXGC/l2qddPdWGv/EQtB/MhpnINT+AdbT6ukk=
MIME-Version: 1.0
Received: by 10.142.82.10 with SMTP id f10mr2718468wfb.118.1285606367640; Mon, 27 Sep 2010 09:52:47 -0700 (PDT)
Received: by 10.142.154.7 with HTTP; Mon, 27 Sep 2010 09:52:47 -0700 (PDT)
Date: Mon, 27 Sep 2010 17:52:47 +0100
Message-ID: <AANLkTin5GF7=qPXYTOFyB0T-2C4JrS2=xaDKo0wZC+fH@mail.gmail.com>
From: Morgaine <morgaine.dinova@googlemail.com>
To: vwrap@ietf.org
Content-Type: multipart/alternative; boundary="001636e90dd7dd423e0491408e19"
Subject: [vwrap] Client-side caching, URIs, and scene scalability
X-BeenThere: vwrap@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual World Region Agent Protocol - IETF working group <vwrap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vwrap>
List-Post: <mailto:vwrap@ietf.org>
List-Help: <mailto:vwrap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Sep 2010 16:52:33 -0000

We've discussed the structure of caps, URIs and asset addressing here many
times in the past.  I would like us to examine this issue in the specific
context of *client-side caching* and *scene growth*, which we have not
previously addressed.  Scalability is a matter of huge importance (as well
as being part of the IETF mission statement), and I'm particularly
interested in making sure that VWRAP standards are scalable in key
dimensions.

Scenes will inevitably rise in size and complexity with the passage of
time.  In quite a short while we can expect millions of assets in a scene
within the field of view of an agent, and further orders of magnitude not
long after.  While some may be tempted to call this "sci fi", observing the
increase in memory, disk, and other computing resources over time suggests
otherwise.  From kilo to mega, giga and tera, it's only when we look back
that we realize that our inability to visualize exponential growth is epic.

This becomes relevant when deciding on URI formats.  It's no use defining an
elegant URI format if it doesn't scale as scene complexity rises.

What this means for us when we are designing the structure of URIs is that
we need to focus on what the URI is for, namely data access, both local and
remote.  When designing for scalability through good use of caching, our
goal is to avoid a client having to perform each remote access if at all
possible.  If our elegant URI format results in clients needing to access
data remotely despite it already being cached locally, then our elegant
addressing scheme has failed.  "Elegant but non-scalable" is not the mark of
success, so let's check against this requirement.

When a region tells the client about the items it current holds (narrowed
down by an interest list in an optimized implementation), it does so by
listing the items in the scene using item identifiers of some kind.  The
client can then use each identifier as an index into its local cache, and
then request from the relevant asset services only those items that are not
already cached.  This is easy in an isolated world where identifiers can be
world-global.  Where it breaks down is when worlds interoperate, and those
arbitrary identifiers (eg. UUIDs or URIs based on them) become useless for
deciding whether an item common to multiple worlds is actually in the cache
or not.  Done wrongly, it can easily result in repeat downloading on a
massive scale.

Local or global identifiers will work poorly unless they're an intrinsic
property of the actual data being indexed.  The reason they won't work well
is because the same data used in two different worlds won't have a common
URI-based cache index unless it happens to be supplied by the same asset
service.  The same item replicated in thousands of worlds would end up being
stored thousands of times in the cache.  While the storage cost may be of
little consequence, the repeated access cost is not, because round-trip
times have very limited downward scalability.

The engineering solution to this is pretty obvious:  scene component
identifiers should include a *hash or digest over the data*, this
information being separable from other parts of the identifier/URI so that
it can be used as a key into the cache.  The cache is king, and terabyte
caches should be regarded as normal now, with petabyte caches not so many
years down the line.  The goal of "Never download the same thing twice" is
already reasonable with terabyte drives today, never mind tomorrow.  VWRAP
needs to embrace this, if it is to be a scalable interop standard.

Note that in the above, my reference to "data" *excludes metadata* by
intent.  Two objects may be quite separate, with totally different metadata,
yet denote exactly the same data, which would give them the same hash digest
and hence share a cache index.  This situation is likely to be extremely
common, especially for environmental items such as trees, vegetation and
other natural elements.  We can easily foresee a situation in which people
create their brand new world by unpacking a region archive and releasing
another few million items into the metaverse.  If those items were cached
the first time that they were seen in one world, they would not need to be
loaded again from this new world, if we design our URIs with good
engineering properties and foresight.

Cache scalability as the number of worlds with common assets rises is one
issue, but there is also another related one on the horizon.  As we move
away from SL's primitive assets and 1-level linksets towards *hierachical
objects* that allow object composition, the number of virtual obects made
from reusable components will skyrocket, because builders will be riding on
the shoulders of giants, just like in RL engineering.  This again will
result in massive cross-world sharing of replicated components.

In summary:  The asset identifiers supplied by a region to a client should
contain an *explicit hash/digest over the data* (calculated *ONCE* by the
relevant asset service of course, not by each region), to allow client-side
caches to be highly effective at eliminating unnecessary network traffic.
This will be very important in a metaverse of countless worlds, huge amounts
of shared data, and massive scenes.


Morgaine.

PS. Hash digests in asset URIs deliver two other important benefits as well,
beyond scalability:


   - They provide the interesting property of *near-universal asset
   addressing*.  This may appeal to those who focus on social aspects of
   digital content such as imposing property semantics, in which case using a
   URI format that almost uniquely identifies assets can kill several birds
   with one stone.


   - They provide isolation from host and network outages.  Near-universal
   asset addressing means that when an asset service fails to respond or
   returns an error code, a new URI containing the same hash digest could be
   manufactured and sent to a second asset service as fallback.  The benefits
   of this for *virtual world resilience* are of course immense.  Resilience
   is so important that I suggest it should be a protocol requirement.  The
   fact that we would gain resilience automatically as a mere side-effect of
   digest-based addressing highlights the rather nice properties of this
   design.


-- End.