Re: [vwrap] Client-side caching, URIs, and scene scalability

That's good to hear John, but it can be made far more effective and
efficient.

Your approach allows a client to avoid requesting the data, but only at the
cost of requesting the ETag header, which entails making a network request.
The approach that I am advocating would eliminate the network requests
altogether, because the hash is part of the asset identifier that is held by
the region and is handed to the client.  Requesting hashes with another
round trip doesn't scale as scenes grow massively.

Avoiding unnecessary network accesses will become crucial as new worlds
expand the asset pool with millions of replicated assets in seconds.
Hash-based URIs would also provide virtual worlds with asset resilience,
since fallback services can be queried automatically for the known asset
hash when retrieval from the initial asset URI fails.  You can't do that
when the hash is held by an asset service that is now inaccessible.

Morgaine.

==============================

On Mon, Sep 27, 2010 at 6:26 PM, Hurliman, John <john.hurliman@intel.com>wrote:

> Agreed. We are already doing this in the SimianGrid asset server by using
> the ETag HTTP header to deliver a SHA256 hash of asset data. This allows a
> client to do a HEAD request before fetching data and is compatible with
> existing web caching systems.
>
>
>
> John
>
>
>
> *From:* vwrap-bounces@ietf.org [mailto:vwrap-bounces@ietf.org] *On Behalf
> Of *Morgaine
> *Sent:* Monday, September 27, 2010 9:53 AM
> *To:* vwrap@ietf.org
> *Subject:* [vwrap] Client-side caching, URIs, and scene scalability
>
>
>
> We've discussed the structure of caps, URIs and asset addressing here many
> times in the past.  I would like us to examine this issue in the specific
> context of *client-side caching* and *scene growth*, which we have not
> previously addressed.  Scalability is a matter of huge importance (as well
> as being part of the IETF mission statement), and I'm particularly
> interested in making sure that VWRAP standards are scalable in key
> dimensions.
>
> Scenes will inevitably rise in size and complexity with the passage of
> time.  In quite a short while we can expect millions of assets in a scene
> within the field of view of an agent, and further orders of magnitude not
> long after.  While some may be tempted to call this "sci fi", observing the
> increase in memory, disk, and other computing resources over time suggests
> otherwise.  From kilo to mega, giga and tera, it's only when we look back
> that we realize that our inability to visualize exponential growth is epic.
>
> This becomes relevant when deciding on URI formats.  It's no use defining
> an elegant URI format if it doesn't scale as scene complexity rises.
>
> What this means for us when we are designing the structure of URIs is that
> we need to focus on what the URI is for, namely data access, both local and
> remote.  When designing for scalability through good use of caching, our
> goal is to avoid a client having to perform each remote access if at all
> possible.  If our elegant URI format results in clients needing to access
> data remotely despite it already being cached locally, then our elegant
> addressing scheme has failed.  "Elegant but non-scalable" is not the mark of
> success, so let's check against this requirement.
>
> When a region tells the client about the items it current holds (narrowed
> down by an interest list in an optimized implementation), it does so by
> listing the items in the scene using item identifiers of some kind.  The
> client can then use each identifier as an index into its local cache, and
> then request from the relevant asset services only those items that are not
> already cached.  This is easy in an isolated world where identifiers can be
> world-global.  Where it breaks down is when worlds interoperate, and those
> arbitrary identifiers (eg. UUIDs or URIs based on them) become useless for
> deciding whether an item common to multiple worlds is actually in the cache
> or not.  Done wrongly, it can easily result in repeat downloading on a
> massive scale.
>
> Local or global identifiers will work poorly unless they're an intrinsic
> property of the actual data being indexed.  The reason they won't work well
> is because the same data used in two different worlds won't have a common
> URI-based cache index unless it happens to be supplied by the same asset
> service.  The same item replicated in thousands of worlds would end up being
> stored thousands of times in the cache.  While the storage cost may be of
> little consequence, the repeated access cost is not, because round-trip
> times have very limited downward scalability.
>
> The engineering solution to this is pretty obvious:  scene component
> identifiers should include a *hash or digest over the data*, this
> information being separable from other parts of the identifier/URI so that
> it can be used as a key into the cache.  The cache is king, and terabyte
> caches should be regarded as normal now, with petabyte caches not so many
> years down the line.  The goal of "Never download the same thing twice" is
> already reasonable with terabyte drives today, never mind tomorrow.  VWRAP
> needs to embrace this, if it is to be a scalable interop standard.
>
> Note that in the above, my reference to "data" *excludes metadata* by
> intent.  Two objects may be quite separate, with totally different metadata,
> yet denote exactly the same data, which would give them the same hash digest
> and hence share a cache index.  This situation is likely to be extremely
> common, especially for environmental items such as trees, vegetation and
> other natural elements.  We can easily foresee a situation in which people
> create their brand new world by unpacking a region archive and releasing
> another few million items into the metaverse.  If those items were cached
> the first time that they were seen in one world, they would not need to be
> loaded again from this new world, if we design our URIs with good
> engineering properties and foresight.
>
> Cache scalability as the number of worlds with common assets rises is one
> issue, but there is also another related one on the horizon.  As we move
> away from SL's primitive assets and 1-level linksets towards *hierachical
> objects* that allow object composition, the number of virtual obects made
> from reusable components will skyrocket, because builders will be riding on
> the shoulders of giants, just like in RL engineering.  This again will
> result in massive cross-world sharing of replicated components.
>
> In summary:  The asset identifiers supplied by a region to a client should
> contain an *explicit hash/digest over the data* (calculated *ONCE* by the
> relevant asset service of course, not by each region), to allow client-side
> caches to be highly effective at eliminating unnecessary network traffic.
> This will be very important in a metaverse of countless worlds, huge amounts
> of shared data, and massive scenes.
>
>
> Morgaine.
>
> PS. Hash digests in asset URIs deliver two other important benefits as
> well, beyond scalability:
>
>    - They provide the interesting property of *near-universal asset
>    addressing*.  This may appeal to those who focus on social aspects of
>    digital content such as imposing property semantics, in which case using a
>    URI format that almost uniquely identifies assets can kill several birds
>    with one stone.
>
>
>    - They provide isolation from host and network outages.  Near-universal
>    asset addressing means that when an asset service fails to respond or
>    returns an error code, a new URI containing the same hash digest could be
>    manufactured and sent to a second asset service as fallback.  The benefits
>    of this for *virtual world resilience* are of course immense.
>    Resilience is so important that I suggest it should be a protocol
>    requirement.  The fact that we would gain resilience automatically as a mere
>    side-effect of digest-based addressing highlights the rather nice properties
>    of this design.
>
>
> -- End.
>
> _______________________________________________
> vwrap mailing list
> vwrap@ietf.org
> https://www.ietf.org/mailman/listinfo/vwrap
>
>