Re: [vwrap] Is 'Data Z' immutable (like a git snapshot)?

Carlo, I too would like to hear what Vaughn had in mind in the areas you
mention, although it's perfectly possible that he was trying to keep the
issues of mutability and addressing out of the initial picture entirely,
just to keep things simple.  They are important issues though, and I'm glad
that you pointed them out.

Here I'll provide my views about this general area.

First of all, mutability breaks caching entirely, so it needs to be
approached with great caution.  Caching can make the difference between a
great service and a totally unusable one (or at least one that doesn't work
very well), so if mutability is allowed then things need to be designed in
such a way that caching still works *most of the time*, namely in between
mutations.

This is an issue which has occupied many minds for decades, and I think it
can be distilled into the widely accepted notion that cached objects should
never be overwritten, but only joined by updated versions.  At a stroke this
lets highly concurrent asset services avoid the thorny issue of writing in
the presence of concurrent readers, and at the same time it lets caches have
very simple update semantics since nothing is mutable from their
perspective.  The only cost is some loss of disk space to hold old versions,
which is rarely significant given disk sizes and costs today.

While that is good for asset services and caches, it does place the burden
of achieving mutability on the parties who require it.  And that of course
is how burdens should be borne.

What this means for virtual worlds is that a mutator becomes responsible for
notifying endpoints that something has mutated, so that they can fetch the
new versions.  This needn't be an onerous requirement, and it may not even
need to be wrapped up in security tape, since having had access once
probably qualifies you for an unconditional update.  In any case, the
original item is still in its asset service (as well as in users' terabyte
caches), so one very useful property of this approach is that a simulation
can't be broken for long by a poor update since reverting is always
possible.  That's a an engineering plus point.

In respect of addressing, you mentioned using hashes as item addresses, and
this is of course my preferred strategy, which I have described and
advocated several times this week, and back in September.  Hash-based
addressing has numerous excellent engineering properties that put it head
and shoulders above other schemes.  I say go with that approach.  In any
event, when we pit alternative schemes against each other on merit, I bet
nothing will come close to hash-based.

On the issue of caps, I think that keeping their semantics simple has great
merit, because heavyweight schemes are less likely to be accepted, and
complex ones are unlikely to be implemented uniformly.  But in any case, as
I described to Vaughn, caps should be *optional* anyway (ie. asset
dependent).  Having to acquire a cap in order to fetch a Creative Commons
licensed asset is unnecessary, and indeed it is rather comic.  A cap only
needs to be requested when the asset requires it, and only those assets that
need it should bear the burden.

The issue of flag bits (or more generally, property fields) for assets is
one that interests me a lot, as it relates to the above.  As I described to
Vaughn, it is the assets that impose requirements on the protocol, not vice
versa, so it is the assets that should carry the properties that control the
protocol.

Morgaine.

==============================

On Fri, Apr 8, 2011 at 9:34 PM, Carlo Wood <carlo@alinoe.com> wrote:

> This is very nice work and I think it clarifies a lot.
>
> I have a few very basic questions.
>
> I see that you request Z from two different Asset servers
> (originally A, but also C when a backup has been stored there).
>
> No matter how you look at it, that is distributed storage:
> compare it with git or mercurial; if Z on A could be changed
> then the backup would be out dated, and that is not what
> we want. So, I conclude that 'Z' (or rather, then handle
> used to refer to Z when asking for it) therefore refers to
> a unmodifiable data (compare git's hex ID's that point to
> a snapshot).
>
> 1) Is that correct?
>
> If it is correct and the handle for Z used in messages
> like "give me a cap for Z" refer to immutable data, then
> the most logical thing to do would be to use a (large)
> hash value of the data (ie, md5) or an UUID that was
> generated when the data was last changed (which is less
> good because it would cause duplicated data when something
> is changed back to what it was before), or to use an
> ID that is less large, but whose uniqueness would be
> guaranteed by including a (for that authority unique) ID
> of the authority that made the last change. The advantage
> of the latter is less bandwidth (smaller ID's) and knowledge
> of the authority right into the ID of something (handy
> for routing). Disadvantages are: it suffers from the same
> as the UUID in that it will duplicate data whenever the
> same data is generated and it requires administration to
> make sure that every authority has a unique ID (compare
> giving people unique IP addresses).
>
> 2) Which of those is used? has, UUID or smaller specially
> crafted ID?
>
> 3) What would happen if someone gets a cap for Z, a
> modifiable object; then the object is modified and then
> the cap is used? Does the cap have a guaranteed life time?
>
> 4) If not every previous state of an object is kept
> eternally and after changing some object old caps (and
> their static data) disappear - then how will the asset
> servers that contain copies know that? They are not
> necessarily aware that anything changed at all for
> that object, so they can't know what the life time
> is. It seems that once you make a copy you are doomed
> to keep it forever or do garbage collection for assets
> that are never accessed anymore.
>
> 5) I can image that at SOME TIME in the future we want
> a few bit (or more) that ARE mutable-- despite that the
> asset ID (Z) refers to snapshot (immutable) data.
> Are we going to build into the protocol a provision for
> such mutable bits?
>
> This would mean, in the case that the ID is a hash (ie md5)
> that the asset Data exists of a payload that the hash is
> taken over plus a mutable part that is not considered for
> the hash. Obviously, such data then could get desynced
> between assets servers with copies.
>
> --
> Carlo Wood <carlo@alinoe.com>
> _______________________________________________
> vwrap mailing list
> vwrap@ietf.org
> https://www.ietf.org/mailman/listinfo/vwrap
>