Re: [vwrap] Is 'Data Z' immutable (like a git snapshot)?

Carlo Wood <carlo@alinoe.com> Sat, 09 April 2011 01:56 UTC

Return-Path: <carlo@alinoe.com>
X-Original-To: vwrap@core3.amsl.com
Delivered-To: vwrap@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7C1263A699D for <vwrap@core3.amsl.com>; Fri, 8 Apr 2011 18:56:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.235
X-Spam-Level:
X-Spam-Status: No, score=-2.235 tagged_above=-999 required=5 tests=[AWL=-0.236, BAYES_00=-2.599, J_CHICKENPOX_51=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ph8imJhgzEqo for <vwrap@core3.amsl.com>; Fri, 8 Apr 2011 18:56:55 -0700 (PDT)
Received: from fep15.mx.upcmail.net (fep15.mx.upcmail.net [62.179.121.35]) by core3.amsl.com (Postfix) with ESMTP id A8CED3A699A for <vwrap@ietf.org>; Fri, 8 Apr 2011 18:56:54 -0700 (PDT)
Received: from edge04.upcmail.net ([192.168.13.239]) by viefep15-int.chello.at (InterMail vM.8.01.02.02 201-2260-120-106-20100312) with ESMTP id <20110409015839.YUZP1633.viefep15-int.chello.at@edge04.upcmail.net> for <vwrap@ietf.org>; Sat, 9 Apr 2011 03:58:39 +0200
Received: from mail9.alinoe.com ([77.250.43.12]) by edge04.upcmail.net with edge id VDyd1g00U0FlQed04DyeRi; Sat, 09 Apr 2011 03:58:38 +0200
X-SourceIP: 77.250.43.12
Received: from carlo by mail9.alinoe.com with local (Exim 4.72) (envelope-from <carlo@alinoe.com>) id 1Q8NRd-0008D3-Fo for vwrap@ietf.org; Sat, 09 Apr 2011 03:58:37 +0200
Date: Sat, 09 Apr 2011 03:58:37 +0200
From: Carlo Wood <carlo@alinoe.com>
To: vwrap@ietf.org
Message-ID: <20110409035837.0d324940@hikaru.localdomain>
In-Reply-To: <BANLkTi=__DRJ-FGvVwsQWyiDkZgz_ekg0g@mail.gmail.com>
References: <BANLkTint6CiMRZWj59sEYM2j7VoKgz4-Bw@mail.gmail.com> <AANLkTimuVubm5Becx8cg_Uq2Gdj8EjHL7maMyqWOeYCJ@mail.gmail.com> <AANLkTi=0iBKxo0_yv2LWsExzrKUjJLqP5Ua2uHB=M_7d@mail.gmail.com> <AANLkTi=QH+c-19PvavnXU+pgWyaqpAA0F5G5SMd6h4JR@mail.gmail.com> <5365485D-FFAE-46CA-B04E-D413E85FB1D1@gmail.com> <4D97E7FE.7010104@gmail.com> <4D97EEC1.7020207@gmail.com> <BANLkTi=9CXCtb=ryFtMuyG2w9ifb-2urkA@mail.gmail.com> <4D98AC5F.70501@gmail.com> <BANLkTikci18U3S-fz6k4doVTdtUig7j=zw@mail.gmail.com> <BANLkTim8uUNmGU91mYmXQX6_Eqqp92--WQ@mail.gmail.com> <20110408223402.36ae68a9@hikaru.localdomain> <BANLkTi=__DRJ-FGvVwsQWyiDkZgz_ekg0g@mail.gmail.com>
X-Mailer: Claws Mail 3.7.8 (GTK+ 2.20.1; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Cloudmark-Analysis: v=1.1 cv=JvXQbuMnWGQeb488dJ7w43Du7THgE+O7ieb9U20/rjk= c=1 sm=0 a=qqZHnfQHanQA:10 a=_kSIUADMT0YA:10 a=lF6S9qf5Q1oA:10 a=kj9zAlcOel0A:10 a=mK_AVkanAAAA:8 a=BjFOTwK7AAAA:8 a=Sa3eJpGXMqus4QJOihAA:9 a=gQbShc3wivGYef4LyzcA:7 a=CjuIK1q_8ugA:10 a=9xyTavCNlvEA:10 a=bW3kdApBr58A:10 a=WzUMvy_nK2BO-88J:21 a=_8XUjsOAaciI1ZC1:21 a=HpAAvcLHHh0Zw7uRqdWCyQ==:117
Subject: Re: [vwrap] Is 'Data Z' immutable (like a git snapshot)?
X-BeenThere: vwrap@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual World Region Agent Protocol - IETF working group <vwrap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vwrap>
List-Post: <mailto:vwrap@ietf.org>
List-Help: <mailto:vwrap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 09 Apr 2011 01:56:57 -0000

On Sat, 9 Apr 2011 02:11:00 +0100
Morgaine <morgaine.dinova@googlemail.com> wrote:

> First of all, mutability breaks caching entirely, so it needs to be
> approached with great caution.  Caching can make the difference
> between a great service and a totally unusable one (or at least one
> that doesn't work very well), so if mutability is allowed then things
> need to be designed in such a way that caching still works *most of
> the time*, namely in between mutations.

Yup, I totally agree. As you might know I've been working a lot
on improving the IRC protocol at the time. IRC works with handles
(nicks and channel names) and everything else is mutable (ie,
the channel topic, the channel modes, etc). This turned out to
be an unsolvable horror (and trust me on that please, I worked
7 years on this topic). What I did was change nick names into
numerics that are *unmutable*, so at least the bloody handle
wasn't changing all the time, heheh (channel names are already
unmutable) and then assigned authortities for the nick names
(namely those servers where the users are connected to). Authorities
have the nice property to have exclusively an outwards message
stream: from the authority away, and therefore for streamable:
if every mutating operation is kept in order than everything
remains synched in the end. This requires that one always ASKS
the authority to change something and not tell it that you
changed something. For example, I changed it that when you KICK
someone from a channel, then a request is sent to the server
where that person is connected to and that server actually
sends out the message that this user is removed from the channel.
Making an authority for channels was never done, there I have
tried to solve this problem with timestamps.

I don't think we should go this route (using time stamps).
Also, the authority "solution" has a the major disadvantage
that it only works in a non-cyclic routing tree; and and
as soon as any rerouting takes place you get into MAJOR problems
for messages that were still under way.

The ONLY way to really avoid all those nightmares is to adopt
how git and mercurial work: immutable data that some hash ID
refers to, which, as you said is, "never" deleted.

> This is an issue which has occupied many minds for decades, and I
> think it can be distilled into the widely accepted notion that cached
> objects should never be overwritten, but only joined by updated
> versions.  At a stroke this lets highly concurrent asset services
> avoid the thorny issue of writing in the presence of concurrent
> readers, and at the same time it lets caches have very simple update
> semantics since nothing is mutable from their perspective.  The only
> cost is some loss of disk space to hold old versions, which is rarely
> significant given disk sizes and costs today.

You totally convinced me :). And for the others: note that I tried
it in other ways for YEARS. So, it means something that now I'm 
convinced that wasn't right and we should avoid it.

> While that is good for asset services and caches, it does place the
> burden of achieving mutability on the parties who require it.  And
> that of course is how burdens should be borne.

Also very true.

> What this means for virtual worlds is that a mutator becomes
> responsible for notifying endpoints that something has mutated, so
> that they can fetch the new versions.

This sound like an 'authority' however: one entity and one alone
can issue the out going message that something has changed...
That is not how it should work though. I can't wrap my finger
around the difference yet though :/... If we go for immutability
then why suddenly are we talking about having to notify others
that something changed?

Of course, things DO change in-world. For example, someone could
detach something - and attach something else. Then the Agent of
that avatar is the authority I'd think: the viewer requests the
Agent to make a change, and if approved then the Agent tells the
viewer and everyone else that the change was made.

I guess that the big difference is that in those outgoing messages
is no large 'data'.. no ASSET data.  The Asset servers themselves
would never do this. Only work-tied (location tied) things will do
this: avatars and rezzed objects, with respectively the Agent
and the Region as authority/source of the mutating messages.

Routing probably goes all through the Region server: if someone
changes an attachment while they are 4000 meter away (in the same
sim) then that STILL has to be routed to everyone else in the sim,
and not later. Basically at least, the Agent *tells* the Region
server that the avatars appears has changed and the Region sends
outward messages of that fact to everyone who is connected.

As long as all mutating messages go in one direction: away from
the authority (the Agent in this case) and there are no RE-routing
issues (which is not the case here) then there are no problems
with this model.

>  This needn't be an onerous
> requirement, and it may not even need to be wrapped up in security
> tape, since having had access once probably qualifies you for an
> unconditional update.  In any case, the original item is still in its
> asset service (as well as in users' terabyte caches), so one very
> useful property of this approach is that a simulation can't be broken
> for long by a poor update since reverting is always possible.  That's
> a an engineering plus point.

If all things are well, then the outgoing 'mutation' message from
the Agent/Region only contains new asset ID's, not the data itself.
And people will get the new data using the new ID.

This sound perfectly ok for textures (which aren't even mutatable
in-world), but what about a little change to the shape of a prim
of an object?  An object (existing of many prims) is an asset:
you can get store it in an asset server and later retrieve it
again.  Hence, we have to realize that such objects are NOT
mutable. Only once they are rezzed they are mutable. This is a
requirement for things to work. A rezzed object is therefore
not an asset: it's world-data that can change (with the Region
as authority? or the owner/viewer maybe, when they are online).
Only once an object is taken back into inventory is the data
send to an asset server and a new ID is created. Until that point
all the data for such objects is exclusively stored in the region
and people obtain the data for the shape of objects (not the textures,
but for which texture ID is used on what face) from the region,
not from an asset server.

This follows from the mutablity argument (not from the fact that
this how it works in SL too).

> In respect of addressing, you mentioned using hashes as item
> addresses, and this is of course my preferred strategy, which I have
> described and advocated several times this week, and back in
> September.  Hash-based addressing has numerous excellent engineering
> properties that put it head and shoulders above other schemes.  I say
> go with that approach.  In any event, when we pit alternative schemes
> against each other on merit, I bet nothing will come close to
> hash-based.

I can't think of any disadvantages. The hash has to have a large size
of course, comparable to UUID's. Still,I'm willing to assume that
with a lot of effort it would be possible to create an asset with
a given hash that in fact is different... Would that be a problem?

Once you know the data, you can reconstruct the item anyway. You
also know the ID (hash or not). You can only know the hash if you
already know the data anyway, or when you have access to it (of course).
Being able to then create an asset that has the same hash, but in
fact is white noise (I definitely can't think of any other way
to construct a known hash) should simply result in the white noise
being discarded: someone "uploads" data that has an already existing
hash, then discard the uploaded data and use the old data. The result
is the uploader/hacker didn't gain anything at all.
 
> On the issue of caps, I think that keeping their semantics simple has
> great merit, because heavyweight schemes are less likely to be
> accepted, and complex ones are unlikely to be implemented uniformly.
> But in any case, as I described to Vaughn, caps should be *optional*
> anyway (ie. asset dependent).  Having to acquire a cap in order to
> fetch a Creative Commons licensed asset is unnecessary, and indeed it
> is rather comic.  A cap only needs to be requested when the asset
> requires it, and only those assets that need it should bear the
> burden.

Hear hear! I like this idea very much :).
But it is unrelated to the ID / unmutable data of course.
This is just about how easy it is to get a cap for something.
I guess that what you mean is that a free asset should have a
cap that exists of it's hash. So that if you know the ID/hash
(ie 'Z') you know immediately where to get it.

> The issue of flag bits (or more generally, property fields) for
> assets is one that interests me a lot, as it relates to the above.
> As I described to Vaughn, it is the assets that impose requirements
> on the protocol, not vice versa, so it is the assets that should
> carry the properties that control the protocol.

I've often desired mutable bit in no-modify objects in SL.
There are numerous applications for it! The idea is that
you have a no modify object but as owner still can change
certain bits that define how it is used AND that are
stored when you take it back into your inventory (are preserved
when next time you rez it again). However, 'no modify' and
the immutability that we talked about are different things
of course! If we assume that such bits are simple changes
to the object that are allowed by the owner at all times,
then the only disadvantage is that (apparently) all data
of the object needs to be stored multiple times: if their
are 4 bits and the users "plays" with them until they had
all 16 possibilities once in their inventory then we'd have
16 times the same data in the asset server.

This however is just a way to look at it (compare with
the way how we look at how a git server works: that view
is highly inefficient too, though easy to grasp).

The real implementation can of course make it so that it
doesn't store that data 16 times...

-- 
Carlo Wood <carlo@alinoe.com>