Re: [ogpx] My take on Teleports and protocol resilience

Vaughn Deluca <vaughn.deluca@gmail.com> Mon, 26 October 2009 19:22 UTC

Return-Path: <vaughn.deluca@gmail.com>
X-Original-To: ogpx@core3.amsl.com
Delivered-To: ogpx@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0186928C13D for <ogpx@core3.amsl.com>; Mon, 26 Oct 2009 12:22:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.609
X-Spam-Level:
X-Spam-Status: No, score=-1.609 tagged_above=-999 required=5 tests=[AWL=-0.631, BAYES_05=-1.11, HTML_MESSAGE=0.001, SARE_UNSUB18=0.131]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9C1IGTn8yp9e for <ogpx@core3.amsl.com>; Mon, 26 Oct 2009 12:22:19 -0700 (PDT)
Received: from mail-fx0-f218.google.com (mail-fx0-f218.google.com [209.85.220.218]) by core3.amsl.com (Postfix) with ESMTP id 01C193A697C for <ogpx@ietf.org>; Mon, 26 Oct 2009 12:22:18 -0700 (PDT)
Received: by fxm18 with SMTP id 18so12373814fxm.37 for <ogpx@ietf.org>; Mon, 26 Oct 2009 12:22:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:cc:content-type; bh=samewgWzwnN4Z56MEExjXvpmul42HqD/qx1SDGLmntg=; b=Et5dcEY2MIUdBdalMjqLXmdXIFzycxUQFOVRc89Pkue+9Lc20LujcdLZ87L8KwxmtT SkBQp4UOQ8nS3Xjh9rYDcgzGu4+OyQHn96FWmwoA9xslogvCm1LvzfLYM8Wb0TV62PRA wkzvWfDOweXA4X5Q104KMD4XkbpiMBHayDFa8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=bmSm3LjGeF8ePfrYVix7tVdH9I1F/AQWv9rR8M/X3YqF6U3ICQ2vCQErF98J6kppKc 8QZgzlXi8++UNEeW9m2iHoKWXdLL1QDYNEDpd9qIpD94gLD/151li6i+FeAOfY9t2u88 q6I/Sk3meXg0P8r0e38TP8CQvHr8Uj2PXQ7Oo=
MIME-Version: 1.0
Received: by 10.204.156.19 with SMTP id u19mr1044560bkw.62.1256584946326; Mon, 26 Oct 2009 12:22:26 -0700 (PDT)
In-Reply-To: <20091026150239.GA6496@alinoe.com>
References: <20091025121547.GB7775@alinoe.com> <9b8a8de40910251530u273830a5k47bdd8927b7efc76@mail.gmail.com> <20091026150239.GA6496@alinoe.com>
Date: Mon, 26 Oct 2009 20:22:26 +0100
Message-ID: <9b8a8de40910261222q30780f0fp1c8f4fa38383ab5d@mail.gmail.com>
From: Vaughn Deluca <vaughn.deluca@gmail.com>
To: Carlo Wood <carlo@alinoe.com>
Content-Type: multipart/alternative; boundary=0015175cd1625b1ccf0476db7b3a
Cc: ogpx@ietf.org
Subject: Re: [ogpx] My take on Teleports and protocol resilience
X-BeenThere: ogpx@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual Worlds and the Open Grid Protocol <ogpx.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ogpx>, <mailto:ogpx-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ogpx>
List-Post: <mailto:ogpx@ietf.org>
List-Help: <mailto:ogpx-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ogpx>, <mailto:ogpx-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Oct 2009 19:22:22 -0000

On Mon, Oct 26, 2009 at 4:02 PM, Carlo Wood <carlo@alinoe.com> wrote:

> That's not how I'd put it Vaughn...
> First of all, for the data proposed (for example, the currently active
>

LOL, I have the strong feeling we agree on almost all counts, I was merely
pointing out some side effects and trying to get a handle on the actual
costs your solution would carry. Your post explains again very clearly the
logic behind your proposal, but that was not my point. I fully agree with
you that it is a very logical and robust solution, my question related to
the efficiency costs.

At the end of your post you come to that point:

>If you'd want to avoid this (minimal!?) RAM usage on the AS
>then you are forced to drop the AS as authority,

First, my question was how much data we are *actually* talking about here.
You use the animation as an example, but there is much more in the Avatars
state than just the current animations. If we serialise the Avatars state to
send it over to the next region, how much data is that, and what if I am a
full size Dragon with a thousand active scripts and hundreds and hundreds of
attachments? And  what can we expect it to be in the future?
I *suspect* its on average not a critical amount, yet, i was hoping somebody
who *knows* would chime in.

Second, as i mentioned in my post i am not very concerned about the storage
space, but more about the *bandwidth*. In your proposal all changes pass
from viewer to AS to region instead of directly from viewer to region. No
matter how you call the info a copy, it is present at *both* locations i and
since you want to keep the AS in the loop, all updates pass trough the AS,
potentially crossing the whole real world. How much traffic is that, and
shopuld we be worried?

>and there is no way you can recover gracefully in case of
>network failure.

Right, that is the consequence of being more efficient.  I am  trying to get
some good estimates for the price of that robustness. But maybe i should
look at the sourcecode and get an estimation myself...

>Worse, in case of a teleport, you need to CHANGE
>who the authority is, which gives heaps and heaps of potential
>problems.

It depends what you are aiming for. You are paying in network traffic and
storage space for robustness and flexibility. In some use cases you will
want to optimise the robustness of the system yet, if i am running an Agent
Domain as a business, i might really not be pleased with he traffic your
solution makes me carry, just in case some region i do not own fails...
 Again, it might be all peanuts, but i would like to *know*

Finally, there might actually be another solution. After all, the *real*
"natural" authority is the viewer.  The viewer can keep a local copy of
avatar state. In case of a network failure the AS could ask the viewer to
send the authoritative info so it can re-create the state and send it over
to the new region. Would that work?

-Vaughn




> animation(s)), the AS *is* the natural authority and origin to begin with.
>
> Allow me to introduce some more theory / jargon. Consider some variable
> that changes over time (lets call it 'data'). The value of the variable
> plays a role in more than one host, and synchronization is done via a
> network. Because networks are not reliable it must be defined who is
> the authority, so that in case of failure, things can be re-synchronized
> from the authority. This means automatically that the authority must
> keep a copy of the data, of course.
>
> Now consider the following:
>
>  Authority             Remote
>  value      -------->  value
>
> I call this a 'downstream' synchronization: the data is always and only
> first changed on the authority, and then propagated away from it.
>
> 'Upstream' messages that would change the value almost garanteed lead
> to problems, simple example: assume some variable has value 'A',
> the authority changes it to 'B', and some downstream host changes it
> at the same time to 'C'. You'd then have:
>
>  Authority                   Remote
>  'B'       ---B-->  <--C---  'C'
>
> ending in
>
>  Authority                   Remote
>  'C'                         'B'
>
> Hence that it is much better to work with requests upstream:
>
>  Authority                           Remote
>  'B'       ---B-->  <--request:C---  'A'
>
>  Authority                           Remote
>  'C'       ---C-->                   'B'
>
>  Authority                           Remote
>  'C'                                 'C'
>
> where there exists a race condition whether or not we end
> up with 'C' or 'B', but at least things stay synchronized.
>
>
> Ok, back to the animation state of an avatar.
> If we state that the AS is the authority, then you can hardly
> speak of 'keeping a copy', it's a trivial thing that both
> the AS as well as the simulator need to know this value.
>
>  AS                       Simulator
>  'animation'    ----->    'animation'
>
> If you'd want to avoid this (minimal!?) RAM usage on the AS
> then you are forced to drop the AS as authority, and there
> is no way you can recover gracefully in case of network
> failure. Worse, in case of a teleport, you need to CHANGE
> who the authority is, which gives heaps and heaps of potential
> problems.
>
> On Sun, Oct 25, 2009 at 11:30:48PM +0100, Vaughn Deluca wrote:
> > > [AS:animation state] --- message stream --> [simulator:animation state
> copy]
> >
> > What you are suggesting here is keeping a copy of the avatar state. We
> would be
> > increasing bandwith and storage space to gain flexibility. I do  not
> think
> > storage space is a real problem. I am not sure about bandwidth.
>
> There would be NO increase in bandwidth imho, because the AS *is* the
> natural authority to begin with: if the user controls the avatar, changes
> it, then the source/origin is the AS (well, the viewer, but it's no
> big deal to use the AS as authority; this DOES mean however that the
> viewer basically only sends requests, and not real state changes).
>
> This picture also shows that the most logical place for HUD scripts
> is the AS as well, not the simulator.
>
> > Is there anybody who can put some real numbers on this? How much extra
> work
> > would the region have to do? Is it still acceptable if the Agent service
> is in
> > Australia and the region in Europe?
> >
> > -Vaughn
> >
> > On Sun, Oct 25, 2009 at 1:15 PM, Carlo Wood <carlo@alinoe.com> wrote:
> >
> >     Lets start with some brainstorming...
> >
> >     Things that might be relevant, in no particular order.
> >
> >     * State of avatar
> >     * State of attachments
> >     * Location of avatar
> >     * Time of unresponsiveness of avatar (position) towards viewer
> >     * Time of unresponsiveness of attachments
> (attaching/detaching/scripts)
> >     * Perception of the viewer
> >     * Perception of other viewers
> >     * Region boundary crossing
> >     * Teleport over larger distance
> >     * User configurable parameters
> >     * did I miss something important?
> >
> >
> >     Real message starts here:
> >
> >     Something that protocol (or implementation)- wise leads to problems
> >     are copies of the same data on different hosts; this always leads
> >     to desynchronization of this data with heaps of untrackable problems
> >     (bugs).
> >
> >     The ONLY way to keep a copy of some data reliably synchrononized is
> by
> >     having a stream of state changes being sent from one host to another,
> >     where the messages that contain that state changes always keep
> >     the same order and the stream is never terminated/lost (in which case
> >     a full-resynchronization would be necessary). Note that this means
> >     that the source of the state changes has to be a single point of
> >     origin, which automatically means that we can identify the REAL
> >     (original) copy of the data. Thus:
> >
> >      [original] ---> stream of state changes --> [copy]
> >
> >     Lets call this an "unidirectional state" by lack of a better word.
> >
> >     [PS I'm ignoring actual implementation details here. I'm NOT
> >        saying that currently anyone is using a TCP-stream of
> >        state changes, so don't quote JUST this part with the comment
> >        that this is not how it current works, Infinity :p
> >        Instead, this is an mathematical approach, the abstract
> >        equivalent of any possible deployment case].
> >
> >     Basically, we want to avoid copies of the same data.
> >
> >
> >     The simulator contains a lot of state information that cannot
> >     be moved away from the simulator though: because the state is
> >     needed to calculate the interactions between all objects and
> >     avatars in the region, which would become way too slow if,
> >     for example, several agent domain services would have to be
> >     queried all the time. I can imagine that this also holds for
> >     the attachments on an avatar.
> >
> >     This means that if an avatar moves from one region to another,
> >     all this state information has to be transfered too.
> >
> >     Thus, the origin sends state information to the new region.
> >     If this fails, then we want the user to resume at it's old
> >     location: we need to keep a copy of the state until we know
> >     the teleport was successful. In other words, we will have
> >     a copy of data at two different hosts, temporarily.
> >
> >     One way to make sure that this is not a problem is by
> >     freezing all state; then copy it, and only once the copy
> >     is successful, destroy the old data and resume the simulation
> >     in the other region.
> >
> >     The problem we try to tackle now is the case were the old
> >     region is not responsive...
> >
> >
> >     In this case we have to notice that the AD can ALSO detect
> >     that at least the new region is able to host the avatar:
> >     the teleport can *partly* succeed. This can be detected
> >     independent of copying all the state information.
> >
> >     Secondly, we have to realize that the 'location' of the
> >     avatar is (state) data that does NOT have to be copied
> >     (we're changing it anyway) and therefore is not part of
> >     said state, and does not suffer from the copy-problem.
> >
> >     The same could be said about animations (which is currently
> >     broken in Second Life: you must stop all animations
> >     before teleporting or a desync happens): we could state
> >     that after a teleport no animations are active, and leave
> >     it to the viewer to re-initiate and required animations.
> >     However, animations can be made unidirectional (meaning
> >     that if anyone but the agent service wants to change the
> >     animation, it has to send a request message to the AD,
> >     which then grants it, so that the actual state change always
> >     originates from the agent service (AS)).
> >
> >     As a result, we can think of the animation state as:
> >
> >
> >      [AS:animation state] --- message stream --> [simulator:animation
> state
> >     copy]
> >
> >     which means that we can teleport, keeping the correct
> >     animation(s) without bothering with the source region.
> >
> >
> >     The only clear case where this kind of 'trick' doesn't work
> >     (I think it's not a trick, but the best way to implement this)
> >     is for script states: those change way too frequent to host
> >     them on the AS.
> >
> >
> >     Conclusion
> >
> >     Thus, in the case of an unresponsive source region,
> >     we CAN - without risks of desynchronization - immediately
> >     transfer the avatar (location) and animations, and
> >     any other INFREQUENTLY CHANGING, UNIDIRECTIONAL DATA
> >     that can be stored on the AS (and buffered on the
> >     simulator), like clothing UUIDs, and attachments.
> >
> >     However, the scripts in the attachments will not run
> >     until their state is transfered from the old region.
> >
> >     Imho, this is hardly a problem: after user configurable
> >     timeout we leave it to the user what he wants to do:
> >     logout or reset the scripts :p... Ok, we just reset
> >     the scripts after some time, where the timeout is
> >     determined by the AD with possible input from the user.
> >
> >     --
> >     Carlo Wood <carlo@alinoe.com>
> >     _______________________________________________
> >     ogpx mailing list
> >     ogpx@ietf.org
> >     https://www.ietf.org/mailman/listinfo/ogpx
> >
> >
>
> --
> Carlo Wood <carlo@alinoe.com>
>