Re: [ogpx] My take on Teleports and protocol resilience
Carlo Wood <carlo@alinoe.com> Mon, 26 October 2009 15:02 UTC
Return-Path: <carlo@alinoe.com>
X-Original-To: ogpx@core3.amsl.com
Delivered-To: ogpx@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix)
with ESMTP id B716F3A6A8F for <ogpx@core3.amsl.com>;
Mon, 26 Oct 2009 08:02:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.552
X-Spam-Level:
X-Spam-Status: No, score=0.552 tagged_above=-999 required=5 tests=[AWL=0.494,
BAYES_05=-1.11, HELO_EQ_AT=0.424, HOST_EQ_AT=0.745]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com
[127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QkjChfgx8icL for
<ogpx@core3.amsl.com>; Mon, 26 Oct 2009 08:02:39 -0700 (PDT)
Received: from viefep18-int.chello.at (viefep18-int.chello.at [62.179.121.38])
by core3.amsl.com (Postfix) with ESMTP id EAD7A3A6A90 for <ogpx@ietf.org>;
Mon, 26 Oct 2009 08:02:38 -0700 (PDT)
Received: from edge05.upc.biz ([192.168.13.212]) by viefep18-int.chello.at
(InterMail vM.7.09.01.00 201-2219-108-20080618) with ESMTP id
<20091026150250.XKME10721.viefep18-int.chello.at@edge05.upc.biz>;
Mon, 26 Oct 2009 16:02:50 +0100
Received: from mail9.alinoe.com ([77.250.43.12]) by edge05.upc.biz with edge
id xT2n1c0DK0FlQed05T2oeN; Mon, 26 Oct 2009 16:02:49 +0100
X-SourceIP: 77.250.43.12
Received: from carlo by mail9.alinoe.com with local (Exim 4.69) (envelope-from
<carlo@alinoe.com>) id 1N2R5j-0001xv-RR; Mon, 26 Oct 2009 16:02:39 +0100
Date: Mon, 26 Oct 2009 16:02:39 +0100
From: Carlo Wood <carlo@alinoe.com>
To: Vaughn Deluca <vaughn.deluca@gmail.com>
Message-ID: <20091026150239.GA6496@alinoe.com>
References: <20091025121547.GB7775@alinoe.com>
<9b8a8de40910251530u273830a5k47bdd8927b7efc76@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <9b8a8de40910251530u273830a5k47bdd8927b7efc76@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)
Cc: ogpx@ietf.org
Subject: Re: [ogpx] My take on Teleports and protocol resilience
X-BeenThere: ogpx@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual Worlds and the Open Grid Protocol <ogpx.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ogpx>,
<mailto:ogpx-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ogpx>
List-Post: <mailto:ogpx@ietf.org>
List-Help: <mailto:ogpx-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ogpx>,
<mailto:ogpx-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Oct 2009 15:02:40 -0000
That's not how I'd put it Vaughn... First of all, for the data proposed (for example, the currently active animation(s)), the AS *is* the natural authority and origin to begin with. Allow me to introduce some more theory / jargon. Consider some variable that changes over time (lets call it 'data'). The value of the variable plays a role in more than one host, and synchronization is done via a network. Because networks are not reliable it must be defined who is the authority, so that in case of failure, things can be re-synchronized from the authority. This means automatically that the authority must keep a copy of the data, of course. Now consider the following: Authority Remote value --------> value I call this a 'downstream' synchronization: the data is always and only first changed on the authority, and then propagated away from it. 'Upstream' messages that would change the value almost garanteed lead to problems, simple example: assume some variable has value 'A', the authority changes it to 'B', and some downstream host changes it at the same time to 'C'. You'd then have: Authority Remote 'B' ---B--> <--C--- 'C' ending in Authority Remote 'C' 'B' Hence that it is much better to work with requests upstream: Authority Remote 'B' ---B--> <--request:C--- 'A' Authority Remote 'C' ---C--> 'B' Authority Remote 'C' 'C' where there exists a race condition whether or not we end up with 'C' or 'B', but at least things stay synchronized. Ok, back to the animation state of an avatar. If we state that the AS is the authority, then you can hardly speak of 'keeping a copy', it's a trivial thing that both the AS as well as the simulator need to know this value. AS Simulator 'animation' -----> 'animation' If you'd want to avoid this (minimal!?) RAM usage on the AS then you are forced to drop the AS as authority, and there is no way you can recover gracefully in case of network failure. Worse, in case of a teleport, you need to CHANGE who the authority is, which gives heaps and heaps of potential problems. On Sun, Oct 25, 2009 at 11:30:48PM +0100, Vaughn Deluca wrote: > > [AS:animation state] --- message stream --> [simulator:animation state copy] > > What you are suggesting here is keeping a copy of the avatar state. We would be > increasing bandwith and storage space to gain flexibility. I do not think > storage space is a real problem. I am not sure about bandwidth. There would be NO increase in bandwidth imho, because the AS *is* the natural authority to begin with: if the user controls the avatar, changes it, then the source/origin is the AS (well, the viewer, but it's no big deal to use the AS as authority; this DOES mean however that the viewer basically only sends requests, and not real state changes). This picture also shows that the most logical place for HUD scripts is the AS as well, not the simulator. > Is there anybody who can put some real numbers on this? How much extra work > would the region have to do? Is it still acceptable if the Agent service is in > Australia and the region in Europe? > > -Vaughn > > On Sun, Oct 25, 2009 at 1:15 PM, Carlo Wood <carlo@alinoe.com> wrote: > > Lets start with some brainstorming... > > Things that might be relevant, in no particular order. > > * State of avatar > * State of attachments > * Location of avatar > * Time of unresponsiveness of avatar (position) towards viewer > * Time of unresponsiveness of attachments (attaching/detaching/scripts) > * Perception of the viewer > * Perception of other viewers > * Region boundary crossing > * Teleport over larger distance > * User configurable parameters > * did I miss something important? > > > Real message starts here: > > Something that protocol (or implementation)- wise leads to problems > are copies of the same data on different hosts; this always leads > to desynchronization of this data with heaps of untrackable problems > (bugs). > > The ONLY way to keep a copy of some data reliably synchrononized is by > having a stream of state changes being sent from one host to another, > where the messages that contain that state changes always keep > the same order and the stream is never terminated/lost (in which case > a full-resynchronization would be necessary). Note that this means > that the source of the state changes has to be a single point of > origin, which automatically means that we can identify the REAL > (original) copy of the data. Thus: > > [original] ---> stream of state changes --> [copy] > > Lets call this an "unidirectional state" by lack of a better word. > > [PS I'm ignoring actual implementation details here. I'm NOT > saying that currently anyone is using a TCP-stream of > state changes, so don't quote JUST this part with the comment > that this is not how it current works, Infinity :p > Instead, this is an mathematical approach, the abstract > equivalent of any possible deployment case]. > > Basically, we want to avoid copies of the same data. > > > The simulator contains a lot of state information that cannot > be moved away from the simulator though: because the state is > needed to calculate the interactions between all objects and > avatars in the region, which would become way too slow if, > for example, several agent domain services would have to be > queried all the time. I can imagine that this also holds for > the attachments on an avatar. > > This means that if an avatar moves from one region to another, > all this state information has to be transfered too. > > Thus, the origin sends state information to the new region. > If this fails, then we want the user to resume at it's old > location: we need to keep a copy of the state until we know > the teleport was successful. In other words, we will have > a copy of data at two different hosts, temporarily. > > One way to make sure that this is not a problem is by > freezing all state; then copy it, and only once the copy > is successful, destroy the old data and resume the simulation > in the other region. > > The problem we try to tackle now is the case were the old > region is not responsive... > > > In this case we have to notice that the AD can ALSO detect > that at least the new region is able to host the avatar: > the teleport can *partly* succeed. This can be detected > independent of copying all the state information. > > Secondly, we have to realize that the 'location' of the > avatar is (state) data that does NOT have to be copied > (we're changing it anyway) and therefore is not part of > said state, and does not suffer from the copy-problem. > > The same could be said about animations (which is currently > broken in Second Life: you must stop all animations > before teleporting or a desync happens): we could state > that after a teleport no animations are active, and leave > it to the viewer to re-initiate and required animations. > However, animations can be made unidirectional (meaning > that if anyone but the agent service wants to change the > animation, it has to send a request message to the AD, > which then grants it, so that the actual state change always > originates from the agent service (AS)). > > As a result, we can think of the animation state as: > > > [AS:animation state] --- message stream --> [simulator:animation state > copy] > > which means that we can teleport, keeping the correct > animation(s) without bothering with the source region. > > > The only clear case where this kind of 'trick' doesn't work > (I think it's not a trick, but the best way to implement this) > is for script states: those change way too frequent to host > them on the AS. > > > Conclusion > > Thus, in the case of an unresponsive source region, > we CAN - without risks of desynchronization - immediately > transfer the avatar (location) and animations, and > any other INFREQUENTLY CHANGING, UNIDIRECTIONAL DATA > that can be stored on the AS (and buffered on the > simulator), like clothing UUIDs, and attachments. > > However, the scripts in the attachments will not run > until their state is transfered from the old region. > > Imho, this is hardly a problem: after user configurable > timeout we leave it to the user what he wants to do: > logout or reset the scripts :p... Ok, we just reset > the scripts after some time, where the timeout is > determined by the AD with possible input from the user. > > -- > Carlo Wood <carlo@alinoe.com> > _______________________________________________ > ogpx mailing list > ogpx@ietf.org > https://www.ietf.org/mailman/listinfo/ogpx > > -- Carlo Wood <carlo@alinoe.com>
- [ogpx] My take on Teleports and protocol resilien… Carlo Wood
- Re: [ogpx] My take on Teleports and protocol resi… Vaughn Deluca
- Re: [ogpx] My take on Teleports and protocol resi… Carlo Wood
- Re: [ogpx] My take on Teleports and protocol resi… Vaughn Deluca
- Re: [ogpx] My take on Teleports and protocol resi… Carlo Wood
- Re: [ogpx] My take on Teleports and protocol resi… Morgaine
- Re: [ogpx] My take on Teleports and protocol resi… Vaughn Deluca