Re: [ogpx] Teleports and protocol resilience

Morgaine <morgaine.dinova@googlemail.com> Fri, 23 October 2009 16:47 UTC

Return-Path: <morgaine.dinova@googlemail.com>
X-Original-To: ogpx@core3.amsl.com
Delivered-To: ogpx@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 5222F3A697A for <ogpx@core3.amsl.com>; Fri, 23 Oct 2009 09:47:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.405
X-Spam-Level:
X-Spam-Status: No, score=-1.405 tagged_above=-999 required=5 tests=[AWL=0.440, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, SARE_UNSUB18=0.131]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OXFbCywVmd9G for <ogpx@core3.amsl.com>; Fri, 23 Oct 2009 09:47:02 -0700 (PDT)
Received: from mail-ew0-f208.google.com (mail-ew0-f208.google.com [209.85.219.208]) by core3.amsl.com (Postfix) with ESMTP id D5C703A68B8 for <ogpx@ietf.org>; Fri, 23 Oct 2009 09:47:01 -0700 (PDT)
Received: by mail-ew0-f208.google.com with SMTP id 4so1809445ewy.37 for <ogpx@ietf.org>; Fri, 23 Oct 2009 09:47:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=GGy6/fdxKgbSFGQlBK9GOA/XZQsExMXAGxm+D0LuzOM=; b=byV0hE+k5Qe8HCMQYkUe7BVc4FdKo1XRkC7VDfGBVyoO/P+uiVQq8tnahUV645yrOd wCh9AMUjKuIQafga9pmR4briOLHQYp8ahOU+CcA7u0IHRpGAiK48rIezJLMwWnEcjAKp RKfNsIYzh9+gWzuNUomy0OiMl1FZtKD6hLgF8=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=yCpnuzLPJ71ZAGH6hOsa/inX59O/u+SHd4fLeTXopLwhyUgXgFZLIJoP6WGx0kpbGn i0Ma/MeMqcVBSfHfe5AGpItKBXYTvEtdNFtv4NotPhzf0ceu8h9XbcRcvomE31QAlztg nSTZBEoBVtlj4+W7XwClbU4BA3uzt3vGpjz+M=
MIME-Version: 1.0
Received: by 10.211.174.10 with SMTP id b10mr273743ebp.39.1256316432700; Fri, 23 Oct 2009 09:47:12 -0700 (PDT)
In-Reply-To: <e0b04bba0910122213n66886b92x57446ad84def466f@mail.gmail.com>
References: <e0b04bba0910122213n66886b92x57446ad84def466f@mail.gmail.com>
Date: Fri, 23 Oct 2009 17:47:12 +0100
Message-ID: <e0b04bba0910230947y5b756bb0uee30c1b37d397d21@mail.gmail.com>
From: Morgaine <morgaine.dinova@googlemail.com>
To: ogpx@ietf.org
Content-Type: multipart/alternative; boundary=00504502c64eb2550104769cf6ce
Subject: Re: [ogpx] Teleports and protocol resilience
X-BeenThere: ogpx@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual Worlds and the Open Grid Protocol <ogpx.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ogpx>, <mailto:ogpx-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ogpx>
List-Post: <mailto:ogpx@ietf.org>
List-Help: <mailto:ogpx-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ogpx>, <mailto:ogpx-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Oct 2009 16:47:04 -0000

Looking back at the replies in this thread, I think that the goal and the
means to achieve it didn't quite come across.

I was trying to address only a very specific issue, just protocol resilience
under source region non-responsiveness, since this is common enough that it
merits addressing.  I did not suggest that there be any perceivable change
of teleport semantics under normal operation (because no such change is
needed), only a change in service coupling.  The semantics we experience in
SL and in Opensim would remain completely unchanged, except in the single
case of source region non-responsiveness.  Under this single anomalous case
there *would* be a perceivable change, but that change would be a huge
improvement.

There would be no new decoherence introduced since exactly the same state
changes would occur on TP as before, with no possibiity of agent state
change in the source region once the AD accepts the TP.

All that's needed to achieve such resilience for teleport at the protocol
level is a slight revision of operation phasing to permit greater execution
overlap, as I outlined.  This is independent of anything else that happens
in the course of the overall teleport operation --- the change of phasing
would affect only the transfer of *agent location* alone, nothing else.

In particular, it should not be confused with the separate requirement for
instantiation of assets or objects at destination, nor with the matter of
serializing and deserializing script states.  The latter has not even been
defined for VWRAP, so it's hard to talk about changing it.  In any event,
this isn't about those aspects of teleport, and doesn't affect them --- they
would continue to work as before.

One of the central aspects of VWRAP is that the protocol is based on a
multiple services model, and one of the key approaches in highly scalable
systems design is to keep services decoupled to the largest extent
possible.  That's what I'm proposing here, a *partial decoupling* that has
no normal semantic change but which does have benefits in anomalous
situations.

Agent location change *can* be decoupled significantly from asset
instantiation change and script state transfer.  My suggestion referred to
this decoupled *agent location change* only, not to asset and simulation
services.  Those other two services undergo state transitions at the same
time as change of agent location does on TP, but services should never be
coupled together unnecessarily, and in this case the coupling can be left
very weak.  The three types of service operations can proceed each at their
own independent rates, coupled at TP initiation time and nowhere else.

It should be noted that the legacy protocols do some of this already, in
that the agent is already active in the destination region long before her
avatar or objects have appeared.  Furthermore, the avatar currently
continues to be visible in the source region for a while after the agent
becomes active in the destination region, because of normal operation
latencies, sim-side queueing, and client lag.  This is a normal part of
current operation, and is not considered an anomaly. What's important is
that no new state change to the agent is possible in the source region after
TP is initiated, and that would remain true.

The impact of this on the other parts of the puzzle needs to wait until
those other parts are examined.  We're not there yet, but I would hope that
improving teleport protocol resilience would be a desireable goal when the
only noticeable change in semantics occurs under fault conditions and
provides a major improvement on current behaviour.


Morgaine.






======================================

On Tue, Oct 13, 2009 at 6:13 AM, Morgaine <morgaine.dinova@googlemail.com>wrote;wrote:

> One of the advantages we have in developing the VWRAP protocols is that we
> are able to look back at legacy SL and Opensim protocols and recognize
> design mistakes or limitations in them.  This allows us to avoid repeating
> such mistakes or limitations in the next generation of systems.
>
> One of the most common sources of frustration and dissatisfaction is
> simulator non-responsiveness.  While this has many possible causes, in VWRAP
> we are not interested in the internal implementation of simulators, but we
> *ARE* interested in the ability of a protocol endpoint to perform its duty
> within the protocol.  A jammed simulator host is in many cases quite unable
> to perform its protocol duties, or in some cases only exceedingly slowly,
> often timing out in a TP for example.  We have a huge amount of experience
> of this happening in both SL and Opensim, so it is a practical reality.  On
> occasion, simulators will be unable to fulfil their part in a protocol, and
> this needs to be taken into account because it is *not uncommon*.
>
> One key area in which the above is relevant is in teleports *OUT* of a
> simulator that is under distress.  Quite often users wish nothing more than
> to *leave* the region being run by a dying simulator, but when
> teleport-out requires cooperation from the host that one is trying to leave
> then this is often not possible at all.  In this situation, the only remedy
> in existing systems is to forcibly terminate the client and relog in another
> region.  We should avoid such out-of-protocol remedies being necessary
> through good protocol design.
>
> In VWRAP, we have both Rez Avatar and Derez Avatar capabilities, which lead
> to corresponding protocol operations during teleport.  If R1 is a region
> being run by a non-responsive simulator from which we want to escape, and R2
> is another region to which we wish to go, if the protocol requires a Derez
> in R1 to be completed before a Rez in R2 can commence then the user will
> have difficulties.  Clearly we don't want this.
>
> In http://tools.ietf.org/html/draft-hamrick-ogp-intro-00 , it is made
> clear that "*The agent domain MUST also remove the avatar from it's
> current location before placing the avatar in the destination location*."
> This suggests that the protocol will be sensitive to R1 non-responsiveness.
> While we do not yet have an actual VWRAP Teleport draft, it seems likely
> that its initial incarnation will have that same problem built in.
>
> I suggest that the protocol define Derez and Rez as *concurrent* and *
> non-dependent* operations to avoid this situation.  The AD can mark R1 as
> disabled for all further agent state changes --- this will provide all the
> protection needed to prevent brief double-presence anomalies from being
> significant.  If a jammed R1 refuses to give up its hold on the avatar, then
> at least the user will not suffer from it.  Reaping dead simulator sessions
> then becomes a problem for the region operator alone, and not for the AD,
> the user, and the region as happens now.
>
>
> Morgaine.
>
>
>
>