[ogpx] Teleports and protocol resilience

Morgaine <morgaine.dinova@googlemail.com> Tue, 13 October 2009 05:13 UTC

Return-Path: <morgaine.dinova@googlemail.com>
X-Original-To: ogpx@core3.amsl.com
Delivered-To: ogpx@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id BE3F428C1FC for <ogpx@core3.amsl.com>; Mon, 12 Oct 2009 22:13:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.755
X-Spam-Level:
X-Spam-Status: No, score=0.755 tagged_above=-999 required=5 tests=[BAYES_50=0.001, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, SARE_UNSUB18=0.131]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IATcKBBuc17N for <ogpx@core3.amsl.com>; Mon, 12 Oct 2009 22:13:24 -0700 (PDT)
Received: from mail-ew0-f208.google.com (mail-ew0-f208.google.com [209.85.219.208]) by core3.amsl.com (Postfix) with ESMTP id D1B1828C107 for <ogpx@ietf.org>; Mon, 12 Oct 2009 22:13:22 -0700 (PDT)
Received: by ewy4 with SMTP id 4so3049284ewy.37 for <ogpx@ietf.org>; Mon, 12 Oct 2009 22:13:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:date:message-id:subject :from:to:content-type; bh=yvFGfiRcLBq7y0gvdxWdQe2LgJaB0kbd6vXpf0/r3tg=; b=Wbl0fwquaubpMUH0OfUL7bHxH7AY5EBCxvmz1Um1tL+Cnv7FvMvWK+1jkmtAUK/qJx /FCxief8Ce5Kj2VdtHA+z1ZzVWFeNTk7vHWEX4FAIMPNmXW9dsXZpjPzfJqG+BO+zqev D3zPjCeyNVvMFi60IbBla7YADCQBdjKwNZOFw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=Y58fz6rnOCey1jLSEnd2AQp4AghfPKZ28Nfx0tdnM2ZbH+NhxE7hDVctbPbkSELl+E HfZVUmEHVWlG3jTCZcdP4OU3J/a55EPadWfnBQrYbTGaMuwgZnDglJGApGglY/PwfJ7c oLl8AFcuK653l2CMNcADjsRcYTtqOvcv2JTpo=
MIME-Version: 1.0
Received: by 10.211.153.2 with SMTP id f2mr5066466ebo.42.1255410799196; Mon, 12 Oct 2009 22:13:19 -0700 (PDT)
Date: Tue, 13 Oct 2009 06:13:19 +0100
Message-ID: <e0b04bba0910122213n66886b92x57446ad84def466f@mail.gmail.com>
From: Morgaine <morgaine.dinova@googlemail.com>
To: ogpx@ietf.org
Content-Type: multipart/alternative; boundary=00504502d11ebbbbe50475ca1a2a
Subject: [ogpx] Teleports and protocol resilience
X-BeenThere: ogpx@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual Worlds and the Open Grid Protocol <ogpx.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ogpx>, <mailto:ogpx-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ogpx>
List-Post: <mailto:ogpx@ietf.org>
List-Help: <mailto:ogpx-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ogpx>, <mailto:ogpx-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Oct 2009 05:13:28 -0000

One of the advantages we have in developing the VWRAP protocols is that we
are able to look back at legacy SL and Opensim protocols and recognize
design mistakes or limitations in them.  This allows us to avoid repeating
such mistakes or limitations in the next generation of systems.

One of the most common sources of frustration and dissatisfaction is
simulator non-responsiveness.  While this has many possible causes, in VWRAP
we are not interested in the internal implementation of simulators, but we *
ARE* interested in the ability of a protocol endpoint to perform its duty
within the protocol.  A jammed simulator host is in many cases quite unable
to perform its protocol duties, or in some cases only exceedingly slowly,
often timing out in a TP for example.  We have a huge amount of experience
of this happening in both SL and Opensim, so it is a practical reality.  On
occasion, simulators will be unable to fulfil their part in a protocol, and
this needs to be taken into account because it is *not uncommon*.

One key area in which the above is relevant is in teleports *OUT* of a
simulator that is under distress.  Quite often users wish nothing more than
to *leave* the region being run by a dying simulator, but when teleport-out
requires cooperation from the host that one is trying to leave then this is
often not possible at all.  In this situation, the only remedy in existing
systems is to forcibly terminate the client and relog in another region.  We
should avoid such out-of-protocol remedies being necessary through good
protocol design.

In VWRAP, we have both Rez Avatar and Derez Avatar capabilities, which lead
to corresponding protocol operations during teleport.  If R1 is a region
being run by a non-responsive simulator from which we want to escape, and R2
is another region to which we wish to go, if the protocol requires a Derez
in R1 to be completed before a Rez in R2 can commence then the user will
have difficulties.  Clearly we don't want this.

In http://tools.ietf.org/html/draft-hamrick-ogp-intro-00 , it is made clear
that "*The agent domain MUST also remove the avatar from it's current
location before placing the avatar in the destination location*."  This
suggests that the protocol will be sensitive to R1 non-responsiveness.
While we do not yet have an actual VWRAP Teleport draft, it seems likely
that its initial incarnation will have that same problem built in.

I suggest that the protocol define Derez and Rez as *concurrent* and *
non-dependent* operations to avoid this situation.  The AD can mark R1 as
disabled for all further agent state changes --- this will provide all the
protection needed to prevent brief double-presence anomalies from being
significant.  If a jammed R1 refuses to give up its hold on the avatar, then
at least the user will not suffer from it.  Reaping dead simulator sessions
then becomes a problem for the region operator alone, and not for the AD,
the user, and the region as happens now.


Morgaine.