[vwrap] text serialization of DSD objects (the application/dsd+text content type)

Meadhbh Hamrick <ohmeadhbh@gmail.com> Sun, 16 January 2011 18:02 UTC

Return-Path: <ohmeadhbh@gmail.com>
X-Original-To: vwrap@core3.amsl.com
Delivered-To: vwrap@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 0D0A63A6DC8 for <vwrap@core3.amsl.com>; Sun, 16 Jan 2011 10:02:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.999
X-Spam-Level:
X-Spam-Status: No, score=-2.999 tagged_above=-999 required=5 tests=[AWL=-0.600, BAYES_00=-2.599, J_CHICKENPOX_14=0.6, J_CHICKENPOX_41=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Selw8OBnjm8Z for <vwrap@core3.amsl.com>; Sun, 16 Jan 2011 10:02:00 -0800 (PST)
Received: from mail-qw0-f44.google.com (mail-qw0-f44.google.com [209.85.216.44]) by core3.amsl.com (Postfix) with ESMTP id 62ECC3A6C6D for <vwrap@ietf.org>; Sun, 16 Jan 2011 10:02:00 -0800 (PST)
Received: by qwi2 with SMTP id 2so4299319qwi.31 for <vwrap@ietf.org>; Sun, 16 Jan 2011 10:04:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=VurRTAXtGYfpHcj1QAg9AEuz8YLXf3PiYpLkMY5QciY=; b=LpcxZ0Wa8JefU/Zv1NiqLJKAiId0jD52afpPqDKCzYRvDW0dfUPJZ4qCm75jvR6smw 1Mv09P1+zDdUbSldcEVjsEPSE4MC8yCs1EYG+QbUTeOIJV635kYlY4KkcvIFnFKYS/YJ 3zzGeT7gcSpRnOKzqe7snw3AIrfSWcTM+lNeI=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type :content-transfer-encoding; b=a9vqzcZYz4SvJZIvEWhJco/lOP5oXz9ue/7JmD2/VvkZxTb579h8MR/IvoFv0za+dq KZYQ1UzekM72BbPdbX6k0dOnQmbNc8O8YUfrFJgefAEMEBxyer2qGxaYlfxM7EBh17FW Ljgs4i92DlkOgpku0pVZcwPUGvuTOl8SCMH34=
Received: by 10.229.91.72 with SMTP id l8mr2179659qcm.137.1295201071713; Sun, 16 Jan 2011 10:04:31 -0800 (PST)
MIME-Version: 1.0
Received: by 10.220.202.141 with HTTP; Sun, 16 Jan 2011 10:04:11 -0800 (PST)
From: Meadhbh Hamrick <ohmeadhbh@gmail.com>
Date: Sun, 16 Jan 2011 10:04:11 -0800
Message-ID: <AANLkTimyx1kiP9gwXwKhghP-_m+fqxzRRH0fMS5JoAHD@mail.gmail.com>
To: vwrap@ietf.org
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: quoted-printable
Subject: [vwrap] text serialization of DSD objects (the application/dsd+text content type)
X-BeenThere: vwrap@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual World Region Agent Protocol - IETF working group <vwrap.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/vwrap>
List-Post: <mailto:vwrap@ietf.org>
List-Help: <mailto:vwrap-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/vwrap>, <mailto:vwrap-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Jan 2011 18:02:02 -0000

hey peeps,

reading ZenMondo's LSL+HTTP+MySQL object transfer paper reminded me
that when i was deploying sl8.us, i needed a serialization format that
was MUCH easier to parse in LSL. i'm not going to argue the merits of
LSL vs. some hypothetical future where we've moved to something
better. i think we all know LSL has it's limitations, but it has the
benefit of having been deployed in both SL and in a slightly different
format in OpenSim.

so the purpose of this serialization scheme is to allow objects in
existing SL or OpenSim instances to more easily participate in VWRAP
services. yeah, this is all pretty much moot since it looks like VWRAP
will either be taken round back and put out of its misery or will turn
into a group to document and bless HyperGrid. but fwiw,i'm building
loosely coupled services based on the stuff documented in previous IDs
from this group, and if you hooked up wireshark or pcap to the sl8.us
servers, grabbed the session keys for the process out of /dev/mem,
this is what you would see.

i'm assuming you've read the LLDS drafts(s) and understand it's
purpose. more specifically, you should understand the role
serialization schemes play in LLSD.

this is obviously preliminary information. it will eventually be
included as part of a forthcoming internet draft.

intro:

the "text serialization scheme" for DSD is intended to be used in
environments that are computationally constrained. specifically
environments where parsing and producing JSON, XML and Binary data is
impractical. the author has successfully used this scheme with 8 and
16 bit microcontrollers and with LSL scripts inside Second Life.

mime type & file extension :

in systems where an object serialization is carried across a transport
that understands MIME types (like S/MIME or HTTP(S)), two
serializations may be used. application/dsd+text is preferred, but
text/plain is also acceptable. the "canonical" file extension for text
serialized objects is ".dsdt" (so update your desktop software now!)

service endpoints that receive messages without a mime type, or with
the mime type 'text/plain' SHOULD interpret the contents as being
serialized with this scheme. in other words, we're optimizing for the
condition where "constrained" devices can't use typed transports.

the serialization:

text serialization encodes information into "human readable" UTF-8
contents. it borrows XML's conception of valid code-points. that is,
only a couple control characters are valid in text encoding. the XML
entry from the wikipedia says it best. from
http://en.wikipedia.org/wiki/XML :

  Unicode code points in the following ranges are valid in XML 1.0 documents:

  * U+0009, U+000A, U+000D: these are the only C0 controls accepted in XML 1.0;
  * U+0020–U+D7FF, U+E000–U+FFFD: this excludes some (not all)
non-characters in the BMP (all surrogates, U+FFFE and U+FFFF are
forbidden);
  * U+10000–U+10FFFF: this includes all code points in supplementary
planes, including non-characters.

the text serialization scheme encodes information into discrete "text
lines." a "text line" is an arbitrary length sequence of UTF-8
codepoints that ends with a CR, LF or CR LF. where CR is a "carriage
return" or %x000D, LF is a "line feed" or %x000A, and a CR LF is a
carriage return followed by a line feed.

in general, each scalar value is encoded into in a single text line.
the beginning and end of vector values (maps, arrays) are  encoded
into single text lines.

text lines have the format of:

<key> : <type tag> : <value> <EOL>

where <key> is a text key value used in maps. <type tag> is a single
character identifying the type of the <value> that follows. <EOL> is a
CR, LF, or CR LF. there are also tags for "beginning of document" and
"version."

and here's the list of tags:

* - start of document document - all text serialized documents MUST
begin with a "start of document" line that looks like: ":*:" + <EOL>.
this is useful for systems that use "magic numbers" to identify
content types.

v - version - the version tag is an optional tag used to identify
documents encoded by future versions of this specification. the
<VALUE> is a simple string representing the version of the spec. this
spec uses the version tag "1" - thus a version line from a document
encoded using rules described here would look like ":v:1" + <EOL>

u - undefined - represents the "undefined" scalar. example: ":u:" + <EOL>

b - boolean - represents the "boolean" scalar. encodes one of two
values: true or false. true is represented with the value "T" while
false is represented with any value other than "T" or the lack of a
value. for clarity, implementaions SHOULD use the character "F".
examples: ":b:T" + <EOL>, ":b:F" + <EOL>, ":b:" + <EOL>" or ":b:i
actually represent a valid false value" + <EOL>

i - integer - represents the "integer" scalar. example: ":i:9" + <EOL>

r - real - represents the "real" scalar. example: ":r:3.14159" + <EOL>

s - string - represents a string. the colon character (':'), the
back-slash character ('\')  and characters that are not allowed in the
XML encoding are escaped with the sequence backslash, u and four hex
digits. ergo, you can represent the escape character with "\u001B". a
colon would be represented with "\u003A". example: ":s:i am an example
string" + <EOL>

d - date - represents a date string. example: ":d:2011-01-16T09:42:00Z" + <EOL>

w - UUID - represents a uuid scalar. example:
":w:f1828128-0ab2-4a24-b6c9-b567ea875d4d" + <EOL>

x - URI - represents a URI. example:
":x:http\u003A//example.org/whatever.php" + <EOL>

n - binary - represents a binary encoded string. no newlines allowed.
example: ":n:aGVsbG8uCg=="

[ - array (open) - this represents the beginning of an array. example:
":[:" + <EOL>

] - array (close) - this represents the end of an array.

{ - map (open) - represents the beginning of a map.

} - map (close) - represents the end of a map.

here's an example text encoding:

:*:
:v:1
:{:
description:s:this is a point in 3 space
point:[:
:r:0.0
:r:14.9
:r:17.0
:]:
control:x:http\u003A//example.org/c/a0b53baa-db54-4257-b9bc-ea15069f3adc
:}:

anyway, if you're familiar with other serialization schemes, this
should be pretty straight forward to implement.

if anyone's interested, i'll dig up and sanitize the LSL
implementation i'm using for the in-world sl8.us tools.

-cheers
-meadhbh
--
meadhbh hamrick * it's pronounced "maeve"
@OhMeadhbh * http://meadhbh.org/ * OhMeadhbh@gmail.com