[ogpx] type-system : guidance for binary serialization implementers

Meadhbh Hamrick <ohmeadhbh@gmail.com> Sun, 28 March 2010 17:33 UTC

Return-Path: <ohmeadhbh@gmail.com>
X-Original-To: ogpx@core3.amsl.com
Delivered-To: ogpx@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id AADB53A68DE for <ogpx@core3.amsl.com>; Sun, 28 Mar 2010 10:33:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.131
X-Spam-Level: *
X-Spam-Status: No, score=1.131 tagged_above=-999 required=5 tests=[BAYES_50=0.001, DNS_FROM_OPENWHOIS=1.13]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bJtQECKElY+q for <ogpx@core3.amsl.com>; Sun, 28 Mar 2010 10:33:55 -0700 (PDT)
Received: from qw-out-2122.google.com (qw-out-2122.google.com [74.125.92.25]) by core3.amsl.com (Postfix) with ESMTP id 9A1A93A65A6 for <ogpx@ietf.org>; Sun, 28 Mar 2010 10:33:55 -0700 (PDT)
Received: by qw-out-2122.google.com with SMTP id 9so90138qwb.31 for <ogpx@ietf.org>; Sun, 28 Mar 2010 10:34:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:from:date:received :message-id:subject:to:content-type; bh=VS6KmKmSszajCMyZ3ykvMNYkljImHfquVgSHGDFunrc=; b=FjNCTK1TGnwQibfL92iDNe8G2bPN23CVDSTerwbQ5x2scRJRZETPITHXeOd5VxYyA7 XGvzldKaQ2gerwUigNI7GlZAy5fynztEsj/0/EkBo8PulN3vQI1+oDYm31NuLnrVj4TV vKn9S/prtnRXP2V5YyCDmDRUUazF7WYyXhziw=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=W6ymw5wdYxA9c+o0TSxnE+mMdzXrRgwP+rv7m/XHkZypw9z7npTwu5gjtgGWgAaufY 1chX/Hl4Tn3TXUJI7JEXyFe//STfeFOg8irjM8y5aUxwMoBaOYlcXIcAZ3vOpEq5rOLt FwwUjNI+ISKOZvyGNU7yMpv+po9bhkUjI9As0=
MIME-Version: 1.0
Received: by 10.229.20.209 with HTTP; Sun, 28 Mar 2010 10:34:02 -0700 (PDT)
From: Meadhbh Hamrick <ohmeadhbh@gmail.com>
Date: Sun, 28 Mar 2010 10:34:02 -0700
Received: by 10.229.41.140 with SMTP id o12mr3946059qce.40.1269797662209; Sun, 28 Mar 2010 10:34:22 -0700 (PDT)
Message-ID: <b325928b1003281034s55fae732n7be979446759bd12@mail.gmail.com>
To: ogpx <ogpx@ietf.org>
Content-Type: text/plain; charset="ISO-8859-1"
Subject: [ogpx] type-system : guidance for binary serialization implementers
X-BeenThere: ogpx@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Virtual World Region Agent Protocol - IETF working group <ogpx.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ogpx>, <mailto:ogpx-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ogpx>
List-Post: <mailto:ogpx@ietf.org>
List-Help: <mailto:ogpx-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ogpx>, <mailto:ogpx-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 28 Mar 2010 17:33:56 -0000

so this has stuck in my craw for a couple weeks, finally remembering
to post it to the list.

the JSON and XML serialization schemes use well known grammars that
have wide adoption. not so for the binary serialization. so if i
wanted to write an XML parser, i would have a lot of 3rd party
resources and discussion to guide me. specifically when it comes to
handling errors.

as i was writing my own implementation of the LLSD binary
serialization scheme, a couple issues came up.

i think we should *somewhere* address them. either as a section of the
type-system document, or as an informational RFC. i would like to have
clear guidance for how implementers would handle the following
exceptional events.

one of the things we were trying to do was make a type system and
serialization schemes that were resilient in the face of error. so i
think there's a strong preference that we recover as best we can from
an error instead of throwing up our hands and just saying "error!"

i would actually go and look at how LL implemented the binary
serialization (i'm sure it's in the viewer code somewhere) but i'm in
the middle of my gnu/bsd cool down period, and i don't want to reset
the timer. i think there may be other people in the same situation,
which is why we should have a document describing what we should do in
the face of parsing errors.

so... if someone out there who's more of a viewer person wants to look
at the linden implementation and describe it's behavior in a textual
format, and then someone wants to look at the opensim implementation
and describe it's behavior in a textual format, we could then document
how things work and avoid this situation for other people.

here's a list of encoding / decoding errors i think about, along with
suggestions for how to handle them.

1. the object count following the opening tag ('{' or '[') is less
than the actual number of objects in the collection. (that is, the
closing tag is not found after iterating through "object count" number
of elements, but is found later in the stream.)

the "simple" thing to do would be to simply say, "the array ends after
$(OBJECT_COUNT) number of elements, even if there's not a closing
tag."

the "smart" thing to do would be to look at the LLIDL definition of
the array you're parsing and try to come up with a "best fit" for
what's supposed to be there vs. what's actually there. in other words,
you look at the LLIDL and if it says you're supposed to have three
elements in the array and you have three elements in the array, then
you're done. or maybe you're supposed to have one additional element
in the array.

the "smart" thing to do could get quite complicated while the "simple"
thing is sure to introduce failures the "smart" thing could recover
from.

2. the object count following the opening tag ('{' or '[') is greater
than the actual number of objects in the collection. (that is, you get
a closing tag before "object count" number of elements)

the "simple" thing to do is to simply say, "a closing tag ends the collection"

combining this with the previous error case, we could say, "a
collection continues until the closing tag is reached or
$(OBJECT_COUNT) elements are found in the collection."

3. there is no closing tag. (that is, after "object count" number of
elements, you find a tag that is not a '}' or a ']'.)

this is more or less the same as error case number 1 above.

though i wonder if it makes sense to differentiate the two cases. if
we were, we would have to have the smarts enough to look forward in
the stream, count the number of opening tags, then count the number of
closing tags and see if they're unbalanced. ugh.

4. lack of 'k' elements in a map. (that is, you haven't reached the
"object count" number of elements and are expecting a 'k' but get
something else.)

my druthers would be to say, if you encounter a type that can be
converted to a string, you just do the conversion and use the result
as the key.

another way to handle it would be to ignore it.

5. you encounter a bare 'k'. (that is, you encounter a 'k' tag outside
the context of a map.)

i vote for "convert to string"

6. duplicate keys in a map. (actually this applies to all serializations)

i vote for "the last key in the map with the same name wins." that is,
if you have two or more keys with the same name, the one found
furthest in the stream is the one that's used.

ideas? comments?
--
meadhbh hamrick * it's pronounced "maeve"
@OhMeadhbh * http://meadhbh.org/ * OhMeadhbh@gmail.com