Re: [apps-discuss] Standardizing protocol buffers

Phillip Hallam-Baker <hallam@gmail.com> Fri, 12 October 2012 19:13 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 04D0621F87E9 for <apps-discuss@ietfa.amsl.com>; Fri, 12 Oct 2012 12:13:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.883
X-Spam-Level:
X-Spam-Status: No, score=-3.883 tagged_above=-999 required=5 tests=[AWL=-0.285, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nkYmoGkK8uxj for <apps-discuss@ietfa.amsl.com>; Fri, 12 Oct 2012 12:13:06 -0700 (PDT)
Received: from mail-oa0-f44.google.com (mail-oa0-f44.google.com [209.85.219.44]) by ietfa.amsl.com (Postfix) with ESMTP id AB1A121F87E8 for <apps-discuss@ietf.org>; Fri, 12 Oct 2012 12:13:06 -0700 (PDT)
Received: by mail-oa0-f44.google.com with SMTP id n5so3903815oag.31 for <apps-discuss@ietf.org>; Fri, 12 Oct 2012 12:13:06 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=EG9zJxFvySMMyUg1a9ib4I+RujhyYSFH+vcsqpiKZ2s=; b=0k3FRwckqoGP1DjyTBvl5zbaZYj36XbObMm0ugegbbOq/PGC07/64I7BJDtXTqPN37 dUmOMA++aGXHsp/ywk15c/byzVMEfKuB2kZS9JwDhDAxXwU6yKKHH7Q2Z11gmK2qMfrN gm579gfLQF1QvnePIQ5N99bNw2juiLov172ErMEqUYt+k+0JhtkfIxkqWejski7yicq+ Z9GNvpef26edVF+zfshLXE/1C0VmtTglRmUVL5/GxcPTYUnyRDW1KxBTWLLmCdWFQVDa 4EP3xZkVbYpdkwjss7fbr++YvyGxKUwQOjZnN7uI24HEV5Yiiw+r90oN/GeRjo2OXWop F1ow==
MIME-Version: 1.0
Received: by 10.60.170.229 with SMTP id ap5mr4160914oec.101.1350069186298; Fri, 12 Oct 2012 12:13:06 -0700 (PDT)
Received: by 10.76.27.103 with HTTP; Fri, 12 Oct 2012 12:13:06 -0700 (PDT)
In-Reply-To: <0484F1B0-2C8B-48B3-8523-CC01C0A23D48@tzi.org>
References: <868851912C182241B686E0BD4D73BC1713B73C@xmb-aln-x08.cisco.com> <0484F1B0-2C8B-48B3-8523-CC01C0A23D48@tzi.org>
Date: Fri, 12 Oct 2012 15:13:06 -0400
Message-ID: <CAMm+LwhG47DzEAjZ9uOny5tCqzJ17wdWUK_Ng9rpYJ9sc8QzHg@mail.gmail.com>
From: Phillip Hallam-Baker <hallam@gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Content-Type: multipart/alternative; boundary="bcaec54b4ac045831104cbe17af5"
Cc: "sstuart@google.com" <sstuart@google.com>, "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Standardizing protocol buffers
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Oct 2012 19:13:08 -0000

+1 and how does all this relate to JSON?

Also we should note that we have another very similar scheme inside TLS.

There are good reasons to prefer binary encoding over JSON and vice versa.
But it would be rather nice if we had a way to abstract the encoding choice
from the protocol.

For example, JSON does not support binary types which makes it rather
inefficient as a wrapper for video streams or the like. Another problem
with JSON is that it uses an escaped encoding for strings and those raise
security issues and performance issues. [And no, telling people not to make
mistakes is not a security solution]

There are also advantages to JSON of course but which one is best is going
to depend on the target application. So I think that is a choice that
should be made in deployment rather than by the protocol designer.


I think we should try to minimize the number of packing options. Most
decisions do not matter at all, some matter a lot but there is a very clear
choice (don't repeat DER) and some depend on circumstances.


As far as the details go, there are only a few useful design choices in a
packing scheme. I am certain this proposal has some of them wrong.


1) Integer encoding

The encoding of negative integers needs to be much better thought out. As
should the fact that an integer might well be larger than 64 bits.

The Varint scheme uses the top bit of each integer to encode the run
length. This means that if we AND each octet of an integer with 0x80 we
will get one of the following

0x00
0x80 0x00
0x80 0x80 0x00
...

The problem with this scheme is that it requires each integer to be packed
into 7 bit form which is kind of inefficient. Lots of rotates and so on. A
more time efficient scheme is to follow the ASN1 approach and pack the run
length into the first byte in some fashion. So if the top byte is:

0xxx xxxx length = 1
10xx xxxx length = 2
110x xxxx length = 3
1110 xxxx length = 4
1111 nnnn length = 4 + nnnn

One of the big advantages of this approach is that it avoids the need for
the 7 bit shifting which is not only inefficient, it requires special case
fixup for negative integers. A twos compliment number can be fitted in very
easily.


2) Tagged or untagged data

JSON and XML data is tagged which means that it is very flexible. Adding an
extra data field does not cause chaos. It was not immediately apparent to
me which approach this proposal takes.

ASN.1 data gives a choice on a per field basis. This does not seem to be a
helpful approach (to say the least).


3) Typed or untyped

This is not the same as tagged or untagged. A data structure {"a" : 1} has
different semantics from {
"b" : 1}. The tags may imply a type but they are not themselves a type
declaration.

JSON data is implicitly typed by the syntax. ASN.1 is not.

Any conversion from a binary format to JSON needs to have the type
information. That may come from either a schema or be explicit in the data
stream as in BER encoding.


4) Can it be implemented as a linear traversal?

The really insane part of ASN.1 is actually two lines buried in the X.500
documentation defining DER. According to the DER encoding rules, an object
is represented using the definite length encoding. So each object is
prefixed by the total length of the data fields.

This is a problem because the included data fields may also be objects and
the length field is encoded using variable length encoding. So emitting DER
encoding requires a totally pointless double recursion down the tree and
back again, keeping track of multiple values. It is just stupid, stupid,
stupid.

Oh and there is absolutely no security reason for DER encoding, none, zip,
nada.

Again, the draft seems pretty vague on this point as well.






On Thu, Oct 11, 2012 at 4:26 PM, Carsten Bormann <cabo@tzi.org> wrote:

> On Oct 11, 2012, at 19:21, "Rex Fernando (rex)" <rex@cisco.com> wrote:
>
> > http://www.ietf.org/id/draft-rfernando-protocol-buffers-00.txt
>
> Hey, I can do that, too.
>
> http://www.ietf.org/id/draft-bormann-apparea-bpack-00.txt
>
> How many of these do we need?
>
> (I'm actually quite serious with binarypack, as there is a lot of stuff
> out there that uses msgpack, and it probably would be a good idea to do a
> version with IETF change control.  Area of application for me is mostly
> constrained node/network REST, but of course it is a quite universal
> hammer.  Not the impact wrench that protocol buffers is :-)
>
> Grüße, Carsten
>
> _______________________________________________
> apps-discuss mailing list
> apps-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/apps-discuss
>



-- 
Website: http://hallambaker.com/