Re: [Anima-signaling] CBOR/JSON

Toerless Eckert <eckert@cisco.com> Fri, 29 May 2015 20:17 UTC

Date: Fri, 29 May 2015 13:17:07 -0700
From: Toerless Eckert <eckert@cisco.com>
To: Markus Stenberg <markus.stenberg@iki.fi>
Message-ID: <20150529201707.GU5551@cisco.com>
References: <11521A55-1E5B-4070-A3F6-121F7F319B70@iki.fi> <556777BA.3060200@gmail.com> <20150528205830.GC5551@cisco.com> <2D8BA190-0ADC-4AC5-B914-ACF44491C1CD@iki.fi>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <2D8BA190-0ADC-4AC5-B914-ACF44491C1CD@iki.fi>
User-Agent: Mutt/1.4.2.2i
Archived-At: <http://mailarchive.ietf.org/arch/msg/anima-signaling/Pyfh9ZRaE06jCRsqlMn60rXrCco>
Cc: Joe Hildebrand <jhildebr@cisco.com>, anima-signaling@ietf.org
Subject: Re: [Anima-signaling] CBOR/JSON
Precedence: list

Markus,

For me this thread is very helpfull because i can't compete with your
or Joe's 1e<N large> coding experience, so i am trying to learn from both your insights.

  One background i ihave is my 1e<more than i cared> non-interop
  deployment experience with varius X.N00 applications back in the 90th.
  IMHO, ISO software back then was written on the principle of "Send/generate
  based on your liberal interpretation of humunguous specs and accept on input
  only what you can not find a good reason for to reject". And diagnostics
  was a pain. Every time i had to do it, it somehow was related to something ASN.1. 

So, the one property i'd like to have is an encoding where the parser does not
have to give up upon an unexpected element or mismatch of schemata but can proceed -
with the maybe just a (very often optional) part of the parse-tree missing/ignored. And
in my understanding (correct me if i am wrong), the best ways to achieve this
are simple self-descriptive structured encodings implemented via widely reused SDKs
(so to minimize actual paser errors in the code). Thats what i was hoping to get
from CBOR. And to the best of my understanding, it would be lot harder to get
with PER.

I would certainly like to better understand what the best options are to
leverage schematas to define, create&validate and parse the encodings of the structured
data.

Yang: sure, we can define the object model/schemata with Yang, but we should agree why:

-> Yang is often used to enable multiple encodings. I think we should do a good enough
   job upfront for anima that we do not need multiple encodings. As for me, whatever
   encoding comes up, even if i don't like it, i'd rather stick to just one encoding.

   Lets say anima decides on XML and wakes up later learning that doesn't fit IOT
   encoding, and needs to add a second encoding - that where yang would be very
   helpfull to ensure consistent definition across encoding. But i think avoiding
   this to happen is why we have this discussion here.

-> Yang is a common language for definition making mutual review easier.
   Thats i think the important, but only thing Yang gives us in Anima.
   .. what am i missing ?

I have seen enough efforts that simply tried to understand the constraints of Yang,
then defined everything first in their encoding, and later added a Yang definition.
This was mostly done in recent years when Yang toolchains where not always
available.

Which brings me to the create&validate/parse coding question: I don't know what
are really good & lightweight toolchain options that would make this attractive to
be schema based. If for example i look into encoding JSON, it seems its very
much depending also on your programming language. If you use JS, you pretty much
jave JSON as part of the language itself, so schema creation is 1:1 programming
language. And hopefully there would be an easy way to permit/check just the CBOR
compatible subset.

Parsing/Validation on receipt based on schema & SDK for it would be great. I also
thought that therre might be JSON schema definitions that ould be used for that
against the CBOR encoding... Yes/No ? And of coure those CBOR compatible JSON
schemas could be created from Yang once there are appropriate toolchains.

And the beauty could be that all of these schemata related aspects could be 
optoinal and be used when deemed beneficial by the coder - and not impacting
the parsability of the self-descriptive encoding..

Cheers
    Toerless

On Fri, May 29, 2015 at 05:25:26PM +0300, Markus Stenberg wrote:
> On 28.5.2015, at 23.58, Toerless Eckert <eckert@cisco.com> wrote:
> > Lets keep Markus on the thread replying to his arguments...
> > 
> > Markus:
> > 
> > Just a quick round of pro-CBOR arguments:
> > 
> > - I do not want to make TLV parsing/generation easier, i want
> >  to eliminate the need for a lot of coders having to redo it over and
> >  over:
> 
> This is the main reason for using ASN.1 or YANG+x, not CBOR. You have multiple encodings, well-defined, and more efficient too than CBOR some of them, based on a single schema, and still extensible as needed :)
> 
> CBOR states its schemaless-ness is a feature, but it is actually a bug; you want to an extensible schema if you go towards something pre-defined here. Otherwise you wind up parsing/validating data structure subsets your parser produces _anyway_ to make sure they at least fullfill the criteria you want, by hand, before actually doing anything with it. 
> 
> There is a number of binary JSON encodings, and while CBOR is from IETF, I don???t see it in the wild in the things I do, so I do not particularly have strong opinion either way about it. Raw JSON I consider a bad idea if you need to transport binary data as it is still no longer readable etc, and number of parsers are not actually 100% binary-proof so hello hex string encoding.
> 
> > - There is a wide range of code quality generated by different coders
> >  and under different commercial conditions (time cost pressure,
> >  care for safety/reliabilty). Consider the code quality
> >  generated by the average commercial situation compared to re-using
> >  a (eg: CBOR) libraries that are continuously re-vetted.
> 
> Considering the number of bugs I have seen in e.g. popular JSON parsers (json-c), I am not as optimistic as you are about their quality. But I am sure ???this time it is different???.
> 
> > - According to Cabo, he has seen a TLV encodings that where
> >  less efficient than CBOR. Which is to me attestation of a lot of
> >  low level designs taking a lot of shortcuts and end up being less
> >  extensible mid-term.
> 
> Sure, you can define better or worse TLV encodings. CBOR is moderately efficient but not insanely so (I could define more efficient one, but there is a tradeoff between encoding efficiency and en/decode complexity obviously.)
> 
> > - Personal data point: Extending IPFIX with additional hierarchical
> >  subdomains. The IETF process is hard enough, the coding effort in 
> >  implementations is even worse.
> > 
> > - Aka: In general i am sick and tired of having to backfit extensions
> >  into closed minded TLV designs, and there is no way to have the
> >  same degree of extensibility designed into a randmon new design as
> >  would be in CBOR/JSON. If you did, you'd just reinvent CBOR/JSON,
> >  and the whole exercise of doing your own TLV design is to get away
> >  with less work. Which then just comes with all these catch22. More
> >  generally: Hierarchical structure is much less baked into TLV designs
> >  as into structured data designs.
> 
> JSON is crap for this usecase, so please just leave it out.
> 
> If you want structured data, you want also schema so CBOR is not really helpful in my opinion.
> 
> > - Diagnostics and policy language in network devices really suffer
> >  from binary TLVs. Most likely TLV designs have no good idea of
> >  transitive transparency of unknown TLVs while this is a lot more
> >  commonly well defined when you start with XML/HTML/JSON, and once
> >  you have this extensibility but binary TLV you have no way on older
> >  devices to deal with that stuff. It's all just numeric. A mix of
> >  binary and for extensions strings like possible in CBOR looks like
> >  a very attractive option to investigate.
> 
> See above. CBOR is just modern ASN.1 _encoding_. XML is again not so great if you need to transport lots of binary content (hello, hex blobs in XML and you suddenly have TLV-ish stuff inside XML binary blobs rapidly).
> 
> Anyway, I would recommend ASN.1 over CBOR personally. It is mature, proven technology with actually superior features in this use case. Another option would be YANG schema encoded in XML or JSON (or even CBOR, if someone defines a mapping for it), if you want to pursue this road. All of these options are extensible, and _provide validation for ???free"_, and _well-defined schema to stick in your standard_.
> 
> Finally, all this said, my original recommendation still stands if you do not want schema (CBOR in and of itself I consider awkward because you define want schema in your document); I have had not so fun time with ASN.1, JSON, XML, and various binary JSON variants over the years and usually they have _not_ been easier than relatively simple binary formats you can define for particular application. 
> 
> Cheers,
> 
> -Markus
>

[Anima-signaling] Fwd: Re: CBOR/JSON Brian E Carpenter
Re: [Anima-signaling] Fwd: Re: CBOR/JSON Toerless Eckert
Re: [Anima-signaling] CBOR/JSON Markus Stenberg
Re: [Anima-signaling] CBOR/JSON Joe Hildebrand (jhildebr)
Re: [Anima-signaling] CBOR/JSON Markus Stenberg
Re: [Anima-signaling] CBOR/JSON Toerless Eckert
Re: [Anima-signaling] CBOR/JSON Brian E Carpenter