Re: [dtn] "Block data length" field in BPbis

I agree.  I'll make this change in the next edition of the BPbis I-D.

Scott

-----Original Message-----
From: dtn <dtn-bounces@ietf.org> On Behalf Of Birrane, Edward J.
Sent: Wednesday, July 11, 2018 9:01 AM
To: Felix Walter <felix.walter@tu-dresden.de>; dtn@ietf.org
Subject: Re: [dtn] "Block data length" field in BPbis

Felix,

I agree and had a similar concern, though related to security.  When encrypting block-type-specific fields and using a cipher suite whose cipher-text is the same size as the plain-text, we need to length-encode the cipher-text which results in size expansion whenever we *do not* model these fields as a CBOR byte string. Sure, overflow bytes could be placed in a BCB, but that's extra processing we'd like to avoid.

For example, let's assume we have a block whose block-type-specific fields are 3 uint8_t's of values 0x1, 0x2, and 0x3.

Represented as a byte string (h'010203') we would have a serialization of 0x43010203. If we recognize this as a CBOR byte string we can treat just the 0x010203 part as plain-text and the resulting cipher-text would be 0xC1C2C3 and encoded as something like: 0x43C1C2C3, which is to say, no size expansion because we "re-use" the length field.  

Represented as an array, ( [1,2,3] ) we would have a serialization of 0x83010203. We'd need to pass that whole thing in as plain-text to a cipher suite which would generate cipher-text of 0xC1C2C3C4, which would then need to have a length provided. If represented as a CBOR byte string, the resultant serialized cipher-text would look like: 0x44C1C2C3C4.  

Unless I am missing something, we can't get around this situation. For that reason, I recommend using byte-strings to capture block-type-specific fields and concur with Felix's analyses.

-Ed

Edward J. Birrane, III, Ph.D.
Embedded Applications Group Supervisor
Principal Staff, Space Exploration Sector Johns Hopkins Applied Physics Laboratory
(W) 443-778-7423 / (F) 443-228-3839

-----Original Message-----
From: dtn <dtn-bounces@ietf.org> On Behalf Of Felix Walter
Sent: Tuesday, July 10, 2018 11:04 AM
To: dtn@ietf.org
Subject: Re: [dtn] "Block data length" field in BPbis

Hi,

Thinking again about this topic and having had a look at a possible integration of the current draft into uPCN, I have to revert my statement that I consider removing the "Block data length" field the best solution. Though it saves the hassle of storing the whole block-specific data in serialized form somewhere when serializing the bundle and though it simplifies the use of a standard CBOR encoder, it may make parsing bundles with large (and maybe unknown) blocks not only terribly inefficient, but vulnerable to DoS, as Gilbert pointed out below.

We now have to keep a buffer within the BPbis parser which we constantly resize when reading a new chunk of block-specific data. There is no chance of knowing when the data stream will end. Having to deal with a bit more overhead in the serializer while already "knowing" the bundle structures was much safer, in my opinion.

Currently, I think the best solution would be to have the block-specific data serialized separately and provided, including a length, as a single array element of the canonical block. For the payload block, this single element should be the byte string containing the payload. For other blocks, it could also be a byte string or that special "serialized-CBOR-in-CBOR" type (I'm not familiar with that) as mentioned by Matt below.

We had some internal discussions with Marius and Lucas (who implemented BPbis support in uPCN) and concluded that this would be the best solution. We see the following main advantages:

* You get much more deterministic parsing times. The maximum depth of the CBOR structure is limited. Possible DoS scenarios resulting from former unlimited "stack depth" are prevented.

* Separation of concerns. - Why should a bundle router parse blocks unrelated to its operation?

* It also addresses the challenges we faced in the serializer with two redundant length fields, as mentioned in my first mail on this topic.

Drawbacks:

* The resulting overhead is again an explicit length field. Compared to v10, we only miss the redundant length field right before the actual payload data. Compared to v11, we again introduce a length field before the block data.

* The CBOR parser and serializer have to be invoked once per block, not only once per bundle.

We consider the overhead in bundle size and computational-wise pretty much negligible compared to the aforementioned advantages and, thus, would strongly support an adaptation of BPbis to serialized block data.

What do you think?

Felix

Am 31.05.2018 um 14:56 schrieb Clark, Gilbert J. (GRC-LCA0):
> As a little context, I'm trying to parse and apply policy to many parallel bundle ... flows, I guess? ... at rates of (eventually) 200+ Gbps.  While 200 Gbps sounds impressive, that number is far less important than the number of bundles it translates to per second: processing times, after all, are generally much more tied to the number of bundles than to the actual data rates at which a specific link / path / whatever is operating.
>
> [rabbit_hole] This is related to one objection I have to proactive
> fragmentation: since processing time at intermediate nodes tends to 
> scale with the number of bundles seen as opposed to the specific data 
> rates one is operating at (within reason, of course), one generally 
> wants to take steps to *minimize* the number of bundles flowing 
> through the system at any given time.  By unnecessarily carving large 
> pieces of data up into a number of smaller bundles without cause, one 
> is forcing intermediate nodes to spend more cycles on processing said 
> bundles flowing through them. [/rabbit_hole]
>
> Having to walk entire bundles in search of specific canonical blocks is bad for me: it injects variance into the per-bundle processing time.  Large degrees of variance are suboptimal because they force one to over-design a system to hit relatively conservative targets in terms of the bundles that can be processed per second.  
>
> I'll also note that, worst-case, walking a bunch of variable-length fields seems like it could act as an effective denial of service attack: one could hypothetically craft bundles which would take forever (relatively speaking) to completely walk as part of a vain search for a target canonical block ... which may or may not exist.
>
> Anyway, I'm not terribly attached to the index idea, but ... *anything* to keep seek / search times for individual canonical blocks more constant will make design / implementation of systems that need to process bundles at high rates quite a bit more efficient in the way they can be designed and the way they can operate, you know?
>
> FWIW,
> Gilbert
>
> P.S. - I'll note that building fast-path processing for variable-length fields does suck a little bit.  I know the goal is extensibility here, but ... fixed-size sets of headers would have been so much nicer to work with.
>
> The views expressed in this mail reflect the opinions of the author.  They are, therefore, not intended to reflect official positions of NASA or the U.S. Government.
>  
> On 5/31/18, 12:49 AM, "Matt Wronkiewicz" <wronkiew@gmail.com> wrote:
>
>     Serializing canonical blocks, or just the block-specific data, has
>     some additional benefits.
>     
>     Some intermediate nodes may need to limit the CBOR tree depth for
>     static analysis of memory usage. They might then encounter a bundle
>     with a application-specific block that has a tree depth beyond its
>     limit, which it otherwise would have been able to process correctly.
>     
>     In the current version, canonical blocks have to be decoded just to
>     find the next block. Encapsulating the block-specific data reduces the
>     amount of parsing that needs to be done to find all the relevant
>     blocks. Encapsulating whole blocks would also reduce parsing and make
>     finding the total length faster.
>     
>     I would prefer to see the CBOR array format used rather than including
>     an index of offsets. Adding it to the beginning of a bundle would be
>     messy, and adding it to the end is redundant. Also sticking with CBOR
>     serialization makes both the spec and the bundles more concise.
>     
>     CBOR has an optional tag for serialized CBOR structures. The encoding
>     is specified by the protocol, so including tags is just wasted bytes.
>     
>     Matt
>     
>     On Wed, May 30, 2018 at 3:18 PM, Clark, Gilbert J. (GRC-LCA0)
>     <gilbert.j.clark@nasa.gov> wrote:
>     > What about e.g. an index field in the primary block that includes an array of both the offset and type of each canonical block included within that particular bundle?
>     >
>     > The ability to index directly to specific canonical blocks without needing to walk the bundle to find them at all would be nice.  It would also be useful to reduce the variance in processing time where e.g. a policy does need to be applied to a canonical block that comes later in a bundle.
>     >
>     > -Gilbert
>     >
>     > The views expressed in this mail reflect the opinions of the author.  They are, therefore, not intended to reflect official positions of NASA or the U.S. Government.
>     >
>     > On 5/30/18, 4:36 PM, "dtn on behalf of Felix Walter" <dtn-bounces@ietf.org on behalf of felix.walter@tu-dresden.de> wrote:
>     >
>     >     Marc,
>     >
>     >     yes, for intermediate nodes this requires parsing the complete CBOR
>     >     representation for all blocks. It also implies that a full CBOR parser
>     >     always has to be used because it is not known in advance which types
>     >     will be contained in unknown blocks.
>     >
>     >     An alternative would be to specify the "block payload" as being a single
>     >     CBOR byte string, containing the (serialized) block-specific data.
>     >     Though not as nice as having the bundle as a single large CBOR object,
>     >     from an implementation perspective, this is probably the simplest
>     >     solution. For example, the status blocks would then contain a serialized
>     >     CBOR array in the payload field.
>     >
>     >     Felix
>     >
>     >     Am 30.05.2018 um 21:37 schrieb Marc Blanchet:
>     >     > On 30 May 2018, at 15:10, Felix Walter wrote:
>     >     >
>     >     >> Scott,
>     >     >>
>     >     >> great, I think removing the field is the best solution.
>     >     >>
>     >     >
>     >     > much safer to have one single place for authority. However, does that
>     >     > require more parsing from the intermediate nodes if they need to
>     >     > somewhat parse some blocks for policy decisions for example?
>     >     >
>     >     > Marc.
>     >     >
>     >     >> Felix
>     >     >>
>     >     >> Am 30.05.2018 um 20:54 schrieb Burleigh, Scott C (312B):
>     >     >>> Felix, sorry, I am finally replying to this email: you are right
>     >     >>> that CBOR representation would provide all of the individual lengths
>     >     >>> of the block-type-specific data fields that are summed in the block
>     >     >>> data length field, and as such the block data length field is
>     >     >>> redundant.  My first impulse on re-reading your email was simply to
>     >     >>> revise the definition of "Block data length" as you suggest.  But on
>     >     >>> reflection I think it actually makes more sense to remove block data
>     >     >>> length from the specification and instead specifically require that
>     >     >>> all block-type-specific data fields appear in CBOR representation.
>     >     >>>
>     >     >>> I want to post version 11 of this specification later today, before
>     >     >>> version 10 expires, and at this point I plan to go ahead with
>     >     >>> removal of the block data length field.  If anyone has a technical
>     >     >>> argument to make in defense of retaining block data length in bpbis,
>     >     >>> please speak up this afternoon?
>     >     >>>
>     >     >>> Scott
>     >     >>>
>     >     >>> -----Original Message-----
>     >     >>> From: dtn <dtn-bounces@ietf.org> On Behalf Of Felix Walter
>     >     >>> Sent: Friday, March 23, 2018 9:22 AM
>     >     >>> To: dtn@ietf.org
>     >     >>> Subject: [dtn] "Block data length" field in BPbis
>     >     >>>
>     >     >>> Hi,
>     >     >>>
>     >     >>> We just had a short talk with Scott, Ed, and Rick about the "Block
>     >     >>> data length" [1] field of the canonical block in BPbis that I would
>     >     >>> like to forward to the list. As far as I understand it, this value
>     >     >>> is the count of (serialized) bytes still belonging to the block, but
>     >     >>> following the length field.
>     >     >>>
>     >     >>> Because CBOR is used for the block-type-specific data, the length
>     >     >>> field by itself is redundant. For example, in the payload block, it
>     >     >>> will always be followed by a "CBOR byte string" representing the
>     >     >>> payload data. (This contains a length as well.) It needs to be
>     >     >>> considered in implementations that all these length fields are
>     >     >>> variable-length themselves - however, this turned out to be no big
>     >     >>> issue. I also see the point of having the "Block data length"
>     >     >>> available to the parser, to be able to skip over the whole block
>     >     >>> data of complex extension blocks without even parsing them.
>     >     >>>
>     >     >>> Are there any further reasons for having the "Block data length"
>     >     >>> available which I have missed? Or does anyone have a strong opinion
>     >     >>> on whether this should be removed or kept?
>     >     >>>
>     >     >>> By the way, the language concerning the "Block data length" should
>     >     >>> probably be modified slightly as it refers to "[...] the aggregate
>     >     >>> length of all remaining fields of the block, i.e., the
>     >     >>> block-type-specific data fields.", though, this may (now) be
>     >     >>> followed by the CRC checksum.
>     >     >>>
>     >     >>> Felix
>     >     >>>
>     >     >>> [1] https://tools.ietf.org/html/draft-ietf-dtn-bpbis-10#section-4.2.3
>     >     >>>
>     >     >>> _______________________________________________
>     >     >>> dtn mailing list
>     >     >>> dtn@ietf.org
>     >     >>> https://www.ietf.org/mailman/listinfo/dtn
>     >     >>>
>     >     >>> _______________________________________________
>     >     >>> dtn mailing list
>     >     >>> dtn@ietf.org
>     >     >>> https://www.ietf.org/mailman/listinfo/dtn
>     >     >>
>     >     >> _______________________________________________
>     >     >> dtn mailing list
>     >     >> dtn@ietf.org
>     >     >> https://www.ietf.org/mailman/listinfo/dtn
>     >     >
>     >     > _______________________________________________
>     >     > dtn mailing list
>     >     > dtn@ietf.org
>     >     > https://www.ietf.org/mailman/listinfo/dtn
>     >
>     >
>     >     _______________________________________________
>     >     dtn mailing list
>     >     dtn@ietf.org
>     >     https://www.ietf.org/mailman/listinfo/dtn
>     >
>     >
>     > _______________________________________________
>     > dtn mailing list
>     > dtn@ietf.org
>     > https://www.ietf.org/mailman/listinfo/dtn
>     
>
> _______________________________________________
> dtn mailing list
> dtn@ietf.org
> https://www.ietf.org/mailman/listinfo/dtn

_______________________________________________
dtn mailing list
dtn@ietf.org
https://www.ietf.org/mailman/listinfo/dtn
_______________________________________________
dtn mailing list
dtn@ietf.org
https://www.ietf.org/mailman/listinfo/dtn