Re: [dtn] "Block data length" field in BPbis

Hi,

Thinking again about this topic and having had a look at a possible
integration of the current draft into uPCN, I have to revert my
statement that I consider removing the "Block data length" field the
best solution. Though it saves the hassle of storing the whole
block-specific data in serialized form somewhere when serializing the
bundle and though it simplifies the use of a standard CBOR encoder, it
may make parsing bundles with large (and maybe unknown) blocks not only
terribly inefficient, but vulnerable to DoS, as Gilbert pointed out below.

We now have to keep a buffer within the BPbis parser which we constantly
resize when reading a new chunk of block-specific data. There is no
chance of knowing when the data stream will end. Having to deal with a
bit more overhead in the serializer while already "knowing" the bundle
structures was much safer, in my opinion.

Currently, I think the best solution would be to have the block-specific
data serialized separately and provided, including a length, as a single
array element of the canonical block. For the payload block, this single
element should be the byte string containing the payload. For other
blocks, it could also be a byte string or that special
"serialized-CBOR-in-CBOR" type (I'm not familiar with that) as mentioned
by Matt below.

We had some internal discussions with Marius and Lucas (who implemented
BPbis support in uPCN) and concluded that this would be the best
solution. We see the following main advantages:

* You get much more deterministic parsing times. The maximum depth of
the CBOR structure is limited. Possible DoS scenarios resulting from
former unlimited "stack depth" are prevented.

* Separation of concerns. - Why should a bundle router parse blocks
unrelated to its operation?

* It also addresses the challenges we faced in the serializer with two
redundant length fields, as mentioned in my first mail on this topic.

Drawbacks:

* The resulting overhead is again an explicit length field. Compared to
v10, we only miss the redundant length field right before the actual
payload data. Compared to v11, we again introduce a length field before
the block data.

* The CBOR parser and serializer have to be invoked once per block, not
only once per bundle.

We consider the overhead in bundle size and computational-wise pretty
much negligible compared to the aforementioned advantages and, thus,
would strongly support an adaptation of BPbis to serialized block data.

What do you think?

Felix

Am 31.05.2018 um 14:56 schrieb Clark, Gilbert J. (GRC-LCA0):
> As a little context, I'm trying to parse and apply policy to many parallel bundle ... flows, I guess? ... at rates of (eventually) 200+ Gbps.  While 200 Gbps sounds impressive, that number is far less important than the number of bundles it translates to per second: processing times, after all, are generally much more tied to the number of bundles than to the actual data rates at which a specific link / path / whatever is operating.
>
> [rabbit_hole] This is related to one objection I have to proactive fragmentation: since processing time at intermediate nodes tends to scale with the number of bundles seen as opposed to the specific data rates one is operating at (within reason, of course), one generally wants to take steps to *minimize* the number of bundles flowing through the system at any given time.  By unnecessarily carving large pieces of data up into a number of smaller bundles without cause, one is forcing intermediate nodes to spend more cycles on processing said bundles flowing through them. [/rabbit_hole]
>
> Having to walk entire bundles in search of specific canonical blocks is bad for me: it injects variance into the per-bundle processing time.  Large degrees of variance are suboptimal because they force one to over-design a system to hit relatively conservative targets in terms of the bundles that can be processed per second.  
>
> I'll also note that, worst-case, walking a bunch of variable-length fields seems like it could act as an effective denial of service attack: one could hypothetically craft bundles which would take forever (relatively speaking) to completely walk as part of a vain search for a target canonical block ... which may or may not exist.
>
> Anyway, I'm not terribly attached to the index idea, but ... *anything* to keep seek / search times for individual canonical blocks more constant will make design / implementation of systems that need to process bundles at high rates quite a bit more efficient in the way they can be designed and the way they can operate, you know?
>
> FWIW,
> Gilbert 
>
> P.S. - I'll note that building fast-path processing for variable-length fields does suck a little bit.  I know the goal is extensibility here, but ... fixed-size sets of headers would have been so much nicer to work with.
>
> The views expressed in this mail reflect the opinions of the author.  They are, therefore, not intended to reflect official positions of NASA or the U.S. Government.
>  
> On 5/31/18, 12:49 AM, "Matt Wronkiewicz" <wronkiew@gmail.com> wrote:
>
>     Serializing canonical blocks, or just the block-specific data, has
>     some additional benefits.
>     
>     Some intermediate nodes may need to limit the CBOR tree depth for
>     static analysis of memory usage. They might then encounter a bundle
>     with a application-specific block that has a tree depth beyond its
>     limit, which it otherwise would have been able to process correctly.
>     
>     In the current version, canonical blocks have to be decoded just to
>     find the next block. Encapsulating the block-specific data reduces the
>     amount of parsing that needs to be done to find all the relevant
>     blocks. Encapsulating whole blocks would also reduce parsing and make
>     finding the total length faster.
>     
>     I would prefer to see the CBOR array format used rather than including
>     an index of offsets. Adding it to the beginning of a bundle would be
>     messy, and adding it to the end is redundant. Also sticking with CBOR
>     serialization makes both the spec and the bundles more concise.
>     
>     CBOR has an optional tag for serialized CBOR structures. The encoding
>     is specified by the protocol, so including tags is just wasted bytes.
>     
>     Matt
>     
>     On Wed, May 30, 2018 at 3:18 PM, Clark, Gilbert J. (GRC-LCA0)
>     <gilbert.j.clark@nasa.gov> wrote:
>     > What about e.g. an index field in the primary block that includes an array of both the offset and type of each canonical block included within that particular bundle?
>     >
>     > The ability to index directly to specific canonical blocks without needing to walk the bundle to find them at all would be nice.  It would also be useful to reduce the variance in processing time where e.g. a policy does need to be applied to a canonical block that comes later in a bundle.
>     >
>     > -Gilbert
>     >
>     > The views expressed in this mail reflect the opinions of the author.  They are, therefore, not intended to reflect official positions of NASA or the U.S. Government.
>     >
>     > On 5/30/18, 4:36 PM, "dtn on behalf of Felix Walter" <dtn-bounces@ietf.org on behalf of felix.walter@tu-dresden.de> wrote:
>     >
>     >     Marc,
>     >
>     >     yes, for intermediate nodes this requires parsing the complete CBOR
>     >     representation for all blocks. It also implies that a full CBOR parser
>     >     always has to be used because it is not known in advance which types
>     >     will be contained in unknown blocks.
>     >
>     >     An alternative would be to specify the "block payload" as being a single
>     >     CBOR byte string, containing the (serialized) block-specific data.
>     >     Though not as nice as having the bundle as a single large CBOR object,
>     >     from an implementation perspective, this is probably the simplest
>     >     solution. For example, the status blocks would then contain a serialized
>     >     CBOR array in the payload field.
>     >
>     >     Felix
>     >
>     >     Am 30.05.2018 um 21:37 schrieb Marc Blanchet:
>     >     > On 30 May 2018, at 15:10, Felix Walter wrote:
>     >     >
>     >     >> Scott,
>     >     >>
>     >     >> great, I think removing the field is the best solution.
>     >     >>
>     >     >
>     >     > much safer to have one single place for authority. However, does that
>     >     > require more parsing from the intermediate nodes if they need to
>     >     > somewhat parse some blocks for policy decisions for example?
>     >     >
>     >     > Marc.
>     >     >
>     >     >> Felix
>     >     >>
>     >     >> Am 30.05.2018 um 20:54 schrieb Burleigh, Scott C (312B):
>     >     >>> Felix, sorry, I am finally replying to this email: you are right
>     >     >>> that CBOR representation would provide all of the individual lengths
>     >     >>> of the block-type-specific data fields that are summed in the block
>     >     >>> data length field, and as such the block data length field is
>     >     >>> redundant.  My first impulse on re-reading your email was simply to
>     >     >>> revise the definition of "Block data length" as you suggest.  But on
>     >     >>> reflection I think it actually makes more sense to remove block data
>     >     >>> length from the specification and instead specifically require that
>     >     >>> all block-type-specific data fields appear in CBOR representation.
>     >     >>>
>     >     >>> I want to post version 11 of this specification later today, before
>     >     >>> version 10 expires, and at this point I plan to go ahead with
>     >     >>> removal of the block data length field.  If anyone has a technical
>     >     >>> argument to make in defense of retaining block data length in bpbis,
>     >     >>> please speak up this afternoon?
>     >     >>>
>     >     >>> Scott
>     >     >>>
>     >     >>> -----Original Message-----
>     >     >>> From: dtn <dtn-bounces@ietf.org> On Behalf Of Felix Walter
>     >     >>> Sent: Friday, March 23, 2018 9:22 AM
>     >     >>> To: dtn@ietf.org
>     >     >>> Subject: [dtn] "Block data length" field in BPbis
>     >     >>>
>     >     >>> Hi,
>     >     >>>
>     >     >>> We just had a short talk with Scott, Ed, and Rick about the "Block
>     >     >>> data length" [1] field of the canonical block in BPbis that I would
>     >     >>> like to forward to the list. As far as I understand it, this value
>     >     >>> is the count of (serialized) bytes still belonging to the block, but
>     >     >>> following the length field.
>     >     >>>
>     >     >>> Because CBOR is used for the block-type-specific data, the length
>     >     >>> field by itself is redundant. For example, in the payload block, it
>     >     >>> will always be followed by a "CBOR byte string" representing the
>     >     >>> payload data. (This contains a length as well.) It needs to be
>     >     >>> considered in implementations that all these length fields are
>     >     >>> variable-length themselves - however, this turned out to be no big
>     >     >>> issue. I also see the point of having the "Block data length"
>     >     >>> available to the parser, to be able to skip over the whole block
>     >     >>> data of complex extension blocks without even parsing them.
>     >     >>>
>     >     >>> Are there any further reasons for having the "Block data length"
>     >     >>> available which I have missed? Or does anyone have a strong opinion
>     >     >>> on whether this should be removed or kept?
>     >     >>>
>     >     >>> By the way, the language concerning the "Block data length" should
>     >     >>> probably be modified slightly as it refers to "[...] the aggregate
>     >     >>> length of all remaining fields of the block, i.e., the
>     >     >>> block-type-specific data fields.", though, this may (now) be
>     >     >>> followed by the CRC checksum.
>     >     >>>
>     >     >>> Felix
>     >     >>>
>     >     >>> [1] https://tools.ietf.org/html/draft-ietf-dtn-bpbis-10#section-4.2.3
>     >     >>>
>     >     >>> _______________________________________________
>     >     >>> dtn mailing list
>     >     >>> dtn@ietf.org
>     >     >>> https://www.ietf.org/mailman/listinfo/dtn
>     >     >>>
>     >     >>> _______________________________________________
>     >     >>> dtn mailing list
>     >     >>> dtn@ietf.org
>     >     >>> https://www.ietf.org/mailman/listinfo/dtn
>     >     >>
>     >     >> _______________________________________________
>     >     >> dtn mailing list
>     >     >> dtn@ietf.org
>     >     >> https://www.ietf.org/mailman/listinfo/dtn
>     >     >
>     >     > _______________________________________________
>     >     > dtn mailing list
>     >     > dtn@ietf.org
>     >     > https://www.ietf.org/mailman/listinfo/dtn
>     >
>     >
>     >     _______________________________________________
>     >     dtn mailing list
>     >     dtn@ietf.org
>     >     https://www.ietf.org/mailman/listinfo/dtn
>     >
>     >
>     > _______________________________________________
>     > dtn mailing list
>     > dtn@ietf.org
>     > https://www.ietf.org/mailman/listinfo/dtn
>     
>
> _______________________________________________
> dtn mailing list
> dtn@ietf.org
> https://www.ietf.org/mailman/listinfo/dtn