Re: [dtn] working group last call on draft-ietf-dtn-bpbis

Stephen Farrell <stephen.farrell@cs.tcd.ie> Tue, 24 January 2017 03:33 UTC

Return-Path: <stephen.farrell@cs.tcd.ie>
X-Original-To: dtn@ietfa.amsl.com
Delivered-To: dtn@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EFDFA129548 for <dtn@ietfa.amsl.com>; Mon, 23 Jan 2017 19:33:15 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.5
X-Spam-Level:
X-Spam-Status: No, score=-7.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-3.199, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cs.tcd.ie
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZGE_09oUMBWV for <dtn@ietfa.amsl.com>; Mon, 23 Jan 2017 19:33:12 -0800 (PST)
Received: from mercury.scss.tcd.ie (mercury.scss.tcd.ie [134.226.56.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DF2D312955F for <dtn@ietf.org>; Mon, 23 Jan 2017 19:33:11 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by mercury.scss.tcd.ie (Postfix) with ESMTP id 2E272BE47; Tue, 24 Jan 2017 03:33:09 +0000 (GMT)
X-Virus-Scanned: Debian amavisd-new at scss.tcd.ie
Received: from mercury.scss.tcd.ie ([127.0.0.1]) by localhost (mercury.scss.tcd.ie [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wCsV3dtOgTP0; Tue, 24 Jan 2017 03:33:06 +0000 (GMT)
Received: from [10.87.48.75] (95-45-153-252-dynamic.agg2.phb.bdt-fng.eircom.net [95.45.153.252]) by mercury.scss.tcd.ie (Postfix) with ESMTPSA id DEA21BE2C; Tue, 24 Jan 2017 03:33:05 +0000 (GMT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cs.tcd.ie; s=mail; t=1485228786; bh=M14sDRmFtWGd2O8vTO3IBk24Mt34k/pQn7L6OsgGP8I=; h=Subject:To:References:From:Date:In-Reply-To:From; b=37sgvg2jpL/xxed8GmXKHHClxSliKCbSy4zFojWCppvTWIuyE1VzYuUovaMGQagnk 6VOSn8/mhkdzOBsFiKfJvwXQ8hYdFgEf/v6u6/t9Bjz642bx5E4uvIHVi1cLGihJeM EHtkhLzT9AhvhgACub7w8G4UP7MNwKgw/KQTJ/wI=
To: Marc Blanchet <marc.blanchet@viagenie.ca>, dtn <dtn@ietf.org>
References: <44B4919D-4283-4FDD-91E5-1EE5288D50AC@viagenie.ca>
From: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Openpgp: id=D66EA7906F0B897FB2E97D582F3C8736805F8DA2; url=
Message-ID: <b573e87b-e62b-56b6-7b89-6bcbde86dd82@cs.tcd.ie>
Date: Tue, 24 Jan 2017 03:33:04 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <44B4919D-4283-4FDD-91E5-1EE5288D50AC@viagenie.ca>
Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg="sha-256"; boundary="------------ms050104030005090500000100"
Archived-At: <https://mailarchive.ietf.org/arch/msg/dtn/Gjz38oe5ocybfRnwOR2WXFW4xR0>
Subject: Re: [dtn] working group last call on draft-ietf-dtn-bpbis
X-BeenThere: dtn@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Delay Tolerant Networking \(DTN\) discussion list at the IETF." <dtn.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dtn>, <mailto:dtn-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dtn/>
List-Post: <mailto:dtn@ietf.org>
List-Help: <mailto:dtn-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dtn>, <mailto:dtn-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Jan 2017 03:33:16 -0000

Hiya,

I've done a review of bpbis-06 and have many comments. (Sorry:-)

Overall I don't think this is ready, and some more discussion
of some of the issues is needed. Since I've not followed the
list as closely as I'd have liked I may have missed some such
discussion in which case pointing me at the relevant bit(s) of
the archive would be a fine response.

I've tried to separate stuff into things that'd cause me to
ballot DISCUSS in IESG evaluation (*), things that might not,
and nitty things. I hope that helps, but don't take that
categorisation too seriously:-)

Cheers,
S.

(*) Note that I'll likely not still be on the IESG when this
gets there (I escape in March) so the fact that I would ballot
DISCUSS is not that relevant to what other ADs might do.


Possibly major issues (DISCUSS like)
------------------------------------

(A) intro: The last bullet list of the things that are not
specified here seems problematic for a PS and I think needs
more discussion/work. I'm not sure if it's only the text that
needs work, or if the missing specification is required now.
Taking the bullets one at at time (numbers are in order of
presentation): 1) This isn't clear enough, I'm not sure what's
being omitted, 2) Omitting routing I think is fair, 3) It's
also fair to omot RIB/FIB issues, 4) it's not ok to omit
security mechanism definition, (making [BPSEC] normative and
waiting on that in the RFC editor queue would fix this, and is
IMO needed), 5) I'm not sure what's right here.  I think it'd
be good to have some list discussion about this, as it'll
certainly come up in IETF LC and IESG review and having list
traffic at which to point will help backup whatever conclusions
are reached.  In particular in 4.3, I don't think it's
acceptable for the BIB and BCB to not have a normative
reference. Similarly, the "TBD" for the other extension block
types are not appropritate. (But those can likely be
informative refs.)

(B) The spec is overly prescriptive in many places and ought be
loosened up wherever possible. All we need is interop and not
the kind of conformance at which this spec seems to aim (but
maybe miss). For exmple the "retention constraint" stuff has
absoluely no reason to be a MUST. As another, I think section
5.4 only needs a MUST in step 4 and all the rest are bogus and
a bad plan.  Also in 3.1, the text here is often far too
prescritpive and I suspect based on only a couple of
implementation strategies.  There are many more examples.  I
think it'd be a good plan to do an editing pass to get rid of
as many of the extraneous and unnecessary constraints that are
here.  Examples feature in other comments, but I've not tried
to be exhaustive in spotting all instances of this.

(C) Many of the flags relaed to reporting provide ways in which
the BP, if it became widely deployed (even if not planned to be
widely used), could be a significant (D)DoS accelerator. Has
anyone figured out the scale factors involved, (e.g. if N bogus
blocks say report if this can't be processed) whether those
might be significant and if so what potential countermeasures
might apply? Absent such an analysis, or fixing the problem,
I'd argue it'd seem irresponsible to standardise the BP.  I'd
say for a PS, the minimum is that BPAs MUST default to not
sending all these new bundles except when specifically
configured to be so verbose.  This also affects 5.1 and maybe
elsewhere which says an agent MUST emit admin bundles if asked.
In 5.6, step 2: again the SHOULD needs to be qualified in order
to not have the BP be a fine DoS accelerator (given
non-singletons). Step 3's SHOULD for this is even worse as a
bad actor could include many such blocks.  In 5.13 - I think
that is too many custody signals. If one envisages DTNs with
custodians located at links that are particularly subject to
disruption, then those may be few in number and having all
other nodes/routers emit custody signals for each bundle not
taken into custody seems hugely inefficient and unnecessary.
There may be more examples.  FWIW, my guess is that if all the
current reporting is kept, then the IESG will require some kind
of applicability statement about the kind of network in which
the BP can safely be deployed. For me, fixing the problem is a
better approach than constraining it via an applicability
statement.

A bit less major (maybe not DISCUSS-worthy)
-------------------------------------------

(1) 3.1's definition of a bundle correctly says that bundles
are better when they include all the meta-data that might be
useful. If considered naively, that conflicts with modern
approaches to privacy where we want to ensure that meta-data is
only seen by those (nodes) that need to see meta-data, as one
form of data minimisation. OTOH, one could argue that such
bundles will ensure that meta-data and payloads enjoy the same
security services, which is a good thing. In any case, I think
it'd be useful to have a discussion about the privacy aspects
of the BP, esp the ways in which those may be different from
other protocols. For example, would we expect report-to URIs to
commonly allow re-identification of a person? I don't recall
we've ever really discussed such issues.

(2) 3.1, destination: I think this ought be clear that delivery
to some node in the endpoint represents success, i.e.  that the
BP does not force successful delivery to all or failure as a
binary choice.

(3) What bad things would follow if 3.2 was deleted?  It may be
that I'm too familiar with DTN (and hence not a good judge if
this section is useful or not) but I didn't find it useful.
Also - is 3.2 normative? If not, I'm even happier to see it go.
If it is, then I gotta wonder if it conflicts with other text
later. And I see there are a few 2119 MUSTs in there so I guess
you do mean it to be normative or did they sneak in in by
accident whilst editing? (As can happen.) If not deleting this
section, I'd argue to find all the bits of text in it that are
needed and move them all elsewhere and then delete the section.

(4) 3.2: The idea that the EID fully determines the MRG seems
just wrong to me. While that might be a nice theory, I figure
it's way more likely that the routing scheme determines how
many copies of a bundle are rx'd at how many instances of the
destination EID. What'd be bad about losing that concept and
letting (the determinants of) the MRG be unspecified here?

(5) 3.2, "Custody of a bundle MAY be taken only if the
destination of the bundle is a singleton endpoint." That's
plain wrong. Not all custodians can know about the desitnation
being a singleton or not. And before you say it, I don't
believe in the flag in 4.1.3 that allows an origin to specify
this - I've never seen a real example of when that's useful -
the only nearby case I recall was where the developer (me:-)
knew we wanted distibution to all nodes in a multi-member
endpoint but with best effort in terms of getting to them all
and with custody and less frequent application layer re-tx's to
ensure we got to as many as possible.  This also affects 5.2.

(6) 3.3, I'm not sure this is useful either. What'd break if it
were deleted? (But then I never liked those bits of DTNRG's
work either;-)

(7) Including some examples and an RFC 7942 implementation
status section would be a good thing, if easily done. That
would help progression and increase confidence in the
correctness of the spec.

(8) 4.1.6: Was sub-second timing discussed by the WG?  I'm not
terribly pushed on that myself, but it'd be a shame to do an
interop breaking change in the BP without discussing that
topic. A reason to think about this is that there may be
inter-VM (or intra-data-centre) reasons to consider the BP with
sub-second timing as interesting.  It'd be a shame to make that
impossible just to make it slightly simpler to represent time.

(9) 4.2.2, creation time rules: I don't see why it'd be a
problem if node-id=X, creation-time=0,counter=0,lifetime=2s is
used in two bundles emitted 3 seconds apart. Why does that
justify a MUST NOT in the spec?

(10) 4.2.2: The "30 seconds" rule also seems wrong to me, as is
the "MUST NEVER" (not a 2119 term btw) for re-use of the seq
no, which is unrealistic.  As an example of that last: what do
you expect to happen with a node that usually knows the wall
clock time, but, at this moment, knows that it does not? E.g.
previous logs have some real dates, but current clock is
1970-01-01 or whatever. I think this is fixable but the current
language is too prescriptive. Best might be to weaken the
language here and to see what implementations do in the real
world.

(11) 4.3.1: Is the SHALL here right? I would have thought a
SHOULD is better to allow for legacy interop with 5050 via
gateways, in which case there may be no node ID. That might be
better off handled in some generic fashion though, and not
piecemeal with each mention of node ID.

(12) 4.3.2: I don't believe it is correct to drop a bundle due
to the lack of a previous node block, which is what sems to be
implied here. Not all routing schemes need this and so it ought
not be a MUST. Maybe a SHOULD is enough, but even if you say
"MUST insert this" then I would like to argue that "receivers
can decide to not care" be stated explicitly e.g.  by saying
that bundles MUST NOT be rejected solely due to the lack of
this EB.

(13) I'm not sure if you have all the right "watch out for the
null EID node ID" text needed. (I didn't go back over
everything, but it'd seem wrong e.g. in an current custodian
AR.

(14) 5.4: "The bundle protocol agent MUST determine which
node(s) to forward the bundle to." That's ungrammatical and
close to BS - what if I want to multicast or broadcast the
bundle or use some other opportunistic CLA? Or a sneakernet
where nobody knows who'll be next hop.

(15) 5.4: The text about the flow label should be deleted as it
says nothing. If includng this, then the flow label spec may
need to be a normative ref (arguably).

(16) 5.4 - I think this is badly misleading. There will be many
cases where a bundle cannot be forwarded now but may be
forwarded later. Am I wrong in reading this section as
precluding that?

(17) 5.6, step 4 says one MUST handle "custody transfer
redundancy" but that term seems undefined.

(18) 5.6 (step 5) points back to 5.3 which points to 5.7 or
5.4. I don't think such GOTOs are a good idea really.  I
suggest removing lots of this and adding in some informative
(i.e. non-normative) pseudo code (or real code) as an appendix.

(19) I think some rules related to custody and fragmentation
may be missing. For example, if bundle A is multicast and
reaches two nodes on different paths who take custody
(custodians C1/C2) and who both fragment but differently (into
F11/F12 and F21/F22 resp) with eventually a custody ack for F21
reaching C1.  Assume F21 is longer than F11, what is C1 to do
with F11 when a custody timer expires?  Ought it re-transmit or
consider that the custody ack for F21 matched F11 sufficiently
well?  I'm not sure what'd be right here, if such cases can
happen.  I'd be fine with the spec admitting that some such
corner cases exist, or maybe it's easy enough to figure out,
not sure.

(20) 5.10.1, I've always wondered why custody timer expiry is
covered here, and not really considered a part of DTN routing.
It seems to me to make more sense to couple custody timing and
routing. If that resonated with folks, I think the change would
be to make the timer-related text here into an illustrative
example and to say that such things are better considered
together with routing and/or by chunks of code that are
somewhat more topology/disruption-aware of the situation in the
particular DTN.

(21) section 9 seems woefully incomplete - why is it ok to say
"will be required" at WGLC? Surely the WG should at least have
discussed the set of registries needed and the registration
rules for those? E.g. do we sill want CCSDS to be able to add
entries to some of those registries as we did with 5050?  And
has the WG considered how do all the things in this draft
relate to the set of IANA registries related to the BP? [1,2]
(In the case of [3], section 4.1.5.1 really probably does need
to say something.)

  [1] https://www.iana.org/assignments/bundle/bundle.xhtml
  [2]
https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml#uri-schemes-1

Seemingly more minor or nitty things
------------------------------------

- abstract: "This Internet Draft" is no longer appropriate
language.

- abstract: I think this ought capture the fact that this
version is not interoperable with 5050. That's not a bad thing,
but worth noting here.

- intro: I don't think the "sales" language is needed or
appropriate in the first couple of paras. It should be entirely
ok to say "we've learned stuff and fixed stuff."

- intro: "Custodial forwarding" is too terse at this point but
also hard to explain briefly, is really a mechanism and not a
capabililty, and maybe not such a highlight, so I'd delete that
bullet

- intro: "[TUT]" is quite outdated and using dtnrg.org for the
reference isn't wise (particularly at the moment when we're
seeing SERVFAIL from the relevant NS;-)

- intro: this is a bit self-serving, but maybe a reference to
the architectural retrospective [3] that Kevin and I wrote
might be useful here, though I've not checked if it touches on
enough of the issues behind the differences between 5050 and
this.

   [3] https://ieeexplore.ieee.org/document/4530739

- Figure 1: I wonder if it'd be worth pointing out that the BP
does not have to run over a layer 4 that runs over a layer 3
etc. The figure and this text does give that impression that a
"proper" transport is needed, which isn't the case.
(Tactically, I'm not sure if the text as-is, or something more
correct, would make getting a new RFC easier or harder - I
guess it'll depend on the reader;-)

- Figure 2: Few if any of the applications I've used with the
BP had an administrative element. That's maybe down to the
experimental nature of the work we've done but I don't think
it's correct to imply that all applications using the BP need
to be able to handle admin records, if that's what you're
implying. (I'm not sure.) I'd say indicating that that's an
optional thing would be right.

- 3.1, singleton: not sure if it's clear enough that all
endpoints are sets, so this may puzzle folks. Maybe add e.g.
"remember that endpoints are sets," not sure.

- 3.1, forwarding: the text is odd - "sustained effort" is not
mandatory, and what "that node" is meant here?

- 4: The first two SHALL statements are odd in that there's no
way in which one could implement this spec and not conform to
those I think. In cases like that it's fine to avoid 2119
language. Not a big deal though, as the current IESG don't get
anal about that, though some ADs in the past have done;-)

- 4: last item MUST be break stop code. Is a decoder supposed
to barf a bundle if this is not true? More generally, same
question applies for all MUSTs stated only in terms of what the
encoding must match.

- 4.1.1: why >1 CRC type? That seems bogus. None or strong
seems better to me. (And I'd go for a crypto hash for strong.)
I assume the WG discussed this and found that there are real
use-cases for each of those specified. While those don't need
to be in the spec, can someone tell me what they are as I'm not
at all sure, e.g. why a 16 bit CRC is useful as an option.

- 4.1.3: "enables anonymous bundle transmission" - that's
overstated, chances are that something in the CLA will be
identifying, or allow re-identification, so I think what you
want to say is that omitting the source EID helps with, but
does not ensure, nymity.

- 4.1.5.1: RFC3986 is the correct reference here, so the spec
text is correct as-is. It may however be worth taking a look at
the whatwg web page that has sometimes claimed to supercede
3986 for the browser-related things in which whatwg have an
interest.  That's just in case there're some useful error
handling considerations on the whatwg web page, (on the day you
look at it;-). It's also the case that since BP EIDs are URIs,
it's possible that strings that comply with today's or
yesterday's whatwg web page may end up in the BP, so it'd be
good to know if any of those (that are not valid according to
3986) might cause a problem with the CBOR encoding.

- 4.1.5.2: Danger, metaphysics! "Every node MUST be a member of
at least one singleton endpoint." This entire section is
over-thought.  I think all you need to say is that nodes the
emit bundles need to have an EID they can use as a source EID
for as long as necessary.

- 4.2.1: this entire section is duplicative. That's a bad idea.

- 4.2.2: 2nd para is badly written - that'd encourage coders to
use the values 8,9,10 and 11 in ways that might be unwise.

- 4.2.2: wrt "anonymous" see earlier comment

- 4.2.2: description of creation time is duplicative, except
the earlier text didn't cover relative time.

- 4.3.3: Is "Bundle Age Block" a good name? BAB used to mean
another type of block, so that could confuse maybe. (That said,
I forget how long we're had this name.)

- 4.3.4: Do you need to say that the hop limit MUST NOT be
changed, once a hop count EB is added. Also, can any node add
one of these, if one was not prevsiously present?

- 5: It's not necessary to say that new RFCs can supercede
this.  That's just standard IEFF process.

- 5.2: mentions "dispatch pending" as if I should know what
that is - is all the retention constraint stuff sufficiently
explained I wonder? (Personally I don't think you need to
mandate all this stuff and you cannot tell if an implementation
has done it or not so I'd not bother trying to be so
prescriptive.)

- 5.4: "at the last possible moment... MUST..." that's a bit
silly as it seems to require BP code inside a NIC which is not
how this'll usually be implemented.

- 5.5: I'm not convinced that the MUSTs here are right for all
DTNs. I reckon that 5.5 could just as well say "MAY delete" and
the BP would be fine. That might also provide some additional
flexibility for some rounting schemes. That said, I won't press
on this - if this doesn't resonate with folks now, and later
turns out to be useful, I don't think we'd have such a hard
time modifying BPAs where needed.

- 5.6: Again, this is overly prescriptive.

- 5.6, step 4: I wonder if an implementer will get all this
right.

- 5.9: Badly implemented, re-assembly can create a memory
consumption DoS vector, perhaps esp. if attempted on a
non-destination node. It'd be better to warn about that. And
maybe change from MAY for in-path reassembly to SHOULD NOT.

- 5.11: does this mean that a custodian MUST ignore a custody
signal destined for some other custodian?

- Figure 6: I don't get when reason codes 5 to 8 would really
be used. Are they in fact needed?  (They seem a bit
implementation specific to me, but I've not gone looking.)

- section 8: First sentence is bogus.

- section 8: [SECO] isn't a good reference. It's outdated and I
doubt will be picked up.