Re: [dtn-interest] comments on draft-irtf-dtnrg-bundle-checksum-01

Lloyd Wood <L.Wood@surrey.ac.uk> Wed, 05 March 2008 23:49 UTC

Received: from ams-iport-1.cisco.com (ams-iport-1.cisco.com [144.254.224.140]) by maillists.intel-research.net (8.13.8/8.13.7) with ESMTP id m25NnauM018691 for <dtn-interest@mailman.dtnrg.org>; Wed, 5 Mar 2008 15:49:36 -0800
X-IronPort-AV: E=Sophos;i="4.25,453,1199660400"; d="scan'208";a="2647500"
Received: from ams-dkim-2.cisco.com ([144.254.224.139]) by ams-iport-1.cisco.com with ESMTP; 06 Mar 2008 00:53:19 +0100
Received: from ams-core-1.cisco.com (ams-core-1.cisco.com [144.254.224.150]) by ams-dkim-2.cisco.com (8.12.11/8.12.11) with ESMTP id m25NrJnl001601; Thu, 6 Mar 2008 00:53:19 +0100
Received: from cisco.com (mrwint.cisco.com [64.103.71.48]) by ams-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id m25NrHHC010223; Wed, 5 Mar 2008 23:53:18 GMT
Received: from lwood-wxp02.cisco.com (ams3-vpn-dhcp4135.cisco.com [10.61.80.38]) by cisco.com (8.11.7p3+Sun/8.8.8) with ESMTP id m25NrFT27501; Wed, 5 Mar 2008 23:53:16 GMT
Message-Id: <200803052353.m25NrFT27501@cisco.com>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Wed, 05 Mar 2008 23:53:12 +0000
To: Delay Tolerant Networking Interest List <dtn-interest@mailman.dtnrg.org>, "Symington, Susan F." <susan@mitre.org>
From: Lloyd Wood <L.Wood@surrey.ac.uk>
In-Reply-To: <8E507634779E22488719233DB3DF9FF0021E3B10@IMCSRV4.MITRE.ORG >
References: <47C8495C.4060607@cs.tcd.ie> <200802291919.m1TJJIh22133@cisco.com> <47C94AB8.9040603@cs.tcd.ie> <200803011631.m21GVtK10491@cisco.com> <47C99F5E.9040606@cs.tcd.ie> <200803012152.m21LqAK18405@cisco.com> <8E507634779E22488719233DB3DF9FF0021E3B10@IMCSRV4.MITRE.ORG>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Authentication-Results: ams-dkim-2; header.From=L.Wood@surrey.ac.uk; dkim=neutral
Subject: Re: [dtn-interest] comments on draft-irtf-dtnrg-bundle-checksum-01
X-BeenThere: dtn-interest@mailman.dtnrg.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: Delay Tolerant Networking Interest List <dtn-interest@mailman.dtnrg.org>
List-Id: Delay Tolerant Networking Interest List <dtn-interest.mailman.dtnrg.org>
List-Unsubscribe: <http://maillists.intel-research.net/mailman/listinfo/dtn-interest>, <mailto:dtn-interest-request@mailman.dtnrg.org?subject=unsubscribe>
List-Archive: <http://maillists.intel-research.net/pipermail/dtn-interest>
List-Post: <mailto:dtn-interest@mailman.dtnrg.org>
List-Help: <mailto:dtn-interest-request@mailman.dtnrg.org?subject=help>
List-Subscribe: <http://maillists.intel-research.net/mailman/listinfo/dtn-interest>, <mailto:dtn-interest-request@mailman.dtnrg.org?subject=subscribe>
X-List-Received-Date: Wed, 05 Mar 2008 23:49:38 -0000

Hi Susan,

first of all, thanks for taking the time to review and comment on our draft.

We'll deal with the first part here, and with mutable canonicalization later
in a separate email. Inline below:

At Wednesday 05/03/2008 13:01 -0500, Symington, Susan F. wrote:
>Wesley, Lloyd, and Will,
>
>I've read through your bundle_checksum_01 draft.
>
>I agree that it would be nice to have a mechanism that enables every
>node in the network to  verify whether a bundle that it receives for
>forwarding or delivery has had any errors  introduced into it. As you
>point out, there is no point in wasting network resources  forwarding a
>bundle across the network if it will only be discarded at its
>destination  because it fails an integrity check. Furthermore, there is
>no point in taking custody of a  bundle for storing and later
>forwarding if it will only be discared at its destination  because it
>fails an integrity check. This is basically the reasoning behind using
>the Bundle  Authentication Block (BAB). The idea is to identify and
>discard erroneous bundles at the  first node at which they are
>received.

Yes - the weakness in relying on the BAB is that the BAB can only detect
in-transit errors over a single bundle hop.

Corruption in a node followed by an okay transfer won't be picked up
by any BAB or by any convergence layer checksum or custody transfer
subsequently, and the corrupted bundle will be relayed to its
destination, consuming network resources, before errors are detected
(using end-to-end security and the PIB if encryption is in use).

 From this point of view, the BAB is equivalent to (though at a slightly higher
layer than) a convergence layer checksum (if the convergence layer has one.)


>You assert that computing checksums on a hop-by-hop basis would be
>inadequate because this  would allow errors that are introduced within
>a bundle node's memory, storage, or forwarding  system to go unnoticed.
>My first thought is that I wonder how much of a concern this really
>is. I don't know; I'm just wondering. 

Our take is that for large bundles, which undergo a lot of segmentation and
reassembly, the risk of introduced error, corruption or overwriting is far higher
than for single small datagrams which don't have as many computations done
on or around them. For nodes in challenging environments (radiation, lack of
error-correcting memory, etc, long-term holding onto bundles) the storage
risk is also higher than for Internet routers trying to hold onto packets for
as little time as possible.

That would make Jonathan Stone's 'Traces of Internet packets for the past
two years show that between 1 packet in 1,100 and 1 packet in 32,000
fails the TCP checksum, even on links where link-level CRCs should catch
all but 1 in 4 billion errors' [When the CRC and TCP Checksum Disagree,
ACM SIGCOMM 2000] a best-case scenario.

Here, we can think of the link-level CRC for a single datagram frame
as the equivalent of a per-link BAB check for a single, larger,
reassembled bundle with a higher probability of introduced error.
Despite the link-layer checks, the number of packets failing the
end-to-end check is significant.

We can extrapolate from this to predict that, despite the BAB checks,
and due to the reasons outlined above, the number of bundles failing
the end-to-end PIB check will also be proportionally significant.


>Defining a hop-by-hop "checksum"
>(I'm using this term  as shorthand for any unkeyed or NULL-keyed
>ciphersuite) for use in the BAB may not be ideal,  but it seems a lot
>less fraught with complexities than defining an an end-to-end checksum
>that has the property that it can be verified at any intermediate node.
>An  end-to-end checksum could be used in conjunction with a BAB
>checksum to ensure that errors  that are missed by use of the BAB
>checksum are found at the destination.

However, you have to expend network resources to get to the
destination. And, if you're sending an encrypted bundle through nodes
lacking the key, you MUST expend resources to get to the destination,
because only at the destination can that encrypted bundle be decrypted
and checked. Errors can't be detected early to lead to early discards.

With the unencrypted ciphersuites our draft defines, checking a complete
bundle at an intermediate hop at least becomes possible - no key required.
This leads to the odd situation where the resend loop and network utilization
for unencrypted complete bundles is smaller and faster than for encrypted
complete bundles, as errors can be picked up sooner. It's more
DTN-friendly in that it increases DTN performance.

Performance considerations would then encourage using the insecure
(unkeyed or NULL-keyed) ciphersuites, and applications doing their own
private security outside the Bundle Protocol to get bundle checks at
intermediate nodes and an overall performance increase.

(And utilization and efficiency concerns would encourage intermediate
nodes to favour forwarding unencrypted traffic they can check and know
is good over PIB-encrypted traffic that they can't and know nothing about.
We're presuming that transfer of reliable data is the goal here; it will
not be for some applications, but it is for the majority.)


>The use of both a  BAB checksum
>and a PIB (end-to-end) checksum in combination would allow all errors
>that were not introduced in the  nodes themselves to be discovered
>immediately, and all other errors to be discovered at the  destination.

It's the (eventually) 'at the destination' which consumes network resources
and slows things up, but this seems to be more of a problem for encrypted
ciphersuites than for the new insecure cyphersuites.

Also, if there's a custody transfer along the way after an error has been
introduced, there may no longer be a viable pristine copy of the bundle
available to be resent; with custody transfer the problem has
been handed off. So, once the corrupted bundle reaches the end node,
there's no recourse. (This can be ameliorated somewhat by sending back
a checksum in the custody transfer receipt and not discarding the bundle
until you've verified the checksum, or better but far more resource-
consuming, having the source node hang on to its original copy until it gets
some sort of ack from the destination. That's a long control loop, though.)


>So, performing checksumming in the BAB may be something to reconsider.
>(It also has  the added benefit of protecting all the mutable fields in
>the bundle.)

..during transfers only, alas - not while sitting in memory. It's better
than nothing, but as Stone suggests, it won't be sufficient by itself. And it
doesn't give much over a convergence-layer checksum.


>If you have already  considered this carefully, or if this
>was already discussed and dismissed at the last DTNRG meeting, which I
>missed, my apologies.

We think our basic arguments about the BAB not addressing the core
reliability and performance issues are outlined above.

>You want to use the Payload Integrity Block (PIB) to compute checksums
>at the source and  verify them at the destination because this is the
>only way to ensure integrity along the  entire path. While verifying at
>the destination may be relatively easy, computing checksums  at the
>source and being able to verify them at any node along the way to the
>destination is much harder, as you have discovered.

The main reason it's harder is that security prevents it! Place encryption
within an outer end-to-end reliability wrapper, and this becomes less hard,
as the reliability wrapper can be checked at each node (provided that
fragmentation does not occur.)

It seems that the easiest way to get end-to-end security within an
end-to-end reliability wrapper that can be checked wherever a complete
bundle is present (which should include all custody transfer destinations)
for faster detection of introduced errors will be for the applications to
use the insecure ciphersuites introduced in this draft for reliability within
the network and checking (when not fragmented), and to
do their own security outside the Bundle Protocol. (This is a solution
C to add to the A and B outlined previously in the thread below.)


>The main question that stands out for me as I read your draft is how it
>will interact with  fragmentation, and I don't think the answer is
>favorable. Suppose a bundle is generated  by an application and custody
>transfer is requested. According to your draft, the bundle must have a
>PIB checksum attached to it. Then suppose at some later node the bundle
>is fragmented  into two parts. These two parts will themselves each be
>a bundle, one having a payload block  with the first n bytes of the
>original payload and the other having a payload block with the  last m
>bytes of the original payload. The first fragment will have the PIB
>checksum in it.  When these fragments are forwarded to separate
>next-hop nodes, the node that receives the  first fragment will not be
>able to verify the PIB checksum in it because the payload in that
>fragment is not the payload over which the PIB checksum was calculated.
>If this node tries to  perform the PIB checksum verification, it will
>end up discarding the bundle. The PIB checksum  can only be verified at
>a node that has all fragments and is able to reassemble them so as to
>recover the entire original payload.

Yes, fragmentation is thorny; that hasn't changed.

There's probably a case to be made for calculating per-fragment checksums.

[mutable canonicalization]

We'll deal with mutable canonicalization in a later email.

>Thank you for acknowledging me in your draft. I hope my comments
>continue to be helpful.

they are. Thankyou.

L. (for Will/Wes)


>>-----Original Message-----
>>From: dtn-interest-bounces@mailman.dtnrg.org 
>>[mailto:dtn-interest-bounces@mailman.dtnrg.org] On Behalf Of Lloyd
>Wood
>>Sent: Saturday, March 01, 2008 4:52 PM
>>To: Delay Tolerant Networking Interest List
>>Subject: Re: [dtn-interest] Draft DTNRG Philly Agenda...
>>
>>At Saturday 01/03/2008 18:24 +0000, Stephen Farrell wrote:
>>>Lloyd Wood wrote:
>>>> At Saturday 01/03/2008 12:23 +0000, Stephen Farrell wrote:
>>>>> I was assuming we'd just have the chat without PPT since we
>>>>> already had a presentation on this before - has it changed
>>>>> significantly?
>>>> 
>>>> http://tools.ietf.org/id/draft-irtf-dtnrg-bundle-checksum-01.txt
>>>> 
>>>> Yes, this draft has changed significantly. Good you asked, 
>>otherwise the chairs and others might not be aware of this and 
>>might not think to read this working group 
>>>
>>>This is not an IETF working group. Rather basic differences 
>>exist between IRTF RGs and IETF WGs.
>>
>>Sorry, slip of the tongue. 'the -irtf- in the title indicates 
>>it's been adopted as a *research* group... draft, which is 
>>what I meant to indicate.
>>
>>You asked if this draft has changed. It has. My announcement 
>>of the draft in:
>>http://maillists.intel-research.net/pipermail/dtn-interest/2008
>>-February/003020.html
>>did not call out those changes. As I said below, my fault. I 
>>didn't attempt to draw any attention to its content to attract 
>>interest. So, here's a summary of the important bits.
>>
>>Even using the 'rework existing ciphersuites to add new 
>>ciphersuites for reliability only' method that we've now 
>>carefully explored in a couple of different ways before writing:
>>http://tools.ietf.org/id/draft-irtf-dtnrg-bundle-checksum-01.txt
>>we've found a couple of problems that this approach cannot 
>>solve, which the draft talks about:
>>
>>1. We've identified a problem with the concept of Custody 
>>Transfer, in that unless the (nth) recipient along can be sure 
>>it has the bundle unaltered and unerrored, its custody receipt 
>>back to the n-1th node really isn't worth much. As it stands 
>>in the existing Bundle Protocol, Custody Transfer ignores 
>>errors creeping in, because it can't detect them. We think 
>>Custody Transfer needs to have protection against errors to be 
>>at all meaningful; a node can't detect errors with encrypted 
>>bundles that it is relaying but doesn't have the keys for.
>>
>>2. We've identified cases where resends across the network are 
>>possible much faster for reliability-only clear-ciphersuite 
>>bundles than secured encrypted bundles, because a node along 
>>the path can perform a end-to-end reliability check on the 
>>clear bundle using the checksum, detect errors introduced 
>>earlier, and get a resend of the bundle earlier with a tighter 
>>and faster control loop. With encrypted payloads where 
>>intermediate nodes don't have the keys, this check of the 
>>payload is only possible at a node with the key, leading to a 
>>longer control loop as the now-errored payload still has to be 
>>forwarded to its security destination before the error can be 
>>picked up. In disconnected DTNs, lengthening this loop is far 
>>more significant than it is in the terrestrial internet where 
>>added hops only add milliseconds (say, comparing the IPv4 
>>at-every-node and IPv6-at-destination-only header checks). Use 
>>of secured bundles can lead to a network performance problem 
>>that discourages us!
>> e of security or of third-party untrusted-node relaying of 
>>secured bundles in DTNs. 
>>
>>So, in the approach in the draft, introducing reliability 
>>without security to the Bundle Protocol for reliable, 
>>unencrypted, payloads, brings benefits that security alone in 
>>the BP can't have for encrypted payloads (meaningful Custody 
>>Transfer, shortened control loops).
>>
>>We have identified two approaches to avoiding these two problems:
>>
>>A. End-to-end bundle security within an outer end-to-end 
>>reliability wrapper for intermediate nodes use to to check 
>>bundles and detect errors before forwarding is one way around 
>>this performance problem, and it also makes for meaningful 
>>Custody Transfer. This would be a major change in how the 
>>bundle protocol is structured (analogous to the push/pop stuff 
>>discussed with Peter on this list a while back, pushing an 
>>outer e2e reliability checksum around an encrypted-e2e bundle 
>>payload.) We haven't attempted to look at an implementation 
>>that would do this in more detail.
>>
>>B. The other way around these two issues is to only ever 
>>forward through trusted nodes which have all the keys, so that 
>>those nodes can check the content for introduced errors (or 
>>tampering) and where Custody Transfer becomes meaningful again 
>>as bundles can be checked for errors before being relayed on, 
>>with early resends requested in tighter control loops if 
>>errors are detected.
>>
>>
>>>And the sarcasm is neither welcome nor effective. It just makes
>>>you look silly and IMO makes it less likely your ideas will gain
>>>ready (or any) acceptance.
>>
>>The ideas should stand on their own merit - once they're 
>>clearly presented and understood by the group. Our written 
>>conclusions in this draft form the starting point for any 
>>discussion at the meeting, and we can summarise them neatly 
>>and concisely in pictures in five minutes before the 
>>discussion. Believe me, the stuff you've just waded through 
>>reading above, which took us months to be able to even 
>>articulate, is better in pictures...
>>
>>L.
>>
>>>Stephen.
>>>
>>>> draft before the meeting. (My fault for assuming you'd read 
>>the draft. We had to read your latest security drafts to write 
>>it!) There's an important control loop/network performance 
>>argument that needs to be presented and discussed.
>>>
>>>> 
>>>> I include a copy of your latest agenda below, along with 
>>drafts where I know them, which are always worth including in 
>>the agendas. Reading the drafts before the talks increases 
>>understanding immensely. A discussion without a talk on the 
>>issues first, by those who have actually studied the issues 
>>closely enough to write them down coherently, to attempt to 
>>bring those who haven't up to speed and set the topics for 
>>discussion is really not worth having. Well-thought-out 
>>written arguments are the starting point for discussion here.
>>>> 
>>>> (The previous presentation in Chicago was on the very different
>>>>  http://tools.ietf.org/html/draft-eddy-dtnrg-checksum-00
>>>> different name, different approach, completely rewritten since.)
>>>> 
>>>> L.
>>>> 
>>>> DTNRG IETF-71 Meeting
>>>> 
>>>> Last changed on 20080301
>>>> 
>>>> Our meeting slot is THURSDAY 0900-1130, March 13, 2008, in 
>>the Franklin 5 room.
>>>> 
>>>> We're planning to do some BP/LTP interop starting on 
>>Thursday after the meeting slot and continuing into Friday morning.
>>>> Agenda
>>>> Time    Topic   Who (link to slides)
>>>> 0900-0905       Welcome & agenda bash   Chairs
>>>> 0905-0930       DTNRG document status/news/interop plans    
>>    Chairs & others
>>>> 0940-1000       Security/Integrity chat         Howard Weiss (tbc)
>>>>                       
>>http://tools.ietf.org/html/draft-irtf-dtnrg-bundle-checksum-01 
>>needs presenting first
>>>> 1000-1010       DTN reference code      Mike Demmer
>>>> 1010-1020       Future reference implementation(s)      Chris Small
>>>> 1020-1030       N4C project     Elwyn Davies/Avri Doria
>>>>                       (related to 
>>http://tools.ietf.org/html/draft-irtf-dtnrg-prophet-00 I'm guessing.)
>>>> 1030-1040       UK-DMC  Will Ivancic
>>>>                       (results from implementing 
>>http://tools.ietf.org/html/draft-wood-dtnrg-saratoga-03 from orbit)
>>>> 1040-1050       Intentional Naming      Pritwish Basu
>>>> 1050-1100       Using HTTP      Lloyd Wood
>>>>                       
>>http://tools.ietf.org/html/draft-wood-dtrng-http-dtn-delivery-01
>>>> 1100-1110       Telemetry Control Protocol      Mike Petkevich
>>>> 1110-1130       DARPA DTN work  Preston Marshall
>>>> 
>>>>> S.
>>>>>
>>>>> PS: the 20 minutes left over was notional - I'd of course
>>>>> forgotten a couple of slots that'd been requested earlier.
>>>>> Now fixed. [1]
>>>>>
>>>>> [1] http://www.ietf.org/proceedings/08mar/agenda/DTNRG.html

Saratoga: http://www.ee.surrey.ac.uk/Personal/L.Wood/dtn/

<http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood@surrey.ac.uk>