Re: [dtn-interest] New bundle internet draft and paper comparing protocols for reliability errored delivery.

Lloyd Wood <L.Wood@surrey.ac.uk> Sun, 22 July 2007 09:04 UTC

Received: from ams-iport-1.cisco.com (ams-iport-1.cisco.com [144.254.224.140]) by webbie.berkeley.intel-research.net (8.11.6/8.11.6) with ESMTP id l6M94SJ03010 for <dtn-interest@mailman.dtnrg.org>; Sun, 22 Jul 2007 02:04:29 -0700
Received: from ams-dkim-1.cisco.com ([144.254.224.138]) by ams-iport-1.cisco.com with ESMTP; 22 Jul 2007 11:04:11 +0200
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AgAAABK7okaQ/uCKh2dsb2JhbACPWQEBCQonnWc
X-IronPort-AV: i="4.16,568,1175464800"; d="scan'208"; a="148677325:sNHT47670210"
Received: from ams-core-1.cisco.com (ams-core-1.cisco.com [144.254.224.150]) by ams-dkim-1.cisco.com (8.12.11/8.12.11) with ESMTP id l6M94Ax8017717; Sun, 22 Jul 2007 11:04:10 +0200
Received: from cisco.com (mrwint.cisco.com [64.103.71.48]) by ams-core-1.cisco.com (8.12.10/8.12.6) with ESMTP id l6M945kt025025; Sun, 22 Jul 2007 09:04:09 GMT
Received: from lwood-wxp01.cisco.com (ams3-vpn-dhcp15.cisco.com [10.61.64.15]) by cisco.com (8.8.8-Cisco List Logging/8.8.8) with ESMTP id KAA14503; Sun, 22 Jul 2007 10:03:56 +0100 (BST)
Message-Id: <200707220903.KAA14503@cisco.com>
X-Mailer: QUALCOMM Windows Eudora Version 7.1.0.9
Date: Sun, 22 Jul 2007 10:03:24 +0100
To: Scott Burleigh <Scott.Burleigh@jpl.nasa.gov>
From: Lloyd Wood <L.Wood@surrey.ac.uk>
Subject: Re: [dtn-interest] New bundle internet draft and paper comparing protocols for reliability errored delivery.
Cc: dtn-interest@mailman.dtnrg.org
In-Reply-To: <469DC5D7.7030305@jpl.nasa.gov>
References: <200707172036.VAA19930@cisco.com> <469DC5D7.7030305@jpl.nasa.gov>
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Authentication-Results: ams-dkim-1; header.From=L.Wood@surrey.ac.uk; dkim=neutral
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by webbie.berkeley.intel-research.net id l6M94SJ03010
Sender: dtn-interest-admin@mailman.dtnrg.org
Errors-To: dtn-interest-admin@mailman.dtnrg.org
X-BeenThere: dtn-interest@mailman.dtnrg.org
X-Mailman-Version: 2.0.13
Precedence: bulk
List-Unsubscribe: <http://mailman.dtnrg.org/mailman/listinfo/dtn-interest>, <mailto:dtn-interest-request@mailman.dtnrg.org?subject=unsubscribe>
List-Id: Delay Tolerant Networking Interest List <dtn-interest.mailman.dtnrg.org>
List-Post: <mailto:dtn-interest@mailman.dtnrg.org>
List-Help: <mailto:dtn-interest-request@mailman.dtnrg.org?subject=help>
List-Subscribe: <http://mailman.dtnrg.org/mailman/listinfo/dtn-interest>, <mailto:dtn-interest-request@mailman.dtnrg.org?subject=subscribe>
List-Archive: <http://mailman.dtnrg.org/pipermail/dtn-interest/>

At Wednesday 18/07/2007 00:48 -0700, Scott Burleigh wrote:
>Lloyd Wood wrote:
>>We have written up a paper examining reliability in protocols, and
>>comparing four different transport protocols that offer delivery of
>>errored data to applications. The insights gained in this work allow
>>us to identify areas of the Licklider and bundle protocols where
>>reliability suffers and more design work is needed. This paper
>>fleshes out the ideas I expressed at the Dublin meet.
>>http://www.ee.surrey.ac.uk/Personal/L.Wood/publications/internet-drafts/draft-wood-dtnrg-saratoga/wood-eddy-checksum-coverage-submitted.pdf
>> Checksum Coverage and Delivery of Errored Content
>>Lloyd Wood, Wesley M. Eddy, Jim McKim and Will Ivancic
>
>Lloyd, thanks for posting this. 

You're welcome.

(speaking solely for myself here. I really should let the paper stand for itself, but can't accept your comments or let them pass without responding.)


> I very much enjoyed the paper’s thorough explication of checksum coverage and your survey of the ways in which that protection is provided in widely used communication protocols.  

Thankyou.


>The very interesting and concisely introduced concept of errored content was new to me, as it’s a communication service that I can’t recall any deep space science mission ever asking for.

Errored content is something that we normally strive to avoid, using checksums. Deliberately letting errors pass up the stack is relatively new, and worth examining. (I'm a little puzzled by the growing move to implement this support for allowing errors, though I can see the value for unidirectional broadcast and long-delay links, and also in saving unnecessary overlapping checksum computing effort throughout the network, which may become more important as MTU sizes rise. It just seems like an edge case, of use to far too few applications and in far too few topologies to be generally worth bothering with, and a bad fit with existing network stacks, which overlap checksum coverage across multiple layers.)


>Something odd happens on page 7, when you start talking about LTP.  The appropriately dispassionate technical tone of the paper up to that point gives way to some surprisingly loaded language: LTP "claims" to permit carriage of reliable and partially reliable payloads,

LTP's "partially reliable" service is not SCTP's "partially reliable" service. SCTP packets are always checksummed with a strong CRC. LTP's packets are not, so the services are not equivalent. We indicate that the term "partially reliable" should not be taken at face value -- untangling the term and differentiating between repeatedness and integrity has taken me quite some time (had to write the paper as part of that process), and avoiding the confusion for others seems worthwhile. 

The term "reliable" conflates delivery and content, and is best qualified to indicate which is being referred to. If used unqualified, it's presumed that both reliable delivery of packets and reliable content in the delivered packets (for both payloads and headers) are provided, in the sense that TCP is 'reliable' - even though TCP's checksum is relatively weak, it's always there. When we say 'reliable', what do we really mean? Similarly, if one of these two properties is missing, the protocol must be unreliable.


> "one would hope" that its frames are checksummed, LTP "unlike many of the protocols described here, is clearly not robust or error-rejecting".

The paper defines robustness as the ability to self-check to ensure frame integrity. LTP does not detect or reject errors in its own headers or payloads (which would require checksum protection - the cross-hatched lines shown on the diagrams, which LTP lacks). However, LTP needs to do just that detection and rejection, since LTP can be run over a vast variety of link layers which may not provide adequate levels of protection (in coverage, or in strength of coverage) for LTP, or which may introduce packet corruption in passing their payloads to LTP.


>  The "Also of interest" paragraph on page 8 is especially puzzling, as it’s a discussion of the relative length and complexity of the LTP and Saratoga specifications, far removed from the subject of the paper.   

It's a simple observation that protocol reuse leads to simpler protocol specification, since you can build on what has gone before. As you say, it's a single paragraph - in a ten-page paper.


>A suspicious mind might speculate that this otherwise meticulously researched and scrupulously objective paper is merely a pretext for initiating a little hatchet job on LTP, for reasons that pass understanding.

We analyse protocols for checksums. In doing so, we point out that LTP has no checksum, and we explore the ramifications of that. (The ramifications of not having checksums are probably greater for Stephen's LTP-T.)


>But this is of no consequence, as I think you probably need to remove all mention of LTP from the paper anyway.  It is completely orthogonal to your subject.

Hardly. LTP permits delivery of errored content, just like UDP-Lite, Saratoga and DCCP. However, unlike UDP-Lite, Saratoga over UDP/UDP-Lite, and DCCP,  LTP does not limit where the errors can occur, and does not check the integrity of its own headers (or trailers) - that's rather rare in any transport protocol design, and that fact alone would be worthy of comment in itself. Examining the failure modes resulting from corruption of individual header/trailer fields would also be interesting.

LTP and the bundle protocol both show a failing to follow the simple principle of always checking that what is received is what was sent, or that integrity of content is always preserved. (That can only be achieved as a welcome unintentional side-effect of their optional security frameworks.) LTP's lack of robustness would matter less if the overlaying bundle protocol was always robust and implemented end-to-end integrity checks. A lot of reliance is placed on the LTP and bundle security frameworks to cover the headers and payloads and to address that shortcoming.

As it is, since LTP and the bundle protocol do not check the integrity of their payloads or headers, they push reliability checks down to the link layer (substituting for LTP) and up to the application for an end-to-end check (substituting for the bundle).


> LTP is not, as you state, a transport-layer protocol that permits delivery of errored payloads.  

LTP propagates any errors in received payloads up the protocol stack to the bundle layer, without any checksums to prevent that from happening. That permits delivery of errored payloads.


>There is nothing in the design of LTP that contemplates the delivery of data in which even a single bit is flipped, ever.  

Exactly. And that is the LTP design's greatest failing; the possibility that an LTP packet received may be corrupted in any way in its payload or headers/trailers is just not countenanced in its base design. LTP lives in a perfect world - yet the real world isn't perfect.

I suggest that the LTP designers read Stone's papers on observations on the occurrence of errors in real systems.


>Note that the last paragraph of section 5 of the LTP specification says "The underlying data-link layer is required to never deliver incompletely received LTP segments to LTP.  In the absence of the use of LTP authentication [LTPEXT], LTP also requires the underlying data-link layer to perform data integrity checking of the segments received.  Specifically, the data-link layer is expected to detect any corrupted segments received and to silently discard them."

Just specifying that does not make it reality - and, more importantly,  that expectation is not sufficient. That won't detect bugs in the data-link layer delivering packets to LTP, for example. The link-layer may receive a packet just fine, but a software bug could overwrite part of the packet after link-layer checking and before delivery to LTP. LTP (and the bundle layer above it, which doesn't implement a checksum either) may never know, because they don't test to find out -- they just presume all is fine in headers and data. Again, Stone (referenced in the paper) has covered this issue in real systems in some depth.


>That is, the premise on which all of section V.D is written is incorrect.  LTP is designed to be used directly over a variety of link protocols, exactly as you say on page 8.  It is not an end-to-end transport protocol that requires its own checksum to protect against errors in intermediate processing;

yet it relies on travelling over link layers that must implement protective checksum, and any errors introduced will be duplicated, hop by hop. (And LTP-T is intended as such a protocol, no?)


> the intent of the design is that each LTP segment is to occupy a single lower-layer frame (see section 4.1 of the spec), so that the checksumming in that lower-layer frame protects the integrity of the segment against errors caused by transmission noise.  A lower-layer checksum in a frame carrying an LTP segment is not just "desirable", it is explicitly mandatory.

That seems to be hoping for a bit much. If a lower-layer checksum is mandatory, you would expect to see a MUST rather than "expected" in the specification. But, as an eventual standards process point, such a MUST/required is being applied not to LTP, but the universe of link-layer protocols outside LTP and outside the IETF's control. Good luck with ever enforcing that "mandatory" requirement. And that requirement still won't help with overall system reliability, as Stone's work and the end-to-end principle point out.

(Since LTP packets are supposed to fit in individual link-layer frames, why not also mandate that the link-layer frame use available link ARQ for reliable delivery? That would remove the need for LTP to do ARQ, in much the same way that link checksums are thought to suffice for higher checksums. If the problem of ensuring reliable content is handed off to the link layer, the problem of ensuring reliable delivery across the link can certainly also be handed off to the link layer as well. And the link-layer ARQ mechanism should always be appropriate for the link.)

The assumptions in the LTP specification are that:

- an LTP packet will always fit into a single link-layer frame without further segmentation. Given the vast number of possible link layers for LTP, this is unlikely, as it rules out classes of link layer, or mandates that LTP be must run over UDP/IP over that class of link layer to get protection against bugs in reassembly of segmented packets. (I see Stephen relaxes this segmentation requirement for LTP-T, which will have greater reliability problems here.)

- that link-layer frame will always checksum the entire LTP packet robustly and detect ALL possible errors. This assumes that the checksums of all link layers are equivalent in protecting against errors, so that LTP packets are uniformly robust and equivalent in integrity no matter what link layer they have travelled over. All checksums are equivalently infinitely strong, leading to perfect LTP packets. In reality, LTP over different links becomes "variably reliable" - which is to say, unreliable. This means bundle contents are at the mercy of the least reliable link in the path, and that a single bad/buggy/weakly-checked link layer can corrupt LTP packets and bundles -- undetectably, if the security frameworks aren't in place.

- there is no intermediate-processing corruption, after the link checksum check has been carried out (though mandating single frames reduces the amount of processing required, and the possibility of corruption occurring).

- the link-layer frame will not be subject to reliable ARQ, as deciding to do ARQ or not (for red/green packets) is solely LTP's job.

Those assumptions cannot be guaranteed in reality.

LTP does not check for errors itself. LTP relies on a link-layer checksum to catch transmission errors. LTP also relies on a higher-layer end-to-end checksum to catch any processing errors (which the bundle format doesn't provide, so we've now proposed one in a draft to cover at least some of the bundle payload).

Given this, how can LTP be described as "a reliable convergence layer" when LTP clearly is not reliable? 'An unreliable convergence layer offering reliable or unreliable delivery" would be a more accurate description of LTP.


>Also, LTP’s "partial reliability" concept is carefully explained ­ or, if you prefer, "bounded" ­ in section 3.3 of the LTP Motivation draft: "
>LTP regards each block of data as comprising two parts: a 'red-part', whose delivery must be assured by acknowledgment and retransmission as necessary, and a 'green-part' whose delivery is attempted, but not assured."  Acknowledgment and retransmission, not checksum. 

...simply because there are no checksums in LTP, which is a rather interesting point worth making in a paper discussing checksum coverage and reliability in protocols, don't you think? (We make similar observations about the weaknesses of the IPv6 header design, which have to be compensated for by having every transport layer over IPv6 - even ICMP - include a pseudo-header checksum.)


> This is "reliability" in the sense of what you call "delivery" reliability in section I, the sense in which TCP is often characterized as a "reliable" protocol, not reliability in the sense of "being able to trust" the data in received packets (which, I agree, is a valuable and interesting distinction). 

It's not just necessary to know that you can trust and rely on the payload data - it's necessary to be able to trust and rely on the headers that frame that data for correct delivery of that data, too.


> It is in fact vaguely similar to the "partial reliability" extension for SCTP, which likewise identifies circumstances under which the rules on acknowledgment and retransmission of some part(s) of a data unit in transit may be set aside.  

"Vaguely similar" is too... vague. The SCTP partial reliability extension is concerned solely with varying acknowledgement repeat persistence, which it goes into in some detail.  SCTP is concerned solely with that persistence because SCTP packets are always checksummed with a strong CRC, so the other requirement for reliability - that of getting the same header and payload data that was sent - doesn't come into it. That SCTP checksum covers the payload and headers. LTP doesn't have that.

Yes, you could say that LTP's partially reliable service is just less reliable and simply "more partial" than SCTP's partially reliable service, as SCTP has both the checksums and limited persistence, while LTP green packets have no checksums and no persistence (unless, of course, the link ARQ gives green packets unwanted persistence anyway). But, instead of using the ambiguous term 'partial reliability', just use the word 'unreliable'. It's much clearer to the reader.


>Neither has anything to do with
>varying the portion of the data unit that is covered by the checksum, as in DCCP and UDP-Lite.

LTP packets are not checksummed and not reliable. Varying the portion of the data unit covered by the checksum simply can't be done in LTP  - simply because LTP has no checksum. As a transport protocol, LTP is basically equivalent to UDPv4 with checksums always turned off. You can probably get away with turning off UDP checksums if you run UDP over IPsec end-to-end to compensate; similarly, you can have LTP security and the NULL authentication sort-of compensate for the lack of LTP checksums. The problem here is that the NULL authentication and hardcoded key are entirely optional - just like having IPsec cover for turning off the shown-to-be-problematic UDPv4 checksum - which is why the UDPv6 checksum is mandatory. Integrity should be mandatory, or at least bounded - where you know you can trust the frame headers, but not the payload, say.


>Perhaps this clears up the paradox you encountered on determining that the full lower-layer checksum that is required for the protection of "red" segments prevents the use of "green" segments for delivery of errored content to the application: "green" LTP segments were never intended to be used for any such purpose, so there is no paradox.

Since LTP leaves out a checksum, the green segments can clearly be used for such a purpose - though they're clearly most useful for that purpose when the lower stack does not contain checksums that reject errored packets before they reach LTP and then bundles and the application, since otherwise only corruption due to processing (bugs, memory event upsets) is passed up.

So what do you believe green packets will be used for? The fact remains that LTP packets can be used for the delivery of errored payloads. (Though, given that the headers can be errored as well, delivery is less trustworthy than the other protocols we examine.)

Again, the paradox is this:
- a green packet is sent over a link layer that checks data integrity.
- the link layer detects a problem in receiving the packet, because say a single bit was flipped.
- the packet is not passed to LTP.

A bit flip is turned into a packet-length erasure; it might be simpler and more efficient to never send the green packets at all, since they're optional. But since a green packet is supposed to fit into a single link frame that can be ARQ'd with link ARQ, a 'partly unreliable' green frame can always arrive at the next hop, thanks to that ARQ mechanism. Another paradox - and one worth including in the paper.


>The reason for this, again, is that LTP was not designed for the delivery of errored payloads.  It was designed for reliable transmission over deep-space RF links.  

More accurate to say that LTP was not designed to protect against errored payloads. LTP's design and specification permits delivery of errored payloads as a consequence of not checking for payload (or header) errors.

"Reliability" encompasses content and delivery - delivery of all packets can lead to reliable content, but is not in itself sufficient. A protocol should include checks on the reliability of its headers and payloads. LTP doesn't do that. How can LTP be called reliable, when it doesn't check that it's reliable?

Let's look again at figure 1 of the paper. LTP doesn't check the integrity of its own headers or its payload. The bundle payload doesn't check the integrity of itself or its payload (but needs to, since LTP isn't endhost-to-endhost, unlike others transport protocols, so relying on the bundle and LTP code being colocated and hoping for the best, as e.g. web browsers do, doesn't suffice). The stack model followed is 1(d), where any checking is left solely to the link layer. That's not reliable.

The deep-space link is supposed to ensure reliability here by preventing all transmission errors before LTP sees them? Why does the link need LTP, then? It could use its own ARQ, which would suffice in the same way link checksums are said to suffice. Why does the bundle protocol need LTP, when it can just run direct over the link layer or over UDP/IP over that link layer?

(btw, "reliable transmission" conflates delivery and content - transmission of symbols. Not a good term imo.) 


>Such links are currently used only by spacecraft on scientific missions, and it has been our experience that neither spacecraft operators nor deep space mission scientists have much interest in errored data. 

I would think that spacecraft operators and deep space mission scientists have a _lot_ of interest in _preventing_ errored data.


> A small number of undetected flipped bits in a set of engineering telemetry records or a multispectral image could induce people to make some expensive mistakes.

Yes. Good to see that you acknowledge the problem of undetected flipped bits (whether from transmission or from processing/storage). Perhaps you'd like to suggest how the LTP specification could be improved to detect and prevent these?

LTP and the bundle protocol can give deep space mission designers errored content - and, as you say, they don't want it. You seriously expect spacecraft designers to use and rely on such unreliable protocols that can let errored data through? Really?


>  Moreover, in many cases the data from spacecraft are compressed to optimize bandwidth utilization, and bit errors in compressed data records tend to make recovery of the original uncompressed data difficult.

So they won't be sent using green packets, then. I think the LTP specification could be strengthened by removal of all "green" specification; the LTP specification is complex enough as is.


>So you are quite correct in saying that LTP not as capable as, say, Saratoga as a means of streaming "Everybody Loves Raymond" in high definition from low Earth orbit. 

I did not say that.

Are you saying that LTP green packets can't be used for streaming of, say, time-sensitive telemetry that is considered useless if it arrives too late? Time sensitivity and not requiring ARQ since the resend would miss its useful window, as in VoIP, is the only reason I can think of for using green packets - though that's not a good fit with bundle transfers, and LTP lacks the variable persistence of SCTP.

Speaking of things we say, Joab Jackson's IEEE Spectrum article has you on record as saying that LTP makes a link reliable. Really? It would seem that the link makes LTP reliable.


> It’s also not as capable as ibuprofen as a means of relieving headaches.  Neither fact is relevant.
>
>Summing up, I think you’ll have a stronger paper if you simply remove LTP from it altogether and stick to a survey of protocols that deliver errored content. 

That's the second time you've brought that up. Scott, you're suggesting any and all analysis or criticism of LTP be removed from our paper -- and you're an author of the LTP specifications. Do you realise what that attempt at suppression of criticism and discussion of LTP's problems looks like?

Comparisons made between other protocols and LTP are entirely valid. The LTP design permits delivery of errored content, and has no mechanisms to prevent that from occurring. Saying 'must be run over a link layer with a strong but unspecified checksum' is not much of a mechanism. Unlike many of the other protocols surveyed, LTP also does not check the integrity of its own headers or trailers, which is certainly worthy of note. (To give an example, unlike LTP, even ATM cells always check their headers to make sure the description of what they are carrying is accurate - and, unlike the bundle protocol, AAL5 frames' headers and trailers are always checksummed too.)

LTP can deliver errored content, because LTP does not check for errors. LTP simply delivers any errors to the application. LTP does not protect against errored content. LTP's design permits this.

Summing up, I think you'll have a stronger protocol that will be more reliable if you include a checksum (though making that optional NULL authentication mandatory might well work).


>Section VI of the paper seems like an excellent discussion of how to engineer an optimum stack along these lines.

Yes. Though whether this would really be worth doing for the set of applications that could usefully take advantage of it is worth asking.


>One last thought: I think your proposal to add a checksum block as a Bundle Protocol extension is excellent, for just the reasons you discuss in sections I and II.  

And in my opinion it's a shame that we had to write that draft. I personally view the lack of attention paid to reliability and the well-known lessons of the end-to-end principle in the designs of the LTP and bundle specifications as weaknesses in their designs. An attempt to address these engineering design failings should be made, and that's what we've expended time and effort to lead the way on.

As for "reasons that pass understanding" - if the reasons in sections I and II are new to you, I strongly recommend reading Saltzer's and Stone's papers, as referenced.

> I haven’t had time to read your draft yet, but I think the concept has a lot of merit.

Yes, reliability has merits. Perhaps a DTN reliability workshop would be a good idea? See you in Chicago.

L.

http://www.ee.surrey.ac.uk/Personal/L.Wood/dtn/


<http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood@surrey.ac.uk>