Re: TCP checksum escapes and iSCSI error recovery design

julian_satran@il.ibm.com Tue, 17 April 2001 13:24 UTC

Received: from ece.cmu.edu ([128.2.236.200]) by ietf.org (8.9.1a/8.9.1a) with SMTP id JAA07190 for <ips-archive@odin.ietf.org>; Tue, 17 Apr 2001 09:24:34 -0400 (EDT)
Received: (from majordom@localhost) by ece.cmu.edu (8.11.0/8.10.2) id f3HB9sK08256 for ips-outgoing; Tue, 17 Apr 2001 07:09:54 -0400 (EDT)
X-Authentication-Warning: ece.cmu.edu: majordom set sender to owner-ips@ece.cmu.edu using -f
Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199]) by ece.cmu.edu (8.11.0/8.10.2) with ESMTP id f3HB95r08225 for <ips@ece.cmu.edu>; Tue, 17 Apr 2001 07:09:06 -0400 (EDT)
Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22]) by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id NAA348520 for <ips@ece.cmu.edu>; Tue, 17 Apr 2001 13:08:58 +0200
From: julian_satran@il.ibm.com
Received: from d12mta02.de.ibm.com (d12mta01_cs0 [9.165.222.237]) by d12relay01.de.ibm.com (8.8.8m3/NCO v4.96) with SMTP id NAA112552 for <ips@ece.cmu.edu>; Tue, 17 Apr 2001 13:08:58 +0200
Received: by d12mta02.de.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id C1256A31.003D3E73 ; Tue, 17 Apr 2001 13:08:56 +0200
X-Lotus-FromDomain: IBMIL@IBMDE
To: ips@ece.cmu.edu
Message-ID: <C1256A31.003D3D6C.00@d12mta02.de.ibm.com>
Date: Tue, 17 Apr 2001 12:54:43 +0200
Subject: Re: TCP checksum escapes and iSCSI error recovery design
Mime-Version: 1.0
Content-type: text/plain; charset="us-ascii"
Content-Disposition: inline
Sender: owner-ips@ece.cmu.edu
Precedence: bulk


Mallikarjun,

I completely agree.  That is why I insisted on having the "must recover"
removed from the requirements doc.
However we should make it possible for those that are willing to make the
extra effort (and pay for it) to recover a transcontinental backup session
without blowing it up.

The one additional point we should consider is the impact iSCSI
gateways/proxies have on recover as they are one of the reasons why we have
segment checksums that go beyond the "pure transport" end-to-end error
detection and forced us into iSCSI end-to-end mode.

Regards,
Julo

"Mallikarjun C." <cbm@rose.hp.com> on 17/04/2001 01:46:21

Please respond to cbm@rose.hp.com

To:   ips@ece.cmu.edu
cc:
Subject:  Re: TCP checksum escapes and iSCSI error recovery design




Let me add some comments to this excellent analysis from Jim -
in particular what this means to the ERT work in progress.

Grossly, there seem to be three different camps of thinking (with
possible finer variations) -

A. trust TCP implicitly: checksum is adequate, no iSCSI digests.
B. trust TCP explicitly: verify TCP, and signal a service delivery
   subsystem failure to SCSI on a CRC (tear up the session) since
   it happens so very rarely.
C. verify and recover: verify TCP, and do a full recovery as needed
   since TCP checksum escapes happen in a random, non-deterministic
   way (may frequently at times).

The attached discussion seems to indicate that the truth is somewhere
between camps B and C.  In the absence of other conclusive evidence
placing us fully in camp B, the recourse is to crisply and unambiguously
specify the error recovery mechanisms while ensuring interoperability
with and minimal penalty for, camp B.  Speaking for the Error Recovery
Team, this for now continues to be the operating assumption of ERT as
we specify algorithms, trim options, and work on the wording.
--
Mallikarjun


Mallikarjun Chadalapaka
Networked Storage Architecture
Network Storage Solutions Organization
MS 5668   Hewlett-Packard, Roseville.
cbm@rose.hp.com


>All,
>In designing the iSCSI error recovery mechanisms there has been
considerable
>focus on general deficiency of the TCP checksum.  It seems that what iSCSI
>error recovery should really be based upon is the overall profile of TCP
>checksum escapes for the networks that will carry iSCSI traffic ("checksum
>escapes" being defined as those cases where corrupted data is passed
through
>TCP and delivered to the upper layer protocol as good).  The design of
iSCSI
>error recovery needs to start with a clear and agreed upon set of
>assumptions regarding the profile for TCP checksum escapes across space
and
>time. Thus, the question is not simply "How often is corrupted data passed
>through TCP and delivered to iSCSI", but more along the lines of  "What is
>the distribution of TCP checksum escapes across network paths and through
>time?"
>
>ISCSI error recovery should not be over-designed nor under-designed and
will
>be fundamentally different if the assumption is that checksum escapes or
>bursts of such occur every 5 minutes or 5 hours or 5 days, and if the
>assumption is that there are a few bad paths or that all paths are bad
>sometimes.
>
>It would make sense for iSCSI to have minimal error recovery if the
spatial
>profile (discussed below) of TCP errors were bi-modal (i.e. there are good
>paths and bad paths). IOW, if the checksum escapes happen
deterministically
>always or not at all, iSCSI can simply drop the connection on a digest
>failure because the operation switched into error mode.  OTOH, we cannot
>adopt this simplistic approach if TCP can be in the shades of gray between
>the two modes.  Attached email discussion indicates that load levels and
>memory usage models along the path play a key role in determining this
>aspect. Employing an iSCSI-level digest/CRC appears to be the wise
approach
>regardless of the TCP operational model (bimodal or otherwise), to detect
>escapes when they happen. IOW, the principle of "trust but verify" is apt
to
>be applied here. The question is what is the appropriate action for iSCSI
to
>take if the verification detects data corruption.
>
>To start this discussion toward a set of assumptions regarding TCP
checksum
>escape profiles and an appropriate iSCSI error recovery design, this email
>includes responses from several individuals working in this area (Vern
>Paxson, Craig Partridge, Jonathan Stone), along with links to their
original
>papers.
>
>Regards,
>Jim Wendt
>Networked Storage Architecture
>Network Storage Solutions Organization
>Hewlett-Packard Company / Roseville, CA
>Tel: +1 916 785-5198
>Fax: +1 916 785-0391
>Email: jim_wendt@hp.com <mailto:jim_wendt@hp.com>
>
>
>----------------------------------------------------------
>Here is a copy of my original email to Vern Paxson, Craig Partridge,
>Jonathan Stone, and Jeff Chase:
>
>Hi All,
>I'm e-mailing this to Vern, Craig, Jonathan, and Jeff in hopes of
gathering
>information around TCP checksum failure profiles and Internet data failure
>characteristics in general.  I'm Jim Wendt and I've recently started
working
>on both the iSCSI error recovery and RDMA/TCP-Framing activities. I'm
>emailing you directly because of your specific research and experience
>regarding TCP checksum failures and behavior.
>
>My immediate interest is in regards to TCP checksum failure rates
>(undetected errors) and how iSCSI should handle these undetected errors.
As
>you probably already know, the iSCSI protocol is a mapping of SCSI device
>protocols onto TCP (or SCTP eventually).  I have read the three papers
(When
>the CRC and TCP Checksum Disagree, Performance of Checksums and CRCs over
>Real Data - both versions, End-to-End Internet Packet Dynamics).
>
>The iSCSI WG has come to the conclusion that the TCP checksum provides an
>insufficient level of data protection, and that a CRC-32 will be used for
>data integrity verification on each iSCSI PDU (the specific CRC polynomial
>to employ is being studied now). Thus, the assumption is that if corrupted
>payload data does pass through TCP and is undetected by the TCP checksum,
>then the corruption will be detected at the iSCSI PDU level via the CRC-32
>check.
>
>What we are considering now is what level and philosophy of error handling
>and recovery is put into iSCSI. It seems to me that fundamentally
different
>error handling behaviors would be put into iSCSI based on a known rate or
>profile of occurrence of bad PDUs (those PDUs that are passed up through
TCP
>as being OK, but for which the iSCSI level CRC-32 indicates a data error).
>Thus, iSCSI error handling might be defined differently if the expected
PDU
>error rate is one every 5 minutes as opposed to one every 5 hours (or 5
>days). Also, the error handling might be different if errors are bursty
>(i.e. a sequence of bad PDUs) rather than evenly spread through time.
>
>I would appreciate hearing your thoughts (and any supporting data
references
>I could employ) regarding the nature of TCP checksum failure profiles
across
>time and space (i.e. network paths).   More specifically:
>
>* How does the Internet work in the face of TCP error escapes?  If there
>truly is a relatively high level of application level data corruption
>(perhaps 1 in 16M packets), then how does the Internet and associated
>applications manage to function?
>
>* What is the distribution of errors across network paths?  Are most
network
>paths very good (low TCP error escapes) with only a few paths that are
>really bad, or are TCP error escapes more evenly distributed across all
>network paths?  If the spatial profile is that there are good and bad
paths,
>then do the error escapes rates on these two classes of paths correspond
to
>the high/low bounds for TCP error escapes (1 in 16 million / 1 in 10
>billion)?
>
>* What is the distribution of errors through time?  Do TCP error escapes
>occur individually and randomly through time, or are TCP error escapes
more
>bi-modal where most of the time there are no errors and occasionally there
>is a clump or burst of TCP error escapes?  If the temporal profile for TCP
>error escapes is that there are good periods and infrequent but severe bad
>periods, then what is the duty cycle for these periods (how bad/good for
how
>long), and what are the error escape rates during these periods?
>
>* What about a stronger TCP checksum?  I don't believe that anyone every
>actually employed RFC1146 (TCP Alternate Checksum).  Has there been any
>recent thinking about actually improving TCP's end-to-end data integrity
>checking.  I suppose that the existing middle box infrastructure won't
allow
>for this. However, I'm considering submitting a draft and starting to push
>for a stronger TCP checksum or CRC, but I would like to get feedback from
>all of you on the technical feasibility and possible acceptance of this
>proposal before taking it to the public forums.
>
>* What about a stronger SCTP checksum? SCTP currently uses the Adler-32
>checksum. Perhaps an optional stronger CRC-32 should be defined for SCTP.
>Has there been any thinking in this direction? Again, I am considering
>pursuing this with the SCTP folks but would appreciate any feedback you
have
>to offer first.
>
>Thanks,
>Jim
>
>Jim Wendt
>Networked Storage Architecture
>Network Storage Solutions Organization
>Hewlett-Packard Company / Roseville, CA
>Tel: +1 916 785-5198
>Fax: +1 916 785-0391
>Email: jim_wendt@hp.com <mailto:jim_wendt@hp.com>
>
>
>----------------------------------------------------------
>Craig Partridge writes:
>
>Hi Jim:
>
>Here's my quick set of answers, which Vern, Jonathan and others can
>refine.
>
>>* How does the Internet work in the face of TCP error escapes?  If there
>>truly is a relatively high level of application level data corruption
>>(perhaps 1 in 16M packets), then how does the Internet and associated
>>applications manage to function?
>
>The answer is that applications appear to function because people don't
>notice the errors or resort to backups, or whatever.  It isn't clear
>exactly how we're managing to survive.  (Back in the old days when NFS
>had these problems, people used to assume a disk failure had occurred when
>in fact the network had trashed their data).
>
>>* What is the distribution of errors across network paths?  Are most
>network
>>paths very good (low TCP error escapes) with only a few paths that are
>>really bad, or are TCP error escapes more evenly distributed across all
>>network paths?  If the spatial profile is that there are good and bad
>paths,
>>then do the error escapes rates on these two classes of paths correspond
to
>>the high/low bounds for TCP error escapes (1 in 16 million / 1 in 10
>>billion)?
>
>The answer is that there are good and bad paths.  On good paths you'll
>probably see less escapes than 1 in 10 billion -- you can probably treat
>them as essentially error free.  If you start seeing a modest number of
>errors, then either (a) there's a broken router in your path or (b)
>there's a broken end system.  If we had a better way of identifying the
>source of errors, I'd say that if you ever see an error from a host,
>declare it broken.
>
>>* What is the distribution of errors through time?  Do TCP error escapes
>>occur individually and randomly through time, or are TCP error escapes
more
>>bi-modal where most of the time there are no errors and occasionally
there
>>is a clump or burst of TCP error escapes?  If the temporal profile for
TCP
>>error escapes is that there are good periods and infrequent but severe
bad
>>periods, then what is the duty cycle for these periods (how bad/good for
>how
>>long), and what are the error escape rates during these periods?
>
>They're not random but I don't think we know enough about the timing to
say.
>My hunch is that a lot are based on end system load (that high loads
>on network cards tends to encourage certain classes of bus errors).
>
>>* What about a stronger TCP checksum?  I don't believe that anyone every
>>actually employed RFC1146 (TCP Alternate Checksum).  Has there been any
>>recent thinking about actually improving TCP's end-to-end data integrity
>>checking.  I suppose that the existing middle box infrastructure won't
>allow
>>for this. However, I'm considering submitting a draft and starting to
push
>>for a stronger TCP checksum or CRC, but I would like to get feedback from
>>all of you on the technical feasibility and possible acceptance of this
>>proposal before taking it to the public forums.
>
>That's a tricky question and not one I've thought enough about to have
>a strong view -- except to say that it isn't all the checksum's fault --
>Jonathan's done work showing that putting the checksum at the end
>dramatically
>improves its efficacy in certain situations.
>
>>* What about a stronger SCTP checksum? SCTP currently uses the Adler-32
>>checksum. Perhaps an optional stronger CRC-32 should be defined for SCTP.
>>Has there been any thinking in this direction? Again, I am considering
>>pursuing this with the SCTP folks but would appreciate any feedback you
>have
>>to offer first.
>
>There's considerable reason to believe that Adler-32 is a lousy checksum
--
>it uses more bits (32 vs. 16) and will probably detect fewer errors than
>either Fletcher-16 or the TCP checksum.  If one is going to invest energy
>in fixing a checksum, fixing SCTP's checksum is probably the first
priority.
>
>Craig
>
>----------------------------------------------------------
>Vern Paxson writes:
>
>Just a couple of follow-on points:
>
>    - I suspect the Internet survives because the bulk of the
>      traffic isn't all that critical (Web pages, particularly
>      large items like images and perhaps video clips), so when
>      they're corrupted, nothing breaks in a major way.
>
>    - One point to consider is that if you use IPSEC, then you
>      get very strong protection from its integrity guarantees
>
>- Vern
>
>----------------------------------------------------------
>Jonathan Stone writes:
>
>In message <200104092135.f39LZVS21092@daffy.ee.lbl.gov>Vern Paxson writes
>
>Jim,
>
>Craig's answer is an excellent summary.  Fixing the SCTP checksum is a
>priority; I have a chapter in my thesis addresing some of the issues
>there, and I believe Craig and I plan to write a paper.
>Stronger TCP checksums are an interesting idea, but the lag on
>deploying new TCP features seems to be  5 years or more.
>
>
>If I re-did the study, i would try and log headers of ``good'' packets
>(those where the checksum matches) as well as the entire packets with
>checksum errors.  With only data on the `bad' packets (which is partly
>to meet privacy concerns, which were a big hurdle for us),
>it's hard to give good answers to your questions about the time or path
>dependency of bad checksums.
>
>One thing we can say is that there appear to be certain `bad' hosts
>which emit very high rates of packets with incorrect checksums.  One
>host we noticed in the Stanford trace sent 2 packets every second for
>around three hours, totalling about 20% of the total packets in that
>trace.  We can't say whether that's a host problem or a path problem;
>but either way that error rate would worry me, if I were using I-SCSI
>on that host (or path).  That said, we did find what looks to be a
>router with a bad memory bit, which is clearly path-dependent
>(though hitting that particular memory word may be time- and
load-dependent
>as well).
>
>Further, some of the bad checksums appear to be due to software
>bugs in specific OS releases.  As the Internet evolves (old OS revs
>replaced by newer ones), the rate of those specific errors will
>evolve over time.
>
>
>Last, the DORM trace in our SIGCOMM 2000 paper is the only trace where
>our monitoring point was directly adjacent to end-hosts, with no
>intervening IP routers.  On that Ethernet segment, i saw about 1 in
>80,000 packets with a valid frame CRC, but a bad IP checksum --
>often a deletion of a 16-bit word from the IP address.  I tried
>to `repair' some  IP headers (e.g., by inserting what seemed
>to be a missing 16 bits of an IP source address).
>After the repair, the IP checksum seemed to be correct.
>
>That points to a problem ocurring after the sending IP layer computeds
>its IP header checksum, but before the frame-level CRC is computed.:
>That suggests an error either in the NIC device driver, in its DMA
>enigne, or somewhere on the NIC itself (dropped from a FIFO?).
>
>That led Craig and I to suggest that where possible, checksums be
>computed early and verified late.  That's a caution about a potential
>downside of outboard checksumming: it cover less of the path to an
>application buffer than is covered by software checksums.  Software
>checksums can catch errors which occur between the driver and the NIC
>(bus errors, DMA, FIFO overruns, what-have-you); but outboard hardware
>checksums do not.  (IEN-45 mentions this issue, but the two-pass
>scheme suggested there doubles bus utilization for a given I/O rate.)
>
>Putting middle boxes in the path just exacerbates this.
>
>Whether or not to use outboard checksumming is entirely up to
>NIC designers. We merely raise the issue that it checks less of
>the datapath than is covered by software checksums.
>
>
>>Just a couple of follow-on points:
>>
>>   - I suspect the Internet survives because the bulk of the
>>     traffic isn't all that critical (Web pages, particularly
>>     large items like images and perhaps video clips), so when
>>     they're corrupted, nothing breaks in a major way.
>>
>>   - One point to consider is that if you use IPSEC, then you
>>     get very strong protection from its integrity guarantees
>
>For a software IPSEC implementation.
>
>A hardware implementation with outboard crypto hardware could
>potentially fall foul of the same kinds of local-to-source-host
>errors, (DMA errors or whatever the ultimate cause is) which our data
>indicates some NICs suffer from.  If the data has already been
>curdled by the time the encrypting accelerator sees it,
>I don't see how IPsec actually enhances integrity.
>
>----------------------------------------------------------
>Vern Paxson writes:
>
>> > - One point to consider is that if you use IPSEC, then you
>> >   get very strong protection from its integrity guarantees
>>
>> For a software IPSEC implementation.
>
>Good point!
>
>         Vern
>
>----------------------------------------------------------
>Craig Partridge writes:
>
>I'd like to clarify my statement to Jim about Adler-32 to be a bit more
>clear
>about what the issues are.   I've also added Mark Adler to the cc list as
>I'd promised Mark to get him results when we had them, and while the
results
>aren't quite cooked, if the Jim is going to circulate a discussion, Mark
>should see it first.
>
>If you're designing a checksum, there are certain features you'd like it
>to have.  Here's a starting point of a formal definition.  Given the set
>V of all possible bit vectors and a checksum function C(), what we'd like
>is:
>
>    prob(C(v1) == C(v2)) is 1/2**(sizeof C())
>
>that is given any two v1, v2 being different elements of V, the chance
that
>their checksum will collide is the best possible, namely 1 over 2 raised
to
>the power of the bitwidth of the result of C().
>
>Three sub points:
>
>    1. This is not quite the same as what the cryptographic checksum folks
>    want.  They actually want it to be very hard [for some computational
>    approximation of hard], given C(v1) to find a v2 such that C(v1) ==
>C(v2).
>    For network checksums, we don't care as we're protecting from errors,
>    not attacks.
>
>    2. If we do not pick v1 and v2 at random, but according to some
>distribution
>    rule of likely packet sizes, packet contents, etc, we'd still like the
>    equation to be true.  We don't want to be vulnerable to certain
>    traffic patterns, etc.
>
>    3. You can compare the effectiveness of checksums by how close they
>    come to this ideal -- that is, how effectively do they use their
>    range of values?
>
>OK, so let's talk about Adler-32.  Adler-32 is a neat idea -- it seeks to
>improve the performance of the Fletcher checksum by summing modulo a prime
>number, rather than 255 or 256.
>
>However, it sums bytes (8-bit quantities) into 16-bit fields.  As a
result,
>the high bits of the 16-bit fields take some time to fill (they only get
>filled by propogating carries from lower bits) and until the packet
>is quite big (thousands of bytes) you don't get enough mixing in the high
>bits.  Two problems: (a) we're not fully using the 16-bit width, so for
>smaller packets the chance of collision is much greater than
1/2**sizeof(C)
>simply because some bits in the checksum are always (or with very
>high probability) set to 0; and (b) it looks like (and Jonathan's still
>working on this) that the law of large numbers will cause the values to
>cluster still further [I think of this behavior as the result of just
>looking at all the bits instead of just the low order bits mod a prime,
>we're
>vulnerable to the fact that the sums are not evenly distributed, by
>the law of large numbers]
>
>We're still working on whether the core idea behind Adler-32 (namely
working
>modulo a prime) is as powerful as it seems, but it is clear that to have
>a hope of making it comparable to the TCP checksum or Fletcher, you have
to
>sum 16-bit quantities into 16-bit fields.
>
>Craig
>
>----------------------------------------------------------
>Jonathan writes:
>
>In message <200104101505.f3AF5BZ23145@aland.bbn.com>Craig Partridge writes
>
>>I'd like to clarify my statement to Jim about Adler-32 to be a bit more
>clear
>>about what the issues are.   I've also added Mark Adler to the cc list as
>>I'd promised Mark to get him results when we had them, and while the
>results
>>aren't quite cooked, if the Jim is going to circulate a discussion, Mark
>>should see it first.
>>
>>If you're designing a checksum, there are certain features you'd like it
>>to have.  Here's a starting point of a formal definition.  Given the set
>>V of all possible bit vectors and a checksum function C(), what we'd like
>is:
>>
>>   prob(C(v1) == C(v2)) is 1/2**(sizeof C())
>>
>>that is given any two v1, v2 being different elements of V, the chance
that
>>their checksum will collide is the best possible, namely 1 over 2 raised
to
>>the power of the bitwidth of the result of C().
>
>Jim, Craig:
>
>Just to be picky, as I'm working right now on definitions of some of
>these issues for my thesis:
>
>One can list other desirable properties, like wanting each bit of the
>checksum field to have informational entropy 1/2.  Craig's aggregate
>definition falls out from that, with a few extra assumptions.
>
>Also, less formally, desiring each bit of the input data to contribute
>equally to flipping the the final state of each output bit.
>that is where Adler32 runs into trouble when given short inputs.
>
>
>>Three sub points:
>>
>>    1. This is not quite the same as what the cryptographic checksum
folks
>>    want.  They actually want it to be very hard [for some computational
>>    approximation of hard], given C(v1) to find a v2 such that C(v1) ==
>C(v2).
>>    For network checksums, we don't care as we're protecting from errors,
>>    not attacks.
>
>There are two versions of formal cryptographic invertibility; the
>other criteria is that it be computationally intractable to find *any*
>v1 and v2 such that C(v1) = C(v2).  Crypto folks would generally like
>both.
>
>
>>    2. If we do not pick v1 and v2 at random, but according to some
>distributi
>>on
>>    rule of likely packet sizes, packet contents, etc, we'd still like
the
>>    equation to be true.  We don't want to be vulnerable to certain
>>    traffic patterns, etc.
>>
>>    3. You can compare the effectiveness of checksums by how close they
>>    come to this ideal -- that is, how effectively do they use their
>>    range of values?
>>
>>OK, so let's talk about Adler-32.  Adler-32 is a neat idea -- it seeks to
>>improve the performance of the Fletcher checksum by summing modulo a
prime
>>number, rather than 255 or 256.
>>
>>However, it sums bytes (8-bit quantities) into 16-bit fields.  As a
result,
>>the high bits of the 16-bit fields take some time to fill (they only get
>>filled by propogating carries from lower bits) and until the packet
>>is quite big (thousands of bytes) you don't get enough mixing in the high
>>bits.  Two problems: (a) we're not fully using the 16-bit width, so for
>>smaller packets the chance of collision is much greater than
1/2**sizeof(C)
>>simply because some bits in the checksum are always (or with very
>>high probability) set to 0; and (b) it looks like (and Jonathan's still
>>working on this) that the law of large numbers will cause the values to
>>cluster still further [I think of this behavior as the result of just
>>looking at all the bits instead of just the low order bits mod a prime,
>we're
>>vulnerable to the fact that the sums are not evenly distributed, by
>>the law of large numbers]
>
>To be fair, and to give the whole picture, there are a couple of
>points here that should be expanded.
>
>The first is that for large input data lengths, we can show that the
>distribution of both 16-bit halves of the Adler-32 sum should acually
>be well distributed.  That holds true for addition mod M, of any
>repeated independent observations of a random variable.  (A proof
>appears in the appendix of the 1998 ToN paper you cited already;
>although  that version of the proof may not formally state
>all the necessary assumptions about indepedent observations.)
>
>However, for networking in general and SCTP in particular, there are
>fairly modest hard upper boundes on the maximum input length.
>SCTP forbids fragmentation.  For the increasingly-pervasive Ethernet
>frame length of 1500 bytes that means SCTP checksums have no more
>than 1480 input bytes.
>
>The data we have -- and I am still working on it -- say that's too
>short to get good coverage of either the two sum in SCTP-- the purely
>additive commutative sum, or the higher-order running sum of the
>commutative sum (which is position-dependent)
>
>Craig's description gives a good intuition for the first, commutative
>sum. For the position-dependent sum (the running sum of the first sum),
>another good intution for the computational modelling I've done is to
>imagine a hash-table where we hash, independently, all possible values
>at all possible offsets in a 1480-byte SCTP packet.  The intuition
>isn't so much "law of large numbers" as that SCTP draws its per-byte
>values from one particular corner of a two-dimensional space (the
>hash-table vs. all possible bytes); so it ends up with an uneven
>coverage of the space of all hash values.
>
>
>>We're still working on whether the core idea behind Adler-32 (namely
>working
>>modulo a prime) is as powerful as it seems, but it is clear that to have
>>a hope of making it comparable to the TCP checksum or Fletcher, you have
to
>>sum 16-bit quantities into 16-bit fields.
>
>Just so the reasoning is clear, another alternative is to sum 8-bit
>inputs into 8-bit accumulators, modulo 251.  Given 32 bits of
>available checksum field, the 16-bit sums are preferable.
>
>Again, this is for short inputs.  For large inputs of several tens of
>Kbytes, the Adler32 sum should do much better (at 64Kbytes it should
>give very uniform coverage).  At those input sizes, the comparison comes
>down to Adler32 having 15 pairs of values which are congruent, mod
>65521, whereas a Fletcher sum wiht 16-bit inputs would have only one;
>versus better `stirring' from a prime modulus.
>
>I dont know what the distribution of sizes of zlib-compressed files
>is.  If they are generally large, then our work may not be applicable
>to Adler-32 its original designed purpose.
>
>
>I also don't know the history of why SCTP chose the Adler-32 sum
>rather than, say, CRC-32. The gossip I hear from IETF-going friend in
>the Bay Area is that there was concern about the performance of
>CRC-32; a direct bit-by-bit shift-and-add was seen as too slow.  I
>hear there was also a supposition that an efficient table-lookup
>version would require very large tables (128 kbits?) and that
>tables that arge were prohibitive for PDAs and small handheld devices.
>
>I am *not* suggesting that is an accurate report; it probably isn't.
>But if there's any grain of truth in it, both four-bit and eight-bit
>table-lookup algorithms for CRC32 exist.  Table lookup size need not
>be an issue.  Perhaps we should draw the SCTP authors -- C. Sharp? --
>into this discussion as well.
>
>----------------------------------------------------------
>Jonathan writes:
>
>A small terminological correction here:
>
>In message <200104101505.f3AF5BZ23145@aland.bbn.com>Craig Partridge writes
>>
>
>[...]
>
>>
>>so for
>>smaller packets the chance of collision is much greater than
1/2**sizeof(C)
>>simply because some bits in the checksum are always (or with very
>>high probability) set to 0; and (b) it looks like (and Jonathan's still
>>working on this) that the law of large numbers will cause the values to
>>cluster still further [I think of this behavior as the result of just
>>looking at all the bits instead of just the low order bits mod a prime,
>we're
>>vulnerable to the fact that the sums are not evenly distributed, by
>>the law of large numbers]
>
>I think its acutally a central limit theorem, not the law of large
>numbers.  For addition modulo M, the "central limit theorem" says that
>summation of many independent identically-distributed random variables
>will tend to a uniform distribution, not a normal distribution
>(as does the weighted sum)
>
>----------------------------------------------------------
>Jonathan writes:
>
>Jim,
>
>I forwarded my previos message to Chip Sharp.  (is the rfc2960
>address, chsharp@cisco.com, still valid?)
>
>He may wish to comment on the SCTP issues as well.
>
>
>----------------------------------------------------------
>----------------------------------------------------------
>Links to papers:
>
>End-to-End Internet Packet Dynamics, Vern Paxson
><
http://citeseer.nj.nec.com/cache/papers2/cs/11598/ftp:zSzzSzftp.ee.lbl.govz
>SzpaperszSzvp-pkt-dyn-ton99.pdf/paxson97endtoend.pdf>
>
>When the CRC and TCP Checksum Disagree, Jonathan Stone, Craig Partridge
><http://www.acm.org/sigcomm/sigcomm2000/conf/paper/sigcomm2000-9-1.pdf>
>
>Performance of Checksums and CRCs over Real Data, Jonathan Stone, Michael
>Greenwald, Craig Partridge, Jim Hughes
><
http://citeseer.nj.nec.com/cache/papers2/cs/1909/ftp:zSzzSzftp.dsg.stanford
>.eduzSzpubzSzpaperszSzsplice-paper.pdf/performance-of-checksums-and.pdf>
>
>----------------------------------------------------------
>
>
>