Re: [tcpm] draft-eggert-tcpm-historicize-00

Joe Touch <touch@isi.edu> Sun, 27 June 2010 18:51 UTC

Return-Path: <touch@isi.edu>
X-Original-To: tcpm@core3.amsl.com
Delivered-To: tcpm@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id CB6933A6980 for <tcpm@core3.amsl.com>; Sun, 27 Jun 2010 11:51:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.974
X-Spam-Level:
X-Spam-Status: No, score=-0.974 tagged_above=-999 required=5 tests=[AWL=-0.975, BAYES_50=0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sELlh0rmmnTi for <tcpm@core3.amsl.com>; Sun, 27 Jun 2010 11:51:44 -0700 (PDT)
Received: from nitro.isi.edu (nitro.isi.edu [128.9.208.207]) by core3.amsl.com (Postfix) with ESMTP id 064043A6979 for <tcpm@ietf.org>; Sun, 27 Jun 2010 11:51:44 -0700 (PDT)
Received: from [192.168.1.92] (pool-71-106-88-10.lsanca.dsl-w.verizon.net [71.106.88.10]) (authenticated bits=0) by nitro.isi.edu (8.13.8/8.13.8) with ESMTP id o5RIpG8M000395 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Sun, 27 Jun 2010 11:51:26 -0700 (PDT)
Message-ID: <4C279DA4.8080604@isi.edu>
Date: Sun, 27 Jun 2010 11:51:16 -0700
From: Joe Touch <touch@isi.edu>
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: "Biswas, Anumita" <Anumita.Biswas@netapp.com>
References: <A3D02FB7C6883741952C425A59E261A509732537@SACMVEXC2-PRD.hq.netapp.com>
In-Reply-To: <A3D02FB7C6883741952C425A59E261A509732537@SACMVEXC2-PRD.hq.netapp.com>
X-Enigmail-Version: 0.96.0
Content-Type: multipart/signed; micalg="pgp-sha1"; protocol="application/pgp-signature"; boundary="------------enigF135499E5D967A50BC629B9E"
X-MailScanner-ID: o5RIpG8M000395
X-ISI-4-69-MailScanner: Found to be clean
X-MailScanner-From: touch@isi.edu
Cc: tcpm@ietf.org, ananth@cisco.com, L.Wood@surrey.ac.uk
Subject: Re: [tcpm] draft-eggert-tcpm-historicize-00
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tcpm>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 27 Jun 2010 18:51:44 -0000

Hi, Anumita,

Biswas, Anumita wrote:
...
> I have heard only anecdotal evidence of a very large company doing very
> large copies of data (about a Tera Byte) across a local network and then
> finding bit corruptions in the data received on the other end - clearly
> the link CRC and the TCP 1s compl summation checksum failed to catch
> those errors. 

Well, that's the point. It's not so clear that the checks are what failed. It
could easily have been a bug in the implementation, or an error in the hardware
that corrupted data after it was received correctly. I.e., no matter how good
these checks, there are other ways in which data can be corrupted.

> In my own company, I have seen a case where a very large block (1460
> bytes - a packet size) of data was replaced and gone undetected and we
> could not root cause the problem - the problem just disappeared after
> showing up for a few weeks - but in this case, it is possible that the
> corruption occurred before the data was sent by TCP itself. 

Yes, that's my concern.

> So to me, to hinge whether we support a new TCP CRC based checksum or
> not, on practical experience in the wild, is probably not a safe
> approach. Ofcourse, if we had multiple proven examples of this in the
> wild, we'd feel more motivated to do this. 

I don't follow this logic. We have no measured evidence of this ever occurring.
The evidence we have to date is that the end system software is more likely the
problem.

Everything so far points to the need for an application protocol check, i.e., a
final, total transfer checksum on each large transaction (i.e., the whole file).
That was one conclusion of the 2000 paper, FWIW.

> Also, I am more worried about packet sizes greater than 1500 bytes -
> where the Ethernet CRC checksum becomes weaker. Typical ethernet MTUs
> across the internet does not exceed 1500 bytes today - so right now, the
> problem scope is limited to Data Centers and other such local LANs where
> the end to end PMTU is greater than 1500 bytes.

That should be verifiable. Then again, if the CRC32 fails on jumboframes, why
would adding CRC16 at TCP help? other than the fact that it's a 'different' sum?
 Your argument suggests that such data centers need different *ethernet*
checksums, not different TCP ones. Then the data centers can leverage - and
reuse - that where necessary.

Joe