Re: The TCP and UDP checksum algorithm may soon need updating

Craig Partridge <craig@tereschau.net> Sat, 06 June 2020 23:17 UTC

Return-Path: <craig@tereschau.net>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7A85E3A02C1 for <ietf@ietfa.amsl.com>; Sat, 6 Jun 2020 16:17:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.088
X-Spam-Level:
X-Spam-Status: No, score=-2.088 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_FILL_THIS_FORM_SHORT=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=tereschau.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RFioS828D-H9 for <ietf@ietfa.amsl.com>; Sat, 6 Jun 2020 16:17:27 -0700 (PDT)
Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 23B593A00D2 for <ietf@ietf.org>; Sat, 6 Jun 2020 16:17:27 -0700 (PDT)
Received: by mail-qk1-x72c.google.com with SMTP id n141so13734872qke.2 for <ietf@ietf.org>; Sat, 06 Jun 2020 16:17:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tereschau.net; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=uY5pZ1M02PgEp9MsnmosmqHQLovOLmFPOrPURXWqx/c=; b=kNi1Uyrkuu/lhmifvYPU1O9ZVN2VunLnyCSh8EcLDas1gEav2bcchBO25OizCVAizp 4fRt/EE8zD+GWJCxZAGpYSFm0Gjui8aHw6hxrzgEHcGP6PrNKN5B6u5bsApU9g9ycFeO ZOXcqZK4TnBVJdQGGIHfmJiJ8LQu/a4IKnE5636JxOPRHI5bpQ/OLJC5qGmUl3U3qQQ8 qAPmv4csMoZ0WMqA5XlfEHQQDv+bBVt5eUUNN2v6IMQ3C9kP7ZQOdDfZBj178GFpOMB/ COa4Hrr26FyVlGBPp/7GPVGT9SLJMx2n2OrU0LGutgjvEceIGRgKezRi3Q5UtfHd5+tJ hQRw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=uY5pZ1M02PgEp9MsnmosmqHQLovOLmFPOrPURXWqx/c=; b=aqN1DkPAHj0trTRZSLRNw8PEVUlAwhZHm3CDgLhCjk2MlLHyb+FWoUtijm1MStrTOn KlluMFOCppOB+yN4JDZtU+KZ2NrWXNNiSDWE82l536/j7y8CFIZSlb4KyAFmJcj+hiGx 4/VHoQuuoID4Kr1arV4IiSuw+7YGeCWJAmPsa+TQUYveg3Lz5zQ8jQKclGHxOx9qQWmT gSWV1NGKqpzI32Dtx9NEHu71cy1URouiwya4bP9siF9gksl1XM3iVgPuWbpz5RfLqe4U eOp/5ATKcBXxAH7q6UENz7MGiQf61ipwRc+dIxXZkAVIxdXbaJnhLrvnVuy73zWX0XGf NfCw==
X-Gm-Message-State: AOAM531f0tGgxdS43cFZEOEyU/AIsq5Uho4EsxZuwebK++SHgSF7TMNd 5ZmqjBWBVVb9P8l8VGaZnha7vQreWclE9RxnzxGHDw/wf2w=
X-Google-Smtp-Source: ABdhPJyA8uAtmSf8UZOiuT7sp3LUrYxIxJG/QS+dgIBwpdIznV5q1T4AwHYceQA5XrqvxCRX8H+g4s9VXxQcc8+30js=
X-Received: by 2002:a37:2c85:: with SMTP id s127mr13337862qkh.35.1591485444183; Sat, 06 Jun 2020 16:17:24 -0700 (PDT)
MIME-Version: 1.0
References: <CAHQj4Cf_vgXYEL=x4DCEnpwNxZpJQSD-h6MWmhMWpYwPF9XFow@mail.gmail.com> <0D18B54B-2865-4A3C-813B-595EA17F6D8B@gmail.com> <32750.1591376396@localhost>
In-Reply-To: <32750.1591376396@localhost>
From: Craig Partridge <craig@tereschau.net>
Date: Sat, 06 Jun 2020 17:17:13 -0600
Message-ID: <CAHQj4CdopwpEfyuOVO3ZywTKveQMpnt_WPh_JDRydgNKHVVmhw@mail.gmail.com>
Subject: Re: The TCP and UDP checksum algorithm may soon need updating
To: IETF discussion list <ietf@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000090ef8a05a772942e"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/53ROo-7Ix_ax8DoPGVvkJxZDIw0>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 06 Jun 2020 23:17:30 -0000

I got requests for citations/sources.  So here's what's known.

First, background reading from 20 years ago -- shows the distribution of
errors that we saw in TCP connections then.  Note while bit errors happen,
far more are bytes or larger.  This paper was also the origin of the idea
of putting 32-bit md5 sums on files...

@inproceedings{10.1145/347059.347561, author = {Stone, Jonathan and
Partridge, Craig}, title = {When the CRC and TCP Checksum Disagree}, year =
{2000}, isbn = {1581132239}, publisher = {Association for Computing
Machinery}, address = {New York, NY, USA}, url = {
https://doi.org/10.1145/347059.347561}, doi = {10.1145/347059.347561},
booktitle = {Proceedings of the Conference on Applications, Technologies,
Architectures, and Protocols for Computer Communication}, pages =
{309–319}, numpages = {11}, location = {Stockholm, Sweden}, series =
{SIGCOMM ’00} }

OK, on to what people are seeing today.  This shows that 1 in every 121
file transfers FTP delivers a file that, when you do the md5 sum, turns out
not to match the original (note there are multiple possible reasons, but
TCP checksum is a strong candidate).

@inproceedings{Liu:2018:CSD:3208040.3208053,

 author = {Liu, Zhengchun and Kettimuthu, Rajkumar and Foster, Ian and Rao,
Nageswara S. V.},

 title = {Cross-geography Scientific Data Transferring Trends and Behavior},

 booktitle = {Proceedings of the 27th International Symposium on
High-Performance Parallel and Distributed Computing},

 series = {HPDC '18},

 year = {2018},

 isbn = {978-1-4503-5785-2},

 location = {Tempe, Arizona},

 pages = {267--278},

 numpages = {12},

 url = {http://doi.acm.org/10.1145/3208040.3208053},

 doi = {10.1145/3208040.3208053},

 acmid = {3208053},

 publisher = {ACM},

 address = {New York, NY, USA},

 keywords = {GridFTP, file transfer, usage management, wide area network},

}


On a related point, about 60% of big file transfers in a major energy
network are failing (checksums are one of the suspects);


@inproceedings{shannigrahi2017request,

  title={Request aggregation, caching, and forwarding strategies for
improving large climate data distribution with NDN: a case study},

  author={Shannigrahi, Susmit and Fan, Chengyu and Papadopoulos, Christos},

  booktitle={Proceedings of the 4th ACM Conference on Information-Centric
Networking},

  pages={54--65},

  year={2017},

  organization={ACM}

}

This reference also sees high error rates:

Kettimuthu, Rajkumar, et al. "Transferring a Petabyte in a Day." Future
Generation Computer Systems 88 (2018): 191-198.

Anecdotally, folks are reporting some middlebox vendors are not updating
the TCP checksum but rather letting the outbound interface simply recompute
the entire checksum -- which means that if the TCP segment gets damaged
during middlebox handling, the middlebox will slap a valid checksum on bad
data.

As I noted -- the literature is slim, which is why a team I'm on is going
to seek more comprehensive error collection which actually captures the
errors in the data, so we can see what kinds of errors are causing trouble.

Craig
-- 
*****
Craig Partridge's email account for professional society activities and
mailing lists.