Re: The TCP and UDP checksum algorithm may soon need updating

tom petch <daedulus@btconnect.com> Wed, 10 June 2020 16:31 UTC

Return-Path: <daedulus@btconnect.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6791A3A09F1 for <ietf@ietfa.amsl.com>; Wed, 10 Jun 2020 09:31:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, MSGID_FROM_MTA_HEADER=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=btconnect.onmicrosoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id EjlvuRD6q4os for <ietf@ietfa.amsl.com>; Wed, 10 Jun 2020 09:31:26 -0700 (PDT)
Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2137.outbound.protection.outlook.com [40.107.20.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id CA3BC3A09F0 for <ietf@ietf.org>; Wed, 10 Jun 2020 09:31:25 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=WMuXmdL0VzIYxgqPri+IXhyi7D/z4G5WEB26QwZ8baXiBj3G+9iA5OFYkegdHoK/7SJ89Z+wFhupLHOKGS0n9rRceGDsP5R0GWuMf6z4RnYoOo/0TaieRcZraVyeubLale9eayNik5JQygIGeZIQ1w/4LCpbF7Lk+Y/Kuz7Jy0Lg8chlhNRPX8V38PqVOmwIpuAPDOv2BftmwlR/9RZOc/aKfGJJiSHkSq9OrJIeojMApr50NPKpRkUlX04toWg+zlGEc6WPALzNMbnh93oKOgAMUtRnp2sOFakjjpoBf7Lwyy4fw1Kp33Nwb0Q8x+KQ3/74zjg3fO7+4tyU7Qu9cg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Th34yQL627g8cAHbOO+XSaVMrihWmxZG3KxlODq98TM=; b=i4nmsrRw/BXr80y6c4a26Sgrfn3awjqfOWbfOeXV/KtV4aKHbk80HHMo0IZqTaMNqiAC6a+dbVIVBwmR7z+SrnwvOuD6Xzr/gRLh0C/RalDWvm9frHFfE1t6GQO/tsSX9p0bzuE0lRdvwSWvvRzJD1ZYhwSoDBGfXEtWxDFvvNEn9gXgSVqsT7yEwBN9eVzG87u9cLEk69v4XAfXNpp4Xns5XWwYxE5U3asIY0QLYTm1p6kk5unzOgacS7CvI5rKmQIoYtptPDiK7tFfA4VL5dDAEafH+WslJHvG+FiBIfNFGQNwgF1CJAzZ1GBjs7VULA7pjsC6G9XiWbaS5DrmLA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=btconnect.com; dmarc=pass action=none header.from=btconnect.com; dkim=pass header.d=btconnect.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=btconnect.onmicrosoft.com; s=selector2-btconnect-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Th34yQL627g8cAHbOO+XSaVMrihWmxZG3KxlODq98TM=; b=jo9M1CLHYsUlqjLmkzRk0178qxCSAUQSfBgN1XntdvzqchpKlQ/G8ZtTitzxij627+UFaF1e5ilFX4/eN+/MuvrOixBpjvJFlGrZt1AYDag9/GmF7OXwAOaFXM0gL8oHk1Hes+XABrS5JKBLxjUwz6h1UUz9Z6mMITYLgen45ng=
Authentication-Results: kumari.net; dkim=none (message not signed) header.d=none;kumari.net; dmarc=none action=none header.from=btconnect.com;
Received: from VI1PR0701MB2480.eurprd07.prod.outlook.com (2603:10a6:800:63::16) by VI1PR0701MB6943.eurprd07.prod.outlook.com (2603:10a6:800:19b::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3088.10; Wed, 10 Jun 2020 16:31:23 +0000
Received: from VI1PR0701MB2480.eurprd07.prod.outlook.com ([fe80::3474:b82e:e75a:b176]) by VI1PR0701MB2480.eurprd07.prod.outlook.com ([fe80::3474:b82e:e75a:b176%11]) with mapi id 15.20.3109.007; Wed, 10 Jun 2020 16:31:23 +0000
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
References: <D55AFBFD-0D59-4176-B6BD-D6A1801FEC2C@akamai.com> <6c9f5bd9-6e26-5d25-e66e-bec206473ff3@mtcc.com> <20200610001225.GD3100@localhost> <3ac60a21-4aee-d742-bedc-5be3a4e65471@mtcc.com> <rbpbpp$2fgp$1@gal.iecc.com> <CAHw9_iK226J9FV8+Jdhr3=NBLRr4U-gr21Nn-JXdNK7Gzu66jw@mail.gmail.com>
Date: Wed, 10 Jun 2020 17:10:19 +0100
Message-ID: <1UWB4Lpxjt.1Tg13nQabPR@pc8xp>
In-Reply-To: <CAHw9_iK226J9FV8+Jdhr3=NBLRr4U-gr21Nn-JXdNK7Gzu66jw@mail.gmail.com>
From: "tom petch" <daedulus@btconnect.com>
To: "Warren Kumari" <warren@kumari.net>, "John Levine" <johnl@taugh.com>
Cc: "IETF Discuss" <ietf@ietf.org>
Subject: Re: The TCP and UDP checksum algorithm may soon need updating
User-Agent: OEClassic/3.0 (WinXP.2600; F; 2019-11-28)
X-ClientProxiedBy: LO2P265CA0039.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:61::27) To VI1PR0701MB2480.eurprd07.prod.outlook.com (2603:10a6:800:63::16)
MIME-Version: 1.0
X-MS-Exchange-MessageSentRepresentingType: 1
Received: from pc8xp (86.139.211.47) by LO2P265CA0039.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:61::27) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3088.19 via Frontend Transport; Wed, 10 Jun 2020 16:31:22 +0000
X-Originating-IP: [86.139.211.47]
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: e730bf26-9e14-412e-ccbf-08d80d5bb6b0
X-MS-TrafficTypeDiagnostic: VI1PR0701MB6943:
X-Microsoft-Antispam-PRVS: <VI1PR0701MB69433264760172D77A36A324C6830@VI1PR0701MB6943.eurprd07.prod.outlook.com>
X-MS-Oob-TLC-OOBClassifiers: OLM:10000;
X-Forefront-PRVS: 0430FA5CB7
X-MS-Exchange-SenderADCheck: 1
X-Microsoft-Antispam: BCL:0;
X-Microsoft-Antispam-Message-Info: hCgT5/HBYvivm63NZ+iVGyqYJMQHD5AawDaymxrIQ1bIBFw8fSKoRIq8/xI/ApcD+H/fGwqxThCv1SakRovkqChc/3Z9uM/J6yIrwJDrbvAlF4oVgIUwEbhueb3D4TFe/dYG17wk+eiyIyPExbZWv9yQDIuep0welrnzSZFULkBJpKwxNtLumrZUb+Uj+cZXLxj55Rso2YE34wShoFsJH5ToXjsbpgmeG8GJkBvVErX+py6rIkpX8J2NBCvNbJDdIMlwZck3BCClsRxDUTkMFKP70ZfETmtpSL+ACZ9xTSsxDPQyg+bJuG2xBWOYDcykdx2ySg6KbCTCBl6hUPLw96G/x926I9ZPLm+sRF/WzhhWkVVwZkU6JtgrDHd7iNJ7j7Smc4vSQ3PXN2RKg4xBlg==
X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR0701MB2480.eurprd07.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(346002)(136003)(376002)(366004)(39860400002)(396003)(6666004)(83080400001)(8676002)(66476007)(53546011)(16526019)(26005)(52230400001)(66556008)(4326008)(316002)(52116002)(110136005)(6496006)(66946007)(8936002)(33716001)(9576002)(186003)(16799955002)(478600001)(86362001)(966005)(55016002)(9686003)(45080400002)(83380400001)(66574014)(5660300002)(2906002)(956004); DIR:OUT; SFP:1102;
X-MS-Exchange-AntiSpam-MessageData: wZHo6rRENeFnKxtqQnSHXKQ/6/62rLfcYl8Vx1OApKbCtCkDZ1fFIhZVTwHnZenHXcgfb7w9ds8BXRMf+6gaqf2GcAzBPIaq4c/oFqM/uOkE+5RkHEOTyEa2zQ018j8um/ivg2EinjIPMHy54u4sKywJrVI+nyrHREkL8NKkP3XMT1f+XbxvBdEJvmqNHgV8geqMPYY9IfD6jq93s2VDzzCnyJtQsU/EmjpLV9XwYCDPTcNHoB+8u6+FODJ8awEaccxpVL69kDEDjmJwHDalmZSVh024Mmwz2f2jwPgdch3L/k8Z4cJ/tPRi5jGOMD20fdYV1mTGQHTWleKIUpMUd2mkXuZhlOInTPND5VP4CtKklRPvjG99qcHhZOEzLeyFBFE51vkasyxi7aNZgT5Mx+fK/oyt+EL6AR0/tBW9dsupdk8MtoVED+AcmBTjB8OM372zmFIFlQHWe9AAzKWKsvaFe0+35g52/tzivbRSG5w=
X-OriginatorOrg: btconnect.com
X-MS-Exchange-CrossTenant-Network-Message-Id: e730bf26-9e14-412e-ccbf-08d80d5bb6b0
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 Jun 2020 16:31:23.5448 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: cf8853ed-96e5-465b-9185-806bfe185e30
X-MS-Exchange-CrossTenant-MailboxType: HOSTED
X-MS-Exchange-CrossTenant-UserPrincipalName: vDpczSO7zsZGBFdfY2fyy7HlZ6X1FEnXjRUXSpIJRfTJWpRCGueX6r/NjS9nLU8y0cpIoIUofjq6Uhs8as22mQ==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0701MB6943
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/9e-WASZLV0g5CMfFXYsaS77T2K4>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Jun 2020 16:31:28 -0000

----- Original Message -----
From: Warren Kumari <warren@kumari.net>
To: John Levine <johnl@taugh.com>
Cc: IETF Discuss <ietf@ietf.org>
Sent: 10/06/2020 15:24:36

On Tue, Jun 9, 2020 at 9:08 PM John Levine <johnl@taugh.com> wrote:
>
> In article <3ac60a21-4aee-d742-bedc-5be3a4e65471@mtcc.com>om>,
> Michael Thomas  <mike@mtcc.com> wrote:
> >So the long and short of this entire issue seems to be is, is the
> >uncaught error rate serious enough that warrant rethinking weak
> >transport and frankly L2 layer error detection? ...
>
> Having read the papers that Craig referenced, that's my interpretation.
>
> One of them is about a big physics application which sends multiple
> terabytes of data over the net using what looks like a version of
> FTP that transfers several files at once.  They send the data as a lot
> of of 4 gig files. When they started verifying file checksums, they
> found about 20% of the received files were corrrupted in transit.

I'm assuming you are talking about "Cross-Geography Scientific Data
Transferring Trends and Behavior", which contains (Section 4.1
Checksum, encryption, and reliability, p.12):
"We note that if a user changes a file during a transfer, this action
can be reported as an integrity failure. We cannot distinguish this
from an actual failure."


Yes, I'm sure that checksum errors do exist, but from my quick checks
I haven't been seeing anything like the error rates discussed here --
and, as a quick sanity check, 4GB is on the same order as a DVD/many
OS distributions:
RedHat: 8.2.0 - 8GB, 7.9 Beta - 4GB --
https://developers.redhat.com/products/rhel/download
Ubuntu: 18.04 (Desktop): 2GB -- https://releases.ubuntu.com/18.04/
Pop!_OS: 20.04LTS: 2.36 GB (NVIDIA) -- https://pop.system76.com/
Fedora 32: Standard ISO image for x86_64: 1.9GB --
https://getfedora.org/en/server/download/
Kali Linux 64-Bit (Installer): 3.6GB -- https://www.kali.org/downloads/
Linux Mint 19.3 "Tricia" - Cinnamon (64-bit): 1.9GB --
https://www.linuxmint.com/edition.php?id=274
debian-10.4.0-amd64-DVD-3.iso: 4.4GB --
https://cdimage.debian.org/debian-cd/current/amd64/iso-dvd/


<tp>
Warren

I have had two experiences of errors in downloads.

One was downloading pdf from an academic institution when about one in three was corrupt and unusable; repeating the download eventually succeeded.  Typical size would be 350kilobyte

The other was downloading large reference XML documents when almost every one was corrupt but being text, the corruption was apparent and I could edit the characters to make the tags valid and view the documents.  A typical size was 250 kilobyte.

I download a lot from other organisations, such as the IETF, FTP whenever possible, and have never detected corruption anywhere else.  My inference was, in both cases, that the corruption was taking place in the backend servers and not in the network ie that TCP was doing its job of reliably 

---
New Outlook Express and Windows Live Mail replacement - get it here:
https://www.oeclassic.com/

delivering corrupted data!

Tom Petch



I'm assuming that a: almost all of us have downloaded multiple copies
of at least a few of these, and b: we check the hashes on the ISOs we
downloaded.
I certainly haven't been seeing anything like 1 in 5, or 1 in 10 ISO
downloads with a corrupt hash[0].

I also move significant amounts of data around - perhaps I'm just
blessed, but if I were getting corruption on anything approaching that
level, I'm sure I'd have noticed - 20% errors in 4GB files mean I
should be seeing a corruption once every ~20GB. I regularly move TBs
around (backups, DRBD, large containers, databases, etc) - ssh/scp
will log "2: Packet corrupt" (or "Corrupted MAC on input.
Disconnecting: Packet corrupt" on the server side). I stuff all of my
logs into a combination of Logstash and Loki, and querying this gives
no occurrences of this message:

Loki: "{job="syslog",type="server"} |~ "sshd.*Corrupt MAC" == 0

Again, I'm sure that there are checksum errors, but I think that a:
there is lots of data that can be easily looked at to estimate
occurrence (including from CDNs and large scale operators), b: we need
to prioritize what we work on.
I'd love to see people having a look at their systems and reporting
what sorts of errors they see....

W
[0]: actually I've only once seen the checksum not match, and that was
because of a NAT box which tried to ALG fixups in the payload and
replaced all occurrences of the external address bit-pattern with the
internal one...

>
> In that application they resend the corrupt files and they obviously
> need make the files smaller. But retransmitting a file at a time seems
> a lot less efficient than improving the checksums and using the
> existing TCP packet level retransmission.
>
> --
> Regards,
> John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail. https://jl.ly
>
--
I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.
   ---maf