Re: The TCP and UDP checksum algorithm may soon need updating

Warren Kumari <> Wed, 10 June 2020 14:25 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id BE1A63A08D6 for <>; Wed, 10 Jun 2020 07:25:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id iEdKZJwI3Rrf for <>; Wed, 10 Jun 2020 07:25:16 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 22E993A0770 for <>; Wed, 10 Jun 2020 07:25:15 -0700 (PDT)
Received: by with SMTP id x18so2742568lji.1 for <>; Wed, 10 Jun 2020 07:25:15 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Nxi8LuJwjviJfSG3LwGDX7Dzh6X5sWz004mBAF7yqsg=; b=cPgMwvLTGjBPXyr8RpEpnZQJO93MOSEfkPUscLPwv2ERo2noSeU4t+2RUO0Po5QnKU GrOgcXY6wnmfqeGS7NzxRAhBokdKh7wQxxCfzH2OAjJsqWWxxHZ6aJjP4e/3EPZ7ib0s zIV5b0iSsOmUGxDUmZM+wx3fmZnvuwIJn2NV+bwOePOaHW8F4BnQz3SI62D/N66mCT5a cCUKgCcUwZnFLrvSLIuFZ53K59jf62iyTVv5rki3r7nNEBawZj8TcXeKo1uOy7NKo9a3 nQGkH00xQIX3zEs5elmwVxMti36vLhBP64geJnpqvWTUZlK0/tr96c6ftN29xeXXmaXX vAhw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Nxi8LuJwjviJfSG3LwGDX7Dzh6X5sWz004mBAF7yqsg=; b=ukT3DL4XR5RKKGVFctm+SAlhOr/IBWxuedCe8Wa6RVPPeh6496v6M7YpiH0/z/6KNv wHo9BMmEk4a8tVyCY9mcKixza25oO4IH13hBBj4OUhYhHVD/6lHOMd88o+XcGE1J8Ph0 eIbQGMtwVfk5zpzc6MKnOUb33wYP3KJJbg1wYuy/Kte28QVGrfqXiTajbfJ6P10aavjJ jm0DaBccu0qWif5/Yc79i2t5oXtmySaW5r0NVVs3uNLfspRboII4uIkcZk+DmWlYNXMS eGajgbn9Am+ZU/BM38wLGJXi8Jr8chK46pRpO1uBFyCxaTAzn3ROXBO5/LkJS7apRm1s pyBA==
X-Gm-Message-State: AOAM533hmRwYplEcBzFr3JOIFSJTGZx3rKtRwv6tq02Q96tUFay4Lx63 MNYPtghYILuqmztkkS2kAw2h2sy1YOUZakELmy99HYzFT/bHig==
X-Google-Smtp-Source: ABdhPJxR1Ptmown9cXIyPCvPjtGeChahe7eJlp+oZ2tyFJZrGfInrWz3rAgrJ9tIjimHAr6Rzwm14Oka4aJITAA8LHY=
X-Received: by 2002:a2e:5c2:: with SMTP id 185mr1830064ljf.260.1591799113739; Wed, 10 Jun 2020 07:25:13 -0700 (PDT)
MIME-Version: 1.0
References: <> <> <20200610001225.GD3100@localhost> <> <rbpbpp$2fgp$>
In-Reply-To: <rbpbpp$2fgp$>
From: Warren Kumari <>
Date: Wed, 10 Jun 2020 10:24:36 -0400
Message-ID: <>
Subject: Re: The TCP and UDP checksum algorithm may soon need updating
To: John Levine <>
Cc: IETF Discuss <>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <>
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 10 Jun 2020 14:25:19 -0000

On Tue, Jun 9, 2020 at 9:08 PM John Levine <> wrote:
> In article <>om>,
> Michael Thomas  <> wrote:
> >So the long and short of this entire issue seems to be is, is the
> >uncaught error rate serious enough that warrant rethinking weak
> >transport and frankly L2 layer error detection? ...
> Having read the papers that Craig referenced, that's my interpretation.
> One of them is about a big physics application which sends multiple
> terabytes of data over the net using what looks like a version of
> FTP that transfers several files at once.  They send the data as a lot
> of of 4 gig files. When they started verifying file checksums, they
> found about 20% of the received files were corrrupted in transit.

I'm assuming you are talking about "Cross-Geography Scientific Data
Transferring Trends and Behavior", which contains (Section 4.1
Checksum, encryption, and reliability, p.12):
"We note that if a user changes a file during a transfer, this action
can be reported as an integrity failure. We cannot distinguish this
from an actual failure."

Yes, I'm sure that checksum errors do exist, but from my quick checks
I haven't been seeing anything like the error rates discussed here --
and, as a quick sanity check, 4GB is on the same order as a DVD/many
OS distributions:
RedHat: 8.2.0 - 8GB, 7.9 Beta - 4GB --
Ubuntu: 18.04 (Desktop): 2GB --
Pop!_OS: 20.04LTS: 2.36 GB (NVIDIA) --
Fedora 32: Standard ISO image for x86_64: 1.9GB --
Kali Linux 64-Bit (Installer): 3.6GB --
Linux Mint 19.3 "Tricia" - Cinnamon (64-bit): 1.9GB --
debian-10.4.0-amd64-DVD-3.iso: 4.4GB --

I'm assuming that a: almost all of us have downloaded multiple copies
of at least a few of these, and b: we check the hashes on the ISOs we
I certainly haven't been seeing anything like 1 in 5, or 1 in 10 ISO
downloads with a corrupt hash[0].

I also move significant amounts of data around - perhaps I'm just
blessed, but if I were getting corruption on anything approaching that
level, I'm sure I'd have noticed - 20% errors in 4GB files mean I
should be seeing a corruption once every ~20GB. I regularly move TBs
around (backups, DRBD, large containers, databases, etc) - ssh/scp
will log "2: Packet corrupt" (or "Corrupted MAC on input.
Disconnecting: Packet corrupt" on the server side). I stuff all of my
logs into a combination of Logstash and Loki, and querying this gives
no occurrences of this message:

Loki: "{job="syslog",type="server"} |~ "sshd.*Corrupt MAC" == 0

Again, I'm sure that there are checksum errors, but I think that a:
there is lots of data that can be easily looked at to estimate
occurrence (including from CDNs and large scale operators), b: we need
to prioritize what we work on.
I'd love to see people having a look at their systems and reporting
what sorts of errors they see....

[0]: actually I've only once seen the checksum not match, and that was
because of a NAT box which tried to ALG fixups in the payload and
replaced all occurrences of the external address bit-pattern with the
internal one...

> In that application they resend the corrupt files and they obviously
> need make the files smaller. But retransmitting a file at a time seems
> a lot less efficient than improving the checksums and using the
> existing TCP packet level retransmission.
> --
> Regards,
> John Levine,, Primary Perpetrator of "The Internet for Dummies",
> Please consider the environment before reading this e-mail.
I don't think the execution is relevant when it was obviously a bad
idea in the first place.
This is like putting rabid weasels in your pants, and later expressing
regret at having chosen those particular rabid weasels and that pair
of pants.