The TCP and UDP checksum algorithm may soon need updating

Craig Partridge <craig@tereschau.net> Thu, 04 June 2020 19:12 UTC

Return-Path: <craig@tereschau.net>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CC9663A0E00 for <ietf@ietfa.amsl.com>; Thu, 4 Jun 2020 12:12:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=tereschau.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YBffg5icST0g for <ietf@ietfa.amsl.com>; Thu, 4 Jun 2020 12:12:28 -0700 (PDT)
Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1879B3A0D6A for <ietf@ietf.org>; Thu, 4 Jun 2020 12:12:28 -0700 (PDT)
Received: by mail-qv1-xf29.google.com with SMTP id g7so3477555qvx.11 for <ietf@ietf.org>; Thu, 04 Jun 2020 12:12:27 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tereschau.net; s=google; h=mime-version:from:date:message-id:subject:to; bh=mfCyEDZ8+LDidOBytFMLCa0s2KGYWp/VJT2KO2REfWc=; b=WlcWEqtcMqiWztd7GrsL4XSUf8l+lzPbDYJFZiJWji9vCP8oWD59pEIkGLrsmYg+J9 AIFUihOUwdA2fvYjeOQoe0LwdDgfvyz2ZYwc7EtF189ZznHeO2DYLx+nMmor/PFZU9zF wKY4UTPkZFsBp0djVYmeLspJpfCDRopvCLOZ329oKX5S49orecUElSQ8bYFNNqKmMp2/ JGao7xWtcm27susWA7ZLTutyGUiNMX90TycNr4lWMVjUD7e/0booIGXOM4REw8YyE5Db jqeQ5hRVHXGmb1bwLiy9MXiRhE8r25USX94/N0OZ8MvjB4TugUG52fAp81kJkStpxWMm LxWw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=mfCyEDZ8+LDidOBytFMLCa0s2KGYWp/VJT2KO2REfWc=; b=sJ3PIgI1hRs1LLZ5r8O9T3jf+kl4JoZSlb0vVb3yDCs5LHB0p2+nuD0AnlAbMnzhus nu4m7EWtkCiOMcAISJ2TYo+NeH6y7zxshARFd/bPNTYkzn+Ax0TP2cxzxPAwQX2kr/hw zxFrkIfpix2inIXo1TaH169SEDPShl9pzXxpXTeTU/Trr4/7taYsXnhMfMsORFWu2/wi r4ytCMZxH01VDC2WjmUOVmwKzZnYEkiD8O0gfP+r9RYDzMpJ0wwR4LNXp1BBM9XfBwfy 7IZ6VaJtvueueGCNYfkKeQab81WJmcK1xxVHF37ol4127wMza3cyd/PCdZtvOqPm0rZv 5cAg==
X-Gm-Message-State: AOAM531+NQUlbnxiQbEx7y97BZg2gTg/pW68GcHjBzx1r8Lb1+7xOCQO 9vYShP5zlGqyyW0uAiUYYU1YU0gNGYrvpNZYqbTRJi5vyW0=
X-Google-Smtp-Source: ABdhPJzbAFHaaY20xWI3gkxgSTsS3F8VzmQfyBRN8XVQ6z8q/yN/ZXFhBGE8yeYtfK15ZpTS86hdwIsP01zhzOcuECE=
X-Received: by 2002:a0c:c2c2:: with SMTP id c2mr6070589qvi.150.1591297946789; Thu, 04 Jun 2020 12:12:26 -0700 (PDT)
MIME-Version: 1.0
From: Craig Partridge <craig@tereschau.net>
Date: Thu, 04 Jun 2020 13:12:14 -0600
Message-ID: <CAHQj4Cf_vgXYEL=x4DCEnpwNxZpJQSD-h6MWmhMWpYwPF9XFow@mail.gmail.com>
Subject: The TCP and UDP checksum algorithm may soon need updating
To: ietf@ietf.org
Content-Type: multipart/alternative; boundary="000000000000d9b8b605a746ec74"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/UlX-EcQvxErWPUTsrf0lUyRZYMQ>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Jun 2020 19:12:33 -0000

Hi folks:

This note is intended as an invitation to think a bit about a potential
hard problem.

There's a small body of literature suggesting that the TCP checksum is
regularly failing to detect errors and that we're getting close the point
where using an MD5 authentication check will be insufficient (e.g. the
number of times the TCP checksum fails to detect errors is so large that
TCP passes through enough errors that the md5 check won't catch all of
them).  This situation is due to the growth in both total traffic and the
size of individual data transfers.  This is not a surprise -- it was
anticipated 20 years ago, when studies showed the TCP checksum was quite
weak.

I'm part of a team that is setting out to do a big set of measurements to
figure out if the other reports that we're close to a tipping point are (a)
correct; and (b) what kinds of errors are getting through.  That data will
tell us if a new checksum is warranted.  We hope to know in about a year.
We have time to think.

If we need a new checksum, then we are in an interesting space.  There is a
defined way to negotiate a new checksum in TCP (RFC 1146, now obsolete, but
we can presumably unobsolete it).  But all the middleboxes that like to
muck with the TCP header and data would have to both (a) learn about the
option and (b) implement the new checksum algorithm(s).  Middleboxes are
the problem because if an end system doesn't update the checksum, that's on
the end system owner and their willingness to risk bad data.  But if an end
system updates and can't transfer data due to the middlebox's ignorance,
that's a larger system problem.

Then there's UDP.  UDP has no options.  We could retrofit options by
leveraging the reserved zero checksum value and some magic codes at the
start or end of the UDP data, but that's ugly.  Or we could define a UDPv2
(UDP has no version numbers either!) and give it another IP protocol
number.  But if we don't fix UDP, protocols above UDP, like QIUC, need
fixing...

I don't think we'll need to fix IP and ICMP, as the consequences of
occasional error aren't a big deal for them.  A misrouted packet or
unreadable ICMP in every million packets or so is probably OK.

At any rate, in a spare moment, worth pondering how one might proceed.

Thanks!

Craig

PS: Some folks may wonder if we couldn't protect ourselves by using a
bigger MAC than md5.  Yes but (a) that doesn't solve the problem for
protocols that don't do message authentication; and (b) MACs are lousy
checksums.

That second statement may surprise folks, so here's the one paragraph
point.  A checksum for error checking (i.e. not proof against adversaries)
should be designed to detect all instances of common types of errors and,
for errors other than those common types, detect errors proportionate to
the width (in bits) of the checksum. Thus, for a checksum of width W bits,
we'd expect it to fail to detect an error with a probability of 1 in
2^(W+4) or better.  Some newer checksums may be able to do even better,
like 1 in 2^(2*W). Whereas a MAC of width W bits can only fail to catch
errors with a probability of 1 in 2^W due to the additional requirement to
thwart an adversary (not sure this is a proven property, but it has
consistently been true).  So, for the same width in bits, a checksum
catches many more errors -- and checksums are computationally much cheaper
to compute than MACs.

-- 
*****
Craig Partridge's email account for professional society activities and
mailing lists.