RE: dataplane encapsulation considerations - checksums
"Osborne, Eric" <eric.osborne@level3.com> Wed, 10 December 2014 18:16 UTC
Return-Path: <eric.osborne@level3.com>
X-Original-To: routing-discussion@ietfa.amsl.com
Delivered-To: routing-discussion@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 90DD91A8784 for <routing-discussion@ietfa.amsl.com>; Wed, 10 Dec 2014 10:16:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.902
X-Spam-Level:
X-Spam-Status: No, score=-1.902 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TroN7mz6RKuy for <routing-discussion@ietfa.amsl.com>; Wed, 10 Dec 2014 10:16:51 -0800 (PST)
Received: from mail1.bemta12.messagelabs.com (mail1.bemta12.messagelabs.com [216.82.251.15]) (using TLSv1.2 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1BE6F1A8763 for <routing-discussion@ietf.org>; Wed, 10 Dec 2014 10:16:51 -0800 (PST)
Received: from [216.82.250.83] by server-15.bemta-12.messagelabs.com id DC/B5-02699-21E88845; Wed, 10 Dec 2014 18:16:50 +0000
X-Env-Sender: eric.osborne@level3.com
X-Msg-Ref: server-5.tower-120.messagelabs.com!1418235393!11394491!1
X-Originating-IP: [209.245.18.37]
X-StarScan-Received:
X-StarScan-Version: 6.12.5; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 17729 invoked from network); 10 Dec 2014 18:16:33 -0000
Received: from bge23000.messagelabs1.prod.broomfield1.level3.net (HELO messagelabs1.level3.com) (209.245.18.37) by server-5.tower-120.messagelabs.com with DHE-RSA-AES256-SHA encrypted SMTP; 10 Dec 2014 18:16:33 -0000
Received: from USIDCWVEHT02.corp.global.level3.com (usidcwveht02.corp.global.level3.com [10.1.142.32]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "USIDCWVEHT02.corp.global.level3.com", Issuer "VIDCCERT0001" (not verified)) by messagelabs1.level3.com (Postfix) with ESMTPS id E886F1F736; Wed, 10 Dec 2014 18:16:32 +0000 (GMT)
Received: from USIDCWVEHT03.corp.global.level3.com (10.1.196.123) by USIDCWVEHT02.corp.global.level3.com (10.1.142.32) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 10 Dec 2014 11:16:32 -0700
Received: from USIDCWVEMBX08.corp.global.level3.com ([fe80::20f7:9e5b:2efa:2ad8]) by USIDCWVEHT03.corp.global.level3.com ([::1]) with mapi id 14.03.0195.001; Wed, 10 Dec 2014 11:16:32 -0700
From: "Osborne, Eric" <eric.osborne@level3.com>
To: "stbryant@cisco.com" <stbryant@cisco.com>, "l.wood@surrey.ac.uk" <l.wood@surrey.ac.uk>, "akatlas@gmail.com" <akatlas@gmail.com>, "routing-discussion@ietf.org" <routing-discussion@ietf.org>
Subject: RE: dataplane encapsulation considerations - checksums
Thread-Topic: dataplane encapsulation considerations - checksums
Thread-Index: AQHQFJWWVTgrZkXK8UKgYttOEC8fzpyJHWng
Date: Wed, 10 Dec 2014 18:16:30 +0000
Message-ID: <63CB93BC589C1B4BAFDB41A0A19B7ACD010FC5D2@USIDCWVEMBX08.corp.global.level3.com>
References: <CAG4d1rd60hK8=WtYw-nid_Z7Z8+TvdzA52fNx3pFjND+eDWAfA@mail.gmail.com>, <54877D58.9050002@cisco.com> <DB4PR06MB457F278EAF9C84BCA20E665AD620@DB4PR06MB457.eurprd06.prod.outlook.com>, <54883947.8000302@cisco.com> <1418217891573.89120@surrey.ac.uk> <5488728F.5030003@cisco.com>
In-Reply-To: <5488728F.5030003@cisco.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.1.196.207]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Archived-At: http://mailarchive.ietf.org/arch/msg/routing-discussion/-RyE3-QCuy6JLp5uxd2yK1of46M
X-BeenThere: routing-discussion@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Routing Area General mailing list <routing-discussion.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/routing-discussion>, <mailto:routing-discussion-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/routing-discussion/>
List-Post: <mailto:routing-discussion@ietf.org>
List-Help: <mailto:routing-discussion-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/routing-discussion>, <mailto:routing-discussion-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Dec 2014 18:16:55 -0000
Throwing in my two cents worth of data. I looked at two routers - one in the US, one in South America. The TCP numbers (largely BGP and SSH) are: Router #1: Rcvd: 3990580271 Total, 125489 no port 60097 checksum error, 43239 bad offset, 0 too short 60097/3990580271 = 0.0015% Router #2: 1291670737 packets received 20332 discarded for bad checksums (20332/1291670737) = 0.0015% again A little worse than Stewarts 1/10^5, but not much. Certainly not high on my list of Things That Give Me Ulcers. eric > -----Original Message----- > From: routing-discussion [mailto:routing-discussion-bounces@ietf.org] On > Behalf Of Stewart Bryant > Sent: Wednesday, December 10, 2014 11:19 AM > To: l.wood@surrey.ac.uk; akatlas@gmail.com; routing-discussion@ietf.org > Subject: dataplane encapsulation considerations - checksums > > Changing the thread title to match the subject > > Lloyd > > Those numbers are orders of magnitude larger than anything I have > seen elsewhere. > > If we look at them in detail they are interesting because if the no ports > are c/s errors as you hypothesis, then only 1/3 of the errors are seemingly > caught by the c/s. I would thus expect you to be arguing strongly > that we need to deprecate the existing c/s in favour of something > much stronger such as fletcher, or ideally a crypto checksum. > In the case of the link state IGPs we have operators that turn on > link state security not for security but for the enhanced checksum > and that seems to be what you need here in the host stacks. > > However we have another study that is worth looking at: > > https://www.verisigninc.com/assets/VRSN_Bitsquatting_TR_20120320.pdf > > This looks at the UDP c/s errors received by DNS servers, and they > saw an error rate of 1 in 10^5. However if you read the detail there are > some > systematic effects going on, with a significant fraction seemingly due > to transmit host stack problems which may be what you are seeing. > > Now I think all this points to two things, firstly from a perspective > of a host the c/s looks inadequate and there may need to be a > revision of the host stack. > > On the other hand the reported error rate for UDP that the paper > reports is an over estimate of the error rate in the tunnel case since > only an error in the IP and UDP header can cause misdelivery > with all other errors reflecting themselves as payload errors which > the payload error protection. Although I do note that they report > some systematic effect in terms of bit position which may point either > way. > > One other aspect of this, which is important in Routing, if we are seeing > the error rates you imply are getting past the TCP c/s, surely the routing > protocols will be importing long term errors into the routing subsystems > (specifically BGP and LDP). Are people observing that and if so don't we > need to fix it? > > - Stewart > > > > On 10/12/2014 13:24, l.wood@surrey.ac.uk wrote: > > Stewart, > > > > Who needs an NMS? Let's go old school. > > TCP and UDP have equivalent pseudo-header+payload ones complement > checksums. So data from TCP can be considered as somewhat equivalent as the > technology is the same, and TCP is somewhat better instrumented and > monitored than UDP is. > > > > Below sample from two core switches, on tcp connections to them (not, alas, > through them) shows e.g. 10287 TCP checksum errors for 4.3 million TCP/IP > packets received. That would be an error rate of 0.24% - the other switch is > 0.45% . Note the figure of e.g. 19458 no port - entirely possible something > inside the core network is trying a port that isn't there, but also possible > corruption of port nos. These are firewalled core devices with limited traffic to > them, so I would consider this a best case, without throwing edge device data > into the mix. > > > > Extrapolating from TCP checksum rates to UDP checksum rates, from traffic > to to traffic through, and from v4 to v6 is left as an exercise for the reader. > > > > Incidentally, these rates pretty much match the 1 in 400 observation made > early in Stone's SIGCOMM 2000 paper. 1:400 is 10,000:4,000,000 > > http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9- > 1.pdf > > Zut alors, quelle surprise, physics is still physics, this particular SIGCOMM > paper is still valid. > > Checksums fail because of corruption. But at least there's a checksum to > catch the corruption. Sans checksums, the corruption is unnoticed. > > > > The main reason noone has seen misbehaviour with MPLS is because no-one > is looking for or instrumenting for it. (Just like MPLS doesn't need TTL, because > no-one ever sees MPLS routing loops, right?) Please go measure MPLS and > report back. > > > > Traversing over UDP, as suggested in draft-ietf-mpls-in-udp and draft-ietf- > tsvwg-gre-in-udp-encap, increases the scope for corruption across a longer > path, rather than individual links - and the UDP checksum is an important last > check. > > > > SWT1#sh tcp stat > > Rcvd: 4332545 Total, 19458 no port > > 10287 checksum error, 4959 bad offset, 0 too short > > 62946 packets (2289083 bytes) in sequence > > 93 dup packets (19167 bytes) > > 34 partially dup packets (1878 bytes) > > 107 out-of-order packets (17980 bytes) > > 8 packets (8 bytes) with data after window > > 0 packets after close > > 0 window probe packets, 614 window update packets > > 682 dup ack packets, 0 ack packets with unsend data > > 179823 ack packets (37459618 bytes) > > Sent: 2049509 Total, 0 urgent packets > > 1774234 control packets (including 345 retransmitted) > > 249103 data packets (37439059 bytes) > > 308 data packets (41767 bytes) retransmitted > > 46 data packets (4876 bytes) fastretransmitted > > 25718 ack only packets (3764 delayed) > > 0 window probe packets, 104 window update packets > > 9649 Connections initiated, 1230 connections accepted, 10857 connections > established > > 1760675 Connections closed (including 30 dropped, 1749812 embryonic > dropped) > > 653 Total rxmt timeout, 9 connections dropped in rxmt timeout > > 0 Keepalive timeout, 6 keepalive probe, 0 Connections dropped in keepalive > > > > > > SWT2#sh tcp stat > > Rcvd: 6370948 Total, 37608 no port > > 28419 checksum error, 1749 bad offset, 0 too short > > 181714 packets (3507560 bytes) in sequence > > 130 dup packets (4925 bytes) > > 164 partially dup packets (302 bytes) > > 35 out-of-order packets (278 bytes) > > 64 packets (4549 bytes) with data after window > > 0 packets after close > > 0 window probe packets, 2006 window update packets > > 3865 dup ack packets, 0 ack packets with unsend data > > 857922 ack packets (92503306 bytes) > > Sent: 3793587 Total, 0 urgent packets > > 2667720 control packets (including 215 retransmitted) > > 1042867 data packets (92804839 bytes) > > 1772 data packets (74842 bytes) retransmitted > > 192 data packets (7176 bytes) fastretransmitted > > 81025 ack only packets (15840 delayed) > > 0 window probe packets, 19 window update packets > > 28735 Connections initiated, 5928 connections accepted, 34638 connections > established > > 2626485 Connections closed (including 1349 dropped, 2591827 embryonic > dropped) > > 1987 Total rxmt timeout, 4 connections dropped in rxmt timeout > > 0 Keepalive timeout, 0 keepalive probe, 0 Connections dropped in keepalive > > > > Lloyd Wood > > http://about.me/lloydwood > > > > error-free modern networking technology? ha. > > _______________________________________________ > routing-discussion mailing list > routing-discussion@ietf.org > https://www.ietf.org/mailman/listinfo/routing-discussion
- routing area design team on dataplane encapsulati… Alia Atlas
- Re: routing area design team on dataplane encapsu… Stewart Bryant
- Re: routing area design team on dataplane encapsu… Alia Atlas
- Re: routing area design team on dataplane encapsu… Erik Nordmark
- Re: routing area design team on dataplane encapsu… l.wood
- Re: routing area design team on dataplane encapsu… Stewart Bryant
- Re: routing area design team on dataplane encapsu… l.wood
- Re: routing area design team on dataplane encapsu… l.wood
- Re: routing area design team on dataplane encapsu… Stewart Bryant
- Re: routing area design team on dataplane encapsu… Stewart Bryant
- Re: routing area design team on dataplane encapsu… l.wood
- Re: routing area design team on dataplane encapsu… Andrew G. Malis
- Re: routing area design team on dataplane encapsu… Alia Atlas
- dataplane encapsulation considerations - checksums Stewart Bryant
- RE: dataplane encapsulation considerations - chec… Osborne, Eric
- RE: dataplane encapsulation considerations - chec… Pat Thaler
- RE: routing area design team on dataplane encapsu… Fedyk, Don
- Re: dataplane encapsulation considerations - chec… 🔓Dan Wing
- Re: routing area design team on dataplane encapsu… Alia Atlas
- RE: routing area design team on dataplane encapsu… Templin, Fred L
- Re: routing area design team on dataplane encapsu… Alia Atlas