Re: routing area design team on dataplane encapsulation considerations

<l.wood@surrey.ac.uk> Wed, 10 December 2014 13:25 UTC

Return-Path: <l.wood@surrey.ac.uk>
X-Original-To: routing-discussion@ietfa.amsl.com
Delivered-To: routing-discussion@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 626E61A89ED for <routing-discussion@ietfa.amsl.com>; Wed, 10 Dec 2014 05:25:05 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.602
X-Spam-Level:
X-Spam-Status: No, score=-2.602 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VtUEQvfDma23 for <routing-discussion@ietfa.amsl.com>; Wed, 10 Dec 2014 05:25:01 -0800 (PST)
Received: from mail1.bemta3.messagelabs.com (mail1.bemta3.messagelabs.com [195.245.230.172]) (using TLSv1.2 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4F2821A89A5 for <routing-discussion@ietf.org>; Wed, 10 Dec 2014 05:24:59 -0800 (PST)
Received: from [195.245.230.131] by server-12.bemta-3.messagelabs.com id 70/D1-26740-9A948845; Wed, 10 Dec 2014 13:24:57 +0000
X-Env-Sender: l.wood@surrey.ac.uk
X-Msg-Ref: server-8.tower-78.messagelabs.com!1418217895!34241101!1
X-Originating-IP: [131.227.200.39]
X-StarScan-Received:
X-StarScan-Version: 6.12.5; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 23515 invoked from network); 10 Dec 2014 13:24:56 -0000
Received: from exht012p.surrey.ac.uk (HELO EXHT012P.surrey.ac.uk) (131.227.200.39) by server-8.tower-78.messagelabs.com with AES128-SHA encrypted SMTP; 10 Dec 2014 13:24:56 -0000
Received: from EXHY022V.surrey.ac.uk (131.227.201.104) by EXHT012P.surrey.ac.uk (131.227.200.39) with Microsoft SMTP Server (TLS) id 8.3.348.2; Wed, 10 Dec 2014 13:24:54 +0000
Received: from emea01-db3-obe.outbound.protection.outlook.com (131.227.201.241) by EXHY022v.surrey.ac.uk (131.227.201.104) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 10 Dec 2014 13:24:54 +0000
Received: from DB4PR06MB457.eurprd06.prod.outlook.com (10.141.238.15) by DB4PR06MB458.eurprd06.prod.outlook.com (10.141.238.19) with Microsoft SMTP Server (TLS) id 15.1.31.17; Wed, 10 Dec 2014 13:24:52 +0000
Received: from DB4PR06MB457.eurprd06.prod.outlook.com ([10.141.238.15]) by DB4PR06MB457.eurprd06.prod.outlook.com ([10.141.238.15]) with mapi id 15.01.0031.000; Wed, 10 Dec 2014 13:24:52 +0000
From: l.wood@surrey.ac.uk
To: akatlas@gmail.com, routing-discussion@ietf.org, stbryant@cisco.com
Subject: Re: routing area design team on dataplane encapsulation considerations
Thread-Topic: routing area design team on dataplane encapsulation considerations
Thread-Index: AQHQFAI+mxeM7C5mI06ufRYYSYZE2pyH3cwAgADUTV+AAAu8gIAACL4O
Date: Wed, 10 Dec 2014 13:24:52 +0000
Message-ID: <1418217891573.89120@surrey.ac.uk>
References: <CAG4d1rd60hK8=WtYw-nid_Z7Z8+TvdzA52fNx3pFjND+eDWAfA@mail.gmail.com>, <54877D58.9050002@cisco.com> <DB4PR06MB457F278EAF9C84BCA20E665AD620@DB4PR06MB457.eurprd06.prod.outlook.com>, <54883947.8000302@cisco.com>
In-Reply-To: <54883947.8000302@cisco.com>
Accept-Language: en-AU, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [122.200.59.254]
x-microsoft-antispam: BCL:0;PCL:0;RULEID:;SRVR:DB4PR06MB458;
x-exchange-antispam-report-test: UriScan:;
x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:;SRVR:DB4PR06MB458;
x-forefront-prvs: 0421BF7135
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(6009001)(189002)(377454003)(479174003)(24454002)(243025005)(199003)(51704005)(66066001)(2656002)(20776003)(40100003)(77096005)(76176999)(105586002)(107886001)(107046002)(21056001)(2501002)(36756003)(64706001)(102836002)(1720100001)(99396003)(15975445007)(120916001)(86362001)(122556002)(87936001)(68736005)(106116001)(106356001)(15395725005)(50986999)(4396001)(62966003)(77156002)(54356999)(46102003)(74482002)(15198665003)(19580405001)(19580395003)(117636001)(92566001)(101416001)(31966008)(561924002)(93886004)(97736003); DIR:OUT; SFP:1102; SCL:1; SRVR:DB4PR06MB458; H:DB4PR06MB457.eurprd06.prod.outlook.com; FPR:; SPF:None; MLV:sfv; PTR:InfoNoRecords; MX:1; A:1; LANG:en;
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OrganizationHeadersPreserved: DB4PR06MB458.eurprd06.prod.outlook.com
X-CrossPremisesHeadersFiltered: EXHY022v.surrey.ac.uk
Archived-At: http://mailarchive.ietf.org/arch/msg/routing-discussion/oKt2ElDp1OuQuAhWEvvVHMCP248
X-BeenThere: routing-discussion@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Routing Area General mailing list <routing-discussion.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/routing-discussion>, <mailto:routing-discussion-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/routing-discussion/>
List-Post: <mailto:routing-discussion@ietf.org>
List-Help: <mailto:routing-discussion-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/routing-discussion>, <mailto:routing-discussion-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Dec 2014 13:25:06 -0000

Stewart,

Who needs an NMS? Let's go old school.
TCP and UDP have equivalent pseudo-header+payload ones complement checksums. So data from TCP can be considered as somewhat equivalent as the technology is the same, and TCP is somewhat better instrumented and monitored than UDP is.

Below sample from two core switches, on tcp connections to them  (not, alas, through them) shows e.g. 10287 TCP checksum errors for 4.3 million TCP/IP packets received. That would be an error rate of 0.24% - the other switch is 0.45% . Note the figure of e.g. 19458 no port - entirely possible something inside the core network is trying a port that isn't there, but also possible corruption of port nos. These are firewalled core devices with limited traffic to them, so I would consider this a best case, without throwing edge device data into the mix.

Extrapolating from TCP checksum rates to UDP checksum rates, from traffic to to traffic through, and from v4 to v6 is left as an exercise for the reader. 

Incidentally, these rates pretty much match the 1 in 400 observation made early in Stone's SIGCOMM 2000 paper. 1:400 is 10,000:4,000,000
http://conferences.sigcomm.org/sigcomm/2000/conf/paper/sigcomm2000-9-1.pdf
Zut alors, quelle surprise, physics is still physics, this particular SIGCOMM paper is still valid.
Checksums fail because of corruption. But at least there's a checksum to catch the corruption. Sans checksums, the corruption is unnoticed. 

The main reason noone has seen misbehaviour with MPLS is because no-one is looking for or instrumenting for it. (Just like MPLS doesn't need TTL, because no-one ever sees MPLS routing loops, right?) Please go measure MPLS and report back.

Traversing over UDP, as suggested in draft-ietf-mpls-in-udp and draft-ietf-tsvwg-gre-in-udp-encap, increases the scope for corruption across a longer path, rather than individual links - and the UDP checksum is an important last check.

SWT1#sh tcp stat
Rcvd: 4332545 Total, 19458 no port
      10287 checksum error, 4959 bad offset, 0 too short
      62946 packets (2289083 bytes) in sequence
      93 dup packets (19167 bytes)
      34 partially dup packets (1878 bytes)
      107 out-of-order packets (17980 bytes)
      8 packets (8 bytes) with data after window
      0 packets after close
      0 window probe packets, 614 window update packets
      682 dup ack packets, 0 ack packets with unsend data
      179823 ack packets (37459618 bytes)
Sent: 2049509 Total, 0 urgent packets
      1774234 control packets (including 345 retransmitted)
      249103 data packets (37439059 bytes)
      308 data packets (41767 bytes) retransmitted
      46 data packets (4876 bytes) fastretransmitted
      25718 ack only packets (3764 delayed)
      0 window probe packets, 104 window update packets
9649 Connections initiated, 1230 connections accepted, 10857 connections established
1760675 Connections closed (including 30 dropped, 1749812 embryonic dropped)
653 Total rxmt timeout, 9 connections dropped in rxmt timeout
0 Keepalive timeout, 6 keepalive probe, 0 Connections dropped in keepalive


SWT2#sh tcp stat
Rcvd: 6370948 Total, 37608 no port
      28419 checksum error, 1749 bad offset, 0 too short
      181714 packets (3507560 bytes) in sequence
      130 dup packets (4925 bytes)
      164 partially dup packets (302 bytes)
      35 out-of-order packets (278 bytes)
      64 packets (4549 bytes) with data after window
      0 packets after close
      0 window probe packets, 2006 window update packets
      3865 dup ack packets, 0 ack packets with unsend data
      857922 ack packets (92503306 bytes)
Sent: 3793587 Total, 0 urgent packets
      2667720 control packets (including 215 retransmitted)
      1042867 data packets (92804839 bytes)
      1772 data packets (74842 bytes) retransmitted
      192 data packets (7176 bytes) fastretransmitted
      81025 ack only packets (15840 delayed)
      0 window probe packets, 19 window update packets
28735 Connections initiated, 5928 connections accepted, 34638 connections established
2626485 Connections closed (including 1349 dropped, 2591827 embryonic dropped)
1987 Total rxmt timeout, 4 connections dropped in rxmt timeout
0 Keepalive timeout, 0 keepalive probe, 0 Connections dropped in keepalive

Lloyd Wood
http://about.me/lloydwood

error-free modern networking technology? ha.
________________________________________
From: Stewart Bryant <stbryant@cisco.com>
Sent: Wednesday, 10 December 2014 11:15 PM
To: Wood L  Dr (Electronic Eng); akatlas@gmail.com; routing-discussion@ietf.org
Subject: Re: routing area design team on dataplane encapsulation considerations

Lloyd

Let's have a current data driven approach to this. You are in a position to
look inside an NMS and tell us what modern data error rates are.

A counter existence proof is MPLS networking which acts at the same
layer that we are discussing and does not seem to have a misdelivery
or "not in table drop rate" that is getting on anyone's radar. Certainly
no one is asking us to upgrade MPLS to add this protection.

- Stewart

On 10/12/2014 11:37, l.wood@surrey.ac.uk wrote:
> It's an issue with zero checksums, ie not hsving a functional working checksum. Non-zero checksums are not an issue.
>
> It's a bit like 60s car manufacturers saying 'do we really need to add seatbelts? They're expensive.' Seatbelts are clearly an issue! Like creationism and evolution, it's not clearcut. We should teach the controversy!
>
> Lloyd Wood
> http://about.me/lloydwood
> ________________________________________
> From: routing-discussion <routing-discussion-bounces@ietf.org> on behalf of Stewart Bryant <stbryant@cisco.com>
> Sent: Wednesday, 10 December 2014 9:53:12 AM
> To: Alia Atlas; routing-discussion@ietf.org
> Subject: Re: routing area design team on dataplane encapsulation considerations
>
> Alia
>
> On 09/12/2014 22:46, Alia Atlas wrote:
>> * IPv6 header protection (non-zero UDP checksum over IPv6 issue)
> I am not sure if it is the non-zero UDP checksum over IPv6 issue, or
> the zeroUDP checksum over IPv6 issue.
>
> Most people doing tunneling seem quite happy with zero but get pushback
> from the transport area.
>
> Perhaps the topic is really
>
> * IPv6 header protection (UDP checksum issue)
>
> - Stewart
>
>
> _______________________________________________
> routing-discussion mailing list
> routing-discussion@ietf.org
> https://www.ietf.org/mailman/listinfo/routing-discussion
>


--
For corporate legal information go to:

http://www.cisco.com/web/about/doing_business/legal/cri/index.html