Re: [Tofoo] VXLAN (UDP tunnel protocols) and non-zero checksums

Tom Herbert <therbert@google.com> Fri, 02 May 2014 16:11 UTC

Return-Path: <therbert@google.com>
X-Original-To: tofoo@ietfa.amsl.com
Delivered-To: tofoo@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3C7A21A6F54 for <tofoo@ietfa.amsl.com>; Fri, 2 May 2014 09:11:03 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.029
X-Spam-Level:
X-Spam-Status: No, score=-2.029 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RP_MATCHES_RCVD=-0.651, SPF_PASS=-0.001] autolearn=unavailable
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y6Pps6EpQ_-Q for <tofoo@ietfa.amsl.com>; Fri, 2 May 2014 09:11:01 -0700 (PDT)
Received: from mail-ie0-x22b.google.com (mail-ie0-x22b.google.com [IPv6:2607:f8b0:4001:c03::22b]) by ietfa.amsl.com (Postfix) with ESMTP id 78E2F1A08DA for <tofoo@ietf.org>; Fri, 2 May 2014 09:11:01 -0700 (PDT)
Received: by mail-ie0-f171.google.com with SMTP id to1so5287435ieb.2 for <tofoo@ietf.org>; Fri, 02 May 2014 09:10:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=P2CvGYEP3oM6LdtxSQ6T3ReAUn+gYkT3VpCzF87IJxs=; b=bEm/Kcu2y5qAsdd7o4mFYu8q8sYRU8cE1/t1xNyTq7UM44okpG0Tp3rZjehk2ltp78 /Rv7m4r7F9nByi/xCyXNotPIhDMWDXt6jRM3Nrk3bV4AWwTuK9vhMo1Pw+sLq+DNTTau 7wpgJ5m4pPxGJXhsNd0q9etYgR+vdk4Tj11MWPS7lLIFHx9v8KTz0zKbDPuXbcbQverb lYkcrYmTOKq2SoTAtCr20nPnPQZw2DRd5fYPdMLOGACL3ArLaDZL9knQFGQyeLZFAmds 4OVRxIAnKPPMs2Xp3s2vAweoRleo0M+K5NJz3v/1x+tj9im1QEHg2RhgMwK/Qjkm51nY JwZQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=P2CvGYEP3oM6LdtxSQ6T3ReAUn+gYkT3VpCzF87IJxs=; b=gWEXMVLaqdWXY/R5RRbMD8BDLTQrJIUrfQK26QycqbYJ5BVuMj8rI0Jcor4PuPUVTf DjfU5+J1xEZEAcdwwiEkMA/kOKyufOKQpZv21QrD6vZ1RXzT0JlvQylRWuF9Wc+FfOw9 QS6pA48Evn+MPvRiQhuNu7XJF4h8nCh7AHjLDv+F20aARvyjTLVTD5TXf8YuQ3OaR5au VGwzatat3xgil8v9O5xNkz6qGCkDlhJXahXsSXQrf9lMu4ScmDJGP2Cbj7Th4qCP+qAs bnpYLaQZKKBdxLwXl/+3kEaR6FAWaSMkeAF8CRIMd6tjlFjhhTCetu9/yloJhezBZOHW fWbA==
X-Gm-Message-State: ALoCoQkUAmNN4lQSzJ67utuoQ0k+dMoJWKR070lckgVZ9/vD+j0Y3TDjI9B1fsoQBSoTbM/7DHMl
MIME-Version: 1.0
X-Received: by 10.50.25.67 with SMTP id a3mr5550502igg.28.1399047059088; Fri, 02 May 2014 09:10:59 -0700 (PDT)
Received: by 10.64.148.98 with HTTP; Fri, 2 May 2014 09:10:58 -0700 (PDT)
In-Reply-To: <53637052.9050405@cisco.com>
References: <CA+mtBx8+OyN5UUsL-sS1AuPF69p6=T3kw4Mq-BogjQhEF-Cpsw@mail.gmail.com> <CF86DC33.F39B6%kreeger@cisco.com> <CA+mtBx9E=NopE=Evm1u7air4_R_eCUM6WvaOW+mw7m6LDGemDw@mail.gmail.com> <CF86F645.F3CBB%kreeger@cisco.com> <CA+mtBx8fwd8O47PvYqaBn6MFuQ6DYbYKrvfQs5CLO8M+WSxarw@mail.gmail.com> <53637052.9050405@cisco.com>
Date: Fri, 02 May 2014 09:10:58 -0700
Message-ID: <CA+mtBx_kkh8VBrBC_hLa6=VyMbzmq3SoJN-9NeZ=-d0gh1Ltog@mail.gmail.com>
From: Tom Herbert <therbert@google.com>
To: stbryant@cisco.com
Content-Type: multipart/alternative; boundary="047d7bd74a2cfb49cb04f86d065b"
Archived-At: http://mailarchive.ietf.org/arch/msg/tofoo/dVkUYVkTnjuE1UphqWV_K8TW1Bk
Cc: "tofoo@ietf.org" <tofoo@ietf.org>, "nvo3@ietf.org" <nvo3@ietf.org>, "Larry Kreeger (kreeger)" <kreeger@cisco.com>, "mallik_mahalingam@yahoo.com" <mallik_mahalingam@yahoo.com>, "ddutt.ietf@hobbesdutt.com" <ddutt.ietf@hobbesdutt.com>
Subject: Re: [Tofoo] VXLAN (UDP tunnel protocols) and non-zero checksums
X-BeenThere: tofoo@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for Tunneling over Foo \(with\)in IP networks \(TOFOO\)." <tofoo.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tofoo>, <mailto:tofoo-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tofoo/>
List-Post: <mailto:tofoo@ietf.org>
List-Help: <mailto:tofoo-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tofoo>, <mailto:tofoo-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 May 2014 16:11:03 -0000

On Fri, May 2, 2014 at 3:15 AM, Stewart Bryant <stbryant@cisco.com> wrote:

>  On 01/05/2014 04:14, Tom Herbert wrote:
>
>
>
>       Assuming we are using Ethernet (I don't believe this can be a
> requirement either) this only provides hop to hop protection, not end to
> end. I don't have a completely error free network and checksum errors while
> low, are non-zero.
>
>
> Interesting, for years I have been looking for someone to put forward
> some statistics on the error rate in a modern network, to get some
> hard data on the UDP c/s issue. What error rate are you seeing?
>
> We see well less than 100 errors per day throughout out network. This is
internal traffic only, rates on from Internet are much higher as one would
expect. I haven't done all the math, but this may be in line with nominal
error rate in the network and the probability of an undetected error
getting though MAC CRC.

My bigger concern is the effects when hardware (or software) fails. Most
hardware failures are fairly noticeable and don't corrupt packets, and the
usual effects are packet loss or increased latency. We have seen cases
though, where hardware fail by corrupting packet data and/or not doing
validation correctly. In the worse case, we have seen undetected data
corruption make it all the way to the application. Failures like this are
still very rare, however when they occur they may be difficult to debug and
even identifying the failure may take a long time.

To mitigate the risks of undetected data corruption in the network, a
separate end to end CRC is commonly performed by the applications over
sensitive data. This is a component of an RPC layer for instance. I think
we'd like to see the probability of an undetected data corruption to be no
more than 2^-64 (I believe the vni in VXLAN qualifies as sensitive data).

Note that neither checksum nor CRC provide for authentication which is
emerging as another requirement for intra data center traffic-- and
encryption as a requirement won't be far behind.

Also can you help me understand what class of switch/router you are
> seeing this on: h/w forwarding, s/w forwarding, host basted s/w
> forwarding... If you know whether the packet memory has any form of error
> detection/protection that would also be interesting.
>

We haven't looked into it at that level, any of those would be suspect
including the NIC and host software. The normal errors seem fairly random
and since there being detected at the RX host it wouldn't be obvious how to
determine where in the network the corruption actually occurred.

There is probably both devices that have memory protection and legacy ones
that don't.

Tom


>

>
 Thanks
>
> Stewart
>
>
>
>
>
>
>