Re: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt

Jeff Wheeler <jsw@inconcepts.biz> Mon, 10 December 2012 10:40 UTC

Return-Path: <jsw@inconcepts.biz>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9482421F8E5F for <idr@ietfa.amsl.com>; Mon, 10 Dec 2012 02:40:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.717
X-Spam-Level:
X-Spam-Status: No, score=-2.717 tagged_above=-999 required=5 tests=[AWL=0.260, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WrgWEjROsEuB for <idr@ietfa.amsl.com>; Mon, 10 Dec 2012 02:40:46 -0800 (PST)
Received: from mail-ie0-f179.google.com (mail-ie0-f179.google.com [209.85.223.179]) by ietfa.amsl.com (Postfix) with ESMTP id B117E21F8E5E for <idr@ietf.org>; Mon, 10 Dec 2012 02:40:46 -0800 (PST)
Received: by mail-ie0-f179.google.com with SMTP id k14so7592916iea.10 for <idr@ietf.org>; Mon, 10 Dec 2012 02:40:45 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=+LdJuzowiyCQGtrRIOtB6Py2DEjiIvA3g39v2BZwZ7w=; b=biJndYkG13N+nvu0MTmPb4fvwj6HXjVyNa0HHsTkjZv2QaFB2OfvfHh1JzATth0qun GDWyMbvTV1hfg3IesGBZjMjSJKkLVogQddAaBrvvdGIdxolwNF5h4UqpWnNXcJhl/JtO Z6OvTJ/22kN4rUl7VP2r24Zqd1+Z1VmyxKuLRfib2+kXTc97aja7w3B2HH/b7plGYiC3 X+hsdO3vvY3xY9YX6LKNArotzMd5NOXBXy7Qq7ywbMxC0Ng0mNSKPAwFSoKA4UhFbx/N 5kXLvrGVhCsC3+LO/siHW9Mam3zB14lAwuThyIlq7MyP/hAXdhqG73TiNpLuscfjJ+H9 +Ayg==
MIME-Version: 1.0
Received: by 10.50.242.73 with SMTP id wo9mr6221766igc.36.1355136045581; Mon, 10 Dec 2012 02:40:45 -0800 (PST)
Received: by 10.64.132.33 with HTTP; Mon, 10 Dec 2012 02:40:45 -0800 (PST)
X-Originating-IP: [74.134.22.105]
In-Reply-To: <15157_1355133030_50C5B066_15157_708_1_53C29892C857584299CBF5D05346208A1161FF@PEXCVZYM11.corporate.adroot.infra.ftgroup>
References: <20121121191321.6164.6887.idtracker@ietfa.amsl.com> <50AD2986.90705@cisco.com> <058b01cdd3b4$9f5193b0$ddf4bb10$@highwayman.com> <8ED5B0B0F5B4854A912480C1521F973A0F4940@xmb-rcd-x13.cisco.com> <94913EE5-2864-4EE2-B474-9631430B1E22@ericsson.com> <068701cdd478$2cf01cf0$86d056d0$@highwayman.com> <CAEGVVtBy-zdLz8hVajLnuAqgzfgQHrseK4r-N9=pOZGtqV7LbA@mail.gmail.com> <CAH1iCipfup-GEeJduBti_KHvX1pUZfmZLA3Zz5Y9Aw9xV3fQ9w@mail.gmail.com> <07e901cdd667$31c593e0$9550bba0$@highwayman.com> <CAPWAtbJ4WqoyrzE87v-7hJpp_=fL=B-LevdSe9Q-_m8FLYdFZw@mail.gmail.com> <15157_1355133030_50C5B066_15157_708_1_53C29892C857584299CBF5D05346208A1161FF@PEXCVZYM11.corporate.adroot.infra.ftgroup>
Date: Mon, 10 Dec 2012 05:40:45 -0500
Message-ID: <CAPWAtbJjG9inc59PyKXwOPpWrQgmD8dV5yOopCYwZeZc0S8yPQ@mail.gmail.com>
From: Jeff Wheeler <jsw@inconcepts.biz>
To: bruno.decraene@orange.com
Content-Type: text/plain; charset="ISO-8859-1"
X-Gm-Message-State: ALoCoQkN/IjeuHsyD5nyHXW8QdjcwBHEDLLDpRTQ8kbQ6R+XuaPq8/VYHHBg3OacleofIQdzdkqr
Cc: "idr@ietf.org" <idr@ietf.org>
Subject: Re: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Dec 2012 10:40:47 -0000

On Mon, Dec 10, 2012 at 4:50 AM,  <bruno.decraene@orange.com> wrote:
>>http://mailman.nanog.org/pipermail/nanog/2012-November/053754.html
>
> In that specific case, it was not a BGP protocol error, but a bug in the router receiving the UPDATE. i.e. draft-ietf-idr-error-handling would not change this.
> However, this is an example that error in flags, including in well known attributes, may happen.

It is also an example of when more choices for fault handling would
have significantly reduced the operational impact on networks who had
the buggy routers.  If my customers who had OpenBGPd could simply
configure their router like, "bgp fault-tolerance attribute-malformed
treat-as-withdraw" they could have got themselves back online.

Because this is not allowed by the current BGP protocol, vendors have
disincentive to provide such flexibility, even though it could
sometimes help customers.  I understand why vendors do not want to
give their customers "too much rope" but as the uses for BGP continue
to expand, and more new code and new vendors continue to use it within
the network, vendors are likely to find that customers demand more
choices for handling malfunctions.

> And in general, I agree with Jeff & Rob: BGP is mission critical to networks. Shutting it down for a wrong flag/bit should not be the first option.

I would like to show a detailed example of a mistake that is not in a
TYPE or LENGTH field can make it impossible for the router to know
what prefixes are in an UPDATE.  Chris's post focused on "framing
errors" but these are unfortunately not the only kind of errors that
are serious.

Imagine if a route is originated with MPLS labels in the reachability,
and the bottom-of-stack bit is not set on any of the labels.  Do you
know what happens?  Receiving routers are not only unable to decide
what to do with this update, they can't even figure out what prefixes
the update is for, because the bottom-of-stack bit indicates to the
receiver where the prefixes actually begin.  In effect, the label data
is part of the framing.

I have made notes and illustrations on RFC3107 Pg3:
http://inconcepts.biz/~jsw/img/1120824-rfc3107pg3.jpg
Sorry this is hand-drawn and scanned into the computer but it is legible.

What should your router do if it can't even figure out what prefixes
are being updated?  It can't "treat as withdraw" but some option
better than close the session should be permissible.

I really think a wholesale modification to the BGP protocol will
eventually be necessary because of all the new uses for BGP.  Yes,
MP-BGP is allowing it to do new things, but not with the kind of fault
tolerance that customers should expect.  Improving the choices for
handling faults is clearly a good effort.  However, it is unfortunate
that there are so many kinds of bugs which require the router to make
potentially unsafe guesses.

Until there is a BGPv5 with an improved approach to length encoding
and more resilience against faulty implementation on one device
cascading to others in the network, we will continue to have
questionable solutions to bad problems.  I think it's important to
realize this so myopia about BGPv4's limits does not discourage anyone
from wanting to make it as robust as possible If The Operator decides
to configure his router using vendor-supplied robustness options.

-- 
Jeff S Wheeler <jsw@inconcepts.biz>
Sr Network Operator  /  Innovative Network Concepts