Re: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt

Jeff Wheeler <jsw@inconcepts.biz> Mon, 10 December 2012 23:16 UTC

Return-Path: <jsw@inconcepts.biz>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4B3E921F86C3 for <idr@ietfa.amsl.com>; Mon, 10 Dec 2012 15:16:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.473
X-Spam-Level:
X-Spam-Status: No, score=-2.473 tagged_above=-999 required=5 tests=[AWL=-0.096, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, J_CHICKENPOX_33=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ajBMjdDN4N-4 for <idr@ietfa.amsl.com>; Mon, 10 Dec 2012 15:16:05 -0800 (PST)
Received: from mail-ie0-f179.google.com (mail-ie0-f179.google.com [209.85.223.179]) by ietfa.amsl.com (Postfix) with ESMTP id 8BF6821F86B9 for <idr@ietf.org>; Mon, 10 Dec 2012 15:16:05 -0800 (PST)
Received: by mail-ie0-f179.google.com with SMTP id k14so9795020iea.24 for <idr@ietf.org>; Mon, 10 Dec 2012 15:16:05 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:cc:content-type:x-gm-message-state; bh=1OotLk1t/nSX/ZUZQ8611XPoq3jvxyDq68ANNO8cgGU=; b=YXCjJ8Q7wZNCubIo/nuGKf1PJiLfPinU1j22kcuWpb3vndSt+KE3oDHWAvxLZCrYWk 7oi/TgAj6LyylkYEM1UtQ7lhprwqEWjNBe1M3PQRaN2eZ21KBKCRgJIbFmLVrwYJERal ih1hj/yJTAZouZYngSC1c6zX/Ajgfq0NXc6x9uhddffOMmJiTc7nsaCoYMt6GSHPq/yv 4eYCWehFObeD9K4Z2+Vdm/3BI4JHLfQsXgAVC2Dhyp9AWNbrT17t53a9pVZ/WOnZqQ8t kKGO3pmETlBXBEFp0osjim5s441fkQj3qLvdpNFwu05nXqqcofqOoE++Y/LPRMJw4gcC byVg==
MIME-Version: 1.0
Received: by 10.50.157.130 with SMTP id wm2mr11072865igb.0.1355181364651; Mon, 10 Dec 2012 15:16:04 -0800 (PST)
Received: by 10.64.132.33 with HTTP; Mon, 10 Dec 2012 15:16:04 -0800 (PST)
X-Originating-IP: [74.134.22.105]
In-Reply-To: <CAH1iCirraR6BcRvupMN-+t=HXv9s2giTsOKzsDa=GhyCNj_THQ@mail.gmail.com>
References: <20121121191321.6164.6887.idtracker@ietfa.amsl.com> <50AD2986.90705@cisco.com> <058b01cdd3b4$9f5193b0$ddf4bb10$@highwayman.com> <8ED5B0B0F5B4854A912480C1521F973A0F4940@xmb-rcd-x13.cisco.com> <94913EE5-2864-4EE2-B474-9631430B1E22@ericsson.com> <068701cdd478$2cf01cf0$86d056d0$@highwayman.com> <CAEGVVtBy-zdLz8hVajLnuAqgzfgQHrseK4r-N9=pOZGtqV7LbA@mail.gmail.com> <074d01cdd536$173f5830$45be0890$@highwayman.com> <9474D8DC-30FF-4C52-9504-15CBCC47E7D8@ericsson.com> <07df01cdd661$f28ef7c0$d7ace740$@highwayman.com> <36E98AE5-3EF8-4738-9982-42B9CA0BAAF5@rob.sh> <005001cdd6da$099f1e90$1cdd5bb0$@highwayman.com> <828AAFF5-0260-4AA6-BBDC-6C1F69919837@ericsson.com> <009001cdd6ff$1c982530$55c86f90$@highwayman.com> <CAPWAtb+JZaJejebL0qd8LzC+3zhGYcagnz9m7gqq=AMC9T=mhw@mail.gmail.com> <CAH1iCirraR6BcRvupMN-+t=HXv9s2giTsOKzsDa=GhyCNj_THQ@mail.gmail.com>
Date: Mon, 10 Dec 2012 18:16:04 -0500
Message-ID: <CAPWAtbJy3icE6SrcDaHm3R9M79VeNAaC5kg_jfarjc7PBLA6Wg@mail.gmail.com>
From: Jeff Wheeler <jsw@inconcepts.biz>
To: Brian Dickson <brian.peter.dickson@gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"
X-Gm-Message-State: ALoCoQkpkeA3YPt9ckwVKI3Hmh1rcxYA7LyAaa4TrTGPeLWww1YbLjd+9WBRPYiN0XbDWM9nTMbo
Cc: idr@ietf.org
Subject: Re: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Dec 2012 23:16:06 -0000

On Mon, Dec 10, 2012 at 3:25 PM, Brian Dickson
<brian.peter.dickson@gmail.com> wrote:
> It is unfortunate that UPDATE messages do not have any sort of identifier.

My knee-jerk is that I am not very excited about your idea, but it is
useful to point out that LDP has message identifiers.  LDP needs to
keep more state around about pending LSP setup to support ordered
mode, merge, and downstream on demand, but there are also useful
diagnostic messages that will include the original message ID when
various fault conditions are observed.

> It would be nice to be able to have the receiver send a NOTIFICATION that
> says, in effect, UPDATE #foo was garbled, please send JUST the afi/safi &
> NLRI so I know what to ignore.

If this happens the NOTIFICATION should be responded to not with an
ordinary UPDATE but with one that is either a new Message Type or else
an UPDATE that has a very unambigious field which is sure not to be
confused or corrupted, signaling that this is a re-try of a previous
UPDATE and, if it can't be parsed, don't make that same NOTIFICATION
again.

But if your ultimate attempt is to ask the peer for the information
you need to effect a WITHDRAW, then you could just invent a way to
demand the peer send you that WITHDRAW.  The cool thing about that is,
then the peer can actually have knowledge of what route is broken.  It
could save that information to support investigation of the problem on
both sides of what may be an eBGP session.  At minimum, it could mark
the RIB entry so it knows that route is not advertised to you, and
this could be available on the CLI.

Now, I am not thinking that anyone will jump to implement your idea,
but my knee-jerk was "oh no!" and really, your idea has real serious
technical advantages.  Of course you observe that it has some
disadvantages too (more state on sender side.)  If one were inventing
"BGP 5" surely this ability would be sensible.  Maybe it is even a
sensible Capability for BGP today.  After all, the vendor can always
allow the operator to deactivate the Capability if they wish.

> Maybe this could be handled via another UPDATE Attribute - the
> identification for an UPDATE itself?
> Then refer to this in the appropriate NOTIFICATION, type 3, new subtype,
> with parameter identifying the problem UPDATE?

As long as the sender doesn't respond with another UPDATE that gets
confused and results in another NOTIFICATION in a loop.  But that is
fixable like I describe above.

> Then there is the question of the original UPDATE (generally on-the-fly
> data). If the error happens soon enough, maybe the sender still has the
> UPDATE in his/her outgoing TCP buffer?

The sender won't have the UPDATE in his TCP buffer because it is
basically guaranteed the receiver will already ACK that segment,
either before or when sending the NOTIFICATION, unless some deep TCP
stack tweaking is done to effect this.  Since the segment is ACK'd
then it will be free'd on the sender side, as there is no reason to
keep it around.  Also, extracting the *malformed* UPDATE NLRI out of
the TCP send buffer to then try to re-send it sounds like a good way
for both the sender and receiver to get confused.

But even if your specific notion of reaching into TCP is bad, your
concept is good and fooling around with TCP just isn't a very good way
to implement it.  The vendors can of course choose to implement it
however they please.

> Otherwise, this would mean having to hold onto UPDATEs for possibly longer
> than desired.

I guess there is no free lunch, but what costs does your idea have
other than spending some RAM to be capable of re-sending the NLRI from
a recent UPDATE, and a little bit of CPU to manage the structure
storing these recent UPDATEs?

> And of course, this pushes the "ACK" further up the stack, from TCP to the
> parent application (BGP). Again, maybe not that bad an idea, but definitely
> non-trivial.

Either it provides a fault tolerance mechanism at the cost of some
RAM+CPU and the sender evicts the information needed to re-send NLRI
after some timeout or memory limit has been reached, or else BGP
Advertisements become less of a simple advertisement and a little
closer to a request/response, except the response isn't anything more
than "understood," not like LDP where responses are requests for LSP
setup, etc.

Anyway, I think your idea is clever, but has about zero chance of
being liked by most people, because it is not that easy to implement
and the frequency of need for it to actually mitigate or prevent an
outage is extremely low.  If I were inventing "BGP 5" I would not
include the requirement for a sender to be able to re-transmit
information from an already-transmitted Message because this could
really be difficult on route-reflectors after major events, certainly
on ASBRs at IXPs or with many eBGP customers receiving DFZ, IXP
route-servers, and so on.  Instead I would just invent BGP 5 to have a
more robust Message structure and be done with silly errors that are
difficult to recover from.  By the time you go to the effort to
program your feature, you might as well just program a superior
Message structure and not have any CPU/RAM expense for
already-transmitted data.

Just my $0.02.
-- 
Jeff S Wheeler <jsw@inconcepts.biz>
Sr Network Operator  /  Innovative Network Concepts