Re: [Idr] [spring] Error Handling for BGP-LS with Segment Routing

Rob Shakir <robjs@google.com> Fri, 21 December 2018 16:23 UTC

Return-Path: <robjs@google.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6A198130E27 for <idr@ietfa.amsl.com>; Fri, 21 Dec 2018 08:23:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.501
X-Spam-Level:
X-Spam-Status: No, score=-17.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Ndy_hINe0Fft for <idr@ietfa.amsl.com>; Fri, 21 Dec 2018 08:23:18 -0800 (PST)
Received: from mail-wm1-x32e.google.com (mail-wm1-x32e.google.com [IPv6:2a00:1450:4864:20::32e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D4D3F130E1A for <idr@ietf.org>; Fri, 21 Dec 2018 08:23:17 -0800 (PST)
Received: by mail-wm1-x32e.google.com with SMTP id p6so6268632wmc.1 for <idr@ietf.org>; Fri, 21 Dec 2018 08:23:17 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0iHGRfyGBqRDmQjbQFUl89+cPFvU8IkGe4Ty+R1rfTA=; b=Ez30qFTReIILwL5cJsGl7bOEEn4D00lA0sktUWVG0xzIhLbXd5eDN8VaUtXkIJytHK chlDp99lF5CW59bMtf7Etfi1bKHqAhsHpp80dj1wTBKcIhT/eKzUentgvNfwq+WDJyHP Ir33gbRr1KQnRgswI/xRtXdkUnHWVD+6WkT01G+6X2z5ZrwrIW06ujiIscObSM5FlSpr SXU55shiD1Xk2KAblMBFjmSIhdb6MJYKHMWc+SdriPEAznS9mMdyf/qlFDm3AMy34hpb cNg/Y6uuV1Toex21BA9lQBnDdV1PQ31aGYVG2mv5/dHlfdxS95z000RseW+WYbyrjOCg q2XQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0iHGRfyGBqRDmQjbQFUl89+cPFvU8IkGe4Ty+R1rfTA=; b=oLl7Nbk1Pv/8P0y4wkYnTBeWwMdXlD2kGgO7nWRzegsTHZTlYychrtgTpQ8xRi0E09 jDPJVyzUGZQ/YYEedSSLXo2eX8xCRbaKS0fDZ3dR625OkJv+9Y4cNPwatAvEUC15WD+w cZl6lQjyGXrJcoM0D+gpXyY1n8v5BWM6bxkPqGh/TMSRL9CjrvlxP0bBfhWf3qoGlPuZ CvvwTU5Vw8Kc/fgkbA/dbk22pt0GGk9aS+hXsTE9exuRWv+PL9lf+cO4IPF4eIymvFZ+ RR+7zJkXAOHuB/xGpToqqtAWysNSXs6+5aUrqR9NIjT5mVYLi8AD4xK7pfGICVfHqeW1 iRkw==
X-Gm-Message-State: AJcUuke6s//8pA3bHMdXqrtThR2Z33q8TAO0FPc2GYbq/cPsdE9mkrtv mo0itzYx8utJwXR18+2lyel8ezNXk31gTVLv9Ld32g==
X-Google-Smtp-Source: ALg8bN6sd/8LdOpZMVbizFy3ha4j4QKtyhdS65qxOWXwVzAxP9zB2Rh1ZJzJjBx8KcqnsZGk7QZHlY1y1taH9Zg7Aus=
X-Received: by 2002:a7b:ce8e:: with SMTP id q14mr3664215wmj.10.1545409395730; Fri, 21 Dec 2018 08:23:15 -0800 (PST)
MIME-Version: 1.0
References: <CAMMESsz8Z_B1aH-4wYL-V9cV=5Xse+tpKqXFish6+V+td7KKzw@mail.gmail.com> <CA+b+ERmic4UXsuWW08SKOH_hwhC5pA+o-J1pHOoT8n2LGJHUng@mail.gmail.com> <CAMMESszxvEFTdsdCS6yEM=Yi6iy=gnrOqWbD07wFTedY90hLkA@mail.gmail.com>
In-Reply-To: <CAMMESszxvEFTdsdCS6yEM=Yi6iy=gnrOqWbD07wFTedY90hLkA@mail.gmail.com>
From: Rob Shakir <robjs@google.com>
Date: Fri, 21 Dec 2018 08:23:04 -0800
Message-ID: <CAHd-QWu8RjwnwJ8LXWpjTmY=VHA4PwZt=uP+H5M4AnKQVBeG7w@mail.gmail.com>
To: Alvaro Retana <aretana.ietf@gmail.com>
Cc: Robert Raszuk <rraszuk@gmail.com>, "idr@ietf. org" <idr@ietf.org>, SPRING WG <spring@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000001144ab057d8aaadb"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/AxYI0vgp5e8bFddFDwI3xhq0bNg>
Subject: Re: [Idr] [spring] Error Handling for BGP-LS with Segment Routing
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Dec 2018 16:23:21 -0000

Alvaro,

I think this is one of the difficulties of overloading a protocol like BGP
with different datasets -- it's not simple to say how particular attributes
are actually going to be used within a protocol deployment. This was one of
the things that was noted in 7606 -- i.e., I can make *any* attribute
really affect forwarding if I write a policy that accepts/rejects some
UPDATE based on the presence of that attribute.

In general, any topology discovery mechanism (whether used in real-time or
not) needs to define how it handles cases where it might end up with
missing information. Let's consider what the different mechanisms for
discovery we have are today:

   - IGP listening -- in this case, if we have some malformed IS-IS TLV,
   then we might end up discarding this information (whether it be at the
   listening node, or a device that didn't flood it earlier in the chain) --
   meaning that we know that we have some potential gap in the topology.
   - Streaming telemetry -- speaking particularly to gNMI for LSDB
   streaming encoded using the OpenConfig model, here, we are tolerant to
   getting as much information as can be parsed, and have a way to carry
   unknown TLVs (which might include those that cannot be successfully parsed)
   as binary data to the external consumer. This means that the approach is
   "as complete data as possible", but has the same characteristic that we can
   also end up having the potential to lose data.
   - BGP-LS with attribute discard -- this has some information loss, since
   we'll have some attributes that could be malformed in the input data, and
   we discard them at the receiver.

It doesn't seem to me that, given the source of the data is the IGP, and we
might have information discarded there -- that we can really guarantee
strong consistency of an off-box view of the network, since we can't
guarantee strong consistency across the IGP domain itself.

Thus, I'm not sure that the issue that is being highlighted here actually
makes a difference when we're considering the overall system design -- we
always need to deal with the fact that the view of the network at the path
computing node might not match exactly the network's current state in the
presence of malformed protocol messages. One motivation for having the LSDB
via streaming telemetry is the ability to provide such validation ("do all
nodes within my IGP domain, including listeners, have a consistent view of
the state of the network?").

If the discussion is "should we adopt treat-as-withdraw vs. attribute
discard?" -- I don't think that from the system perspective there is really
any difference between the two in this situation. We still have the same
potentially inconsistent view of the network.

For these reasons, I'd err on leaving this unchanged in the current
specification(s).

Cheers,
r.

On Wed, Dec 19, 2018 at 10:13 AM Alvaro Retana <aretana.ietf@gmail.com>
wrote:

> On December 18, 2018 at 6:23:19 PM, Robert Raszuk (rraszuk@gmail.com)
> wrote:
>
> Robert:
>
> Hi!
>
> What comes as #1 question to your points is a comparison of SR controller
> with regular BGP RR.
>
> I think it is safe to assume that error handling on SR controller would be
> no more aggressive then on RRs. So if there is error the updates may be
> dropped on the RRs itself, logged and proper NOC alarm generated.
>
> IMO this is no different regardless if you use SR with BGP-LS or just
> plane regular BGP routing.
>
> In general, I agree that error handling should be the same regardless of
> the type of BGP speaker (RR, controller, PE, whatever).
>
> So unless your goal here is to point out the deficiency of BGP error
> handling RFC I am not sure what is so specific to BGP-LS and SR.
>
> No, the goal is not to point at any deficiency in the error handling RFC.
> I just replied to Bruno saying: " I don’t want to rehash the discussion
> from rfc7606 about the types of approached and whether there should be more
> or not (or what those could be)…. I’m just pointing out that I think the
> current approach is not the right one for all applications.”
>
> When BGP-LS was defined, it was noted that the "information present in
> this document carries purely application-level data that has no immediate
> corresponding forwarding state impact..”  I think that SR has a direct
> impact on the forwarding state of the network.  That is what is specific
> about BGP-LS+SR.
>
>
> To be clear, this thread is about using BGP-LS with applications that have
> an impact on forwarding/route selection in the network, like SR (Bruno
> pointed at lsvr and there may be others).  It is not about about the error
> handling approaches (rfc7606) or BGP sessions in general…just that specific
> application.
>
> Thanks for helping me clarify what I mean.  Hopefully this makes more
> sense. ;-)
>
> Alvaro.
> _______________________________________________
> spring mailing list
> spring@ietf.org
> https://www.ietf.org/mailman/listinfo/spring
>