Re: [Idr] [spring] Error Handling for BGP-LS with Segment Routing

Rob Shakir <robjs@google.com> Thu, 03 January 2019 22:40 UTC

Return-Path: <robjs@google.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AAC1513134A for <idr@ietfa.amsl.com>; Thu, 3 Jan 2019 14:40:27 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.501
X-Spam-Level:
X-Spam-Status: No, score=-17.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id prdag1jzTAoZ for <idr@ietfa.amsl.com>; Thu, 3 Jan 2019 14:40:24 -0800 (PST)
Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4A126131346 for <idr@ietf.org>; Thu, 3 Jan 2019 14:40:24 -0800 (PST)
Received: by mail-wr1-x435.google.com with SMTP id q18so35006835wrx.9 for <idr@ietf.org>; Thu, 03 Jan 2019 14:40:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GMhmM06cmI1n6HQ02qABXGY1MCPBXnNBnXmbliSb+98=; b=FO0WEX9OYEVArCuk+peoBBr6CasHTUv9yxpmPJwnODDtTkk8c7c5KNdRn972QZeOqq 94YxBzqEUZ4rr2onr1N3He4IxfnCCQ9F3nsvWMLNgn3UVnzSCG+JOCljOQr+8W3qK6fu oU748/SHJsTaD7rRN+H7+bsZVkH05c20/76qorvsVH1CalliRsnsBdQhxc1ObAvnphHM 9SX/hUZoJreXvo3W6i8+Z1cIlVKvr8MTb+bOKrKiIFP4qHOToN8cLCellc9G8rnhkVx1 psO9bMpFhojcyJqmjFtSyIyXGbxInXsVLYwTsmAHBLDEyuTMn8u8vP+fW/Z53k033+Ek RCsg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GMhmM06cmI1n6HQ02qABXGY1MCPBXnNBnXmbliSb+98=; b=hleJQfyTbYQZhYnJQYD1dtXht4p9syy3P4lzNLi5vrdhfpRrA+0rB/N+02j8LTGdqU WOQwwsqpl5/m3DjZoAeavrxriG76k5b+d06th/GDaUjjjjd4W9x5E/pZc2/RPVBsLg0l bZ3TssrGUY9Mt1N8FPXK/Q/gTaUHHydhQteWuK/AccNkUIzctTi+kI/ZQOj5AACEJfYi aDxXhCMYQVJmXovFkOcZ+It2Bpd2Yeh8Rb+Gvo1JFJTyRprP6yrTxiP7GiZ3xy937fUS ZAfLMnD957bUxGnHddJlAJ/t0NtevbuEGlnupL3H08cqs5yFvlwC3QeG7l0c1c9ZRPFR ZudA==
X-Gm-Message-State: AJcUukeC8cFFd+m1DObi0fBMpbfejK40n7Oja1oVFqHSf2XQOJSMLn59 7tfZXl8Q+gPHYt8mZBNtaBUWKTKB21n4S+ibWgKokA==
X-Google-Smtp-Source: ALg8bN4wTSMdbaddMAkHbTGrSjf89rPrFJMi+uu60qPwOJofqsLMXncbPlcPgwUb40lkEw4SX0TuaiY5JxaiThk8+Ag=
X-Received: by 2002:adf:9591:: with SMTP id p17mr44810222wrp.224.1546555222186; Thu, 03 Jan 2019 14:40:22 -0800 (PST)
MIME-Version: 1.0
References: <CAMMESsz8Z_B1aH-4wYL-V9cV=5Xse+tpKqXFish6+V+td7KKzw@mail.gmail.com> <CA+b+ERmic4UXsuWW08SKOH_hwhC5pA+o-J1pHOoT8n2LGJHUng@mail.gmail.com> <CAMMESszxvEFTdsdCS6yEM=Yi6iy=gnrOqWbD07wFTedY90hLkA@mail.gmail.com> <CAHd-QWu8RjwnwJ8LXWpjTmY=VHA4PwZt=uP+H5M4AnKQVBeG7w@mail.gmail.com> <CAMMESsxQhNtW4GEvucv6A2Sh2=_sxm9wigRax+9Gj3C7caBV5A@mail.gmail.com>
In-Reply-To: <CAMMESsxQhNtW4GEvucv6A2Sh2=_sxm9wigRax+9Gj3C7caBV5A@mail.gmail.com>
From: Rob Shakir <robjs@google.com>
Date: Thu, 03 Jan 2019 14:40:10 -0800
Message-ID: <CAHd-QWskekEA1HrJbAGnwPrv8b2+jy12qg9iazmn4kXDgsN15Q@mail.gmail.com>
To: Alvaro Retana <aretana.ietf@gmail.com>
Cc: SPRING WG <spring@ietf.org>, Robert Raszuk <rraszuk@gmail.com>, "idr@ietf. org" <idr@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000a559fe057e957229"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/Q1Y_TVe7aW4wGOWYcKnme1dNWbs>
Subject: Re: [Idr] [spring] Error Handling for BGP-LS with Segment Routing
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Jan 2019 22:40:28 -0000

Hi Alvaro,

Also speaking as a WG participant :-)

On Thu, Jan 3, 2019 at 1:40 PM Alvaro Retana <aretana.ietf@gmail.com> wrote:

> BGP-LS only defines a mechanism through which it may miss information, but
> not how to handle it — or maybe it does (?): by using attribute discard it
> just accepts that the information might be missing going forward…and
> doesn’t attempt to do anything.  Maybe this quote is true: "Doing Nothing
> Often Leads to the Very Best Something” — Winnie the Pooh
>

I think that it defines *something*, albeit not explicitly. Essentially, as
I read it, we're saying "when an attribute encoded by the advertising
BGP-LS source is incorrect, then BGP-LS as a system will prefer to use
partial information" (partial information, since we assume that some
information does get through, since the NLRI could be parsed).

That action may be ok in the general case…but I think that doing nothing
> may not be enough/appropriate for an application like SR, because it is
> explicitly calculating paths….
>

> The point I’m trying to bring up is not necessarily treat-as-withdraw vs.
> attribute discard…. But, first, is attribute discard
> enough/appropriate/good for a BGP-LS application such as SR?  If it isn’t,
> second, is there a different approach that would be better?  Maybe we then
> come to a point where something can change…or accept the limitations of the
> system and be clear about them.  I fully realize that I may be the only one
> who thinks there’s an issue…
>

My point was really the same... The question I was trying to raise is "what
is the alternative that you would suggest?". Other technologies that
fulfill the same role as BGP-LS (those that I described) don't take a very
different approach.

Clearly, it's bad to calculate paths with incomplete information about the
topology of the network. It's also bad to calculate zero paths because you
discarded the entire topology based on an error. We're in-between a rock
and a hard place in terms of maintaining system functionality here -- all
systems that do the same as BGP-LS are having to make some form of
compromise about which constraint (correctness, or connectivity) they are
violating.

This is why I was arguing for leaving things unchanged -- the correctness
constraint seems OK to violate by default. If there are deployments where
connectivity is the desirable constraint to violate, then reacting to the
fact that attribute-discard did occur is possible (or not configuring 7606
error handling if the implementation supports this).

Describing these compromises is, of course, a good idea. However, it's not
clear where this description would go -- we don't really have a document
that describes this overall system and how it might be implemented today.

Cheers and HNY!
r.



>
> Thanks!!
>
> Alvaro.
>
>
> On December 21, 2018 at 11:23:16 AM, Rob Shakir (robjs@google.com) wrote:
>
> Alvaro,
>
> I think this is one of the difficulties of overloading a protocol like BGP
> with different datasets -- it's not simple to say how particular attributes
> are actually going to be used within a protocol deployment. This was one of
> the things that was noted in 7606 -- i.e., I can make *any* attribute
> really affect forwarding if I write a policy that accepts/rejects some
> UPDATE based on the presence of that attribute.
>
> In general, any topology discovery mechanism (whether used in real-time or
> not) needs to define how it handles cases where it might end up with
> missing information. Let's consider what the different mechanisms for
> discovery we have are today:
>
>    - IGP listening -- in this case, if we have some malformed IS-IS TLV,
>    then we might end up discarding this information (whether it be at the
>    listening node, or a device that didn't flood it earlier in the chain) --
>    meaning that we know that we have some potential gap in the topology.
>    - Streaming telemetry -- speaking particularly to gNMI for LSDB
>    streaming encoded using the OpenConfig model, here, we are tolerant to
>    getting as much information as can be parsed, and have a way to carry
>    unknown TLVs (which might include those that cannot be successfully parsed)
>    as binary data to the external consumer. This means that the approach is
>    "as complete data as possible", but has the same characteristic that we can
>    also end up having the potential to lose data.
>    - BGP-LS with attribute discard -- this has some information loss,
>    since we'll have some attributes that could be malformed in the input data,
>    and we discard them at the receiver.
>
> It doesn't seem to me that, given the source of the data is the IGP, and
> we might have information discarded there -- that we can really guarantee
> strong consistency of an off-box view of the network, since we can't
> guarantee strong consistency across the IGP domain itself.
>
> Thus, I'm not sure that the issue that is being highlighted here actually
> makes a difference when we're considering the overall system design -- we
> always need to deal with the fact that the view of the network at the path
> computing node might not match exactly the network's current state in the
> presence of malformed protocol messages. One motivation for having the LSDB
> via streaming telemetry is the ability to provide such validation ("do all
> nodes within my IGP domain, including listeners, have a consistent view of
> the state of the network?").
>
> If the discussion is "should we adopt treat-as-withdraw vs. attribute
> discard?" -- I don't think that from the system perspective there is really
> any difference between the two in this situation. We still have the same
> potentially inconsistent view of the network.
>
> For these reasons, I'd err on leaving this unchanged in the current
> specification(s).
>
> Cheers,
> r.
>
>