From nobody Fri Aug 20 15:59:50 2021
Return-Path: <jefftant.ietf@gmail.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
 by ietfa.amsl.com (Postfix) with ESMTP id 652063A08D5;
 Fri, 20 Aug 2021 15:59:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.096
X-Spam-Level: 
X-Spam-Status: No, score=-2.096 tagged_above=-999 required=5
 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
 DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 HTML_MESSAGE=0.001, MIME_QP_LONG_LINE=0.001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
 header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44])
 by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
 with ESMTP id LMl05UZfvWVS; Fri, 20 Aug 2021 15:59:42 -0700 (PDT)
Received: from mail-pj1-x102d.google.com (mail-pj1-x102d.google.com
 [IPv6:2607:f8b0:4864:20::102d])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by ietfa.amsl.com (Postfix) with ESMTPS id C93033A08C7;
 Fri, 20 Aug 2021 15:59:42 -0700 (PDT)
Received: by mail-pj1-x102d.google.com with SMTP id
 fa24-20020a17090af0d8b0290178bfa69d97so8384358pjb.0; 
 Fri, 20 Aug 2021 15:59:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; 
 h=content-transfer-encoding:from:mime-version:subject:date:message-id
 :references:cc:in-reply-to:to;
 bh=s/esH0sHaBgGJYdwDQSC6v/rQXdrrr1LqrRDNbL0odM=;
 b=sl3QagJSu44umUWPosoczSKsUjqXs74agw1ma6f/qXzksE2LiodK8ENCeXcPccutow
 1nFbNsdz5HF9k5rKqQhgs2/uA8HmJAx9q0vn9Y7hGOTuEg1gUFrCYRrxqvy8py8E4H2T
 PQak45OuqhAXqwGAaFnXPx7+qJJ3n95nJewwx5tXA1Bd83H323ATCDY7r+/jKvVvWq2H
 kW/sXSl/a0fPXOFK8cTswUo3i4dIPuPjlQVSqxj4iSMK/15l9VmrF83vt6ozvtUCJCsk
 k/tOZyaFz/aLk5SVQu9QgOSVwdTOfKLWbNBNvZivSTub4QHUrJ71rh3X6BYXkFAybCD9
 Al5A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:content-transfer-encoding:from:mime-version
 :subject:date:message-id:references:cc:in-reply-to:to;
 bh=s/esH0sHaBgGJYdwDQSC6v/rQXdrrr1LqrRDNbL0odM=;
 b=NZAqeK8jaW0PBObqE3VKRSrcbU3qTZmJ30gEW9DNagwA7MmAGU5OfJLEy1ixQaf+ot
 U7seBvVlXiVTv6kbhBnNiilP719QGkhBhRSSFyjNzcOah3rw2tlvguScmfd9/x4ufaEd
 SVANOOrO7wkxSlg9HrojlNQQPFiLn8c/zDao9BluTen7hJ5A/ZK0UKWpUfn9gYXCEG+1
 qEubOJEAlGrO/VzwOCVVdkk05r+eS49YdKCvWPcsypDye6kl5LJ5K7yr0AnmLvMWgCMD
 sZ1HomikXW9yFHouwsGeTeHbXAprGmUnbme2ZcPlBzMjzRkxCx1aGRD2k51bkEQSzowX
 a6+Q==
X-Gm-Message-State: AOAM531DyKnr0XqJz9X5EWi9LR39Gz5Ng5xP/ExmagcXRhCYJvkRQO1H
 DWEbP8EvYdEresQw5o1dX10yoKCuLWMPxg==
X-Google-Smtp-Source: ABdhPJwnjLKoLB2Fow5fPuZSlUamjI6xVsmcOcl7slN2ESlP7JgkjQ9o/p52ZcQ0ii24Uph7Pv9Znw==
X-Received: by 2002:a17:90b:1d88:: with SMTP id
 pf8mr6807810pjb.152.1629500380500; 
 Fri, 20 Aug 2021 15:59:40 -0700 (PDT)
Received: from smtpclient.apple (c-73-63-232-212.hsd1.ca.comcast.net.
 [73.63.232.212])
 by smtp.gmail.com with ESMTPSA id v1sm8856061pgj.40.2021.08.20.15.59.39
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Fri, 20 Aug 2021 15:59:39 -0700 (PDT)
Content-Type: multipart/alternative;
 boundary=Apple-Mail-7B8FC2E1-A25F-4C6B-8812-036F1ED25D37
Content-Transfer-Encoding: 7bit
From: Jeff Tantsura <jefftant.ietf@gmail.com>
Mime-Version: 1.0 (1.0)
Date: Fri, 20 Aug 2021 15:59:38 -0700
Message-Id: <5D874729-4FC1-4AD6-BD6C-BDC473B6B342@gmail.com>
References: <CANJ8pZ-mz8KuWBkgYjuKSRwTW2ycq2VkQKYRqnnFquPQEQ=GNg@mail.gmail.com>
Cc: Jeffrey Haas <jhaas@pfrc.org>, idr@ietf.org, draft-chen-bgp-redist@ietf.org
In-Reply-To: <CANJ8pZ-mz8KuWBkgYjuKSRwTW2ycq2VkQKYRqnnFquPQEQ=GNg@mail.gmail.com>
To: Enke Chen <enchen@paloaltonetworks.com>
X-Mailer: iPhone Mail (18G82)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/xYUOyBzxIRX7pKGYNUnAu_m6t9Y>
Subject: Re: [Idr] draft-chen-bgp-redist - A Juniper Perspective
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>,
 <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>,
 <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 20 Aug 2021 22:59:49 -0000


--Apple-Mail-7B8FC2E1-A25F-4C6B-8812-036F1ED25D37
Content-Type: text/plain;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable

Jeff,

Many thanks for the analysis and detailed explanation of Junos behavior.
It would be great if other vendors would do a similar work and post to the l=
ist.

Perhaps RFC5004/router-id (in densely meshed eBGP topologies) behavior  dese=
rves a good discussion as well.

Cheers,
Jeff

> On Aug 20, 2021, at 14:28, Enke Chen <enchen@paloaltonetworks.com> wrote:
>=20
> =EF=BB=BF
> Hi, Jeff:
>=20
> Thanks for the pointer to JUNOS's BGP path selection. It's great that the G=
ated/JUNOS already has some consideration of the "admin-distance" / "prefere=
nce" in BGP. That should take care of the case described in Sect. 2.1 ("On a=
 single router").  It's not clear to me, though, whether and how the cases d=
escribed in Sec. 2.2 ("Network-wide behavior") are handled.
>=20
> This is just a quick comment. I am still trying to digest your email.
>=20
> -- Enke
>=20
>=20
>> On Fri, Aug 20, 2021 at 12:22 PM Jeffrey Haas <jhaas@pfrc.org> wrote:
>> Enke & Jenny,
>>=20
>> While I'm inclined to say that the matters described in
>> draft-chen-bgp-redist aren't a problem, it's probably more reasonable to s=
ay
>> "this isn't a problem for everyone".
>>=20
>> I'm not going to debate what other implementations do.  Instead, I'll spe=
nd
>> the much of my message discussing what the implementation I work on does.=

>> This does leave us in an interesting headache as a working group whether
>> there's more general work to do here that merits discussion as an RFC.
>>=20
>> ----
>>=20
>> One thing I will contribute toward some possible need for discussion is t=
he
>> sloppy ground that RFC 4271 had with regards to interactions with non-BGP=

>> protocols.  We had quite a difficult time getting the text for this "righ=
t"
>> and there was multiple pushes to try to NOT discuss non-BGP protocols at
>> all.  Such discussions got very deep in the internal details of various
>> implementations.  What we did end up with in RFC 4271 was the following:
>>=20
>> The BGP RIBs are a distinct entity from the "Routing Table".
>>=20
>> Section 9.1.2 places the best BGP route in the Loc-RIB.
>> It also says, "Whether the new BGP route replaces an existing non-BGP rou=
te
>> in the Routing Table depends on the policy configured on the BGP speaker.=
"
>>=20
>> Section 9.1.3 for Route Dissemination then says the following:
>>  :    The Phase 3 decision function is invoked on completion of Phase 2, o=
r
>>  :    when any of the following events occur:
>>  : [...]
>>  :       b) when locally generated routes learned by means outside of BGP=

>>  :          have changed
>>  : [...]
>>  :=20
>>  :    All routes in the Loc-RIB are processed into Adj-RIBs-Out according=

>>  :    to configured policy.=20
>>=20
>> So... there's not a lot of normative text there.  The last sentence above=
,
>> "configured policy", is the wiggle room the RFC has to say "I'm originati=
ng
>> a route". =20
>>=20
>> The general call in the RFC to not advertise things you can't forward to
>> also provides some level of wiggle room that the best route in the Routin=
g
>> Table is what should be redistributed:
>>  :    A route SHALL NOT
>>  :    be installed in the Adj-Rib-Out unless the destination, and NEXT_HO=
P
>>  :    described by this route, may be forwarded appropriately by the
>>  :    Routing Table.
>>=20
>> Anyway, the text is sloppy and discussions about how redistribution is do=
ne
>> and whether that is shown in the Loc-Rib view or as an override to the
>> rib-out manifests in discussions in things like BMP or the Yang modules.
>> It's still not as cleanly settled as it should be.
>>=20
>> Offering my opinion on this abstract model, I've tended to think about a
>> better route in the Routing Table (e.g. a static route with a better admi=
n
>> distance/preference) as being selected into the Loc-Rib by considering it=
 a
>> route injected via a virtual Adj-Rib-In.  But that's just my personal
>> justification.
>>=20
>> ----
>>=20
>> Juniper's implementation roughly works like the following.  Since it's
>> derived from GateD heritage, many similar implementations will behave in a=

>> similar fashion.
>>=20
>> In a single routing table, there is a total ordering on all contributors t=
o
>> a given destination.
>>=20
>> The highest level of ordering is "preference", which roughly corresponds t=
o
>> admin-distance.
>>=20
>> When routes are BGP routes, the BGP routes are ordered based largely on
>> standard RFC rules.  The deviations from those rules are the usual vendor=

>> deviations from standards based on when the later standards were publishe=
d,
>> and the mix of features the operator wants.  Examples of deviations are
>> whether RFC 5004 is used for temporal vs. router-id based deterministic
>> tie-breaking.
>>=20
>> BGP routes vs. non-BGP routes are not directly comparable, even when they=

>> have the same preference, but we have criteria that permits them to be
>> selected deterministically. =20
>>=20
>> For non-BGP routes vs. non-BGP routes of two different protocols, we
>> similarly will select things deterministically, but may be willing to use=

>> particular properties of the routes that may be comparable; e.g. metric.
>>=20
>> Much of this process is documented here:
>> https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__www.juniper.net_do=
cumentation_en-5FUS_junos_topics_reference_general_routing-2Dprotocols-2Dadd=
ress-2Drepresentation.html&d=3DDwIBAg&c=3DV9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8=
PTfkrzVSo&r=3DOPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&m=3D-dwyFCMbZS-tuT=
tu3oL1HNFBH-2mpqRlUm9F7wDkv6g&s=3DR7AEen-1wMEbbOC4rK4EjfOnMRVPTfQpMnYKMWKGmA=
A&e=3D=20
>>=20
>> For the described scenarios in draft-chen-bgp-redist, JUNOS doesn't have
>> determinism issues.
>>=20
>> In our more recent multi-threaded BGP mode, we at one point in the design=

>> had an issue somewhat related to the one described in the draft.  In our
>> implementation, we address the issue by always making the Routing Table a=
nd
>> the BGP thread always exchange the best route in the local total-ordering=
.
>>=20
>> -- Jeff
> _______________________________________________
> Idr mailing list
> Idr@ietf.org
> https://www.ietf.org/mailman/listinfo/idr

--Apple-Mail-7B8FC2E1-A25F-4C6B-8812-036F1ED25D37
Content-Type: text/html;
	charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=3D=
utf-8"></head><body dir=3D"auto">Jeff,<div><br></div><div>Many thanks for th=
e analysis and detailed explanation of Junos behavior.</div><div>It would be=
 great if other vendors would do a similar work and post to the list.</div><=
div><br></div><div>Perhaps RFC5004/router-id (in densely meshed eBGP topolog=
ies) behavior &nbsp;deserves a good discussion as well.<br><br><div dir=3D"l=
tr">Cheers,<br><div>Jeff</div></div><div dir=3D"ltr"><br><blockquote type=3D=
"cite">On Aug 20, 2021, at 14:28, Enke Chen &lt;enchen@paloaltonetworks.com&=
gt; wrote:<br><br></blockquote></div><blockquote type=3D"cite"><div dir=3D"l=
tr">=EF=BB=BF<div dir=3D"ltr"><div>Hi, Jeff:</div><div><br></div><div>Thanks=
 for the pointer to JUNOS's&nbsp;BGP path selection. It's great that the Gat=
ed/JUNOS already has some consideration of the "admin-distance" / "preferenc=
e" in BGP. That should take care of the case described in Sect. 2.1 ("On a s=
ingle router").&nbsp; It's not clear to me, though, whether and how the case=
s described in Sec. 2.2 ("Network-wide behavior") are handled.</div><div><br=
></div><div>This is just a quick comment. I am still trying to digest your e=
mail.</div><div><br></div><div>-- Enke</div><div><br></div></div><br><div cl=
ass=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Fri, Aug 20, 20=
21 at 12:22 PM Jeffrey Haas &lt;<a href=3D"mailto:jhaas@pfrc.org">jhaas@pfrc=
.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
">Enke &amp; Jenny,<br>
<br>
While I'm inclined to say that the matters described in<br>
draft-chen-bgp-redist aren't a problem, it's probably more reasonable to say=
<br>
"this isn't a problem for everyone".<br>
<br>
I'm not going to debate what other implementations do.&nbsp; Instead, I'll s=
pend<br>
the much of my message discussing what the implementation I work on does.<br=
>
This does leave us in an interesting headache as a working group whether<br>=

there's more general work to do here that merits discussion as an RFC.<br>
<br>
----<br>
<br>
One thing I will contribute toward some possible need for discussion is the<=
br>
sloppy ground that RFC 4271 had with regards to interactions with non-BGP<br=
>
protocols.&nbsp; We had quite a difficult time getting the text for this "ri=
ght"<br>
and there was multiple pushes to try to NOT discuss non-BGP protocols at<br>=

all.&nbsp; Such discussions got very deep in the internal details of various=
<br>
implementations.&nbsp; What we did end up with in RFC 4271 was the following=
:<br>
<br>
The BGP RIBs are a distinct entity from the "Routing Table".<br>
<br>
Section 9.1.2 places the best BGP route in the Loc-RIB.<br>
It also says, "Whether the new BGP route replaces an existing non-BGP route<=
br>
in the Routing Table depends on the policy configured on the BGP speaker."<b=
r>
<br>
Section 9.1.3 for Route Dissemination then says the following:<br>
&nbsp;:&nbsp; &nbsp; The Phase 3 decision function is invoked on completion o=
f Phase 2, or<br>
&nbsp;:&nbsp; &nbsp; when any of the following events occur:<br>
&nbsp;: [...]<br>
&nbsp;:&nbsp; &nbsp; &nbsp; &nbsp;b) when locally generated routes learned b=
y means outside of BGP<br>
&nbsp;:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; have changed<br>
&nbsp;: [...]<br>
&nbsp;: <br>
&nbsp;:&nbsp; &nbsp; All routes in the Loc-RIB are processed into Adj-RIBs-O=
ut according<br>
&nbsp;:&nbsp; &nbsp; to configured policy. <br>
<br>
So... there's not a lot of normative text there.&nbsp; The last sentence abo=
ve,<br>
"configured policy", is the wiggle room the RFC has to say "I'm originating<=
br>
a route".&nbsp; <br>
<br>
The general call in the RFC to not advertise things you can't forward to<br>=

also provides some level of wiggle room that the best route in the Routing<b=
r>
Table is what should be redistributed:<br>
&nbsp;:&nbsp; &nbsp; A route SHALL NOT<br>
&nbsp;:&nbsp; &nbsp; be installed in the Adj-Rib-Out unless the destination,=
 and NEXT_HOP<br>
&nbsp;:&nbsp; &nbsp; described by this route, may be forwarded appropriately=
 by the<br>
&nbsp;:&nbsp; &nbsp; Routing Table.<br>
<br>
Anyway, the text is sloppy and discussions about how redistribution is done<=
br>
and whether that is shown in the Loc-Rib view or as an override to the<br>
rib-out manifests in discussions in things like BMP or the Yang modules.<br>=

It's still not as cleanly settled as it should be.<br>
<br>
Offering my opinion on this abstract model, I've tended to think about a<br>=

better route in the Routing Table (e.g. a static route with a better admin<b=
r>
distance/preference) as being selected into the Loc-Rib by considering it a<=
br>
route injected via a virtual Adj-Rib-In.&nbsp; But that's just my personal<b=
r>
justification.<br>
<br>
----<br>
<br>
Juniper's implementation roughly works like the following.&nbsp; Since it's<=
br>
derived from GateD heritage, many similar implementations will behave in a<b=
r>
similar fashion.<br>
<br>
In a single routing table, there is a total ordering on all contributors to<=
br>
a given destination.<br>
<br>
The highest level of ordering is "preference", which roughly corresponds to<=
br>
admin-distance.<br>
<br>
When routes are BGP routes, the BGP routes are ordered based largely on<br>
standard RFC rules.&nbsp; The deviations from those rules are the usual vend=
or<br>
deviations from standards based on when the later standards were published,<=
br>
and the mix of features the operator wants.&nbsp; Examples of deviations are=
<br>
whether RFC 5004 is used for temporal vs. router-id based deterministic<br>
tie-breaking.<br>
<br>
BGP routes vs. non-BGP routes are not directly comparable, even when they<br=
>
have the same preference, but we have criteria that permits them to be<br>
selected deterministically.&nbsp; <br>
<br>
For non-BGP routes vs. non-BGP routes of two different protocols, we<br>
similarly will select things deterministically, but may be willing to use<br=
>
particular properties of the routes that may be comparable; e.g. metric.<br>=

<br>
Much of this process is documented here:<br>
<a href=3D"https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__www.junipe=
r.net_documentation_en-5FUS_junos_topics_reference_general_routing-2Dprotoco=
ls-2Daddress-2Drepresentation.html&amp;d=3DDwIBAg&amp;c=3DV9IgWpI5PvzTw83UyH=
GVSoW3Uc1MFWe5J8PTfkrzVSo&amp;r=3DOPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11=
E&amp;m=3D-dwyFCMbZS-tuTtu3oL1HNFBH-2mpqRlUm9F7wDkv6g&amp;s=3DR7AEen-1wMEbbO=
C4rK4EjfOnMRVPTfQpMnYKMWKGmAA&amp;e=3D" rel=3D"noreferrer" target=3D"_blank"=
>https://urldefense.proofpoint.com/v2/url?u=3Dhttps-3A__www.juniper.net_docu=
mentation_en-5FUS_junos_topics_reference_general_routing-2Dprotocols-2Daddre=
ss-2Drepresentation.html&amp;d=3DDwIBAg&amp;c=3DV9IgWpI5PvzTw83UyHGVSoW3Uc1M=
FWe5J8PTfkrzVSo&amp;r=3DOPLTTSu-451-QhDoSINhI2xYdwiMmfF5A2l8luvN11E&amp;m=3D=
-dwyFCMbZS-tuTtu3oL1HNFBH-2mpqRlUm9F7wDkv6g&amp;s=3DR7AEen-1wMEbbOC4rK4EjfOn=
MRVPTfQpMnYKMWKGmAA&amp;e=3D</a> <br>
<br>
For the described scenarios in draft-chen-bgp-redist, JUNOS doesn't have<br>=

determinism issues.<br>
<br>
In our more recent multi-threaded BGP mode, we at one point in the design<br=
>
had an issue somewhat related to the one described in the draft.&nbsp; In ou=
r<br>
implementation, we address the issue by always making the Routing Table and<=
br>
the BGP thread always exchange the best route in the local total-ordering.<b=
r>
<br>
-- Jeff<br>
</blockquote></div>
<span>_______________________________________________</span><br><span>Idr ma=
iling list</span><br><span>Idr@ietf.org</span><br><span>https://www.ietf.org=
/mailman/listinfo/idr</span><br></div></blockquote></div></body></html>=

--Apple-Mail-7B8FC2E1-A25F-4C6B-8812-036F1ED25D37--

