[RTG-DIR] Rtgdir early review of draft-ietf-idr-sr-policy-safi-00

Zhaohui Zhang via Datatracker <noreply@ietf.org> Fri, 01 March 2024 03:14 UTC
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Zhaohui Zhang via Datatracker <noreply@ietf.org>
To: rtg-dir@ietf.org
Cc: draft-ietf-idr-sr-policy-safi.all@ietf.org, idr@ietf.org
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <170926285323.21559.2544259526462856240@ietfa.amsl.com>
Reply-To: Zhaohui Zhang <zzhang@juniper.net>
Date: Thu, 29 Feb 2024 19:14:13 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtg-dir/E9qyJKHjRpMTNOAXnJo9jVi0nQg>
Subject: [RTG-DIR] Rtgdir early review of draft-ietf-idr-sr-policy-safi-00
Reviewer: Zhaohui Zhang
Review result: Has Issues

Abstract

   A Segment Routing (SR) Policy is an ordered list of segments (i.e.,
   instructions) that represent a source-routed policy.  An SR Policy
   consists of one or more candidate paths, each consisting of one or
   more segment lists.  A headend may be provisioned with candidate
   paths for an SR Policy via several different mechanisms, e.g., CLI,
   NETCONF, PCEP, or BGP.

"an ordered list of segments" or "ordered lists of segments" or
"ordered lists of ordered segments?
I assume each candidate path can have at least one "ordered list of segments".

   This document introduces a BGP subsequent address family (SAFI) for
   IPv4 and IPv6 address families.  In UPDATE messages of those AFI/
   SAFIs, the NLRI identifies an SR Policy Candidate Path while the
   attributes encode the segment lists and other details of that SR
   Policy Candidate Path.

Does a candidate path include the endpoint and color information?

   While for simplicity we may write that BGP advertises an SR Policy,
   it has to be understood that BGP advertises a candidate path of an SR
   policy and that this SR Policy might have several other candidate
   paths provided via BGP (via an NLRI with a different distinguisher as
   defined in Section 2.1), PCEP, NETCONF, or local policy
   configuration.

   Typically, a controller defines the set of policies and advertises
   them to policy headend routers (typically ingress routers).  These
   policy advertisements use the BGP extensions defined in this
   document.  The policy advertisement is, in most but not all cases,
   tailored for a specific policy headend.  In this case, the
   advertisement may be sent on a BGP session to that headend and not
   propagated any further.

"in most cases" and "in this case" - are they the same?

   Alternatively, a router (i.e., a BGP egress router) advertises SR
   Policies representing paths to itself.  In this case, it is possible
   to send the policy to each headend over a BGP session to that
   headend, without requiring any further propagation of the policy.

What's the difference from the previous one? There is no difference
whether it is sent from an egress router or a controller should not matter,
 right?

   An SR Policy intended only for the receiver will, in most cases, not
   traverse any Route Reflector (RR, [RFC4456]).

Is the above paragraph correct/needed. I suppose in most cases
they will traverse RR after all - whether it is from a controller or
an egress PE.

   In some situations, it is undesirable for a controller or BGP egress
   router to have a BGP session to each policy headend.  In these
   situations, BGP Route Reflectors may be used to propagate the
   advertisements.  In certain other deployments, it may be necessary
   for the advertisement to propagate through a sequence of one or more
   ASes within an SR Domain (refer to Section 7 for the associated
   security considerations).  To make this possible, an attribute needs
   to be attached to the advertisement that enables a BGP speaker to
   determine whether it is intended to be a headend for the advertised
   policy.  This is done by attaching one or more Route Target Extended
   Communities to the advertisement [RFC4360].

How is further propagation prevented after the headend is reached?

   The BGP extensions for the advertisement of SR Policies include
   following components:

The BGP extensions is for the advertisement of SR Policy
 Candidate Paths not SR Policies themselves, right?

   *  One or more IPv4 address format route target extended community
      ([RFC4360]) attached to the SR Policy advertisement and that
      indicates the intended headend of such an SR Policy advertisement.

and IPv6? s/format/specific/?

   The SR Policy SAFI route updates use the Tunnel Encapsulation
   Attribute to signal an SR Policy - i.e., a tunnel itself.  Its usage

An SR Policy Candidate Path, not an SR Policy?

Good to see "a tunnel itself" mentioned here :-)
I've always thought the "SR Policy" is a convoluted term for tunnel :-)

   of this attribute is hence very different from [RFC9012] where this
   attribute is associated with a BGP route update (e.g., for Internet
   or VPN routes) to specify the tunnel which is used for forwarding
   traffic for that route.  This document does not update or change the
   usage of the Tunnel Encapsulation Attribute as specified in [RFC9012]
   for existing AFI/SAFIs as specified in that document.  The details of
   processing of the Tunnel Encapsulation Attribute for the SR Policy
   SAFI are specified in Section 2.2 and Section 2.3.


Good to see the difference is pointed out here. I've always thought
Tunnel Encapsulation Attribute (TEA) is shoehorned here but
I guess it is too late to change that.


   The Color Extended Community (as defined in [RFC9012]) is used to
   steer traffic into an SR Policy, as described in section 8.8 of
   [RFC9256].  The Section 3 of this document updates [RFC9012] with
   modifications to the format of the Flags field of the Color Extended
   Community by using the two leftmost bits of that field.

   *  Policy Color: 4-octet value identifying (with the endpoint) the
      policy.  The color is used to match the color of the destination
      prefixes to steer traffic into the SR Policy as specified in
      section 8 of [RFC9256].

   *  Endpoint: value identifies the endpoint of a policy.  The Endpoint
      may represent a single node or a set of nodes (e.g., an anycast
      address).  The Endpoint is an IPv4 (4-octet) address or an IPv6
      (16-octet) address according to the AFI of the NLRI.  The address
      can be either a unicast or an unspecified address (0.0.0.0 for
      IPv4, :: for IPv6) as specified in section 2.1 of [RFC9256].

Can you call it out as "null endpoint" that was used later?

   It is important to note that any BGP speaker receiving a BGP message
   with an SR Policy NLRI, will process it only if the NLRI is among the

There are a lot of "processing" before it is deemed "among the bet paths",
right? Do you mean the "SRPM" will process it only if the NLRI is among
the best paths?

   best paths as per the BGP best-path selection algorithm.  In other
   words, this document leverages the existing BGP propagation and best-
   path selection rules.  Details of the procedures are described in
   Section 4.

   SR Policy SAFI NLRI: <Distinguisher, Policy-Color, Endpoint>
   Attributes:
      Tunnel Encapsulation Attribute (23)
         Tunnel Type: SR Policy (15)
             Binding SID
             SRv6 Binding SID
             Preference
             Priority
             Policy Name

Policy name seems to be a property for policy not the candidate path.
What if the names do not match among different candidate paths of the same
policy?
             Policy Candidate Path Name
             Explicit NULL Label Policy (ENLP)
             Segment List
                 Weight
                 Segment
                 Segment
                 ...
             ...

   Figure 2: SR Policy Encoding

2.3.  Applicability of Tunnel Encapsulation Attribute Sub-TLVs

   The Tunnel Egress Endpoint and Color sub-TLVs, as defined in
   [RFC9012], may also be present in the SR Policy encodings.

Why do we say the above given the following paragraph? They seem to
be contractive.

   The Tunnel Egress Endpoint and Color Sub-TLVs of the Tunnel
   Encapsulation Attribute are not used for SR Policy encodings and
   therefore their value is irrelevant in the context of the SR Policy
   SAFI NLRI.  If present, the Tunnel Egress Endpoint sub-TLV and the
   Color sub-TLV MUST be ignored by the BGP speaker and MAY be removed
   from the Tunnel Encapsulation Attribute during propagation.

   Similarly, any other sub-TLVs (including those defined in [RFC9012])
   whose applicability is not specifically defined for the SR Policy
   SAFI MUST be ignored by the BGP speaker and MAY be removed from the
   Tunnel Encapsulation Attribute during propagation.

Why don't we say any those sub-TLVs not defined in this document must not
be present and must be ignored?

   Preference, Binding SID, SRv6 Binding SID, Segment-List, Priority,
   Policy Name, Policy Candidate Path Name, and Explicit NULL Label
   Policy are all optional sub-TLVs introduced for the BGP Tunnel
   Encapsulation Attribute [RFC9012] being defined in this section.

Should the segment-list be mandatory?
What does it mean if the segment-list is empty?

   When the Binding SID sub-TLV is used to signal an SRv6 SID, the
   choice of its SRv6 Endpoint Behavior [RFC8986] to be instantiated is
   left to the headend node.  It is RECOMMENDED that the SRv6 Binding
   SID sub-TLV defined in Section 2.4.3, that enables the specification
   of the SRv6 Endpoint Behavior, be used for signaling of an SRv6
   Binding SID for an SR Policy candidate path.

Is there a choice here? Shouldn't the behavior be that traffic with that
Binding SID is steered into this policy?
The whole paragraph is hard to parse.

   *  Binding SID: If the length is 2, then no Binding SID is present.
      If the length is 6 then the Binding SID is encoded in 4 octets
      using the format below.  Traffic Class (TC), S, and TTL (Total of
      12 bits) are RESERVED and MUST be set to zero and MUST be ignored.

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |          Label                        | TC  |S|       TTL     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    Figure 6: Binding SID Label Encoding

      If the length is 18 then the Binding SID contains a 16-octet SRv6
      SID.

Why do we need the a 16-octet Binding SID since we have the following
"SRv6 Binding SID Sub-TLV"?

2.4.3.  SRv6 Binding SID Sub-TLV

   The SRv6 Binding SID sub-TLV is optional.  More than one SRv6 Binding
   SID sub-TLVs MAY be signaled in the same SR Policy encoding to
   indicate one or more SRv6 SIDs, each with potentially different SRv6
   Endpoint Behaviors to be instantiated.

Why would there be more than one signaled, and why would there be different
endpoing behaviors? Isn't the behavior simply "steer into the SR policy"?

      -  S-Flag: This flag encodes the "Specified-BSID-only" behavior.
         It is used by SRPM as described in section 6.2.3 in [RFC9256].

I have trouble understanding this "Specified-BSID-only" behavior.


      -  I-Flag: This flag encodes the "Drop Upon Invalid" behavior.  It
         is used by SRPM as described in section 8.2 in [RFC9256].

I also have trouble understanding this "Drop Upon Invalid" behavior.
I read rfc9256 but still can't put the two together.

2.4.4.2.  Segment Sub-TLVs


   The Segment sub-TLVs are optional and MAY appear multiple times in
   the Segment List sub-TLV.

Why are they optional? What is the use case of an empty segment list?

2.4.4.2.2.  Segment Type B

   The Type B Segment Sub-TLV encodes a single SRv6 SID.  The format is
   as follows:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type      |   Length      |     Flags     |   RESERVED    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   //                       SRv6 SID (16 octets)                  //
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   //     SRv6 Endpoint Behavior and SID Structure (optional)     //
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   *  Flags: 1 octet of flags as defined in Section 2.4.4.2.3.

   *  SRv6 SID: 16 octets of IPv6 address.

   *  SRv6 Endpoint Behavior and SID Structure: Optional, as defined in
      Section 2.4.4.2.4.

When this is part of a segment list, what is the significance of the Flags and
SRv6 Endpoint Behavior and SID Structure?

   The TLV 2 defined for the advertisement of Segment Type B in the
   earlier versions of this document has been deprecated to avoid
   backward compatibility issues.

Why would deprecating them avoid backward compatibility issues?
If there are implementations/deployments based on earlier versions,
deprecating them won't help.
If there are no implementations/deployments based on earlier versions,
there is no backward compatiblity issue.

Perhaps just remove "to avoid ..."?

2.4.4.2.3.  Segment Flags

   The Segment Types sub-TLVs described above may contain the following
   flags in the "Flags" field defined in Section 6.8:

    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   |V|   |B|       |
   +-+-+-+-+-+-+-+-+

   Figure 22: Segment Flags

   where:

      V-Flag: This flag, when set, is used by SRPM for "SID
      verification" as described in Section 5.1 of [RFC9256].

I have trouble understanding the V-Flag. How is the headend supposed to verify
the BSID or any segment in the segment list?

2.4.4.2.4.  SRv6 SID Endpoint Behavior and Structure

   The Segment Types sub-TLVs described above MAY contain the SRv6
   Endpoint Behavior and SID Structure [RFC8986] encoding as described
   below:

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Endpoint Behavior       |            Reserved           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    LB Length  |  LN Length    | Fun. Length   |  Arg. Length  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   Figure 23: SRv6 SID Endpoint Behavior and Structure

   where:

      Endpoint Behavior: 2 octets.  It carries the SRv6 Endpoint
      Behavior code point for this SRv6 SID as defined in section 9.2 of
      [RFC8986].  When set with the value 0xFFFF (i.e., Opaque), the
      choice of SRv6 Endpoint Behavior is left to the headend.

      Reserved: 2 octets of reserved bits.  This field MUST be set to
      zero on transmission and MUST be ignored on receipt.

      Locator Block Length: 1 octet.  SRv6 SID Locator Block length in
      bits.

      Locator Node Length: 1 octet.  SRv6 SID Locator Node length in
      bits.

      Function Length: 1 octet.  SRv6 SID Function length in bits.

      Argument Length: 1 octet.  SRv6 SID Arguments length in bits.

How is this different from the "SRv6 SID Structure Sub-Sub-TLV" in RFC9252?
Why not reuse that one?

2.4.5.  Explicit NULL Label Policy Sub-TLV

   To steer an unlabeled IP packet into an SR policy, it is necessary to
   create a label stack for that packet, and push one or more labels
   onto that stack.

Do you mean SR-mpls policy?
Perhaps remove ", and push one or more labels onto that stack"?
Perhaps changes "Explicit NULL Label Policy" to
"Explicit NULL Label Behavior"? The word "policy" here gets tangled
with "SR Policy".

4.2.1.  Validation of an SR Policy NLRI

   When a BGP speaker receives an SR Policy NLRI from a neighbor it MUST
   first perform validation based on the following rules in addition to
   the validation described in Section 5:

   *  The SR Policy NLRI MUST include a distinguisher, color, and
      endpoint field which implies that the length of the NLRI MUST be
      either 12 or 24 octets (depending on the address family of the
      endpoint).

   *  The SR Policy update MUST have either the NO_ADVERTISE community
      or at least one route target extended community in IPv4-address
      format or both.  If a router supporting this specification
      receives an SR Policy update with no route target extended
      communities and no NO_ADVERTISE community, the update MUST be
      considered as malformed.

What about IPv6-address specific RT?

4.2.2.  Eligibility for Local Use of an SR Policy NLRI

   If one or more route targets are present and none matches the local
   BGP Identifier, then, while the SR Policy NLRI is valid, it is not
   usable on the receiver node.

Does the route target have to match the local BGP identifier?
As long as the receiver is configured with a local RT that matches one
of the advertised RTs, it should be fine, right? That is how VPN RT
works and I suppose the same can be used here.

When should the BGP update stops being propagated if RT is used?
Never? or should a matching RT be removed by each matching receiver
and then the propagation stops when there is no RT left?

   By default, a BGP node receiving an SR Policy NLRI SHOULD NOT remove
   route target extended community before propagation.  An
   implementation MAY provide support for configuration to filter and/or
   remove route target extended community before propagation.

Isn't the above applicable to any AFI/SAFI? Why do we need to specify that?

5.  Error Handling and Fault Management

   A BGP Speaker MUST perform the following syntactic validation of the
   SR Policy NLRI to determine if it is malformed.  This includes the
   validation of the length of each NLRI and the total length of the
   MP_REACH_NLRI and MP_UNREACH_NLRI attributes.  It also includes the
   validation of the consistency of the NLRI length with the AFI and the
   endpoint address as specified in Section 2.1.

   When the error determined allows for the router to skip the malformed
   NLRI(s) and continue the processing of the rest of the update
   message, then it MUST handle such malformed NLRIs as 'Treat-as-
   withdraw'.  In other cases, where the error in the NLRI encoding
   results in the inability to process the BGP update message (e.g.
   length related encoding errors), then the router SHOULD handle such
   malformed NLRIs as 'AFI/SAFI disable' when other AFI/SAFI besides SR
   Policy are being advertised over the same session.  Alternately, the
   router MUST perform 'session reset' when the session is only being
   used for SR Policy or when it 'AFI/SAFI disable' action is not
   possible.

Is the above generic BGP handling?

   The validation of the TLVs/sub-TLVs introduced in this document and
   defined in their respective sub-sections of Section 2.4 MUST be
   performed to determine if they are malformed or invalid.  The
   validation of the Tunnel Encapsulation Attribute itself and the other
   TLVs/sub-TLVs specified in Section 13 of [RFC9012] MUST be done as
   described in that document.  In case of any error detected, either at
   the attribute or its TLV/sub-TLV level, the "treat-as-withdraw"
   strategy MUST be applied.  This is because an SR Policy update
   without a valid Tunnel Encapsulation Attribute (comprising of all
   valid TLVs/sub-TLVs) is not usable.

The above says the validation of those in Section 2.4 may lead to
"treat-as-withdraw" - I assume this is BGP handling. Does that not
conflict with the following paragraph?

   The validation of the individual fields of the TLVs/sub-TLVs defined
   in Section 2.4 are beyond the scope of BGP as they are handled by the
   SRPM as described in the individual TLV/sub-TLV sub-sections.  A BGP
   implementation MUST NOT perform semantic verification of such fields
   nor consider the SR Policy update to be invalid or not usable based
   on such validation.

6.  IANA Considerations

   This document uses code point allocations from the following existing
   registries:

   *  Subsequent Address Family Identifiers (SAFI) Parameters registry

   *  BGP Tunnel Encapsulation Attribute Tunnel Types registry under the
      BGP Tunnel Encapsulation registry

   *  BGP Tunnel Encapsulation Attribute sub-TLVs registry under the BGP
      Tunnel Encapsulation registry

   *  Color Extended Community Flags registry under the BGP Tunnel
      Encapsulation registry

Do we need to mention the above for the already allocated code points?
if yes, should we mention the value as well?
Actually I see 6.1~6.4 below - so the above is not needed at all.

   This document also requests the creation of the following new
   registries:

   *  SR Policy Segment List Sub-TLVs under the BGP Tunnel Encapsulation
      registry

   *  SR Policy Binding SID Flags under the BGP Tunnel Encapsulation
      registry

   *  SR Policy SRv6 Binding SID Flags under the BGP Tunnel
      Encapsulation registry

   *  SR Policy Segment Flags under the BGP Tunnel Encapsulation
      registry

   *  Color Extended Community Color-Only Types registry under the BGP
      Tunnel Encapsulation registry

Similarly, we probably don't need the above. Just a nit.
[RTG-DIR] Rtgdir early review of draft-ietf-idr-s… Zhaohui Zhang via Datatracker
Re: [RTG-DIR] Rtgdir early review of draft-ietf-i… Ketan Talaulikar
Re: [RTG-DIR] Rtgdir early review of draft-ietf-i… Ketan Talaulikar
Re: [RTG-DIR] Rtgdir early review of draft-ietf-i… Jeffrey (Zhaohui) Zhang
Re: [RTG-DIR] Rtgdir early review of draft-ietf-i… Ketan Talaulikar
Re: [RTG-DIR] Rtgdir early review of draft-ietf-i… Ketan Talaulikar
Re: [RTG-DIR] Rtgdir early review of draft-ietf-i… Ketan Talaulikar