[Idr] Comments on draft-ietf-idr-5g-edge-service-metadata

Jeffrey Haas <jhaas@pfrc.org> Fri, 27 October 2023 21:01 UTC

Date: Fri, 27 Oct 2023 17:01:20 -0400
From: Jeffrey Haas <jhaas@pfrc.org>
To: draft-ietf-idr-5g-edge-service-metadata@ietf.org, idr@ietf.org
Message-ID: <20231027210120.GA22739@pfrc.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/5UNxEK8zb-Hk2YRQfDc2X8pt0eA>
Subject: [Idr] Comments on draft-ietf-idr-5g-edge-service-metadata
Precedence: list

Authors,

My apologies for not providing earlier review of your document, especially
in light of request for early allocation.  Life happens, and life has been
messy the last few months.

I have several concerns covering the PDU formats and BGP procedures for this
document, but most of them are not severe and likely to be able to be
addressed in an update.  Depending on how some are addressed, the
implementors of the draft may want to defer their code point request until
they can do further discussion and testing of the result of these
conversations.

High level assessment for the chairs and AD:
--------------------------------------------

Asking for early assignment for a path attribute code point has a
requirement for a certain level of stability for the feature.  This is
primarily motivated by inconsistent parsing of the attribute, or
inconsistent feature behavior, can result in anything from BGP session
resets to traffic blackholes.

While the intent of this feature appears to be "walled garden" scenarios,
the fact that BGP is used in these environments in circumstances that can
leak on accident, these issues should be discussed.

The majority of the PDU considerations are covered, but not all.  It'd be
good to get some clarity on the items flagged below as they are likely to
impact the ability to have interoperable implementations.

I find the site-id procedures to be unclear and some additional text might
be helpful.

The potential inconsistencies in route selection and the partial deployment
model MUST be addressed.  I suspect the intended deployment scenario is such
that this may not be a problem when the feature is kept contained in a
consistently deployed walled garden. 

The feature does note that filtering of the attribute may be necessary.
However, we're getting very publicly reminded that path attributes often go
further than we like, and cause outages and security issues.  I'd strongly
recommend the attribute escape scenario be dealt with prior to early
assignment.


Boring BGP details:
-------------------

Transitivity:
=============

The transitivity of thie new path attribute isn't currently clear in the
document.  Here are some conflicting sections:

: 4.1. Metadata Path Attribute
: 
: The Metadata Path Attribute is an optional transitive BGP Path attribute to
: carry metrics and metadata about the edge services attached to the egress
: router.

Here, we're saying the attribute is optional, transitive.

: 4.1.1. Metadata Path Attribute Handling Procedure 
: [...]
: 
: When a BGP Speaker does not recognize some of the Sub-TLVs within one
: Metadata Path Attribute in a BGP UPDATE message, the BGP Speaker should
: forward the received BGP UPDATE message without any change if the BGP UPDATE
: message is marked as transitive.

BGP UPDATEs themselves aren't "transitive".  What's the intention here?
Simply that if the BGP route is propagated that unrecognized TLVs should be
propagated?

Perhaps consider using the RFC 4271 normative definition of Route:

:    Route
:       A unit of information that pairs a set of destinations with the
:       attributes of a path to those destinations.  The set of
:       destinations are systems whose IP addresses are contained in one
:       IP address prefix carried in the Network Layer Reachability
:       Information (NLRI) field of an UPDATE message.

This avoids trying to discuss the UPDATE.  UPDATEs are just PDU packing to
carry Routes and covers the destination and the Path Attributes.

: 4.1.2. TLV Format 
: [...]
: 
: The second high-order bit (bit 1): set to 0 to indicate that the
: service-metadata is not transitive. Only intended for the receiving router.

Here, it's marked as non-transitive.

Other PDU details:
==================

: 4.1.2. TLV Format 
: [...]
: 
: - The third high-order bit (bit 2): same as specified by RFC4721.
: - The fourth high-order bit (bit 3): set to 1 to indicate there are two
:   octets for the Length field.

Don't try to describe the details here.  It's base BGP protocol behavior.
In particular, the text covering the extended length behavior is erroneous
and had been mentioned previously in another thread.  The Length will be one
or two octets based on the length of the contained attribute value.

: 4.3. Capacity Availability Index Metadata 
: 
: The Capacity Availability Index Sub-TLV:
: 
:  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: |      CapAvailIdx Sub-Type     |         Reserved              |
: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: |        Site-ID (2 octets)     | Site Availability Percentage  |
: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Where's the Length?  Is Reserved a typo?

Note that the site-id and site availability percentage breaks your "all
sub-tlv values are 32 bit integers". (Noted below commenting on §4.1.2)

: 4.4.3. Raw Load Measurement Sub-TLV 
: [...]
: Raw Load Measurement Sub-TLV has the following format:
: 
:    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
:   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:   | Raw-Load-Measurement Sub-Type |               Length          |
:   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:   |                   Measurement Period                          |
:   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:   |           total number of packets to the Edge Service         |
:   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:   |           total number of packets from the Edge Service       |
:   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:   |           total number of bytes to the Edge Service           |
:   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:   |           total number of bytes from the Edge Service         |
:   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

This also breaks the "all sub-tlv values are 32 bit integers" rule.

Additionally, these values likely want to be 64-bits wide.  Given network
capacities and speeds these days 32 bits is likely too small.

: - Measurement Period: BGP Update period in Seconds or user-specified period.

Bad idea.  Either specify seconds or provide an additional field so that
that measurement quantum can be determined by the applications or management
platforms.

Error Handling:
---------------

: 4.1.3. Error Handling 
: 
: A BGP speaker SHOULD NOT include more than one Metadata Path Attribute in
: one BGP Update message.

Largely shouldn't be mentioned.  It's a session fatal error per RFC 4271, §6.3.

: A BGP UPDATE message that includes the Metadata Path Attribute doesn't
: change the BGP Error Handling procedure specified in the [RFC7606]. 

Similarly, don't mention.  You don't get to break these rules. :-)

: If one of the Sub-TLVs has an invalid value, e.g., out of its specified
: ranges, the Sub-TLV with the invalid value is ignored by the BGP receiver.

Ignored is fine.  Should the TLV be stripped prior to propagation?  Unlike
unknown TLVs, a known TLV with an invalid value could be cleaned up.

: By default, no notification is required unless configured to send a
: notification to its management system. 

I don't think you mean to use "notification" here since that word tends to
be used for the NOTIFICATION PDU as part of a session reset.  Is your intent
that logging of the error locally or to a management system is optional
rather than a SHOULD?

: 8. Validation and Error Handling 

I'd suggest condensing the 4.1.3 and 8. error handling sections together.



General sub-TLV questions/issues:
=================================

: 4.1.2. TLV Format 
: 
: All values in the Sub-TLVs are unsigned 32 bits integers.

Are you sure you really want to force these values to be constrained to 32
bits?  This may preclude adding other types with more flexible values in the
future, even if something boring like uint64 or decimal64.

Minimally, you should say if these are signed or unsigned integers.  Based
on usage throughout the rest of the document, I suspect they're intended to
be unsigned.

Having decided what the required - fixed-sized - length is, the error
handling considerations should be addressed.  The section 4.1.3 text covers
invalid (semantic) values, but doesn't address syntactic issues with this
field.  If the length is not correct, is only the sub-TLV itself considered
malformed or is the entire Path Attribute malformed?  If the latter, the
usual RFC 7606 treat-as-withdraw behaviors should be done.

Speaking of Length, the document should describe the semantics of the
sub-TLV length field and whether it is only the length of the value portion
of the field or inclusive of the type and length field itself.  This
occasionally varies in IETF protocols and can lead to interop issues.


Route Churn Considerations:
---------------------------

Several fields contain values that are intended to be some level of dynamic
metric.  (Zero surprise given that's the purpose of the feature!)  This
includes the Capacity Availability Index, Site Delay Prediction Index,
Service Delay Prediction Index, Raw Load Measurement, etc.

Section 7 at least addresses that metric change can impact path selection,
and attempts to provide a default lower bound for such churn.  Good!

However, since this mechanism is intended to be able to be used on routes
that are used for BGP nexthop resolution (e.g., labeled unicast), the churn
in these metrics can result in not only churn of the prefixes carrying the
data, but dependent routes.

This churn is highly analogous to the impacts of features such as RSVP
auto-bandwidth which is known to have significant negative network impacts.
It's minimally responsible to mention this broader impact.

4.3. Capacity Availability Index Metadata Issues:
-------------------------------------------------

Section 4.3.2 discussses that this TLV is used to help decorate routes that
have a nexthop where the routes share a site-id.  However it's unclear if
this TLV is intended to be used BOTH on the routes used as nexthops and the
routes that resolve over said nexthops as the correlator?

Can routes used as nexthops have more than one site-id bound to them?
Can the routes resolving over them?

If the same TLV is used on the routes resolving over the nexthops, how is
the site availability percentage filled in?

The following text is also unclear:
: However, it is unnecessary to include the Site Capacity Availability Index
: for every BGP Update message if there is no change to the site-reference
: identifier or the Capacity Availability value for the service instances.

BGP uses implicit withdraws when UPDATES advertise a given NLRI with a set
of Path Attributes.  If the Path Attributes contain metadata that excludes
the capacity TLV, I'd presume that it goes away from the route in question.
What is the intent of the text above?

Route Selection Considerations:
-------------------------------

: 5.2. Integrating with BGP decision process 

: For the selected services configured to be influenced by the Edge Service
: Metadata, the ingress router BGP Decision process [IDR-CUSTOM-DECISION]

While the custom decision process draft is adopted work, it's not widely
implemented.

If the proposal is using the cost community, what is the recommended
contents of that community for these procedures?

Inconsistent Route Selection Considerations:
============================================

Devices that recognize TLVs inconsistently may have inconsistent route
selection.  This should be flagged as an issue.

Features that impact BGP's route selection need significant additional
scrutiny, and IDR review hasn't always been great about catching such
things.  The primary issue is that when such features are inconsistently
deployed, or their inputs are inconsistently made use of (see comment
above), BGP Speakers in the network can come to different conclusions about
what the active route should be.  

This can result in forwarding loops.

Features that use optional non-transitive path attributes can be more safely
deployed in a network, but still require the feature to be consistently
deployed within an IGP domain in many cases.  This document is currently
unclear about the scope of deployment, but with the focus on the ingresses,
it has the feel that the intent that partial deployment is a consideration.

The text in section 6 of your document seems to confirm that partial
deployment is under consideration.

See general considerations in draft-haas-idr-bgp-attribute-escape, §3 for
some discussion on these points.

Note that tunneling to the nexthop can mitigate some of the forwarding loop
considerations in some cases, but not all.

Scoping Considersations:
------------------------

RTC procedures:
===============

: 6. Service Metadata Propagation Scope 
: 
: For each registered low-latency Service, BGP RT Constrained Distribution
: [RFC4684] can be used to form the Group interested in the Service. The
: "Service ID", an IP address prefix, is the Route Target.

Is the intention here that a general-purpose IP-formatted route-target
should contain the service ID?

This seems to be limiting the feature's use to only VPN service routes using
this specific format.  If that's not the case and this is intended as a way
to mark subsets of routes using this feature for constrained distribution,
you probably want to use a new extended community type/subtype.

Note that rt-constrain on non-route-target extended communities has been
discussed previously on IDR, and some drafts trying to codify the same, but
support for the same is currently not deployed to the best of my knowledge.

The procedures covering exactly what happens for advertising the RTC route
and attracting the group routes is unclear.  See my other questions about
site-id in my comments above. 

An example here would be quite helpful.

Attribute Escape Considerations:
================================

This draft specifies a new path attribute, that may be optionally
transitive, or not.  (The need for clarification is in prior comments.)

This new attribute may be attached to Internet scoped routes.

Section 9 and Section 10 attempt to limit the deployment of this feature
within "trusted domains" "between Ingress and egress routers of one single
BGP domain".

Section 4.1.1 attempts to address the scoping consideration further by:
: In order to prevent distribution of the BGP Metadata Path Attribute beyond
: its intended scope of applicability, attribute filtering SHOULD be deployed
: to remove the BGP Metadata Path attribute at the administrative boundary.

As addressed in draft-haas-idr-bgp-attribute-escape, such filtering desires
and expectations of limited domains have tended to be wishful thinking and
we keep ending up with operational accidents.

I would STRONGLY suggest that the Path Attribute definition be updated to
provide additional scoping information wherein a remote BGP domain receiving
an escaped metadata Path Attribute can determine that it should NOT be
locally used for the procedures discussed in this document.

An example of such a change would be to add an Autonomous System number to
the Path Attribute providing the context of the AS that should be using the
contained information.

-- Jeff

[Idr] Comments on draft-ietf-idr-5g-edge-service-… Jeffrey Haas
Re: [Idr] Comments on draft-ietf-idr-5g-edge-serv… Linda Dunbar
Re: [Idr] Comments on draft-ietf-idr-5g-edge-serv… Jeffrey Haas