Re: [Idr] Feedback on 5575-bis

Christoph Loibl <> Thu, 04 October 2018 10:19 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id E1AF7130E17 for <>; Thu, 4 Oct 2018 03:19:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id wFKJwHbfnwy9 for <>; Thu, 4 Oct 2018 03:19:43 -0700 (PDT)
Received: from ( [IPv6:2001:858:2:8::235]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 39440130DFB for <>; Thu, 4 Oct 2018 03:19:43 -0700 (PDT)
Received: from ([] helo=[]) by with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from <>) id 1g80LP-0006EB-Gj; Thu, 04 Oct 2018 11:55:06 +0200
From: Christoph Loibl <>
Message-Id: <>
Content-Type: multipart/alternative; boundary="Apple-Mail=_0BE6A93F-6D0D-416C-91A9-C8FAB141FE0E"
Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\))
Date: Thu, 4 Oct 2018 12:19:27 +0200
In-Reply-To: <>
To: Jeffrey Haas <>
References: <>
X-Mailer: Apple Mail (2.3445.9.1)
Archived-At: <>
Subject: Re: [Idr] Feedback on 5575-bis
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 04 Oct 2018 10:19:49 -0000

Hi Jeff, 

Thank you very much for your detailed feedback on the draft.

As I promised this is the full report on your feedback on 5575bis draft. I appreciate discussion on the list. And will update the draft document next week according to the discussion.

However, I really need to make clear (again): With this -BIS we want to make things clearer and want to improve interoperability. We avoid fundamental changes that lead to incompatibility (esp. when it comes to encoding of the NLRI and propagation of the NLRI). Please always keep that in mind when suggesting changes (basically this is *not* flowspec 2.0)

Said that, I think the current RFC5575 implementations (at least those I am aware of) are widely compatible/compliant implementations also of the 5575BIS but may lack some minor details that have previously not been specified to that granularity (in RFC5575).

See my comments inline (CL).

Cheers Christoph

> From: Jeffrey Haas < <>>
> Subject: [Idr] Feedback on 5575-bis
> Date: 12. July 2018 at 20:58:55 CEST
> To: <>
> As I promised Sue, I've done a thorough re-read of draft-ietf-idr-rfc5575bis.
> This re-read was partially prompted by recent work in my employer's
> implementation so this was very fresh.
> Here are my comments:
> :   A Flow Specification received from an external autonomous system will
> :   need to be validated against unicast routing before being accepted.
> :   If the aggregate traffic flow defined by the unicast destination
> :   prefix is forwarded to a given BGP peer, then the local system can
> :   safely install more specific flow rules that may result in different
> :   forwarding behavior, as requested by this system.
> Issue: Redirections may need additional scrutiny.  While the destination
> based check on the received unicast routing is good, it's still possible for
> the external AS to influence forwarding beyond "rate-limit/drop" if scrutiny
> is not given to the other control communities.  This issue is compounded
> with the redirect-ip feature, not in this document.
> I'd suggest this be mentioned in an operational security considerations
> section.

CL: As a matter of fact there is already a paragraph mentioning this particular problem. However I suggest to make it clearer with a operational security considerations section as you suggested:

   While the general verification of the traffic filter NLRI is
   specified in this document (Section 6) the traffic filtering actions
   received by a third party may need custom verification or filtering.
   In particular all non traffic-rate actions may allow a third party to
   modify packet forwarding properties and potentially gain access to
   other routing-tables/VPNs or undesired queues.  This can be avoided
   by proper filtering of action communities at network borders and by
   mapping user-defined communities (see Section 7) to expose certain
   forwarding properties to third parties.

> :   In many deployments of BGP Flow Specification, the Flow Specification
> :   information has replace existing host length route advertisements.
> s/replace/replaced/.  Also, it's unclear in the context of this paragraph
> why this is interesting.  A bit of explanatory text explaining the use of
> such most-specific routes for redirecting traffic would be useful.

CL: Suggest to remove the sentence from the draft

> :   Experience with previous BGP extensions has also shown that the
> :   maximum capacity of BGP speakers has been gradually increased
> :   according to expected loads.  Taking into account Internet unicast
> :   routing as well as additional applications as they gain popularity.
> The last sentence is dangling.

CL: Suggest to rephrase:

   Experience with previous BGP extensions has also shown that the
   maximum capacity of BGP speakers has been gradually increased
   according to expected loads.  For example Internet unicast routing as
   well as other BGP applications increased their maximum capacity as
   they gain popularity.

> :   Flow Specification components must follow strict type ordering by
> :   increasing numerical order.  A given component type may (exactly
> :   once) or may not be present in the specification.  If present, it
> :   MUST precede any component of higher numeric type value.
> .... and what if they don't?  Is it malformed?  Should the implementation
> attempt to re-order?

CL: [1] I sent an email roughly a month ago to start a discussion about the encoding of the NLRI - with no response:

*) NLRI is widely ment opaque to BGP (no changes to RFC5575)


*) BGP needs to decode the NLRI to the first component (DST-IP) for the verification (Section 6) (not more of it! - the rest is part of the application that receives the NLRI)

Since changing the above properties would introduce a fundamental change to FS I strongly beleave that it does not fit into a 5575bis.

If the components are not in the right order the encoding _is_wrong_. However BGP cannot detect this, because its knowledge about the correct encoding ends after the very first component (DST-IP). So, to obey the rules (RFC5575) of an opaque NLRI FS-garbage gets propagated. Only when the decoding of the NLRI actually happens (not in BGP) such invalids can be detected and need to be ignored. 

Additionally reordering is not possible because BGP has no understanding of the components (which is actually good because it allows future extensions) - (btw. reordering/re-encoding of the NLRI is potentially dangerous because it may produce “ghost”/“mutated" duplicates of an NLRI in the network).

I suggest to add a paragraph to the draft to make it clear.

Please also refer to my Email one or two months ago - the suggested changes (100% compatible with current implementations) may also help to introduce future extensions.

> :   All combinations of component types within a single NLRI are allowed,
> :   even if the combination makes no sense from a semantical perspective.
> :   If a given component type within a prefix in unknown, the prefix in
> :   question cannot be used for traffic filtering purposes by the
> :   receiver.  Since a Flow Specification has the semantics of a logical
> :   AND of all components, if a component is FALSE, by definition it
> :   cannot be applied.  However, for the purposes of BGP route
> :   propagation, this prefix should still be transmitted since BGP route
> :   distribution is independent on NLRI semantics.
> The issue here is when we encounter an unknown component, we can no longer
> verify the syntactic correctness of the NLRI.  This means we will pass
> potentially invalid entries downstream.  Downstream receivers must thus not
> drop the session, but should use treat-as-withdraw semantics.
> This issue in validation, as I tersely noted in a prior IDR thread, really
> needs discussion in the document.  It impacts the ability to verify, to add
> new component types, and potentially cause invalid NLRI to propagate
> downstream.

CL: See the above comment. As long as BGP treats the NLRI as an opaque value we do not have a problem with all this (not even with new FS components). Downstram propagation of the garbage can be partly avoided (see my Email on that topic) at some cost. But this issue exists with every BGP opaque NLRI - this is not special to FS.

> :   The <type, value> encoding is chosen in order to allow for future
> :   extensibility.
> It should be mentioned that due to the property above that extensions are
> problematic.

CL: This is true, and I think that should be removed. (See also the 2 comments above)

> 4.2.3:
> :     0   1   2   3   4   5   6   7
> :   +---+---+---+---+---+---+---+---+
> :   | e | a |  len  | 0 |lt |gt |eq |
> :   +---+---+---+---+---+---+---+---+
> :
> :        Numeric operator
> The 0 field is unclear what the expected behavior is when it's NOT zero.
> :      a - AND bit.  If unset, the previous term is logically ORed with
> :      the current one.  If set, the operation is a logical AND.  It
> :      should be unset in the first operator byte of a sequence.  The AND
> :      operator has higher priority than OR for the purposes of
> :      evaluating logical expressions.
> It's not clear what should be done if AND is set in the first list entry.
> :      len - length of the value field for this operand encodes 1 (00) -
> :      4 (11) bytes.  Type 3 flow component values are always encoded as
> :      single byte (len = 00).
> "always encoded" begs the question about what to do when it isn't.  Being
> encoded as 4 bytes of length but a length of 1 is still correct, e.g.  Does
> Postel's maxim apply here?

CL: I suggest change all the occurrences of 0, and must be 0 to “MUST be encoded as 0 and ignored on decoding”. Also we suggest (SHOULD) to encode the length of the compoents. Furthermore we suggest a-and bit MUST be unset on first operator, and ignored on decode (mainly for compatibility reasons - I do not want to discuss if Postel was right or not :-) ):

      e - end-of-list bit.  Set in the last {op, value} pair in the

      a - AND bit.  If unset, the previous term is logically ORed with
      the current one.  If set, the operation is a logical AND.  In the
      first operator byte of a sequence it MUST be encoded as unset and
      and MUST be treated as always unset on decoding.  The AND operator
      has higher priority than OR for the purposes of evaluating logical

      len - length of the value field for this operand encodes 1 (00) -
      4 (11) bytes.  Type 3 flow component values SHOULD be encoded as
      single byte (len = 00).

      0 - MUST be set to 0 on NLRI encoding, and MUST be ignored during

      lt - less than comparison between data and value.

      gt - greater than comparison between data and value.

      eq - equality between data and value.

> 4.2.4
> :      Defines a list of {operator, value} pairs that matches source OR
> :      destination TCP/UDP ports.  This list is encoded using the numeric
> :      operator format defined in Section 4.2.3.  Values are encoded as
> :      1- or 2-byte quantities.
> It's not clear what to do if the encoding is > 2 byte.  Ditto for sections
> 4.2.5, 4.2.6, etc.

CL: Same solution as the above. (4.2.3)

> 4.2.9 has a similar "must be zero" set of bits that need to have their
> behavior specified.

CL: Same solution as above (4.2.3)

> 4.2.11
> :      Defines a list of {operator, value} pairs used to match the 6-bit
> :      DSCP field [RFC2474].  This list is encoded using the numeric
> :      operator format defined in Section 4.2.3.  Values are encoded
> :      using a single byte.  The two most significant bits are zero and
> :      the six least significant bits contain the DSCP value.
> It's not specified what to do when the top two bits are set.

CL: Same solution as above (4.2.3)

> 4.2.12
> :    +---+---+---+---+---+---+---+---+
> :    |   Reserved    |LF |FF |IsF|DF |
> :    +---+---+---+---+---+---+---+---+
> :
> :      Bitmask values:
> :
> :         Bit 7 - Don't fragment (DF)
> :
> :         Bit 6 - Is a fragment (IsF)
> :
> :         Bit 5 - First fragment (FF)
> :
> :         Bit 4 - Last fragment (LF)
> While the general behavior of unexpected lengths is already unclear, it's
> even less clear here.  
> Reserved behavior is not defined.  "Must be zero on transmit, should be zero
> on receive?"
> It's also not specified whether this is "any of" or not.  This impacts
> whether multiple bits being set is valid for some combinations.

CL: Regarding the reserved bits: Same solution as above (4.2.3).

Regarding the any-of or distinct/bitwise match. This is explained in section 4.2.9 where the bit-operator is introduced.

   m -   Match bit.  If set, this is a bitwise match operation defined
      as "(data AND value) == value"; if unset, (data AND value)
      evaluates to TRUE if any of the bits in the value mask are set in
      the data

> 5.
> :   filtering (prevention of traffic-based, denial-of-service (DOS)
> :   attacks, traffic filtering in the context of BGP/MPLS VPN service,
> :   and centralized traffic control for SDN/NFV networks) requires
> The separate clauses here likely need to be semicolon separated since the
> first clause is a compound separated by a comma.  Alternatively leave to the
> RFC editor to straight it out.

CL: Editorial changes.

> :   Section 8 has details on the limitation
> :   of previous mechanisms and why BGP Flow Specification version 1
> :   provides a solution for to prevent DOS and aid BGP/MPLS VPN filtering
> :   rules.
> I think this is the first time in the document this is referred to as v1 of
> the spec.

CL: This is definitely wrong and all occurrences of FS v1 need to be removed.

> :   This Flow Specification NLRI defined above to convey information
> :   about traffic filtering rules for traffic that should be discarded or
> :   handled in manner specified by a set of pre-defined actions (which
> :   are defined in BGP Extended Communities).  This mechanism is
> :   primarily designed to allow an upstream autonomous system to perform
> :   inbound filtering in their ingress routers of traffic that a given
> :   downstream AS wishes to drop.
> There's several bits of awkwardness in the above paragraph.  But among them
> is the presumption of drop being the default choice in the last sentence.

CL: There is a default in the specification under section 7 (accept). This is again also a question of extensibility. If new action-communities are “invented” a older implementation may not be able to identify those extended communities and thus may think that *no* action is specified (so it is forwarding the traffic to a node that knows about that new action). 

However FS-filter matching normally stops at the first match (subsequent FS-filters after the match are not taken into consideration). 

I suggest to remove the invalid “default” choice from this paragraph and explain the behaviour as described above. Section 7 as you pointed out below however should be a bit more verbose on the matching process.

> 5.1
> :   The relative order of two Flow Specification rules is determined by
> :   comparing their respective components.  The algorithm starts by
> :   comparing the left-most components of the rules. 
> The importance of whether the components are sorted becomes critical here.
> Given the semantic ambiguity of components the receiver is unable to decode
> (unknown), the length check also can't be applied here.  This is mostly
> important for likely buggy cases where two NLRI are semantically identical
> but mis-ordered.  Compare vs. the RFC 3107 withdraw ambiguities for label
> value some time ago.
> A trite example is consider two NLRI both containing the type 4 port field,
> both containing the same 1-byte value (e.g. 255), but one encoded using a
> single byte of length, the other with two bytes of length.  Is this an
> implicit update or not?
> Somewhat similarly, duplicate components aren't specifically called out, but
> potentially implied (as noted in an IDR exchange) by the requirements of the
> sort order.  Exactly what the behavior is in the face of duplicates should
> be specified.

CL: NLRI re-en-coding is not acceptable since it introduces a hole lot of problems. Opaqueness of the NLRI to BGP is somehow vital to avoid such situations altogether. See also the comment above [1].

> 6.
> :   The underlying concept is that the neighboring AS that advertises the
> :   best unicast route for a destination is allowed to advertise flow-
> :   spec information that conveys a more or equally specific destination
> :   prefix.  Thus, as long as there are no more specific unicast routes,
> :   received from a different neighboring AS, which would be affected by
> :   that filtering rule.
> Observation: An address space hijack that is not filtered, e.g.
> advertisement of a more specific route, may prevent inter-as flowspec from
> validating.  This might be worth putting in the operational considerations
> section.

CL: Yes, adding the operational considerations section:

   Since verfication of the traffic filtering NLRI is tied to the
   announcement of the best unicast route, a unfiltered address space
   hijack (e.g. advertisement of a more specific route) may cause this
   verification to fail and consequently prevent Flow Specification
   filters from being accepted by a peer.
> 7.
> :   Multiple traffic actions may be present for a single NLRI.  The
> :   traffic actions are processed in ascending order of the sub-type
> :   found in the BGP Extended Communities.  If not all of them can be
> :   processed the filter SHALL NOT be applied at all (for example: if for
> :   a given flow there are the action communities rate-limit-bytes and
> :   traffic-marking attached, and the plattform does not support one of
> :   them also the other shall not be applied for that flow).
> Typo on platform.
> The extended communities defined are not only a given sub-type, but also a
> type field.  The above suggestion is ambiguous about the case where the
> subtypes are equal but the types are distinct.
> I'm also wondering if the "applied at all" is the desired wording.  Does
> that mean, as earlier presented in the draft, we get:
> :   The default action for a traffic filtering Flow Specification is to
> :   accept IP traffic that matches that particular rule.
> Or does it mean that the rule is ignored?  The implications are different: A
> pass permits traffic to pass whereas ignore is effectively a "delete rule",
> which may cause traffic that would otherwise pass the filter to be impacted
> differently by subsequent rules.

CL: You are right this is not very clear. I think that a rule where not all of the actions can be performed should generate a match (meaning that subsequent filters are not evaluated) and the packet is treated as default-accept.

> 7.1
> :   The remaining 4 octets carry the maximum rate information in IEEE
> :   floating point [IEEE.754.1985] format, units being bytes per second.
> :   A traffic-rate of 0 should result on all traffic for the particular
> :   flow to be discarded.
> It is worth noting that while we could simply end up with preposterously
> large numbers, there is also the edge case that these floating point values
> may be NEGATIVE.  The draft may wish to note that negative numbers should be
> one of: ignored (see prior comment about how this impacts rule order), or
> treated as zero (discard)

CL: I suggest to add a lower bound there:

   flow to be discarded.  On encoding the traffic-rate MUST NOT be
   negative.  On decoding negative values MUST be treated as zero
   (discard all traffic).

> 7.2
> See 7.1

CL: See 7.1 ;-)

> 7.4
> There's a level of ambiguity in what happens when there's more than one
> redirect extended community:
> :   Interferes with: All other redirect functions.  All redirect                                     
> :   functions are mutually exclusive.  If this redirect function exists,
> :   then no other redirect functions can be processed.
> We're allowed to install to any VRF:
> :   If several local instances match this criteria,
> :   the choice between them is a local matter
> but this is partially also interacting with the text from earlier:
> :   Multiple traffic actions may be present for a single NLRI.  The
> :   traffic actions are processed in ascending order of the sub-type
> :   found in the BGP Extended Communities.  If not all of them can be
> :   processed the filter SHALL NOT be applied at all
> So, consider the case where we have two redirect extended communities.  If
> one of them does not have any matching VRFs, should the "SHALL NOT be
> applied at all" kick in?  If both can be used, do we pick the first as
> implied by ascending order?  Or do we take a wider reading of the 7.4 text
> and let the implementation just pick one?

CL: As of section 7.6. it is not allowed to have multiple redirect extended communities (-> interfering actions -> treat as withdraw):

   1.  Redirect-action-communities (0x8008, 0x8108, 0x8208):

       The three redirect-communities are mutually exclusive.  Only a
       single redirect community may be associated with a Flow
       Specification otherwise they are interfering.

> This indirectly raises a different point: For each of the action communities
> that aren't expected to allow multiple instances, how do you deal with the
> repeats?  E.g. two traffic-rate extended communities?

CL: As of section 7.6 this is not allowed:

   2.  All traffic-action communities (including redirect-actions):

       Multiple occurences of the same (sub-type and type) traffic-
       action associated with a Flow Specification are always
See also the above

> 7.6 covers the fact that only a single redirect community can be associated
> with a flowspec.  However, given that some routers may have target:A working
> but not target:B and vice versa elsewhere, we shouldn't necessarily restrict
> this.  

CL: yes this is a known drawback if we limit the redirect actions. We may also go with sorting them and picking the lowest/highest possible.

> 7.6.1 illustrates the idea that they are a conflict that should be treated
> as withdraw.  I believe this is not the desired behavior.

CL: as above in the specification it actually is. But as you pointed out maybe it is better to loosen this restriction and get some ordering in place (what should happen if a device is able to follow 2 redirect rules? We should not duplicate the packet :-) )  

> 9.2 Another reference to this being version 1.

CL: remove.
> 13.
> s/MuPedro/Pedro/

CL: apply regex/replace

> — 


Christoph Loibl <> | CL8-RIPE | PGP-Key-ID: 0x4B2C0055 | <>