[Idr] Feedback on 5575-bis

Jeffrey Haas <jhaas@pfrc.org> Thu, 12 July 2018 18:59 UTC

Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 03EAE130F3C for <idr@ietfa.amsl.com>; Thu, 12 Jul 2018 11:59:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sXa7ncpo7fSa for <idr@ietfa.amsl.com>; Thu, 12 Jul 2018 11:59:07 -0700 (PDT)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id 1A311130DC5 for <idr@ietf.org>; Thu, 12 Jul 2018 11:59:06 -0700 (PDT)
Received: by slice.pfrc.org (Postfix, from userid 1001) id 580221E28F; Thu, 12 Jul 2018 14:58:56 -0400 (EDT)
Date: Thu, 12 Jul 2018 14:58:55 -0400
From: Jeffrey Haas <jhaas@pfrc.org>
To: idr@ietf.org
Message-ID: <20180712185855.GQ12853@pfrc.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/60L9CxbFhcUKByuxlffKjpxyUoc>
Subject: [Idr] Feedback on 5575-bis
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jul 2018 18:59:11 -0000

As I promised Sue, I've done a thorough re-read of draft-ietf-idr-rfc5575bis.
This re-read was partially prompted by recent work in my employer's
implementation so this was very fresh.

Here are my comments:

:   A Flow Specification received from an external autonomous system will
:   need to be validated against unicast routing before being accepted.
:   If the aggregate traffic flow defined by the unicast destination
:   prefix is forwarded to a given BGP peer, then the local system can
:   safely install more specific flow rules that may result in different
:   forwarding behavior, as requested by this system.

Issue: Redirections may need additional scrutiny.  While the destination
based check on the received unicast routing is good, it's still possible for
the external AS to influence forwarding beyond "rate-limit/drop" if scrutiny
is not given to the other control communities.  This issue is compounded
with the redirect-ip feature, not in this document.

I'd suggest this be mentioned in an operational security considerations
section.

:   In many deployments of BGP Flow Specification, the Flow Specification
:   information has replace existing host length route advertisements.

s/replace/replaced/.  Also, it's unclear in the context of this paragraph
why this is interesting.  A bit of explanatory text explaining the use of
such most-specific routes for redirecting traffic would be useful.

:   Experience with previous BGP extensions has also shown that the
:   maximum capacity of BGP speakers has been gradually increased
:   according to expected loads.  Taking into account Internet unicast
:   routing as well as additional applications as they gain popularity.

The last sentence is dangling.

:   Flow Specification components must follow strict type ordering by
:   increasing numerical order.  A given component type may (exactly
:   once) or may not be present in the specification.  If present, it
:   MUST precede any component of higher numeric type value.

... and what if they don't?  Is it malformed?  Should the implementation
attempt to re-order?

:   All combinations of component types within a single NLRI are allowed,
:   even if the combination makes no sense from a semantical perspective.
:   If a given component type within a prefix in unknown, the prefix in
:   question cannot be used for traffic filtering purposes by the
:   receiver.  Since a Flow Specification has the semantics of a logical
:   AND of all components, if a component is FALSE, by definition it
:   cannot be applied.  However, for the purposes of BGP route
:   propagation, this prefix should still be transmitted since BGP route
:   distribution is independent on NLRI semantics.

The issue here is when we encounter an unknown component, we can no longer
verify the syntactic correctness of the NLRI.  This means we will pass
potentially invalid entries downstream.  Downstream receivers must thus not
drop the session, but should use treat-as-withdraw semantics.

This issue in validation, as I tersely noted in a prior IDR thread, really
needs discussion in the document.  It impacts the ability to verify, to add
new component types, and potentially cause invalid NLRI to propagate
downstream.

:   The <type, value> encoding is chosen in order to allow for future
:   extensibility.

It should be mentioned that due to the property above that extensions are
problematic.

4.2.3:
:     0   1   2   3   4   5   6   7
:   +---+---+---+---+---+---+---+---+
:   | e | a |  len  | 0 |lt |gt |eq |
:   +---+---+---+---+---+---+---+---+
:
:        Numeric operator

The 0 field is unclear what the expected behavior is when it's NOT zero.

:      a - AND bit.  If unset, the previous term is logically ORed with
:      the current one.  If set, the operation is a logical AND.  It
:      should be unset in the first operator byte of a sequence.  The AND
:      operator has higher priority than OR for the purposes of
:      evaluating logical expressions.

It's not clear what should be done if AND is set in the first list entry.

:      len - length of the value field for this operand encodes 1 (00) -
:      4 (11) bytes.  Type 3 flow component values are always encoded as
:      single byte (len = 00).

"always encoded" begs the question about what to do when it isn't.  Being
encoded as 4 bytes of length but a length of 1 is still correct, e.g.  Does
Postel's maxim apply here?

4.2.4
:      Defines a list of {operator, value} pairs that matches source OR
:      destination TCP/UDP ports.  This list is encoded using the numeric
:      operator format defined in Section 4.2.3.  Values are encoded as
:      1- or 2-byte quantities.

It's not clear what to do if the encoding is > 2 byte.  Ditto for sections
4.2.5, 4.2.6, etc.

4.2.9 has a similar "must be zero" set of bits that need to have their
behavior specified.

4.2.11
:      Defines a list of {operator, value} pairs used to match the 6-bit
:      DSCP field [RFC2474].  This list is encoded using the numeric
:      operator format defined in Section 4.2.3.  Values are encoded
:      using a single byte.  The two most significant bits are zero and
:      the six least significant bits contain the DSCP value.

It's not specified what to do when the top two bits are set.

4.2.12
:    +---+---+---+---+---+---+---+---+
:    |   Reserved    |LF |FF |IsF|DF |
:    +---+---+---+---+---+---+---+---+
:
:      Bitmask values:
:
:         Bit 7 - Don't fragment (DF)
:
:         Bit 6 - Is a fragment (IsF)
:
:         Bit 5 - First fragment (FF)
:
:         Bit 4 - Last fragment (LF)


While the general behavior of unexpected lengths is already unclear, it's
even less clear here.  

Reserved behavior is not defined.  "Must be zero on transmit, should be zero
on receive?"

It's also not specified whether this is "any of" or not.  This impacts
whether multiple bits being set is valid for some combinations.

5.

:   filtering (prevention of traffic-based, denial-of-service (DOS)
:   attacks, traffic filtering in the context of BGP/MPLS VPN service,
:   and centralized traffic control for SDN/NFV networks) requires

The separate clauses here likely need to be semicolon separated since the
first clause is a compound separated by a comma.  Alternatively leave to the
RFC editor to straight it out.

:   Section 8 has details on the limitation
:   of previous mechanisms and why BGP Flow Specification version 1
:   provides a solution for to prevent DOS and aid BGP/MPLS VPN filtering
:   rules.

I think this is the first time in the document this is referred to as v1 of
the spec.

:   This Flow Specification NLRI defined above to convey information
:   about traffic filtering rules for traffic that should be discarded or
:   handled in manner specified by a set of pre-defined actions (which
:   are defined in BGP Extended Communities).  This mechanism is
:   primarily designed to allow an upstream autonomous system to perform
:   inbound filtering in their ingress routers of traffic that a given
:   downstream AS wishes to drop.

There's several bits of awkwardness in the above paragraph.  But among them
is the presumption of drop being the default choice in the last sentence.

5.1
:   The relative order of two Flow Specification rules is determined by
:   comparing their respective components.  The algorithm starts by
:   comparing the left-most components of the rules. 

The importance of whether the components are sorted becomes critical here.
Given the semantic ambiguity of components the receiver is unable to decode
(unknown), the length check also can't be applied here.  This is mostly
important for likely buggy cases where two NLRI are semantically identical
but mis-ordered.  Compare vs. the RFC 3107 withdraw ambiguities for label
value some time ago.

A trite example is consider two NLRI both containing the type 4 port field,
both containing the same 1-byte value (e.g. 255), but one encoded using a
single byte of length, the other with two bytes of length.  Is this an
implicit update or not?

Somewhat similarly, duplicate components aren't specifically called out, but
potentially implied (as noted in an IDR exchange) by the requirements of the
sort order.  Exactly what the behavior is in the face of duplicates should
be specified.

6.
:   The underlying concept is that the neighboring AS that advertises the
:   best unicast route for a destination is allowed to advertise flow-
:   spec information that conveys a more or equally specific destination
:   prefix.  Thus, as long as there are no more specific unicast routes,
:   received from a different neighboring AS, which would be affected by
:   that filtering rule.


Observation: An address space hijack that is not filtered, e.g.
advertisement of a more specific route, may prevent inter-as flowspec from
validating.  This might be worth putting in the operational considerations
section.

7.
:   Multiple traffic actions may be present for a single NLRI.  The
:   traffic actions are processed in ascending order of the sub-type
:   found in the BGP Extended Communities.  If not all of them can be
:   processed the filter SHALL NOT be applied at all (for example: if for
:   a given flow there are the action communities rate-limit-bytes and
:   traffic-marking attached, and the plattform does not support one of
:   them also the other shall not be applied for that flow).

Typo on platform.
The extended communities defined are not only a given sub-type, but also a
type field.  The above suggestion is ambiguous about the case where the
subtypes are equal but the types are distinct.

I'm also wondering if the "applied at all" is the desired wording.  Does
that mean, as earlier presented in the draft, we get:

:   The default action for a traffic filtering Flow Specification is to
:   accept IP traffic that matches that particular rule.

Or does it mean that the rule is ignored?  The implications are different: A
pass permits traffic to pass whereas ignore is effectively a "delete rule",
which may cause traffic that would otherwise pass the filter to be impacted
differently by subsequent rules.

7.1
:   The remaining 4 octets carry the maximum rate information in IEEE
:   floating point [IEEE.754.1985] format, units being bytes per second.
:   A traffic-rate of 0 should result on all traffic for the particular
:   flow to be discarded.

It is worth noting that while we could simply end up with preposterously
large numbers, there is also the edge case that these floating point values
may be NEGATIVE.  The draft may wish to note that negative numbers should be
one of: ignored (see prior comment about how this impacts rule order), or
treated as zero (discard)

7.2
See 7.1

7.4
There's a level of ambiguity in what happens when there's more than one
redirect extended community:

:   Interferes with: All other redirect functions.  All redirect                                     
:   functions are mutually exclusive.  If this redirect function exists,
:   then no other redirect functions can be processed.

We're allowed to install to any VRF:
:   If several local instances match this criteria,
:   the choice between them is a local matter

but this is partially also interacting with the text from earlier:
:   Multiple traffic actions may be present for a single NLRI.  The
:   traffic actions are processed in ascending order of the sub-type
:   found in the BGP Extended Communities.  If not all of them can be
:   processed the filter SHALL NOT be applied at all

So, consider the case where we have two redirect extended communities.  If
one of them does not have any matching VRFs, should the "SHALL NOT be
applied at all" kick in?  If both can be used, do we pick the first as
implied by ascending order?  Or do we take a wider reading of the 7.4 text
and let the implementation just pick one?

This indirectly raises a different point: For each of the action communities
that aren't expected to allow multiple instances, how do you deal with the
repeats?  E.g. two traffic-rate extended communities?

7.6 covers the fact that only a single redirect community can be associated
with a flowspec.  However, given that some routers may have target:A working
but not target:B and vice versa elsewhere, we shouldn't necessarily restrict
this.  

7.6.1 illustrates the idea that they are a conflict that should be treated
as withdraw.  I believe this is not the desired behavior.

9.2 Another reference to this being version 1.

13.
s/MuPedro/Pedro/

-- Jeff