Re: [Idr] Feedback on 5575-bis

Jeffrey Haas <jhaas@pfrc.org> Thu, 04 October 2018 16:28 UTC

Return-Path: <jhaas@slice.pfrc.org>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8A8D0130E64 for <idr@ietfa.amsl.com>; Thu, 4 Oct 2018 09:28:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 57gQIW5QOH-7 for <idr@ietfa.amsl.com>; Thu, 4 Oct 2018 09:28:10 -0700 (PDT)
Received: from slice.pfrc.org (slice.pfrc.org [67.207.130.108]) by ietfa.amsl.com (Postfix) with ESMTP id 989571294D0 for <idr@ietf.org>; Thu, 4 Oct 2018 09:28:10 -0700 (PDT)
Received: by slice.pfrc.org (Postfix, from userid 1001) id 695631E45C; Thu, 4 Oct 2018 12:27:38 -0400 (EDT)
Date: Thu, 04 Oct 2018 12:27:37 -0400
From: Jeffrey Haas <jhaas@pfrc.org>
To: Christoph Loibl <c@tix.at>
Cc: idr@ietf.org
Message-ID: <20181004162737.GD17157@pfrc.org>
References: <20180712185855.GQ12853@pfrc.org> <46D2DE8D-F552-444B-82C8-46DF301B1D10@tix.at>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <46D2DE8D-F552-444B-82C8-46DF301B1D10@tix.at>
User-Agent: Mutt/1.5.21 (2010-09-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/idr/NJi1sCg5YcVq-A07NkfaiurnfJg>
Subject: Re: [Idr] Feedback on 5575-bis
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idr/>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Oct 2018 16:28:14 -0000

Christoph,

On Thu, Oct 04, 2018 at 12:19:27PM +0200, Christoph Loibl wrote:
> Thank you very much for your detailed feedback on the draft.
> 
> As I promised this is the full report on your feedback on 5575bis draft. I appreciate discussion on the list. And will update the draft document next week according to the discussion.
> 
> However, I really need to make clear (again): With this -BIS we want to make things clearer and want to improve interoperability. We avoid fundamental changes that lead to incompatibility (esp. when it comes to encoding of the NLRI and propagation of the NLRI). Please always keep that in mind when suggesting changes (basically this is *not* flowspec 2.0)

[Some moralization about -bis documents.]

-bis documents are usually issued to clarify or fix broken behaviors.  While
this often means fixing easy errata, it also means the opportunity to
address things fundamentally broken in the original specificiation.

While I realize the authors' intent may be to do the least work necessary,
changes are already present that impact previously on-the-wire valid
encoding in a way that is not necessarily fully backward compatible.  And
*that's fine*, as long as it has working group consensus.

(Specific example being restriction of various filters to be of a specific
field length when that field length had a natural length, but wasn't
previously restricted against larger lengths in the original spec.)

So, while it may be the authors' intent to try to keep things small, it's
not out of order to try to make sure important things are addressed.  

> > Issue: Redirections may need additional scrutiny.  While the destination
> > based check on the received unicast routing is good, it's still possible for
> > the external AS to influence forwarding beyond "rate-limit/drop" if scrutiny
> > is not given to the other control communities.  This issue is compounded
> > with the redirect-ip feature, not in this document.
> > 
> > I'd suggest this be mentioned in an operational security considerations
> > section.
> 
> CL: As a matter of fact there is already a paragraph mentioning this particular problem. However I suggest to make it clearer with a operational security considerations section as you suggested:
> 
>    While the general verification of the traffic filter NLRI is
>    specified in this document (Section 6) the traffic filtering actions
>    received by a third party may need custom verification or filtering.
>    In particular all non traffic-rate actions may allow a third party to
>    modify packet forwarding properties and potentially gain access to
>    other routing-tables/VPNs or undesired queues.  This can be avoided
>    by proper filtering of action communities at network borders and by
>    mapping user-defined communities (see Section 7) to expose certain
>    forwarding properties to third parties.

This is reasonable text.

> 
> 
> > 
> > :   In many deployments of BGP Flow Specification, the Flow Specification
> > :   information has replace existing host length route advertisements.
> > 
> > s/replace/replaced/.  Also, it's unclear in the context of this paragraph
> > why this is interesting.  A bit of explanatory text explaining the use of
> > such most-specific routes for redirecting traffic would be useful.
> 
> CL: Suggest to remove the sentence from the draft

Works for me.

> > :   Flow Specification components must follow strict type ordering by
> > :   increasing numerical order.  A given component type may (exactly
> > :   once) or may not be present in the specification.  If present, it
> > :   MUST precede any component of higher numeric type value.
> > 
> > .... and what if they don't?  Is it malformed?  Should the implementation
> > attempt to re-order?
> 
> CL: [1] I sent an email roughly a month ago to start a discussion about the encoding of the NLRI - with no response:

You may have noticed I and others have erratic IETF reply time. :-)

> *) NLRI is widely ment opaque to BGP (no changes to RFC5575)
> 
> but: 
> 
> *) BGP needs to decode the NLRI to the first component (DST-IP) for the verification (Section 6) (not more of it! - the rest is part of the application that receives the NLRI)
> 
> Since changing the above properties would introduce a fundamental change to FS I strongly beleave that it does not fit into a 5575bis.
> 
> If the components are not in the right order the encoding _is_wrong_. However BGP cannot detect this, because its knowledge about the correct encoding ends after the very first component (DST-IP). So, to obey the rules (RFC5575) of an opaque NLRI FS-garbage gets propagated. Only when the decoding of the NLRI actually happens (not in BGP) such invalids can be detected and need to be ignored. 

The "opaque" requirement is the thing I fundamentally question and I believe
is deserving of more WG discussion.  As you note in your own paper on
flowspec given at RIPE 74:

"In that case such a BGP UPDATE may lead to BGP sessions flapping not only
on adjacent routers but also on remote routesrs not directly connected with
the router that originates that NLRI as long as it is part of that same
flow-specification domain".

We fall back to an NLRI form of what I have called "optional transitive
nonsense" when discussing such errors for Path Attributes.  

Don't understand it?  Propogate it.  Cause issues downstream.

As you note in that same paper, at least one vendor fails to propagate
flowspec routes when they don't install a filter.

My employer's implementation currently drops sessions when components that
aren't understood are received.  This is arguably wrong, but 5575 existed
before RFC 7606 error handling procedures became the norm.  These days,
"treat as withdraw" may be the best behavior if not propagated opaquely.

I'll note that your paper didn't test for this condition.  Finding out what
the other implementations do is reasonable in a -bis situation to determine
if this "propagate opaque" is the best idea.

> Additionally reordering is not possible because BGP has no understanding of the components (which is actually good because it allows future extensions) - (btw. reordering/re-encoding of the NLRI is potentially dangerous because it may produce “ghost”/“mutated" duplicates of an NLRI in the network).

This is already problematic.  Consider the case of L3VPNs where the label
portion of the NLRI wasn't cleanly specified.  A result of this was edge
cases of key comparison being done differently on some implementations.

Basically, while I agree with the core point, BGP extensions already force
us to consider such things.

> I suggest to add a paragraph to the draft to make it clear.
> 
> Please also refer to my Email one or two months ago - the suggested changes (100% compatible with current implementations) may also help to introduce future extensions.

URL, please?

> CL: See the above comment. As long as BGP treats the NLRI as an opaque value we do not have a problem with all this (not even with new FS components). Downstram propagation of the garbage can be partly avoided (see my Email on that topic) at some cost. But this issue exists with every BGP opaque NLRI - this is not special to FS.

NLRI is generally NOT opaque in BGP.  This is a fundamental design issue of
the protocol that prevents the deployment of completely generic propogation
infrastructure.

What did you have in mind here?

> > 4.2.3:
> > :     0   1   2   3   4   5   6   7
> > :   +---+---+---+---+---+---+---+---+
> > :   | e | a |  len  | 0 |lt |gt |eq |
> > :   +---+---+---+---+---+---+---+---+
> > :
[...]
> > 
> > :      len - length of the value field for this operand encodes 1 (00) -
> > :      4 (11) bytes.  Type 3 flow component values are always encoded as
> > :      single byte (len = 00).
> > 
> > "always encoded" begs the question about what to do when it isn't.  Being
> > encoded as 4 bytes of length but a length of 1 is still correct, e.g.  Does
> > Postel's maxim apply here?
> 
> CL: I suggest change all the occurrences of 0, and must be 0 to “MUST be encoded as 0 and ignored on decoding”. Also we suggest (SHOULD) to encode the length of the compoents. Furthermore we suggest a-and bit MUST be unset on first operator, and ignored on decode (mainly for compatibility reasons - I do not want to discuss if Postel was right or not :-) ):

This is a point where a blanket declaration risks making existing
implementations non-compliant.  Again, poentially in scope, but not quite
consistent with your stated aims. :-)

>       a - AND bit.  If unset, the previous term is logically ORed with
>       the current one.  If set, the operation is a logical AND.  In the
>       first operator byte of a sequence it MUST be encoded as unset and
>       and MUST be treated as always unset on decoding.  The AND operator
>       has higher priority than OR for the purposes of evaluating logical
>       expressions.

While I think this is reasonable, it does run up against some issues with
regards to encoding.  As you note, how we treat issues for reordering apply
here.

This is because implementations that follow this behavior are likely to mask
off the and behavior.  Implementations often regenerate the NLRI from
internal data structures.  As a result, doing the right thing runs into one
of your other concerns.

IMO, the right thing to do is keep the text above.

>       len - length of the value field for this operand encodes 1 (00) -
>       4 (11) bytes.  Type 3 flow component values SHOULD be encoded as
>       single byte (len = 00).

I'm good with this.  Same to other length comments later on.

>       0 - MUST be set to 0 on NLRI encoding, and MUST be ignored during
>       decoding

SHOULD is more appropriate here, lest you break future extensions.

> > 4.2.11
> > :      Defines a list of {operator, value} pairs used to match the 6-bit
> > :      DSCP field [RFC2474].  This list is encoded using the numeric
> > :      operator format defined in Section 4.2.3.  Values are encoded
> > :      using a single byte.  The two most significant bits are zero and
> > :      the six least significant bits contain the DSCP value.
> > 
> > It's not specified what to do when the top two bits are set.
> 
> CL: Same solution as above (4.2.3)

SHOULD be zero and ignored on receipt is fine.


> 
> > 
> > 4.2.12
> > :    +---+---+---+---+---+---+---+---+
> > :    |   Reserved    |LF |FF |IsF|DF |
> > :    +---+---+---+---+---+---+---+---+
> > :
> > :      Bitmask values:
> > :
> > :         Bit 7 - Don't fragment (DF)
> > :
> > :         Bit 6 - Is a fragment (IsF)
> > :
> > :         Bit 5 - First fragment (FF)
> > :
> > :         Bit 4 - Last fragment (LF)
> > 
> > 
> > While the general behavior of unexpected lengths is already unclear, it's
> > even less clear here.  
> > 
> > Reserved behavior is not defined.  "Must be zero on transmit, should be zero
> > on receive?"
> > 
> > It's also not specified whether this is "any of" or not.  This impacts
> > whether multiple bits being set is valid for some combinations.
> 
> CL: Regarding the reserved bits: Same solution as above (4.2.3).
> 
> Regarding the any-of or distinct/bitwise match. This is explained in section 4.2.9 where the bit-operator is introduced.
> 
>    m -   Match bit.  If set, this is a bitwise match operation defined
>       as "(data AND value) == value"; if unset, (data AND value)
>       evaluates to TRUE if any of the bits in the value mask are set in
>       the data

Agreed.

> > :   This Flow Specification NLRI defined above to convey information
> > :   about traffic filtering rules for traffic that should be discarded or
> > :   handled in manner specified by a set of pre-defined actions (which
> > :   are defined in BGP Extended Communities).  This mechanism is
> > :   primarily designed to allow an upstream autonomous system to perform
> > :   inbound filtering in their ingress routers of traffic that a given
> > :   downstream AS wishes to drop.
> > 
> > There's several bits of awkwardness in the above paragraph.  But among them
> > is the presumption of drop being the default choice in the last sentence.
> 
> CL: There is a default in the specification under section 7 (accept). This is again also a question of extensibility. If new action-communities are “invented” a older implementation may not be able to identify those extended communities and thus may think that *no* action is specified (so it is forwarding the traffic to a node that knows about that new action). 
> 
> However FS-filter matching normally stops at the first match (subsequent FS-filters after the match are not taken into consideration). 
> 
> I suggest to remove the invalid “default” choice from this paragraph and explain the behaviour as described above. Section 7 as you pointed out below however should be a bit more verbose on the matching process.

I'll look for the proposed text.

> > 5.1
[...]
> > Somewhat similarly, duplicate components aren't specifically called out, but
> > potentially implied (as noted in an IDR exchange) by the requirements of the
> > sort order.  Exactly what the behavior is in the face of duplicates should
> > be specified.
> 
> CL: NLRI re-en-coding is not acceptable since it introduces a hole lot of problems. Opaqueness of the NLRI to BGP is somehow vital to avoid such situations altogether. See also the comment above [1].

See again prior comments.  Many implementations build data structurres from
received NLRI.  If you understand it, you may regenerate it in canonical
order.

If you don't understand it, your implementation has no choice to treat it
opaquely.

The issue here is cases where you do understand it, but it's inconsistent.
So, I don't think your general opaque comment holds here.

Our implementation is likely at some future point to permit propagation of
opaque entries, but will likely default to "treat as withdraw".

> CL: Yes, adding the operational considerations section:
> 
>    Since verfication of the traffic filtering NLRI is tied to the
>    announcement of the best unicast route, a unfiltered address space
>    hijack (e.g. advertisement of a more specific route) may cause this
>    verification to fail and consequently prevent Flow Specification
>    filters from being accepted by a peer.

Thanks.

> > 
> > 7.
> > :   Multiple traffic actions may be present for a single NLRI.  The
> > :   traffic actions are processed in ascending order of the sub-type
> > :   found in the BGP Extended Communities.  If not all of them can be
> > :   processed the filter SHALL NOT be applied at all (for example: if for
> > :   a given flow there are the action communities rate-limit-bytes and
> > :   traffic-marking attached, and the plattform does not support one of
> > :   them also the other shall not be applied for that flow).
> > 
> > Typo on platform.
> > The extended communities defined are not only a given sub-type, but also a
> > type field.  The above suggestion is ambiguous about the case where the
> > subtypes are equal but the types are distinct.
> > 
> > I'm also wondering if the "applied at all" is the desired wording.  Does
> > that mean, as earlier presented in the draft, we get:
> > 
> > :   The default action for a traffic filtering Flow Specification is to
> > :   accept IP traffic that matches that particular rule.
> > 
> > Or does it mean that the rule is ignored?  The implications are different: A
> > pass permits traffic to pass whereas ignore is effectively a "delete rule",
> > which may cause traffic that would otherwise pass the filter to be impacted
> > differently by subsequent rules.
> 
> CL: You are right this is not very clear. I think that a rule where not all of the actions can be performed should generate a match (meaning that subsequent filters are not evaluated) and the packet is treated as default-accept.

I think that default accept is probably the SAFEST behavior, but is perhaps
not what is desired.  I suggest flagging this issue for explicit WG
discussion.

> > 7.1
> > :   The remaining 4 octets carry the maximum rate information in IEEE
> > :   floating point [IEEE.754.1985] format, units being bytes per second.
> > :   A traffic-rate of 0 should result on all traffic for the particular
> > :   flow to be discarded.
> > 
> > It is worth noting that while we could simply end up with preposterously
> > large numbers, there is also the edge case that these floating point values
> > may be NEGATIVE.  The draft may wish to note that negative numbers should be
> > one of: ignored (see prior comment about how this impacts rule order), or
> > treated as zero (discard)
> > 
> 
> CL: I suggest to add a lower bound there:
> 
>    flow to be discarded.  On encoding the traffic-rate MUST NOT be
>    negative.  On decoding negative values MUST be treated as zero
>    (discard all traffic).

This seems reasonable.

> > 7.4
> > There's a level of ambiguity in what happens when there's more than one
> > redirect extended community:
> > 
> > :   Interferes with: All other redirect functions.  All redirect                                     
> > :   functions are mutually exclusive.  If this redirect function exists,
> > :   then no other redirect functions can be processed.
> > 
> > We're allowed to install to any VRF:
> > :   If several local instances match this criteria,
> > :   the choice between them is a local matter
> > 
> > but this is partially also interacting with the text from earlier:
> > :   Multiple traffic actions may be present for a single NLRI.  The
> > :   traffic actions are processed in ascending order of the sub-type
> > :   found in the BGP Extended Communities.  If not all of them can be
> > :   processed the filter SHALL NOT be applied at all
> > 
> > So, consider the case where we have two redirect extended communities.  If
> > one of them does not have any matching VRFs, should the "SHALL NOT be
> > applied at all" kick in?  If both can be used, do we pick the first as
> > implied by ascending order?  Or do we take a wider reading of the 7.4 text
> > and let the implementation just pick one?
> 
> CL: As of section 7.6. it is not allowed to have multiple redirect extended communities (-> interfering actions -> treat as withdraw):

Again, that's a change of behavior and one that interferes with scenarios
where flowspec routes may be propagated among a set of routers with disjoint
VRFs.

>    1.  Redirect-action-communities (0x8008, 0x8108, 0x8208):
> 
>        The three redirect-communities are mutually exclusive.  Only a
>        single redirect community may be associated with a Flow
>        Specification otherwise they are interfering.
> 
> 
> > This indirectly raises a different point: For each of the action communities
> > that aren't expected to allow multiple instances, how do you deal with the
> > repeats?  E.g. two traffic-rate extended communities?
> 
> CL: As of section 7.6 this is not allowed:
> 
>    2.  All traffic-action communities (including redirect-actions):
> 
>        Multiple occurences of the same (sub-type and type) traffic-
>        action associated with a Flow Specification are always
>        interfering.
> See also the above
> 
> 
> > 7.6 covers the fact that only a single redirect community can be associated
> > with a flowspec.  However, given that some routers may have target:A working
> > but not target:B and vice versa elsewhere, we shouldn't necessarily restrict
> > this.  
> > 
> 
> CL: yes this is a known drawback if we limit the redirect actions. We may also go with sorting them and picking the lowest/highest possible.

This is a change of behavior and one I disagree with. :-)  Also, one that
I'll fight all the way through WGLC.  This is deployed behavior among my
customer-base.

That said, my request is that the spec discuss inconsistencies rather than
trying to preclude it.

It's also worth noting that "magic community" issues are endemic to BESS
documents.  I've been wanting to write a draft there discussing it, but it's
been on my low priority queue.

-- Jeff