Re: [bess] Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06

"Adrian Farrel" <adrian@olddog.co.uk> Wed, 06 March 2019 15:05 UTC

Reply-To: adrian@olddog.co.uk
From: Adrian Farrel <adrian@olddog.co.uk>
To: stephane.litkowski@orange.com, draft-ietf-bess-nsh-bgp-control-plane@ietf.org
Cc: bess@ietf.org
References: <6687_1551262912_5C7664C0_6687_242_18_9E32478DFA9976438E7A22F69B08FF924C199D40@OPEXCAUBMA3.corporate.adroot.infra.ftgroup> <090901d4d063$75aa6cf0$60ff46d0$@olddog.co.uk> <30790_1551796864_5C7E8A80_30790_14_1_9E32478DFA9976438E7A22F69B08FF924C19B882@OPEXCAUBMA3.corporate.adroot.infra.ftgroup>
In-Reply-To: <30790_1551796864_5C7E8A80_30790_14_1_9E32478DFA9976438E7A22F69B08FF924C19B882@OPEXCAUBMA3.corporate.adroot.infra.ftgroup>
Date: Wed, 06 Mar 2019 15:05:37 -0000
Organization: Old Dog Consulting
Message-ID: <036f01d4d42e$0ec01340$2c4039c0$@olddog.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: AQJikq86JmpEvAOqlfHnTU/L7Rjo2gJ6iJ9tAbgP62ikwA1tcA==
Content-Language: en-gb
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/3a450rKnIpSKCD8K8BbcmsMO_-A>
Subject: Re: [bess] Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06
Precedence: list

Thanks again Stephane,

I think we have closure on most (but not all) of your points. I'll post another revision now because it makes the incremental changes easier to process. But we can have another go round if any of the unresolved issues merit it.

One thing to push back on from before was the use of "portal" or "gateway". We were using "portal" and you asked us to change to "gateway". I initially thought that would be OK, but on reflection we think that "gateway" has specific connotations in networking where it means an interworking function between two different protocols and this is most definitely not what is intended. So, because "portal" is a synonym, we prefer to go back to using that.

   Thus the SFF can be seen as a portal in the underlay network through
   which a particular SFI is reached.

Cheers,
Adrian

> New comment:
>
> " When the SFF receives the packet and the NSH back from the SFI it
>   MUST select the next SFI"
>
> [SLI] Even if I agree that this is the intended behavior, it is not the
> purpose of this document to set the dataplane behavior of NSH. I
> think keeping the "MUST" as lower case is fine.

Ah yes.
I got carried away.

>>> The Figure 1 is not really used in this section as part of the existing
>>> text. I would be better to have a companion text that explains the
>>> figure.
>>
>> Wow! Yes. That's embarrassing.
>
> [SLI] The provided text is good. Just few comments:
>
> - the figure wraps on two pages, I missed the SFa in the figure as it is
>   located on the other page. It would be great if you could make it fit
>   on one page.

Hmmm, yes.
I will make the figure small enough to fit on one page, but I won't handle pagination at this stage: the RFC Editor will resolve that in the final formatting.

> - don't you need tunnels between SFF2 and SFF3 and between SFF1 
>   and SFF4 (full mesh). I agree that tunnel may be established on demand
>   if an SFC is using two SFFs, but here we don't have this information.

You *could* certainly have a those tunnels. But in practice there is unlikely to be a full mesh. This is a bit like a content distribution problem: a multi-layer network engineering solution is needed to place the SFs and decide which SFFs need to be connected with tunnels. Furthermore, if an SFC will never need to take SFs in a particular order, then tunnels wouldn't be needed.

> As the figure is already complex, adding tunnels may overload it. Maybe
> we could add a text telling that to simplify the figure only some tunnels
> between SFFs are represented.

Right, no need to complicate the figure.
I'll add...
      <t>Note that, for convenience and clarity, <xref target="SFCarch" /> shows only a few tunnels between
         SFFs.  There could be a full mesh of such tunnels, or more likely, a selection of tunnels connecting
         key SFFs to enable the construction of SFPs and to balance load and traffic in the network.</t>

> - the example of SFC looks strange to me as SFd may be used twice
>   in the chain why not using " SFa, an SF of type SFTx, and SFe" ?

Bad text, well caught.
Text should read...

      Suppose an SFC needs to include SFa,
      an SF of type SFTx, and SFc.

> - There is a sentence telling that the figure illustrates loadbalancing,
>   however I think that the sentence " A number of SFPs can be
>   constructed using any instance of SFb or using SFd." is not enough
>   to describe the loadbalancing. Who is doing the loadbalancing ?

OK. Now reads...

      <t>This figure demonstrates how load balancing can be achieved by creating several SFPs that satisfy
         the same SFC.  Suppose an SFC needs to include SFa, an SF of type SFTx, and SFc.  A number of SFPs
         can be constructed using any instance of SFb or using SFd.  Load balancing may be applied at two
         places:
         <list style="symbols">
           <t>The Classifier may distribute different flows onto different SFPs to share the load in the
              network and across SFIs.</t>
           <t>SFF-2 may distribute different flows (on the same SFP) to different instances of SFb to share
              the processing load.</t>
         </list></t>

>>> “The Service Function Type identifies a service function”. I don’t
>>> think we can really say that, it identifies the type of service the SF
>>> is providing but not the SF itself.
>>
>> Yes
>
> [SLI] The new text sounds strange.  Even if it is correct, it sounds as a repetition:
> " The Service Function Type identifies a service function type".
> Could we use something like : "The Service Function Type identifies the
> functions/features of service function can offer".

OK

>>> How is the nexthop encoded in the NLRI ?
>>
>> A bit confused about this question.
>
> [SLI] I'm talking about the nexthop field of the MP_REACH_NLRI attribute, 
> you must set a nexthop field even if it is not used for forwarding and you
> need to set how it is encoded.

Ah, that!
Yes, it's just a loopback address of the advertising SFF.
Added a paragraph for that.

>>> I don’t see the “error handling” behavior associated with this attribute
>>> (discard, treat-as-withdraw…)
>>
>> I think the errors are covered by section 6 of RFC 4271, but we need to
>> point to it.
>
> [SLI] You have added " Malformed SFP attributes, or those that in error in
> some way, MUST be handled as described in Section 6 of [RFC4271]"
> This is not enough ad RFC7606 allows for a more "graceful" process of 
> errors and it's up to each new attribute to have its own behavior in term
> of error processing. RFC7606 has some guidelines.

This one will take a little more time to work up some text.
We'll get back to you.

>>> Section 4.1
>
> [SLI] I have read it again, I think the last sentence was causing me
> some trouble:
> "An SFF
>   that has a presence in multiple service function overlay networks
>   (i.e., imports more than one RT) may find it helpful to maintain
>   separate forwarding state for each overlay network.".
>
> The isolation of the controlplane and forwarding information between
> tenants is a mandatory thing for security reason. The "may find it
> helpful" makes this something nice to have making the multitenancy
> case not widely deployed.

OK, I see the problem.
Of course, each SFP is separate routing state (like a signalled TE-LSP), so it is not technically necessary to keep separate routing state for each overlay network. The tenants would have no visibility into the state at SFFs.
And from a black box point of view, no one can tell the difference, so we don't need to instruct implementations.
But we can agree that it would be highly likely to maintain separate forwarding state for each overlay network.
So I will strengthen this to s/may find it helpful/will usually/

> NEW COMMENT:
> Section 5:
> "Note that each FlowSpec update MUST be tagged with the route target
>  of the overlay or VPN network for which it is intended."
> [SLI] You should be more clear that VPN-IPv4 and VPN-IPv6 Flowspec
> families must be used, it's not just a matter of RTs.

A couple of the authors have discussed this a bit and we are puzzled.

RFC 5575 section 8 discusses the applicability of Flowspecs to VPNs.
https://www.iana.org/assignments/flow-spec/flow-spec.xhtml#flow-spec-2 does not list any VPN Flowspecs.
draft-ietf-pce-pcep-flowspec makes observations about VPN identification and applicability to Flowspecs.
draft-ietf-idr-flowspec-l2vpn has a redefinition of SAFI 134 to apply to Flowspecs to an L2VPN environment.

I suspect that you are referring to the last of these four references.

Maybe you could suggest some text that would cover your concern.

>>> Section 7.1
>>>
>>> While I understand that the node doing the classification can perform a deep
>>> packet inspection to get an entropy indicator, any intermediate node cannot
>>> set it again as the NSH header will be there.
>> 
>> *If* you want entropy for the underlay network, you have to get
>> it from somewhere. And an entropy label is going to be a lot more
>> practical that hoping that each hope in the underlay can do some
>> form of hash.
>>
>> I don't propose any changes to the document for this point, but 
>> do feel free to negotiate.
>
> [SLI] I don't think that the current text actually solves the loadbalancing
> issue in the underlay. However I'm also wondering if it's the job of this
> draft (which defines a controlplane) to define where to set the entropy
> indicator and what are the requirements in term of hashing. It's more a
> dataplane issue, out of scope of this draft. I don't know if there is already
> other document in SFC or other WGs dealing with entropy issues when
> NSH is there.
> So I see two options:
> - you address fully the problem in your draft
> - or you make it out of scope, so some text should be 
>   removed. You can still say that there is a problem to solve.

I like your second option 😊

And the problem is made more complex when you consider that there may be different underlays between successive SFFs meaning that entropy has to be mapped between tunnel types.

Actually, this is a good change. This is a control plane document, not a full system specification. We have deleted the whole entropy section, and strengthened the text in 2.2 to note that "something must be done".

> I don’t like the representation of RD using “192.0.2.1,1” as the “,” can
> be confusing with a regular separator. Why not using :”192.0.2.1:1”
> notation which is well known ?
>
> [SLI] I don't see the change in v08

Snafu
Made the change in the XML and didn't regenerate the TXT.
Yes, "idiot" is an appropriate word.

>>> Section 8.9.1:
>>>
>>> How does an SFF know that an attached SFI is stateful ? I don’t
>>> think it can know that.
>>
>> Well, how does the SFF know which SFIs are attached and what
>> their types are?
>> The registration of SFIs to their SFFs is out of scope of this document 
>> (I think it was raised as a separate function in draft-ietf-sfc-control-plane).
>
> [SLI] Fine, could you tell that it is out of scope ?

Certainly.

>>> I don’t think that the fact that SFF2 is used in both direction is safe
>>> from a load balancing perspective.
>>> If the hashing algorithm used by SFF2 is sensible to the order of the
>>> keys (like source vs dest address, or source vs dest port), it may 
>>> provide a different SFI as a result of the hashing between the
>>> forward and the reverse flow.
>>
>> It's unclear what hashing might be used, but it seems to me that the
>> primary choice will be based on the SPI.
>> Of course, if the hash goes further (i.e.., payload) and the SFF is aware 
>> of forward/reverse traffic it is capable of hashing the right fields.
>>
>> But does this smells of an implementation detail?
>
> [SLI] Of course that will be implementation dependent, but the fact that
>  it is implementation dependent makes the behavior unpredictable and
> does not ensure that you will get symmetry.

OK. I went to add some text, looked for the right place and found section 7.3 where we have...

   For bidirectional SFPs where the same instance of a stateful SF must
   be traversed in both directions, it is not enough to leave the choice
   of service function instance as a local choice even if the load
   balancing is stable because coordination would be required between
   the decision points in the forward and reverse directions and this
   may be hard to achieve in all cases except where it is the same SFF
   that makes the choice in both directions.

   Note that this approach necessarily increases the amount of SFP state
   in the network (i.e., there are more SFPs).  It is possible to
   mitigate this effect by careful construction of SFPs built from a
   concatenation of other SFPs.

I think that covers it.
I'll put in a back pointer from 8.9.1 to 7.3.
Will also add a note to 8.9.1 to the effect that the problem can be resolved by a combination of detailed (choice-free) SFPs, and suitable programming of Classifiers.

>>> Section 9:
>>>
>>> Do we have to set limits on receiving nodes in term of number of
>>> states received from the controller to mitigate some attack ?
>>
>> I'm not sure, but probably not. But anyway, that would be out of scope.
>
> [SLI] I'm challenging, as the SecDir may challenge you on that point 
> or a similar one.

So the attack would rely on the controller being subverted, or the communications between controller and SFF being spoofed.

In the former case we have a complete disaster (similar to a route reflector being subverted), and the things that go wrong will be far worse than an overload attack on an SFF. And, not only is this problem out of scope, but it is not defendable in open systems. I think we rely on regulated software upgrades.

The communications between controller and SFF do not form part of this specification. So, I will add a paragraph to highlight the risk.

>>> The text talks about security of BGP, what kind of mechanism
>>> should be put in place ?
>>
>> This has similar security behaviour to a L3VPN, so I guess the same rules apply.
>
> [SLI] That would be good to tell this in the sec considerations. As you use
> similar distribution mechanism as RFC4364, the same sec considerations applies.

OK

>>> Do we have any interdomain considerations ?
>>
>> 8300 says that the intended scope is for use within a single provider's operational
>> domain.
>
> [SLI] That would be good to remind it as well.

OK. Added.

>>> References:
>>>
>>> I think that the mpls-sfc and mpls-sfc-encaps should also be
>>> normative as you are defining a controlplane to use them.
>>
>>I don't mind doing that.
>
> [SLI] These two are more debatable. Let's keep them as info, and 
> we will see if IESG raises any concern.

Well, actually, since those two docs are on the Standards Track and are well ahead in the pipe, let's make them Normative.

[bess] Shepherd's review of draft-ietf-bess-nsh-b… stephane.litkowski
Re: [bess] Shepherd's review of draft-ietf-bess-n… Andrew G. Malis
Re: [bess] Shepherd's review of draft-ietf-bess-n… stephane.litkowski
Re: [bess] Shepherd's review of draft-ietf-bess-n… Adrian Farrel
Re: [bess] Shepherd's review of draft-ietf-bess-n… stephane.litkowski
Re: [bess] Shepherd's review of draft-ietf-bess-n… Adrian Farrel
Re: [bess] Shepherd's review of draft-ietf-bess-n… Andrew G. Malis
Re: [bess] Shepherd's review of draft-ietf-bess-n… stephane.litkowski
Re: [bess] Shepherd's review of draft-ietf-bess-n… Adrian Farrel
Re: [bess] Shepherd's review of draft-ietf-bess-n… Adrian Farrel
Re: [bess] Shepherd's review of draft-ietf-bess-n… stephane.litkowski
Re: [bess] Shepherd's review of draft-ietf-bess-n… Adrian Farrel
Re: [bess] Shepherd's review of draft-ietf-bess-n… stephane.litkowski
Re: [bess] Shepherd's review of draft-ietf-bess-n… John E Drake
Re: [bess] Shepherd's review of draft-ietf-bess-n… stephane.litkowski