Re: [bess] Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06

<stephane.litkowski@orange.com> Tue, 05 March 2019 14:41 UTC

Return-Path: <stephane.litkowski@orange.com>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D7AF7126C7E; Tue, 5 Mar 2019 06:41:09 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cUKOVjjifIRj; Tue, 5 Mar 2019 06:41:06 -0800 (PST)
Received: from orange.com (mta240.mail.business.static.orange.com [80.12.66.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 65700124B0C; Tue, 5 Mar 2019 06:41:06 -0800 (PST)
Received: from opfedar03.francetelecom.fr (unknown [xx.xx.xx.5]) by opfedar20.francetelecom.fr (ESMTP service) with ESMTP id 44DKMN29lhz8tRK; Tue, 5 Mar 2019 15:41:04 +0100 (CET)
Received: from Exchangemail-eme6.itn.ftgroup (unknown [xx.xx.13.20]) by opfedar03.francetelecom.fr (ESMTP service) with ESMTP id 44DKMN1MgXzCqkr; Tue, 5 Mar 2019 15:41:04 +0100 (CET)
Received: from OPEXCAUBMA3.corporate.adroot.infra.ftgroup ([fe80::90fe:7dc1:fb15:a02b]) by OPEXCAUBMA1.corporate.adroot.infra.ftgroup ([fe80::f04d:ad3c:61de:a175%21]) with mapi id 14.03.0435.000; Tue, 5 Mar 2019 15:41:04 +0100
From: stephane.litkowski@orange.com
To: "adrian@olddog.co.uk" <adrian@olddog.co.uk>, "draft-ietf-bess-nsh-bgp-control-plane@ietf.org" <draft-ietf-bess-nsh-bgp-control-plane@ietf.org>
CC: "bess@ietf.org" <bess@ietf.org>
Thread-Topic: Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06
Thread-Index: AdTN15CohDC0LG//QgG1ys5nLdIbiQCg4JYAAL9ELHA=
Date: Tue, 05 Mar 2019 14:41:03 +0000
Message-ID: <30790_1551796864_5C7E8A80_30790_14_1_9E32478DFA9976438E7A22F69B08FF924C19B882@OPEXCAUBMA3.corporate.adroot.infra.ftgroup>
References: <6687_1551262912_5C7664C0_6687_242_18_9E32478DFA9976438E7A22F69B08FF924C199D40@OPEXCAUBMA3.corporate.adroot.infra.ftgroup> <090901d4d063$75aa6cf0$60ff46d0$@olddog.co.uk>
In-Reply-To: <090901d4d063$75aa6cf0$60ff46d0$@olddog.co.uk>
Accept-Language: fr-FR, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.114.13.245]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/Plh9PF7YrFdXZcYmvuj8hDRXfK0>
Subject: Re: [bess] Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Mar 2019 14:41:10 -0000

Hi Adrian,

Thanks for the updates. More comments inline.
I have trimmed the answers we have an agreement on and added some comments.

Brgds,


-----Original Message-----
From: Adrian Farrel [mailto:adrian@olddog.co.uk] 
Sent: Friday, March 01, 2019 20:18
To: LITKOWSKI Stephane OBS/OINIS; draft-ietf-bess-nsh-bgp-control-plane@ietf.org
Cc: bess@ietf.org
Subject: RE: Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06
> Introduction:
> - When “classifier” is used as an example of SF, do you refer to the classifier used to steer
>    the traffic onto an SFC ? If yes, I don’t think it could be considered as an SF.

Ah, no. Hmm. Tricky.
A SF could be responsible for classifying traffic into different buckets for different uses. This is a very similar function to an 'SFC Classifier' (responsible for classifying traffic onto different SFCs).
I can't think of a way to remove this confusion except by deleting this example. Instead, I have boosted the list of examples by borrowing from RFC 7498.

[SLI] Ok, there was a confusion, I thought you were talking about the SFC Classifier rather than a more generic classifier.
The way the text is in the new version is fine for me.


New comment:

" When the SFF receives the packet and the NSH back from the SFI it
   MUST select the next SFI"

[SLI] Even if I agree that this is the intended behavior, it is not the purpose of this document to set the dataplane behavior of NSH. I think keeping the "MUST" as lower case is fine.




> “Service Function Type (SFT) that
   is the category of SF that is supported by an SFF”. 
>
> Don’t you mean SFI rather than SFF ?

Nope.
SFFs support attached SFIs.
An SFI is an instance of an SF.
An SF has a type that is its SFT.

[SLI] I agree with your statement.
That may be an "English issue" on my side. I attached "that is supported by an SFF" to the "SFT" and not to the "SF" when reading.


> The Figure 1 is not really used in this section as part of the existing text. I 
> would be better to have a companion text that explains the figure.

Wow! Yes. That's embarrassing.

[SLI] The provided text is good. Just few comments:
- the figure wraps on two pages, I missed the SFa in the figure as it is located on the other page. It would be great if you could make it fit on one page.
- don't you need tunnels between SFF2 and SFF3 and between SFF1 and SFF4 (full mesh). I agree that tunnel may be established on demand if an SFC is using two SFFs, but here we don't have this information.
As the figure is already complex, adding tunnels may overload it. Maybe we could add a text telling that to simplify the figure only some tunnels between SFFs are represented.
- the example of SFC looks strange to me as SFd may be used twice in the chain why not using " SFa, an SF of type SFTx, and SFe" ?
- There is a sentence telling that the figure illustrates loadbalancing, however I think that the sentence " A number of SFPs can be constructed using any instance of SFb or using SFd." is not enough to describe the loadbalancing. Who is doing the loadbalancing ?


> I don’t like (personal opinion), the “grouping” of SFIs in the Figure 1 as part of an SFT. 
> What strikes me is that it could be confused with the usual representation of an SF 
> composed of multiple SFIs. Again that’s just a personal feeling.

Interesting. That confusion does sort of exist.

I don't think an SF is composed of multiple SFIs. 
An SF has a type: its SFT
The may be multiple instances of an SF, the SFIs, all of the same type.
And, of course, it is the SFIs that are attached to the SFF.
There may be multiple SFs of the same type. These are not necessarily to be indicated as SFIs. For example, two SFs of the same type may be accessed through different SFFs.

I think it might help if the text discussed this, but I don't think that it is right to make a hierarchy in the figure.

However, since we're here, the figure could be made a bit better along with the text to describe it.

[SLI] I agree, the figure looks good now.



> “The Service Function Type identifies a service function”. I don’t
> think we can really say that, it identifies the type of service the SF
> is providing but not the SF itself.

Yes

[SLI] The new text sounds strange.  Even if it is correct, it sounds as a repetition:
" The Service Function Type identifies a service function type".
Could we use something like : "The Service Function Type identifies the functions/features of service function can offer".


> “Each node hosting an SFI
>   must originate an SFIR for each type of SF that it hosts, and it may
>   advertise an SFIR for each instance of each type of SF.” 
>
> Is it really “and” ? I mean can we just summarize by using “Each 
> node hosting an SFI MUST originate an SFIR for each instance of
> each type of SF” ?

No. This makes a scaling optimization. If you don't need the creator of the SFP to know that there are 57 instances of an SF, you just advertise once for the type of SF and allow the local SFF select which SFI to use. On the other hand, if you want to allow the load balancing to be done by a controller, you can advertise.

Adding a brief explanation.

[SLI] The provided explanation is fine and improves the understanding.



> How is the nexthop encoded in the NLRI ?

A bit confused about this question. We're in the section that talks about the SFIR: the concept of nexthop applies to the SFP and so is found in the SFPR.

Perhaps the presence of the Tunnel Encapsulation attribute is the point of confusion. That attribute is there to say what protocol(s) the SFF is expecting to see used for packets arriving for the SFIs.

Or maybe you were expecting the NLRI to encode a prefix like it used to in the good old days?

[SLI] I'm talking about the nexthop field of the MP_REACH_NLRI attribute, you must set a nexthop field even if it is not used for forwarding and you need to set how it is encoded.


> I don’t see the “error handling” behavior associated with this attribute
> (discard, treat-as-withdraw…)

The errors would be...
- malformed attribute (need to cover this)
- wrong number of Association TLVs (but they are optional and multiple are allowed, so no error)
- Association TLV in error (there is a paragraph for this in 3.2.1.1)
- missing Hop TLV (need to cover this)
- Hop TLV in error (need to cover this)

I think the errors are covered by section 6 of RFC 4271, but we need to point to it.

[SLI] You have added " Malformed SFP attributes, or those that in error in some way, MUST be	
 	   handled as described in Section 6 of [RFC4271]"
This is not enough ad RFC7606 allows for a more "graceful" process of errors and it's up to each new attribute to have its own behavior in term of error processing. RFC7606 has some guidelines.


> Section 4.1
>
> Having an SFF being part of a single overlay network is IMO a very limited
> use case, in such a case, I don’t think that you need an RT at all. The SFF 
> could be part of a L3VPN which will be used as an underlay.
> An SFF being “multitenant” is more valid and defacto requires to maintain
> separate forwarding states (VRF like…). This is mandatory to maintain the
> tenant isolation and RTs are very useful here to know the appropriate
> context to put the routes in.

I wonder whether there is a misunderstanding.

We have...

   Every node in a service function overlay network is configured with
   one or more import RTs.

That means that an SFF, for example, can be in multiple overlays, and each SFP is placed within an individual overlay.

I re-read 4.1 and don't see it saying that an SFF is part of a single overlay. Can you point me at what I should change?

[SLI] I have read it again, I think the last sentence was causing me some trouble:
"An SFF
   that has a presence in multiple service function overlay networks
   (i.e., imports more than one RT) may find it helpful to maintain
   separate forwarding state for each overlay network.".
The isolation of the controlplane and forwarding information between tenants is a mandatory thing for security reason. The "may find it helpful" makes this something nice to have making the multitenancy case not widely deployed.


NEW COMMENT:
Section 5:
"Note that each FlowSpec update MUST be tagged with the route target	
 	   of the overlay or VPN network for which it is intended."
[SLI] You should be more clear that VPN-IPv4 and VPN-IPv6 Flowspec families must be used, it's not just a matter of RTs.


> Section 7.1
>
> While I understand that the node doing the classification can perform a deep
> packet inspection to get an entropy indicator, any intermediate node cannot
> set it again as the NSH header will be there.

Right. Exactly. Hence...
   However, when an NSH is
   included in a packet, those key fields may be inaccessible.  For
   example, the fields may be too far inside the packet for a forwarding
   engine to quickly find them and extract their values, or the node
   performing the examination may be unaware of the format and meaning
   of the NSH and so unable to parse far enough into the packet.

> Here is an example:
>
> PacketsIn -→ Classifier ->SFF1 (SFI1) -> SFF2(SFI2) ->SFF3(SFI3)->PacketsOut to dest
>
> Classifier pushes NSH header onto the packets as well as the underlay tunnel
> (using an entropy indicator in UDP or ELI/EL). Where the packets comes in at
> SFF1, IMO, it strips the underlay tunnels including the entropy indicator.
> Which means that when the packet with NSH header will go to SFF2, it will
> not have anymore entropy indicator and there is no way to build it again as
> the NSH header prevents a new deep packet inspection. Am I missing
> something ? Maintaining a per packet state is IMO not doable , especially if the
> SFI modifies the packet content.

OK. 
So this is the thing. *If* you want entropy for the underlay network, you have to get it from somewhere. And an entropy label is going to be a lot more practical that hoping that each hope in the underlay can do some form of hash.
There are some other options:
- use more fine-grained SFPs so that the SPI is effectively a key to 
  the entropy
- put the entropy as metadata in the NSH requiring that the SFF
  be NSH-aware
- encode the necessary entropy information in the tunnel attribute
- don't use entropy

I don't propose any changes to the document for this point, but do feel free to negotiate.

[SLI] I don't think that the current text actually solves the loadbalancing issue in the underlay. However I'm also wondering if it's the job of this draft (which defines a controlplane) to define where to set the entropy indicator and what are the requirements in term of hashing. It's more a dataplane issue, out of scope of this draft. I don't know if there is already other document in SFC or other WGs dealing with entropy issues when NSH is there.
So I see two options:
- you address fully the problem in your draft
- or you make it out of scope, so some text should be removed. You can still say that there is a problem to solve.


> I don’t like the representation of RD using “192.0.2.1,1” as the “,” can
> be confusing with a regular separator. Why not using :”192.0.2.1:1”
> notation which is well known ?

Obviously not so well known that I knew it well 😊
[SLI] All implementations are using this notation.

But, it's a good change.
[SLI] I don't see the change in v08

> Section 8.9.1:
>
> How does an SFF know that an attached SFI is stateful ? I don’t
> think it can know that.

Well, how does the SFF know which SFIs are attached and what their types are?
The registration of SFIs to their SFFs is out of scope of this document (I think it was raised as a separate function in draft-ietf-sfc-control-plane).


[SLI] Fine, could you tell that it is out of scope ?


> I don’t think that the fact that SFF2 is used in both direction is safe
> from a load balancing perspective.
> If the hashing algorithm used by SFF2 is sensible to the order of the
> keys (like source vs dest address, or source vs dest port), it may 
> provide a different SFI as a result of the hashing between the
> forward and the reverse flow.

It's unclear what hashing might be used, but it seems to me that the primary choice will be based on the SPI.
Of course, if the hash goes further (i.e.., payload) and the SFF is aware of forward/reverse traffic it is capable of hashing the right fields.

But does this smells of an implementation detail?

[SLI] Of course that will be implementation dependent, but the fact that it is implementation dependent makes the behavior unpredictable and does not ensure that you will get symmetry.




> Section 9:
>
> Do we have to set limits on receiving nodes in term of number of
> states received from the controller to mitigate some attack ?

I'm not sure, but probably not. But anyway, that would be out of scope.

[SLI] I'm challenging, as the SecDir may challenge you on that point or a similar one.


> The text talks about security of BGP, what kind of mechanism
> should be put in place ?

To be honest, I don't know.
This has similar security behaviour to a L3VPN, so I guess the same rules apply.

[SLI] That would be good to tell this in the sec considerations. As you use similar distribution mechanism as RFC4364, the same sec considerations applies.

> Do we have any interdomain considerations ?

8300 says that the intended scope is for use within a single provider's operational domain.
[SLI] That would be good to remind it as well.


> References:
> Shouldn’t RFC7665 be normative?

Probably. It gives us a downref, but maybe the IESG doesn't mind them these days.

[SLI] I agree, however this is a mandatory building block to understand this document, that's why I would prefer to have it as normative.


> I think that the mpls-sfc and mpls-sfc-encaps should also be
> normative as you are defining a controlplane to use them.

I don't mind doing that.
[SLI] These two are more debatable. Let's keep them as info, and we will see if IESG raises any concern.


_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.