Re: [bess] Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06

"Adrian Farrel" <adrian@olddog.co.uk> Fri, 01 March 2019 19:17 UTC

Return-Path: <adrian@olddog.co.uk>
X-Original-To: bess@ietfa.amsl.com
Delivered-To: bess@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 98EED130F35; Fri, 1 Mar 2019 11:17:58 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Xq7p5d8yS9-S; Fri, 1 Mar 2019 11:17:53 -0800 (PST)
Received: from mta7.iomartmail.com (mta7.iomartmail.com [62.128.193.157]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 575C3130F2F; Fri, 1 Mar 2019 11:17:53 -0800 (PST)
Received: from vs1.iomartmail.com (vs1.iomartmail.com [10.12.10.121]) by mta7.iomartmail.com (8.14.4/8.14.4) with ESMTP id x21JHoAe031984; Fri, 1 Mar 2019 19:17:50 GMT
Received: from vs1.iomartmail.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9502A2203A; Fri, 1 Mar 2019 19:17:50 +0000 (GMT)
Received: from asmtp2.iomartmail.com (unknown [10.12.10.249]) by vs1.iomartmail.com (Postfix) with ESMTPS id 7E03A22040; Fri, 1 Mar 2019 19:17:50 +0000 (GMT)
Received: from LAPTOPK7AS653V ([87.112.237.8]) (authenticated bits=0) by asmtp2.iomartmail.com (8.14.4/8.14.4) with ESMTP id x21JHmGP009670 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Fri, 1 Mar 2019 19:17:49 GMT
Reply-To: adrian@olddog.co.uk
From: Adrian Farrel <adrian@olddog.co.uk>
To: stephane.litkowski@orange.com, draft-ietf-bess-nsh-bgp-control-plane@ietf.org
Cc: bess@ietf.org
References: <6687_1551262912_5C7664C0_6687_242_18_9E32478DFA9976438E7A22F69B08FF924C199D40@OPEXCAUBMA3.corporate.adroot.infra.ftgroup>
In-Reply-To: <6687_1551262912_5C7664C0_6687_242_18_9E32478DFA9976438E7A22F69B08FF924C199D40@OPEXCAUBMA3.corporate.adroot.infra.ftgroup>
Date: Fri, 01 Mar 2019 19:17:48 -0000
Organization: Old Dog Consulting
Message-ID: <090901d4d063$75aa6cf0$60ff46d0$@olddog.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQJikq86JmpEvAOqlfHnTU/L7Rjo2qTX6iuA
Content-Language: en-gb
X-Originating-IP: 87.112.237.8
X-Thinkmail-Auth: adrian@olddog.co.uk
X-TM-AS-GCONF: 00
X-TM-AS-Product-Ver: IMSVA-9.0.0.1623-8.2.0.1013-24464.002
X-TM-AS-Result: No--12.729-10.0-31-10
X-imss-scan-details: No--12.729-10.0-31-10
X-TMASE-Version: IMSVA-9.0.0.1623-8.2.1013-24464.002
X-TMASE-Result: 10--12.729100-10.000000
X-TMASE-MatchedRID: IeZYkn8zfFrxIbpQ8BhdbOYAh37ZsBDCbv16+gil4jedCqKtxM6bh8Rk oeJ3OFRcT7JBDjlfRui+V0MBPePjBdkrWtCrI2IZb/5HBZ6dvRgpWss5kPUFdNW0BaT1GcXqlOt s73NQ5aOOqvmKjiN1Ds81NvjC6w1UVF6CAiRuOSL/VoEOchXiKZZ6zKu0q4rtC/+dM49Ci+x+F4 83Rr9JuY6P8DfrwSqEuPAycf9X/EuPIr9Wpu0YXL928T84h7p3dT5f0Z8VP9QjymkPL84cP4gxX ifB59/NDaQorz++OMvKNdqUqAQW32jez20/QikWZlRzaO1xpJ2EQiKo28GuYyG8WMGwsRk0BuaW JMDEarYlRLkfeq7NhcC/6BBbQR8dgBfp8CCuth745KHDXm2JAXH9gj5nlAmSKAzGd8VeOIjn0Lp DrddNEuNGsB0teqxGI8UVBDAyPnoAHiVC0gfg9gPZZctd3P4BviRliDV2nyyBQ/sPpJq/zPkUmy 1Y1jH8gjDkrNTrYcRuxytUTKTafceyjdU31qpL7DzBuedLDxvwZGE/+dMc1rRWD4ydITYY7HVf5 B/dAqIM0M82sr4D85nXzeukfgq63OnZo8D3VmUyDra4i6b7VAJcNZSQwTLW+9DFNdcliA5ZzCui GAOKUH1nAfpYG6HxvrcjKMCMHyYVwSNstZW9srdQIb8hCnY+IR1rLBJm/M6O418gpkLmx+DGRKQ QH0piKnFstIIcn7gPu5VZvLINUjzqOmsIGhM5QML2WBMvCAWUi9wB9gmcSglbhF7ZTanLLdiJel lAWf9IrI3stU+lT0EjneCa3zkPKs2mb3//dMOeAiCmPx4NwFkMvWAuahr8m5N2YHMD0b8MyrfP9 j+C1d934/rDAK3zhG2qikEpQGXCNmQNWN153vi9ZBqaGH2DZioXNrI412fpu8qD6dDii52Yt0X/ FJ9b
X-TMASE-SNAP-Result: 1.821001.0001-0-1-12:0,22:0,33:0,34:0-0
Archived-At: <https://mailarchive.ietf.org/arch/msg/bess/PmAZw24WkW5fCRKE0NeBx9xt-NM>
Subject: Re: [bess] Shepherd's review of draft-ietf-bess-nsh-bgp-control-plane-06
X-BeenThere: bess@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: BGP-Enabled ServiceS working group discussion list <bess.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/bess>, <mailto:bess-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/bess/>
List-Post: <mailto:bess@ietf.org>
List-Help: <mailto:bess-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/bess>, <mailto:bess-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Mar 2019 19:18:09 -0000

Hello Stephane,

Thanks for this review. It is very thorough and has helped improve the document.

We have posted an update to the draft, and there are responses to your review, below.

Thanks,
Adrian

> The document is globally well written with good examples that help the understanding.
> However it requires some refinements in the normative language used: some statement
> should be normative but are not using upper case words.

Yes. I found 16 'must' to convert to 'MUST'. Some lowercase remains, but I believe they are correct.

> Abstract:
> - Please expand “BGP” on first use

'BGP' is well known according to https://www.rfc-editor.org/materials/abbrev.expansion.txt

> - Please give a reference associated to “Network Service Header”

OK. Obviously we cannot give a citation in the Abstract, but we can use plain text.

> Introduction:
> - When “classifier” is used as an example of SF, do you refer to the classifier used to steer
>    the traffic onto an SFC ? If yes, I don’t think it could be considered as an SF.

Ah, no. Hmm. Tricky.
A SF could be responsible for classifying traffic into different buckets for different uses. This is a very similar function to an 'SFC Classifier' (responsible for classifying traffic onto different SFCs).
I can't think of a way to remove this confusion except by deleting this example. Instead, I have boosted the list of examples by borrowing from RFC 7498.

> - I think the “conventional” approach described in the intro is really old school and
>  even before SFC, we had some means, like dynamic routing protocols and other
>  mechanisms like separate routing tables, to steer the traffic through a set of services.
>  I agree that there is also some drawback associated with such approaches like the
>  operational complexity of provisioning.

OK. I *think* the change here is s/Conventionally/Historically/  and change the tense to past.

> Section 1.2:
> It would be good to give a definition for the additional terms that are
> defined in this document.

OK

> Section 2.1:
> 
> As the section is focused on dataplane, it would be good to rename it 
> as “Reminder on NSH dataplane” or something like that. It does not
> provide a functional overview of the controlplane which is what was
> expected based on the title.

Yeah, the CP is in 2.2.
We go for "Overview of Service Function Chaining"

> “A special Service Function, called a Classifier, is located at each
>   ingress point to a service function overlay network.  It assigns the
>   packets of a given packet flow to a specific Service Function Path.“
>
> Again, based on my understanding, the Classifier cannot really be
> considered as an SF , at least this is a component out of the SFC 
> which steers the traffic onto an SFC.

Well, rather than have a philosophical debate, I have changed 
OLD
A special Service Function, called a Classifier,
NEW
A special functional element, called a Classifier,
END

> “An unknown or invalid SPI SHALL be treated as an error and the SFF
>   MUST drop the packet.  Such errors SHOULD be logged, and such logs
>   MUST be subject to rate limits.”
>
> I found strange to have such statement in this document which is focused
> on the controlplane while this statement is a dataplane statement. Isn’t it
> part of RFC8300 Section 3 : 
> “3.  Update the NSH: SFs MUST decrement the service index by one.  If
>       an SFF receives a packet with an SPI and SI that do not
>       correspond to a valid next hop in a valid SFP, that packet MUST
>       be dropped by the SFF.”
>
> The next paragraph deals also with normative dataplane statement which 
> is IMO out of scope of this document.

You're right. There is no place for normative text in section 2.1.
Dropped to lower case, and added "as required by RFC 8300".

> Section 2.2
>
> “The SFIR describes a particular instance of a
>    particular Service Function”. 
>
> Would it be good to talk about “Service Function Instance” rather than
> instance of a Service Function ? That’s understandable, of course, 
> however I think it’s good to reuse the exact terminology. 

It just reads better 😊

Moving to...

“The SFIR describes a particular instance of a particular Service Function (i.e., an SFI)..."

> • I think the SFT definition is appearing a bit late in this section and in the document 
>  as it has been referenced already multiple times before.

This is now in 1.2.
The only previous mention of SFT is in the bullet list for the SFPR definition two paragraphs higher.
But I have jiggled 2.2 to bring SFT to the top.

> “Service Function Type (SFT) that
   is the category of SF that is supported by an SFF”. 
>
> Don’t you mean SFI rather than SFF ?

Nope.
SFFs support attached SFIs.
An SFI is an instance of an SF.
An SF has a type that is its SFT.

> “Thus the SFF can be seen as a portal…”. Would “gateway” 
> be more suitable rather than “portal” ?

Oh, have I been writing too many fairy stories?

It's pretty much synonymous, with a nice ring to it, but no pain involved in changing to 'gateway'.

> The Figure 1 is not really used in this section as part of the existing text. I 
> would be better to have a companion text that explains the figure.

Wow! Yes. That's embarrassing.

> I don’t like (personal opinion), the “grouping” of SFIs in the Figure 1 as part of an SFT. 
> What strikes me is that it could be confused with the usual representation of an SF 
> composed of multiple SFIs. Again that’s just a personal feeling.

Interesting. That confusion does sort of exist.

I don't think an SF is composed of multiple SFIs. 
An SF has a type: its SFT
The may be multiple instances of an SF, the SFIs, all of the same type.
And, of course, it is the SFIs that are attached to the SFF.
There may be multiple SFs of the same type. These are not necessarily to be indicated as SFIs. For example, two SFs of the same type may be accessed through different SFFs.

I think it might help if the text discussed this, but I don't think that it is right to make a hierarchy in the figure.

However, since we're here, the figure could be made a bit better along with the text to describe it.

> Section 3:
> “they must use BGP Capabilities”: is it a normative MUST ?

Ack

> Section 3.1:
>
> s/a two byte Type field and a six byte/a two bytes Type field and a six bytes/

Got to love English 😊
You use singular in this case.

> “Two SFIs of the same SFT must be associated”. Is it a normative 
> MUST ? Same comment for next sentences in the paragraph (multiple
> occurrences)

Ack

> “The Service Function Type identifies a service function”. I don’t
> think we can really say that, it identifies the type of service the SF
> is providing but not the SF itself.

Yes

> “Each node hosting an SFI
>   must originate an SFIR for each type of SF that it hosts, and it may
>   advertise an SFIR for each instance of each type of SF.” 
>
> Is it really “and” ? I mean can we just summarize by using “Each 
> node hosting an SFI MUST originate an SFIR for each instance of
> each type of SF” ?

No. This makes a scaling optimization. If you don't need the creator of the SFP to know that there are 57 instances of an SF, you just advertise once for the type of SF and allow the local SFF select which SFI to use. On the other hand, if you want to allow the load balancing to be done by a controller, you can advertise.

Adding a brief explanation.

> “A BGP Update containing one or more SFIRs will also include”. Is it a
> MUST or SHOULD “also include”?

Yeah, a MUST, I guess.

> How is the nexthop encoded in the NLRI ?

A bit confused about this question. We're in the section that talks about the SFIR: the concept of nexthop applies to the SFP and so is found in the SFPR.

Perhaps the presence of the Tunnel Encapsulation attribute is the point of confusion. That attribute is there to say what protocol(s) the SFF is expecting to see used for packets arriving for the SFIs.

Or maybe you were expecting the NLRI to encode a prefix like it used to in the good old days?

> Section 3.1.1
>
> • s/“It can be included”/”It MAY be included”/ ?

Sure.

> That would be great to give more details about the usage of pools

OK.

> The encoding in Figure 4 is not matching the text. Figure 4 has Type=0x80,
> Sub-type=TBD6, text says type=TBD6 and subtype=0x00

Eeeek! And Figure 5.

The text is correct. I have fixed the figures.

> How is the pool ID managed ?

OK. That's a good one.
It's a globally unique value, and so it needs to be centrally managed. As also the assignment of SFIs to the pools.
Note has been added.

> Section 3.1.2
>
> Same comment about Figure 5 not matching the text.

Ack

> Section 3.2:
>
> s/a two byte Type field and a six byte/a two bytes Type field and a six bytes/

Same reply

> Same comment about normative language “must” vs “MUST”

Ack

> How is the nexthop encoded in the NLRI ?

Maybe a better place for the question, if we are not still talking about the use of NLRI to carry a prefix.
The SFPR carries the SFP which is a full "path" including the next hops.

> Section 3.2.1:
>
> From a readability point of view, it would be good to tell in the second sentence that the
> requested attribute is Optional Transitive (we can see it in the description but it is also
> good to have it in the intro of the attribute).

OK

> “The presence rules and meanings are as follows.”. It would be good to use normative language
> in the following sentences.

I have done s/OPTIONAL/optional/
Feels to me that the rest is statement of fact rather than direction to the implementation.

> I don’t see the “error handling” behavior associated with this attribute
> (discard, treat-as-withdraw…)

The errors would be...
- malformed attribute (need to cover this)
- wrong number of Association TLVs (but they are optional and multiple are allowed, so no error)
- Association TLV in error (there is a paragraph for this in 3.2.1.1)
- missing Hop TLV (need to cover this)
- Hop TLV in error (need to cover this)

I think the errors are covered by section 6 of RFC 4271, but we need to point to it.

> Section 3.2.1.1
> “It may be present” or “It MAY be present” ?

Yes

> Section 4.1
>
> Having an SFF being part of a single overlay network is IMO a very limited
> use case, in such a case, I don’t think that you need an RT at all. The SFF 
> could be part of a L3VPN which will be used as an underlay.
> An SFF being “multitenant” is more valid and defacto requires to maintain
> separate forwarding states (VRF like…). This is mandatory to maintain the
> tenant isolation and RTs are very useful here to know the appropriate
> context to put the routes in.

I wonder whether there is a misunderstanding.

We have...

   Every node in a service function overlay network is configured with
   one or more import RTs.

That means that an SFF, for example, can be in multiple overlays, and each SFP is placed within an individual overlay.

I re-read 4.1 and don't see it saying that an SFF is part of a single overlay. Can you point me at what I should change?

> Section 7.1
>
> While I understand that the node doing the classification can perform a deep
> packet inspection to get an entropy indicator, any intermediate node cannot
> set it again as the NSH header will be there.

Right. Exactly. Hence...
   However, when an NSH is
   included in a packet, those key fields may be inaccessible.  For
   example, the fields may be too far inside the packet for a forwarding
   engine to quickly find them and extract their values, or the node
   performing the examination may be unaware of the format and meaning
   of the NSH and so unable to parse far enough into the packet.

> Here is an example:
>
> PacketsIn -→ Classifier ->SFF1 (SFI1) -> SFF2(SFI2) ->SFF3(SFI3)->PacketsOut to dest
>
> Classifier pushes NSH header onto the packets as well as the underlay tunnel
> (using an entropy indicator in UDP or ELI/EL). Where the packets comes in at
> SFF1, IMO, it strips the underlay tunnels including the entropy indicator.
> Which means that when the packet with NSH header will go to SFF2, it will
> not have anymore entropy indicator and there is no way to build it again as
> the NSH header prevents a new deep packet inspection. Am I missing
> something ? Maintaining a per packet state is IMO not doable , especially if the
> SFI modifies the packet content.

OK. 
So this is the thing. *If* you want entropy for the underlay network, you have to get it from somewhere. And an entropy label is going to be a lot more practical that hoping that each hope in the underlay can do some form of hash.
There are some other options:
- use more fine-grained SFPs so that the SPI is effectively a key to 
  the entropy
- put the entropy as metadata in the NSH requiring that the SFF
  be NSH-aware
- encode the necessary entropy information in the tunnel attribute
- don't use entropy

I don't propose any changes to the document for this point, but do feel free to negotiate.

> Section 8
>
> It would have been good to have two SFIs with the same SFT
> on the same SFF in the example.

It's a long way through, but 8.9.1 contains an example of that.

> I don’t like the representation of RD using “192.0.2.1,1” as the “,” can
> be confusing with a regular separator. Why not using :”192.0.2.1:1”
> notation which is well known ?

Obviously not so well known that I knew it well 😊
But, it's a good change.

> Section 8.9.1:
>
> How does an SFF know that an attached SFI is stateful ? I don’t
> think it can know that.

Well, how does the SFF know which SFIs are attached and what their types are?
The registration of SFIs to their SFFs is out of scope of this document (I think it was raised as a separate function in draft-ietf-sfc-control-plane).

> I don’t think that the fact that SFF2 is used in both direction is safe
> from a load balancing perspective.
> If the hashing algorithm used by SFF2 is sensible to the order of the
> keys (like source vs dest address, or source vs dest port), it may 
> provide a different SFI as a result of the hashing between the
> forward and the reverse flow.

It's unclear what hashing might be used, but it seems to me that the primary choice will be based on the SPI.
Of course, if the hash goes further (i.e.., payload) and the SFF is aware of forward/reverse traffic it is capable of hashing the right fields.

But does this smells of an implementation detail?

> Section 8.9.2:
>
> How can the controller instruct the classifier how to place
> traffic ?

This is describe in Section 7.5 which describes a new Flowspec Action extended community that assigns a specified flow to a specified [SPI, SI, SFT].  This allows a controller to program all classifiers consistently.

An alternative would be to use PCEP based on draft-ietf-pce-pcep-flowspec with some SFC extensions (there was a draft in PCE for SFC extensions, but it was a bit premature), but that is beyond the scope of this document.

> In addition, this could involve multiple classifiers that need
> to be coordinated.

Agreed. This is part of the programming of classifiers mentioned above.

> The ingress forward classifier and the ingress reverse classifier should
> use SFPs within the same assoc.

Yes, that's what they do in the example.
The Assoc-SPI points to the associated SFP.

> Section 9:
>
> Do we have to set limits on receiving nodes in term of number of
> states received from the controller to mitigate some attack ?

I'm not sure, but probably not. But anyway, that would be out of scope.

> The text talks about security of BGP, what kind of mechanism
> should be put in place ?

To be honest, I don't know.
This has similar security behaviour to a L3VPN, so I guess the same rules apply.

> Do we have any interdomain considerations ?

8300 says that the intended scope is for use within a single provider's operational domain.

> This is a controller based approach, so the security of the controller
> itself is also important.

It's true that one approach to building a network architecture (probably the wise approach?) is controller based. But this document is providing a BGP based control plane not a full solution, so we think that discussion is out of scope.

> References:
> Shouldn’t RFC7665 be normative?

Probably. It gives us a downref, but maybe the IESG doesn't mind them these days.

> I think that the mpls-sfc and mpls-sfc-encaps should also be
> normative as you are defining a controlplane to use them.

I don't mind doing that.