Re: Tsvart early review of draft-ietf-rtgwg-net2cloud-problem-statement-22

Lukasz Bromirski <lukasz.bromirski@gmail.com> Mon, 17 April 2023 16:57 UTC

From: Lukasz Bromirski <lukasz.bromirski@gmail.com>
Message-Id: <A7BCC79A-45FA-4427-8068-BCC0162BA538@gmail.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_83962196-EB34-4C98-AEBD-86A99A9E6294"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.2\))
Subject: Re: Tsvart early review of draft-ietf-rtgwg-net2cloud-problem-statement-22
Date: Mon, 17 Apr 2023 18:57:06 +0200
In-Reply-To: <CO1PR13MB492041AD9B39FFE205EAF5AA859F9@CO1PR13MB4920.namprd13.prod.outlook.com>
Cc: David Black <david.black@dell.com>, "tsv-art@ietf.org" <tsv-art@ietf.org>, "draft-ietf-rtgwg-net2cloud-problem-statement.all@ietf.org" <draft-ietf-rtgwg-net2cloud-problem-statement.all@ietf.org>, "rtgwg@ietf.org" <rtgwg@ietf.org>
To: Linda Dunbar <linda.dunbar@futurewei.com>
References: <168055635654.11507.17750417804419163710@ietfa.amsl.com> <PH0PR13MB49229EDCFEC1D54173EA590585999@PH0PR13MB4922.namprd13.prod.outlook.com> <FDAE23AC-5834-4EC3-B368-249F94E9DE9F@gmail.com> <CO1PR13MB492041AD9B39FFE205EAF5AA859F9@CO1PR13MB4920.namprd13.prod.outlook.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/rtgwg/aTF9gCdmvirtHSho9NTLFAsNxAA>
Precedence: list

Linda,

Thanks - my responses inline:

> On 16 Apr 2023, at 03:16, Linda Dunbar <linda.dunbar@futurewei.com> wrote:
> 
> Lukasz, 
>  
> Thank you very much for reviewing the document and the comments.
> Please see below for the resolutions to your comments.
>  
> Linda
>  
> From: Łukasz Bromirski <lukasz.bromirski@gmail.com <mailto:lukasz.bromirski@gmail.com>> 
> Sent: Friday, April 14, 2023 5:38 PM
> To: Linda Dunbar <linda.dunbar@futurewei.com <mailto:linda.dunbar@futurewei.com>>
> Cc: David Black <david.black@dell.com <mailto:david.black@dell.com>>; tsv-art@ietf.org <mailto:tsv-art@ietf.org>; draft-ietf-rtgwg-net2cloud-problem-statement.all@ietf.org <mailto:draft-ietf-rtgwg-net2cloud-problem-statement.all@ietf.org>; rtgwg@ietf.org <mailto:rtgwg@ietf.org>
> Subject: Re: Tsvart early review of draft-ietf-rtgwg-net2cloud-problem-statement-22
>  
> Hi Linda, Group,
>  
> Let me offer some points related to the latest version of the draft:
>  
> 1. "DSVPN" - this is Huawei specific term describing VPNs that allow for dynamic connections between spokes which itself is 1:1 copy of Cisco DMVPN down to use of NHRP and mGRE (https://support.huawei.com/enterprise/en/doc/EDOC1100112360/a485316c/overview-of-dsvpn <https://support.huawei.com/enterprise/en/doc/EDOC1100112360/a485316c/overview-of-dsvpn>). Shouldn't we avoid vendor-specific product/solution names in RFC documents?
>  
> It's actually called out again in point 4.2 later on along with Cisco's DMVPN callout at the same time (which itself is not defined anywhere).
> [Linda] Agree with your point. Is “NHRP [RFC2735] based multi-point VPN” a better name? Or Can you suggest a name to indicate the NHRP based multi-point-to-point or multi-point-to multi-point tunnels among those client’s own virtual routers? 

NHRP is just a piece of the wider architecture, but yes, if you’re looking for something short to put as a description of the concept, I’d say something like “Dynamic VPN solution for p2p or p2mp”.

However, I’d argue if this definition is even needed, given it’s mentioned only 5 times, four of which are “DMVPN or DSVPN”, while DMVPN definition is nowhere to be found. There’s clear skew to Huawei specific naming here.

It can easily be substituted to this definition above without explicitly mentioning vendor-specific names at all, given how it’s used to describe specific use cases.

> 2. 
>  
> "3.1: [...] Cloud GWs need to peer with a larger variety of parties, via private circuits or IPsec over public internet."
>  
> As far as I understood, the whole 3.1. section tries to underline need for flexible/resilient BGP implementation and I agree with that. However,  I'd argue that a lot of cloud-based connections happen via BGP over internet directly, not necessarily through private circuits or IPsec. The 4.2 section of that draft even mentions some examples of that use case.
>  
> [Linda] Azure’s ExpressRoute (https://azure.microsoft.com/en-us/products/expressroute <https://azure.microsoft.com/en-us/products/expressroute> ), AWS’s  Direct Connect ( https://aws.amazon.com/directconnect/ <https://aws.amazon.com/directconnect/> ) are via private circuits, which are widely used. They all limit on the inbound routes via both Direct connect and via Internet. 

That’s correct, but that’s not the point I was trying to make. Some connections are directly over BGP without need to either use MPLS VPNs (who even allows that?) or IPsec. There’s so much focus in this doc that the connectivity to the cloud is either of those two options (MPLS VPN and IPsec) and that’s simply not the case.

Direct peerings and/or GRE tunneling are used as well. Not to mention any way of tunneling if Customer is deploying their own virtual routers at the edge of specific cloud service which is very common at this point.

Also, case in point below:

> There's so much focus in the document on only two types of connection - MPLS VPN or IPsec. The actual use case of connecting your workload to the cloud can be easily addressed by any type of overlay routing, like GRE or VXLAN/GENEVE terminated on the virtual cloud gateway.
> [Linda] When the Cloud connection is via Internet, the IPsec is exclusively used as outer header. Within the payload of the IPsec, client routes can be encapsulated by VXLAN/GENEVE, which is not the focus of the document. 

It is not.

Look at the GRE options for example (https://docs.aws.amazon.com/vpc/latest/tgw/tgw-connect.html <https://docs.aws.amazon.com/vpc/latest/tgw/tgw-connect.html>) or general connectivity for private direct peerings. 

On top of that, Enterprises do deploy their own virtual routers at the edge of public clouds and pack their own way of extending overlays - up to, and including EVPN based solutions based on top of GRE or VXLAN/GENEVE encaps. There’s no need for IPsec in those cases, as typically traffic is anyway encrypted via TLS in the app layer, or in some cases - enterprises don’t really care about encrypting the traffic simply as that. There are number of reasons that’s the case, not relying on specific internetworking configuration being one of the most important ones.

> "When inbound routes exceed the maximum routes threshold for a peer, the current common practice is generating out of band alerts (e.g., Syslog) via management system to the peer, or terminating the BGP session (with cease notification messages [RFC 4486] being sent)."
>  
> For completness sake, shouldn't we explicitly state what's the action in the first case? Typically, the additional routes above the threshold are ignored and this in turn may lead to other reachability problems.
> [Linda] At the current time, there is no standard procedure when inbound routes exceed the maximum limits or across certain thresholds. We are planning to write a standard track draft in IDR WG to kick start the discussion, like sending notifications when threshold across, ignoring routes that are not originated by the clients, or having some kind of policy on ignoring additional routes. There will be a lot of debates on this subject. IDR WG had many attempts on this in the past. None has reached consensus. 

Got it, but current gen of routers typically give you two options - drop the session (described as option #2) or keep the session up and ignore additional prefixes. It’s not really that important here, but just wanted to ask if it wouldn’t make sense to clarify what happens in option #1.

> "3.4.1: [...] Therefore, the edge Cloud that is the closest doesn't contribute much to the overall latency."
>  
> How that's a problem?
>  
> [Linda] Here is what is intended to say:
> The difference in routing distances to multiple server instances in different edge Clouds is relatively small. Therefore, the edge Cloud with the shortest routing distance might not be the best in providing the overall latency. <>
Yeah, that version makes more sense. The one I quoted should be fixed then?

> "4.3: [...] However, traditional MPLS-based VPN solutions are sub-optimized for dynamically connecting to workloads/applications in cloud DCs."
>  
> The whole section says existing MPLS VPNs and/or IPsec tunnels are being used to connect to Cloud DCs. So how exactly the "traditional MPLS-based VPNs" are "sub-optimized" if at the same time they're the exact means document mentions of solving the problem?
> [Linda] “sub-optimal” because
> The Provider Edge (PE) nodes of the enterprise’s VPNs might not have direct connections to the third-party cloud DCs used by the enterprise to provide easy access to its end users. When the user base changes, the enterprise’s workloads/applications may be migrated to a new cloud DC location closest to the new user base. The existing MPLS VPN provider might not have PEs at the new location. Deploying PEs routers at new locations is not trivial, which defeats one of the benefits of Clouds’ geographically diverse locations allowing workloads to be as close to their end-users as possible.

Yeah, that’s the point I make below. Given how distributed current ISP infra is (where it can provide MPLS VPNs to Customers) versus centralized and limited cloud DC physical connectivity is, this statement is not true. It’s easier to find MPLS VPN offering in given point of geography than to find there cloud DC - there are very limited in numbers. Let’s take a look at AWS site map:
https://aws.amazon.com/about-aws/global-infrastructure/ <https://aws.amazon.com/about-aws/global-infrastructure/> and compare this with any major ISP PoP map.

> "4.3. [...] The existing MPLS VPN provider might not have PEs at the new location. Deploying PEs routers at new locations is not trivial, which defeats one of the benefits of Clouds' geographically diverse locations allowing workloads to be as close to their end-users as possible."
>  
> When reading this literally, I'd say that any SP offering MPLS VPNs will be anyway more flexible in terms of reach (if it covers given geo) than pretty much fixed and limited number of cloud DCs available. However, I sense the intent here was to underline role of "agile" DCs set up by for example "cloud" stacks of 5G services (and similar services), and if so - that likely would require some clarification to be well understood.
> [Linda] Setting up MPLS circuits takes weeks/months.

Sure, but that’s not what the point says. 

> "4.3. [...] As MPLS VPNs provide more secure and higher quality services, choosing a PE closest to the Cloud GW for the IPsec tunnel is desirable to minimize the IPsec tunnel distance over the public Internet."
>  
> MPLS VPNs provide more secure and higher quality services.... than what?
> [Linda] MPLS VPNs utilize private links.  Entrance to MPLS VPNs with edge filters provide additional filter. These are more secure than the public Internet.

I could agree with that, but such statements should be stated explicitly (why we believe that’s so). Different people will thing about “more secure” in different ways. Some focus on encryption, some on authentication, some on the routing security you seem to mention. I assume the point was about all of that so let’s clarify that.

>  
> "4.3. [...] As multiple Cloud DCs are interconnected by the Cloud provider's own internal network, the Cloud GW BGP session might advertise all of the prefixes of the enterprise's VPC, regardless of which Cloud DC a given prefix is actually in. This can result in inefficient routing for the end-to-end data path."
>  
> That's true, but either we praise use of anycast (in the doc above) or claim it's inferior to instead polluting routing table (announcing more prefixes), or limiting visibility (by announcing less prefixes). You can't really have it both ways.
> [Linda] the intent of the section is to document the problem and describe a get around method:
> To get around this problem, virtual routers in Cloud DCs can be used to attach metadata (e.g., GENEVE header or IPv6 optional header) to indicate Geo-location of the Cloud DCs.
> Can you suggest a better text?

Maybe something like:

“As multiple Cloud DCs are interconnected by the Cloud provider own internal network, it’s topology and routing policies are not transparent or even visible to Enterprise Customer. While normally, Cloud GW BGP sessions will provide prefixes across Enterprise VPCs and that typically achieves goals of universal connectivity, load-balancing (due to ECMP) and high availability (multiple Cloud DC points can go down and the rest will still provide service), it’s worth to note that configuration by default may not provide best or even stable end-to-end data path for Customer traffic.”

It may need some simplification.

> "5. As described in [Int-tunnels], IPsec tunnels can introduce MTU problems. This document assumes that endpoints manage the appropriate MTU sizes, therefore, not requiring VPN PEs to perform the fragmentation when encapsulating user payloads in the IPsec packets."
>  
> Well, typically no, it's 2023 and while PMTUD is still broken in parts of the internet that's abusively controlled or censored, the real problem here is with networks that run above typical 1500 bytes which is common for virtual environments and likely was a reason that text was put in place. Maybe underlining this would make sense in this paragraph?
> [Linda] IETF drafts use plain text. Can’t use underline. Can you suggest a better wording for this? Thank you.

“As described in [Int-tunnels], IPsec tunnels can introduce MTU problems. It’s worth to observer that some of the applications tend to use bigger packets, as cloud environments operate on virtual network interface cards and MTU of 9000 is commonly used there. This can create problems when traffic has to transit physical network that typically uses MTU of 1500 or even lower and requires fragmentation.

This document assumes that endpoints manage the appropriate MTU sizes and Path MTU discovery is able to dynamically adjust TCP traffic based on true end-to-end maximum MTU. We also assume, that VPN PEs don’t need to perform fragmentation when encapsulating user payloads in the IPsec packets. If that’s not the case for given deployment, care need to be taken to appropriately size your devices and make sure they can handle the fragmentation, as that’s typically very process-intensive activity.

> "5.2. IPSec" -> "IPsec"
> [Linda] changed.
>  
> "5.2. IPSec encap & decap are very processing intensive, which can degrade router performance. NAT also adds to the performance burden."
>  
> That's why nowadays IPsec is executed in hardware, or in "hardware-accelerated" software path (like QAT for x86-pure workloads), so is typically NAT on enterprise gear that does qualify as a "PE" so often mentioned in this document. 
> [Linda] Are the  “hardware-accelerated” path performance also impacted by larger number of IPsec flows? Can you suggest a better wording?

“IPsec encapsulation and decapsulation is process intensive, and typically offloaded to hardware acceleration cards on modern edge routers. There’s also risk of additional load related to fragmentation. Some other features, like NAT or packet filtering can also be offloaded to hardware acceleration, improving performance. However, when such capabilities are executed purely on device CPU(s), care should be taken to properly size those devices so then can handle the additional load."

> "5.2. [...] When enterprise CPEs or gateways are far away from cloud DC gateways or across country/continent boundaries, performance of IPsec tunnels over the public Internet can be problematic and unpredictable."
>  
> ...compared to? Pure IP routing between the same IPs? 
> [Linda] comparing with private links.

That may need clarification then. 

> "7. [...] via Public IP ports which are exposed"
>  
> Wouldn't it make sense to use 'interfaces' here? "ports" has TCP/UDP layer 4 connotation.
> [Linda] on routers, the term “physical ports” are commonly used, like Ethernet ports, OC12 Ports, WIFI ports, etc.

Interfaces. Common term is ‘interfaces’, ‘ports’ is being typically used in data sheets only.

Ports immediately gives you “services listening on a port X”, not ports like interfaces.

I’d ask anyone on the list with actual networking infra experience to add their own comments.

> "7. [...] Potential risk of augmenting the attack surface with inter-Cloud DC connection by means of identity spoofing, man-in-the-middle, eavesdropping or DDoS attacks. One example of mitigating such attacks is using DTLS to authenticate and encrypt MPLS-in-UDP encapsulation (RFC 7510)."
>  
> How it is different than protection offered by IPsec?
> [Linda] This section is about those attacks to the public facing “interface” that support IPsec.

Yeah, but DTLS would be suspectible to the same or even easier DDoS than IPsec. So that would likely need complete rewrite to address both at the same time, just like the point below:

> "7. [...] When IPsec tunnels established from enterprise on-premises CPEs are terminated at the Cloud DC gateway where the workloads or applications are hosted, traffic to/from an enterprise's workload can be exposed to others behind the data center gateway (e.g., exposed to other organizations that have workloads in the same data center).
>  
> To ensure that traffic to/from workloads is not exposed to unwanted entities, IPsec tunnels may go all the way to the workload (servers, or VMs) within the DC."
>  
> How that problem statement would be different than DTLS solution/protection from the beginning of the section? 
>  
> [Linda] DTLS is at the Transport Layer. Here we are talking about IP layer. The answer to that security question is long.  As you know IPSEC has different attack planes than DTLS at different costs. Are you looking for a chart that compares this facet?  Or can you simply reference the appropriate RFCs?

No, the whole point of me commenting on that and DTLS section was that they tend to try to describe risks, but reference each other as solution to the problem. Neither of which is actually solution for a problem, first of all.

Thanks,
— 
Łukasz Bromirski

Tsvart early review of draft-ietf-rtgwg-net2cloud… David Black via Datatracker
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Linda Dunbar
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Black, David
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Linda Dunbar
Re: Tsvart early review of draft-ietf-rtgwg-net2c… Łukasz Bromirski
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Linda Dunbar
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Black, David
Re: Tsvart early review of draft-ietf-rtgwg-net2c… Lukasz Bromirski
Re: Tsvart early review of draft-ietf-rtgwg-net2c… Robert Raszuk
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Linda Dunbar
ECN [RFC6040] reference for IPsec tunneling to Cl… Linda Dunbar
RE: ECN [RFC6040] reference for IPsec tunneling t… Black, David
RE: ECN [RFC6040] reference for IPsec tunneling t… Linda Dunbar
RE: ECN [RFC6040] reference for IPsec tunneling t… Black, David
RE: ECN [RFC6040] reference for IPsec tunneling t… Linda Dunbar
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Linda Dunbar
Re: Tsvart early review of draft-ietf-rtgwg-net2c… Robert Raszuk
request WGLC for draft-ietf-rtgwg-net2cloud-probl… Linda Dunbar
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Linda Dunbar
Re: Tsvart early review of draft-ietf-rtgwg-net2c… Robert Raszuk
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Linda Dunbar
Re: Tsvart early review of draft-ietf-rtgwg-net2c… Robert Raszuk
RE: Tsvart early review of draft-ietf-rtgwg-net2c… Linda Dunbar
Re: Tsvart early review of draft-ietf-rtgwg-net2c… Robert Raszuk