Re: [nvo3] [Tsv-art] Can you please enter READY for Tsvart last call review of draft-ietf-nvo3-vmm-14?

Hi Linda, 

It is good that you seek confirmation that a reviewers issues have been
addressed.I do hope the reviewers would in most cases be responsive and provide
feedback. However, in some cases they reviewer may not be avaialble to provide
timely feedback. TSVART reviewers, I would appreciate that you do indicate if that is the situation that you are unable to follow up.  

If an reviewer becomes unresponsive either by not responding or indicating an
unavailability to follow a non-ready TSVART review, please do note that this
review is not blocking progressing the draft. It is up to the responsible AD to
judge if the IETF last call raised issues has been adderssed, and you as
document shephred can indicate your position to the AD. 

In the end we TSV ADs will be consider the review that was done, and the current
state of the document. If we believe there are issue unresolved in accordance
with our judgement we will comment or even DISCUSS the draft. 

Hope that clarifies

Magnus Westerlund
TSV AD

On Wed, 2020-04-15 at 16:03 +0000, Linda Dunbar wrote:
> Matthew,
>  
> I have spent a lot of time & effort addressing comments from TSV review and
> revised the draft 8 times. But TSV Directorate hasn’t entered READY.
> Is there any way to move this draft forward? Or we are just going circles?
>  
> Thank you very much.
>  
> Linda Dunbar
>  
> From: Linda Dunbar 
> Sent: Friday, April 3, 2020 12:31 PM
> To: Black, David <David.Black@dell.com>; Bob Briscoe <ietf@bobbriscoe.net>;
> Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org
> Subject: RE: Can you please enter READY for Tsvart last call review of draft-
> ietf-nvo3-vmm-14?
>  
> Bob,
>  
> Can you please respond? We revised the draft 8 times since.
> Has the latest version has addressed your concern?
> If not, please let us know as soon as possible, If Yes, can you please enter
> “READY” for the TSV review?
>  
> Thank you very much for your review and comments.
>  
> Linda Dunbar
>  
> From: Black, David <David.Black@dell.com> 
> Sent: Thursday, April 2, 2020 12:36 PM
> To: Linda Dunbar <linda.dunbar@futurewei.com>; Bob Briscoe <
> ietf@bobbriscoe.net>; Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; 
> sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org;
> Black, David <David.Black@dell.com>
> Subject: RE: Can you please enter READY for Tsvart last call review of draft-
> ietf-nvo3-vmm-14?
>  
> The concerns that I’ve been working are resolves.
>  
> I defer to Bob on whether his remaining concerns about TCP (part of [E]) and
> overall clarity ([2/3/4]) have been satisfactorily addressed.
>  
> Thanks, --David
>  
> From: Linda Dunbar <linda.dunbar@futurewei.com> 
> Sent: Thursday, April 2, 2020 11:39 AM
> To: Black, David; Bob Briscoe; Bocci, Matthew (Nokia - GB); sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3; draft-ietf-nvo3-vmm.all@ietf.org
> Subject: Can you please enter READY for Tsvart last call review of draft-ietf-
> nvo3-vmm-14?
>  
> [EXTERNAL EMAIL]
> David and Bob,
>  
> Thank you very much for giving the comments and suggestions for the draft-
> ietf-nvo3-vmm, and continued review and comments for recent 8 revisions.
>  
> Hope the latest draft(-14 version) has  resolved all your concern.
>  
> Can you please enter READY for the TSV review? So that this draft can go
> forward.
>  
> Thank you.
> Really appreciate your help.
>  
> Linda
>  
> From: Linda Dunbar 
> Sent: Wednesday, April 1, 2020 9:26 PM
> To: 'Black, David' <David.Black@dell.com>; Bob Briscoe <ietf@bobbriscoe.net>;
> Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> David,
>  
> Thanks, accept your suggested wording in the -14 version.
>  
> Linda
>  
> From: Black, David <David.Black@dell.com> 
> Sent: Wednesday, April 1, 2020 3:21 PM
> To: Linda Dunbar <linda.dunbar@futurewei.com>; Bob Briscoe <
> ietf@bobbriscoe.net>; Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; 
> sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org;
> Black, David <David.Black@dell.com>
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> Hi Linda,
>  
> > Why can’t we simply say that the VM is moving away from the Gateway?
>  
> Sure, that works, but there’s a subtlety in that the VM may be using different
> gateways for different flows.  In addition, “need” is the wrong verb, as the
> prior Gateway(s) can be expected to continue to work – they just may no be the
> best choices after the VM move.
>  
> OLD
> After a VM is moved to a new NVE, the VM’s corresponding Gateway may need to
> change as well. If there is no other entity (or node) to take over the
> corresponding NVO3 Gateway function for the VM under the new NVE, the network
> path between the VM and the external entities needs to be hair-pinned to the
> NVO3 Gateway used prior to the VM move.
> NEW
> Moving a VM a new NVE may move the VM away from the NVO3 Gateway(s) used by
> the VM’s traffic, e.g., some traffic may be better handled by an NVO3 Gateway
> that is closer to the new NVE than the NVO3 Gateway that was used before the
> VM move.  If NVO3 Gateway changes are not possible for some reason, then the
> VM’s traffic can continue to use the prior NVO3 Gateway(s), which may have
> some drawbacks, e.g., longer network paths.
>  
> This also removes use of “hair-pinned” to avoid having to define that term.
>  
> Thanks, --David
>  
> From: Linda Dunbar <linda.dunbar@futurewei.com> 
> Sent: Wednesday, April 1, 2020 1:00 PM
> To: Black, David; Bob Briscoe; Bocci, Matthew (Nokia - GB); sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3; draft-ietf-nvo3-vmm.all@ietf.org
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> [EXTERNAL EMAIL]
> David,
>  
> Thank you for the suggested wording. I accept all the change for the -13
> version, except question for the following one:
>  
> “After a VM is moved to a new NVE, the VM's corresponding Gateway
>      functionality for communication with each external entity may benefit
>      from being moved to a different network node. “
>  
>  
> It is odd to say that “VM’s corresponding Gateway functionality for the
> communication with external entity may benefit …?
> We don’t know if the Gateway actually benefits or not, especially if the
> communication may be hair-pinned back.
>  
> Why can’t we simply say that the VM is moving away from the Gateway?
>  
> How about the following?
>  
> After a VM is moved to a new NVE, the VM’s corresponding Gateway may need to
> change as well. If there is no other entity (or node) to take over the
> corresponding NVO3 Gateway function for the VM under the new NVE, the network
> path between the VM and the external entities needs to be hair-pinned to the
> NVO3 Gateway used prior to the VM move.
>  
> Thank you.
> Linda
>  
> From: Black, David <David.Black@dell.com> 
> Sent: Tuesday, March 31, 2020 1:14 PM
> To: Linda Dunbar <linda.dunbar@futurewei.com>; Bob Briscoe <
> ietf@bobbriscoe.net>; Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; 
> sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org;
> Black, David <David.Black@dell.com>
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> Hi Linda,
>  
> Something like half a dozen versions later, I think we’re down to editorial
> cleanup of the three sections that I’ve worked on, and then Bob should weigh
> in on what he considers to be ok vs. in need of further attention.  Edits
> follow:
>  
> -- Section 4.1
> OLD
>      This document assumes that the inter-VNs communication and the
>      communication with external entities are via the NVO3 Gateway as
>      described in RFC8014 (NVO3 Architecture). RFC 8014 (Section 5.3)
>      describes the NVO3 Gateway function which is to relay traffic onto
>      and off of a virtual network, i.e. among different VNs.
> NEW
>      This document assumes that the inter-VNs communication and the
>      communication with external entities are via NVO3 Gateway
>      functionality as described in Section 5.3 of RFC 8014 [RFC8014].
>      NVO3 Gateways relay traffic onto and off of a virtual network,
>      enabling communication both across different VNs and with external
>      entities.
>  
> Final “i.e.” in old text was overlooked in editing – it’s incorrect.
>  
> OLD
>      There are different policies on the NVO3 Gateway to govern the
>      communication among VNs and with external entities (or hosts).
> NEW
>      NVO3 Gateway functionality enforces appropriate policies to control
>      communication among VNs and with external entities (e.g., hosts).
>  
> OLD
>      After a VM is moved to a new NVE, the VM's corresponding Gateway
>      may need to change as well. If such a change is not possible, then
>      the path to the external entities need to be hair-pinned to the
>      NVO3 Gateway used prior to the VM move.
> NEW
>      After a VM is moved to a new NVE, the VM's corresponding Gateway
>      functionality for communication with each external entity may benefit
>      from being moved to a different network node.  If it is not possible
>      to move that functionality, then the network paths between the VM
>      and external entities may be hair-pinned to the network node or nodes
>      used for the Gateway functionality prior to the VM move.
>     
> -- Section 8
>  
>      For unexpected events, such as unexpected failure, a VM might need
>      to move to a new NVE, which is called Hot VM Failover in this
>      document. For Hot VM Failover, there are redundant primary and
>      secondary VMs whose states are synchronized by means that are
>      outside the scope of this draft. If the VM in the primary NVE
>      fails, there is no need to actively move the VM to the secondary
>      NVE because the VM in the secondary NVE can immediately pick up
>      the processing. Details of how state is synchronized between the
>      primary and secondary VMs are beyond the scope of this document.
>  
> Remove last sentence in above paragraph (“Details of how state is synchronized
> ...”), as it’s redundant with the second sentence (I meant to remove that
> sentence, but overlooked this in working on the last round of edits).
>  
> -- Section 10
>  
> OLD
>      Some Data Centers have their NVO3
>      Gateways to be equipped with capability to mitigate ARP/ND
>      threats, such as periodically exchanging its ARP/ND cache with
>      NVA's central control system to validate the ARP/ND cache learned
>      by the NVE with the VM Management System.
> NEW
>      Some Data Centers deploy additional functionality in their NVO3
>      Gateways for mitigation of ARP/ND threats, e.g., periodically
>      sending each Gateway’s ARP/ND cache contents to the NVA or other
>      central control system to check for ARP/ND cache entries that
>      are not consistent with the locations of VMs and their IP addresses
>      indicated by the VM Management System.
>  
> Thanks, --David
>  
> From: Linda Dunbar <linda.dunbar@futurewei.com> 
> Sent: Monday, March 30, 2020 10:26 PM
> To: Black, David; Bob Briscoe; Bocci, Matthew (Nokia - GB); sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3; draft-ietf-nvo3-vmm.all@ietf.org
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> [EXTERNAL EMAIL]
> David,
>  
> Thank you very much for the suggested wording.
> I accept all of them in the -12 version.
>  
> Thanks for catching the typo IPv40
>  
> As for your comment on “that I don’t understand what how “exchanging its
> ARP/ND cache with NVA’s central control system” does anything useful to
> counter the threats”,
> I added the following explanation:
>  
> Some Data Centers have their NVO3 Gateways to be equipped with capability to
> mitigate ARP/ND threats, such as periodically exchanging its ARP/ND cache with
> NVA’s central control system to validate the ARP/ND cache learned by the NVE
> with the VM Management System.
> Linda
>  
> From: Black, David <David.Black@dell.com> 
> Sent: Monday, March 30, 2020 7:55 PM
> To: Linda Dunbar <linda.dunbar@futurewei.com>; Bob Briscoe <
> ietf@bobbriscoe.net>; Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; 
> sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org;
> Black, David <David.Black@dell.com>
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> Hi Linda,
>  
> Status summary:
>  [A] & [B] – already done
>  [C] – Edits to Section 8 text follow below.
>  [D] – Improved, but now Section 4.1 needs some rearrangement and editing,
> more below.
>  [E] – My concerns with what is now Section 4.3 have been addressed, but Bob
> will need to opine on his remaining TCP concerns.
>  [F] – Edits to Section 10 text follow below.
>  [2/3/4] – These do not appear to have been addressed, as repeating the
> summaries of past actions that did not address these concerns is not helpful. 
> Please try again, and I leave further follow-up on these three concerns to
> Bob.
>  
> -- [D] Section 4.1 changes
>  
> The newly inserted second paragraph covers what needs to be covered, but the
> section is now seriously confused about whether it is about Inter-VN
> communication, external communication or both. So, please:
>  Swap the order of the first two paragraphs, so that the new paragraph comes
> first, as it has the broader scope.
>  Change the title of Section 4.1 to “Inter-VN and External Communication” or
> something similar that reflects the broader scope.
>  Edit the resulting first two paragraphs for flow and consistency, including
> moving the reference to Section 5.3 of RFC 8014 into the new first paragraph.
>  
> -- [C] Section 8 changes
>  
> To try to speed this up, I’m going to provide most of the necessary changes,
> as the current text is wrong in multiple ways, starting with the assertion
> that a load balancer is required for Hot VM Failover.  There is widely used
> running code that provides counterexamples to that incorrect assertion.
>  
> OLD
>      For unexpected events, such as unexpected failure, a VM might need
>      to move to a new NVE, which is called Hot VM Failover in this
>      document. For Hot VM Failover, there are VMs in both primary and
>      secondary NVEs. They can provide services simultaneously as in
>      load-share mode of operation.  If the VM in the primary NVE fails,
>      there is no need to actively move the VM to the secondary NVE
>      because the VM in the secondary NVE can immediately pick up the
>      processing. It is out of the scope of this document on how and
>     what information are exchange between the two VMs under two
>      different NVE
> NEW
>      For unexpected events, such as unexpected failure, a VM might need
>      to move to a new NVE, which is called Hot VM Failover in this
>      document. For Hot VM Failover, there are redundant primary and
>      secondary VMs whose states are synchronized by means that are
>      outside the scope of this draft. If the VM in the primary NVE fails,
>      there is no need to actively move the VM to the secondary NVE
>      because the VM in the secondary NVE can immediately pick up the
>      processing. Details of how state is synchronized between the
>      primary and secondary VMs are beyond the scope of this document.
>  
> As part of this, I’ve removed the “They can provide services simultaneously as
> in load-share mode of operation” sentence, as that’s not true of all Hot VM
> Failover mechanisms.
> In addition, the implied assertion that the VMs involved cannot share the same
> NVE is not correct in all cases, so I also removed that implication in the new
> text.
>  
> OLD
>      The VM Failover to the new NVE is transparent to the peers that
>      communicate with this VM. This can be achieved by both active VM
>      and standby VM share the same TCP port and same IP address. There
>      must be a load balancer that can distribute the packets to the VM
>      under the new NVE. The new VM can pick up providing service while
>      the sender (peer) still continues to receive Ack from the old VM
>      and chooses not to use the service of the secondary responding VM.
>      If the situation (loading condition of the primary responding VM)
>      changes the secondary responding VM may start providing service to
>      the sender (peers).
> NEW
>      The VM Failover to the new NVE is transparent to the peers that
>      communicate with this VM. This can be achieved by both active VM
>      and standby VM share the same TCP port and same IP address, and
>      using distributed load balancing functionality that controls which
>      VM responds to each service request.  In the absence of a
>     failure, the new VM can pick up providing service while
>      the sender (peer) still continues to receive Ack from the old VM.
>      If the situation (loading condition of the primary responding VM)
>      changes the secondary responding VM may start providing service to
>      the sender (peers).  On failure, the sender (peer) may have to
>      retry the request, so this structure is limited to requests that
>      can be safely retried.
>  
>      If load balancing functionality is not used, the VM Failover can
>      be made transparent to the sender (peers) without relying on
>      request retry by using techniques described in section 4 that
>      do not depend on the primary VM or its associated NVE doing
>      anything after the failure.  This restriction is necessary because
>      a failure that affects the primary VM may also cause its associated
>      NVE to fail (e.g., if the NVE is located in the hypervisor that
>      hosts the primary VM and the underlying physical server fails,
>      both the primary VM and the hypervisor that contains the NVE fail
>      as a consequence).
>  
> I’ve kept in the load balancer discussion and corrected it, plus added a
> paragraph to cover situations in which a load balancer is not used.
>  
> The following paragraph needs to be removed:
>  
>      If TCP states are not properly synchronized among the two VMs, the
>      VM under the New NVE after failover can force the peers to re-
>      establish a new TCP connection by stopping the previous TCP
>      connection. As most TCP connections are short lived, re-
>      establishing a new one is not a big problem.
>  
> As Bob has pointed out, the “not a big problem” conclusion is indefensible. 
> It’s also actually incorrect as stated - to make it correct requires further
> constraints on both the load balancer and the senders, so I suggest simply
> removing the entire paragraph that contains it, as the first set of edits
> (above) assert synchronization of all VM state, which includes TCP state.
>  
> -- [F] Section 10 edits
>  
> The following two edits are necessary to correct technical problems:
>  
> OLD
>     Security threats for the data and control plane for overlay 
>      networks are discussed in [RFC8014].  ARP (IPv40 and ND (IPv6) are
>      not secure, especially if we accept gratuitous versions in multi-
>      tenant environment.
> NEW
>      Security threats for the data and control plane for overlay 
>      networks are discussed in [RFC8014].  ARP (IPv4) and ND (IPv6) are
>      not secure, especially if they can be sent gratuitously across tenant
>      boundaries in a multi-tenant environment.
>  
> Although I do admire the wishful thinking in the IPv40 typo :-).
>  
> OLD
>      In Layer-3 based overlay data center networks, ARP and ND messages
>      can be used to mount address spoofing attacks.  An NVE may have
>      untrusted VMs attached. This usually happens in cases like the VMs
>      running third party applications.
> NEW
>      In overlay data center networks, ARP and ND messages
>      can be used to mount address spoofing attacks from untrusted VMs
>      and/or other untrusted sources. Examples of untrusted VMs include
>      running third party applications (i.e., applications not written
>      by the tenant who controls the VM).
>  
> The original use of “Layer-3” is incorrect – these threats also apply to
> Layer-2 service because ARP and ND are layer 2 packets.
>  
> Finally, this paragraph is not clear:
>  
>      Because of those threats, VM management system needs to apply
>      stronger security mechanisms when add a VM to an NVE. Some tenants
>      may have requirement that prohibit their VMs to be co-attached to
>      the NVEs with other tenants. Some Data Centers have their NVO3
>      Gateways to be equipped with capability to mitigate ARP/ND
>      threats, such as periodically exchanging its ARP/ND cache with
>      NVA's central control system.
>  
> The good news is that this paragraph is responsive to the threats described
> earlier, but ... ... the bad news is that I don’t understand what how
> “exchanging its ARP/ND cache with NVA’s central control system” does anything
> useful to counter the threats, so that needs to be explained.
>  
> Thanks, --David
>  
> From: Linda Dunbar <linda.dunbar@futurewei.com> 
> Sent: Monday, March 30, 2020 5:38 PM
> To: Black, David; Bob Briscoe; Bocci, Matthew (Nokia - GB); sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3; draft-ietf-nvo3-vmm.all@ietf.org
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> [EXTERNAL EMAIL]
> David,
>  
> Resolutions to your inserted below marked as [Linda]:
>  
> Thank you.
>  
> Linda
>  
> From: Black, David <David.Black@dell.com> 
> Sent: Saturday, March 28, 2020 8:43 PM
> To: Linda Dunbar <linda.dunbar@futurewei.com>; Bob Briscoe <
> ietf@bobbriscoe.net>; Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; 
> sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org;
> Black, David <David.Black@dell.com>
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> Hi Linda,
>  
> This time, I’ll respond, again summarizing what needs to be done.
>  
> [A] & [B] These two concerns about draft status and keywords are resolved in
> the -10 version.
>  
> The intended status for this draft in the datatracker is now Informational,
> and all use of RFC 2119/RFC 8014 keywords has been removed, including the
> keywork boilerplate.  The reason for removal of the keyword boilerplate is the
> fact that no RFC 2119/RFC 8014 keywords remain in the draft.
>  
> [Linda] Thank you for the information.
>  
> [C-1] As Section 8 has not been removed, it needs some serious editing
> attention.
>  
> First of all, Section 8 needs to start by making a clear distinction between
> Planned VM moves (Hot, Warm and Cold VM Mobility, as those terms are defined
> in Section 2) and Unplanned VM Moves (Hot Standby, as described in Section
> 8).  Also, Hot (VM) Standby is not a good name for this VM replication
> mechanism – I would suggest Hot VM Failover.
> [Linda] Thank you very much for the suggestion. How about the following
> changes in Section 8?
>  
> VM Hot mobility is to enable uninterrupted running of the application or
> workload instantiated on the VM when the VM running conditions changes, such
> as utilization overload, hardware running condition changes, or others. Hot,
> Warm and Cold mobility is planned activities which are managed by VM
> management system.
> 
> For unexpected events, such as unexpected failure, a VM might need to move to
> a new NVE, which is called Hot VM Failover in this document. For Hot VM
> Failover,  there are VMs in both primary and secondary NVEs. They have
> identical information and can provide services simultaneously as in load-share 
> mode of operation.
>  
>  
> Second, there are a number of incorrect statements in Section 8, starting with
> “They [VMs] have identical information and can provide services simultaneously
> as in load-share mode of operation.”  The “can provide services
> simultaneously” portion of that statement is incorrect in full generality. 
> That portion of the statement is correct under a number of assumptions and
> limitations that need to be stated in Section 8.  Further, under those
> assumptions and limitations, the two VMs do not “have identical information”
> when operating in load-share mode because they are processing different
> portions of the shared load.  This nonetheless works because the differences
> in the VM states don’t matter as a consequence of the assumptions/limitations
> that need to be added to Section 8.
> [Linda] Change the wording to the following:
>  
> For Hot VM Failover, there are VMs in both primary and secondary NVEs. They
> can provide services simultaneously as in load-share mode of operation.  If
> the VM in the primary NVE fails, there is no need to actively move the VM to
> the secondary NVE because the VM in the secondary NVE can immediately pick up
> the processing. The VM Failover to the new NVE is transparent to the peers
> that communicate with this VM. It is out of the scope of this document on how
> and what information are exchange between the two VMs under two different NVE.
>  
>  
> A similar problem affects “one option is to start with and maintain TCP
> connections to two different VMs at the same time.”  Unmodified applications
> in other VMs and elsewhere in the Internet do not maintain two simultaneous
> connections to two different VMs in the fashion described, so text needs to be
> added to Section 8 to explain where the other ends of those two TCP
> connections are located and what (if any) modifications to applications and/or
> other components are necessary to use this option.
>  
> [Linda] Change the wording to the following:
> The VM Failover to the new NVE is transparent to the peers that communicate
> with this VM. This can be achieved by both active VM and standby VM share the
> same TCP port and same IP address. There must be a load balancer that can
> distribute the packets to the VM under the new NVE. The new VM can pick up
> providing service while the sender (peer) still continues to receive Ack from
> the old VM and chooses not to use the service of the secondary responding VM. 
> If the situation (loading condition of the primary responding VM) changes the
> secondary responding VM may start providing service to the sender (peers).
> If TCP states are not properly synchronized among the two VMs, the VM under
> the New NVE after failover can force the peers to re-establish a new TCP
> connection by stopping the previous TCP connection. As most TCP connections
> are short lived, re-establishing a new one is not a big problem.
>  
>  
> [C-2]  In this text in Section 7:
> The Warm VM mobility refers to having the backup entities receive backup
> information at more frequent intervals
> The terms “backup entities” and “backup information” are not defined.  They
> need to be defined (e.g., in Section 2) and explained in Section 7.
> [Linda] how about changing the wording to the following:
> The Warm VM mobility refers to having the functional components under the new
> NVE to receive running status of the VM at frequent intervals, so that it can
> take less time to launch the VM under the new NVE and other NVEs that
> communicate with the VM can be notified of the VM move more promptly.  The
> duration of the interval determines the effectiveness (or benefit) of Warm VM
> mobility.  The larger the duration, the less effective the Warm VM mobility
> option becomes.
>  
> [D] The new Section 4.1 text on Gateways is a good start, but is not
> sufficient, as it only covers the VN-to-VN use case.
>  
> The scope of RFC 8014 Section 5.3 gateways is much broader than VN-to-VN,
> e.g., it includes the use case that Bob asked about: “Inter-communication
> through an NVO3 Gateway equally applies to communication with any external
> entity, whether in another VN or just in another N (without the V, I mean just
> any other network, whether virtual or not).”  See the first paragraph of RFC
> 8014 Section 5.3 to understand the full scope of gateway functionality that
> needs to be addressed in this draft.
> [Linda] When a VM communicates with an external entity, the VM is effectively
> communicating with a peer in a different network.  Communicating with hosts in
> another VNs and external hosts are through NVO3 Gateway.
>  
> When a VM communicates with an external entity, the VM is effectively
> communicating with a peer in a different network or a globally reachable
> host.  Communicating with hosts in other VNs and external hosts are all
> through the NVO3 Gateway. There are different policies on the NVo3 Gateway to
> govern the communication among VNs and with external hosts.
>  
>  
> [E] The removal of the “Task” and ICMP text resolves my concerns, but Bob will
> need to respond on his TCP-related concerns (below).
>  
> [F] The Security Considerations are improved, as there is a specific
> dependence of some of the mechanisms in this draft on IPv4 ARP and IPv6 ND
> messages – I suggest pointing that out, possibly as an inserted new second
> sentence in the Security Considerations section.
>  
> OLD
>      In Layer-3 based overlay data center networks, the problem of
>      address spoofing may arise.
> NEW
>      In Layer-3 based overlay data center networks, ARP and ND messages
>      can be used to mount address spoofing attacks.
>  
> [Linda] Thanks for the suggestion. Changed accordingly.
>  
> Further down in the same paragraph, the text mentions denial of service
> attacks via ARP/ND overload, but also needs to mention the much simpler black-
> hole attacks that can be mounted by sending a falsified ARP/ND message to
> indicate that the victim’s IP address has moved to the attacker’s VM.  That
> technique can also be used to mount man-in-the-middle attacks with some more
> effort to ensure that the intercepted traffic is eventually delivered to the
> victim.  There is no discussion about countering these attacks aside from:
>  
>      This requires VM management system to apply stronger security
>      mechanisms when add a VM to an NVE. VM Management system is out of
>      scope of this document.
>  
> [Linda] thanks for the suggestion. The wording is changed to the following:
>  
> In Layer-3 based overlay data center networks, ARP and ND messages can be used
> to mount address spoofing attacks.  An NVE may have untrusted VMs attached.
> This usually happens in cases like the VMs running third party applications. 
> Those untrusted VMs can send falsified ARP (IPv4) and ND (IPv6) messages,
> causing NVE, NVO3 Gateway, and NVA to be overwhelmed and not able to perform
> legitimate functions. The attacker can intercept, modify, or even stop data
> in-transit ARP/ND messages intended for other VNs and initiate DDOS attacks to
> other VMs attached to the same NVE. A simple black-hole attacks can be mounted
> by sending a falsified ARP/ND message to indicate that the victim’s IP address
> has moved to the attacker’s VM.  That technique can also be used to mount man-
> in-the-middle attacks with some more effort to ensure that the intercepted
> traffic is eventually delivered to the victim. 
> The locator-identifier mechanism given as an example (ILA) doesn't include
> secure binding. It doesn't discuss how to securely bind the new locator to the
> identifier.
>  
> Because of those threats, VM management system needs to apply stronger
> security mechanisms when add a VM to an NVE. Some tenants may have requirement
> that prohibit their VMs to be co-attached to the NVEs with other tenants. Some
> Data Centers have their NVO3 Gateways to be equipped with capability to
> mitigate ARP/ND threats, such as periodically exchanging its ARP/ND cache with
> NVA’s central control system.
> 
>  
> The Security Area is likely to object to that as insufficient - if the attacks
> are in scope Security Considerations for a draft, then so are the
> countermeasures to the attacks.
>  
> [2/3/4]  Bob’s 3 editorial comments (numbered 2, 3 and 4 below for historical
> reasons) have not been addressed.  They need to be addressed.
>  
> [Linda] I think those have been addressed in -10 version, specifically:
>  
> 2              [NOT ADDRESSED] Signposting of where sub-cases start and end in
> S.4.1
> [Linda] changed.
>  
> 3.            [NOT ADDRESSED] Indecision over whether packets are silently
> dropped, dropped with an ICMP message, forwarded or tunnelled.
> [Linda] removed the ICMP discussion.
>  
> 4.            [NOT ADDRESSED] The order in which the various stages of
> mobility occur was jumbled. Some stages have been ruled out of scope, but they
> are still mentioned in jumbled order.
> [Linda] add the reference to RFC7666 that describe the states and MIBs of VMs
> managed by Hypervisor.
>  
>  
> Thanks, --David
>  
> From: Linda Dunbar <linda.dunbar@futurewei.com> 
> Sent: Friday, March 27, 2020 4:10 PM
> To: Bob Briscoe; Black, David; Bocci, Matthew (Nokia - GB); sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3; draft-ietf-nvo3-vmm.all@ietf.org
> Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> [EXTERNAL EMAIL]
> Bob,
>  
> Resolution to your comments are inserted below in the next (-10) version,
> marked with [Linda-10].
>  
> Linda
>  
> From: Bob Briscoe <ietf@bobbriscoe.net> 
> Sent: Friday, March 27, 2020 10:23 AM
> To: Linda Dunbar <linda.dunbar@futurewei.com>; Black, David <
> David.Black@dell.com>; Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; 
> sarikaya@ieee.org
> Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org
> Subject: Re: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-
> 04
>  
> Linda, inline...
> 
> On 26/03/2020 23:28, Linda Dunbar wrote:
> > David,
> >  
> > Thank you for reviewing the revised version and providing more comments.
> > Resolution to your comments are inserted below marked with [Linda]  :
> >  
> > Thank you. Linda
> >  
> > From: Black, David <David.Black@dell.com> 
> > Sent: Wednesday, March 25, 2020 8:13 PM
> > To: Linda Dunbar <linda.dunbar@futurewei.com>; Bob Briscoe <
> > ietf@bobbriscoe.net>; Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>;
> > sarikaya@ieee.org
> > Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org
> > ; Black, David <David.Black@dell.com>
> > Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-
> > vmm-04
> >  
> > Quick responses based on the recently posted -08 version and the discussion
> > below.
> >  
> > [A] & [B] – Informational as intended RFC status is ok.
> > The RFC 2119/RFC 8014 keyword boilerplate needs to be removed from Section
> > 2, as none of those keywords are now used.
> > [Linda] Okay removed, though I did see other Informational drafts have this
> > statement.
> >  
> > [C] Hot and cold standby are still problem areas:
> >  
> > [Linda] This document only covers the case of VM moves being Planned.
> > ...
> > [Linda] “Standby modes of VM movement” is out of the scope of this document
> > as indicated in the newly inserted statement in the Introduction.
> >  
> > Section 8 on the Hot Standby mechanism needs to be removed from the draft,
> > as the Hot Standby mechanism is a fault tolerance mechanism that is
> > primarily intended for Unplanned moves, e.g., as the draft states:  If the
> > VM in the primary NVE fails, there is no need to actively move the VM to the
> > secondary NVE because the VM in the secondary NVE already contain identical
> > information.
> >  
> > [Linda]  Section 8 is on “Other Options”. Hot Redundancy is one option that
> > doesn’t require moving VM in case anything happens. Removed the word “VM
> > Mobility” from the title. Since this is an informational draft, the “Other
> > Option” is to give a perspective of other ways to achieve reliability.
> 
> [BB] Is any of section 8 relevant to the scope of the draft now?
> 
> I thought achieving reliability would be out of scope, now you've said (a
> couple of paras earlier): "This document only covers the case of VM moves
> being Planned" and "“Standby modes of VM movement” is out of the scope of this
> document". Surely, no-one can improve reliability with only planned moves.
> Reliability is all about unplanned moves.
> 
> [Linda-10] VM Hot mobility is to enable uninterrupted running of the
> application or workload instantiated on the VM when the VM running conditions
> changes, such as utilization overload, hardware running condition changes, or
> others.  The process improves the reliability of the application running on
> the VM, even though it is planned move. Whereas, Hot Standby option is to
> prevent unexpected failure conditions, 
> 
> >  
> >  
> > In addition, the “cold standby entity” in Section 7 should be renamed to
> > something more meaningful.
> > [Linda] How about changing the text to the following?
> >  
> > The Cold VM mobility can be facilitated by VM management system to exchange
> > the needed states between the Old NVE and the New NVE. The cold mobility
> > option can be used for non-critical applications and services that can
> > tolerate interrupted TCP connections.
> > 
> 
> [BB] It's not just S.8. In S.7 (of draft-09), there's still the following text
> (which was what I was originally talking about when I raised this concern):
> 
>      The Warm VM mobility refers the backup entities receive backup
>      information at more frequent intervals.  The duration of the
>      interval determines the warmth of the option.  The larger the
>      duration, the less warm (and hence cold) the Warm VM mobility
>      option becomes.
> 
> As far as I understand it, text that talks about backup entities receiving
> backup info at frequent intervals is describing standby for failover, not
> mobility. Even tho the text gives it the name "Warm VM Mobility", the meaning
> of all the words is surely about standby and failover.
> 
> [Linda-10] The statement you quote has been deleted by -09 version. The new
> text is as follows:
>  
> Both Cold and Warm VM mobility (or migration) refers to the VM being
> completely shut down at the Old NVE before restarted at the New NVE.
> Therefore, all transport services to the VM are restarted.
> 
> In this document, all VM mobility is initiated by VM Management System.  The
> Cold VM mobility only exchange the needed states between the Old NVE and the
> New NVE after the VM attached to the Old NVE is completely shut down. There is
> time delay before the new VM is launched. The cold mobility option can be used
> for non-critical applications and services that can tolerate interrupted TCP
> connections.
> 
> The Warm VM mobility refers to having the backup entities receive backup
> information at more frequent intervals, so that it can take less time to
> launch the VM under the new NVE and other NVEs that communicate with the VM
> can be notified prior to the VM move.  The duration of the interval determines
> the effectiveness (or benefit) of Warm VM mobility.  The larger the duration,
> the less effective the Warm VM mobility option becomes.
> 
> For Hot VM Mobility, once a VM moves to a New NVE, the VM IP address does not
> change and the VM should be able to continue to receive packets to its
> address(es). The VM needs to send a gratuitous Address Resolution message or
> unsolicited Neighbor Advertisement message upstream after each move.
> 
>  
> >  
> > [D] The suggested NVO3 gateway text is incorrect because it includes this
> > statement:
> > > RFC 8014 (Section 5.3) has the discussion whether a VM move may result in
> > or cannot result in a change to the network node providing the NV03 Gateway
> > functionality
> >  
> > Section 5.3 of RFC 8014 does not contain that discussion; please do go read
> > RFC 8014.  That discussion (which is specific to VM mobility) needs to be
> > written and added to this draft.
> > [Linda] sorry, I literally took your wording. Change the text to the
> > following to amend what is missing from RFC8014.
> > Inter VNs (Virtual Networks) communication refers to communication among
> > tenants (or hosts) belonging to different VNs. Those tenants can be attached
> > to the NVEs co-located in the same Data Center or in different Data centers.
> > This document assumes that the inter-VNs communication is via the NVO3
> > Gateway as described in RFC8014 (NVO3 Architecture). RFC 8014 (Section 5.3)
> > describes the NVO3 Gateway function which is to relay traffic onto and off
> > of a virtual network, i.e. among different VNs.
> > 
> > After a VM is moved to a new NVE, the VM’s corresponding Gateway may need to
> > change as well. If such a change is not possible, then the path to the
> > external entity need to be hair-pinned to the NVO3 Gateway used prior to the
> > VM move.
> > 
> >  
> 
> [BB] Works for me.
> 
> Nonetheless, surely this doesn't just apply to inter-VN communication. Inter-
> communication through an NVO3 Gateway equally applies to communication with
> any external entity, whether in another VN or just in another N (without the
> V, I mean just any other network, whether virtual or not).
> 
> >  
> >  
> > [E] Two parts
> > E-a) Section 4.2.1 is now Section 4.3 and the new text is an improvement,
> > but it would be even clearer to remove the term “Task” entirely from this
> > draft, as the scope (stated in the Introduction and Abstract) is VM
> > mobility.  Discussion about application of the techniques described in this
> > draft to entities other than VMs should be placed in a separate new section
> > near the end of the draft to avoid confusing readers.  This would also
> > address a significant portion of Bob’s “new problem” below.
> > [Linda] Okay, removed.
> >  
> > E-b) Discussion of ICMP effects has not been added.   If that discussion is
> > not going to be added, then the text on use of ICMP needs to be removed.
> > [Linda] Okay, removed.
> 
> [BB] Can you explain (at least in this email) what you think is meant by the
> following sentence:
> 
>      impact is relatively small when TCP
>      connections are automatically closed in the network stack during a
>      migration event.  
> 
> I think this is saying that network elements close TCP connections. Normally
> only a host participating in a connection can close it. The only way a network
> element would be able to close a TCP connection would be to spoof RST or FIN
> packets (if the TCP connection is not authenticated or encrypted). Is this
> what is meant?
> 
> [Linda-10] meaning the TCP connection to the application instantiated on the
> VM is dropped during VM migration.  To make it more clear, change the text to
> the following:
> 
> The simplest course of action is to drop all TCP connections to the
> applications running on the VM during a migration. 
> 
> 
> 
> On the other hand, if it means the network discards all packets, that will
> probably eventually cause something on one of the end-hosts to time out and
> close the connection, but it will be a long and painful way to close a
> connection, lasting perhaps tens of minutes or longer.
> 
> Similarly, it's not clear what the subject of this sentence is, i.e. what is
> doing the pausing:
> 
>     More involved approach to connection migration entails pausing the
>      connection, packaging connection state and sending to target,
>      instantiating connection state in the peer stack,
> 
> As above, normally only a host can pause one of its TCP connections. But
> everything else in the draft is talking about actions taken by network
> elements. So if this does mean the host is in control of pausing, it needs to
> make that clear, and describe how the network communicates the need to pause
> to each application.
> 
> [Linda-10]  add “proxy to the application (or the application itself)” to make
> it more clear.
> 
> More involved approach to connection migration entails a proxy to the
> application (or the application itself) to pause the connection, package
> connection state and send to target, instantiate connection state in the peer
> stack, and restarting the connection. 
> >  
> > [F] Security Considerations are still a problem – the added security text is
> > not specific to VM mobility.
> >  
> > [Linda] How about changing to the following:
> > Security threats for the data and control plane for overlay networks are
> > discussed in [RFC8014].  ARP (IPv40 and ND (IPv6) are not secure, especially
> > if we accept gratuitous versions in multi-tenant environment.
> > 
> > In Layer-3 based overlay data center networks, the problem of address
> > spoofing may arise.  An NVE may have untrusted VMs attached. This usually
> > happens in cases like the VMs running third party applications.  Those
> > untrusted VMs can send falsified ARP (IPv4) and ND (IPv6) messages, causing
> > NVE, NVO3 Gateway, and NVA to be overwhelmed and not able to perform
> > legitimate functions. The attacker can intercept, modify, or even stop data
> > in-transit ARP/ND messages intended for other VNs and initiate DDOS attacks
> > to other VMs attached to the same NVE.
> > 
> > This requires VM management system to apply stronger security mechanisms
> > when add a VM to an NVE. VM Management system is out of scope of this
> > document.
> > 
> >  
> 
> [BB] With the L3 VM mobility cases that are still included in scope, a new
> secure binding needs to be instantiated between the unchanged identifier and
> the new locator. This is unlikely to be something that a 'VM management
> system' will be in a position to do, because it involves e2e checks that a
> message from one IP address to another and the response back to the first
> address, actually get routed back to the originator. 
> 
> The locator-identifier mechanism given as an example (ILA) doesn't include
> secure binding, AFAICT. BTW, the cited ILA draft is the old one (draft-
> herbert-nvo3-ila), which has now moved to draft-herbert-intarea-ila. And even
> that expired 18 months ago. Its doesn't discuss how to securely bind the new
> locator to the identifier. There are just some fairly high level comments in
> the security considerations section about using IPSec or GUESEC, both of which
> have nothing to do with securing the binding between locators and identifiers
> nor protecting against IP address spoofing (they protect the /information/ in
> the connection, not the integrity of the address bindings). Similarly, draft-
> herbert-ila-mobile has some discussion of security, but not of the binding
> between locator and identifier (again AFAICT).
> 
> So, if it's still appropriate to use ILA as an example, I think this security
> gap needs to be highlighted in the Security Considerations section of nvo3-
> vmm. Alternatively you might want to refer to another example mechanism, such
> as HIP, which was designed for IP address mobility using a secure binding
> between the identifier and each locator. However, I have no idea whether other
> aspects of HIP would make it inapplicable for VM Mobility.
> 
> [Linda-10] I would think the security issue for ILA should be addressed in the
> ILA draft. Otherwise, if that draft changes, my claim of their security issue
> is no longer valid. Well, to make you happy, I added  the following wording:
> 
> The locator-identifier mechanism given as an example (ILA) doesn't include
> secure binding. It doesn't discuss how to securely bind the new locator to the
> identifier.
> > -------
> > For completeness, here are the remaining 3 structure and comprehensibility
> > problems that Bob noted:
> > [NOT ADDRESSED] Signposting of where sub-cases start and end in S.4.1
> > [Linda] changed.
> 
> [BB] 4.2 (which was 4.1) still reads as one long block of prose with no clear
> indication of where the different cases stop. The main problem is the case
> where the client requests the same MAC address. This para seems to have been
> inserted into the IPv4 case, half-way through, and it's not clear that it ends
> and goes back to other matters to do with the IPv4 case. It's also not clear
> why a client requesting the same MAC address isn't addressed under the IPv6
> case. Here (again) are the 4 different parts of this subsection that I could
> identify:
> 
> * IPv4 
> * end-user client 
> * 2 paras starting "Other NVEs communicating with this virtual machine..."
> [Not clear that the end-user case has ended and we have returned to the
> general IPv4 case?]
> [Linda-10] Other NVEs communicating with this VM is referring to “Other NVEs
> that have attached VMs communicating with this VM”.
>  
> Other NVEs that have attached VMs or the NVO3 Gateway that have external
> entities communicating with this VM may still have the old ARP entry
> 
> * IPv6 [Strictly, it still hasn't said whether the end-user client case has
> ended.] [Also, it doesn't explain why there is no need for an end-user client
> case under IPv6?]
> 
> [Linda-10] Not sure what do you mean by saying “end-user client case under
> IPv6”?  this section is about VM having IPv6 address.
> This section is to describe the approach to prevent other NVEs to communicate
> with the OLD NVE. Nevertheless, I removed some extra wordings to make the
> description tighter.  
>  
>  
> 
> > [NOT ADDRESSED] Indecision over whether packets are silently dropped,
> > dropped with an ICMP message, forwarded or tunnelled.
> > [Linda] removed the ICMP discussion.
> 
> [BB] The ICMP text was a different problem, not this one. This one is about
> how there's no clear structure to say which solution is used in which
> circumstances. Tunnelling comes up in the L2 section, packet drop comes up
> here and there, but it says it can also forward packets, without saying why it
> would drop packets if it could forward them. You can close TCP connections.
> You can pause them. It's written like "Whatever; You can do this, or that or
> the other. Do what you want. Whatever. Am I bovvered? Look at my face. Is my
> face bovvered? Whatever."
> 
> [Linda-10] please read the updated 09 version. The old NVE can forward the
> received packets destined towards the VM being moved to the new NVE. That is
> how Tunneling is achieved.
> 
> [NOT ADDRESSED] The order in which the various stages of mobility occur was
> jumbled. Some stages have been ruled out of scope, but they are still
> mentioned in jumbled order.
> > [Linda] add the reference to RFC7666 that describe the states and MIBs of
> > VMs managed by Hypervisor.
> 
> [BB] The order of text in this draft will not be solved by referring to
> another draft that gives things in the right order.
> 
> > Thanks, --David
> >  
> 
> [BB] Thanks to David for helping me out here. And thanks to Linda for starting
> to get to grips with this draft. I hope you can see now that it was not in
> good shape.
> 
> I'm still wanting to know why the IETF needs to publish this draft (and
> therefore why we're all having to sink our time into getting it into shape).
> As more and more of it is being removed as out of scope, I'm even less certain
> who is going to ever read this draft, or why they would need to.
> 
> [Linda-10] the intent of the draft is to describe the behavior of NVEs, NVO3
> gateways, and  NVA upon VMs being moved. Such as how to promptly time out
> ARP/ND entries upon moving, sending VM change notifications to NVA, and
> sending query to NVA when entries are not found.  So that the updated ARP/ND
> caches on other NVEs can be updated promptly.  
> As Directorate for Security Area, OpArea, and GenArt Area, I have reviewed
> many drafts to be queued to RFCs that have far less content that what is
> described in this this draft.
> 
> Linda
> 
> Bob
> 
> > From: Linda Dunbar <linda.dunbar@futurewei.com> 
> > Sent: Wednesday, March 25, 2020 1:47 PM
> > To: Black, David; Bob Briscoe; Bocci, Matthew (Nokia - GB); 
> > sarikaya@ieee.org
> > Cc: tsv-art@ietf.org; NVO3; draft-ietf-nvo3-vmm.all@ietf.org
> > Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-
> > vmm-04
> >  
> > [EXTERNAL EMAIL]
> > David and Bob,
> >  
> > Thank you very much for the prompt replies and comments.
> > Inserted below are the proposed resolutions to your comments and
> > suggestions.
> >  
> > Linda
> >  
> > From: Black, David <David.Black@dell.com> 
> > Sent: Tuesday, March 24, 2020 10:05 PM
> > To: Bob Briscoe <ietf@bobbriscoe.net>; Linda Dunbar <
> > linda.dunbar@futurewei.com>; Bocci, Matthew (Nokia - GB) <
> > matthew.bocci@nokia.com>; sarikaya@ieee.org
> > Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; draft-ietf-nvo3-vmm.all@ietf.org
> > ; Black, David <David.Black@dell.com>
> > Subject: RE: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-
> > vmm-04
> >  
> > Trying to help out – here are some summarized comments on the first six
> > issues that Bob listed as not addressed – this summary does not include the
> > requests for a serious editorial pass and draft restructuring for clarity.
> >  
> > -- [A] Purpose of draft &
> > -- [B] Normative text
> >  
> > The draft header indicates that its intended status is Informational, but
> > the datatracker indicates Best Current Practice (BCP).   Which one is
> > correct?
> >  
> > The answer to that question affects what to do about the two instances of
> > “MUST”.   Based on what I see in this draft, Informational seems more
> > appropriate than BCP.
> > [Linda] agree with you. Will ask the NV03 Chair to change the status to
> > “Informational” to be consistent with the status stated in the draft. The
> > two instances of “MUST” has been removed in 08 version.
> >  
> >  
> > -- [C] VM mobility vs. VM redundancy
> >  
> > Bob> #. The draft silently slips back and forth between VM mobility and VM
> > redundancy,
> >  
> > It does indeed, almost exactly as Bob describes:
> >  
> > [BB] In -07 warm standby is still described as it was in -04: state update
> > messages at regular intervals. That is redundancy, not mobility.
> >  
> > With one exception - the draft discusses hot and cold standby, but not warm
> > standby – I think Bob meant to refer to cold standby.  That said ...
> >  
> > The crucial distinction that the draft needs to make clear is between
> > planned moves, i.e., mobility, and unplanned moves, i.e., standby-based
> > failover (in this context, “failover” is a better word than “redundancy”). 
> > In both cases, network connections have to be moved and/or re-established,
> > and the mechanisms to do so are related which is why it makes sense for the
> > draft to cover both classes of mechanisms.
> >  
> > [Linda] This document only covers the case of VM moves being Planned. Even
> > with Cold Migration, it is referring to VM Management System  informs the VM
> > in the Location (NVE) to be launched. I can add the following statement in
> > the introduction in 08 version to make it clear:
> > This document is strictly within the DCVPN, as defined by the NVO3 Framework
> > [RFC 7365]. The intent is to describe Layer 2 and Layer 3 Network behavior
> > when VMs are moved from one NVE to another. This document assumes that the
> > VMs move is initiated by VM management system, i.e. planed move. How and
> > when to move VM are out of the scope of this document. RFC7666 already has
> > the description of the MIB for VMs controlled by Hypervisor. The impact of
> > VM mobility on higher layer protocols and applications is outside its scope.
> >  
> >  
> > The Standby modes of VM movement are missing from the conventions in Section
> > 2, and the distinction between planned and unplanned moves should be made
> > clear there or in Section 3 before getting into the details of the specific
> > modes.  I’ll also observe that Section 8 leaves out a lot of the details
> > that are required to make Hot Standby work, and in particular, “provide
> > services simultaneously as in load-share mode of operation” is not possible
> > in all situations – if the VM has significant internal state then load
> > balancing among the two VMs may cause failover to behave incorrectly because
> > the VM that is failed over to will have different internal state from the
> > failed VM.
> >  
> > [Linda] “Standby modes of VM movement” is out of the scope of this document
> > as indicated in the newly inserted statement in the Introduction.
> >  
> > I also suggest that the authors look at RFC 7666 which contains a state
> > machine for VM states – use of that state machine and the states that it
> > contains may help improve the explanation of the various VM movement
> > mechanisms in this draft.
> >  
> > [Linda] the reference to RFC7666 including the state machine is added to the
> > introduction and to Section 4.2. The state machine of VMs is referenced in
> > Section 4.2.
> > RFC7666 has the more detailed description of the State Machine of VMs
> > controlled by Hypervisor.
> > 
> >  
> >  
> > -- [D] External entities.
> >  
> > Bob> #. Applicability is fairly clearly outlined, but it is not clear
> > whether hosts corresponding with the mobile VMs are part of the same
> > controlled environment or on the uncontrolled public Internet.
> > 
> >  
> > 
> > I see ... Bob’s comment needs to be translated into NVO3 terminology ...
> > just a minute, bear with me ...
> >  
> > What Bob is asking for is a discussion of how things work when an NVO3
> > Gateway provides connectivity between the VM and an external entity outside
> > the NV Domain.  The authors should refer to Section 5.3 of RFC 8014 (NV03
> > Architecture), and discuss the resulting implications, including whether a
> > VM move may result in or cannot result in a change to the network node
> > providing the NV03 Gateway functionality – if such a change is not possible,
> > then the path to the external entity may be hair-pinned to the NVO3 Gateway
> > used prior to the VM move.
> > [Linda] The current Introduction already stated that the communications are
> > among VMs within the DC and with External entities.
> > There are communications among tasks belonging to one tenant and
> > communications among tasks belonging to different tenants or with external
> > entities.
> > I can add your suggested wording to the Section 4:
> > This document assumes that the communication with external entities are via
> > the NVO3 Gateway as described in RFC8014 (NVO3 Architecture). RFC 8014
> > (Section 5.3) has the discussion whether a VM move may result in or cannot
> > result in a change to the network node providing the NV03 Gateway
> > functionality – if such a change is not possible, then the path to the
> > external entity may be hair-pinned to the NVO3 Gateway used prior to the VM
> > move.
> >  
> > [E]  Section 4.2.1 is problematic.
> >  
> > E-a) Bob and the authors appear to be talking past each other, as the VM’s
> > IP addresses do not change as a consequence of what Section 4.2.1
> > describes.   However, Section 4.2.1 is sufficiently poorly written that it
> > lead Bob to the conclusion that the VM’s IP addresses do change ... and
> > looking at  that text now as if I knew nothing about VM mobility, I can see
> > how that text could lead a non-expert reader to that incorrect conclusion. 
> > In particular, identifier/locator separation is not necessary for what’s
> > going on in Section 4.2.1.
> >  
> > [Linda] Revise the 4.2.1  description to the following:
> > The term “Task” is referring to an entity (Task) that is instantiated on a
> > VM or a container, in another word, a Task can be an “Application” or a
> > “workload” running on a VM or a Container.
> > 
> > Moving a Task running on a VM attached to one NVE to another VM attached to
> > a New NVE is same as moving the VM from one NVE to the New NVE. The VM
> > attached to the New NVE needs to be assigned with the same address as VM
> > attached to the Old NVE, which is called Address Migration in this document.
> > Here is an example of the steps involved in Address Migration:
> > 
> >  Configure IPv4/v6 address on the target VM/NVE.
> >  Suspend use of the address on the old NVE.  This includes handling
> > established connections.  A state may be established to drop packets or send
> > ICMPv4 or ICMPv6 destination unreachable message when packets to the
> > migrated address are received. Referring to the VM State Machine described
> > in RFC7666.
> >  Push the new NVE-VM mapping to other NVEs which have the attached VMs
> > communicating with the VM being moved.  All relevant NVEs will learn the new
> > mapping via their corresponding NVA.
> >  
> >  
> > E-b) Also ...
> >  
> > Bob> For useful tutorial on how TCP responds to ICMP destination unreachable
> > messages
> > ...
> > [Linda] Layer 4 connection state handling is out of the scope of NVo3.
> >  
> > It appears that “Layer 4 connection state handling” includes the effects of
> > ICMP messages on connections.  That may be outside the scope of NVO3, but it
> > is within the scope of this draft because the draft discusses sending
> > “ICMPv4 or ICMPv6 destination unreachable message when packets to the
> > migrated address are received.”   Having brought up that concept, the draft
> > is responsible for discussing the effects and consequences of sending such
> > messages.  If the authors do not intend to discuss that topic, then Section
> > 4.2.1 ought to be removed.
> > 
> > [F] Security considerations
> >  
> > Bob> #. Security Considerations: repeats issues in other drafts that are not
> > specific to mobility, but it does not mention any security issues
> > specifically due to VM mobility.
> > 
> > That looks like a problem, if for no other reason than a SECDIR review of
> > the DetNet MPLS data plane draft has raised the analogous issue with that
> > draft.  I don’t see a SECDIR review for this draft in the data tracker – I
> > suggest that the draft shepherd ask for a SECDIR review and ensure that this
> > concern is brought to the attention of the SECDIR reviewer.
> >  
> > [Linda] Security specific to VM Mobility has the following:
> >  
> > There are several issues in a multi-tenant environment that create
> > problems.  In Layer-2 based overlay data center networks, lack of security
> > in VXLAN, corruption of VNI can lead to delivery to wrong tenant.  Also, ARP
> > in IPv4 and ND in IPv6 are not secure, especially if we accept gratuitous
> > versions.  When these are done over a UDP   encapsulation, like VXLAN, the
> > problem is worse since it is trivial for a non-trusted entity to spoof UDP
> > packets.
> > 
> >  
> > Thanks, --David
> >  
> > From: Tsv-art <tsv-art-bounces@ietf.org> On Behalf Of Bob Briscoe
> > Sent: Tuesday, March 24, 2020 7:34 PM
> > To: Linda Dunbar; Bocci, Matthew (Nokia - GB); sarikaya@ieee.org
> > Cc: tsv-art@ietf.org; NVO3; draft-ietf-nvo3-vmm.all@ietf.org
> > Subject: Re: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-
> > vmm-04
> >  
> > [EXTERNAL EMAIL]
> > Linda,
> > 
> > On 24/03/2020 20:10, Linda Dunbar wrote:
> > > Bob,
> > >  
> > >  
> > > With regard to the purpose of the document, the Abstract stated very
> > > clearly:
> > > This document describes virtual machine mobility solutions commonly used
> > > in data centers built with overlay-based network. This document is
> > > intended for describing the solutions and the impact of moving VMs (or
> > > applications) from one Rack to another connected by the Overlay networks.
> > >  
> > >  
> > > The Purpose is “for describing the solutions and the impact of moving VMs
> > > (or applications) from one Rack to another connected by the Overlay
> > > networks.”
> > 
> > [BB2] I quoted those sentences to you myself. I explained that they are what
> > the document is about. But they are not a reason for why the IETF is
> > publishing it.
> > 
> > >  
> > > Other changes and reply to  your comments are inserted below. Please let
> > > us know as soon as possible (hopefully not another 3 months) if they are
> > > acceptable?
> > 
> > [BB2] I felt guilty about not having noticed the emails about this. Until I
> > read it. Then I thought about the few days work I had put into that review
> > (I am not employed by anyone - I do this on my own time). And of the 14 main
> > points I made, only the 5 easy ones had been addressed and the 9 main ones
> > had been completely ignored. 
> > 
> > Anyway, my email was to Matthew as document shepherd. As I said below, I
> > believe my role as a reviewer is not to decide whether this document is
> > acceptable. It is to state which of my review comments have not been
> > addressed, say how important I think each is, and clarify to remove any
> > misunderstanding. Then it is up to Matthew to decide what you need to do (or
> > not).
> > 
> > Nonetheless, I am particularly concerned about the descriptions of
> > connections being paused or stopped, which are so far from how TCP/IP works,
> > that it makes me certain that this document is a fantasy - it can never have
> > been implemented. Or if it has been, there must be some major assumptions
> > about the applications in this multi-tenant DC that are not being stated.
> > 
> > more...
> > 
> > >  
> > >  
> > > Linda Dunbar
> > >  
> > >  
> > >  
> > > From: Bob Briscoe <ietf@bobbriscoe.net> 
> > > Sent: Monday, March 23, 2020 6:59 AM
> > > To: Bocci, Matthew (Nokia - GB) <matthew.bocci@nokia.com>; 
> > > sarikaya@ieee.org
> > > Cc: tsv-art@ietf.org; NVO3 <nvo3@ietf.org>; 
> > > draft-ietf-nvo3-vmm.all@ietf.org
> > > Subject: Re: [nvo3] [Tsv-art] Tsvart last call review of draft-ietf-nvo3-
> > > vmm-04
> > >  
> > > Matthew,
> > > 
> > > It is a long time since I reviewed this (Sep 2018), and I'm sorry for
> > > being unresponsive back in Aug 2019 when the -05 revision that aimed to
> > > address my -04 review comments was released.
> > > 
> > > I have to say, IMO this document is still far from suitable for
> > > publication as an RFC. The IETF surely cannot publish material that
> > > demonstrates such a loose grasp of the subject area, particularly the
> > > impact of mobility on layer 4 and above. However, that is not my call. All
> > > I can do as a reviewer is identify those of my comments that have not been
> > > addressed, and why it is important to address them. Then you, as document
> > > shepherd, can decide whether the IESG will need these issues to be
> > > addressed.
> > > 
> > > So, below, I work through my previous top level comments, identifying
> > > whether they have been addressed or not. I assume the editorial nits that
> > > I identified have been addressed (I haven't checked).
> > > 
> > > I also have to say that, as a general rule, the only ones of my review
> > > comments that have been addressed are the 'easy' ones that could be dealt
> > > with using something like find-and-replace techniques. Those that question
> > > the subject matter itself, have usually not been addressed at all. 
> > > 
> > > =================================
> > > 
> > > > #. The introduction does not say what the purpose of publishing this
> > > > draft is.
> > > 
> > > [NOT ADDRESSED]
> > > 
> > > [BB] Changing the intended status to Informational has helped to address
> > > some of my concerns. However, an informational document still has to have
> > > a target readership and a purpose. This vmm document still doesn't seem to
> > > have a purpose, or if it does, it doesn't say what's its purpose is.
> > > 
> > > When it says it... 
> > >  ...describes solutions that support VM mobility,
> > > 
> > > or that:
> > >  ...there is a desire to document comprehensive VM mobility solutions that
> > > cover
> > >      both IPv4 and IPv6.
> > > 
> > > ...they are not reasons for the IETF to publish an informational RFC. They
> > > are circular reasons - they just say, "we are writing this because it
> > > describes something". They do not say why it is relevant for the IETF to
> > > publish this description. Is it something that the IETF needs to
> > > understand before it moves in to standardize aspects of VM mobility? Does
> > > this document provide new insights that have not been understood before?
> > > Do multiple VM mobility solutions already exist, but a standard is needed,
> > > because they don't interoperate?
> > > 
> > > > #. It does not seem as if the NVO WG has discussed the purpose of using
> > > > normative text in this draft. See detailed comments.
> > > 
> > > [NOT ADDRESSED]
> > > 
> > > [BB] In -07 there are still the same two "MUSTs" as in -04. This is
> > > intended to be an informational RFC, so no-one will be required to comply
> > > with it. Admittedly, informational RFCs can contain normative keywords in
> > > certain special cases (e.g. an informative copy of a specification of a
> > > protocol that is outside the IETF's change control). But this draft is not
> > > one of those special cases. 
> > > 
> > > By my understanding of what normative keywords are for, these "MUSTs"
> > > ought to be replaced with "has to" or "needs to".
> > > 
> > > [Linda] How about changing to the following?
> > > The Old NVE needs to tunnel these in-flight packets to the New NVE to
> > > avoid packets loss.
> > > 
> > >  
> > 
> > [BB2] Yup. But before you start making minor tweaks, let's jump to the point
> > about stopping and pausing connections, which I think throws the whole
> > document into doubt...
> > 
> > >  
> > > 
> > > > #. The draft silently slips back and forth between VM mobility and VM
> > > > redundancy, without recognizing the differences. See detailed comments.
> > > 
> > > [Linda] There is no single mention of VM Redundancy in the 07 version.
> > > What do you mean?  “Warm Mobility” depends on having relevant information
> > > (or status) about the VM at the target location.  That is different from
> > > running redundant instance at the target location.
> > > 
> > > [NOT ADDRESSED]
> > > 
> > > [BB] In -07 warm standby is still described as it was in -04: state update
> > > messages at regular intervals. That is redundancy, not mobility. In
> > > particular, Section 7 has VM Mobility in the title, but starts slipping
> > > into talking about standby after the second para.
> > > 
> > > [Linda] In this document, the definition of “WARM Mobility” is referring
> > > to having the VM’s relevant information on the target location.
> > > 
> > > 
> > > Warm standby is useful for resilience against failures, but it is not
> > > mobility. In standby, processing does not /move/, which is what /mobility/
> > > means. Standby never releases the 'Old' address and, because there is no
> > > first move there is never a second or subsequent move. So routing and
> > > addressing does not become increasingly fragmented.
> > > 
> > > [Linda] this document doesn’t address the concept of “Warm Standby”.
> > > 
> > > 
> > > As the Intro says, the purpose of VM mobility is...
> > > 
> > >      It is highly desirable [RFC7364] to allow
> > >      VMs to be moved dynamically (live, hot, or cold move) from one
> > >      server to another for dynamic load balancing or optimized work
> > >      distribution.
> > > 
> > > In contrast, warm standby does not ever release the 'Old' processing
> > > resources, so it cannot be used for dynamically balancing load and
> > > optimizing work distribution.
> > > 
> > > The following gives the impression that warm 'mobility' is on a spectrum
> > > between hot and cold VM mobility:
> > > 
> > >      The larger the
> > >      duration, the less warm (and hence cold) the Warm VM mobility
> > >      option becomes.
> > >  
> > > [Linda] are you satisfied with the following change?
> > > The Warm VM mobility refers the backup entities receive backup information
> > > at more frequent intervals.  The duration of the interval determines the
> > > effectiveness (or benefit) of Warm VM mobility.  The larger the duration,
> > > the less effective the Warm VM mobility option becomes.
> > > 
> > 
> > [BB] The problem here is a lot deeper than can be fixed with a few word
> > changes.
> > 
> > > Warm standby is on a different spectrum to hot and cold mobility. But both
> > > hot and cold mobility involve one volley of messages. The only difference
> > > is whether processing is stopped or not. Repeating messages is not part
> > > way between one message and one message. Regularly repeating messages is
> > > not part way between leaving a process running and stopping it.
> > > 
> > > [Linda] Agree with you. But this document doesn’t cover “Warm Standby”.
> > >  
> > > 
> > > This is still muddle-headed thinking - about redundancy, not mobility. I'm
> > > not saying redundancy is not important - it's extremely important. I'm
> > > just saying it's out of scope for a VM mobility draft.
> > > 
> > > > #. Please adopt different terminology than "source NVE" and "destination
> > > > NVE", which are really poor choices of terms for an intermediate node.
> > > > See detailed comments. Why not use "old NVE" and "new NVE", which is
> > > > what you mean?
> > > 
> > >  
> > > [Linda] There is no reference of “source NVE” nor “destination NVE” in the
> > > 07 version.
> > 
> > [BB2] The sentences starting with '#' are copies of my original review
> > comments (as explained at the start of my email). This one was addressed in
> > your updated drafts, which is why I said '[ADDRESSED]' right below it. And
> > thanked you for addressing it.
> > 
> > >  
> > > 
> > > [ADDRESSED] Thank you
> > > 
> > > > #. Applicability is fairly clearly outlined, but it is not clear whether
> > > > hosts corresponding with the mobile VMs are part of the same controlled
> > > > environment or on the uncontrolled public Internet. See detailed
> > > > comments.
> > > > 
> > > 
> > > [NOT PROPERLY ADDRESSED]
> > > 
> > > [BB] The introduction now contains the useful scoping sentence:
> > > 
> > >      There are
> > >      communication among tasks belonging to one tenant and
> > >      communication among tasks belonging to different tenants or with
> > >      external entities.
> > > 
> > > However, AFAICT, the draft still does not address the case where an
> > > internal entity moves but it needs to continue to communicate with an
> > > external entity that is not controlled by (or even aware of) the NVA.
> > > 
> > > [Linda] the behavior on how the Old NVE needs to tunnel packets to the New
> > > NVE applies to both external traffic and internal traffic.
> > >  
> > 
> > [BB2] But, by definition, an external entity is not aware of an NVA and
> > there's no reason to assume it has any logic to communicate with an NVA.
> > 
> > > A system of tasks within a controlled environment all talking with each
> > > other would be like Schroedinger's cat in a box - a quaint thought
> > > experiment but not useful. The outside world has to be able to continually
> > > give input (requests, data, events, code, etc) and get output (responses,
> > > transformed data, notifications, etc). So a VM mobility solution cannot
> > > just separate off the outside world as if that's a different problem that
> > > is not in scope. 
> > > 
> > > It's not difficult to do this, but it surely has to be done.
> > > 
> > > > 
> > > > #. Section 4.2.1 on L3 VM mobility reads like some potential half-
> > > > thought-through ideas on how to solve L3 mobility, rather than current
> > > > practice, let alone best current practice. Either current practice
> > > > should be described instead, or the scope of the draft should be
> > > > narrowed solely to L2 VM mobility. See detailed comments.
> > > 
> > > [NOT ADDRESSED]
> > > 
> > > [BB] None of the inadequate text in this section has been altered (except
> > > terminology find-and-replace that happened to hit words in this section).
> > > 
> > > The draft still doesn't deal with the reality of where layer-4 connection
> > > state is held (in the end applications and the transport stack under
> > > them). The authors seem to believe that network elements can close or
> > > pause Layer-4 connections. For this, applications have to be built with
> > > logic to handle IP address migration. But the scope of this draft is
> > > multi-tenant DCs, where the DC infrastructure has no knowledge of which
> > > applications each tenant might be using.
> > > 
> > > My comments all still apply, and can still be found quoted way down this
> > > email (find text: "#. L3 Mobility").
> > >  
> > > 
> > > 
> > > For useful tutorial on how TCP responds to ICMP destination unreachable
> > > messages (defined as soft errors), and the dilemmas surrounding how TCP
> > > should respond, see RFC5461 "TCP's Reaction to Soft Errors".
> > > It's probably also worth reading RFC6069 "Making TCP More Robust to Long
> > > Connectivity Disruptions (TCP-LCD)".
> > > Other transport protocols (e.g. SCTP, QUIC/UDP, RTP/UDP, etc) face similar
> > > dilemmas.
> > > 
> > > [Linda] Layer 4 connection state handling is out of the scope of NVo3. A
> > > separate draft can be written in the Transport Area to deal with Layer
> > > Connection state Handling.
> > >  
> > 
> > [BB2] Thank you for finally admitting the parts of this draft about L3
> > migration are a fantasy. 
> > 
> > When the L3 parts of the draft talk about using ICMP messages to stop
> > connections or pause connections, it needs to be understood that a
> > connection in TCP/IP /is/ the state at layer 4. The relevant Layer 4
> > connection state /is/ the IP address of its peer that the end systems holds.
> > If you want to avoid more and more tunnels; if you want to prevent IP
> > address space fragmentation, you have to either change that state (the IP
> > address) that is held at layer 4, or make it somehow reference another IP
> > address. 
> > 
> > There is no mechanism in this draft to change the IP address that is written
> > into packets to get them from the sender to the receiving end. If dealing
> > with connection state is out of scope, and you don't have any other
> > mechanism, the only outcome is continually more tunnels or continually more
> > address fragmentation. 
> > 
> > There are ways to solve this problem, e.g. by adding a layer of abstraction
> > so that the IP addresses held in the L4 connection state are identifiers but
> > not locators. Solutions like HIP, LISP, etc. take this approach. But it's
> > not sufficient to not solve this problem at all, and instead invent a
> > fantasy solution that has no relation to how the TCP/IP architecture works,
> > then say that it would have worked, if only the transport area had written a
> > draft that solved the VM mobility problem for us.
> > 
> > >  
> > > 
> > > > # The VM's file system is described as state that moves with the VM
> > > > (S.6), but VM mobility solutions often move the VM but stitch it back to
> > > > its (unmoved) storage. Conversely, the storage can also move independent
> > > > of the VM.
> > > >  
> > > 
> > > [ADDRESSED]
> > > 
> > > [BB] Thank you.
> > > 
> > > > #. The draft omits some of the security, transport and management
> > > > aspects of VM mobility. See detailed comments.
> > > 
> > > [NOT ADDRESSED]
> > > 
> > > [BB] The Security Considerations section is unchanged. It still only
> > > refers to previously known security issues with overlays in general, and
> > > does not discuss security vulnerabilities specific to VM mobility. In
> > > particular the need for the transport to recheck for address spoofing
> > > after each address change, which was identified in my review, which can
> > > still be found quoted below under "#. Gaps".
> > > 
> > > [Linda] the Security section has the following statement:
> > > In Layer-3 based overlay data center networks, the problem of address
> > > spoofing may arise.  An NVE may have untrusted tasks attached.
> > 
> > [BB2] Yes, I know the Security Considerations section says that. And it said
> > that at draft-04 as well. You don't seem to appreciate that I spent the
> > whole of my Sunday afternoon and Monday morning reading and checking the
> > latest version and refreshing my memory to establish whether anything has
> > changed since -04 to address my review comments. It hasn't. The Security
> > Considerations section in -07 still only refers to known security issues
> > with overlays in general. 
> > 
> > With VM mobility, the specific /new/ security concern is that a L3 address
> > change requires address spoofing to be *re*checked before the move can be
> > considered secure again. The original anti-address-spoofing check was done
> > e2e (that's one purpose of the transport layer handshake). So somehow, a VM
> > mobility solution has to trigger the transport layer to initiate another
> > handshake (which is not something transport protocols are designed to do).
> > Unless there is a solution to this problem, there is no solution to L3 VM
> > mobility.
> > 
> > > There is still no mention of the impact of the additional delay / latency,
> > > during a mobility event.
> > > [Linda] that is beyond the scope of this document
> > >  
> > 
> > [BB] To quote the intro, "This document is intended for describing the
> > solutions and the impact of moving VMs ..."
> > 
> > > There also is still no mention of statistics gathering, which would be
> > > needed to be able to make decisions on when to migrate VMs. But I guess
> > > the decision on when to migrate can be ruled to be at a higher scope than
> > > this draft.
> > > 
> > > [Linda] that is VM manager’s job, out of the scope of NVO3 WG and out of
> > > scope of this document.
> > > 
> > > > #. The draft reads as if different sections have been written by
> > > > different authors and no-one has edited the whole to give it a coherent
> > > > structure, or to ensure consistency (both technical and editorial)
> > > > between the parts. See detailed comments.
> > > 
> > > [Linda] the 07 version should have fixed the problem.
> > 
> > [BB] Of the 5 main problems I points out (listed below), only the 2 easy
> > ones were addressed.
> > 
> > > [MOSTLY NOT ADDRESSED]
> > > 
> > > [BB] I made a number of points to help improve the structure and
> > > comprehensibility. Two easy 'find-and-replace' ones have been dealt with,
> > > but the three that require more than editorial knowledge have not:
> > > [ADDRESSED] VM/task was being used in place of L2/L3. This has been taken
> > > on board. Thanks.
> > > [NOT ADDRESSED] Signposting of where sub-cases start and end in S.4.1
> > > [NOT ADDRESSED] Indecision over whether packets are silently dropped,
> > > dropped with an ICMP message, forwarded or tunnelled.
> > > [NOT ADDRESSED] The order in which the various stages of mobility occur
> > > was jumbled. Some stages have been ruled out of scope, but they are still
> > > mentioned in jumbled order.
> > > [ADDRESSED] Replace hypervisor with NVE. Thanks.
> > >  
> > > 
> > > > #. The quality of the English grammar does not allow a reviewer to
> > > > concentrate on the technical aspects rather than the English. It would
> > > > have been useful if one of the English-speaking co-authors had improved
> > > > the English before submission for review. See detailed comments.
> > > 
> > > [ADDRESSED] Thanks.
> > > 
> > > ==================
> > > [BB] New problem
> > > 
> > > Whilst checking over -07, I noticed that the definitions of task, workload
> > > and VM are now all interchangeable.
> > > 
> > >       Tasks:  Task is a program instantiated or running on a virtual
> > >                machine or container.  Tasks in virtual machines or
> > >                containers can be migrated from one server to another.
> > >                We use task, workload and virtual machine
> > >                interchangeably in this document.
> > > 
> > > Then in 4.2:
> > > 
> > >      Even though the term VM and Task are used interchangeably in this
> > >      document, the term Task is used in the context of Layer-3
> > >      migration mainly to have slight emphasis on the moving an entity
> > >      (Task) that is instantiated on a VM or a container.
> > > 
> > > The correct way to deal with confusion between different concepts is not
> > > just to say "Well, if you squint up your eyes, all words mean roughly the
> > > same thing really."
> > > 
> > 
> > Regards
> > 
> > 
> > 
> > 
> > Bob
> > 
> > > 
> > > Bob
> > > 
> > > On 24/02/2020 13:10, Bocci, Matthew (Nokia - GB) wrote:
> > > > Hi Bob, Authors
> > > >  
> > > > I am the document shepherd for this draft. It has now been updated to
> > > > v07 following the shepherd’s review and WG last call.
> > > >  
> > > > Please can you let me know where we are with addressing the comments in
> > > > this review?
> > > >  
> > > > Thanks
> > > >  
> > > > Matthew
> > > >  
> > > > From: Bob Briscoe <ietf@bobbriscoe.net>
> > > > Date: Friday, 21 September 2018 at 10:34
> > > > To: "sarikaya@ieee.org" <sarikaya@ieee.org>
> > > > Cc: "tsv-art@ietf.org" <tsv-art@ietf.org>, NVO3 <nvo3@ietf.org>, IETF <
> > > > ietf@ietf.org>, "draft-ietf-nvo3-vmm.all@ietf.org" <
> > > > draft-ietf-nvo3-vmm.all@ietf.org>
> > > > Subject: Re: [nvo3] [Tsv-art] Tsvart last call review of draft-ietf-
> > > > nvo3-vmm-04
> > > > Resent from: <alias-bounces@ietf.org>
> > > > Resent to: <sarikaya@ieee.org>, Linda Dunbar <Linda.dunbar@huawei.com>,
> > > > <vumip1@gmail.com>, <tom@herbertland.com>, <sadikshi@cisco.com>, <
> > > > matthew.bocci@nokia.com>, Sam Aldrin <aldrin.ietf@gmail.com>, Yizhou Li
> > > > <liyizhou@huawei.com>, MARTIN VIGOUREUX <martin.vigoureux@nokia.com>,
> > > > Deborah Brungard <db3546@att.com>, <aretana.ietf@gmail.com>, Matthew
> > > > Bocci <matthew.bocci@nokia.com>
> > > > Resent date: Friday, 21 September 2018 at 10:34
> > > >  
> > > > Behcet,
> > > > 
> > > > Linda made load of responses to my review, some of which I disagree with
> > > > so I would like to respond to them. I need responses to those two
> > > > questions first though, 'cos everything else depends on those.
> > > > 
> > > > 
> > > > Bob
> > > > 
> > > > On 20/09/18 15:30, Behcet Sarikaya wrote:
> > > > > Dear Bob,
> > > > > On Wed, Sep 19, 2018 at 9:53 AM Bob Briscoe <ietf@bobbriscoe.net>
> > > > > wrote:
> > > > > > Behcet,
> > > > > > 
> > > > > > I would like to make significant responses to many of Linda's
> > > > > > responses, but until we get answers to the two pre-requisite
> > > > > > questions I've given, I can't be sure how to respond.
> > > > > > 
> > > > > > So rather than promising a new version with no prior discussion, I
> > > > > > believe it would be much more fruitful to engage in this
> > > > > > conversation. I'm trying to help.
> > > > > > 
> > > > > 
> > > > >  
> > > > > You already made a detailed review.
> > > > > Your two points are clarifications from your detailed review.
> > > > > When I said we will revise I meant we  will revise based on your
> > > > > detailed review.
> > > > > After we post our revision you can do what ever you wish.
> > > > >  
> > > > > Sincerely,
> > > > > Behcet 
> > > > > > Cheers
> > > > > > 
> > > > > > 
> > > > > > Bob
> > > > > > 
> > > > > > On 19/09/18 15:46, Behcet Sarikaya wrote:
> > > > > > > Hi Bob,
> > > > > > >  
> > > > > > > Thank you for your comments.
> > > > > > > The authors are currently discussing your points and we will come
> > > > > > > up with a revision soon after the discussions are over.
> > > > > > >  
> > > > > > > Regards,
> > > > > > > Behcet 
> > > > > > > On Tue, Sep 18, 2018 at 6:03 PM Bob Briscoe <ietf@bobbriscoe.net>
> > > > > > > wrote:
> > > > > > > > Linda,
> > > > > > > > 
> > > > > > > > Until we can all understand the answers to the following two
> > > > > > > > questions, I don't think we can discuss what track this draft
> > > > > > > > ought to be on, let alone move on to your responses to all my
> > > > > > > > other points.
> > > > > > > > 
> > > > > > > > 1/ Applicability
> > > > > > > > 
> > > > > > > > You say this draft solely applies to connections with both ends
> > > > > > > > within the controlled DC environment. But the draft says it's
> > > > > > > > about multi-tenant DCs. Are there any multi-tenant DCs that
> > > > > > > > restrict all VMs to only communicate with other VMs within the
> > > > > > > > same controlled DC environment? 
> > > > > > > > 
> > > > > > > > 2/ Purpose of publishing as an RFC
> > > > > > > > 
> > > > > > > > When I said:
> > > > > > > > 
> > > > > > > > > #. The introduction does not say what the purpose of
> > > > > > > > > publishing this draft is.
> > > > > > > > 
> > > > > > > > you responded:
> > > > > > > > 
> > > > > > > > > [Linda] The first paragraph on Page 3 has the description why
> > > > > > > > > VM Mobility is needed.
> > > > > > > > 
> > > > > > > > Whether VM Mobility is needed was not my question. My question
> > > > > > > > was what is the purpose of the IETF publishing an RFC about VM
> > > > > > > > Mobility? And particularly, what is /this/ RFC intended to
> > > > > > > > achieve? 
> > > > > > > > 
> > > > > > > > Are the authors trying to argue for a particular approach vs.
> > > > > > > > others? Are you trying to write a tutorial? Are you trying to
> > > > > > > > give the pros and cons of different approaches? Are you trying
> > > > > > > > to give advice on good practice (with the implication that
> > > > > > > > alternative practices are less good)? Are you trying to clarify
> > > > > > > > ideas by writing them down? Are you trying to outline the
> > > > > > > > implications of VM Mobility for other protocols being developed
> > > > > > > > within the NVO WG?
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Bob
> > > > > > > > 
> > > > > > > > On 10/09/18 19:16, Linda Dunbar wrote:
> > > > > > > > > Bob,
> > > > > > > > >  
> > > > > > > > > Thank you very much for reviewing the draft and provided in-
> > > > > > > > > depth comments. I am very sorry for the delayed response due
> > > > > > > > > to traveling.
> > > > > > > > >  
> > > > > > > > > Replies to your comments are inserted below marked by [Linda]:
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Bob Briscoe [mailto:ietf@bobbriscoe.net] 
> > > > > > > > > Sent: Monday, September 03, 2018 9:45 PM
> > > > > > > > > To: tsv-art@ietf.org
> > > > > > > > > Cc: nvo3@ietf.org; ietf@ietf.org; 
> > > > > > > > > draft-ietf-nvo3-vmm.all@ietf.org
> > > > > > > > > Subject: Tsvart last call review of draft-ietf-nvo3-vmm-04
> > > > > > > > >  
> > > > > > > > > Reviewer: Bob Briscoe
> > > > > > > > > Review result: Not Ready
> > > > > > > > >  
> > > > > > > > > I have been selected as the Transport Directorate reviewer for
> > > > > > > > > this draft. The Transport Directorate seeks to review all
> > > > > > > > > transport or transport-related drafts as they pass through
> > > > > > > > > IETF last call and IESG review, and sometimes on special
> > > > > > > > > request. The purpose of the review is to provide assistance to
> > > > > > > > > the Transport ADs. For more information about the Transport
> > > > > > > > > Directorate Reviews and the Transport Area Review Team, please
> > > > > > > > > see 
> > > > > > > > > https://trac..ietf.org/trac/tsv/wiki/TSV-Directorate-Reviews
> > > > > > > > >  
> > > > > > > > > In this case, very very few of the review comments relate to
> > > > > > > > > transport issues, although the greatest issue concerns a
> > > > > > > > > desire that the network could pause or stop connections during
> > > > > > > > > L3 VM Mobility, which is certainly a transport issue.
> > > > > > > > >  
> > > > > > > > > [Linda] There is “Hot Migration” with transport service
> > > > > > > > > continuing, and there is a “Cold Migration”, which is a common
> > > > > > > > > practice in many data centers, which stop the task running on
> > > > > > > > > the old place and move to the new place before restart as
> > > > > > > > > described in the Task Migration.
> > > > > > > > > Is it helpful to add this description to the draft?
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > ==Summary==
> > > > > > > > >  
> > > > > > > > > The technical aspects of the draft concerning L2 VM mobility
> > > > > > > > > (within a subnet) seem sound. However, this is only part of
> > > > > > > > > the draft, which has the following
> > > > > > > > > issues:
> > > > > > > > >  
> > > > > > > > > #. The introduction does not say what the purpose of
> > > > > > > > > publishing this draft is.
> > > > > > > > > It seems that, rather than describing a specific protocol or
> > > > > > > > > protocols, it intends to describe the overall system procedure
> > > > > > > > > that would typically be used in DCs for VM mobility. It is
> > > > > > > > > tagged as a BCP, but it does not say who needs this BCP, why
> > > > > > > > > it is useful for the IETF to publish this BCP, how wide the
> > > > > > > > > authors' knowledge is of current practice (given DCs are
> > > > > > > > > private), or why this is a BCP rather than a protocol spec.
> > > > > > > > >  
> > > > > > > > > [Linda] The first paragraph on Page 3 has the description why
> > > > > > > > > VM Mobility is needed. Is it helpful to move this paragraph to
> > > > > > > > > the beginning of the Introduction Section?
> > > > > > > > > “Virtualization which is being used in almost all of today’s
> > > > > > > > > data
> > > > > > > > > centers enables many virtual machines to run on a single
> > > > > > > > > physical
> > > > > > > > > computer or compute server. Virtual machines (VM) need
> > > > > > > > > hypervisor
> > > > > > > > > running on the physical compute server to provide them shared
> > > > > > > > > processor/memory/storage. Network connectivity is provided by
> > > > > > > > > the
> > > > > > > > > network virtualization edge (NVE) [RFC8014]. Being able to
> > > > > > > > > move VMs
> > > > > > > > > dynamically, or live migration, from one server to another
> > > > > > > > > allows for
> > > > > > > > > dynamic load balancing or work distribution and thus it is a
> > > > > > > > > highly
> > > > > > > > > desirable feature [RFC7364].”
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > The draft starts out (S.3) as if it intends to say what a good
> > > > > > > > > VM Mobility protocol should or shouldn't do, but the rest of
> > > > > > > > > the document doesn't give any reasoning for these
> > > > > > > > > recommendations, it just asserts what appears to be one view
> > > > > > > > > of how a whole VM Mobility system works, sometimes referring
> > > > > > > > > to one example protocol RFC for a component part, but more
> > > > > > > > > often with no references or details.
> > > > > > > > >  
> > > > > > > > > [Linda] Is it helpful to move the paragraph above to the
> > > > > > > > > beginning of the Introduction Section? So that audience is
> > > > > > > > > aware of why VM Mobility is needed. And then follow up with
> > > > > > > > > what a good VM Mobility protocol should or shouldn't do?
> > > > > > > > >  
> > > > > > > > > #. It does not seem as if the NVO WG has discussed the purpose
> > > > > > > > > of using normative text in this draft. See detailed comments.
> > > > > > > > >  
> > > > > > > > > [Linda] The “Intended status” of the draft is “Best Current
> > > > > > > > > Practice”. So all the text are not “normative”. Is it Okay?
> > > > > > > > >  
> > > > > > > > > #. The draft silently slips back and forth between VM mobility
> > > > > > > > > and VM redundancy, without recognizing the differences. See
> > > > > > > > > detailed comments.
> > > > > > > > >  
> > > > > > > > > [Linda] There is only one usage of “redundancy” in the entire
> > > > > > > > > document, used under the context of “Hot standby option”,
> > > > > > > > > indicating  the “redundancy” of “the VMs in both primary and
> > > > > > > > > secondary domains have identical information and can provide
> > > > > > > > > services simultaneously as in load-share mode of operation”
> > > > > > > > > being expensive.
> > > > > > > > >  
> > > > > > > > > #. Please adopt different terminology than "source NVE" and
> > > > > > > > > "destination NVE", which are really poor choices of terms for
> > > > > > > > > an intermediate node. See detailed comments. Why not use "old
> > > > > > > > > NVE" and "new NVE", which is what you mean?
> > > > > > > > > [Linda] Thanks for the suggestion. We will change to “Old
> > > > > > > > > NVE”, and “new NVE”.
> > > > > > > > >  
> > > > > > > > > #. Applicability is fairly clearly outlined, but it is not
> > > > > > > > > clear whether hosts corresponding with the mobile VMs are part
> > > > > > > > > of the same controlled environment or on the uncontrolled
> > > > > > > > > public Internet. See detailed comments.
> > > > > > > > > [Linda] “Hosts” are the App running on the VM. It is the under
> > > > > > > > > the same controlled environment. Not on uncontrolled public
> > > > > > > > > internet.
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > #. Section 4.2.1 on L3 VM mobility reads like some potential
> > > > > > > > > half-thought-through ideas on how to solve L3 mobility, rather
> > > > > > > > > than current practice, let alone best current practice. Either
> > > > > > > > > current practice should be described instead, or the scope of
> > > > > > > > > the draft should be narrowed solely to L2 VM mobility. See
> > > > > > > > > detailed comments.
> > > > > > > > > [Linda] This is refereeing to “Cold Migration”, which is a
> > > > > > > > > common practice in many data centers.
> > > > > > > > >  
> > > > > > > > > # The VM's file system is described as state that moves with
> > > > > > > > > the VM (S.6), but VM mobility solutions often move the VM but
> > > > > > > > > stitch it back to its (unmoved) storage. Conversely, the
> > > > > > > > > storage can also move independent of the VM.
> > > > > > > > > [Linda] It depends. When a VM move to a different zone, the
> > > > > > > > > storage/file can becomes inaccessible.
> > > > > > > > >  
> > > > > > > > > #. The draft omits some of the security, transport and
> > > > > > > > > management aspects of VM mobility. See detailed comments.
> > > > > > > > > [Linda] Can you provide some text?
> > > > > > > > >  
> > > > > > > > > #. The draft reads as if different sections have been written
> > > > > > > > > by different authors and no-one has edited the whole to give
> > > > > > > > > it a coherent structure, or to ensure consistency (both
> > > > > > > > > technical and editorial) between the parts. See detailed
> > > > > > > > > comments.
> > > > > > > > >  
> > > > > > > > > [Linda] we can improve.
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > #. The quality of the English grammar does not allow a
> > > > > > > > > reviewer to concentrate on the technical aspects rather than
> > > > > > > > > the English. It would have been useful if one of the English-
> > > > > > > > > speaking co-authors had improved the English before submission
> > > > > > > > > for review. See detailed comments.
> > > > > > > > > [Linda] can you help?  Becoming a co-author to improve?
> > > > > > > > >  
> > > > > > > > > ==Detailed Comments==
> > > > > > > > >  
> > > > > > > > > ===#. Normative statements===
> > > > > > > > >  
> > > > > > > > > In the body of the document, there is just one occurrence of
> > > > > > > > > normative text (actually two "MUST"s, but both state a common
> > > > > > > > > requirement - just written separately for IPv4 and IPv6). This
> > > > > > > > > merely serves to imply that everything else the document says
> > > > > > > > > is less important or optional, which was probably not the
> > > > > > > > > intention.
> > > > > > > > > [Linda] The goal is to indicate any solution in moving the VM
> > > > > > > > > “MUST” follow this rule. They make sense, aren’t they?
> > > > > > > > >  
> > > > > > > > > At the start there is a requirements section, which states
> > > > > > > > > what a VM Mobility protocol "SHOULD" or "SHOULD NOT" do. I
> > > > > > > > > think this is intended as a set of goals for the rest of the
> > > > > > > > > document. If so, these "SHOULDs" are not intended to apply to
> > > > > > > > > implementations, so they ought not to be capitalized.
> > > > > > > > >  
> > > > > > > > > [Linda] okay, will change.
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > The first requirement, "Data center network SHOULD support
> > > > > > > > > virtual machine mobility in IPv6", is written as a requirement
> > > > > > > > > on all DC networks, not on implementations. I assume this was
> > > > > > > > > intended to read as "Data center network virtual machine
> > > > > > > > > mobility protocols SHOULD support IPv6". Even then, it doesn't
> > > > > > > > > really add anything to say VM mobility should support v6 and
> > > > > > > > > it should support v4. A L2 solution won't. While undoubtedly,
> > > > > > > > > a L3 solution will at least support one of them.
> > > > > > > > > [Linda]Agree. Will change it to “Data center that support IPv6
> > > > > > > > > address should …”
> > > > > > > > >  
> > > > > > > > > I'm not sure that 'protocol' is the right word anyway; I think
> > > > > > > > > 'VM Mobility procedure' would be a better phrase, because it
> > > > > > > > > includes steps such as suspending the VM, which is more than a
> > > > > > > > > protocol.
> > > > > > > > > [Linda] yes. Will change to “Procedure”.
> > > > > > > > >  
> > > > > > > > > The requirement "Virtual machine mobility protocol MAY support
> > > > > > > > > host routes to accomplish virtualization", is not followed up
> > > > > > > > > at all in the rest of the draft.
> > > > > > > > > Even if this requirement stays, the last 3 words should be
> > > > > > > > > deleted.
> > > > > > > > >  
> > > > > > > > > [Linda] will change to “Host Route can be used to support the
> > > > > > > > > Virtual Machine Mobility Procedure.”
> > > > > > > > >  
> > > > > > > > > By the end of the draft, the solution falls far short of the
> > > > > > > > > most relevant "Requirements" anyway, so one assumes the title
> > > > > > > > > of the section ought to have been "Goals". Specifically, even
> > > > > > > > > in the simpler case of L2 VM mobility, S.4.1 says that
> > > > > > > > > triangular routing and tunnelling persist "until a neighbour
> > > > > > > > > cache entry times out". A cache timeout is about 10 orders of
> > > > > > > > > magnitude longer than the requirement to only persist "while
> > > > > > > > > handling packets in flight", which would be a few milliseconds
> > > > > > > > > at most (the time for packets to clear the network that were
> > > > > > > > > already launched into flight when the old VM stopped).
> > > > > > > > >  
> > > > > > > > > Whatever, it would be preferable for the draft to give
> > > > > > > > > rationale for these requirements, rather than just assert
> > > > > > > > > them. This would help to shed light on the merits of the
> > > > > > > > > different trade offs that solutions choose.
> > > > > > > > >  
> > > > > > > > > [Linda] Agree, will add.
> > > > > > > > >  
> > > > > > > > > ===#. Mobility vs. Redundancy===
> > > > > > > > >  
> > > > > > > > > Redundancy and mobility have a lot of similarities, but they
> > > > > > > > > have different goals. With mobility, it is necessary to know
> > > > > > > > > the exact instant when one set of state is identical to the
> > > > > > > > > other so it can hand over. With redundancy, the aim is to keep
> > > > > > > > > two (or more) sets of state evolving through the same sequence
> > > > > > > > > of changes, but there is no need to know the point at which
> > > > > > > > > one is the same as the other was at a certain point.
> > > > > > > > > [Linda] Agree with what you said. There is only one usage of
> > > > > > > > > “redundancy” in the entire document, used under the context of
> > > > > > > > > “Hot standby option”, indicating  the “redundancy” of  “the
> > > > > > > > > VMs in both primary and secondary domains have identical
> > > > > > > > > information and can provide services simultaneously as in
> > > > > > > > > load-share mode of operation” being expensive.
> > > > > > > > >  
> > > > > > > > > The draft slips from mobility to resilience in the following
> > > > > > > > > places:
> > > > > > > > > * S.2. Terminology: Warm VM Mobility is defined without any
> > > > > > > > > ending, as if it is permanent replication. * S.7. "Handling of
> > > > > > > > > Hot, Warm and Cold Virtual Machine Mobility" is actually all
> > > > > > > > > about redundancy, and doesn't address mobility explicitly.
> > > > > > > > >  
> > > > > > > > > [Linda] Will add the definition “Hot Migration”, “cold
> > > > > > > > > migration”, and “warm migration”.
> > > > > > > > >  
> > > > > > > > > ===#. Terminology===
> > > > > > > > >  
> > > > > > > > > Packets run from the source at A to the destination at B via
> > > > > > > > > NVE1, then via NVE2. Please don't call NVE1 and NVE2 the
> > > > > > > > > source NVE and the destination NVE.
> > > > > > > > > In future, no-one will thank you for the apparent
> > > > > > > > > contradictions when they continually stumble over phrases like
> > > > > > > > > this one in S.4.1: "...send their packets to the source NVE"..
> > > > > > > > >  
> > > > > > > > > The term "packets in flight" is used incorrectly to refer to
> > > > > > > > > all the packets sent to the old NVE after the VM has moved,
> > > > > > > > > even if they were launched into flight long after the old VM
> > > > > > > > > stopped receiving packets.
> > > > > > > > >  
> > > > > > > > > [Linda] thank for the comments. Will change.
> > > > > > > > >  
> > > > > > > > > BTW, I think s/before/after/ in: "that have old ARP or
> > > > > > > > > neighbor cache entry before VM or task migration".
> > > > > > > > >  
> > > > > > > > > I think: s/IP-based VM mobility/L3 VM mobility/ throughout,
> > > > > > > > > because "based"
> > > > > > > > > sounds (to me) like the mobility control protocol is over
> > > > > > > > > (i.e. based on) IP.
> > > > > > > > >  
> > > > > > > > > ===#. Applicability===
> > > > > > > > >  
> > > > > > > > > In section 4.2 it says that the protocol mostly used as the IP
> > > > > > > > > based task migration protocol is ILA. This implies that all
> > > > > > > > > hosts corresponding with the mobile VMs are either part of the
> > > > > > > > > same controlled environment, or they are proxied via nodes
> > > > > > > > > that are part of the same controlled environment (I only have
> > > > > > > > > passing knowledge of ILA, but I understand that it depends on
> > > > > > > > > ILA routers on the path). If I am correct, this aspect of
> > > > > > > > > scope needs to be made clear from the start.
> > > > > > > > >  
> > > > > > > > > Also under the heading of applicabiliy, the sentence "Since
> > > > > > > > > migrations should be relatively rare events" appears very late
> > > > > > > > > in the document (S.4.2.1). The assumed level of churn ought to
> > > > > > > > > be stated nearer the start.
> > > > > > > > >  
> > > > > > > > > [Linda] yes, under the same controlled environment.
> > > > > > > > >  
> > > > > > > > > ===#. L3 Mobility===
> > > > > > > > > L2 VM mobility is independent of the application, because
> > > > > > > > > resolution of L2 mappings is delegated to the stack. In
> > > > > > > > > contrast, L3 VM mobility is only feasible under certain
> > > > > > > > > conditions, because an application needs an IP address to open
> > > > > > > > > a socket (resolution of DNS names is not delegated to the
> > > > > > > > > stack, and apps can use IP addresses directly anyway).
> > > > > > > > >  
> > > > > > > > > Examples of the 'certain conditions':
> > > > > > > > > a) /All/ applications used in the whole DC load balancing
> > > > > > > > > scheme contain IP address migration logic for /all/ their
> > > > > > > > > connections; 
> > > > > > > > > b) VMs running solely applications that support IP address
> > > > > > > > > migration register this fact with the NVA, and it only select
> > > > > > > > > such VMs for mobility. 
> > > > > > > > > c) An abstraction is layered over /all/ the IP addresses
> > > > > > > > > exposed to applications (at both ends) so that the IP
> > > > > > > > > addresses that applications use are solely identifiers  (e.g.
> > > > > > > > > ILA, LISP, HIP), not also locators.
> > > > > > > > >  
> > > > > > > > > The introduction says the draft is about VM mobility in a
> > > > > > > > > multi-tenant DC, so the DC admin will not know the range of
> > > > > > > > > applications being used. This excludes condition (a) above.
> > > > > > > > > When the draft says "...if all applications running are known
> > > > > > > > > to handle this gracefully...", it doesn't quantify just how
> > > > > > > > > restrictive this condition is, and it gives no explanation of
> > > > > > > > > how this knowledge might be 'known' or which function within
> > > > > > > > > the system 'knows' it.
> > > > > > > > >  
> > > > > > > > > S.4.2.1 contains what seems like plenty of arm-waving.
> > > > > > > > > * "TCP connections could be automatically closed in the
> > > > > > > > > network stack during a migration event."
> > > > > > > > >         o There is no TCP connection state in the network
> > > > > > > > > stack.
> > > > > > > > >         o Even if the network starts to drop every packet, the
> > > > > > > > > TCP connection
> > > > > > > > >         state persists in the end-points for a duration of the
> > > > > > > > > order of 30-90
> > > > > > > > >         minutes (OS-dependent) before TCP deems the connection
> > > > > > > > > is broken. 
> > > > > > > > >        o Other transport protocols have similar designs
> > > > > > > > > (including the app-layer
> > > > > > > > >         of protocols over UDP).
> > > > > > > > > * "More involved approach to connection migration":
> > > > > > > > >         o pausing the connection [does this refer to an actual
> > > > > > > > > feature of any
> > > > > > > > >         L4 protocol?] 
> > > > > > > > >         o packaging connection state and sending to target
> > > > > > > > > [does
> > > > > > > > >         this assume logic written into the application, or is
> > > > > > > > > this assuming the
> > > > > > > > >         stack handles this and the app is restricted to using
> > > > > > > > > some form of
> > > > > > > > >         separate identifier/locator addresses?] 
> > > > > > > > >         o instantiating connection state in the peer stack
> > > > > > > > > [ditto?].
> > > > > > > > >  
> > > > > > > > > There's some arm-waving in S.7 too:
> > > > > > > > >   "Cold Virtual Machine mobility is facilitated by the VM
> > > > > > > > > initially
> > > > > > > > >    sending an ARP or Neighbor Discovery message at the
> > > > > > > > > destination NVE
> > > > > > > > >    but the source NVE not receiving any packets inflight."
> > > > > > > > >    [How is it arranged for the source NVE not to receive any
> > > > > > > > > packets in flight?]
> > > > > > > > >  
> > > > > > > > > And in S.7:
> > > > > > > > >   "In hot
> > > > > > > > >    standby option, regarding TCP connections, one option is to
> > > > > > > > > start
> > > > > > > > >    with and maintain TCP connections to two different VMs at
> > > > > > > > > the same
> > > > > > > > >    time."
> > > > > > > > >    [This sounds like resilience logic has been written into
> > > > > > > > > the application,
> > > > > > > > >    which would be a special case but not something VM mobility
> > > > > > > > > infrastructure
> > > > > > > > >    could depend on.]
> > > > > > > > >  
> > > > > > > > > [Linda] will add.
> > > > > > > > >  
> > > > > > > > > ===#. Gaps===
> > > > > > > > > #. Security Considerations: repeats issues in other drafts
> > > > > > > > > that are not specific to mobility, but it does not mention any
> > > > > > > > > security issues specifically due to VM mobility. It says that
> > > > > > > > > address spoofing may arise in a DC (sort-of implying it is
> > > > > > > > > worse than in non-DC environments, but not saying why). The
> > > > > > > > > handshake at the start of a connection (e.g. TCP, SCTP, QUIC)
> > > > > > > > > checks for source address spoofing. So L3 VM mobility would be
> > > > > > > > > more vulnerable to source address spoofing in cases where the
> > > > > > > > > mobile VM was the connection initiator and there was not a new
> > > > > > > > > handshake after the move. However, this draft does not contain
> > > > > > > > > any detailed mobility protocols, so it is not possible to
> > > > > > > > > identify any specific security flaws.
> > > > > > > > >  
> > > > > > > > > #. Transport Issues: Effect of delay on the transport: Cold
> > > > > > > > > mobility introduces significant delay, and other forms less,
> > > > > > > > > but still some delay. It should be pointed out that some
> > > > > > > > > applications (e.g. real-time) will therefore not be useful if
> > > > > > > > > subjected to VM mobility. Similarly, even a short period of
> > > > > > > > > delay will drive most congestion controls to severely reduce
> > > > > > > > > throughput. These points might be self-evident, but perhaps
> > > > > > > > > they should be stated explicitly.
> > > > > > > > >  
> > > > > > > > > BTW, in the L3 VM mobility case, the draft often refers to TCP
> > > > > > > > > connections, but the address bindings of any transport
> > > > > > > > > protocols would have to be migrated due to VM mobility (e.g.
> > > > > > > > > SCTP; sequences of datagrams over UDP; streams over UDP such
> > > > > > > > > as with RTP, QUIC).
> > > > > > > > >  
> > > > > > > > > #. Management Issues: perhaps the draft ought to recommend
> > > > > > > > > statistics gathering (e.g. time taken, amount of duplicate
> > > > > > > > > data) to aid a DC's future decisions on the cost-benefit of
> > > > > > > > > moving a VM. The OPSDIR review says a BCP does not /have/ to
> > > > > > > > > describe management issues, but this document seems to
> > > > > > > > > describe a whole system procedure, not just a protocol, which
> > > > > > > > > then surely includes the management plane.
> > > > > > > > >  
> > > > > > > > > [Linda] can you become a co-author and add those in?
> > > > > > > > >  
> > > > > > > > > ===#. Incoherent Structure===
> > > > > > > > >  
> > > > > > > > > S.4.1. happens to talk about VMs moving, while S.4.2. happens
> > > > > > > > > to talk about tasks moving, but this is not the distinguishing
> > > > > > > > > aspect of these two sections (anyway, S.2. says "the draft
> > > > > > > > > uses task and VM interchangeably"): * "4.1 VM Migration" is
> > > > > > > > > about "L2 VM Mobility" so this ought to be the section
> > > > > > > > > heading, *
> > > > > > > > > "4.2 Task Migration" is about "L3 VM Mobility" so this ought
> > > > > > > > > to be the section heading. It would also help not to switch
> > > > > > > > > from VM to task across these sections
> > > > > > > > > - it's just a distraction.
> > > > > > > > >  
> > > > > > > > > S.4.1 needs better signposting of where each sub-case ends
> > > > > > > > > (Subsections might be useful to solve this): * IPv4 * end-user 
> > > > > > > > > client * 2 paras starting "All NVEs communicating with this
> > > > > > > > > virtual machine..." [Not clear that the end-user case has
> > > > > > > > > ended and we have returned to the general IPv4 case?] * IPv6
> > > > > > > > > [Strictly, it still hasn't said whether the end-user client
> > > > > > > > > case has ended.] [Also, it doesn't explain why there is no
> > > > > > > > > need for an end-user client case under IPv6?] Sections 5 & 6
> > > > > > > > > seem to be about either L2 or L3 mobility, whereas Sections 7
> > > > > > > > > &
> > > > > > > > > 8 seem to be restricted to L2.
> > > > > > > > >  
> > > > > > > > > The draft vacillates over what to do with packets arriving at
> > > > > > > > > the old NVE in the L3 case (see also L3 mobility above): *
> > > > > > > > > S4.2 first says packets are dropped, possibly with an ICMP
> > > > > > > > > error message;
> > > > > > > > >   o then later it says they are silently dropped;
> > > > > > > > >   o then in the very next sentence it says either silently
> > > > > > > > > drop them or forward
> > > > > > > > >   them to the new location
> > > > > > > > > * S.5 says they should not be lost, but instead delivered to
> > > > > > > > > the destination hypervisor
> > > > > > > > >   o then it describes how they are tunnelled (which is not the
> > > > > > > > > same as
> > > > > > > > >   "forwarding").
> > > > > > > > >  
> > > > > > > > > The order in which all the stages of mobilty are given is
> > > > > > > > > jumbled up across sections that also appear in arbitrary
> > > > > > > > > order: * S.5 prepares, establishes uses then stops a tunnel,
> > > > > > > > > but it doesn't say where the other stages fit between these
> > > > > > > > > steps
> > > > > > > > >         o When tunneling packets, it talks about the
> > > > > > > > > *migrating* VM not the
> > > > > > > > >         *migrated* VM, which implies tunnelling has started
> > > > > > > > > before the new VM
> > > > > > > > >         is running. Does this imply there is a huge buffer? o
> > > > > > > > > It says "Stop
> > > > > > > > >         Tunneling Packets - When source NVE stops receiving
> > > > > > > > > packets destined
> > > > > > > > >         to..." but it is never clear when a source has stopped
> > > > > > > > > sending packets
> > > > > > > > >         to a destination, unless it explicitly closes the
> > > > > > > > > connection (e.g. with
> > > > > > > > >         a FIN in the case of TCP). Often there are long gaps
> > > > > > > > > between packets,
> > > > > > > > >         because many flows are 'thin' (meaning the application
> > > > > > > > > frequently has
> > > > > > > > >         nothing to send). These gaps can last for
> > > > > > > > > milliseconds, hours or even
> > > > > > > > >         days without any implication that the connection has
> > > > > > > > > ended.
> > > > > > > > > * Then S.6. describes moving state, but doesn't say that this
> > > > > > > > > is not after the previous tunnelling steps (or where it fits
> > > > > > > > > within those steps). * Then S.7 describes hot, warm and cold
> > > > > > > > > mobility, but doesn't lay out the tunnelling or steps to move
> > > > > > > > > state in each case. * Then S.8 says it's about VM life-cycle,
> > > > > > > > > but just gives the very first 3 steps for allocation of
> > > > > > > > > resources to a VM, then abruptly ends, without even starting
> > > > > > > > > the VM, let alone getting to move it.
> > > > > > > > >  
> > > > > > > > > S.5 exhibits another inconsistency by talking about the
> > > > > > > > > hypervisor, not the NVE.
> > > > > > > > >  
> > > > > > > > > ==#. Nits==
> > > > > > > > >  
> > > > > > > > > Nits with the English are too numerous to mention them all.
> > > > > > > > > Below are pointers to general problems as well as some
> > > > > > > > > individual instances.
> > > > > > > > >  
> > > > > > > > > S.4
> > > > > > > > >   "Layer 2 and Layer 3 protocols are described next.  In the
> > > > > > > > > following
> > > > > > > > >    sections, we examine more advanced features."
> > > > > > > > >         s/following/subsequent/
> > > > > > > > >  
> > > > > > > > > S.4.1
> > > > > > > > > Expand WSC, MSC and NVA on first use.
> > > > > > > > >  
> > > > > > > > > s/the VM moves in the same link/the VM moves in the same
> > > > > > > > > subnet/
> > > > > > > > >  
> > > > > > > > > "i.e. end-user clients ask for the same MAC address upon
> > > > > > > > > migration. [...] to ensure that the same IPv4 address is
> > > > > > > > > assigned to the VM." I think s/IPv4/MAC/ was intended?
> > > > > > > > >  
> > > > > > > > > "  All NVEs communicating with this virtual machine uses the
> > > > > > > > > old ARP
> > > > > > > > >    entry.  If any VM in those NVEs need to talk to the new VM
> > > > > > > > > in the
> > > > > > > > >    destination NVE, it uses the old ARP entry."
> > > > > > > > > Repetition: these 2 sentences say the same. (The mistake is
> > > > > > > > > also repeated when these 2 sentences are repeated for IPv6).
> > > > > > > > >  
> > > > > > > > > S.4.2.1
> > > > > > > > > s/Push the new mapping to hosts./Push the new mapping to
> > > > > > > > > communicating hosts./
> > > > > > > > >  
> > > > > > > > > S.5.
> > > > > > > > > The IPv4/IPv6 pairs of paras for "tunnel estabilshment" and
> > > > > > > > > "tunneling packets"
> > > > > > > > > only differ in the words "IPv4"/"IPv6". So in each case a
> > > > > > > > > single para could be given for IP (irrespective of whether v4
> > > > > > > > > or v6).
> > > > > > > > >  
> > > > > > > > > Thank you very much.
> > > > > > > > >  
> > > > > > > > > Linda Dunbar
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > >  
> > > > > > > > > 
> > > > > > > > > _______________________________________________
> > > > > > > > > Tsv-art mailing list
> > > > > > > > > Tsv-art@ietf.org
> > > > > > > > > https://www.ietf.org/mailman/listinfo/tsv-art
> > > > > > > > 
> > > > > > > >  
> > > > > > > > 
> > > > > > > > -- 
> > > > > > > > ________________________________________________________________
> > > > > > > > Bob Briscoe                               http://bobbriscoe.net/
> > > > > > > 
> > > > > > >  
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > Tsv-art mailing list
> > > > > > > Tsv-art@ietf.org
> > > > > > > https://www.ietf.org/mailman/listinfo/tsv-art
> > > > > > 
> > > > > >  
> > > > > > 
> > > > > > -- 
> > > > > > ________________________________________________________________
> > > > > > Bob Briscoe                               http://bobbriscoe.net/
> > > > > 
> > > > >  
> > > > > 
> > > > > _______________________________________________
> > > > > nvo3 mailing list
> > > > > nvo3@ietf.org
> > > > > https://www.ietf.org/mailman/listinfo/nvo3
> > > > 
> > > >  
> > > > 
> > > > -- 
> > > > ________________________________________________________________
> > > > Bob Briscoe                               http://bobbriscoe.net/
> > > >  
> > > > 
> > > > _______________________________________________
> > > > nvo3 mailing list
> > > > nvo3@ietf.org
> > > > https://www.ietf.org/mailman/listinfo/nvo3
> > > 
> > >  
> > > 
> > > -- 
> > > ________________________________________________________________
> > > Bob Briscoe                               http://bobbriscoe.net/
> > >  
> > > 
> > > _______________________________________________
> > > Tsv-art mailing list
> > > Tsv-art@ietf.org
> > > https://www.ietf.org/mailman/listinfo/tsv-art
> > 
> >  
> > 
> > -- 
> > ________________________________________________________________
> > Bob Briscoe                               http://bobbriscoe.net/
> >  
> > 
> > _______________________________________________
> > Tsv-art mailing list
> > Tsv-art@ietf.org
> > https://www.ietf.org/mailman/listinfo/tsv-art
> 
>  
> 
> -- 
> ________________________________________________________________
> Bob Briscoe                               http://bobbriscoe.net/
> _______________________________________________
> Tsv-art mailing list
> Tsv-art@ietf.org
> https://www.ietf.org/mailman/listinfo/tsv-art
-- 
Cheers

Magnus Westerlund 

----------------------------------------------------------------------
Networks, Ericsson Research
----------------------------------------------------------------------
Ericsson AB                 | Phone  +46 10 7148287
Torshamnsgatan 23           | Mobile +46 73 0949079
SE-164 80 Stockholm, Sweden | mailto: magnus.westerlund@ericsson.com
----------------------------------------------------------------------

Re: [nvo3] [Tsv-art] Can you please enter READY for Tsvart last call review of draft-ietf-nvo3-vmm-14?

Attachment: smime.p7s