Re: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-04

Bob Briscoe <ietf@bobbriscoe.net> Mon, 23 March 2020 11:59 UTC

Return-Path: <ietf@bobbriscoe.net>
X-Original-To: tsv-art@ietfa.amsl.com
Delivered-To: tsv-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6B8253A07EC; Mon, 23 Mar 2020 04:59:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=bobbriscoe.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AYkd3EKrdMFW; Mon, 23 Mar 2020 04:59:02 -0700 (PDT)
Received: from cl3.bcs-hosting.net (cl3.bcs-hosting.net [3.11.37.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8DEAC3A07D3; Mon, 23 Mar 2020 04:59:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=bobbriscoe.net; s=default; h=Content-Type:In-Reply-To:MIME-Version:Date: Message-ID:From:References:Cc:To:Subject:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=LL7SjFVNiwvJZ+dpz+1iC6hn6eTEUlU0mHcsV9gnQ64=; b=zYTkRYBRGuqMPrTGxYWM4/sLI ZlrU/K+CDS7KVWVqe3T3dkg2B1X83yihZh7nPxUCxgFHA3EVfuGeBSnxPDl+4AFEjrrX3bF5u425w WYFMNxbSC2A8pA9YGccoLWoishBVVN9jA21QkTyLm3oddoTwGZVhW7OHAsxw8zX5Anjg8VokKhduc SaPrLu0PDV+s37FCO3wHb6ak110JQWv51su0nlRu4KnrCtPxlaF7K00I2DYTxKbF3fciOLPriqm6u Y37Tm5e3UsDoQdGxwZl38WWp1SJwxJWNR6qBwGwp+cr91j0Po2aUKvaGIRgUGM5vHtNPDmESc681/ joU378y4w==;
Received: from host-79-78-166-168.static.as9105.net ([79.78.166.168]:45552 helo=[192.168.2.5]) by cl3.bcs-hosting.net with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from <ietf@bobbriscoe.net>) id 1jGLjM-00828O-5K; Mon, 23 Mar 2020 11:58:59 +0000
To: "Bocci, Matthew (Nokia - GB)" <matthew.bocci@nokia.com>, "sarikaya@ieee.org" <sarikaya@ieee.org>
Cc: "tsv-art@ietf.org" <tsv-art@ietf.org>, NVO3 <nvo3@ietf.org>, "draft-ietf-nvo3-vmm.all@ietf.org" <draft-ietf-nvo3-vmm.all@ietf.org>
References: <153602909285.13281.13763046029400746910@ietfa.amsl.com> <4A95BA014132FF49AE685FAB4B9F17F66B139743@sjceml521-mbs.china.huawei.com> <7f3ceaff-db16-8eb9-a72c-aca219c7d90c@bobbriscoe.net> <CAC8QAcfVywTMOs=+B5UH5JwpsPPkiYZnb4YQzcqKMzedQsiMdw@mail.gmail.com> <c513d041-0c65-111d-9fd4-4474c52fa491@bobbriscoe.net> <CAC8QAcc-fr_-g8bPe812=udZVQk3d2E00mkJBwDbX3ZkdDceUw@mail.gmail.com> <9722da33-469b-38f6-b629-99a277fc864e@bobbriscoe.net> <A4D61406-480A-4F0F-B7D3-817321983BC0@nokia.com>
From: Bob Briscoe <ietf@bobbriscoe.net>
Message-ID: <fd0efba8-014a-819d-ae8f-43a54562ca06@bobbriscoe.net>
Date: Mon, 23 Mar 2020 11:58:55 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <A4D61406-480A-4F0F-B7D3-817321983BC0@nokia.com>
Content-Type: multipart/alternative; boundary="------------3DE2998DD3FE023E53D26345"
Content-Language: en-GB
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - cl3.bcs-hosting.net
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - bobbriscoe.net
X-Get-Message-Sender-Via: cl3.bcs-hosting.net: authenticated_id: in@bobbriscoe.net
X-Authenticated-Sender: cl3.bcs-hosting.net: in@bobbriscoe.net
X-Source:
X-Source-Args:
X-Source-Dir:
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/Bd-Z63cxNEMyYW93KeH7yoKQcPs>
Subject: Re: [Tsv-art] [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-04
X-BeenThere: tsv-art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Review Team <tsv-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-art/>
List-Post: <mailto:tsv-art@ietf.org>
List-Help: <mailto:tsv-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 23 Mar 2020 11:59:09 -0000

Matthew,

It is a long time since I reviewed this (Sep 2018), and I'm sorry for 
being unresponsive back in Aug 2019 when the -05 revision that aimed to 
address my -04 review comments was released.

I have to say, IMO this document is still far from suitable for 
publication as an RFC. The IETF surely cannot publish material that 
demonstrates such a loose grasp of the subject area, particularly the 
impact of mobility on layer 4 and above. However, that is not my call. 
All I can do as a reviewer is identify those of my comments that have 
not been addressed, and why it is important to address them. Then you, 
as document shepherd, can decide whether the IESG will need these issues 
to be addressed.

So, below, I work through my previous top level comments, identifying 
whether they have been addressed or not. I assume the editorial nits 
that I identified have been addressed (I haven't checked).

I also have to say that, as a general rule, the only ones of my review 
comments that have been addressed are the 'easy' ones that could be 
dealt with using something like find-and-replace techniques. Those that 
question the subject matter itself, have usually not been addressed at all.

=================================

> #. The introduction does not say what the purpose of publishing this 
> draft is.

*[NOT ADDRESSED]*

[BB] Changing the intended status to Informational has helped to address 
some of my concerns. However, an informational document still has to 
have a target readership and a purpose. This vmm document still doesn't 
seem to have a purpose, or if it does, it doesn't say what's its purpose is.

When it says it...
  ...describes solutions that support VM mobility,

or that:
  ...there is a desire to document comprehensive VM mobility solutions 
that cover
      both IPv4 and IPv6.

...they are not reasons for the IETF to publish an informational RFC. 
They are circular reasons - they just say, "we are writing this because 
it describes something". They do not say why it is relevant for the IETF 
to publish this description. Is it something that the IETF needs to 
understand before it moves in to standardize aspects of VM mobility? 
Does this document provide new insights that have not been understood 
before? Do multiple VM mobility solutions already exist, but a standard 
is needed, because they don't interoperate?

> #. It does not seem as if the NVO WG has discussed the purpose of 
> using normative text in this draft. See detailed comments.

*[NOT ADDRESSED]*

[BB] In -07 there are still the same two "MUSTs" as in -04. This is 
intended to be an informational RFC, so no-one will be required to 
comply with it. Admittedly, informational RFCs can contain normative 
keywords in certain special cases (e.g. an informative copy of a 
specification of a protocol that is outside the IETF's change control). 
But this draft is not one of those special cases.

By my understanding of what normative keywords are for, these "MUSTs" 
ought to be replaced with "has to" or "needs to".


> #. The draft silently slips back and forth between VM mobility and VM 
> redundancy, without recognizing the differences. See detailed comments.


*[NOT ADDRESSED]*

[BB] In -07 warm standby is still described as it was in -04: state 
update messages at regular intervals. That is redundancy, not mobility. 
In particular, Section 7 has VM Mobility in the title, but starts 
slipping into talking about standby after the second para.

Warm standby is useful for resilience against failures, but it is not 
mobility. In standby, processing does not /move/, which is what 
/mobility/ means. Standby never releases the 'Old' address and, because 
there is no first move there is never a second or subsequent move. So 
routing and addressing does not become increasingly fragmented.

As the Intro says, the purpose of VM mobility is...

      It is highly desirable [RFC7364  <https://tools.ietf.org/html/rfc7364>] to allow
      VMs to be moved dynamically (live, hot, or cold move) from one
      server to another for dynamic load balancing or optimized work
      distribution.


In contrast, warm standby does not ever release the 'Old' processing 
resources, so it cannot be used for dynamically balancing load and 
optimizing work distribution.

The following gives the impression that warm 'mobility' is on a spectrum 
between hot and cold VM mobility:

      The larger the
      duration, the less warm (and hence cold) the Warm VM mobility
      option becomes.


Warm standby is on a different spectrum to hot and cold mobility. But 
both hot and cold mobility involve one volley of messages. The only 
difference is whether processing is stopped or not. Repeating messages 
is not part way between one message and one message. Regularly repeating 
messages is not part way between leaving a process running and stopping it.

This is still muddle-headed thinking - about redundancy, not mobility. 
I'm not saying redundancy is not important - it's extremely important. 
I'm just saying it's out of scope for a VM mobility draft.


> #. Please adopt different terminology than "source NVE" and 
> "destination NVE", which are really poor choices of terms for an 
> intermediate node. See detailed comments. Why not use "old NVE" and 
> "new NVE", which is what you mean?

[*ADDRESSED*] Thank you

> #. Applicability is fairly clearly outlined, but it is not clear 
> whether hosts corresponding with the mobile VMs are part of the same 
> controlled environment or on the uncontrolled public Internet. See 
> detailed comments.
>

[*NOT PROPERLY ADDRESSED*]

[BB] The introduction now contains the useful scoping sentence:

      There are
      communication among tasks belonging to one tenant and
      communication among tasks belonging to different tenants or with
      external entities.


However, AFAICT, the draft still does not address the case where an 
internal entity moves but it needs to continue to communicate with an 
external entity that is not controlled by (or even aware of) the NVA.

A system of tasks within a controlled environment all talking with each 
other would be like Schroedinger's cat in a box - a quaint thought 
experiment but not useful. The outside world has to be able to 
continually give input (requests, data, events, code, etc) and get 
output (responses, transformed data, notifications, etc). So a VM 
mobility solution cannot just separate off the outside world as if 
that's a different problem that is not in scope.

It's not difficult to do this, but it surely has to be done.


>
>
> #. Section 4.2.1 on L3 VM mobility reads like some potential 
> half-thought-through ideas on how to solve L3 mobility, rather than 
> current practice, let alone best current practice. Either current 
> practice should be described instead, or the scope of the draft should 
> be narrowed solely to L2 VM mobility. See detailed comments.

[*NOT ADDRESSED*]

[BB] None of the inadequate text in this section has been altered 
(except terminology find-and-replace that happened to hit words in this 
section).

The draft still doesn't deal with the reality of where layer-4 
connection state is held (in the end applications and the transport 
stack under them). The authors seem to believe that network elements can 
close or pause Layer-4 connections. For this, applications have to be 
built with logic to handle IP address migration. But the scope of this 
draft is multi-tenant DCs, where the DC infrastructure has no knowledge 
of which applications each tenant might be using.

My comments all still apply, and can still be found quoted way down this 
email (find text: "#. L3 Mobility").

For useful tutorial on how TCP responds to ICMP destination unreachable 
messages (defined as soft errors), and the dilemmas surrounding how TCP 
should respond, see RFC5461 "TCP's Reaction to Soft Errors".
It's probably also worth reading RFC6069 "Making TCP More Robust to Long 
Connectivity Disruptions (TCP-LCD)".
Other transport protocols (e.g. SCTP, QUIC/UDP, RTP/UDP, etc) face 
similar dilemmas.


>
> # The VM's file system is described as state that moves with the VM 
> (S.6), but VM mobility solutions often move the VM but stitch it back 
> to its (unmoved) storage. Conversely, the storage can also move 
> independent of the VM.
[*ADDRESSED*]

[BB] Thank you.

>
> #. The draft omits some of the security, transport and management 
> aspects of VM mobility. See detailed comments.

[*NOT ADDRESSED*]

[BB] The Security Considerations section is unchanged. It still only 
refers to previously known security issues with overlays in general, and 
does not discuss security vulnerabilities specific to VM mobility. In 
particular the need for the transport to recheck for address spoofing 
after each address change, which was identified in my review, which can 
still be found quoted below under "#. Gaps".

There is still no mention of the impact of the additional delay / 
latency, during a mobility event.

There also is still no mention of statistics gathering, which would be 
needed to be able to make decisions on when to migrate VMs. But I guess 
the decision on when to migrate can be ruled to be at a higher scope 
than this draft.


>
> #. The draft reads as if different sections have been written by 
> different authors and no-one has edited the whole to give it a 
> coherent structure, or to ensure consistency (both technical and 
> editorial) between the parts. See detailed comments.

[*MOSTLY NOT ADDRESSED*]

[BB] I made a number of points to help improve the structure and 
comprehensibility. Two easy 'find-and-replace' ones have been dealt 
with, but the three that require more than editorial knowledge have not:

 1. [ADDRESSED] VM/task was being used in place of L2/L3. This has been
    taken on board. Thanks.
 2. [NOT ADDRESSED] Signposting of where sub-cases start and end in S.4.1
 3. [NOT ADDRESSED] Indecision over whether packets are silently
    dropped, dropped with an ICMP message, forwarded or tunnelled.
 4. [NOT ADDRESSED] The order in which the various stages of mobility
    occur was jumbled. Some stages have been ruled out of scope, but
    they are still mentioned in jumbled order.
 5. [ADDRESSED] Replace hypervisor with NVE. Thanks.


>
> #. The quality of the English grammar does not allow a reviewer to 
> concentrate on the technical aspects rather than the English. It would 
> have been useful if one of the English-speaking co-authors had 
> improved the English before submission for review. See detailed comments.

[*ADDRESSED*] Thanks.

==================
[BB] New problem

Whilst checking over -07, I noticed that the definitions of task, 
workload and VM are now all interchangeable.

       Tasks:  Task is a program instantiated or running on a virtual
                machine or container.  Tasks in virtual machines or
                containers can be migrated from one server to another.
                We use task, workload and virtual machine
                interchangeably in this document.


Then in 4.2:

      Even though the term VM and Task are used interchangeably in this
      document, the term Task is used in the context of Layer-3
      migration mainly to have slight emphasis on the moving an entity
      (Task) that is instantiated on a VM or a container.


The correct way to deal with confusion between different concepts is not 
just to say "Well, if you squint up your eyes, all words mean roughly 
the same thing really."



Bob


On 24/02/2020 13:10, Bocci, Matthew (Nokia - GB) wrote:
>
> Hi Bob, Authors
>
> I am the document shepherd for this draft. It has now been updated to 
> v07 following the shepherd’s review and WG last call.
>
> Please can you let me know where we are with addressing the comments 
> in this review?
>
> Thanks
>
> Matthew
>
> *From: *Bob Briscoe <ietf@bobbriscoe.net>
> *Date: *Friday, 21 September 2018 at 10:34
> *To: *"sarikaya@ieee.org" <sarikaya@ieee.org>
> *Cc: *"tsv-art@ietf.org" <tsv-art@ietf.org>rg>, NVO3 <nvo3@ietf.org>rg>, 
> IETF <ietf@ietf.org>rg>, "draft-ietf-nvo3-vmm.all@ietf.org" 
> <draft-ietf-nvo3-vmm.all@ietf.org>
> *Subject: *Re: [nvo3] [Tsv-art] Tsvart last call review of 
> draft-ietf-nvo3-vmm-04
> *Resent from: *<alias-bounces@ietf.org>
> *Resent to: *<sarikaya@ieee.org>rg>, Linda Dunbar 
> <Linda.dunbar@huawei.com>om>, <vumip1@gmail.com>om>, <tom@herbertland.com>om>, 
> <sadikshi@cisco.com>om>, <matthew.bocci@nokia.com>om>, Sam Aldrin 
> <aldrin.ietf@gmail.com>om>, Yizhou Li <liyizhou@huawei.com>om>, MARTIN 
> VIGOUREUX <martin.vigoureux@nokia.com>om>, Deborah Brungard 
> <db3546@att.com>om>, <aretana.ietf@gmail.com>om>, Matthew Bocci 
> <matthew.bocci@nokia.com>
> *Resent date: *Friday, 21 September 2018 at 10:34
>
> Behcet,
>
> Linda made load of responses to my review, some of which I disagree 
> with so I would like to respond to them. I need responses to those two 
> questions first though, 'cos everything else depends on those.
>
>
> Bob
>
> On 20/09/18 15:30, Behcet Sarikaya wrote:
>
>     Dear Bob,
>
>     On Wed, Sep 19, 2018 at 9:53 AM Bob Briscoe <ietf@bobbriscoe.net
>     <mailto:ietf@bobbriscoe.net>> wrote:
>
>         Behcet,
>
>         I would like to make significant responses to many of Linda's
>         responses, but until we get answers to the two pre-requisite
>         questions I've given, I can't be sure how to respond.
>
>         So rather than promising a new version with no prior
>         discussion, I believe it would be much more fruitful to engage
>         in this conversation. I'm trying to help.
>
>     You already made a detailed review.
>
>     Your two points are clarifications from your detailed review.
>
>     When I said we will revise I meant we  will revise based on your
>     detailed review.
>
>     After we post our revision you can do what ever you wish.
>
>     Sincerely,
>
>     Behcet
>
>         Cheers
>
>
>         Bob
>
>         On 19/09/18 15:46, Behcet Sarikaya wrote:
>
>             Hi Bob,
>
>             Thank you for your comments.
>
>             The authors are currently discussing your points and we
>             will come up with a revision soon after the discussions
>             are over.
>
>             Regards,
>
>             Behcet
>
>             On Tue, Sep 18, 2018 at 6:03 PM Bob Briscoe
>             <ietf@bobbriscoe.net <mailto:ietf@bobbriscoe.net>> wrote:
>
>                 Linda,
>
>                 Until we can all understand the answers to the
>                 following two questions, I don't think we can discuss
>                 what track this draft ought to be on, let alone move
>                 on to your responses to all my other points.
>
>                 1/ Applicability
>
>                 You say this draft solely applies to connections with
>                 both ends within the controlled DC environment. But
>                 the draft says it's about multi-tenant DCs. Are there
>                 any multi-tenant DCs that restrict all VMs to only
>                 communicate with other VMs within the same controlled
>                 DC environment?
>
>                 2/ Purpose of publishing as an RFC
>
>                 When I said:
>
>                     #. The introduction does not say what the purpose
>                     of publishing this draft is.
>
>                 you responded:
>
>                     [Linda] The first paragraph on Page 3 has the
>                     description why VM Mobility is needed.
>
>
>                 Whether VM Mobility is needed was not my question. My
>                 question was what is the purpose of the IETF
>                 publishing an RFC about VM Mobility? And particularly,
>                 what is /this/ RFC intended to achieve?
>
>                 Are the authors trying to argue for a particular
>                 approach vs. others? Are you trying to write a
>                 tutorial? Are you trying to give the pros and cons of
>                 different approaches? Are you trying to give advice on
>                 good practice (with the implication that alternative
>                 practices are less good)? Are you trying to clarify
>                 ideas by writing them down? Are you trying to outline
>                 the implications of VM Mobility for other protocols
>                 being developed within the NVO WG?
>
>
>
>
>                 Bob
>
>                 On 10/09/18 19:16, Linda Dunbar wrote:
>
>                     Bob,
>
>                     Thank you very much for reviewing the draft and
>                     provided in-depth comments. I am very sorry for
>                     the delayed response due to traveling.
>
>                     Replies to your comments are inserted below marked
>                     by [Linda]:
>
>                     -----Original Message-----
>                     From: Bob Briscoe [mailto:ietf@bobbriscoe.net]
>                     Sent: Monday, September 03, 2018 9:45 PM
>                     To: tsv-art@ietf.org <mailto:tsv-art@ietf.org>
>                     Cc: nvo3@ietf.org <mailto:nvo3@ietf.org>;
>                     ietf@ietf.org <mailto:ietf@ietf..org>;
>                     draft-ietf-nvo3-vmm.all@ietf.org
>                     <mailto:draft-ietf-nvo3-vmm.all@ietf.org>
>                     Subject: Tsvart last call review of
>                     draft-ietf-nvo3-vmm-04
>
>                     Reviewer: Bob Briscoe
>
>                     Review result: Not Ready
>
>                     I have been selected as the Transport Directorate
>                     reviewer for this draft. The Transport Directorate
>                     seeks to review all transport or transport-related
>                     drafts as they pass through IETF last call and
>                     IESG review, and sometimes on special request. The
>                     purpose of the review is to provide assistance to
>                     the Transport ADs. For more information about the
>                     Transport Directorate Reviews and the Transport
>                     Area Review Team, please see
>                     https://trac..ietf.org/trac/tsv/wiki/TSV-Directorate-Reviews
>                     <https://trac.ietf.org/trac/tsv/wiki/TSV-Directorate-Reviews>
>
>                     In this case, very very few of the review comments
>                     relate to transport issues, although the greatest
>                     issue concerns a desire that the network could
>                     pause or stop connections during L3 VM Mobility,
>                     which is certainly a transport issue.
>
>                     [Linda] There is “Hot Migration” with transport
>                     service continuing, and there is a “Cold
>                     Migration”, which is a common practice in many
>                     data centers, which stop the task running on the
>                     old place and move to the new place before restart
>                     as described in the Task Migration.
>
>                     Is it helpful to add this description to the draft?
>
>                     ==Summary==
>
>                     The technical aspects of the draft concerning L2
>                     VM mobility (within a subnet) seem sound. However,
>                     this is only part of the draft, which has the
>                     following
>
>                     issues:
>
>                     #. The introduction does not say what the purpose
>                     of publishing this draft is.
>
>                     It seems that, rather than describing a specific
>                     protocol or protocols, it intends to describe the
>                     overall system procedure that would typically be
>                     used in DCs for VM mobility. It is tagged as a
>                     BCP, but it does not say who needs this BCP, why
>                     it is useful for the IETF to publish this BCP, how
>                     wide the authors' knowledge is of current practice
>                     (given DCs are private), or why this is a BCP
>                     rather than a protocol spec.
>
>                     [Linda] The first paragraph on Page 3 has the
>                     description why VM Mobility is needed. Is it
>                     helpful to move this paragraph to the beginning of
>                     the Introduction Section?
>
>                     /“Virtualization which is being used in almost all
>                     of today’s data/
>
>                     /centers enables many virtual machines to run on a
>                     single physical/
>
>                     /computer or compute server. Virtual machines (VM)
>                     need hypervisor/
>
>                     /running on the physical compute server to provide
>                     them shared/
>
>                     /processor/memory/storage. Network connectivity is
>                     provided by the/
>
>                     /network virtualization edge (NVE) [RFC8014].
>                     Being able to move VMs/
>
>                     /dynamically, or live migration, from one server
>                     to another allows for/
>
>                     /dynamic load balancing or work distribution and
>                     thus it is a highly/
>
>                     /desirable feature [RFC7364].”/
>
>                     The draft starts out (S.3) as if it intends to say
>                     what a good VM Mobility protocol should or
>                     shouldn't do, but the rest of the document doesn't
>                     give any reasoning for these recommendations, it
>                     just asserts what appears to be one view of how a
>                     whole VM Mobility system works, sometimes
>                     referring to one example protocol RFC for a
>                     component part, but more often with no references
>                     or details.
>
>                     [Linda] Is it helpful to move the paragraph above
>                     to the beginning of the Introduction Section? So
>                     that audience is aware of why VM Mobility is
>                     needed. And then follow up with what a good VM
>                     Mobility protocol should or shouldn't do?
>
>                     #. It does not seem as if the NVO WG has discussed
>                     the purpose of using normative text in this draft.
>                     See detailed comments.
>
>                     [Linda] The “Intended status” of the draft is
>                     “Best Current Practice”. So all the text are not
>                     “normative”. Is it Okay?
>
>                     #. The draft silently slips back and forth between
>                     VM mobility and VM redundancy, without recognizing
>                     the differences. See detailed comments.
>
>                     [Linda] There is only one usage of “redundancy” in
>                     the entire document, used under the context of
>                     “Hot standby option”, indicating  the “redundancy”
>                     of “the VMs in both primary and secondary domains
>                     have identical information and can provide
>                     services simultaneously as in load-share mode of
>                     operation” being expensive.
>
>                     #. Please adopt different terminology than "source
>                     NVE" and "destination NVE", which are really poor
>                     choices of terms for an intermediate node. See
>                     detailed comments. Why not use "old NVE" and "new
>                     NVE", which is what you mean?
>
>                     [Linda] Thanks for the suggestion. We will change
>                     to “Old NVE”, and “new NVE”.
>
>                     #. Applicability is fairly clearly outlined, but
>                     it is not clear whether hosts corresponding with
>                     the mobile VMs are part of the same controlled
>                     environment or on the uncontrolled public
>                     Internet. See detailed comments.
>
>                     [Linda] “Hosts” are the App running on the VM. It
>                     is the under the same controlled environment. Not
>                     on uncontrolled public internet.
>
>                     #. Section 4.2.1 on L3 VM mobility reads like some
>                     potential half-thought-through ideas on how to
>                     solve L3 mobility, rather than current practice,
>                     let alone best current practice. Either current
>                     practice should be described instead, or the scope
>                     of the draft should be narrowed solely to L2 VM
>                     mobility. See detailed comments.
>
>                     [Linda] This is refereeing to “Cold Migration”,
>                     which is a common practice in many data centers.
>
>                     # The VM's file system is described as state that
>                     moves with the VM (S.6), but VM mobility solutions
>                     often move the VM but stitch it back to its
>                     (unmoved) storage. Conversely, the storage can
>                     also move independent of the VM.
>
>                     [Linda] It depends. When a VM move to a different
>                     zone, the storage/file can becomes inaccessible.
>
>                     #. The draft omits some of the security, transport
>                     and management aspects of VM mobility. See
>                     detailed comments.
>
>                     [Linda] Can you provide some text?
>
>                     #. The draft reads as if different sections have
>                     been written by different authors and no-one has
>                     edited the whole to give it a coherent structure,
>                     or to ensure consistency (both technical and
>                     editorial) between the parts. See detailed comments.
>
>                     [Linda] we can improve.
>
>                     #. The quality of the English grammar does not
>                     allow a reviewer to concentrate on the technical
>                     aspects rather than the English. It would have
>                     been useful if one of the English-speaking
>                     co-authors had improved the English before
>                     submission for review. See detailed comments.
>
>                     [Linda] can you help?  Becoming a co-author to
>                     improve?
>
>                     ==Detailed Comments==
>
>                     ===#. Normative statements===
>
>                     In the body of the document, there is just one
>                     occurrence of normative text (actually two
>                     "MUST"s, but both state a common requirement -
>                     just written separately for IPv4 and IPv6). This
>                     merely serves to imply that everything else the
>                     document says is less important or optional, which
>                     was probably not the intention.
>
>                     [Linda] The goal is to indicate any solution in
>                     moving the VM “MUST” follow this rule. They make
>                     sense, aren’t they?
>
>                     At the start there is a requirements section,
>                     which states what a VM Mobility protocol "SHOULD"
>                     or "SHOULD NOT" do. I think this is intended as a
>                     set of goals for the rest of the document. If so,
>                     these "SHOULDs" are not intended to apply to
>                     implementations, so they ought not to be capitalized.
>
>                     [Linda] okay, will change.
>
>                     The first requirement, "Data center network SHOULD
>                     support virtual machine mobility in IPv6", is
>                     written as a requirement on all DC networks, not
>                     on implementations. I assume this was intended to
>                     read as "Data center network virtual machine
>                     mobility protocols SHOULD support IPv6". Even
>                     then, it doesn't really add anything to say VM
>                     mobility should support v6 and it should support
>                     v4. A L2 solution won't. While undoubtedly, a L3
>                     solution will at least support one of them.
>
>                     [Linda]Agree. Will change it to “Data center that
>                     support IPv6 address should …”
>
>                     I'm not sure that 'protocol' is the right word
>                     anyway; I think 'VM Mobility procedure' would be a
>                     better phrase, because it includes steps such as
>                     suspending the VM, which is more than a protocol.
>
>                     [Linda] yes. Will change to “Procedure”.
>
>                     The requirement "Virtual machine mobility protocol
>                     MAY support host routes to accomplish
>                     virtualization", is not followed up at all in the
>                     rest of the draft.
>
>                     Even if this requirement stays, the last 3 words
>                     should be deleted.
>
>                     [Linda] will change to “Host Route can be used to
>                     support the Virtual Machine Mobility Procedure.”
>
>                     By the end of the draft, the solution falls far
>                     short of the most relevant "Requirements" anyway,
>                     so one assumes the title of the section ought to
>                     have been "Goals". Specifically, even in the
>                     simpler case of L2 VM mobility, S.4.1 says that
>                     triangular routing and tunnelling persist "until a
>                     neighbour cache entry times out". A cache timeout
>                     is about 10 orders of magnitude longer than the
>                     requirement to only persist "while handling
>                     packets in flight", which would be a few
>                     milliseconds at most (the time for packets to
>                     clear the network that were already launched into
>                     flight when the old VM stopped).
>
>                     Whatever, it would be preferable for the draft to
>                     give rationale for these requirements, rather than
>                     just assert them. This would help to shed light on
>                     the merits of the different trade offs that
>                     solutions choose.
>
>                     [Linda] Agree, will add.
>
>                     ===#. Mobility vs. Redundancy===
>
>                     Redundancy and mobility have a lot of
>                     similarities, but they have different goals. With
>                     mobility, it is necessary to know the exact
>                     instant when one set of state is identical to the
>                     other so it can hand over. With redundancy, the
>                     aim is to keep two (or more) sets of state
>                     evolving through the same sequence of changes, but
>                     there is no need to know the point at which one is
>                     the same as the other was at a certain point.
>
>                     [Linda] Agree with what you said. There is only
>                     one usage of “redundancy” in the entire document,
>                     used under the context of “Hot standby option”,
>                     indicating  the “redundancy” of “the VMs in both
>                     primary and secondary domains have identical
>                     information and can provide services
>                     simultaneously as in load-share mode of operation”
>                     being expensive.
>
>                     The draft slips from mobility to resilience in the
>                     following places:
>
>                     * S.2. Terminology: Warm VM Mobility is defined
>                     without any ending, as if it is permanent
>                     replication. * S.7. "Handling of Hot, Warm and
>                     Cold Virtual Machine Mobility" is actually all
>                     about redundancy, and doesn't address mobility
>                     explicitly.
>
>                     [Linda] Will add the definition “Hot Migration”,
>                     “cold migration”, and “warm migration”.
>
>                     ===#. Terminology===
>
>                     Packets run from the source at A to the
>                     destination at B via NVE1, then via NVE2. Please
>                     don't call NVE1 and NVE2 the source NVE and the
>                     destination NVE.
>
>                     In future, no-one will thank you for the apparent
>                     contradictions when they continually stumble over
>                     phrases like this one in S.4.1: "...send their
>                     packets to the source NVE"..
>
>                     The term "packets in flight" is used incorrectly
>                     to refer to all the packets sent to the old NVE
>                     after the VM has moved, even if they were launched
>                     into flight long after the old VM stopped
>                     receiving packets.
>
>                     [Linda] thank for the comments. Will change.
>
>                     BTW, I think s/before/after/ in: "that have old
>                     ARP or neighbor cache entry before VM or task
>                     migration".
>
>                     I think: s/IP-based VM mobility/L3 VM mobility/
>                     throughout, because "based"
>
>                     sounds (to me) like the mobility control protocol
>                     is over (i.e. based on) IP.
>
>                     ===#. Applicability===
>
>                     In section 4.2 it says that the protocol mostly
>                     used as the IP based task migration protocol is
>                     ILA. This implies that all hosts corresponding
>                     with the mobile VMs are either part of the same
>                     controlled environment, or they are proxied via
>                     nodes that are part of the same controlled
>                     environment (I only have passing knowledge of ILA,
>                     but I understand that it depends on ILA routers on
>                     the path). If I am correct, this aspect of scope
>                     needs to be made clear from the start.
>
>                     Also under the heading of applicabiliy, the
>                     sentence "Since migrations should be relatively
>                     rare events" appears very late in the document
>                     (S.4.2.1). The assumed level of churn ought to be
>                     stated nearer the start.
>
>                     [Linda] yes, under the same controlled environment.
>
>                     ===#. L3 Mobility===
>
>                     L2 VM mobility is independent of the application,
>                     because resolution of L2 mappings is delegated to
>                     the stack. In contrast, L3 VM mobility is only
>                     feasible under certain conditions, because an
>                     application needs an IP address to open a socket
>                     (resolution of DNS names is not delegated to the
>                     stack, and apps can use IP addresses directly anyway).
>
>                     Examples of the 'certain conditions':
>
>                     a) /All/ applications used in the whole DC load
>                     balancing scheme contain IP address migration
>                     logic for /all/ their connections;
>                     b) VMs running solely applications that support IP
>                     address migration register this fact with the NVA,
>                     and it only select such VMs for mobility.
>                     c) An abstraction is layered over /all/ the IP
>                     addresses exposed to applications (at both ends)
>                     so that the IP addresses that applications use are
>                     solely identifiers  (e.g. ILA, LISP, HIP), not
>                     also locators.
>
>                     The introduction says the draft is about VM
>                     mobility in a multi-tenant DC, so the DC admin
>                     will not know the range of applications being
>                     used. This excludes condition (a) above. When the
>                     draft says "...if all applications running are
>                     known to handle this gracefully...", it doesn't
>                     quantify just how restrictive this condition is,
>                     and it gives no explanation of how this knowledge
>                     might be 'known' or which function within the
>                     system 'knows' it.
>
>                     S.4.2.1 contains what seems like plenty of arm-waving.
>
>                     * "TCP connections could be automatically closed
>                     in the network stack during a migration event."
>
>                             o There is no TCP connection state in the
>                     network stack.
>
>                             o Even if the network starts to drop every
>                     packet, the TCP connection
>
>                             state persists in the end-points for a
>                     duration of the order of 30-90
>
>                             minutes (OS-dependent) before TCP deems
>                     the connection is broken.
>                            oOther transport protocols have similar
>                     designs (including the app-layer
>
>                             of protocols over UDP).
>
>                     * "More involved approach to connection migration":
>
>                             o pausing the connection [does this refer
>                     to an actual feature of any
>
>                             L4 protocol?]
>                             o packaging connection state and sending
>                     to target [does
>
>                             this assume logic written into the
>                     application, or is this assuming the
>
>                             stack handles this and the app is
>                     restricted to using some form of
>
>                             separate identifier/locator addresses?]
>                             o instantiating connectionstate in the
>                     peer stack [ditto?].
>
>                     There's some arm-waving in S.7 too:
>
>                       "Cold Virtual Machine mobility is facilitated by
>                     the VM initially
>
>                        sending an ARP or Neighbor Discovery message at
>                     the destination NVE
>
>                        but the source NVE not receiving any packets
>                     inflight."
>
>                        [How is it arranged for the source NVE not to
>                     receive any packets in flight?]
>
>                     And in S.7:
>
>                       "In hot
>
>                        standby option, regarding TCP connections, one
>                     option is to start
>
>                        with and maintain TCP connections to two
>                     different VMs at the same
>
>                        time."
>
>                        [This sounds like resilience logic has been
>                     written into the application,
>
>                        which would be a special case but not something
>                     VM mobility infrastructure
>
>                        could depend on.]
>
>                     [Linda] will add.
>
>                     ===#. Gaps===
>
>                     #. Security Considerations: repeats issues in
>                     other drafts that are not specific to mobility,
>                     but it does not mention any security issues
>                     specifically due to VM mobility. It says that
>                     address spoofing may arise in a DC (sort-of
>                     implying it is worse than in non-DC environments,
>                     but not saying why). The handshake at the start of
>                     a connection (e.g. TCP, SCTP, QUIC) checks for
>                     source address spoofing. So L3 VM mobility would
>                     be more vulnerable to source address spoofing in
>                     cases where the mobile VM was the connection
>                     initiator and there was not a new handshake after
>                     the move. However, this draft does not contain any
>                     detailed mobility protocols, so it is not possible
>                     to identify any specific security flaws.
>
>                     #. Transport Issues: Effect of delay on the
>                     transport: Cold mobility introduces significant
>                     delay, and other forms less, but still some delay.
>                     It should be pointed out that some applications
>                     (e.g. real-time) will therefore not be useful if
>                     subjected to VM mobility. Similarly, even a short
>                     period of delay will drive most congestion
>                     controls to severely reduce throughput. These
>                     points might be self-evident, but perhaps they
>                     should be stated explicitly.
>
>                     BTW, in the L3 VM mobility case, the draft often
>                     refers to TCP connections, but the address
>                     bindings of any transport protocols would have to
>                     be migrated due to VM mobility (e.g. SCTP;
>                     sequences of datagrams over UDP; streams over UDP
>                     such as with RTP, QUIC).
>
>                     #. Management Issues: perhaps the draft ought to
>                     recommend statistics gathering (e.g. time taken,
>                     amount of duplicate data) to aid a DC's future
>                     decisions on the cost-benefit of moving a VM. The
>                     OPSDIR review says a BCP does not /have/ to
>                     describe management issues, but this document
>                     seems to describe a whole system procedure, not
>                     just a protocol, which then surely includes the
>                     management plane.
>
>                     [Linda] can you become a co-author and add those in?
>
>                     ===#. Incoherent Structure===
>
>                     S.4.1. happens to talk about VMs moving, while
>                     S.4.2. happens to talk about tasks moving, but
>                     this is not the distinguishing aspect of these two
>                     sections (anyway, S.2. says "the draft uses task
>                     and VM interchangeably"): * "4.1 VM Migration" is
>                     about "L2 VM Mobility" so this ought to be the
>                     section heading, *
>
>                     "4.2 Task Migration" is about "L3 VM Mobility" so
>                     this ought to be the section heading. It would
>                     also help not to switch from VM to task across
>                     these sections
>
>                     - it's just a distraction.
>
>                     S.4.1 needs better signposting of where each
>                     sub-case ends (Subsections might be useful to
>                     solve this): * IPv4 * end-user client * 2 paras
>                     starting "All NVEs communicating with this virtual
>                     machine..." [Not clear that the end-user case has
>                     ended and we have returned to the general IPv4
>                     case?] * IPv6 [Strictly, it still hasn't said
>                     whether the end-user client case has ended.]
>                     [Also, it doesn't explain why there is no need for
>                     an end-user client case under IPv6?] Sections 5 &
>                     6 seem to be about either L2 or L3 mobility,
>                     whereas Sections 7 &
>
>                     8 seem to be restricted to L2.
>
>                     The draft vacillates over what to do with packets
>                     arriving at the old NVE in the L3 case (see also
>                     L3 mobility above): * S4.2 first says packets are
>                     dropped, possibly with an ICMP error message;
>
>                       o then later it says they are silently dropped;
>
>                       o then in the very next sentence it says either
>                     silently drop them or forward
>
>                       them to the new location
>
>                     * S.5 says they should not be lost, but instead
>                     delivered to the destination hypervisor
>
>                       o then it describes how they are tunnelled
>                     (which is not the same as
>
>                       "forwarding").
>
>                     The order in which all the stages of mobilty are
>                     given is jumbled up across sections that also
>                     appear in arbitrary order: * S.5 prepares,
>                     establishes uses then stops a tunnel, but it
>                     doesn't say where the other stages fit between
>                     these steps
>
>                             o When tunneling packets, it talks about
>                     the *migrating* VM not the
>
>                     *migrated* VM, which implies tunnelling has
>                     started before the new VM
>
>                             is running. Does this imply there is a
>                     huge buffer? o It says "Stop
>
>                             Tunneling Packets - When source NVE stops
>                     receiving packets destined
>
>                             to..." but it is never clear when a source
>                     has stopped sending packets
>
>                             to a destination, unless it explicitly
>                     closes the connection (e.g. with
>
>                             a FIN in the case of TCP). Often there are
>                     long gaps between packets,
>
>                             because many flows are 'thin' (meaning the
>                     application frequently has
>
>                             nothing to send). These gaps can last for
>                     milliseconds, hours or even
>
>                             days without any implication that the
>                     connection has ended.
>
>                     * Then S.6. describes moving state, but doesn't
>                     say that this is not after the previous tunnelling
>                     steps (or where it fits within those steps). *
>                     Then S.7 describes hot, warm and cold mobility,
>                     but doesn't lay out the tunnelling or steps to
>                     move state in each case. * Then S.8 says it's
>                     about VM life-cycle, but just gives the very first
>                     3 steps for allocation of resources to a VM, then
>                     abruptly ends, without even starting the VM, let
>                     alone getting to move it.
>
>                     S.5 exhibits another inconsistency by talking
>                     about the hypervisor, not the NVE.
>
>                     ==#. Nits==
>
>                     Nits with the English are too numerous to mention
>                     them all. Below are pointers to general problems
>                     as well as some individual instances.
>
>                     S.4
>
>                       "Layer 2 and Layer 3 protocols are described
>                     next.  In the following
>
>                        sections, we examine more advanced features."
>
>                     s/following/subsequent/
>
>                     S.4.1
>
>                     Expand WSC, MSC and NVA on first use.
>
>                     s/the VM moves in the same link/the VM moves in
>                     the same subnet/
>
>                     "i.e. end-user clients ask for the same MAC
>                     address upon migration. [...] to ensure that the
>                     same IPv4 address is assigned to the VM." I think
>                     s/IPv4/MAC/ was intended?
>
>                     "  All NVEs communicating with this virtual
>                     machine uses the old ARP
>
>                        entry.  If any VM in those NVEs need to talk to
>                     the new VM in the
>
>                        destination NVE, it uses the old ARP entry."
>
>                     Repetition: these 2 sentences say the same. (The
>                     mistake is also repeated when these 2 sentences
>                     are repeated for IPv6).
>
>                     S.4.2.1
>
>                     s/Push the new mapping to hosts./Push the new
>                     mapping to communicating hosts./
>
>                     S.5.
>
>                     The IPv4/IPv6 pairs of paras for "tunnel
>                     estabilshment" and "tunneling packets"
>
>                     only differ in the words "IPv4"/"IPv6". So in each
>                     case a single para could be given for IP
>                     (irrespective of whether v4 or v6).
>
>                     Thank you very much.
>
>                     Linda Dunbar
>
>
>
>                     _______________________________________________
>
>                     Tsv-art mailing list
>
>                     Tsv-art@ietf.org  <mailto:Tsv-art@ietf.org>
>
>                     https://www.ietf.org/mailman/listinfo/tsv-art  <https://www..ietf.org/mailman/listinfo/tsv-art>
>
>
>
>                 -- 
>
>                 ________________________________________________________________
>
>                 Bob Briscoehttp://bobbriscoe.net/
>
>
>
>             _______________________________________________
>
>             Tsv-art mailing list
>
>             Tsv-art@ietf.org  <mailto:Tsv-art@ietf.org>
>
>             https://www.ietf.org/mailman/listinfo/tsv-art
>
>
>
>         -- 
>
>         ________________________________________________________________
>
>         Bob Briscoehttp://bobbriscoe.net/
>
>
>
>
>     _______________________________________________
>
>     nvo3 mailing list
>
>     nvo3@ietf.org  <mailto:nvo3@ietf.org>
>
>     https://www.ietf.org/mailman/listinfo/nvo3
>
>
>
> -- 
> ________________________________________________________________
> Bob Briscoehttp://bobbriscoe.net/
>
> _______________________________________________
> nvo3 mailing list
> nvo3@ietf.org
> https://www.ietf.org/mailman/listinfo/nvo3

-- 
________________________________________________________________
Bob Briscoe                               http://bobbriscoe.net/