[nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-04

Bob Briscoe <ietf@bobbriscoe.net> Tue, 04 September 2018 02:44 UTC

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Bob Briscoe <ietf@bobbriscoe.net>
To: tsv-art@ietf.org
Cc: nvo3@ietf.org, ietf@ietf.org, draft-ietf-nvo3-vmm.all@ietf.org
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <153602909285.13281.13763046029400746910@ietfa.amsl.com>
Date: Mon, 03 Sep 2018 19:44:52 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/nvo3/BvOX98Yl54vMPqcIsBq3Cnr7usU>
Subject: [nvo3] Tsvart last call review of draft-ietf-nvo3-vmm-04

Reviewer: Bob Briscoe
Review result: Not Ready

I have been selected as the Transport Directorate reviewer for this draft. The
Transport Directorate seeks to review all transport or transport-related drafts
as they pass through IETF last call and IESG review, and sometimes on special
request. The purpose of the review is to provide assistance to the Transport
ADs. For more information about the Transport Directorate Reviews and the
Transport Area Review Team, please see
https://trac.ietf.org/trac/tsv/wiki/TSV-Directorate-Reviews

In this case, very very few of the review comments relate to transport issues,
although the greatest issue concerns a desire that the network could pause or
stop connections during L3 VM Mobility, which is certainly a transport issue.

==Summary==

The technical aspects of the draft concerning L2 VM mobility (within a subnet)
seem sound. However, this is only part of the draft, which has the following
issues:

#. The introduction does not say what the purpose of publishing this draft is.
It seems that, rather than describing a specific protocol or protocols, it
intends to describe the overall system procedure that would typically be used
in DCs for VM mobility. It is tagged as a BCP, but it does not say who needs
this BCP, why it is useful for the IETF to publish this BCP, how wide the
authors' knowledge is of current practice (given DCs are private), or why this
is a BCP rather than a protocol spec.

The draft starts out (S.3) as if it intends to say what a good VM Mobility
protocol should or shouldn't do, but the rest of the document doesn't give any
reasoning for these recommendations, it just asserts what appears to be one
view of how a whole VM Mobility system works, sometimes referring to one
example protocol RFC for a component part, but more often with no references or
details.

#. It does not seem as if the NVO WG has discussed the purpose of using
normative text in this draft. See detailed comments.

#. The draft silently slips back and forth between VM mobility and VM
redundancy, without recognizing the differences. See detailed comments.

#. Please adopt different terminology than "source NVE" and "destination NVE",
which are really poor choices of terms for an intermediate node. See detailed
comments. Why not use "old NVE" and "new NVE", which is what you mean?

#. Applicability is fairly clearly outlined, but it is not clear whether hosts
corresponding with the mobile VMs are part of the same controlled environment
or on the uncontrolled public Internet. See detailed comments.

#. Section 4.2.1 on L3 VM mobility reads like some potential
half-thought-through ideas on how to solve L3 mobility, rather than current
practice, let alone best current practice. Either current practice should be
described instead, or the scope of the draft should be narrowed solely to L2 VM
mobility. See detailed comments.

# The VM's file system is described as state that moves with the VM (S.6), but
VM mobility solutions often move the VM but stitch it back to its (unmoved)
storage. Conversely, the storage can also move independent of the VM.

#. The draft omits some of the security, transport and management aspects of VM
mobility. See detailed comments.

#. The draft reads as if different sections have been written by different
authors and no-one has edited the whole to give it a coherent structure, or to
ensure consistency (both technical and editorial) between the parts. See
detailed comments.

#. The quality of the English grammar does not allow a reviewer to concentrate
on the technical aspects rather than the English. It would have been useful if
one of the English-speaking co-authors had improved the English before
submission for review. See detailed comments.

==Detailed Comments==

===#. Normative statements===

In the body of the document, there is just one occurrence of normative text
(actually two "MUST"s, but both state a common requirement - just written
separately for IPv4 and IPv6). This merely serves to imply that everything else
the document says is less important or optional, which was probably not the
intention.

At the start there is a requirements section, which states what a VM Mobility
protocol "SHOULD" or "SHOULD NOT" do. I think this is intended as a set of
goals for the rest of the document. If so, these "SHOULDs" are not intended to
apply to implementations, so they ought not to be capitalized.

The first requirement, "Data center network SHOULD support virtual machine
mobility in IPv6", is written as a requirement on all DC networks, not on
implementations. I assume this was intended to read as "Data center network
virtual machine mobility protocols SHOULD support IPv6". Even then, it doesn't
really add anything to say VM mobility should support v6 and it should support
v4. A L2 solution won't. While undoubtedly, a L3 solution will at least support
one of them.

I'm not sure that 'protocol' is the right word anyway; I think 'VM Mobility
procedure' would be a better phrase, because it includes steps such as
suspending the VM, which is more than a protocol.

The requirement "Virtual machine mobility protocol MAY support host routes to
accomplish virtualization", is not followed up at all in the rest of the draft.
Even if this requirement stays, the last 3 words should be deleted.

By the end of the draft, the solution falls far short of the most relevant
"Requirements" anyway, so one assumes the title of the section ought to have
been "Goals". Specifically, even in the simpler case of L2 VM mobility, S.4.1
says that triangular routing and tunnelling persist "until a neighbour cache
entry times out". A cache timeout is about 10 orders of magnitude longer than
the requirement to only persist "while handling packets in flight", which would
be a few milliseconds at most (the time for packets to clear the network that
were already launched into flight when the old VM stopped).

Whatever, it would be preferable for the draft to give rationale for these
requirements, rather than just assert them. This would help to shed light on
the merits of the different trade offs that solutions choose.

===#. Mobility vs. Redundancy===

Redundancy and mobility have a lot of similarities, but they have different
goals. With mobility, it is necessary to know the exact instant when one set of
state is identical to the other so it can hand over. With redundancy, the aim
is to keep two (or more) sets of state evolving through the same sequence of
changes, but there is no need to know the point at which one is the same as the
other was at a certain point.

The draft slips from mobility to resilience in the following places:
* S.2. Terminology: Warm VM Mobility is defined without any ending, as if it is
permanent replication. * S.7. "Handling of Hot, Warm and Cold Virtual Machine
Mobility" is actually all about redundancy, and doesn't address mobility
explicitly.

===#. Terminology===

Packets run from the source at A to the destination at B via NVE1, then via
NVE2. Please don't call NVE1 and NVE2 the source NVE and the destination NVE.
In future, no-one will thank you for the apparent contradictions when they
continually stumble over phrases like this one in S.4.1: "...send their packets
to the source NVE".

The term "packets in flight" is used incorrectly to refer to all the packets
sent to the old NVE after the VM has moved, even if they were launched into
flight long after the old VM stopped receiving packets.

BTW, I think s/before/after/ in: "that have old ARP or neighbor cache entry
before VM or task migration".

I think: s/IP-based VM mobility/L3 VM mobility/ throughout, because "based"
sounds (to me) like the mobility control protocol is over (i.e. based on) IP.

===#. Applicability===

In section 4.2 it says that the protocol mostly used as the IP based task
migration protocol is ILA. This implies that all hosts corresponding with the
mobile VMs are either part of the same controlled environment, or they are
proxied via nodes that are part of the same controlled environment (I only have
passing knowledge of ILA, but I understand that it depends on ILA routers on
the path). If I am correct, this aspect of scope needs to be made clear from
the start.

Also under the heading of applicabiliy, the sentence "Since migrations should
be relatively rare events" appears very late in the document (S.4.2.1). The
assumed level of churn ought to be stated nearer the start.

===#. L3 Mobility===
L2 VM mobility is independent of the application, because resolution of L2
mappings is delegated to the stack. In contrast, L3 VM mobility is only
feasible under certain conditions, because an application needs an IP address
to open a socket (resolution of DNS names is not delegated to the stack, and
apps can use IP addresses directly anyway).

Examples of the 'certain conditions':
a) /All/ applications used in the whole DC load balancing scheme contain IP
address migration logic for /all/ their connections; b) VMs running solely
applications that support IP address migration register this fact with the NVA,
and it only select such VMs for mobility. c) An abstraction is layered over
/all/ the IP addresses exposed to applications (at both ends) so that the IP
addresses that applications use are solely identifiers  (e.g. ILA, LISP, HIP),
not also locators.

The introduction says the draft is about VM mobility in a multi-tenant DC, so
the DC admin will not know the range of applications being used. This excludes
condition (a) above. When the draft says "...if all applications running are
known to handle this gracefully...", it doesn't quantify just how restrictive
this condition is, and it gives no explanation of how this knowledge might be
'known' or which function within the system 'knows' it.

S.4.2.1 contains what seems like plenty of arm-waving.
* "TCP connections could be automatically closed in the network stack during a
migration event."
        o There is no TCP connection state in the network stack.
        o Even if the network starts to drop every packet, the TCP connection
        state persists in the end-points for a duration of the order of 30-90
        minutes (OS-dependent) before TCP deems the connection is broken. o
        Other transport protocols have similar designs (including the app-layer
        of protocols over UDP).
* "More involved approach to connection migration":
        o pausing the connection [does this refer to an actual feature of any
        L4 protocol?] o packaging connection state and sending to target [does
        this assume logic written into the application, or is this assuming the
        stack handles this and the app is restricted to using some form of
        separate identifier/locator addresses?] o instantiating connection
        state in the peer stack [ditto?].

There's some arm-waving in S.7 too:
  "Cold Virtual Machine mobility is facilitated by the VM initially
   sending an ARP or Neighbor Discovery message at the destination NVE
   but the source NVE not receiving any packets inflight."
   [How is it arranged for the source NVE not to receive any packets in flight?]

And in S.7:
  "In hot
   standby option, regarding TCP connections, one option is to start
   with and maintain TCP connections to two different VMs at the same
   time."
   [This sounds like resilience logic has been written into the application,
   which would be a special case but not something VM mobility infrastructure
   could depend on.]

===#. Gaps===
#. Security Considerations: repeats issues in other drafts that are not
specific to mobility, but it does not mention any security issues specifically
due to VM mobility. It says that address spoofing may arise in a DC (sort-of
implying it is worse than in non-DC environments, but not saying why). The
handshake at the start of a connection (e.g. TCP, SCTP, QUIC) checks for source
address spoofing. So L3 VM mobility would be more vulnerable to source address
spoofing in cases where the mobile VM was the connection initiator and there
was not a new handshake after the move. However, this draft does not contain
any detailed mobility protocols, so it is not possible to identify any specific
security flaws.

#. Transport Issues: Effect of delay on the transport: Cold mobility introduces
significant delay, and other forms less, but still some delay. It should be
pointed out that some applications (e.g. real-time) will therefore not be
useful if subjected to VM mobility. Similarly, even a short period of delay
will drive most congestion controls to severely reduce throughput. These points
might be self-evident, but perhaps they should be stated explicitly.

BTW, in the L3 VM mobility case, the draft often refers to TCP connections, but
the address bindings of any transport protocols would have to be migrated due
to VM mobility (e.g. SCTP; sequences of datagrams over UDP; streams over UDP
such as with RTP, QUIC).

#. Management Issues: perhaps the draft ought to recommend statistics gathering
(e.g. time taken, amount of duplicate data) to aid a DC's future decisions on
the cost-benefit of moving a VM. The OPSDIR review says a BCP does not /have/
to describe management issues, but this document seems to describe a whole
system procedure, not just a protocol, which then surely includes the
management plane.

===#. Incoherent Structure===

S.4.1. happens to talk about VMs moving, while S.4.2. happens to talk about
tasks moving, but this is not the distinguishing aspect of these two sections
(anyway, S.2. says "the draft uses task and VM interchangeably"): * "4.1 VM
Migration" is about "L2 VM Mobility" so this ought to be the section heading, *
"4.2 Task Migration" is about "L3 VM Mobility" so this ought to be the section
heading. It would also help not to switch from VM to task across these sections
- it's just a distraction.

S.4.1 needs better signposting of where each sub-case ends (Subsections might
be useful to solve this): * IPv4 * end-user client * 2 paras starting "All NVEs
communicating with this virtual machine..." [Not clear that the end-user case
has ended and we have returned to the general IPv4 case?] * IPv6 [Strictly, it
still hasn't said whether the end-user client case has ended.] [Also, it
doesn't explain why there is no need for an end-user client case under IPv6?]
Sections 5 & 6 seem to be about either L2 or L3 mobility, whereas Sections 7 &
8 seem to be restricted to L2.

The draft vacillates over what to do with packets arriving at the old NVE in
the L3 case (see also L3 mobility above): * S4.2 first says packets are
dropped, possibly with an ICMP error message;
  o then later it says they are silently dropped;
  o then in the very next sentence it says either silently drop them or forward
  them to the new location
* S.5 says they should not be lost, but instead delivered to the destination
hypervisor
  o then it describes how they are tunnelled (which is not the same as
  "forwarding").

The order in which all the stages of mobilty are given is jumbled up across
sections that also appear in arbitrary order: * S.5 prepares, establishes uses
then stops a tunnel, but it doesn't say where the other stages fit between
these steps
        o When tunneling packets, it talks about the *migrating* VM not the
        *migrated* VM, which implies tunnelling has started before the new VM
        is running. Does this imply there is a huge buffer? o It says "Stop
        Tunneling Packets - When source NVE stops receiving packets destined
        to..." but it is never clear when a source has stopped sending packets
        to a destination, unless it explicitly closes the connection (e.g. with
        a FIN in the case of TCP). Often there are long gaps between packets,
        because many flows are 'thin' (meaning the application frequently has
        nothing to send). These gaps can last for milliseconds, hours or even
        days without any implication that the connection has ended.
* Then S.6. describes moving state, but doesn't say that this is not after the
previous tunnelling steps (or where it fits within those steps). * Then S.7
describes hot, warm and cold mobility, but doesn't lay out the tunnelling or
steps to move state in each case. * Then S.8 says it's about VM life-cycle, but
just gives the very first 3 steps for allocation of resources to a VM, then
abruptly ends, without even starting the VM, let alone getting to move it.

S.5 exhibits another inconsistency by talking about the hypervisor, not the NVE.

==#. Nits==

Nits with the English are too numerous to mention them all. Below are pointers
to general problems as well as some individual instances.

S.4
  "Layer 2 and Layer 3 protocols are described next.  In the following
   sections, we examine more advanced features."
        s/following/subsequent/

S.4.1
Expand WSC, MSC and NVA on first use.

s/the VM moves in the same link/the VM moves in the same subnet/

"i.e. end-user clients ask for the same MAC address upon migration. [...] to
ensure that the same IPv4 address is assigned to the VM." I think s/IPv4/MAC/
was intended?

"  All NVEs communicating with this virtual machine uses the old ARP
   entry.  If any VM in those NVEs need to talk to the new VM in the
   destination NVE, it uses the old ARP entry."
Repetition: these 2 sentences say the same. (The mistake is also repeated when
these 2 sentences are repeated for IPv6).

S.4.2.1
s/Push the new mapping to hosts./Push the new mapping to communicating hosts./

S.5.
The IPv4/IPv6 pairs of paras for "tunnel estabilshment" and "tunneling packets"
only differ in the words "IPv4"/"IPv6". So in each case a single para could be
given for IP (irrespective of whether v4 or v6).

[nvo3] Tsvart last call review of draft-ietf-nvo3… Bob Briscoe
Re: [nvo3] Tsvart last call review of draft-ietf-… Linda Dunbar
Re: [nvo3] Tsvart last call review of draft-ietf-… Black, David
Re: [nvo3] Tsvart last call review of draft-ietf-… Linda Dunbar
Re: [nvo3] Tsvart last call review of draft-ietf-… Behcet Sarikaya
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Spencer Dawkins at IETF
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Black, David
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Bob Briscoe
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Bob Briscoe
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Behcet Sarikaya
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Bob Briscoe
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Behcet Sarikaya
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Bob Briscoe
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Bocci, Matthew (Nokia - GB)
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Bob Briscoe
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Bob Briscoe
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Black, David
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Black, David
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Bob Briscoe
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Black, David
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Black, David
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Black, David
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Black, David
Re: [nvo3] [Tsv-art] Tsvart last call review of d… Linda Dunbar