[Tsv-art] Tsvart last call review of draft-ietf-detnet-architecture-08

Michael Scharf <michael.scharf@hs-esslingen.de> Fri, 28 September 2018 22:24 UTC

Return-Path: <michael.scharf@hs-esslingen.de>
X-Original-To: tsv-art@ietf.org
Delivered-To: tsv-art@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id B250A130E7D; Fri, 28 Sep 2018 15:24:19 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Michael Scharf <michael.scharf@hs-esslingen.de>
To: tsv-art@ietf.org
Cc: draft-ietf-detnet-architecture.all@ietf.org, detnet@ietf.org, ietf@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.84.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <153817345967.27205.135001179751151278@ietfa.amsl.com>
Date: Fri, 28 Sep 2018 15:24:19 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/PrhGhvCNYoAAbEBY8fc73dAfKIQ>
Subject: [Tsv-art] Tsvart last call review of draft-ietf-detnet-architecture-08
X-BeenThere: tsv-art@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Transport Area Review Team <tsv-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-art/>
List-Post: <mailto:tsv-art@ietf.org>
List-Help: <mailto:tsv-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Sep 2018 22:24:20 -0000

Reviewer: Michael Scharf
Review result: Ready with Issues

The document "Deterministic Networking Architecture"
(draft-ietf-detnet-architecture-08) defines an overall framework for
Deterministic Networking.

As TSV-ART reviewer, I believe that this document has issues as detailed below.

Michael

Major issues:

* It seems that DetNet cannot easily be deployed in the Internet without
additional means. Thus, for a baseline document, one could expect some
explanation on the requirements of deploying DetNet in a network. DetNet
basically requires support in (almost) all network devices transporting DetNet
traffic. That assumption should be explicitly spelt out early in the document,
e.g., in the introduction. There also needs to be an explicit discussion of the
implications if not the whole network is aware of or supports DetNet. There is
some text in Section 4.2.2 and Section 4.3.3, but I believe additional explicit
discussion is needed at a prominant place. For instance, can use of DetNet do
harm to parts of a network not supporting DetNet? As a side note, when TCPM
published RFC 8257, the following disclaimer was added: "DCTCP, as described in
this specification, is applicable to deployments in controlled environments
like data centers, but it must not be deployed over the public Internet without
additional measures." I wonder if a similar disclaimer is needed for DetNet. If
there is an implicit assumption that DetNet will  be used in homogenous
environments with mostly DetNet-aware devices within the same organization,
such an assumption should be made explicit.

* It is surprising that there is hardly any discussion on network robustness
and safety; this probably also relates to security. For instance,
misconfiguration or errors of functions performing packet replication could
severely and permantly congest a network and cause harm. How does the DetNet
architecture ensure that a network stays fully operational e.g. if the topology
changes or there are equipment failures? Probably this can be solved by
implementations (e.g., dynamic control plane), but why are corresponding
requirements not spelt out? Section 3.3.2 speculates that filters and policers
can help, and that may be true, but that probably still assumes consistently
and correctly configured (and well-behaving) devices. And Section 3.3.2 is
vague and mentions a "infinite variety of possible failures" without stating
any requirements or recommendations. There may be further solutions, such as
circuit breakers and the like. Why are such topics not discussed?

* Somewhat related, the document only looks at impact of failures to the QoS of
DetNet traffic. What is missing is a discussion how to protect non-DetNet parts
of a network from any harm caused by DetNet mechanisms. Solutions to this
probably exist. But why is the impact on non-DetNet traffic (e.g., in case of
topology changes or failures of DetNet functions) not discussed at all in the
document?

* Regarding security, an architecture like DetNet probably requires that only
authenticated and authorized end systems have access to the data plane. The
security considerations only briefly mention the control aspect ("the
authentication and authorization of the controlling systems").

* For an architecture document, the lack of clarity and consistency regarding
terminology is concerning. This specifically applies to the case of incomplete
networks (as per Section 4.2.2 and 4.3.3) that include "DetNet-unaware nodes".
The document introduces terms such as "DetNet intermediate nodes" but then
repeatedly uses generic terms such as "node" or "hop" that may include
DetNet-unaware nodes. For instance, for incomplete networks, a sentence such as
"The primary means by which DetNet achieves its QoS assurances is to reduce, or
even completely eliminate, congestion within a node as a cause of packet loss"
seems to only apply to "DetNet transit nodes" but not "DetNet-unaware nodes".
Similar ambiguity exist for other use of the terms "hop" and "node", which may
or may not include DetNet-unaware nodes. It is unclear why the document does
not consistently use the terminology introduced in Section 2.1 in all sections
and clearly distinguishes cases with and without DetNet support.

* Section 4.4 refers to RFC 7426, which is an informational RFC on IRTF stream,
and the document uses the concepts introduced there (e.g., "planes"). This is
very confusing. First, an IETF Proposed Standard should probably refer to
documents having IETF consensus. An example would be RFC 7491, albeit there is
other related work as well, e.g., in the TEAS WG. Second, Section 4.4 is by and
large decoupled from the rest of the document and not specific to DetNet.
Neither do other sections of the document refer to the concepts introduced in
Section 4.4, nor does Section 4.4 use the DetNet terminology or discuss
applicability to DetNet. Section 4.4 even mentions explicitly at the end that
it discusses aspects that are orthogonal to the DetNet architecture. It is not
at all clear why Section 4.4 is in this document. Section 4.4 could be removed
from the document without impacting the rest of the document.

Minor issues:

* Terminology "DetNet transport layer"

  The term "transport layer" has a well-defined meaning in the IETF, e.g.
  originating from RFC 1122. While "transport" and e.g. "transport network" is
  used in the IETF for different technologies in different areas, I think the
  term "transport layer" is typically understood to refer to transport
  protocols such as TCP and UDP. As such, I personally find the term "DetNet
  transport layer" misleading and confusing. The confusion is easy to see e.g.
  in Figure 4, where UDP (which is a transport protocol as per RFC 1122) sits
  on top of "transport".

  Based on the document it also may be solution/implementation specific whether
  the "DetNet transport layer" is actually a separate protocol layer compared
  to the "DetNet service layer". Thus it is not clear to me why the word
  "layer" has to be used, specifically in combination "transport layer".

  To me as, the word "transport layer" (and "transport protocol") should be
  used for protocols defined in TSV area, consistent with RFC 1122. But this is
  probably a question to be sorted out by the IESG.

* Page 9

   A DetNet node may have other resources requiring allocation and/or
   scheduling,

  This is just one of several examples for inconsistent use of terminology.
  What is a "DetNet node"? That term is not introduced in Section 2.1

* Page 14

   A DetNet network supports the dedication of a high proportion (e.g.
   75%) of the network bandwidth to DetNet flows.

  The 75% value is not reasoned. What prevents using 99% of the bandwidth for
  DetNet traffic?

* Page 15: Figure 2

  If the term "transport layer" cannot be avoided, the labels in this figure
  should at least be expanded to "DetNet transport layer".

* Page 18: Figure 4

  As already mentioned earlier, Figure 4 is confusing. UDP is a transport
  protocol. If the term "transport" cannot be avoided, the labels in this
  figure should at least be expanded to "DetNet transport".

* Page 23

   If the source transmits less data than this limit
   allows, the unused resource such as link bandwidth can be made
   available by the system to non-DetNet packets.

  Could there be additional requirements on the use of unused resources by
  non-DetNet packets, e.g., regarding preemption? I am just wondering... If
  that was possible, a statement like "... can be made available by the system
  to non-DetNet packets as long as all guarantees are fulfilled" would be on
  the safe side, no?

* Page 27:

   DetNet achieves congestion protection and bounded delivery latency by
   reserving bandwidth and buffer resources at every hop along the path
   of the DetNet flow.

  Why does this sentence use the word "hop"? As far as I understand, in DetNet
  bandwidth and buffer resources are reserved in each DetNet intermediate node.
  If there were hops over IP routers not being DetNet intermediate nodes, no
  resources would be reserved there. As per Section 4.3.3, it is possible to
  deploy DetNet this way. And obviously there can be resource bottlenecks below
  IP, on devices that are not routers... So does "hop" here refer to IP router
  hops or also to devices not processing IP (or IP/MPLS)?

* Page 27:

   Standard queuing and transmission selection algorithms allow a
   central controller to compute the latency contribution of each
   transit node to the end-to-end latency, ...

  The text does not explain why a _central_ controller is needed for this
  computation. Why would a distributed control plane not be able to realize
  this computation. Isn't this implementation-specific?

* Page 32

  To somebody who is not deeply familiar with DetNet, it is impossible to parse
  the description of the examples in Section 4.7.3. For instance, "VID +
  multicast MAC address" is not introduced. I think this example must be
  expaned with additional context and explanation to be useful to readers.

* Page 34

   There are three classes of information that a central controller or
   distributed control plane needs to know that can only be obtained
   from the end systems and/or nodes in the network.

  Wouldn't it be sufficient to state "Provisioning of DetNet requires knowledge
  about ...". Does it matter in this context whether the provisioning is done
  by a central controller or a distributed control plane? For instance, could
  the same paragraph also apply to a network that uses _multiple_ central
  controllers, or hybrid combinations of central controllers and distributed
  control planes? In general, an architecture document should be agnostic to
  implementation aspects unless there is a specific need. In this specific
  case, I fail to see a need to discuss the realization of the control plane of
  a network.

Editorial nits:

* Page 9:

   The low-level mechanisms described in Section 4.5 provide the
   necessary regulation of transmissions by an end system or
   intermediate node to provide congestion protection.  The allocation
   of the bandwidth and buffers for a DetNet flow requires provisioning
   A DetNet node may have other resources requiring allocation and/or
   scheduling, that might otherwise be over-subscribed and trigger the
   rejection of a reservation.

  Probably a full stop is missing after "provisioning".

* Page 11: "... along separate (disjoint non-SRLG) paths ..."

  I find this confusing. I would understand e.g. "along separate
  (SRLG-disjoint) paths".

* Page 34:

   When using a peer-
   to-peer control plane, some of this information may be required by a
   system's neighbors in the network.

  Would "acquired" be a better term?

* Page 34:

   o  The identity of the system's neighbors, and the characteristics of
      the link(s) between the systems, including the length (in
      nanoseconds) of the link(s).

  "Latency" or "delay" would probably be a better terms if the value is
  measured in nanoseconds.

* Page 35:

   DetNet is provides a Quality of Service (QoS), and as such, does not
   directly raise any new privacy considerations.

  Broken sentence

* Please expand acronyms on first use (e.g., OTN)