Re: [Tsv-art] Tsvart last call review of draft-ietf-detnet-architecture-08

"Scharf, Michael" <Michael.Scharf@hs-esslingen.de> Mon, 19 November 2018 22:15 UTC

Return-Path: <Michael.Scharf@hs-esslingen.de>
X-Original-To: tsv-art@ietfa.amsl.com
Delivered-To: tsv-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1636E1252B7; Mon, 19 Nov 2018 14:15:31 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=hs-esslingen.de
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vlGkKOIApT1p; Mon, 19 Nov 2018 14:15:27 -0800 (PST)
Received: from mail.hs-esslingen.de (mail.hs-esslingen.de [134.108.32.78]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E2089130E07; Mon, 19 Nov 2018 14:15:25 -0800 (PST)
Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.hs-esslingen.de (Postfix) with ESMTP id EB58525A2F; Mon, 19 Nov 2018 23:15:23 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hs-esslingen.de; s=mail; t=1542665724; bh=jT1vajoHLRoeNyWLnWCYikTxfM1y3HhaMnH0bYTVmZ4=; h=From:To:CC:Subject:Date:References:In-Reply-To:From; b=HnlFPZyxvdx+Qgfe4LMgtT58+fGbd5nUh6cFDNEde75yMAYNgyEpUi5nbKo1HG9OE BeMMAA2UitiPRpbH3aaBUTOg5Cxv+5vOysPEhfiaxIHUCZ3u3fE/9s9R0YCg+rYzhw 1dmMR8d8RMp1fM22DxUHfKgcN3qy1IzC0jKKhj/4=
X-Virus-Scanned: by amavisd-new-2.7.1 (20120429) (Debian) at hs-esslingen.de
Received: from mail.hs-esslingen.de ([127.0.0.1]) by localhost (hs-esslingen.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8XMP7M0iAAoR; Mon, 19 Nov 2018 23:15:23 +0100 (CET)
Received: from rznt8102.rznt.rzdir.fht-esslingen.de (rznt8102.hs-esslingen.de [134.108.29.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.hs-esslingen.de (Postfix) with ESMTPS; Mon, 19 Nov 2018 23:15:23 +0100 (CET)
Received: from RZNT8114.rznt.rzdir.fht-esslingen.de ([169.254.3.98]) by rznt8102.rznt.rzdir.fht-esslingen.de ([fe80::f977:d5e6:6b09:56ac%10]) with mapi id 14.03.0415.000; Mon, 19 Nov 2018 23:15:23 +0100
From: "Scharf, Michael" <Michael.Scharf@hs-esslingen.de>
To: Lou Berger <lberger@labn.net>
CC: "tsv-art@ietf.org" <tsv-art@ietf.org>, "draft-ietf-detnet-architecture.all@ietf.org" <draft-ietf-detnet-architecture.all@ietf.org>, "detnet@ietf.org" <detnet@ietf.org>
Thread-Topic: Tsvart last call review of draft-ietf-detnet-architecture-08
Thread-Index: AQHUgB1y5LtWcAVh50yat0D4Ke6LC6VXhkBA
Date: Mon, 19 Nov 2018 22:15:22 +0000
Message-ID: <6EC6417807D9754DA64F3087E2E2E03E2D16A4D3@rznt8114.rznt.rzdir.fht-esslingen.de>
References: <153817345967.27205.135001179751151278@ietfa.amsl.com> <fdf872d6-08a6-2c33-de21-9dd1506c1d21@labn.net>
In-Reply-To: <fdf872d6-08a6-2c33-de21-9dd1506c1d21@labn.net>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [134.108.29.249]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/JN58twOOt7ZPMIrmns8wEmq7NyA>
Subject: Re: [Tsv-art] Tsvart last call review of draft-ietf-detnet-architecture-08
X-BeenThere: tsv-art@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Review Team <tsv-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-art/>
List-Post: <mailto:tsv-art@ietf.org>
List-Help: <mailto:tsv-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 19 Nov 2018 22:15:31 -0000

Lou,

> -- 
> I wanted to take a step back from the multiple discussions that were
> spawned by your review -- from a doc shepherd perspective, and see
> where
> we are.   I know that the authors have sent a -09 version that
> addresses
> some, but not all issues.
> 
>  From the exchanges I've seen, I think the key remaining issues are
> related to:
> 
> (a) possibly introduction of congestion in the general internet if
> packets were somehow to escape a detnet domain.  The source of this
> congestion would be inelastic traffic using DetNet or due to congestion
> loss that is masked by PREOF.

These are two major issues that need to be addressed. Note that it may not be sufficient just to add a section on operational and deployment considerations. Also the existing text in the document will need to get aligned to normative guidance on how to avoid a congestion collapse.

In -09, one example would be Section 3.1. "Primary goals defining the DetNet QoS"

   Congestion protection operates by allocating resources along the path
   of a DetNet flow, e.g., buffer space or link bandwidth.  Congestion
   protection greatly reduces, or even eliminates entirely, packet loss
   due to output packet congestion within the network, but it can only
   be supplied to a DetNet flow that is limited at the source to a
   maximum packet size and transmission rate.  Note that congestion
   protection provided via congestion detection and notification is
   explicitly excluded from consideration in DetNet, as it serves a
   different set of applications.

At least the last sentence would contradict a better discussion of congestion in the document. For instance, it could just be removed. In any case, the current wording in the last sentence is not correct, as the IETF term for what is described in the last sentence is "congestion control".

Another example would be  Section 3.2.1.1. "Eliminate congestion loss"
 
   The primary means by which DetNet achieves its QoS assurances is to
   reduce, or even completely eliminate, congestion within a DetNet node
   as a cause of packet loss.  This can be achieved only by the
   provision of sufficient buffer storage at each node through the
   network to ensure that no packets are dropped due to a lack of buffer
   storage.  Note that a DetNet flow cannot be throttled, i.e., its
   transmission rate cannot be reduced via explicit congestion
   notification.

This section IMHO has to include a discussion of what happens in the (not expected) case that packets get dropped or that ECN marks are received. It is understood that this would not happen in normal operation of a DetNet network, but I believe just considering the error-free operation of a DetNet network is not sufficient for this document. At least for the risk of traffic that may escape from a DetNet network is inherently not sufficient to assume that the DetNet network is always error-free.

As a result, addressing my concerns will most likely require editing several parts of the document.

In addition, I'd like to emphasize that my review comment "It is surprising that there is hardly any discussion on network robustness and safety" covers more than just inelastic traffic that escapes from a DetNet network and masking of packet loss. Given that DetNet traffic may be extremely critical traffic, I really wonder why the document doesn't emphasize more the required robustness against failures *inside* the DetNet network as well as counter-measures. But this is something the WG needs to decide. As TSV-ART reviewer, I will be fine if the document clearly describes how the impact of failures will be isolated inside the DetNet network and will not put the general Internet at risk.


> (b) The use of the term 'transport' in DetNet to refer to what is
> basically a Traffic Engineered sub-network layer, such as is provided
> with MPLS-TE or Optical Transport Networks.

In the Internet architecture, the term 'transport' refers to Internet transport protocols. I doubt that the document can avoid discussing the implications of and interactions with Internet transport protocols such as UDP or TCP. As a result, I disagree that the document can use the term 'transport' to refer to traffic engineered sub-network layers.

From a TSV-ART point of view, the document can either only use the term "transport" for Internet transport protocols and use another term for sub-network layers (as handled in the *routing* area of the IETF), or the document has to clearly distinguish between the Internet transport layer and other uses of the term "transport" and explain the overlap. I believe the former would be less confusing, but I will leave it up to the TSV ADs to discuss terminology overlap in the IESG. As TSV-ART reviewer I insist that the document uses the terms "transport layer" and "transport protocol" only when referring to the Internet transport layer.


> Do you have any other issues that that are critical to be addressed
> before this work moves forward?  If so which?

Regarding Section 4.4 I have already deferred the discussion to the IESG. The TSV-ART review comment is that the IESG needs to carefully look at the concepts, terminology, and references in section 4.4.

Regarding my other comments, I acknowledge that -09 is a step forward. But given the cross-dependencies e.g. regarding terminology and definitions, I will need to read the text completely once there is a proposal how to address my review. As noted in my review, I believe the document must use terminology clearly and consistently. As example, a statement in -09 such as "Network nodes supporting DetNet flows have to implement some of the DetNet capabilities (not necessarily all) in order to treat DetNet flows such that their QoS requirements are met" is IMHO too vague. But in such cases it depends whether there is precise normative guidance elsewhere. And this requires looking at the text as a whole.

Best regards

Michael 



> 
> Thank you,
> 
> Lou
> 
> On 9/28/2018 6:24 PM, Michael Scharf wrote:
> > Reviewer: Michael Scharf
> > Review result: Ready with Issues
> >
> > The document "Deterministic Networking Architecture"
> > (draft-ietf-detnet-architecture-08) defines an overall framework for
> > Deterministic Networking.
> >
> > As TSV-ART reviewer, I believe that this document has issues as
> detailed below.
> >
> > Michael
> >
> > Major issues:
> >
> > * It seems that DetNet cannot easily be deployed in the Internet
> without
> > additional means. Thus, for a baseline document, one could expect
> some
> > explanation on the requirements of deploying DetNet in a network.
> DetNet
> > basically requires support in (almost) all network devices
> transporting DetNet
> > traffic. That assumption should be explicitly spelt out early in the
> document,
> > e.g., in the introduction. There also needs to be an explicit
> discussion of the
> > implications if not the whole network is aware of or supports DetNet.
> There is
> > some text in Section 4.2.2 and Section 4.3.3, but I believe
> additional explicit
> > discussion is needed at a prominant place. For instance, can use of
> DetNet do
> > harm to parts of a network not supporting DetNet? As a side note,
> when TCPM
> > published RFC 8257, the following disclaimer was added: "DCTCP, as
> described in
> > this specification, is applicable to deployments in controlled
> environments
> > like data centers, but it must not be deployed over the public
> Internet without
> > additional measures." I wonder if a similar disclaimer is needed for
> DetNet. If
> > there is an implicit assumption that DetNet will  be used in
> homogenous
> > environments with mostly DetNet-aware devices within the same
> organization,
> > such an assumption should be made explicit.
> >
> > * It is surprising that there is hardly any discussion on network
> robustness
> > and safety; this probably also relates to security. For instance,
> > misconfiguration or errors of functions performing packet replication
> could
> > severely and permantly congest a network and cause harm. How does the
> DetNet
> > architecture ensure that a network stays fully operational e.g. if
> the topology
> > changes or there are equipment failures? Probably this can be solved
> by
> > implementations (e.g., dynamic control plane), but why are
> corresponding
> > requirements not spelt out? Section 3.3.2 speculates that filters and
> policers
> > can help, and that may be true, but that probably still assumes
> consistently
> > and correctly configured (and well-behaving) devices. And Section
> 3.3.2 is
> > vague and mentions a "infinite variety of possible failures" without
> stating
> > any requirements or recommendations. There may be further solutions,
> such as
> > circuit breakers and the like. Why are such topics not discussed?
> >
> > * Somewhat related, the document only looks at impact of failures to
> the QoS of
> > DetNet traffic. What is missing is a discussion how to protect non-
> DetNet parts
> > of a network from any harm caused by DetNet mechanisms. Solutions to
> this
> > probably exist. But why is the impact on non-DetNet traffic (e.g., in
> case of
> > topology changes or failures of DetNet functions) not discussed at
> all in the
> > document?
> >
> > * Regarding security, an architecture like DetNet probably requires
> that only
> > authenticated and authorized end systems have access to the data
> plane. The
> > security considerations only briefly mention the control aspect ("the
> > authentication and authorization of the controlling systems").
> >
> > * For an architecture document, the lack of clarity and consistency
> regarding
> > terminology is concerning. This specifically applies to the case of
> incomplete
> > networks (as per Section 4.2.2 and 4.3.3) that include "DetNet-
> unaware nodes".
> > The document introduces terms such as "DetNet intermediate nodes" but
> then
> > repeatedly uses generic terms such as "node" or "hop" that may
> include
> > DetNet-unaware nodes. For instance, for incomplete networks, a
> sentence such as
> > "The primary means by which DetNet achieves its QoS assurances is to
> reduce, or
> > even completely eliminate, congestion within a node as a cause of
> packet loss"
> > seems to only apply to "DetNet transit nodes" but not "DetNet-unaware
> nodes".
> > Similar ambiguity exist for other use of the terms "hop" and "node",
> which may
> > or may not include DetNet-unaware nodes. It is unclear why the
> document does
> > not consistently use the terminology introduced in Section 2.1 in all
> sections
> > and clearly distinguishes cases with and without DetNet support.
> >
> > * Section 4.4 refers to RFC 7426, which is an informational RFC on
> IRTF stream,
> > and the document uses the concepts introduced there (e.g., "planes").
> This is
> > very confusing. First, an IETF Proposed Standard should probably
> refer to
> > documents having IETF consensus. An example would be RFC 7491, albeit
> there is
> > other related work as well, e.g., in the TEAS WG. Second, Section 4.4
> is by and
> > large decoupled from the rest of the document and not specific to
> DetNet.
> > Neither do other sections of the document refer to the concepts
> introduced in
> > Section 4.4, nor does Section 4.4 use the DetNet terminology or
> discuss
> > applicability to DetNet. Section 4.4 even mentions explicitly at the
> end that
> > it discusses aspects that are orthogonal to the DetNet architecture.
> It is not
> > at all clear why Section 4.4 is in this document. Section 4.4 could
> be removed
> > from the document without impacting the rest of the document.
> >
> > Minor issues:
> >
> > * Terminology "DetNet transport layer"
> >
> >    The term "transport layer" has a well-defined meaning in the IETF,
> e.g.
> >    originating from RFC 1122. While "transport" and e.g. "transport
> network" is
> >    used in the IETF for different technologies in different areas, I
> think the
> >    term "transport layer" is typically understood to refer to
> transport
> >    protocols such as TCP and UDP. As such, I personally find the term
> "DetNet
> >    transport layer" misleading and confusing. The confusion is easy
> to see e.g.
> >    in Figure 4, where UDP (which is a transport protocol as per RFC
> 1122) sits
> >    on top of "transport".
> >
> >    Based on the document it also may be solution/implementation
> specific whether
> >    the "DetNet transport layer" is actually a separate protocol layer
> compared
> >    to the "DetNet service layer". Thus it is not clear to me why the
> word
> >    "layer" has to be used, specifically in combination "transport
> layer".
> >
> >    To me as, the word "transport layer" (and "transport protocol")
> should be
> >    used for protocols defined in TSV area, consistent with RFC 1122.
> But this is
> >    probably a question to be sorted out by the IESG.
> >
> > * Page 9
> >
> >     A DetNet node may have other resources requiring allocation
> and/or
> >     scheduling,
> >
> >    This is just one of several examples for inconsistent use of
> terminology.
> >    What is a "DetNet node"? That term is not introduced in Section
> 2.1
> >
> > * Page 14
> >
> >     A DetNet network supports the dedication of a high proportion
> (e.g.
> >     75%) of the network bandwidth to DetNet flows.
> >
> >    The 75% value is not reasoned. What prevents using 99% of the
> bandwidth for
> >    DetNet traffic?
> >
> > * Page 15: Figure 2
> >
> >    If the term "transport layer" cannot be avoided, the labels in
> this figure
> >    should at least be expanded to "DetNet transport layer".
> >
> > * Page 18: Figure 4
> >
> >    As already mentioned earlier, Figure 4 is confusing. UDP is a
> transport
> >    protocol. If the term "transport" cannot be avoided, the labels in
> this
> >    figure should at least be expanded to "DetNet transport".
> >
> > * Page 23
> >
> >     If the source transmits less data than this limit
> >     allows, the unused resource such as link bandwidth can be made
> >     available by the system to non-DetNet packets.
> >
> >    Could there be additional requirements on the use of unused
> resources by
> >    non-DetNet packets, e.g., regarding preemption? I am just
> wondering... If
> >    that was possible, a statement like "... can be made available by
> the system
> >    to non-DetNet packets as long as all guarantees are fulfilled"
> would be on
> >    the safe side, no?
> >
> > * Page 27:
> >
> >     DetNet achieves congestion protection and bounded delivery
> latency by
> >     reserving bandwidth and buffer resources at every hop along the
> path
> >     of the DetNet flow.
> >
> >    Why does this sentence use the word "hop"? As far as I understand,
> in DetNet
> >    bandwidth and buffer resources are reserved in each DetNet
> intermediate node.
> >    If there were hops over IP routers not being DetNet intermediate
> nodes, no
> >    resources would be reserved there. As per Section 4.3.3, it is
> possible to
> >    deploy DetNet this way. And obviously there can be resource
> bottlenecks below
> >    IP, on devices that are not routers... So does "hop" here refer to
> IP router
> >    hops or also to devices not processing IP (or IP/MPLS)?
> >
> > * Page 27:
> >
> >     Standard queuing and transmission selection algorithms allow a
> >     central controller to compute the latency contribution of each
> >     transit node to the end-to-end latency, ...
> >
> >    The text does not explain why a _central_ controller is needed for
> this
> >    computation. Why would a distributed control plane not be able to
> realize
> >    this computation. Isn't this implementation-specific?
> >
> > * Page 32
> >
> >    To somebody who is not deeply familiar with DetNet, it is
> impossible to parse
> >    the description of the examples in Section 4.7.3. For instance,
> "VID +
> >    multicast MAC address" is not introduced. I think this example
> must be
> >    expaned with additional context and explanation to be useful to
> readers.
> >
> > * Page 34
> >
> >     There are three classes of information that a central controller
> or
> >     distributed control plane needs to know that can only be obtained
> >     from the end systems and/or nodes in the network.
> >
> >    Wouldn't it be sufficient to state "Provisioning of DetNet
> requires knowledge
> >    about ...". Does it matter in this context whether the
> provisioning is done
> >    by a central controller or a distributed control plane? For
> instance, could
> >    the same paragraph also apply to a network that uses _multiple_
> central
> >    controllers, or hybrid combinations of central controllers and
> distributed
> >    control planes? In general, an architecture document should be
> agnostic to
> >    implementation aspects unless there is a specific need. In this
> specific
> >    case, I fail to see a need to discuss the realization of the
> control plane of
> >    a network.
> >
> > Editorial nits:
> >
> > * Page 9:
> >
> >     The low-level mechanisms described in Section 4.5 provide the
> >     necessary regulation of transmissions by an end system or
> >     intermediate node to provide congestion protection.  The
> allocation
> >     of the bandwidth and buffers for a DetNet flow requires
> provisioning
> >     A DetNet node may have other resources requiring allocation
> and/or
> >     scheduling, that might otherwise be over-subscribed and trigger
> the
> >     rejection of a reservation.
> >
> >    Probably a full stop is missing after "provisioning".
> >
> > * Page 11: "... along separate (disjoint non-SRLG) paths ..."
> >
> >    I find this confusing. I would understand e.g. "along separate
> >    (SRLG-disjoint) paths".
> >
> > * Page 34:
> >
> >     When using a peer-
> >     to-peer control plane, some of this information may be required
> by a
> >     system's neighbors in the network.
> >
> >    Would "acquired" be a better term?
> >
> > * Page 34:
> >
> >     o  The identity of the system's neighbors, and the
> characteristics of
> >        the link(s) between the systems, including the length (in
> >        nanoseconds) of the link(s).
> >
> >    "Latency" or "delay" would probably be a better terms if the value
> is
> >    measured in nanoseconds.
> >
> > * Page 35:
> >
> >     DetNet is provides a Quality of Service (QoS), and as such, does
> not
> >     directly raise any new privacy considerations.
> >
> >    Broken sentence
> >
> > * Please expand acronyms on first use (e.g., OTN)
> >
> >