Re: [Teas] Benjamin Kaduk's Discuss on draft-ietf-teas-native-ip-scenarios-09: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Thu, 03 October 2019 21:05 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: teas@ietfa.amsl.com
Delivered-To: teas@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2B3D012081B; Thu, 3 Oct 2019 14:05:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lvRbHCT1M5vz; Thu, 3 Oct 2019 14:05:01 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5AE4C120818; Thu, 3 Oct 2019 14:05:01 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id x93L4so8003172 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 3 Oct 2019 17:04:58 -0400
Date: Thu, 03 Oct 2019 14:04:53 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: Aijun Wang <wangaijun@tsinghua.org.cn>
Cc: The IESG <iesg@ietf.org>, lberger@labn.net, teas-chairs@ietf.org, teas@ietf.org, draft-ietf-teas-native-ip-scenarios@ietf.org
Message-ID: <20191003210453.GA6424@kduck.mit.edu>
References: <E76E0E68-9DD6-44C7-9C3A-733864C69ECB@tsinghua.org.cn>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <E76E0E68-9DD6-44C7-9C3A-733864C69ECB@tsinghua.org.cn>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/teas/f-o4R1IUGCJH0dlJ48l5E0zx02k>
Subject: Re: [Teas] Benjamin Kaduk's Discuss on draft-ietf-teas-native-ip-scenarios-09: (with DISCUSS and COMMENT)
X-BeenThere: teas@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Traffic Engineering Architecture and Signaling working group discussion list <teas.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/teas>, <mailto:teas-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/teas/>
List-Post: <mailto:teas@ietf.org>
List-Help: <mailto:teas-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/teas>, <mailto:teas-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Oct 2019 21:05:05 -0000

Hi Aijun,

On Thu, Oct 03, 2019 at 04:10:16PM +0800, Aijun Wang wrote:
> 
> Hi, Benjamin:
> Thanks for your review.
> 
> On summary, this draft just gives the scenarios that needed for TE in Native IP network, this is absent in current existing IETF documents. The simulation results just demonstrated the applicability of central control under the global view.
> This document is the base document of other two drafts:
> 
> https://tools.ietf.org/html/draft-ietf-teas-pce-native-ip-04 (Solution Draft)
> 
> https://tools.ietf.org/html/draft-ietf-pce-pcep-extension-native-ip-04(PCEP extension)
> 
> There are currently at least three different solutions trying to accomplish the TE necessities in native IP network. This scenarios and simulation draft is just the start points of these documents.

Okay, so it sounds like the goal of the document is to show that there are
some scenarios/use-cases in which Native IP provides a substantial
improvement from the previous state of the art (or, perhaps, from the other
competing options), and the simulations are just a tool used to augment the
qualitative description with some concrete instantiations.  This does not
match fully with the current Abstract/Introduction, the former of which I
quote from:

                                                       This document
   describes various complex scenarios and simulation results when
   applying a PCE in a native IP network.  This solution, referred to as
   Centralized Control Dynamic Routing (CCDR), integrates the advantage
   of using distributed protocols and the power of a centralized control
   technology.

A pithy summary of that exerpt might be "describes some routing scenarios
and simulation results for CCDR; CCDR combines advantages of distributed
protocols and centralized control".  While those statements seem to be
true, it doesn't say much about the presented scenarios and results
implying that CCDR is the best or even a good solution for the described
scenarios; it's just two facts that don't get tied together.

> More details responses are inline below. Wish they can convince you for the future vote.

I'm pretty sure that we'll need to update the text for me to change my
ballot position, but I also don't forsee significant challenges in doing
so.

> Thanks in advance.
> 
> Aijun Wang
> China Telecom
> 
> >> On Oct 3, 2019, at 13:15, Benjamin Kaduk via Datatracker <noreply@ietf.org> wrote:
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-teas-native-ip-scenarios-09: Discuss
> > 
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> > 
> > 
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> > 
> > 
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-ietf-teas-native-ip-scenarios/
> > 
> > 
> > 
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> > 
> > I have two points for discussion:
> > 
> > (1) If this document was subject to the approval requirements for
> > standards action, it would basically be suffering from "death by
> > abstain"; this seems like a good signal for the IESG to discuss whether
> > it makes sense to approve this document even though the more-lenient
> > document-action requirements would otherwise let it go forward.
> > 
> [Aijun Wang]: This document is the cooperation results of authors from operators, research university and the vendor. The style of the document maybe slight different from the traditional IETF documents, but the aim of it same as others: give some useful information to the community based on our efforts.
> Wish the relevant responses and explanations can convince the reviewer.
> 
> > (2) The document seems incomplete to me.  It has some aspects of being
> > all/any of a use-cases document, an architecture document, and an
> > applicability analysis, but does not seem to have a complete treatment
> > for any of them.  To be clear, there is enough in the document to
> > indicate that the topic merits further work, and there are some
> > interesting results, but I'm not sure that publication as an RFC is
> > appropriate for this document in this form.  Specifically:
> [Aijun Wang] It is mainly one native IP TE scenarios description document. The simulation part of this document just want to convince the reader the addition gain of central control under the global view.
> The architecture/solutions contents is in above mentioned solutions draft.
> 
> > 
> > (2a) use-cases: we see the examples of star topology with BRAS/SR and
> > the simulated network in Figure 6, but there is not much discussion of
> > where these (or similar) scenarios arise in practice, how common they
> > are, and how closely the simulation reflects actual usage.
> 
> [Aijun Wang]: The star topology and simulated topology are all abstraction from our real network deployment.

That's what I expected, but it can be helpful to state this explicitly.
(On the other hand, this is probably textbook stuff, so minimal coverage is
okay.)

Since we're focusing on the "use-cases" portion, to use my terminology,
then the main thing to focus on would be to convince the reader that the
simulation results are likely to translate well to real-world usage.
Mentioning that the presented results hold across many different (random)
simulations, as Éric suggest, would help with this, especially if some
sense of the nature of the distribution across trials can be included (such
as standard deviations).  Drawing the connection between the real-world
topology and model topology (as mentioned above) also helps with this.

As a side note, I do wonder how the simulation results would look if there
was a "spatial affinity" for the attachments of edge nodes to the core, in
that there is a correlation between which subset of core nodes are attached
to individual edge nodes.  I expect the impact to be minor, since the core
is fully connected in the simulated topology, but can't guarantee it.  But
this is a side note since I don't think running such new simulations would
be necessary in order to get the document into a state where it can be
published.

> > 
> > (2b) architecture: a very high-level picture is given ("use a PCE to
> > engineer some of the IP traffic on a network and improve overall
> > efficiency"), but we don't see much about how PCCs will be involved and
> > apply the computed paths or what requirements will need to be met by the
> > protocols and components used to instantiate the architecture
> 
> [Aijun Wang]: The above contents that you concerned is described in the above mentioned solution and PCEP extension drafts. As explained in previous, this draft focus only the overall scenarios and the simulation results.

Sure, and that's a fine way to split up the content -- we don't necessarily
need to include the architecture of the solution in this document.

> > 
> > (2c) applicability: we see some scenarios where the proposed technology
> > shows drastic improvement over the alternative selected for comparison,
> > but there is little to give confidence that this reflects a broad maxima
> > that is robust to environmental variations.  Is the alternative selected
> > for comparison an appropriate one for the cases in question?  How would
> > the propsal react in the face of changes in the environment it runs in,
> > such as link or node failure, changes in the baseline usage, or traffic
> > spikes?  What timescale can it react in and what level of visibility
> > does the PCE need into current conditions in order to be reliable?
> [Aijun Wang] The SDN controller get the overall underlying network conditions via BGP-LS/SNMP/Netflow protocol in about 5 minutes intervals. The optimization process for the simulated complex topology and traffic matrix need only tens of seconds. For further strict requirements, it is easy to enhance the central controller’s capability to tackle the changeable environment.
> 
> The above concerns has more relations to the solutions, we can add such descriptions in the solutions draft at its “Deployment Considerations” part.(https://tools.ietf.org/html/draft-ietf-teas-pce-native-ip-04#section-9)

I agree that these concerns relate to specific solutions and are best
addressed alongside the solutions draft(s); we don't necessarily need to
have this draft cover this topic.

> > 
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> > 
> > If we're using a PCE in a native IP network, how do the computed routes
> > get applied; are we using source routing or just being careful about
> > the prefixes in use?  (Are there going to be any scaling concerns?)
> > 
> 
> [Aijun Wang] These contents are in the above mentioned solution draft. It is not using the current source routing, but the combination of multiple BGP session and the manipulation of the path to BGP nexthop with the help of PCEP protocol. The scalability is discussed in section 9.1 of solution draft(https://tools.ietf.org/html/draft-ietf-teas-pce-native-ip-04#section-9.1)

Okay, thanks.


> 
> 
> > Section 3.1
> > 
> > I don't understand what Figure 1 is intending to convey.  Are "Private
> > Cloud Site" and "Public Cloud Site" supposed to be separate boxes on the
> > edge of the distributed control network?  Why is the "Cloud Based
> > Application" in neither of the named clouds?
> > 
> 
> [Aijun Wang]: The source and destination of the “Cloud-Based Application” are located at the “Private Cloud Site” and “Public Cloud Site”. You can deem these sites as the concepts described in traditional VPN deployment scenarios.

So the "Cloud-Based Application" has components in both Cloud sites, but
excludes the PCE and distributed control network?  Perhaps extending the
box for it to include parts at both sites, or having smaller boxes at both
sites and a line grouping them together, would be more clear.

> > Section 3.2
> > 
> >   Network topology within a Metro Area Network (MAN) is generally in a
> >   star mode as illustrated in Figure 2, with different devices
> > 
> > "Generally" within what scope, commercial ISPs?  I know of things that
> > could be called MANs that use a different topology.
> > 
> [Aijun Wang]: In commercial ISP.

Okay.  This was implied by the previous discussion of "service provider
network", but I'd suggest being explicit about it here.

> > Section 4.1
> > 
> > nit: several sentences are missing spaces after the full stop.
> 
> [Aijun Wang] Will correct them in update version.
> > 
> > Section 4.2
> > 
> > Is a fully-linked core of 100 nodes representative of typical
> > deployments?  That's a lot of links not going to customers!
> 
> [Aijun Wang]: The core nodes can also connects the customers. That is the reasons that the traffic matrix is 500*500, not 100*100
> 
> > 
> > Section 4.3
> > 
> >   The traffic matrix is generated based on the link capacity of
> >   topology.  [...]
> > 
> > I don't know how to interpret this statement.
> > It does sound like the traffic matrix is generated in a somewhat
> > arbitrary fashion, with no stated effort to keep it aligned with
> > real-world traffic patterns.
> > 
> [Aijun Wang] We just keep the overall congestion links ratio is similar with the real network. The arbitration fashion can otherwise certificate the robustness of the simulation process.  

If I understand correctly, there's three important variables for the
simulation: baseline load, the congesion threshold for each link,  and the
distribution of flow bandwidth for the simulated flows.  I don't know if
the simulation is being run with zero baseline load so that all flows are
centrally managed or not, so please say something about that.  For the
other two, sould it be accurate to say something like "the congestion
threshold for the edge/core and intra-core links are set to the fixed value
of 90%" or "the congestion threshold for edge/core and intra-core links are
sampled from a distribution representative of observed real-world
environments", and "the bandwidth requirements of generated flows is from a
distribution representative of observed real-world flow data"?

In short, I think we need to say enough about what was done that someone
else could independently run if not the same experiment, one sufficiently
similar that we expect the results to also be similar.

Thanks,

Ben