[alto] Review of draft-xiang-alto-exascale-network-optimization-01

Yichen Qian <92yichenqian@tongji.edu.cn> Tue, 27 June 2017 08:13 UTC

Return-Path: <92yichenqian@tongji.edu.cn>
X-Original-To: alto@ietfa.amsl.com
Delivered-To: alto@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8B4FD129417 for <alto@ietfa.amsl.com>; Tue, 27 Jun 2017 01:13:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.903
X-Spam-Level:
X-Spam-Status: No, score=-1.903 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rujddsSPRLpa for <alto@ietfa.amsl.com>; Tue, 27 Jun 2017 01:13:51 -0700 (PDT)
Received: from tongji.edu.cn (mailusrsvr2.tongji.edu.cn [202.120.164.62]) (using TLSv1.2 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 08EE612009C for <alto@ietf.org>; Tue, 27 Jun 2017 01:13:49 -0700 (PDT)
Received: from [192.168.1.161] (unknown [10.60.139.182]) by mailusersvr2 (Coremail) with SMTP id PqR4ygA3ubC3E1JZ7+OLAA--.3808S2; Tue, 27 Jun 2017 16:13:46 +0800 (CST)
From: Yichen Qian <92yichenqian@tongji.edu.cn>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 27 Jun 2017 16:13:42 +0800
Message-Id: <17A28926-71DD-449D-A808-FBC275C6064A@tongji.edu.cn>
Cc: IETF ALTO <alto@ietf.org>
To: Qiao Xiang <xiangq27@gmail.com>
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
X-Mailer: Apple Mail (2.3124)
X-CM-TRANSID: PqR4ygA3ubC3E1JZ7+OLAA--.3808S2
X-Coremail-Antispam: 1UD129KBjvJXoWxZF4kKr17tFW5Xr15Ary7Wrg_yoWruF4xpF WfCa1Dtws7Xr1UCan7Zw4xXr1ru3sYyF43G343KryUZwsxCrWDtF12yF1YvFZrCryfXw1Y v390vr15uwn0vaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUkjb7Iv0xC_Kw4lb4IE77IF4wAFc2x0x2IEx4CE42xK8VAvwI8I cIk0rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2 AK021l84ACjcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v2 6F4UJVW0owA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxV W0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv 7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r 1j6r4UM4x0Y48IcxkI7VAKI48JMxAIw28IcxkI7VAKI48JMxAIw28IcVCjz48v1sIEY20_ Cr1UJr1UMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxV Cjr7xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUXVWUAwCIc40Y0x0EwIxGrwCI42IY 6xIIjxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6x AIw20EY4v20xvaj40_WFyUJVCq3wCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv 6xkF7I0E14v26r1j6r4UYxBIdaVFxhVjvjDU0xZFpf9x07beAp5UUUUU=
X-CM-SenderInfo: qrqrmmmu6w00xjmlhvlgxou0/
Archived-At: <https://mailarchive.ietf.org/arch/msg/alto/aK-DhNZLDLWB124VMeyDtnDIJGo>
Subject: [alto] Review of draft-xiang-alto-exascale-network-optimization-01
X-BeenThere: alto@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Application-Layer Traffic Optimization \(alto\) WG mailing list" <alto.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/alto>, <mailto:alto-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/alto/>
List-Post: <mailto:alto@ietf.org>
List-Help: <mailto:alto-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/alto>, <mailto:alto-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 27 Jun 2017 08:13:53 -0000

Hi Qiao, all,

It’s interesting to introduce I/O and computation resources into ALTO to get not only the network resources for more accurate query. Here is my review of the draft:

>>> It is mentioned in chapter 4.2 (“Sharing raw site/cluster information would violate sites' privacy constraints.”) and 6 (“How much privacy, including… will be exposed?”) that the privacy is a problem in providing the information, but it seems that it is not addressed in this draft.

Abstract
"In this document, we propose that it is feasible to use existing ALTO services to provides not only network information, but also information about other resources in science networks including computing and storage."

>>> to provides -> to provide

"ExaO provides simple APIs for users to submit and manage dataset transfer and analytic requests and to monitor the status of each request, along with fine-grained local and global network and site state information in real-time. "

>>> Too many “and” to read and understand.
 
1.  Introduction
"Applications such as the Production ANd Distributed Analysis system (PanDA) in ATLAS and the Physics Experiment Data Export system (PhEDEX) in CMS have been developed to manage the data transfers among different cluster sites."
   
>>> have been -> has been

"Section 6 lists several key issues to address in order to realize the proposal of providng multi-resource information by ALTO topology services."

>>> providng -> providing

5.2.  Example: encode storage bandwidth into path vector
"Other than network resource, assume in this topology eh1 and eh3 are equipped with commodity hard drive disk (HDD) while eh2 and eh4 are equipped with SSD."

>>> There is the full name of HDD but no full name for SSD.
   
"In this example, if we see the end hosts as network elements, the storage I/O bandwidth of each host can also be encoded as an abstract element into the path-vector."

>>> It’s better to have a new figure with the storage I/O bandwidth as an abstract element in the topology.

6.  Key Issues
"a large dataset transfer or analytic application always involve many network elements in multiple clusters/sites and the absolute number of involved network elements keep increasing as the scale of clusters increase."

>>> involve -> involves

"There still lacks of an analytics or experimental understanding on the scalability of path-vector and RSA services."

>>> lacks of -> lacks

7.1.  Architecture
"including replica selection, routing path computation and bandwidth allocation, and request parallelization decisions, such as which cluster each sub-request should be placed at the multi-resource orchestrator."

>>> missing a preposition before “the multi-resource orchestrator” -> in the multi-resource…

7.2.1.  User API

>>> Another API that allows a user to query all the requests that the user has submitted can be considered in case of missing the requestID.

7.4.  ALTO Server
"Each ALTO server must provide basic information services as specified in [RFC7285] such as network map, cost map, endpoint cost service (ECS), etc.  To support the fine-grained multi-resource allocation in ExaO, each ALTO server should also provide more fine-grained information about different resources in clusters through ALTO extension services such as the routing state abstraction [DRAFT-RSA], path vector [DRAFT-PV], network graph [DRAFT-NETGRAPH],multi-cost [DRAFT-MC] and cost-calendar [DRAFT-CC] services."
				
>>> Providing basic information along with fine-grained information may has some overlap or redundant information such as information provided by cost map and RSA. It may increase the burden of data transfer. Some ways to reduce the redundant or send part of the information can be considered.

7.7.1.  Orchestration Algorithms
"The modular design of ExaO allows the adoption of different orchestration algorithms and methodologies, depending on the specific performance requirements."
>>> If using a specific algorithm, what interface (inputs/outputs…) should users specify?

7.7.3.  Example: A Max-Min Fairness Resource Allocation Algorithm

>>> As an example, I think there is too much introduction and description for MFRA. May be it can be seen as a default algorithm in the system?

8.3.  Constraints of the MFRA Algorithm
"Simply denoting the replica selection as a set of binary constraint will significantly increases the computation complexity of the scheduling process."

>>> increases -> increase