Re: [Nmlrg] links to each slide of presentations//RE: slides of NMLRG #3, June 27th, Athens

Sheng Jiang <jiangsheng@huawei.com> Wed, 29 June 2016 08:10 UTC

Return-Path: <jiangsheng@huawei.com>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B5BB012D598 for <nmlrg@ietfa.amsl.com>; Wed, 29 Jun 2016 01:10:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.646
X-Spam-Level:
X-Spam-Status: No, score=-5.646 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ph6_0VFMKDUR for <nmlrg@ietfa.amsl.com>; Wed, 29 Jun 2016 01:10:47 -0700 (PDT)
Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0E8BA12DA10 for <nmlrg@irtf.org>; Wed, 29 Jun 2016 01:10:45 -0700 (PDT)
Received: from 172.24.1.60 (EHLO nkgeml411-hub.china.huawei.com) ([172.24.1.60]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DJO16105; Wed, 29 Jun 2016 16:10:33 +0800 (CST)
Received: from NKGEML515-MBX.china.huawei.com ([fe80::a54a:89d2:c471:ff]) by nkgeml411-hub.china.huawei.com ([10.98.56.70]) with mapi id 14.03.0235.001; Wed, 29 Jun 2016 16:10:24 +0800
From: Sheng Jiang <jiangsheng@huawei.com>
To: David Meyer <dmm@1-4-5.net>
Thread-Topic: [Nmlrg] links to each slide of presentations//RE: slides of NMLRG #3, June 27th, Athens
Thread-Index: AQHR0UXZfny1k5v9oUOv37+HaylYt5/+e2IAgAGH4Uw=
Date: Wed, 29 Jun 2016 08:10:24 +0000
Message-ID: <5D36713D8A4E7348A7E10DF7437A4B927CA897D4@NKGEML515-MBX.china.huawei.com>
References: <5D36713D8A4E7348A7E10DF7437A4B927CA893E9@NKGEML515-MBX.china.huawei.com>, <CAHiKxWhfdukjRVnSKnbLwMwamiqZYoCD350BoNqO4njqkQoU1g@mail.gmail.com>
In-Reply-To: <CAHiKxWhfdukjRVnSKnbLwMwamiqZYoCD350BoNqO4njqkQoU1g@mail.gmail.com>
Accept-Language: en-GB, zh-CN, en-US
Content-Language: en-GB
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.47.66.194]
Content-Type: multipart/alternative; boundary="_000_5D36713D8A4E7348A7E10DF7437A4B927CA897D4NKGEML515MBXchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020206.5773827D.0013, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32
X-Mirapoint-Loop-Id: d3da655d6566e22179791efb37db52bc
Archived-At: <https://mailarchive.ietf.org/arch/msg/nmlrg/YTkZ9LcXGma6ZOBh0VoR7UiDQoI>
Cc: "nmlrg@irtf.org" <nmlrg@irtf.org>
Subject: Re: [Nmlrg] links to each slide of presentations//RE: slides of NMLRG #3, June 27th, Athens
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Jun 2016 08:10:50 -0000

Hi, David,



Thanks so much for your email. I am sharing the same concern you have. Actually, this Monday in our NMLRG #3, Athens, we did discussed this issue regarding to the potential standardized dataset. And we are planning to have an open discussion to regarding to the requirements and validation qualification of the potential common dataset in the coming NMLRG #4, Berlin, IETF96.



However, upto now, I have serveral concerns. Firstly, before we could discuss such dataset, we need to narrow our target scenarios to a few very specific use cases, giving that there are so many various scenarios in the network area and they are so different from each other and require different dataset. Secondly, as we did talk with operators, the real data is the important properties for them. So, it would be almost impossible for them to share them. It leaves two possibilities on how to get dataset: a) generating dataset using simulation; b) try to get some partial dataset from operators that have removed their sensenble data. Both way are problemstic, in the risk that a lot of efforts may waste in wrong areas. Thirdly, the availability of data may very different in different network environments. It depends on the measurement/congnitive functions that have been deployed or implemented.



Although having the above concerns, I  think we should: a) research on certain dataset. It would be very useful to prove the machine learning could perform better or provide more flexibility and adaptibility; b) continue our discuss on various network use cases that could benefit from applying machine learning, in order to serve wider network communities.



Another important point I would like to discuss is that you mentioned "our goal is to provide accurate, repeatable, and explainable results." Ideally, we do want this. It is our traditional way for logic-based programming. However, I am not sure that is a doable target with machine learning. Many of machine learning algorithms are based on statistics theory, which may not be that much accurate. Some of machine learning algorithms are black box or have very complexicated internal computing process. So, the resulf of these algorithms may be hard to be explained. Also the process of machine learning may be hard to be controlled or intervened. Repeatable, yes, we have to have this feature to be widely adopted. However, repeatable may have to come with some level of toleration for the inaccurate.



Best regards,



Sheng



________________________________

From: David Meyer [dmm@1-4-5.net]
Sent: 28 June 2016 23:30
To: Sheng Jiang
Cc: nmlrg@irtf.org
Subject: Re: [Nmlrg] links to each slide of presentations//RE: slides of NMLRG #3, June 27th, Athens

Sheng,

Thanks for the pointers. One thing I notice is that we don't have much on what the characteristics of network data might be, and as such, what kinds of existing learning algorithms might be suitable and where additional development might be required. For example, is the collected data IID? Is it time series? Is the underlying distribution stationary? If not, how does that constrain algorithms we might use or develop? For example, how does a given algorithm deal with concept drift/internal covariate shift (in the case of DNNs; see e.g., batch normalization [0]). There are many other such questions, such as is the data categorical (e.g., ports, IP addresses) or is it continuous/discrete (e.g., counters). And if the data in question is categorical, what is the cardinality of the categories (this will inform how such data can be encoded); in the case of IP addresses we can't really one-hot encode addresses because their cardinality is too large (2^32 or 2^128); this has implications for how we build classifiers (in particular, for softmax layers in DNNs of various kinds).

Related to the above is the question of features? What are good features for networking? Where do they come from? Are they domain specific? Can we learn features the way a DNNs does in the network space? Can we use autoencoders to discover such features? Or can we use GANs to train DNNs for network classification tasks in an unsupervised manner? Are there other, non-ad-hoc (well founded) methods we can use, or is every use case a one-off (one would hope not).

We can carry the same kinds of analyzes to the algorithms applied. For example, while something k-means is an effective way to get a feeling for how continuous/discrete hangs together, if our data is categorical statistical clustering approaches such as LDA might provide a more well-founded approach (of course, as with most Bayesian techniques is the question of approximate inference arises since in most interesting cases the integral that we need to solve, namely the marginal probability data isn't tractable so we need to resort to MCMC or more likely variational inference). And what about the use of SGD/batch normalization etc with DNNs, and perhaps more importantly, can we use network data to train DNN policy networks for reinforcement learning like we saw in deep Q-learning and AlphaGo?

These comments are all by way of saying that we don't have a solid theoretical understanding (yet) of how techniques that have been so successful in other domains (e.g., DNNs for perceptual tasks) generalize to networking use cases. We will need this understanding if our goal is to provide accurate, repeatable, and explainable results.

In order to accomplish all of this we need, as I have been saying , not only a good understanding of how these algorithms work but also standardized data sets and associated benchmarks so we can tell if we are making progress (or even if our techniques work). Analogies here include MNIST and ImageNet and their associated benchmarks, among others. As mentioned  standardized data sets are key to making progress in the ML for networking space (otherwise how do you know your technique works and/or improves on another techniques?). One might assume that these data sets would need to be labeled (as supervised learning is where most of the progress is being made these days), but not necessarily; Generative Adversarial Networks (GANs) have emerged as a new way to train DNNs in an unsupervised manner (this is moving very rapidly; see e.g., https://openai.com/blog/generative-models/).

The summary here is that the "distance" between theory and practice in ML is effectively zero right now due to the incredible rate of progress in the field; this means we need  to understand both sides of the theory/practice coin in order to be effective. None of the slide decks provide much background on what the proposed algorithms are, how they work, or why they should be expected to work on network data.

Finally, if you are interested in LDA or other algorithms there are a few short explanatory pieces I have written for my team on http://www.1-4-5.net/~dmm/ml (works in progress).

Thanks,

Dave

[0] https://arxiv.org/pdf/1502.03167.pdf

On Tue, Jun 28, 2016 at 7:03 AM, Sheng Jiang <jiangsheng@huawei.com<mailto:jiangsheng@huawei.com>> wrote:
Oops... The proceeding page for interim meeting seems not be as intelligent as the proceeding pages of IETF meetings. Our proceeding page does not autonomically show the slides. I have sent an email to IETF secretary to ask them to fix it. Meanwhile, in this email, here is the links for each presentations:

Slide Filename Edit Replace Delete
Chair Slides
https://www.ietf.org/proceedings/interim-2016-nmlrg-01/slides/slides-interim-2016-nmlrg-01-0.pdf

Introduction to Network Machine Learning & NMLRG
https://www.ietf.org/proceedings/interim-2016-nmlrg-01/slides/slides-interim-2016-nmlrg-01-1.pdf

Data Collection and Analysis At High Security Lab
https://www.ietf.org/proceedings/interim-2016-nmlrg-01/slides/slides-interim-2016-nmlrg-01-2.pdf

Use Cases of Applying Machine Learning Mechanism with Network Traffic
https://www.ietf.org/proceedings/interim-2016-nmlrg-01/slides/slides-interim-2016-nmlrg-01-3.pdf

Mobile network state characterization and prediction
https://www.ietf.org/proceedings/interim-2016-nmlrg-01/slides/slides-interim-2016-nmlrg-01-4.pdf

Learning how to route
https://www.ietf.org/proceedings/interim-2016-nmlrg-01/slides/slides-interim-2016-nmlrg-01-5.pdf

Regards,

Sheng
________________________________________
From: nmlrg [nmlrg-bounces@irtf.org<mailto:nmlrg-bounces@irtf.org>] on behalf of Sheng Jiang [jiangsheng@huawei.com<mailto:jiangsheng@huawei.com>]
Sent: 28 June 2016 21:03
To: nmlrg@irtf.org<mailto:nmlrg@irtf.org>
Subject: [Nmlrg] slides of NMLRG #3, June 27th, Athens

Hi, nmlrg,

All slides that have been presented in our NMLRG #3 meeting, June 27th, 2016, Athens, Greece, with EUCNC2016, have been uploaded. They can be accessed through below link

https://www.ietf.org/proceedings/interim/2016/06/27/nmlrg/proceedings.html

Best regards,

Sheng
_______________________________________________
nmlrg mailing list
nmlrg@irtf.org<mailto:nmlrg@irtf.org>
https://www.irtf.org/mailman/listinfo/nmlrg
_______________________________________________
nmlrg mailing list
nmlrg@irtf.org<mailto:nmlrg@irtf.org>
https://www.irtf.org/mailman/listinfo/nmlrg