Re: [Nmlrg] links to each slide of presentations//RE: slides of NMLRG #3, June 27th, Athens

Sheng Jiang <jiangsheng@huawei.com> Fri, 01 July 2016 20:36 UTC

From: Sheng Jiang <jiangsheng@huawei.com>
To: David Meyer <dmm@1-4-5.net>
Thread-Topic: [Nmlrg] links to each slide of presentations//RE: slides of NMLRG #3, June 27th, Athens
Thread-Index: AQHR0UXZfny1k5v9oUOv37+HaylYt5/+e2IAgAGH4UyAAAvZgIACSbUQ//+hNICAAe6t3A==
Date: Fri, 01 Jul 2016 20:36:05 +0000
Message-ID: <5D36713D8A4E7348A7E10DF7437A4B927CA8A355@NKGEML515-MBX.china.huawei.com>
References: <5D36713D8A4E7348A7E10DF7437A4B927CA893E9@NKGEML515-MBX.china.huawei.com> <CAHiKxWhfdukjRVnSKnbLwMwamiqZYoCD350BoNqO4njqkQoU1g@mail.gmail.com> <5D36713D8A4E7348A7E10DF7437A4B927CA897D4@NKGEML515-MBX.china.huawei.com> <CAHiKxWhMPjKY24zucg-UND0ARWHbjnOyYv=Tia-yk-Vq6Ta_SA@mail.gmail.com> <5D36713D8A4E7348A7E10DF7437A4B927CA89FB4@NKGEML515-MBX.china.huawei.com>, <CAHiKxWh5D0JiDLTKV3ePT4XNhk9ZhmDu66Z=5hxOpoo5pxkorA@mail.gmail.com>
In-Reply-To: <CAHiKxWh5D0JiDLTKV3ePT4XNhk9ZhmDu66Z=5hxOpoo5pxkorA@mail.gmail.com>
Accept-Language: en-GB, zh-CN, en-US
Content-Language: en-GB
Content-Type: multipart/alternative; boundary="_000_5D36713D8A4E7348A7E10DF7437A4B927CA8A355NKGEML515MBXchi_"
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/nmlrg/WcLkd861BlMmjQ2tU3C_lWnN0UU>
Cc: "nmlrg@irtf.org" <nmlrg@irtf.org>
Subject: Re: [Nmlrg] links to each slide of presentations//RE: slides of NMLRG #3, June 27th, Athens
Precedence: list

Hi, David,

//Cut to short since the original text has become unreadable giving too many replies.

I guess there should be a clarification what one-off solution I meant. It actually meant specific ML solution for every specific network tasks, but they should be as generalized among networks. It should be able to work on most of networks, if not all of them. For example, we should have a DNS-oriented ML solution for DNS malicious attacks. I am not sure how much it could be generalized with router-oriented ML solution for DDoS attacks on routers. I am also not sure whether some part of these ML solutions could be reused from more general ML-based analysis on network traffics. But, I am interesting to work on general learning on the fundamental properties of network as you suggested.

With the above explanation, the "one-off" ML solutions for a specific network task that is general among networks, are legitimate topics for our RG, I believe. I could emphasis the generalization among networks in our future version of RG charter, but I am not sure how much we could do on generalizing among various network tasks.


>>Although having the above concerns, I  think we should: a) research on certain dataset.

I meant we should try to create a "standardized" data sets as you suggested.


>>Many of machine learning algorithms are based on statistics theory, which may
>>not be that much accurate.
>><Sheng>The reason that statistics-based ML algorithms can work is that they
>>chose to focus on only high probability and igrone corner cases.

>I don't believe this is correct, but perhaps I don't understand your point. Can
>you give a concrete example of (a). what you mean by statistics-based ML
>algorithm and (b). one that ignores corner cases and how/why?

Maybe the most close word for my "statistics-based" is probabilistic. Then, let's take an example of linear regression algorithm. Based on the training data, the algorithm could work out a curve that could be used to predict. However, not all of training data are on this curve. Until we expand this curve with a tolerance range, we may cover majority of training data. There may still some dispersed data which is chosen to be ignored. And there is no guarantee that the future data would perfectly much the prediction on this curve. Till we give a tolerance scope, we could be confidence. However, it is still not 100%. The ML chooses to ignore the dispersed data, otherwise it may not be able to produce a smooth curve based on pure probabilistic rather than understanding the semantic of them. So, when the future data is impacted by the same reasons that cause such dispersed data. The ML based prediction would certainly fail. It causes the inaccurate, even tolerance scope may not be able to cover it.


>BTW, people are going to deploy ML based systems not because they are interesting
>or cool (which they are), but rather because their competitors are/will.

Agree. But this won’t happen until we have successfully persuade a first wave usage.


>>Also the process of machine learning may be hard to be controlled or intervened.
>><Sheng> Here, I actually refer to DNN too. We don't know how to intervened
>>DNN in its running time.

>Do you mean that once you've trained a DNN you can only use it for classification
>(or whatever) and can't further train it, or do you mean that we can't learn from
>additional data, or something else?

No. Of cause, DNN can learn from additional data. What I meant is we could not be sure the DNN’s results or behaviors, even we could choose the training data or the incremental data. This links back to the fact that we do not understand in details how DNN works. Therefore, we could not intervene or control DNN to behavior the way we want 100%. This is also the trust issues for DNN, and even for AI.


>><Sheng>Inaccurate or stochastic, they are not what many network people can accept.
>>In their eyes, network is definite, every parameter is definite. However, actually, if
>>you could analyze every network parameter, the value may be definite, but
>>actually they all have toleration for some level of inaccurate as long as their
>>design purposes could meet. It is the fundamental that we could use ML to
>>solve network task.

>I didn't understand what you mean here.

I have been questioned many times by traditional network people who think their network and parameters on their network devices are definite. Giving the inaccurate of ML predict result, they are very resisting. However, our studies showed only the design purposes need to be definitely meet, not the value of parameters. For example, in one of our studies, we got various values for COST in ISIS protocol, like 5, 50, 100, 200, 500, 1000, etc. However, further studies on the semantic and design purposes showed the exact values are not meaningful. In any giving network, the COST is used to distinguish interface into two or three classes. If ML could learn this design purpose, then it could assign arbitrary two of three numbers in a giving network and the network will work fine.


Thanks and regards,

Sheng

Re: [Nmlrg] links to each slide of presentations/… Sheng Jiang
Re: [Nmlrg] Fwd: links to each slide of presentat… Voula Vassaki
Re: [Nmlrg] links to each slide of presentations/… David Meyer
Re: [Nmlrg] links to each slide of presentations/… Sheng Jiang
Re: [Nmlrg] links to each slide of presentations/… Liubing (Leo)
Re: [Nmlrg] links to each slide of presentations/… Panagiotis Demestichas
Re: [Nmlrg] links to each slide of presentations/… David Meyer
Re: [Nmlrg] links to each slide of presentations/… Sheng Jiang
Re: [Nmlrg] links to each slide of presentations/… David Meyer
Re: [Nmlrg] links to each slide of presentations/… David Meyer
Re: [Nmlrg] links to each slide of presentations/… Sheng Jiang
Re: [Nmlrg] links to each slide of presentations/… David Meyer
[Nmlrg] links to each slide of presentations//RE:… Sheng Jiang
Re: [Nmlrg] links to each slide of presentations/… David Meyer