Re: [Nmlrg] Machine Learning in network - solicitation for use cases

"Liubing (Leo)" <leo.liubing@huawei.com> Tue, 08 September 2015 08:14 UTC

Return-Path: <leo.liubing@huawei.com>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7BE7F1A8AC0 for <nmlrg@ietfa.amsl.com>; Tue, 8 Sep 2015 01:14:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.211
X-Spam-Level:
X-Spam-Status: No, score=-4.211 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GLkUkRqM1Ud6 for <nmlrg@ietfa.amsl.com>; Tue, 8 Sep 2015 01:14:20 -0700 (PDT)
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [58.251.152.64]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C055E1ACD50 for <nmlrg@irtf.org>; Tue, 8 Sep 2015 01:14:19 -0700 (PDT)
Received: from 172.24.1.49 (EHLO nkgeml410-hub.china.huawei.com) ([172.24.1.49]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CUN91921; Tue, 08 Sep 2015 16:14:07 +0800 (CST)
Received: from NKGEML506-MBX.china.huawei.com ([169.254.3.214]) by nkgeml410-hub.china.huawei.com ([10.98.56.41]) with mapi id 14.03.0235.001; Tue, 8 Sep 2015 16:13:57 +0800
From: "Liubing (Leo)" <leo.liubing@huawei.com>
To: Brian E Carpenter <brian.e.carpenter@gmail.com>, Sheng Jiang <jiangsheng@huawei.com>, Dacheng Zhang <dacheng.zdc@alibaba-inc.com>, "nmlrg@irtf.org" <nmlrg@irtf.org>
Thread-Topic: [Nmlrg] Machine Learning in network - solicitation for use cases
Thread-Index: AQHQ48AegVsK0UL1fk+2fNxpqlcTQp4u5gnE///RqACAALXdgIAAbfMAgAAYC4CAAYcXgIAAi5ug//+MkwCAAJ0JAA==
Date: Tue, 8 Sep 2015 08:13:56 +0000
Message-ID: <8AE0F17B87264D4CAC7DE0AA6C406F45C227CF25@nkgeml506-mbx.china.huawei.com>
References: <D20A251E.25E52%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2B192@nkgeml512-mbx.china.huawei.com> <D20B2C03.25EC7%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D062@nkgeml512-mbx.china.huawei.com> <D211D160.26495%dacheng.zdc@alibaba-inc.com> <D211D7F2.2651C%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D300@nkgeml512-mbx.china.huawei.com> <55EC9987.9030002@gmail.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D65D@nkgeml512-mbx.china.huawei.com> <55ED09ED.3090406@gmail.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2DD75@nkgeml512-mbx.china.huawei.com> <8AE0F17B87264D4CAC7DE0AA6C406F45C227BE52@nkgeml506-mbx.china.huawei.com> <55EE6648.4040804@gmail.com>
In-Reply-To: <55EE6648.4040804@gmail.com>
Accept-Language: en-US, zh-CN
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.111.98.117]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <http://mailarchive.ietf.org/arch/msg/nmlrg/31fTu6h6Z095nd_lYG26y6WtqtA>
Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2015 08:14:23 -0000

Hi Brian,

Thanks for your elaborating explanation.
Please see inline.

> -----Original Message-----
> From: Brian E Carpenter [mailto:brian.e.carpenter@gmail.com]
> Sent: Tuesday, September 08, 2015 12:39 PM
> To: Liubing (Leo); Sheng Jiang; Dacheng Zhang; nmlrg@irtf.org
> Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases
> 
> On 08/09/2015 16:00, Liubing (Leo) wrote:
> 
> ...
> > But I'm curious about what is the item that could be labeled as "This is not
> an attack " or " You missed an attack ". E.g., the item is an packet, a stream,
> or any other kind of N-tuple things.
> 
> The two cases are rather different.
> 
> 1. The system signals "attack in progress" to the NOC. The operators have a
> look and decide that there is no attack, it is just some unusual traffic.
> (Example: you are live-streaming the Olympic Games. Two seconds after the
> end of the 100 metres final, there is an enormous burst of traffic.
> The machine learning system signals an attack, because it was not trained on
> the data set from the previous Olympic Games.)
> 
> In this case the NOC operators urgently tell the algorithm it is wrong.
> It needs to learn that the signature of a sudden burst just after the end of an
> event is less likely to be an attack than a sudden burst at another time.
> 
> 2. Someone invents a new kind of DDoS attack, which is therefore not in the
> historical training data. The system doesn't identify it.
> In this case, the NOC operators tell the algorithm "Attack started at <time>."

[Bing] It feels like the learning objects are mostly traffic burst events? When traffic burst happens, the machine judges whether it's a DDoS or not.
Then the training data might be a bunch of traffic burst evens marked as normal or abnormal. And the challenge should be how to pick a set of features out of a burst event for the machine learning program to discover the pattern of normal/abnormal classification.

This is just my hypothetical case, could be all wrong.

> This automatically becomes high quality training data for the algorithm: the
> signature of the new traffic at that time is 100% certain to be an attack.

> I think the hard part is extracting useful signatures from the traffic stream in
> real time; the learning/training part is fairly standard.
[Bing] When you said the signature of "100% certain", my perception is that it is something like the typical virus detection approach, which doing exact match of a particular piece of code that could identify a virus.
If my perception was correct, I think the signatures are some special combinations of the features (as mentioned above) where the classification pattern has a 100% confidence. Then I think it doesn't need to extract the signatures in real time, because the features are all pre-defined. 

However, this is only from my view, I might have totally misunderstood you.

Best regards,
Bing

>     Brian