Re: [Nmlrg] Machine Learning in network - solicitation for use cases

"Liubing (Leo)" <leo.liubing@huawei.com> Wed, 09 September 2015 02:13 UTC

Return-Path: <leo.liubing@huawei.com>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AEBC51B3893 for <nmlrg@ietfa.amsl.com>; Tue, 8 Sep 2015 19:13:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.211
X-Spam-Level:
X-Spam-Status: No, score=-4.211 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id irSOGWhGxbVz for <nmlrg@ietfa.amsl.com>; Tue, 8 Sep 2015 19:13:25 -0700 (PDT)
Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 62F1C1B3042 for <nmlrg@irtf.org>; Tue, 8 Sep 2015 19:13:24 -0700 (PDT)
Received: from 172.24.1.49 (EHLO nkgeml408-hub.china.huawei.com) ([172.24.1.49]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CRZ79118; Wed, 09 Sep 2015 10:13:13 +0800 (CST)
Received: from NKGEML506-MBX.china.huawei.com ([169.254.3.214]) by nkgeml408-hub.china.huawei.com ([10.98.56.39]) with mapi id 14.03.0235.001; Wed, 9 Sep 2015 10:13:02 +0800
From: "Liubing (Leo)" <leo.liubing@huawei.com>
To: Brian E Carpenter <brian.e.carpenter@gmail.com>, Sheng Jiang <jiangsheng@huawei.com>, Dacheng Zhang <dacheng.zdc@alibaba-inc.com>, "nmlrg@irtf.org" <nmlrg@irtf.org>
Thread-Topic: [Nmlrg] Machine Learning in network - solicitation for use cases
Thread-Index: AQHQ48AegVsK0UL1fk+2fNxpqlcTQp4u5gnE///RqACAALXdgIAAbfMAgAAYC4CAAYcXgIAAi5ug//+MkwCAAJ0JAIAAmDeAgAC4MWA=
Date: Wed, 9 Sep 2015 02:13:02 +0000
Message-ID: <8AE0F17B87264D4CAC7DE0AA6C406F45C227D26D@nkgeml506-mbx.china.huawei.com>
References: <D20A251E.25E52%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2B192@nkgeml512-mbx.china.huawei.com> <D20B2C03.25EC7%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D062@nkgeml512-mbx.china.huawei.com> <D211D160.26495%dacheng.zdc@alibaba-inc.com> <D211D7F2.2651C%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D300@nkgeml512-mbx.china.huawei.com> <55EC9987.9030002@gmail.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D65D@nkgeml512-mbx.china.huawei.com> <55ED09ED.3090406@gmail.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2DD75@nkgeml512-mbx.china.huawei.com> <8AE0F17B87264D4CAC7DE0AA6C406F45C227BE52@nkgeml506-mbx.china.huawei.com> <55EE6648.4040804@gmail.com> <8AE0F17B87264D4CAC7DE0AA6C406F45C227CF25@nkgeml506-mbx.china.huawei.com> <55EF69B3.2070305@gmail.com>
In-Reply-To: <55EF69B3.2070305@gmail.com>
Accept-Language: en-US, zh-CN
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.111.98.117]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <http://mailarchive.ietf.org/arch/msg/nmlrg/SfTg_XJKiXklK7mbrTWiATOwLZw>
Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Sep 2015 02:13:27 -0000

Hi Brian,

Thanks much for the information. 
The papers are very good stuff to get a sense of how modern technologies detect the DDoS event, rather than my arbitrary imagination:)

Best regards,
Bing

> -----Original Message-----
> From: Brian E Carpenter [mailto:brian.e.carpenter@gmail.com]
> Sent: Wednesday, September 09, 2015 7:05 AM
> To: Liubing (Leo); Sheng Jiang; Dacheng Zhang; nmlrg@irtf.org
> Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases
> 
> Bing,
> 
> We are at the limits of my knowledge here. However, as I said, I think that
> automatically extracting event signatures from a real time packet stream is
> the hard part.
> 
> I think a typical approach would be to run a variety of statistical algorithms
> over a sliding window of incoming packet headers.
> This paper talks about one technique (but not used in real time):
> http://www.cs.auckland.ac.nz/CDMTCS//researchreports/266eimann.pdf
> and the corresponding PhD thesis:
> https://researchspace.auckland.ac.nz/bitstream/handle/2292/3427/02whol
> e.pdf
> 
> Regards
>    Brian
> 
> 
> On 08/09/2015 20:13, Liubing (Leo) wrote:
> > Hi Brian,
> >
> > Thanks for your elaborating explanation.
> > Please see inline.
> >
> >> -----Original Message-----
> >> From: Brian E Carpenter [mailto:brian.e.carpenter@gmail.com]
> >> Sent: Tuesday, September 08, 2015 12:39 PM
> >> To: Liubing (Leo); Sheng Jiang; Dacheng Zhang; nmlrg@irtf.org
> >> Subject: Re: [Nmlrg] Machine Learning in network - solicitation for
> >> use cases
> >>
> >> On 08/09/2015 16:00, Liubing (Leo) wrote:
> >>
> >> ...
> >>> But I'm curious about what is the item that could be labeled as
> >>> "This is not
> >> an attack " or " You missed an attack ". E.g., the item is an packet,
> >> a stream, or any other kind of N-tuple things.
> >>
> >> The two cases are rather different.
> >>
> >> 1. The system signals "attack in progress" to the NOC. The operators
> >> have a look and decide that there is no attack, it is just some unusual
> traffic.
> >> (Example: you are live-streaming the Olympic Games. Two seconds after
> >> the end of the 100 metres final, there is an enormous burst of traffic.
> >> The machine learning system signals an attack, because it was not
> >> trained on the data set from the previous Olympic Games.)
> >>
> >> In this case the NOC operators urgently tell the algorithm it is wrong.
> >> It needs to learn that the signature of a sudden burst just after the
> >> end of an event is less likely to be an attack than a sudden burst at
> another time.
> >>
> >> 2. Someone invents a new kind of DDoS attack, which is therefore not
> >> in the historical training data. The system doesn't identify it.
> >> In this case, the NOC operators tell the algorithm "Attack started at
> <time>."
> >
> > [Bing] It feels like the learning objects are mostly traffic burst events?
> When traffic burst happens, the machine judges whether it's a DDoS or not.
> > Then the training data might be a bunch of traffic burst evens marked as
> normal or abnormal. And the challenge should be how to pick a set of
> features out of a burst event for the machine learning program to discover
> the pattern of normal/abnormal classification.
> >
> > This is just my hypothetical case, could be all wrong.
> >
> >> This automatically becomes high quality training data for the
> >> algorithm: the signature of the new traffic at that time is 100% certain to
> be an attack.
> >
> >> I think the hard part is extracting useful signatures from the
> >> traffic stream in real time; the learning/training part is fairly standard.
> > [Bing] When you said the signature of "100% certain", my perception is
> that it is something like the typical virus detection approach, which doing
> exact match of a particular piece of code that could identify a virus.
> > If my perception was correct, I think the signatures are some special
> combinations of the features (as mentioned above) where the classification
> pattern has a 100% confidence. Then I think it doesn't need to extract the
> signatures in real time, because the features are all pre-defined.
> >
> > However, this is only from my view, I might have totally misunderstood
> you.
> >
> > Best regards,
> > Bing
> >
> >>     Brian