Re: [Nmlrg] Machine Learning in network - solicitation for use cases
Sebastian Abt <sabt@sabt.net> Thu, 17 September 2015 19:40 UTC
Return-Path: <sabt@sabt.net>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id 1409C1A8A1F
for <nmlrg@ietfa.amsl.com>; Thu, 17 Sep 2015 12:40:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id 3evBgJMQwFiz for <nmlrg@ietfa.amsl.com>;
Thu, 17 Sep 2015 12:40:29 -0700 (PDT)
Received: from sephina.sabt.net (mail.sabt.net [IPv6:2001:1a50:1::3])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id 455281A897B
for <nmlrg@irtf.org>; Thu, 17 Sep 2015 12:40:29 -0700 (PDT)
Received: from [62.216.164.250] (helo=mbpro.fritz.box)
by sephina.sabt.net with esmtpsa (TLSv1:AES256-SHA:256)
(Exim 4.69 (FreeBSD)) (envelope-from <sabt@sabt.net>)
id 1Zcf2S-0001yq-Cv; Thu, 17 Sep 2015 21:40:12 +0200
Content-Type: multipart/signed;
boundary="Apple-Mail=_78854875-FC14-4DFA-8A04-E014824764E7";
protocol="application/pkcs7-signature"; micalg=sha1
Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Sebastian Abt <sabt@sabt.net>
In-Reply-To: <8AE0F17B87264D4CAC7DE0AA6C406F45C227CF25@nkgeml506-mbx.china.huawei.com>
Date: Thu, 17 Sep 2015 21:40:11 +0200
Message-Id: <011F781F-9409-44D6-A006-C899A39053A1@sabt.net>
References: <D20A251E.25E52%dacheng.zdc@alibaba-inc.com>
<5D36713D8A4E7348A7E10DF7437A4B927BB2B192@nkgeml512-mbx.china.huawei.com>
<D20B2C03.25EC7%dacheng.zdc@alibaba-inc.com>
<5D36713D8A4E7348A7E10DF7437A4B927BB2D062@nkgeml512-mbx.china.huawei.com>
<D211D160.26495%dacheng.zdc@alibaba-inc.com>
<D211D7F2.2651C%dacheng.zdc@alibaba-inc.com>
<5D36713D8A4E7348A7E10DF7437A4B927BB2D300@nkgeml512-mbx.china.huawei.com>
<55EC9987.9030002@gmail.com>
<5D36713D8A4E7348A7E10DF7437A4B927BB2D65D@nkgeml512-mbx.china.huawei.com>
<55ED09ED.3090406@gmail.com>
<5D36713D8A4E7348A7E10DF7437A4B927BB2DD75@nkgeml512-mbx.china.huawei.com>
<8AE0F17B87264D4CAC7DE0AA6C406F45C227BE52@nkgeml506-mbx.china.huawei.com>
<55EE6648.4040804@gmail.com>
<8AE0F17B87264D4CAC7DE0AA6C406F45C227CF25@nkgeml506-mbx.china.huawei.com>
To: "Liubing (Leo)" <leo.liubing@huawei.com>
X-Mailer: Apple Mail (2.2104)
Archived-At: <http://mailarchive.ietf.org/arch/msg/nmlrg/h5_dLRpZAL1yF6JcuaGRykGokig>
Cc: "nmlrg@irtf.org" <nmlrg@irtf.org>, Sebastian Abt <sabt@sabt.net>,
Dacheng Zhang <dacheng.zdc@alibaba-inc.com>,
Sheng Jiang <jiangsheng@huawei.com>
Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>,
<mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>,
<mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Sep 2015 19:40:32 -0000
> Am 08.09.2015 um 10:13 schrieb Liubing (Leo) <leo.liubing@huawei.com>om>: > > Hi Brian, > > Thanks for your elaborating explanation. > Please see inline. > >> -----Original Message----- >> From: Brian E Carpenter [mailto:brian.e.carpenter@gmail.com] >> Sent: Tuesday, September 08, 2015 12:39 PM >> To: Liubing (Leo); Sheng Jiang; Dacheng Zhang; nmlrg@irtf.org >> Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases >> >> On 08/09/2015 16:00, Liubing (Leo) wrote: >> >> ... >>> But I'm curious about what is the item that could be labeled as "This is not >> an attack " or " You missed an attack ". E.g., the item is an packet, a stream, >> or any other kind of N-tuple things. >> >> The two cases are rather different. >> >> 1. The system signals "attack in progress" to the NOC. The operators have a >> look and decide that there is no attack, it is just some unusual traffic. >> (Example: you are live-streaming the Olympic Games. Two seconds after the >> end of the 100 metres final, there is an enormous burst of traffic. >> The machine learning system signals an attack, because it was not trained on >> the data set from the previous Olympic Games.) >> >> In this case the NOC operators urgently tell the algorithm it is wrong. >> It needs to learn that the signature of a sudden burst just after the end of an >> event is less likely to be an attack than a sudden burst at another time. >> >> 2. Someone invents a new kind of DDoS attack, which is therefore not in the >> historical training data. The system doesn't identify it. >> In this case, the NOC operators tell the algorithm "Attack started at <time>." > > [Bing] It feels like the learning objects are mostly traffic burst events? When traffic burst happens, the machine judges whether it's a DDoS or not. > Then the training data might be a bunch of traffic burst evens marked as normal or abnormal. And the challenge should be how to pick a set of features out of a burst event for the machine learning program to discover the pattern of normal/abnormal classification. > > This is just my hypothetical case, could be all wrong. As you say, the point really is the representation you choose, i.e. the features your system uses. If you encode traffic as bps and pps, then bursts won’t tell you anything. Both in the case of a flash crowd and during a volumetric DDoS attack these features would increase. You can predict a little bit more if you also incorporate bps/pps, i.e. bits-per-packet, as in reality DDoS attacks typically cause a shift there, but this is all pretty coarse. And what might be a DDoS attack for a customer might not be considered an attack for the carrier - it may even go unnoticed by the carrier as it is too low-volume when compared to its usual backbone traffic levels. What this example should tell: for reliable attack detection choosing features that are independent of traffic volume is important. Looking at volume/bursts can only be indicative. >> This automatically becomes high quality training data for the algorithm: the >> signature of the new traffic at that time is 100% certain to be an attack. > >> I think the hard part is extracting useful signatures from the traffic stream in >> real time; the learning/training part is fairly standard. > [Bing] When you said the signature of "100% certain", my perception is that it is something like the typical virus detection approach, which doing exact match of a particular piece of code that could identify a virus. > If my perception was correct, I think the signatures are some special combinations of the features (as mentioned above) where the classification pattern has a 100% confidence. Then I think it doesn't need to extract the signatures in real time, because the features are all pre-defined. In both examples the difficulty is backwards-mapping from features to packets/flows/etc. Typically, your features are derived from a set of packets packets/flows/etc. collected over a specific period of time. So, you have to bin the packets/flows/etc. in a sensible way (i.e., not only time) before actually extracting feature vectors in order to be able to draw conclusions on very specific packets/flows - which is especially required for automatic response. BTW: the same applies to the „high quality training data“ as mentioned by Brian. While I agree that manually labelled training data is the gold standard, in the case sketched here an operator would only see the prediction your ADS makes on the feature vectors, which may be based on a bunch of packets/flows/… which might contain both, legitimate and illegitimate activity. So, being able to map labels down to packets or flows will be a time consuming manual process. sebastian
- [Nmlrg] Machine Learning in network - solicitatio… Sheng Jiang
- Re: [Nmlrg] Machine Learning in network - solicit… Dacheng Zhang
- [Nmlrg] Using Machine Learning for Network Device… Liubing (Leo)
- Re: [Nmlrg] Using Machine Learning for Network De… Sheng Jiang
- Re: [Nmlrg] Using Machine Learning for Network De… Liubing (Leo)
- Re: [Nmlrg] Using Machine Learning for Network De… Sheng Jiang
- Re: [Nmlrg] Using Machine Learning for Network De… Liubing (Leo)
- Re: [Nmlrg] Machine Learning in network - solicit… Dacheng Zhang
- Re: [Nmlrg] Machine Learning in network - solicit… Sheng Jiang
- Re: [Nmlrg] Machine Learning in network - solicit… Brian E Carpenter
- Re: [Nmlrg] Machine Learning in network - solicit… Dacheng Zhang
- Re: [Nmlrg] Machine Learning in network - solicit… Dacheng Zhang
- Re: [Nmlrg] Machine Learning in network - solicit… Sheng Jiang
- Re: [Nmlrg] Machine Learning in network - solicit… Brian E Carpenter
- Re: [Nmlrg] Machine Learning in network - solicit… Sheng Jiang
- Re: [Nmlrg] Machine Learning in network - solicit… Sheng Jiang
- Re: [Nmlrg] Machine Learning in network - solicit… Liubing (Leo)
- Re: [Nmlrg] Machine Learning in network - solicit… Brian E Carpenter
- Re: [Nmlrg] Machine Learning in network - solicit… Liubing (Leo)
- Re: [Nmlrg] Machine Learning in network - solicit… Brian E Carpenter
- Re: [Nmlrg] Machine Learning in network - solicit… Liubing (Leo)
- Re: [Nmlrg] Machine Learning in network - solicit… Jérôme François
- Re: [Nmlrg] Machine Learning in network - solicit… Jérôme François
- Re: [Nmlrg] Machine Learning in network - solicit… Sheng Jiang
- Re: [Nmlrg] Machine Learning in network - solicit… Sebastian Abt
- Re: [Nmlrg] Machine Learning in network - solicit… Sebastian Abt
- Re: [Nmlrg] Machine Learning in network - solicit… Sebastian Abt
- Re: [Nmlrg] Machine Learning in network - solicit… Sebastian Abt
- Re: [Nmlrg] Machine Learning in network - solicit… Sebastian Abt
- Re: [Nmlrg] Machine Learning in network - solicit… Brian E Carpenter
- Re: [Nmlrg] Machine Learning in network - solicit… Jérôme François
- Re: [Nmlrg] Machine Learning in network - solicit… Liubing (Leo)
- Re: [Nmlrg] Machine Learning in network - solicit… Jérôme François
- Re: [Nmlrg] Machine Learning in network - solicit… Sheng Jiang
- Re: [Nmlrg] Machine Learning in network - solicit… Sheng Jiang
- Re: [Nmlrg] Machine Learning in network - solicit… Liubing (Leo)
- Re: [Nmlrg] Machine Learning in network - solicit… Sheng Jiang