Re: [Nmlrg] Machine Learning in network - solicitation for use cases

Brian E Carpenter <brian.e.carpenter@gmail.com> Tue, 08 September 2015 23:05 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 362E11B341F for <nmlrg@ietfa.amsl.com>; Tue, 8 Sep 2015 16:05:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CCD5uiZm2eTY for <nmlrg@ietfa.amsl.com>; Tue, 8 Sep 2015 16:05:24 -0700 (PDT)
Received: from mail-pa0-x234.google.com (mail-pa0-x234.google.com [IPv6:2607:f8b0:400e:c03::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DAE151B3591 for <nmlrg@irtf.org>; Tue, 8 Sep 2015 16:05:24 -0700 (PDT)
Received: by padhk3 with SMTP id hk3so51725517pad.3 for <nmlrg@irtf.org>; Tue, 08 Sep 2015 16:05:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:organization:message-id:date:user-agent :mime-version:in-reply-to:content-type:content-transfer-encoding; bh=o8WKlSAPI8wkUxv0/gI8Ziwgk0hIIP3jEM6g0epxMis=; b=slfMkFsvizgxQF6RUMnH1FaTdEpIEjd5Cr4PMyYu8xdnoo6wacal+GMEj859Sj8R5k O3QnBAlS7qAA1A6rYrlDbjiBq65gM0kvZwNVTtdVc+jYYlt1XLgfgt/Qaul8LFpUORSP GmDIpLqZ0NiOP+eeATK0T7PxPA2IrQWjDQiLGhnN4np0tr+o/uQ7afMDFG1bPPFZiqee 9T6TUcESOyvIisTzRyBn+9r3tDIZ9Pc5U4I7QtaTXO5Q/0RXdeWFLptvBPlZQ/UlSnZb SKmjDHEW6Q26ADi9iy4Y1ewPIaWXkz6YqB5gQFHHfpWmIQyMRsksDzlIAYgZtCMVUty2 zBpA==
X-Received: by 10.66.253.129 with SMTP id aa1mr53366021pad.24.1441753524291; Tue, 08 Sep 2015 16:05:24 -0700 (PDT)
Received: from [192.168.178.25] (132.219.69.111.dynamic.snap.net.nz. [111.69.219.132]) by smtp.gmail.com with ESMTPSA id wk6sm4656168pab.30.2015.09.08.16.05.20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Sep 2015 16:05:22 -0700 (PDT)
To: "Liubing (Leo)" <leo.liubing@huawei.com>, Sheng Jiang <jiangsheng@huawei.com>, Dacheng Zhang <dacheng.zdc@alibaba-inc.com>, "nmlrg@irtf.org" <nmlrg@irtf.org>
References: <D20A251E.25E52%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2B192@nkgeml512-mbx.china.huawei.com> <D20B2C03.25EC7%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D062@nkgeml512-mbx.china.huawei.com> <D211D160.26495%dacheng.zdc@alibaba-inc.com> <D211D7F2.2651C%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D300@nkgeml512-mbx.china.huawei.com> <55EC9987.9030002@gmail.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D65D@nkgeml512-mbx.china.huawei.com> <55ED09ED.3090406@gmail.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2DD75@nkgeml512-mbx.china.huawei.com> <8AE0F17B87264D4CAC7DE0AA6C406F45C227BE52@nkgeml506-mbx.china.huawei.com> <55EE6648.4040804@gmail.com> <8AE0F17B87264D4CAC7DE0AA6C406F45C227CF25@nkgeml506-mbx.china.huawei.com>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Organization: University of Auckland
Message-ID: <55EF69B3.2070305@gmail.com>
Date: Wed, 9 Sep 2015 11:05:23 +1200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <8AE0F17B87264D4CAC7DE0AA6C406F45C227CF25@nkgeml506-mbx.china.huawei.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/nmlrg/lqeqyDptKQ3BnTpbkxG5CaCYE-M>
Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 08 Sep 2015 23:05:27 -0000

Bing,

We are at the limits of my knowledge here. However, as I said, I think
that automatically extracting event signatures from a real time
packet stream is the hard part.

I think a typical approach would be to run a variety of statistical
algorithms over a sliding window of incoming packet headers.
This paper talks about one technique (but not used in real time):
http://www.cs.auckland.ac.nz/CDMTCS//researchreports/266eimann.pdf
and the corresponding PhD thesis:
https://researchspace.auckland.ac.nz/bitstream/handle/2292/3427/02whole.pdf

Regards
   Brian


On 08/09/2015 20:13, Liubing (Leo) wrote:
> Hi Brian,
> 
> Thanks for your elaborating explanation.
> Please see inline.
> 
>> -----Original Message-----
>> From: Brian E Carpenter [mailto:brian.e.carpenter@gmail.com]
>> Sent: Tuesday, September 08, 2015 12:39 PM
>> To: Liubing (Leo); Sheng Jiang; Dacheng Zhang; nmlrg@irtf.org
>> Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases
>>
>> On 08/09/2015 16:00, Liubing (Leo) wrote:
>>
>> ...
>>> But I'm curious about what is the item that could be labeled as "This is not
>> an attack " or " You missed an attack ". E.g., the item is an packet, a stream,
>> or any other kind of N-tuple things.
>>
>> The two cases are rather different.
>>
>> 1. The system signals "attack in progress" to the NOC. The operators have a
>> look and decide that there is no attack, it is just some unusual traffic.
>> (Example: you are live-streaming the Olympic Games. Two seconds after the
>> end of the 100 metres final, there is an enormous burst of traffic.
>> The machine learning system signals an attack, because it was not trained on
>> the data set from the previous Olympic Games.)
>>
>> In this case the NOC operators urgently tell the algorithm it is wrong.
>> It needs to learn that the signature of a sudden burst just after the end of an
>> event is less likely to be an attack than a sudden burst at another time.
>>
>> 2. Someone invents a new kind of DDoS attack, which is therefore not in the
>> historical training data. The system doesn't identify it.
>> In this case, the NOC operators tell the algorithm "Attack started at <time>."
> 
> [Bing] It feels like the learning objects are mostly traffic burst events? When traffic burst happens, the machine judges whether it's a DDoS or not.
> Then the training data might be a bunch of traffic burst evens marked as normal or abnormal. And the challenge should be how to pick a set of features out of a burst event for the machine learning program to discover the pattern of normal/abnormal classification.
> 
> This is just my hypothetical case, could be all wrong.
> 
>> This automatically becomes high quality training data for the algorithm: the
>> signature of the new traffic at that time is 100% certain to be an attack.
> 
>> I think the hard part is extracting useful signatures from the traffic stream in
>> real time; the learning/training part is fairly standard.
> [Bing] When you said the signature of "100% certain", my perception is that it is something like the typical virus detection approach, which doing exact match of a particular piece of code that could identify a virus.
> If my perception was correct, I think the signatures are some special combinations of the features (as mentioned above) where the classification pattern has a 100% confidence. Then I think it doesn't need to extract the signatures in real time, because the features are all pre-defined. 
> 
> However, this is only from my view, I might have totally misunderstood you.
> 
> Best regards,
> Bing
> 
>>     Brian