Re: [Nmlrg] Machine Learning in network - solicitation for use cases

Brian E Carpenter <brian.e.carpenter@gmail.com> Thu, 17 September 2015 20:02 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 34BD41A6EF0 for <nmlrg@ietfa.amsl.com>; Thu, 17 Sep 2015 13:02:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OJNyZCLT1ayy for <nmlrg@ietfa.amsl.com>; Thu, 17 Sep 2015 13:02:37 -0700 (PDT)
Received: from mail-pa0-x234.google.com (mail-pa0-x234.google.com [IPv6:2607:f8b0:400e:c03::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BB49C1A00CF for <nmlrg@irtf.org>; Thu, 17 Sep 2015 13:02:37 -0700 (PDT)
Received: by padhy16 with SMTP id hy16so28161412pad.1 for <nmlrg@irtf.org>; Thu, 17 Sep 2015 13:02:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:cc:from:organization:message-id:date :user-agent:mime-version:in-reply-to:content-type :content-transfer-encoding; bh=044JPUBxqlqMD1x5nCqbrnxLnesM1di922eQGDyJZFE=; b=NXbfV2s7VyxstjEIrWZEZm3Icp1LizQLRUOFlEV0qvQmbI5DYKSuABaETIzZDcVj1M ovAg27dmcWAauBiXc7xoeWqD/YlcuJ4TL87dd7KVQPOblKIUaH1I/kK6onv2rp5RUV8K fBq9DJUwqVHxAiV8+bEDppieZI1CSKFiPuzjJKPomscl5IOQCbu3tdDjiWhF/eCVtBEx urKj2lMHOQJkKR4l3llzvjMziB/665MDo6T4nU4Ui9pJM9AZSjFZeWhDfVyQot1HhdMh anaiXoQcR3k/5LSaSI91SrstE8oMF7yTul2DE6pVvYE7XCMnp413ZHevU4sjx3bbYI3E o0CQ==
X-Received: by 10.68.68.133 with SMTP id w5mr1735054pbt.143.1442520157386; Thu, 17 Sep 2015 13:02:37 -0700 (PDT)
Received: from [192.168.178.25] ([163.47.223.240]) by smtp.gmail.com with ESMTPSA id ey3sm4928785pbd.28.2015.09.17.13.02.33 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Sep 2015 13:02:36 -0700 (PDT)
To: Sebastian Abt <sabt@sabt.net>, Sheng Jiang <jiangsheng@huawei.com>
References: <D20A251E.25E52%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2B192@nkgeml512-mbx.china.huawei.com> <D20B2C03.25EC7%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D062@nkgeml512-mbx.china.huawei.com> <D211D160.26495%dacheng.zdc@alibaba-inc.com> <D211D7F2.2651C%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2D300@nkgeml512-mbx.china.huawei.com> <D2130D6D.26ABF%dacheng.zdc@alibaba-inc.com> <5D36713D8A4E7348A7E10DF7437A4B927BB2DDB6@nkgeml512-mbx.china.huawei.com> <3D0B6D8D-4350-40F0-B09E-4094040A2A7A@sabt.net>
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
Organization: University of Auckland
Message-ID: <55FB1C5D.3070600@gmail.com>
Date: Fri, 18 Sep 2015 08:02:37 +1200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <3D0B6D8D-4350-40F0-B09E-4094040A2A7A@sabt.net>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/nmlrg/uDtzToctCYFSQt0c0AML833oSPE>
Cc: "nmlrg@irtf.org" <nmlrg@irtf.org>, Dacheng Zhang <dacheng.zdc@alibaba-inc.com>
Subject: Re: [Nmlrg] Machine Learning in network - solicitation for use cases
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 17 Sep 2015 20:02:39 -0000

Sebastian,

On 18/09/2015 07:17, Sebastian Abt wrote:
> 
>> Am 08.09.2015 um 05:39 schrieb Sheng Jiang <jiangsheng@huawei.com>om>:
>>
>>>> b) is it possible for autonomic reaction from the network operational
>>>> perspective after detect such DDoS attack? Give the machine learning may
>>>> not be accurate, my guess is human intervention is needed.
>>>
>>> In the current practice, machine learning procedure is normally offline.
>>> 1) machine learning may not very that accurate. 2) big data processing
>>> needs time and computing resources.  Human involvement is required.
>>
>> What may influence the accuracy of the mechanism learning result? In another word, how to improve the accuracy in machine learning mechanism? This question may not be DDoS protection specific.
> 
> I think there are many different factors that affect accuracy of a ML system.  Most crucial in my opinion are the following two:
> 
> 1. You need to find an appropriate description of the class(es) you try to learn.  In ML, this processing of finding/generating an appropriate description is commonly called feature extraction.  For network security this means that you need to find a way to transform your given representation of traffic (e.g., packets, flow records, SNMP counters, …) such that only the bits relevant to describe normality/to distinguish between two classes A and B are reflected and everything else is dismissed, effectively reducing entropy. 

For a real-time system we need a transformation of the packet stream that can run
at line speed and produce a fairly small number of parameters per second, right? Would you
see that as meaning a statistical algorithm, e.g. one producing entropy measures?

> The resulting feature vectors should have high intra-class and low inter-class similarity - for whatever notion of similarity
you choose.
> 
> 2. Especially for one-class systems that only learn models of normality, it is important to be able to track a change of normality. Otherwise, these systems render themselves useless over time / generate too much false alarms.  As operator, you can only rely on the results if there are no (significant) baseline changes.  However, detecting this is probably not trivial and as far as I know this is not heavily researched by the network security community.  Some years ago, I read a paper that claimed that such baseline confidence checks are successfully employed in voice recognition systems and crucial for those system’s reliability.  Unfortunately, I don’t have this paper at hand.

Strangely enough my first published paper a million years ago was on human
factors in real-time speech recognition. It turned out that emotional changes
such as user frustration caused large changes in gross parameters of the speech
signal, as large as the differences between speakers. So I think your point
is very important: how to distinguish between normal anomalies (like a surge
in valid traffic) from abnormal anomalies (like a surge in DOS packets).
Feature extraction that makes this possible sounds like a hard problem.

   Brian