Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-machine-learning-00.txt

Jérôme François <jerome.francois@inria.fr> Sun, 17 July 2016 21:04 UTC

Return-Path: <jerome.francois@inria.fr>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 92A3B12D0B6 for <nmlrg@ietfa.amsl.com>; Sun, 17 Jul 2016 14:04:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.186
X-Spam-Level:
X-Spam-Status: No, score=-8.186 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RP_MATCHES_RCVD=-1.287] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BuX5sLRhSYIh for <nmlrg@ietfa.amsl.com>; Sun, 17 Jul 2016 14:04:06 -0700 (PDT)
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 39D0612B04D for <nmlrg@irtf.org>; Sun, 17 Jul 2016 14:04:06 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.28,380,1464645600"; d="scan'208,217";a="226939911"
Received: from unknown (HELO [10.141.69.144]) ([46.189.28.189]) by mail2-relais-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES128-SHA; 17 Jul 2016 23:04:03 +0200
Message-ID: <578BF2BD.1060900@inria.fr>
Date: Sun, 17 Jul 2016 23:03:57 +0200
From: Jérôme François <jerome.francois@inria.fr>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Albert Cabellos <albert.cabellos@gmail.com>, draft-jiang-nmlrg-traffic-machine-learning@ietf.org, nmlrg@irtf.org
References: <CAGE_QewtGRL58K-XLrFOE9a-vMjJEV8v5sthMQ3OeHdzAOKK8A@mail.gmail.com>
In-Reply-To: <CAGE_QewtGRL58K-XLrFOE9a-vMjJEV8v5sthMQ3OeHdzAOKK8A@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------050509060708010607030708"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nmlrg/IiRqdG7pfmB4qeMs32xWyo8fGzs>
Subject: Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-machine-learning-00.txt
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sun, 17 Jul 2016 21:04:08 -0000

Hi Albert,

>     4.1.  HTTPS Traffic Classification
>
>
> [snip]
>
>        As a concrete example, Google, Facebook or Amazon are service
>        providers while maps, drive, gmail are services of Google.  To
>        identify them when they are accessed by a user, IP addresses
>     and DNS
>        (Domain Name System) names based identification is not reliable as
>        the users can relies on intermediates to respectively serve as
>     proxy
>        or resolve DNS requests.  The SNI (Server Name Indication)
>     [RFC5246]
>        is an extension of HTTPS which is indicated by the user when
>        initiating the TLS handshake (Client Hello).  SNI actually contains
>        the hostname to which the request is addressed.  Such an
>     hostname is
>        significative of the service and service provider name. 
>     However, SNI
>        is an optional field and can be easily forged to circumvent HTTPS
>        filtering without impacting service use [bypasssni].  More advanced
>        mechanisms are hence necessary to improve the robustness of
>        identification even in the case of non collaborative users.
>
>
> I suggest being vendor-agnostic in the examples, the specific examples
> do not improve the draft by any means.
I guess that the examples helps to understand what we mean by service
provider and service, i.e. to illustrate that having two levels is
something common nowadays.
>
> [snip]
>
>
>
>          HTTPS Connection
>                +
>                |(1)
>        +-------v------+
>        |TLS Connection|
>        |Reconstruction|
>        +-------+------+
>                |(2)
>        +-------v------+    (3')                    (4')
>        |  Features    +-------------+----------------------------+
>        |  Extraction  |             |                            |
>        +-------+------+     +-------v---------+             +----v----+
>                |            |Service Provider +------------->Services |
>                |(3)         |L1 model         |   Load      |L2 model |
>                |            +-------^---------+   services  +----^----+
>        +-------v------+             |             model X        |
>        |SNI Labelling |             +----------------------------+
>        +-------+------+                         |(5)
>                |            +-----------------------------------------+
>                +------------>              Training and               |
>                        (4)  |              Models building            |
>                             +-----------------------------------------+
>
>        Two-levels HTTPS traffic classification
>
>        In figure above, step(1) consists in reconstructing the HTTPS
>        connection and retrieving packets on top of which the following
>        metrics are observed (2):
>
>        o  Inter Arrival Time
>
>        o  Packet size
>
>        o  Encrypted data size: this feature has the advantage to be
>     strongly
>           related to the service accessed instead of the packet size which
>           is biased by other lower layer headers
>
>        Based on these values, aggregated features are computed: average,
>        minimum, maximum, 25th percentile, median, 75th percentile.
>
>  
> Does the authors see value on listing all the traffic features in an
> ANNEX?
>
All can be found in the referenced paper and those that contribute to a
good classification
are the ones given in the draft. So, as the author, I would say "no".

Best regards,
Jérôme