Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-machine-learning-00.txt

Jérôme François <jerome.francois@inria.fr> Sun, 17 July 2016 21:04 UTC

Message-ID: <578BF2BD.1060900@inria.fr>
Date: Sun, 17 Jul 2016 23:03:57 +0200
From: Jérôme François <jerome.francois@inria.fr>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
MIME-Version: 1.0
To: Albert Cabellos <albert.cabellos@gmail.com>, draft-jiang-nmlrg-traffic-machine-learning@ietf.org, nmlrg@irtf.org
References: <CAGE_QewtGRL58K-XLrFOE9a-vMjJEV8v5sthMQ3OeHdzAOKK8A@mail.gmail.com>
In-Reply-To: <CAGE_QewtGRL58K-XLrFOE9a-vMjJEV8v5sthMQ3OeHdzAOKK8A@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------050509060708010607030708"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nmlrg/IiRqdG7pfmB4qeMs32xWyo8fGzs>
Subject: Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-machine-learning-00.txt
Precedence: list

Hi Albert,

>     4.1.  HTTPS Traffic Classification
>
>
> [snip]
>
>        As a concrete example, Google, Facebook or Amazon are service
>        providers while maps, drive, gmail are services of Google.  To
>        identify them when they are accessed by a user, IP addresses
>     and DNS
>        (Domain Name System) names based identification is not reliable as
>        the users can relies on intermediates to respectively serve as
>     proxy
>        or resolve DNS requests.  The SNI (Server Name Indication)
>     [RFC5246]
>        is an extension of HTTPS which is indicated by the user when
>        initiating the TLS handshake (Client Hello).  SNI actually contains
>        the hostname to which the request is addressed.  Such an
>     hostname is
>        significative of the service and service provider name. 
>     However, SNI
>        is an optional field and can be easily forged to circumvent HTTPS
>        filtering without impacting service use [bypasssni].  More advanced
>        mechanisms are hence necessary to improve the robustness of
>        identification even in the case of non collaborative users.
>
>
> I suggest being vendor-agnostic in the examples, the specific examples
> do not improve the draft by any means.
I guess that the examples helps to understand what we mean by service
provider and service, i.e. to illustrate that having two levels is
something common nowadays.
>
> [snip]
>
>
>
>          HTTPS Connection
>                +
>                |(1)
>        +-------v------+
>        |TLS Connection|
>        |Reconstruction|
>        +-------+------+
>                |(2)
>        +-------v------+    (3')                    (4')
>        |  Features    +-------------+----------------------------+
>        |  Extraction  |             |                            |
>        +-------+------+     +-------v---------+             +----v----+
>                |            |Service Provider +------------->Services |
>                |(3)         |L1 model         |   Load      |L2 model |
>                |            +-------^---------+   services  +----^----+
>        +-------v------+             |             model X        |
>        |SNI Labelling |             +----------------------------+
>        +-------+------+                         |(5)
>                |            +-----------------------------------------+
>                +------------>              Training and               |
>                        (4)  |              Models building            |
>                             +-----------------------------------------+
>
>        Two-levels HTTPS traffic classification
>
>        In figure above, step(1) consists in reconstructing the HTTPS
>        connection and retrieving packets on top of which the following
>        metrics are observed (2):
>
>        o  Inter Arrival Time
>
>        o  Packet size
>
>        o  Encrypted data size: this feature has the advantage to be
>     strongly
>           related to the service accessed instead of the packet size which
>           is biased by other lower layer headers
>
>        Based on these values, aggregated features are computed: average,
>        minimum, maximum, 25th percentile, median, 75th percentile.
>
>  
> Does the authors see value on listing all the traffic features in an
> ANNEX?
>
All can be found in the referenced paper and those that contribute to a
good classification
are the ones given in the draft. So, as the author, I would say "no".

Best regards,
Jérôme

Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-… Albert Cabellos
Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-… Jérôme François
[Nmlrg] Review for draft-jiang-nmlrg-traffic-mach… Albert Cabellos