Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-machine-learning-00.txt

Albert Cabellos <albert.cabellos@gmail.com> Mon, 18 July 2016 11:33 UTC

MIME-Version: 1.0
In-Reply-To: <578BF2BD.1060900@inria.fr>
References: <CAGE_QewtGRL58K-XLrFOE9a-vMjJEV8v5sthMQ3OeHdzAOKK8A@mail.gmail.com> <578BF2BD.1060900@inria.fr>
From: Albert Cabellos <albert.cabellos@gmail.com>
Date: Mon, 18 Jul 2016 13:33:49 +0200
Message-ID: <CAGE_QewKEGcLqb1XD-h98_sqHxxFAFzt22A-jDG-bWNN0ATQXA@mail.gmail.com>
To: Jérôme François <jerome.francois@inria.fr>
Content-Type: multipart/alternative; boundary="001a114bde2c9289810537e757d9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nmlrg/Yzcm2krjr8Zamv0PQdtVBmJfhWE>
Cc: nmlrg@irtf.org, draft-jiang-nmlrg-traffic-machine-learning@ietf.org
Subject: Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-machine-learning-00.txt
Precedence: list

Hi Jérôme

Please see inline:

On Sun, Jul 17, 2016 at 11:03 PM, Jérôme François <jerome.francois@inria.fr>
wrote:

> Hi Albert,
>
> 4.1.  HTTPS Traffic Classification
>
>
> [snip]
>
>    As a concrete example, Google, Facebook or Amazon are service
>>    providers while maps, drive, gmail are services of Google.  To
>>    identify them when they are accessed by a user, IP addresses and DNS
>>    (Domain Name System) names based identification is not reliable as
>>    the users can relies on intermediates to respectively serve as proxy
>>    or resolve DNS requests.  The SNI (Server Name Indication) [RFC5246]
>>    is an extension of HTTPS which is indicated by the user when
>>    initiating the TLS handshake (Client Hello).  SNI actually contains
>>    the hostname to which the request is addressed.  Such an hostname is
>>    significative of the service and service provider name.  However, SNI
>>    is an optional field and can be easily forged to circumvent HTTPS
>>    filtering without impacting service use [bypasssni].  More advanced
>>    mechanisms are hence necessary to improve the robustness of
>>    identification even in the case of non collaborative users.
>
>
> I suggest being vendor-agnostic in the examples, the specific examples do
> not improve the draft by any means.
>
> I guess that the examples helps to understand what we mean by service
> provider and service, i.e. to illustrate that having two levels is
> something common nowadays.
>
>
I think that everyone understands this, I suggest to have a vendor-neutral
document.

>
> [snip]
>
>>
>>
>>      HTTPS Connection
>>            +
>>            |(1)
>>    +-------v------+
>>    |TLS Connection|
>>    |Reconstruction|
>>    +-------+------+
>>            |(2)
>>    +-------v------+    (3')                    (4')
>>    |  Features    +-------------+----------------------------+
>>    |  Extraction  |             |                            |
>>    +-------+------+     +-------v---------+             +----v----+
>>            |            |Service Provider +------------->Services |
>>            |(3)         |L1 model         |   Load      |L2 model |
>>            |            +-------^---------+   services  +----^----+
>>    +-------v------+             |             model X        |
>>    |SNI Labelling |             +----------------------------+
>>    +-------+------+                         |(5)
>>            |            +-----------------------------------------+
>>            +------------>              Training and               |
>>                    (4)  |              Models building            |
>>                         +-----------------------------------------+
>>
>>    Two-levels HTTPS traffic classification
>>
>>    In figure above, step(1) consists in reconstructing the HTTPS
>>    connection and retrieving packets on top of which the following
>>    metrics are observed (2):
>>
>>    o  Inter Arrival Time
>>
>>    o  Packet size
>>
>>    o  Encrypted data size: this feature has the advantage to be strongly
>>       related to the service accessed instead of the packet size which
>>       is biased by other lower layer headers
>>
>>    Based on these values, aggregated features are computed: average,
>>    minimum, maximum, 25th percentile, median, 75th percentile.
>>
>>
> Does the authors see value on listing all the traffic features in an ANNEX?
>
> All can be found in the referenced paper and those that contribute to a
> good classification
> are the ones given in the draft. So, as the author, I would say "no".
>
>
Thanks! In any case, can the authors try to put together the features that
they are using and see if there are any common ones? The document should be
-to a reasonable extent- self-contained.

cheers

Albert


> Best regards,
> Jérôme
>

Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-… Albert Cabellos
Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-… Jérôme François
[Nmlrg] Review for draft-jiang-nmlrg-traffic-mach… Albert Cabellos