Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-machine-learning-00.txt

Albert Cabellos <albert.cabellos@gmail.com> Mon, 18 July 2016 11:33 UTC

Return-Path: <albert.cabellos@gmail.com>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 136E112D8E1 for <nmlrg@ietfa.amsl.com>; Mon, 18 Jul 2016 04:33:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Jd9onJPWVtMz for <nmlrg@ietfa.amsl.com>; Mon, 18 Jul 2016 04:33:53 -0700 (PDT)
Received: from mail-wm0-x22b.google.com (mail-wm0-x22b.google.com [IPv6:2a00:1450:400c:c09::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 415AC12D8F7 for <nmlrg@irtf.org>; Mon, 18 Jul 2016 04:33:51 -0700 (PDT)
Received: by mail-wm0-x22b.google.com with SMTP id i5so112188770wmg.0 for <nmlrg@irtf.org>; Mon, 18 Jul 2016 04:33:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=UxuHaaZ28vseobFT7M/MOsKeVrs+xpA8qESuQ6UAzzM=; b=C4C9UyIQqUDJO5TnsDc9Bt6z87dxY10kx6sC/bQKpWHq73cQaV7aO8VZaKjmG1wCai dKH/KhV4JogLba2WcbomlvmKbLwNXgEjk8vR7PXq3vSq8+pqbBiwB8/yk2kgemr6BEB4 t02/inIjWNNJ6nOvkUOWHakNZ7rlWvKU4TyZGa+g7irRshTWV+aH/DC3emXn1JaALwS3 aYxCPaCf+VkRIdfHN3B+KqNhQEtIWqEw6hA5KAFEMGeIDZ7P3Z/EAcCqE37jrI6ByOKq dJpkPowqcyiA6qAva/56KmoOAysVbvO15auo2dOHtnzfWllT4CQWedvWIZbCUx2i3OfB a9Vg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=UxuHaaZ28vseobFT7M/MOsKeVrs+xpA8qESuQ6UAzzM=; b=URxnyY1uNfM527NQbj8WCg/JOPElrPPrhi/mszR1rFHyWoIy3zsm0XvzUBVdwSq4LQ aeH8FRXeun7g32Vqj0ia/HVHbBCuNXfy8KnsM1HVgKmoaghNZlRApydXZCCaDWM1bKZV IV4ksR6BHkUg7HaHazU05uydCjTAcDOcTizvOHa+EVdYDJfVqdZ5ya5JRbyRm3TXJzB8 uOOTh+nYGgktgLgdm8Sgm81uPHEPShqzc5ag8Dtxgo43QbwWaCfPFAzPPYlhmL/E4U9+ GReM3DU18qAz2yTqwjglF+Q8bqLI/fO7GIy2DaSvy7eDXXRwvWqIHUFeD1F6jGir6AG5 g/LQ==
X-Gm-Message-State: ALyK8tKbWmoUAglMIcHpb1nmIYx4mfegsNtOSoysZOzBMtVr/OFqu+LzGJ4W45y1Cenz6OUbRKGwnv2BEacYBQ==
X-Received: by 10.28.154.21 with SMTP id c21mr38512451wme.63.1468841629745; Mon, 18 Jul 2016 04:33:49 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.28.56.131 with HTTP; Mon, 18 Jul 2016 04:33:49 -0700 (PDT)
In-Reply-To: <578BF2BD.1060900@inria.fr>
References: <CAGE_QewtGRL58K-XLrFOE9a-vMjJEV8v5sthMQ3OeHdzAOKK8A@mail.gmail.com> <578BF2BD.1060900@inria.fr>
From: Albert Cabellos <albert.cabellos@gmail.com>
Date: Mon, 18 Jul 2016 13:33:49 +0200
Message-ID: <CAGE_QewKEGcLqb1XD-h98_sqHxxFAFzt22A-jDG-bWNN0ATQXA@mail.gmail.com>
To: Jérôme François <jerome.francois@inria.fr>
Content-Type: multipart/alternative; boundary="001a114bde2c9289810537e757d9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nmlrg/Yzcm2krjr8Zamv0PQdtVBmJfhWE>
Cc: nmlrg@irtf.org, draft-jiang-nmlrg-traffic-machine-learning@ietf.org
Subject: Re: [Nmlrg] Review for draft-jiang-nmlrg-traffic-machine-learning-00.txt
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Jul 2016 11:33:55 -0000

Hi Jérôme

Please see inline:

On Sun, Jul 17, 2016 at 11:03 PM, Jérôme François <jerome.francois@inria.fr>
wrote:

> Hi Albert,
>
> 4.1.  HTTPS Traffic Classification
>
>
> [snip]
>
>    As a concrete example, Google, Facebook or Amazon are service
>>    providers while maps, drive, gmail are services of Google.  To
>>    identify them when they are accessed by a user, IP addresses and DNS
>>    (Domain Name System) names based identification is not reliable as
>>    the users can relies on intermediates to respectively serve as proxy
>>    or resolve DNS requests.  The SNI (Server Name Indication) [RFC5246]
>>    is an extension of HTTPS which is indicated by the user when
>>    initiating the TLS handshake (Client Hello).  SNI actually contains
>>    the hostname to which the request is addressed.  Such an hostname is
>>    significative of the service and service provider name.  However, SNI
>>    is an optional field and can be easily forged to circumvent HTTPS
>>    filtering without impacting service use [bypasssni].  More advanced
>>    mechanisms are hence necessary to improve the robustness of
>>    identification even in the case of non collaborative users.
>
>
> I suggest being vendor-agnostic in the examples, the specific examples do
> not improve the draft by any means.
>
> I guess that the examples helps to understand what we mean by service
> provider and service, i.e. to illustrate that having two levels is
> something common nowadays.
>
>
I think that everyone understands this, I suggest to have a vendor-neutral
document.

>
> [snip]
>
>>
>>
>>      HTTPS Connection
>>            +
>>            |(1)
>>    +-------v------+
>>    |TLS Connection|
>>    |Reconstruction|
>>    +-------+------+
>>            |(2)
>>    +-------v------+    (3')                    (4')
>>    |  Features    +-------------+----------------------------+
>>    |  Extraction  |             |                            |
>>    +-------+------+     +-------v---------+             +----v----+
>>            |            |Service Provider +------------->Services |
>>            |(3)         |L1 model         |   Load      |L2 model |
>>            |            +-------^---------+   services  +----^----+
>>    +-------v------+             |             model X        |
>>    |SNI Labelling |             +----------------------------+
>>    +-------+------+                         |(5)
>>            |            +-----------------------------------------+
>>            +------------>              Training and               |
>>                    (4)  |              Models building            |
>>                         +-----------------------------------------+
>>
>>    Two-levels HTTPS traffic classification
>>
>>    In figure above, step(1) consists in reconstructing the HTTPS
>>    connection and retrieving packets on top of which the following
>>    metrics are observed (2):
>>
>>    o  Inter Arrival Time
>>
>>    o  Packet size
>>
>>    o  Encrypted data size: this feature has the advantage to be strongly
>>       related to the service accessed instead of the packet size which
>>       is biased by other lower layer headers
>>
>>    Based on these values, aggregated features are computed: average,
>>    minimum, maximum, 25th percentile, median, 75th percentile.
>>
>>
> Does the authors see value on listing all the traffic features in an ANNEX?
>
> All can be found in the referenced paper and those that contribute to a
> good classification
> are the ones given in the draft. So, as the author, I would say "no".
>
>
Thanks! In any case, can the authors try to put together the features that
they are using and see if there are any common ones? The document should be
-to a reasonable extent- self-contained.

cheers

Albert


> Best regards,
> Jérôme
>