Re: [Nmlrg] Available Public Data for Learning Algorithms

Sheng Jiang <jiangsheng@huawei.com> Mon, 26 October 2015 01:16 UTC

Return-Path: <jiangsheng@huawei.com>
X-Original-To: nmlrg@ietfa.amsl.com
Delivered-To: nmlrg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0EF201B347B for <nmlrg@ietfa.amsl.com>; Sun, 25 Oct 2015 18:16:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.51
X-Spam-Level:
X-Spam-Status: No, score=-1.51 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FVaPCJPI8geO for <nmlrg@ietfa.amsl.com>; Sun, 25 Oct 2015 18:16:36 -0700 (PDT)
Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [119.145.14.66]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 453861B3478 for <nmlrg@irtf.org>; Sun, 25 Oct 2015 18:16:31 -0700 (PDT)
Received: from 172.24.1.49 (EHLO nkgeml404-hub.china.huawei.com) ([172.24.1.49]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BPQ28524; Mon, 26 Oct 2015 09:16:26 +0800 (CST)
Received: from NKGEML512-MBX.china.huawei.com ([169.254.7.203]) by nkgeml404-hub.china.huawei.com ([10.98.56.35]) with mapi id 14.03.0235.001; Mon, 26 Oct 2015 09:16:23 +0800
From: Sheng Jiang <jiangsheng@huawei.com>
To: Rudra Saha <rudrsaha@gmail.com>, Aunn Raza <12bscssraza@seecs.edu.pk>
Thread-Topic: [Nmlrg] Available Public Data for Learning Algorithms
Thread-Index: AQHRDyWxqRmv52T0mkS62+sH6S5nOp57rBgAgAABOoCAAAd9gIABQhXg
Date: Mon, 26 Oct 2015 01:16:22 +0000
Message-ID: <5D36713D8A4E7348A7E10DF7437A4B927BBC0B7D@nkgeml512-mbx.china.huawei.com>
References: <CAEiGv0-YyUYLGdns6sC6K9nL6VKtyUHXMoo9xxHPXBNG4w82pA@mail.gmail.com> <CACAbbkJxDUtEo+WOF=WjuUvrvo+9dWnHbMpC7V+RkCcCeWCUVQ@mail.gmail.com> <CAEiGv0-_c7UMax9+eZNQTsTccYO4PCgsqsXXtXwJQJjuk967nA@mail.gmail.com> <CACAbbk+fCvvG1+jObijO2FOnFMOwzUu2odFVb33Dz1XMsDoc_w@mail.gmail.com>
In-Reply-To: <CACAbbk+fCvvG1+jObijO2FOnFMOwzUu2odFVb33Dz1XMsDoc_w@mail.gmail.com>
Accept-Language: en-GB, zh-CN, en-US
Content-Language: zh-CN
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.111.99.197]
Content-Type: multipart/alternative; boundary="_000_5D36713D8A4E7348A7E10DF7437A4B927BBC0B7Dnkgeml512mbxchi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.562D7EEB.003E, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=169.254.7.203, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32
X-Mirapoint-Loop-Id: f032b26b45f345747fd1b9171b1bdfd5
Archived-At: <http://mailarchive.ietf.org/arch/msg/nmlrg/5vaX_WXPhb8jM4_2sqA1H-HZZrk>
Cc: Hunain Arif <12bscsharif@seecs.edu.pk>, "nmlrg@irtf.org" <nmlrg@irtf.org>
Subject: Re: [Nmlrg] Available Public Data for Learning Algorithms
X-BeenThere: nmlrg@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Network Machine Learning Research Group <nmlrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nmlrg/>
List-Post: <mailto:nmlrg@irtf.org>
List-Help: <mailto:nmlrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/nmlrg>, <mailto:nmlrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 26 Oct 2015 01:16:38 -0000

Hi, Aunn and all,

Data is very important for machine learning, for sure. In order to evaluate the performance of ML mechanism for a certain use case, a public data set is necessary.

However, before we talk any specific ML mechanisms with a specific set of dataset, we need to firstly study the use cases. The dataset is tightly bound to the use cases. Different use cases are associated with different data set. The use case decides what data is needed or the availabilities of data decide whether a specific task is feasible or not. There are many aspects of networking, before we discussed and convergent to a few specific use cases, it may be a little bit too early to concern any specific dataset.

Thanks and regards,

Sheng

From: nmlrg [mailto:nmlrg-bounces@irtf.org] On Behalf Of Rudra Saha
Sent: Sunday, October 25, 2015 9:54 PM
To: Aunn Raza
Cc: Hunain Arif; nmlrg@irtf.org
Subject: Re: [Nmlrg] Available Public Data for Learning Algorithms


I guess it's a good idea that we can take up a dataset related to networking and discuss whatever insights we may come up with, that being said I don't know about any particular dataset that we can use. If some can point something out, that will be great.

Regards,
Rudra
On Oct 25, 2015 6:57 PM, "Aunn Raza" <12bscssraza@seecs.edu.pk<mailto:12bscssraza@seecs.edu.pk>> wrote:
Hi Rudra,

There are many such dataset repositories, but here we can discuss the dataset with reference to networking applications and network data as the focus of the group.

Regards
Aunn

On Sun, Oct 25, 2015 at 6:22 PM, Rudra Saha <rudrsaha@gmail.com<mailto:rudrsaha@gmail.com>> wrote:

The UCI repository has lots of datasets that can be tinkered around with. They have a good mix of both labeled (for supervised learning and related tasks) and unlabeled.

Regards,
Rudra Saha
On Oct 25, 2015 6:34 PM, "Aunn Raza" <12bscssraza@seecs.edu.pk<mailto:12bscssraza@seecs.edu.pk>> wrote:
Hi All,

I have a background in machine learning but i was hoping to generate this thread as to share the publicly available datasets so that folks among us may use and innovate a new solution from that data.

before sharing links to public datasets, please also notify whether it is labeled or unlabeled and type of the data.

Thanks, Best Regards

Aunn Raza

_______________________________________________
nmlrg mailing list
nmlrg@irtf.org<mailto:nmlrg@irtf.org>
https://www.irtf.org/mailman/listinfo/nmlrg