Re: [Idnet] A few ideas/suggestions to get us going

David Meyer <dmm@1-4-5.net> Thu, 23 March 2017 13:35 UTC

MIME-Version: 1.0
In-Reply-To: <VI1PR0701MB193499B59CC0B6C8EF2AFD5DAB3F0@VI1PR0701MB1934.eurprd07.prod.outlook.com>
References: <CAHiKxWh26ciY-Pf78EH3CLO1+d3utikMr1N8GwKWJzkZQAAu9g@mail.gmail.com> <VI1PR0701MB193499B59CC0B6C8EF2AFD5DAB3F0@VI1PR0701MB1934.eurprd07.prod.outlook.com>
From: David Meyer <dmm@1-4-5.net>
Date: Thu, 23 Mar 2017 06:35:31 -0700
Message-ID: <CAHiKxWh1nGakQeDDPfLgyQXDx2CM285Fn-34xVuDp2i8PBfkiQ@mail.gmail.com>
To: Rana Pratap Sircar <rana.pratap.sircar@ericsson.com>
Cc: "idnet@ietf.org" <idnet@ietf.org>
Content-Type: multipart/alternative; boundary="001a113773b4789736054b65f3a0"
Archived-At: <https://mailarchive.ietf.org/arch/msg/idnet/7LtAAj4u8vtyaxTrB6a_1B8z-nM>
Subject: Re: [Idnet] A few ideas/suggestions to get us going
Precedence: list

Hey Rana,


On Thu, Mar 23, 2017 at 1:42 AM, Rana Pratap Sircar <
rana.pratap.sircar@ericsson.com> wrote:

> Hi Dave,
>
>
>
> As always another wonderful initiative from you. I too have been
> struggling with the 3 points that you mentioned –
>
> 1.       Theory of Networking
>
> 2.       Open Datasets that are relevant to Networks and not the images
> and demography aspects – not too sure of the initiatives such as
> https://github.com/opentraffic & http://www.caida.org/data/
>
> 3.       Skills
>
>
>
> I feel that apart from this, there are a couple of additional challenges –
>
> 1.       Networks have a fairly complex layered architecture. Thus, any
> problem statement needs to look at a smaller scope (assuming that behavior
> of network is sum of these scopes ~ which, in my humble opinion, is
> incorrect)
>

dmm> definitely; this goes with the "usable" theory of networking

> 2.       Continuously and rapidly evolving technology
>
dmm> Yes. One way to think about this is that in many cases ML models
assume a "stationary" underlying Data Generation Distribution [0];
obviously this isn't isn't the case in adversarial situations (this is why
"baselines" are weak in anomaly detection scenarios; an attacker merely
observes the black-box behavior and changes behavior accordingly) or in any
other case in which the underlying processes change (for example, in APT
scenarios). I will point out here that this is one place where distributed
representations can help; see [1] for a really nice overview of
representation theory.

Thanks,

Dave


[0] The underlying data generating distribution (DGD)  is the process or
set of processes that generate the data we observe; in some sense the
observations are a proxy for this DGD. The behavior of these processes is
what we really want to understand.

[1] https://arxiv.org/pdf/1305.0445.pdf


>
> I would most definitely like to participate & contribute…
>
>
>
> Best regards,
>
> Rana
>
> Ph: +91 88 00 22 4872 <+91%2088002%2024872>
>
> "You can't make the same mistake twice, the second time, it's not a
> mistake, it's a choice." - Anonymous
>
>
>
> *From:* IDNET [mailto:idnet-bounces@ietf.org] *On Behalf Of *David Meyer
> *Sent:* Wednesday, March 22, 2017 10:59 PM
> *To:* idnet@ietf.org
> *Subject:* [Idnet] A few ideas/suggestions to get us going
>
>
>
> Folks,
>
>
>
> I thought I'd try to get some discussion going by outlining some of my
> views as to why networking is lagging other areas in the development and
> application of Machine Learning (ML). In particular, networking is way
> behind what we might call the "perceptual tasks" (vision, NLP, robotics,
> etc) as well as other areas (medicine, finance, ...). The attached slide
> from one of my decks tries to summarize the situation, but I'll give a bit
> of an outline below.
>
>
>
> So why is networking lagging many other fields when it comes to the
> application of machine learning? There are several reasons which I'll try
> to outline here (I was fortunate enough to discuss this with the
> packetpushers crew a few weeks ago, see [0]). These are in no particular
> order.
>
>
>
> First, we don't have a "useful" theory of networking (UTON). One way to
> think about what such a theory would look like is by analogy to what we see
> with the success of convolutional neural networks (CNNs) not only for
> vision but now for many other tasks. In that case there is a theory of how
> vision works, built up from concepts like receptive fields, shared weights,
> simple and complex cells, etc. For example, the input layer of a CNN isn't
> fully connected; rather connections reflect the receptive field of the
> input layer, which is in a way that is "inspired" by biological vision
> (being very careful with "biological inspiration"). Same with the
> alternation of convolutional and pooling layers; these loosely model the
> alternation of simple and complex cells in the primary visual cortex (V1),
> the secondary visual cortex(V2) and the Brodmann area (V3). BTW, such a
> theory seems to be required for transfer learning [1], which we'll need if
> we don't want every network to be analyzed in an ad-hoc, one-off style
> (like we see today).
>
>
>
> The second thing that we need to think about is publicly available
> standardized data sets. Examples here include MNIST, ImageNet, and many
> others. The result of having these data sets has been the steady ratcheting
> down of error rates on tasks such as object and scene recognition, NLP, and
> others to super-human levels. Suffice it to say we have nothing like these
> data sets for networking. Networking data sets today are largely
> proprietary, and because there is no UTON, there is no real way to compare
> results between them.
>
>
>
> Third, there is a large skill set gap. Network engineers (us!) typically
> don't have the mathematical background required to build effective machine
> learning at scale. See [2] for an outline of some of the mathematical
> skills that are essential for effective ML. There is a lot more to this,
> involving how progress is made in ML (open data, open source, open models,
> in general open science and associated communities, see e.g., OpenAi [3],
> Distill [4], and many others). In any event we need build community and
> gain new skills if we want to be able to develop and apply state of the art
> machine learning algorithms to network data, at scale. The bottom line is
> that it will be difficult if not impossible to be effective in the ML space
> if we ourselves don't understand how it works and further, if we can build
> explainable systems (noting that explaining what the individual neurons in
> a deep neural network are doing is notoriously difficult; that said much
> progress is being made). So we want to build explainable, end-to-end
> trained systems, and to accomplish this we ourselves need to understand how
> these algorithms work, but in training and in inference.
>
>
>
> This email is already TL;DR but I'll add one more here: We need to learn
> control, not just prediction. Since we live in an inherently adversarial
> environment we need to take advantage of Reinforcement Learning as well as
> the various attacks being formulated against ML; [5] gives one interesting
> example of attacks against policy networks using adversarial examples. See
> also slides 31 and 32 of [6] for some more on this topic.
>
>
>
> I hope some of this gets us thinking about the problems we need to solve
> in order to be successful in the ML space. There's plenty more of this on
> http://www.1-4-5.net/~dmm/ml and http://www.1-4-5.net/~dmm/vita.html.
>
> I'm looking forward to the discussion.
>
>
>
> Thanks,
>
>
>
> --dmm
>
>
>
>
>
>
>
>
>
> [0]  http://packetpushers.net/podcast/podcasts/pq-show-107-
> applicability-machine-learning-networking/
>
> [1]  http://sebastianruder.com/transfer-learning/index.html
>
> [2]  http://datascience.ibm.com/blog/the-mathematics-of-machine-learning/
>
> [3] https://openai.com/blog/
>
> [4] http://distill.pub/
>
> [5] http://rll.berkeley.edu/adversarial/arXiv2017_AdversarialAttacks.pdf
>
> [6]  http://www.1-4-5.net/~dmm/ml/talks/2016/cor_ml4networking.pptx
>

[Idnet] A few ideas/suggestions to get us going David Meyer
Re: [Idnet] A few ideas/suggestions to get us goi… Rana Pratap Sircar
Re: [Idnet] A few ideas/suggestions to get us goi… Henk Birkholz
Re: [Idnet] A few ideas/suggestions to get us goi… David Meyer
Re: [Idnet] A few ideas/suggestions to get us goi… David Meyer
Re: [Idnet] A few ideas/suggestions to get us goi… David Meyer
Re: [Idnet] A few ideas/suggestions to get us goi… Adeel Rehman
Re: [Idnet] A few ideas/suggestions to get us goi… Pedro Martinez-Julia
Re: [Idnet] A few ideas/suggestions to get us goi… David Meyer
Re: [Idnet] A few ideas/suggestions to get us goi… João Paulo S. Medeiros
Re: [Idnet] A few ideas/suggestions to get us goi… Pedro Martinez-Julia
Re: [Idnet] A few ideas/suggestions to get us goi… João Paulo S. Medeiros
Re: [Idnet] A few ideas/suggestions to get us goi… David Meyer
[Idnet] FPS game traffic datasets... Re: A few id… grenville armitage