[nmrg] RE: [IRSG] review of draft-irtf-nmrg-snmp-measure-04.txt

"Bert Wijnen - IETF" <bertietf@bwijnen.net> Mon, 19 May 2008 08:11 UTC

Received: from relay.versatel.net (relay.versatel.net [62.250.3.110]) by bierator.ibr.cs.tu-bs.de (8.13.4/8.13.4/Debian-3sarge3) with SMTP id m4J8BVvC016482 for <nmrg@ibr.cs.tu-bs.de>; Mon, 19 May 2008 10:11:37 +0200
Received: (qmail 10502 invoked from network); 19 May 2008 08:11:31 -0000
Received: from unknown (HELO bwMedion) (87.215.199.34) by relay.versatel.net with SMTP; 19 May 2008 08:11:31 -0000
From: Bert Wijnen - IETF <bertietf@bwijnen.net>
To: "Karen R. Sollins" <sollins@csail.mit.edu>, j.schoenwaelder@jacobs-university.de
Date: Mon, 19 May 2008 10:11:37 +0200
Message-ID: <NIEJLKBACMDODCGLGOCNOEDPEOAA.bertietf@bwijnen.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook IMO, Build 9.0.6604 (9.0.2911.0)
In-Reply-To: <p06240404c456a78f0f60@[192.168.1.105]>
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
Importance: Normal
X-IBRFilter-SpamReport: 0.001 () BAYES_50
X-Scanned-By: MIMEDefang 2.51 on 134.169.34.9
Cc: Internet Research Steering Group <irsg@ISI.EDU>, nmrg@ibr.cs.tu-bs.de
Subject: [nmrg] RE: [IRSG] review of draft-irtf-nmrg-snmp-measure-04.txt
X-BeenThere: nmrg@ibr.cs.tu-bs.de
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Network Management Research Group <nmrg.ibr.cs.tu-bs.de>
List-Unsubscribe: <https://mail.ibr.cs.tu-bs.de/mailman/listinfo/nmrg>, <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=unsubscribe>
List-Archive: <http://mail.ibr.cs.tu-bs.de/pipermail/nmrg>
List-Post: <mailto:nmrg@ibr.cs.tu-bs.de>
List-Help: <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=help>
List-Subscribe: <https://mail.ibr.cs.tu-bs.de/mailman/listinfo/nmrg>, <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=subscribe>
X-List-Received-Date: Mon, 19 May 2008 08:11:43 -0000

Thanks for the discussion sofar.

>From what I have seen during the development of this document
I believe that the scope of the document was intentionally
limited to what the document describes. In other weords, we
intentionally did not want to include the "impact of SNMP traffic
on otehr network activity".
>From the discussion below, I think what we need to do is to be
somewhat more explicit about the fact that and why we have
limited ourselves to that scope.

Karen, please let me know if you agree with my assesment.

Further Juergen has fixed a few things and made a few more changes
as per below discussion.

I asumme that the NMRG group does agree with this approach.
If not, please speak up quickly/soon.

Bert Wijnen
document shepherd

> -----Oorspronkelijk bericht-----
> Van: Karen R. Sollins [mailto:sollins@csail.mit.edu]
> Verzonden: maandag 19 mei 2008 6:05
> Aan: j.schoenwaelder@jacobs-university.de; Karen R. Sollins
> CC: nmrg@ibr.cs.tu-bs.de; Bert Wijnen - IETF; Internet Research Steering
> Group
> Onderwerp: Re: [IRSG] review of draft-irtf-nmrg-snmp-measure-04.txt
>
>
> HI Juergen,
>
> Thanks for your thoughtful responses.  I also did not think that what
> I was suggesting was a lot of work.   At a high level, think about a
> reader who is not part of your group, whom you are trying to convince
> that what you are doing is valuable or whom you would like to
> convince to do such a data collection exercise. I have interspersed
> my comments into your responses below.
>
> 			Cheers,
> 			Karen
>
> At 2:20 PM +0200 5/16/08, Juergen Schoenwaelder wrote:
> >On Mon, May 12, 2008 at 02:52:02PM -0400, Karen R. Sollins wrote:
> >
> >>  Here is my full review of draft-irtf-nmrg-snmp-measure-04.txt.
> >
> >Thank you very much for your substantial review. Below is how I plan
> >to address your comments. For most of your comments, the changes are
> >easy to do. For some of your comments, I suggest to make no changes
> >and I try to explain why. Please check and let me and the RG know
> >where you do not agree with my suggested plan of action.
> >
> >>  I do not think this document is quite ready for publication, but it
> >>  is quite close and I don't think this should be major work.
> >>
> >>  My review falls into two parts.  The first is broad comments and the
> >>  second is nits.  Overall, I agree with the working group that it is
> >>  valuable to spell out a set of details such as this, more or less a
> >>  "protocol" in the sense of biologically based research, describing
> >>  what should be done, what kind of information should be collected and
> >>  why and how to handle it.   I have not reviewed the XML or CSV
> >>  representations nor the reference lists in detail.  With that said, I
> >>  have two high level concerns with this document.
> >>
> >>  The first is that there is confusion about whether this document
> >>  describes a proposed measurement effort or (closer to the reality of
> >>  what is in the document) a methodology by which such measurements
> >>  could and perhaps should be taken.  So, given the bulk of the
> >>  document, I do not believe that it is at all explicit about the
> >>  particular measurement exercise that will be done.   With that said,
> >>  there are a few introductory points that are worth changing.
> >>
> >>  1. The abstract includes the sentence, "This document proposes to
> >>  carry out large scale SNMP traffic measurements."  First, the
> >>  document won't be carrying out anything.  More importantly the
> >>  document describes an approach (not even a pan), so I suggest that
> >>  the sentence read, "This document describes an approach to carrying
> >>  out..."
> >
> >changed as suggested
> >
> >>  2. In Section 1, the 4th paragraph reads, "This document describes an
> >>  effort to collect SNMP traffic traces..."  Instead, I suggest that it
> >>  read, "This document recommends an approach to collecting, codifying,
> >>  and handling SNMP traffic traces..."
> >
> >changed as suggested
> >
> >>  The second high level concern I have is that there is talk about
> >>  specific kinds of information to be collected and an interest in not
> >>  only the nature but longer term inferences with perhaps implications
> >>  for future redesign efforts in the SNMP context.    I have two levels
> >>  of concern here.  The most important one is that since network
> >>  management and in particular SNMP is NOT the primary objective of the
> >>  net (the primary objective being the transport of real payload), it
> >>  seems to me that the truly critical question with respect to network
> >>  management traffic is the impact that it has or does not have on that
> >>  real job.  To me this implies that the measurements MUST also include
> >>  contextual information.  As an example, it is probably more important
> >>  to understand whether  or not the network management traffic is
> >>  causing significant congestion for the payload traffic than the
> >>  particular mix or frequency pattern within the network management
> >>  traffic.  Without out the complementary contextual information, the
> >  > whole measurement exercise seems to me to be of somewhat narrow
> >>  value.
> >
> >The measurement may be of narrow value from your point of view but
> >please keep in mind that this document is coming from the Network
> >Management Research Group and not from a general Network Measurement
> >Research Group. Our goal is to understand how network management
> >protocols are being used because that has impact on their design and
> >implementation strategies. Further note that in many networks, the
> >management traffic is logically and sometimes even physically
> >separated from the normal traffic and perhaps this is the reason why
> >we did not even think about the question whether management traffic
> >has an impact on normal traffic.
>
> If you want to leave it as is, then I think it would be valuable to
> say as much.  Be specific about what you are not doing, because much
> of the rest of the world looks at network traffic from a broader
> perspective.
>
> >
> >>  Secondarily, the document hits on one of my little pet peeves
> >>  - location.  There is mention that the meta-data must include "where
> >>  the trace was collected".  That, in and of itself, is a challenging
> >>  problem.  Is this geographic, topological, or based on other metrics?
> >>  If it is based on something that can change with time (e.g. ip
> >>  addresses or topology), then without keeping track of how those
> >>  metrics may have changed over time, the data set becomes useless
> >>  because the location becomes uninterpretable.  As you all hint, it is
> >>  important that the collected information be archived, in order to be
> >>  useful at significantly later times, and thus, everything importatn
> >>  about the situation must be recorded.  I would like to see more
> >>  specificity with respect to location and clearer acknowledgement of
> >>  its need in order to keep the collected information useful outside
> >>  the current context.
> >
> >Makes sense.
> >
> >>  More generally, there are many cases in which there are statements
> >>  about what kind of information should be collected (e.g.one week
> >>  traces, concurrent vs. sequential requests, etc.) where there is no
> >>  explanation of the criticality of this information or how to go
> >>  beyond the immediately observable facts.
> >>
> >>  I realize that it is generally not in the nature of RFC to provide
> >>  explanation as much as simple unadorned facts, but this is an unusual
> >>  kind of document, because it is describing a potential information
> >>  gathering exercise.  It has some of the explanations of why things
> >>  are important, but it is quite incomplete.  I realize that it is
> >>  important to collect and archive measurement sets exactly because one
> >>  cannot know ahead of time all the potential uses, but there should
> >>  either be a story about collecting everything possible or an initial
> >>  use for each kind of information collected.  In some cases neither
> >>  argument is presented.
> >>
> >>  This all leads to a number of specific questions:
> >>
> >>  1. Section 1: It seems to me that there are TWO key questions with
> >>  respect to SNMP.  The first is how it is being used, which in turn
> >>  leads to the points made in this section, but the second is the
> >>  impact of that traffic.  I think that ought to appear in the
> >>  Introduction as well.
> >
> >See my comments above. So far, the NMRG did not consider the impact of
> >SNMP on other traffic a target of this activity. I don't want to add
> >such text unless I see support from the NMRG and concrete proposals
> >what should be added.
>
> Again, as above, if you want to leave the scope as it is, then you
> should probably be up front about that, clearly leaving that work to
> another time and place.
>
> >
> >>  2. Section 2.1.  The second paragraph begins with, "It is recommended
> >>  to capture at least a full week of data."  This is never justified or
> >>  explained.  Is one week really enough?  For what?  Why wouldn't
> >>  several weeks be critical, because one week might be anomalous?  Why
> >>  isn't a year critical, since we know that there are annual or
> >>  seasonal differences in traffic behaviors?  Typically, I find that
> >>  one-week data sets often leave me with lots of unanswered questions,
> >>  so justify this.
> >
> >The text actually says:
> >
> >    It is recommended to capture at least a full week of data.  Operators
> >    are encouraged to capture traces over even longer periods of time.
> >
> >The text tries to establish a lower bound of one week an encourages
> >longer capture periods. I would love to get continuous traces but
> >reality is such that this is not feasible. Our idea is simply to catch
> >at least the weekly behaviour. Yes, there is of course also monthly or
> >yearly behaviour but I believe it is not useful to set the bar so high
> >that nobody gives us appropriate traces. I personally believe the text
> >is fine as is.
>
> So, what I was really getting at was the question of why one week was
> the minimum necessary.  So, something like that it is the minimum
> over which one can see the diurnal patterns in the weekly pattern and
> it is understood that both for computational and storage reasons the
> operators may not want to collect more.
>
> >
> >>  3. Next paragraph: this is where the location question arises.
> >>  Without some completely standardized and self explanatory capturing
> >>  of location information, any data set will be incomparable to any
> >>  other.
> >
> >I expanded "where the trace was collected" to "where the trace was
> >collected (name of the network and/or name of the organization owning
> >the network, description of the measurement point in the network
> >topology where the trace was collected)".
>
> Good.
>
> >
> >>  4. Section 3.3, end of first paragraph:  The sentence reads, "Some
> >>  SNMP implementations approximate networking delays by measuring
> >>  request-response times and it would be useful to understand to what
> >>  extent this is a viable approach."  I agree, but traces will not tell
> >>  you anything about whether behaviors observed in packet traces are
> >>  for this reason or some other reason.  I do not believe you can get
> >>  at this question with the data you are collecting.
> >
> >I think it is possible to analyze retransmission behaviour. Depending
> >on the SNMP version used (and the other versions also depending on
> >implementation choices), you can get information whether a response is
> >just coming late for the original request or it is actually a response
> >to a retranmitted request. We are not talking TCP here; we are talking
> >about application layer retransmissions and SNMP has its own msgID and
> >requestID fields.
>
> The point I was trying to make here is that it is very difficult to
> intuit the reasons behind behaviors seen in the traffic, unless
> someone or something tells you.  So, you can see what the ends do,
> but not why.
>
> >
> >>  5. Section 3.4: Please explain why it is "interesting" (your word) to
> >>  identify whether concurrency or sequentiality is occurring?  What
> >>  will you "learn" if both are observed?  And, if one is occurring more
> >>  frequently or under specific identifiable conditions, what further
> >>  does that tell you?  Just knowing that one or the other occurs is
> >>  only the tip of the iceberg, and without acknowledging the fact that
> >>  these are important and unanswered questions, just learning first
> >>  ordered details suggests you are setting the bar too low.
> >
> >The introduction of section 3 says:
> >
> >    The questions raised in the following subsections are meant to be
> >    illustrative and no attempt has been made to provide a complete
> >    list.
> >
> >I believe it is a good idea to first figure out whether there is an
> >iceberg or not (keeping your analogy) and if there is one to ask
> >questions how big the iceberg might be. For SNMP agent implementations
> >that tend to do quite some caching, it is useful to know how well
> >caching strategies are working in real-world networks. The concurrency
> >level an agent experiences has clear impact on that. Furthermore, it
> >will be useful to know how bursty the traffic tends to be or how well
> >managers spread the traffic over polling intervals and this is again
> >related to the concurrency we can extract from traces.
> >
> >I am not really sure what I should change, perhaps the word
> >"interesting" is the source of the trouble and I should replace this
> >with "valuable"?
>
> What I was trying to ask was that you tell the reader a bit about
> what makes it interesting.  It certainly doesn't have to be complete,
> but just a hint.  I don't think you should change the word
> "interesting", because if it isn't interesting, you probably
> shouldn't be doing it.
>
> >
> >>  6. Section 3.5:  Please explain what you would do  with the
> >>  information about which approach to table retrieval is used.  Again,
> >>  what if the results tell you that both are used?  And, if not, of
> >>  what use is it to know which approach is prevalent?  Mighten it be
> >>  useful to know the conditions under which one or the other is used
> >  > more commonly?
> >
> >This again has direct impact on agent implementation techniques and
> >caching strategies.
>
> And again, just explain a little.
>
> >
> >>  Now on to the real NITS:
> >>  1. p. 1, Section 1, middle of second paragraph:
> >>  s/In fact, there are many speculations how SNMP/In fact, there are
> >>  many speculations on how SNMP/
> >
> >fixed
> >
> >>  2. p. 6, Section 2.2, 5th paragraph:
> >>  s/for SNMP messages that got fragmented/for SNMP messages that
> >>were fragmented/
> >
> >fixed
> >
> >>  3. p. 7, Section 2.4, last paragraph:
> >>  s/Improvements in the tool chain may require to go back to the
> >>  original pcap traces and to rebuild all intermediate formats from
> >>  them./Improvements in the tool chain may require going back to the
> >>  original pcap traces and rebuilding all intermediate formats from
> >>  them./
> >
> >fixed
> >
> >>  4. p. 7, Section 2.5, first paragraph:
> >>  s/all scripts used to analyze traffic traces would be/all scripts
> >>  used to analyze traffic traces will be/
> >
> >fixed
> >
> >>  5. p. 8, first sentence:
> >>  s/A common versioned repository for analysis of scripts might be
> >>  useful to establish./It might be useful to establish and common,
> >>  versioning repository for analysis scripts./
> >
> >fixed
> >
> >>  6. p. 6, second paragraph (includes several corrections):
> >>  s/it is suggested that analysis scripts are written in scripting
> >>  languages such as Perl using suitable Perl modules to manipulate XML
> >>  documents [8].  Using a scripting language such as Perl instead of
> >>  system programming languages such as C or C++ has the advantage to
> >>  reduce development time and to make scripts more accessible to
> >>  operators who may want to verify scripts before running them on trace
> >>  files which potentially contain sensitive data./it is suggested that
> >>  analysis scripts be written is scripting languages such as Perl using
> >>  suitable Perl modules to manipulate XML documents [8].  Using a
> >>  scripting language such as Perl instead of system programming
> >>  languages such as C or C++ has the advantage of reducing development
> >>  time and making scripts more accessible to operators who may want to
> >>  verify scripts before running them on trace files which may contain
> >>  sensitive data./
> >
> >fixed
> >
> >>  6. p. 6, end of next paragraph:
> >>  s/might be the only option to deal/might be the only option
> in dealing/
> >
> >fixed
> >
> >>  7. p. 9, Section 3.2.  In this case there are enough corrections in
> >>  close proximity that I've grouped them together.  I suggest that the
> >>  following text:
> >>
> >>  "SNMP is used to periodically poll devices as well as to retrieve
> >>  information on request of an operator or application.  The periodic
> >>  polling leads to periodic traffic patterns while the on demand
> >>  information retrieval causes more aperiodic traffic patterns.  It is
> >>  worthwhile to understand what the relationship is between the amount
> >>  of periodic and aperiodic traffic.  It will be interesting to
> >>  research whether there are multiple levels of periodicity at
> >>  different time scales.
> >>
> >>  The periodic polling behavior may dependent on the application and
> >>  the polling engine it uses.  For example, some management platforms
> >>  allow applications to specify how long polled values may be kept in a
> >>  cache before it is polled again."
> >>
> >>  have the following grammatical changes made:
> >>
> >>  "SNMP is used to periodically poll devices as well as to retrieve
> >>  information at the request of an operator or application.  The
> >>  periodic polling leads to periodic traffic patterns while on-demand
> >>  information retrieval causes more aperiodic traffic patterns.  It is
> >>  worthwhile to understand what the relationship is between the amount
> >>  of periodic and aperiodic traffic.  It will be interesting to
> >>  understand whether there are multiple levels of periodicity at
> >>  different time scales.
> >>
> >>  Periodic polling behavior may be dependent on the application and
> >  > polling engine it uses.  For example, some management platforms allow
> >>  applications to specify how long polled values may be kept in a cache
> >>  before they are polled again."
> >
> >fixed
> >
> >>  8. p. 10, Section 3.6, middle first paragraph:
> >>  s/ adopt/adapt/
> >
> >fixed
> >
> >/js
> >
> >--
> >Juergen Schoenwaelder           Jacobs University Bremen gGmbH
> >Phone: +49 421 200 3587         Campus Ring 1, 28759 Bremen, Germany
> >Fax:   +49 421 200 3103         <http://www.jacobs-university.de/>
>
>
> --
>
> Karen R. Sollins, Ph. D.
> Principal Research Scientist
> M.I.T. CSAIL
> The Stata Center
> 32 Vassar St., 32-G818
> Cambridge, MA 02139
> V: 617/253-6006
> F: 617/253-2673
> E: sollins@csail.mit.edu
>