[nmrg] Re: [IRSG] review of draft-irtf-nmrg-snmp-measure-04.txt

"Karen R. Sollins" <sollins@csail.mit.edu> Mon, 19 May 2008 17:25 UTC

Received: from mercury.lcs.mit.edu (mercury.lcs.mit.edu [18.26.0.122]) by bierator.ibr.cs.tu-bs.de (8.13.4/8.13.4/Debian-3sarge3) with ESMTP id m4JHPi6M016414 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <nmrg@ibr.cs.tu-bs.de>; Mon, 19 May 2008 19:25:50 +0200
Received: from [192.168.1.105] (cpe-76-168-88-197.socal.res.rr.com [76.168.88.197]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mercury.lcs.mit.edu (Postfix) with ESMTP id 91CBE6BE59D; Mon, 19 May 2008 13:25:37 -0400 (EDT)
Mime-Version: 1.0
Message-Id: <p06240407c45767a7388d@[192.168.1.105]>
In-Reply-To: <20080519094639.GA27481@elstar.local>
References: <p06240840c44e32552b6b@[18.26.0.27]> <20080516122042.GA19275@elstar.local> <p06240404c456a78f0f60@[192.168.1.105]> <20080519094639.GA27481@elstar.local>
Date: Mon, 19 May 2008 10:25:06 -0700
To: j.schoenwaelder@jacobs-university.de, "Karen R. Sollins" <sollins@csail.mit.edu>
From: "Karen R. Sollins" <sollins@csail.mit.edu>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
X-IBRFilter-SpamReport: 0.001 () BAYES_50
X-Scanned-By: MIMEDefang 2.51 on 134.169.34.9
Cc: Internet Research Steering Group <irsg@ISI.EDU>, nmrg@ibr.cs.tu-bs.de
Subject: [nmrg] Re: [IRSG] review of draft-irtf-nmrg-snmp-measure-04.txt
X-BeenThere: nmrg@ibr.cs.tu-bs.de
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Network Management Research Group <nmrg.ibr.cs.tu-bs.de>
List-Unsubscribe: <https://mail.ibr.cs.tu-bs.de/mailman/listinfo/nmrg>, <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=unsubscribe>
List-Archive: <http://mail.ibr.cs.tu-bs.de/pipermail/nmrg>
List-Post: <mailto:nmrg@ibr.cs.tu-bs.de>
List-Help: <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=help>
List-Subscribe: <https://mail.ibr.cs.tu-bs.de/mailman/listinfo/nmrg>, <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=subscribe>
X-List-Received-Date: Mon, 19 May 2008 17:25:53 -0000

HI Juergen,

It is all fine with me.  I think all you need now is concurrence from 
the research group.

			Cheers,
			Karen

At 11:46 AM +0200 5/19/08, Juergen Schoenwaelder wrote:
>On Sun, May 18, 2008 at 09:05:15PM -0700, Karen R. Sollins wrote:
>
>>  Thanks for your thoughtful responses.  I also did not think that what I was
>>  suggesting was a lot of work.   At a high level, think about a reader who
>>  is not part of your group, whom you are trying to convince that what you
>>  are doing is valuable or whom you would like to convince to do such a data
>>  collection exercise. I have interspersed my comments into your responses
>>  below.
>
>I will repond to those things that are still "open" with suggestions
>on how to "close" them.
>
>KS>  The second high level concern I have is that there is talk about
>KS>  specific kinds of information to be collected and an interest in not
>KS>  only the nature but longer term inferences with perhaps implications
>KS>  for future redesign efforts in the SNMP context.    I have two levels
>KS>  of concern here.  The most important one is that since network
>KS>  management and in particular SNMP is NOT the primary objective of the
>KS>  net (the primary objective being the transport of real payload), it
>KS>  seems to me that the truly critical question with respect to network
>KS>  management traffic is the impact that it has or does not have on that
>KS>  real job.  To me this implies that the measurements MUST also include
>KS>  contextual information.  As an example, it is probably more important
>KS>  to understand whether  or not the network management traffic is
>KS>  causing significant congestion for the payload traffic than the
>KS>  particular mix or frequency pattern within the network management
>KS>  traffic.  Without out the complementary contextual information, the
>KS> whole measurement exercise seems to me to be of somewhat narrow
>KS>  value.
>
>JS> The measurement may be of narrow value from your point of view but
>JS> please keep in mind that this document is coming from the Network
>JS> Management Research Group and not from a general Network Measurement
>JS> Research Group. Our goal is to understand how network management
>JS> protocols are being used because that has impact on their design and
>JS> implementation strategies. Further note that in many networks, the
>JS> management traffic is logically and sometimes even physically
>JS> separated from the normal traffic and perhaps this is the reason why
>JS> we did not even think about the question whether management traffic
>JS> has an impact on normal traffic.
>
>KS> If you want to leave it as is, then I think it would be valuable to say as
>KS> much.  Be specific about what you are not doing, because much of the rest
>KS> of the world looks at network traffic from a broader perspective.
>
>I think the abstract and the introduction are pretty clearly spelling
>out the scope of the measurement effort. So I am not sure changes are
>needed, but see below.
>
>KS>  1. Section 1: It seems to me that there are TWO key questions with
>KS>  respect to SNMP.  The first is how it is being used, which in turn
>KS>  leads to the points made in this section, but the second is the
>KS>  impact of that traffic.  I think that ought to appear in the
>KS>  Introduction as well.
>
>JS> See my comments above. So far, the NMRG did not consider the impact of
>JS> SNMP on other traffic a target of this activity. I don't want to add
>JS> such text unless I see support from the NMRG and concrete proposals
>JS> what should be added.
>
>KS> Again, as above, if you want to leave the scope as it is, then you should
>KS> probably be up front about that, clearly leaving that work to another time
>KS> and place.
>
>I propose to add the following paragraph just before the last
>paragraph in section 1:
>
>    The measurement approach described in this document is by design
>    limited to the study of SNMP traffic.  Studies of other management
>    protocols or the impact of management protocols such as SNMP on
>    other traffic sharing the same network resources is left to future
>    efforts.

This is what I was looking for.

>KS>  2. Section 2.1.  The second paragraph begins with, "It is recommended
>KS>  to capture at least a full week of data."  This is never justified or
>KS>  explained.  Is one week really enough?  For what?  Why wouldn't
>KS>  several weeks be critical, because one week might be anomalous?  Why
>KS>  isn't a year critical, since we know that there are annual or
>KS>  seasonal differences in traffic behaviors?  Typically, I find that
>KS>  one-week data sets often leave me with lots of unanswered questions,
>KS>  so justify this.
>
>JS> The text actually says:
>JS>
>JS>    It is recommended to capture at least a full week of data.  Operators
>JS>    are encouraged to capture traces over even longer periods of time.
>JS>
>JS> The text tries to establish a lower bound of one week an encourages
>JS> longer capture periods. I would love to get continuous traces but
>JS> reality is such that this is not feasible. Our idea is simply to catch
>JS> at least the weekly behaviour. Yes, there is of course also monthly or
>JS> yearly behaviour but I believe it is not useful to set the bar so high
>JS> that nobody gives us appropriate traces. I personally believe the text
>JS> is fine as is.
>
>KS> So, what I was really getting at was the question of why one week
>KS> was the minimum necessary.  So, something like that it is the
>KS> minimum over which one can see the diurnal patterns in the weekly
>KS> pattern and it is understood that both for computational and
>KS> storage reasons the operators may not want to collect more.
>
>I have changed the text to the following:
>
>    It is recommended to capture at least a full week of data to capture
>    diurnal patterns and one cycle of weekly behavior.  Operators are
>    strongly encouraged to capture traces over even longer periods of
>    time.

OK by me.

>KS>  4. Section 3.3, end of first paragraph:  The sentence reads, "Some
>KS>  SNMP implementations approximate networking delays by measuring
>KS>  request-response times and it would be useful to understand to what
>KS>  extent this is a viable approach."  I agree, but traces will not tell
>KS>  you anything about whether behaviors observed in packet traces are
>KS>  for this reason or some other reason.  I do not believe you can get
>KS>  at this question with the data you are collecting.
>
>JS> I think it is possible to analyze retransmission behaviour. Depending
>JS> on the SNMP version used (and the other versions also depending on
>JS> implementation choices), you can get information whether a response is
>JS> just coming late for the original request or it is actually a response
>JS> to a retranmitted request. We are not talking TCP here; we are talking
>JS> about application layer retransmissions and SNMP has its own msgID and
>JS> requestID fields.
>
>KS> The point I was trying to make here is that it is very difficult to intuit
>KS> the reasons behind behaviors seen in the traffic, unless someone or
>KS> something tells you.  So, you can see what the ends do, but not why.
>
>I agree, but this is a rather general observation and not necessarily
>specific to 3.3 and it is not clear to me what I should do about it.
>There is already a general remark in the last paragraph of section 2.5
>that one has to be careful with drawing conclusions that go beyond
>what you can really get out of traces.

OK - I'll drop it.

>
>KS>  5. Section 3.4: Please explain why it is "interesting" (your word) to
>KS>  identify whether concurrency or sequentiality is occurring?  What
>KS>  will you "learn" if both are observed?  And, if one is occurring more
>KS>  frequently or under specific identifiable conditions, what further
>KS>  does that tell you?  Just knowing that one or the other occurs is
>KS>  only the tip of the iceberg, and without acknowledging the fact that
>KS>  these are important and unanswered questions, just learning first
>KS>  ordered details suggests you are setting the bar too low.
>
>JS> The introduction of section 3 says:
>JS>
>JS>    The questions raised in the following subsections are meant to be
>JS>    illustrative and no attempt has been made to provide a complete
>JS>    list.
>JS>
>JS> I believe it is a good idea to first figure out whether there is an
>JS> iceberg or not (keeping your analogy) and if there is one to ask
>JS> questions how big the iceberg might be. For SNMP agent implementations
>JS> that tend to do quite some caching, it is useful to know how well
>JS> caching strategies are working in real-world networks. The concurrency
>JS> level an agent experiences has clear impact on that. Furthermore, it
>JS> will be useful to know how bursty the traffic tends to be or how well
>JS> managers spread the traffic over polling intervals and this is again
>JS> related to the concurrency we can extract from traces.
>JS>
>JS> I am not really sure what I should change, perhaps the word
>JS> "interesting" is the source of the trouble and I should replace this
>JS> with "valuable"?
>
>KS> What I was trying to ask was that you tell the reader a bit about
>KS> what makes it interesting.  It certainly doesn't have to be
>KS> complete, but just a> hint.  I don't think you should change the
>KS> word "interesting", because if it isn't interesting, you probably
>KS> shouldn't be doing it.
>
>I added the following sentence to section 3.4:
>
>    The concurrency level and the amount of redundant requests has
>    implications on caching strategies employed by SNMP agents.

Fine.

>KS>  6. Section 3.5:  Please explain what you would do  with the
>KS>  information about which approach to table retrieval is used.  Again,
>KS>  what if the results tell you that both are used?  And, if not, of
>KS>  what use is it to know which approach is prevalent?  Mighten it be
>KS>  useful to know the conditions under which one or the other is used
>KS>  more commonly?
>
>JS> This again has direct impact on agent implementation techniques and
>JS> caching strategies.
>
>KS> And again, just explain a little.
>
>I changed the last sentence of section 3.5 to read as follows:
>
>    [...] It will be useful to know which of
>    these approaches are used on production networks since this has a
>    direct implication on agent implementation techniques and caching
>    strategies.

Fine.

>/js
>
>--
>Juergen Schoenwaelder           Jacobs University Bremen gGmbH
>Phone: +49 421 200 3587         Campus Ring 1, 28759 Bremen, Germany
>Fax:   +49 421 200 3103         <http://www.jacobs-university.de/>


-- 

Karen R. Sollins, Ph. D.
Principal Research Scientist
M.I.T. CSAIL
The Stata Center
32 Vassar St., 32-G818
Cambridge, MA 02139
V: 617/253-6006
F: 617/253-2673
E: sollins@csail.mit.edu