[nmrg] Re: [IRSG] review of draft-irtf-nmrg-snmp-measure-04.txt

Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de> Mon, 19 May 2008 09:46 UTC

Received: from hermes.jacobs-university.de (hermes.jacobs-university.de [212.201.44.23]) by bierator.ibr.cs.tu-bs.de (8.13.4/8.13.4/Debian-3sarge3) with ESMTP id m4J9kkLM009294 for <nmrg@ibr.cs.tu-bs.de>; Mon, 19 May 2008 11:46:51 +0200
Received: from localhost (demetrius3.jacobs-university.de [212.201.44.48]) by hermes.jacobs-university.de (Postfix) with ESMTP id 19658C0002; Mon, 19 May 2008 11:46:46 +0200 (CEST)
X-Virus-Scanned: amavisd-new at jacobs-university.de
Received: from hermes.jacobs-university.de ([212.201.44.23]) by localhost (demetrius3.jacobs-university.de [212.201.44.32]) (amavisd-new, port 10024) with ESMTP id Ob-VXCaLdwoP; Mon, 19 May 2008 11:46:39 +0200 (CEST)
Received: from elstar.local (elstar.iuhb02.iu-bremen.de [10.50.231.133]) by hermes.jacobs-university.de (Postfix) with ESMTP id 1E5F0C000D; Mon, 19 May 2008 11:46:39 +0200 (CEST)
Received: by elstar.local (Postfix, from userid 501) id 343A15970AE; Mon, 19 May 2008 11:46:39 +0200 (CEST)
Date: Mon, 19 May 2008 11:46:39 +0200
From: Juergen Schoenwaelder <j.schoenwaelder@jacobs-university.de>
To: "Karen R. Sollins" <sollins@csail.mit.edu>
Message-ID: <20080519094639.GA27481@elstar.local>
Mail-Followup-To: "Karen R. Sollins" <sollins@csail.mit.edu>, nmrg@ibr.cs.tu-bs.de, Bert Wijnen - IETF <bertietf@bwijnen.net>, Internet Research Steering Group <irsg@ISI.EDU>
References: <p06240840c44e32552b6b@[18.26.0.27]> <20080516122042.GA19275@elstar.local> <p06240404c456a78f0f60@[192.168.1.105]>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <p06240404c456a78f0f60@[192.168.1.105]>
User-Agent: Mutt/1.5.17 (2007-11-01)
X-IBRFilter-SpamReport: 0.001 () BAYES_50
X-Scanned-By: MIMEDefang 2.51 on 134.169.34.9
Cc: Internet Research Steering Group <irsg@ISI.EDU>, nmrg@ibr.cs.tu-bs.de
Subject: [nmrg] Re: [IRSG] review of draft-irtf-nmrg-snmp-measure-04.txt
X-BeenThere: nmrg@ibr.cs.tu-bs.de
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: j.schoenwaelder@jacobs-university.de
List-Id: Network Management Research Group <nmrg.ibr.cs.tu-bs.de>
List-Unsubscribe: <https://mail.ibr.cs.tu-bs.de/mailman/listinfo/nmrg>, <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=unsubscribe>
List-Archive: <http://mail.ibr.cs.tu-bs.de/pipermail/nmrg>
List-Post: <mailto:nmrg@ibr.cs.tu-bs.de>
List-Help: <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=help>
List-Subscribe: <https://mail.ibr.cs.tu-bs.de/mailman/listinfo/nmrg>, <mailto:nmrg-request@ibr.cs.tu-bs.de?subject=subscribe>
X-List-Received-Date: Mon, 19 May 2008 09:46:53 -0000

On Sun, May 18, 2008 at 09:05:15PM -0700, Karen R. Sollins wrote:

> Thanks for your thoughtful responses.  I also did not think that what I was 
> suggesting was a lot of work.   At a high level, think about a reader who 
> is not part of your group, whom you are trying to convince that what you 
> are doing is valuable or whom you would like to convince to do such a data 
> collection exercise. I have interspersed my comments into your responses 
> below.

I will repond to those things that are still "open" with suggestions
on how to "close" them.

KS>  The second high level concern I have is that there is talk about
KS>  specific kinds of information to be collected and an interest in not
KS>  only the nature but longer term inferences with perhaps implications
KS>  for future redesign efforts in the SNMP context.    I have two levels
KS>  of concern here.  The most important one is that since network
KS>  management and in particular SNMP is NOT the primary objective of the
KS>  net (the primary objective being the transport of real payload), it
KS>  seems to me that the truly critical question with respect to network
KS>  management traffic is the impact that it has or does not have on that
KS>  real job.  To me this implies that the measurements MUST also include
KS>  contextual information.  As an example, it is probably more important
KS>  to understand whether  or not the network management traffic is
KS>  causing significant congestion for the payload traffic than the
KS>  particular mix or frequency pattern within the network management
KS>  traffic.  Without out the complementary contextual information, the
KS> whole measurement exercise seems to me to be of somewhat narrow
KS>  value. 

JS> The measurement may be of narrow value from your point of view but
JS> please keep in mind that this document is coming from the Network
JS> Management Research Group and not from a general Network Measurement
JS> Research Group. Our goal is to understand how network management
JS> protocols are being used because that has impact on their design and
JS> implementation strategies. Further note that in many networks, the
JS> management traffic is logically and sometimes even physically
JS> separated from the normal traffic and perhaps this is the reason why
JS> we did not even think about the question whether management traffic
JS> has an impact on normal traffic.

KS> If you want to leave it as is, then I think it would be valuable to say as 
KS> much.  Be specific about what you are not doing, because much of the rest 
KS> of the world looks at network traffic from a broader perspective.

I think the abstract and the introduction are pretty clearly spelling
out the scope of the measurement effort. So I am not sure changes are
needed, but see below.

KS>  1. Section 1: It seems to me that there are TWO key questions with
KS>  respect to SNMP.  The first is how it is being used, which in turn
KS>  leads to the points made in this section, but the second is the
KS>  impact of that traffic.  I think that ought to appear in the
KS>  Introduction as well.

JS> See my comments above. So far, the NMRG did not consider the impact of
JS> SNMP on other traffic a target of this activity. I don't want to add
JS> such text unless I see support from the NMRG and concrete proposals
JS> what should be added.

KS> Again, as above, if you want to leave the scope as it is, then you should 
KS> probably be up front about that, clearly leaving that work to another time 
KS> and place.

I propose to add the following paragraph just before the last
paragraph in section 1:

   The measurement approach described in this document is by design
   limited to the study of SNMP traffic.  Studies of other management
   protocols or the impact of management protocols such as SNMP on
   other traffic sharing the same network resources is left to future
   efforts.

KS>  2. Section 2.1.  The second paragraph begins with, "It is recommended
KS>  to capture at least a full week of data."  This is never justified or
KS>  explained.  Is one week really enough?  For what?  Why wouldn't
KS>  several weeks be critical, because one week might be anomalous?  Why
KS>  isn't a year critical, since we know that there are annual or
KS>  seasonal differences in traffic behaviors?  Typically, I find that
KS>  one-week data sets often leave me with lots of unanswered questions,
KS>  so justify this.

JS> The text actually says:
JS>
JS>    It is recommended to capture at least a full week of data.  Operators
JS>    are encouraged to capture traces over even longer periods of time.
JS>
JS> The text tries to establish a lower bound of one week an encourages
JS> longer capture periods. I would love to get continuous traces but
JS> reality is such that this is not feasible. Our idea is simply to catch
JS> at least the weekly behaviour. Yes, there is of course also monthly or
JS> yearly behaviour but I believe it is not useful to set the bar so high
JS> that nobody gives us appropriate traces. I personally believe the text
JS> is fine as is.

KS> So, what I was really getting at was the question of why one week
KS> was the minimum necessary.  So, something like that it is the
KS> minimum over which one can see the diurnal patterns in the weekly
KS> pattern and it is understood that both for computational and
KS> storage reasons the operators may not want to collect more.

I have changed the text to the following:

   It is recommended to capture at least a full week of data to capture
   diurnal patterns and one cycle of weekly behavior.  Operators are
   strongly encouraged to capture traces over even longer periods of
   time.  

KS>  4. Section 3.3, end of first paragraph:  The sentence reads, "Some
KS>  SNMP implementations approximate networking delays by measuring
KS>  request-response times and it would be useful to understand to what
KS>  extent this is a viable approach."  I agree, but traces will not tell
KS>  you anything about whether behaviors observed in packet traces are
KS>  for this reason or some other reason.  I do not believe you can get
KS>  at this question with the data you are collecting.

JS> I think it is possible to analyze retransmission behaviour. Depending
JS> on the SNMP version used (and the other versions also depending on
JS> implementation choices), you can get information whether a response is
JS> just coming late for the original request or it is actually a response
JS> to a retranmitted request. We are not talking TCP here; we are talking
JS> about application layer retransmissions and SNMP has its own msgID and
JS> requestID fields.

KS> The point I was trying to make here is that it is very difficult to intuit 
KS> the reasons behind behaviors seen in the traffic, unless someone or 
KS> something tells you.  So, you can see what the ends do, but not why.

I agree, but this is a rather general observation and not necessarily
specific to 3.3 and it is not clear to me what I should do about it.
There is already a general remark in the last paragraph of section 2.5
that one has to be careful with drawing conclusions that go beyond
what you can really get out of traces.

KS>  5. Section 3.4: Please explain why it is "interesting" (your word) to
KS>  identify whether concurrency or sequentiality is occurring?  What
KS>  will you "learn" if both are observed?  And, if one is occurring more
KS>  frequently or under specific identifiable conditions, what further
KS>  does that tell you?  Just knowing that one or the other occurs is
KS>  only the tip of the iceberg, and without acknowledging the fact that
KS>  these are important and unanswered questions, just learning first
KS>  ordered details suggests you are setting the bar too low.

JS> The introduction of section 3 says:
JS>
JS>    The questions raised in the following subsections are meant to be
JS>    illustrative and no attempt has been made to provide a complete
JS>    list.
JS>
JS> I believe it is a good idea to first figure out whether there is an
JS> iceberg or not (keeping your analogy) and if there is one to ask
JS> questions how big the iceberg might be. For SNMP agent implementations
JS> that tend to do quite some caching, it is useful to know how well
JS> caching strategies are working in real-world networks. The concurrency
JS> level an agent experiences has clear impact on that. Furthermore, it
JS> will be useful to know how bursty the traffic tends to be or how well
JS> managers spread the traffic over polling intervals and this is again
JS> related to the concurrency we can extract from traces.
JS>
JS> I am not really sure what I should change, perhaps the word
JS> "interesting" is the source of the trouble and I should replace this
JS> with "valuable"?

KS> What I was trying to ask was that you tell the reader a bit about
KS> what makes it interesting.  It certainly doesn't have to be
KS> complete, but just a> hint.  I don't think you should change the
KS> word "interesting", because if it isn't interesting, you probably
KS> shouldn't be doing it.

I added the following sentence to section 3.4:

   The concurrency level and the amount of redundant requests has
   implications on caching strategies employed by SNMP agents.

KS>  6. Section 3.5:  Please explain what you would do  with the
KS>  information about which approach to table retrieval is used.  Again,
KS>  what if the results tell you that both are used?  And, if not, of
KS>  what use is it to know which approach is prevalent?  Mighten it be
KS>  useful to know the conditions under which one or the other is used
KS>  more commonly?

JS> This again has direct impact on agent implementation techniques and
JS> caching strategies.

KS> And again, just explain a little.

I changed the last sentence of section 3.5 to read as follows:

   [...] It will be useful to know which of
   these approaches are used on production networks since this has a
   direct implication on agent implementation techniques and caching
   strategies.

/js

-- 
Juergen Schoenwaelder           Jacobs University Bremen gGmbH
Phone: +49 421 200 3587         Campus Ring 1, 28759 Bremen, Germany
Fax:   +49 421 200 3103         <http://www.jacobs-university.de/>