[ippm] AD Evaluation of draft-ietf-ippm-alt-mark-07

Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com> Sun, 03 September 2017 03:44 UTC

MIME-Version: 1.0
From: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>
Date: Sat, 02 Sep 2017 22:44:18 -0500
Message-ID: <CAKKJt-fNBLbG5z+5tWW4168C-ZP4G30akBppcR2dh9HdhSbLsg@mail.gmail.com>
To: draft-ietf-ippm-alt-mark.shepherd@ietf.org
Cc: "ippm-chairs@ietf.org" <ippm-chairs@ietf.org>, draft-ietf-ippm-alt-mark@ietf.org, ippm@ietf.org
Content-Type: multipart/alternative; boundary="f403045e1ffe1217f8055840cfc8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/bQSp4jo-0qkedCCy6NowzGOj9dw>
Subject: [ippm] AD Evaluation of draft-ietf-ippm-alt-mark-07
Precedence: list

Hi, Carlos,

I've divided my evaluation feedback into two sections.

   - I really like this draft,
   - and there's a lot of good ideas in it, and a lot of good text in it,
   - and I have about 6 pages of comments from AD Evaluation.

Please don't be disturbed by the "6 pages of comments" part. I seem to send
lots of comments about really good IPPM work ... and then it gets
published.

Bill and Brian are accustomed to this, after four years with me.

Please let me know what the plan for revision turns out to be. I'm
expecting to see this draft come back soon, because I'm expecting lots of
text to move, but not very much text to change, so I'll leave it in "AD
Evaluation" state, and change the substate to "Revised I-D Needed". So, NOT
returning this to the working group in the datatracker.

Thanks, and it's always a pleasure.

Spencer

--- For the shepherd and chairs (everyone else is welcome to read this
part, but you don't have any actions to take unless the shepherd and chairs
ask you to, so you could also safely skip down to "--- For the authors and
editors", about a page down).

I get no joy from asking this, but I note that this draft has eight names
of authors and editors, so I’m looking for confirmation from the chairs
that all of the names significantly contributed to the document.

(I’d ask the editor who has contributed significantly, but there are six
editors listed. I’ve never seen that before, and can imagine that if the
pen has rotated around, six people could have held it at some point since
it was an individual -00 draft, but is the working group expecting six
people to hold the pen when ADs start sending ballots with DISCUSSes and
Comments?)

The reason I’m asking now, is that when this draft goes into IESG
Evaluation, this text from the  RFC Editor’s Style Guide (
https://tools.ietf.org/html/rfc7322#section-4.1.1) comes into play:

   The total number of authors or editors on the first page is generally
   limited to five individuals and their affiliations.  If there is a
   request for more than five authors, the stream-approving body needs
   to consider if one or two editors should have primary responsibility
   for this document, with the other individuals listed in the
   Contributors or Acknowledgements section.

and the working group chairs and participants are much better able to
assess who should be listed than the IESG during a telechat.

And, with six editors listed, the IESG wouldn't know who to pick as editor
without asking the people I'm asking now. So, let's handle this in the
working group, ok? :-)

If the shepherd could include that confirmation in
https://datatracker.ietf.org/doc/draft-ietf-ippm-alt-mark/shepherdwriteup/,
probably under Working Group Summary, that would be outstanding, because
that’s where balloting ADs would look before asking questions about this.

I looked at
https://tools.ietf.org/rfcdiff?url1=draft-tempia-ippm-p3m-03.txt&url2=draft-ietf-ippm-alt-mark-07.txt,
and it looks like a lot of that text was carried over without many changes.
That's fine, but the structure of the document didn't change, and some
things should have changed. I have a lot of comments below about the
structure of the document, but most of the comments about structure would
apply verbatim to https://datatracker.ietf.org/doc/draft-tempia-ippm-p3m/,
which had all-Telecom Italia authors and editors, so that structure
probably made sense in that context - just not for a document intended for
implementers of this document as an RFC.

How this happens is up to the chairs and shepherd, but what I'd like, is
for the next revision to look as if one person read through the document
and my comments about structure, which I'll tag as "*** Includes suggestion
for change to document structure", and do the right thing.

Because I'm suggesting strcutural changes which will make diffs unusable if
implemented, it would be helpful for me, to see one revision with only
structural changes, and then a second revision with any other changes. I'll
be able to request Last Call sooner if I don't have to read every line of
the next revision again - and I'm asking questions about very few changes
to actual text, so I'd like to be able to spot them quickly after text
MOVES.

--- For the authors and editors

In the Abstract,

   A report on the operational
   experiment done at Telecom Italia is explained in order to give an
   example and show the method applicability.

might read more clearly as “a report is provided in order to explain”.

Recognizing that the authors know very well what’s on their own network,
I’m wondering whether

   Nowadays, most of the traffic in Service Providers' networks carries
   contents that are highly sensitive to packet loss [RFC7680], delay
   [RFC7679], and jitter [RFC3393].

is true for all Service Providers. I wouldn’t have been at all surprised to
see

   Nowadays, most Service Providers' networks carry traffic with contents
   contents that are highly sensitive to packet loss [RFC7680], delay
   [RFC7679], and jitter [RFC3393].

but the claim in the document is broader - “most traffic is highly
sensitive”.

In this text,

  The method proposed in this document follows the second approach, but
   it doesn't use additional packets to virtually split the flow in
   blocks.  Instead, it "colors" the packets

is there a reference you could provide for this use of the term “colors”? I
mostly ask because the document already provides such good coverage on
references, so this omission stood out. I see

  The previous IETF drafts about this technique were:
   [I-D.cociglio-mboned-multicast-pm] and [I-D.tempia-opsawg-p3m].

in the Acknowledgements section, so wonder if they would be helpfully
mentioned here (or soon after), but I don't know if that would be helpful,
and you folks would know what is helpful.

In this text,

   In order to coherently compare timestamps collected on different
   routers, the network nodes must be in sync.

is it obvious to everyone but me, what about the network nodes must be in
sync? I can think of a number of things, but I’m guessing. If this was a
clocking thing, I’d expect to see something like “the clocks on the network
nodes must be in sync”.

In this text

   Due to ECMP, packet re-ordering is very common in IP network.  The
   accuracy of marking based PM, especially packet loss measurement, may
   be affected by packet re-ordering.  Take a look at the following
   example:

   Block   :    1    |    2    |    3    |    4    |    5    |...
   --------|---------|---------|---------|---------|---------|---
   Node R1 : AAAAAAA | BBBBBBB | AAAAAAA | BBBBBBB | AAAAAAA |...
   Node R2 : AAAAABB | AABBBBA | AAABAAA | BBBBBBA | ABAAABA |...

                        Figure 5: Packet Reordering

it would be clearer if “Take a look at the following example” provided
enough detail to tell the reader what the reader is looking for - like,
what does Node R1 add to this picture? I think from the text after Figure
5, your point is that when you see reordered packets, you can’t assign them
unambiguously to an interval block, but I don’t know why the packets in the
Node R2 row aren’t enough to make your point, which makes me wonder what
you think is the difference between the Node R1 and Node R2 rows, that you
needed both rows.

If I was guessing, something like ‘in Figure 5, the packet stream for Node
R1 isn’t being reordered, and can be safely assigned to interval blocks,
but the packet stream for Node R2 is being reordered, so there is no safe
way to tell whether the packet with the marker of "B" in block 3 belongs to
block 2 or block 4’ would make your point more clearly, but please explain
what the two rows mean a little more clearly, however you do it.

*** Includes suggestion for change to document structure - I’m looking at

   Before going deeper into the implementation details, it's worth
   mentioning two different strategies that can be used when
   implementing the method:

   o  flow-based: the flow-based strategy is used when only a limited
      number of traffic flows need to be monitored.  This could be the
      case, for example, of IPTV channels or other specific applications
      traffic with high QoS requirements (i.e.  Mobile Backhauling
      traffic).  According to this strategy, only a subset of the flows
      is colored.  Counters for packet loss measurements can be
      instantiated for each single flow, or for the set as a whole,
      depending on the desired granularity.  A relevant problem with
      this approach is the necessity to know in advance the path
      followed by flows that are subject to measurement.  Path rerouting
      and traffic load-balancing increase the issue complexity,
      especially for unicast traffic.  The problem is easier to solve
      for multicast traffic where load balancing is seldom used,
      especially for IPTV traffic where static joins are frequently used
      to force traffic forwarding and replication.  Another application
      is on Mobile Backhauling, implemented with a VPN MPLS in Telecom
      Italia's network; where the monitoring is between the Provider
      Edge nodes of the VPN MPLS.

   o  link-based: measurements are performed on all the traffic on a
      link by link basis.  The link could be a physical link or a
      logical link (for instance an Ethernet VLAN or a MPLS PW).
      Counters could be instantiated for the traffic as a whole or for
      each traffic class (in case it is desired to monitor each class
      separately), but in the second case a couple of counters is needed
      for each class.

   The current implementation in Telecom Italia uses the first strategy.
   As mentioned, the flow-based measurement requires the identification
   of the flow to be monitored and the discovery of the path followed by
   the selected flow.  It is possible to monitor a single flow or
   multiple flows grouped together, but in this case measurement is
   consistent only if all the flows in the group follow the same path.
   Moreover, a Service Provider should be aware that, if a measurement
   is performed by grouping many flows, it is not possible to determine
   exactly which flow was affected by packets loss.  In order to have
   measures per single flow it is necessary to configure counters for
   each specific flow.  Once the flow(s) to be monitored have been
   identified, it is necessary to configure the monitoring on the proper
   nodes.  Configuring the monitoring means configuring the policy to
   intercept the traffic and configuring the counters to count the
   packets.  To have just an end-to-end monitoring, it is sufficient to
   enable the monitoring on the first and the last hop routers of the
   path: the mechanism is completely transparent to intermediate nodes
   and independent from the path followed by traffic flows.  On the
   contrary, to monitor the flow on a hop-by-hop basis along its whole
   path it is necessary to enable the monitoring on every node from the
   source to the destination.  In case the exact path followed by the
   flow is not known a priori (i.e. the flow has multiple paths to reach
   the destination) it is necessary to enable the monitoring system on
   every path: counters on interfaces traversed by the flow will report
   packet count, counters on other interfaces will be null.

and I'm wondering if this is true of Alternate Marking in general, or only
for Telecom Italia. If it’s true in general, this seems really useful for
everyone to know, but it’s in the Telecom Italia implementation report part
of the draft, so it’s easy for people who only want to understand how
Alternate Marking works to stop reading before they read down to it.

Perhaps moving the entire block of text, except for "The current
implementation" sentence, earlier in the document, and leaving only

   The current implementation in Telecom Italia uses the flow-based strategy
   defined in Section (wherever you put it).

in the Telecom Italia section would be helpful.

*** Includes suggestion for change to document structure - I think pretty
much all of 5.1.1, 5.1.2, and 5.1.3 would also apply generally, but you
folks would know better. 5.1.4 does seem Telecom Italia-specific.

*** Includes suggestion for change to document structure - Actually,
Sections 6 and 7 are still saying things like

   This document doesn't aim to propose a new Performance Metric but a
   new method of measurement for a few Performance Metrics that have
   already been standardized.

that doesn't seem specific to Telecom Italia. Which makes me wonder how
much actual reporting ON the Telecom Italia implementation itself is in
this document, and how much of it is (perhaps something like)
"generally-applicable lessons learned from the Telecom Italia
Implementation". Maybe just changing the high-level descriptions would be
better than trying to separate this part of the report into "what's true
for all Alternate Marking" and "what's specific to Telecom Italia".

Please take all this as a compliment. The text flows really well in
sections 5-7. My concern is only that people who need to read it will have
stopped before they read it.

In Section 8,

   the measurements described in this document
   are passive, so there are no packets injected into the network
   causing potential harm to the network itself and to data traffic.

might be clearer if it said "no new packets injected".

(I know you know this stuff, but you're also writing for reviewers, ADs,
and actual implementers of your RFC who may not know that)

*** Includes suggestion for change to document structure - You talk about
"harm caused by the measurement" and "harm to the measurement" (and that's
good), but I think the paragraph that starts

  The measurement itself may be affected by routers

falls into the second group, and then the paragraph that starts

   One of the main security threats in OAM protocols is network
   reconnaissance;

goes back to the first group. If you switched the order, this section would
be divided more clearly into the two groups (and network security folk who
are only worried about "harm caused by the measurement" can safely stop
reading when they get to the parts about "harm to the measurement"). Maybe
dividing the Security Considerations section into two named sub-sections
would be helpful to readers.

I'm not happy about this, but in 2017, we're worried about people who have
really good capabilities for pervasive monitoring (see BCP 188 at
https://tools.ietf.org/html/rfc7258 to remind yourself why). So I'm
thinking that in this text,

   One of the main security threats in OAM protocols is network
   reconnaissance; an attacker can gather information about the network
   performance by passively eavesdropping to OAM messages.  The
   advantage of the methods described in this document is that the
   marking bits are the only information that is exchanged between the
   network devices.  Therefore, passive eavesdropping to data plane
   traffic does not allow attackers to gain information about the
   network performance.

your point is that that attackers can't gain information about network
performance from a single monitoring point, and must use synchronized
monitoring points at multiple points on the path, because they have to do
the same kind of measurement and aggregation that Service Providers using
Alternate Marking must do.

If that's true, it's probably worth saying, before a secdir reviewer asks
about it. And I should mention that I just looked at
https://www.ietf.org/mailman/roster/secdir, and you COULD get the author of
BCP 188 as your secdir reviewer ;-)

*** Includes suggestion for change to document structure - It would
probably be helpful if your "Conclusions" section was significantly earlier
in the document - it would strengthen your "Overview of the Method" section
a lot, if it was there - and it's more "Summary" than "Conclusion". At a
minimum, it shouldn't be behind the Security Considerations, because it's
really helpful text, and the current location hides it.

[ippm] AD Evaluation of draft-ietf-ippm-alt-mark-… Spencer Dawkins at IETF
Re: [ippm] AD Evaluation of draft-ietf-ippm-alt-m… Carlos Pignataro (cpignata)
[ippm] R: AD Evaluation of draft-ietf-ippm-alt-ma… Fioccola Giuseppe
Re: [ippm] AD Evaluation of draft-ietf-ippm-alt-m… Spencer Dawkins at IETF
[ippm] R: AD Evaluation of draft-ietf-ippm-alt-ma… Fioccola Giuseppe
Re: [ippm] AD Evaluation of draft-ietf-ippm-alt-m… Spencer Dawkins at IETF
[ippm] R: AD Evaluation of draft-ietf-ippm-alt-ma… Fioccola Giuseppe
Re: [ippm] AD Evaluation of draft-ietf-ippm-alt-m… Carlos Pignataro (cpignata)
[ippm] R: AD Evaluation of draft-ietf-ippm-alt-ma… Fioccola Giuseppe
Re: [ippm] AD Evaluation of draft-ietf-ippm-alt-m… Carlos Pignataro (cpignata)
Re: [ippm] AD Evaluation of draft-ietf-ippm-alt-m… Spencer Dawkins at IETF
Re: [ippm] AD Evaluation of draft-ietf-ippm-alt-m… Carlos Pignataro (cpignata)