Re: [trill] TRILL OAM Requirements -

Thomas Narten <narten@us.ibm.com> Thu, 26 April 2012 20:36 UTC

Return-Path: <narten@us.ibm.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B0F0721F8795 for <trill@ietfa.amsl.com>; Thu, 26 Apr 2012 13:36:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -110.599
X-Spam-Level:
X-Spam-Status: No, score=-110.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id N68SJ9YY9v1O for <trill@ietfa.amsl.com>; Thu, 26 Apr 2012 13:36:51 -0700 (PDT)
Received: from e7.ny.us.ibm.com (e7.ny.us.ibm.com [32.97.182.137]) by ietfa.amsl.com (Postfix) with ESMTP id 2606A21F8793 for <trill@ietf.org>; Thu, 26 Apr 2012 13:36:51 -0700 (PDT)
Received: from /spool/local by e7.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <trill@ietf.org> from <narten@us.ibm.com>; Thu, 26 Apr 2012 16:36:45 -0400
Received: from d01dlp02.pok.ibm.com (9.56.224.85) by e7.ny.us.ibm.com (192.168.1.107) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 26 Apr 2012 16:36:42 -0400
Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 2674E6E804B for <trill@ietf.org>; Thu, 26 Apr 2012 16:36:41 -0400 (EDT)
Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q3QKacwv109322 for <trill@ietf.org>; Thu, 26 Apr 2012 16:36:39 -0400
Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q3QKaXJH011183 for <trill@ietf.org>; Thu, 26 Apr 2012 17:36:34 -0300
Received: from cichlid.raleigh.ibm.com ([9.80.5.204]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q3QKaTtu010679 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 26 Apr 2012 17:36:32 -0300
Received: from cichlid.raleigh.ibm.com (localhost [127.0.0.1]) by cichlid.raleigh.ibm.com (8.14.5/8.12.5) with ESMTP id q3QKaRIm020300; Thu, 26 Apr 2012 16:36:27 -0400
Message-Id: <201204262036.q3QKaRIm020300@cichlid.raleigh.ibm.com>
To: Sam Aldrin <aldrin.ietf@gmail.com>
In-reply-to: <4ECDDE13-BA0F-41E6-BE64-C00CE24D4FBC@gmail.com>
References: <344037D7CFEFE84E97E9CC1F56C5F4A50100E64C@xmb-sjc-214.amer.cisco.com> <OFDB5D95ED.C6594B78-ON872579EB.00764D44-882579EB.007C3F2D@us.ibm.com> <4ECDDE13-BA0F-41E6-BE64-C00CE24D4FBC@gmail.com>
Comments: In-reply-to Sam Aldrin <aldrin.ietf@gmail.com> message dated "Wed, 25 Apr 2012 21:52:42 -0700."
Date: Thu, 26 Apr 2012 16:36:26 -0400
From: Thomas Narten <narten@us.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12042620-5806-0000-0000-000014A71043
Cc: trill@ietf.org
Subject: Re: [trill] TRILL OAM Requirements -
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/trill>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Apr 2012 20:36:52 -0000

Let me see if I can help tease this apart a bit more.

One comment I have (broadly speaking) about the  requirements document
is that it could be more clear that it is about providing TRILL
mechanisms that would allow certain functionality to be
provided. Whether an implementation provides all those tools (or
facilities) is a separate matter. And may even be out-of-scope for
TRILL to *require*.

So, if someone is worried that doing something for lots of flows does
not scale, there is nothing that says an implemenation is required to
*implement* such functionality. In addition, let's assume operators
will be smart enough to use facilities that make sense, and that they
won't turn on things that cause their networks to keel over. :-)

Another way to look at it, ICMP defines echo request/TTL
exceeded/etc. Tools like ping/traceroute make use of those
facilities. But no where (I think) does the IETF require that an IP
stack provide Ping or traceroute.

So, TRILL OAM needs to provide mechanisms that can be used to
implement OAM. That is what we need to focus on.

Sam Aldrin <aldrin.ietf@gmail.com> writes:

> Santosh,
> 
> Please find my comments inline with %SAM.
> 
> > On Apr 25, 2012, at 3:36 PM, Santosh Rajagopalan wrote:
> 
> > Comments on this doc:
> > 4.3 Continuity check:
> > This seems to be defined too broadly in the doc. I look at this as a
> > keepalive mechanism which gets set by configuration, and stays on
> > for a longer period of time (as opposed to connectivity
> > verification, which is user-initiated and short-lived). In Ethernet
> > OAM, this is used to keep track of connectivity between adjacent
> > switches and between switches that lie at the edges of a "level".
> > The value of continuity check as defined in this doc seems less
> > certain to me.
> 
> %SAM - What would make is certain in a requirements document in your
> opinion?

MPLS is circuit oriented. I.e., a path is an end to end path. One
identifies a path at the head end. Anything that enters an MPLS tunnel
pops out at the other end. Trill just doesn't have that notion. So
reusing the definitions from RFC5860 doesn't immediately make sense to
me.

The notion of a path in TRILL is very different than MPLS.

The document says:

>    OAM MUST provide functions that enable an RBridge to perform a
>    Continuity Check to any other RBridge over a specified path or a set
>    of paths.

How do you indicate what "a specified path" is in TRILL? What is the
mechanism for doing so? Is this the same thing as using a flow or is
it something different? But the flow-related requirements are
mentioned elsewhere, so I'm not sure what the above means/implies.

> > What does it mean to have continuity checks for each flow? Each time
> > a flow gets initiated, is an operator expected to setup keepalives
> > for that flow? Or is this expected to get started automatically
> > every time a new flow is detected?
> 
> %SAM - This is requirements document. There should be ability to
> verify any given individual path/flow.

I think what the requirements document should say is that TRILL OAM
must provide mechanisms that allow one to provide continuity checks
for particular flows. How and when those are used, and what kind of
tools actually make use of these mechanisms is a separate matter (and
may even be out-of-scope for TRILL to define in terms of
requirements). But TRILL OAM does need to provide the mechanisms to
allows such connectivity checks to be done.

> > Either way, this doesn't scale for effort or state.
> 
> %SAM - Just because a vendor device cannot scale, it doesn't mean
>  cannot be done. May be if you can justify with a quantification what
>  scale metrics you are talking, would help to discuss further.

TRILL provides mechanisms, it's a product decision what to implement?
:-)

> > The same goes for continuity checks which check if each link of a
> > multipath is alive. I would recommend that the scope of the
> > continuity check messages get narrowed to measure the connectivity
> > liveliness for rbridges either on a per-hop basis or an end-to-end
> > basis.
> 
> %SAM - Again, I am not sure, based on what quantification you are
> arriving at the conclusion. This is fundamental requirement in
> other network types as well. So,  would like you to provide more
> data points to your argument on why it cannot be done in TRILL,
> whilst the same is done in other networks.

If TRILL just provides mechanisms for such checks, and it is an
operator choice where they are enabled, that would seem a reasonable
way to have it both ways.

> > 4.5. General Requirements
> 
> > "OAM MUST NOT require extensions to or modifications of the TRILL
> >   header."
> > Is the rbridge channel as defined in
>   http://tools.ietf.org/html/draft-eastlake-trill-rbridge-channel-00
>   considered an extension or modification of the TRILL header?
> 
> %SAM - Please ask the author for clarification.

I would say "TRILL header" is defined narrowly by the base spec and
maybe draft-ietf-trill-rbridge-extension-03.txt

And IMO, the channel header is a payload and is not considered part of
the TRILL header. The document should say this.

> > "OAM, as practical as possible, SHOULD provide a single framework
> >   between TRILL and other similar standards."
> > I'm not sure what this means. Can someone clarify?
> 
> %SAM - We could add more clarifying text here. My take on this is, we
>  should not define individual framework for different OAM
>  components. For example, to quickly achieve ping and trace, we define
>  a packet format. Later, when we want to test multipath/ECMP,
>  oops..cannot be done, so let us define a new format. So, the
>  requirement emphasize on avoiding that. Can it be 100% achievable,
>  simple answer is NO. But we need to try our best. Others could chime
>  in, if they want to add.

My take is that the requirement is somewhat vague, so arguing about
what it means is probably not all that useful. And given that it is a
SHOULD, rather than MUST, also not critical to define.  But at the
same time, it makes the requirement not all that meaningful. :-)

> > "OAM MUST maintain related error and operational counters."

I interpreted this to mean that an implementation must maintain  error
counters (ok, that isn't rocket science). But it says nothing about
how OAM gains access to those counters, or indeed, which counters are
appropriate to record.

> > Again, what does this mean? Is the requirement that the OAM
> > framework defined for trill must be able to query all counters?

I do not believe that this is implied.

> %SAM - What part is not clear? Requirement is specifying the error
>  counters should be maintained on an RBridge. Whether you provide
>  access to query or only display via CLI is implementation specific.

I would like to see the document add the following requirement:

  Error and operational counters SHOULD BE accessed via SNMP.

And I would go further and say (though this doesn't need to be said,
as long as its understood) that there is no need for OAM to define an
alternate mechanism (besides SNMP) for getting access to such
counters. Implementations are free of course to do their own thing as
well (e.g., CLI).
  
> > "OAM MUST provide a single OAM framework for all TRILL OAM functions"
> > Can you provide an example of what would be a violation of this?
> 
> %SAM - see earlier reasoning.

IMO, the above should be a SHOULD. Since "single OAM framework" is a
nebulous term, I would not be comfortable agreeing it is a MUST. Devil
is in the details...

But I agree that it's an appropriate goal.

> > 4.7. Packet Loss
> 
> > This is another section which is very broadly defined. Most switches
> > I know of don't have the ability to maintain packet loss counters
> > per flow, or even per source rbridge (leave aside packet loss for
> > one link of an ecmp between rbridges). The best you can get here is
> > packet losses for a given link, and this is already part of the
> > TRILL MIB and can be queried by SNMP.
> 
> %SAM - When you say 'most', does that mean there are switches which
>  could do that, correct? That is why the word 'SHOULD' is used. If any
>  vendor cannot support, that is their choice. In regds to MIB or SNMP,
>  they are part of OAM umbrella. Not sure why this OAM requirement
>  implemented via SNMP is an issue to you?

I think the better thing to do is look at specific requirements:

>    OAM SHOULD provide the ability to measure packet loss for a
>    specified flow between any two RBridges.

Does this imply a specific requirement on an implementation? Does this
require some new OAM mechanism to be defined (beyond what is needed
for the other requirements)? I'm not sure... And without being sure,
I'm not sure if this should be a SHOULD requirement or not.

E.g., is the above saying that one should be able to run a test using
a specific flow that measures packet loss? I would say yes. And I
think I know how to implement this.

Or is this a requirement that says for a given flow X (whatever that
means) of *actual* *data* *packets*, does OAM need to be able to track
the packet loss *for* *that* *actual* *flow*? If so, I don't know
right off how you could do that and it sounds expensive/complex to
implement. I wouldn't want this to be a requirement. :-)

Thus, further clarification would seem in order.

> > 4.8. Packet Delay
> 
> > As defined here, measurement of packet delay seems to require
> > hardware support. Is that the intent? Or is this supposed to measure
> > delay between OAM software processes on the relevant switches?
> 
> %SAM - I think it is both. If you can timestamp at each level, one
>  could provide more granular levels. How you do it? could be
>  proprietary or part of solution documents.

We are not intereseted in proprietary solutions. :-) What is the
actual TRILL OAM *mechanism* that would be needed to support such a
requirement? That is what we should focus on.

> 
> >
> > 5. General Format of TRILL OAM Messages
> 
> > The inner ethernet header should be called out explicitly, instead
> > of being subsumed in the "Flow entropy" section, since using
> > anything other than ethernet after the trill header would break
> > every single trill compliant piece of silicon I know of.
> 
> %SAM - We have merchant silicon vendors on the author list. I have my
>  opinion but will leave it for them to comment. :D

To be clear, it would be helpful to me if Figure 1 showed what the
TRILL header part contained. Does it contain the inner src/dst MAC and
VLAN, for instance?  Actually, it clearly does not. RFC6325 does not
consider the inner header to be part of the TRILL header.

It would probably help to give an example as to what the Flow Entropy
field contained.  Also,  the inner header fields (like reserved DA
multicast addresses) have special meaning in TRILL. Are there  any
special rules wrt the Flow Entropy field wrt reserved or otherwise
specially defifined values?

Thomas