Re: [trill] TRILL OAM Requirements -

Sam Aldrin <aldrin.ietf@gmail.com> Thu, 26 April 2012 22:13 UTC

References: <344037D7CFEFE84E97E9CC1F56C5F4A50100E64C@xmb-sjc-214.amer.cisco.com> <OFDB5D95ED.C6594B78-ON872579EB.00764D44-882579EB.007C3F2D@us.ibm.com> <4ECDDE13-BA0F-41E6-BE64-C00CE24D4FBC@gmail.com> <201204262036.q3QKaRIm020300@cichlid.raleigh.ibm.com>
In-Reply-To: <201204262036.q3QKaRIm020300@cichlid.raleigh.ibm.com>
Mime-Version: 1.0 (1.0)
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="us-ascii"
Message-Id: <6157EE78-9F6D-4D87-92CA-78F616E7707B@gmail.com>
From: Sam Aldrin <aldrin.ietf@gmail.com>
Date: Thu, 26 Apr 2012 15:13:37 -0700
To: Thomas Narten <narten@us.ibm.com>
Cc: "trill@ietf.org" <trill@ietf.org>
Subject: Re: [trill] TRILL OAM Requirements -
Precedence: list

See my comments inline.

Sent from my iPad

On Apr 26, 2012, at 1:36 PM, Thomas Narten <narten@us.ibm.com> wrote:

> Let me see if I can help tease this apart a bit more.
> 
> One comment I have (broadly speaking) about the  requirements document
> is that it could be more clear that it is about providing TRILL
> mechanisms that would allow certain functionality to be
> provided. Whether an implementation provides all those tools (or
> facilities) is a separate matter. And may even be out-of-scope for
> TRILL to *require*.
> 
> So, if someone is worried that doing something for lots of flows does
> not scale, there is nothing that says an implemenation is required to
> *implement* such functionality. In addition, let's assume operators
> will be smart enough to use facilities that make sense, and that they
> won't turn on things that cause their networks to keel over. :-)
Thanks for asserting what I was saying. :D
> 
> Another way to look at it, ICMP defines echo request/TTL
> exceeded/etc. Tools like ping/traceroute make use of those
> facilities. But no where (I think) does the IETF require that an IP
> stack provide Ping or traceroute.
That's exactly should be with OAM for TRILL as well. Whether one implements via ping or trace is beyond the scope of this doc.
> 
> So, TRILL OAM needs to provide mechanisms that can be used to
> implement OAM. That is what we need to focus on.
> 
> Sam Aldrin <aldrin.ietf@gmail.com> writes:
> 
>> Santosh,
>> 
>> Please find my comments inline with %SAM.
>> 
>>> On Apr 25, 2012, at 3:36 PM, Santosh Rajagopalan wrote:
>> 
>>> Comments on this doc:
>>> 4.3 Continuity check:
>>> This seems to be defined too broadly in the doc. I look at this as a
>>> keepalive mechanism which gets set by configuration, and stays on
>>> for a longer period of time (as opposed to connectivity
>>> verification, which is user-initiated and short-lived). In Ethernet
>>> OAM, this is used to keep track of connectivity between adjacent
>>> switches and between switches that lie at the edges of a "level".
>>> The value of continuity check as defined in this doc seems less
>>> certain to me.
>> 
>> %SAM - What would make is certain in a requirements document in your
>> opinion?
> 
> MPLS is circuit oriented. I.e., a path is an end to end path. One
> identifies a path at the head end. Anything that enters an MPLS tunnel
> pops out at the other end. Trill just doesn't have that notion. So
> reusing the definitions from RFC5860 doesn't immediately make sense to
> me.
> 
> The notion of a path in TRILL is very different than MPLS.
If it same as MPLS, there wouldn't be TRILL, instead it is all mpls. Having said that, whether it is mpls packet or trill frame, both traverse the devices and the path could be traced and verified. Tis includes load balancing etc. there exists unicast and multicast in both the places. We should be reusing whatever was laid out for mpls as much as possible. If something is not really applicable, we can always address that, which I haven't seen in the above case or question.
> 
> The document says:
> 
>>   OAM MUST provide functions that enable an RBridge to perform a
>>   Continuity Check to any other RBridge over a specified path or a set
>>   of paths.
> 
> How do you indicate what "a specified path" is in TRILL? What is the
> mechanism for doing so? Is this the same thing as using a flow or is
> it something different? But the flow-related requirements are
> mentioned elsewhere, so I'm not sure what the above means/implies.
Imo they represent the same. May be we should clarify what path and flow mean. Whether they are same or different.
> 
>>> What does it mean to have continuity checks for each flow? Each time
>>> a flow gets initiated, is an operator expected to setup keepalives
>>> for that flow? Or is this expected to get started automatically
>>> every time a new flow is detected?
>> 
>> %SAM - This is requirements document. There should be ability to
>> verify any given individual path/flow.
> 
> I think what the requirements document should say is that TRILL OAM
> must provide mechanisms that allow one to provide continuity checks
> for particular flows. How and when those are used, and what kind of
> tools actually make use of these mechanisms is a separate matter (and
> may even be out-of-scope for TRILL to define in terms of
> requirements). But TRILL OAM does need to provide the mechanisms to
> allows such connectivity checks to be done.
Yes. Solution document covers how to achieve that. One step at a time.
> 
>>> Either way, this doesn't scale for effort or state.
>> 
>> %SAM - Just because a vendor device cannot scale, it doesn't mean
>> cannot be done. May be if you can justify with a quantification what
>> scale metrics you are talking, would help to discuss further.
> 
> TRILL provides mechanisms, it's a product decision what to implement?
> :-)
> 
>>> The same goes for continuity checks which check if each link of a
>>> multipath is alive. I would recommend that the scope of the
>>> continuity check messages get narrowed to measure the connectivity
>>> liveliness for rbridges either on a per-hop basis or an end-to-end
>>> basis.
>> 
>> %SAM - Again, I am not sure, based on what quantification you are
>> arriving at the conclusion. This is fundamental requirement in
>> other network types as well. So,  would like you to provide more
>> data points to your argument on why it cannot be done in TRILL,
>> whilst the same is done in other networks.
> 
> If TRILL just provides mechanisms for such checks, and it is an
> operator choice where they are enabled, that would seem a reasonable
> way to have it both ways.
Ack.
> 
>>> 4.5. General Requirements
>> 
>>> "OAM MUST NOT require extensions to or modifications of the TRILL
>>>  header."
>>> Is the rbridge channel as defined in
>>  http://tools.ietf.org/html/draft-eastlake-trill-rbridge-channel-00
>>  considered an extension or modification of the TRILL header?
>> 
>> %SAM - Please ask the author for clarification.
> 
> I would say "TRILL header" is defined narrowly by the base spec and
> maybe draft-ietf-trill-rbridge-extension-03.txt
> 
> And IMO, the channel header is a payload and is not considered part of
> the TRILL header. The document should say this.
> 
>>> "OAM, as practical as possible, SHOULD provide a single framework
>>>  between TRILL and other similar standards."
>>> I'm not sure what this means. Can someone clarify?
>> 
>> %SAM - We could add more clarifying text here. My take on this is, we
>> should not define individual framework for different OAM
>> components. For example, to quickly achieve ping and trace, we define
>> a packet format. Later, when we want to test multipath/ECMP,
>> oops..cannot be done, so let us define a new format. So, the
>> requirement emphasize on avoiding that. Can it be 100% achievable,
>> simple answer is NO. But we need to try our best. Others could chime
>> in, if they want to add.
> 
> My take is that the requirement is somewhat vague, so arguing about
> what it means is probably not all that useful. And given that it is a
> SHOULD, rather than MUST, also not critical to define.  But at the
> same time, it makes the requirement not all that meaningful. :-)
Do you want it to be moved to MUST to make it meaningful :D
> 
>>> "OAM MUST maintain related error and operational counters."
> 
> I interpreted this to mean that an implementation must maintain  error
> counters (ok, that isn't rocket science). But it says nothing about
> how OAM gains access to those counters, or indeed, which counters are
> appropriate to record.
We could further classify the error counters. But do to agree that mentioning access method is part of the scope. If I say, XML must, not many folks may like it.
> 
>>> Again, what does this mean? Is the requirement that the OAM
>>> framework defined for trill must be able to query all counters?
> 
> I do not believe that this is implied.
> 
>> %SAM - What part is not clear? Requirement is specifying the error
>> counters should be maintained on an RBridge. Whether you provide
>> access to query or only display via CLI is implementation specific.
> 
> I would like to see the document add the following requirement:
> 
>  Error and operational counters SHOULD BE accessed via SNMP.
If Snmp, why not cli, XML etc?
> 
> And I would go further and say (though this doesn't need to be said,
> as long as its understood) that there is no need for OAM to define an
> alternate mechanism (besides SNMP) for getting access to such
> counters. Implementations are free of course to do their own thing as
> well (e.g., CLI).
I still say, we should not define any here. If folks think Snmp is great, so be it. But not everyone is employing Snmp as 'the' baseline mechanism.
> 
>>> "OAM MUST provide a single OAM framework for all TRILL OAM functions"
>>> Can you provide an example of what would be a violation of this?
>> 
>> %SAM - see earlier reasoning.
> 
> IMO, the above should be a SHOULD. Since "single OAM framework" is a
> nebulous term, I would not be comfortable agreeing it is a MUST. Devil
> is in the details...
> 
> But I agree that it's an appropriate goal.
Yes it is a noble goal, to ensure future changes adhere to it. As we all share the same goal, do not think we should deliberate too much, unless one is planning alternate mechanism.
> 
>>> 4.7. Packet Loss
>> 
>>> This is another section which is very broadly defined. Most switches
>>> I know of don't have the ability to maintain packet loss counters
>>> per flow, or even per source rbridge (leave aside packet loss for
>>> one link of an ecmp between rbridges). The best you can get here is
>>> packet losses for a given link, and this is already part of the
>>> TRILL MIB and can be queried by SNMP.
>> 
>> %SAM - When you say 'most', does that mean there are switches which
>> could do that, correct? That is why the word 'SHOULD' is used. If any
>> vendor cannot support, that is their choice. In regds to MIB or SNMP,
>> they are part of OAM umbrella. Not sure why this OAM requirement
>> implemented via SNMP is an issue to you?
> 
> I think the better thing to do is look at specific requirements:
> 
>>   OAM SHOULD provide the ability to measure packet loss for a
>>   specified flow between any two RBridges.
> 
> Does this imply a specific requirement on an implementation? Does this
> require some new OAM mechanism to be defined (beyond what is needed
> for the other requirements)? I'm not sure... And without being sure,
> I'm not sure if this should be a SHOULD requirement or not.
As this is requirement/framework document, do you agree that one should be able to measure packet loss for a given path? There are many ways one could implement this. Mpls has its own, 1731 specifies its own way. We could come up with one, later, as well. But frame work should be able to support  it.
> 
> E.g., is the above saying that one should be able to run a test using
> a specific flow that measures packet loss? I would say yes. And I
> think I know how to implement this.
> 
> Or is this a requirement that says for a given flow X (whatever that
> means) of *actual* *data* *packets*, does OAM need to be able to track
> the packet loss *for* *that* *actual* *flow*? If so, I don't know
> right off how you could do that and it sounds expensive/complex to
> implement. I wouldn't want this to be a requirement. :-)
Agree. If you specify, I would not implement :D
> 
> Thus, further clarification would seem in order.
> 
>>> 4.8. Packet Delay
>> 
>>> As defined here, measurement of packet delay seems to require
>>> hardware support. Is that the intent? Or is this supposed to measure
>>> delay between OAM software processes on the relevant switches?
>> 
>> %SAM - I think it is both. If you can timestamp at each level, one
>> could provide more granular levels. How you do it? could be
>> proprietary or part of solution documents.
> 
> We are not intereseted in proprietary solutions. :-) What is the
> actual TRILL OAM *mechanism* that would be needed to support such a
> requirement? That is what we should focus on.
Measuring rtt, one way delay etc are in the scope here. When we consider payload, we should consider time stamp fields option.

Sam
> 
>> 
>>> 
>>> 5. General Format of TRILL OAM Messages
>> 
>>> The inner ethernet header should be called out explicitly, instead
>>> of being subsumed in the "Flow entropy" section, since using
>>> anything other than ethernet after the trill header would break
>>> every single trill compliant piece of silicon I know of.
>> 
>> %SAM - We have merchant silicon vendors on the author list. I have my
>> opinion but will leave it for them to comment. :D
> 
> To be clear, it would be helpful to me if Figure 1 showed what the
> TRILL header part contained. Does it contain the inner src/dst MAC and
> VLAN, for instance?  Actually, it clearly does not. RFC6325 does not
> consider the inner header to be part of the TRILL header.
> 
> It would probably help to give an example as to what the Flow Entropy
> field contained.  Also,  the inner header fields (like reserved DA
> multicast addresses) have special meaning in TRILL. Are there  any
> special rules wrt the Flow Entropy field wrt reserved or otherwise
> specially defifined values?
> 
> Thomas
> 
>

[trill] TRILL OAM Requriements - Tissa Senevirathne (tsenevir)
Re: [trill] TRILL OAM Requirements - Santosh Rajagopalan
Re: [trill] TRILL OAM Requirements - Sam Aldrin
Re: [trill] TRILL OAM Requirements - Thomas Narten
Re: [trill] TRILL OAM Requirements - Sam Aldrin