OAM IESG/IAB Joint Design Session
October 12-14 2010

Attendees:
Ron Bonica
Marshall Eubanks
Joel Halpern
Sean Turner
Tim Polk
Russ Housely
Katsushi Kobayashi
David Harrington
Loa Andersson
Robert Rennison
Scott Mansfield
Nurit Sprecher
Adrian Farrell
Stewart Bryant
Lou Berger
Eric Gray

Webex:
Dan R
Fred Baker
Jakov Stein
Deborah Brungard
Gregory Cachet
Jesper 
Michael Schof
Greg Minsky
Mark Lasserre


Ron's Goals:
slides
slide 6:
Ron: ethernet doesn't have OAM; 
Yakov: yes it does; two 
Dan: mentions EFM OAM  
slide 7: multiple types of tunneling
slide 8: 
Adrian: what does application mean?
Ron: email doesn't care about jitter and packet loss
video does care.
Joel: if loss rate gets high; then email still cares; it is a matter of degree
last slide: 
in TP world, it has to work, must have routing plane but you must exercise the FIB and assume RIB is the same.
Adrian: need guideline re tunnels: does user see where the tunnel actually is?
does te tunnel itself have any knowledge of what it is carrying?

Yakov: if ip mechanism is server for 
thing to do is have oam at both layers
top one might trigger things in bottom layer
stuart: there is another cdommunity that wants to mix the two layers
some seem to e looking for complete layer isolation.
Yakov: ... layer violations
Ron: sometimes somebody wants to let you knwo the tunnel has prolems, but doesn't want to give details.
sometimes the owner of the tunnel wants to tell you just what is going on?

Yakov:
server layer MUSt inform the lower layer 
Stiurart: but not ecessaily the details of the fault
Ron: let's say we have IP running over SONET; don't know which SONET thinsg go down;
if different business domains, 
Stuart: 
if ytouy ar ertunning both networks, you expect to display
Yakov: the VPN end to end failure would be seen both at the path and the fault casue it triggered 
if same business, probably see both; 
but different business, may 
adrian: we need to e much cleaner in oru choice of words.
interlayer info reporting system - not forwarding plane oam.
should be outr of scope for this discussion
need to focus on forwarding plane oam that feeds to the reporting mecha nism
Yakobv:
Rob: paasing info is control plane; depends wjhether overlay or peer model.
stuart: signal might be fastest way; server reports cannot deliveer; passing info is layer vioaltion
Yakov: OAMs propoagte up;
stuart: some sort of signals are propogated up.
serve rlayer can tell client layer; not the other way
Ron: server layer may or may bot inform the client layer of what happens; policy issue; out of scope for this meeting
Deb: need some kind of replacement signal; not exactly oam;
Ron: not a layer violation of lower layer tells higher; vioaltion if higher layer tells lower layer.
Yakov: 806 specifies lower layer can only state that it cannot deliver service;
Deb: different interpretations
Joel: some environments can only say I'm broken; take care of it.
There are other environments; traceroute gives some details.
soemtimes the tunnel hides things, soemtimes doesn't.
cannot agree that different models must all operate the same;
can be provider policy based on business relationships.
adrian: I am in agreement with info going up. I do not see why a client layer cannot notify server layer that something is wrong.
all of this polict stuff is out of scope.
Lou: there are different models of information flow; I don't want to use client and server because that implies formal separatrion.
some models don't have clear separation. ITU has strict rules about info passing.
but we certainly have other models that should be considered, and it might be more than policy.
multipath and multilink  might require client 
what toolset are we building?
what models are we building? what type os fmechnaisms we allow and/or require
Yakov: the reason why ITU doesn't allow ...
different service providers; 
Ron: downhill info is mostly: I'm sick - can you tell me whether you're the cause?
Yakov: if path is e2e, we're probably going through a whole bunch of providers. Do I pass the info down to all lower service providers?
Ron: if there is a lossy tunnel with one lossy link, would you want to knwo the tunnel i slossy, or which link is lossy?
Ron: rather than using client/server, let's use lower/upper layers
let's not focu on ITU versus IETF models
Mark: do we want different tools across different layers.
are we trying to achieve tools across all tunneling technologies?
Ron: if ti is possible to find common arch direction for oam for all tunneling mechsanism, so we'll al be doinjg roughly the same thing.
Mark: when you sayong tunneling, does that include multi-segment PWE, mpls, gre, etc.?
Ron: if there's an RFC that defines we;re talking about it.
Mark one toolset that works across al these?
Ron: common arch so all oam speaks the same language.

OAM Tunneling Considerations:
slides
slide 7:
Joel: it is not obvious to me that management fits with the rest of the topic
Ron: in the draft this presentation came from has no mention
what can we do to tunnels; don't we need to discuss?
adrian: favorite hobyhorse - might be management plane or control plane, but not forwarding plane OAM.
Ron: strike sldie from preso
Dan: if you towwards outcome, then you might need management cons in ietf documents.
but management should be reoved from preso.
slide 9:
Ron: ping, traceroute in application land - realtime o rad hoc
if done in same layer (sonet, atm) done in realtime.
whether done in realtime o rad hoc is orthoganla to wheter application/tuneling space.
Yakov: Eth has mode for realtime and ad hoc?
Ron: 
stuart: it depends on what info application needs.
same oam for both realtime and ad hoc? we shouldn't be forcing support for both
Dan: secruity implications for both continous and on-demand modes.
DoS attacks, privacy, and so on ...
i do not think that we can separate completely the oam plane and the management plane 
the more i think, the more i see them related, and it comes back again and again

Lou: we should focus on function we are providing rather than how we are doing it.
stuart: we aren't looking for a function; we're looking for a toolkit that can be packaged different ways.
Yakov: anothe rmode is OWAMP that runs in periodic mode (an intermidate mode)
agree that on-demand vs continuous 
SLA is normall yrun ad hoc.
stuart: why isn't te intermediate case just a special case of the other modes?
Yakov: can setup 
Joel: as far as I cna tell that's a mgmt setup that wants ad hoc at a certain time for a certain period. It does it when the system asks it to.
stuart: a sioftware app might need to coordinate such on/off functionality
Yakov: special hardware support might be needed.
Ron: semms like tomorrow, we'll have design session time to design intercation b etween upper/lower layers, what kind of requests go up/down, should it be ad hoc/continuous.
then we can decide what needs to be available in lower layers to provide the info.

OAM - an enduser perspective
slides
slide 2: are both fm and pm useful to users?
slide3: makes sense in soem environments such as eth; do they make sense in ip layer?
fred: clarify ...
slide4 : flow as seen by enduser, not SP
fred: i think this exists already 
...
yakov: new waists - many things running over http, with different service characteristsics
can't treat them as a single flow for measuring specific aspects
yakov: 3rd concept:
if NEs cannot distinguish flows
many flows together 
Ron: is flow the appropriate things to consider here?
Yakov: looking fomr the enduser point of view
can we develop oam that will help enduser - or is this asking too much form oam mechanism?
slide 5:
Joel: have you looked at whether your analysis is dependent on the 5 seconds?
yakov: yes I tested 5sec, 10sec, ... over 10-15sec I had problems storing it.
fred: have you read slatery's paper? you'r reproducing her results.
slide 7:
Yakov: residential chnage the distribution
slide 8: VPN case
slide 9: 
I think we leverage IPPM to meet point #2
slide 10:
mirror=TWAMP reflector 
at far end of router and near end of INTERNET network 
(this is wrong, but I couldn't capture the description related to the slide.
two reflectors - one provides connectivity, both provide performance
slide 11:
CC=continuity checks
fps=frames pr second
there are tools available, but not implemented on routers; you don't know if access network is causing the problem.

ALU-MPLS-TP-OAM-Proposal.doc
word doc
Mark:
BFD starts at a very fast rate without negotiation
proposal at end is 
-separate state machines

no reason not to use BFD at this time.
Would like to hear from  others about whether BFD should be use dor should we use another protocol?
Rob: question about whether BFD starts up too fast.
The poll-final mechanism sets the speed
what's the problem?
stuart: the transport community doesn't want negotiation
one variant is to run two BFD sessions
Nurit: this workshop is not only for mpls-tp.
doc is a comparison of BFD and 1731, they aren't comparable.
Mark: there are operators and vendors who want to see 1731; others want BFD
Rob: I haven't heard the rationale for using two channels
Mark: 
Stuart: so you want to run BFD fast on one channel and negotiate the second path
Mark: this is to address one of th emain queuess
stuart: P2MP doesn't do negoatioation
summary -
part of the communtiy doesn't want slow start; 
running BFD on the protection channel
Mark: Mostly I wanted communtiy feedback on not having fancy negotitation; two BFDs
Is BFD the right approach for all the tunneling technologies? 
loss delay is BFD-based
all mechanisms cannot be BFD-based
stuart: we're only talking about CC for BFD

Ron: recap
1) we're producing a arch doc recommending what we think 
what is an IP tunnel?
should psuedowires be included? if we include too much, we'll kill the whole effort
stuart: there isn't much psuedowire over IP out there; we should let it dominate the discussion.
joel: we have tools already (ping, traceroute, etc) that run over IP.
how can we use the same thing if IP isn't present?
stuart: we carry psuedowire OAM
Joel: if we shoot for a common set of tools, how can we do that?
stuart: we do that already with some tools
traceroute might be used in PW multisegment
it will return the addresses of the IP swutches on top
in the transport space, they don't IP there at all
joel: then I havve a problem.
stuart: summary: the concept of running an IP-based over non-IP doesn;t make sense, but with a PW, an IP-based tools does make sense.
when a multisegement PW without IP, we aren't sure what to do.
Mark: we currently use different tools. 
we don't necessarily care about thje same things.
today for single segment PW this i snot applicable
concerned about single tool arch
stuart" we are similar between MPLS tools and IP tools
so there is some measure of arch commonaility that could lead to simplifications.
Ron: tomoroow we'll be talking about what tools we want, what info we want
hopefully the saem info would be available from mutliple underlying tunnels
then how to build the underlying OAM.
Lou: it would be good to free the docuemnt to discuss IP OAM
not tunnel OAM - IP OAM
what should the IP layer be able to expect from the tunnel layer?

Tunnel OAM Requirements and Considerations - Nurit
slides
my slides have things known to everyone.
slide 5: joel: did you leave out MP2MP deliberately?
Nurit: I thought was simply a combination.
slide 13: Joel: it sounds perfectly senesible until we talk abtou consistency? 
what does consistency mean? same protocol? semantics? maybe semantic consistency, but not sure hwo to get protocol consistency, esp. for non-IP transport
Joel: security that involves key exchange is not scalable.
yes- we need to think about security; saying it must be secure doesn't mean anything.
nurit: not a requirement
slide 14: nurit: we should ensure mpls-tp oam functionality is supported in new tools
slide 15:
adrian: last bullet is unclear/ambiguous. is this two tunnels? two tunnels of different technologies?
if there are two tunnels, then the oam fo reach tunnel is 
mark: OAM message mapping draft is in a drfat; it should be avoided but can be necessary
peer to peer interworking happens. 
if we're designing from the bottom up, maybe we can make these work together
Ron: if you do peer to peer interop, you have an n-squerded problem.
mark: there are several cases where it has been done - MPLS-to-ATM, etc.
it's not a requirements; just someting that could be done.
slide 17
slide 18: mark: what is the difference between an endpoint and a segment end?
slide 19:
Joel: I'm a node and notice a path fails; how do I contact my peer if the path is down.
jakov: is notification for perf mgmt or fault mgmt?
slide 20: Jakov: why only at intermediate points?
where does OAM start? at ingress point?
slide 27: jakov: are you talking about bidirectional failure? 
we also need to consdier unidirectional

Forwarding Plane OAM Functionality - Bob
slides
slide 6: nurit: I don't understand the converion here - 
bob: C will generate inband notifications, but path is rerouted through C'
yakov: you can have reroute event
C is going to be generating notifications; it no longer knows it is no longer on the path.
slide 7:
bob: MPLS-TP has stimulated vendors to develop OAM for MPLS
there is work in progress, and we need consensus
sasha: iTU is now working on SLM - synthetic loss measurement
nurit: I think you raised some good points; i still have issues with XXXXX
I think we need to talk about the problem and architecture before discussing protocols
bob: I think mpls-tp still has issues to be worked otu, and we need f2F communication

Cross-Layer Mechanism - kobayashi
slides
slide 7: nurit: what is PTP? what is SIRENS?


IP/MPLS OAM - Eric
slides
slide 10: 1731 functions or functionality?
eric: if frames difffer too much, and state machine differs too much ...
some drafts propose bringing LM/DM in, but proposals not consistent.
could be simplified here.
Yakov: when you say DM are you talking about the different DMs identified in 1731?
are there several different DMs that would each get separate codepoints?
eric: i think that would be ideal so IANA would allocate o nly a few points by simplifyingnthis.
if other SDOs, especially country SDOs, invent their own DMs, they might define their own standards, and require lots of different codepoints.
stuart: I don't recall all the things in drfat-frost.
it wraps in almost no time, which they didn't do for Internet one.
Let's try to future-proof
eric: right now the drafts are not consistent.
stuart: there is also an issue with timing, but 1731 is a 1588-old-version-only 
loss is based around NTP, so good for Internet/NTP and for ITU-T
if we know what we want we can design one consistent function; if we have 
eric: IEEE has a tendency to send liaison to ITU; IERTF says "you don't know what you're doing"
stuart: are designing OAM for Internet, or for Etjernet
Yakov: are you going to make measurements consistent
stuart: one packet protocol for both two-way and one-way.
Yakov: 
Nurit: we're getting into solution; we need to discuss operational experience etc before designing solution
eric: operators would like consistency between ITU/IEEE and IETF solutions.
stuart: 1731 didn;t work propoerly in MPLS-TP. The IP oAM wasn't precise enough for a transport environment. we produced one that would work in any of these environments.
What more would customers want than something that will work in a unified solution.
draft-frost has now been accepted as WG draft, so we can go forward with it.
Yakov: I see the throughput is in green; I think that might the weakest part of 1731.
reusing 1731 there is probably a bad idea.
Loa: the draft is probably an evolution from 1731.
Ron: I think we should be arguing about arch, functionality, etc. not what code can be reused.
Eric: this i snot th efirst time this discussion has been done; IETF may have new info, but the discussion has already been done by ITU.
stuart: we started with 1731 to work out the requirements, then we developed a solution.
Loa: with WG hat on ... people saying the IETF way is harder than using 1731, obviously aren't reading the 1731 and IETF documents. They address the same problems.


Traceroute - Jesper
slide 2: Jakov: it is unclear what problem you ar esolving. there can be multiple layers under traceroute.
Jesper: IPinIP, 
Ron: let's change the example to IPoverIPoverIPoverIP...
it should be possible to send info all the way to the top layer, unless it's against policy at some layer.
Jakov: it has to be per segment
once someone doesn't allow it up, it doesn't get up.
Javok: are you ony interested in getting one layer down, or down into the various layers?
Jesp: end-user will see 
Nurit: we need to talk about level of separation we want between layers.
If it i snot fixed, then we may not be able to monitor the whole network
Javok: I had asked earleier - if this going down one layer, I would understand this; if we are going down layers, the operator may have difficulty understanding what is being returned.

slide 13:
Ron: I think there is an RFC for unnumbered interfaces, but I don't remember the RFC
Marshall: does anybody think we could change ICMP in this decade?
slide 16:
Ron: changing should to must, if you do RFCXXXX the should already becomes a must, so that becomes a non-issue
Javok: would it be cleaner to have a flag that says please recurse?
this e a larger mod to icmp
jesp: response (garbled)
marshall: suppose lisp over gre, and each domain policy allows, how would source know which message came from which router?
jesp: packets will get encapsulated twice
marshall: are you saying the source would then need to parse all this?
and the enduser would not know what tunnels are being used in
jesp: the source would not know when the
the source can determine when the trace becomes recursive, but it's not clear how this would be presented to user.
It is not different than the ways things are done today. 
Certainly my mom wouldn't understand it.
It does give additinal info
adrian: I'm worried about how we gfeneralize this. Possibly the definiiton of a tunnel is at astake. An SP might have different providers to go into at the lower layer, and might need to choose,
and it becomes more complex if traceroute needs to decide how to process. If might be better to ba eable to package the report from lower layers, so it's delivered as a pakage. But I'm concerned that we are going to swamp the source who cannot really do anything with the info except to contact the admin for the 
Javok: you might be able to tell which provider needs to be called, depending on the layter in which the faults occur.
Javok: I don't think the SPs will let you see this anyway.
Jesp: if corp X wants to ... they can contact CPE ...
Jakov: 
adrian: I want to put Ron on the spot; he did an RFC on genetric tunnel trace, and it addressed this problem. we didn't get a lot of traction in solution space. Ron, why did 3609 not get traction?
Ron: 3609 was published in 2003, and we;ve started using tunnels for more things since then. the one we were trying to solve was that tunnels were sharing fate, and we couldn't tell. Now the world may have changed, and it might be more useful to knwo that tunnels are fate sharing.
stuart: tunnel for v4v6 migartion or lisp, where tracing the path 
in some caes you don't want to expose how the tunnels are nested.
I don't knwo how we describe the info hiding case from the transparency case.
Ron: policy on a tunnel-by-tunnel basis.
Jesp: some tunnels
stuart: are you going to need layer 1 traceroutes, as opposed to an in-depth traceroute?
jesp: it can be interesting to explore the levels, 
Ron: there is an implementation that exists - you do a traceroute across the top layer. and then ask if the link is a tunnel. If the answer is yes, you could set the depth bit and search through the tunnel. 
stuart: I'm concerned about the security issues.
Ron: the router set a bit saying that this is a tunnel, and I am willing to do tunnel route for you.
Jesp: that is getting significantly more complex. The original proposal is simple. getting vendors to deploy a more complex approach will make this more difficult.
Javok: did 5837 get deployed?
Ron: no, but it's a new RFC

Jakov: we should make a note that anytime a user 
any type of oAM traceroute, started by an enduser, should use icmp to start the process.

OAM Overview - Nurit
slides
Summary of OAM Functions spreadsheet
Yakov: VCCV shoild eb called PWE3 
Yakov: what is the point of this draft? it discusses a lot more OAM options than we need. Is this for MPLS-TP or is this meant to be a tutorial on OAM. If tutorial, it is missing discussion of what OAM means.
Nurit: when we started i mpls-tp, it was an overview for the mpls-tp team. then we found out that the OPS area WG was working on soemthing similar. So we expanded the target audience. This is sort of a tutorial with reference to existng documents.

for the rest of the day, we will capture what will go into a document.
tomorrow mpls-tp team will use the room.

Ron: who is willing to edit the document form this meeting
Loa: if capture of the meeting is most important, then we can follow the IAB workshop report approach.
Nurit: I think we need a documwent about what is in scope, motivations, and if it will be the basis for a series of documents, it will need to document what we need to work on, what we need to study, and what documents we need to produce.
Ron: agreed. we need to define tunnels, what tunnels are in scope, and informational - what do we want to know about these tunnels. Then we need to discuss how do we get this info.
Nurit: I think this would only be a partial report because we don't have consensus on the topics and content to be included.
Yakov: 

Ron: I will write the report