OAM IESG/IAB Joint Design Session October 12-14 2010 Attendees: Ron Bonica Marshall Eubanks Joel Halpern Sean Turner Tim Polk Russ Housely Katsushi Kobayashi David Harrington Loa Andersson Robert Rennison Scott Mansfield Nurit Sprecher Adrian Farrell Stewart Bryant Lou Berger Eric Gray Webex: Dan R Fred Baker Jakov Stein Deborah Brungard Gregory Cachet Jesper Michael Schof Greg Minsky Mark Lasserre Ron's Goals: slides slide 6: Ron: ethernet doesn't have OAM; Yakov: yes it does; two Dan: mentions EFM OAM slide 7: multiple types of tunneling slide 8: Adrian: what does application mean? Ron: email doesn't care about jitter and packet loss video does care. Joel: if loss rate gets high; then email still cares; it is a matter of degree last slide: in TP world, it has to work, must have routing plane but you must exercise the FIB and assume RIB is the same. Adrian: need guideline re tunnels: does user see where the tunnel actually is? does te tunnel itself have any knowledge of what it is carrying? Yakov: if ip mechanism is server for thing to do is have oam at both layers top one might trigger things in bottom layer stuart: there is another cdommunity that wants to mix the two layers some seem to e looking for complete layer isolation. Yakov: ... layer violations Ron: sometimes somebody wants to let you knwo the tunnel has prolems, but doesn't want to give details. sometimes the owner of the tunnel wants to tell you just what is going on? Yakov: server layer MUSt inform the lower layer Stiurart: but not ecessaily the details of the fault Ron: let's say we have IP running over SONET; don't know which SONET thinsg go down; if different business domains, Stuart: if ytouy ar ertunning both networks, you expect to display Yakov: the VPN end to end failure would be seen both at the path and the fault casue it triggered if same business, probably see both; but different business, may adrian: we need to e much cleaner in oru choice of words. interlayer info reporting system - not forwarding plane oam. should be outr of scope for this discussion need to focus on forwarding plane oam that feeds to the reporting mecha nism Yakobv: Rob: paasing info is control plane; depends wjhether overlay or peer model. stuart: signal might be fastest way; server reports cannot deliveer; passing info is layer vioaltion Yakov: OAMs propoagte up; stuart: some sort of signals are propogated up. serve rlayer can tell client layer; not the other way Ron: server layer may or may bot inform the client layer of what happens; policy issue; out of scope for this meeting Deb: need some kind of replacement signal; not exactly oam; Ron: not a layer violation of lower layer tells higher; vioaltion if higher layer tells lower layer. Yakov: 806 specifies lower layer can only state that it cannot deliver service; Deb: different interpretations Joel: some environments can only say I'm broken; take care of it. There are other environments; traceroute gives some details. soemtimes the tunnel hides things, soemtimes doesn't. cannot agree that different models must all operate the same; can be provider policy based on business relationships. adrian: I am in agreement with info going up. I do not see why a client layer cannot notify server layer that something is wrong. all of this polict stuff is out of scope. Lou: there are different models of information flow; I don't want to use client and server because that implies formal separatrion. some models don't have clear separation. ITU has strict rules about info passing. but we certainly have other models that should be considered, and it might be more than policy. multipath and multilink might require client what toolset are we building? what models are we building? what type os fmechnaisms we allow and/or require Yakov: the reason why ITU doesn't allow ... different service providers; Ron: downhill info is mostly: I'm sick - can you tell me whether you're the cause? Yakov: if path is e2e, we're probably going through a whole bunch of providers. Do I pass the info down to all lower service providers? Ron: if there is a lossy tunnel with one lossy link, would you want to knwo the tunnel i slossy, or which link is lossy? Ron: rather than using client/server, let's use lower/upper layers let's not focu on ITU versus IETF models Mark: do we want different tools across different layers. are we trying to achieve tools across all tunneling technologies? Ron: if ti is possible to find common arch direction for oam for all tunneling mechsanism, so we'll al be doinjg roughly the same thing. Mark: when you sayong tunneling, does that include multi-segment PWE, mpls, gre, etc.? Ron: if there's an RFC that defines we;re talking about it. Mark one toolset that works across al these? Ron: common arch so all oam speaks the same language. OAM Tunneling Considerations: slides slide 7: Joel: it is not obvious to me that management fits with the rest of the topic Ron: in the draft this presentation came from has no mention what can we do to tunnels; don't we need to discuss? adrian: favorite hobyhorse - might be management plane or control plane, but not forwarding plane OAM. Ron: strike sldie from preso Dan: if you towwards outcome, then you might need management cons in ietf documents. but management should be reoved from preso. slide 9: Ron: ping, traceroute in application land - realtime o rad hoc if done in same layer (sonet, atm) done in realtime. whether done in realtime o rad hoc is orthoganla to wheter application/tuneling space. Yakov: Eth has mode for realtime and ad hoc? Ron: stuart: it depends on what info application needs. same oam for both realtime and ad hoc? we shouldn't be forcing support for both Dan: secruity implications for both continous and on-demand modes. DoS attacks, privacy, and so on ... i do not think that we can separate completely the oam plane and the management plane the more i think, the more i see them related, and it comes back again and again Lou: we should focus on function we are providing rather than how we are doing it. stuart: we aren't looking for a function; we're looking for a toolkit that can be packaged different ways. Yakov: anothe rmode is OWAMP that runs in periodic mode (an intermidate mode) agree that on-demand vs continuous SLA is normall yrun ad hoc. stuart: why isn't te intermediate case just a special case of the other modes? Yakov: can setup Joel: as far as I cna tell that's a mgmt setup that wants ad hoc at a certain time for a certain period. It does it when the system asks it to. stuart: a sioftware app might need to coordinate such on/off functionality Yakov: special hardware support might be needed. Ron: semms like tomorrow, we'll have design session time to design intercation b etween upper/lower layers, what kind of requests go up/down, should it be ad hoc/continuous. then we can decide what needs to be available in lower layers to provide the info. OAM - an enduser perspective slides slide 2: are both fm and pm useful to users? slide3: makes sense in soem environments such as eth; do they make sense in ip layer? fred: clarify ... slide4 : flow as seen by enduser, not SP fred: i think this exists already ... yakov: new waists - many things running over http, with different service characteristsics can't treat them as a single flow for measuring specific aspects yakov: 3rd concept: if NEs cannot distinguish flows many flows together Ron: is flow the appropriate things to consider here? Yakov: looking fomr the enduser point of view can we develop oam that will help enduser - or is this asking too much form oam mechanism? slide 5: Joel: have you looked at whether your analysis is dependent on the 5 seconds? yakov: yes I tested 5sec, 10sec, ... over 10-15sec I had problems storing it. fred: have you read slatery's paper? you'r reproducing her results. slide 7: Yakov: residential chnage the distribution slide 8: VPN case slide 9: I think we leverage IPPM to meet point #2 slide 10: mirror=TWAMP reflector at far end of router and near end of INTERNET network (this is wrong, but I couldn't capture the description related to the slide. two reflectors - one provides connectivity, both provide performance slide 11: CC=continuity checks fps=frames pr second there are tools available, but not implemented on routers; you don't know if access network is causing the problem. ALU-MPLS-TP-OAM-Proposal.doc word doc Mark: BFD starts at a very fast rate without negotiation proposal at end is -separate state machines no reason not to use BFD at this time. Would like to hear from others about whether BFD should be use dor should we use another protocol? Rob: question about whether BFD starts up too fast. The poll-final mechanism sets the speed what's the problem? stuart: the transport community doesn't want negotiation one variant is to run two BFD sessions Nurit: this workshop is not only for mpls-tp. doc is a comparison of BFD and 1731, they aren't comparable. Mark: there are operators and vendors who want to see 1731; others want BFD Rob: I haven't heard the rationale for using two channels Mark: Stuart: so you want to run BFD fast on one channel and negotiate the second path Mark: this is to address one of th emain queuess stuart: P2MP doesn't do negoatioation summary - part of the communtiy doesn't want slow start; running BFD on the protection channel Mark: Mostly I wanted communtiy feedback on not having fancy negotitation; two BFDs Is BFD the right approach for all the tunneling technologies? loss delay is BFD-based all mechanisms cannot be BFD-based stuart: we're only talking about CC for BFD Ron: recap 1) we're producing a arch doc recommending what we think what is an IP tunnel? should psuedowires be included? if we include too much, we'll kill the whole effort stuart: there isn't much psuedowire over IP out there; we should let it dominate the discussion. joel: we have tools already (ping, traceroute, etc) that run over IP. how can we use the same thing if IP isn't present? stuart: we carry psuedowire OAM Joel: if we shoot for a common set of tools, how can we do that? stuart: we do that already with some tools traceroute might be used in PW multisegment it will return the addresses of the IP swutches on top in the transport space, they don't IP there at all joel: then I havve a problem. stuart: summary: the concept of running an IP-based over non-IP doesn;t make sense, but with a PW, an IP-based tools does make sense. when a multisegement PW without IP, we aren't sure what to do. Mark: we currently use different tools. we don't necessarily care about thje same things. today for single segment PW this i snot applicable concerned about single tool arch stuart" we are similar between MPLS tools and IP tools so there is some measure of arch commonaility that could lead to simplifications. Ron: tomoroow we'll be talking about what tools we want, what info we want hopefully the saem info would be available from mutliple underlying tunnels then how to build the underlying OAM. Lou: it would be good to free the docuemnt to discuss IP OAM not tunnel OAM - IP OAM what should the IP layer be able to expect from the tunnel layer? Tunnel OAM Requirements and Considerations - Nurit slides my slides have things known to everyone. slide 5: joel: did you leave out MP2MP deliberately? Nurit: I thought was simply a combination. slide 13: Joel: it sounds perfectly senesible until we talk abtou consistency? what does consistency mean? same protocol? semantics? maybe semantic consistency, but not sure hwo to get protocol consistency, esp. for non-IP transport Joel: security that involves key exchange is not scalable. yes- we need to think about security; saying it must be secure doesn't mean anything. nurit: not a requirement slide 14: nurit: we should ensure mpls-tp oam functionality is supported in new tools slide 15: adrian: last bullet is unclear/ambiguous. is this two tunnels? two tunnels of different technologies? if there are two tunnels, then the oam fo reach tunnel is mark: OAM message mapping draft is in a drfat; it should be avoided but can be necessary peer to peer interworking happens. if we're designing from the bottom up, maybe we can make these work together Ron: if you do peer to peer interop, you have an n-squerded problem. mark: there are several cases where it has been done - MPLS-to-ATM, etc. it's not a requirements; just someting that could be done. slide 17 slide 18: mark: what is the difference between an endpoint and a segment end? slide 19: Joel: I'm a node and notice a path fails; how do I contact my peer if the path is down. jakov: is notification for perf mgmt or fault mgmt? slide 20: Jakov: why only at intermediate points? where does OAM start? at ingress point? slide 27: jakov: are you talking about bidirectional failure? we also need to consdier unidirectional Forwarding Plane OAM Functionality - Bob slides slide 6: nurit: I don't understand the converion here - bob: C will generate inband notifications, but path is rerouted through C' yakov: you can have reroute event C is going to be generating notifications; it no longer knows it is no longer on the path. slide 7: bob: MPLS-TP has stimulated vendors to develop OAM for MPLS there is work in progress, and we need consensus sasha: iTU is now working on SLM - synthetic loss measurement nurit: I think you raised some good points; i still have issues with XXXXX I think we need to talk about the problem and architecture before discussing protocols bob: I think mpls-tp still has issues to be worked otu, and we need f2F communication Cross-Layer Mechanism - kobayashi slides slide 7: nurit: what is PTP? what is SIRENS? IP/MPLS OAM - Eric slides slide 10: 1731 functions or functionality? eric: if frames difffer too much, and state machine differs too much ... some drafts propose bringing LM/DM in, but proposals not consistent. could be simplified here. Yakov: when you say DM are you talking about the different DMs identified in 1731? are there several different DMs that would each get separate codepoints? eric: i think that would be ideal so IANA would allocate o nly a few points by simplifyingnthis. if other SDOs, especially country SDOs, invent their own DMs, they might define their own standards, and require lots of different codepoints. stuart: I don't recall all the things in drfat-frost. it wraps in almost no time, which they didn't do for Internet one. Let's try to future-proof eric: right now the drafts are not consistent. stuart: there is also an issue with timing, but 1731 is a 1588-old-version-only loss is based around NTP, so good for Internet/NTP and for ITU-T if we know what we want we can design one consistent function; if we have eric: IEEE has a tendency to send liaison to ITU; IERTF says "you don't know what you're doing" stuart: are designing OAM for Internet, or for Etjernet Yakov: are you going to make measurements consistent stuart: one packet protocol for both two-way and one-way. Yakov: Nurit: we're getting into solution; we need to discuss operational experience etc before designing solution eric: operators would like consistency between ITU/IEEE and IETF solutions. stuart: 1731 didn;t work propoerly in MPLS-TP. The IP oAM wasn't precise enough for a transport environment. we produced one that would work in any of these environments. What more would customers want than something that will work in a unified solution. draft-frost has now been accepted as WG draft, so we can go forward with it. Yakov: I see the throughput is in green; I think that might the weakest part of 1731. reusing 1731 there is probably a bad idea. Loa: the draft is probably an evolution from 1731. Ron: I think we should be arguing about arch, functionality, etc. not what code can be reused. Eric: this i snot th efirst time this discussion has been done; IETF may have new info, but the discussion has already been done by ITU. stuart: we started with 1731 to work out the requirements, then we developed a solution. Loa: with WG hat on ... people saying the IETF way is harder than using 1731, obviously aren't reading the 1731 and IETF documents. They address the same problems. Traceroute - Jesper slide 2: Jakov: it is unclear what problem you ar esolving. there can be multiple layers under traceroute. Jesper: IPinIP, Ron: let's change the example to IPoverIPoverIPoverIP... it should be possible to send info all the way to the top layer, unless it's against policy at some layer. Jakov: it has to be per segment once someone doesn't allow it up, it doesn't get up. Javok: are you ony interested in getting one layer down, or down into the various layers? Jesp: end-user will see Nurit: we need to talk about level of separation we want between layers. If it i snot fixed, then we may not be able to monitor the whole network Javok: I had asked earleier - if this going down one layer, I would understand this; if we are going down layers, the operator may have difficulty understanding what is being returned. slide 13: Ron: I think there is an RFC for unnumbered interfaces, but I don't remember the RFC Marshall: does anybody think we could change ICMP in this decade? slide 16: Ron: changing should to must, if you do RFCXXXX the should already becomes a must, so that becomes a non-issue Javok: would it be cleaner to have a flag that says please recurse? this e a larger mod to icmp jesp: response (garbled) marshall: suppose lisp over gre, and each domain policy allows, how would source know which message came from which router? jesp: packets will get encapsulated twice marshall: are you saying the source would then need to parse all this? and the enduser would not know what tunnels are being used in jesp: the source would not know when the the source can determine when the trace becomes recursive, but it's not clear how this would be presented to user. It is not different than the ways things are done today. Certainly my mom wouldn't understand it. It does give additinal info adrian: I'm worried about how we gfeneralize this. Possibly the definiiton of a tunnel is at astake. An SP might have different providers to go into at the lower layer, and might need to choose, and it becomes more complex if traceroute needs to decide how to process. If might be better to ba eable to package the report from lower layers, so it's delivered as a pakage. But I'm concerned that we are going to swamp the source who cannot really do anything with the info except to contact the admin for the Javok: you might be able to tell which provider needs to be called, depending on the layter in which the faults occur. Javok: I don't think the SPs will let you see this anyway. Jesp: if corp X wants to ... they can contact CPE ... Jakov: adrian: I want to put Ron on the spot; he did an RFC on genetric tunnel trace, and it addressed this problem. we didn't get a lot of traction in solution space. Ron, why did 3609 not get traction? Ron: 3609 was published in 2003, and we;ve started using tunnels for more things since then. the one we were trying to solve was that tunnels were sharing fate, and we couldn't tell. Now the world may have changed, and it might be more useful to knwo that tunnels are fate sharing. stuart: tunnel for v4v6 migartion or lisp, where tracing the path in some caes you don't want to expose how the tunnels are nested. I don't knwo how we describe the info hiding case from the transparency case. Ron: policy on a tunnel-by-tunnel basis. Jesp: some tunnels stuart: are you going to need layer 1 traceroutes, as opposed to an in-depth traceroute? jesp: it can be interesting to explore the levels, Ron: there is an implementation that exists - you do a traceroute across the top layer. and then ask if the link is a tunnel. If the answer is yes, you could set the depth bit and search through the tunnel. stuart: I'm concerned about the security issues. Ron: the router set a bit saying that this is a tunnel, and I am willing to do tunnel route for you. Jesp: that is getting significantly more complex. The original proposal is simple. getting vendors to deploy a more complex approach will make this more difficult. Javok: did 5837 get deployed? Ron: no, but it's a new RFC Jakov: we should make a note that anytime a user any type of oAM traceroute, started by an enduser, should use icmp to start the process. OAM Overview - Nurit slides Summary of OAM Functions spreadsheet Yakov: VCCV shoild eb called PWE3 Yakov: what is the point of this draft? it discusses a lot more OAM options than we need. Is this for MPLS-TP or is this meant to be a tutorial on OAM. If tutorial, it is missing discussion of what OAM means. Nurit: when we started i mpls-tp, it was an overview for the mpls-tp team. then we found out that the OPS area WG was working on soemthing similar. So we expanded the target audience. This is sort of a tutorial with reference to existng documents. for the rest of the day, we will capture what will go into a document. tomorrow mpls-tp team will use the room. Ron: who is willing to edit the document form this meeting Loa: if capture of the meeting is most important, then we can follow the IAB workshop report approach. Nurit: I think we need a documwent about what is in scope, motivations, and if it will be the basis for a series of documents, it will need to document what we need to work on, what we need to study, and what documents we need to produce. Ron: agreed. we need to define tunnels, what tunnels are in scope, and informational - what do we want to know about these tunnels. Then we need to discuss how do we get this info. Nurit: I think this would only be a partial report because we don't have consensus on the topics and content to be included. Yakov: Ron: I will write the report