IETF61 RTGWG Minutes

Thanks to John Scudder for taking the notes.
Folks, please review and send your corrections to the list.

-- 
Alex
http://www.psg.com/~zinin

RTGWG IETF 61
-------------

    1.  Agenda bashing, aministrivia (chairs)              [5m]  00:05
    2.  Document status (chairs)                           [5m]  00:10

RFC 3906 published (informational)

GTSM -- more comments need to be integrated, last call before 
Minneapolis.  Implementors please inform mailing list/authors

Framework, loopfree, MIB well along.

uloop prevention design team constituted (names already sent to 
list).  Desire to keep membership small (already not small, so maybe 
"less big").  Goal total coverage if possible, extensible if not. 
Design team to report back by December '04.

    3.  Basic IP FRR spec update (Alia)                    [15]  00:25

Document revved to be more of a spec and less of a survey.  Need to 
read framework too because that's where definitions section is!

To do: Multihomed prefixes, link selection, SRLG.

Need more people to read & comment.  Comments to list please.

    4.  IPFRR MIB (Alia)                                   [20m] 00:45

draft-atlas-ip-local-protect-loopfree-00.txt
Only the first of many MIBs.
Doesn't cover SRLGs (yet?)
Includes protected route table (with NH, alternate NH including alt NH type)
Includes unprotected route table (just route and why)
Global routing stats (various kinds of route counts)
Interface table.
Not covered: IGP (IPFRR enabled? local holddown time?), LDP 
(protected/unprotected FECs, alt NH info including alt label).  Other 
(small) MIBs will probably be needed for these.
Please comment on: is this grouping of MIBs appropriate?

Alex Zinin: re protected/unprotected route tables, why use a 
different table instead of augmenting an existing table?
Alia: I don't know how to do that, my understanding is you can't 
really extend a MIB, this one is indexed the same as an IP routing 
table MIB which I think is as good as it gets.
Alex: so how do I use these tables?
Joel Halpern: please remember that a MIB is a MIB, it's just used for 
management purposes, it doesn't drive the implementation.
Alex: Are these different sets of routes, will it be recorded twice, 
once in normal routing table and once in unprotected table?
Alia: Yep.
Bill Fenner: I will sometimes admit to being MIB-literate.  This is 
the right thing to do.  Indexes are dup'd but info isn't.
Stewart Bryant: How do we report dynamic info like "repair attempted 
but failed"?  No doubt there will be other dynamic info.
Alia: Q is what level to detect, what level to report at.  Probably 
will be in IGP MIB and not this one.
Stewart:  I think this is really important for O&M, because these 
faults are transient so we must be very attentive to this issue.
Alia:  Yep.  We need to make sure that we can actually detect the 
errors we put in the MIB!
Stewart:  Need to go to ipfix?  Maybe doesn't even need to be in MIB.
Alia:  We should talk about it.
Stewart:  We'll try to write a draft up about it.
Don Fedyk:  We did consider that.  Error reason is in there but 
there's no history associated with it.  Take a look at what we have 
and see what needs to be improved on.
Stewart:  An example of what I'm talking about is we think we have a 
protection path but when we try to send a packet on it, it fails.

Is MIB grouping sensible, are MIBs sensible, please read and comment 
or you will  get what you deserve?  Right now draft has u-turn 
alternates in it, should it include other candidate alternate types?

Stewart:  First MIB should include basic, where there is common 
ground, then have a different MIB for advanced.
Alia:  All I mean is that there is type defined for "u-turn" for 
alternate type, and a row in interface for "can I break u-turns".
Alex:  Maybe we should just rename u-turn to "reserved"?

Comments to list please.  Very few admit to having read it.

Alex:  We'll ask on the list about making draft a WG doc.
David Ward:  Who will do IGP MIBs?
Alia:  Are you volunteering?
David:  No.  Someone from this WG should do the work and then present 
it to the IGP WGs.
Alia:  Yep.

    5.  Micro-loop prevent DT report (Alia, Mike)          [20m] 01:05
        Discussion                                         [20m] 01:25

draft-bryant-shand-lf-conv-frmwk-00.txt
draft-zinin-microloop-analysis-00.txt
Mike presenting.

Trying to bring order to chaos, we have too many partial solutions 
right now.  Trying to explain, divide solution space into types, 
consider types, summarize.

Basic problem:  Microloops resulting from conventional IGP 
converge-as-fast-as-you-can loses traffic, undoing IPFRR goodness.

Reason for uloops:  Independent/asynchronous decisions.  Loops are 
temporary!  Duration can be much longer than IPFRR time though. 
Duration driven by relative time to update FIBs (i.e., degree of 
asynchrony).  No way to guarantee two routers will take similar 
length of time to update FIBs (from one router's PoV the network 
change may cause just a few routes to change -> fast download, from 
another PoV many routes may change -> slow download).

Solution: Controlled convergence.  Inevitably makes convergence 
slower, but this is OK because IPFRR repair covers failure allowing 
leisurely convergence.  But: still want to keep traditional method as 
fallback in case of multiple failures.

Solution taxonomy:
- Controlled information flow (incremental cost change)
- Controlled distributed behavior (synchronized FIB installation, 
ordered FIB changes, path locking)

(See slides for full comparison matrix, highlights follow)
- Incremental cost change -- can take hours
- Synchronized FIB install -- seems simple, but isn't, and dependency on NTP
- ordered spf's. no changes in forwarding plane. doesn't deal with 
SRLG (only single failure is supported). Need to extend algo to a 
per-destination base. Long delays if large network diameter. Worst 
case can be pretty long...
- path locking. cons: complete coverage requires additional 
forwarding mechanisms. pros: small delay in rib/fib installation.

Detailed description of the above four methods

Ordering by signalling
Alex: Is node failure a SRLG case?
Mike, Alia:  No.  Node failure can be handled by any of these techniques.

Ordering by delay
"Lollipop topology" (for example) can make delay-ordered SPF slower 
than needed (known techniques are more pessimistic than needed).

Can combine delay and signalling (optimization of delay-based 
version, point is that signalling doesn't need to be reliable since 
delay backs it up)

Backwards compatibility is a problem.

Alex: how much is it really a problem?  Can't you just announce the
capability in your IGP and only start using the method when all routers
support it?

Mike: yes but that means if you infect your network with one router
that doesn't support this, you've broken the scheme.

Path Locking
Three epochs -- change discovery time, use transitional paths time, 
lock to new topology time.
Potential transitional path types -- tunnels, safe neighbors, packet 
marking, u-turn
Sorting out the possibilities -- what are the criteria?  Time to be 
converged (ballpark: 10 sec), simplicity, SRLG support (or really, 
unpredicted multiple failure coverage), no additional mechanisms 
beyond IP (may hurt coverage), common additional mechanisms for this 
and other advanced methods, also work for LDP.

Tentative conclusion:
- Incremental cost change impractical
- Sync'd FIB swap -- skeptical about practicality
- Ordered SPF -- long delay, poor SLRG support -- enough to be an issue?
- Path locking -- seems most promising, many possibilities (ed: but, 
maybe it's just that the newest toy is always the shiniest?)
- Haven't thought of any new methods this morning but we haven't been 
to the bar yet
- Need more brain power on this, more discussion

Danny: Is incremental deployability a hard requirement?
Alex: Yeah, and is 100% coverage required?
Danny: Sure but is incremental really a hard requirement?
Alia: Path locking can be done incremental.  You can't have a flag day.
Danny: Well not a flag day, but it would be OK to require all routers 
to have same version of code before solution becomes viable.
Alia: But still need to worry about turning it all on
Andrew Lange: That's what maintencance windows are for.

Voch Kompella: Re sync'd FIB swap -- If requirements were externally 
provided (and included atomic clocks) problem would be easier.  Are 
we making the problem harder than we have to because we are inventing 
our own requirements?
George Swallow: Are you only worried about clock skew during failure?
Mike: Skew isn't the problem, problem is skew in FIB install time.
George: So clock sync is not the biggest issue here actually.
Mike: Yes although I'm nervous about inter-layer dependencies.
Stewart: Well if you can detect that NTP isn't working then you can 
just disable the loopfree thingy.

David Ward: So we've asked for a collection of requirements but have 
no place to collect them.
Alex: Actually we haven't asked for requirements.
David: Wow.
David: How do you multicast?
Mike: General thinking is that you have to get the packet to the 
other side of the failure, can't just drop it off some place and use 
the unicast/downstream approaches because of RPF, etc.
Bill: Two halves to problem, other half is you need state to know 
where downstream neighbors are for mcast.  So fast repair has to 
repair that state as well.  You can get the packet to the other end 
of the failure OR get the join state down the repair path real fast.
Stewart: We're talking about for the repair, right?  For the uloop 
convergence you have lots of time to fix up the mfib?
Everyone: Nope nope.
Bill: You're moving the tree around.  PIM needs to get access to the 
new SPF topology before the new FIB is put into use, that might work.
Alia: At a minimum we have to not break mcast/make it worse! 
Secondary question is how to protect mcast too.

Alex: So getting back to uloop prevention...
David: Design team requested requirements, how are we going to 
provide them Alex?
Alex: Oh, thought you were asking about a requirements document
Alex: In particular SPs should try to respond to presenters 
questions/strawman requirements.  SRLGs?  Less than full coverage? 
These are important because they will drive the selection of 
mechanism.
Danny: Where ARE we going to record the requirements?
Alex: The mailing list?
Alia: The taxonomy doc?
Alex: OK.

    6.  Update on draft-atlas-ip-local-protect-uturn (Alia)[20m] 01:45

Changes:
- Explicitly marked packet identification (well known label?).  Makes 
ID'ing potential U-turn packets easier, etc.
- Example algorithm for how to look for U-turn alternates.  (Worst 
case is 1 additional SPF per neighbor.)

Next?:
- Simplify alternate selection
- More detailed explanation considering link protection

Other suggestions?
Other comments?

_______________________________________________
Rtgwg mailing list
Rtgwg@ietf.org
https://www1.ietf.org/mailman/listinfo/rtgwg