[Idr] IETF73 IDR WG meeting minutes

Yakov Rekhter <yakov@juniper.net> Tue, 25 November 2008 15:14 UTC

Return-Path: <idr-bounces@ietf.org>
X-Original-To: idr-archive@megatron.ietf.org
Delivered-To: ietfarch-idr-archive@core3.amsl.com
Received: from [] (localhost []) by core3.amsl.com (Postfix) with ESMTP id C71F228C1A0; Tue, 25 Nov 2008 07:14:43 -0800 (PST)
X-Original-To: idr@core3.amsl.com
Delivered-To: idr@core3.amsl.com
Received: from localhost (localhost []) by core3.amsl.com (Postfix) with ESMTP id 877E828C1A0 for <idr@core3.amsl.com>; Tue, 25 Nov 2008 07:14:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.486
X-Spam-Status: No, score=-6.486 tagged_above=-999 required=5 tests=[AWL=0.113, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([]) by localhost (core3.amsl.com []) (amavisd-new, port 10024) with ESMTP id WyywJY5TnDXH for <idr@core3.amsl.com>; Tue, 25 Nov 2008 07:14:40 -0800 (PST)
Received: from exprod7og111.obsmtp.com (exprod7og111.obsmtp.com []) by core3.amsl.com (Postfix) with ESMTP id 45FEC28C191 for <idr@ietf.org>; Tue, 25 Nov 2008 07:14:40 -0800 (PST)
Received: from source ([]) (using TLSv1) by exprod7ob111.postini.com ([]) with SMTP ID DSNKSSwWXUPdaI6ZrpIgveXU28Y4Sseeum6r@postini.com; Tue, 25 Nov 2008 07:14:38 PST
Received: from p-emfe01-sac.jnpr.net ( by P-EMHUB01-HQ.jnpr.net ( with Microsoft SMTP Server id 8.1.311.2; Tue, 25 Nov 2008 07:11:12 -0800
Received: from p-emlb01-sac.jnpr.net ([]) by p-emfe01-sac.jnpr.net with Microsoft SMTPSVC(6.0.3790.3959); Tue, 25 Nov 2008 07:11:12 -0800
Received: from emailsmtp55.jnpr.net ([]) by p-emlb01-sac.jnpr.net with Microsoft SMTPSVC(6.0.3790.3959); Tue, 25 Nov 2008 07:11:12 -0800
Received: from magenta.juniper.net ([]) by emailsmtp55.jnpr.net with Microsoft SMTPSVC(6.0.3790.1830); Tue, 25 Nov 2008 07:11:11 -0800
Received: from juniper.net (sapphire.juniper.net []) by magenta.juniper.net (8.11.3/8.11.3) with ESMTP id mAPFBBM94741 for <idr@ietf.org>; Tue, 25 Nov 2008 07:11:11 -0800 (PST) (envelope-from yakov@juniper.net)
Message-ID: <200811251511.mAPFBBM94741@magenta.juniper.net>
To: idr@ietf.org
MIME-Version: 1.0
Content-ID: <12313.1227625871.1@juniper.net>
Date: Tue, 25 Nov 2008 07:11:11 -0800
From: Yakov Rekhter <yakov@juniper.net>
X-OriginalArrivalTime: 25 Nov 2008 15:11:11.0850 (UTC) FILETIME=[0D54C4A0:01C94F10]
Subject: [Idr] IETF73 IDR WG meeting minutes
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: idr-bounces@ietf.org
Errors-To: idr-bounces@ietf.org


Attached are the minutes. Please review/comment. The deadline
is Dec 9, 2008.

IETF 73 Minutes

Document status:
RFCs 5291 and 5292 published.

Since last IETF idr-as-representation-01 and
as-documentation-reservation-00 went from individual submissions
to IDR WG documents, to IDR WG Last Call, to IETF Last Call, and
now are in RFC editor queue.  "Unheard of productivity."

draft-ietf-idr-flow-spec-02 passed working group last call but Yakov sent
comments to author to fix. Waiting for the revised draft.

draft-jakma-mrai - request to adopt as IDR WG document.

Changes default values of MRAI
Current protocol recommends 30 seconds
Paul's draft recommends 5 seconds

Yakov talked amongst working group members and area directors. The
original question, namely what should be the protocol specified
default value of MRAI, seems like the wrong question.  This is
because BGP operates in extremely diverse environments, carries
multiple types of reachability, etc. There is no one right answer
to the question, even if you isolate the question to just one
environment like Internet. Moreover, various research papers haven't
come to consensus on this either.

BGP protocol spec should not specify the default value of MRAI -
the protocol spec should specify that an implementation should have
a "knob" that allows to configure MRAI, and leave it at that.

Therefore, we should not be accepting draft-jakma-mrai as an IDR
WG document. In view of the above it would be desirable to have a
document that would obsolete the current default MRAI, and replace
it with a new requirement to have a "knob" that allows to configure

4 octets AS specific generic extended community 

Sent mail to list asking for comments about accepting the draft
as an IDR WG document, got little response.

Room poll - almost no one read it. There seems to be no interest - will 
hold off adopting.

BGP best path selection additional criteria


blackhole avoidance-00 was presented at IETF-68
It's a BGP generic issue, not MPLS specific issue.

Current draft is the result of the discussion.

Deployment scenario: Two islands connected via MPLS network.
When there's a PE-PE data plane failure, may result in blackholing of the
island to island traffic.
PE1 continues to advertise the reachability
(i.e. control plane not bound to data plane)

The fact that BGP is attracting the traffic is leading to blackhole.

Problem statement:

RFC4271, 9.1.2 defines bestpath selection
checks reachability of nexthop.
- This is not granular enough
The draft provides additional criteria for bestpath selection.

1. Reachability should be resolved in a particular data plane protocol.
2. Path available check for BGP nexthop MAY be performed.

1: Selection of data plane is outside of scope of document - matter of policy.
2: Check to see if the path itself is "lying".


About 5 people read the draft.
Out of people who read the draft, not enough people to accept as WG item.

Eric Rosen presenting

Advertisement of BestExternal Route in BGP

Basic idea:
PE2 selects IBGP learned route as best.
Normally PE2 withdraws its EBGP learned path.

With best external, PE2's EBGP still gets advertised.
PE2 still installs the best path for forwarding.

RR and confederation scenarios presented as well (see slides).

- Creates a total ordering of all paths based on bgp decision process.
- Best external path is the first path in the total order that is external to
  the domain, where domain is AS, Confed, etc.

- Fast connectivity restoration
- If nexthop of priamry route becomes unresolvable, switch to backup route
  without waiting for updates.
- Reduces intern-domain churn
- Switching to secondary route might not affect what the upstream AS sees.
- Reduces persistent IBGP oscillation.

*persistent route oscillation example*

Poll of the people at the meeting showed that several people read
the draft, and there is a good support for accepting this draft as
an IDR WG document.

Authors should send mail to mailing list asking to accept this draft
as an IDR WG document.

Fast connectivity restoration using bgp add-path

Presented by Robert Raszuk

Builds on draft-walton-bgp-add-paths-06.

All potential applications of add-path removed from that draft to
make things simpler.  Individual documents for applications are
being issued.

Problem statement:
Each BGP [instance] advertises only the best path.
Same rule applies to route refelectors
Best exit point failure triggers normal BGP control plane convergence 
that could take [too] long.
Traffic restoration time could be much less if ingress border routes 
knew about multiple exit points within the as.

Uses draft-walton-bgp-add-path-06
Define algorithm to select backup paths to advertise
Ingress routes may precompute backup paths and install forwarding 
for fast convergence.

1. Select best
2. Remove all paths whose orig id or nexthop == best (including best)
3. Run bestpath again on remaining paths to select backup

How to advertise:
1. Add Path mechanism:
     Add pathid to reachability.
2. Path-id = opaque pair-wise id generated by each speaker.

Border router sending multiple paths:
see slides

"There is a clear need to make a consistent best path selection 
in the IBGP domain."

Use something like add-path to mark 2nd best, etc.

First and second best consistently calculated in the domain via the new 
ATTR_SET path attribute.

No longer do nexthopself - it will then hide the additional paths.
(Requires advertising your external interfaces into BGP.)

ATTR-SET attribute:
Optional non-transitive attribute as a set of TLVs
Each TLV contains an attribute of the path from the border router 
that is not otherwise sent as part of the update:
Attr type:
- Interior cost
- Peer BGP identifier
- IPv4 peer address
- IPv6 peer address
Length: total length of value
Value: the value encodes the attribute of the corresponding type.

Unclear if interior cost is useful.

Fast connectivity restoration:
*see slides*

Other applications:
- Churn reduction
- Load balancing
- Graceful maintenance.

add-paths is not yet a WG document.
Will ask for this one to become WG document if add-paths

Optional non-transitive means that if speaker doesn't understand
the attribute, it will drop it. Optional non-transitive is not
sufficient to prevent propagation of an attribute across AS'

Poll of the people at the meeting showed that not enough people have 
read the draft. 

?: The trigger to switch to the backup path is IBGP?

Robert: It could be. Also BFD or other means.

MIB (Jeff Haas)

KiraKasha (spelling)(Cisco) 

Why is there no system timer?

- Jeff: The argument to the system uptime were updated. Discontinuity 
  could be put in the system timer.

- Also multiple instances of BGP follow the initial timer information.


Ilijtsch van Beijnum

bgp inter-as-cost (IAC)

Problem statement:
- Traffic engineering for incoming traffic.
  - Only ASPATH length is communicated to traffic sources.
- Can manipulate AS path length but that is not granular enough.
- Need something new.


AS1 connected to two external ASes, those two talk to 4 others.
AS1 gets traffic asymetrically.

AS1 tries to prepend.
now gets overloaded on other link.

What if we add an inter-as metric?


By default, [IAC] metric at each hop is +16.

Ilijitsch did an analysis using route views data on how common path
length is used as the tie-breaker.

For each combination of two routes, he compared AS path lengths for
each prefix.
The analysis yielded 129k to 186k of ~266 prefixes (48 to 70%) would
tie break on length.

The draft conceptually:
- Adds a new attribute: inter-as cost (IAC)
- The metric increase by 2 to 256 per AS hop
- Default IAC is +16
- At nodes recognizing IAC, replace AS path length comparison with IAC

Some stuff for backward compatibility and incremental deployment is 
present in the draft and was not presented.

But, there are other ways to do this.  E.g.:
- Attribute that is initialized by source
- Remains unmodified in transit
- Comes after AS path length comparison or inserted elsewhere in 
  tie breakers

*bgp tie breaker slide*

Eric Rosen:
The problem is there is no internet-wide metric in bgp
There's a reason for it.
There's no way for a downstream to tell the upstreams what his policies 
are. It's very hard to use AS path to bias things.

Everyone on the internet is going to have different policies.
Downstreams don't want upstreams to tell them what their policies are.

You're saying two things:
- There's no mechanism
- We don't want a mechanism.
I disagree.

Everyone still gets to express their policy.  It's just finer
granularity.  There's no problem.

Eric Rosen:
Traffic engineering by source AS - the mechanism is a knob provided 
by the updates.  This may not result in the policy you want.

You're not overriding other people's policies - they can do their own.

You're providing a cost on an AS basis.  The topology you're showing
is quite simple.  If you add multiple ASes like a real topology,
then AS 1 is going to bump up the cost of everything to the other

If you add the cost, +3, between 1 and 2 and that's on a per prefix
basis why would I do this if other attributes to do the same thing?

 The other attributes are only applied locally, not transitively.

In the case where 2 is not capable of IAC - then we have the same 

This is covered in the backward compatibility model.

The important thing is to agree there's a problem.  If so, we can work 
on the details.

Tony Li:
Responding to Eric Rosen.  We have a problem today with traffic
engineering.  We have a problem with people injecting more specifics.
It's a grosser tool and inflates the routing table.  This is useful
as another tool. We need this.

Dow Street:
You made the comment that a significant percentage of these paths
were being determined by tie-breakers.  That's an indication other
ASes don't care.

 We have to be careful - we only have this view from route-views.

Lixia Zhang:
Incremental deployment is the biggest challenge.

AS3 doesn't understand the attribute.  Instead of using the attribute,
we take AS path length and multiply it by 16 in order to yield AS
we want.  There's also a multiplier.

Lixia Zhang:
You do not know how many ASes that do/do not understand the new
thing.  How can you figure out the number 56, etc?

Experimentally.  Just like we have with the tools today.

Lixia Zhang:
Trial and error?

That's the way things are today

Tony Li:
People use looking glasses to see the current state of things in 
the Internet.

Lixia Zhang:
Given time, there is some other factor that would yield the factors 
you're seeing.

It's been tried before and failed: DPA (Destination Preference Attribute).
The reason it failed is that there was no way to incrementally deploy it
even within a single AS without introducing persistent forwarding 
loops within an AS.

I looked at that and wrote some text to cover.

The problem was incremental deployment. This also means within an AS.

You're right - if you do this, your traffic will flow differently. 
I think the multiplier helps mitigate this.

It could cause persistent forwarding loops. The problem is within the AS.

As I understand the theory, as long as the metric goes up, we 
shouldn't loop.

Dow Street:
It seems like today, because we have very coarse control of path
length, there's not too much use in trying to tune this in a fine
manner. Do you have something that doesn't have so much ripple
effect within the global system?

Dimitri Papdimitrou:
Urge Ilijitsch to talk more to traffic engineering groups.

That would be within the scope of the GROW wg.


Presented by Paul Francis.

-01 draft of FIB suppresssion with virtual aggregation and default routes.

This document was introduced at the Dublin meeting.

The idea that this attacks is that ISPs are trying to extend the
life of old routers.  The fib size is unable to hold the default
free zone (DFZ).  Older routers may be moved to the customer edge.
They may be able to cut the fib in half.

You can get probably a factor of ten FIB reduction using this mechanism. 
(Rough estimate based on topology.)

This is entirely within an AS.  There's debate of whether this is
the best working group to present this mechanism as this doesn't
affect BGP on the wire.

He received a lot of feedback on -00.

Status on how Huawei has implemented this.

Main changes:
- BCP instead of RFC.
- Added edge suppression mode, edges default to core
- Removed need for new attribute - no wire protocol changes

Merge, add, split and remove procedures for virtual prefixes (VP).

Virtual aggregtation uses VP

VP are bigger than any "real" prefix.
Certain routers FIB-install routes (tunnel) to all sub-prefixes in a VP


(Using mpls tunnel)

Edge suppresion mode (thanks to Robert Raszuk)
Core routers FIB install all routes.
Edge routers FIB-install zero or more routes and default to a core.
(Routes to customers, popular prefixes, etc.)
Edge suppression  mode allows all edge routers, not just customer 
edges to have small FIBs.

Joel Halpern:
Does this work if the edge device has more than one connection into 
operator's core?

I forget - will have to take it offline.

Removal of new attribute (thanks to Daniel Ginsburg)
In order to know which prefixes must be FIB installed, routers need to know
- Full set of vps
- VPs for which they are an AP
-00 used new attribute to convey vps
-01 uses configuration.

FIB-size management sometimes requires redefintion of VPs.
This must be done without service disruption or temporaily large FIB size.
See the draft.

Implementation status:
- In VRP5 (huawei router os)
- Currently GRE (no key) tunnels
- To ASBR: routers must FIB-install routes learned from neighbor AS.
  -  Need auto-configuration  of tunnels to remove this restriction.

Huawei wants to use interdomain tunnels to reduce stretch penalty.

Don't know the motivation of why Huawei want to use GRE instead of MPLS.
They're interested in extending it across ASes.

The auto config of the tunnels is work going on in softwires.

Next steps (technical)
- Define automatic configuration of GRE keys in BGP.
- For FIB-suppresion, GRE key identifies external peer.

Two possible approches
- draft-ietf-softwire-encaps-safi (should just work)
- Extended attributes (Huawei engineers prefer this because it reuses 
  an existing mechanism)

*GRE Example*
(effectively BGP MPLS VPN for forwarding)

Next steps for BCP

Lixia Zhang:
Happy to see this design presented here.  This is where APT (in the RRG) is

John Scudder:
Regarding carrying around GRE keys.  The softwires draft contemplated
carrying around key with every route and decided to not do that
since that means if you update, you have to update everything.  Use

Rob Rennis?:
How suboptimal. Just filtering longer prefixes and using defaults will 
get you much of what you want.

Tony Li:
One of the problems of doing this by policy is the edge is where
this may not need to stop.  This means the edge router needs to
carry the full routes.  Need the routes at least in the control
plane. You can then optimize the FIB.

Should ask GROW if operators are willing to deploy this.  

Yeah, will take it to APRICOT, NANOG, etc.

Will take version 2 of this draft to GROW working group.

-00 draft of tunnel endpoints in BGP

Presented by Paul Francis.

These are Inter-as IP tunnels.  This was motivated by stretch and latency
induces by intradomain VA but other benefits may also exist:
- load balance
- fast restoration

The idea is simple: always FIB-install tunnels, avoid extra hops in 
ASes doing VA.

Tunnel from ingress ISP to egress ISP.  ISPs that do VA in the
middle would then do shortest path through the Internet.

Inter-as ip tunnels could be implemented as extended attributes or
- would welcome feedback

The draft assumpes softwire-encaps-safi
- in the softwires draft, the tunnel endpoint must be BGP nexthop
- we extended this across ASes.

Egress tunnels can advertise tunnel parameters but tied to BGP nexthop.

softwire-encaps-safi defines the tunnel encapuation attribute
- optional transitive
- defines tunnel parameters (GRE, L2TPV3)

Our draft adds a sub-TLV which identifies the tunnel endpoint:
"endpoint address sub-TLV" This means that this tunnel can be used
to reach the NLRI in this update.

AS-PATH is not the same whether tunnel is used or not
- Origin AS is origin for both route to tunnel and route to NLRI
- NLRI containing tunnel address is in the same update

By including AS number in attribute, we detect when this is no longer true
- Could happen, for instance, as a result of upstream aggregation.

All routers is service provider  use the same tunnel endpoint address
- Anycasted across all routers (this is optional if site hosts tunnel 
- Prevents error where an upstream AS aggregates NLRI and drops one of the
  tunnel endpoints

ASes using VA should FIB-install routes to tunnel endpoints
- Makes tunneled packets.

What about load balancing?
- If upstream deaggregates, ony one of the resulting routes can have 
  a working tunnel
- Other routes can be used, only without tunnel

One improvement might be to make tunnel address a CIDR block
- Upstream ASes would have to know how to deaggregate the tunnel address.


Tony Li:
Other than the  aggregation trick at the end, why advertise a prefix for the
tunnel endpoint.

You need to anycast the entire endpoint across the AS.  You limit
the path choice to one tunnel.  By using a CIDR block, you can
compute multiple paths.  It's a load balance/traffic engineering

Tony Li: 
You can do that with /32's? You only can't re-aggregate.

If an AS further up the chain wants to deaggregate the CIDR block, they can.
You'll need a really big block to allow the whole internet to deaggregate.

Let us remember that about 20 years ago the idea of routing at AS 
granularity has been tried at NSFNET Backbone Phase II. It turned 
out to be totally impractical, largely due to the very coarse 
granularity of traffic engineering. 

You definitely need to do presentation in softwires. and also get
feedback from operators in GROW.

Idr mailing list