Re: [rrg] RRG to hibernation

Shane Amante <shane@castlepoint.net> Sun, 11 November 2012 01:39 UTC

Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Shane Amante <shane@castlepoint.net>
In-Reply-To: <C64A3635-DE95-41F6-A70C-43597EB58CBB@tcb.net>
Date: Sat, 10 Nov 2012 18:38:57 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <81767641-8399-466D-A9F2-F2C07D3BBE0C@castlepoint.net>
References: <20121110032942.BD27018C113@mercury.lcs.mit.edu> <4C845B01-B282-46FB-A4B8-7ADDBCC9C6E5@tcb.net> <B80A8335-49BD-4B90-A024-FA82B1E8EE5F@tony.li> <C64A3635-DE95-41F6-A70C-43597EB58CBB@tcb.net>
To: rrg@irtf.org
Subject: Re: [rrg] RRG to hibernation
Precedence: list

On Nov 10, 2012, at 10:35 AM, Danny McPherson <danny@tcb.net> wrote:
> On Nov 10, 2012, at 12:24 PM, Tony Li wrote:
[--snip--]
>> I agree that some security needs to be deployed.  I'm not convinced that it needs to be BGPSEC.  We've muddled along for many years and never found the gumption to actually deploy anything.  Must not be important to people.  I don't get it, but that's the observable behavior.  
>> 
>> In any case, this doesn't seem like a research topic.  This is pretty clearly an engineering issue.
> 
> I don't agree.  The engineering solution that SIDR is actively working (RPKI-enabled BGPSEC) is pumping out standards track RFCs like there's no tomorrow.  The USG has stated intentions of "expediting secure routing work through the Internet standard process" and "fostering adoption through government procurement vehicles".  
> 
> As an operator this scares the hell out of me, especially considering what they've designed is largely a system to control "what's routed on the Internet and by whom".  They can't seem to do anything in BGP(SEC) without introducing the equivalent of "periodic updates", and undoing all the goodness of things like update packing completely.  
> 
> Some serious thinkers working on this problem would be goodness...

Let me add that I share Danny's concerns ...

However, let me try to take a step back and share with everyone a much broader set of, potentially, architectural concerns that I'm not sure this RG considered during the last round.

BGP was originally designed for flooding of reachability information.  But, reachability information is the end-result /after/ the application of _routing_policy_, describing "intent", by operators of individual networks based on various contractual agreements they have with parties whom they directly interconnect.  Assuming you agree with this premise, this presents a paradox from a security PoV.  Specifically, if a downstream network does not have visibility into its upstream network's routing policy is it practical/feasible for the downstream network to understand the _intended_ propagation of reachability information and, ultimately, connectivity?  Furthermore, is it feasible to carry such information within the control plane itself?  Or, should the control plane be relegated to carrying [strictly] reachability information in real-time, while offboard systems carry accompanying routing policy and security information in order to assist in making "optimal" Inter-Domain routing/forwarding decisions?

A second concern is also related to the original design of BGP and what it has organically involved into, today.  Specifically, BGP is /also/ now being tasked as a generic "message bus" and service discovery mechanism.  Not to pick on anyone, in particular, but the following are recent examples that come to my mind wrt this trend:
http://tools.ietf.org/html/draft-ietf-idr-ls-distribution-01
http://tools.ietf.org/html/draft-ietf-idr-operational-message-00
... and, there may be others.  Although, contrast those proposals with what should be most concerning to people in this RG, and in the IETF:
http://tools.ietf.org/html/draft-ietf-grow-ops-reqs-for-bgp-error-handling-05
In short, operators (such as myself) are _extremely_ concerned that a single erroneous update results in a complete reset of BGP sessions.  Due to the overwhelming success of BGP, it's now (and, has been for a while) a mission-critical protocol, thus such catastrophic session resets -- caused by a single malformed UPDATE -- are widely visible/impactful.  This impact is compounded by the 'cost to recover'.  Namely, due to the large and growing amount of information in the RIB (again, not just reachability, but also service-discovery and completely orthogonal information), it takes longer to exchange RIB information and, ultimately, restore services.  Is this really the best we, as an industry, can do?

While the IETF IDR WG has been looking at mechanisms for how BGP may defend against certain types of erroneous BGP UPDATE's for external BGP sessions:
http://tools.ietf.org/html/draft-ietf-idr-error-handling-02
... there does not appear to be any [straightforward] answer with respect to internal BGP sessions, given the requirement that BGP speakers internal to an AS must have a globally consistent RIB and FIB, otherwise packet forwarding loops will result.  And, in my personal operational experience it's _rarely_ the case that malformed UPDATE's are detected at the first ASBR (attached to an eBGP neighbor) in my AS, thus it concerns me that mechanisms such as draft-ietf-idr-error-handling-02 are an adequate solution to the problems we experience.  IOW, as an operator I desire "defense in depth" where a heterogeneous mix of vendor equipment (HW + SW), participating as interior BGP speakers, have mechanisms to detect *and* automatically recover from malformed UDPATE's received over iBGP sessions.  This is another area that I would point research colleagues toward.

So, this raises the classic conundrum of: increasing complexity, increasing RIB (and FIB) size information coupled with a contrasting need from operators who are concerned about the robustness of the protocol and the requirement to NOT sustain any failures[1].  Something's got to give.

Ultimately, this makes me question whether it's no longer _just_ growth of RIB (and, FIB) size that this RG should be (primarily?) focused on.  Rather, will the requirements for:
a) operational robustness, in the face of critical messaging errors in an Inter-Domain Routing Protocol, which the IETF may be unable to address on its own;
b) designing security as a first-class principle of an Inter-Domain Routing Protocol -- either carried within or outside of control-plane reachability information
c) increased scalability of RIB (and, other?) information
... lead us down a path of considering we may be approaching the end-of-the-road for BGPv4 and we need something new?

Does anyone on this list share similar concerns wrt operational robustness, time to recovery and (then) scalability of BGPv4?

-shane

[1] It is not cool to suggest that operators should just stop asking for new features and we wouldn't have this problem.  :)

Re: [rrg] RRG to hibernation Noel Chiappa
Re: [rrg] RRG to hibernation Scott Brim
Re: [rrg] RRG to hibernation Noel Chiappa
Re: [rrg] RRG to hibernation William Herrin
Re: [rrg] RRG to hibernation Danny McPherson
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Danny McPherson
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Shane Amante
Re: [rrg] RRG to hibernation Eliot Lear
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation heinerhummel
Re: [rrg] RRG to hibernation Danny McPherson
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Noel Chiappa
Re: [rrg] RRG to hibernation Christian Huitema
Re: [rrg] RRG to hibernation Noel Chiappa
Re: [rrg] RRG to hibernation Joel M. Halpern
Re: [rrg] RRG to hibernation Patrick Frejborg
Re: [rrg] RRG to hibernation heinerhummel
Re: [rrg] RRG to hibernation Noel Chiappa
Re: [rrg] RRG to hibernation Noel Chiappa
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Noel Chiappa
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Mark Townsley
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Noel Chiappa
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Joel M. Halpern
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Jakob Heitz
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Shane Amante
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Christian Huitema
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Patrick Frejborg
Re: [rrg] RRG to hibernation Scott Brim
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Fleischman, Eric
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Joel M. Halpern
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation George, Wes
Re: [rrg] RRG to hibernation William Herrin
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation heinerhummel
[rrg] RRG to hibernation (resent) heinerhummel
Re: [rrg] RRG to hibernation William Herrin
Re: [rrg] RRG to hibernation heinerhummel
Re: [rrg] RRG to hibernation Scott Brim
Re: [rrg] RRG to hibernation Scott Brim
Re: [rrg] RRG to hibernation Dae Young KIM
Re: [rrg] RRG to hibernation Christian Huitema
Re: [rrg] RRG to hibernation Tony Li
Re: [rrg] RRG to hibernation Eliot Lear
[rrg] Where IRON fits in Templin, Fred L
Re: [rrg] Where IRON fits in heinerhummel
Re: [rrg] RRG to hibernation Christian Huitema
Re: [rrg] RRG to hibernation Scott Brim
Re: [rrg] RRG to hibernation Scott Brim