Re: [GROW] ISC Response to draft-ietf-grow-unique-origin-as

Danny McPherson <danny@tcb.net> Thu, 29 September 2011 01:33 UTC

Return-Path: <danny@tcb.net>
X-Original-To: grow@ietfa.amsl.com
Delivered-To: grow@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 716721F0C80 for <grow@ietfa.amsl.com>; Wed, 28 Sep 2011 18:33:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Snqw9xtVwTw0 for <grow@ietfa.amsl.com>; Wed, 28 Sep 2011 18:33:25 -0700 (PDT)
Received: from dog.tcb.net (dog.tcb.net [64.78.150.133]) by ietfa.amsl.com (Postfix) with ESMTP id 15BD61F0CFC for <grow@ietf.org>; Wed, 28 Sep 2011 18:33:25 -0700 (PDT)
Received: by dog.tcb.net (Postfix, from userid 0) id 4320C268063; Wed, 28 Sep 2011 19:36:11 -0600 (MDT)
Received: from dul1dmcphers-m2.home (pool-98-118-255-164.clppva.fios.verizon.net [98.118.255.164]) (authenticated-user smtp) (TLSv1/SSLv3 AES128-SHA 128/128) by dog.tcb.net with SMTP; Wed, 28 Sep 2011 19:36:10 -0600 (MDT) (envelope-from danny@tcb.net)
X-Avenger: version=0.7.8; receiver=dog.tcb.net; client-ip=98.118.255.164; client-port=54143; syn-fingerprint=65535:49:1:64:M1460,N,W3,N,N,T,S MacOS 10.4.8; data-bytes=0
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: text/plain; charset="us-ascii"
From: Danny McPherson <danny@tcb.net>
In-Reply-To: <20110928235156.GA65454@bikeshed.isc.org>
Date: Wed, 28 Sep 2011 21:35:55 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <352BFFD6-B2C3-4ACD-96C1-46F28B5E5719@tcb.net>
References: <20110928193323.GA57548@bikeshed.isc.org> <CC4CB415-C615-4379-842F-2177B333D380@tcb.net> <20110928235156.GA65454@bikeshed.isc.org>
To: Leo Bicknell <bicknell@isc.org>
X-Mailer: Apple Mail (2.1084)
Cc: "grow@ietf.org grow@ietf.org" <grow@ietf.org>
Subject: Re: [GROW] ISC Response to draft-ietf-grow-unique-origin-as
X-BeenThere: grow@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Grow Working Group Mailing List <grow.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/grow>, <mailto:grow-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/grow>
List-Post: <mailto:grow@ietf.org>
List-Help: <mailto:grow-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/grow>, <mailto:grow-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Sep 2011 01:33:26 -0000

Regarding your "industry" and "uniformed" comment, I'm certain many 
folks who commented and reviewed and were involved in this effort have
as much operational experience as you and I alike, and given that 
this issue was conceived based on an operational issue observed with 
anycasted DNS operations by at least one operator with which I am 
intimately familiar, incidents that also impacted operations of 
services which you are likely intimately familiar, I suggest you 
find another card to play.

There was an operational trigger related to anycasted prefixes leaked 
outside of their intended catchment that lead to the development of 
this technique.  Obviously, the desire would be that any globally 
anycasted service resolve services universally, which was presumably 
the case at least at the origin servers, but when that is not what is
observed by the services consumer techniques to more rapidly identify 
and classify a given operational issue triggered by the routing system
issues within the routing system have real utility.

> There appears to be two kinds of identification at work here.  There
> is identifing the Anycast node that is serving a particular customer,
> and then there is identifing if the route should be originated from
> a particular node.  It is my feeling that more discussion was given
> to the second part in the previous reviews of this draft, while my
> response focused almost entirely on the first part.

No, I think the intention and discussion of this document was
precisely to that effect, and particularly to the case where 
anycasting deviates from traditional models of unicast services 
in that an autonomous system MAY be non-contiguous and MAY have 
multiple non-intuitive "autonomous" operations from many 
locations with a single origin AUTONOMOUS system number.  Providing 
a discriminator in the form of unique origin ASes at the routing 
layer to illustrate the variances is the objective here.

> With respect to detecting hijacked routes I think this proposal is
> totally backwards.  There have been efforts in the RIR community
> to track the origin ASN of a prefix
> (https://www.arin.net/policy/proposals/2006_3.html), which only
> allow a single origin ASN to be specified.  

False.

S 3.5.1: "This additional field will be used to record a list of the 
ASes that the user permits to originate address prefixes within the 
address block."

"list" == plural.

All additional discussion in the sub-bullets related to this item 
are clearly plural as well, as is all the RPKI-esque capabilities
(e.g., multiple ROAs for a given prefix) related to this topic to 
date.  

> Research has been done
> on inconsistent-origin ASN's with interesting results
> (http://www.routeviews.org/papers/nanog_origin.pdf).

Slide 4: The fact that 31% of the prefixes in that study, albeit 
a decade old, have inconsistent origins, nicely illustrates the 
lack of utility of inconsistent origin AS reporting.

Slide 6: Indictors for what's causes "Fluctuating" in the routing 
system would be far easier to identify and attribute if all anycasted 
sources were from unique origin ASes.  I.e., a discriminator in the
routing system for anycasted prefixes would aid in origin-based routing 
system analysis of anycasted prefixes.

Slide 20: Regarding flap damping, if per-path penalty models (rather 
than per-prefix) were employed unstable paths would not penalize the 
entire prefix (e.g., paths learned from stable nodes), could even 
optimize for origin only in the particular example.  Of course, per
path implementations would improve many other issues as well.

I see _nothing in this work that suggests unique origins are a bad 
idea, although I can certainly see attributes that could be further 
analyzed based on their work if unique origin ASes were used.

Can you be specific about something this technique would impair?

>  Several of
> the BGP monitoring services only allow a single origin ASN when
> monitoring a route.  The IETF's own documents recommend using a
> consistent origin ASN for troubleshooting purposes
> (http://www.ietf.org/rfc/rfc4116.txt, section 3.1.2.2).

Hence the need to discuss the technique in this document, where folks
who offer these services or build tools in house have not already 
adapted their policy and detection capabilities.  

That said, regarding: S.3.1.2.2: 

  "Operational troubleshooting is facilitated by the use of a consistent
   origin AS.  This allows import policies to be based on a route's true
   origin rather than on intermediate routing details, which may change
   (e.g., as transit providers are added and dropped by the multihomed
   site)."

Could definitely be argued inversely in that using unique upstream ASNs 
as a detection or policy trigger point (as you currently do) is more 
problematic from a policy perspective, it's certainly the case in routing 
policies and detection indicators I've attempted to enumerate.  

Simply put, if detection or policies can't be defined (discriminated) 
based on authorized origin and a single upstream adjacent AS alone then 
adjacent ASes and upstreams need to be codified.  If the origins are 
consistent in all locations and adjacent ASes are unique per location 
via the same operator, the origin, adjacent ASes, and upstream adjacent 
sets all need to be enumerated in detection or routing policies for the 
prefix, that's the only way you can find a unique discriminator in the 
set to key off of.

> The document requires N ASN's, where N is the number of "sites"
> in the Anycast cloud.  ISC's current method with F-Root requires a
> scant N+1 ASN's, using the +1 to be the consistent origin ASN.  This
> is far easier to input into many of the tools (listing one origin
> ASN, rather than hundreds), and easier to explain to other operators
> (if F isn't originated from 3557, it's wrong).

Yes, it's easier for the operators to have a consistent origin, I have 
definitely heard that.  But when diagnosing problems in the routing 
system and detection leaks or other anomalies at the routing layer, it
is certainly not easier when an anycasted prefix for a critical 
resource can appear with any set of upstreams anywhere in the system at
any time with no unique discriminators.

> However, I think the most interesting and relevant quote comes out of
> http://irl.cs.ucla.edu/papers/originChange.pdf:
> 
>  Only the origin itself (UCLA in our example) could easily and
>  accurately distinguish be- tween a legitimate origin change and a prefix
>  hijack [4].

But it can't today!  Again, the very application of a common origin 
in all locations for a given prefix, and the thought that that prefix 
might well appear in some new place with the same origin and a new 
upstream AS, or escape outside of it's intended catchment illustrates 
just the problem.  I.e., if it's leaked or hijacked how the heck would 
anyone know?

The only way you can fix this in the routing system is with both origin 
and path validation, work occurring in various places in the community.  
In the interim, we propose that a unique origin be used per location, and 
the set of plausible upstream ASes for that origin published, and then 
folks can employ how they desire for detection, policy, or other 
functions.

> My comment was not about the difference between the model proposed and
> the F-Root model, both of which use an ASN per site.  Rather it was
> about the difference between both of those models, and a model which
> uses just a single origin ASN with no per-site ASN.
> 
> I believe in particular if you think about a router with BGP
> multi-path enabled either per-site ASN configuration is likely to
> lead to increased routes in the RIB.  Since this is a general
> recommendation, more folks doing this mean larer RIBS.

Again, not true. Anycast itself may lead to more paths for a given 
prefix, but this technique has no implication on that.

>> This isn't aimed at "end users", although ultimately, it is about adding 
>> more transparency to service consumers.  As discussed above, service level 
>> capabilities should indeed exist for "end users" and clients, I agree.  This
>> capability aims to help network operators and others that analyze routing 
>> system information to better diagnose issue with a given services and to 
>> ideally identify any anomalous or persistently misbehaving elements.
> 
> I see many of the arguments for this proposal were that it helped
> in diagnostic capabilities.  Yet in all of the GROW archives I have
> read so far, and in the RFC itself I do not see a single example
> of how this configuration leads to faster/more accurate troubleshooting
> than say the F-Root model.  

OK, re-read the draft, and read the responses above, perhaps they help
clarify.

> As I documented at the start of this mail I believe many BGP jockies
> will find this quite counter-intuitive, having been taught in classes
> and in the IETF's own other BGP documents that inconsistent origin
> ASN is bad/dangerous.  If the IETF is going to advice this document,
> than things like RFC 4116 should be updated so they don't appear
> to conflict.

Indeed, hence this document!

> But really the crux of the failing to me here is the lack of supportave
> examples.  I'd love to see an example detailed where this draft makes
> some sort of troubleshooting for Joe BGP Jockey easier, most preferably
> in the draft itself.  I sure can't see how it would make anything
> easier, but perhaps an example would make me turn around on that.  If
> this really does a lot of good, examples should be easy to produce.

Again, see example above, and re-read the draft, I hope this helps
explain the operational trigger and utility of this technique.

-danny