Re: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt

"Chris Hall" <chris.hall@highwayman.com> Sun, 09 December 2012 23:45 UTC

Return-Path: <chris.hall@highwayman.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C41E521F8D06 for <idr@ietfa.amsl.com>; Sun, 9 Dec 2012 15:45:00 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.961
X-Spam-Level: *
X-Spam-Status: No, score=1.961 tagged_above=-999 required=5 tests=[AWL=-2.500, BAYES_00=-2.599, GB_SUMOF=5, HELO_MISMATCH_UK=1.749, HOST_MISMATCH_NET=0.311]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id igxof2YqpESF for <idr@ietfa.amsl.com>; Sun, 9 Dec 2012 15:45:00 -0800 (PST)
Received: from smtp.demon.co.uk (mdfmta004.mxout.tbr.inty.net [91.221.168.45]) by ietfa.amsl.com (Postfix) with ESMTP id D937121F8D82 for <idr@ietf.org>; Sun, 9 Dec 2012 15:44:59 -0800 (PST)
Received: from mdfmta004.tbr.inty.net (unknown [127.0.0.1]) by mdfmta004.tbr.inty.net (Postfix) with ESMTP id 2A387A0C080; Sun, 9 Dec 2012 23:44:58 +0000 (GMT)
Received: from mdfmta004.tbr.inty.net (unknown [127.0.0.1]) by mdfmta004.tbr.inty.net (Postfix) with ESMTP id F3CCEA0C07F; Sun, 9 Dec 2012 23:44:57 +0000 (GMT)
Received: from hestia.halldom.com (unknown [80.177.246.130]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mdfmta004.tbr.inty.net (Postfix) with ESMTP; Sun, 9 Dec 2012 23:44:57 +0000 (GMT)
Received: from hyperion.halldom.com ([80.177.246.170] helo=HYPERION) by hestia.halldom.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.76) (envelope-from <chris.hall@highwayman.com>) id 1ThqYK-0005fq-VE; Sun, 09 Dec 2012 23:44:57 +0000
From: Chris Hall <chris.hall@highwayman.com>
To: idr@ietf.org
References: <20121121191321.6164.6887.idtracker@ietfa.amsl.com> <50AD2986.90705@cisco.com> <058b01cdd3b4$9f5193b0$ddf4bb10$@highwayman.com> <8ED5B0B0F5B4854A912480C1521F973A0F4940@xmb-rcd-x13.cisco.com> <94913EE5-2864-4EE2-B474-9631430B1E22@ericsson.com> <068701cdd478$2cf01cf0$86d056d0$@highwayman.com> <CAEGVVtBy-zdLz8hVajLnuAqgzfgQHrseK4r-N9=pOZGtqV7LbA@mail.gmail.com> <CAH1iCipfup-GEeJduBti_KHvX1pUZfmZLA3Zz5Y9Aw9xV3fQ9w@mail.gmail.com>
In-Reply-To: <CAH1iCipfup-GEeJduBti_KHvX1pUZfmZLA3Zz5Y9Aw9xV3fQ9w@mail.gmail.com>
Date: Sun, 09 Dec 2012 23:44:51 -0000
Organization: Highwayman
Message-ID: <07e901cdd667$31c593e0$9550bba0$@highwayman.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Outlook 14.0
thread-index: AQHwJ9rDNhpCAk7gfRWZlMlTSLUu6QFwpw6KAjDRnx0CVlUcVAFHaBeAARUnQBoBYBPk8QGYNqrRl3HDqJA=
Content-Language: en-gb
X-MDF-HostID: 9
Subject: Re: [Idr] I-D Action: draft-ietf-idr-error-handling-03.txt
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 09 Dec 2012 23:45:00 -0000

Brian Dickson wrote (on Fri 07-Dec-2012 at 23:13 +0000):
>
> I see the high-level problem Chris describes, and think maybe 
> it requires discussing possible ways of removing 
> doubt/ambiguity over successful parsing of such MP_REACH
> and MP_UNREACH attributes.

As I wrote earlier, given a new capability, it is possible to allow
for arbitrarily broken attributes but still reliably
"treat-as-withdraw" all NLRI in an UPDATE.

This finesses the question of why one would need to allow for
arbitrarily broken attributes, because in that scenario the degree of
broken-ness is moot.

To improve error handling -- in particular, avoid session-reset --
without requiring changes at the sender end, requires a deeper
analysis.

--------------------------------

For the avoidance of doubt: the following all relates to a receiver
who cannot assume that the sender always sends NLRI attributes as the
first attributes -- ie, it applies to senders with currently RFC
compliant implementations (about whom the receiver has no special
knowledge), and hence to senders who do not send the above mentioned
new capability.

I start from the position that it is essential to identify all NLRI if
"treat-as-withdraw" is to be used.  In the general case this means
being able to identify the start of all attributes, and to be sure
that no attributes have been missed.  This rules out accepting
arbitrarily broken attributes.

For me, the first step in "parsing" attributes is to check the
"framing" (which is sort of the "lexical" level).  A set of attributes
passes the framing check if the sum of all attribute lengths is equal
to the 'Total Attributes Length'.  A set of attributes which does not
pass the framing check is, I suggest, too badly broken to allow all
NLRI to be reliably identified, and therefore cannot be handled as
"treat-as-withdraw".  This allows for all sorts of broken attribute
*except* for attributes with broken lengths.

Now, in the framing check, the 'Total Attributes Length' is,
effectively, a checksum -- but, not a very strong one.  However, it
may be considered sufficient to allow the attributes to be accepted,
such that any further errors can trigger "treat-as-withdraw", apart
from errors in the NLRI attributes. 

However, I am yet to be convinced of the value of accepting a
well-known attribute (eg. ORIGIN) if it has invalid Flags or does not
have (one of) the known valid Length(s).  I assume that accepting such
an attribute is designed to cope with for software issues at the far
end.  However, forming these attributes is very simple, well
exercised, and easily tested code.  It seems to me that this sort of
anomaly is more likely to be symptomatic of a framing issue than it is
to be a well-formed attribute which the sender has, for reasons
unknown, decided to send with unexpected Flags and/or Length.

Therefore, I think that the framing check should be augmented by some
"semantic" checks (eg ORIGIN must be not-Optional, not-Transitive and
must have Length == 1), to make up for (some of) the weakness of the
simple checksum.  This is partly because I feel the weakness of the
checksum is a potential problem -- but I confess this is not evidence
based.  It is also partly because I don't feel a strong need to allow
for such malformations in the well-known cases in any case -- this is
also not evidence based, but I have seen no evidence to the contrary,
either.

So, if one states the problem as:

  1) to qualify for "treat-as-withdraw" it must be possible to
     identify all NLRI attributes.

     And each NLRI attribute must be "well-formed".

     And repeated NLRI attributes are automatically cause
     for session-reset.

  2) for (1) the receiver must be able to identify all
     attributes -- because otherwise it cannot be sure of
     identifying all the NLRI ones.

  3) the test for being able to identify all attributes is
     that the attributes are properly "framed".

  4) the framing test may be augmented by various
     semantic constraints, at least for well-known
     attributes.

Then there is a debate to be had about each step in the argument from
(1) to (3), and the extent to which (4) should apply (if at all) and
which attributes it should apply to.

I believe this would resolve the ambiguity/contradiction in the
current draft, namely that:

  * "treat-as-withdraw" is mandated as a means to deal with
    arbitrarily broken attributes,

  * but cannot be used unless all NLRI can be identified,

  * which is not possible (in the general case) for
    arbitrarily broken attributes !

Resolution of the issue limits the degree of broken-ness which is
acceptable.  Note that semantic nonsense inside an Optional-Transitive
attribute would be handled as "treat-as-withdraw"... which for me is
the most important case.  (I'm pretty happy dropping a session with a
peer that cannot manage to construct a well-framed set of attributes,
but dropping a session because the peer has innocently passed on a
semantically invalid attribute does upset me.)

I understand that in the well-known RIPE/Duke incident, which raised
the profile of invalid Optional-Transitive attributes, the problem was
that some routers mangled things such that the framing-check above
would have failed.  Which is a shame.

Chris