Re: [Idr] [Technical Errata Reported] RFC4271 (3366)

t.petch <ietfc@btconnect.com> Wed, 03 October 2012 17:21 UTC

Return-Path: <ietfc@btconnect.com>
X-Original-To: idr@ietfa.amsl.com
Delivered-To: idr@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 511E021F84E2 for <idr@ietfa.amsl.com>; Wed, 3 Oct 2012 10:21:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.283
X-Spam-Level:
X-Spam-Status: No, score=-3.283 tagged_above=-999 required=5 tests=[AWL=-0.284, BAYES_00=-2.599, J_CHICKENPOX_15=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GnU+UvfiXKdl for <idr@ietfa.amsl.com>; Wed, 3 Oct 2012 10:21:21 -0700 (PDT)
Received: from db3outboundpool.messaging.microsoft.com (db3ehsobe001.messaging.microsoft.com [213.199.154.139]) by ietfa.amsl.com (Postfix) with ESMTP id 0273E21F8495 for <idr@ietf.org>; Wed, 3 Oct 2012 10:21:19 -0700 (PDT)
Received: from mail33-db3-R.bigfish.com (10.3.81.233) by DB3EHSOBE008.bigfish.com (10.3.84.28) with Microsoft SMTP Server id 14.1.225.23; Wed, 3 Oct 2012 17:21:18 +0000
Received: from mail33-db3 (localhost [127.0.0.1]) by mail33-db3-R.bigfish.com (Postfix) with ESMTP id 1C9853402E3; Wed, 3 Oct 2012 17:21:18 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:157.56.249.85; KIP:(null); UIP:(null); IPV:NLI; H:AMSPRD0710HT002.eurprd07.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore: -25
X-BigFish: PS-25(zz9371I542M1432I1418I4015Izz1202h1d1ah1d2ahzz8275ch1033IL17326ah8275bh8275dh172cdfhz2dh2a8h5a9h668h839hd24hf0ah107ah1177h1179h1288h12a5h12a9h12bdh137ah139eh13b6h1441h304l1155h)
Received: from mail33-db3 (localhost.localdomain [127.0.0.1]) by mail33-db3 (MessageSwitch) id 1349284876731759_9236; Wed, 3 Oct 2012 17:21:16 +0000 (UTC)
Received: from DB3EHSMHS002.bigfish.com (unknown [10.3.81.238]) by mail33-db3.bigfish.com (Postfix) with ESMTP id AD7C942010A; Wed, 3 Oct 2012 17:21:16 +0000 (UTC)
Received: from AMSPRD0710HT002.eurprd07.prod.outlook.com (157.56.249.85) by DB3EHSMHS002.bigfish.com (10.3.87.102) with Microsoft SMTP Server (TLS) id 14.1.225.23; Wed, 3 Oct 2012 17:21:14 +0000
Received: from DBXPRD0310HT002.eurprd03.prod.outlook.com (157.56.252.133) by pod51017.outlook.com (10.255.160.165) with Microsoft SMTP Server (TLS) id 14.16.207.9; Wed, 3 Oct 2012 17:21:13 +0000
Message-ID: <000901cda18b$816c58e0$4001a8c0@gateway.2wire.net>
From: "t.petch" <ietfc@btconnect.com>
To: Shashank Tyagi <Shashank.Tyagi@microsoft.com>, stbryant@cisco.com, shares@ndzh.com, jgs@juniper.net, RFC Errata System <rfc-editor@rfc-editor.org>
References: <20120926090655.84AD3B1E002@rfc-editor.org> <000b01cda0a3$ea0c4b00$4001a8c0@gateway.2wire.net> <B57D4E80DCDFC64D81BCA4D6FB7E8A662207E8D7@SINEX14MBXC401.southpacific.corp.microsoft.com> <04d601cda0b7$d70b1720$4001a8c0@gateway.2wire.net> <B57D4E80DCDFC64D81BCA4D6FB7E8A662207EADF@SINEX14MBXC401.southpacific.corp.microsoft.com>
Date: Wed, 03 Oct 2012 18:21:14 +0100
MIME-Version: 1.0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2800.1106
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106
X-Originating-IP: [157.56.252.133]
X-FOPE-CRA-Verdict: 157.56.249.85$juniper.net%12218%4%btconnect.com%False%True%0$
X-OriginatorOrg: btconnect.com
Cc: idr <idr@ietf.org>
Subject: Re: [Idr] [Technical Errata Reported] RFC4271 (3366)
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Oct 2012 17:21:22 -0000

Shashank

Whether or not the IETF accepts or rejects a proposed Erratum is up to
the AD and he usually takes guidance from the relevant WG via the
mailing list, which in this case is idr.  Hence I am adding that as a
Cc:.

I remain confused.  You say below

 [shtyagi]
In my opinion the hold timer should not have any role unless the FSM
thinks the BGP connection is active. The hold timer is only intended to
check for activity in BGP session.

My current (a bit rusty) thinking is that the HoldTimer is set when
moving to OpenSent, OpenConfirm, Established states.  Moving to OpenSent
it is set to a large value (e.g. 4 minutes) and if it expires, we revert
to Idle.  If we lose the TCP connection, and this is something that the
BGP engine cannot reliably know - that is the nature of TCP stacks -
then we do not immediately give up but go back to Active, wait for the
ConnectRetryTimer to expire, start a TCP connection, move to Connect and
wait for success with TCP (again); we can then go back to OpenSent and
restart the HoldTimer.

But we can also stay in Active/Connect for ConnectRetryTimer and
DelayOpenTimer one or more times and on TCP failure in Connect go back
to Active.

What I think that you are proposing is that this can go on for ever.
What the current specification says is that if you have been going round
this loop for a large amount of time without success, then it is time to
give up.

I think the current specification has it right.

As you gather, my first reply misread the FSM thinking that a KEEPALIVE
had been sent when it had not, but that was also irrelevant; the idea of
the HoldTimer being there to clean things up has not changed.

Tom Petch

----- Original Message -----
From: "Shashank Tyagi" <Shashank.Tyagi@microsoft.com>
To: "t.petch" <ietfc@btconnect.com>; <stbryant@cisco.com>;
<shares@ndzh.com>; <jgs@juniper.net>; "RFC Errata System"
<rfc-editor@rfc-editor.org>
Sent: Tuesday, October 02, 2012 5:36 PM
-----Original Message-----
From: t.petch [mailto:ietfc@btconnect.com]
Sent: Tuesday, October 02, 2012 9:35 PM
To: Shashank Tyagi; stbryant@cisco.com; shares@ndzh.com;
jgs@juniper.net; RFC Errata System
Cc: idr@ietf.org
----- Original Message -----
From: "Shashank Tyagi" <Shashank.Tyagi@microsoft.com>
Sent: Tuesday, October 02, 2012 2:59 PM

Hi Tom,
              Thanks for your reply. Yes it refers to OpenSent state
sorry forgot to mention that. Few comments:
-In OpenSent the KEEPALIVE has not been sent, only the Open message has
been sent. So there would be no receipt of KEEPALIVE.

<tp>
Right, you are confusing me:-(

Your proposed change was to set the HoldTimer to 0 when transitioning
from OpenSent state to Active which made me think that a KEEPALIVE had
been sent and the HoldTimer could be running in OpenSent and I thought I
had found such a case in the FSM this morning.  Looking again, I cannot
find it so assume that no KEEPALIVE has been sent (I was probably
reading OpenConfirm).

[shtyagi] Probably OpenConfirm :)

What then is your suggested change for the receipt of event 18 in
OpenSent?
To stop the HoldTimer if it is running?
To ensure that the next use of the HoldTimer has a zero value and so is
not started?
Or what?

[shtyagi]
Yes to set the value to 0. Whenever the TCP is established again (in
either ACTIVE state or CONNECT state) and OPEN Message sent out the Hold
Timer Will get activated again.

The FSM is intended to be robust in the event of all possibilities,
including implementation errors, with the default being, when something
completely impossible happens, go back to Idle state.

In fact, there are an awful lot of corner cases arising out of such as
connection collisions, of data being buffered in the TCP stack when BGP
thinks there is no connection, TCP connection dropping and either noone
noticing or restarting etc so even a perfect implementation can do some
strange things.  If, thus, the HoldTimer expires in Active state, the
only thing to do IMHO is to revert to Idle (and call your software
supplier if you are sure it cannot happen:-) - as I said before, the
expectation is that the ConnectRetryTimer will expire first taking it
back to OpenSent.

 [shtyagi]
In my opinion the hold timer should not have any role unless the FSM
thinks the BGP connection is active. The hold timer is only intended to
check for activity in BGP session.
All the corner cases that you have mentioned above are already being
handled in the FSM.
BGP packets in queue after TCP failure if received in ACTIVE state will
be discarded with state changed to IDLE.
Am suggesting to only turn of the Hold Timer only after we decide the
TCP has failed and we are closing the BGP connection.

There is an additional problem in case the value is not reset. Even
assuming the connection retry is 90s and HoldTimer is 4 mins, when
ConnectRetry timer expires in ACTIVE state we don't reset the HoldTimer
and move to CONNECT state.
Ideally if peer is not accepting TCP connections we should remain stuck
in CONNECT state with ConnectRetry counter restarting.
But in above path with HoldTimer in some value we would after 1-2 tries
change state back to IDLE state which should not be the case.


Thanks,
Shashank


Tom Petch

-Also when moving from OPENSENT to ACTIVE in case of TCP failure the rfc
says to close the BGP connection. Why should the HoldTimer be running
without a BGP connection?

Thanks,
Shashank

-----Original Message-----
From: t.petch [mailto:ietfc@btconnect.com]
Sent: Tuesday, October 02, 2012 7:12 PM
To: yakov@juniper.net; tony.li@tony.li; skh@nexthop.com;
stbryant@cisco.com; adrian@olddog.co.uk; shares@ndzh.com;
jgs@juniper.net; RFC Errata System
Cc: Shashank Tyagi; rfc-editor@rfc-editor.org; idr@ietf.org
Subject: Re: [Idr] [Technical Errata Reported] RFC4271 (3366)

>From the Original Text, I infer that this refers to OpenSent, when event
18 occurs.

The Notes say

"HoldTimer should only be used to control time in between BGP packets. "

This is true but in OpenSent state (p.63), a KEEPALIVE has been sent (as
well as an OPEN) so the timer should be running.

"Also in this case it can lead to case in ACTIVE state where HoldTimer
expires before the ConnectRetryTimer leading to IDLE state."

Well, it will lead to Active state (p.59) - that is what the text says -
and event 10, HoldTimer expires, will then take you to Idle state
(p.63).

The expectation is that the HoldTimer has been set to a large value, 4
minutes, which is much greater than the ConnectRetryTimer (90s) and it
is the expiry of the latter that should take you out of Active state to
Connect and thence to OpenSent when the HoldTimer will be restarted.
Note that the receipt of a KEEPALIVE (event 26) also takes you from
Active to Idle - I think that the two cases are similar, that the BGP
connection did not reach second base.  You are in trouble either way and
back to Idle is the safe thing to do.

Of course, if you ignore that advice to set the HoldTimer to a large
value - you have yet to receive an OPEN and so do not know what value
will be acceptable to the peer - and set it instead to 3s, then yes, you
will get back to Idle more often than is desirable.

Nowadays, there is a greater wish to keep BGP connections up, compared
to when we wrote RFC4271, but I think that, in this case, there is
nothing worth keeping up and going back to Idle is the right thing to
do.

(I am not sure what the best distribution for this is so I have used
'Reply All' but expect that that should be trimmed soon).

Tom Petch


----- Original Message -----
From: "RFC Errata System" <rfc-editor@rfc-editor.org>
To: <yakov@juniper.net>; <tony.li@tony.li>; <skh@nexthop.com>;
<stbryant@cisco.com>; <adrian@olddog.co.uk>; <shares@ndzh.com>;
<jgs@juniper.net>
Cc: <shtyagi@microsoft.com>; <rfc-editor@rfc-editor.org>; <idr@ietf.org>
Sent: Wednesday, September 26, 2012 10:06 AM
Subject: [Idr] [Technical Errata Reported] RFC4271 (3366)


>
> The following errata report has been submitted for RFC4271, "A Border
> Gateway Protocol 4 (BGP-4)".
>
> --------------------------------------
> You may review the report below and at:
> http://www.rfc-editor.org/errata_search.php?rfc=4271&eid=3366
>
> --------------------------------------
> Type: Technical
> Reported by: Shashank Tyagi <shtyagi@microsoft.com>
>
> Section: 8.2.2
>
> Original Text
> -------------
> If a TcpConnectionFails event (Event 18) is received, the local
>       system:
>
>         - closes the BGP connection,
>
>         - restarts the ConnectRetryTimer,
>
>         - continues to listen for a connection that may be initiated
by
>           the remote BGP peer, and
>
>         - changes its state to Active.
>
> Corrected Text
> --------------
> If a TcpConnectionFails event (Event 18) is received, the local
>       system:
>
>         - closes the BGP connection,
>
>         - sets the HoldTimer to 0,
>
>         - restarts the ConnectRetryTimer,
>
>         - continues to listen for a connection that may be initiated
by
>           the remote BGP peer, and
>
>         - changes its state to Active.
>
> Notes
> -----
> HoldTimer should only be used to control time in between BGP packets.
> Also in this case it can lead to case in ACTIVE state where HoldTimer
expires before the ConnectRetryTimer leading to IDLE state.
>
> Instructions:
> -------------
> This errata is currently posted as "Reported". If necessary, please
> use "Reply All" to discuss whether it should be verified or rejected.
> When a decision is reached, the verifying party (IESG) can log in to
> change the status and edit the report, if necessary.
>
> --------------------------------------
> RFC4271 (draft-ietf-idr-bgp4-26)
> --------------------------------------
> Title               : A Border Gateway Protocol 4 (BGP-4)
> Publication Date    : January 2006
> Author(s)           : Y. Rekhter, Ed., T. Li, Ed., S. Hares, Ed.
> Category            : DRAFT STANDARD
> Source              : Inter-Domain Routing
> Area                : Routing
> Stream              : IETF
> Verifying Party     : IESG