Re: Mirja Kühlewind's Discuss on draft-ietf-bfd-seamless-base-09: (with DISCUSS)

"Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net> Wed, 04 May 2016 16:09 UTC

Return-Path: <ietf@kuehlewind.net>
X-Original-To: rtg-bfd@ietfa.amsl.com
Delivered-To: rtg-bfd@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EA0F012D0E4 for <rtg-bfd@ietfa.amsl.com>; Wed, 4 May 2016 09:09:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.898
X-Spam-Level:
X-Spam-Status: No, score=-2.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RP_MATCHES_RCVD=-0.996, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fNeSMZACzDas for <rtg-bfd@ietfa.amsl.com>; Wed, 4 May 2016 09:09:38 -0700 (PDT)
Received: from kuehlewind.net (kuehlewind.net [83.169.45.111]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5CD1512DA44 for <rtg-bfd@ietf.org>; Wed, 4 May 2016 09:01:02 -0700 (PDT)
Received: (qmail 19590 invoked from network); 4 May 2016 17:54:19 +0200
Received: from p5dec2412.dip0.t-ipconnect.de (HELO ?192.168.178.33?) (93.236.36.18) by kuehlewind.net with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 4 May 2016 17:54:18 +0200
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: Mirja Kühlewind's Discuss on draft-ietf-bfd-seamless-base-09: (with DISCUSS)
From: "Mirja Kuehlewind (IETF)" <ietf@kuehlewind.net>
In-Reply-To: <CCC628B4-D79C-40FB-89ED-79DCF35F883A@cisco.com>
Date: Wed, 04 May 2016 17:54:23 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <B64861C9-A0F4-4613-9432-7237F6B1E7D8@kuehlewind.net>
References: <20160503093512.7446.68991.idtracker@ietfa.amsl.com> <EC687254-EEDA-4EFF-B01B-757449183CED@cisco.com> <1EB9BDDB-483F-48BC-9D2F-D68E6ACA285E@kuehlewind.net> <BB0CE67E-0DA4-420B-AE8D-4F39BDE05D55@cisco.com> <ABC84681-A58E-44B9-9E32-50EB22403603@kuehlewind.net> <2129C9FD-F4C4-4C2D-B3E9-369A754163A4@cisco.com> <E8033AED-3A75-4674-A282-1DBF850EAC09@kuehlewind.net> <7E0A3ABF-1C32-43D1-BC9B-E42BDB3090AB@cisco.com> <4C6B695B-D796-464D-ACCC-74BADD577F32@kuehlewind.net> <CCC628B4-D79C-40FB-89ED-79DCF35F883A@cisco.com>
To: "Carlos Pignataro (cpignata)" <cpignata@cisco.com>
X-Mailer: Apple Mail (2.3124)
Archived-At: <http://mailarchive.ietf.org/arch/msg/rtg-bfd/ZQDgzMwDNtpL5bkuLFPYeXV_y5c>
Cc: The IESG <iesg@ietf.org>, "rtg-bfd@ietf.org" <rtg-bfd@ietf.org>, "draft-ietf-bfd-seamless-base@ietf.org" <draft-ietf-bfd-seamless-base@ietf.org>, "bfd-chairs@ietf.org" <bfd-chairs@ietf.org>
X-BeenThere: rtg-bfd@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "RTG Area: Bidirectional Forwarding Detection DT" <rtg-bfd.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rtg-bfd/>
List-Post: <mailto:rtg-bfd@ietf.org>
List-Help: <mailto:rtg-bfd-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtg-bfd>, <mailto:rtg-bfd-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 04 May 2016 16:09:41 -0000

Hi Carlos,

> Am 04.05.2016 um 17:44 schrieb Carlos Pignataro (cpignata) <cpignata@cisco.com>:
> 
> Hi, Mirja,
> 
>> On May 4, 2016, at 11:29 AM, Mirja Kuehlewind (IETF) <ietf@kuehlewind.net> wrote:
>> 
>> Hi Carlos,
>> 
>>> Am 04.05.2016 um 17:13 schrieb Carlos Pignataro (cpignata) <cpignata@cisco.com>:
>>> 
>>> Hi, Mirja,
>>> 
>>>> On May 4, 2016, at 10:41 AM, Mirja Kuehlewind (IETF) <ietf@kuehlewind.net> wrote:
>>>> 
>>>> Hi Carlos,
>>>> 
>>>> below.
>>>> 
>>>>> Am 04.05.2016 um 16:33 schrieb Carlos Pignataro (cpignata) <cpignata@cisco.com>:
>>>>> 
>>>>> Thanks much for the response, Mirja!
>>>>> 
>>>>> I think we are converging, please see inline.
>>>>> 
>>>>>> On May 4, 2016, at 10:13 AM, Mirja Kuehlewind (IETF) <ietf@kuehlewind.net> wrote:
>>>>>> 
>>>>>> Hi Carlos,
>>>>>> 
>>>>>> see below.
>>>>>> 
>>>>>>> Am 03.05.2016 um 19:24 schrieb Carlos Pignataro (cpignata) <cpignata@cisco.com>:
>>>>>>> 
>>>>>>> Hi, Mirja,
>>>>>>> 
>>>>>>>> On May 3, 2016, at 12:31 PM, Mirja Kuehlewind (IETF) <ietf@kuehlewind.net> wrote:
>>>>>>>> 
>>>>>>>> Hi Carlos,
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> Am 03.05.2016 um 15:40 schrieb Carlos Pignataro (cpignata) <cpignata@cisco.com>:
>>>>>>>>> 
>>>>>>>>> Hi, Mirja,
>>>>>>>>> 
>>>>>>>>> What is an uncontrolled packet in an IP network, and what entity controls controlled ones? :-)
>>>>>>>> 
>>>>>>>> Questions over questions… :-)
>>>>>>>> 
>>>>>>>> See below...
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> More seriously, please see inline.
>>>>>>>>> 
>>>>>>>>>> On May 3, 2016, at 5:35 AM, Mirja Kuehlewind <ietf@kuehlewind.net> wrote:
>>>>>>>>>> 
>>>>>>>>>> Mirja Kühlewind has entered the following ballot position for
>>>>>>>>>> draft-ietf-bfd-seamless-base-09: Discuss
>>>>>>>>>> 
>>>>>>>>>> When responding, please keep the subject line intact and reply to all
>>>>>>>>>> email addresses included in the To and CC lines. (Feel free to cut this
>>>>>>>>>> introductory paragraph, however.)
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
>>>>>>>>>> for more information about IESG DISCUSS and COMMENT positions.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> The document, along with other ballot positions, can be found here:
>>>>>>>>>> https://datatracker.ietf.org/doc/draft-ietf-bfd-seamless-base/
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>>> DISCUSS:
>>>>>>>>>> ----------------------------------------------------------------------
>>>>>>>>>> 
>>>>>>>>>> As S-BFD has no initiation process anymore it is not guarenteed that the
>>>>>>>>>> receiver/responder actually exists. That means that packets could float
>>>>>>>>>> (uncontrolled) in the network or even outside of the adminstrative domain
>>>>>>>>>> (e.g. due to configuration mistakes). From my point of view this document
>>>>>>>>>> should recommend/require two things:
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> We have called out the misconfiguration — however:
>>>>>>>>> 
>>>>>>>>>> 1) A maximum number of S-BFD packet that is allow to be send without
>>>>>>>>>> getting a response (maybe leading to a local error report).
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This can result in a deadlock situation, if an S-BFD Reflector is enabled much later. I’m very hesitant to cap the packets sent. We can, and I think it is useful, MAY log an error for multiple timeouts.
>>>>>>>> 
>>>>>>>> Okay, I understand that a hard limit probably does make sense. An error log seems definitely useful.
>>>>>>> 
>>>>>>> OK, that sounds good. See below.
>>>>>>> 
>>>>>>>> Another proposal for consideration: Currently the draft says an initiator should only send one packet per second if the target is in ADMINDOWN state. In this case there this state is explicit announced. However if the other end just disappears or was never/not yet there, one could use an exponential back off instead, starting with a smaller intervals than one second but then increase it exponentially. Just an idea...
>>>>>>> 
>>>>>>> Thanks for the proposal. Please have in mind however that this is a protocol for detecting liveness (and lack of it), so increasing exponentially defeats the purpose.
>>>>>>> 
>>>>>>> Further, exponential back off may not be the best choice when interacting with routing protocols.
>>>>>>> 
>>>>>>> What we currently say is:
>>>>>>>  The criteria for declaring loss of
>>>>>>>  reachability and the action that would be triggered as a result
>>>>>>>  are outside the scope of this document.
>>>>>>> 
>>>>>>> As much of these are implementation choices.
>>>>>>> 
>>>>>>> But we can add at the end “, and MAY include logging an error.“
>>>>>> 
>>>>>> Please do so.
>>>>> 
>>>>> Done.
>>>>> 
>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 2) Egress filtering at the adminstrative border of the domain that uses
>>>>>>>>>> S-BFD to make sure that no S-BFD packets leave the domain.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> This is no different than any other application that uses UDP; a misconfigured DNS server will result in the same, and a traceroute is also not too different. This seems too onerous of a requirement. An administrative domain filters at ingress.
>>>>>>>> 
>>>>>>>> First of all, just because other protocols might have such a problem, that does mean it’s okay.
>>>>>>> 
>>>>>>> I agree with this. I had a different point in mind though — trying to specify this on every UDP application might not be the most effective way. Perhaps there’s a UDP guideline you are uncovering.
>>>>>>> 
>>>>>>>> However, correctly me if I’m wrong, but there the situation seems slightly different because there is no termination criterium at all that means an s-bfd node would send useless data forever (… until manual change of the config).
>>>>>>>> 
>>>>>>> 
>>>>>>> But as far as this doc is concerned, let me try to make some clarifications (and a correction).
>>>>>>> 
>>>>>>> There are termination criteria — the document says:
>>>>>>> 
>>>>>>> An SBFDInitiator may be a persistent session on the initiator with a
>>>>>>> timer for S-BFD control packet transmissions (stateful
>>>>>>> SBFDInitiator).  An SBFDInitiator may also be a module, a script or a
>>>>>>> tool on the initiator that transmits one or more S-BFD control
>>>>>>> packets "when needed" (stateless SBFDInitiator).
>>>>>>> 
>>>>>>> For the case in which you have a “when needed” SBFDInitiator, the workflow is like a “ping”.
>>>>>>> 
>>>>>>> For the case in which you have a “persistent" SBFDInitiator, in theory this can run forever (for some value of ever). However, please don’t loose track of why this protocol exists. Having OAM failures and do nothing about it defeats the purpose of having OAM. Meaning, a red alarm will blink, a honk will horn, and the config state will be changed (manually or by some support system).
>>>>>>> 
>>>>>>> In other words, I do not think this is such a unique case (insofar as running ad-infinutum).
>>>>>> 
>>>>>> I still believe that the case where you have a misconfiguration and the initiator sends packets (forever) but never ever gest a reply (and never has seen a reply in the past), is a different case and can be detected and handled separately.
>>>>>> 
>>>>> 
>>>>> Again, I would not qualify this as ‘forever’, but I understand what you mean.
>>>>> 
>>>>>>> 
>>>>>>>> I still believe that egress filtering would be more appropriate here (than ingress) because the domain that is using s-bfd knows about it and therefor can set up the respective filters and should not spam others while hoping that filters are in place.
>>>>>>>> 
>>>>>>> 
>>>>>>> To me, there is no insignificant operational complexity with requiring the addition of filters throughout, for one particular application not leaking (where the leak does not cause anything special), and when the leak might happen because of a misconfiguration (or bug) but will be detected by the operational support systems. The ROI does not seem to add up.
>>>>>> 
>>>>>> Okay the document should probably not require egress filtering in any case but what’s about saying something like:
>>>>>> 
>>>>>> „If S-BFD is used it SHOULD be ensured that S-BFD control packet do not propagate outside of the administrative domain that uses it.“
>>>>>> 
>>>>> 
>>>>> We can add an additional explanation of the problem before a statement, but I do not think that particular SHOULD is actionable. How’s something like:
>>>>> 
>>>>> Explain that without handshake, a persistent initiator can send blindly, to then add “In such case, operational measures SHOULD be taken to identify if S-BFD packets are not responded to for an extended period of time, and remediate the situation”
>>>>> 
>>>>>> This is not an uncommon thing to specify also for other protocols.
>>>>>> 
>>>>> 
>>>>> I agree. Frankly, I am happy with either statement, but I think the latter might be more operationally actionable.
>>>>> 
>>>>> Preference?
>>>> 
>>>> I still would prefer something in the line as I proposed. I think there could effectively  be different action to be taken here, e.g. agree filtering or measurement to detect failure, as well as no action if for some other reason it can be ensure that should a misconfiguration can not happen (or is at least very unlikely to happen) e.g because things are automated and there are additional checks before apply a config.
>>>> 
>>> 
>>> Perhaps I can add “for an extended period of time” to the first statement (or similar wording of your liking)?
>>> 
>>> Your main concern is the “forever”. Let’s ensure it is not “forever”. However, I’m concerned that a single packet out (like a ping to the wrong address) will be violating “ it SHOULD be ensured that S-BFD control packet do not propagate outside”
>> 
>> The concern it not „forever“ but putting (unnecessary) load on other network (by accident). So I agree, one or a few packets is not a problem. So yes, adding “for an extended period of time” is fine. We could also/instead exchange the word „ensure“ by something else (maybe „control“…?).
>> 
> 
> These two changes would certainly work. 
> 
> Thank you. We will post a new rev today.
> 
> [I still think that a few packets are not “(unnecessary) load" for an IP device, it’s really not different than doing a traceroute and getting an icmp.unreach port unreachable (or if it is critical and unwelcome load for a device, those devices are protected at ingress at their border).

I do agree but you never know how people might (mis)use things in future...

> 
> But in any case, I do think that explaining the problem you highlight helps and improves the doc, and the new text on what to do does not hurt.]

Thanks. I’ll clear my discuss now and will have a look at the new version next week!

Mirja


> 
> Thanks,
> 
> — Carlos.
> 
>> Mirja
>> 
>> 
>> 
>>> 
>>> Would that work?
>>> 
>>> Thanks,
>>> 
>>> — Carlos.
>>> 
>>>> The second SHOULD that you proposed is from my point of view actually an additional point that I would also be happy to see in the doc.
>>>> 
>>>> Mirja
>>>> 
>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> — Carlos.
>>>>> 
>>>>>> Mirja
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Does the explanation of the termination criteria help?
>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Seems to me the logging will alert someone/something to take action, and should be enough.
>>>>>>>> 
>>>>>>>> Logging plus alerts is definitely a good thing.
>>>>>>>> 
>>>>>>> 
>>>>>>> I agree.
>>>>>>> 
>>>>>>> Will add “, and MAY include logging an error.” as per above.
>>>>>>> 
>>>>>>> Do you think we should expand on this somewhere else in the document?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> — Carlos.
>>>>>>> 
>>>>>>>> Mirja
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thoughts?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> — Carlos.
>