Re: [Tsv-art] Tsvart telechat review of draft-ietf-pim-source-discovery-bsr-08

Stig Venaas <stig@venaas.com> Wed, 24 January 2018 18:33 UTC

Return-Path: <stig@venaas.com>
X-Original-To: tsv-art@ietfa.amsl.com
Delivered-To: tsv-art@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E9BB2127342 for <tsv-art@ietfa.amsl.com>; Wed, 24 Jan 2018 10:33:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=venaas-com.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id g_7cgXhzP1RQ for <tsv-art@ietfa.amsl.com>; Wed, 24 Jan 2018 10:33:18 -0800 (PST)
Received: from mail-qt0-x230.google.com (mail-qt0-x230.google.com [IPv6:2607:f8b0:400d:c0d::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A918D127775 for <tsv-art@ietf.org>; Wed, 24 Jan 2018 10:33:15 -0800 (PST)
Received: by mail-qt0-x230.google.com with SMTP id a27so12888622qtd.1 for <tsv-art@ietf.org>; Wed, 24 Jan 2018 10:33:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=venaas-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=uJ4hTa7W/fP1skC6JL/YrG8h5hG5Ux/7+pnnqQFXMpw=; b=fuBq0fg1l0CKQ23QORvpUP86MI3TDHIMdRbOY486gnV+gPQ6hm6JATLIVcB3XmQQKA Ig7fQi3Ur8wiHZcAQgmEhUXqP+Rp1sfUahWKxD4QLFpy0PN5NmJ+4ZHwEfvLB5JsfgbH +IVSsdGTWbBvUXIa3kQGGeq2s/uuRdI5bx9ikQIBxk1Pd/nWWqtW0093o6ijhD04JbOp OcXiFbYePre1l2X5Tz6OPWytVhPMHw25KCJacehg3e0l16EvyCnzIXFbOTEIGxNrkEVC tR4j81lDZ/+SgMSRIhfZkoqIAMz8tGYZBs3OLH4WJebniQmw1hfSS/ocPYXV1CSKDzD9 uPGg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=uJ4hTa7W/fP1skC6JL/YrG8h5hG5Ux/7+pnnqQFXMpw=; b=Sn1BItt/mfIhg8SG15262FMpaDoLv/BoN499rz/7JnLi2fw6iemlcWEDrPC0rv9EsM q/nd6x7VmDX/rgmYrQ8OADQpmhG6BPApVQ3fRWfvGUq7Q1bvXOYRVJArYNo/hPaHpsoU 4W+RZcV6157FljhmHKe+Es1TgvT4VwOX+0H/0PQ49XYojky8ElsrmAGYWy2w7/rqhpDq ecdYMCZ2ARaHtbWc510DPMUodZWr8uA8uo1tTjEgkeTgShJ+dfpKILfcp7WFFS02y0op gAq70gYrAzXKsllymfb4DqrUVptQMHRxct3y+nkoA2eAwxI9KuDy5x8Mvizko7hxtrEh E7jw==
X-Gm-Message-State: AKwxytfLRI4bp1Ml7mlXGpIP6ULvJNQkvhQXVvg8g4kdO/OxtWNuoL+a xMS7fk4e+xywGdkhlOHnN463pAllQv8iy3hyn1wtXA==
X-Google-Smtp-Source: AH8x226XDyAPmyrut3B8SsnqSeieiS0b5tImdvFd3jBWVGFshOVhK2MBarBwF5l9EISjWoUW2EhPQUY9xE/7Hp66r5E=
X-Received: by 10.55.192.202 with SMTP id v71mr10332079qkv.294.1516818794409; Wed, 24 Jan 2018 10:33:14 -0800 (PST)
MIME-Version: 1.0
Received: by 10.140.84.149 with HTTP; Wed, 24 Jan 2018 10:33:13 -0800 (PST)
In-Reply-To: <CE03DB3D7B45C245BCA0D243277949362FFAB5E4@MX307CL04.corp.emc.com>
References: <151675081688.15722.801207813861297527@ietfa.amsl.com> <CAHANBt+a1eoMGNJs5tKvNOtLKKBbW5CHE00ZaUGS62OP5goviw@mail.gmail.com> <CE03DB3D7B45C245BCA0D243277949362FFAB39B@MX307CL04.corp.emc.com> <622506ad-8fb6-f194-3b70-403c26f67d02@gmail.com> <CE03DB3D7B45C245BCA0D243277949362FFAB5E4@MX307CL04.corp.emc.com>
From: Stig Venaas <stig@venaas.com>
Date: Wed, 24 Jan 2018 10:33:13 -0800
Message-ID: <CAHANBt+NWXdfg3zryAU_n8a5hVZeB19c1gQA8eDT+-M8vUKoOA@mail.gmail.com>
To: "Black, David" <David.Black@dell.com>
Cc: Stewart Bryant <stewart.bryant@gmail.com>, "tsv-art@ietf.org" <tsv-art@ietf.org>, "ietf@ietf.org" <ietf@ietf.org>, "pim@ietf.org" <pim@ietf.org>, "draft-ietf-pim-source-discovery-bsr.all@ietf.org" <draft-ietf-pim-source-discovery-bsr.all@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsv-art/4Pe0a_v5T6Ix3xjj1lEp8IeMP2s>
Subject: Re: [Tsv-art] Tsvart telechat review of draft-ietf-pim-source-discovery-bsr-08
X-BeenThere: tsv-art@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Transport Area Review Team <tsv-art.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsv-art/>
List-Post: <mailto:tsv-art@ietf.org>
List-Help: <mailto:tsv-art-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsv-art>, <mailto:tsv-art-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Jan 2018 18:33:21 -0000

Hi

I agree keeping it simple is good, but I have some concerns about
requiring a minimal fixed time like 10 seconds in BSR (RFC 5059)
between each message. I would prefer something like:

A router MUST NOT originate more than N packets per minute, note that
this does not consider packets that are being forwarded by the router.
This document does not mandate how this should be implemented, but
some possible ways could be having a minimal time between each packet,
counting the number of packets originated and resetting the count
every minute, or using a leaky bucket algorithm. One benefit of using
a leaky bucket algorithm is that it can handle bursts better. The
default value of N is 6. The value SHOULD be configurable. Depending
on the network one may want to use a low value allowing new
information to be propagated, but with a large number of routers and
many updates, the total number of messages might become too large and
requiring too much processing. The PFM mechanism can be used to
distribute many different types of information. When defining new
types, it should be considered what changes, if any, warrants sending
a triggered message.

For the GSH (source announcement) TLV, I'll make it clear that a
triggered message is useful when a new source is detected, but one
should not trigger a message due to a source expiring (becoming
inactive).

Thoughts?

Stig


On Wed, Jan 24, 2018 at 9:40 AM, Black, David <David.Black@dell.com> wrote:
> That works for me, Thanks, --David
>
>
>> -----Original Message-----
>> From: Stewart Bryant [mailto:stewart.bryant@gmail.com]
>> Sent: Wednesday, January 24, 2018 11:45 AM
>> To: Black, David <david.black@emc.com>; Stig Venaas <stig@venaas.com>
>> Cc: tsv-art@ietf.org; ietf@ietf.org; pim@ietf.org; draft-ietf-pim-source-
>> discovery-bsr.all@ietf.org
>> Subject: Re: Tsvart telechat review of draft-ietf-pim-source-discovery-bsr-08
>>
>> The problem with complex processing under error conditions is that that
>> is where all the software bugs hang out because they are hard to test
>> and don't show up until you have the problem they are trying to fix.
>>
>> This is a case where you want the simplest possible process like a small
>> burst followed by your 60s interval which seems unlikely to stress any
>> sensibly designed implementation on a reasonably sized network.
>>
>> - Stewart
>>
>>
>> On 24/01/2018 16:30, Black, David wrote:
>> > Hi Stig,
>> >
>> >> I agree with all you wrote and will update the document. However,
>> >> there is one slight issue with the minimum time between origination of
>> >> each message. When a new source is detected, we would like to
>> >> originate a message ASAP so that receivers can start receiving the
>> >> multicast without much delay. A 10s delay would be a rather long time
>> >> if a source was detected right after the previous message was
>> >> originated. I think some delay would be warranted though, in
>> >> particular in a case where perhaps a router starts up and a large
>> >> number of directly connected sources could be detected within a short
>> >> time frame. I think an exponential back-off could make sense here.
>> >> E.g., if it is just one new source, maybe trigger a message ASAP. If a
>> >> new source is detected right after the previous one, wait a bit
>> >> longer, which also allows for aggregation of multiple sources in one
>> >> messages if several are detected later. In extreme cases one could
>> >> over time keep increasing the delay until the next update.
>> >> If sufficient we could maybe have a fixed minimum delay of 1s or not,
>> >> but that is probably too short in those extreme cases. Hence maybe an
>> >> exponential back-off.
>> > Exponential back-off sounds like a very good idea - I'd suggest adding
>> something starting from RFC 5059's back-off functionality.
>> >
>> >> I would appreciate some further guidance what you think is reasonable
>> >> here, and perhaps whether I can borrow something here from other
>> >> protocols/drafts. Part of the experiment here might be to find out
>> >> what minimum values, or how rapid back-off, is needed based on the
>> >> size of the network, the amount of sources, the types of links etc.
>> > In addition to burst scenarios (e.g., router starts up, lots of new sources
>> detected quickly as a result), I strongly suggest thinking about chaos
>> scenarios where links and/or routers are coming and going so rapidly that the
>> source population is in a constant state of flux.   If things are really bad, the
>> best thing to do may be to shut up and hope that the chaos settles out, as
>> not much useful will happen until it does, and send messages about
>> observed changes risks make things worse.  Again, exponential back-off
>> makes sense, possibly quite aggressive, e.g., back-off from 10 seconds by a
>> small factor a few times, and if things still look bad, wait at least a minute or
>> two with further back-off from that longer time until things stabilize.  This
>> needs more thought on how to adjust the back-off factor, as that off-the-
>> top-of my-head example probably exhibits peculiar behavior in scenarios
>> that just are on the edge of tripping the long delay - some thinking about
>> what stability means and how to get there may help in figuring out the
>> relative merits and applicability of backing off further vs. some kind of
>> dramatic reset, analogous to TCP's congestion window reset on timeout.
>> >
>> > As this is intended to be an experimental RFC, I don’t think a completely
>> worked-out solution is expected or required - a good discussion of the
>> problems and explanation of areas that need investigation as part of the
>> experiment ought to suffice, as suggested in last sentence quoted above.  I
>> would add some initial exponential back-off functionality as a starting point.
>> >
>> >> Also note that the general mechanism can be used for many types of
>> >> information. It depends on the information how urgent it is to
>> >> distribute it. Source discovery is particular is fairly urgent.
>> > And that should be discussed, perhaps in Section 3 somewhere.
>> >
>> > Thanks, --David
>> >
>> >
>> >> -----Original Message-----
>> >> From: Stig Venaas [mailto:stig@venaas.com]
>> >> Sent: Tuesday, January 23, 2018 7:44 PM
>> >> To: Black, David <david.black@emc.com>
>> >> Cc: tsv-art@ietf.org; draft-ietf-pim-source-discovery-bsr.all@ietf.org;
>> >> ietf@ietf.org; pim@ietf.org
>> >> Subject: Re: Tsvart telechat review of draft-ietf-pim-source-discovery-
>> bsr-08
>> >>
>> >> Hi, thanks for the great comments.
>> >>
>> >> I agree with all you wrote and will update the document. However,
>> >> there is one slight issue with the minimum time between origination of
>> >> each message. When a new source is detected, we would like to
>> >> originate a message ASAP so that receivers can start receiving the
>> >> multicast without much delay. A 10s delay would be a rather long time
>> >> if a source was detected right after the previous message was
>> >> originated. I think some delay would be warranted though, in
>> >> particular in a case where perhaps a router starts up and a large
>> >> number of directly connected sources could be detected within a short
>> >> time frame. I think an exponential back-off could make sense here.
>> >> E.g., if it is just one new source, maybe trigger a message ASAP. If a
>> >> new source is detected right after the previous one, wait a bit
>> >> longer, which also allows for aggregation of multiple sources in one
>> >> messages if several are detected later. In extreme cases one could
>> >> over time keep increasing the delay until the next update.
>> >> If sufficient we could maybe have a fixed minimum delay of 1s or not,
>> >> but that is probably too short in those extreme cases. Hence maybe an
>> >> exponential back-off.
>> >>
>> >> I would appreciate some further guidance what you think is reasonable
>> >> here, and perhaps whether I can borrow something here from other
>> >> protocols/drafts. Part of the experiment here might be to find out
>> >> what minimum values, or how rapid back-off, is needed based on the
>> >> size of the network, the amount of sources, the types of links etc.
>> >>
>> >> Also note that the general mechanism can be used for many types of
>> >> information. It depends on the information how urgent it is to
>> >> distribute it. Source discovery is particular is fairly urgent.
>> >>
>> >> Stig
>> >>
>> >>
>> >> On Tue, Jan 23, 2018 at 3:40 PM, David Black <david.black@dell.com>
>> wrote:
>> >>> Reviewer: David Black
>> >>> Review result: Ready with Issues
>> >>>
>> >>> I've reviewed this document as part of TSV-ART's ongoing effort to
>> review key
>> >>> IETF documents. These comments were written primarily for the
>> transport area
>> >>> directors, but are copied to the document's authors for their information
>> and
>> >>> to allow them to address any issues raised.  When done at the time of
>> IETF Last
>> >>> Call, the authors should consider this review together with any other
>> last-call
>> >>> comments they receive. Please always CC tsv-art@ietf.org if you reply to
>> or
>> >>> forward this review.
>> >>>
>> >>> This draft describes an experimental PFM (PIM Flooding Mechanism)
>> mechanism for
>> >>> flooding PIM information among multicast routers that is a generalized
>> form of
>> >>> the RFC 5059 PIM BSR (BootStrap Router) mechanism, and applies this
>> mechanism
>> >>> to distribution of source group mappings (PFM-SD).
>> >>>
>> >>> Early implementation experience with PFM-SD on low bandwidth radio
>> links
>> >>> (described Section 2) suggests that the mechanism is able to work better
>> than
>> >>> PIM-SM without starving other traffic in the fashion that PIM-DM may.
>> This is
>> >>> promising and (in this reviewer's opinion) justifies experimentation at
>> larger
>> >>> scale and in other network environments.  In general, this is a well-
>> written
>> >>> document and the authors should be commended for including the
>> "running code"
>> >>> implementation experience report in Section 2.
>> >>>
>> >>> Flooding mechanisms are very useful, but the time periods that govern
>> sending
>> >>> of flooding messages are crucial to avoid excessive consumption of
>> network
>> >>> resources.  Section 5 of RFC 5059 has a solid discussion of the time
>> periods
>> >>> that apply to use of flooding by the BSR mechanism.   The discussion in
>> this
>> >>> draft is somewhat weaker, raising a couple of minor issues:
>> >>>
>> >>> 1) For PFM-SD, Section 4.2 provides a reasonable discussion of time
>> periods
>> >>> that apply, but appears to be missing a minimum time period between
>> sending
>> >>> messages.   Section 5 of RFC 5059 recommends a default of 10 seconds
>> for that
>> >>> minimum time period by comparison to a default PIM BSR sending
>> interval of 60
>> >>> seconds.  That 10 second minimum default should be added to this draft,
>> as the
>> >>> same default sending interval of 60 seconds is used.
>> >>>
>> >>> 2) For future use of PFM for other purposes, Section 3.3 provides the
>> following
>> >>> guidance:
>> >>>
>> >>>     Each TLV definition will need to define when a triggered PFM message
>> needs
>> >>>     to be originated, and also whether to send periodic messages, and
>> how
>> >>>     frequent.
>> >>>
>> >>> That guidance is correct as far as it goes, but it's not particularly helpful
>> >>> to future protocol designers.   Text should be added to at least point to
>> the
>> >>> examples in section 4.2 of this draft and/or part of Section 5 of RFC 5059
>> to
>> >>> suggest the sorts of values that have proven to be workable, and
>> perhaps also
>> >>> strongly encourage (SHOULD use) a default minimum time between
>> messages of at
>> >>> least 10 seconds.
>> >>>
>> >>> Understanding this draft requires that the reader be familiar with
>> multicast
>> >>> and PIM, which is reasonable.  In addition, an understanding of PIM BSR
>> is also
>> >>> required, which is perhaps somewhat less reasonable.  An example that
>> this
>> >>> reviewer tripped over is that Section 3 of this draft states that "Like BSR,
>> >>> messages are forwarded hop by hop."  There is no further explanation
>> or
>> >>> definition of "forwarded hop by hop," making it necessary to consult RFC
>> 5059
>> >>> to understand that term, e.g., this has nothing to do with IPv6 hop-by-
>> hop
>> >>> options.  A sentence or two of explanation of this hop by hop forwarding
>> >>> concept ought to be copied and adapted from RFC 5059, and it would be
>> good to
>> >>> check for other concepts that rely on RFC 5059 for definitions.
>> >>>
>> >>>
>>
>