Re: [MBONED] WGLC for draft-ietf-mboned-dc-deploy

"Holland, Jake" <> Fri, 28 February 2020 21:57 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C15133A1EB4 for <>; Fri, 28 Feb 2020 13:57:25 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id Jt5SY580BaOq for <>; Fri, 28 Feb 2020 13:57:24 -0800 (PST)
Received: from ( [IPv6:2620:100:9005:57f::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 08E203A1EB3 for <>; Fri, 28 Feb 2020 13:57:23 -0800 (PST)
Received: from pps.filterd ( []) by ( with SMTP id 01SLs06k011325; Fri, 28 Feb 2020 21:57:20 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=jan2016.eng; bh=4EroI61mnFaJMAxTSvQGWV8h++QVsWUHHHcUFfY8kvE=; b=NOeY9/5SbjZk7a1hOUi7bC5WFXWQ2S16nhhfJg6qbB0UtPAGn6E2AqPMIQHANpGPMe/5 SSOVJoIb7GFDJu8M3iyPBOCRdSAz/ycpnG3cTqV8+G1MgSrjkZxxqu5yft0+LoYOysxP 6mUXdIe+ShdoU84pwnTiEoiHjEp2lKVvfAp6xa8cwHNKB/LXpAna+Fn6iungK/JIzXu4 oIPXi3j6v7uNuKRy5TewXilEgZIuu4hcPgpg4951I0fvgEShAvvhOV7LrVo8nqkMs9yx JkI33ybjcC0LiaN6pHIRCfsELPDYPJo03Xvy3uIr7c791EiZWAjq5aydNYdbSK5FDBSV mA==
Received: from prod-mail-ppoint3 ( [] (may be forged)) by with ESMTP id 2yepx0cw3w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 28 Feb 2020 21:57:19 +0000
Received: from pps.filterd ( []) by ( with SMTP id 01SLmKvp014271; Fri, 28 Feb 2020 16:57:19 -0500
Received: from ([]) by with ESMTP id 2yepvrn6ec-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Fri, 28 Feb 2020 16:57:18 -0500
Received: from ( by ( with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 28 Feb 2020 16:57:17 -0500
Received: from ([]) by ([]) with mapi id 15.00.1497.006; Fri, 28 Feb 2020 16:57:17 -0500
From: "Holland, Jake" <>
To: Mike McBride <>, Leonard Giuliano <>
Thread-Topic: [MBONED] WGLC for draft-ietf-mboned-dc-deploy
Thread-Index: AQHV3SwFClE/ZHBexUCZ9TOUaYx1xagwH6AAgAD30QA=
Date: Fri, 28 Feb 2020 21:57:17 +0000
Message-ID: <>
References: <> <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: []
Content-Type: text/plain; charset="utf-8"
Content-ID: <>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-02-28_08:2020-02-28, 2020-02-28 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-2002050000 definitions=main-2002280155
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-02-28_08:2020-02-28, 2020-02-28 signatures=0
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 mlxscore=0 malwarescore=0 priorityscore=1501 clxscore=1011 adultscore=0 impostorscore=0 lowpriorityscore=0 spamscore=0 phishscore=0 bulkscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002280156
Archived-At: <>
Subject: Re: [MBONED] WGLC for draft-ietf-mboned-dc-deploy
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Mail List for the Mboned Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 28 Feb 2020 21:57:26 -0000

Hi Mike,

Sorry for taking so long.

This draft seems borderline to me, mostly on editorial grounds.  I support moving
forward, but I'd prefer to see some tuning of the text first.

There's some excellent technical insights in here worth publishing, but as an
informational doc I think it needs to be easier to read and give advice that's
less tentative, caveat-filled, and speculative.

Overall, I think it's about 35% too wordy, and as it stands I'm not sure the
people trying to set up their data center will bother reading it deeply enough to
extract its valuable insights, whereas more straightforward prose that gets to the
point would make it useful to them.

I was trying to do a by-section walkthrough with suggestions, but it was taking me
way too long, and will maybe never be really done, and incorporated too many
judgement calls, so I'll just throw in a few examples of what I'm talking about, and
hope the authors can use them as a guide for how I'd suggest they focus some of their

Overall, I think this doc should go forward, and provides some value even as-is, but
I think would be more than twice as useful if the text were revised with an eye toward
being concise and decisive, with a specific target audience in mind.  And so I urge the
authors to consider doing so.


1. With respect, this bit from 2.2 reads to me like 3 lines of awful word salad that
would be better said as "Overlays provide":
   often fervent and arguably partisan debate about the relative merits
   of these overlay technologies belies the fact that, conceptually, it
   may be said that these overlays mainly simply provide"

This is one of the worst examples I saw, but the overwhelming bulk of my editorial
objections are about text that's got similarities to this.  It's gotta be tighter text,
nobody I know can read that kind of stuff for long.  Everything similar to this is the
main thing that I'd like to see changed.

I'm not giving a complete list of detailed examples in this review, but when I said
"35% too wordy overall" in the intro, I mean to suggest that it's probably possible to
say the same thing more effectively by cutting or rephrasing the least essential 35%
of the words.

For the particular snippet above, I was able to suggest about a 96% cut.  Most of the
rest of the text is much less severe, but has similar opportunities distributed
liberally throughout, IMO.

2.  Every sentence with "likely" or "future" in it seems speculative, and usually like
it's trying to justify why someone would bother reading this doc.

I suggest assuming instead that whoever got as far as trying to read this doc already
strongly suspects they want to roll out multicast in a datacenter, and wants to know
how to do it, what to watch out for, where they have to make tricky choices, and what
the important factors in those choices are.  I think they won't care whether things
looked likely when the doc was first written, and will be annoyed at having to wade
through that kind of speculation.

3. The "widely available" deployment guides and best practices in 3.4 should include
example references, IMO.  Searching for "PIM best practices" gives a bunch of "Project
Information Management" junk.

4. North/South East/West should get a definition and maybe a reference, I don't think
these terms have a well-established usage in the RFC series yet.  Probably leaf/spine

5. The "Applications" section would be better split into subsections.  It's sort of a
wall of text that changes subjects a lot.

6. I think 4.3 is far too abstract.  Phrases like "enticing possibility" and "novel
algorithms and concepts" elide the problem being discussed to the point I don't really
know what it's talking about from reading it.

The reference to [Shabaz19] is a good step in the right direction, but I'd recommend
pulling in some of the references it contains in its "comprehensive overview of other
approaches", and describing the problems they're solving, along with the pros and cons
(especially since an acm reference comes with a paywall), and trim most of the abstract
description of the solution space in the first 3 paragraphs.


Though my feedback is mainly about editorial issues, I'll also suggest adding one new
technical section about gotchas to watch out for.

I don't insist it be added, especially if it's all well-covered in the references for
the deployment guides and best practices mentioned in 3.4, but I thought I'd offer a
few particulars as suggestions to include in such a section.  It's likely there are
some others I haven't encountered, but below are a few of the most obnoxious that have
bitten me or that I've heard of.

I think what ties these together as nasty gotchas is that you think your network is
working fine, but then it suddenly stops and you have to debug it.  I think these are
probably the failure modes that are most important to highlight.

There may be other such failure scenarios worth listing, but these are the ones I know
of offhand:

- it's important to get redundancy in your IGMP/ND querier setup, because snooping
relies on seeing the membership reports.  It's easy to accidentally get traffic that
works for 60 or 120 seconds after the spontaneous report from the initial join, then
stops working because nothing is sending the query that causes re-sending of the
report, or alternatively it starts flooding everywhere in the layer 2 lan instead of
only to the joined groups when the snooping info expires, both of which can cause
disruptions in service.

- it's important to disable igmpv2 everywhere if you rely on ssm, because seeing igmpv2
messages can put the devices on a LAN into compatibility mode, which can even happen
spontaneously if the right sequence of igmpv3 messages were dropped, and which can be
persistent once it happens and the devices on the lan continue sending the v2 messages.
This can result in service disruptions when using PIM-SSM or otherwise relying on SSM
for specific (S,G)s, since the older igmp versions don't have the necessary SSM info.
(With a reference to section 7 of RFC 3376, and probably similar for mldv2.)

- there's a failure mode from having too many joined groups to re-build the membership
state in the rpf tree before the membership expires.  This can also cause a persistent
service disruption after a single link failure with redundant paths but not a redundant
forwarding tree on an otherwise functional network, and even on a network that can
recover successfully with fewer groups joined, so it can be a nasty surprise that gets
worse with scale of multicast usage, and would have a threshold that depends on the
timers. (I raise this more tentatively because it hasn't hit me, but I've heard of it


I guess I'll leave it at that in the interest of actually sending a review out this
time (I started and got stuck on this response about 3 times, starting in October).

I hope these comments are helpful, and I do think the doc is worth publishing, though
I'd ideally like to see it become easier to read first.

Thanks and regards,

´╗┐On 2/27/20, 3:13 PM, "Mike McBride" <> wrote:

    mboned crew,
    Only one response to the wglc. One more day. These types of drafts are
    what this wg are chartered to produce. Please give it a quick read and
    respond either way. If it's not useful we will drop it. But if you
    find it at all useful please respond so we can finally be done and
    move to iesg.
    On Thu, Feb 6, 2020 at 12:27 PM Leonard Giuliano
    <> wrote:
    > We would like to begin working group last call on Multicast in the Data
    > Center Overview.  This draft has been recently updated based on feedback
    > from last year's WGLC, where there was some support, but not enough
    > responses to advance the draft.  Please post whether you support/oppose
    > the advancement of the drafts as well as any comments you may have to the
    > list by Feb 28.
    > Most recent version of the draft can be found here:
    > -Chairs
    > _______________________________________________
    > MBONED mailing list
    MBONED mailing list