Re: [pim] New Version Notification for draft-yong-rtgwg-igp-multicast-arch-00.txt

Haoweiguo <haoweiguo@huawei.com> Thu, 26 February 2015 01:32 UTC

Return-Path: <haoweiguo@huawei.com>
X-Original-To: pim@ietfa.amsl.com
Delivered-To: pim@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id AD3C81A92E3 for <pim@ietfa.amsl.com>; Wed, 25 Feb 2015 17:32:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.61
X-Spam-Level:
X-Spam-Status: No, score=-3.61 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, J_CHICKENPOX_44=0.6, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0x6QHMRZ-x5N for <pim@ietfa.amsl.com>; Wed, 25 Feb 2015 17:32:13 -0800 (PST)
Received: from lhrrgout.huawei.com (lhrrgout.huawei.com [194.213.3.17]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 69B481A92E2 for <pim@ietf.org>; Wed, 25 Feb 2015 17:32:12 -0800 (PST)
Received: from 172.18.7.190 (EHLO lhreml406-hub.china.huawei.com) ([172.18.7.190]) by lhrrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id BSZ59917; Thu, 26 Feb 2015 01:32:11 +0000 (GMT)
Received: from NKGEML402-HUB.china.huawei.com (10.98.56.33) by lhreml406-hub.china.huawei.com (10.201.5.243) with Microsoft SMTP Server (TLS) id 14.3.158.1; Thu, 26 Feb 2015 01:32:09 +0000
Received: from NKGEML501-MBS.china.huawei.com ([169.254.2.146]) by nkgeml402-hub.china.huawei.com ([10.98.56.33]) with mapi id 14.03.0158.001; Thu, 26 Feb 2015 09:32:01 +0800
From: Haoweiguo <haoweiguo@huawei.com>
To: Lucy yong <lucy.yong@huawei.com>, IJsbrand Wijnands <ice@cisco.com>, "pim@ietf.org" <pim@ietf.org>
Thread-Topic: [pim] New Version Notification for draft-yong-rtgwg-igp-multicast-arch-00.txt
Thread-Index: AdBQaR4pKWEiAiYmQMOUdirH+SnnlQA9n39m
Date: Thu, 26 Feb 2015 01:32:00 +0000
Message-ID: <DD5FC8DE455C3348B94340C0AB5517334F8459A1@nkgeml501-mbs.china.huawei.com>
References: <2691CE0099834E4A9C5044EEC662BB9D45455CED@dfweml701-chm>
In-Reply-To: <2691CE0099834E4A9C5044EEC662BB9D45455CED@dfweml701-chm>
Accept-Language: en-US, zh-CN
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.135.23.94]
Content-Type: multipart/alternative; boundary="_000_DD5FC8DE455C3348B94340C0AB5517334F8459A1nkgeml501mbschi_"
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <http://mailarchive.ietf.org/arch/msg/pim/BSzm9NnaDgWjOkNRp5TChiS51Dg>
Subject: Re: [pim] New Version Notification for draft-yong-rtgwg-igp-multicast-arch-00.txt
X-BeenThere: pim@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Protocol Independent Multicast <pim.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/pim>, <mailto:pim-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/pim/>
List-Post: <mailto:pim@ietf.org>
List-Help: <mailto:pim-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/pim>, <mailto:pim-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 26 Feb 2015 01:32:16 -0000

Hi Ice,

Thanks for your careful reading, pls see some more explanations below with [weiguo].

weiguo

________________________________
From: pim [pim-bounces@ietf.org] on behalf of Lucy yong [lucy.yong@huawei.com]
Sent: Wednesday, February 25, 2015 3:34
To: IJsbrand Wijnands; pim@ietf.org
Subject: Re: [pim] New Version Notification for draft-yong-rtgwg-igp-multicast-arch-00.txt

Hi Ice,

Sorry for the very late response on your mail (http://www.ietf.org/mail-archive/web/pim/current/msg03055.html). Please see inline below.

Dear Lucy,

I have read through the draft, it reads very well, below are my comments:
Lucy: thanks.

1.1 Motivation

Ice: Creating a ‘multicast’ fabric can be done with PIM/mLDP/RSVP-TE as well, this is not a specific benefit of adding tree building inside of the IGP. Note that the MVPN procedures (RFC6513) described mechanisms similar to what is proposed in this draft, i.e. create a Mi-PMSI (default fabric) and S-PMSI (specific trees). And RFC6514 describes how to use auto-provision those trees. It is not clear to me from the draft (or use-case) that there is a reason to avoid BGP.
Lucy: yes, you can use another protocol to do multicast in IGP. The goal here is to use single IGP protocol for both unicast and multicast. It will simplify the network and allow automatic infrastructure network configuration. This is important for DC networks.
[weiguo]: There are two reasons for why IGP protocol is prefered than BGP.
1. The "multicast" Fabric relies on whole network's LSDB, BGP can't generate LSDB. In IGP multicast, calculating mechanism replaces explicit signalling in PIM to construct bidir tree.
2. Automation. Internal IGP link can be unnumbered interface, no IP address configuration is required, unnumbered interface don't hinder internal IGP fabric calculation.

Ice: Longer convergence times of PIM compared to unicast are due to two factors, the delay of signalling from RIB to PIM AND the amount of state (trees) that need to be updated in the PIM database. The first delay is relatively small, say order of 100ms, the cost of updating a high number of PIM/mLDP trees is higher and increases with the amount of state.
Lucy: this is not always true. PIM convergence time is on top of the IGP convergence time.
[weiguo]: Multicast and unicast use same protocol, no extra convergence timer than unicast. As for the state, the solution is an enhanced stateful solution compared with PIM like solution, it's different from BIER. In some cases, if multiple group addresses receivers attach to same edge devices, these group addresses can share same distribution tree(tree aggregation), it can effectively reduce the amount of state.

3.2.2. Parallel Local Link Selection

Quote from draft:
   “….Note that if multiple distribution trees are configured in a domain
   or on a router, better load balance among parallel links through the
   tie-breaking algorithm can be achieved. Otherwise, if there is only
   one tree is configured, then only one link in parallel links can be
   used for the corresponding distribution tree. However, calculating
   and maintaining many trees is resource consuming. Operators need to
   balance between two …."

Ice: It is very likely network operators want to take benefit of ECMP through the network, so having a single tree in the network is not an attractive option. Also, a ‘default’ single tree in the network will cause flooding and waisting of bandwidth (the nature of Tree aggregation). Putting the burden on the operators to choose their poison (Flooding or State) it not solving any problem for operators  compared to how multicast is deployed today. This is one of the key issues we need to address IMO.
Lucy: The solution builds one default tree and other trees if operator wants. So flooding can be avoided. There is no free lunch. All depend what is operator goal.
[weiguo]:One default tree doesn't mean flooding in whole network, it can be pruned for each multicast. Multiple trees is to enhance link bandwidth utilization in ECMP case.

3.4. Pruning a Distribution Tree for a Group

Ice: I can see it could be useful to use the link state database to build a tree, but if we’re going down the path of creating multiple trees for different receiver populations, it means you need to maintain state for each of those trees. If we now compare this with mLDP/PIM, you end up with the same amount of state, its just that it is signalled via the IGP. The way the tree is calculated in the IGP is different from how PIM/mLDP does it, but its not clear if using the IGP database is any better.
Lucy: singling via IGP saves mcast convergence time and allows IGP network automatically configured.

4.4. Reverse Path Forwarding Check (RPFC)

Ice: This section raises an important issue. The advantage of using the IGP database to build a tree is that it follows the (forward looking) unicast path towards its destination(s). There is no dependency on the RPF check as we know it from PIM and mLDP, which is a simplification. Not having the RPF check makes you more receptive to loops, as you indicated in this section. By adding RPFC back into the mix to prevent loops, we’re now combining the ‘forward looking’ path selection done by the upstream router and the ‘backwards looking’ (RPFC) accept mechanise on the downstream router. If there are async paths in the network there is no guarantee they select the same link, causing the downstream router to incorrectly drop the packets.
Lucy: RPFC mechanism is built in and can be used. In fact, just check source IP address on the packet against ucast db. The solution addresses how to ensure upstream and downstream select the same path in a tree. The problem you state is not the case.
[weiguo]: RPFC is used in PIM and not used in Bidir-PIM. Loop is possible only in network converge transiet process. The RPFC will make the solution more strict and avoid loop theoretically, it can be optional in most common cases, RPFC is not mandatory.

Summarising:

Using the IGP Link State Database to build an IGP Tree per Root could be useful in some scenario’s, but the lacking of RFC check make this IGP Tree much more receptive to loops. I think this is a big concern and adding back the RPF check back into the mix just complicates the solution.

Its is not clear that the procedures and mechanisms to build pruned IGP Trees are any simpler then how PIM/mLDP/RSVP-TE trees are build.
Lucy: the solution does not simply here. It is similar to PIM.
[weiguo]: From implementation perspective, it is hard to say it is more simpler than Bidir-PIM. From operator's perspective, it is simpler for operation, it has better convergence timer.

When looking at the amount of state maintained in the network, its probably the same. Saying that doing IGP Tree building is better because there is no need to run an other protocol like PIM/mLDP is misleading. Obviously the procedures added into the IGP come with a cost, in complexity, state and signalling requirements. This is not something that comes for free and now everybody who understand unicast IGP knows how the multicast procedures work.
Lucy: again, we clearly state the motivation for IGP multicast. It has some very useful case for DC network. Not sure how many DC network deploy PIM yet.
[weiguo]: In this solution, each node still needs maintain multicast forwarding table, for the operators, the solution is a relatively evolutionary solution compared with PIM. The evolutionary points:
1. Convergence timer
2. Easy maintainence because reduced protocol
3. Automation.

If the problem we are trying to solve is driven by ‘plug and play’ and/or auto provisioning, there are existing BGP mechanisms that we can use in combination with PIM/mLDP/RSVP-TE.
Lucy: add another protocol to achieve that? How complex level we want to go here while we can use one protocol to achieve it? No wonder why IT people want to some revolution on the network.
[weiguo]: IGP fabric automation is better than BGP based fabric. In IGP fabric, unnumbered interface can be used, no IP address is needed, it's easy to realized "plug and play" and/or other auto provisioning.

Sorry again for the late response.

Thanks,
Lucy

Thx,

Ice.