Re: [Ipoverib] Comments on draft-ietf-ipoib-link-multicast-04.txt

"H.K. Jerry Chu" <Jerry.Chu@eng.sun.com> Sun, 11 May 2003 06:36 UTC

Received: from www1.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id CAA28534 for <ipoverib-archive@lists.ietf.org>; Sun, 11 May 2003 02:36:37 -0400 (EDT)
Received: from www1.ietf.org (localhost.localdomain [127.0.0.1]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h4B60OB14482; Sun, 11 May 2003 02:00:24 -0400
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id h4B5xPB14409 for <ipoverib@optimus.ietf.org>; Sun, 11 May 2003 01:59:25 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id CAA28495 for <ipoverib@ietf.org>; Sun, 11 May 2003 02:34:00 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19EkRD-0007ZG-00 for ipoverib@ietf.org; Sun, 11 May 2003 02:35:59 -0400
Received: from kathmandu.sun.com ([192.18.98.36]) by ietf-mx with esmtp (Exim 4.12) id 19EkRC-0007ZD-00 for ipoverib@ietf.org; Sun, 11 May 2003 02:35:58 -0400
Received: from jurassic.eng.sun.com ([129.146.81.144]) by kathmandu.sun.com (8.9.3p2+Sun/8.9.3) with ESMTP id AAA23380; Sun, 11 May 2003 00:36:58 -0600 (MDT)
Received: from unknown (vpn-129-150-17-211.SFBay.Sun.COM [129.150.17.211]) by jurassic.eng.sun.com (8.12.9+Sun/8.12.9) with SMTP id h4B6apKQ693585; Sat, 10 May 2003 23:36:56 -0700 (PDT)
Message-Id: <200305110636.h4B6apKQ693585@jurassic.eng.sun.com>
Date: Sat, 10 May 2003 23:42:33 -0700
From: "H.K. Jerry Chu" <Jerry.Chu@eng.sun.com>
Reply-To: "H.K. Jerry Chu" <Jerry.Chu@eng.sun.com>
Subject: Re: [Ipoverib] Comments on draft-ietf-ipoib-link-multicast-04.txt
To: rbrabson@us.ibm.com
Cc: ipoverib@ietf.org
MIME-Version: 1.0
Content-Type: TEXT/plain; charset="us-ascii"
Content-MD5: KAk58iXh782/n+IP6O4dDQ==
X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.4.2 SunOS 5.8 sun4u sparc
Sender: ipoverib-admin@ietf.org
Errors-To: ipoverib-admin@ietf.org
X-BeenThere: ipoverib@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ipoverib>, <mailto:ipoverib-request@ietf.org?subject=unsubscribe>
List-Id: IP over InfiniBand WG Discussion List <ipoverib.ietf.org>
List-Post: <mailto:ipoverib@ietf.org>
List-Help: <mailto:ipoverib-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ipoverib>, <mailto:ipoverib-request@ietf.org?subject=subscribe>

>> >> >- Section 9.0 indicates that "The packet SHOULD be forwarded to 
>locally 
>> >
>> >> >  connected routers.  [snip] The specific mechanism for a sender to 
>> >> >  forward packets to routers are left to implementations".  At a 
>> >> >  minimum, I think the exact implementation needs to be specified.
>> >> 
>> >> Why? (The draft strives to avoid dictating implementation details 
>that
>> >> do not affect interoperability.)
>> >
>> >For interoperatibilty.  The requirement is the host and router must 
>agree 
>> >on using a common mechanism for exchanging the IP multicast packet. 
>> >Implementations are free to choose a mechanism as long as all other 
>> >implementations with which they communicate also choose to implement 
>the 
>> >same mechanism.  I don't think this is an area where we can or should 
>punt 
>> >- a minimal, common approach needs to be mandated such that all IPoIB 
>> >nodes implement that mechanism so that any two IPoIB nodes will 
>> >interoperate.
>> 
>> I still don't see any problem you may have in mind. As long as a sender
>> gets the pkt to the router, why it matter how it gets there? E.g.,
>> routers will listen to both the broadcast and the all-router multicast
>> addresses. Does it matter which address a pkt is sent to?
>
>I guess I thought that a receiver might choose to discard IB packets which 
>is sent to the wrong (M)GID.  If this isn't possible or allowed by IBA, 

I don't think it is possible to determine whether a destination link layer
address is "right" or "wrong" wrt to a dst IP addr in general.

>then you are right - it doesn't matter how a packet arrives at the local 
>QP, just as long as it arrives.  So, I guess an alternative to using the 
>all-router MGID or the broadcast GID would be to have an IPoIB host 
>somehow determine all listeners and unicast packets to each listener.  Not 
>a real good alternative, but it would work.

Yup.

>
>Even if there are many possible ways to make IPoIB work, we should be 
>choosing the "best" or "most appropriate" one(s) and documenting those. 
>This appears to be sending to the all-router MGID, or possibly the 
>broadcast GID.  Why not simply state these are the mechnisms defined by 
>IPoIB, but state that future standards may define additional mechanisms 
>for delivery of multicast packets to the IPoIB router.

Again the draft has been carefully written to separate the required
properties of IPoIB from implementation freedom. For a multicast receiver
the draft clearly states it MUST join the approiate MGID. But for a
multicast sender it doesn't really matter how the link layer manages to
get an outbound multicast pkt to its receivers (including all the MC
routers), as long as it does NOT make any undue assumption about
receivers. One specific consideration is that although we don't
encourage, we still want to allow implementations that must resort
to broadcast when sending MC pkts to be declared compliant. An
analogy is IP over Ethernet does not dictate implementations such as
multicast receivers mustn't resort to promiscuous mode when its hash
table overflows...

Perhaps the best is not to dictate but to suggest the preferred
implementation (i.e. the all-router MC group).

>
>> >
>> >> >  The 
>> >> >  text suggests using all-router multicast group or the broadcast 
>> >group. 
>> >> >  If a router is required to create the all-router multicast group 
>(see 
>> >
>> >> >  the previous comment), then wouldn't it be sufficient to check for 
>
>> >the 
>> >> >  existence of the all-router multicast group and, if it exists, 
>send 
>> >to 
>> >> >  that?  If it doesn't exist, then there can't be any multicast 
>routers 
>> >
>> >> >  on the link, and since there aren't any local listeners there are 
>no 
>> >> >  on-link listeners there isn't any need to send the packet.
>> >> 
>> >> Don't the code sinppet following 9.B match pretty much what you 
>> >described
>> >> above?
>> >
>> >Yes, it does, and if the text matched the code snippet then I wouldn't 
>> >have an issue.  But the text which supports the code snippet doesn't.
>> 
>> I can remove the text if it causes any confusion. I thought it is very
>> obvious if no router exists then there is no need to forward (as
>> illustrated by the code snippet).
>> 
>> >The 
>> >text basically says you could send to the all-router multicast group OR 
>
>> >the broadcast group.  And either should work, as long as the receiving 
>> >IPoIB node MUST accept the IP multicast packets regardless of the IB 
>group 
>> >over which they are received (I'm not sure I've seen this explicitly 
>> >stated in the draft, but I may have missed it).
>> 
>> I thought this is required by IP already. But please tell me if I'm
>> wrong.
>> 
>> >But do we really need to 
>> >have multiple mechanisms to accomplish the same result?  Wouldn't we be 
>
>> >better off just choosing one and going with that?  And how are we going 
>to 
>> >find multiple interoperable implementations for each option when the 
>draft 
>> >puts no limits on the approaches which can be taken?
>> 
>> The multiple mechanisms were not there in the earlier draft, but were 
>> requested later by my co-author. I personally don't see any harm with
>> it but I don't mind taking it out if Vivek agrees with you.
>
>I responded to a separate post from Vivek on this topic.  I'm OK if both 
>mechanisms (send to the all-router MGID or broadcast GID) are left in, 
>although I do think the best approach is to send to the all-router MGID. 
>Doing so reduces the amount of traffic which needs to be filtered by IPoIB 
>hosts when there are no on-link listeners.

Ok.

>
>> >> >  And shouldn't the IPoIB sender subscribe to the IB multicast group 
>
>> >> >  creation events using a wildcard MGID so that it can 
>> >> >  "SendOnlyNonMember" join the group if a listener appears on the 
>local 
>> >
>> >> >  IB subnet, or "SendOnlyNonMember" join the all-routers group if a 
>> >> >  router appears on-link and creates the all-routers multicast 
>group? 
>> >> >  And doesn't it need to subscribe to the IB multicast group 
>deletion 
>> >> >  events so that it knows when to switch back to sending to the 
>> >> >  all-router multicast group and/or stop sending?
>> >> 
>> >> Aren't you saying the same thing (more or less) as what is described 
>in
>> >> the section following the code snippet?
>> >
>> >You are right - I must have missed the text.  But you need to change 
>the 
>> >"should" to a "MUST" in that paragraph.
>> 
>> Since the caching part is a "should", the subscription part ought to be
>> a "should" too, right?
>
>Yes, you are right.  Listening for creation/deletion reports doesn't make 
>any sense if the IPoIB node does not cache the MGID for groups which it 
>"NonMember" joins.
>
>> >> >  But I'm wondering why a node needs to do this at all.  Wouldn't it 
>
>> >> >  make more sense for the IPoIB node instead create the multicast 
>group 
>> >
>> >> >  and join the group so that IPoIB multicast routers can join the 
>> >group? 
>> >> >  Yes, the IPoIB sender will need create the group as a 
>"FullMember", 
>> >> >  but this seems to provide a simpler mapping of IP multicast to IB 
>> >> >  multicast.
>> >> 
>> >> While what you suggested may simplify the procedure, it essentially
>> >> removes the difference between a multicast sender and a multicast 
>> >receiver,
>> >> which is wrong. In the IP multicast model a multicast sender doesn't 
>> >need
>> >> to be a member of a multicast group to send. The draft has tried to
>> >> keep the same distinction.
>> >
>> >However, you are mapping the IP multicast model to the IB multicast 
>model, 
>> >and the two are not the same.  The reality is a sender must join an IB 
>> >multicast group in order to send to it (albeit as a 
>"SendOnlyNonMember"),
>> 
>> This is the big difference that was debated in IBTA.
>> 
>> >and this step does not exist in the IP model.  I imagine that the vast 
>> >majority of the code/complexity of an IPoIB implementation is going to 
>be 
>> >dealing with IB multicasting.  I was hoping there was some way to 
>reduce 
>> >both at an IPoIB host.  Maybe it is just wishful thinking.
>> 
>> I actually have the same philosophy as well. I'm always against 
>complexity
>> unless it's fully justified. But IB multicast just works differently 
>than
>> IP multicast. We have gone through a number of alternatives and the
>> current one seemed to a good balance between complexity and correctness.
>> I don't remember all the tradeoffs but if we simply require a sender
>> to be a full-member, how would one know when to leave a group, when to
>> delete a group..., etc. Also wouldn't it enable anyone to write a silly
>> sender program to try to send to a million different addresses and cause
>> all the IB multicast resources to be used up? (The join call by 
>> IP_ADD_MEMBERSHIP at least places some resource limitation.)
>
>Yes, there are issues which might well make it more complex.  As I said, 
>maybe it was wishful thinking.  From the sound of it, it was.
>
>> >By the way, after thinking about this a little more, I'm not sure that 
>> >creating a group as I suggested makes sense in all cases.  For 
>instance, 
>> >it is only useful for an IP multicast group which has a scope greater 
>than 
>> >link-local.  For link-local scope, only nodes directly attached to the 
>> >IPoIB link are able to receive the packet.  Creating the group for the 
>> >purpose of forwarding to an IPoIB multicast router is simply wasteful, 
>as 
>> >the IPoIB multicast router will not be able to forward the packet to 
>> >another IP link.
>> >
>> >A similar optimization can also be taken with the current approach 
>> >described in this draft - for an IP multicast group with link-local 
>scope, 
>> >if the IB MGID for that IP multicast group does not exist then an IPoIB 
>
>> >node really doesn't need to send the multicast packet to the all-router 
>
>> >MGID or broadcast GID.
>> 
>> Correct. This level of details can arguably be left to implementation
>> because it doesn't affect interoperability. But I think this 
>optimization
>> is simple and useful enough that I'll add a note to the draft.
>
>Sounds good.
>
>> >
>> >> >- Section 9.0 also indicates that "A multicast sender must join the 
>> >> >  target multicast group as a "SendOnlyNonMember" before outgoing 
>> >> >  messages from it can be successfully routed." This text should be 
>> >> >  clarified to indicate this is only true if the node has not 
>already 
>> >> >  joined the group as either a "FullMember" or "SendOnlyNonMember". 
>The 
>> >
>> >> >  If-Then-Else logic below correctly shows this behavior, but this 
>> >> >  paragraph does not reflect that behavior.
>> >> 
>> >> Although your comment is correct, I'd argue this kind of details can
>> >> be found in the IBA spec. It has nothing to do with IPoIB in 
>particular,
>> >> So the question becomes how much IBA info we need to repeat in an 
>IPoIB 
>> >RFC.
>> >> Since the code sippet already covers this correctly, if you still 
>> >believe
>> >> this needs to be clarified i can certain add more text to it.
>> >
>> >My feeling is enough to describe HOW an IPoIB node will USE the IBA 
>> >services, and this is one of those cases.  I would prefer to see the 
>text 
>> >clarified if you do another revision of the draft, but given the code 
>> >snippet describes the correct behavior I wouldn't hold this draft up 
>just 
>> >to clarify this paragraph.
>> 
>> Ok.
>> 
>> >
>> >> >- Section 9.0 indicates that "For the rest of attributes, it is 
>> >> >  recommended the same values from the all-node multicast/broadcast 
>> >> >  group be used." Should this instead say "For the rest of the 
>> >> >  attributes, the same values from the all-node multicast/broadcast 
>> >> >  group SHOULD be used." Regardless of the wording, this text seems 
>to 
>> >> >  contradict the Section 7.0 which states a node must "use the rest 
>of 
>> >> >  the link attributes associated with the group for all future 
>> >> >  communication on the link.  If Section 7.0 is correct, should this 
>be 
>> >
>> >> >  a MUST instead of a SHOULD?  And, if not, under what circumstances 
>
>> >> >  would a node choose to not use the same values?
>> >> 
>> >> 7.0 describes the attributes used by a node when conducting unicast
>> >> communication. 9.0 describes the attributes used to create a 
>multicast
>> >> group, i.e., for multicast communication. So the question is:
>> >> Must path attributes used for unicast match with those for multicast?
>> >> 
>> >> The flexibility allowed in the multicast case in the current draft is
>> >> deliberate. I don't know in reality if this is useful, or even good 
>to
>> >> have. (Comments are welcome.) My design principle has been to not to
>> >> make something a requirement unless it is needed for 
>interoperability.
>> >
>> >Flexibility is often times good, but so is describing a minimal set of 
>> >functions necessary for interoperability.  If a node selects link 
>> >attributes when creating an MGID which does not match the broadcast 
>GID's 
>> >attributes, then it is possible that not all nodes which are members of 
>
>> >the IPoIB link will be able to support those attributes.  If this 
>happens, 
>> >then the two nodes will not be able to communicate using the IP 
>multicast 
>> >group mapped to the MGID.  All of a sudden, an IPoIB node can't send an 
>
>> >ARP request or Neighbor Solicitation to another node.
>> 
>> I think some of these "deliberate" flexibility (or ambuguity) came from
>> the ambuguity of IB itself in some of the mutlicast attributes such as
>> SL, TClass..., etc. It is not clear to me if they must or even can
>> follow the ones for unicast (i.e. described by the broadcast group)
>> in all cases.
>
>Well, I hope they can, as that is what I was planning on doing for any 
>MGIDs which I had to create.  If they can't, I have no idea how to come up 
>with the appropriate values for the attributes.

The flexibility is not for you :-), but for those who for some reason
know how to come up with an optimal set of values which turn out to
be different than those of the broadcast group.

Well I'll study IBA spec more to see what needs to be change here.

>
>> >Perhaps it would be better to state that a node which creates a 
>multicast 
>> >group MUST use the same values for the attributes as the all-node 
>> >multicast/broadcast group, but a node joining a multicast group must 
>not 
>> >verify the values match.  If the node joining the group can't support 
>the 
>> >values for the multicast group, then it should log the error.  That 
>way, a 
>> >future RFC could describe why/when/how a node would choose to use 
>> >different values for the link attributes but still be able to 
>interoperate 
>> >with IPoIB nodes.
>> 
>> I'll leave the above for IB folks to comment.

_______________________________________________
IPoverIB mailing list
IPoverIB@ietf.org
https://www1.ietf.org/mailman/listinfo/ipoverib