Re: [pim] Benjamin Kaduk's Discuss on draft-ietf-pim-drlb-13: (with DISCUSS and COMMENT)

On Wed, Dec 11, 2019 at 03:27:22PM -0800, Stig Venaas wrote:
> Hi
> 
> Thanks for the careful review here. A lot of valuable comments. I've
> addressed most of these in the new version I just submitted.
> 
> Please see inline for details.
> 
> On Tue, Dec 3, 2019 at 1:51 PM Benjamin Kaduk via Datatracker
> <noreply@ietf.org> wrote:
> >
> > Benjamin Kaduk has entered the following ballot position for
> > draft-ietf-pim-drlb-13: Discuss
> >
> > When responding, please keep the subject line intact and reply to all
> > email addresses included in the To and CC lines. (Feel free to cut this
> > introductory paragraph, however.)
> >
> >
> > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> > for more information about IESG DISCUSS and COMMENT positions.
> >
> >
> > The document, along with other ballot positions, can be found here:
> > https://datatracker.ietf.org/doc/draft-ietf-pim-drlb/
> >
> >
> >
> > ----------------------------------------------------------------------
> > DISCUSS:
> > ----------------------------------------------------------------------
> >
> > I think we need greater clarity on whether the list of GDR candidate
> > addresses is sorted or not (i.e., whether it is required for protocol
> > operation), as indicated by the rtgdir reviewer.
> > Specifically, Section 5.3 is clear in the descriptive text that the list
> > is sorted (as if a recipient might rely on that behavior), but Section
> > 5.3.2 and Section 5.4 only have it as RECOMMENDED.  Given my
> > understanding of the protocol, it seems that all routers need to receive
> > the DRLB-List in order to perform the GDR selection algorithm, in which
> > case the extra information about the addresses being sorted would not be
> > useful for the calculation.  That would actually suggest that we do not
> > need RFC 2119 keywords here, and could just say (as we do in Section
> > 5.4) that it's recommended for the DR to use a deterministic procedure,
> > such as sorting.
> 
> Right, I agree. I added that it is recommended the first time I
> mention sorting, and changed RECOMMENDED to recommended as the
> protocol works fine either way. I added rationale for the
> recommendation. Namely with certain algorithms, like modulo, the
> result is more predictable if there is a fixed order. We don't want it
> to depend on the order the DR detected the candidates.

Indeed, this all looks good.

> > I also think the text should be more clear in Section 5.3.2 about the
> > use of the Router Identifier as the "GDR Candidate Address".  I believe
> > (but am not certain) that the intended behavior is that the elected DR
> > use all the PIM Hellos it has received (from candidate GDRs) to assemble
> > the list of candidate "addresses", but instead of using the actual IP
> > addresses it uses the Router Identifier construction described here when
> > assembling the "GDR Candidate Address(es)" field.  The current text
> > leaves unsaid what entity is performing this operation and how the PIM
> > Hello+Router Identifier corresponds to an entry in the list of
> > addresses.  Furtheremore, for the IPv6 case, it seems like this
> > substitution procedure interacts very poorly with the masking procedure
> > when the network includes a mix of routers that do/don't send a Router
> > ID (as it may not be possible to set a 32-bit contiguous mask that
> > captures the varying parts of IPv6 router addresses and the space
> > reserved here for "Router ID").
> 
> I added more explicit text here, both when the DRLB forms the list,
> and when a router looks for itself in the list. I've not tried to
> address the IPv6 limitation though. The admin would have to be careful
> what masks to use if some routers support Router-ID.

I think the updates for the Router Identifier usage help a lot; if I was
holding the pen I would probably twaek § 5.3.2's "If the "Interface ID"
option, as specified in [RFC6395], is present in a GDR Candidate's PIM
Hello message, and the "Router Identifier" portion is non-zero" a little
bit to set it in context more clearly, but that's not a Discuss-level
concern and I leave it to your judgment.

I do still worry some about the IPv6 limitation, though, but let me check
my understanding a bit.  It seems that we only take the low-order 32 bits
of the shifted+masked value out of implementation convenience, since 32-bit
integers are fast and (modular) division slow.  I'm not missing some more
fundamental reason why the 32 bits are chosen for the modulo hash
algorithm, am I?  (I mean, yes, IPv4 addresses will only populate 32 bits
of any field, but I am not sure why that length would have to be extended
to IPv6 processing, when zero-padding would produce a well-defined result.)
If it is just a relatively arbitrary choice of number of bits to take,
given that it in theory could have a detrimental effect on the mechanism
when used with IPv6, it would seem to potentially qualify as a "known
technical omission" in the language of RFC 2026, which is something that we
are not supposed to have in Proposed Standards.  Even if the IESG did
decide to waive the requirement, I think it would still be incubent upon us
to properly document the at-risk scenarios.

> > I'm concerned about hash algorithm agility (in the vein of BCP 201,
> > though since this is not a cryptographic hash that BCP does not strictly
> > speaking apply), as the rtg-dir review noted.  Specifically, each router
> > has to commit in its Hello to a single hash algorithm, so transitioning
> > to a new algorithm will require accepting reduced functionality during
> > the transition period (a reduced list of potential GDR candidates),
> > which is contrary to the goals of algorithm negotiation espoused in BCP
> > 201.  Is this not a significant concern for this use case?  I see that
> > Section 6 attempts to disclaim discussion of algorithm migration, but I
> > am not yet convinced that it is appropriate to do so.
> 
> It is not clear whether we will define additional algorithms. If we
> do, I think it might be sufficient that the admin manually configures
> each router with the desired algorithm. If we believe there is a need,
> we can define an election mechanism later though. Let us see whether
> there is a need for it. We may not need multiple algorithms either. I
> don't foresee people frequently changing algorithms.

Thanks for the additional discussion.  I agree that within this
architecture it should be possible to introduce an election scheme along
with the new algorithm, should one be needed, so I will not press this
point further.

> > Please also remove from Section 5.7 the stale statement referring to the
> > previous section (see COMMENT).
> 
> Done, thanks.
> 
> >
> > ----------------------------------------------------------------------
> > COMMENT:
> > ----------------------------------------------------------------------
> >
> > Thank you for the Backward Compatibility section; it's great to see that
> > covered explicitly!
> >
> > Per the rtg-dir review, please clarify that each router advertises at
> > most one Hash Algorithm at any given time (or how a multi-algorithm
> > scenario would work).
> 
> Done. Generally in pim, options are not sent multiple times, but I
> pointed out that at most one option is used.
> 
> > Limiting the GDR candidates to those with the same (highest) priority as
> > the PIM DR seems like it will in practice encourage having multiple
> > routers advertise the same priority value (if that is not already the
> > case).  Are there any operational considerations or risks in having to
> > use the IP-address tie-breaker more often for the non-GDR-capable
> > routers?
> 
> I don't think this is an issue. The only thing I can think of is that
> changing the IP address on a router may have an impact. It may be
> harder to control which router is the DR, but it is not that important
> which router is the DR. What matters is which routers are GDR, and
> that is still controlled by the priority.

Okay, thanks for filling in the blanks that I couldn't.

> > Section 1
> >
> > Interesting to 1-index the routers but 0-index the links in Figure 2.
> 
> Fixed.
> 
> > Section 3
> >
> >    The extension specified in this document applies to PIM-SM when they
> >    act as last hop routers (there are directly connected receivers).  It
> >
> > nit: this sentence makes more sense when I insert the word "routers"
> > after "PIM-SM".
> 
> Right, thanks.
> 
> >    does not alter the behavior of a PIM DR, or any other routers, on the
> >    first hop network (directly connected sources).  This is because the
> >    source tree is built using the IP address of the sender, not the IP
> >    address of the PIM DR that sends the registers towards the RP.  The
> >    load balancing between first hop routers can be achieved naturally if
> >    an IGP provides equal cost multiple paths (which it usually does in
> >    practice).  Also distributing the load to do registering does not
> >    justify the additional complexity required to support it.
> >
> > In this last sentence, does "registering" refer to setting up the sender
> > or the registration of receivers from DR to RP?
> 
> It's the sender, I clarified that.
> 
> > Section 4
> >
> >    In order to share forwarding load among last hop routers, besides the
> >    normal PIM DR election, the GDR is also elected on the multi-access
> >    LAN.  There is only one PIM DR on the multi-access LAN, but there
> >    might be multiple GDR Candidates.
> >
> > nit: this reads as if there is only a single GDR per LAN the same way
> > that there is only one PIM DR.  But my understanding is that the GDR is
> > per-group, so perhaps a wording tweak is in order, even given the
> > exposition in the following paragraph.
> 
> Agree, fixed.
> 
> >    A Hash Algorithm based on the announced Source, Group, or RP masks
> >    allows one GDR to be assigned to a corresponding multicast state.
> >    And that GDR is responsible for initiating the creation of the
> >    multicast forwarding tree for multicast traffic.
> >
> > nit: s/And that/That/
> >
> > Section 5.1
> 
> Agree.
> 
> > Do we expect the hash masks to be a contiguous set of bits (i.e., not
> > 0xf0f0f0f0)?
> >
> >    The DRLB-List Hello Option contains a list of GDR Candidates.  The
> >    first one listed has ordinal number 0, the second listed ordinal
> >    number 1, and the last one has ordinal number N - 1 if there are N
> >    candidates listed.  The hash value computed will be the ordinal
> >    number of the GDR Candidate that is acting as GDR.
> 
> Added text.
> 
> > nit: I suggest "acting as GDR for the flow in question".
> 
> Good, thanks.
> 
> > I would also consider having some lead-in text to introduce the purpose
> > of the bulleted list that follows, perhaps something like "the input to
> > be hashed is determined according to the following procedure:".
> 
> Done.
> 
> > Section 5.2
> >
> > I suspect that the "keeping only the last 32 bits of the result" step
> > could result in pathological behavior for certain IPv6 addressing
> > schemes; this risk should be discussed in the security considerations
> > (or the limitation removed).  Presumably any hash algorithm more
> > complicated than modulo would not need this step of trimming down to 32
> > bits, too?
> 
> There is some risk here, but at least I would say that IPv6 multicast
> addresses would typically have different last 32 bits. But not
> necessarily. One can choose which 32 bits by applying a mask though.

I think if one uses a "sparse" mask whose low and high bits are more than
32-bits apart, the currently documented procedure will in effect use as the
actual mask the bitwise AND of the supplied mask and 0xffffffff (after
shifting), so that you can't really choose which 32 bits precisely.  We'd
need some way to "compress out" the "holes" in the mask in order to get the
full flexibility.  If that was the intended behavior already, then I think
further clarification is needed.

> > Please define (e.g., by reference to a specific version of the C
> > language) the notation used for these calculations.  (I note that the
> > algorithm applied to IPv6 addresses would require a 128-bit unsigned
> > integer type.)
> 
> I did not do this. The notation is used in other RFCs as well, and is
> the same for all C versions as far as I know. I agree the formulas are
> assuming 128-bit integer type the way they are written, but it is
> possible to implement them (doing the same computation) with only
> 32-bit integers as well.

IMO even "a C-like notation" would be helpful (the "specific version of the
C language" is just an attempt to preemptively satisfy anyone else who asks
for a reference), but this is only COMMENT-level so I'll say no more.

> > Section 5.3
> >
> >    All PIM routers include a new option, called "Load Balancing
> >    Capability (DRLB-Cap)" in their PIM Hello messages.
> >
> > nit: I suggest a minor rewording to """PIM routers include a new option,
> > called "Load Balancing Capability (DRLB-Cap)" in their PIM Hello
> > messages, to indicate support for this specification""".  (With the
> > current text the reader is responsible for scoping the "All PIM routers"
> > to "ones that implement this specificiation.)
> >
> >    Besides this DRLB-Cap Hello Option, the elected PIM DR also includes
> >    a new "DR Load Balancing List (DRLB-List) Hello Option".  The DRLB-
> >    List Hello Option consists of three Hash Masks as defined above and
> >    also a sorted list of GDR Candidate addresses on the LAN.
> 
> Done.
> 
> > Would you mind pointing me at the part of RFC 7761 that describes the
> > procedure/delay used by a router to determine that it is the DR (and
> > thus, when it should start sending DRLB-List)?  It's not entirely clear
> > that we'd need to include that reference in this document, but I'd like
> > to sate my curiousity.
> 
> See section 4.3.2 in 7761.

Thanks!  (I was mostly curious what kind of race-avoidance was in place,
but thinking about it more since the Hellos are paced anyway, there's
limited scope for issues of that nature.)

-Ben

> >
> > Section 5.3.1
> >
> >       Hash Algorithm: Hash Algorithm type. 0 for the Modulo algorithm
> >       defined in this document.
> >
> > Maybe mention the registry again here?
> 
> Done.
> 
> > Section 5.3.2
> >
> >          This DRLB-List Hello Option MUST only be advertised by the
> >          elected PIM DR.  It MUST be ignored if received from a non-DR.
> >          The option MUST also be ignored if the hash masks are not the
> >          correct number of bits, or GDR Candidate addresses are in the
> >          wrong address family.
> >
> > I'm not sure that any of the cases listed in the last sentence are
> > reliably detectable.
> 
> I didn't change this. The check may just have to be based on the total
> length being as expected.
> 
> > Section 5.4
> >
> >    the order in which the DR learns of new candidates.  Note that, as
> >    non-DR routers, the DR also advertises the DRLB-Cap Hello Option to
> >    indicate its ability to support the new functionality and the type of
> >    GDR election Hash Algorithm.
> >
> > nits: "as for non-DR routers", "the type of GDR election Hash Algorithm
> > it uses"
> 
> Thanks.
> 
> > Section 5.6
> >
> > The requirement in step 1 to run the Hash Algorithm for all groups with
> > local receiver interest seems to imply that all GDR candidates must
> > track and store local receiver interest for all groups, as opposed to
> > without this extension where only the DR strictly needs to do so.  I
> > imagine that generally all/most routers will be tracking this
> > information, though, so in practice this will not be an additional
> > operational burden [that would need to be documented].  But this is not
> > my area of expertise, so please correct me if I'm wrong!
> 
> Yes, they are all expected to track it. In regular PIM, the non-DRs
> are also tracking it.
> 
> > Section 5.7
> >
> >    When a router stops acting as the GDR for a group, or source and
> >    group pair if SSM, it MUST set the Assert metric preference to
> >    maximum (0x7FFFFFFF) and the Assert metric to one less than maximum
> >    (0xFFFFFFFE).  This was also mentioned in the previous section.  That
> >
> > This was not mentioned in the previous section.
> 
> Thanks. Removed. It was in an earlier version.
> 
> > Section 6
> >
> >    An administrator needs to consider what the total bandwidth
> >    requirements are and find a set of routers that together has enough
> >    total capacity, while making sure that each of the routers can handle
> >    its part, assuming that the traffic is distributed roughly equally
> >    among the routers.  Ideally, one should also have enough bandwidth to
> >
> > In a scenario where an attacker can create groups or control how some
> > amount of traffic is split across groups, this assumption of roughly
> > equal distribution will not hold.  Please discuss this in the security
> > considerations.
> 
> Added.
> 
> >    The default masks will use the entire group addresses, and source
> >    addresses if SSM, as part of the hash.  An administrator may set
> >
> > (side note: of course, the only hash algorithm currently defined will
> > only use the last 32 bits of IPv6 addresses)
> 
> Right, unless a mask is used with trailing 0-bits.
> 
> Thanks. Hope my changes are sufficient, but let me know if you feel
> further changes are needed.
> 
> Stig
> 
> >
> > _______________________________________________
> > pim mailing list
> > pim@ietf.org
> > https://www.ietf.org/mailman/listinfo/pim