[magma] MLDv2: editorial comments and questions

[Note that I'm still in xmit mode - making sure I type up all the comments
and questions before I read the responses so far]

Introduction:
-------------

Several of the unclear aspects (such as do the routers send periodic
general queries) below could be amelirotated by having an introduction
section which actually introduces the workings of the protocol.

It would be good if such a section presented this as well as
 - the fact that the routers do not track the membership of
   individual listeners (for scalability reasons)
 - the existence of state change reports as well as current status reports
 - the fact that there are 3 different types of v2 queries
 - the fact that a single (multicast) router on a link is
   elected the querier and it is the only node issuing queries
 - general queries are sent periodically 
 - mc addr and source specific queries are sent it response
   to detecting state changes in a listener
 - the workings of the "supress" flag would be useful to explain
   at a high level
 - the assumption about robustness and fast leaves
 - the fact that MLD operates independently for each interface - all
   state is per interface

Currently even with a good understanding of MLDv1 it is very hard to
read this specification and get a good grasp of the protocol.
In some cases it requires reverse engineering the overall operation
of the protocol from some minute detailed rules in the bowels of the spec.
This is not good.
Even a new reader (that has never read MLDv1) should be able to get an
overview of how the protocol works without having to read the defails.

Minor issues:
-------------

Section 2 has MUST requirements on the service interface for instance,
"MUST NOT be less than 64 addresses per list".
Why is it important for the protocol specification to place such
requirements on the API?
It seems out of place.

There seems to be a section in 4.1 about "source address for queries"
that is missing.
Presumably, as in draft-ietf-mld-source, the query must be sent with a 
link-local source thus an unspecified source is not allowed.
If not then the destination address rule in 4.1.14 would allow
off-link queries.

What is the motivation for 4.1.14 allowing unicast addresses?
When I read this and appendix A.2 I thought there was some new "fast leave"
scheme that involved targeting reports at individual nodes but there
is no such thing. If the query to unicast is useful for e.g. debugging it
would be good to say that in section 4.1.14.

4.2.13 says
	Routers MUST ignore a report with an unspecified source.
This requires a rule that hosts only send such reports for link-local groups
and some explanatory text in 4.1.13 that the purpose of such reports
is to make MLD snooping switches see that the host is interested
in the group. Otherwise the host can think it send a report
for some non-ll group while the router ignored that report.
Hence join latency before the host has assigned a link-local address
would suffer.

5.2 doesn't make it specific what happens when a node receives
a multicast address specific (or multicast address and source specific) 
query and the interface has no record for that group.
It seems to say to schedule the multicast address timer
but then not send the report when the timer fires. That seems suboptimal
unless there is something I don't understand here.
Is there a requirement that all queries (even for addresses which
the node is not interested in) be retained until the timer fires?
Is there some subtle ordering dependency that would make things fail
if the listener compared the source list when the query was received
it could report the wrong thing if its source list has changed while
the timer was running?

Section 6.2 specifies the "multicast address timer".
As far as I understand the semantics of this timer
is just the "exclude mode" timer. Or is its semantics more tricky
than that?

I think it would give a more comprehensible discussion if
section 6.2 specified a separate "include source list"
and "exclude source list" instead of having a single list and
interpreting its content based on the source timer.
An example of possible confusion shows up in 6.5 where it says
   source records.  Source records whose timers are zero (from the
   previous EXCLUDE mode) are deleted.
A source record could presumably have had its timer decrement to
zero and without it being an EXCLUDE source record
(yes, per section 6.2.3 such a source record should be removed,
but the delete might not be immediate.)
I realize this part of the protocol works even with this
confusion, but this is one more thing which makes the protocol
hard to understand.

The rules for when a source record as well as a multicast record
can be deleted is hidden in various rules e.g. in section 6.2.3.
It would be cleared to be able to say in one place
	A multicast record in include mode with an empty source
	list carries no semantic information hence it can be deleted.
	Thus any time a record transitions to this state it is deleted
	by the router.
and corresponding text for the source record.
That would simply the Actions/Comments in the state tables.

Section 6.2.2 says
         INCLUDE             Timer >= 0          All listeners in 
                                                 INCLUDE mode.
even though the text above says that the multicast address timer
is not used (presumably never set, decremented, or fires) in INCLUDE
mode. Thus it would make sense to say "Don't care" or "Not used"
instead of "Timer >= 0".

It would be much helpful if section 6.2.2 could say when the multicast address
timer is started/restarted.
I *think* this is each time a report is received in IS_EX or TO_EX state
but I can't find that (and which value it entered in the timer)
spelled out anywhere.

The table in section 6.2.3 looks inconsistent to me.
The two elements:
        INCLUDE            TIMER == 0        Suggest to stop forwarding
                                             traffic from source and
                                             remove source record. If
                                             there are no more source
                                             records, delete multicast
                                             address record      

        INCLUDE         No source element    Suggest to not forward
                                             traffic from source

Presumably the two above are the same. When the source timer
becomes zero in include mode, the individual source record can be
deleted. And when all source records are deleted the multicast
address record can be deleted. But the second item above does not state
"and delete multicast address record". Is this just an editorial
miss?

Furthermore section 6.2.3 has:
        EXCLUDE            TIMER == 0        Suggest to not forward
                                             traffic from source
                                             (DO NOT remove record)
Which record is not to be removed? The source address record or
the multicast address record?

Section 6.4.2 and 6.6.2 talk about "Querier state" but this state
on the router is never defined.
I understand that is it the result of the querier election
that sets the querier state, but I don't expect other readers
to be that good at reading between the lines.
Perhaps querier election and querier state should be defined early
in section 6 instead of towards the end i.e. move 6.6.2 earlier?

Section 6.6.1 and elsewhere talks about "Supress router-side processing flag".
The only motivation and high-level explanation of this flag is 
in appendix B:
   o The S flag (Suppress Router-Side Processing) is included in queries
     in order to fix robustness issues.
It would be useful for understanding if the issue and solution was
explained in more detail at the higher level. I feel like I'm reading
uncommented assembly code trying to deduct the programmers intent here!

Section 6.6.2 talks of a "lower IPv6 address". I suspect some more
detail would be useful - even for a 32 bit quantity we have byte-order
issues thus how to compare a 128 bit quantity as an integer might
be subject to different interpretations.

Section 6.6.2 reads as the elected querier sends general queries but other
routers might send other queries. I assume that the elected querier is the
only node sending queries. Is this correct?
If so it makes sense to append "as well as other queries." to the last
sentence in the first paragraph.

The second paragraph in 6.6.2 looks inconsistent with section 7.3.1.
The former says that the router should automatically switch to
MLDv1 queries if it hears a v1 query, but the latter talks about
this being adminstratively assurred. At a minimum this is rather confusing.
What should an implementation do?

The description of detecting an old router in 7.2.1 seems rather complex
and hard to read. I understand from section 8 that this is just
a case of 
	when an old query is received switch to old mode and (re)start
	a timer.
	if the timer fires switch to new mode
but I can't parse that from the text in 7.2.1.
(I suspect the complexity is a carry over from IGMPv3 which deals with
3 versions.) Saying "older" isn't needed - "v1" is more specific in
the case of MLDv2.
Same for section 7.3.2.

Section 9 says that the use of the unspecified source
address protects against off-link packets. This isn't true since
there is nothing in any IPv6 specification that says that routers
MUST NOT forward packets with an unspecified source address.
Thus the security considerations section should mention the issue
of how the protocol handles off-link reports with an unspecified
source address. (The fact that such reports are ignored by the router
means that this doesn't add any security hole - but the issue should
be explained and documented. MLD snooping switches might have a
security issue with off-link unspecified source reports though!)

Editorial nits:
---------------

Section 1 says:
   This document specifies Version 2 of MLD.  The previous version of
   MLD became an Internet Standard and is specified in [RFC 2710].  In
MLDv1 is a proposed standard and the above can be read as it being
a standard. How about dropping "became and Internet Standard and"?

Section 3.1:
   o If the requested filter mode is INCLUDE *and* the requested source
     list is empty, then the entry corresponding to the requested
     interface and multicast address is deleted if present.  If no
     such entry is present, the request is ignored.
I think "has no effect" is more accurate than "is ignored".
For instance, some particular API might chose to return an error
in such a case which doesn't sound like the request would be ignored.
I don't think this specification should preclude there from being such
an API.

Section 3.2 mostly talks about the interface state but it also
has this important note:
   but not on socket s2. Note that MLDv2 messages are not subject to
   source filtering and must always be processed by hosts and routers.
Shouldn't this be specified more visibly somewhere else?

Section 3.2 continues with
   Filtering of packets based upon a socket's multicast reception state
   is a new feature of this service interface.  The previous service
   interface described no filtering based upon multicast listening
   state; rather, a Start Listening operation on a socket simply caused
   the node to start to listen to a multicast address on the given
   interface, and packets sent to that multicast address could be
   delivered to all sockets whether they had started to listen or not.
The above doesn't match reality.
While filtering based on the socket isn't required by any IETF standard
(and I think requring it is the new thing) many implementations have
some form of filtering. In some cases it is simple as in unless
the socket has issued at least one add_membership type operation
it will not receive any multicast packets (even if other sockets
have joined the group). In other cases it might be full-fledged
filtering of groups per socket.
A possible fix is to just add "Requiring " before "Filtering of packets ...".

Section 4:
   link-local IPv6 Source Address (or the unspecified address, if the
   node has not yet acquired such an address), an IPv6 Hop Limit of 1,
It is the *interface* and not the *node* that needs to
have a link-local address assigned.
s/node/sending interface/
Same issue in section 4.2.13.

The document seems inconsistent on whether the unspecified address
can be used for v2 queries, but I think the intent is that
they only be allowed on v2 reports. If so it makes sense to
clarify the above sentence and also search for "unspecified"
and make the other text consistent.

Section 4.2.12 says:
       1    MODE_IS_INCLUDE - indicates that the interface has a filter
            mode of INCLUDE for the specified multicast address. The
            Source Address [i] fields in this Multicast Address Record
            contain the interface's source list for the specified
            multicast address, if it is non-empty.
This reads as if MODE_IS_INCLUDE reports are sent even if the
source address list is empty, but that is not the case AFAICT.
Thus it makes sense to clarify this to say that MODE_IS_INCLUDE
is never sent with an empty list.

Section 4.2.12 says:
   o A "Filter Mode Change Record" is sent by a node whenever a local
     invocation of IPv6MulticastListen causes a change of the filter
     mode (i.e., a change from INCLUDE to EXCLUDE, or from EXCLUDE to
     INCLUDE) of the interface-level state entry for a particular
     multicast address. The Record is included in a Report sent from the
     interface on which the change occurred.  The Record Type of a
     Filter Mode Change Record may be one of the following two values:
I makes it clear later, but it would be good to add
"whether or not the source list changes at the same time"
to the above.

Section 4.2.12 says:
   Unrecognized Record Type values MUST be silently ignored.

I assume this means more than just ignoring the type field when it
isn't recognized. The question is whether it means that the
whole report be ignored, or just the multicast address record
with the unrecognized type. Which one is it?

Section 5 says
   There are two types of events that trigger MLDv2 protocol actions on 
   an interface:
Presumably there is a set of timers that, when firing, trigger
MLD protocol actions as well.
It would make sense to list those timers here.

Changing this means it makes sense to add a section 5.3 about
interface timer firing and 5.4 about multicast address timer firing.

Section 5:
   (Received MLD messages of types other than Query are silently    
   ignored, except as required for interoperation with the earlier 
   version of MLD.)
Replace "earlier version of MLD" with MLDv1 and look
for other occurances of "version" where the same thing applies.
(I understand that this type of language makes sense for IGMP
since there are two earlier versions.)

Section 5.2:
   When a new Query with the Router Alert option arrives on an
The "router alert" is checked above.
Using the expression "valid query" throughout the section captures
the validity checks at the beginning of the section.

Section 6.2.2 says in the first paragraph
	kept per multicast address per attached link.
but the "per attached link" doesn't appear in section 6.2.3 for the
source timer which looks confusing.
It would be better to say
	kept per multicast address record.

Section 9.1:
Change "MLDv2 receivers" to "MLDv2 listeners".

Section 10.
It would make sense to ask the IANA to assign a *link local scope* multicast
address.
It would make sense to say ICMPv6 instead of ICMP in this section
as well.

The specification potentially introduces new name spaces.
At least the multicast address record type is such a name space.
It would make sense to state this in the IANA considerations section.
(I'm Assuming that the assignment of new types be restricted to standards 
action.)

Section A.2 #1 made me believe that routers in this protocol
would track membership per host which it clearly doesn't do.
And doing it per host doesn't seem to be an improvement.
For instance, asking a host whether it wants group X would 
still require a timeout since the listener does not respond to queries
for groups in which it is not interested.
So this argument is weak at best.
What could be done without host supression is for the router to
count the number of hosts interested in a particular group
(at least for include mode - haven't thought about exclude mode)
which makes it easier for the router to detect when it is likely
that the last listener has unsubscribed without having to query for
every leave.

Appendix A.2 #2 seems to have an odd conclusion as the last sentence.
Since MLDv2 mandates compatibility with MLDv1 and MLDv1 has host supression
Thus I don't see how the snooping switches can be made simpler by removing
anything at this point in time. In some future date when the v1 compatibility
is removed that might be the case.
A point that is missing is that in the precense of MLD snooping switches
the utility of host supression is zero. That coupled with #3 and #4
seems to make a strong case for removing host supression.
But #1 and the current #2 seems weak arguments at best.

Appendix B
Change "65,535 seconds" to either "65,535 milliseconds" or "65 seconds".

   Erik

_______________________________________________
magma mailing list
magma@ietf.org
https://www1.ietf.org/mailman/listinfo/magma