RE: [NSIS] AD review: draft-ietf-nsis-ntlp-09 (minor issues and nits)

Hi Magnus,

Many thanks for the detailed review. I think I at least understand
the bulk of the issues you raise, which is a good omen.

This email has some initial comments on the minor issues/nits since
a lot of them can be cleared up fairly easily. Anything that 
generates further discussion I will capture explicitly in our
issue tracker as/when it happens. I'll write separate emails
on the major issues M1-M4.

Cheers,

robert h.

> -----Original Message-----
> From: Magnus Westerlund [mailto:magnus.westerlund@ericsson.com] 
> Sent: 01 June 2006 15:23
> To: nsis@ietf.org
> Subject: [NSIS] AD review: draft-ietf-nsis-ntlp-09
> 
> 
> Hi,
> 
> I have finally finished my review of GIST. I do have a number of 
> comments and question with potentially serious implications. So lets 
> start with them. Please remember that I have only read it 
> once plus some 
> checks of it when writing up things. So there might be cases when a 
> simple reference to where things are defined can resolve the issue.
> 
> I will also be on vacation 3rd-11th of June so do not expect any 
> response from me during this period.
> 
> Major issues
> ------------
> 
> M1. Congestion Control
> M2. Security functions for d-mode 
> M3. NAT traversal for GIST.
> M4. IANA consideration section

will be covered in separate mails.

> 
> Minor Issues
> ------------
> 
> L1. The lack of a collected ABNF makes it hard to verify. Has the 
> present ABNF been run through any of the syntax checkers available at:
> http://tools.ietf.org/inventory/verif-tools ?

earlier versions were certainly checked. the latest version passes
Bill Fenner's checker with the exception that there are two
common-headers,
the second in the error message format. That will be fixed.

> 
> L2. Section 3.3, page 12, second paragraph:
> "  In addition, it should be noted that NAT traversal almost certainly
>     requires transformation of the MRI field in GIST messages (see
>     Section 7.2).  Although the transformation does not have to be
>     defined as part of the standard, the impact on existing GIST-aware
>     NAT implementations should be considered."
> 
> It seems that not defining the NAT behavior for GIST can potentially 
> cause interop problems and make it very hard to use. What is 
> the plans 
> for addressing this? Are there already existing GIST-aware NATs?

See the (separate) answer on M3 (to come).

> 
> L3. Section 4.1.2:
> "Reliability: This attribute may be 'true' or 'false'.  For the case
>        'true', messages MUST be delivered to the signaling 
> application in
>        the peer exactly once or not at all; if there is a 
> chance that the
>        message was not delivered, an error MUST be indicated 
> to the local
>        signaling application identifying the routing 
> information for the
>        message in question."
> 
> How does one evaluate "if there is a chance that the message was not 
> delivered"? I think there is need for some clear definition 
> on when the 
> normative requirement applies.

This really depends on the protocol being used. In the current 
specification, you can only use Forwards-TCP to satisfy the reliable
attribute, so I think there should be a reference to 5.7.2 to say
how a TCP user can detect that a message might not have been delivered.
This then becomes part of the rules for defining a new MA protocol
(if you use it for reliability, indicate how failures are detected).

> 
> L4. Have anyone considered if one can run draft-ietf-pmtud-method-06 
> style of MTU discovery with GIST D-mode?

In theory this would be possible, however, the design rationale for
splitting C/D modes suggests it's the wrong thing to do. The idea is
to make the Query small and ideally robust against PMTU issues (since
you have to do a Query for each flow), and amortise the cost of PMTUD 
over the lifetime of a MA (which can support multiple flows). We are
also assuming 
that solid implementations of PMTUD get re-used by GIST implementations
(because they are built into the MA protocols), rather than being built 
into GIST directly, so advances in PMTUD get incorporated automatically.

> 
> L5. Section 5.1, page 33:
> "It MUST echo the
>     MRI (with inverted direction), SID, and Responder-Cookie if the
>     Response carried one;"
> 
> I don't think "with inverted direction" is really clear here. 
> What does 
> it mean to invert the direction of the MRI?

Change D=0 ro D=1 or vice versa. Will clarify.

> 
> L6. Section 5.2.1 and A.1:
> 
> I have some problems with that Section 5.2.1 first declares 
> that these 
> fields are present, while not defining them fully. Like nr of 
> bits etc. 
> Then section A.1 is also not a complete definition of the 
> fields in the 
> message. I think doing an informative reference in 5.2.1 is okay, and 
> then actually have a complete definition in A.1.

OK

> 
> I would also like to point out that unless you always write out the 
> number of bits a field have, and the fields in the order they 
> appear in 
> the graph you will confuse people reading it with text to speech 
> synthesis, like Sam Hartman.

OK

> 
> Do consider that these comments applies to also other 
> protocol elements 
> defined in appendix A, such as A.3.3.

OK

> 
> L7. Section 5.2.2:
> "If the message is downstream, the IP-TTL MUST be set to the TTL
>           that will be set in the IP header for the message 
> (if this can
>           be determined), or else set to 0."
> 
> How is a message "downstream" I think you need to specify if this 
> relates being sent in the downstream direction.

OK, will clarify.

> 
> L8. Section 5.3.2:
> 
> "   GIST may send messages addressed as {flow sender, flow receiver}
>     which could make their way to the flow receiver even if 
> that receiver
>     were GIST-unaware.  These should be rejected (with an 
> ICMP message)
>     rather than delivered to the user application (which 
> would be unable
>     to use the source address to identify it as not being part of the
>     normal data flow).  Therefore, a "well-known" port is required.
> "
> 
> I am having difficulties understanding how the end-peer could make a 
> difference between GIST messages going the whole way to the 
> end-peer per 
> design and those that should be rejected by the end-peer. 
> Please explain 
> what the intention here is.

OK, an example:
GIST gets assigned the port number 5682.
A pair of endpoints (A, B) set up a media flow A->B. B is GIST
unaware, and the application get allocated port 5682 by the node's
OS (just by bad luck). 
As part of GIST signalling initiated by A, an intermediate node C
sends a Query which is received by B.

This query now has source IP# of A and destination IP# of B and the 
same destination port as the media flow. So it may well be delivered 
to the media application (and look like garbage); if B checks the 
source port the problem will only arise if C happens to chose the
same source port for GIST as A did for the media flow.

It may be that we don't have to worry about this in real life (i.e.
we accept that occasionally bad things happen). Else it should be
explained better ;-)

> 
> L9. Section 5.4.1: DCCP:
> The note on DCCP forgets one important factor. DCCP does not 
> provide any 
> reliability.

It was implicit in the comment on fragmentation ... but we can
make it explicit.

> 
> L10. Section 5.7.1:
> "However, the object
>     contents MUST be retained only for the duration of the 
> Query/Response
>     exchange and any following association setup, and afterwards
>     discarded."
> 
> Is "afterwards" only to referring to completed or failed 
> association setup?

Both. Probably we should define another default time here (cf. M1).

> 
> L11. Section 5.7.1:
> "The responding node MUST verify this to ensure that no bidding-down
>     attack has occurred; see Section 8.6 for further discussion."
> 
> As section 8.6 states this is not sufficient for detecting a 
> on-path man 
> in the middle attack. So I think the wording the the above 
> sentence is 
> to strong.

I hope not: 8.6 is supposed to say that on-path man-in-the-middle 
attacks can be reliably detected (modulo the residual threat listed
in section 5.7). But maybe the correct phraseology would be along
the lines of saying that "*if* a bidding down attack is detected after
checking the SCD, then ..." (i.e. we accept that there may be false
negatives but that the MUST applies to the result of the verification).

> 
> L12. Section 5.7.2:
> 
> What address is used in the TCP connection. The port is specified but 
> not the address.

Destination address: As returned in the Network-Layer-Information in the
Response.
Source address: free.

> 
> L13. Section 5.7.3:
> 
> Why is TLS 1.0 made mandatory and not TLS 1.1?

Earlier, TLS1.1 was the mandatory one; but there was some expert
advice to the contrary; see
http://www1.ietf.org/mail-archive/web/nsis/current/msg05682.html.

> 
> L14. Section 5.8.1.2:
> "In this case, it MUST use the signaling
>     source address for the IP source address in order to 
> receive the ICMP
>     error."
> 
> How can a sender of message sending a query using d-mode with 
> the flow 
> source address determine that it is a to small MTU that causes the 
> messages to be lost, rather than anything else?  I think the 
> use of MUST 
> in this sentence is wrong. Is it something more than a informative 
> statement saying: To be able to receive ICMP messages the 
> message sender 
> needs to use his own address.

OK; I think there is still a MUST here but the MUST applies to any
transmission with DF set (we don't care why a DF was set, but if it
was, you MUST use your own SA). 

> 
> L15. Section 5.8.1.3:
> 
> Can someone please explain why an upstream request which has 
> traversed a 
> non-GIST aware router can be trusted less than a down stream one?

Routing asymmetry and the (in)ability to apply ingress filtering.
It may be that we don't need to specify this tie-breaker aspect,
however.

> 
> L16. Section 6.4, Rule 4:
> 
>     Rule 4:
> 
>     send MA-Hello message
>     restart NoHello timer
> 
> why is the timeout of a send Hallo message timer reseting the NoHello 
> timer? Shouldn't the NoHello timer be only based on receiving Hello 
> messages from the other side?

I suspect you are right here (should be the SendHello timer).

> 
> Also what is the recommended Hello frequency? It would be 
> good to have 
> some discussion on what reasonable values are. This is also 
> part of the 
> congestion discussion for GIST.

Partly true; however, I think the issues here are less critical,
since the Hello is only sent over MAs which are themselves congestion
controlled.

> 
> Section 7.1.2: GIST probing:
> 
> One more missing recommendation on values for protocol functionality.

See separate answer to M1.

> 
> L17. Section 7.1.4:
> 
> Bullet 1: "The signaling
>         application at E1 MAY begin local repair immediately, or MAY
>         propagate a notification upstream to D1 that re-routing has
>         occurred."
> 
> There seems to be a interaction between GIST and the signalling 
> application about the notification on the change. Are there any 
> recommendations on signalling application implementing the necessary 
> semantics to perform that notification?

Yes, but only implicitly and in the NSLP specifications. Possibly there
should be
more explicit generic guidance on NSLP design, but it has not been clear
up 
to now where is the correct place to capture this information. (That
is, to a large extent I guess it would be preferable if NSLP designers
did not have to understand everything in GIST to be able to design an
NSLP properly.)

> 
> L18. Section 7.2, page 73, last paragraph:
> 
>     A NAT will intercept datagram mode messages with the normal
>     encapsulation containing such echoed NAT-Traversal objects.
> 
> It is not obvious that a NAT will do this for d-mode. Isn't a 
> NAT which 
> is not made GIST aware simply going to translate the IP/UDP 
> headers and 
> forward the packet despite the router alert.

Non-GIST aware NATs are out of scope.

> 
> L19. Section 7.2, page 74:
> 
> "Note that Confirm messages are not translated."
> 
> Why isn't them? Also where is that specified? And what is 
> really meant 
> with "not translated"?

See separate answer to M3.

> 
> L20. Section A.2:
> The "r" bits are not explicitly written to fall under the 
> rules in the 
> last paragraph of that section.

True (!)

> 
> L21. Section A.2.1:
> 
> What is the treatment of the "reserverd" AB=11 for a GIST node?

Good question; I suspect this should be an error case.

> 
> L22. Section A.3.1:
> 
> The "MRM-ID" field is not having the same name as in the IANA 
> consideration section.

OK.

> 
> L23. Section A.3.1.2:
> "D - Direction (always 0 for "downstream")"
> 
> The use of "always" in the above make it unclear what is meant.

The case D=1 is not allowed by the specification; we can fix
with a reference to 5.8.2.2.

> 
> L24. Section A.3.8:
> "the number of GIST payloads translated by the NAT"
> 
> The NAT? There might be several NATs on the path and it is 
> not clear how 
> that is handled. One concern would be if different NATs translate 
> different fields.

See separate answer to M3. (There is an answer to this point
but the spec needs an update to include it.)

> 
> L25. Section 4.1
> 
> The Info Count Field and Info fields. I think the rules regarding 
> ordering of these are to weak. As the order and existence of 
> fields are 
> dependent on the error codes and sub-error code,  it is important to 
> clarify that with normative strength. So please state the rules that 
> applies for resolving the additional data fields and for each 
> error code 
> be clearer on the additional data field. Being explicit in 
> stating none 
> for those that have none is a good idea.

The idea is that the "template" for the sections A.4.4.x tells you
this pretty strictly by virtue of the "Additional Info: " line given for

each error condition. But we can try to make that more formal in
some way.

> 
> 
> Nits
> ----
> 
> N1. There is quite a list of abbreviation used in this 
> document without 
> being spelled out at their first usage. Please correct. I 
> have spotted 
> the following ones:
> - NSLPID
> - RAO
> - SCD
> - N-SM
> - C-mode
> - Q-mode

OK.

> 
> N2. Section 3.4, last paragraph:
> 
> "A consequence of this
>     design is that signaling applications should choose SIDs 
> so that they
>     are cryptographically random, and should not use several 
> SIDs for the
>     same flow unless strictly necessary, to avoid additional load from
>     routing state maintenance."
> 
> I think there is need for a reference to a document on how to achieve 
> cryptographically random numbers. Is RFC 4086 a useful ref?

OK.

> 
> N3. section 6.1:
> 
> "   Rule 4 (rx_Data):
> 
>     if (node policy will only process Data messages with matching
>         routing state) then
>       send "No Routing State" error message
>     else
>       pass directly to NSLP"
> 
> It seems to me that the expression within the IF is wrongly 
> defined. I 
> think it should read:
> 
> if (routing state exist) then
>     pass to NSLP
> else
>     Send error msg.

Nope; if routing state already existed, this state machine would
not be invoked (the message goes straight to 6.2/6.3). Some messages
can still be processed even if no routing state exists, which is 
what this event is supposed to capture.

> 
> 
> N4. figure 7: The established to established arrow has 
> "[!confirmRequired]" which is a bit confusing on what it 
> applies to of 
> the above cases.

OK.

> 
> 
> N5. Section 7.1.2:
> Please include a reference for OSPF.

OK.

> 
> N6. Section 7.2:
> 
> It uses "public" and "private" when referring to the 
> different sides of 
> a NAT. In BEHAVE WG they are using External and Internal instead. The 
> reason is due to that the external side of a NAT may still be 
> a private 
> address space.

OK.

> 
> N7. Section 11:
> Ref 6 and 8 needs to be updated

yes for 8, but isn't 6 still the authoritative reference for TLS1.0?
(So, this depends on the answer to L13).

> 
> N8. Section A.3.5:
> "MUST not" -> "MUST NOT"

OK

> 
> N9. Section A.4.4.9: subcode 3:
> 
> "Invalid Object" -> "Invalid Object Type"?

OK.

> 
> Cheers
> 
> Magnus Westerlund
> 
> Multimedia Technologies, Ericsson Research EAB/TVA/A
> ----------------------------------------------------------------------
> Ericsson AB                | Phone +46 8 4048287
> Torshamsgatan 23           | Fax   +46 8 7575550
> S-164 80 Stockholm, Sweden | mailto: magnus.westerlund@ericsson.com
> 
> _______________________________________________
> nsis mailing list
> nsis@ietf.org
> https://www1.ietf.org/mailman/listinfo/nsis
> 

_______________________________________________
nsis mailing list
nsis@ietf.org
https://www1.ietf.org/mailman/listinfo/nsis