Re: [core] Martin Stiemerling's Discuss on draft-ietf-core-coap-15: (with DISCUSS and COMMENT)

Carsten Bormann <cabo@tzi.org> Thu, 25 April 2013 11:26 UTC

Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Content-Type: text/plain; charset="windows-1252"
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <20130424092228.10345.76059.idtracker@ietfa.amsl.com>
Date: Thu, 25 Apr 2013 13:26:36 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <8C88F8A7-9B07-4D82-8EA8-89794BD32EFC@tzi.org>
References: <20130424092228.10345.76059.idtracker@ietfa.amsl.com>
To: Martin Stiemerling <martin.stiemerling@neclab.eu>
Cc: draft-ietf-core-coap@tools.ietf.org, core-chairs@tools.ietf.org, The IESG <iesg@ietf.org>, core@ietf.org
Subject: Re: [core] Martin Stiemerling's Discuss on draft-ietf-core-coap-15: (with DISCUSS and COMMENT)
Precedence: list

Hi Martin,

thanks for this detailed review.
I have done some of the changes as simple editorial fixes, these are
marked like [1310] below and can be reviewed in
http://trac.tools.ietf.org/wg/core/trac/changeset/1310
(Overview in http://trac.tools.ietf.org/wg/core/trac/timeline).

Some of the replies are marked "-> Ticket": This means that the
authors think that the change is a good idea, but probably needs a bit
more discussion with the WG, so we will handle this as a ticket.

Grüße, Carsten

        ----------------------------------------------------------------------
        DISCUSS:
        ----------------------------------------------------------------------

        A well-written document and I have a few points to discuss. 

        The congestion avoidance mechanisms look ok, but I assume we will get
        feedback from implementers and deployments on the parameters and
        mechanisms. It would be good to get this feedback documented at some
        point. 

Indeed, this will require active attention by the WG.
Fortunately, researchers are looking at this, and I expect
additional results to become available soon.

        Here are the issues (based on my own review and input from Joe Touch and
        Michael Scharf):

        1) IPv6 UDP checksum calculation
        It is not clear if zero UDP checksums are permitted or not permitted with
        COAP.?
        (UDP zero checksums:
        https://datatracker.ietf.org/doc/draft-ietf-6man-udpzero/)
        That should be specified.

        2) Handling of UDP-lite
        Can UDP-lite (RFC 3828) be used or cannot be used in conjunction with
        CoAP?

Re 1 and 2: We just had a bit of discussion on the WG list, because we
never had considered this.  The consensus seems to be that CoAP will
be used on a wide variety of systems, and neither host support nor
support e.g. in RFC 6282 is available.  (Citing from the discussion:
"They seem to be specialized optimizations that are not well deployed
and somehow seem to add overall deployment complexity and performance
risk to the solution even if they provide some CPU reduction.")

I don't actually think we need to say anything new in the draft
because UDP is distinct from UDP-lite and we are not referencing 3828;
neither do we reference 6936-to-be, so we are stuck with the features
in 0768 and 2460.  (I also believe the discusion in 6936 puts CoAP out
of its own scope.)  But, of course, we would be open to suggested
text.

        3) Fragmentation of messages
        The recommendation in Section 4.6 about the path MTU is generally valid
        only for IPv6. For IPv4, 567 bytes is the safe area to work without
        fragmentation, though in today WANs 1280 work perfectly, but I am not so
        sure about the networks envisioned for CoAP. This 576 bytes for IPv4 are
        mentioned in the implementation note, but deserves text on the same level
        as for IPv6. 

IPv4 simply hasn't received a lot of attention here.  The more
normative text is about message size selection; there should be little
practical difference between IPv4 and IPv6 here.
The 576 byte MRU is more of a theoretical value.  IPv4 implementations
will have live with IP layer fragmentation for the larger message
sizes just as 6LoWPAN will have to live with adaptation layer fragmentation.

        4) Ensuring no fragmentation with IPv4
        The implementation note in Section 4.6 states that for IPv4 it is 'harder
        to ensure that there is no IP fragmentation'. This neglects the
        possibility of using the Don't Fragment (DF) flag in the IPv4 header and
        also that there is possibly feedback from a node enroute that the MTU is
        too big if the DF flag is set, i.e., by means of an ICMP error message. 
        Should there be any recommendation or protocol machinery to deal with
        path probing? E.g., referencing RFC 4821 (PMTUD). 

CoAP is meant to be operable without persistent state between
exchanges.  Normal operation of CoAP in constrained implementations
(if they even implement IPv4) will not use DF.  More advanced
implementations may be able to keep state about peers; it should be
pretty obvious how to do this (and will generally be combined with
establishing congestion control state).  I have added a reference to
RFC 4821 to the discussion of path MTU discovery [1310].

        5) Reaction to network errors that are signalled
        I wonder why the draft is not discussing any reaction to network failures
        signalled through ICMP messages. This relates also to my DISCUSS issue no
        4. 

We didn't find much guidance in existing UDP-based protocols on
handling ICMP messages.  RFC5405 section 3.7 is on a level of "can
utilize", and the practical problems of robustness and validation of
messages (including against attacks) make handling ICMP messages in
constrained implementations difficult.  In any case, even advanced
forms of ICMP handling are unlikely to impact CoAP protocol processing
beyond improved local error handling, so we believe the subject is
best left to a point in time when more operational experience is
available.

        6) Idempotency
        The discussion of idempotency is useful, but overlooks message order.
        I.e., the discussion appears to assume that a sequence of the same
        actions has the same effect as a single action, but this is true only if
        different sets of actions (from different sources, or copies of different
        actions from a single source) aren't interleaved. This should be
        addressed. 

The CoAP specification generally does not attempt to explain all the
relevant concepts of the Web, but defers to other specifications.
Section 9.1.2 of RFC2616 contains a discussion about sequences of
idempotent method executions.  Section 9.1 is explicitly referenced
from section 5.1, which is the main section discussing idempotence.

        7) Protocol reactions to reserved or prohibited values
        Regarding reserved or prohibited values in the IANA section, it would be
        useful to be clear about what happens when those values are seen. I.e.,
        should they be ignored, generate an error, etc.

Good point.  We need to check this in detail.
-> Ticket.

        8) Flow Control/Receiver Buffer
        The protocol does not have any real means for the receiver to control the
        amount of data that is being sent to it. I can understand the attempt to
        provide a simple protocol, but adding a very basic flow control mechanism
        will not prohibitively increase the complexity of the protocol, while
        improving robustness. 
        According to Section 2.1, a node can always return a RST if the message
        cannot be process for whatever reason. 
        I propose to add an option to the RST message that allows the message
        receiver to state how much data it is willing to accept from a particular
        sender or in general (up to the implementation).  

(RST messages are empty messages and cannot have options.)  CoAP
servers currently perform load shedding by not reacting to an incoming
message at all.  Note that an 5.03 error can also set the Max-Age
option in place of the "Retry-After" known from HTTP (section
5.9.3.4).  There has been discussion on more explicit feedback for
load shedding, e.g.,
draft-greevenbosch-core-minimum-request-interval-00; currently, the WG
feels that finding a good solution (or even understanding the problem
space) for this requires more study (see minutes from Orlando, where
we discussed Bert's draft).

        9) Handling a wrapping message IDs
        According to Section 4.4.:
        "The same Message ID MUST NOT be re-used (in communicating with the same
        endpoint) within the EXCHANGE_LIFETIME (Section 4.8.2)" with
        EXCHANGE_LIFETIME of 247s. 
        By now it is unrealistic that the message ID of 16 bits will wrap around
        in that time frame, but protocols live long and at some later time it can
        be possible. 
        However, the protocol doesn't have any means to detect wrapped message
        IDs.

Indeed, the onus is on the sender to ensure that the Message ID does
not wrap around within EXCHANGE_LIFETIME.  In contrast to, say, the
IPv4 IP ID, the potential problem of Message ID reuse has been
well-highlighted, and it is receiving additional attention in the LWIG
drafts that are starting to provide guidance on CoAP implementation.
Implementations that need more than ~ 250 messages per second (per
peer endpoint) may need to use multiple source endpoints.
We don't think much more can be or should be done here.

        ----------------------------------------------------------------------
        COMMENT:
        ----------------------------------------------------------------------

        1) Endpoint vs. host 
        This document uses the term "endpoint" to refer to the combination of
        address and port, and possibly also security association, that is local
        to one end of an association. I would have expected the more common term
        "socket", as originated in TCP parlance, to be used instead (even though
        here the term is used in a connectionless context). 

Most implementers have a quite different idea of a "socket", so this
language would be rather confusing for them.  The authors might have
used "transport address", but "endpoint" seemed shorter.

        2) Reaction to network errors due to local link errors
        Link layers can give some hints if the link is up, down, etc.
        Traditionally, this has not been taken into account too much when design
        transport protocols, but wouldn't it make sense to take it into account
        for CoAP, as it is much more working in constrained environments?

As a quality of implementation issue: certainly.
I also expect this to come up in the LWIG work.
But how would it impact the CoAP specification?

        3) Short messages
        Section 3., paragraph 1:

          CoAP is based on the exchange of short messages which, by default,
          are transported over UDP (i.e. each CoAP message occupies the data
          section of one UDP datagram).  CoAP may also be used over Datagram

        What are short messages in terms of bytes? Is this a hidden protocol
        requirement?

Section 4.6 discusses message sizes and should leave the implementer
with a pretty good idea what message sizes are a good fit for CoAP.
I don't think forward-referencing to 4.6 from section 3 is necessary.

        4) randomization of message IDs

        Section 4.4., paragraph 3:

          Implementation Note:  Several implementation strategies can be
             employed for generating Message IDs.  In the simplest case a
        CoAP
             endpoint generates Message IDs by keeping a single Message ID
             variable, which is changed each time a new Confirmable or Non-
             confirmable message is sent regardless of the destination
        address
             or port.  Endpoints dealing with large numbers of transactions
             could keep multiple Message ID variables, for example per prefix
             or destination address (note that some receiving endpoints may
        not
             be able to distinguish unicast and multicast packets adressed to
             it, so endpoints generating Message IDs need to make sure these
        do
             not overlap).  The initial variable value should be randomized.

         the initial variable SHOULD be randomized, just to avoid blind off
         path attacks, right?

Yes.  We are trying to avoid RFC 2119 language in the implementation notes.
Since this is about a variable that only exists in a specific
implementation strategy, a SHOULD wouldn't work very well, anyway.

        5)
        In Section 4.6.:

         larger than an IP fragment result in undesired packet fragmentation.
        should read larger than an 'IP packet' instead of 'IP fragment'.

Indeed, [1311].

        6)
        Section 5.4.1., paragraph 7:

          Critical/Elective rules apply to non-proxying endpoints.  A proxy
          processes options based on Unsafe/Safe classes as defined in
          Section 5.7.

         I suggest moving this statement to the beginning of this subsection,
         as it provides important information that shouldn’t be missed.

Since the entire next subsection also discusses the subject, I think
there is little danger that this will be missed.  (Putting the
exception early confuses the section, so I would like to avoid this
change.)

        7) Dependency between application layer and CoAP
        Section 5.2.2., paragraph 2:

          The server maybe initiates the attempt to obtain the resource
          representation and times out an acknowledgement timer, or it
          immediately sends an acknowledgement knowing in advance that there
          will be no piggy-backed response.  The acknowledgement effectively
        is
          a promise that the request will be acted upon.

        This may or may not be an issue:
        Assuming that the server did sent an ACK for a request but is never ever
        fulfilling its promise to send any real 'response'. The request/response
        initiated from the client is done on the CoAP level, but not for the
        application on top. 
        Is there any recommendation for the application on top of CoAP how to
        handle such cases?

Generally, we would expect applications to handle this in similar ways
they are handling other application-layer timeouts.  E.g., many e-mail
and web applications timeout requests after a time on the order of a
minute.  We think this is another issue best left for discussion after
some operational experience is available.

[core] Martin Stiemerling's Discuss on draft-ietf… Martin Stiemerling
Re: [core] Martin Stiemerling's Discuss on draft-… Carsten Bormann
Re: [core] Martin Stiemerling's Discuss on draft-… Martin Stiemerling
Re: [core] Martin Stiemerling's Discuss on draft-… Carsten Bormann