Re: [core] Stephen Farrell's Discuss on draft-ietf-core-coap-15: (with DISCUSS and COMMENT)

Carsten Bormann <cabo@tzi.org> Sat, 27 April 2013 19:33 UTC

Mime-Version: 1.0 (Mac OS X Mail 6.3 \(1503\))
Content-Type: text/plain; charset="iso-8859-1"
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <20130425012605.7589.40567.idtracker@ietfa.amsl.com>
Date: Sat, 27 Apr 2013 21:33:05 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <A0F7F10A-2FF1-41EA-8BF5-B92BFD8AC067@tzi.org>
References: <20130425012605.7589.40567.idtracker@ietfa.amsl.com>
To: Stephen Farrell <stephen.farrell@cs.tcd.ie>
Cc: draft-ietf-core-coap@tools.ietf.org, core-chairs@tools.ietf.org, The IESG <iesg@ietf.org>, core@ietf.org
Subject: Re: [core] Stephen Farrell's Discuss on draft-ietf-core-coap-15: (with DISCUSS and COMMENT)
Precedence: list

Stephen,

thank you for this comprehensive review.
I have done some of the changes as simple editorial fixes, these are
marked like [1316] below and can be reviewed in
http://trac.tools.ietf.org/wg/core/trac/changeset/1316
(Overview in http://trac.tools.ietf.org/wg/core/trac/timeline).

Some of the replies are marked "-> Ticket": This means that the
authors think that the change is a good idea, but probably needs a bit
more discussion with the WG, so we will handle this as a ticket.

Grüße, Carsten


        ----------------------------------------------------------------------
        DISCUSS:
        ----------------------------------------------------------------------


        I will end up balloting yes for his. I think its good work
        and has lots of implementations. Note also that some of the
        discuss points here should be easily resolved or are just
        checking stuff. (Its in the nature of very very long
        documents

... that they get longer with each review cycle, and ...

        that the accumulation of such stuff generates more
        discuss points.) Anyway, let's discuss...

        (1) 2.3: What if the URI scheme doesn't have a host or port
        or path or query?  Also in 5.6, 2nd bullet in list: Just to
        note that if you were using ni URIs with CoAP, then you no
        longer need to insist on exactly the same URI (e.g. the
        authority part needn't matter with ni URIs, the alg-val part
        is what counts). That might be true of other schemes too, so
        perhaps this statment is scheme specific to some extent?

That is a very good point.  I think what happens here is that some
schemes such as the ni scheme open a way of matching that is specific
to that scheme and not available in the default case.  We should add a
statement to this effect to the definition of "Cache-Key" that needs
to be added at the end of this section.

Add at the end of (the introduction to) 5.6:
»The set of options that is used for matching the cache entry is also
collectively referred to as the "Cache-Key".  For URI schemes other
than coap and coaps, matching of those options that constitute the
request URI may be performed under rules specific to the URI scheme.«

-> Ticket

        This is just a discuss point to check that you're ok with
        CoAP being restricted to some URI schemes in this manner,
        the ni URI case is just an example I happen to know fairly
        well:-) So I'll clear this one when told that this is
        considered acceptable but I want to check the general issue
        about uri-scheme dependencies for CoAP. The same point
        occurs in 5.7.1 and maybe elsewhere btw. So basic point is:
        please provide some sensible description of which URI
        schemes can be used with CoAP and which cannot.

The specification defines two URI schemes, coap and coaps.
It is probably a good idea to start some thinking about how RFC 6920
can be used, but that should be a separate specification.
So, I think, the answer is "any URI scheme defined to use the CoAP
protocol".  Do we need to state this explicitly?

        (2) 4.2, "when the timeout is triggered" - what happens with
        sleepy nodes that only wake on external events, and where
        e.g. if 2 timeouts have elapsed whilst asleep? Not sure if
        odd behaviour of that kind could cause much harm, but was it
        considered? This could also affect the definition in 4.8.2
        of MAX_TRANSMIT_SPAN.

The intention is that retransmissions stay in the overall envelope of
MAX_TRANSMIT_SPAN, even if they are not perfectly spaced.
Indeed, this could benefit from some additional discussion.
-> Ticket

        (3) 4.2, last para: this creates an attack that can work
        from off-path - just send loads of faked ACKs with guessed
        Tokens and some of 'em will be accepted with probability
        depending on Token-length and perhaps the quality of the RNG
        used by the sender of the CON. That could cause all sorts of
        "interesting" application layer behaviour. Why is that ok?
        (Or put another way - was this considered and with what
        conclusion?)

Actually, the ACK would be matched on the Message ID (or I could
simply point to the penultimate paragraph of 5.3.1).  This is a
relatively powerful attack that could be added to the list in 11.4.
-> Ticket

        I suspect you need to have text trading off the
        Token length versus use of DTLS or else CoAP may end up too
        insecure for too many uses. (Note: the attack here is made
        worse because the message ID comparison can be skipped.
        Removing that "feature" would help a lot here.) 5.3.1's
        client "may want to use a non-trivial, randomized token"
        doesn't seem to cut it to me. How does this kind of
        interaction map to DTLS sessions/epoch? Basically, I'd like
        to see some RECOMMENDED handling of token length that would
        result in it not being unsafe to connect a CoAP node to the
        Internet. (And please note recent instances where 10's of
        thousands of insecure devices have been found via probing
        the IPv4 address space. These are real attacks.)

Well, ceterum censeo BCP39.
But it seems the discussion at the end of 5.3.1 could make the
security considerations for selecting a token value in NoSec mode more
explicit.  Right now the basic consideration, but not the parameters
that go into it, is discussed in 11.4 as well.
-> Ticket

        (4) 4.4, implementation note - this seems unwise since it
        means that once Alice has interacted with Bob, then Bob can
        easily guess the message IDs that Alice will use to talk to
        Charlie.

Not every CoAP client will have the means to keep state per peer server.
(It is not quite "once Alice has interacted", because the initiator of
an exchange chooses the Message ID.  But it is often not too hard to
trick Alice into initiating an exchange.)

        (5) 4.6, last para  - this only applies to insecure uses of
        CoAP, you should point that out

"can" is indeed probably too strong.
-> Ticket

        (6) 6.2 - "the UDP datagrams MUST...use DTLS" is fine but
        maybe not enough, if the request uses DTLS then presumably
        so MUST *all* response messages, and they MUST use the same
        DTLS session? Or perhaps one with the same authenticated
        endpoints. Don't you need to say that? If you don't then
        just sending the request via DTLS but getting (some)
        response messages in clear would seem to be allowed.  I
        think 9.1 might cover all the above, but want to just check.

This is implicit in the fact that all exchanges (except for multicast)
use the same pair of endpoints for both directions, so you simply
can't reply via NoSec to a DTLS message.  It probably doesn't hurt to
make that explicit somewhere.
-> Ticket.

        (7) 9.1.1, 1st para: what is "the server" - is that the
        destination host from the URI? If yes, then fine.  If no,
        then we need to DISCUSS that. 

Yes, it is.

        (8) 9.1.3.3 - "signed by an appropriate chain of trust" is
        an odd phrase - do you mean it MUST be validated as per RFC
        5280 section 6? If so, say so. If not, say what you do mean.
        (But we might need to talk about it in that case,
        depending;-)

We probably should say:
The certificate MUST be validated as appropriate for the security
requirements, using functionality equivalent to the algorithm
specified in RFC5280 Section 6.
-> Ticket.

        (9) 9.1.3.3 - you don't mention certificate status checking.
        I can see why that's hard to impossible in some n/w's but
        entirely ignoring it seems wrong. Perhaps call out the
        vulnerability and point at OCSP stapling as a potential
        solution, but one that requires further work and/or further
        specification?

Yes, we should point this out.
-> Ticket

        (10) 10.1 - what does https mean here? If it means that
        the request/response are in clear between the source and
        proxy and then encrypted then a) you really really need to
        say that clearly and b) why is that even acceptable and c)
        what if the destination resource requires client auth?

Well, it is way worse, because the proxy needs to decide on the
security policy that it wants to apply to the TLS connection
underlying HTTPS.  This kind of proxy function only makes sense if the
proxy has a legitimate role in the trust chain, e.g. as a domain
boundary controller.  In constrained node networks, this is actually a
likely use case.

        It just seems broken to pretend to use https this way. Going
        via a cross-proxy breaks security.  Similarly, what does coaps
        mean in 10.2?

Again, doing this kind of proxying only makes sense if the proxy has a
legitimate role.

        ----------------------------------------------------------------------
        COMMENT:
        ----------------------------------------------------------------------


        general: 112 pages, sheesh;-) Seriously though, there is
        repetition here that'd be better not there and fewer words
        is better. Too late now though.

(See above...)

        abstract: "CoAP easily..." is a bit of sales-talk, better to
        say "CoAP is designed to easily..."

[1316]

        intro, 2nd para: better to not talk about the WG name and
        its work really, but about the resulting protocol

We used the "CoRE" tag in RFC 6690, and would like to continue using
it, for the overall architecture that the CoAP protocol is a part of.
(If that is a problem, please rename all MPLS documents first :-)

        intro, 2nd para: suggest s/limiting the use of/limiting the
        need for/

[1317]

        intro, last para: more sales pitch language

But it's all true and not even exaggerated!

        1.2, critical option - I wondered here if proxies have to
        know these or just client & server.  "Endpoint receiving the
        message" doesn't make that (ctystal) clear. "Unsafe Option"
        made me wonder more. (It is clear later.)

Hmm.  See my attempt in [1320].

        2.2: This is the first time Token is used. Might be no harm
        to distinguish that explicitly from Message ID.

[1318]

        2.3: For what "security reason" are proxies useful? 

The domain boundary controller one (see above).

        3: Ver field "MUST set this field to 1" - I guess someone
        might set both bits to 1, so saying '01'B might be better.

[1319]

        Section 3: I didn't see where Message ID wrap-around was
        described. I see Martin has a discuss on that which I
        support.

(See my reply there.)

        3: Message ID - with 16 bits that imposes a rate limit on
        how often you can send. 

The rate limit is ~ 250 messages per second per pair of endpoints with
default parameters.

        I don't think that's described

It will be described in the LWIG documents that are discussing Message
ID management in more detail.

        and I'm curious as to whether it'd set to max goodput for CoAP
        that'd be way less than otherwise possible with e.g. HTTP.

Answer: Yes.  (~ 250 KiB/s with default parameters and large messages;
20 kB/s with more realistic 80-byte messages.)  If you need faster,
please use TCP (you will get better congestion control, too).

        3.2: So I can't have an option with a uint value where
        missing != 0? Might be worth saying.

I'm not sure I'm decoding this sentence correctly, but a present
option with a zero-byte uint (value 0) is distinguishable from an
absent option.  E.g., Max-Age has a default value (that is used if the
option is absent) of 60 (seconds).  If the option is present with a
zero-byte value encoding, the default is overridden and Max-Age is set
to 0.

        4.1: middle two paragraphs seem like repetition - maybe they
        could be deleted?

In teaching, I love repetition, if it is done at the right place.
This is trying to recap types and codes, and I think it is at the
right place.

        4.2, 1st para, "acknowledge such a message" - do you mean
        all CONs or just an empty CON? splitting up this para into a
        few or using a bullet list or pseudo-code would be better I
        think.

I don't want to hixiefy the spec too much (section 6 is worse enough).
Just fixed the specific problem: [1321]

        4.2, "a random number between 2 and 3" (replacing names with
        defaults) - ought you recommend some minimun granularity
        just in case some careless developer did something like:

              initialTimeout=ACK_TIMEOUT+
                 rand()%(ACK_TIMEOUT*ACKRANDOM_FACTOR-ACK_TIMEOUT)

I'm not sure there is a good basis for defining a specific minimum
granularity.  This is a good observation for the LWIG implementation
guidance draft, though.
For the specification, I slightly clarified: [1324]

        4.3, last sentence in parenthesis - I have no idea what that
        means

It means that, since RSTs can refer to both CONs and NONs, you need to
avoid using the same Message ID for a CON and a NON during the time
you might hit a RST.

        4.4, last para: I just wonder if any NAT or v6 transition
        schemes might invalidate this MUST?

This is a matching rule that is enforced at the endpoint that sent a
CON/NON.  If the NAT/transition scheme breaks the property that when
you send a message to IP address X and Port Y, you get back the reply
message from IP address X and Port Y, CoAP is not going to work over
it.  (Neither will DNS or anything else much, so I hope no such NAT or
transition scheme is deployed widely).  By the way, it also means you
fundamentally can't do ACKs to a multicast message, because they won't
match.

        4.6, 1st sentence: don't get that, maybe better deleted.

This is trying to alert implementers to the fact that there may be
overriding concerns with respect to the choice of a message size that
are not part of the CoAP spec.

        4.8.1, DEFAULT_LEISURE is in table 1 but not discussed until
        section 8, a fwd ref at least would be good

[1322]

        5.2.2, "The server maybe initiates..." seems too casual.

[1323]

        5.3.1, 3rd para - the note about using the same token for
        different source ports seems broken to me. I don't think you
        say anything to the effect that the response has to go to
        that source port.

This is implied in the endpoint concept.  A different port is a
different endpoint is a different client.  (What the introduction to
5.3 is really trying to say is that the token space is per
endpoint-pair.)

        5.4.6, option numbers can be 16 bits long, in that case bit
        7 is not the lsb - does that need fixing? Similarly with
        Figure 11.

Clarified some more: [1325]
(Figure 11 is fine, because this operates on the actual number.)

        5.5.2, I buy your argument here about language tagging, but
        what happens at a CoAP->HTTP g/w? Doesn't language tagging
        become an issue there? How's that handled?

This is best compared to the human-readable component of the HTTP
status line, called Reason-Phrase in RFC 2616 and reason-phrase in
section 3.1.2 of draft-ietf-httpbis-p1-messaging-22.txt.
We mainly simplify this from "an encoding that is a superset of
US-ASCII [USASCII]" to "UTF-8".  It is unlikely that constrained node
implementations will need a language-tag here more urgently than HTTP
users.

        5.10.1 - can any of the valid options for Host from 3986 be
        used? e.g. IPv6 addresses as text in square brackets,
        decimal form IPv4 addresses? You do have some guidance later
        but I think that'd be bettern being more obvious.

Unfortunately, CoAP needs to use URIs pretty much as-is, so it does
import all this complexity.  Note that the default value of Uri-Host
already uses an IP address, so the need for address literals should be
clear from the context.

        5.10.5 - I'm probably just confused by reading so much;-) If
        there are two Max-Age options, which wins? Where's that
        stated in general?

For options that are not specified as repeatable (table 3),
supernumerary option instances are interpreted according to 5.4.5
(which we just fixed slightly in [1308]).

        5.10.8.1 - I don't get why its ok to not say which If-Match
        to pick if more than one matches. Why's that?

There is only one current representation of the target resource.  If
that matches any of the If-Match options, the condition is fulfilled.
Essentially, you can say "if the resource has value a or b, replace it
with the payload c of this message".  Whether it had value a or b does
not matter, because there is only one replacement value c that can be
given.  (If that is not the intention, there is no way to say this
with what we have.)

        6.1 - I don't get what you mean by saying that the coap URI
        scheme "supports" /.well-known, maybe that'll be clear in
        section 7. (I don't think it was.)

RFC5785:
   A well-known URI is a URI [RFC3986] whose path component begins with
   the characters "/.well-known/", and whose scheme is "HTTP", "HTTPS",
   or another scheme that has explicitly been specified to use well-
   known URIs.

We thus explicitly specify that usage for the COAP scheme (and by
derivation in 6.2 for COAPS).

        6.2 - s/for privacy// - DTLS does authentication, integrity
        and confidentiality, not privacy

Oops.  [1326]

        7.1 - what if I want to only do discovery via DTLS? What
        does "support" mean for port 5683 then?

Well, what links exactly you offer there is up to the server.
But the current text says that if you do any resource discovery at
all, you also MUST provide something via CoAP over UDP on port 5683.
There is a trade-off between security and manageability here.
Using this, I can find out that a node does speak CoAP, and if it
wants to make any (e.g. management) resources widely available, it can
offer them there.
(We can argue how MUSTy that MUST must be.  E.g., RFC 4443 requires
any router to send an ICMPv6 Time Exceeded message with Code 0 in
response to a packet sent to it that is formed in a certain way.  On
the other hand, there is only a "MUST implement" on the ICMP echo
function.)

        7.2 - I didn't really get how this works, but I assume that
        if I re-read RFC6690, then I'd get it. Is that a good
        assumption?

You probably need to read the whole tome on Web linking (starting from
RFC 5988) as well; reading draft-shelby-core-resource-directory-05.txt
won't hurt either.  There is a lot of interest in making this work
well for Smart Object Networks, so I expect documentation to grow in
this space.

        8.2.1, 1st para - this talks about "the cache" but I don't
        think you've (so far) told me that clients that send
        multicast requests have to have a cache for responses. Don't
        you need to?

s/the/a/ [1327]

        8.2.2 - Please make it clear(er) that bormann-coap-misc is
        properly an informative ref. (Assuming it is.)

s/a/one/ [1328]

        section 9, last para: what's that mean? I got the feeling
        the text was trying to hide something from me fwiw.

To the contrary, it is making very explicit some important limitations
on what the current authorization can do.  Essentially, you can only
be secure through a proxy using either object security (end-to-end) or
when the proxy actually is part of your security model.

        6.5 - I thin you need a security consideration somwhere
        about comparing coap(s) URIs and the potential for access
        control screw-ups. 

Good point.
-> Ticket

        9.1 - have people implemented the ECDH ciphersuites in CoAP
        testing? Knowing if this is just text or also running code
        might help discuss resolution.

Don Sturek reported on the list that multiple interoperable
implementations exist (and obtained ZigBee IP certification) for
TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8, also using it with (TCP) TLS for
SEP 2.0.  Zach Shelby reports they have the same ciphersuite running
with DTLS/CoAP (CoAPS).

Various open source implementations of CoAPS are ongoing.
Matthias Kovatsch reports on the list:
Californium has a full CoAPS implementation (PSK, ECDHE,
RawPublicKeys, Certificates):
https://github.com/mkovatsc/Californium
We had DTLS interops with Bremen (TinyDTLS), SICS (JCoAP + own DTLS),
and Silicon Labs (own DTLS, unknown CoAP).
PSK worked with all of them.
ECDHE was tested successfully with SICS and Silicon Labs.
Mutual Authentication was working with SICS.
Fragmentation and RawPublicKeys could not be tested for
interoperability with others so far, but it works with Californium
only nodes.

        9.1.3.3 - throwing in RSA as a SHOULD (albeit within a
        section that's a MAY) is odd - if devices can do RSA then
        why not have 'em all do it for the raw public keys and get
        the interop gains that will accrue from that.

Well, this is for the (somewhat high-end) case of cert usage.

WG: Is there a cipher suite that better its the other ones we use?
E.g., Why didn't we use TLS_ECDHE_PSK_WITH_AES_128_CBC_SHA (RFC 5489)?
I notice that in the transition from -05 to -06, we got rid of
TLS_RSA_WITH_AES_128_CBC_SHA in favor of
TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8, but not of
TLS_RSA_PSK_WITH_AES_128_CBC_SHA.  The changes comment is "DTLS cipher
suites aligned with ZigBee IP, DTLS clarified as default CoAP security
mechanism (#138, #139)".
http://trac.tools.ietf.org/wg/core/trac/ticket/139 mentions
TLS_PSK_WITH_AES_128_CCM_8 and TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8, but
not anything about the cert-with-PSK case.

-> Ticket

        11.5 - this is a bit detailed, wouldn't a reference do
        most of it?

It may be a bit graphic, but I don't know a good reference discussing
CoAP cross-protocol attacks.  Just discussing them in general terms
didn't seem to cut it.  (And the secdir reviewer liked the text :-)

        12.7 - as it turns out I also don't see why this needs
        two ports - the cost is two more bytes for security which
        is significantly-enough less than the current cost (in
        terms of message size) for security. Am I wrong?

On constrained node networks, we are fighting for every byte.  GHC
gives us back most of what DTLS and the relevant ciphersuites waste a
bit carelessly, so we mostly pay for the authenticator, and that is
money (battery) well-spent.  There are indeed good ways to multiplex
on the first byte of the message (we effectively use only 1/4 of the
space), but that would require the cooperation of the TLS WG, and we
removed this feature between coap-06 and coap-07 (we discussed this
again in the Atlanta TSVAREA meeting,
http://www.ietf.org/proceedings/85/minutes/minutes-85-tsvwg).
So for now, the second port is the way we arrived at.

        Please also consider the secdir review [1] (if you've
        not done so already, I do see a shepherd response).

          [1] http://www.ietf.org/mail-archive/web/secdir/current/msg03873.html

The comments from that review should have been covered in -15 already.

[core] Stephen Farrell's Discuss on draft-ietf-co… Stephen Farrell
Re: [core] Stephen Farrell's Discuss on draft-iet… Carsten Bormann
Re: [core] Stephen Farrell's Discuss on draft-iet… Stephen Farrell
Re: [core] Stephen Farrell's Discuss on draft-iet… Carsten Bormann
Re: [core] Stephen Farrell's Discuss on draft-iet… Stephen Farrell
Re: [core] Stephen Farrell's Discuss on draft-iet… Carsten Bormann