[core] Adam Roach's No Objection on draft-ietf-core-coap-tcp-tls-09: (with COMMENT)

Adam Roach <adam@nostrum.com> Mon, 22 May 2017 20:48 UTC

Return-Path: <adam@nostrum.com>
X-Original-To: core@ietf.org
Delivered-To: core@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 38E3F12702E; Mon, 22 May 2017 13:48:22 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Adam Roach <adam@nostrum.com>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-core-coap-tcp-tls@ietf.org, Jaime Jimenez <jaime.jimenez@ericsson.com>, core-chairs@ietf.org, jaime.jimenez@ericsson.com, core@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.51.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <149548610222.24921.6807834750685175839.idtracker@ietfa.amsl.com>
Date: Mon, 22 May 2017 13:48:22 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/TqnNNGxma3yWM_ZpbrkPsXg6C2Y>
Subject: [core] Adam Roach's No Objection on draft-ietf-core-coap-tcp-tls-09: (with COMMENT)
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.22
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 22 May 2017 20:48:22 -0000

Adam Roach has entered the following ballot position for
draft-ietf-core-coap-tcp-tls-09: No Objection

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-core-coap-tcp-tls/



----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I have removed my DISCUSS, but want to be clear that I remain quite
distressed about the design aspects of this document. I have adjusted my
comments below to trim them down to only those issues that remain in -09.
Beyond the comments left over from -08, I am perplexed that no concrete
mechanism for UDP/TCP failover is provided, nor is any discussion of the
management aspects of configuring between them, nor is any discussion of
which transport protocol(s) may be considered MTI.

I also wish to highlight the somewhat buried request from my original
comments that I believe this document would be vastly improved by
splitting it into one document that deals with TCP, and another that
deals with WebSockets. They are intended for radically different
environments, and a large majority of implementors will care about one
but not the other. Combining into a single document just creates more
work for them.

General — this is a very bespoke approach to what could have been mostly
solved with a single four-byte “length” header; it is complicated on the
wire, and in implementation; and the format variations among CoAP over
UDP, TLS, and WebSockets are going to make gateways much harder to
implement and less efficient (as they will necessarily have to
disassemble messages and rebuild them to change between formats). The
protocol itself mentions gateways in several places, but does not discuss
how they are expected to map among the various flavors of CoAP defined in
this document. Some of the changes seem unnecessary, but it could be that
I’m missing the motivation for them. Ideally, the introduction would work
harder at explaining why CoAP over these transports is as different from
CoAP over UDP as it is, focusing in particular on why the complexity of
having three syntactically incompatible headers is justified by the
benefits provided by such variations.

Additionally, it’s not clear from the introduction what the motivation
for using the mechanisms in this document is as compared to the
techniques described in section 10 (and its subsections) of RFC 7252.
With the exception of subscribing to resource state (which could be
added), it seems that such an approach is significantly easier to
implement and more clearly defined than what is in this document; and it
appears to provide the combined benefits of all four transports discussed
in this document. My concern here is that an explosion of transport
options makes it less likely that a client and server can find two in
common: the limit of the probability of two implementations having a
transport in common as the number of transports approaches infinity is
zero. Due to this likely decrease in interoperability, I’d expect to see
some pretty powerful motivation in here for defining a third, fourth,
fifth, and sixth way to carry CoAP when only TCP is available (I count
RFC 7252 http and https as the first and second ways in this
accounting).

Specific comments follow.

Section 3.3, paragraph 3 says that an initiator may send messages prior
to receiving the remote side’s CSM, even though the message may be larger
than would be allowed by that CSM.  What should the recipient of an
oversized message do in this case? In fact, I don’t see in here what a
recipient of a message larger than it allowed for in its CSM is supposed
to do in response at *any* stage of the connection. Is it an error? If
so, how do you indicate it? Or is the Max-Message-Size option just a
suggestion for the other side? This definitely needs clarification.
(Aside — it seems odd and somewhat backwards that TCP connections are
provided an affordance for fine-grained control over message sizes, while
UDP communications are not.)

Section 5 and its subsections define a new set of message types,
presumably for use only on connection-oriented protocols, although this
is only implied, and never stated. For example, some implementors may see
CSM, Ping, and Pong as potentially useful in UDP; and, finding no
prohibition in this document against using them, decide to give it a go.
Is that intended? If not, I strongly suggest an explicit prohibition
against using these in UDP contexts.

Section 5.3.2 says that implementations supporting block-wise transfers
SHOULD indicate the Block-wise Transfer Option. I can't figure out why
this is anything other than a "MUST". It seems odd that this document
would define a way to communicate this, and then choose to leave the
communicated options as “YES” and “YOUR GUESS IS AS GOOD AS MINE” rather
than the simpler and more useful “YES” and “NO”.

I find the described operation of the Custody Option in the operation of
Ping and Pong to be somewhat problematic: it allows the Pong sender to
unilaterally decide to set the Custody Option, and consequently
quarantine the Pong for an arbitrary amount of time while it processes
other operations. This seems impossible to distinguish from a
failure-due-to-timeout from the perspective of the Ping sender. Why not
limit this behavior only to Ping messages that include the Custody
Option?

I am similarly perplexed by the hard-coded “must do ALPN *unless* the
designated port takes the magical value 5684” behavior. I don’t think
I’ve ever seen a protocol that has such variation based on a hard-coded
port number, and it seems unlikely to be deployed correctly (I’m imaging
the frustration of: “I changed both the server and the client
configuration from the default port of 5684 to 49152, and it just stopped
working. Like, literally the *only* way it works is on port 5684. I've
checked firewall settings everywhere and don't see any special handling
for that port -- I just can't figure this out, and it's driving me
crazy.”). Given the nearly universal availability of ALPN in pretty much
all modern TLS libraries, it seems much cleaner to just require ALPN
support and call it done. Or *don’t* require ALPN at all and call it
done. But *changing* protocol behavior based on magic port numbers seems
like it’s going to cause a lot of operational heartburn.

[I have removed my comments about section 8.1, as I believe EKR is
managing the TLS-related issues for this document]

Although the document clearly expects the use of gateways and proxies
between these connection-oriented usages of CoAP and UDP-based CoAP,
Appendix A seems to omit discussion or consideration of how this
gatewaying can be performed. The following list of problems is
illustrative of this larger issue, but likely not exhaustive. (I'll note
that all of these issues evaporate if you move to a simpler scheme that
merely frames otherwise unmodified UDP CoAP messages)

Section A.1 does not indicate what gateways are supposed to do with
out-of-order notifications. The TCP side requires these to be delivered
in-order; so, do this mean that gateways observing a gap in sequence
numbers need to quarantine the newly received message so that it can
deliver the missing one first? Or does it deliver the newly-received
message and then discard the “stale” one when it arrives? I don’t think
that leaving this up to implementations is particularly advisable.

Section A.3 is a bit more worrisome. I understand the desired
optimization here, but where you reduce traffic in one direction, you run
the risk of exploding it in the other. For example, consider a coap+tcp
client connecting to a gateway that communicates with a CoAP-over-UDP
server. When that client wants to check the health of its observations,
it can send a Ping and receive a Pong that confirms that they are all
alive and well. In order to be able to send a Pong that *means* “all your
observations are alive and well,” the gateway has to verify that all the
observations are alive and well. A simple implementation of a gateway
will likely check on each observed resource individually when it gets a
Ping, and then send a Pong after it hears back about all of them. So, as
a client, I can set up, let’s say, two dozen observations through this
gateway. Then, with each Ping I send, the gateway sends two dozen checks
towards the server. This kind of message amplification attack is an
awesome way to DoS both the gateway and the server. I believe the
document needs a treatment of how UDP/TCP gateways handle notification
health checks, along with techniques for mitigating this specific
attack.

Section A.4 talks about the rather different ways of dealing with
unsubscribing from a resource. Presumably, gateways that get a reset to a
notification are expected to synthesize a new GET to deregister on behalf
of the client? Or is it okay if they just pass along the reset, and
expect the server to know that it means the same thing as a
deregistration? Without explicit guidance here, I expect server and
gateway implementors to make different choices and end up with a lack of
interop.

>From i-d nits (this appears to be in reference to Figure 1):
** There is 1 instance of too long lines in the document, the longest one
being 3 characters in excess of 72.