[core] Comments on draft-ietf-core-coap-09

Cullen Jennings <fluffy@cisco.com> Wed, 18 April 2012 04:58 UTC

Return-Path: <fluffy@cisco.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 4348821F8458 for <core@ietfa.amsl.com>; Tue, 17 Apr 2012 21:58:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -107.54
X-Spam-Level:
X-Spam-Status: No, score=-107.54 tagged_above=-999 required=5 tests=[AWL=-1.941, BAYES_00=-2.599, GB_SUMOF=5, RCVD_IN_DNSWL_HI=-8, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id B2Jb2jTNnZmV for <core@ietfa.amsl.com>; Tue, 17 Apr 2012 21:58:05 -0700 (PDT)
Received: from mtv-iport-1.cisco.com (mtv-iport-1.cisco.com [173.36.130.12]) by ietfa.amsl.com (Postfix) with ESMTP id 1F95E21F8456 for <core@ietf.org>; Tue, 17 Apr 2012 21:58:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=fluffy@cisco.com; l=22525; q=dns/txt; s=iport; t=1334725085; x=1335934685; h=from:content-transfer-encoding:subject:date:message-id: to:mime-version; bh=Q41xW+O2+cArV5TQCIQy3/bXP9YbvZotGwb2Aa4LZgw=; b=Yt6aGCtTVF6uYXuvRPs1F8tOaLPUBhQBS5PIO2sI8C12fBHcNU9BQ5F8 1kX6HlCx9FidNIdT+H/tGzgefFzaDwaiCfqdOmVMYv3FMkDPhiDybFbOg BkQfw1CLFAymuQMJxFoQxesR+pg0laTuu+F4dX5Y0sJnybWzuncxUTFup A=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AlMGAJBIjk+rRDoH/2dsb2JhbAA7CRMBAYMHriiBB4IiAQcgMQEKS3YKIgIHh2wMmEiBKJ95imEEglKCTmMEiFyNE4VyiF6BaYMGgTUBBgE
X-IronPort-AV: E=Sophos;i="4.75,439,1330905600"; d="scan'208";a="37918510"
Received: from mtv-core-2.cisco.com ([171.68.58.7]) by mtv-iport-1.cisco.com with ESMTP; 18 Apr 2012 04:58:04 +0000
Received: from [192.168.4.100] (sjc-fluffy-8914.cisco.com [10.20.249.165]) by mtv-core-2.cisco.com (8.14.3/8.14.3) with ESMTP id q3I4v1Ef005964 for <core@ietf.org>; Wed, 18 Apr 2012 04:58:04 GMT
From: Cullen Jennings <fluffy@cisco.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 17 Apr 2012 22:58:03 -0600
Message-Id: <BC8529C9-1849-425A-899D-24F6C3C199E6@cisco.com>
To: core WG <core@ietf.org>
Mime-Version: 1.0 (Apple Message framework v1084)
X-Mailer: Apple Mail (2.1084)
Subject: [core] Comments on draft-ietf-core-coap-09
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/core>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Apr 2012 04:58:17 -0000

Multicast Optional or not	

I can't figure out if imputation need to implement or use multicast. Some of the discovery stuff implies it is needed. 



Multicast address

I think we should have IANA allocate a v4 and v6 default address to use for multicast 



IPSec

I'm wondering what parts of text around IPSec really need to be in the draft? I'm having a hard time finding anywhere where it has any normative impact. 

Section 1.2 Definition of reverse proxy

I think this definition needs to be re written. It does not make sense



Figure in section 2 (before 2.1). I think it would help if this showed where DTLS fits in to the stack. 


Proxy terminology. We have two very different types of proxies - one only speaks COAP and the other type translates between CoAP and HTTP. I would prefer a different terms for the translating type. Perhaps "translator" or "translator proxy". I understand people don't want to use gateway but it is what many people would call a gateway. 


Why two ways of dealing with number of option codes  - Having two ways to deal with number of option (the OC Option Count and the end of options maker) just leads to bugs. Is it really worth it to save 1 byte? 

I find MUST fit in single datagram confusing. If you want to say MUST be less than 65536 that would be clear but sort of not needed.


Section 3 Packet size - I'm a bit skeptical of a default packet size of less than 1280. That seems pretty big for when nothing else is known. Lets say I have an HTTP to COAP proxy sitting on a 1G interface with jumbo grams enables. And it is going to send a request to a CoAP node sitting on 802.15 mesh network. Do you really think sending 9k packets is OK? I think we should set the max packets size to something that has a open of working on constrained networks. 


section 3.1.1 generating 4.13 "if the payload was truncated" would be better as "the payload would be truncated"

Section 3.2 The text 
   A delta encoding is used between options, with the
   Option Number for each Option calculated as the sum of its Option
   Delta field and the Option Number of the preceding Option in the
   message, if any, or zero otherwise. 

is very hard to follow. Could you rewrite this. 


Having the a option delta of 15 mean different things based on if the Option Count Field is 0 or not just seems it adds complexity and bugs not worth the small compression gain. 

Table 5.10  - I think it would be clearer and easier for implementation with less bugs if No 5,6,8,9, and 15 were moved from type string to type opaque. Is there any good reasons they need to be string? 

 
Section 4 - clarify that the stop and wait is per flow. Or is it per destination? Given the later suggestion that every transaction could be on a new port, there is a big difference between the two. 




Section 4.3 - This section and a few other places uses the term "same security context". I know what you mean but I think we need to get more specific about exactly what this is. 

The first paragraph of section 4.4.1 and 4.4.2 seem pretty redundant to information provided earlier - I think you could remove them 


Multicast and DTLS 

As far as I can tell, the multicast stuff does not work with DTLS. So in a deployment using DTLS, we need to cover how and when some messages would not use DTLS and what the implications of this would be. If you conceder some of the initial use cases for multicast (like discovery and not having a group of lights "pop on like pop corn"), I think it is possible to meet the uses cases with reasonable security risks but seems like more is needed in the draft to address the interaction of DTLS and multicast. 

Section 4.5 - it says that the server SHOULD be aware that a request came in over multicast bug give the later text, it seems that this needs to be a MUST 

Section 4.5 - estimation Leisure. I do not see how someone implementing a COAP stack can figure out S, G, and R. I think we should define a configurable parameter called LEISURE_TIME and set it to 5 seconds. To be clear, I don't think the current algorithm is implementable. 

Section 4.6 - Stop and wait and wait 

Imagine A sends a CON Get request and gets and ACK but no response. How long does it wait for a repines? I assume it can not send any new requests to this destination during this time?

I think we need to mandate the upper bound on the number of parallel connections to a given destination. If we don't vendor A will do 4, then vendor B will make a "better" product that is faster because it does 10. And pretty soon we will have no congestion control. And this all needs to be a MUST not SHOULD. 

Imagine a broken client sends requests at a high rate to a server. Should the server send large responses at the same high rate? 

If a system is receiving multicast requests at a rate of 10 per second, should it send the responses at a rate of 10 per second regardless of congestion to that that destination? (Leisure delays these response but does not change the average rate). 

Have people implemented the congestion control and what does it look like under failure. 


Section 5.2.3 

Sort of wondering if the SHOULD here should be a a MAY or MUST. Can you explain when it would reasonable not to do this. 


Section 5.3 2nd paragraph from end has "An end-point receiving a token". I think receiving should be changed to something like "that did not generate". 


Last sentence in section 5.4.1 - I'm really not sure what this means. MUST be ignored by recipient is always a bit vague - "treated as unknown options" is likely better. I don't know what it means for option to have no meaning. 

Section 
5.4.1 Human readable error messages. I have yet to see a human readable error message in SIP or HTTP that provided any more useful information than the error code. Sure, I can imagine there might be a case where this is needed but often it is not. I think the SHOULD in 5.4.1 should be changed to a MAY with some text saying only do this if it adds real value. 

The text says that when you get a confirmable critical message you reject with RST. This seems wrong in multiple way. The CON / ACK etc stuff is on lower layer. The option message have to do with the request / response on upper layer. I think if you get an critical option you don't understand, you have to seed an error response regardless of if it is confirmable or not. The transfer of the message works so sending a RST seems wrong here. The text also contradicts itself in says they MUST be silently ignored then going on to say MAY send reset. This text also needs to deal with what happens when the request was multicast - clearly instantly sending a RST to a multicast message could cause a series repines implosion failure. 

Section 5.4.5 has 
   As these option numbers are even, they
   stand for elective options, and unless assigned a meaning, these MUST
   be silently ignored.

I find the unless assigned a meaning really confusing. They have been assigned a meaning and it is effectively "No Operation" - I think it would be better to just phrase it more like that. 


Caching

The interaction of security and machining seems fairly vague. Can data retrieved over a secret connection be cached. I certainly hope so even if it is only cached by the client that did the get. On the other hand it can not be arbitrarily hand to to others. I think a bit more detail is needed in this space. 

In section 5.6, is says the caching only depends on reply code not the method. But this does not seem to actually be the case as we did deeper. I doubt caching the response to a POST is something we want to do in most cases. 

Should the default Max-Age be zero for PUT, POST, and DELETE? Even for GET, most things will have to set it to zero so it might be a better overall to have default at zero. Thoughts? 

The idea that all options have to match is a bit concerning. I'm not sure I would want to change this but it does worry me that we can't have options that don't invalidate the cache. Perhaps we want each new option to define if it is include in cache check algorithm or not. 





Proxying

Need explanation of interaction with DTLS. 

When I am using a proxy and sending it a COAP URL, I find it totally lame that I have to send it in a totally different option than if I was not using a proxy. I think we should change this unless there is some really good reasons for doing it. Even when using a HTTP URI, I'd rather just add a scheme than use proxy-URI. 

In paragraph 3 of section 5.7 - should say how the proxy can "recognized as identifying the proxy end-point,"

Last paragraph of section 5.7. I'm worried about slowly extending the life time of the cache by each step not taking into account the network latency. So if there is 2 condos of latency on the mesh network and device A passed something with a Max-Life to 10 seances to Cache B that later caches to cache B, it could end up living for multiple seconds longer than it should have. 




Section 5.8.2 what response code is returned if the PSOT both say created a new resource and deleted an old one and perhaps changed a third. 

5.8.3 make lea that Content-Type option is not required

Few places you have "but idempotent" that needs to change to "but is idempotent"


In 5.9.1.1. has text 
    A cache SHOULD mark any stored response for the
   created resource as not fresh.
seems like this should be a MUST 


The text 
   When a cache receives a 2.03 (Valid) response, it needs to update the
   stored response with the value of the Max-Age Option included in the
   response (see Section 5.6.2).
should have a MUST in it 

and in 
   However, a cache SHOULD mark any
   stored response for the changed resource as not fresh.
SHOULD should be a MUST


The 
   The representation format is
   specified by the media type given in the Content-Type Option.
seems to be repeated all over the place. Can it be said just once?


Caching 4.xx responses. 

It looks to me like a proxy could cache the "un authorized" error response. That seems wrong. A different client might be authorized. But this gets back to lack of clarity of how security will work in the proxy case. 

Response codes 4.03 to 4.15. I don't think you can define these by reference to the HTTP spec. The text there does not make sense for COAP. I think you need to provide the normative definitions in this spec. 


Caching something like 5.04 timeout sounds like a bad idea. What would be the Max-Age of this repines? We have lots of responses that will happen in overload conditions that probably should not be cached. 


Section 5.10 

Trivial detail but rather see this as two tables. 


Imagine a proxy that does not understand Max-Age. The server sends a response to proxy with Max-Age of 10 seconds. The proxy does not understand this so it removes this. Now the client that made the request to the proxy things the Max-Age in the response is 60 seconds. This seems broken - am I missing something about how this works?



section 5.10.2 

Very confusing things could happen with URI-Host and NAT. Imagine that one side is ing 10.0.1/24 space and other side of nat is using 10.0.2/24 space. Lets say the NAT maps 10.0.1.100 to 10.0.2.200. If the client includes a URI-Host option, the far sever will think this is for 10.0.1.100 yet if the client does not include a URI-Host option, the far side server will think this is for 10.0.2.200. So we could get in a case where if the server reject things with 10.0.1.100 in the URI-Host. I think we need some more care to outline how things are supposed to work and how the code needs to be written to work in the presence of NATs. I will note most of the V4 to V6 transition mechanism look a lot like NATs with respect to this sort of problem so we can't just wish this problem away. 


Last para of 5.10.2. Suspect mean URi-Query can occur zero or more times. current text says one not zero. 

2nd para 5.10.3 - this should ref section 6.2 not RFC 3986

5.10.4 - make clear this is option even when threes is a payload caring content 

5.10.5 - make clear order of preference. Is it the first thing in list that is the most preferred?

5.10.8

Imagine retiring two location paths that both have a differ query arguments. How does that work? 

Can the libation URI be included if the resource was just changed but not created. If it was created do we need it to be a MUST not MAY on return ? 



5.10.9 

Assuming that an empty ETag is a valid ETag, using this as a condition for existence won't work.The ad-hoc nature of this seems just sort of wrong. I'd rather see an If-Exists option for this functionality. 

The 3rd from last paragraph starting with "It none of" seems like it could be removed and just made otherwise to the condition if the paragraph above. This will make it easier to keep track of the when 4.12 happens. 

The 2nd from last paragraph suggest all other error take precedence over 4.12. This is very hard to implement in some cases. I would remove this. Many systems will want to check and reject based on pre-conditions before doing all the other things that might cause an error. 


Section 6.1

Max URI Lengths. SIP could nervier agree on a max length for a URI. So it die not define one. This turned out to be a mistake because when you write code, you have to assume some limit and everyone assumed different limits. COAP is meant for small M2M type apps. My belief is we should set a limit to URI sizes in it - and that limit should be fairly small. Something in the oder of say 100 bytes. These do not need to be human readable URLs and 2^(100*8) is a petty big number. 

Lets ban % encoding in for CoAP URI - I can't imagine any reason we real need it and it is an endless source of bugs. 

Allowing %2F in the path elements sounds like another resource of endless bugs. 

Where you have "is located at that IP address", given NATs, IP mobility, LISP, and so on, I think that saying "can be reach at that IP address" would be more appropriate. 

What you want to say about if DNS is required or not. Obviously I think the answer needs to be it is not required but whatever the case is needs to be made clear. 

The ABNF for IPv6 addresses is more complicated that some implementers think. I would like to see a not cautioning them on and pointing it out in particular. 

Given the size limitations, we could define ~ to be an abbreviation for /.wellknown/ and ~c to be an abbreviation for /.wellknown/core There is going to be a lot of use of this over time. 

For cases where DNS is used, need to discuss URI host resolution in DNS. The thing I want here is that SRV is sued as that allows the port, as well as IP, to be specified in the DNS. There are many parts of HTTP that would be much easier if we could put the port in DNS. THe move towards large shared hosting in cloud deployments and v4 address "completion" make this even more important. Note I think DNS should optional to implement, but if you do a DNS lookup, it should be a SRV lookup. 




Section 6.2. I think it is a super bad idea to not allow caching of coaps data under any circumstances. Suggesting that proxies only work with no security is more or less equivalent to saying you can't use them for lots of deployments. 


Section 6.4. I find the use /url/ really weird. 

Section 7.1 - SHOULD is totally wimpy here. I think we need to pick MAY moor MUST. I will be arguing for MUST. 

Section 7.2 - I don't think you can mandate that a server MUST lien on the default port. This means two servers behind a NAT can't really work 

It says other endpoint can be hosted in the "in the dynamic port space". I think should be changed to "at other ports". Is OK to use non dynamic ports for many use cases. 

I don't care but should there be a default port in the 6LowPan compressed space so you know where to send discovery requests?

I think the DLTS port is also a MUST be supported in same way as non DTLS port. You need a secret way to do discovery. 


Section 8 - I tried passing a COAP URL in a HTTP request line  of various libraries, firewalls, and other tools. In theory it might work, but my observation is that in practice, it does not. I'm not sure how to fix this. One possible solution  is making a way to translate well know http URL into a coap URL. So for example, the translator proxy that received an HTTP request like 

http://www.proxy.com/.wellknonw/core-translate/1.2.3.4_4567/foo/bar?a=3

would translate that to

coap://1.2.3.4:4567/foo/bar?a=3 

If we did something like this, your standard web libraries running on things like google app engine could make REST calls to the proxy that resulted in a request to the CoAP device. Perhaps you think this approach is the reverse proxy approach but one way or another, I can see how to get the current solution working very well in practice. 

Section 8.1.1. in mapping to HTTP ETags, I think the spec needs to deal with the difference of strong and weak ETags. 


Section 8.2.4

Lets talk about al this 200, 204, 201 stuff. Do we have concrete uses cases where we really need to separate all this. I'm trying to decide if it is just a source o bugs or something useful.  

The text in 2nd para looks wrong. If a POST just modifies an existing resource, but that resource has an URI, seems like one could return a 200. 

Section 9 has 
   It is recommended that an application
   environment use consistent values for these parameters.
I'm confused on what this means. Does this man one could not deploy future extensions of this spec that dynamical adjusted the values if some nodes the network did not do that?

In saying that 
   The values for RESPONSE_TIMEOUT, RESPONSE_RANDOM_FACTOR, and
   MAX_RETRANSMIT may be configured to values specific to the
   application environment,

I think we need to say more to have this be congestion safe. For example, we could say device MUST NOT be allow values smaller than than 2 seconds, 1.0 random factor, and 1 retransmit respectively. 


Section 10.1.3 

This section confuses me about what is in the Authority part of a capo URI. Half of it seems to think it is a EUI64 and the other half thinks it an IP address. How does the discovery process learn that mapping, securely, on every device? Lets take a new stab at this section assuming using DLTSL with certificates and dynamically assigned IP addresses. 

 
I think the RSA_PSK mode should be MAY not SHOULD 

My favorite line in the security section is 
	"CoAP servers SHOULD NOT accept multicast requests that can not
        be authenticated.  "
Given we don't have any machoism to authenticate multicast requests, this needs some work. 

Section 11.2 

In the IANA registry, I'd explicitly say the fenceposts were all multiples of 14 instead of 14,28.42 ..

I'd reserver a single EXP code point for experimental options I would add the following text for this code point 

   The value EXP has been made available for the purposes of
   experimentation.  This value is not meant for vendor specific use of
   any sort and it MUST NOT be used for operational deployments.


Section 11.3

 All the entries other that application/link-format are inappropriate for this table. They make sense for rendering to humans in email and HTTP, but in the context of M2M communications, they are basically transfer encodings and do not define usable semantics of the data. Using them will not lead to interoperability but exactly the oppose. We should make it easy for people to register the actual formats they are using and not just call everything text/plain. I've pointed this out in the past - if someone has convincing arguments from the list, glad to read a pointer to them. 

Reserving 201 to 255 for private use does not make any sense. I have no idea what private use is in this context. 

The 0-200 range should be reserved for "IETF Review" not "Expert Review". Putting an expert in the position where they had to say "no, I don't think your future format will be used much so I won't give you a short code" is not a decision we can reasonable ask an expert review to do. 

The example should use an media type that is registered. 


Section 13.1

I think we need a series pass to move a bunch of this to informational. 

Appendix B

The Code=1 or Code=69 in figures seems redundant


Appendix B


2nd' example - need to say more about where the IP address came from 


example 4 and 5 seem like a great example of why we should not allow URI like this. 


Appendix D 

Unclear what the status of this section of the spec is. Are devices supposed to support this?

It seems like we need to specify how identity is created from public key for this to be interoperable. I think we should ref TLS draft for this. 



I think a pass should be made though the document finding every SHOULD and doing either changing it to a MAY or MUST or, if it is left a SHOULD, clearly explaining in what sort of cases one would do and when one would not and the implications of that. As it stands, it will be very hard to decide if two devices will interoperate if you don't know which one did which of the SHOULDs.

There is a lot of duplicated normative text. I understand that a bit happens but take for example the point that 
      The response SHOULD include a human-readable error message.
This gets said about five times.