Re: [clue] AD Review: draft-ietf-clue-protocol-13

Adam Roach <adam@nostrum.com> Fri, 03 November 2017 20:11 UTC

Return-Path: <adam@nostrum.com>
X-Original-To: clue@ietfa.amsl.com
Delivered-To: clue@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C5F5313FF79 for <clue@ietfa.amsl.com>; Fri, 3 Nov 2017 13:11:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.88
X-Spam-Level:
X-Spam-Status: No, score=-1.88 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hdIC_ZCgvT5r for <clue@ietfa.amsl.com>; Fri, 3 Nov 2017 13:11:48 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2433A13FF48 for <clue@ietf.org>; Fri, 3 Nov 2017 13:11:48 -0700 (PDT)
Received: from Svantevit.roach.at (cpe-70-122-154-80.tx.res.rr.com [70.122.154.80]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id vA3KBk3F056340 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Fri, 3 Nov 2017 15:11:47 -0500 (CDT) (envelope-from adam@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host cpe-70-122-154-80.tx.res.rr.com [70.122.154.80] claimed to be Svantevit.roach.at
From: Adam Roach <adam@nostrum.com>
To: Simon Pietro Romano <spromano@unina.it>
Cc: clue@ietf.org, Roberta Presta <roberta.presta@unina.it>
References: <a30828ea-1db8-fccd-9c2b-ddc0a1dcb08d@nostrum.com> <8D2EDAF8-014E-477A-AECD-79D944EA4503@unina.it> <ac30c041-b269-484f-023a-0e8723133a5c@nostrum.com>
Message-ID: <a10cf598-26a0-ddc9-0c7f-6978231d2eff@nostrum.com>
Date: Fri, 03 Nov 2017 15:11:46 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <ac30c041-b269-484f-023a-0e8723133a5c@nostrum.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/clue/fy3X-KMAkwNTZIiBjVxBhmLuJU8>
Subject: Re: [clue] AD Review: draft-ietf-clue-protocol-13
X-BeenThere: clue@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: CLUE - ControLling mUltiple streams for TElepresence <clue.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/clue>, <mailto:clue-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/clue/>
List-Post: <mailto:clue@ietf.org>
List-Help: <mailto:clue-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/clue>, <mailto:clue-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Nov 2017 20:11:50 -0000

Ah, I almost forgot -- please replace the RFC2119 boilerplate in section 
3 with RFC8174 boilerplate. Thanks!

/a

On 11/3/17 3:06 PM, Adam Roach wrote:
> Thanks for your thorough response! I've removed those issues that 
> you've addressed, and included responses to the remaining issues below.
>
> IMPORTANT NOTE TO CLUE WORKING GROUP: There is a discussion regarding 
> retransmission timers that you need to have. See the comments below 
> for details.
>
>
> On 10/6/17 1:37 PM, Simon Pietro Romano wrote:
>>
>>> Title: The rule of thumb is that all but a small handful of 
>>> well-known acronyms need to be expanded in titles and abstracts. I 
>>> recognize that "CLUE" is a bit tortured, as acronyms go, but the 
>>> title of this document is, broadly speaking, opaque. Please change 
>>> it to something meaningful, such as "Protocol for Controlling 
>>> Multiple Streams for Telepresence (CLUE)”
>>>
>> [SPR] Done.
>
> I think this got missed. The header of the current version still reads:
>
> CLUE Working Group R. Presta
> Internet-Draft S. P. Romano
> Intended status: Standards Track University of Napoli
> Expires: April 9, 2018 October 6, 2017
>
>                              CLUE protocol
>                       draft-ietf-clue-protocol-14
>
>>
>>> General: There are three instances of excessively long lines in the 
>>> document.
>>>
>> [SPR] ???
>
> The reformatting of XML in -14 has fixed this issue.
>
>> BLOCKER: General: There are several mentions of timeouts and retry 
>> thresholds in the text and its corresponding state machines; however, 
>> the document neither defines nor cites a document as defining what 
>> these timeout and retry values are. These need to be defined and 
>> described. If the timer and retry scheme allows the two ends of the 
>> connection to have different values for timeouts and number of 
>> retries, then there need to be additional error procedures that allow 
>> the MC and MP state machines to stay in sync (if the timer/retry 
>> values can be different, it's possible for one state machine to 
>> transition to "terminated," while the other is still active, and you 
>> need messaging to clean this up).
>>
>> [SPR] Our idea was to rely on a single, fixed, application-level 
>> (i.e., CLUE-level) timeout value, rather than leveraging a number of 
>> them (T1, T2, T4, plus Timer A through Timer K) as in SIP. The value 
>> of the one and only timer would be the usual RTT estimate (~500ms). 
>> Differently from protocols like SIP, we are also leveraging 
>> retransmit counts to get out of a timeout-driven retransmission loop. 
>> We’re not adding such information to the revised version of the 
>> draft, because we’d first like to get your feedback on the approach. 
>> If you (and the rest of the group) believe this is a feasible 
>> approach, we can expand on that in a further version of the document.
>
> In thinking through what this scheme should look like, it occurs to me 
> that the CLUE messages are defined to be sent over a reliable 
> transport (SCTP), which has its own retransmission timers and eventual 
> timeouts. Implementing a retransmission scheme on top of a reliable 
> transport -- especially one as aggressive as you suggest above -- will 
> put more traffic on the network when congestion occurs rather than less.
>
> So, in the final analysis, I think the action here is to remove 
> retransmission timers and retry counts altogether. If the underlying 
> transport takes longer to detect a failure than is sensible for CLUE 
> (and it likely does), then a supervisory timer that declares the 
> session failed might make sense.
>
>>
>>> Section 5: The XML definition allows zero or more <clueId> elements 
>>> to appear in a message. If more than one is allowed, the document 
>>> should explain how multiple IDs are handled. If they are not, then 
>>> the schema and/or text needs to prohibit having more than one.
>>>
>> [SPR] Actually, the schema states:
>>
>> <xs:element name="clueId" type="xs:string" minOccurs="0”/>
>>
>> This should indicate that clueId can appear either 0 or 1 times in a 
>> valid XML document. The default value for maxOccurs is in fact 1. 
>> Don’t you agree on that?
>
> Ah, you're correct. I'm not used to seeing minOccurs without 
> maxOccurs, but this is entirely fine as it's currently written.
>
>>
>>> BLOCKER: Section 5.1: The description of <supportedVersions> 
>>> describes a scheme in which multiple supported versions can be 
>>> listed; and, if the list is omitted, it implies that only the 
>>> version described in <v> is supported. This text does not define 
>>> (nor does any other text that I can find) what <v> should be set to 
>>> when <supportedVersions> is used. Intuitively, it seems that <v> 
>>> should be set to the largest minor version of the smallest major 
>>> version advertised in <supportedVersions>, but that (or whatever the 
>>> correct answer is) needs to be clearly spelled out.
>>>
>> [SPR] I believe we were giving for granted that “v” should be, in 
>> such case, the highest supported version (i.e., largest major and 
>> larger minor numbers). Now that you make me think around that, I 
>> would simply say that the “v” attribute MUST be set to one of the 
>> versions that the entity in question supports, as per the 
>> <supportedVersions> list. I can, e.g., decide that I send an 
>> ‘options’ with “v=12.23” if the <supportedVersions> list is like in 
>> the following:
>>
>> <version>33.44</version>
>> <version>27.0</version>
>> <version>12.345</version>
>> <version>1.44</version>
>
> I think this formulation is a problem. I assume the intention behind 
> the "v" field is that some implementation that receives a version 
> number in the "v" field with a major number higher than it understands 
> is supposed to close the connection, since it runs a risk of 
> misinterpreting the contents of messages. So, in your example, if you 
> sent "v=12.23" to my implementation that only spoke major version 1, I 
> would reject it, even though you also support major version 1. That's 
> why I was thinking it should contain the smallest major version being 
> advertised.
>
> The minor version is obviously less useful in this context, since 
> they're defined to be backwards and forwards compatible, but it's more 
> useful to know the highest minor version supported than some random 
> minor version, as it indicates the full feature set that is supported. 
> The reason it's less useful is that the value can be parsed out of the 
> <version> list.
>
>>> BLOCKING: The state machines in section 6 and its subsections don't 
>>> have transitions for all possible messages that could arrive in a 
>>> state. This can cause interop issues. Please add text that clearly 
>>> indicates whether such messages do or do not cause a transition. 
>>> (This might be as simple as "messages not shown for a state do not 
>>> cause the state to change," but only if you carefully check that 
>>> this is true -- for example, what should an MP state machine do do 
>>> if it gets a "CONF + ACK" in the state "WAIT FOR CONF"?)
>>>
>> [SPR] In the specific case you mention, I would expect that a 302 is 
>> generated (Invalid Parameter), since the incoming CONF should not 
>> contain the <ack> element. This said, I see your point related to the 
>> potential lack of some edges in the state machine graph. This point 
>> is obviously also related to the “timeout” issue you raised before. 
>> Different ways of managing timeouts will definitely have different 
>> impacts on the current state machines.
>> We will defer the modification of the diagrams in question to the 
>> next revision round. In the meanwhile, we’ll give a thought to 
>> potential further unexpected transactions to be taken into account.
>
> Based on my analysis above, I think the state machines will get a bit 
> simpler, as they won't have retransmission counts any longer.
>
>>
>>> Section 6.2 describes a state machine that starts in a state called 
>>> "WAIT FOR ADV." This state does not appear to be timer-supervised, 
>>> meaning that implementations of this state machine can stay in this 
>>> state literally forever. Is that the intention?
>>>
>> [SPR] The idea would be to open a socket and start listening for 
>> incoming advertisements. Isn’t that OK?
>
> At some point, I think you're going to want to conclude that the CLUE 
> data channel isn't going to get set up correctly, and signal that to 
> the application in some way. This is probably on the order of a minute 
> or so.
>
>>> Section 8 indicates that extensions need to indicate "the standard 
>>> version of the protocol the extension refers to." Given that there 
>>> is compatibility within a major version of a protocol, I think this 
>>> means to say "the major standard version of the protocol that the 
>>> extension refers to.”
>>>
>> [SPR] My extension might be associated with version 3.6 of the 
>> protocol. This means that it will need to be supported by all 
>> versions higher than that (3.7 onwards). Though, this does not entail 
>> that versions 3.5 and lower should understand it. Right?
>
> I'm a bit confused. Is the notion here that extensions, once 
> introduced, always become a mandatory-to-implement part of the 
> protocol? If that's the case, it's not clear why you would need to 
> advertise them explicitly, since support would be implied by the 
> version number itself. (This isn't unheard of: it's how NFS has 
> historically handled version numbers, for example.) However, I didn't 
> get the impression that CLUE protocol extensions were handled in such 
> a way. If this is the intention, there needs to be a lot more text 
> that makes this MTI aspect of extensions clear, and a reconsideration 
> of why you're signaling them individually rather than inferring them 
> from the version number.
>
>>> Section 8 contains schema that contains a <version> element. It 
>>> should be clarified that this is the *protocol* version, not the 
>>> *extension* version (or, if it's the extension version, that needs 
>>> to be spelled out too, but I think you'll need a new namespace for 
>>> that...?)
>>>
>> [SPR] I thought this was clear from the third bullet point in the list:
>>
>> - the standard version of the protocol the extension refers to.
>>
>> Isn’t this enough? In such a case, what would you suggest to add?
>
> Ah, I see. It wasn't clear that these three bullets corresponded to 
> the elements in the XML. Perhaps make this clearer with something like 
> "The values of the fields of the <extension> element take the 
> following values:" (and maybe move it closer to the actual XML).
>
>>
>>> The remainder of this comment is non-blocking: I also find the 
>>> "SHOULD" in this section to be highly perplexing. Can you explain 
>>> the rationale behind requiring schema, but not requiring any 
>>> description of what the schema *means?)
>>>
>> [SPR] It is like coding without adding comments/documentation. 
>> Documentation is most welcome, but not mandatory. Don’t you agree?
>
> I don't. It seems more like committing a header file without the 
> corresponding implementation file. Just because people know what 
> things are called doesn't mean that they can handle them. For interop, 
> you're going to need some description of what the fields mean and how 
> the messages are handled.
>
>>>
>>> The schema in section 9 contains:
>>>
>>> <xs:import namespace="urn:ietf:params:xml:ns:clue-info"
>>> schemaLocation="data-model-schema-17.xsd"/>
>>>
>>> If you do not intend to bake this "-17" into the document (and I 
>>> can't imagine you do), please add an RFC editor note to change it to 
>>> something else upon publication.
>>>
>> [SPR] Done. I am not very familiar with RFC editor notes, indeed. 
>> Here is what we added at the beginning of section 9:
>>
>> "NOTE TO THE RFC-Editor: Please replace "data-model-schema-17.xsd" 
>> with the right schema location for the CLUE data model schema 
>> document (which is still to be defined at the time of this writing) 
>> in this section prior to publication as an RFC.”
>>
>> Does this work for you?
>
> That's perfect. Thanks!
>
>>> Section 10 in general: While it does consume a lot of space, I don't 
>>> think that defining six rather different message types and then 
>>> showing only *one* type in the examples is very illustrative of the 
>>> protocol. I would *STRONGLY* suggest adding at least a response 
>>> message, and ideally the example section should contain at least one 
>>> example for each of the six message types.
>>>
>> [SPR] Will do in the (hopefully final) revision of the document. 
>> Let’s first double check that the proposed modifications are OK.
>
> Sounds good.
>
>>
>>> Section 12.1 has a strange double-double quote around the URN name, 
>>> and section 12.3 repeats this for the MIME type.
>>>
>> [SPR] I don’t understand this. Can you please clarify?
>>
>
> Replace
>
>    ""urn:ietf:params:xml:ns:clue-protocol"".
>
> with
>
>    "urn:ietf:params:xml:ns:clue-protocol".
>
>
> And then replace
>
>    This section registers the ""application/clue+xml"" MIME type.
>
> with
>
>    This section registers the "application/clue+xml" MIME type.
>
>
> Thanks!
>
> /a