Re: [clue] AD Review: draft-ietf-clue-protocol-13

Adam Roach <adam@nostrum.com> Fri, 03 November 2017 20:06 UTC

Return-Path: <adam@nostrum.com>
X-Original-To: clue@ietfa.amsl.com
Delivered-To: clue@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C4E9B13FF77 for <clue@ietfa.amsl.com>; Fri, 3 Nov 2017 13:06:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.88
X-Spam-Level:
X-Spam-Status: No, score=-1.88 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WVOHCCj0oDq9 for <clue@ietfa.amsl.com>; Fri, 3 Nov 2017 13:06:44 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4070213FF76 for <clue@ietf.org>; Fri, 3 Nov 2017 13:06:44 -0700 (PDT)
Received: from Svantevit.roach.at (cpe-70-122-154-80.tx.res.rr.com [70.122.154.80]) (authenticated bits=0) by nostrum.com (8.15.2/8.15.2) with ESMTPSA id vA3K6f6C055804 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Fri, 3 Nov 2017 15:06:43 -0500 (CDT) (envelope-from adam@nostrum.com)
X-Authentication-Warning: raven.nostrum.com: Host cpe-70-122-154-80.tx.res.rr.com [70.122.154.80] claimed to be Svantevit.roach.at
To: Simon Pietro Romano <spromano@unina.it>
Cc: clue@ietf.org, Roberta Presta <roberta.presta@unina.it>
References: <a30828ea-1db8-fccd-9c2b-ddc0a1dcb08d@nostrum.com> <8D2EDAF8-014E-477A-AECD-79D944EA4503@unina.it>
From: Adam Roach <adam@nostrum.com>
Message-ID: <ac30c041-b269-484f-023a-0e8723133a5c@nostrum.com>
Date: Fri, 03 Nov 2017 15:06:41 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:52.0) Gecko/20100101 Thunderbird/52.4.0
MIME-Version: 1.0
In-Reply-To: <8D2EDAF8-014E-477A-AECD-79D944EA4503@unina.it>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/clue/2b_eceaVAshzcNd-mE1sxCWLQ48>
Subject: Re: [clue] AD Review: draft-ietf-clue-protocol-13
X-BeenThere: clue@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: CLUE - ControLling mUltiple streams for TElepresence <clue.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/clue>, <mailto:clue-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/clue/>
List-Post: <mailto:clue@ietf.org>
List-Help: <mailto:clue-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/clue>, <mailto:clue-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 03 Nov 2017 20:06:47 -0000

Thanks for your thorough response! I've removed those issues that you've 
addressed, and included responses to the remaining issues below.

IMPORTANT NOTE TO CLUE WORKING GROUP: There is a discussion regarding 
retransmission timers that you need to have. See the comments below for 
details.


On 10/6/17 1:37 PM, Simon Pietro Romano wrote:
>
>> Title: The rule of thumb is that all but a small handful of 
>> well-known acronyms need to be expanded in titles and abstracts. I 
>> recognize that "CLUE" is a bit tortured, as acronyms go, but the 
>> title of this document is, broadly speaking, opaque. Please change it 
>> to something meaningful, such as "Protocol for Controlling Multiple 
>> Streams for Telepresence (CLUE)”
>>
> [SPR] Done.

I think this got missed. The header of the current version still reads:

CLUE Working Group R. Presta
Internet-Draft S. P. Romano
Intended status: Standards Track University of Napoli
Expires: April 9, 2018 October 6, 2017

                              CLUE protocol
                       draft-ietf-clue-protocol-14

>
>> General: There are three instances of excessively long lines in the 
>> document.
>>
> [SPR] ???

The reformatting of XML in -14 has fixed this issue.

> BLOCKER: General: There are several mentions of timeouts and retry 
> thresholds in the text and its corresponding state machines; however, 
> the document neither defines nor cites a document as defining what 
> these timeout and retry values are. These need to be defined and 
> described. If the timer and retry scheme allows the two ends of the 
> connection to have different values for timeouts and number of 
> retries, then there need to be additional error procedures that allow 
> the MC and MP state machines to stay in sync (if the timer/retry 
> values can be different, it's possible for one state machine to 
> transition to "terminated," while the other is still active, and you 
> need messaging to clean this up).
>
> [SPR] Our idea was to rely on a single, fixed, application-level 
> (i.e., CLUE-level) timeout value, rather than leveraging a number of 
> them (T1, T2, T4, plus Timer A through Timer K) as in SIP. The value 
> of the one and only timer would be the usual RTT estimate (~500ms). 
> Differently from protocols like SIP, we are also leveraging retransmit 
> counts to get out of a timeout-driven retransmission loop. We’re not 
> adding such information to the revised version of the draft, because 
> we’d first like to get your feedback on the approach. If you (and the 
> rest of the group) believe this is a feasible approach, we can expand 
> on that in a further version of the document.

In thinking through what this scheme should look like, it occurs to me 
that the CLUE messages are defined to be sent over a reliable transport 
(SCTP), which has its own retransmission timers and eventual timeouts. 
Implementing a retransmission scheme on top of a reliable transport -- 
especially one as aggressive as you suggest above -- will put more 
traffic on the network when congestion occurs rather than less.

So, in the final analysis, I think the action here is to remove 
retransmission timers and retry counts altogether. If the underlying 
transport takes longer to detect a failure than is sensible for CLUE 
(and it likely does), then a supervisory timer that declares the session 
failed might make sense.

>
>> Section 5: The XML definition allows zero or more <clueId> elements 
>> to appear in a message. If more than one is allowed, the document 
>> should explain how multiple IDs are handled. If they are not, then 
>> the schema and/or text needs to prohibit having more than one.
>>
> [SPR] Actually, the schema states:
>
> <xs:element name="clueId" type="xs:string" minOccurs="0”/>
>
> This should indicate that clueId can appear either 0 or 1 times in a 
> valid XML document. The default value for maxOccurs is in fact 1. 
> Don’t you agree on that?

Ah, you're correct. I'm not used to seeing minOccurs without maxOccurs, 
but this is entirely fine as it's currently written.

>
>> BLOCKER: Section 5.1: The description of <supportedVersions> 
>> describes a scheme in which multiple supported versions can be 
>> listed; and, if the list is omitted, it implies that only the version 
>> described in <v> is supported. This text does not define (nor does 
>> any other text that I can find) what <v> should be set to when 
>> <supportedVersions> is used. Intuitively, it seems that <v> should be 
>> set to the largest minor version of the smallest major version 
>> advertised in <supportedVersions>, but that (or whatever the correct 
>> answer is) needs to be clearly spelled out.
>>
> [SPR] I believe we were giving for granted that “v” should be, in such 
> case, the highest supported version (i.e., largest major and larger 
> minor numbers). Now that you make me think around that, I would simply 
> say that the “v” attribute MUST be set to one of the versions that the 
> entity in question supports, as per the <supportedVersions> list. I 
> can, e.g., decide that I send an ‘options’ with “v=12.23” if the 
> <supportedVersions> list is like in the following:
>
> <version>33.44</version>
> <version>27.0</version>
> <version>12.345</version>
> <version>1.44</version>

I think this formulation is a problem. I assume the intention behind the 
"v" field is that some implementation that receives a version number in 
the "v" field with a major number higher than it understands is supposed 
to close the connection, since it runs a risk of misinterpreting the 
contents of messages. So, in your example, if you sent "v=12.23" to my 
implementation that only spoke major version 1, I would reject it, even 
though you also support major version 1. That's why I was thinking it 
should contain the smallest major version being advertised.

The minor version is obviously less useful in this context, since 
they're defined to be backwards and forwards compatible, but it's more 
useful to know the highest minor version supported than some random 
minor version, as it indicates the full feature set that is supported. 
The reason it's less useful is that the value can be parsed out of the 
<version> list.

>> BLOCKING: The state machines in section 6 and its subsections don't 
>> have transitions for all possible messages that could arrive in a 
>> state. This can cause interop issues. Please add text that clearly 
>> indicates whether such messages do or do not cause a transition. 
>> (This might be as simple as "messages not shown for a state do not 
>> cause the state to change," but only if you carefully check that this 
>> is true -- for example, what should an MP state machine do do if it 
>> gets a "CONF + ACK" in the state "WAIT FOR CONF"?)
>>
> [SPR] In the specific case you mention, I would expect that a 302 is 
> generated (Invalid Parameter), since the incoming CONF should not 
> contain the <ack> element. This said, I see your point related to the 
> potential lack of some edges in the state machine graph. This point is 
> obviously also related to the “timeout” issue you raised before. 
> Different ways of managing timeouts will definitely have different 
> impacts on the current state machines.
> We will defer the modification of the diagrams in question to the next 
> revision round. In the meanwhile, we’ll give a thought to potential 
> further unexpected transactions to be taken into account.

Based on my analysis above, I think the state machines will get a bit 
simpler, as they won't have retransmission counts any longer.

>
>> Section 6.2 describes a state machine that starts in a state called 
>> "WAIT FOR ADV." This state does not appear to be timer-supervised, 
>> meaning that implementations of this state machine can stay in this 
>> state literally forever. Is that the intention?
>>
> [SPR] The idea would be to open a socket and start listening for 
> incoming advertisements. Isn’t that OK?

At some point, I think you're going to want to conclude that the CLUE 
data channel isn't going to get set up correctly, and signal that to the 
application in some way. This is probably on the order of a minute or so.

>> Section 8 indicates that extensions need to indicate "the standard 
>> version of the protocol the extension refers to." Given that there is 
>> compatibility within a major version of a protocol, I think this 
>> means to say "the major standard version of the protocol that the 
>> extension refers to.”
>>
> [SPR] My extension might be associated with version 3.6 of the 
> protocol. This means that it will need to be supported by all versions 
> higher than that (3.7 onwards). Though, this does not entail that 
> versions 3.5 and lower should understand it. Right?

I'm a bit confused. Is the notion here that extensions, once introduced, 
always become a mandatory-to-implement part of the protocol? If that's 
the case, it's not clear why you would need to advertise them 
explicitly, since support would be implied by the version number itself. 
(This isn't unheard of: it's how NFS has historically handled version 
numbers, for example.) However, I didn't get the impression that CLUE 
protocol extensions were handled in such a way. If this is the 
intention, there needs to be a lot more text that makes this MTI aspect 
of extensions clear, and a reconsideration of why you're signaling them 
individually rather than inferring them from the version number.

>> Section 8 contains schema that contains a <version> element. It 
>> should be clarified that this is the *protocol* version, not the 
>> *extension* version (or, if it's the extension version, that needs to 
>> be spelled out too, but I think you'll need a new namespace for that...?)
>>
> [SPR] I thought this was clear from the third bullet point in the list:
>
> - the standard version of the protocol the extension refers to.
>
> Isn’t this enough? In such a case, what would you suggest to add?

Ah, I see. It wasn't clear that these three bullets corresponded to the 
elements in the XML. Perhaps make this clearer with something like "The 
values of the fields of the <extension> element take the following 
values:" (and maybe move it closer to the actual XML).

>
>> The remainder of this comment is non-blocking: I also find the 
>> "SHOULD" in this section to be highly perplexing. Can you explain the 
>> rationale behind requiring schema, but not requiring any description 
>> of what the schema *means?)
>>
> [SPR] It is like coding without adding comments/documentation. 
> Documentation is most welcome, but not mandatory. Don’t you agree?

I don't. It seems more like committing a header file without the 
corresponding implementation file. Just because people know what things 
are called doesn't mean that they can handle them. For interop, you're 
going to need some description of what the fields mean and how the 
messages are handled.

>>
>> The schema in section 9 contains:
>>
>> <xs:import namespace="urn:ietf:params:xml:ns:clue-info"
>> schemaLocation="data-model-schema-17.xsd"/>
>>
>> If you do not intend to bake this "-17" into the document (and I 
>> can't imagine you do), please add an RFC editor note to change it to 
>> something else upon publication.
>>
> [SPR] Done. I am not very familiar with RFC editor notes, indeed. Here 
> is what we added at the beginning of section 9:
>
> "NOTE TO THE RFC-Editor: Please replace "data-model-schema-17.xsd" 
> with the right schema location for the CLUE data model schema document 
> (which is still to be defined at the time of this writing) in this 
> section prior to publication as an RFC.”
>
> Does this work for you?

That's perfect. Thanks!

>> Section 10 in general: While it does consume a lot of space, I don't 
>> think that defining six rather different message types and then 
>> showing only *one* type in the examples is very illustrative of the 
>> protocol. I would *STRONGLY* suggest adding at least a response 
>> message, and ideally the example section should contain at least one 
>> example for each of the six message types.
>>
> [SPR] Will do in the (hopefully final) revision of the document. Let’s 
> first double check that the proposed modifications are OK.

Sounds good.

>
>> Section 12.1 has a strange double-double quote around the URN name, 
>> and section 12.3 repeats this for the MIME type.
>>
> [SPR] I don’t understand this. Can you please clarify?
>

Replace

    ""urn:ietf:params:xml:ns:clue-protocol"".

with

    "urn:ietf:params:xml:ns:clue-protocol".


And then replace

    This section registers the ""application/clue+xml"" MIME type.

with

    This section registers the "application/clue+xml" MIME type.


Thanks!

/a