Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)

Thank you, Med.

Is the presentation about the signal-channel I-D already included in the agenda?
I can spare at least 5min from `2. Controlling Filtering Rules Using DOTS Signal Channel (15 min)` for this discussion.

thanks,
Kaname

On 2019/07/23 22:56, mohamed.boucadair@orange.com wrote:
> Re-,
>
> Thank you, Tiru,
>
> All: FWIW, we prepared a set of slides to expose the pending DISCUSS point from Mirja. The slides are available at:
> https://datatracker.ietf.org/meeting/105/materials/slides-105-dots-heartbeat-mechanism-mirjas-discuss-on-the-signal-channel-i-d-00
>
> Cheers,
> Med
>
>> -----Message d'origine-----
>> De : Konda, Tirumaleswar Reddy [mailto:TirumaleswarReddy_Konda@McAfee.com]
>> Envoyé : mardi 23 juillet 2019 08:41
>> À : BOUCADAIR Mohamed TGI/OLN; Benjamin Kaduk; Valery Smyslov
>> Cc : dots-chairs@ietf.org; dots@ietf.org
>> Objet : RE: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
>>
>>> -----Original Message-----
>>> From: mohamed.boucadair@orange.com
>>> <mohamed.boucadair@orange.com>
>>> Sent: Tuesday, July 23, 2019 11:02 AM
>>> To: Konda, Tirumaleswar Reddy
>>> <TirumaleswarReddy_Konda@McAfee.com>; Benjamin Kaduk
>>> <kaduk@mit.edu>; Valery Smyslov <valery@smyslov.net>
>>> Cc: dots-chairs@ietf.org; dots@ietf.org
>>> Subject: RE: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
>>>
>>>
>>>
>>> Hi Tiru, all,
>>>
>>> Please see inline.
>>>
>>> Cheers,
>>> Med
>>>
>>>> -----Message d'origine-----
>>>> De : Konda, Tirumaleswar Reddy
>>>> [mailto:TirumaleswarReddy_Konda@McAfee.com]
>>>> Envoyé : dimanche 21 juillet 2019 08:52 À : Benjamin Kaduk; Valery
>>>> Smyslov Cc : dots-chairs@ietf.org; BOUCADAIR Mohamed TGI/OLN;
>>>> dots@ietf.org Objet : RE: [Dots] Mirja's DISCUSS: Pending Point (AD
>>>> Help Needed)
>>>>
>>>> Hi Ben,
>>>>
>>>> There seems to several confusions regarding the heartbeat mechanism, I
>>>> will try to address all the comments/Discuss from you, Mirja and
>>>> Valery
>>>> below:
>>>>
>>>> [1] https://tools.ietf.org/html/rfc7252 is specific to UDP transport
>>>> (and does not deal with TCP). Please see the first paragraph in
>>>> https://tools.ietf.org/html/rfc7252#section-3. The message
>>>> transmission parameters (max-retransmit, ack-timeout and
>>>> ack-random-factor) and missing-hb-allowed discussed in DOTS signal
>>>> channel are specific to UDP transport.
>>>>
>>>> [2] CoAP over TCP is discussed in https://tools.ietf.org/html/rfc8323.
>>>> Please see the following differences b/w CoAP-over UDP and
>>>> CoAP-over-TCP relevant to our discussion:
>>>>
>>>> a) CoAP ping/pong defined in RFC7252 (uses Empty confirmable message
>>>> and
>>>> reset) will not work for CoAP-over-TCP. As per
>>>> https://tools.ietf.org/html/rfc8323#section-3.4, Empty messages (Code
>>>> 0.00) can always be sent and MUST be ignored by the recipient.
>>>> CoAP-over- TCP defines its own CoAP ping/pong for connection health
>>>> (see https://tools.ietf.org/html/rfc8323#section-5.4).
>>>>
>>>> b)Confirmable  and Non-confirmable message types are specific to UDP,
>>>> and are not supported in CoAP-over-TCP.
>>>>
>>>> [3] For TCP, if no ack is received for CoAP ping for specific
>>>> duration, TCP will close the connection, and the DOTS client will have
>>>> to re- establish the TCP connection. missing-hb-allowed is of no use
>>>> for TCP. We are all in the same page for TCP, and the draft can
>> probably
>>>>        be updated for better clarity.
>>>>
>>>> [4] Now coming to UDP, please see my responses below:
>>>>
>>>> a) As you already know, DOTS signal channel uses heartbeat exchange in
>>>> both directions, and hence CoAP ping is sent by both DOTS client and
>>>> server.
>>>> b) CoAP ping is a confirmable message and hence the exponential
>>>> back-off with the default value of MAX_RETRANSMIT is 4
>>>> (https://tools.ietf.org/html/rfc7252#section-4.8).
>>>> c) CoAP ping is the only confirmable message exchanged during attack
>>>> (all other messages exchanged during an attack are non-confirmable).
>>>> The specification allows distinct values for message transmission
>>>> parameters and missing-hb-allowed to be used during attack and peace
>>> times.
>>>> To handle congestion conditions during an attack, the specification
>>>> allows two options:
>>>>
>>>> [Option a] By setting MAX_RETRANSMIT to 1, exponential-back off is
>>>> avoided and missing-hb-allowed set to a very higher value (e.g. 20) to
>>>> handle congestion (high packet loss). The draft can be updated to
>>>> explain [Option a] in more detail.
>>>> [Option b] The CoAP MAX_RETRANSMIT default value of 4 is not modified,
>>>> and for example, missing-hb-allowed can be set to 5 (since 4 transmits
>>>> are not sufficient to detect the peer is not alive during congestion).
>>>>
>>> [Med] We can add this text to illustrate the configuration flexibility:
>>>
>>>     The specification allows for a flexible retry configuration when an
>>>     unreliable transport is in use.  For example, a server may be tweaked
>>>     to return a lower 'missing-hb-allowed' (e.g., 5) value but delegate
>>>     the retransmission to the underlying CoAP library by setting 'max-
>>>     retransmit' to a high value (e.g., 3).  The server may also be
>>>     configured to return a 'max-retransmit' set to '1' together with a
>>>     higher 'missing-hb-allowed' value (e.g., 15).
>> Looks good, Both these techniques are used by protocols today, I see DTLS
>> heartbeat uses retransmit and exponential back-off (see
>> https://tools.ietf.org/html/rfc6347#section-4.2.4.1) for liveness check
>> and in STUN usage for consent freshness
>> (https://tools.ietf.org/html/rfc7675) STUN binding requests are sent
>> periodically.
>>
>> Cheers,
>> -Tiru
>>
>>>
>>>> The Discuss from Mirja is not to rely on the CoAP ping/pong but to
>>>> define it in the DOTS layer itself (please see
>>>> https://mailarchive.ietf.org/arch/msg/dots/V6vv28zDpdY5eR_kaB7L-
>>> 60bhkk
>>>> ) and suggested to go with an alternate design using non-confirmable
>>>> messages. The alternate design won't work is our assessment, please
>>>> see my response
>>>>
>>> https://mailarchive.ietf.org/arch/msg/dots/QRMfsmhPTFksN6a_nBBKimVx-
>>> lM
>>>> Cheers,
>>>> -Tiru
>>>>
>>>>> -----Original Message-----
>>>>> From: Dots <dots-bounces@ietf.org> On Behalf Of Benjamin Kaduk
>>>>> Sent: Sunday, July 21, 2019 9:35 AM
>>>>> To: Valery Smyslov <valery@smyslov.net>
>>>>> Cc: dots-chairs@ietf.org; mohamed.boucadair@orange.com;
>>>>> dots@ietf.org
>>>>> Subject: Re: [Dots] Mirja's DISCUSS: Pending Point (AD Help Needed)
>>>>>
>>>>> This email originated from outside of the organization. Do not click
>>>> links or
>>>>> open attachments unless you recognize the sender and know the
>>>>> content is safe.
>>>>>
>>>>> Hi Valery,
>>>>>
>>>>> On Fri, Jul 19, 2019 at 02:42:50PM +0300, Valery Smyslov wrote:
>>>>>> Hi Med,
>>>>>>
>>>>>> I believe Mirja's main point was that if you use liveness check
>>>>>> mechanism in the transport layer, then if it reports that liveness
>>>> check fails,
>>>>> then it _also_ closes the transport session.
>>>>>> Quotes from her emails:
>>>>>> "Yes, as Coap Ping is used, the agent should not only conclude
>>>>>> that
>>>> the
>>>>> DOTS signal session is disconnected but also the Coap session and
>>>>> not
>>>> send
>>>>> any further Coap messages anymore."
>>>>>> and
>>>>>>
>>>>>> "Actually to my understanding this will not work. Both TCP
>>>>>> heartbeat
>>>> and
>>>>> Coap Ping are transmitted reliably. If you don’t receive an ack for
>>>> these
>>>>> transmissions you are not able to send any additional messages and
>>>>> can
>>>> only
>>>>> close the connection."
>>>>>> I'm not familiar with CoAP, but I suspect she's right about TCP -
>>>>>> if TCP layer itself doesn't receive ACK for the sent data after
>>>>>> several
>>>>> retransmissions, the connection is closed.
>>>>>
>>>>> Thanks for this crisp summary (and thanks Med for the detailed
>>>>> writeup
>>>> as
>>>>> well)!
>>>>>
>>>>>> As far as I understand the current draft allows underlying
>>>>>> liveness check to fail and has a parameter to restart this check
>>>>>> several times if this happens. It seems that a new transport
>>>>>> session will be created in this case (at least if TCP is used). In
>>>>>> my reading of the draft this seems not been assumed, it is assumed
>>>>>> that the session remains
>>>> the
>>>>> same. So, I think that main Mirja's concern is that it won't work
>>>>> (at
>>>> least with
>>>>> TCP).
>>>>>
>>>>> My sense is similar; if I could attempt to summarize Mirja's stance,
>>>> it's that
>>>>> we're invoking a transport-level feature that does its own
>>>>> retransmit
>>>> and
>>>>> backoff, but then if the transport comes back and says "the peer is
>>>> gone", we
>>>>> say "but we're under attack, so I don't believe you; try again".
>>>>> This kicks of another independent set of "retransmits" (I know it's
>>>>> not technically the right word) with a fresh exponential backoff.
>>>>> There's
>>>> two
>>>>> complaints about this: (1) we're changing the transport, since if
>>>>> the
>>>> transport
>>>>> concludes the peer is gone then the transport "normally" tears down
>>>>> the connection (*) entirely, and (2) the assembly of (exponential
>>>>> backoff
>>>> 1),
>>>>> (exponential backoff 2), (exponential backoff 2) is strange pacing,
>>>>> and
>>>> might
>>>>> be better served by a similar number of "retransmits" but with
>>>>> different pacing, since the long delay at the end of each backoff
>>>>> period is not
>>>> expected
>>>>> to add a huge amount of value in terms of letting congestion ease
>>>>> during attack time, and we would be just as well served by capping
>>>>> the delay between retransmits and having more retransmits.
>>>>>
>>>>> The asterisk on (1) is of course because, as is noted later in the
>>>> thread, only
>>>>> TCP tears down the association when it concludes the peer is gone
>>>> (assuming
>>>>> I'm reading the right parts of 7252).  Quoting 7252:
>>>>>
>>>>>                                                          If the
>>>>>     retransmission counter reaches MAX_RETRANSMIT on a timeout, or if
>>> the
>>>>>     endpoint receives a Reset message, then the attempt to transmit
>> the
>>>>>     message is canceled and the application process informed of
>> failure.
>>>>>     On the other hand, if the endpoint receives an acknowledgement in
>>>>>     time, transmission is considered successful.
>>>>>
>>>>> So all CoAP does is to tell the application "that request didn't
>>>>> work",
>>>> but CoAP
>>>>> is happy to try additional requests on the connection; the teardown
>>>> logic is
>>>>> indeed left up to the application.
>>>>>
>>>>> I'm not sure that we've seen much discussion about (2), though
>>>>> (sorry if
>>>> I
>>>>> missed it) -- why is the repeated backoff-and-restart the right
>>>>> pacing
>>>> for this
>>>>> purpose?
>>>>>
>>>>> -Ben
>>>>>
>>>>>> I didn't participate in the WG discussion on this, so I don't know
>>>>>> what was discussed regarding this issue. If it was discussed and
>>>>>> the WG has come to conclusion that this is not an issue, then I
>>>>>> believe more text should be added to the draft so, that people
>>>>>> like Mirja, who
>>>>> didn't participate in the discussion, don't have any concerns while
>>>> reading the
>>>>> draft.
>>>>>> Regards,
>>>>>> Valery.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: mohamed.boucadair@orange.com
>>>>>>> <mohamed.boucadair@orange.com>
>>>>>>> Sent: Friday, July 19, 2019 9:57 AM
>>>>>>> To: Benjamin Kaduk (kaduk@mit.edu) <kaduk@mit.edu>; dots-
>>>>>>> chairs@ietf.org; dots@ietf.org
>>>>>>> Subject: Mirja's DISCUSS: Pending Point (AD Help Needed)
>>>>>>>
>>>>>>> Hi Ben, chairs, all,
>>>>>>>
>>>>>>> (restricting the discussion to the AD/chairs/WG)
>>>>>>>
>>>>>>> * Status:
>>>>>>>
>>>>>>> All DISCUSS points from Mirja's review were fixed, except the
>>>>>>> one discussed in this message.
>>>>>>>
>>>>>>> * Pending Point:
>>>>>>>
>>>>>>> Rather than going into much details, I consider the following as
>>>>>>> the summary of the remaining DISCUSS point from Mirja:
>>>>>>>
>>>>>>>> I believe there are flaws in the design. First it’s a layer
>>>>>>>> violation, but if more an idealistic concern but usually
>>>>>>>> designing in layers is a good approach. But more importantly,
>>>>>>>> you end up with un-frequent messages which may still terminate
>>>>>>>> the connection at some point, while what you want is to simply
>>>>>>>> send messages frequently in an unreliable fashion but a low
>>>>>>>> rate until the
>>>> attack is over.
>>>>>>> * Discussion:
>>>>>>>
>>>>>>> (1) First of all, let's remind that RFC7252 does not define how
>>>>>>> CoAP ping must be used. It does only say:
>>>>>>>
>>>>>>> ==
>>>>>>>        Provoking a Reset
>>>>>>>        message (e.g., by sending an Empty Confirmable message) is
>>>> also
>>>>>>>        useful as an inexpensive check of the liveness of an
>> endpoint
>>>>>>>        ("CoAP ping").
>>>>>>> ==
>>>>>>>
>>>>>>> How the liveness is assessed is left to applications. So, there
>>>>>>> is
>>>>>>> ** no layer violation **.
>>>>>>>
>>>>>>> (2) What we need isn't (text from Mirja):
>>>>>>>
>>>>>>>> to simply send messages frequently in an unreliable fashion
>>>>>>>> but a low rate until the attack is over "
>>>>>>> It is actually the other way around. The spec says:
>>>>>>>
>>>>>>>    "... This is particularly useful for DOTS
>>>>>>>     servers that might want to reduce heartbeat frequency or
>> cease
>>>>>>>     heartbeat exchanges when an active DOTS client has not
>> requested
>>>>>>>     mitigation."
>>>>>>>
>>>>>>> What we want can be formalized as:
>>>>>>>   - Taking into account DDoS traffic conditions, a check to
>>>>>>> assess the liveness of the peer DOTS agent + maintain NAT/FW
>>>>>>> state on on-
>>>> path
>>>>> devices.
>>>>>>> An much more elaborated version is documented in SIG-004 of RFC
>>>> 8612.
>>>>>>> * My analysis:
>>>>>>>
>>>>>>> - The intended functionality is naturally provided by existing
>>>>>>> CoAP
>>>>> messages.
>>>>>>> - Informed WG decision: The WG spent a lot of cycles when
>>>>>>> specifying the current behavior to be meet the requirements set
>> in
>>> RFC8612.
>>>>>>> - Why not an alternative design: We can always define messages
>>>>>>> with duplicated functionality, but that is not a good design
>>>>>>> approach especially when there is no evident benefit.
>>>>>>> - The specification is not broken: it was implemented and
>> tested.
>>>>>>> And a logistic comment: this issue fits IMHO under the
>>>>>>> non-discuss criteria in
>>>>>>> https://www.ietf.org/blog/discuss-criteria-iesg-
>>>> review/#stand-
>>>>> undisc.
>>>>>>> * What's Next?
>>>>>>>
>>>>>>> As an editor, I don't think a change is needed but I'd like to
>>>>>>> hear from Ben, chairs, and the WG.
>>>>>>>
>>>>>>> Please share your thoughts and whether you agree/disagree with
>>>>>>> the above analysis.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Med
>>>>> _______________________________________________
>>>>> Dots mailing list
>>>>> Dots@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/dots
> _______________________________________________
> Dots mailing list
> Dots@ietf.org
> https://www.ietf.org/mailman/listinfo/dots