Re: [Dots] [core] WG Last Call on draft-ietf-core-new-block

Christian Amsüss <christian@amsuess.com> Tue, 09 March 2021 16:32 UTC

Return-Path: <christian@amsuess.com>
X-Original-To: dots@ietfa.amsl.com
Delivered-To: dots@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3B0813A1332; Tue, 9 Mar 2021 08:32:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TjbGFAtZxpl5; Tue, 9 Mar 2021 08:31:59 -0800 (PST)
Received: from prometheus.amsuess.com (alt.prometheus.amsuess.com [IPv6:2a01:4f8:190:3064::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 14DD43A1335; Tue, 9 Mar 2021 08:31:58 -0800 (PST)
Received: from poseidon-mailhub.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bd]) by prometheus.amsuess.com (Postfix) with ESMTPS id E670E40802; Tue, 9 Mar 2021 17:31:53 +0100 (CET)
Received: from poseidon-mailbox.amsuess.com (hermes.amsuess.com [10.13.13.254]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id 45330D3; Tue, 9 Mar 2021 17:31:52 +0100 (CET)
Received: from hephaistos.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:737c:1818:4f9d:476f]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id B2D6410A; Tue, 9 Mar 2021 17:31:51 +0100 (CET)
Received: (nullmailer pid 3538520 invoked by uid 1000); Tue, 09 Mar 2021 16:31:51 -0000
Date: Tue, 9 Mar 2021 17:31:51 +0100
From: Christian =?iso-8859-1?Q?Ams=FCss?= <christian@amsuess.com>
To: supjps-ietf@jpshallow.com
Cc: draft-ietf-core-new-block@ietf.org, dots@ietf.org, core@ietf.org
Message-ID: <YEei91E6YoP+wXiI@hephaistos.amsuess.com>
References: <022401d6e440$06763ba0$1362b2e0$@jpshallow.com> <YCxikyadpukaiK5I@hephaistos.amsuess.com> <004601d705f8$acbec250$063c46f0$@jpshallow.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="9IhhXOIxwYNE54lK"
Content-Disposition: inline
In-Reply-To: <004601d705f8$acbec250$063c46f0$@jpshallow.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dots/qIjrIWCyH1vI0FlyugadrGF8mNM>
Subject: Re: [Dots] [core] WG Last Call on draft-ietf-core-new-block
X-BeenThere: dots@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "List for discussion of DDoS Open Threat Signaling \(DOTS\) technology and directions." <dots.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dots>, <mailto:dots-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dots/>
List-Post: <mailto:dots@ietf.org>
List-Help: <mailto:dots-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dots>, <mailto:dots-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Mar 2021 16:32:02 -0000

Hello Jon,

I've finally managed to look at the rest of the mail, only answering
where I see value in it (considering the rest done):

> > * CSM option: "There is little, if any, benefit of using these options
> >   with [reliable transports]" ... and reliable transports are the only
> >   thing CSMs are defined for. So why define a CSM if it's not expected
> >   to be useful?
> 
> [Jon] This is here for completeness.  DOTS tries to use UDP, but falls back
> to using TCP if unable to use UDP. 

This document's task is to solve a particular problem. I don't see how
the document becomes better by covering parts just for orthogonality's
sake when there is, to quote, "little, if any, benefit" to it.

There's this saying about things being good when there's nothing to
remove. Keeping provisions for reliable transports in puts needless work
on the reviewers and just slows down this document's progress (and if
it's only because questions like this come up again and again).

Unless, of course, there is both a) the intention for the DOTS use case is
to go from the current (AIU: Block as baseline but using QBlock as an
optimization) to QBlock as mandatory-to-implement *and* b) the bodies
exceed the required-by-the-application Max-Message-Size -- then keeping
reliable transports would make sense.

> > * The 2.31 Continue rules: Why not send it with CON? A 2.31 Continue is
> >   just as compact as an ACK, and can convey to the client that all
> >   packages up to that point have been received, so it doesn't need to
> >   retransmit the earlier ones even though the corresponding ACK may have
> >   gotten lost.
> 
> [Jon] For DOTS, we have to have the entire conversation for the transmission
> of the telemetry data as NON, as there could be a flooded pipe in one
> direction because of a DDoS attack.

Same topic here: Please make a choice between "we specify what we need"
and "we specify the general case".

We can go through how to efficiently use this with CON (both in this
concrete point and in the earlier discussion on mixing NON and CON in ways
that don't apply to DOTS but would be useful). My impression though is
that this would require more effort and time than we have drive for in
this document -- which is fine because it's special purpose. If that's
not done, my opinion is that this document should focus on NON-only
exchanges, and leave its use with confirmable messages and transports
out of scope.

A full blockwise-bis will need to do this, and will profit greatly from
the discussions we've been having here.

> > * The new QB2 M-bit rules depend on MAX_PAYLOADS to be agreed. But that
> >   agreement is SHOULD only, and even that's already stretching what I'd
> >   expect of a configurable CoAP parameter.
> 
> [Jon] Having the flexibility and ability to configure the actual value and
> mutually agree on the new value could beneficial to DOTS (clients on low
> bandwidth networks may want to tune it down as suggested as per:-
> 
> "If the CoAP peer reports at least one payload has not arrived for each body
> for at least a 24 hour period and it is known that there are no other
> network issues over that period, then the value of MAX_PAYLOADS can be
> reduced by 1 at a time (to a minimum of 1) and the situation re-evaluated
> for another 24 hour period until there is no report of missing payloads
> under normal operating conditions. The newly derived value for MAX_PAYLOADS
> should be used for both ends of this particular CoAP peer link. Note that
> the CoAP peer will not know about the MAX_PAYLOADS change until it is
> reconfigured. As a consequence of not being reconfigured, the peer may
> indicate that there are some missing payloads prior to the actual payload
> being transmitted as all of its MAX_PAYLOADS payloads have not arrived."

A bit of clarification may help here: Is the expectation that the peers
report their stats to some management that then evaluates it and
reconfigures the peers? May they do the stats on their own, unilaterally
change MAX_PAYLOADS and then tell their peer?

Suggested rough wording if that fits your answer: "Endpoints whose
MAX_PAYLOADS are configurable can report their per-peer stats back to
the source of their configuration. If between two endpoints at least one
payload has not [...]. Th enewly derived value for MAX_PAYLOADS is then
configured for both endpoints."

> >   It's also rather complicated. How is this better than a simple "M=1
> >   means this block plus as many as you can comfortably send, where M=0
> >   is for this one only"?
> 
> [Jon] We still need to have a concept of 'Continue' to reduce any
> NON_TIMEOUT delays for every MAX_PAYLOADS for handling congestion control.
> 'Continue' is used in multiple places in the draft.

I would have figured that something local can be used here (like "have a
short timeout that's extended and readjusted as new responses come in"),
but with the above management, what's here should be fine. (If CONs were
allowed, there'd be better options that don't introduce the delays --
but see above).

> > * "is meant to prevent amplification attacks": Could you elaborate? A
> >   client permitted the use of QB2 has already some leverag on the server
> >   to start attacks, and would in any case (overlaps or not) only be
> >   permitted MAX_PAYLOADS on a single request by the server, no matter
> >   how it requests more than that.
> 
> [Jon] A single request, containing multiple Q-Block2 with M set and the same
> NUM (0 meaning all blocks is a really bad case) would otherwise (no overlap
> protection) cause MAX_PAYLOADS to be sent, NON_TIMEOUT pause, next set of
> MAX_PAYLOADS, NON_TIMEOUT pause etc. for quite some time.  Request packet of
> 1k+, each Q-Block2 being 2 bytes gives 500+ requests for lots of packets.
> Yes, there are NON_TIMEOUT gaps giving respite against a single request, but
> multiple requests would average things out.

I understand this to imply that if duplicates were allowed to be
requested the server would actually send them multiply, allowing the
client to create a virtual larger resource it can "pull out" with a
single request as compared to regular requests that "pull out" the
resource at most once. Wouldn't have occurred to me as an implementer
(I'd have only sent the union of the requested sets) -- but fair enough,
thanks.

> > * "then the body response SHOULD be restarted with a different ETag
> >   Option value": That behavior sounds like a recipe for endless running
> >   requests when CON is involed (which, granted, are under flow control,
> >   but still don't do anything useful). Given there is also a
> >   recommendation to keep the being-transmitted version alive, why not
> >   just stop the transmission?  Or send just one final block of the new
> >   value -- and then it's up to the client to decide whether that's a
> >   representation it already knows (and just got a freshness statement
> >   for) or needs to get it as a whole again?
> 
> [Jon] My implementation maintains the cached body as per previous
> RECOMMENDED statement which keeps things simple.  I was then trying to cover
> the non-cached copy case.  I agree with your concern.
> 
> [Jon] From the client's perspective, ETag is opaque with no way of knowing a
> newly seen ETag is earlier or later.  If a NON (old) ETag is out of sequence
> on arrival or there is a CON retransmit with an (old) ETag the client has to
> make a decision on seeing the (old) ETag (which may be for a single packet
> that holds the whole body and hence is not in the clients history).  Does
> the current receipt of multiple payloads get aborted or ... when the (old)
> ETag is seen?

Usually there's the sequence of requests (and thus tokens) by which the
client can tell older from newer; in sending Q-Block that's not given,
thus see below.

> [Jon] The new data length may be less than the previous payload, and so the
> currently being transmitted block is beyond the size of the new data.
> 
> [Jon] The (needs terminating) response can be for something other than a
> GET, making it a bit more problematic for the client to continue -
> especially if there is non-idempotent usage.
> 
> [Jon] Two ways forward here
> 1. Make previous statement a MUST and delete this statement Or 2.
> OLD
> If the server detects part way through a body transfer that the resource
> data has changed and the server is not maintaining a cached copy of the old
> data, then the body response SHOULD be restarted with a different ETag
> Option value. Any subsequent missing block requests MUST be responded to
> using the latest ETag Option value.
> NEW
> If the server detects part way through a body transfer that the resource
> data has changed and the server is not maintaining a cached copy of the old
> data, then the transmission is terminated.  Any subsequent missing block
> requests MUST be responded to using the latest ETag and Size2 Option values
> with the updated data.
> 
> [Jon] Preferences ?

Both (MUST keep around as well as terminating the transmission) are
fine. I'd be leaning towards the server just stopping if (or when) it
doesn't have the cached version around any more as it mitigates
slow loris style DoS attacks. But as said, either are fine -- they
both ensure that no different ETags come on the same token unless
disambiguated by an incrementing Observe option.

> > * "When the next client completes building the body, any existing
> >   partial body transmission to the CoAP server is terminated": Just a
> >   note, you're already using Request-Tag, so you could leave it up to
> >   the proxy to try to run them simultaneously (it obviously can, as it
> >   already got to the point of having both request bodies loaded in
> >   full). The server can then still cancel one or postpone the other.
> 
> [Jon] If the proxy uses multi-plexing client logic to talk to a singular
> server with a common 'CoAP session', it has to determine which body entity
> to terminate - and may chose the wrong one based as timing, whereas it is
> simpler to get the proxy to make the correct decision.

Why does it have to terminate either? It can simultaneously receive two
requests, and then simultaneously relay them.


Best regards
Christian

PS. I'm also around the IETF gather venue for most of the meeting time,
and now have a fresh memory of the state of affairs, so especially for
the last point it may help to do a faster back-and-forth there (although
I should be reasonably responsive with mail now too).

-- 
To use raw power is to make yourself infinitely vulnerable to greater powers.
  -- Bene Gesserit axiom