Re: [TLS] Transport Issues in DTLS 1.3

Mark Allman <mallman@icsi.berkeley.edu> Thu, 25 March 2021 17:30 UTC

Return-Path: <mallman@icsi.berkeley.edu>
X-Original-To: tls@ietfa.amsl.com
Delivered-To: tls@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1DC7E3A2831 for <tls@ietfa.amsl.com>; Thu, 25 Mar 2021 10:30:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.888
X-Spam-Level:
X-Spam-Status: No, score=-1.888 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, T_SPF_PERMERROR=0.01] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=icsi-berkeley-edu.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M_JxOGjmHtw2 for <tls@ietfa.amsl.com>; Thu, 25 Mar 2021 10:30:08 -0700 (PDT)
Received: from mail-ot1-x336.google.com (mail-ot1-x336.google.com [IPv6:2607:f8b0:4864:20::336]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C8C653A2833 for <tls@ietf.org>; Thu, 25 Mar 2021 10:29:57 -0700 (PDT)
Received: by mail-ot1-x336.google.com with SMTP id l12-20020a9d6a8c0000b0290238e0f9f0d8so2691330otq.8 for <tls@ietf.org>; Thu, 25 Mar 2021 10:29:57 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icsi-berkeley-edu.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version; bh=x85S0RHCvYgef6G7KdKRcCiZZTF16laX33xHD4oiByY=; b=KMA1Z6egrS252vAeMgADJF4q9Wp/eiXTyGf7lBW2ohylw23Cg+adZOhCIsK9mwkYOf KbpM8jrNxsnug0EI1AE3GmCg6Nbxk06kOd7qzGDtu7XghlC27G6bJciBKB7e/JMFjMc4 yItJaD7CNxbt1BnETdme8rC9LS7qDDdrzPBisJh6UIoFO8ZVESrTBcmQtIRWjE4ZQ5Tf d69UAN1iy05AlxJhwrCSpzrpb8gU/NNH7wylcqvhKOk8k2owe1zh3luFJr3KjmuDB6qz iNOlxLB6GpT1AqWS6kSVQ9UceER/3MT28sPt+/TnR+jdrTWFgKX9noKex2z5cjgRToZP FFIA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version; bh=x85S0RHCvYgef6G7KdKRcCiZZTF16laX33xHD4oiByY=; b=rwueqXeFD7Z3AxbmJMrkJxDwuz/R/SJSd3TMaX/jB6z6yplojovW2wo8k3Niy9iVF9 IbryJBs30DYLRgMDPjHkW0Gz/vUgtvqdS35M3ErH69mpERy3ERahY5Ht2LmbRNBf+lSA /I4ffugP6F0JPgRf9MDleVqF1t2iAgCoO14n6RuR3YciXtcrFS8OX2CbQSSWKLMwOpyE 75gpiNzyOUr0pJfcw8NbOGKU6qPuXN+RozhmfOF9R3/Br33GdxB2Z8WfzFfGUY62OoEq naURgGsTDxhvGhH/V54m6dLenitRIxKeeexlaUyCActePzGoA9Xs2Ec/cOd7kwPuFM9a 4YSQ==
X-Gm-Message-State: AOAM5321v+1E7cKnjmzyaJ5F+b4keCA07PcQdvjsg2OKOAhLQhSLHVBH CGUzMQzHPkxuhlPfNRToMR2zmDDY/Oe0wH++
X-Google-Smtp-Source: ABdhPJyvMguqxB1UNEvSB0k0vJOCItcmVHVAX4OFMXOSWYvgAdf8CHmJrUvyvUi2t7yIp9UVw/j51A==
X-Received: by 2002:a9d:3c2:: with SMTP id f60mr174175otf.220.1616693396431; Thu, 25 Mar 2021 10:29:56 -0700 (PDT)
Received: from [192.168.1.181] (162-203-32-211.lightspeed.bcvloh.sbcglobal.net. [162.203.32.211]) by smtp.gmail.com with ESMTPSA id t22sm1522115otl.49.2021.03.25.10.29.54 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 25 Mar 2021 10:29:55 -0700 (PDT)
From: Mark Allman <mallman@icsi.berkeley.edu>
To: Martin Duke <martin.h.duke@gmail.com>
Cc: draft-ietf-tls-dtls13.all@ietf.org, Lars Eggert <lars@eggert.org>, Gorry Fairhurst <gorry@erg.abdn.ac.uk>, tls@ietf.org
Date: Thu, 25 Mar 2021 13:29:52 -0400
X-Mailer: MailMate (1.13.2r5673)
Message-ID: <A2B835CB-7D03-435F-AD06-9ED16E2661CF@icsi.berkeley.edu>
In-Reply-To: <CAM4esxR3YPoWaxU9B--oaT9r2bh_QBNH=tt0FsiUKaAT=M6_fg@mail.gmail.com>
References: <CAM4esxR3YPoWaxU9B--oaT9r2bh_QBNH=tt0FsiUKaAT=M6_fg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_96179E7B-C484-4155-85FB-7BA6D2A6D613_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tls/3fqAgj2JJyzJ-mcvSiUcS7iyv4g>
X-Mailman-Approved-At: Thu, 01 Apr 2021 10:23:14 -0700
Subject: Re: [TLS] Transport Issues in DTLS 1.3
X-BeenThere: tls@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "This is the mailing list for the Transport Layer Security working group of the IETF." <tls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tls>, <mailto:tls-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tls/>
List-Post: <mailto:tls@ietf.org>
List-Help: <mailto:tls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tls>, <mailto:tls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Mar 2021 17:30:11 -0000

Folks-

I am appending what I sent Martin & Gorry this morning.  I looked
quite quickly as Martin was looking for quick input.  I am happy to
iterate if things aren't all that understandable.

allman


[Quick hit.]

I agree with Martin's DISCUSS and Gorry's notes.  A couple more
things here ...

> This spec looks a mess!

A generous reading is that this is most of the problem.  I think
maybe there is some intent in here that isn't stated very well.  It
needs to be explicit and not as sloppy.

A few specific things (in addition to what Gorry said, which I
absolutely agree with):

  - "Though timer values are the choice of the implementation,
    mishandling of the timer can lead to serious congestion
    problems"

    + Gorry flagged this and I am flagging it again.  If this is
      something that can lead to serious problems, let's not just
      leave it to "choice of the implementation".  Especially if we
      have some idea how to make it less problematic.

  - "Implementations SHOULD use an initial timer value of 100 msec
    (the minimum defined in RFC 6298 [RFC6298])"

    + I wrote RFC 6298 and I have no idea where this is coming from!

    + Even if this value of 100msec is OK for DTLS it shouldn't lean
      on RFC 6298 because RFC 6298 doesn't say that is OK.  I.e.,
      the parenthetical is objectively wrong.

    + RFC 6298 says the INITIAL RTO should be 1sec (point (2.1) in
      section 2).  RFC 8961 affirms this and also says the INITIAL
      RTO should be 1sec (requirement (1) in section 4).

  - "Note that a 100 msec timer is recommended rather than the
    3-second RFC 6298 default in order to improve latency for
    time-sensitive applications."

    + Again, this mis-states RFC 6298, which says the initial RTO is
      1sec (not 3sec).  (Previous to RFC 6298 the initial RTO was
      3sec, which is probably where the notion comes from.  Most of
      the purpose of RFC 6298 was to drop the initial RTO to 1sec.)

    + This is a statement of desire, not any sort of principled
      justification for using 100msec.  At the least this should be
      much better argued.

    + To me 100msec feels much too close to the RTT of some network
      paths to be appropriate here.  To be clear, deviations from
      RFC 8961 that gather consensus are fine, but you should say
      why that deviation is OK.  And, I'd think the further you
      deviate the more you need to say (for me).  I.e., dropping
      from 1sec to 900msec may not be that big of an issue.  But,
      dropping to 1/10-th of the guideline and to something pretty
      close to not rare RTTs should require some care and some
      discussion, IMO.

    + And, I am not trying to be a picky protocol lawyer and say
      this document "didn't check the RFC 8085 / RFC 8961 box".
      Rather, RFC 8085 & 8961 say things for a reason and I don't
      think we should implicitly ignore them because they come from
      experience on how to do these sorts of things.

  - "The retransmit timer expires: the implementation transitions to
    the SENDING state, where it retransmits the flight, resets the
    retransmit timer, and returns to the WAITING state."

    + Maybe this is spec sloppiness, but boy does it sound like the
      recipe TCP used before VJCC to collapse the network.  I.e.,
      expire and retransmit the window.  Rinse and repeat.  It may
      be the intention is for backoff to be involved.  But, that
      isn't what it says.

  - “When they have received part of a flight and do not immediately
    receive the rest of the flight (which may be in the same UDP
    datagram). A reasonable approach here is to set a timer for 1/4 the
    current retransmit timer value when the first record in the flight
    is received and then send an ACK when that timer expires.”

    + Where does 1/4 come from?  Why is it "reasonable"?  This just
      feels like a complete WAG that was pulled out of the air.

And, +1 on all the flight size stuff Martin mentioned.

allman