Re: [Tsv-art] Tsvart early review of draft-ietf-ipsecme-iptfs-03

Joseph Touch <> Fri, 04 December 2020 16:42 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 4019E3A0E07; Fri, 4 Dec 2020 08:42:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.318
X-Spam-Status: No, score=-1.318 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id w-wMynYG1ahQ; Fri, 4 Dec 2020 08:42:46 -0800 (PST)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 5DB2E3A0DFA; Fri, 4 Dec 2020 08:42:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;; s=default; h=To:References:Message-Id:Cc:Date:In-Reply-To: From:Subject:Mime-Version:Content-Type:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=3tiLLRfadGXBvz/zVGj83kNY2hb6V4wQRrBs2yRFCLE=; b=Q2d9w9RYRCQfNymce8/ZmFzio /EPAIRPHi1lt4KQdKQwnvcnHB5qCrSIExYqx21p8jN3dLmFuHHOH/87Xd2h+pcGlPeZhRwbxxPOfj 1/d93jaXvtN1C0AiG0du1tFF3ZGhsr3rhPgi+/n0RQYVIrBeMoRFiXxrzx0qUn+eHmRQUsSq1IN+r VLB6ZHmrXUW1EDxuKK74RO26ywh2rHJkcT0KlD2Eg+OGwUi+BkOux7aCuvmIB4O4XEblgAKIB0ITK q3QXdq1JY55k+2EOeLfnR0bOElq0tdyciTG6ZON19ptNWKbNKKHPTElMlvcTodHc46frsdPL6HY8e Z2iaEVtfQ==;
Received: from ([]:57535 helo=[]) by with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.93) (envelope-from <>) id 1klEAK-000ryS-2R; Fri, 04 Dec 2020 11:42:45 -0500
Content-Type: multipart/alternative; boundary="Apple-Mail=_8DAA3C60-7DBA-4A6C-85BC-B2FBED6C9E8A"
Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.\))
From: Joseph Touch <>
In-Reply-To: <>
Date: Fri, 4 Dec 2020 08:42:38 -0800
Cc:, tsv-art <>,
Message-Id: <>
References: <> <>
To: Christian Hopps <>
X-Mailer: Apple Mail (2.3654.
X-OutGoing-Spam-Status: No, score=-0.5
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname -
X-AntiAbuse: Original Domain -
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain -
X-Get-Message-Sender-Via: authenticated_id:
X-From-Rewrite: unmodified, already matched
Archived-At: <>
Subject: Re: [Tsv-art] Tsvart early review of draft-ietf-ipsecme-iptfs-03
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Transport Area Review Team <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Fri, 04 Dec 2020 16:42:49 -0000

Hi, Christian,

> On Dec 4, 2020, at 12:14 AM, Christian Hopps <> wrote:
> Hi Joseph,
> First, thanks for your review. This is an initial reply to clarify a couple points [inline].
>> On Dec 4, 2020, at 1:26 AM, Joseph Touch via Datatracker <> wrote:
>> Reviewer: Joseph Touch
>> Review result: Not Ready
>> This documents most significant transport issue is the use of path MTU
>> discovery. Sec 2 recommends the use of path MTU discovery [RFC1191] [RFC8201],
>> but this is known to be problematic black holing where ICMPs are blocked or
>> filtered (which should be noted everywhere PMTU is suggested, including Sec
>> 4.2). It would be more appropriate to incorporate PLPMTUD [RFC4821] [RFC8899],
>> especially in conjunction with congestion control, as both assume bidirectional
>> stateful ingress/egress coordination. This is particularly important given the
>> sending side’s prerogative to change the size of the tunnel EMTU_S (see below).
> Thanks, I will look into adding text on this. An implementation can use receipt of the time echo in the CC header as an ACK of receipt of a tunnel packet that included that time val so probing using this method could be suggested as well. For non-CC the user is expected to be in control of the path domain (otherwise it wouldn't be able to correctly set the fixed send rate) so no PMTU (or simple PMTU if they really wish to use it -- unlikely) should suffice.

Agreed. It would just be useful to not imply “just do PMTU” because we know it isn’t a solution.

>> The most significant concern is not at the transport layer – it is the
>> assumption of in-order delivery and insufficient consideration of the impact of
>> loss at the network layer. Loss could orphan received fragments – for which a
>> timeout should be recommended. Loss or reordering could generate faulty
>> reassembly at the egress – which is actually very likely here, e.g., when a
>> short packet is followed by a sequence of packets that are exactly the tunnel
>> MAP (as defined in draft-ietf-intarea-tunnels). As noted below, persistent
>> fragmentation could make this situation worse, esp. in the presence of frequent
>> reordering or loss.
> In-order deliver is not assumed. Correct reordering is required, and is done on ingress by IPTFS through use of the ESP sequence number. IPsec/ESP includes a sequence number and a window for anti-replay protection. This also provides for loss detection (as the sliding window moves) allowing for reaping of orphaned fragments.
> Perhaps we should add a bit of text highlight this though?

That would help. I checked and reordering doesn’t appear to be mentioned at all.

It will be important to also include limits on reordering - how many packets to hold back before allowing a drop in the sequence and what to do if a packet shows up late. I.e., this is a bit more complex than just saying “reordering is required”.

> (FWIW my test suite for our implementation of IPTFS includes a fairly comprehensive set of tests of out-of-order delivery including drops. :)
>> Other significant issues, largely at the network/tunnel layer:
>> There is no clear utility in having the blockoffset point past the end of the
>> current packet. It serves – at best – as only a partially useful check on the
>> next packet. I.e., if the two blockoffsets disagree, presumably a packet is
>> lost – but if they agree, it cannot be concluded that a packet is not lost. It
>> is sufficient that it points to the end of the tunnel packet.
> No harm either though, right? Having implemented this, I can tell you that it does help detect bugs in ones code while in development. :)

It’s useful as a check, but it’s important to explain what the check means.

There are four cases (BO = blockoffsets, seq = ESP sequence number)

                              ESP - no gap         ESP - gap
BO - aligned          OK                         BO-FP-err
BO - misaligned    SN-FP-err              Typ-err

Typ-err is the typical error when a packet is lost, because that should result in both an ESP seqno gap and blockoffset misalignment.

SN-FP-err is when there’s no ESP seq no gap but the blockoffsets are misaligned. This should only ever happen if the seqno rolls around and you missed it, which you should have some other mechanism to prevent (i.e., drop old packets when they’re half the sequence space old - which is easy to predict because you know the tunnel packet generation rate).

BO-FP-err is when there’s an ESP seq no gap but the blockoffsets are aligned. This is just a false positive (thus “FP” in my notation). S

In any of the error cases, you do not reassemble.

So checking blockoffset alignment only helps you know whether your sequence number check and timeout discard is working correctly.

Protocol implementation correctness should be checked by test vectors, not during normal code operation IMO. At best, it’s unnecessary complexity. At worst, it gives you a false sense of wanting to reassemble when you shouldn’t.

>> Although the mechanism allows for padding, it appears too aggressive in trying
>> to be work-conserving. Consider a tunnel that could support 1000B payloads; if
>> a stream of packets comes in as 200B, 1000B, 1000B, 1000B, etc. (more 1000B
>> packets), EVERY 1000B would be fragmented, which means loss of a single tunnel
>> packet would cause two inner packets to be incomplete. The protocol should
>> allow padding in some cases to avoid fragmentation, e.g., every 100th packet it
>> might allow insertion of padding rather splitting a packet across the stream.
>> The heuristic shown here is just an example.
> This is allowed by the protocol. There is no restriction specified on when an implementation can insert padding in the tunnel packet stream. In fact unless the tunnel is being 100% utilized the expectation is that it will be sending extra padding often -- that being one key to the traffic flow security.
> Your suggestion I think is that it could be beneficial to consider reducing the available bandwidth by forcing padding to align packets even if in the presence of an offered load >= 100% of the tunnel capacity .

Not just then, but also when offered load is <100% too. Even in those cases, it’s possible that it might be better to add padding than to try to pack. Think of it this way -
- if the remaining space in a tunnel packet is small
- if the next packet would fit in the tunnel without splitting it up if it started the next packet
- if the amount of wasted space doesn’t make the ingress incoming packet queue backup too much
- pad out the current tunnel packet and start a new one

I.e., this can all be based on a queue on the ingress. When the queue backs up, push towards work-conserving. Otherwise, push for split-reduction.

The depth of the queue should be configurable - and a property of the tunnel, because it adds delay. 

> Again, this is certainly allowed, but when and if to do this seems to be a great topic for research, and would be outside the scope of this initial protocol specification I think.
> Do you think we should add text explicitly mentioning that one could do this if one wanted to though?

I’d suggest mentioning that it’s allowed, describing a possible way to do it, and explaining its cost (esp. the added delay). It may be “research” to get it right, but having something like that as part of the system may be important to having it work over lossy paths.

>> Fig 1 shows the blocksize as always occurring in the latter half of a word; is
>> this always the case? If so, it would be useful to indicate the left side of
>> that word as “ESP – con’t”. If not, other examples should be provided.
> The "..." is just omitting the details of other IPTFS fields which are normatively specified in section 6. The intent was to give an overview of what is being discussed in the text below, but not normatively define the header format.

That’s not clear. You should put something in that area to say “IPFTS Sec 6 HDR” or something.

>> At the end of Sec 2.2, the blockoffset helps recover the next inner packet, but
>> the term “full” in that sentence implies the inner packet is entirely inside
>> that outer packet (which it may not be).
> will remove "full".

Thanks - that’s what I was thinking would suffice.

>> The  option of congestion control and the claim of “unidirectional” should be
>> discussed more consistently (i.e., mention the need for bidirectional channels
>> when mentioning thec claim of unidirectionality).
> Will do.
>> Like all tunnels, this approach needs to more clearly indicate a number of
>> different MTU values and how they interact [draft-ietf-intarea-tunnels], both
>> for the underlying network and the tunnel provided: -       EMTU_S -
>> EMTU_R -       Path MTU -       Link MTU
>> It may also be useful to use the terms developed in that doc, e.g., tunnel link
>> packet (the packet that traverses a tunnel) as well as inner vs. outer
>> fragmentation, tunnel maximum atomic packet, etc.
>> There are a number of other tunnel considerations that should be addressed,
>> again as discussed in draft-ietf-intarea-tunnels. These include: -       The
>> impact of tunnel traversal on the inner hopcount/TTL (there should be none, as
>> per that doc; hopcount should be adjusted by routers, not tunnels) -
>> Impact of errors in the tunnel on the ingress -       Specification of the
>> EMTU_R of the tunnel itself, and how that is determined -       What the
>> ingress should do if an incoming packet exceeds EMTU_R -       It should be
>> noted that this is a unicast tunnel -       What you expect if there are
>> multiple paths between the tunnel ingress and egress -       Does the tunnel
>> itself have a flow ID? (if the outer packet is IPv6) If so, is it fixed or
>> intended to vary arbitrarily (and if so, how)? -       If the outer packet is
>> IPv4, do you expect to use DF=1? If not, how are you handling ID issues in RFC
>> 6864? If so, it might be useful to add a reminder that the ID can be anything
>> (including constant, which may be useful in avoiding a covert channel).
> Will go back through the document keeping this in mind.

I’ll issue an update to that doc shortly. It’s on my list to do a deep pass to get that one out the door - it’s been a long time coming, but it’s already impacted the terminology of a number of other docs that have been published.

FWIW, I’d be glad to provide feedback on an update, esp. on that last issue.