[pmtud] pmtud ietf57 vienna minutes, version 0
Matthew J Zekauskas <matt@internet2.edu> Thu, 21 August 2003 22:07 UTC
Received: from optimus.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA03702 for <pmtud-archive@odin.ietf.org>; Thu, 21 Aug 2003 18:07:29 -0400 (EDT)
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19pxaC-0004MW-Lv for pmtud-archive@odin.ietf.org; Thu, 21 Aug 2003 18:07:05 -0400
Received: (from exim@localhost) by www1.ietf.org (8.12.8/8.12.8/Submit) id h7LM73V7016761 for pmtud-archive@odin.ietf.org; Thu, 21 Aug 2003 18:07:03 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19pxaB-0004MG-DU for pmtud-web-archive@optimus.ietf.org; Thu, 21 Aug 2003 18:07:03 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA03635 for <pmtud-web-archive@ietf.org>; Thu, 21 Aug 2003 18:06:56 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19pxa8-0002XK-00 for pmtud-web-archive@ietf.org; Thu, 21 Aug 2003 18:07:00 -0400
Received: from ietf.org ([132.151.1.19] helo=optimus.ietf.org) by ietf-mx with esmtp (Exim 4.12) id 19pxa7-0002XH-00 for pmtud-web-archive@ietf.org; Thu, 21 Aug 2003 18:06:59 -0400
Received: from localhost.localdomain ([127.0.0.1] helo=www1.ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19pxa9-0004Ko-0T; Thu, 21 Aug 2003 18:07:01 -0400
Received: from odin.ietf.org ([132.151.1.176] helo=ietf.org) by optimus.ietf.org with esmtp (Exim 4.20) id 19pxZX-0004CA-I8 for pmtud@optimus.ietf.org; Thu, 21 Aug 2003 18:06:23 -0400
Received: from ietf-mx (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id SAA03500 for <pmtud@ietf.org>; Thu, 21 Aug 2003 18:06:16 -0400 (EDT)
Received: from ietf-mx ([132.151.6.1]) by ietf-mx with esmtp (Exim 4.12) id 19pxZU-0002Vw-00 for pmtud@ietf.org; Thu, 21 Aug 2003 18:06:20 -0400
Received: from basie.internet2.edu ([207.75.164.22]) by ietf-mx with esmtp (Exim 4.12) id 19pxZT-0002VK-00 for pmtud@ietf.org; Thu, 21 Aug 2003 18:06:19 -0400
Received: from localhost (localhost.localdomain [127.0.0.1]) by basie.internet2.edu (Postfix) with ESMTP id 781077B504; Thu, 21 Aug 2003 18:05:48 -0400 (EDT)
Received: from wlan238.internet2.edu (slip-32-101-183-168.mi.us.prserv.net [32.101.183.168]) by basie.internet2.edu (Postfix) with ESMTP id 91A267B500; Thu, 21 Aug 2003 18:05:41 -0400 (EDT)
From: Matthew J Zekauskas <matt@internet2.edu>
To: pmtud@ietf.org
Cc: Matt Zekauskas <matt@internet2.edu>, Matt Mathis <mathis@psc.edu>
Message-ID: <27046290.1061489137@localhost>
X-Mailer: Mulberry/2.2.1 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-Virus-Scanned: by mail.internet2.edu virus scanner
Content-Transfer-Encoding: 7bit
Subject: [pmtud] pmtud ietf57 vienna minutes, version 0
Sender: pmtud-admin@ietf.org
Errors-To: pmtud-admin@ietf.org
X-BeenThere: pmtud@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/pmtud>, <mailto:pmtud-request@ietf.org?subject=unsubscribe>
List-Id: Path Maximum Transmission Unit Discovery <pmtud.ietf.org>
List-Post: <mailto:pmtud@ietf.org>
List-Help: <mailto:pmtud-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/pmtud>, <mailto:pmtud-request@ietf.org?subject=subscribe>
List-Archive: <https://www1.ietf.org/mail-archive/working-groups/pmtud/>
Date: Thu, 21 Aug 2003 18:05:37 -0400
Content-Transfer-Encoding: 7bit
Content-Transfer-Encoding: 7bit
Due to a misunderstanding on my part, these are a little late... Let us know if you have comments (stuff with XXX, in particular, indicates areas where none of the note-takers captured the conversation). I expect these were actually due last week, although I haven't had any of the usual reminder mail from the proceedings editor; reading sooner rather than later would be helpful, however.) Matt M's original slides are here http://www.psc.edu/~mathis/papers/pmtud200307/ A hacked version of the pdf file to try and get a clearer xplot slide is here: <http://people.internet2.edu/~matt/pmtud/meetings/ietf57/pmtud200307-mz.pdf> ================================================================== Path Maximum Transmission Unit Discovery Pre-WG (pmtud) Thursday, 17-July-2003 from 13:00 to 15:00 ======================================================= The meeting was chaired by Matt Mathis and Matt Zekauskas. Al Morton, Itojun Hagino, and Matt Z. took notes, which were assembled into these minutes by the chairs. Agenda ------ * Preliminaries: Blue sheets, Note takers, etc * WG Status * Short history and work to date * Robustness Issues * Other Stakeholders * Plans Matt Mathis led off the meeting presenting the new co-chair, the agenda, the changes to the proposed charter, and the aggressive milestones. The group status is that some parts of the administrative preparation did not get done, but IESG has approved, hence "Pre WG". This will be a fast development, silence will be acceptance (at the start as sections are integrated). PMTUD is a re-activation of path MTU WG, which was a very similar effort. To participate, you must subscribe to IETF list (pmtud@ietf.org). Matt Zekauskas volunteered to Co-chair the group. The charter was broadened so as not to restrict to a single method. Milestones are aggressive, and need for implementation to test is clear. No one disagreed, nor were there suggestions for other methods to study. Matt Mathis will make another editorial pass at the document (which was a rewrite, starting from RFC 1981, instead of an update to the previous BOF input). Sections will be added based on mailing list comments and any input from stakeholder communities. Matt Mathis sketched the previous algorithm and noted some of the problems. He then sketched the new algorithm, noting that there is just a small amount of MUST/SHOULD language: under what circumstances can losses be ignored as a congestion signal. The rest is heuristics; it doesn't need to be the same for every application, and permits vendor diversity. Question: when can the algorithm be used by TCP? Just after 3-way handshake, or before real communication? Matt responded that it uses live payload data, and the draft has a recommendation not to attempt the algorithm unless the congestion window is at least twenty packets, so the connection is well established before the algorithm starts. Thus, this could slow down tiny files -- the exact algorithm is a heuristic, so you could choose to perform it differently. There are tradeoffs. The request for collaborators in the IPsec & security area led to a big discussion on tunnel issues. People were positive about the method, but there are corner cases to consider. Input has been promised to the mailing list. (In the IPv4 world, lack of PMTUD is noted as a major problem with IPsec VPNs and providing services.) This started around slide 11, "Plans for the Next Draft". One of the collaborators was folks from the multicast area; one possibility is a generalization of the algorithm for reliable delivery. This would solve an ICMP implosion problem if the current MTU discovery technique was used. Someone noted that the behavior as specified in IPv4 and IPv6 was different, in 6 you respond, in v4 you don't. [XXX] Another collaborating group would be IPsec; currently the security architecture document has major sections dealing with the interaction of MTU discovery and IPsec (because tunnels are created); the new technique might obsolete many of those sections. One audience member [Itojun, I think -XXX] noted that the interaction between IPsec and and TCP depends on if the TCP stack is aware of IPsec. If the TCP stack does not take care of the IPsec header size, the algorithm would need to be revised. Matt M. responded that the detail in the draft needs to be resolved in a consistent way. You can count the IPsec header as part of the IP header or TCP header. The really nasty cases involve additional layers, for example IPsec on a VPN, ICMP messages could go back to the wrong place. Michael Richardson expanded on this as an IPsec implementer. The worst case is common at meetings such as this -- you have a corporate address on your laptop, and a VPN back to the corporate space, so all traffic goes back to HQ. Try to visit a bank, and they have ICMP filters. Your gateway is sending out ICMP messages to the bank, and they drop them. This proposed algorithm should work really well. Many times VPNs are blamed, when the problem is really a bad ICMP filter. However, there is a problem, if you raise the MTU and the tunnels do not toss large messages but fragment them anyway, you will end up always fragmenting. Someone working on a Linux version mentioned that he did not honor the DF bit; having a poor(er) performing implementation was better than one that didn't work at all. ("Poor performance is better than no security.") Perhaps there could be a heuristic that worked for a short term solution so these mechanisms don't interact badly... the endpoint would need to be updated for this algorithm, so IPsec tunneling could be updated at the same time. This behavior is often a kernel option, too. Perhaps you could fragment into tunnel, but retain the DF bit, and if set don't do anything weird. Itojun related KAME experience; there they ended up not setting the DF bit on output header when IPsec tunnels are created. Another point was that IPv6 (on IPv4?) tunnels have the same issue. IPv6 tunnels should have a MTU of 1280 by default so a minimum MTU can be maintained. Matt M. mentioned that he's aware that a large number of tunneling implementations don't copy the DF bit from inner packet to outer header. He's not yet sure if the document needs a specific section covering tunnels and tunnel migration; an intermediate ground that works is to let tunnels behave this way in the interim, and discourage a mode where end systems ignore can't fragment messages. Someone else pointed out that mobility might add additional headers; so a tunnel MTU of 1280 might not be enough. Itojun stated that he was thinking of configured tunnels and not mobile IPv6. If you send a packet with mobile headers, the TCP stack needs to be aware of the size of the mobile IP headers and reduce MSS appropriately -- maintain the total MTU size. Another person agreed that 1280 is particularly bad, 1380 would be better. Matt Mathis noted that there was definitely a subgroup interested in considering tunneling and MTU discovery; he encouraged folks to join the mailing list and contribute the various circumstances where there are potential problems. Lars Eggert mentioned that RFC2003 specifies some things related to MTU discovery, and RFC2401 specifically prohibits some of the mechansims in RFC2003 for security reasons. As to other transport protocols, Matt Z. reported that he had quickly skimmed SCTP and DCCP documents, and that SCTP looked possible, but DCCP says specifically that the MTU can't be raised. No one that claimed to be an SCTP expert was in the room (or at least didn't comment negatively on the applicability to SCTP). Eddie Kohler noted that this behavior was revised in the DCCP WG meeting this week. Matt Z. prompted him to send some DCCP text. Matt M. emphasized that the point with getting a draft done early is to encourage implementation as soon as possible. The algorithm will use specific details of other protocols, and we're dependent on the uniformity of implementation of certain features. We need to learn what implementations really do; ideally get a custom implementation run on servers and real field data to feed back into the document. Matt M also noted two cases that he's worried about (although these are just examples; others are encouraged to consider other cases, or report back implementation experience). First, what happens if a path is striped across multiple links, and the MTU is not the same across the stripes? You can require that the MTU is not raised until a certain number of segments are received successfully. You need to understand the interaction between random losses and whether the MTU is or is not raised. Second, what happens if there is a parametric failure -- when raising the MTU causes the error rate to increase? An actual case is one particular 10G gbic; it was error free with 1500 byte packets, but not with 9000 byte packets. There is an opportiunity for different heuristics here, for example use a smaller MSS if you cannot fill a window. For hard, repeated, timeouts the first thing you want to do is reduce congestion variables, then reduce MTU. At some point want to restart the checks to increase MTU. There are other possible protocol interactions, too: for example, SCTP can use multiple endpoints. What if it changes addresses, and the new path has a smaller MTU? Michael felt that it was important to focus where the production environments hurt most with the current MTU scheme. Matt M noted that different things hurt in different environments. Michael expanded that the most frequent case will likely be large port 80 responses to a client. And it's the client that would decide that the path is stupid or broken, and other A record should be tried. The web server is getting the timeouts, not the client. This won't deploy if we can't solve the web server case. Matt M noted another case he had thought about, but not seen: what happens if raising the MTU causes link stability problems (as opposed to hard failures) -- say the link "goes away" for 10 seconds and then returns. He's thought about using a state machine to catch this case... the link is broken, and we don't necessarily want to fix it with an MTU discovery algorithm. On ignoring DF bits, so that a tunnel fragments large packets: Matt M contended it was worse for a 1500 tunnel fragmenting a 9000 packet than a tunnel fragmenting a 1500 packet by the tunnel overhead. Michael didn't understand this at first; Matt explained that the problem is that with many fragments the odds are greater that you lose a fragment, and hence the whole packet than if there are only two fragments. In thinking about other stakeholders, one person felt that the algorithm would work for RTP over UDP; there you have some extensions to share information -- reliable instrumentation of losses. [XXX] Itojun said that we should contact rrs@cisco.com for SCTP. In the multicast case, it was argued that this algorithm might cause an ACK implosion that is worse than a ICMP-message-too-big implosion. [something about DSS/MPP? bridges & MSS flapping hack that I didn't catch. delete if you don't know anything about it. mr: dss/mpp bridges doing is MSS flapping hack. maybe should write it down and say please don't do it, do it instead XXX] Someone commented that the document as written is very TCP specific. The algorithm should be better separated from actual deployment. Matt M said that's the intention. Another question was what, exactly, is the definition of MTU? End-to-end or link-specific. Matt M said that we were talking about IP MTU when using a particular link layer; how the IETF uses it, not what hardware specifications say. It was also argued that IPv6 might not need this at all; the algorithm could arguably make performance worse (since the MSS size would ramp up instead of being decided once by ICMP-message-too-big); However, the new algorithm would prevent against implementation or configuration bugs and also work in the cases where L2 MTUs were different on a switch. One audience member said that this should be documented in detail. Since v6 has no DF bit, people that block ICMPv6 won't get connectivity, so people won't block it. Matt M noted that some stacks have IPv4 mimic IPv6 -- they always set DF, even on fragments, and attempt to fragment only at the endpoints. However, there's no requirement that routers send the too big messages in v4, but there is a requirement in v6. Another comment was tha the MTU in Router Advertisement messages should solve this problem. If operational experience says this isn't happening, it should be reflected to the v6 working group. Matt M said that in all cases the problems with path MTU discovery are bugs. There are a large set of problems. _______________________________________________ pmtud mailing list pmtud@ietf.org https://www1.ietf.org/mailman/listinfo/pmtud
- [pmtud] pmtud ietf57 vienna minutes, version 0 Matthew J Zekauskas
- Re: [pmtud] pmtud ietf57 vienna minutes, version 0 Michael Richardson
- Re: [pmtud] pmtud ietf57 vienna minutes, version 0 Jun-ichiro itojun Hagino
- [pmtud] the DF-bit honouring problem Michael Richardson
- RE: [pmtud] pmtud ietf57 vienna minutes, version 0 Stephane.Antoine
- Re: [pmtud] pmtud ietf57 vienna minutes, version 0 Magnus Westerlund
- Re: [pmtud] the DF-bit honouring problem Matt Mathis
- Re: [pmtud] the DF-bit honouring problem Jeremy Harris
- Re: [pmtud] the DF-bit honouring problem Michael Richardson
- Re: [pmtud] the DF-bit honouring problem Michael Welzl