Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis

Mark Allman <mallman@icir.org> Fri, 10 May 2013 15:37 UTC

To: Pasi Sarolahti <pasi.sarolahti@iki.fi>
From: Mark Allman <mallman@icir.org>
In-Reply-To: <FCF05C2E-7414-4F1E-B63C-EFC5C94812E4@iki.fi>
Organization: International Computer Science Institute (ICSI)
Song-of-the-Day: Ain't Even Done With the Night
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="--------ma5147-1"; micalg="pgp-sha1"; protocol="application/pgp-signature"
Date: Fri, 10 May 2013 11:37:00 -0400
Sender: mallman@icir.org
Message-Id: <20130510153700.ED45811C3C9C@lawyers.icir.org>
Cc: "tcpm (tcpm@ietf.org)" <tcpm@ietf.org>
Subject: Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis
Precedence: list
Reply-To: mallman@icir.org

I adamantly do not support 1323bis being published in its current
state.

  - My overall notion is that bis documents should take into account
    what has been learned since the initial publication.  Sometimes
    we learn the documents are unclear.  Sometimes we learn about
    refinements to techniques that can make things better in some
    fashion.  Sometimes we learn that additional mechanisms would be
    handy.  Sometimes we learn that what we previously wrote is just
    wrong.  We should take all this into account when we update
    documents.

  - 1323bis does not take into account much of what has been learned
    since the publication of 1323.  Sure, it fixes a few small
    things, but it just completely ignores the big things.  There
    are forests.  And, there are trees.  And, 1323bis is concerned
    with bark.

  - 1.1, item (1): This does not say *why* the window is a
    "fundamental performance problem".  I mean, I know it is, but it
    seems the document should at least sketch and cite some notion
    that the maximum window and the RTT constrain the performance.

  - 1.1, item (2): Citing 6675 is fine.  That is how to use SACKs.
    But, seemingly if you want to cite *SACK* you should also cite
    2018 and 2883 as these define SACK.

  - 1.1, item (3): This is not a "fundamental performance problem".
    This should be removed from the list (see below).

  - 1.3 states "it is not expected that most TCP implementations
    will properly handle unknown options".  Why do we have to frame
    this as an expectation when there is empirical evidence that
    this is the case?  Again, bis documents should be open to what
    we have *learned*.

    At least these two papers speak directly to the issue:

      Alberto Medina, Mark Allman, Sally Floyd.  Measuring
      Interactions Between Transport Protocols and Middleboxes.  ACM
      SIGCOMM/USENIX Internet Measurement Conference.  October 2004.

      Alberto Medina, Mark Allman, Sally Floyd.  Measuring the
      Evolution of Transport Protocols in the Internet.  ACM
      Computer Communication Review, 35(2), April 2005.

    There are probably others.  (It is not my intention to flog my
    papers, I just readily know of them.)

  - 1.3: "We recognize there is a tradeoff between the bandwidth
    saved by reducing unnecessary retransmission timeouts, and the
    extra header bandwidth used by this option.  It is required that
    this TCP option will be sent on non-<SYN> segments only after an
    exchange of options on the <SYN> segments has indicated that
    both sides understand this extension."

    Two issues:

      - First, you **never** establish that RTTM "reduc[es]
        unnecessary retransmission timeouts".  You do not cite
        anything that says that.  You do not present data that says
        that.  You are simply waving your hands and hoping that more
        RTT samples means a better RTO and that leads to fewer
        spurious retransmissions.

      - Second, I find the above ambiguous.  When I read it I
        wondered if this was wiggle room to let TCPs not send TS on
        all segments after the 3WHS.  Later in the document you
        unequivocally say that TS must be sent on all segments after
        negotiated.  I think you want "will be sent on ALL
        non-<SYN>" in the above to leave no doubt what you mean.

  - 2.1--2.3: The whole discussion of the WS option is a bit hard to
    follow.  

    The first sentence of the section says: "The window scale
    extension expands the definition of the TCP window to 32 bits
    and then uses a scale factor to carry this 32-bit value in the
    16-bit Window field of the TCP header".

    That is not true because at most the option dictates you can
    scale the window by at most 15 bits---or to an overall window
    size of 31 bits.

    As much is admitted later.

    And, then it is further refined such that the window must
    actually be *less* than 2^31 and so the max WS factor should be
    14 to yield a max window of 2^30.

    The discussion of why this is and all is fine and I have no
    problem with it.  But, don't say the window can be 32 bits or 31
    bits when it can't be.  It just leads to confusion.

  - 2.2: It is confusing that you introduce a "scale factor" that is
    different from the encoded "window scale".  I.e., a "scale
    factor" of 1 seemingly means the window size is not scaled at
    all.  But, this requires a "window scale" of zero.  The
    discussion could just be cleaned up to get rid of the "scale
    factor" notion.  Or, **at least** you could define the scale
    factor as one plus the window scale.

  - 2.2: "A Window Scale option in a segment without a SYN bit
    SHOULD be ignored."  Why is this?  Why not "MUST be ignored"?
    Is there some case where the TCP should really pay attention to
    it?  I can't think of one as things are presently defined.

  - In 2.3 you note (in a somewhat tortured way) that if the WS
    arrives as 15 then it should be assumed to be 14.  You might
    sketch why this is safe.  I.e., even if the other side is
    offering a window of 2^31 bytes you'll only use half that and so
    all will be fine.

  - 2.3: You should cite something for congestion avoidance and slow
    start.  I usually cite both Jac88 and the RFC.  But, either is
    fine.

  - 3.1: "Many TCP implementations base their RTT measurements upon
    a sample of one segment per window or less.  While this yields
    an adequate approximation to the RTT for small windows, it
    results in an unacceptably poor RTT estimate for a LFN."

    Do you have evidence of this?  We have evidence you're wrong:

      Mark Allman, Vern Paxson.  On Estimating End-to-End Network
      Path Properties.  Proceedings of the ACM SIGCOMM Technical
      Symposium, Cambridge, MA, September 1999.

    That shows that the number of samples per RTT is pretty
    immaterial to the effectiveness of the RTO.

  - 3.1: "If we look at RTT estimation as a signal processing
    problem (which it is), a data signal at some frequency, the
    packet rate, is being sampled at a lower frequency, the window
    rate.  This lower sampling frequency violates Nyquist's criteria
    and may therefore introduce "aliasing" artifacts into the
    estimated RTT [Hamming77]."

    This is hand waving.  At best.

    I buy that the RTT is a signal.  The mistake above is trying to
    tightly couple the frequency of that signal to the "packet
    rate".  If that was true then the conclusion that *any* RTT
    measurement strategy that relied only on TCP packets themselves
    would in fact violate Nyquist.  But, why should I think its
    true?  It might be true for one flow over a dumbbell where every
    packet directly and materially influences the RTT.  But, over an
    network with even a little statmux it quickly becomes clear that
    the frequency of the RTT process is not dependent on the packet
    rate of any particular source and so the above reasoning is just
    not sound.

    Or, to put it a different way: when the math and the world
    disagree, the world is right.  And the world says something
    different from what the document says:

      Mark Allman, Vern Paxson.  On Estimating End-to-End Network
      Path Properties.  Proceedings of the ACM SIGCOMM Technical
      Symposium, Cambridge, MA, September 1999.

  - 3.1: "RTT estimator".  Note, TCP does not have an "RTT
    estimator".  Scrub this from the document.  We have an "RTO
    estimator".  These are different things.  Confusing them is a
    fundamental mistake.

  - 3.1: "it becomes effectively impossible to obtain a valid RTT
    measurement".  This is FUD.  The RTO backoff [RFC6298] stays in
    place until you can take an unambiguous RTT sample.  So, this
    statement is not true unless you get to the point when you
    cannot get a single packet through the network per maximum RTO.
    And, in that case, RTTM isn't going to save you.  Please remove
    this notion.

  - 3.1: "It is vitally important to use the RTTM mechanism with big
    windows; otherwise, the door is opened to some dangerous
    instabilities due to aliasing."

    Show me.  Where is the evidence?  Cite something.  This document
    makes a whole lot of statements that seem to come from thin air.
    In contrast to when 1323 was published, we have studied this
    stuff for quite a long time now and we have a better idea how
    these things work.  So, you should be able to readily establish
    such statements or you should not make them.  (Or, **at least**
    explain how you think this "dangerous instability" is going to
    come to pass.)

    This is a whole additional level of ridiculous.  Its one thing
    to say you get a more accurate RTO with more RTT samples.  You'd
    be wrong, but at least all we're talking about is a little more
    or a little less time waiting on retransmissions.  But, to say
    this leads to *instability* is not right.  The backoff in the
    RTO backstops busted RTO (congestion) decisions.  Even if the
    RTO is quite wrong the backoff will kick in to prevent
    "dangerous instabilities".

  - End of 3.3 on RTT sample weighting factors.

    (1) The problem with the history being truncated when using RTTM
        was independently highlighted by Ludwig and Floyd.  We
        should at least have the common courtesy to cite Sally's
        note to e2e and Reiner's paper.

    (2) Instead of some nebulous reference to RTO algorithms you
        could point to the standard one, anyway.  I fully understand
        that some implementations don't use it, but at least as a
        concrete example of an algorithm that has this problem.

    (3) ARE YOU KIDDING ME?  

        This document goes through a tortured and wrong analysis
        telling me how "vitally important" RTTM is to address a
        "fundamental performance problem" of TCP over LFNs and how
        my performance is going to be "unacceptably poor" without
        it. 

        And, then we have this pesky problem that the history in the
        RTO depends on the window size when you actually do the
        thing you suggest we do (use RTTM).

        So, the first way to address this problem with the RTO is:

          "an implementation could choose to just use one sample per
          RTT to update the RTO estimator"

        Thanks for the whiplash!  Holy shit.  Really?! 

        Which one is it?  If I take this advice then aren't I in
        massive violation of Nyquist and causing dangerous
        instability to the network?

        This negates whole tracts of the damn draft.  Be wrong if
        you want, but at least have the decency to stay wrong.

  - Again, bis documents should reflect our understanding of the
    world.  RTTM does not *hurt* anything.  It just doesn't *help*
    anything either.  We should be honest enough to take this into
    account in our documents.

    RTTM should not be deprecated.  It should be a MAY.

    RTTM should not be discussed with breathless bullshit about hand
    wavy math and un-demonstrated stability issues and whatnot.  

    We should say that RTTM is absolutely within compliance of the
    spec and that it will not hurt your RTO.

    We should also say that RTTM is unlikely to help your RTO.

    We should leave it to implementers to decide if RTTM is useful
    for their purposes.

    We should specify a way to vary the gains in the standard RTO
    algorithm based on the current cwnd.

    And, we should absolutely state that there are other uses for
    the timestamp option (like Eifel, like PAWS) and there is
    nothing wrong with the *option* for that purpose.

  - 3.4, (A): Why are we discussing this in terms of the "Kth"
    segment?  Delayed ACKs per the standard is "2nd".  Why do we
    have to make the discussion in terms of some theory rather than
    in terms of what we have specified?

  - 3.2 insinuates that you should not include a timestamp on an
    RST: "TSopt MUST be sent in every non-<RST> segment", implying
    it should not be sent on an <RST> (or you'd have just said
    "every segment").  But, then 4.2 goes on to (rightly IMO)
    develop why we should include it on <RST> segments.  This
    inconsistency needs fixed.

Sorry ... there is just no way this is close to ready, IMO.

allman

Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Pasi Sarolahti
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
[tcpm] WGLC for draft-ietf-tcpm-1323bis Pasi Sarolahti
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Pasi Sarolahti
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Yuchung Cheng
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Yoshifumi Nishida
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Scheffenegger, Richard
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis John Leslie
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Scheffenegger, Richard
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Scheffenegger, Richard
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis David Borman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis David Borman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Yuchung Cheng
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis David Borman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Scheffenegger, Richard
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Michael Welzl
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Pasi Sarolahti
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis David Borman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis David Borman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Scheffenegger, Richard
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Yuchung Cheng
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Yuchung Cheng
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Scheffenegger, Richard
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Joe Touch
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Yuchung Cheng
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Pasi Sarolahti
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis David Borman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Scheffenegger, Richard
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Michael Welzl
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Michael Welzl
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Michael Welzl
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Mark Allman
Re: [tcpm] WGLC for draft-ietf-tcpm-1323bis Scheffenegger, Richard
[tcpm] WGLC for draft-ietf-tcpm-1323bis Pasi Sarolahti