[tcpm] Benjamin Kaduk's Discuss on draft-ietf-tcpm-rfc793bis-25: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 23 September 2021 03:34 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: tcpm@ietf.org
Delivered-To: tcpm@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 5F0BA3A1B4F; Wed, 22 Sep 2021 20:34:00 -0700 (PDT)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-tcpm-rfc793bis@ietf.org, tcpm-chairs@ietf.org, tcpm@ietf.org, Michael Scharf <michael.scharf@hs-esslingen.de>, michael.scharf@hs-esslingen.de
X-Test-IDTracker: no
X-IETF-IDTracker: 7.38.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <163236803976.28405.5643771942452620510@ietfa.amsl.com>
Date: Wed, 22 Sep 2021 20:34:00 -0700
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/I2NXzl_-7q9vZHaXI6YqYI7sSG4>
Subject: [tcpm] Benjamin Kaduk's Discuss on draft-ietf-tcpm-rfc793bis-25: (with DISCUSS and COMMENT)
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Sep 2021 03:34:01 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-tcpm-rfc793bis-25: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-tcpm-rfc793bis/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

Many thanks for taking on the task of producing a roll-up update for the
core TCP specification!  I am sure it was a lot of work, but I am happy
to see it done.

That said, I do have a few points that I would like to have a bit more
discussion on before the document is published; I'm happy to see that
Warren already linked to
https://www.ietf.org/blog/handling-iesg-ballot-positions/ on the topic
of what a DISCUSS position can (and cannot) mean.

(1) We incorporate some long-standing enhancements that improve the
security and robustness of TCP (in particular, random ISN and protection
against off-path in-window attacks come to mind), but only at SHOULD or
MAY requirements level.

For example, we currently say:

   A TCP implementation MUST use the above type of "clock" for clock-
   driven selection of initial sequence numbers (MUST-8), and SHOULD
   generate its Initial Sequence Numbers with the expression:

   ISN = M + F(localip, localport, remoteip, remoteport, secretkey)

and:

         +  RFC 5961 [37] section 5 describes a potential blind data
            injection attack, and mitigation that implementations MAY
            choose to include (MAY-12).  TCP stacks that implement
            RFC 5961 MUST add an input check that the ACK value is
            [...]

What prevents us from making a MUST-level requirement for randomized
ISNs?  Is it just the fact that it was only a SHOULD in RFC 6528 and a
perception that promoting to a MUST would be incompatible with retaining
Internet Standard status?

Likewise, what prevents using stronger normative language (e.g., MUST)
for the RFC 5961 protections?

It seems to me that these mechanisms are of general applicability and
provide significant value for use of TCP on the internet, even though
they are not fully robust and do not use cryptographic mechanisms.  If
there are scenarios where their use is harmful or even just not
applicable, that seems like an exceptional case that should get
documented so as to strengthen the general recommendation for the
non-exception cases.


(2) I think this is just a process question to ensure that the IESG
knows what we are approving at Internet Standard maturity, though it
is certainly possible that I misunderstand the situation.

In Section 3.7.3 we see the normative statement (SHLD-6) that "when the
when the effective MTU of an interface varies packet-to- packet, TCP
implementations SHOULD use the smallest effective MTU of the interface
to calculate the value to advertise in the MSS option".  This seems to
originate in RFC 6691 (being obsoleted by this document), but RFC 6691
is only an Informational document and has not had an opportunity to
"accumulate experience at Proposed Standard before progressing", to
paraphrase RFC 6410.

Similarly, Section 3.9.2 has (SHLD-23) "Generally, an application SHOULD
NOT change the DiffServ field value during the course of a connection
(SHLD-23)."  This is a bit harder to track down, as the DiffServ field
was not always known by that name.  I actually failed to find a directly
analogous previous statement of this guidance (presumably my error), and
thus don't know if it had any experience at the PS level or not.

RFC 6410 seems pretty clear that some revisions are okay in Internet
Standards without such "bake time" at PS, but it does seem like
something that should be done consciously rather than by accident.

(3) This is also a process point for explicit consideration by the IESG.

Appendix A.2 appears to discuss a few (rare) scenarios in which the
technical mechanisms of this document fail catastrophically (e.g.,
getting stuck in a SYN|ACK loop and failing to complete the handshake).
Does this meet the "resolved known design choices" and "no known
technical omission" bar required by RFC 2026 even for *proposed*
standard?

(Note that RFC 2026 explicitly says that the IESG may waive this
requirement, at least for PS.)


(AFAICT one such scenario is reported at
https://www.rfc-editor.org/errata_search.php?eid=3305 , which the change
log for this document calls out as "not applicable due to other
changes"; I am not sure which "other changes" are intended, for this
case.)

(4) Another point mostly just to get explicit IESG acknowledgment
(elevating one of Lars' comments to DISCUSS level, essentially).

As the changelog (and gen-art reviewer!) notes:

   Early in the process of updating RFC 793, Scott Brim mentioned that
   this should include a PERPASS/privacy review.  This may be something
   for the chairs or AD to request during WGLC or IETF LC.

I don't see any evidence to suggest that such a review actually
occurred.  Do we want to seek out such a targeted review before
progressing?


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Thank you for the editorial changes so that we now talk about "a TCP
implementation" or a "remote TCP peer" rather than just "a TCP" or
"a remote TCP"!

Abstract

                                                 It also updates RFC
   5961 by adding a small clarification in reset handling while in the
   SYN-RECEIVED state.  [...]

I'm not sure I found what this clarification was; is SYN-RECEIVED the
correct state?  The ad-hoc diff I constructed between RFC 793 and this
document shows identical text for the "If the RST bit is set" case when
currently in SYN-RECEIVED STATE.

Section 3.1

   Options: [TCP Option]; Options#Size == (DOffset-5)*32; present
   only when DOffset > 5.

My (later) nit-level notation comment aside, the given expression does
not seem to convey the size occupied by the options, but rather the
combined size of the options and the padding.

Section 3.2

   A TCP Option is one of: an End of Option List Option, a No-Operation
   Option, or a Maximum Segment Size Option.

The IANA registry lists some thirty-odd option kinds, so this sentence
just seems false without some additional qualifier ("defined by this
specification", etc.)

Section 3.4

   In response to sending data the TCP endpoint will receive
   acknowledgments.  The following comparisons are needed to process the
   acknowledgments.
      [...]
      SEG.SEQ = first sequence number of a segment

      SEG.LEN = the number of octets occupied by the data in the segment
      (counting SYN and FIN)

      SEG.SEQ+SEG.LEN-1 = last sequence number of a segment

It seems to me that this information from the incoming segment is not
part of processing the *acknowledgment*, but rather part of processing
the data received in that segment (a procedure discussed a few
paragraphs later).

             This clock is a 32-bit counter that typically increments at
   least once every roughly 4 microseconds, [...]
   Maximum Segment Lifetime (MSL), generated ISNs will be unique, since
   it cycles approximately every 4.55 hours, which is much longer than
   the MSL.

Once we put in the "at least" we allow arbitrarily faster clock updates,
and that puts the "approximately every 4.55 hours" estimate in question.
Very fast clock updates would cycle correspondingly faster.  Do we need
to place a lower limit on the clock update interval?  (On first look, it
seems like we might not, since the keyed PRF F() is providing most of
the protection from off-path guessing, and an attacker can always use
direct connections to estimate the clock cycle interval.  OTOH, if it
cycles so fast that it repeats within O(MSL), that might be problematic.)

   parameters and some secret data.  For discussion of the selection of
   a specific hash algorithm and management of the secret key data,
   please see Section 3 of [41].

The guidance in the referenced document seems a bit dated (it indicates
that MD5 is probably still okay for this purpose).  While the known
attacks on MD5 do not directly translate into an attack on ISN
generation, collisions can be found on as little as 64 bytes of input,
and all of the straightforward ways to use pure MD5 as a keyed hash for
this purpose have some undesirable properties.  I'm happy to note that
FreeBSD is using siphash for this purpose, which should be more than
adequate.  I expect that Linux and other major TCP stacks are already
doing something similar, so the guidance to use MD5 may be dated in
practice as well as in utility.

I don't have a great proposal for where to put some updated guidance
(unless there's already some work underway in tcpm?); it is probably not
appropriate to put it inline here, so either an appendix or a separate
document seem plausible.

Section 3.7.1

   The MSS value to be sent in an MSS option should be equal to the
   effective MTU minus the fixed IP and TCP headers.  By ignoring both
   IP and TCP options when calculating the value for the MSS option, if
   there are any IP or TCP options to be sent in a packet, then the
   sender must decrease the size of the TCP data accordingly.  RFC 6691
   [42] discusses this in greater detail.

I note that RFC 6691 is obsoleted by this document; it seems to me that
if we think there is useful content still in that document, we should
include such content in this document instead of referring to a document
we are calling obsolete.  (This is not the only place we do so, to be
clear, but I will try to mention it just once.  I do see the note that
we only claim to incorporate the normative portions of most of the
obsoleted specs, leaving the informational content alone.)

Section 3.8.4

   An implementation SHOULD send a keep-alive segment with no data
   (SHLD-12); however, it MAY be configurable to send a keep-alive
   segment containing one garbage octet (MAY-6), for compatibility with
   erroneous TCP implementations.

Such misbehaved TCP impelementations were misbehaved even in 1989 when
RFC 1122 was published -- do we have a sense for whether they are still
around to any significant degree?

Section 3.8.5

   As a result of implementation differences and middlebox interactions,
   new applications SHOULD NOT employ the TCP urgent mechanism (SHLD-
   13).  However, TCP implementations MUST still include support for the
   urgent mechanism (MUST-30).  Details can be found in RFC 6093 [38].

This "SHOULD NOT employ" has been in force for over a decade (RFC 6093
is dated January 2011).  How long do we have to wait until there are
sufficiently few implementations employing the urgent mechanism that it
no longer needs to be implemented?

Section 3.9.2.3

      An incoming SYN with an invalid source address MUST be ignored
      either by TCP or by the IP layer (MUST-63) (Section 3.2.1.3 of
      [18]).

Requirements of the form "A or B must do X" that are ambiguous about
whether A or B takes the action leave the risk that both will expect the
other party to take the action, and the action will fail to occur.  If
we're in a position to specifically require one (or both!) to check,
that leads to a more robust and verifiable system.  (I assume we're not
in such a position, but it can't hurt to check.)

Section 4

   Destination Address
           The network layer address of the remote endpoint.
   [...]
   Source Address
           The network layer address of the sending endpoint.

These definitions don't seem to work in the context of a receiver
validating the TCP checksum, where the destination address is the local
endpoint's address and the source address is the remote endpoint's
address.  (I note that these definitions are different from what RFC 793
itself used.)

   receive window
           This represents the sequence numbers the local (receiving)
           TCP endpoint is willing to receive.  Thus, the local TCP
           endpoint considers that segments overlapping the range
           RCV.NXT to RCV.NXT + RCV.WND - 1 carry acceptable data or
           control.  Segments containing sequence numbers entirely
           outside of this range are considered duplicates and
           discarded.

Duplicates or injection attacks (when the sequence numbers in the
segment are too large).

Section 5

   The collection of applicable RFC Errata that have been reported and
   either accepted or held for an update to RFC 793 were incorporated
   (Errata IDs: 573, 574, 700, 701, 1283, 1561, 1562, 1564, 1565, 1571,
   1572, 2296, 2297, 2298, 2748, 2749, 2934, 3213, 3300, 3301, 6222).
   Some errata were not applicable due to other changes (Errata IDs:
   572, 575, 1569, 3305, 3602).

I think that EID 1565 belongs in the "not applicable due to other
changes" list, since the text it attempts to modify involves the
now-removed discussion of the IP "precedence" field.

Similarly, EID 2296 also affected text about precedence and security
that is no longer present in a recognizable form.

   The more secure Initial Sequence Number generation algorithm from RFC
   6528 was incorporated.  See RFC 6528 for discussion of the attacks
   that this mitigates, as well as advice on selecting PRF algorithms
   and managing secret key data.

(As I mentioned up in §3.4, that guidance is no longer current.)

Section 9.1

It's not clear to me that RFC 2675 ([5]) needs to be classified as
normative.

The guidance at
https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/
would suggest that RFC 5961 ([37]) should be classified as normative,
since we replicate its MUST-level requirements with the condition that
"TCP stacks that implement RFC 5961 MUST [...]", which would appear to
make that behavior an "optional feature".

Appendix A.1.2

   The IP security option (IPSO) and compartment defined in [1] was
   refined in RFC 1038 that was later obsoleted by RFC 1108.  The
   Commercial IP Security Option (CIPSO) is defined in FIPS-188, and is
   supported by some vendors and operating systems.  RFC 1108 is now

Should we mention that FIPS-188 is archived and withdrawn by NIST?
(I also didn't find much to define the actual IP option in the PDF I
found,
https://csrc.nist.gov/csrc/media/publications/fips/188/archive/1994-09-06/documents/fips188.pdf,
but I didn't look very hard.)

Appendix A.3

It's fascinating to me that the preferred reference for this modified
Nagle algorithm is an Internet-Draft from 1999, vs something more
recent.

Appendix B

     Every 2nd full-sized segment or 2*RMSS ACK'd | SHLD-19|x| | | | |

This 'x' seems to be in the "MUST" column, not the "SHOULD" column.

   Time Stamp support                             | MAY-10 | | |x| | |

How do we square timestamp support being a "MAY" with SHLD-4,
SHOULD-level guidance to use timestamps to reduce TIME-WAIT?

   Time Exceeded => tell ALP, don't abort         | MUST-56| | | | |x|
   Param Problem => tell ALP, don't abort         | MUST-56| | | | |x|

Is there a double negative between "don't abort" and the 'x' being in
the "MUST NOT" column?

NITS

I made essentially no attempt to de-duplicate the nit-level remarks
against the ballot positions from other ADs (in contrast to the other
comments, where I made some modest effort to de-duplicate).  My
apologies for the extra work to ignore the already-fixed items.

Section 1

   For several decades, RFC 793 plus a number of other documents have
   combined to serve as the core specification for TCP [48].  Over time,
   a number of errata have been filed against RFC 793, as well as
   deficiencies in security, performance, and many other aspects.  The

A naive parse would say that this means a number of errata have been
filed against deficiencies.  I suspect the transition between errata and
deficiencies should refer to deficiencies having been "discovered" or
similar.

   The purpose of this document is to bring together all of the IETF
   Standards Track changes that have been made to the base TCP
   functional specification and unify them into an update of RFC 793.

It's a little surprising to see this described as an "update of RFC 793"
(vs. a "replacement of" or "updated version of") since the relationship
is Obsoletes, not Updates.  I might even consider "into a single
consolidated specification".

Section 3.1

   Options: [TCP Option]; Options#Size == (DOffset-5)*32; present
   only when DOffset > 5.

The "Options#Size" notation seems confusing and is not using any
convention I'm aware of.  It does not appear in RFC 793 or any other RFC
that I can find, either.

Section 3.3.1

   maintenance of a TCP connection requires the remembering of several
   variables.  We conceive of these variables being stored in a

"the remembering of" is a fairly awkward phrase, where something like
just "remembering" or "maintaining state for" would flow more naturally.

Section 3.4

   It is essential to remember that the actual sequence number space is
   finite, though very large.  This space ranges from 0 to 2**32 - 1.

The sense of scale in the broader ecosystem may have evolved out from
under us; QUIC's 62-bit sequence space might be more along the lines of
"very large" these days, with a 32-bit space being merely "large".

   A connection is defined by a pair of sockets.  Connections can be

This is the first instance of the word "socket" in this document.
RFC 793 used the term much more prevalently, but this update has
(beneficially, IMO) moved away from that approach in favor of discussing
IP addresses and port numbers.  Might such a change be appropriate here
as well?  Regardless, we should probably have some introduction to what
we mean by "socket" if we are to retain any uses of the term, IMO, more
than just the glossary entry.

   verify this SYN.  The three way handshake and the advantages of a
   clock-driven scheme are discussed in [68].

I don't have access to the reference, but it's not clear from just it's
abstract whether "advantages of" or "advantages over" a clock-driven
scheme is the intended meaning.

   explanation for this specification is given.  TCP implementors may
   violate the "quiet time" restriction, but only at the risk of causing
   some old data to be accepted as new or new data rejected as old
   duplicated by some receivers in the internet system.

Maybe "old duplicated data"?  The current phrasing feels like it's
missing a word.

                                Hosts that prefer to avoid waiting are
   willing to risk possible confusion of old and new packets at a given
   destination may choose not to wait for the "quiet time".

I think this needs an "and", for "prefer to avoid waiting and are
willing to risk".

   To summarize: every segment emitted occupies one or more sequence
   numbers in the sequence space, the numbers occupied by a segment are
   "busy" or "in use" until MSL seconds have passed, upon rebooting a
   block of space-time is occupied by the octets and SYN or FIN flags of
   the last emitted segment, if a new connection is started too soon and
   uses any of the sequence numbers in the space-time footprint of the
   last segment of the previous connection incarnation, there is a
   potential sequence number overlap area that could cause confusion at
   the receiver.

This list seems to be missing an "and".
(Also, is it really only the last emitted segment that could cause
problems?)

Section 3.5

                                                             It is the
   implementation of a trade-off between memory and messages to provide
   information for this checking.

I'm not sure this reads well; is "the implementation of" needed?

      If an incoming segment has a security level, or compartment that
      does not exactly match the level and compartment requested for the
      connection, a reset is sent and the connection goes to the CLOSED
      state.  The reset takes its sequence number from the ACK field of

The comma in the first line is no longer needed (it was part of the list
when precedence was still part of the list).

Section 3.6

      In this case, a FIN segment can be constructed and placed on the
      outgoing segment queue.  No further SENDs from the user will be
      accepted by the TCP implementation, and it enters the FIN-WAIT-1
      state.  RECEIVEs are allowed in this state.  All segments
      preceding and including FIN will be retransmitted until
      acknowledged.  When the other TCP peer has both acknowledged the
      FIN and sent a FIN of its own, the first TCP peer can ACK this
      FIN.  Note that a TCP endpoint receiving a FIN will ACK but not
      send its own FIN until its user has CLOSED the connection also.

Naming the two peers (e.g., A and B) can help avoid awkward grammatical
constructions like "can ACK this FIN" and improve clarity.

Section 3.8

   segments may arrive due to network or TCP retransmission.  As
   discussed in the section on sequence numbers the TCP implementation
   performs certain tests on the sequence and acknowledgment numbers in
   the segments to verify their acceptability.

comma after "sequence number".

Section 3.8.6.2.2

   Note that the general effect of this algorithm is to advance RCV.WND
   in increments of Eff.snd.MSS (for realistic receive buffers:
   Eff.snd.MSS < RCV.BUFF/2).  Note also that the receiver must use its
   own Eff.snd.MSS, assuming it is the same as the sender's.

I think the last sentence would be more clear if it was something like
"making the assumption that is the same" or "on the assumption that it
is the same".

Section 3.8.6.3

   Note that there are several current practices that further lead to a
   reduced number of ACKs, including generic receive offload (GRO), ACK
   compression, and ACK decimation [26].

Reference [26] seems reasonable for ACK decimation and ACK compression,
but doesn't seem to cover GRO at all.

Section 3.9.1

         If the PUSH flag is set, the application intends the data to be
         transmitted promptly to the receiver, and the PUSH bit will be
         set in the last TCP segment created from the buffer.  When an
         application issues a series of SEND calls without setting the
         PUSH flag, the TCP implementation MAY aggregate the data
         internally without sending it (MAY-16).

There's a dedicated paragraph a few paragraphs later for when the PUSH
flag is not set; the last sentence might flow better there.

         Some TCP implementations have included a FLUSH call, which will
         empty the TCP send queue of any data that the user has issued
         SEND calls but is still to the right of the current send
         window.  That is, it flushes as much queued send data as

I think "has issued SEND calls for" (add "for").

Section 3.9.2

   When received options are passed up to TCP from the IP layer, TCP
   implementations MUST ignore options that it does not understand
   (MUST-50).

singular/plural mismatch (it/implementations)

Section 3.9.2.2

   Soft Errors
     For ICMP these include: Destination Unreachable -- codes 0, 1, 5,
     Time Exceeded -- codes 0, 1, and Parameter Problem.

     For ICMPv6 these include: Destination Unreachable -- codes 0 and 3,
     Time Exceeded -- codes 0, 1, and Parameter Problem -- codes 0, 1,
     2.

     Since these Unreachable messages indicate soft error conditions,

I'm not entirely sure that I'd classify "parameter problem" as an
"unreachable" message per se.

Section 3.10

   Please note in the following that all arithmetic on sequence numbers,
   acknowledgment numbers, windows, et cetera, is modulo 2**32 the size
   of the sequence number space.  Also note that "=<" means less than or

Some punctuation around "the size of the sequence number space" seems in
order.

   equal to (modulo 2**32).

[In formal mathematics this "less than or equal to, modulo N" operator
is not defined.  But it's probably okay in this context.]

Section 3.10.1

         the parameters of the incoming SYN segment.  Verify the
         security and DiffServ value requested are allowed for this
         user, if not return "error: precedence not allowed" or "error:
         security/compartment not allowed."  If passive enter the LISTEN

It's surprising for the error string to mention "precedence" when the
predicate is DiffServ value.

         with "error: insufficient resources".  If Foreign socket was
         not specified, then return "error: remote socket unspecified".

I suspect s/Foreign/remote/ was intended.  (Also occurs later, but I
will just note it once here.)

Section 3.10.3

      -  Since the remote side has already sent FIN, RECEIVEs must be
         satisfied by data already on hand, but not yet delivered to the
         user.  If no text is awaiting delivery, the RECEIVE will get a
         "error: connection closing" response.  Otherwise, any remaining
         text can be used to satisfy the RECEIVE.

I think s/text/data/ should be applied on the last line (since it was
already applied on the second line).

Section 3.10.7.4

   o  Segments are processed in sequence.  Initial tests on
      arrival are used to discard old duplicates, but further
      processing is done in SEG.SEQ order.  If a segment's
      contents straddle the boundary between old and new, only the
      new parts should be processed.

Maybe s/should be/are/?  There's not really optionality about it...

            *  If this connection was initiated with a passive OPEN
               (i.e., came from the LISTEN state), then return this
               connection to LISTEN state and return.  The user need
               not be informed.  If this connection was initiated
               with an active OPEN (i.e., came from SYN-SENT state)
               then the connection was refused, signal the user
               "connection refused".  In either case, all segments on
               the retransmission queue should be removed.  And in

IIUC, what's described here as "removed" is described elsewhere as
"flushed"; it would be good to use consistent terminology when possible.

         +  Once in the ESTABLISHED state, it is possible to deliver
            segment text to user RECEIVE buffers.  Text from segments
            can be moved into buffers until either the buffer is full
            or the segment is empty.  If the segment empties and
            [...]

As above, it seems like (case-insensitive) s/test/data/ would improve
consistency.

Section 4

   internet datagram
           The unit of data exchanged between an internet module and the
           higher level protocol together with the internet header.

"exchanged between an internet module and the higher level protocol"
sounds like a local operation; I would have expected the definition of
an *internet* datagram to involve transfer over the (inter)network.

   segment length
           The amount of sequence number space occupied by a segment,
           including any controls that occupy sequence space.

Should we say that this is a field in the segment header?

   URG
           A control bit (urgent), occupying no sequence space, used to
           indicate that the receiving user should be notified to do
           urgent processing as long as there is data to be consumed
           with sequence numbers less than the value indicated in the
           urgent pointer.

To me, "value indicated in" is synonymous with "value contained in",
which is problematic here since the urgent field is only 16 bits and
sequence numbers 32 bits.  "indicated by" would be an improvement,
though of course if we're willing to spend more words we can increase
clarity further.

Appendix A.1

   RFC 793 requires checking the IP security compartment and precedence
   on incoming TCP segments for consistency within a connection, and

I think the past tense "required" would be more appropriate upon
publication of this document as an RFC obsoleting RFC 793.