Re: [rddp] RE:I-D ACTION:draft-williams-iwarp-ift-00.txt

"Giora Biran" <GBIRAN@il.ibm.com> Sat, 09 November 2002 01:22 UTC

Received: from www1.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA25483 for <rddp-archive@odin.ietf.org>; Fri, 8 Nov 2002 20:22:36 -0500 (EST)
Received: (from mailnull@localhost) by www1.ietf.org (8.11.6/8.11.6) id gA91OiZ14649 for rddp-archive@odin.ietf.org; Fri, 8 Nov 2002 20:24:44 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gA91Oiv14646 for <rddp-web-archive@optimus.ietf.org>; Fri, 8 Nov 2002 20:24:44 -0500
Received: from www1.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA25475 for <rddp-web-archive@ietf.org>; Fri, 8 Nov 2002 20:22:04 -0500 (EST)
Received: from www1.ietf.org (localhost.localdomain [127.0.0.1]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gA91O5v14631; Fri, 8 Nov 2002 20:24:05 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gA8GgKv09017 for <rddp@optimus.ietf.org>; Fri, 8 Nov 2002 11:42:20 -0500
Received: from d12lmsgate-3.de.ibm.com (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id LAA26318; Fri, 8 Nov 2002 11:39:44 -0500 (EST)
Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23]) by d12lmsgate-3.de.ibm.com (8.12.3/8.12.3) with ESMTP id gA8GfQE8107090; Fri, 8 Nov 2002 17:41:26 +0100
Received: from d10hubm1.telaviv.ibm.com (d10ml001.telaviv.ibm.com [9.148.216.55]) by d12relay02.de.ibm.com (8.12.3/NCO/VER6.4) with ESMTP id gA8GfPCZ066594; Fri, 8 Nov 2002 17:41:25 +0100
Importance: Normal
Subject: Re: [rddp] RE:I-D ACTION:draft-williams-iwarp-ift-00.txt
To: Vadim Makhervaks <VADIK@il.ibm.com>
Cc: "Williams, Jim" <Jim.Williams@emulex.com>, Julian Satran <Julian_Satran@il.ibm.com>, "'Culley, Paul'" <Paul.Culley@hp.com>, rddp@ietf.org, rddp-admin@ietf.org, tsvwg@ietf.org
X-Mailer: Lotus Notes Release 5.0.9a January 7, 2002
Message-ID: <OF623E3AB5.2E791CCC-ON42256C6B.00175B8B-42256C6B.005BB059@telaviv.ibm.com>
From: Giora Biran <GBIRAN@il.ibm.com>
Date: Fri, 08 Nov 2002 18:41:29 +0200
X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 5.0.9a |January 7, 2002) at 08/11/2002 18:41:25
MIME-Version: 1.0
Content-type: text/plain; charset="us-ascii"
Sender: rddp-admin@ietf.org
Errors-To: rddp-admin@ietf.org
X-BeenThere: rddp@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rddp>, <mailto:rddp-request@ietf.org?subject=unsubscribe>
List-Id: IETF Remote Direct Data Placement (rddp) WG <rddp.ietf.org>
List-Post: <mailto:rddp@ietf.org>
List-Help: <mailto:rddp-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rddp>, <mailto:rddp-request@ietf.org?subject=subscribe>

Let me add some comments regarding the complexity in a hardware
implementation as I see it:

- It seems that the checking of the CRC is separate from the checking of
the TCP/IP checksum and requires additional buffering. It is true that in
the case of a single (and full) DDP segment per TCP segment both checks can
be done on the same time using the same buffer. But, it doesn't work in the
case of resegmentation middle-box.
The bottom line is that CRC on the DDP header makes the hardware
implementation more efficient by saving buffer size and simplified the flow
of direct placement.

- Allowing resegmentation middle-box caused much more complicated cases
then the packing itself. All the cases even those which seems to be rare,
have to be handle by the RNIC (preferably hardware). Therefore I do not see
a big impact for the implementation to allow packing. Packing might add
complexity on the transmit side, but the receive side have to be capable to
handle packing (caused by a resegmentation middle-box.



Giora


Vadim Makhervaks   07/11/2002 00:52

To:    "Williams, Jim" <Jim.Williams@emulex.com>
cc:    Giora Biran/Haifa/IBM@IBMIL, Julian Satran/Haifa/IBM@IBMIL,
       "'Culley, Paul'" <Paul.Culley@hp.com>, rddp@ietf.org,
       rddp-admin@ietf.org, tsvwg@ietf.org
From:  Vadim Makhervaks/Haifa/IBM@IBMIL
Subject:    Re: [rddp] RE:I-D ACTION:draft-williams-iwarp-ift-00.txt
       (Document link: Giora Biran)

I'll try to put some pros for markers.

First of all I need to admit, that I'm not fun of markers, it is not an
easy thing to implement, but right now I do not see any other solution for
the problem that I'm going to present below.


First few words about host interface, and it's ability to handle bursts
from the wire. I believe that DDP concept is 10gb oriented, and thus a host
interface that should be considered is not PCI-X, but QDR-PCIX, RapidIO,
direct interface to memory controller, etc. It's not clear that such
interface would have any problem to handle inbound traffic on wire-speed.
It should be mostly write operations on the bus, and assuming efficient
bridge/MC implementation, we should not have big problems with bursts of
writes.

Even assuming that RNIC vendor would decide to pass the received stream via
on-board/on-chip memory before placing to the host memory, markers still
would be very helpful. I'll give just one example to emphasize this:
Without markers the location of the next DDP segment is identified by the
Length field of the header of previous DDP segment. If DDP header was lost,
then without use of markers, RNIC vendor would need to wait for receiving
this header, in order to place the data to the destination buffers on the
host. However, if markers are supplied by transmitter, receiver may several
choices: a) use a fast-path (cut-thru) and place received segments directly
to the destination buffers; b) pass received stream thru the memory, but
still be able to place the payload to the destination buffers out-of-order;
c) use a classic reassembly buffers, and place the data to the destination
buffers in-order.

So why should we force RNIC vendor to choose option c), and not leave this
as another feature open for competition?


Vadim




                                                                                                                                  
                      "Williams, Jim"                                                                                             
                      <Jim.Williams@emu        To:       "'Culley, Paul'" <Paul.Culley@hp.com>, Vadim Makhervaks/Haifa/IBM@IBMIL  
                      lex.com>                 cc:       rddp@ietf.org, tsvwg@ietf.org, Giora Biran/Haifa/IBM@IBMIL, Julian       
                      Sent by:                  Satran/Haifa/IBM@IBMIL                                                            
                      rddp-admin@ietf.o        Subject:  [rddp] RE:I-D ACTION:draft-williams-iwarp-ift-00.txt                     
                      rg                                                                                                          
                                                                                                                                  
                                                                                                                                  
                      06/11/02 11:07 PM                                                                                           
                                                                                                                                  
                                                                                                                                  





> -----Original Message-----
> From: Culley, Paul [mailto:Paul.Culley@hp.com]
>
> You have done an excellent analysis of the differences and
> issues at hand.

I would echo that sentiment.

> -----Original Message-----
> From: Vadim Makhervaks [mailto:VADIK@il.ibm.com]


> I probably late on this discussion, but I just recently went thru the
> differences between both drafts.
> It seems (to me) that the changes proposed by Jim are not
> tightly coupled,
> and I would propose to discuss them separately:

Generally true.  There is some coupling, though, which I note.

> ddp header CRC: Sounds as a reasonable approach to me. I can
> imagine why
> such CRC would help to RNIC implementation. On the other
> side, I don't see
> a damage of adding such CRC to the DDP segment,  beside
> another word of
> overhead per DDP segment. I also think that such CRC should
> protect the DDP
> header only, and RDMA extensions (for Read Request and
> Terminate) should be
> considered as a payload. Anyway this disclaimer is
> out-of-the-scope of MPA
> definition, and probably is not relevant too much for this discussion.
>
> <prc> As discussed in prior emails, the MPA authors (after
> much discussion) felt that this is not necessary.
>   You need a separate header CRC if designing a "Flow through NIC":
>            * But Need for elasticity buffer anyway makes "flow through"
an
>              un-needed extra complexity.
>            * But NIC designers want to check L2 CRC and IP checksum first
>              anyway, requires whole packet.
>   You need a separate header CRC if Placing partial FPDUs
> (can place 1st
>   part without whole FPDU):
>            * As long as alignment is correct, this is not needed.
>              o Alignment is Expected to be the usual case in the
> data center,
>                at least, and common elsewhere.
>              o If lost alignment is an uncommon case, extra wire
> overhead and
>                extra complexity of saving headers, while placing
> data doesn't
>                seem worth the small buffer saving
> <prc>

I agree that a single CRC makes sense if alignment is always
guaranteed.  The IFT proposal does not guarantee alignment,
therefore two CRCs make sense.  SCTP does guarantee alignment,
so its single CRC makes sense.

Standard TCP does not provide alignment.  TSVWG is
considering an experimental variant of TCP that
does provide alignment.  But the RDDP work group
charter is clear that RDDP must be standardized on
standard TCP.  Hence the motivation for the IFT proposal.

Also, the benefit of the second CRC for the unaligned
case seems much greater than the cost of an extra
CRC in the aligned case.


> padding: It's definitely easier to implement CRC logic when
> everything is
> word aligned.

This is not an issue for which I have a whole lot of energy.
But my experience and those of the other hardware designers
I have worked with directly contradict the above.
Unaligned CRCs are trivial to implement.  Logic necessary
to insert padding is not trivial.  Either way works.
My vote is no padding, but I'm not terribly concerned
about conceding this point if a majority feel otherwise.

> It's not too much overhead - upto 3 bytes per
> DDP segment. I
> also agree that this is not a MUST requirement, and RNIC
> vendor should be
> able to handle not-aligned CRC generation/validation.
> <prc> Padding also eliminates the problem of how to deal with
> markers that fall in the middle of the CRC.  If everything is
> aligned on 4 byte breaks, then markers also end up on the
> same breaks (tied into the 512 byte marker interval).
> <prc>

True.  IFT does not propose markers, so this point is moot
for IFT.

> <prc> [...] I was personally of the opinion that we should
> allow or even require packing, but was in a minority opinion ;-(.
> <prc>

I suspect that packing may ultimately be required by the IETF.
Either that, or some detailed explaining
as to why network traffic patterns are not negatively affected.

> markers: We may argue whether this feature is helpful or not,
> and both side
> could give plenty examples showing cons and pros of this
> feature.

I volunteer for the cons. :)

There are some basic architectural issues needing
to be resolved as to whether it is permissible for
DDP to do out of order placement when running over
an ordered transport.  I personally am not religious
about this either way.  But unless this can be resolved,
markers make no sense whatsoever.

I do believe the benefits of out of order placement
are sufficiently questionable that the simplicity
of the in order model should not be compromised
if at all avoidable.

I am a bit religious in feeling that markers are
an ugly, complicated, and technically embarrassing
way to solve the problem of alignment.

It has been clearly stated that primary problem
the RDDP group needs to solve is layering RDDP
on standard TCP, and layering on an experimental
aligned version of TCP is secondary.  Markers
appears to be an effort to optimize for
the secondary case at the expense of the primary
case.

I believe that alignment can be done without
markers and without complicating or burdening
the primary objective of standard TCP.  I will
attempt to explore this in future drafts.

draft-ietf-tsvwg-tcp-ulp-frame-01.txt (expired),
is an example of such an approach.  This type
of approach is far more consistent with the
RDDP group charter of treating standard TCP
as the primary objective, and an experimental
aligning version of TCP as a secondary objective.



_______________________________________________
rddp mailing list
rddp@ietf.org
https://www1.ietf.org/mailman/listinfo/rddp







_______________________________________________
rddp mailing list
rddp@ietf.org
https://www1.ietf.org/mailman/listinfo/rddp