Re: [rddp] RE:I-D ACTION:draft-williams-iwarp-ift-00.txt

"Vadim Makhervaks" <VADIK@il.ibm.com> Wed, 06 November 2002 22:53 UTC

Received: from www1.ietf.org (ietf.org [132.151.1.19] (may be forged)) by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA16055 for <rddp-archive@odin.ietf.org>; Wed, 6 Nov 2002 17:53:59 -0500 (EST)
Received: (from mailnull@localhost) by www1.ietf.org (8.11.6/8.11.6) id gA6Mu4w18957 for rddp-archive@odin.ietf.org; Wed, 6 Nov 2002 17:56:04 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gA6Mu4v18954 for <rddp-web-archive@optimus.ietf.org>; Wed, 6 Nov 2002 17:56:04 -0500
Received: from www1.ietf.org (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA16027 for <rddp-web-archive@ietf.org>; Wed, 6 Nov 2002 17:53:28 -0500 (EST)
Received: from www1.ietf.org (localhost.localdomain [127.0.0.1]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gA6MtOv18881; Wed, 6 Nov 2002 17:55:24 -0500
Received: from ietf.org (odin.ietf.org [132.151.1.176]) by www1.ietf.org (8.11.6/8.11.6) with ESMTP id gA6Mr4v18798 for <rddp@optimus.ietf.org>; Wed, 6 Nov 2002 17:53:04 -0500
Received: from d12lmsgate-5.de.ibm.com (ietf-mx.ietf.org [132.151.6.1]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id RAA15951; Wed, 6 Nov 2002 17:50:27 -0500 (EST)
Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23]) by d12lmsgate-5.de.ibm.com (8.12.3/8.12.3) with ESMTP id gA6MqFvx010488; Wed, 6 Nov 2002 23:52:15 +0100
Received: from d10hubm1.telaviv.ibm.com (d10ml001.telaviv.ibm.com [9.148.216.55]) by d12relay02.de.ibm.com (8.12.3/NCO/VER6.4) with ESMTP id gA6MqEo9075588; Wed, 6 Nov 2002 23:52:14 +0100
Subject: Re: [rddp] RE:I-D ACTION:draft-williams-iwarp-ift-00.txt
To: "Williams, Jim" <Jim.Williams@emulex.com>
Cc: Giora Biran <GBIRAN@il.ibm.com>, Julian Satran <Julian_Satran@il.ibm.com>, "'Culley, Paul'" <Paul.Culley@hp.com>, rddp@ietf.org, rddp-admin@ietf.org, tsvwg@ietf.org
X-Mailer: Lotus Notes Release 5.0.8 June 18, 2001
Message-ID: <OF5E28E10F.DDAEE059-ONC2256C69.007C01E1@telaviv.ibm.com>
From: Vadim Makhervaks <VADIK@il.ibm.com>
Date: Thu, 07 Nov 2002 00:52:10 +0200
X-MIMETrack: Serialize by Router on D10ML001/10/M/IBM(Release 5.0.9a |January 7, 2002) at 07/11/2002 00:52:14
MIME-Version: 1.0
Content-type: text/plain; charset="us-ascii"
Sender: rddp-admin@ietf.org
Errors-To: rddp-admin@ietf.org
X-BeenThere: rddp@ietf.org
X-Mailman-Version: 2.0.12
Precedence: bulk
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/rddp>, <mailto:rddp-request@ietf.org?subject=unsubscribe>
List-Id: IETF Remote Direct Data Placement (rddp) WG <rddp.ietf.org>
List-Post: <mailto:rddp@ietf.org>
List-Help: <mailto:rddp-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/rddp>, <mailto:rddp-request@ietf.org?subject=subscribe>

I'll try to put some pros for markers.

First of all I need to admit, that I'm not fun of markers, it is not an
easy thing to implement, but right now I do not see any other solution for
the problem that I'm going to present below.


First few words about host interface, and it's ability to handle bursts
from the wire. I believe that DDP concept is 10gb oriented, and thus a host
interface that should be considered is not PCI-X, but QDR-PCIX, RapidIO,
direct interface to memory controller, etc. It's not clear that such
interface would have any problem to handle inbound traffic on wire-speed.
It should be mostly write operations on the bus, and assuming efficient
bridge/MC implementation, we should not have big problems with bursts of
writes.

Even assuming that RNIC vendor would decide to pass the received stream via
on-board/on-chip memory before placing to the host memory, markers still
would be very helpful. I'll give just one example to emphasize this:
Without markers the location of the next DDP segment is identified by the
Length field of the header of previous DDP segment. If DDP header was lost,
then without use of markers, RNIC vendor would need to wait for receiving
this header, in order to place the data to the destination buffers on the
host. However, if markers are supplied by transmitter, receiver may several
choices: a) use a fast-path (cut-thru) and place received segments directly
to the destination buffers; b) pass received stream thru the memory, but
still be able to place the payload to the destination buffers out-of-order;
c) use a classic reassembly buffers, and place the data to the destination
buffers in-order.

So why should we force RNIC vendor to choose option c), and not leave this
as another feature open for competition?


Vadim




|---------+---------------------------->
|         |           "Williams, Jim"  |
|         |           <Jim.Williams@emu|
|         |           lex.com>         |
|         |           Sent by:         |
|         |           rddp-admin@ietf.o|
|         |           rg               |
|         |                            |
|         |                            |
|         |           06/11/02 11:07 PM|
|         |                            |
|---------+---------------------------->
  >-------------------------------------------------------------------------------------------------------------------------------|
  |                                                                                                                               |
  |       To:       "'Culley, Paul'" <Paul.Culley@hp.com>, Vadim Makhervaks/Haifa/IBM@IBMIL                                       |
  |       cc:       rddp@ietf.org, tsvwg@ietf.org, Giora Biran/Haifa/IBM@IBMIL, Julian Satran/Haifa/IBM@IBMIL                     |
  |       Subject:  [rddp] RE:I-D ACTION:draft-williams-iwarp-ift-00.txt                                                          |
  |                                                                                                                               |
  |                                                                                                                               |
  >-------------------------------------------------------------------------------------------------------------------------------|





> -----Original Message-----
> From: Culley, Paul [mailto:Paul.Culley@hp.com]
>
> You have done an excellent analysis of the differences and
> issues at hand.

I would echo that sentiment.

> -----Original Message-----
> From: Vadim Makhervaks [mailto:VADIK@il.ibm.com]


> I probably late on this discussion, but I just recently went thru the
> differences between both drafts.
> It seems (to me) that the changes proposed by Jim are not
> tightly coupled,
> and I would propose to discuss them separately:

Generally true.  There is some coupling, though, which I note.

> ddp header CRC: Sounds as a reasonable approach to me. I can
> imagine why
> such CRC would help to RNIC implementation. On the other
> side, I don't see
> a damage of adding such CRC to the DDP segment,  beside
> another word of
> overhead per DDP segment. I also think that such CRC should
> protect the DDP
> header only, and RDMA extensions (for Read Request and
> Terminate) should be
> considered as a payload. Anyway this disclaimer is
> out-of-the-scope of MPA
> definition, and probably is not relevant too much for this discussion.
>
> <prc> As discussed in prior emails, the MPA authors (after
> much discussion) felt that this is not necessary.
>   You need a separate header CRC if designing a "Flow through NIC":
>            * But Need for elasticity buffer anyway makes "flow through"
an
>              un-needed extra complexity.
>            * But NIC designers want to check L2 CRC and IP checksum first
>              anyway, requires whole packet.
>   You need a separate header CRC if Placing partial FPDUs
> (can place 1st
>   part without whole FPDU):
>            * As long as alignment is correct, this is not needed.
>              o Alignment is Expected to be the usual case in the
> data center,
>                at least, and common elsewhere.
>              o If lost alignment is an uncommon case, extra wire
> overhead and
>                extra complexity of saving headers, while placing
> data doesn't
>                seem worth the small buffer saving
> <prc>

I agree that a single CRC makes sense if alignment is always
guaranteed.  The IFT proposal does not guarantee alignment,
therefore two CRCs make sense.  SCTP does guarantee alignment,
so its single CRC makes sense.

Standard TCP does not provide alignment.  TSVWG is
considering an experimental variant of TCP that
does provide alignment.  But the RDDP work group
charter is clear that RDDP must be standardized on
standard TCP.  Hence the motivation for the IFT proposal.

Also, the benefit of the second CRC for the unaligned
case seems much greater than the cost of an extra
CRC in the aligned case.


> padding: It's definitely easier to implement CRC logic when
> everything is
> word aligned.

This is not an issue for which I have a whole lot of energy.
But my experience and those of the other hardware designers
I have worked with directly contradict the above.
Unaligned CRCs are trivial to implement.  Logic necessary
to insert padding is not trivial.  Either way works.
My vote is no padding, but I'm not terribly concerned
about conceding this point if a majority feel otherwise.

> It's not too much overhead - upto 3 bytes per
> DDP segment. I
> also agree that this is not a MUST requirement, and RNIC
> vendor should be
> able to handle not-aligned CRC generation/validation.
> <prc> Padding also eliminates the problem of how to deal with
> markers that fall in the middle of the CRC.  If everything is
> aligned on 4 byte breaks, then markers also end up on the
> same breaks (tied into the 512 byte marker interval).
> <prc>

True.  IFT does not propose markers, so this point is moot
for IFT.

> <prc> [...] I was personally of the opinion that we should
> allow or even require packing, but was in a minority opinion ;-(.
> <prc>

I suspect that packing may ultimately be required by the IETF.
Either that, or some detailed explaining
as to why network traffic patterns are not negatively affected.

> markers: We may argue whether this feature is helpful or not,
> and both side
> could give plenty examples showing cons and pros of this
> feature.

I volunteer for the cons. :)

There are some basic architectural issues needing
to be resolved as to whether it is permissible for
DDP to do out of order placement when running over
an ordered transport.  I personally am not religious
about this either way.  But unless this can be resolved,
markers make no sense whatsoever.

I do believe the benefits of out of order placement
are sufficiently questionable that the simplicity
of the in order model should not be compromised
if at all avoidable.

I am a bit religious in feeling that markers are
an ugly, complicated, and technically embarrassing
way to solve the problem of alignment.

It has been clearly stated that primary problem
the RDDP group needs to solve is layering RDDP
on standard TCP, and layering on an experimental
aligned version of TCP is secondary.  Markers
appears to be an effort to optimize for
the secondary case at the expense of the primary
case.

I believe that alignment can be done without
markers and without complicating or burdening
the primary objective of standard TCP.  I will
attempt to explore this in future drafts.

draft-ietf-tsvwg-tcp-ulp-frame-01.txt (expired),
is an example of such an approach.  This type
of approach is far more consistent with the
RDDP group charter of treating standard TCP
as the primary objective, and an experimental
aligning version of TCP as a secondary objective.



_______________________________________________
rddp mailing list
rddp@ietf.org
https://www1.ietf.org/mailman/listinfo/rddp




_______________________________________________
rddp mailing list
rddp@ietf.org
https://www1.ietf.org/mailman/listinfo/rddp