Re: [storm] comments on

"Sharp, Robert O" <> Wed, 04 May 2011 22:06 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 6E290E06C3 for <>; Wed, 4 May 2011 15:06:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -5.999
X-Spam-Status: No, score=-5.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_46=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id vRvokA8t5iBL for <>; Wed, 4 May 2011 15:06:57 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 6D199E06BB for <>; Wed, 4 May 2011 15:06:57 -0700 (PDT)
Received: from ([]) by with ESMTP; 04 May 2011 15:06:46 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="4.64,316,1301900400"; d="scan'208";a="430677047"
Received: from ([]) by with ESMTP; 04 May 2011 15:06:21 -0700
Received: from ([]) by ([]) with mapi; Wed, 4 May 2011 15:06:20 -0700
From: "Sharp, Robert O" <>
To: arkady kanevsky <>, "" <>
Date: Wed, 4 May 2011 15:06:18 -0700
Thread-Topic: comments on
Thread-Index: AcwJsR+P+Nk6JUXeSXWVX8ISPYdtPgAvIABQ
Message-ID: <>
References: <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [storm] comments on
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 04 May 2011 22:06:58 -0000

Hi Arkady,

Thanks for the comments!  Responses below...


> -----Original Message-----
> From: arkady kanevsky []
> Sent: Tuesday, May 03, 2011 11:42 AM
> To:; Sharp, Robert O
> Subject: comments on
> ext-00
> Here are a few comments on the draft:
> In Glossary it is stated:
> Atomic Operation - is an operation that results in an execution of a
>    64-bit operation at a specific address on a remote node. The
>    consumer can use atomic operations to read, modify and write at the
>    destination address while at the same time guarantee that no other
>    read or write operation will occur across any other RDMAP/DDP
>    Streams on an RNIC at the Data Sink.
> later in section 5 it is stated:
> The
>    operations atomically read, modify and write back the contents of
>    the destination address and guarantee that atomic operations on this
>    address by other Queue Pairs (QPs) on the same RNIC do not occur
>    between the read and the write.
> in
> 5.3. Atomicity Guarantees
>    Atomicity of the RMW on the responder's node by the Atomic Operation
>    SHALL be assured in the presence of concurrent atomic accesses by
>    other QPs on the same RNIC.
> The three statements are very different. The cases that need to be
> handled are:
> other operation of the same QP, the same RNIC, other devices, local ULP
> access.

The intention of the draft is that atomicity is guaranteed on the same or 
different RDMAP/DDP Streams on a single RNIC.  The atomicity guarantees 
apply to local or remote ULP access as long as both are strictly accessing 
the memory referenced by the atomic operation through the same RNIC.
Since this is a wire protocol specification, the ULP access description will
be considered out of scope. There is clearly an inconsistency in terminology 
here when QP was used.  One clarification that we will make is to replace 
the QP term with RDMAP/DDP Streams throughout the document.  I believe that
the sections in the draft are consistent with this description, but if you
see something specific that could be made more clear, we'll be happy to 
address further input.  We can certainly add a statement that says that
there are no guarantees between different RNICs or other devices if that 
would help.

> 2. It is stated:
> The discovery of whether the
>    atomic operations are implemented or not is outside the scope of
>    this specification and it should be handled by the ULPs or
>    applications.
> Is this really the best way? Should the connection setup extension
> draft which I submitted be extended to exchange this info also? That is
> which version of RDMA protocol the remote side supports or requests to
> support. The reason I am asking it it not clear to me what option does
> ULP have if atomic or immediate data is not supported.

For this draft the statement is appropriate, however we certainly would
be interested in building better support into the MPA to exchange
the enhanced capabilities between peers at connection setup.  I assume
that this would be future revision to the MPA draft at this point in
time.  Do you agree?

> 3. What is the Endian(ness) format of Add Data for fetchadd? ditto for
> cmpswp.

The wire format (Big Endian) and the conversion to the appropriate 
Endian(ness) for the host is specified in sections 5.1.1 and 5.1.3.

> 4. Is there a separate memory registration (Stag creation) for atomic
> operation? Or any registered memory can be used for atomic op?

In practice, applications will likely use separate STags for atomic
operations, but there is not a requirement to do so.  Any registered
memory can be used for atomic operations.

> 5. How does the requester knows that Stag+remote tagged offset does
> corresponds to  a naturally aligned buffer address?

Applications must exchange STags and tagged offsets in order to expose
the target of an atomic operation in a application specific way.  It is 
the responsibility of the application that is exposing the memory to 
ensure that the buffer is naturally aligned.  The requestor must trust
the peer to have properly aligned an exposed target of an atomic 
operation. If the target exposed an improperly aligned STag and tagged 
offset, an error will be surfaced by the RNIC.

> 6. "The Swap atomic operation result is unknown when the buffer address
> is not naturally aligned." feels a bit loose. I would assume that RNIC
> will at least not change any data outside memory specified by Stag.
> Also in 5.2.1 #4 states:
> At the Remote Peer, when an invalid Atomic Operation Request
>       Message is delivered to the Remote Peer's RDMAP layer, an error
>       is surfaced.
> It sounds like breaking natural alignment is not an error. That is it
> will break the connection.

There is room for improvement here as you have pointed out.  We will add
a better description that indicates that the responder RNIC will prevent 
atomic operation access to a memory location that is not naturally aligned.  
This will be an error that is always surfaced.  The result will be a 
terminate message back to the requester.  We will add a description of
the terminate semantics to the draft.

> Also:
> 6. At the Remote Peer, when an invalid Atomic Operation Response
>       Message is delivered to the Remote Peer's RDMAP layer, an error
>       is surfaced.
> Does the error carries the request identifier?

The terminate message generated will carry the RDMAP header of the 
offending atomic operation request which includes the request identifier.
This will be called out in the new description of the terminate semantics.

> 7. Any guidance for sizing ORD/IRD for atomic ops? Does this doc
> requires my doc with ORD/IRD negotiation at connect time?

Sizing ORD/IRD is out of scope for this wire format level of draft since
the application is responsible for determining the acceptable requirements.
As with the new capabilities related to this draft, it would be good to work
this in with new MPA enhancements.

> 8. If the same memory has two Stag (overlapping memory regions) is
> atomic op ordering only preserved for a single QP/connection? 

STags do not have an impact on the atomic guarantees. Atomicity is
guaranteed within the scope of an RNIC across all QPs supported by the
RNIC without respect to the STag used to reference a memory location 
accessed by the atomic operation.

> What
> happens if natural alignment is less then 64-bit? Is there any order
> guaranteed per Stag (if two QPs use the same Stag for atomic or RDMA
> ops)?

Operations that violate the natural alignment assumption will be terminated
by the responder before the operation is performed per the changes 
discussed above.

> 9. It is strange that for atomic op we are using 64-bit terminology but
> for immediate data we use 8-byte terminology.

We intentionally used the separate terminology.  We used the "byte"
terminology to discuss data payload and the "bit" terminology to discuss
fields within the data payload.  I believe that we were consistent but if
you see something that should be clarified, we'll be happy to make the

> 10.Suggest to specify which queues and buffers immediate data message
> uses?

The queue number (qn=0) and data buffers (untagged buffers) are specified
in section 6.3.  If additional clarifications would help, please let us know.

> Arkady
> --
> Cheers,
> Arkady Kanevsky