Re: [nfsv4] draft-ietf-nfsv4-rpcrdma-bidirection-03 review

I wrote:
> but I have a problem with writing the spec as if
> such situations don't exist.

I've been thinking more about this.  I think the problem lies in what I
mean by
"the spec".  If that means "rfc566bis", then such situations cannot exist.
rfc566bis seems to only deal with forward direction operation.  In such
an environment,
if you are a client and you receive a message, it has to be a reply.
Similarly, if you are
server and you receive a message, it has to be a request.  So there is no
occasion for the
kind of ambiguity I'm concerned about.

On the other hand, if "the spec" means "the specification of RPC-over-RDMA
Version One",
in which multiple directions of operation can be supported, the ambiguity
is unavoidable. If
you (client or server) receive a message you do not know a priori, whether
it is a request or
a reply and there are situations in which the receiver cannot determine,
given a malformed
message, whether it has received a malformed request or reply.

On Mon, May 23, 2016 at 12:23 PM, David Noveck <davenoveck@gmail.com> wrote:

> > If the requester sent version 1 but doesn't
> > recognize version 1 in replies, something is very wrong.
>
> It certainly would be, but if you want to say that you only
> send this when you get a request, it is unclear how one might
> determine this.  If you support Version One, and you get a
> version field which is not one, you have no idea what the XDR
> for that version looks like and thus you have no way to find out
> whether out whether the message contains a request or reply.
>
> In practical terms, if you are a server, then you might well assume that
> the first message you receive likely to be a request, but future
> versions might add initialization/setup transactions that you are not
> prepared to parse.  Maybe Version two needs to be modified so
> that clients always send a (possibly NULL) request before using
> extensions.
>
> > Essentially I'm saying that RDMA_ERROR is always a REPLY.
> > That is the way current implementations treat it, thus we
> > should document that.
>
> OK, but there are going to be situations in which the receiver cannot
> determine whether it has received a request or a reply.  I don'y know
> exactly how to deal with those situations (you might say he MAY send
> RDMA_ERROR), but I have a problem with writing the spec as if
> such situations don't exist.
>
> On Mon, May 23, 2016 at 11:39 AM, Chuck Lever <chuck.lever@oracle.com>
> wrote:
>
>>
>> > On May 23, 2016, at 11:00 AM, David Noveck <davenoveck@gmail.com>
>> wrote:
>> >
>> > I think the better alternative is to document the ambiguity and ask
>> people
>> > to live with it.  Version One's error handling is broken and it doesn't
>> seem
>> > as though it can be fixed without changing the XDR, and we've decided
>> that all
>> > such changes are to bump the version number.
>>
>> I'm not proposing an XDR change, and there would be no
>> behavior change required of current implementations. No
>> current implementation sends RDMA_ERROR in response to
>> a bogus reply message. This is strictly documentation
>> of existing implementation behavior.
>>
>>
>> > The idea is to ask the sender not to set the credit value and the
>> receiver not
>> > to interpret it.
>>
>> The sender is going to set the credit value no matter
>> what. There's nothing that says otherwise. (You haven't
>> mentioned the sender side change before: that _would_ be
>> a change to existing implementations, and to rfc5666bis).
>>
>>
>> > The problem with restricting when this is sent is that it assumes, that
>> if
>> > you receive a message, you can tell if it is a request or a reply.  For
>> a well-formed
>> > message that can be determined, but the idea here is we are dealing
>> with a
>> > message that is so messed up  that you can't even XDR the transport
>> header.
>>
>> Not necessarily. The RDMA_ERROR message is used most
>> typically in two important cases where XDR is parseable:
>>
>> 1. To report an RPC-over-RDMA protocol version mismatch
>>
>> 2. When a backward direction request has non-empty chunk
>> lists and the responder has no support for chunks
>>
>> The case where an actual parsing problem occurs is useful
>> mostly during early development of an implementation. It
>> happens in practice only when there is a catastrophic
>> fabric or RNIC failure that corrupts RDMA Send data
>> content.
>>
>>
>> > In we now look at section 5.5.2 and look at the examples there:
>> > an invalid value in the rdma_proc field
>> > Without that, you can't even parse the header so, you don't where to
>> look for the call/reply indication.
>> > an RDMA_NOMSG message that has no chunk lists
>> > Here, there is no payload to look at.
>> > or the contents of the rdma_xid field might not match the contents of
>> the XID field in the accompanying RPC message.
>> > In this case, you can determine which you have.
>>
>> How should a requester behave when it gets such a
>> malformed reply message? Sending an RDMA_ERROR message
>> to the responder seems ineffective. The requesters I'm
>> familiar with drop such garbage replies and report an
>> error, which seems reasonable to me.
>>
>> Responders are required to copy the rdma_vers field to
>> replies. If the requester sent version 1 but doesn't
>> recognize version 1 in replies, something is very wrong.
>>
>> The XID field in an RDMA_ERROR message is valuable: when
>> sent from a responder, it tells the requester that the
>> responder was not able to process that XID. The requester
>> is then free to try again or terminate the transaction.
>> I believe making that field unambiguous is a good thing,
>> and worth making the proposed change.
>>
>> Essentially I'm saying that RDMA_ERROR is always a REPLY.
>> That is the way current implementations treat it, thus we
>> should document that.
>>
>> This also disambiguates the credit value, but that's
>> much less important. I don't have a strong opinion about
>> ignoring that field on receipt of an RDMA_ERROR.
>>
>>
>> > On Mon, May 23, 2016 at 10:10 AM, Chuck Lever <chuck.lever@oracle.com>
>> wrote:
>> >
>> > > On May 20, 2016, at 3:15 PM, Chuck Lever <chuck.lever@oracle.com>
>> wrote:
>> > >
>> > >
>> > >> On May 20, 2016, at 2:51 PM, Karen <karen.deitke@oracle.com> wrote:
>> > >>
>> > >>
>> > >>
>> > >> On 5/20/16 12:44 PM, Chuck Lever wrote:
>> > >>>> On May 20, 2016, at 2:30 PM, Karen <karen.deitke@oracle.com>
>> wrote:
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> On 5/20/16 10:41 AM, Chuck Lever wrote:
>> > >>>>>>
>> > >>>>>> 4.1
>> > >>>>>>
>> > >>>>>> "When message direction is not fully determined by context"
>> > >>>>>>
>> > >>>>>> "fully determined by context" thats confusing, what does that
>> really mean? I think this means when the rdma header does not directly
>> indicate if the message is a call or reply, but the wording is confusing.
>> > >>>>> That means that in some cases the receiver can guess
>> > >>>>> accurately which direction the message was going, based
>> > >>>>> on the context of the operation, even without having
>> > >>>>> an RPC message payload.
>> > >>>>>
>> > >>>>> I'm open to suggestions.
>> > >>>>>
>> > >>>>> I think some prefer that the document simply state that
>> > >>>>> direction is always unknown in cases where an RPC
>> > >>>>> message payload is not present. That kind of opens a
>> > >>>>> can of worms with RDMA_ERROR, which is needed to report
>> > >>>>> that the client does not support backward direction
>> > >>>>> operation.
>> > >>>> I don't think that it can always absolutely be clear which
>> direction without the RPC header. Or am I misunderstanding RPC payload? I'm
>> taking that to mean the RPC header, but maybe you are referring to the NFS
>> data in the rpc payload?
>> > >>> "Context" here means that if the receiver can tell
>> > >>> by other means (like, there are no other outstanding
>> > >>> operations). So no, it's not always going to be clear.
>> > >>> In those cases, direction is not known.
>> > >> I don't think there is ever a 100% way to know the direction without
>> the rpc header's call or reply field.
>> > >
>> > > I don't think what I wrote contradicts that statement, but
>> > > it does allow latitude for innovation to close this hole
>> > > in other ways.
>> > >
>> > >
>> > >> Even if there is an outstanding request, and it is for the xid that
>> is defined in the rdma header, being that the xid is not unique between
>> client and server, there still exists, though extremely small, a possiblity
>> that this is a new request that just so happens to have the same xid and is
>> not actually the reply to an outstanding operation.
>> > >
>> > > The only case in RPC-over-RDMA Version One where this is
>> > > a concern is RDMA_ERROR. The other two valid procs are
>> > > RDMA_NOMSG and RDMA_MSG, and both of those have RPC
>> > > message payloads.
>> > >
>> > > When a client receives an RDMA_ERROR, that normally means
>> > > one of its forward requests had a problem.
>> > >
>> > > Typical RPC-over-RDMA Version One client implementations
>> > > don't send an error to a server due to a problem with a
>> > > forward reply. We can probably say the same about a
>> > > server sending an RDMA_ERROR to a client in response
>> > > to a bad backward reply. (And, rfc5666bis could be
>> > > enhanced to suggest or require that requesters don't
>> > > send RDMA_ERROR in response to a bogus reply).
>> >
>> > OK, this idea is growing on me.
>> >
>> > What do people think of updating rfc5666bis to restrict
>> > RDMA_ERROR to be sent only by responders? This would be
>> > for RPC-over-RDMA V1 only, of course.
>> >
>> > The reason to do this would be to eliminate the ambiguity
>> > of the meaning of the XID and credit value, due to the
>> > absence of an RPC message payload with a direction field
>> > in it.
>> >
>> > I'm not aware of any current requester implementation
>> > that can send an RDMA_ERROR message from its reply
>> > handler.
>> >
>> >
>> > > So the only instance where a server might receive an
>> > > RDMA_ERROR message is when it has sent a backward
>> > > request that the client did not like.
>> > >
>> > > Thus: context indicates what direction that RDMA_ERROR
>> > > message is going.
>> > >
>> > >
>> > > --
>> > > Chuck Lever
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > nfsv4 mailing list
>> > > nfsv4@ietf.org
>> > > https://www.ietf.org/mailman/listinfo/nfsv4
>> >
>> > --
>> > Chuck Lever
>> >
>> >
>> >
>> > _______________________________________________
>> > nfsv4 mailing list
>> > nfsv4@ietf.org
>> > https://www.ietf.org/mailman/listinfo/nfsv4
>> >
>>
>> --
>> Chuck Lever
>>
>>
>>
>>
>