Re: [nfsv4] New Version Notification for draft-dnoveck-nfsv4-rpcrdma-xcharext-02.txt

Chuck Lever <chuck.lever@oracle.com> Fri, 26 August 2016 15:39 UTC

Return-Path: <chuck.lever@oracle.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0FCD812D0CA for <nfsv4@ietfa.amsl.com>; Fri, 26 Aug 2016 08:39:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.749
X-Spam-Level:
X-Spam-Status: No, score=-4.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H2=-0.001, RP_MATCHES_RCVD=-0.548, SPF_PASS=-0.001, UNPARSEABLE_RELAY=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wlZvE8WMjaPd for <nfsv4@ietfa.amsl.com>; Fri, 26 Aug 2016 08:39:25 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1584E12D0C2 for <nfsv4@ietf.org>; Fri, 26 Aug 2016 08:39:25 -0700 (PDT)
Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u7QFdNqF019525 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 26 Aug 2016 15:39:23 GMT
Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id u7QFdN1n014028 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 26 Aug 2016 15:39:23 GMT
Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id u7QFdNNU031398; Fri, 26 Aug 2016 15:39:23 GMT
Received: from anon-dhcp-171.1015granger.net (/68.46.169.226) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Fri, 26 Aug 2016 08:39:23 -0700
Content-Type: text/plain; charset="us-ascii"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Chuck Lever <chuck.lever@oracle.com>
In-Reply-To: <CADaq8jcWTuCC4yb3_fFpYYWcW3he2HHvjz7zaPON2_Yc5qXN-A@mail.gmail.com>
Date: Fri, 26 Aug 2016 11:39:21 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <4FDB0016-8E32-4E61-B139-F46C5AFCED7C@oracle.com>
References: <147155138353.27840.7944779905916585881.idtracker@ietfa.amsl.com> <CADaq8jeb6P_1mG9++T=Ff31WBeJi-bD7NqUBQM4GOscxquiDxQ@mail.gmail.com> <2E2207C2-EC6D-4C44-9024-56D103563617@oracle.com> <CADaq8jd6PT7y=x1a5Tynxdzed2DivEuCG_6UK1eA=Vxod3PJxQ@mail.gmail.com> <54FAE2DB-2583-4512-89D7-2EC9E7AEA86F@oracle.com> <CADaq8jcWTuCC4yb3_fFpYYWcW3he2HHvjz7zaPON2_Yc5qXN-A@mail.gmail.com>
To: David Noveck <davenoveck@gmail.com>
X-Mailer: Apple Mail (2.3124)
X-Source-IP: aserv0022.oracle.com [141.146.126.234]
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/1rfr2W24KWNJWZlAWwk25l630FU>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] New Version Notification for draft-dnoveck-nfsv4-rpcrdma-xcharext-02.txt
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Aug 2016 15:39:27 -0000

Hiya Dave-

> On Aug 26, 2016, at 5:08 AM, David Noveck <davenoveck@gmail.com> wrote:
> 
> > So, there is a protocol mechanism for reporting partial or full
> > completion of a property change request, yet no mechanism for
> > indicating which change request it is the receiver is completing.
> > But multiple property change requests are allowed to be in flight
> > concurrently.
> 
> The protocol allows it, but particular implementations, may well
> restrict things so that they have only one in flight for each property
> at any particular time.
> 
> > Without an XID, the only way this might work is by serializing
> > updates to each property.
> 
> It is one way.  I don't think it is the only way.  The requester, if it
> is interested in knowing when there are no outstanding requests 
> for a property can increment on requests and decrement on messages
> that indicate completion.
> 
> > And, if the sender can send an "I'm done" update followed by any
> > number of "Oops, I changed this again" updates, 
> 
> The updates indicate that there has been an unsolicited change in
> the property:  no "Oops" and no "again".   
> 
> Whether such changes happen or not, depends on the
> implementation.  In any case, I would expect them to be
> rare.
> 
> The proposed protocol provides a way for the peer to be
> informed of such changes.   I don't see how that undercuts
> the facilities that allow the completion of requested changes
> to be reported asynchronously. 
> 
> > I don't see the
> > point of a receiver ever remembering that it is waiting for a
> > pending change, or the protocol distinguishing between unsolicited
> > and pendclr.
> 
> You seem to be saying the potential existence of unsolicited changes
> some time after the completion of a requested change somehow
> makes the notion of completion of the requested change useless.

No, I'm saying that pendclr adds no value; the receiver knows
everything it needs to know by looking only at the new value.

I don't have a problem with UPDXCH as a mechanism for unsolicited
change notifications, though I'm not sure we have a use case
where the same kind of notification cannot be done with REQXCH.


> I'm not sure why you believe that.

I believe that, because I can't for the life of me think of a use
case where a receiver can make use of the information in pendclr.
(Yes, I probably lack quite a bit of imagination). Can you please
provide one?

It's difficult, as a reviewer of proposed protocol, to understand
why we are reserving these 32 bits in the UPDXCH message without
having at least a little clue about why they are there and how a
receiver would make use of them.

It's especially perplexing to me to see a facility for asynchronous
notification of completion when there is no real call-reply transaction
relationship between REQXCH and REPXCH, and there is no timeout when
sending REQXCH messages. A key question for me is "How is [UPDXCH
with pendclr set] different than a delayed REPXCH response?" If they
are semantically the same, then why have both?

If the document moves forward with pendclr, I think implementers need
more guidance about how to use this field. Otherwise, someone is going
to build an implementation that depends on pendclr, and some other
implementation that implemented the protocol blindly (ie without any
need for the pendclr value) is not going to get it right (or will take
shortcuts, or whatever), and there will be an interoperability problem,
just add boiling water and fluff with fork.

In other words, I think there is useful feedback from an implementation
that does not need pendclr. Perhaps only interop testing can sort it
out.


> The protocol allows unsolicited changes
> but that doesn't mean they are going to be happening all that frequently.

A sane implementation of the currently proposed initial properties
probably won't use UPDXCH at all. I'm trying to explore the corners
here and play devil's advocate; if an implementer is allowed to use
UPDXCH in a crazy way, we should try to anticipate that and either
prohibit it or make it safe.


> > it just needs to know the current value of the property.
> 
> It certainly needs to know that.  Your implementaition might
> not be interested in anything else but the protocol provides it
> because it is easy for the sender to provide and useful in some
> cases.

To interoperate among implementations that depend on it and
ones that don't need it, though, I think the document needs
more precise language about how pendclr is to be used.


> > If two or more ULPs are using the same connection, they might
> > send a sequence of possibly contradictory property change
> > requests.
> 
> I don't understand why one might assume that ULP's 
> are sending property change requests.  ULPs are RPC
> protocols.  They send rpc requests and receive replies.

(Much) earlier you suggested that a ULP might want to select its
own receive buffer size, for example. If two ULPs share the same
connection, then they could choose to adjust the connection in
contradictory ways. Perhaps you are taking that off the table
now, which is fair enough.


> > Also, there's no co-ordination between the two senders on the
> > same connection; or an administrator might adjust these
> > properties.
> 
> Any co-ordination requirement that a transport implementation
> might impose to deal with the case of multiple ULPs would be
> out of scope in this document.

My remark here was not about two ULPs. The two directions of
transmission on a connection are not co-ordinated in any way. An
unsolicited UPDXCH can pass a REQXCH going in opposite directions.

IMO the document has to acknowledge that property value changes
have to be done carefully and with co-ordination with the regular
operation of the transport to avoid hazardous oscillations of
values. The PROP protocol mechanism does not provide any kind of
serialization on its own.


> I don't see how you decide there is "no co-ordination".  If you build
> an implementation and give multiple ULPs the ability to request and 
> make such changes independently, you are asking for (or begging for) 
> trouble.  As an implementer, the choice regarding co-ordination (or not)
> is up to you.
> 
> I think the entities making such changes are most likely to reflect 
> administrative requests or internal transport optimization functions.
> 
> I can't see why one would allow the ULP to do this itself.

Yes, I'm talking about administrative requests or optimization
efforts, here.


> > I posit that the receiver cares only about the current effective
> > value of these properties. 
> 
> That's one of the things he cares about, but I believe it is not
> the only one.
> 
> pendclr is completely unreliable.
> 
> If you were to always set it to false, as you suggest below that you 
> might, you can make it unreliable, but there is no good reason to do that.
> 
> > Can you provide a real world example of how pendclr might be used?
> > Because it appears to me to convey no actionable information, and
> > I don't see why I shouldn't always set it to false.
> 
> 
> Let's consider that a client implementation encounters a server
> which has a very large receive buffer size and feels that it would prefer more 
> smaller buffers.

More likely, either the client or server is low on resources, and
wishes to reclaim receive buffer space. But, OK.


> It then requests a smaller receiver buffer size.  as part of doing that it decides
> to not send requests or responses that are larger than this anticipated size
> limit.  I think the spec already discusses this but the basic point is that by making
> this request the sender is licensing the peer to reduce the buffer size.  If this
> request is completed, the sender will make that new size permanent (or at least until
> it hears about a change). On the other hand, if the peer rejects the request 
> (either fairly soon via RESPXCH, or  after a while via UPDXCH), the sender would
> then adopt the current value, since the possibility of the
> peer eventually
> satisfying the orginal request is now gone. On the other hand, an update not including pendclr
> should not have that effect.  As long as the request is pending, he has to be prepared
> for its peer to adopt the lower value that it has requested.

I don't see how the REQuester's behavior would be different if pendclr
was false. It can't assume the requested setting takes effect until
it gets a REPXCH or UPDXCH with the new value, whether or not pendclr
is true, and there's nothing in the spec (that I recall) that says
otherwise (about pendclr being true is required before a new value is
trusted -- did I miss something? -- and if that is the case, then
what's the value of sending an unsolicited UPDXCH ?).

There's nothing that prevents the REP side from sending "yes, I took
the new setting (pendclr true)" and then "oops, no I went back to the
old one (pendclr false)". That's not something an implementer might
do on purpose, but it sure can happen by accident (poorly-timed
administrator action, say).


> > It's very easy to imagine an implementer deciding that UPDXCH is
> > better for his or her client or server than FIRSTPROP, and noting
> > stridently that the document does not prohibit or even warn against
> > this design.
> 
> > IMO the document needs to make an explicit requirement not to
> > behave this way, or state that it is acceptable though not preferred.
> 
> Are you sure that it is "requirement' and not a "REQUIREMENT" that you
> are looking for?
> 
> I'm OK with saying you shouldn't do that, but I have problem saying it is "acceptable".  I think it is noxious, although there is no "NOXIOUS".
> 
> Maybe the right thing is to state the obvious, that the function of UPDXCH is to
> convey changes, making its use to present values at connection time dubious,
> while the function of ROPT_FIRSTPROP is to present the value at initial connection.

You could state that UPDXCH and REQXCH MUST/SHOULD NOT be used before
FIRSTPROP has been exchanged.

Or even should not. I would go with MUST NOT: then receivers can depend
on seeing FIRSTPROP before seeing other changes.


> > For a client to reduce its receive buffers in mid-flight, all it
> > needs to do is send Reply chunks more often. That will work OK
> > if the server uses the Reply chunk whenever one is provided.
> > (Not all servers behave nicely in this regard, though).
> 
> > For a server to reduce its receive buffers in mid-flight, the
> > server needs to notify the client of the change in receive
> > buffer size, the client would need to acknowledge the change,
> > and only then can the server reduce its receive buffer size.
> > How might that be done with the protocol proposed in
> > rpcrdma-xcharext ?
> 
> I don't think unsolicited reduction in receive buffer size (as opposed to the
> requested reduction discussed above) can be made to work.

It could, perhaps, if the property had two values: one was the send
size limit, and the other was the receive buffer size. The send size
limit is not necessarily associated with a physical buffer size, if
Sends are done with a gather vector, so its relatively
straightforward to change it.

One peer could send a REQXCH to reduce the sender's send size limit.
If the other peer says "OK" then it is safe to reduce the receive
buffer size.

In fact, IMO both sides do want to know the send size limit of their
peer so they can trim receive buffers to a size that would be used.
It might also help a requester determine when a responder needs a
Reply chunk.


> Maybe I should add an explicit statement
> to that effect.

That would be implementation guidance at best, and might preclude
some future innovation that allows unsolicited receive buffer size
changes.

Should similar statements be made about the other initial properties
proposed in rpcrdma-xcharext? 

--
Chuck Lever