Re: [nfsv4] I-D Action: draft-ietf-nfsv4-rfc5667bis-04.txt

David Noveck <davenoveck@gmail.com> Wed, 01 February 2017 01:00 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 581041296B1 for <nfsv4@ietfa.amsl.com>; Tue, 31 Jan 2017 17:00:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.998
X-Spam-Level:
X-Spam-Status: No, score=-1.998 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rs8vEmkiGHhH for <nfsv4@ietfa.amsl.com>; Tue, 31 Jan 2017 17:00:00 -0800 (PST)
Received: from mail-oi0-x229.google.com (mail-oi0-x229.google.com [IPv6:2607:f8b0:4003:c06::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 1476A1296AF for <nfsv4@ietf.org>; Tue, 31 Jan 2017 17:00:00 -0800 (PST)
Received: by mail-oi0-x229.google.com with SMTP id j15so225523013oih.2 for <nfsv4@ietf.org>; Tue, 31 Jan 2017 17:00:00 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=be+qfruQVBZSdtX4XyB7+KvFM/mc7GlfKmJfiEwssNk=; b=iL49pAywIChvaULJ00B7LsGkDpR3Occ+2pZUoz5ZCvIXH5gOqC52suPb3l0AHN4sjf 8Zux9U6wcMB58YYX+ee9KeRW7/gCjUcOZHkohGGwLCF73Y/FA7zD2fsXFgt19dzJGLa2 z2goriHlugCujaCgFFLhdMzfj2wjRkXo92mSP1JzW8b2uTJTRua/6/MVtyMR4ZYBP1Fo qTBwTg95GkvmZ2TQNBbwOM3cKB+gUWcjaXPLeIeO0hY6icXA31jD0Hv/OPBjvqRg8ELs RYdh+r7QOLhQCcda/D3jISn7sHKW0IhZTAuP33aDeoF1zDNYicIyP/JxwriRtzndD0Ym isKw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=be+qfruQVBZSdtX4XyB7+KvFM/mc7GlfKmJfiEwssNk=; b=BiMWp+aRUfvuHugifaQPAXnDDnoxWawXGbVR03taq0Y31nlA3I7ObH6GrNzT2Y7nwa pFr56DGxbuAKaesnYJyZYwRdOAnsphf0vrYAR62zui80ZQVdVrd3yMEubytqngmc6Xk8 u21RW6ESEfoN/fqWXUk0xohryFiWf403kR7aNyVfZVuSAW6uSytOe+k9v9QuDqiSFhmW ZPlzmHrI6fhLk6iKIZlL4rHTJlw/KAYnC0mh60GLmlCChemakDIfe7N1ljt2vacNEdQt KZ3SBY8aBELeN47zTagDC6HJIQO2LGIrMdcbGi9iR0ZYI5a/DQzYK0pKh+SD/dj8EWJr 1qmw==
X-Gm-Message-State: AIkVDXLTBx5OEqeyn/tmrfa3ftQo7X3nRuEYxYltLfIfGx55sTcRYMmFIVz1e5FyKdQuA1G2HHsHf8MCvhzzYQ==
X-Received: by 10.202.212.70 with SMTP id l67mr73976oig.153.1485910799049; Tue, 31 Jan 2017 16:59:59 -0800 (PST)
MIME-Version: 1.0
Received: by 10.182.137.200 with HTTP; Tue, 31 Jan 2017 16:59:58 -0800 (PST)
In-Reply-To: <9FBFE5CC-2EF7-4B91-A213-3CE2690F6A4B@oracle.com>
References: <148495844040.13416.10356809202500126242.idtracker@ietfa.amsl.com> <338b603b-8be3-f7f3-d7e0-021d8185f8ec@oracle.com> <E0D9D91E-9245-4846-842A-1F75A9A8D4A4@oracle.com> <CADaq8jctE3vv2J5jyVHFv-991iN3rHku5JEdcZKZ69=V34cttQ@mail.gmail.com> <711FEE13-D5CF-4793-B8B9-C682972B338D@oracle.com> <CADaq8jcwTEG8TCZ43tf6FmHGd2bhG42ZWCOop5RNKYRmriQbKg@mail.gmail.com> <9FBFE5CC-2EF7-4B91-A213-3CE2690F6A4B@oracle.com>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 31 Jan 2017 19:59:58 -0500
Message-ID: <CADaq8jd+vRmXbu5=hmFtzbf47iuiMLoTmY15RDNags1-TmrAoQ@mail.gmail.com>
To: Chuck Lever <chuck.lever@oracle.com>
Content-Type: multipart/alternative; boundary="001a113deff4586cbf05476d91a6"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/-GYIKrkIEzH7brtXTjTyC1umuOE>
Cc: "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] I-D Action: draft-ietf-nfsv4-rfc5667bis-04.txt
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Feb 2017 01:00:04 -0000

> I've attempted to make the language more generic, but I
> regard that as a constructive but ultimately unattainable
> goal because it requires predicting the future with a high
> degree of accuracy.

I don't agree.  I think we need use more abstraction in
stating the requirements than rfc5667 did.  The NFS specs
work with filesystems that were not known when the
spec were written.  Only when we extend the
abstraction, as we did with xattrs, do we have to
write a new extension document.

> At some point we have to stop and say it is good enough.

We agree on that but disagree about what that point is.


> > The concept of an
> > extensible version 2 transport would become unworkable in that
> > case.

> IMO that's overstating it. We have a lot to learn about how to
extend RPC-over-RDMA.

> The extension rules in rpcrdma-version-two
> have to state how extension specifications deal with adding
> new ULB requirements.

I don;t think extensions need to add ULB requirements.  I do think we
need some text in rpcrdma-version-two that explains what the extensions
have to say if they propose different implementation of abstractions
referred to by ULB(s).

I'll draft some proposed text in this regard and run it by you.

> Right now, we don't know what requirements
> even existing proposed extensions might add. For instance:

> - Do we really believe that rfc5666bis DDP eligibility is an
> appropriate rule to apply, without any change, to send-based DDP?

I believe they do.  rtrext is written using that concept which
you created and required that ULBs define.  That gives us a good
basis to make ULBs more generic.  I just want to take
advantage of it sooner than you do.

> I don't think that's a foregone conclusion.

Perhaps not but I believe it is the case.

> - Can you explain here why you believe that message continuation
> will have an impact on the way ULBs are written?

Right now rfc5667 says that if a message is longer than the
size of the receivers buffer it must go in a reply chunk.  That
isn't compatible with message continuation.

Instead, it should say that if a message is longer than the longest
message that a transport can handle using sends, it should be
transferred using explicit RDMA operations.

That would leave it up to the transport to define the longest message
that can transferred by sends and how longer messages are transferred
using explicit RDMA operations, which rfc5666bis already does.

> The point being that we have a set of ULB composition rules in
> rfc5666bis. You seem to be adding new ones that we haven't all
> agreed on, and then dinging me (repeatedly) for ignoring them.

I haven't done that.  If it seems to you that I have, it would help
if you specified one of these new rules.


> I'm much more comfortable closing the book on rfc5667bis with the
> set of transports we have in RFCs now, and then opening another
> update when we have real documented transport changes to address.

If you want to close the book on rfc56667bis, I think I can put
off what I was planning to do.

> Otherwise rfc5667bis will never be finished.

I think you are unduly pessimistic but I think we accommodate your
desire to close on rfc5667.  I'll send separate mail on the retry issue.

> Let's design for what we have,

OK.  rfc5667bis will address what we have. limited to protocols for which
there
are implementations, and working group documents that are far advanced.

Another document will address what we have in the sense that there are
existing specifications
for the feature.  In other words it will hande version-two and rtrext.  I
think it will handle a lot
more, although you might not agree.

> clear the table, and move on. There's
> no reason to believe that will create an unending sequence of updates
> to the NFS ULB. There will be no more ULB updates than there are
> extensions, and quite likely fewer.

Exactly.  I think we can get by with exactly one.

So what I expect to do is to submit draft-dnoveck-nfsv4-nfsulb,
illustrating my
approach to making the ULBs for NFS transport-generic.  Since it is based
on rfc5667bis, I will, with your permission, list you as a co-author.

On Tue, Jan 31, 2017 at 6:02 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

>
> > On Jan 31, 2017, at 2:54 PM, David Noveck <davenoveck@gmail.com> wrote:
> >
> > > Thus I'm considering removing the retry language from rfc5667bis.
> >
> > I don't think that should be done.  Let me explain why.
> >
> > > First, the mandate I recall receiving in Dallas for rfc5667bis was:
> > >
> > > 1. Document existing implementations
> > > 2. Fix mistakes and omissions
> > > 3. Extend the NFS ULB to properly cover NFSv4.1 and NFSv4.2
> >
> > And those have been done :-)
> >
> > > Since then we have added:
> >
> > > 4. Make it align with new language in rfc5666bis
> >
> > One part of which is to require that you have a reliable means of
> > bounding reply sizes.  Allowing retry is a means to accomplish
> > that requirement without essentially waving the problem away
> > and guessing/hoping that it will not be bothersome.   Even if this
> > is not a problem in practical terms, I don;t see how we can present
> > an rfc5667bis that doesn't meet the requirements in rfc5666bis.
> >
> > > And snuck in:
> >
> > I don't think there was anything sneaky here.  The "mandate"
> > you mention was not a legislative act or executive order.
> > It was a plan to go forward with and we should be able
> > to make changes if necessary.
>
> > > 5. Make it align with the language of rpcrdma-version-two and
> > > rpcrdma-rtrext (which I'm also struggling with, but that's a
> > > separate topic)
> >
> > I think the goal is not to align it with the particulars of any
> > particular version or extension but to make it generic, so
> > that the transport can be changed without continually revising
> > the ULB for new versions and extensions.
>
> I've attempted to make the language more generic, but I
> regard that as a constructive but ultimately unattainable
> goal because it requires predicting the future with a high
> degree of accuracy.
>
> At some point we have to stop and say it is good enough.
>
>
> > The concept of an
> > extensible version 2 transport would become unworkable in that
> > case.
>
> IMO that's overstating it. We have a lot to learn about how to
> extend RPC-over-RDMA. The extension rules in rpcrdma-version-two
> have to state how extension specifications deal with adding
> new ULB requirements. Right now, we don't know what requirements
> even existing proposed extensions might add. For instance:
>
> - Do we really believe that rfc5666bis DDP eligibility is an
> appropriate rule to apply, without any change, to send-based DDP?
> I don't think that's a foregone conclusion.
>
> - Can you explain here why you believe that message continuation
> will have an impact on the way ULBs are written?
>
> The point being that we have a set of ULB composition rules in
> rfc5666bis. You seem to be adding new ones that we haven't all
> agreed on, and then dinging me (repeatedly) for ignoring them.
>
>
> > As an example, the current rfc5667bis, in numerous places
> > assumes that no mesaage continuation feature is available.  We have
> > suffered, for a long time, from the lack of this feature which SMB
> > Direct has had for a long time.  Even apart from the controversial
> > concept of send-based DDP, I don't see how we can have an rfc5667bis
> > that essentially forecloses a message continuation extension.
>
> I'm much more comfortable closing the book on rfc5667bis with the
> set of transports we have in RFCs now, and then opening another
> update when we have real documented transport changes to address.
> Otherwise rfc5667bis will never be finished.
>
> Let's design for what we have, clear the table, and move on. There's
> no reason to believe that will create an unending sequence of updates
> to the NFS ULB. There will be no more ULB updates than there are
> extensions, and quite likely fewer.
>
>
> > > As I see it, "retrying short Reply chunks" is a new and untried
> > > piece of protocol. It attempts to accommodate a future,
> > > unimplemented transport or ULP, and therefore it lies outside the
> > > mandated scope of rfc5667bis IMO.
> >
> > But that position would essentially ties us to Version One forever,
> which,
> > aside from the particulars of the discussion in Dallas, is not where we
> want
> > to be.,
>
> It doesn't tie anyone to anything. The WG is allowed to open
> updates of existing RFCs as many times as they like. I believe
> there is adequate process here to deal properly with it.
>
>
> > I think Version One can accommodate retry, even though it
> > has some weaknesses with regard to non-idempotent operations.
>
> ERR_CHUNK can be returned for a lot of reasons. As we learned
> when trying to add extensibility to V1, that could make it
> inadequate for this purpose.
>
> If short Reply chunk retry is mentioned in rfc5667bis, I prefer
> that it is discussed in the following terms:
>
> 1. It is an optional remedy, written as implementation advice.
>
> 2. The choice to retry is at the discretion of the ULP on the
> requester (not the transport)
>
> 3. Retry is done via a distinct RPC (fresh XID)
>
> 4. Short Reply chunk retry is permitted only for NFSACL GETACL
> and NFSv4.0 GETATTR
>
>
> > It is true that version Two will probably have better, more complete
> > support.
>
> Possibly all that is necessary is an unambiguos error code.
>
>
> > It may be that there is something in rfc566bis as currently written, that
> > makes this difficult to do at this late stage.  I could live with that,
> > but given that rfc5667bis needs to be made transport-generic, I think
> > it should definitely allow this in future transport Versions, if those
> > versions support it, and not forclose it, just as it should not foreclose
> > message continuation.
>
> Another way to "allow retry in future versions" is not to
> mention it at all, which is what I proposed in the previous
> e-mail.
>
>
> > > Second, it's unnecessary in the real world:
> >
> > I agree that it is very unlikely to be used and that many implementations
> > may choose not to implement it.
> >
> > > I'm not aware of a single instance in the field where a server has
> > > had to reject a client's request because of a short Reply chunk.
> > > It's certainly never occurred during my testing, but lots of other
> > > issues have.
> >
> > OK, but the problem is that, without it, when and if this happens,
> > we have a situation in which it is clear there is a bug, but it cannot be
> > determined whether the fault is due to the requester or the responder.
>
> The responder reports, via a local system log, that the requester
> is in error. The Solaris NFS/RDMA server implementation reports
> similar errors this way all the time, in addition to sending
> GARBAGE_ARGS replies, so I know this works.
>
> It can be done with V1 implementations, and it can be done even
> if requesters do not support retrying.
>
>
> > In order to allow interoperable implementations, this kind of situation
> > must be avoided.
>
> > If the spec allows retry but the requester implementation does not
> provide it,
> > then it has a bug and it can decode it is willing to live with a small
> possibility of
> > failure.
> >
> > >Third, the places where retry might be useful are all legacy
> > > protocols:
> >
> > If you use the term "legacy protocols" as it is used in rfc5667bis, this
> is not so.
>
> To be clear, I meant legacy in the sense that the book is closed
> on NFSACL and NFSv4.0.
>
>
> > Also, rfc5666bis requires reliable reply estimation of all ULPs, rather
> than
> > only of non-legacy ones.
>
> The intent of that requirement is to ensure interoperation.
>
> You haven't explained why "failing a request that is not formed
> according to the rules" is not sufficient to establish that
> implementations interoperate successfully.
>
> If the rfc5666bis requirement is too stern or hazy or in some
> other way not sensible, we should fix it.
>
> But I don't see anything that states a ULB is prevented from
> identifying gray areas or possible cases that can't be handled.
>
>
> > On Tue, Jan 31, 2017 at 12:04 PM, Chuck Lever <chuck.lever@oracle.com>
> wrote:
> >
> > > On Jan 31, 2017, at 7:11 AM, David Noveck <davenoveck@gmail.com>
> wrote:
> > >
> > > > Without an RPC Reply message, however, the client matches the XID in
> the
> > > > ERR_CHUNK message to a previous call and that will have the matching
> > > > SEQUENCE operation.
> > >
> > > Right.
> >
> > As I clicked Send yesterday, I blinked and "retried" became "retired".
> > Although I'm a fan of "retired", a different term might be less likely
> > to confuse readers in this context. Suggestions welcome.
> >
> > So now I read this text
> >
> > >    In addition, within the error response, the requester does not have
> > >    the result of the execution of the SEQUENCE operation, which
> > >    identifies the session, slot, and sequence id for the request which
> > >    has failed.  The xid associated with the request, obtained from the
> > >    rdma_xid field of the RDMA_ERROR or RDMA_MSG message, must be used
> to
> > >    determine the session and slot for the request which failed, and the
> > >    slot must be properly retired.  If this is not done, the slot could
> > >    be rendered permanently unavailable.
> >
> > to mean "No short Reply chunk retries are permitted when a session is
> > in use." I'm much more comfortable with this requirement.
> >
> > And, now there is one less use case for retrying a short Reply chunk.
> >
> >
> > > > That makes this rather a layering violation,
> > >
> > > It looks to me like this layering violation is not serious and I think
> it is
> > > unavoidable.
> >
> > Let's consider how the layers might work together in this case.
> >
> > Sizing a Reply chunk is done by the ULP. If the Reply chunk turns out
> > to be too small, it is the ULP, not the transport, that will have to
> > replace that chunk with a larger Reply chunk.
> >
> > The ULP therefore must be fully aware that the previous attempt
> > failed, and why. It also has to be capable of deciding whether or not
> > a retry is advisable, given other factors such as responder reply
> > caching.
> >
> > Since the ULP is driving the retry, given the separation of duties
> > between a ULP and the RPC layer, it is possible that a new XID will
> > be used for the retried operation. The responder does not have a way
> > of recognizing the link between the failed XID and the new one.
> >
> > To address this, then the ULP would have to indicate what XID should
> > be used in the retry's RPC (and RPC-over-RDMA) header.
> >
> >
> > > > and perhaps a reason why
> > > > retransmitting with a larger Reply chunk might be a cure worse than
> the
> > > > disease.
> > >
> > > Note that the layering violation does not arise when retrying the
> operation
> > > with a larger reply chunk.
> > >
> > > The case when this layering violation/misdemeanor occurs is when
> > > the operation is failed and the slot needs to be made available
> again.  Retrying
> > > the operation with a larger reply chunk would make make this situation.
> > > less likely to occur. However, since 4.1 has a reliable means of
> limiting reply
> > > size (unlike v4.0), it appears that this is beside the point for
> session-based
> > > minor versions of nfsv4.
> > >
> > > Note that the disease here is the fact that v4.0  (and some auxiliary
> > > protocols) does not have a reliable means of determining reply size
> limits.
> > > We can't cure that disease as these protocols are unchangeable.
> > >
> > > Allowing retry with a larger reply chunk is not a cure but it is a
> treatment which
> > > ameliorates the problem.  As far as I can see, it is a safe and
> effective treatment.
> >
> > The safe and most conservative course is to terminate the RPC, in all
> > cases. Some ULPs might be able to tolerate retry, others might not.
> >
> > Each ULP has to make that decision on a case-by-case basis. The
> > transport by itself cannot arbitrarily perform a retry, it is now
> > clear.
> >
> > Thus I'm considering removing the retry language from rfc5667bis.
> >
> > First, the mandate I recall receiving in Dallas for rfc5667bis was:
> >
> > 1. Document existing implementations
> > 2. Fix mistakes and omissions
> > 3. Extend the NFS ULB to properly cover NFSv4.1 and NFSv4.2
> >
> > Since then we have added:
> >
> > 4. Make it align with new language in rfc5666bis
> >
> > And snuck in:
> >
> > 5. Make it align with the language of rpcrdma-version-two and
> > rpcrdma-rtrext (which I'm also struggling with, but that's a
> > separate topic)
> >
> > As I see it, "retrying short Reply chunks" is a new and untried
> > piece of protocol. It attempts to accommodate a future,
> > unimplemented transport or ULP, and therefore it lies outside the
> > mandated scope of rfc5667bis IMO.
> >
> > Second, it's unnecessary in the real world:
> >
> > I'm not aware of a single instance in the field where a server has
> > had to reject a client's request because of a short Reply chunk.
> > It's certainly never occurred during my testing, but lots of other
> > issues have.
> >
> > Third, the places where retry might be useful are all legacy
> > protocols:
> >
> > As you pointed out, NFSv4.0 and NFSACL appear to be the two areas
> > where we have concerns. The problem is largely addressed by NFSv4.1
> > and newer minor versions, and even there, retrying is not permitted.
> > The future does not need this new behavior, apparently.
> >
> > What retrying amounts to, in this context, is adding a workaround
> > in the transport protocol for bugs in the Upper Layer Protocols and
> > their implementations.
> >
> >
> > > > I'm beginning to believe that making this situation always a
> permanent
> > > > error, as rfc5666bis does, is a better protocol choice.
> > >
> > > I don;t see it that way.  It leaves us with an rfc5666bis  requirement
> for ULBs that
> > > we would be unable to satisfy for a number of ULPs dealt with in
> rfc5667bis.
> >
> > This is a hole that can be closed simply by prescribing how each
> > implementation must behave when a Reply chunk is short. That has
> > already been done in rfc5666bis: the RPC fails. This prevents a
> > transport deadlock and connection loss, and indicates a ULP
> > implementation bug.
> >
> > Why does rfc5667bis need to go further? The consequences of a
> > shortage are clear, and no longer catastrophic to other RPCs.
> >
> >
> > --
> > Chuck Lever
> >
> >
> >
> >
>
> --
> Chuck Lever
>
>
>
>