Re: [nfsv4] 4.0 trunking

David Noveck <davenoveck@gmail.com> Wed, 21 September 2016 00:55 UTC

Return-Path: <davenoveck@gmail.com>
X-Original-To: nfsv4@ietfa.amsl.com
Delivered-To: nfsv4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6FE5F12B531 for <nfsv4@ietfa.amsl.com>; Tue, 20 Sep 2016 17:55:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.699
X-Spam-Level:
X-Spam-Status: No, score=-2.699 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p84NZXKnjwpY for <nfsv4@ietfa.amsl.com>; Tue, 20 Sep 2016 17:55:34 -0700 (PDT)
Received: from mail-oi0-x22e.google.com (mail-oi0-x22e.google.com [IPv6:2607:f8b0:4003:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 33BBC12B53E for <nfsv4@ietf.org>; Tue, 20 Sep 2016 17:55:33 -0700 (PDT)
Received: by mail-oi0-x22e.google.com with SMTP id t83so42704727oie.3 for <nfsv4@ietf.org>; Tue, 20 Sep 2016 17:55:33 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=x22QlXtEjyMx81zJNjpW6LkCgdT09EVXtM91IRZRIk8=; b=vH7WaBjKcbqnMgPWQ/JoBOjR8oJHMh3KRlaSyAetAhrooUz1W+fSmO/m2ZEeC/9B/F TL6Z5XdNDqy6GRkCyYa04WG1BRlahuDZKfIrIpK5DiT2KDZoxkCdXt44m4W/TicCAH4A RjJYWLW/4Z95f7PHmELAOx93oPFXZqL3IFDYkyVydiCcbedZSvYt/SvC7CO1mA6FJiMr 0D6eJ27hSBGYHx11l/EYhP6i3T2IK1YIFd6nDwn0u8gxA6IzmzYJ9ufZ20Mtqa/aUlde aPYoeBJUCKKd/vXCeWWO1MZ/9VABnjWnzgqfm82u/PgJf24Zuo17y3yhak2wZ+fiDZ9i WBEQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=x22QlXtEjyMx81zJNjpW6LkCgdT09EVXtM91IRZRIk8=; b=gJviMR21tH+SKsVdEq8SmbpmediFgPn7WuEVo5wUQEDt89mZIV4HJGKlR8aHVVoN1G b3WVfa7Ytti9sxS0bfW767q+APk2VGTJnbfvEkqUNbX1P0tGxIhiXGJhuCjetjf8oLYl 76NUyON8sBWFIfCo5JW9Jtj576tR0zs0QTRa9JAQdd2NSrqD0huGvssekLMGVF4do0ZU w5nBuGNjLsHMkXbOfJ3K9ymg89rVDACLHtlLl1iWaL4B+XrzT1cBCFETuPoHnKw6f0Rk GKWOJ8I5CN1B0yivxPBHj1WcBWrpp0gwiaU+wsSZjlty3MJ1Kpsj6cKmghU5kDhUCYlT Tdcw==
X-Gm-Message-State: AE9vXwNig7AZrXpd4O8pXcshK0ROG58rv4zkNsVEzr4Xt/VDyIPodn9JmuSSsb5jcdUjV+YQ/5VdEgyCdvg97A==
X-Received: by 10.202.240.11 with SMTP id o11mr12806970oih.23.1474419332525; Tue, 20 Sep 2016 17:55:32 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.182.192.10 with HTTP; Tue, 20 Sep 2016 17:55:31 -0700 (PDT)
In-Reply-To: <20160920213931.GC12789@fieldses.org>
References: <20160907212039.GA6847@fieldses.org> <CADaq8jfiRU7DTRYXGHZvMALAZWeRjhqcpo8Si3_diMt_5dNSMw@mail.gmail.com> <20160908010532.GA10658@fieldses.org> <CADaq8jcnananUPDHH4Vzhv93JTxZegsZLMCtZWFD-keHheKvHA@mail.gmail.com> <20160910200355.GA30688@fieldses.org> <BBB2EBDC-6F05-44C1-B45A-C84C24A9AD7F@netapp.com> <20160920213931.GC12789@fieldses.org>
From: David Noveck <davenoveck@gmail.com>
Date: Tue, 20 Sep 2016 20:55:31 -0400
Message-ID: <CADaq8jeqr9WPeBeV0ruxwy+5omfgHhJFjAR=tCGL3N-Yjt0PuQ@mail.gmail.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Content-Type: multipart/alternative; boundary="94eb2c096f3e90b7ca053cfa00ff"
Archived-At: <https://mailarchive.ietf.org/arch/msg/nfsv4/kOwJZW62thrHD1W7hSKDwyhVxVw>
Cc: "Adamson, Andy" <William.Adamson@netapp.com>, "nfsv4@ietf.org" <nfsv4@ietf.org>
Subject: Re: [nfsv4] 4.0 trunking
X-BeenThere: nfsv4@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: NFSv4 Working Group <nfsv4.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/nfsv4/>
List-Post: <mailto:nfsv4@ietf.org>
List-Help: <mailto:nfsv4-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/nfsv4>, <mailto:nfsv4-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 21 Sep 2016 00:55:38 -0000

>The thing is, though, that in that case we already have our
> answer at step 3:
> if X1 pointed to the same server as X2, then the
> SETCLIENTID to X1 would have to return a distinct verifier.

True but the converse doesn't hold.  If you get a distinct verifier,
it could be because X1 and X2 are the same server, but the other
possibility is that X1 and X2 are different servers and, if they are, they
may well happen to have different verifiers.

But I think you are on to something.  If the verifiers for X1 and X2 are V1
and V2 respectively, if V2 works successfully then you have to go back and
use V1 on X1.  If the SETCLIENTID_CONFIRM to X2 using V2 succeeds, you can
go back and try another confirm to X1 using V2.  If X1 and X2 are distinct,
that should still succeed.  On the other hand, if they are the same server
the creation of V2 should replace V1 and it will become invalid.

> We can stop there.

Or somewhere.

On Tue, Sep 20, 2016 at 5:39 PM, J. Bruce Fields <bfields@fieldses.org>
wrote:

> On Mon, Sep 12, 2016 at 12:59:05PM +0000, Adamson, Andy wrote:
> > IMHO we should not worry too much about NFSv4.0 trunking as NFSv4.1+
> solves this issue. Trunking is simply one more reason to move from NFSv4.0
> to NFSv4.1.
>
> I tend to agree, but since it got implemented an was causing a real bug,
> it was hard to ignore....
>
> After a patch (55b9df93ddd6 "NFSv4/v4.1: Verify the client owner id
> during trunking detection", if anyone cares) the damage seems restricted
> to the non-default "migration" case, making this less of a concern.
>
> I wonder if there's an easy fix, though: if I'm reading rfc 7931 right,
> it works (slightly simplified) like this:
>
>         1. a SETCLIENTID to IP address X1 returns a clientid.  You
>         confirm it and carry on.
>
>         2.  Some time later a SETCLIENTID to IP address X2 returns the
>         same clientid.  To check whether X1 and X2 actually point to the
>         same server:
>
>         3. Send a callback-changing SETCLIENTID to X1.  Get back
>         (clientid, verifer).
>
>         4. Confirm it with a SETCLIENTID_CONFIRM to X2.
>
> If the SETCLIENTID_CONFIRM succeeds then they're the same server.
> Servers are required to return an error (STALE_CLIENTID, I think?) if
> the verifier in the SETCLIENTID_CONFIRM arguments doesn't match.
>
> But, if the SETCLIENTID at step 2 just happened to return the same
> verifier as the SETCLIENTID at step 3, then the SETCLIENTID_CONFIRM at
> step 4 will succeed even if the two servers are different.  Hence the
> bug.
>
> The thing is, though, that in that case we already have our answer at
> step 3: if X1 pointed to the same server as X2, then the SETCLIENTID to
> X1 would have to return a distinct verifier.  We can stop there.
>
> With that addition, don't we get the right answer every time, without
> any unjustified assumptions about the randomness of server's clientid
> and verifier generation?  Or am I missing something?
>
> --b.
>
> >
> > —>Andy
> >
> >
> > > On Sep 10, 2016, at 4:03 PM, J. Bruce Fields <bfields@fieldses.org>
> wrote:
> > >
> > > On Sat, Sep 10, 2016 at 03:38:29PM -0400, David Noveck wrote:
> > >> Thanks for pointing this out.
> > >>
> > >> Before I address the details of this, let me state my overall
> position:
> > >>
> > >>   - I think it is correct that RFC7931 overstates the degree of
> assurance
> > >>   that one might reasonably have regarding the unlikelihood of
> spurious
> > >>   collision of clientid4's and verifiers.
> > >>   - Nevertheless, I don't think the situation is as bleak as you
> paint it,
> > >>   with regard to the Linux server approach to these matters that you
> > >>   describe.  This is basically because I don't see the issue of the
> > >>   correlation between clientid4's and verifiers the same way that you
> do. See
> > >>   below.
> > >>   - I think it is possible to address the issue satisfactorily in the
> > >>   context of an errata rfc7931 errata and will start working on that.
> > >>   - Given the weakness of the rfc 3530/7530 requirements in this
> area, we
> > >>   may need (yet another) RFC updating 7530 at some point.
> > >>   - I see that as a longer-term effort, since the practices that you
> > >>   describe will not result in large numbers of spurious collisions and
> > >>   clients can adapt the algorithm to require additional verification.
> > >>   - If there are servers out there whose practices are significantly
> more
> > >>   troublesome than the ones you describe, we need to find out soon.
> Given
> > >>   that many of those responsible for v4.0 implementations may not be
> reading
> > >>   this list, I suggest we discuss this at the October Bakeathon.
> > >>
> > >>
> > >>> The Linux server is generating clientid as a (boot time, counter)
> pair.
> > >>
> > >> I'm assuming the boot time is in the form of a 32-bit unsigned number
> of
> > >> seconds after some fixed date/time.  Correct?
> > >>
> > >> If that is the case, duplicates would require that:
> > >> 1. Two servers be booted during the same second (e.g. when power came
> back
> > >> on after a disruption).
> > >
> > > Right, or routine maintenance, or whatever.
> > >
> > >> 2. Each of the servers has received the same number of client
> SETCLIENTIDs
> > >> (mod 4 billion).
> > >>
> > >> Clearly this is possible although I would say that it is unlikely.
> > >> Nevertheless, the "highly unlikely" in RFC7931 is overstating it.
> > >
> > > It turns out that if you have a hundred servers that get rebooted
> > > simultaneously for a kernel update or some similar routine maintenance,
> > > this happen every time.  (I'm a step removed from the original case
> > > here, but I believe this is more-or-less accurate and not
> hypothetical.)
> > >
> > > Pretty sure you can easily hit this without that big a setup, too.
> > >
> > >> The case that would be more worrisome would be one which a server
> simply
> > >> used a 64-bit word of byte-addressable persistent memory (e.g.
> NVDIMM, 3d
> > >> xpoint) to maintain a permanent counter, dispensing with the boot
> time.
> > >
> > > That'd be something worth looking into in cases the users have the
> right
> > > hardware (not always).  The workaround I'm going with for now is
> > > initializing the counter part to something random.  (Random numbers may
> > > not be completely reliable at boot time either, but I suspect they'll
> be
> > > good enough here...).
> > >
> > >> That is allowable as RFC7530 stands.  I don't expect that this is a
> current
> > >> problem but, given where persistent memory is going (i.e. cheaper and
> more
> > >> common), we could see this in the future.
> > >>
> > >> Clearly, spurious clientid4 collisions are possible, which is the
> point of
> > >> the whole algorithm.
> > >>
> > >>> Ditto for the verifier--it's a (boot time, counter) pair,
> > >>
> > >> I presume the same boot time format is used.  Correct?
> > >>
> > >>> with the second part coming from a different counter, but
> > >>> normally I think they'll get incremented at the same time,
> > >>
> > >> It is true that they will both get incremented in many common (i.e.
> > >> "normal") situations.
> > >>
> > >> However, this common case is not the only case.
> > >>
> > >> In particular, in the case in which one or more clients are following
> the
> > >> guidance in RFC7931 and there are trunked server addresses, you will
> have
> > >> occasions in which the verifier counter will get incremented and the
> > >> clientid4 counter will not.
> > >> 1.  If the SETCLIENTID is used to modify the callback information
> > >> 2.  If two different SETCLIENTID operations are done by the same
> client
> > >
> > > Sure.
> > >
> > >>> so clientid and verifier are highly correlated.
> > >>
> > >> I don't think so.  The boot time portions are clearly the same, but
> given
> > >> the likely occurrence of cases 1. and 2.these counters will typically
> have
> > >> different values.  They may be correlated in the sense that the
> difference
> > >> between the two values is likely not to be large, but that does not
> mean
> > >> that the probability of them having the same value is necessarily
> substantial.
> > >>
> > >> Once there are any occasions in which the verifiers are incremented
> and the
> > >> clientid's are not, these values are essentially uncorrelated.  The
> only
> > >> way that they can collide at the same time the clientids collide is
> when
> > >> the number of instances of the cases 1. and 2. above is a multiple of
> > >> 2^32.
> > >
> > > I'm not following.  I think you may be confusing clientid-verifier
> > > collisions with verifier-verifier (from different server) collisions.
> > > The latter is what matters.
> > >
> > > The confusion may be my fault.  I should have said something like:
> > > chance of a collision between two clientid's is correlated with chance
> > > of a collision between two verifiers given out by the same two servers.
> > > So adding in a verifier comparison doesn't decrease the probability as
> > > much as you'd expect.
> > >
> > >> While I think that RFC7931's "vanishingly small" is not correct, I
> > >> would argue that this brings us into "highly unlikely" territory.
> > >
> > > So, unfortunately, we appear to have an actual real-life case where
> this
> > > happens all the time.
> > >
> > >> Also, the
> > >> existing text does suggest that you can repeat the procedure any
> number of
> > >> times to reduce the likelihood of a spurious determination that two
> > >> addresses are trunked.
> > >
> > > My intuition is that that would mean a lot of effort for a
> disappointing
> > > reduction in the probability of a bug, but I haven't done any
> > > experiments or thought this through much.
> > >
> > > --b.
> > >
> > >>> So, we can mitigate this by adding some randomness, OK.
> > >>
> > >> One version of that might be to start the counter fields with the
> > >> nanoseconds portion of the boot time.
> > >>
> > >> Alternatively, you might  keep the  current counter each starting at
> zero
> > >> but take advantage of the fact that clientid and verfier are used
> > >> together.  To do that the timestamp, in clientid4 might be boot time
> in
> > >> seconds while the one in the verifier would be boot time nanoseconds
> within
> > >> the second during which the boot occurred. That would reduce the
> frequency
> > >> of verifier collisions but this might not be necessary, if, as i
> expect,
> > >> the algorithm will usually result in distinct verifiers anyway.
> > >>
> > >>> But that's a new requirement not apparent from 3530,
> > >>
> > >> I think the 7931 text can be patched up via an errata, but if not we
> will
> > >> have to consider how and whether to address the issue in 7530.
> > >> Unfortunately, that would be kind of big for an errata.
> > >>
> > >> If no existing servers have a substantial issue, then we do have the
> option
> > >> of doing a longer-term RFC updating 7530.  This could be directed to
> the
> > >> (probably now) hypothetical case of a server not using boot time and
> > >> maintaining 64-bit global persistent counts.
> > >>
> > >> Such servers would have a higher probability of clientid4 conflict and
> > >> verifier conflict than the ones you mention, while staying within the
> > >> rfc7530 requirements.
> > >>
> > >>> and I wouldn't be surprised if other servers have similar issues.
> > >>
> > >> Those who read this list have been notified of the issue by your
> message.
> > >>
> > >> The problem is with server implementers who don't read this list.  In
> > >> theory, errata should take care of that, but aside from the
> difficulty of
> > >> arriving at a suitable change and getting it through, it may be that
> many
> > >> v4.0 implementers and maintainers may not be reading this list or
> paying
> > >> much attention to errata either.  Perhaps we can discuss this in
> Westford
> > >> where many implementers, who might not be all that aware of stuff on
> the
> > >> working group list, would be present.  That's also one way to see if
> there
> > >> are existing servers where this is big problem.
> > >
> > > OK!
> > >
> > > Assuming nobody's doing anything very complicated--it's probably also
> > > not difficult to guess algorithms in testing.  Something we can do in
> > > Westford, or maybe before if anyone has access to a variety of servers.
> > >
> > > --b.
> > >
> > >>
> > >>
> > >> On Wed, Sep 7, 2016 at 9:05 PM, J. Bruce Fields <bfields@fieldses.org
> >
> > >> wrote:
> > >>
> > >>> On Wed, Sep 07, 2016 at 08:14:58PM -0400, David Noveck wrote:
> > >>>>> But I can't find any further discussion of how a client might make
> that
> > >>>>> determination.  Am I overlooking it?
> > >>>>
> > >>>> It's actually in section 5.8 of RFC7931.
> > >>>
> > >>> Oops, thanks!
> > >>>
> > >>> Looking at that, I think none of this is true:
> > >>>
> > >>>        Note that the NFSv4.0 specification requires the server to
> make
> > >>>        sure that such verifiers are very unlikely to be regenerated.
> > >>>
> > >>> Verifiers given out by one server shouldn't be reused, sure, but
> that's
> > >>> quite different from the claim that collisions *between* servers are
> > >>> unlikely.
> > >>>
> > >>>        Given that it is already highly unlikely that the clientid4 XC
> > >>>        is duplicated by distinct servers,
> > >>>
> > >>> Why is that highly unlikely?
> > >>>
> > >>>        the probability that SCn is
> > >>>        duplicated as well has to be considered vanishingly small.
> > >>>
> > >>> There's no reason to believe the probability of a verifier collision
> is
> > >>> uncorrelated with the probability of a clientid collision.
> > >>>
> > >>> The Linux server is generating clientid as a (boot time, counter)
> pair.
> > >>> So collision between servers started at the same time (probably not
> that
> > >>> unusual) is possible.
> > >>>
> > >>> Ditto for the verifier--it's a (boot time, counter) pair, with the
> > >>> second part coming from a different counter, but normally I think
> > >>> they'll get incremented at the same time, so clientid and verifier
> are
> > >>> highly correlated.
> > >>>
> > >>> So, we can mitigate this by adding some randomness, OK.  But that's a
> > >>> new requirement not apparent from 3530, and I wouldn't be surprised
> if
> > >>> other servers have similar issues.
> > >>>
> > >>> --b.
> > >>>
> > >>>> I've only been keeping draft-ietf-nfsv4-migration-issues alive
> because
> > >>> of
> > >>>> the section dealing with issues relating to v4.1.Otherwise, I would
> have
> > >>>> let the thing expire.  The next time I update this, I'll probably
> > >>> collapse
> > >>>> sections 4 and 5 to a short section saying that all the
> > >>>> v4.0 issues were addressed by publication of RFC7931.
> > >>>>
> > >>>>> (It appears that the Linux client is trying to do that by sending
> > >>>>> a setclientid_confirm to server1 using the (clientid,verifier)
> > >>>>> returned from a setclientid reply from server2.  That doesn't look
> > >>>>> correct to me.)
> > >>>>
> > >>>> It seems kind of weird but the idea is that if you get the same
> clientid
> > >>>> from two server IP address they are probably connected to the same
> server
> > >>>> (i.e are trunked), but there is a chance that having the same
> clientid
> > >>> is a
> > >>>> coincidence,  The idea is that if these are the same server using
> the
> > >>>> verifier will work but if they are different servers it won't work
> but
> > >>> will
> > >>>> be harmless.
> > >>>>
> > >>>>
> > >>>> On Wed, Sep 7, 2016 at 5:20 PM, J. Bruce Fields <
> bfields@fieldses.org>
> > >>>> wrote:
> > >>>>
> > >>>>> In
> > >>>>> https://tools.ietf.org/html/draft-ietf-nfsv4-migration-
> > >>>>> issues-10#section-5.4.2
> > >>>>>
> > >>>>>        In the face of possible trunking of server IP addresses, the
> > >>>>>        client will use the receipt of the same clientid4 from
> multiple
> > >>>>>        IP-addresses, as an indication that the two IP- addresses
> may
> > >>> be
> > >>>>>        trunked and proceed to determine, from the observed server
> > >>>>>        behavior whether the two addresses are in fact trunked.
> > >>>>>
> > >>>>> But I can't find any further discussion of how a client might make
> that
> > >>>>> determination.  Am I overlooking it?
> > >>>>>
> > >>>>> (It appears that the Linux client is trying to do that by sending a
> > >>>>> setclientid_confirm to server1 using the (clientid,verifier)
> returned
> > >>>>> from a setclientid reply from server2.  That doesn't look correct
> to
> > >>>>> me.)
> > >>>>>
> > >>>>> --b.
> > >>>>>
> > >>>
> > >
> > > _______________________________________________
> > > nfsv4 mailing list
> > > nfsv4@ietf.org
> > > https://www.ietf.org/mailman/listinfo/nfsv4
> >
> > _______________________________________________
> > nfsv4 mailing list
> > nfsv4@ietf.org
> > https://www.ietf.org/mailman/listinfo/nfsv4
>
> _______________________________________________
> nfsv4 mailing list
> nfsv4@ietf.org
> https://www.ietf.org/mailman/listinfo/nfsv4
>