git.proxmox.com Git - mirror_ubuntu-focal-kernel.git/log

nfsd4: implement new 4.1 open reclaim types

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround

0c12eaffdf09466f36a9ffe970dda8f4aeb6efc0 "nfsd: don't break lease on
CLAIM_DELEGATE_CUR" was a temporary workaround for a problem fixed
properly in the vfs layer by 778fc546f749c588aa2f6cd50215d2715c374252
"locks: fix tracking of inprogress lease breaks", so we can revert that
change (but keeping some minor cleanup from that commit).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: warn on open failure after create

If we create the object and then return failure to the client, we're
left with an unexpected file in the filesystem.

I'm trying to eliminate such cases but not 100% sure I have so an
assertion might be helpful for now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: preallocate open stateid in process_open1()

As with the nfs4_file, we'd prefer to find out about any failure before
creating a new file rather than after.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: do idr preallocation with stateid allocation

Move idr preallocation out of stateid initialization, into stateid
allocation, so that we no longer have to handle any errors from the
former.

This is a little subtle due to the way the idr code manages these
preallocated items--document that in comments.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: preallocate nfs4_file in process_open1()

Creating a new file is an irrevocable step--once it's visible in the
filesystem, other processes may have seen it and done something with it,
and unlinking it wouldn't simply undo the effects of the create.

Therefore, in the case where OPEN creates a new file, we shouldn't do
the create until we know that the rest of the OPEN processing will
succeed.

For example, we should preallocate a struct file in case we need it
until waiting to allocate it till process_open2(), which is already too
late.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: clean up open owners on OPEN failure

If process_open1() creates a new open owner, but the open later fails,
the current code will leave the open owner around. It won't be on the
close_lru list, and the client isn't expected to send a CLOSE, so it
will hang around as long as the client does.

Similarly, if process_open1() removes an existing open owner from the
close lru, anticipating that an open owner that previously had no
associated stateid's now will, but the open subsequently fails, then
we'll again be left with the same leak.

Fix both problems.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify process_open1 logic

No change in behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: make is_open_owner boolean

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: centralize renew_client() calls

There doesn't seem to be any harm to renewing the client a bit earlier,
when it is looked up. That saves us from having to sprinkle
renew_client calls over quite so many places.

Also remove a redundant comment and do a little cleanup.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: typo logical vs bitwise negate

This should be a bitwise negate here. It silences a Sparse warning:
fs/nfsd/nfs4xdr.c:693:16: warning: dubious: x & !y

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfs: fix bug about IPv6 address scope checking

The result from ipv6_addr_scope() is a set of flags, not a single value,
so we can't just compare the result with IPV6_ADDR_SCOPE_LINKLOCAL.

This patch fixs the problem, and checks for unequal addresses before
scope_id.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: more robust ignoring of WANT bits in OPEN

Mask out the WANT bits right at the start instead of on each use.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: move name-length checks to xdr

Again, these checks are better in the xdr code.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: move access/deny validity checks to xdr code

I'd rather put more of these sorts of checks into standardized xdr
decoders for the various types rather than have them cluttering up the
core logic in nfs4proc.c and nfs4state.c.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: ignore WANT bits in open downgrade

We don't use WANT bits yet--and sending them can probably trigger a
BUG() further down.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

sunrpc: add MODULE_ALIAS to match the filesystem name

sunrpc implements the rpc_pipefs filesystem type.
Add the alias to have the module requested automatically by the kernel
when the filesystem is mounted.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup state.h comments

These comments are mostly out of date.

Reported-by: Bryan Schumaker <bjschuma@netapp.com>

nfsd4: clean up downgrading code

In response to some review comments, get rid of the somewhat obscure
for-loop with bitops, and improve a comment.

Reported-by: Steve Dickson <steved@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: fix state lock usage in LOCKU

In commit 5ec094c1096ab3bb795651855d53f18daa26afde "nfsd4: extend state
lock over seqid replay logic" I modified the exit logic of all the
seqid-based procedures except nfsd4_locku().  Fix the oversight.

The result of the bug was a double-unlock while handling the LOCKU
procedure, and a warning like:

[  142.150014] WARNING: at kernel/mutex-debug.c:78 debug_mutex_unlock+0xda/0xe0()
...
[  142.152927] Pid: 742, comm: nfsd Not tainted 3.1.0-rc1-SLIM+ #9
[  142.152927] Call Trace:
[  142.152927]  [<ffffffff8105fa4f>] warn_slowpath_common+0x7f/0xc0
[  142.152927]  [<ffffffff8105faaa>] warn_slowpath_null+0x1a/0x20
[  142.152927]  [<ffffffff810960ca>] debug_mutex_unlock+0xda/0xe0
[  142.152927]  [<ffffffff813e4200>] __mutex_unlock_slowpath+0x80/0x140
[  142.152927]  [<ffffffff813e42ce>] mutex_unlock+0xe/0x10
[  142.152927]  [<ffffffffa03bd3f5>] nfs4_lock_state+0x35/0x40 [nfsd]
[  142.152927]  [<ffffffffa03b0b71>] nfsd4_proc_compound+0x2a1/0x690
[nfsd]
[  142.152927]  [<ffffffffa039f9fb>] nfsd_dispatch+0xeb/0x230 [nfsd]
[  142.152927]  [<ffffffffa02b1055>] svc_process_common+0x345/0x690
[sunrpc]
[  142.152927]  [<ffffffff81058d10>] ? try_to_wake_up+0x280/0x280
[  142.152927]  [<ffffffffa02b16e2>] svc_process+0x102/0x150 [sunrpc]
[  142.152927]  [<ffffffffa039f0bd>] nfsd+0xbd/0x160 [nfsd]
[  142.152927]  [<ffffffffa039f000>] ? 0xffffffffa039efff
[  142.152927]  [<ffffffff8108230c>] kthread+0x8c/0xa0
[  142.152927]  [<ffffffff813e8694>] kernel_thread_helper+0x4/0x10
[  142.152927]  [<ffffffff81082280>] ? kthread_worker_fn+0x190/0x190
[  142.152927]  [<ffffffff813e8690>] ? gs_change+0x13/0x13

Reported-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: look up stateid's per clientid

Use a separate stateid idr per client, and lookup a stateid by first
finding the client, then looking up the stateid relative to that client.

Also some minor refactoring.

This allows us to improve error returns: we can return expired when the
clientid is not found and bad_stateid when the clientid is found but not
the stateid, as opposed to returning expired for both cases.

I hope this will also help to replace the state lock mostly by a
per-client lock, but that hasn't been done yet.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: assume test_stateid always has session

Test_stateid is 4.1-only and only allowed after a sequence operation, so
this check is unnecessary.

Cc: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: use idr for stateid's

The idr system is designed exactly for generating id and looking up
integer id's. Thanks to Trond for pointing it out.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: move client * to nfs4_stateid, add init_stid helper

This will be convenient.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

leases: split up generic_setlease into lock/unlock cases

Eventually we should probably do the same thing to the file operations
as well.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: make op_cacheresult another flag

I'm not sure why I used a new field for this originally.

Also, the differences between some of these flags are a little subtle;
add some comments to explain.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: fix open downgrade, again

Yet another open-management regression:

- nfs4_file_downgrade() doesn't remove the BOTH access bit on
  downgrade, so the server's idea of the stateid's access gets
  out of sync with the client's.  If we want to keep an O_RDWR
  open in this case, we should do that in the file_put_access
  logic rather than here.
- We forgot to convert v4 access to an open mode here.

This logic has proven too hard to get right.  In the future we may
consider:
- reexamining the lock/openowner relationship (locks probably
  don't really need to take their own references here).
- adding open upgrade/downgrade support to the vfs.
- removing the atomic operations.  They're redundant as long as
  this is all under some other lock.

Also, maybe some kind of additional static checking would help catch
O_/NFS4_SHARE_ACCESS confusion.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: hash closed stateid's like any other

Look up closed stateid's in the stateid hash like any other stateid
rather than searching the close lru.

This is simpler, and fixes a bug: currently we handle only the case of a
close that is the last close for a given stateowner, but not the case of
a close for a stateowner that still has active opens on other files.
Thus in a case like:

open(owner, file1)
open(owner, file2)
close(owner, file2)
close(owner, file2)

the final close won't be recognized as a retransmission.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: construct stateid from clientid and counter

Including the full clientid in the on-the-wire stateid allows more
reliable detection of bad vs. expired stateid's, simplifies code, and
ensures we won't reuse the opaque part of the stateid (as we currently
do when the same openowner closes and reopens the same file).

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify free_stateid

We no longer need is_deleg_stateid, for example.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: match close replays on stateid, not open owner id

Keep around an unhashed copy of the final stateid after the last close
using an openowner, and when identifying a replay, match against that
stateid instead of just against the open owner id. Free it the next
time the seqid is bumped or the stateowner is destroyed.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: replace oo_confirmed by flag bit

I want at least one more bit here. So, let's haul out the caps lock key
and add a flags field.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd41: try to check reply size before operation

For checking the size of reply before calling a operation,
we need try to get maxsize of the operation's reply.

v3: using new method as Bruce said,

"we could handle operations in two different ways:

- For operations that actually change something (write, rename,
  open, close, ...), do it the way we're doing it now: be
  very careful to estimate the size of the response before even
  processing the operation.
- For operations that don't change anything (read, getattr, ...)
  just go ahead and do the operation.  If you realize after the
  fact that the response is too large, then return the error at
  that point.

  So we'd add another flag to op_flags: say, OP_MODIFIES_SOMETHING.  And for
  operations with OP_MODIFIES_SOMETHING set, we'd do the first thing.  For
  operations without it set, we'd do the second."

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
[bfields@redhat.com: crash, don't attempt to handle, undefined op_rsize_bop]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

SUNRPC: compare scopeid for link-local addresses

For ipv6 link-local addresses, sunrpc do not compare those scope id.
This patch let sunrpc compares scope id only on link-local addresses.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

SUNRPC: Replace svc_addr_u by sockaddr_storage

For IPv6 local address, lockd can not callback to client for
missing scope id when binding address at inet6_bind:

324       if (addr_type & IPV6_ADDR_LINKLOCAL) {
325               if (addr_len >= sizeof(struct sockaddr_in6) &&
326                   addr->sin6_scope_id) {
327                       /* Override any existing binding, if another one
328                        * is supplied by user.
329                        */
330                       sk->sk_bound_dev_if = addr->sin6_scope_id;
331               }
332
333               /* Binding to link-local address requires an interface */
334               if (!sk->sk_bound_dev_if) {
335                       err = -EINVAL;
336                       goto out_unlock;
337               }

Replacing svc_addr_u by sockaddr_storage, let rqstp->rq_daddr contains more info
besides address.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

NFSD: Add a cache for fs_locations information

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: since this is server-side, use nfsd4_ prefix instead of nfs4_ prefix. ]
[ cel: implement S_ISVTX filter in bfields-normal form ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

NFSD: Remove the ex_pathname field from struct svc_export

There are no more users...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

NFSD: Cleanup for nfsd4_path()

The current code is sort of hackish in that it assumes a referral is always
matched to an export. When we add support for junctions that may not be the
case.
We can replace nfsd4_path() with a function that encodes the components
directly from the dentries. Since nfsd4_path is currently the only user of
the 'ex_pathname' field in struct svc_export, this has the added benefit
of allowing us to get rid of that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: better stateid hashing

First, we shouldn't care here about the structure of the opaque part of
the stateid. Second, this hash is really dumb. (I'm not sure the
replacement is much better, though--to look at it another patch.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: use deleg changes to cleanup preprocess_stateid_op

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: fix test_stateid for delegation stateid's

Test_stateid should handle delegation stateid's as well.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: hash deleg stateid's like any other

It's simpler to look up delegation stateid's in the same hash table as
any other stateid.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: share common stid-hashing helper function

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: add common dl_stid field to delegation

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: move some of nfs4_stateid into a separate structure

We want delegations to share more with open/lock stateid's, so first
we'll pull out some of the common stuff we want to share.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: remove redundant stateid initialization

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: rename init_stateid

Note this is actually open-stateid specific.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: pass around typemask instead of flags

We're only using those flags to choose lock or open stateid's at this
point.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: split preprocess_seqid, cleanup

Move most of this into helper functions. Also move the non-CONFIRM case
into caller, providing a helper function for that purpose.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: split up find_stateid

Minor cleanup.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: rearrange to avoid a forward reference

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: split out some free_generic_stateid code

We'll use this elsewhere.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: split stateowners into open and lockowners

The stateowner has some fields that only make sense for openowners, and
some that only make sense for lockowners, and I find it a lot clearer if
those are separated out.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: move CLOSE_STATE special case to caller

Move the CLOSE_STATE case into the unique caller that cares about it
rather than putting it in preprocess_seqid_op.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: move double-confirm test to open_confirm

I don't see the point of having this check in nfs4_preprocess_seqid_op()
when it's only needed by the one caller.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify check_open logic

Sometimes the single-exit style is good, sometimes it's unnecessarily
convoluted....

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: share common seqid checks

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: eliminate unused lt_stateowner

This is used only as a local variable.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: drop most stateowner refcounting

Maybe we'll bring it back some day, but we don't have much real use for
it now.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: eliminate impossible open replay case

If open fails with any error other than nfserr_replay_me, then the main
nfsd4_proc_compound() loop continues unconditionally to
nfsd4_encode_operation(), which will always call encode_seqid_op_tail.
Thus the condition we check for here does not occur.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: extend state lock over seqid replay logic

There are currently a couple races in the seqid replay code: a
retransmission could come while we're still encoding the original reply,
or a new seqid-mutating call could come as we're encoding a replay.

So, extend the state lock over the encoding (both encoding of a replayed
reply and caching of the original encoded reply).

I really hate doing this, and previously added the stateowner
reference-counting code to avoid it (which was insufficient)--but I
don't see a less complicated alternative at the moment.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup seqid op stateowner usage

Now that the replay owner is in the cstate we can remove it from a lot
of other individual operations and further simplify
nfs4_preprocess_seqid_op().

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: centralize handling of replay owners

Set the stateowner associated with a replay in one spot in
nfs4_preprocess_seqid_op() and keep it in cstate. This allows removing
a few lines of boilerplate from all the nfs4_preprocess_seqid_op()
callers.

Also turn ENCODE_SEQID_OP_TAIL into a function while we're here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: make delegation stateid's seqid start at 1

Thanks to Casey for reminding me that 5661 gives a special meaning to a
value of 0 in the stateid's seqid field, so all stateid's should start
out with si_generation 1. We were doing that in the open and lock
cases for minorversion 1, but not for the delegation stateid, and not
for openstateid's with v4.0.

It doesn't *really* matter much for v4.0 or for delegation stateid's
(which never get the seqid field incremented), but we may as well do the
same for all of them.

Reported-by: Casey Bodley <cbodley@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify stateid generation code, fix wraparound

Follow the recommendation from rfc3530bis for stateid generation number
wraparound, simplify some code, and fix or remove incorrect comments.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: consolidate lock & open stateid tables

There's no reason to have two separate hash tables for open and lock
stateid's.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify distinguishing lock & open stateid's

The trick free_stateid is using is a little cheesy, and we'll have more
uses for this field later.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: remove typoed replay field

Wow, I wonder how long that typo's been there.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: fix off-by-one-error in SEQUENCE reply

The values here represent highest slotid numbers. Since slotid's are
numbered starting from zero, the highest should be one less than the
number of slots.

Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd: remove include/linux/nfsd/syscall.h

We don't need this any more.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: remove redundant is_open_owner check

When called with OPEN_STATE, preprocess_seqid_op only returns an open
stateid, hence only an open owner.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: get lock checks out of preprocess_seqid_op

We've got some lock-specific code here in nfs4_preprocess_seqid_op which
is only used by nfsd4_lock(). Move it to the caller.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify lock openmode check

Note that the special handling for the lock stateid case is already done
by nfs4_check_openmode() (as of 02921914170e3b7fea1cd82dac9713685d2de5e2
"nfsd4: fix openmode checking on IO using lock stateid") so we no longer
need these two cases in the caller.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup and consolidate seqid_mutating_err

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: remove HAS_SESSION

This flag doesn't really buy us anything.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: cleanup lock/stateowner initialization

Share some common code, stop doing silly things like initializing a list
head immediately before adding it to a list, etc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: name openowner data structures more clearly

These appear to be generic (for both open and lock owners), but they're
actually just for open owners. This has confused me more than once.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: replace some macros by functions

For all the usual reasons. (Type safety, readability.)

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: stop using nfserr_resource for transitory errors

The server is returning nfserr_resource for both permanent errors and
for errors (like allocation failures) that might be resolved by retrying
later. Save nfserr_resource for the former and use delay/jukebox for
the latter.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: fix failure to end nfsd4 grace period

Even if we fail to write a recovery record, we should still mark the
client as having acquired its first state. Otherwise we leave 4.1
clients with indefinite ERR_GRACE returns.

However, an inability to write stable storage records may cause failures
of reboot recovery, and the problem should still be brought to the
server administrator's attention.

So, make sure the error is logged.

These errors shouldn't normally be triggered on a corectly functioning
server--this isn't a case where a misconfigured client could spam the
logs.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: simplify recovery dir setting

Move around some of this code, simplify a bit.

Reviewed-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd: prettify NFSD_MAY_* flag definitions

Acked-by: Jim Rees <rees@umich.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: permit read opens of executable-only files

A client that wants to execute a file must be able to read it. Read
opens over nfs are therefore implicitly allowed for executable files
even when those files are not readable.

NFSv2/v3 get this right by using a passed-in NFSD_MAY_OWNER_OVERRIDE on
read requests, but NFSv4 has gotten this wrong ever since
dc730e173785e29b297aa605786c94adaffe2544 "nfsd4: fix owner-override on
open", when we realized that the file owner shouldn't override
permissions on non-reclaim NFSv4 opens.

So we can't use NFSD_MAY_OWNER_OVERRIDE to tell nfsd_permission to allow
reads of executable files.

So, do the same thing we do whenever we encounter another weird NFS
permission nit: define yet another NFSD_MAY_* flag.

The industry's future standardization on 128-bit processors will be
motivated primarily by the need for integers with enough bits for all
the NFSD_MAY_* flags.

Reported-by: Leonardo Borda <leonardoborda@gmail.com>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

Remove include/linux/nfsd/const.h

Userspace shouldn't have a use for these constants. Nothing here is
used outside fs/nfsd.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd: remove unused defines

At least one of these is actually wrong anyway.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: it's OK to return nfserr_symlink

The nfsd4 code has a bunch of special exceptions for error returns which
map nfserr_symlink to other errors.

In fact, the spec makes it clear that nfserr_symlink is to be preferred
over less specific errors where possible.

The patch that introduced it back in 2.6.4 is "kNFSd: correct symlink
related error returns.", which claims that these special exceptions are
represent an NFSv4 break from v2/v3 tradition--when in fact the symlink
error was introduced with v4.

I suspect what happened was pynfs tests were written that were overly
faithful to the (known-incomplete) rfc3530 error return lists, and then
code was fixed up mindlessly to make the tests pass.

Delete these unnecessary exceptions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: fix incorrect comment in nfsd4_set_nfs4_acl

Zero means "I don't care what kind of file this is". And that's
probably what we want--acls are also settable at least on directories,
and if the filesystem doesn't want them on other objects, leave it to it
to complain.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd: clean up nfsd_mode_check()

Add some more comments, simplify logic, do & S_IFMT just once, name
"type" more helpfully.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd: open-code special directory-hardlink check

We allow the fh_verify caller to specify that any object *except* those
of a given type is allowed, by passing a negative type. But only one
caller actually uses it. Open-code that check in the one caller.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: clean up S_IS -> NF4 file type mapping

A slightly unconventional approach to make the code more compact I could
live with, but let's give the poor reader *some* chance.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

sunrpc: use better NUMA affinities

Use NUMA aware allocations to reduce latencies and increase throughput.

sunrpc kthreads can use kthread_create_on_node() if pool_mode is
"percpu" or "pernode", and svc_prepare_thread()/svc_init_buffer() can
also take into account NUMA node affinity for memory allocations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: "J. Bruce Fields" <bfields@fieldses.org>
CC: Neil Brown <neilb@suse.de>
CC: David Miller <davem@davemloft.net>
Reviewed-by: Greg Banks <gnb@fastmail.fm>
[bfields@redhat.com: fix up caller nfs41_callback_up]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

locks: setlease cleanup

There's an incorrect comment here. Also clean up the logic: the
"rdlease" and "wrlease" locals are confusingly named, and don't really
add anything since we can make a decision as soon as we hit one of these
cases.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

locks: fix tracking of inprogress lease breaks

We currently use a bit in fl_flags to record whether a lease is being
broken, and set fl_type to the type (RDLCK or UNLCK) that it will
eventually have. This means that once the lease break starts, we forget
what the lease's type *used* to be. Breaking a read lease will then
result in blocking read opens, even though there's no conflict--because
the lease type is now F_UNLCK and we can no longer tell whether it was
previously a read or write lease.

So, instead keep fl_type as the original type (the type which we
enforce), and keep track of whether we're unlocking or merely
downgrading by replacing the single FL_INPROGRESS flag by
FL_UNLOCK_PENDING and FL_DOWNGRADE_PENDING flags.

To get this right we also need to track separate downgrade and break
times, to handle the case where a write-leased file gets conflicting
opens first for read, then later for write.

(I first considered just eliminating the downgrade behavior
completely--nfsv4 doesn't need it, and nobody as far as I can tell
actually uses it currently--but Jeremy Allison tells me that Windows
oplocks do behave this way, so Samba will probably use this some day.)

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

locks: move F_INPROGRESS from fl_type to fl_flags field

F_INPROGRESS isn't exposed to userspace. To me it makes more sense in
fl_flags....

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

locks: minor lease cleanup

Use a helper function, to simplify upcoming changes.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: return nfserr_symlink on v4 OPEN of non-regular file

Without this, an attempt to open a device special file without first
stat'ing it will fail.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: fix seqid_mutating_error

The set of errors here does *not* agree with the set of errors specified
in the rfc!

While we're there, turn this macros into a function, for the usual
reasons, and move it to the one place where it's actually used.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

nfsd4: Remove check for a 32-bit cookie in nfsd4_readdir()

Fan Yong <yong.fan@whamcloud.com> noticed setting
FMODE_32bithash wouldn't work with nfsd v4, as
nfsd4_readdir() checks for 32 bit cookies. However, according to RFC 3530
cookies have a 64 bit type and cookies are also defined as u64 in
'struct nfsd4_readdir'. So remove the test for >32-bit values.

Cc: stable@kernel.org
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

Linux 3.1-rc1

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Fix build with DEBUG_PAGEALLOC enabled.