OSDN Git Service

tomoyo/tomoyo-test1.git
4 years agopNFS: Fix RCU lock leakage
Trond Myklebust [Sat, 11 Apr 2020 15:37:18 +0000 (11:37 -0400)]
pNFS: Fix RCU lock leakage

Another brown paper bag moment. pnfs_alloc_ds_commits_list() is leaking
the RCU lock.

Fixes: a9901899b649 ("pNFS: Add infrastructure for cleaning up per-layout commit structures")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Clean up process of marking inode stale.
Trond Myklebust [Mon, 6 Apr 2020 17:39:29 +0000 (13:39 -0400)]
NFS: Clean up process of marking inode stale.

Instead of the various open coded calls to set the NFS_INO_STALE bit
and call nfs_zap_caches(), consolidate them into a single function
nfs_set_inode_stale().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoSUNRPC: Don't start a timer on an already queued rpc task
Trond Myklebust [Sat, 4 Apr 2020 23:52:21 +0000 (19:52 -0400)]
SUNRPC: Don't start a timer on an already queued rpc task

Move the test for whether a task is already queued to prevent
corruption of the timer list in __rpc_sleep_on_priority_timeout().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
Trond Myklebust [Thu, 2 Apr 2020 19:37:02 +0000 (15:37 -0400)]
NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()

When we're sending a layoutreturn, ensure that we reference the
layout cred atomically with the copy of the stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
Trond Myklebust [Thu, 2 Apr 2020 19:47:08 +0000 (15:47 -0400)]
NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()

Ensure that the dereference of the layout cred is atomic with the
stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Beware when dereferencing the delegation cred
Trond Myklebust [Thu, 2 Apr 2020 19:27:09 +0000 (15:27 -0400)]
NFS: Beware when dereferencing the delegation cred

When we look up the delegation cred, we are usually doing so in
conjunction with a read of the stateid, and we want to ensure
that the look up is atomic with that read.

Fixes: 57f188e04773 ("NFSv4: nfs_update_inplace_delegation() should update delegation cred")
[sfr@canb.auug.org.au: Fixed up borken Fixes: line from Trond :-)]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
Trond Myklebust [Thu, 2 Apr 2020 16:37:25 +0000 (12:37 -0400)]
NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout

Setting nfs_mountpoint_expiry_timeout() to a negative value stops
mountpoint expiration, while setting it to a positive value restarts
the scheduler.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: finish_automount() requires us to hold 2 refs to the mount record
Trond Myklebust [Thu, 2 Apr 2020 14:34:36 +0000 (10:34 -0400)]
NFS: finish_automount() requires us to hold 2 refs to the mount record

We must not return from nfs_d_automount() without holding 2 references
to the mount record. Doing so, will trigger the BUG() in finish_automount().
Also ensure that we don't try to reschedule the automount timer with
a negative or zero timeout value.

Fixes: 22a1ae9a93fb ("NFS: If nfs_mountpoint_expiry_timeout < 0, do not expire submounts")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Fix a few constant_table array definitions
Scott Mayhew [Thu, 2 Apr 2020 21:20:44 +0000 (17:20 -0400)]
NFS: Fix a few constant_table array definitions

nfs_vers_tokens, nfs_xprt_protocol_tokens, and nfs_secflavor_tokens were
all missing an empty item at the end of the array, allowing
lookup_constant() to potentially walk off the end and trigger and oops.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: e38bb238ed8c ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
Cc: stable@vger.kernel.org # v5.6
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Try to join page groups before an O_DIRECT retransmission
Trond Myklebust [Tue, 31 Mar 2020 00:57:49 +0000 (20:57 -0400)]
NFS: Try to join page groups before an O_DIRECT retransmission

If we have to retransmit requests, try to join their page groups
first.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Refactor nfs_lock_and_join_requests()
Trond Myklebust [Mon, 30 Mar 2020 16:40:47 +0000 (12:40 -0400)]
NFS: Refactor nfs_lock_and_join_requests()

Refactor nfs_lock_and_join_requests() in order to separate out the
subrequest merging into its own function nfs_lock_and_join_group()
that can be used by O_DIRECT.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Reverse the submission order of requests in __nfs_pageio_add_request()
Trond Myklebust [Tue, 31 Mar 2020 22:27:26 +0000 (18:27 -0400)]
NFS: Reverse the submission order of requests in __nfs_pageio_add_request()

If we have to split the request up into subrequests, we have to submit
the request pointed to by the function call parameter last, in case
there is an error or other issue that causes us to exit before the
last request is submitted. The reason is that the caller is expected
to perform cleanup in those cases.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Clean up nfs_lock_and_join_requests()
Trond Myklebust [Mon, 30 Mar 2020 15:12:16 +0000 (11:12 -0400)]
NFS: Clean up nfs_lock_and_join_requests()

Clean up nfs_lock_and_join_requests() to simplify the calculation
of the range covered by the page group, taking into account the
presence of mirrors.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Remove the redundant function nfs_pgio_has_mirroring()
Trond Myklebust [Mon, 30 Mar 2020 00:03:33 +0000 (20:03 -0400)]
NFS: Remove the redundant function nfs_pgio_has_mirroring()

We need to trust that desc->pg_mirror_idx is set correctly, whether
or not mirroring is enabled.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Fix memory leaks in nfs_pageio_stop_mirroring()
Trond Myklebust [Mon, 30 Mar 2020 00:06:45 +0000 (20:06 -0400)]
NFS: Fix memory leaks in nfs_pageio_stop_mirroring()

If we just set the mirror count to 1 without first clearing out
the mirrors, we can leak queued up requests.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
Trond Myklebust [Tue, 31 Mar 2020 00:13:48 +0000 (20:13 -0400)]
NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()

nfs_direct_write_scan_commit_list() will lock the request and bump
the reference count, but we also need to account for the reference
that was taken when we initially added the request to the commit list.

Fixes: fb5f7f20cdb9 ("NFS: commit errors should be fatal")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Fix use-after-free issues in nfs_pageio_add_request()
Trond Myklebust [Sun, 29 Mar 2020 23:55:05 +0000 (19:55 -0400)]
NFS: Fix use-after-free issues in nfs_pageio_add_request()

We need to ensure that we create the mirror requests before calling
nfs_pageio_add_request_mirror() on the request we are adding.
Otherwise, we can end up with a use-after-free if the call to
nfs_pageio_add_request_mirror() triggers I/O.

Fixes: c917cfaf9bbe ("NFS: Fix up NFS I/O subrequest creation")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
Trond Myklebust [Wed, 1 Apr 2020 17:04:49 +0000 (13:04 -0400)]
NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()

When a subrequest is being detached from the subgroup, we want to
ensure that it is not holding the group lock, or in the process
of waiting for the group lock.

Fixes: 5b2b5187fa85 ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
Trond Myklebust [Wed, 1 Apr 2020 14:07:16 +0000 (10:07 -0400)]
NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()

When we detach a subrequest from the list, we must also release the
reference it holds to the parent.

Fixes: 5b2b5187fa85 ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoMerge tag 'nfs-rdma-for-5.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Trond Myklebust [Sat, 28 Mar 2020 16:01:17 +0000 (12:01 -0400)]
Merge tag 'nfs-rdma-for-5.7-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

NFSoRDMA Client Updates for Linux 5.7

New Features:
- Allow one active connection and several zombie connections to prevent
  blocking if the remote server is unresponsive.

Bugfixes and Cleanups:
- Enhance MR-related trace points
- Refactor connection set-up and disconnect functions
- Make Protection Domains per-connection instead of per-transport
- Merge struct rpcrdma_ia into rpcrdma_ep

4 years agoNFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
Trond Myklebust [Sat, 28 Mar 2020 15:39:29 +0000 (11:39 -0400)]
NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()

If the FLUSH_SYNC flag is set, nfs_initiate_pgio() will currently
wait for completion, and then return the status of the I/O operation.
What we actually want to report in nfs_pageio_doio() is whether or
not the RPC call was launched successfully, whereas actual I/O
status is intended handled in the reply callbacks.

Since FLUSH_SYNC is never set by any of the callers anyway, let's
just remove that code altogether.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS/flexfiles: Specify the layout segment range in LAYOUTGET
Trond Myklebust [Mon, 23 Mar 2020 19:18:12 +0000 (15:18 -0400)]
pNFS/flexfiles: Specify the layout segment range in LAYOUTGET

Move from requesting only full file layout segments, to requesting
layout segments that match our I/O size. This means the server is
still free to return a full file layout, but we will no longer
error out if it does not.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS/flexfiles: remove requirement for whole file layouts
Trond Myklebust [Mon, 23 Mar 2020 18:33:11 +0000 (14:33 -0400)]
pNFS/flexfiles: remove requirement for whole file layouts

Remove the requirement that the server always sends whole file
layouts.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS/flexfiles: Check the layout segment range before doing I/O
Trond Myklebust [Mon, 23 Mar 2020 18:48:23 +0000 (14:48 -0400)]
pNFS/flexfiles: Check the layout segment range before doing I/O

When starting to read or write with a layout segment, check that the
range matches our request.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS/flexfile: Don't merge layout segments if the mirrors don't match
Trond Myklebust [Mon, 23 Mar 2020 19:40:20 +0000 (15:40 -0400)]
pNFS/flexfile: Don't merge layout segments if the mirrors don't match

Check that the number of mirrors, and the mirror information matches
before deciding to merge layout segments in pNFS/flexfiles.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pNFS: Fix pnfs_layout_mark_request_commit() invalid layout segment handling
Trond Myklebust [Sun, 22 Mar 2020 20:08:55 +0000 (16:08 -0400)]
NFS/pNFS: Fix pnfs_layout_mark_request_commit() invalid layout segment handling

Fix up pnfs_layout_mark_request_commit() to alway reschedule the write
if the layout segment is invalid. Also minor cleanup.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pNFS: Simplify bucket layout segment reference counting
Trond Myklebust [Sun, 22 Mar 2020 18:47:38 +0000 (14:47 -0400)]
NFS/pNFS: Simplify bucket layout segment reference counting

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pNFS: Clean up pNFS commit operations
Trond Myklebust [Sat, 21 Mar 2020 15:13:05 +0000 (11:13 -0400)]
NFS/pNFS: Clean up pNFS commit operations

Move the pNFS commit related operations into a separate structure
that can be carried by the pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Remove bucket array from struct pnfs_ds_commit_info
Trond Myklebust [Sat, 21 Mar 2020 13:50:05 +0000 (09:50 -0400)]
NFS: Remove bucket array from struct pnfs_ds_commit_info

Remove the unused bucket array in struct pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pNFS: Add a helper pnfs_generic_search_commit_reqs()
Trond Myklebust [Fri, 20 Mar 2020 23:24:19 +0000 (19:24 -0400)]
NFS/pNFS: Add a helper pnfs_generic_search_commit_reqs()

Lift filelayout_search_commit_reqs() into the generic pnfs/nfs code,
and add support for commit arrays.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS: Enable per-layout segment commit structures
Trond Myklebust [Fri, 20 Mar 2020 22:34:33 +0000 (18:34 -0400)]
pNFS: Enable per-layout segment commit structures

Enable adding and lookup of per-layout segment commits in filelayout
and flexfilelayout.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS: Add infrastructure for cleaning up per-layout commit structures
Trond Myklebust [Fri, 20 Mar 2020 20:04:06 +0000 (16:04 -0400)]
pNFS: Add infrastructure for cleaning up per-layout commit structures

Ensure that both the file and flexfiles layout types clean up when
freeing the layout segments.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pNFS: Support commit arrays in nfs_clear_pnfs_ds_commit_verifiers()
Trond Myklebust [Fri, 20 Mar 2020 23:06:48 +0000 (19:06 -0400)]
NFS/pNFS: Support commit arrays in nfs_clear_pnfs_ds_commit_verifiers()

Add support for scanning the full list of per-layout segment commit
arrays to nfs_clear_pnfs_ds_commit_verifiers().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Fix O_DIRECT commit verifier handling
Trond Myklebust [Sat, 21 Mar 2020 13:27:46 +0000 (09:27 -0400)]
NFS: Fix O_DIRECT commit verifier handling

Instead of trying to save the commit verifiers and checking them against
previous writes, adopt the same strategy as for buffered writes, of
just checking the verifiers at commit time.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: commit errors should be fatal
Trond Myklebust [Sat, 21 Mar 2020 13:36:13 +0000 (09:36 -0400)]
NFS: commit errors should be fatal

Fix the O_DIRECT code to avoid retries if the COMMIT fails with a fatal
error.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pNFS: Allow O_DIRECT to release the DS commitinfo
Trond Myklebust [Fri, 20 Mar 2020 21:08:02 +0000 (17:08 -0400)]
NFS/pNFS: Allow O_DIRECT to release the DS commitinfo

Add a pNFS callback to allow the O_DIRECT code to release the DS
commitinfo when freeing the dreq.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS: Support per-layout segment commits in pnfs_generic_commit_pagelist()
Trond Myklebust [Thu, 19 Mar 2020 23:31:00 +0000 (19:31 -0400)]
pNFS: Support per-layout segment commits in pnfs_generic_commit_pagelist()

Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_commit_pagelist().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS: Support per-layout segment commits in pnfs_generic_recover_commit_reqs()
Trond Myklebust [Thu, 19 Mar 2020 19:48:42 +0000 (15:48 -0400)]
pNFS: Support per-layout segment commits in pnfs_generic_recover_commit_reqs()

Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_recover_commit_reqs().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4/pNFS: Scan the full list of commit arrays when committing
Trond Myklebust [Thu, 19 Mar 2020 17:41:08 +0000 (13:41 -0400)]
NFSv4/pNFS: Scan the full list of commit arrays when committing

Add support for scanning the full list of per-layout segment commit
arrays to pnfs_generic_scan_commit_lists()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4/pnfs: Support a list of commit arrays in struct pnfs_ds_commit_info
Trond Myklebust [Thu, 19 Mar 2020 17:36:36 +0000 (13:36 -0400)]
NFSv4/pnfs: Support a list of commit arrays in struct pnfs_ds_commit_info

When we have multiple layout segments with different lists of mirrored
data, we need to track the commits on a per layout segment basis.
This patch adds a list to support this tracking in struct
pnfs_ds_commit_info.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoxprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt
Chuck Lever [Fri, 21 Feb 2020 22:01:05 +0000 (17:01 -0500)]
xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt

Change the rpcrdma_xprt_disconnect() function so that it no longer
waits for the DISCONNECTED event.  This prevents blocking if the
remote is unresponsive.

In rpcrdma_xprt_disconnect(), the transport's rpcrdma_ep is
detached. Upon return from rpcrdma_xprt_disconnect(), the transport
(r_xprt) is ready immediately for a new connection.

The RDMA_CM_DEVICE_REMOVAL and RDMA_CM_DISCONNECTED events are now
handled almost identically.

However, because the lifetimes of rpcrdma_xprt structures and
rpcrdma_ep structures are now independent, creating an rpcrdma_ep
needs to take a module ref count. The ep now owns most of the
hardware resources for a transport.

Also, a kref is needed to ensure that rpcrdma_ep sticks around
long enough for the cm_event_handler to finish.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Extract sockaddr from struct rdma_cm_id
Chuck Lever [Fri, 21 Feb 2020 22:01:00 +0000 (17:01 -0500)]
xprtrdma: Extract sockaddr from struct rdma_cm_id

rpcrdma_cm_event_handler() is always passed an @id pointer that is
valid. However, in a subsequent patch, we won't be able to extract
an r_xprt in every case. So instead of using the r_xprt's
presentation address strings, extract them from struct rdma_cm_id.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep
Chuck Lever [Fri, 21 Feb 2020 22:00:54 +0000 (17:00 -0500)]
xprtrdma: Merge struct rpcrdma_ia into struct rpcrdma_ep

I eventually want to allocate rpcrdma_ep separately from struct
rpcrdma_xprt so that on occasion there can be more than one ep per
xprt.

The new struct rpcrdma_ep will contain all the fields currently in
rpcrdma_ia and in rpcrdma_ep. This is all the device and CM settings
for the connection, in addition to per-connection settings
negotiated with the remote.

Take this opportunity to rename the existing ep fields from rep_* to
re_* to disambiguate these from struct rpcrdma_rep.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Disconnect on flushed completion
Chuck Lever [Fri, 21 Feb 2020 22:00:49 +0000 (17:00 -0500)]
xprtrdma: Disconnect on flushed completion

Completion errors after a disconnect often occur much sooner than a
CM_DISCONNECT event. Use this to try to detect connection loss more
quickly.

Note that other kernel ULPs do take care to disconnect explicitly
when a WR is flushed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Remove rpcrdma_ia::ri_flags
Chuck Lever [Fri, 21 Feb 2020 22:00:44 +0000 (17:00 -0500)]
xprtrdma: Remove rpcrdma_ia::ri_flags

Clean up:
The upper layer serializes calls to xprt_rdma_close, so there is no
need for an atomic bit operation, saving 8 bytes in rpcrdma_ia.

This enables merging rpcrdma_ia_remove directly into the disconnect
logic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Invoke rpcrdma_ia_open in the connect worker
Chuck Lever [Fri, 21 Feb 2020 22:00:38 +0000 (17:00 -0500)]
xprtrdma: Invoke rpcrdma_ia_open in the connect worker

Move rdma_cm_id creation into rpcrdma_ep_create() so that it is now
responsible for allocating all per-connection hardware resources.

With this clean-up, all three arms of the switch statement in
rpcrdma_ep_connect are exactly the same now, thus the switch can be
removed.

Because device removal behaves a little differently than
disconnection, there is a little more work to be done before
rpcrdma_ep_destroy() can release the connection's rdma_cm_id. So
it is not quite symmetrical with rpcrdma_ep_create() yet.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Allocate Protection Domain in rpcrdma_ep_create()
Chuck Lever [Fri, 21 Feb 2020 22:00:33 +0000 (17:00 -0500)]
xprtrdma: Allocate Protection Domain in rpcrdma_ep_create()

Make a Protection Domain (PD) a per-connection resource rather than
a per-transport resource. In other words, when the connection
terminates, the PD is destroyed.

Thus there is one less HW resource that remains allocated to a
transport after a connection is closed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Refactor rpcrdma_ep_connect() and rpcrdma_ep_disconnect()
Chuck Lever [Fri, 21 Feb 2020 22:00:28 +0000 (17:00 -0500)]
xprtrdma: Refactor rpcrdma_ep_connect() and rpcrdma_ep_disconnect()

Clean up: Simplify the synopses of functions in the connect and
disconnect paths in preparation for combining the rpcrdma_ia and
struct rpcrdma_ep structures.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Clean up the post_send path
Chuck Lever [Fri, 21 Feb 2020 22:00:23 +0000 (17:00 -0500)]
xprtrdma: Clean up the post_send path

Clean up: Simplify the synopses of functions in the post_send path
by combining the struct rpcrdma_ia and struct rpcrdma_ep arguments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Refactor frwr_init_mr()
Chuck Lever [Fri, 21 Feb 2020 22:00:17 +0000 (17:00 -0500)]
xprtrdma: Refactor frwr_init_mr()

Clean up: prepare for combining the rpcrdma_ia and rpcrdma_ep
structures. Take the opportunity to rename the function to be
consistent with the "subsystem _ object _ verb" naming scheme.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Invoke rpcrdma_ep_create() in the connect worker
Chuck Lever [Fri, 21 Feb 2020 22:00:12 +0000 (17:00 -0500)]
xprtrdma: Invoke rpcrdma_ep_create() in the connect worker

Refactor rpcrdma_ep_create(), rpcrdma_ep_disconnect(), and
rpcrdma_ep_destroy().

rpcrdma_ep_create will be invoked at connect time instead of at
transport set-up time. It will be responsible for allocating per-
connection resources. In this patch it allocates the CQs and
creates a QP. More to come.

rpcrdma_ep_destroy() is the inverse functionality that is
invoked at disconnect time. It will be responsible for releasing
the CQs and QP.

These changes should be safe to do because both connect and
disconnect is guaranteed to be serialized by the transport send
lock.

This takes us another step closer to resolving the address and route
only at connect time so that connection failover to another device
will work correctly.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agoxprtrdma: Enhance MR-related trace points
Chuck Lever [Wed, 12 Feb 2020 16:12:35 +0000 (11:12 -0500)]
xprtrdma: Enhance MR-related trace points

Two changes:
- Show the number of SG entries that were mapped. This helps debug
  DMA-related problems.
- Record the MR's resource ID instead of its memory address. This
  groups each MR with its associated rdma-tool output, and reduces
  needless exposure of memory addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
4 years agopNFS: Add a helper to allocate the array of buckets
Trond Myklebust [Wed, 18 Mar 2020 21:22:47 +0000 (17:22 -0400)]
pNFS: Add a helper to allocate the array of buckets

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS/pNFS: Refactor pnfs_generic_commit_pagelist()
Trond Myklebust [Thu, 19 Mar 2020 21:29:12 +0000 (17:29 -0400)]
NFS/pNFS: Refactor pnfs_generic_commit_pagelist()

Refactor pnfs_generic_commit_pagelist() to simplify the conversion
to layout segment based commit lists.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS/flexfiles: Simplify allocation of the mirror array
Trond Myklebust [Thu, 19 Mar 2020 14:13:05 +0000 (10:13 -0400)]
pNFS/flexfiles: Simplify allocation of the mirror array

Just allocate the array at the end of the layout segment structure,
instead of allocating it as a separate array of pointers.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoSUNRPC: fix krb5p mount to provide large enough buffer in rq_rcvsize
Olga Kornievskaia [Thu, 26 Mar 2020 14:24:51 +0000 (10:24 -0400)]
SUNRPC: fix krb5p mount to provide large enough buffer in rq_rcvsize

Ever since commit 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing
reply buffer size"). It changed how "req->rq_rcvsize" is calculated. It
used to use au_cslack value which was nice and large and changed it to
au_rslack value which turns out to be too small.

Since 5.1, v3 mount with sec=krb5p fails against an Ontap server
because client's receive buffer it too small.

For gss krb5p, we need to account for the mic token in the verifier,
and the wrap token in the wrap token.

RFC 4121 defines:
mic token
Octet no   Name        Description
         --------------------------------------------------------------
         0..1     TOK_ID     Identification field.  Tokens emitted by
                             GSS_GetMIC() contain the hex value 04 04
                             expressed in big-endian order in this
                             field.
         2        Flags      Attributes field, as described in section
                             4.2.2.
         3..7     Filler     Contains five octets of hex value FF.
         8..15    SND_SEQ    Sequence number field in clear text,
                             expressed in big-endian order.
         16..last SGN_CKSUM  Checksum of the "to-be-signed" data and
                             octet 0..15, as described in section 4.2.4.

that's 16bytes (GSS_KRB5_TOK_HDR_LEN) + chksum

wrap token
Octet no   Name        Description
         --------------------------------------------------------------
          0..1     TOK_ID    Identification field.  Tokens emitted by
                             GSS_Wrap() contain the hex value 05 04
                             expressed in big-endian order in this
                             field.
          2        Flags     Attributes field, as described in section
                             4.2.2.
          3        Filler    Contains the hex value FF.
          4..5     EC        Contains the "extra count" field, in big-
                             endian order as described in section 4.2.3.
          6..7     RRC       Contains the "right rotation count" in big-
                             endian order, as described in section
                             4.2.5.
          8..15    SND_SEQ   Sequence number field in clear text,
                             expressed in big-endian order.
          16..last Data      Encrypted data for Wrap tokens with
                             confidentiality, or plaintext data followed
                             by the checksum for Wrap tokens without
                             confidentiality, as described in section
                             4.2.4.

Also 16bytes of header (GSS_KRB5_TOK_HDR_LEN), encrypted data, and cksum
(other things like padding)

RFC 3961 defines known cksum sizes:
Checksum type              sumtype        checksum         section or
                                value            size         reference
   ---------------------------------------------------------------------
   CRC32                            1               4           6.1.3
   rsa-md4                          2              16           6.1.2
   rsa-md4-des                      3              24           6.2.5
   des-mac                          4              16           6.2.7
   des-mac-k                        5               8           6.2.8
   rsa-md4-des-k                    6              16           6.2.6
   rsa-md5                          7              16           6.1.1
   rsa-md5-des                      8              24           6.2.4
   rsa-md5-des3                     9              24             ??
   sha1 (unkeyed)                  10              20             ??
   hmac-sha1-des3-kd               12              20            6.3
   hmac-sha1-des3                  13              20             ??
   sha1 (unkeyed)                  14              20             ??
   hmac-sha1-96-aes128             15              20         [KRB5-AES]
   hmac-sha1-96-aes256             16              20         [KRB5-AES]
   [reserved]                  0x8003               ?         [GSS-KRB5]

Linux kernel now mainly supports type 15,16 so max cksum size is 20bytes.
(GSS_KRB5_MAX_CKSUM_LEN)

Re-use already existing define of GSS_KRB5_MAX_SLACK_NEEDED that's used
for encoding the gss_wrap tokens (same tokens are used in reply).

Fixes: 2c94b8eca1a2 ("SUNRPC: Use au_rslack when computing reply buffer size")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Don't specify NFS version in "UDP not supported" error
Petr Vorel [Tue, 24 Mar 2020 20:08:49 +0000 (21:08 +0100)]
NFS: Don't specify NFS version in "UDP not supported" error

UDP was originally disabled in 6da1a034362f for NFSv4. Later in
b24ee6c64ca7 UDP is by default disabled by NFS_DISABLE_UDP_SUPPORT=y for
all NFS versions. Therefore remove v4 from error message.

Fixes: b24ee6c64ca7 ("NFS: allow deprecation of NFS UDP protocol")

Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agonfsroot: set tcp as the default transport protocol
Liwei Song [Wed, 25 Mar 2020 03:50:13 +0000 (11:50 +0800)]
nfsroot: set tcp as the default transport protocol

UDP is disabled by default in commit b24ee6c64ca7 ("NFS: allow
deprecation of NFS UDP protocol"), but the default mount options
is still udp, change it to tcp to avoid the "Unsupported transport
protocol udp" error if no protocol is specified when mount nfs.

Fixes: b24ee6c64ca7 ("NFS: allow deprecation of NFS UDP protocol")
Signed-off-by: Liwei Song <liwei.song@windriver.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: direct.c: Fix memory leak of dreq when nfs_get_lock_context fails
Misono Tomohiro [Wed, 28 Aug 2019 08:01:22 +0000 (17:01 +0900)]
NFS: direct.c: Fix memory leak of dreq when nfs_get_lock_context fails

When dreq is allocated by nfs_direct_req_alloc(), dreq->kref is
initialized to 2. Therefore we need to call nfs_direct_req_release()
twice to release the allocated dreq. Usually it is called in
nfs_file_direct_{read, write}() and nfs_direct_complete().

However, current code only calls nfs_direct_req_relese() once if
nfs_get_lock_context() fails in nfs_file_direct_{read, write}().
So, that case would result in memory leak.

Fix this by adding the missing call.

Signed-off-by: Misono Tomohiro <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agonfs: Fix up documentation in nfs_follow_referral() and nfs_do_submount()
Trond Myklebust [Mon, 16 Mar 2020 15:37:31 +0000 (11:37 -0400)]
nfs: Fix up documentation in nfs_follow_referral() and nfs_do_submount()

Fallout from the mount patches.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoSUNRPC: Trim stack utilization in the wrap and unwrap paths
Chuck Lever [Wed, 11 Mar 2020 15:21:17 +0000 (11:21 -0400)]
SUNRPC: Trim stack utilization in the wrap and unwrap paths

By preventing compiler inlining of the integrity and privacy
helpers, stack utilization for the common case (authentication only)
goes way down.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoSUNRPC: Remove xdr_buf_read_mic()
Chuck Lever [Wed, 11 Mar 2020 15:21:12 +0000 (11:21 -0400)]
SUNRPC: Remove xdr_buf_read_mic()

Clean up: this function is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agosunrpc: Fix gss_unwrap_resp_integ() again
Chuck Lever [Wed, 11 Mar 2020 15:21:07 +0000 (11:21 -0400)]
sunrpc: Fix gss_unwrap_resp_integ() again

xdr_buf_read_mic() tries to find unused contiguous space in a
received xdr_buf in order to linearize the checksum for the call
to gss_verify_mic. However, the corner cases in this code are
numerous and we seem to keep missing them. I've just hit yet
another buffer overrun related to it.

This overrun is at the end of xdr_buf_read_mic():

1284         if (buf->tail[0].iov_len != 0)
1285                 mic->data = buf->tail[0].iov_base + buf->tail[0].iov_len;
1286         else
1287                 mic->data = buf->head[0].iov_base + buf->head[0].iov_len;
1288         __read_bytes_from_xdr_buf(&subbuf, mic->data, mic->len);
1289         return 0;

This logic assumes the transport has set the length of the tail
based on the size of the received message. base + len is then
supposed to be off the end of the message but still within the
actual buffer.

In fact, the length of the tail is set by the upper layer when the
Call is encoded so that the end of the tail is actually the end of
the allocated buffer itself. This causes the logic above to set
mic->data to point past the end of the receive buffer.

The "mic->data = head" arm of this if statement is no less fragile.

As near as I can tell, this has been a problem forever. I'm not sure
that minimizing au_rslack recently changed this pathology much.

So instead, let's use a more straightforward approach: kmalloc a
separate buffer to linearize the checksum. This is similar to
how gss_validate() currently works.

Coming back to this code, I had some trouble understanding what
was going on. So I've cleaned up the variable naming and added
a few comments that point back to the XDR definition in RFC 2203
to help guide future spelunkers, including myself.

As an added clean up, the functionality that was in
xdr_buf_read_mic() is folded directly into gss_unwrap_resp_integ(),
as that is its only caller.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agonfs: Replace zero-length array with flexible-array member
Gustavo A. R. Silva [Mon, 9 Mar 2020 18:24:42 +0000 (13:24 -0500)]
nfs: Replace zero-length array with flexible-array member

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4.2: error out when relink swapfile
Murphy Zhou [Fri, 14 Feb 2020 14:34:09 +0000 (22:34 +0800)]
NFSv4.2: error out when relink swapfile

This fixes xfstests generic/356 failure on NFSv4.2.

Signed-off-by: Murphy Zhou <jencce.kernel@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS:remove redundant call to nfs_do_access
Zhouyi Zhou [Fri, 6 Mar 2020 03:45:26 +0000 (03:45 +0000)]
NFS:remove redundant call to nfs_do_access

In function nfs_permission:
1. the rcu_read_lock and rcu_read_unlock around nfs_do_access
is unnecessary because the rcu critical data structure is already
protected in subsidiary function nfs_access_get_cached_rcu. No other
data structure needs rcu_read_lock in nfs_do_access.

2. call nfs_do_access once is enough, because:
2-1. when mask has MAY_NOT_BLOCK bit
The second call to nfs_do_access will not happen.

2-2. when mask has no MAY_NOT_BLOCK bit
The second call to nfs_do_access will happen if res == -ECHILD, which
means the first nfs_do_access goes out after statement if (!may_block).
The second call to nfs_do_access will go through this procedure once
again except continue the work after if (!may_block).
But above work can be performed by only one call to nfs_do_access
without mangling the mask flag.

Tested in x86_64
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoSUNRPC: remove redundant assignments to variable status
Colin Ian King [Fri, 28 Feb 2020 23:44:14 +0000 (23:44 +0000)]
SUNRPC: remove redundant assignments to variable status

The variable status is being initialized with a value that is never
read and it is being updated later with a new value.  The initialization
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4: Add support for CB_RECALL_ANY for flexfiles layouts
Trond Myklebust [Tue, 18 Feb 2020 20:58:31 +0000 (15:58 -0500)]
NFSv4: Add support for CB_RECALL_ANY for flexfiles layouts

When we receive a CB_RECALL_ANY that asks us to return flexfiles
layouts, we iterate through all the layouts and look at whether or
not there are active open file descriptors that might need them
for I/O. If there are no such descriptors, we return the layouts.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4: Clean up nfs_delegation_reap_expired()
Trond Myklebust [Thu, 27 Feb 2020 14:15:19 +0000 (09:15 -0500)]
NFSv4: Clean up nfs_delegation_reap_expired()

Convert to use nfs_client_for_each_server() for efficiency.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4: Clean up nfs_delegation_reap_unclaimed()
Trond Myklebust [Thu, 27 Feb 2020 14:08:25 +0000 (09:08 -0500)]
NFSv4: Clean up nfs_delegation_reap_unclaimed()

Convert nfs_delegation_reap_unclaimed() to use nfs_client_for_each_server()
for efficiency.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4: Clean up nfs_client_return_marked_delegations()
Trond Myklebust [Thu, 27 Feb 2020 13:29:02 +0000 (08:29 -0500)]
NFSv4: Clean up nfs_client_return_marked_delegations()

Convert it to use the nfs_client_for_each_server() helper, and
make it more efficient by skipping delegations for inodes we
know are in the process of being freed. Also improve the efficiency
of the cursor by skipping delegations that are being freed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Add a helper nfs_client_for_each_server()
Trond Myklebust [Thu, 27 Feb 2020 00:16:09 +0000 (19:16 -0500)]
NFS: Add a helper nfs_client_for_each_server()

Add a helper nfs_client_for_each_server() to iterate through all the
filesystems that are attached to a struct nfs_client, and apply
a function to all the active ones.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4/pnfs: Clean up nfs_layout_find_inode()
Trond Myklebust [Thu, 27 Feb 2020 16:24:06 +0000 (11:24 -0500)]
NFSv4/pnfs: Clean up nfs_layout_find_inode()

Now that we can rely on just the rcu_read_lock(), remove the
clp->cl_lock and clean up.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4: Ensure layout headers are RCU safe
Trond Myklebust [Tue, 18 Feb 2020 22:14:40 +0000 (17:14 -0500)]
NFSv4: Ensure layout headers are RCU safe

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4/pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
Trond Myklebust [Thu, 27 Feb 2020 16:01:12 +0000 (11:01 -0500)]
NFSv4/pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()

Make sure to test the stateid for validity so that we catch instances
where the server may have been reusing stateids in
nfs_layout_find_inode_by_stateid().

Fixes: 7b410d9ce460 ("pNFS: Delay getting the layout header in CB_LAYOUTRECALL handlers")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agopNFS/flexfiles: Report DELAY and GRACE errors from the DS to the server
Trond Myklebust [Mon, 10 Feb 2020 19:45:34 +0000 (14:45 -0500)]
pNFS/flexfiles: Report DELAY and GRACE errors from the DS to the server

Ensure that if the DS is returning too many DELAY and GRACE errors, we
also report that to the MDS through the layouterror mechanism.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Limit the size of the access cache by default
Trond Myklebust [Sat, 8 Feb 2020 14:14:11 +0000 (09:14 -0500)]
NFS: Limit the size of the access cache by default

Currently, we have no real limit on the access cache size (we set it
to ULONG_MAX). That can lead to credentials getting pinned for a
very long time on lots of files if you have a system with a lot of
memory.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Avoid referencing the cred twice in async rename/unlink
Trond Myklebust [Sat, 8 Feb 2020 00:44:33 +0000 (19:44 -0500)]
NFS: Avoid referencing the cred twice in async rename/unlink

In both async rename and rename, we take a reference to the
cred in the call arguments.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4: Avoid unnecessary credential references in layoutget
Trond Myklebust [Sat, 8 Feb 2020 00:40:14 +0000 (19:40 -0500)]
NFSv4: Avoid unnecessary credential references in layoutget

Layoutget is just using the credential attached to the open context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O
Trond Myklebust [Sat, 8 Feb 2020 00:38:12 +0000 (19:38 -0500)]
NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O

Avoid unnecessary references to the cred when we have already referenced
it through the open context or the open owner.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Assume cred is pinned by open context in I/O requests
Trond Myklebust [Sat, 8 Feb 2020 00:25:56 +0000 (19:25 -0500)]
NFS: Assume cred is pinned by open context in I/O requests

In read/write/commit, we should be able to assume that the cred is
pinned by the open context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoSUNRPC: Don't take a reference to the cred on synchronous tasks
Trond Myklebust [Sat, 8 Feb 2020 00:16:34 +0000 (19:16 -0500)]
SUNRPC: Don't take a reference to the cred on synchronous tasks

If the RPC call is synchronous, assume the cred is already pinned
by the caller.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoSUNRPC: Add a flag to avoid reference counts on credentials
Trond Myklebust [Sat, 8 Feb 2020 00:11:12 +0000 (19:11 -0500)]
SUNRPC: Add a flag to avoid reference counts on credentials

Add a flag to signal to the RPC layer that the credential is already
pinned for the duration of the RPC call.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: alloc_nfs_open_context() must use the file cred when available
Trond Myklebust [Sat, 8 Feb 2020 00:32:49 +0000 (19:32 -0500)]
NFS: alloc_nfs_open_context() must use the file cred when available

If we're creating a nfs_open_context() for a specific file pointer,
we must use the cred assigned to that file.

Fixes: a52458b48af1 ("NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'.")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Ensure we time out if a delegreturn does not complete
Trond Myklebust [Fri, 20 Dec 2019 15:43:37 +0000 (10:43 -0500)]
NFS: Ensure we time out if a delegreturn does not complete

We can't allow delegreturn to hold up nfs4_evict_inode() forever,
since that can cause the memory shrinkers to block. This patch
therefore ensures that we eventually time out, and complete the
reclaim of the inode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4/pnfs: pnfs_set_layout_stateid() should update the layout cred
Trond Myklebust [Mon, 27 Jan 2020 18:07:26 +0000 (13:07 -0500)]
NFSv4/pnfs: pnfs_set_layout_stateid() should update the layout cred

If the cred assigned to the layout that we're updating differs from
the one used to retrieve the new layout segment, then we need to
update the layout plh_lc_cred field.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFSv4: nfs_update_inplace_delegation() should update delegation cred
Trond Myklebust [Mon, 27 Jan 2020 17:44:41 +0000 (12:44 -0500)]
NFSv4: nfs_update_inplace_delegation() should update delegation cred

If the cred assigned to the delegation that we're updating differs
from the one we're updating too, then we need to update that field
too.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoNFS: Use the 64-bit server readdir cookies when possible
Trond Myklebust [Mon, 3 Feb 2020 19:49:33 +0000 (14:49 -0500)]
NFS: Use the 64-bit server readdir cookies when possible

When we're running as a 64-bit architecture and are not running in
32-bit compatibility mode, it is better to use the 64-bit readdir
cookies that supplied by the server. Doing so improves the accuracy
of telldir()/seekdir(), particularly when the directory is changing,
for instance, when doing 'rm -rf'.

We still fall back to using the 32-bit offsets on 32-bit architectures
and when in compatibility mode.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
4 years agoLinux 5.6-rc6 v5.6-rc6
Linus Torvalds [Sun, 15 Mar 2020 22:01:23 +0000 (15:01 -0700)]
Linux 5.6-rc6

4 years agoMerge tag 'irq-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 15 Mar 2020 20:15:16 +0000 (13:15 -0700)]
Merge tag 'irq-urgent-2020-03-15' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "A single commit to handle an erratum in Cavium ThunderX to prevent
  access to GIC registers which are broken in the implementation"

* tag 'irq-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Workaround Cavium erratum 38539 when reading GICD_TYPER2

4 years agoMerge tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 15 Mar 2020 19:55:52 +0000 (12:55 -0700)]
Merge tag 'locking-urgent-2020-03-15' of git://git./linux/kernel/git/tip/tip

Pull futex fix from Thomas Gleixner:
 "Fix for yet another subtle futex issue.

  The futex code used ihold() to prevent inodes from vanishing, but
  ihold() does not guarantee inode persistence. Replace the inode
  pointer with a per boot, machine wide, unique inode identifier.

  The second commit fixes the breakage of the hash mechanism which
  causes a 100% performance regression"

* tag 'locking-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Unbreak futex hashing
  futex: Fix inode life-time issue

4 years agoMerge tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 15 Mar 2020 19:52:56 +0000 (12:52 -0700)]
Merge tag 'x86-urgent-2020-03-15' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "Two fixes for x86:

   - Map EFI runtime service data as encrypted when SEV is enabled.

     Otherwise e.g. SMBIOS data cannot be properly decoded by dmidecode.

   - Remove the warning in the vector management code which triggered
     when a managed interrupt affinity changed outside of a CPU hotplug
     operation.

     The warning was correct until the recent core code change that
     introduced a CPU isolation feature which needs to migrate managed
     interrupts away from online CPUs under certain conditions to
     achieve the isolation"

* tag 'x86-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vector: Remove warning on managed interrupt migration
  x86/ioremap: Map EFI runtime services data as encrypted for SEV

4 years agoMerge tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 15 Mar 2020 19:50:15 +0000 (12:50 -0700)]
Merge tag 'perf-urgent-2020-03-15' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Thomas Gleixner:
 "A pile of perf fixes:

  Kernel side:

   - AMD uncore driver: Replace the open coded sanity check with the
     core variant, which provides the correct error code and also leaves
     a hint in dmesg

  Tooling:

   - Fix the stdio input handling with glibc versions >= 2.28

   - Unbreak the futex-wake benchmark which was reduced to 0 test
     threads due to the conversion to cpumaps

   - Initialize sigaction structs before invoking sys_sigactio()

   - Plug the mapfile memory leak in perf jevents

   - Fix off by one relative directory includes

   - Fix an undefined string comparison in perf diff"

* tag 'perf-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/amd/uncore: Replace manual sampling check with CAP_NO_INTERRUPT flag
  tools: Fix off-by 1 relative directory includes
  perf jevents: Fix leak of mapfile memory
  perf bench: Clear struct sigaction before sigaction() syscall
  perf bench futex-wake: Restore thread count default to online CPU count
  perf top: Fix stdio interface input handling with glibc 2.28+
  perf diff: Fix undefined string comparision spotted by clang's -Wstring-compare
  perf symbols: Don't try to find a vmlinux file when looking for kernel modules
  perf bench: Share some global variables to fix build with gcc 10
  perf parse-events: Use asprintf() instead of strncpy() to read tracepoint files
  perf env: Do not return pointers to local variables
  perf tests bp_account: Make global variable static

4 years agoMerge tag 'timers-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 15 Mar 2020 19:48:21 +0000 (12:48 -0700)]
Merge tag 'timers-urgent-2020-03-15' of git://git./linux/kernel/git/tip/tip

Pull timer fix from Thomas Gleixner:
 "A single fix adding the missing time namespace adjustment in
  sys/sysinfo which caused sys/sysinfo to be inconsistent with
  /proc/uptime when read from a task inside a time namespace"

* tag 'timers-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sys/sysinfo: Respect boottime inside time namespace

4 years agoMerge tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 15 Mar 2020 19:44:23 +0000 (12:44 -0700)]
Merge tag 'ras-urgent-2020-03-15' of git://git./linux/kernel/git/tip/tip

Pull RAS fixes from Thomas Gleixner:
 "Two RAS related fixes:

   - Shut down the per CPU thermal throttling poll work properly when a
     CPU goes offline.

     The missing shutdown caused the poll work to be migrated to a
     unbound worker which triggered warnings about the usage of
     smp_processor_id() in preemptible context

   - Fix the PPIN feature initialization which missed to enable the
     functionality when PPIN_CTL was enabled but the MSR locked against
     updates"

* tag 'ras-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mce: Fix logic and comments around MSR_PPIN_CTL
  x86/mce/therm_throt: Undo thermal polling properly on CPU offline

4 years agoMerge tag 'efi-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 15 Mar 2020 19:42:03 +0000 (12:42 -0700)]
Merge tag 'efi-urgent-2020-03-15' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Thomas Gleixner:
 "Two EFI fixes:

   - Prevent a race and buffer overflow in the sysfs efivars interface
     which causes kernel memory corruption.

   - Add the missing NULL pointer checks in efivar_store_raw()"

* tag 'efi-urgent-2020-03-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: Add a sanity check to efivar_store_raw()
  efi: Fix a race and a buffer overflow while reading efivars via sysfs

4 years agoMerge tag 'iommu-fixes-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 15 Mar 2020 19:37:10 +0000 (12:37 -0700)]
Merge tag 'iommu-fixes-v5.6-rc5' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:

 - Intel VT-d fixes:
    - RCU list handling fixes
    - Replace WARN_TAINT with pr_warn + add_taint for reporting firmware
      issues
    - DebugFS fixes
    - Fix for hugepage handling in iova_to_phys implementation
    - Fix for handling VMD devices, which have a domain number which
      doesn't fit into 16 bits
    - Warning message fix

 - MSI allocation fix for iommu-dma code

 - Sign-extension fix for io page-table code

 - Fix for AMD-Vi to properly update the is-running bit when AVIC is
   used

* tag 'iommu-fixes-v5.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Populate debugfs if IOMMUs are detected
  iommu/amd: Fix IOMMU AVIC not properly update the is_run bit in IRTE
  iommu/vt-d: Ignore devices with out-of-spec domain number
  iommu/vt-d: Fix the wrong printing in RHSA parsing
  iommu/vt-d: Fix debugfs register reads
  iommu/vt-d: quirk_ioat_snb_local_iommu: replace WARN_TAINT with pr_warn + add_taint
  iommu/vt-d: dmar_parse_one_rmrr: replace WARN_TAINT with pr_warn + add_taint
  iommu/vt-d: dmar: replace WARN_TAINT with pr_warn + add_taint
  iommu/vt-d: Silence RCU-list debugging warnings
  iommu/vt-d: Fix RCU-list bugs in intel_iommu_init()
  iommu/dma: Fix MSI reservation allocation
  iommu/io-pgtable-arm: Fix IOVA validation for 32-bit
  iommu/vt-d: Fix a bug in intel_iommu_iova_to_phys() for huge page
  iommu/vt-d: Fix RCU list debugging warnings

4 years agoMerge tag 'irqchip-fixes-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Thomas Gleixner [Sun, 15 Mar 2020 09:53:11 +0000 (10:53 +0100)]
Merge tag 'irqchip-fixes-5.6-2' of git://git./linux/kernel/git/maz/arm-platforms into irq/urgent

Pull irqchip fixes from Marc Zyngier:

- Add workaround for Cavium/Marvell ThunderX unimplemented GIC registers

4 years agoMerge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa...
Linus Torvalds [Sat, 14 Mar 2020 22:53:48 +0000 (15:53 -0700)]
Merge branch 'i2c/for-current' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "I2C has quite some regression fixes this time.

  One is also related to watchdogs, we have proper acks from Guenter for
  them"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: acpi: put device when verifying client fails
  misc: eeprom: at24: fix regulator underflow
  i2c: gpio: suppress error on probe defer
  macintosh: windfarm: fix MODINFO regression
  i2c: designware-pci: Fix BUG_ON during device removal
  i2c: i801: Do not add ICH_RES_IO_SMI for the iTCO_wdt device
  watchdog: iTCO_wdt: Make ICH_RES_IO_SMI optional
  watchdog: iTCO_wdt: Export vendorsupport

4 years agoMerge tag 'arc-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Sat, 14 Mar 2020 22:49:09 +0000 (15:49 -0700)]
Merge tag 'arc-5.6-rc6' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - Fix __ALIGN_STR and __ALIGN to not use default junk padding

 - Misc Kconfig cleanups, header updates

* tag 'arc-5.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: define __ALIGN_STR and __ALIGN symbols for ARC
  ARC: show_regs: reduce lines of output
  ARC: Replace <linux/clk-provider.h> by <linux/of_clk.h>
  ARC: fpu: fix randconfig build error reported by 0-day test service
  ARC: fix some Kconfig typos
  ARC: Cleanup old Kconfig IO scheduler options