OSDN Git Service

tomoyo/tomoyo-test1.git
3 years agosvcrdma: Retain the page backing rq_res.head[0].iov_base
Chuck Lever [Mon, 1 Feb 2021 20:16:57 +0000 (15:16 -0500)]
svcrdma: Retain the page backing rq_res.head[0].iov_base

svc_rdma_sendto() now waits for the NIC hardware to finish with
the pages backing rq_res. We still have to release the page array
in some cases, but now it's always safe to immediately re-use the
page backing rq_res's head buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Remove unused sc_pages field
Chuck Lever [Thu, 28 Jan 2021 21:47:56 +0000 (16:47 -0500)]
svcrdma: Remove unused sc_pages field

Clean up. This significantly reduces the size of struct
svc_rdma_send_ctxt.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Normalize Send page handling
Chuck Lever [Wed, 13 Jan 2021 18:57:18 +0000 (13:57 -0500)]
svcrdma: Normalize Send page handling

Currently svc_rdma_sendto() migrates xdr_buf pages into a separate
page list and NULLs out a bunch of entries in rq_pages while the
pages are under I/O. The Send completion handler then frees those
pages later.

Instead, let's wait for the Send completion, then handle page
releasing in the nfsd thread. I'd like to avoid the cost of 250+
put_page() calls in the Send completion handler, which is single-
threaded.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Add a "deferred close" helper
Chuck Lever [Sat, 20 Feb 2021 23:53:40 +0000 (18:53 -0500)]
svcrdma: Add a "deferred close" helper

Refactor a bit of commonly used logic so that every site that wants
a close deferred to an nfsd thread does all the right things
(set_bit(XPT_CLOSE) then enqueue).

Also, once XPT_CLOSE is set on a transport, it is never cleared. If
XPT_CLOSE is already set, then the close is already being handled
and the enqueue can be skipped.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Maintain a Receive water mark
Chuck Lever [Thu, 11 Mar 2021 23:32:30 +0000 (18:32 -0500)]
svcrdma: Maintain a Receive water mark

Post more Receives when the number of pending Receives drops below
a water mark. The batch mechanism is disabled if the underlying
device cannot support a reasonably-sized Receive Queue.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Use svc_rdma_refresh_recvs() in wc_receive
Chuck Lever [Thu, 11 Mar 2021 21:15:22 +0000 (16:15 -0500)]
svcrdma: Use svc_rdma_refresh_recvs() in wc_receive

Replace svc_rdma_post_recv() with the new batch receive mechanism.
For the moment it is posting just a single Receive WR at a time,
so no change in behavior is expected.

Since svc_rdma_wc_receive() was the last call site for
svc_rdma_post_recv(), it is removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Add a batch Receive posting mechanism
Chuck Lever [Thu, 11 Mar 2021 18:54:34 +0000 (13:54 -0500)]
svcrdma: Add a batch Receive posting mechanism

Introduce a server-side mechanism similar to commit e340c2d6ef2a
("xprtrdma: Reduce the doorbell rate (Receive)") to post Receive
WRs in batch. Its first consumer is svc_rdma_post_recvs(), which
posts the initial set of Receive WRs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Remove stale comment for svc_rdma_wc_receive()
Chuck Lever [Thu, 11 Mar 2021 18:49:25 +0000 (13:49 -0500)]
svcrdma: Remove stale comment for svc_rdma_wc_receive()

xprt pinning was removed in commit 365e9992b90f ("svcrdma: Remove
transport reference counting"), but this comment was not updated
to reflect that change.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: Provide an explanatory comment in CMA event handler
Chuck Lever [Mon, 1 Mar 2021 18:34:38 +0000 (13:34 -0500)]
svcrdma: Provide an explanatory comment in CMA event handler

Clean up: explain why svc_xprt_enqueue() is invoked in the event
handler even though no xpt_flags bits are toggled here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agosvcrdma: RPCDBG_FACILITY is no longer used
Chuck Lever [Sun, 21 Feb 2021 00:11:55 +0000 (19:11 -0500)]
svcrdma: RPCDBG_FACILITY is no longer used

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: report client confirmation status in "info" file
NeilBrown [Fri, 19 Mar 2021 22:38:04 +0000 (09:38 +1100)]
nfsd: report client confirmation status in "info" file

mountd can now monitor clients appearing and disappearing in
/proc/fs/nfsd/clients, and will log these events, in liu of the logging
of mount/unmount events for NFSv3.

Currently it cannot distinguish between unconfirmed clients (which might
be transient and totally uninteresting) and confirmed clients.

So add a "status: " line which reports either "confirmed" or
"unconfirmed", and use fsnotify to report that the info file
has been modified.

This requires a bit of infrastructure to keep the dentry for the "info"
file.  There is no need to take a counted reference as the dentry must
remain around until the client is removed.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: don't ignore high bits of copy count
J. Bruce Fields [Fri, 19 Mar 2021 00:03:22 +0000 (20:03 -0400)]
nfsd: don't ignore high bits of copy count

Note size_t is 32-bit on a 32-bit architecture, but cp_count is defined
by the protocol to be 64 bit, so we could be turning a large copy into a
0-length copy here.

Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: COPY with length 0 should copy to end of file
J. Bruce Fields [Fri, 19 Mar 2021 00:03:23 +0000 (20:03 -0400)]
nfsd: COPY with length 0 should copy to end of file

>From https://tools.ietf.org/html/rfc7862#page-65

A count of 0 (zero) requests that all bytes from ca_src_offset
through EOF be copied to the destination.

Reported-by: <radchenkoy@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: Fix typo "accesible"
Ricardo Ribalda [Thu, 18 Mar 2021 20:22:21 +0000 (21:22 +0100)]
nfsd: Fix typo "accesible"

Trivial fix.

Cc: linux-nfs@vger.kernel.org
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted
Trond Myklebust [Sat, 13 Mar 2021 21:08:47 +0000 (16:08 -0500)]
nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted

In order to ensure that knfsd threads don't linger once the nfsd
pseudofs is unmounted (e.g. when the container is killed) we let
nfsd_umount() shut down those threads and wait for them to exit.

This also should ensure that we don't need to do a kernel mount of
the pseudofs, since the thread lifetime is now limited by the
lifetime of the filesystem.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: Log client tracking type log message as info instead of warning
Paul Menzel [Fri, 12 Mar 2021 21:03:00 +0000 (22:03 +0100)]
nfsd: Log client tracking type log message as info instead of warning

`printk()`, by default, uses the log level warning, which leaves the
user reading

    NFSD: Using UMH upcall client tracking operations.

wondering what to do about it (`dmesg --level=warn`).

Several client tracking methods are tried, and expected to fail. That’s
why a message is printed only on success. It might be interesting for
users to know the chosen method, so use info-level instead of
debug-level.

Cc: linux-nfs@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agonfsd: helper for laundromat expiry calculations
J. Bruce Fields [Tue, 2 Mar 2021 15:46:23 +0000 (10:46 -0500)]
nfsd: helper for laundromat expiry calculations

We do this same logic repeatedly, and it's easy to get the sense of the
comparison wrong.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Clean up NFSDDBG_FACILITY macro
Chuck Lever [Fri, 5 Mar 2021 19:22:32 +0000 (14:22 -0500)]
NFSD: Clean up NFSDDBG_FACILITY macro

These are no longer needed because there are no dprintk() call sites
in these files.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Add a tracepoint to record directory entry encoding
Chuck Lever [Fri, 5 Mar 2021 18:57:40 +0000 (13:57 -0500)]
NFSD: Add a tracepoint to record directory entry encoding

Enable watching the progress of directory encoding to capture the
timing of any issues with reading or encoding a directory. The
new tracepoint captures dirent encoding for all NFS versions.

For example, here's what a few NFSv4 directory entries might look
like:

nfsd-989   [002]   468.596265: nfsd_dirent:          fh_hash=0x5d162594 ino=2 name=.
nfsd-989   [002]   468.596267: nfsd_dirent:          fh_hash=0x5d162594 ino=1 name=..
nfsd-989   [002]   468.596299: nfsd_dirent:          fh_hash=0x5d162594 ino=3827 name=zlib.c
nfsd-989   [002]   468.596325: nfsd_dirent:          fh_hash=0x5d162594 ino=3811 name=xdiff
nfsd-989   [002]   468.596351: nfsd_dirent:          fh_hash=0x5d162594 ino=3810 name=xdiff-interface.h
nfsd-989   [002]   468.596377: nfsd_dirent:          fh_hash=0x5d162594 ino=3809 name=xdiff-interface.c

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Clean up after updating NFSv3 ACL encoders
Chuck Lever [Sun, 15 Nov 2020 20:09:16 +0000 (15:09 -0500)]
NFSD: Clean up after updating NFSv3 ACL encoders

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 SETACL result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 21:21:24 +0000 (16:21 -0500)]
NFSD: Update the NFSv3 SETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 21:11:42 +0000 (16:11 -0500)]
NFSD: Update the NFSv3 GETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Clean up after updating NFSv2 ACL encoders
Chuck Lever [Sun, 15 Nov 2020 19:31:42 +0000 (14:31 -0500)]
NFSD: Clean up after updating NFSv2 ACL encoders

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 ACL ACCESS result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 19:52:09 +0000 (14:52 -0500)]
NFSD: Update the NFSv2 ACL ACCESS result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 ACL GETATTR result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 19:49:57 +0000 (14:49 -0500)]
NFSD: Update the NFSv2 ACL GETATTR result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 SETACL result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 19:47:56 +0000 (14:47 -0500)]
NFSD: Update the NFSv2 SETACL result encoder to use struct xdr_stream

The SETACL result encoder is exactly the same as the NFSv2
attrstatres decoder.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream
Chuck Lever [Wed, 18 Nov 2020 19:38:47 +0000 (14:38 -0500)]
NFSD: Update the NFSv2 GETACL result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs
Chuck Lever [Wed, 18 Nov 2020 19:55:05 +0000 (14:55 -0500)]
NFSD: Add an xdr_stream-based encoder for NFSv2/3 ACLs

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Remove unused NFSv2 directory entry encoders
Chuck Lever [Sun, 15 Nov 2020 19:30:13 +0000 (14:30 -0500)]
NFSD: Remove unused NFSv2 directory entry encoders

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream
Chuck Lever [Sat, 14 Nov 2020 18:45:35 +0000 (13:45 -0500)]
NFSD: Update the NFSv2 READDIR entry encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 READDIR result encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 20:49:01 +0000 (16:49 -0400)]
NFSD: Update the NFSv2 READDIR result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Count bytes instead of pages in the NFSv2 READDIR encoder
Chuck Lever [Fri, 13 Nov 2020 21:57:44 +0000 (16:57 -0500)]
NFSD: Count bytes instead of pages in the NFSv2 READDIR encoder

Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Add a helper that encodes NFSv3 directory offset cookies
Chuck Lever [Fri, 13 Nov 2020 21:53:17 +0000 (16:53 -0500)]
NFSD: Add a helper that encodes NFSv3 directory offset cookies

Refactor: Add helper function similar to nfs3svc_encode_cookie3().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 STATFS result encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 23:01:38 +0000 (19:01 -0400)]
NFSD: Update the NFSv2 STATFS result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 READ result encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 20:40:11 +0000 (16:40 -0400)]
NFSD: Update the NFSv2 READ result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 READLINK result encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 19:41:09 +0000 (15:41 -0400)]
NFSD: Update the NFSv2 READLINK result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 diropres encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 20:44:16 +0000 (16:44 -0400)]
NFSD: Update the NFSv2 diropres encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 attrstat encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 19:28:59 +0000 (15:28 -0400)]
NFSD: Update the NFSv2 attrstat encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv2 stat encoder to use struct xdr_stream
Chuck Lever [Fri, 23 Oct 2020 15:08:02 +0000 (11:08 -0400)]
NFSD: Update the NFSv2 stat encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Reduce svc_rqst::rq_pages churn during READDIR operations
Chuck Lever [Fri, 15 Jan 2021 14:28:44 +0000 (09:28 -0500)]
NFSD: Reduce svc_rqst::rq_pages churn during READDIR operations

During NFSv2 and NFSv3 READDIR/PLUS operations, NFSD advances
rq_next_page to the full size of the client-requested buffer, then
releases all those pages at the end of the request. The next request
to use that nfsd thread has to refill the pages.

NFSD does this even when the dirlist in the reply is small. With
NFSv3 clients that send READDIR operations with large buffer sizes,
that can be 256 put_page/alloc_page pairs per READDIR request, even
though those pages often remain unused.

We can save some work by not releasing dirlist buffer pages that
were not used to form the READDIR Reply. I've left the NFSv2 code
alone since there are never more than three pages involved in an
NFSv2 READDIR Reply.

Eventually we should nail down why these pages need to be released
at all in order to avoid allocating and releasing pages
unnecessarily.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Remove unused NFSv3 directory entry encoders
Chuck Lever [Fri, 13 Nov 2020 16:27:13 +0000 (11:27 -0500)]
NFSD: Remove unused NFSv3 directory entry encoders

Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 23:46:58 +0000 (19:46 -0400)]
NFSD: Update NFSv3 READDIR entry encoders to use struct xdr_stream

The benefit of the xdr_stream helpers is that they transparently
handle encoding an XDR data item that crosses page boundaries.
Most of the open-coded logic to do that here can be eliminated.

A sub-buffer and sub-stream are set up as a sink buffer for the
directory entry encoder. As an entry is encoded, it is added to
the end of the content in this buffer/stream. The total length of
the directory list is tracked in the buffer's @len field.

When it comes time to encode the Reply, the sub-buffer is merged
into rq_res's page array at the correct place using
xdr_write_pages().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 23:31:48 +0000 (19:31 -0400)]
NFSD: Update the NFSv3 READDIR3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Count bytes instead of pages in the NFSv3 READDIR encoder
Chuck Lever [Mon, 9 Nov 2020 18:13:21 +0000 (13:13 -0500)]
NFSD: Count bytes instead of pages in the NFSv3 READDIR encoder

Clean up: Counting the bytes used by each returned directory entry
seems less brittle to me than trying to measure consumed pages after
the fact.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Add a helper that encodes NFSv3 directory offset cookies
Chuck Lever [Tue, 10 Nov 2020 14:57:14 +0000 (09:57 -0500)]
NFSD: Add a helper that encodes NFSv3 directory offset cookies

Refactor: De-duplicate identical code that handles encoding of
directory offset cookies across page boundaries.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:35:46 +0000 (15:35 -0400)]
NFSD: Update the NFSv3 COMMIT3res encoder to use struct xdr_stream

As an additional clean up, encode_wcc_data() is removed because it
is now no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream
Chuck Lever [Fri, 6 Nov 2020 18:15:09 +0000 (13:15 -0500)]
NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 17:42:13 +0000 (13:42 -0400)]
NFSD: Update the NFSv3 FSINFO3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream
Chuck Lever [Fri, 6 Nov 2020 18:08:45 +0000 (13:08 -0500)]
NFSD: Update the NFSv3 FSSTAT3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:08:29 +0000 (15:08 -0400)]
NFSD: Update the NFSv3 LINK3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:33:05 +0000 (15:33 -0400)]
NFSD: Update the NFSv3 RENAMEv3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:27:23 +0000 (15:27 -0400)]
NFSD: Update the NFSv3 CREATE family of encoders to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:26:31 +0000 (15:26 -0400)]
NFSD: Update the NFSv3 WRITE3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 READ3res encode to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:23:50 +0000 (15:23 -0400)]
NFSD: Update the NFSv3 READ3res encode to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:18:40 +0000 (15:18 -0400)]
NFSD: Update the NFSv3 READLINK3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 19:12:38 +0000 (15:12 -0400)]
NFSD: Update the NFSv3 wccstat result encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 18:46:58 +0000 (14:46 -0400)]
NFSD: Update the NFSv3 LOOKUP3res encoder to use struct xdr_stream

Also, clean up: Rename the encoder function to match the name of
the result structure in RFC 1813, consistent with other encoder
function names in nfs3xdr.c. "diropres" is an NFSv2 thingie.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream
Chuck Lever [Thu, 22 Oct 2020 17:56:58 +0000 (13:56 -0400)]
NFSD: Update the NFSv3 ACCESS3res encoder to use struct xdr_stream

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Update the GETATTR3res encoder to use struct xdr_stream
Chuck Lever [Wed, 21 Oct 2020 15:58:41 +0000 (11:58 -0400)]
NFSD: Update the GETATTR3res encoder to use struct xdr_stream

As an additional clean up, some renaming is done to more closely
reflect the data type and variable names used in the NFSv3 XDR
definition provided in RFC 1813. "attrstat" is an NFSv2 thingie.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoNFSD: Extract the svcxdr_init_encode() helper
Chuck Lever [Tue, 27 Oct 2020 19:53:42 +0000 (15:53 -0400)]
NFSD: Extract the svcxdr_init_encode() helper

NFSD initializes an encode xdr_stream only after the RPC layer has
already inserted the RPC Reply header. Thus it behaves differently
than xdr_init_encode does, which assumes the passed-in xdr_buf is
entirely devoid of content.

nfs4proc.c has this server-side stream initialization helper, but
it is visible only to the NFSv4 code. Move this helper to a place
that can be accessed by NFSv2 and NFSv3 server XDR functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
3 years agoLinux 5.12-rc4 v5.12-rc4
Linus Torvalds [Sun, 21 Mar 2021 21:56:43 +0000 (14:56 -0700)]
Linux 5.12-rc4

3 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Mar 2021 21:06:10 +0000 (14:06 -0700)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Miscellaneous ext4 bug fixes for v5.12"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: initialize ret to suppress smatch warning
  ext4: stop inode update before return
  ext4: fix rename whiteout with fast commit
  ext4: fix timer use-after-free on failed mount
  ext4: fix potential error in ext4_do_update_inode
  ext4: do not try to set xattr into ea_inode if value is empty
  ext4: do not iput inode under running transaction in ext4_rename()
  ext4: find old entry again if failed to rename whiteout
  ext4: fix error handling in ext4_end_enable_verity()
  ext4: fix bh ref count on error paths
  fs/ext4: fix integer overflow in s_log_groups_per_flex
  ext4: add reclaim checks to xattr code
  ext4: shrink race window in ext4_should_retry_alloc()

3 years agoMerge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block
Linus Torvalds [Sun, 21 Mar 2021 19:25:54 +0000 (12:25 -0700)]
Merge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block

Pull io_uring followup fixes from Jens Axboe:

 - The SIGSTOP change from Eric, so we properly ignore that for
   PF_IO_WORKER threads.

 - Disallow sending signals to PF_IO_WORKER threads in general, we're
   not interested in having them funnel back to the io_uring owning
   task.

 - Stable fix from Stefan, ensuring we properly break links for short
   send/sendmsg recv/recvmsg if MSG_WAITALL is set.

 - Catch and loop when needing to run task_work before a PF_IO_WORKER
   threads goes to sleep.

* tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block:
  io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
  io-wq: ensure task is running before processing task_work
  signal: don't allow STOP on PF_IO_WORKER threads
  signal: don't allow sending any signals to PF_IO_WORKER threads

3 years agoMerge tag 'staging-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sun, 21 Mar 2021 18:54:04 +0000 (11:54 -0700)]
Merge tag 'staging-5.12-rc4' of git://git./linux/kernel/git/gregkh/staging

Pull staging and IIO driver fixes from Greg KH:
 "Some small staging and IIO driver fixes:

   - MAINTAINERS changes for the move of the staging mailing list

   - comedi driver fixes to get request_irq() to work correctly

   - counter driver fixes for reported issues with iio devices

   - tiny iio driver fixes for reported issues.

  All of these have been in linux-next with no reported problems"

* tag 'staging-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: vt665x: fix alignment constraints
  staging: comedi: cb_pcidas64: fix request_irq() warn
  staging: comedi: cb_pcidas: fix request_irq() warn
  MAINTAINERS: move the staging subsystem to lists.linux.dev
  MAINTAINERS: move some real subsystems off of the staging mailing list
  iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler
  iio: hid-sensor-temperature: Fix issues of timestamp channel
  iio: hid-sensor-humidity: Fix alignment issue of timestamp channel
  counter: stm32-timer-cnt: fix ceiling miss-alignment with reload register
  counter: stm32-timer-cnt: fix ceiling write max value
  counter: stm32-timer-cnt: Report count function when SLAVE_MODE_DISABLED
  iio: adc: ab8500-gpadc: Fix off by 10 to 3
  iio:adc:stm32-adc: Add HAS_IOMEM dependency
  iio: adis16400: Fix an error code in adis16400_initial_setup()
  iio: adc: adi-axi-adc: add proper Kconfig dependencies
  iio: adc: ad7949: fix wrong ADC result due to incorrect bit mask
  iio: hid-sensor-prox: Fix scale not correct issue
  iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel

3 years agoMerge tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Linus Torvalds [Sun, 21 Mar 2021 18:49:16 +0000 (11:49 -0700)]
Merge tag 'usb-5.12-rc4' of git://git./linux/kernel/git/gregkh/usb

Pull USB and Thunderbolt driver fixes from Greg KH:
 "Here are some small Thunderbolt and USB driver fixes for some reported
  issues:

   - thunderbolt fixes for minor problems

   - typec fixes for power issues

   - usb-storage quirk addition

   - usbip bugfix

   - dwc3 bugfix when stopping transfers

   - cdnsp bugfix for isoc transfers

   - gadget use-after-free fix

  All have been in linux-next this week with no reported issues"

* tag 'usb-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: tcpm: Skip sink_cap query only when VDM sm is busy
  usb: dwc3: gadget: Prevent EP queuing while stopping transfers
  usb: typec: tcpm: Invoke power_supply_changed for tcpm-source-psy-
  usb: typec: Remove vdo[3] part of tps6598x_rx_identity_reg struct
  usb-storage: Add quirk to defeat Kindle's automatic unload
  usb: gadget: configfs: Fix KASAN use-after-free
  usbip: Fix incorrect double assignment to udc->ud.tcp_rx
  usb: cdnsp: Fixes incorrect value in ISOC TRB
  thunderbolt: Increase runtime PM reference count on DP tunnel discovery
  thunderbolt: Initialize HopID IDAs in tb_switch_alloc()

3 years agoMerge tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Mar 2021 18:34:24 +0000 (11:34 -0700)]
Merge tag 'irq-urgent-2021-03-21' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Ingo Molnar:
 "A change to robustify force-threaded IRQ handlers to always disable
  interrupts, plus a DocBook fix.

  The force-threaded IRQ handler change has been accelerated from the
  normal schedule of such a change to keep the bad pattern/workaround of
  spin_lock_irqsave() in handlers or IRQF_NOTHREAD as a kludge from
  spreading"

* tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Disable interrupts for force threaded handlers
  genirq/irq_sim: Fix typos in kernel doc (fnode -> fwnode)

3 years agoMerge tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Mar 2021 18:26:21 +0000 (11:26 -0700)]
Merge tag 'perf-urgent-2021-03-21' of git://git./linux/kernel/git/tip/tip

Pull perf fixes from Ingo Molnar:
 "Boundary condition fixes for bugs unearthed by the perf fuzzer"

* tag 'perf-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Fix unchecked MSR access error caused by VLBR_EVENT
  perf/x86/intel: Fix a crash caused by zero PEBS status

3 years agoMerge tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 21 Mar 2021 18:19:29 +0000 (11:19 -0700)]
Merge tag 'locking-urgent-2021-03-21' of git://git./linux/kernel/git/tip/tip

Pull locking fixes from Ingo Molnar:

 - Get static calls & modules right. Hopefully.

 - WW mutex fixes

* tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  static_call: Fix static_call_update() sanity check
  static_call: Align static_call_is_init() patching condition
  static_call: Fix static_call_set_init()
  locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini()
  locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling

3 years agoMerge tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 21 Mar 2021 18:11:22 +0000 (11:11 -0700)]
Merge tag 'efi-urgent-2021-03-21' of git://git./linux/kernel/git/tip/tip

Pull EFI fixes from Ingo Molnar:

 - another missing RT_PROP table related fix, to ensure that the
   efivarfs pseudo filesystem fails gracefully if variable services
   are unsupported

 - use the correct alignment for literal EFI GUIDs

 - fix a use after unmap issue in the memreserve code

* tag 'efi-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi: use 32-bit alignment for efi_guid_t literals
  firmware/efi: Fix a use after bug in efi_mem_reserve_persistent
  efivars: respect EFI_UNSUPPORTED return from firmware

3 years agoMerge tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 21 Mar 2021 18:04:20 +0000 (11:04 -0700)]
Merge tag 'x86_urgent_for_v5.12-rc4' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "The freshest pile of shiny x86 fixes for 5.12:

   - Add the arch-specific mapping between physical and logical CPUs to
     fix devicetree-node lookups

   - Restore the IRQ2 ignore logic

   - Fix get_nr_restart_syscall() to return the correct restart syscall
     number. Split in a 4-patches set to avoid kABI breakage when
     backporting to dead kernels"

* tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic/of: Fix CPU devicetree-node lookups
  x86/ioapic: Ignore IRQ2 again
  x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART
  x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall()
  x86: Move TS_COMPAT back to asm/thread_info.h
  kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()

3 years agoMerge tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 21 Mar 2021 17:57:35 +0000 (10:57 -0700)]
Merge tag 'powerpc-5.12-4' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix a possible stack corruption and subsequent DLPAR failure in the
   rpadlpar_io PCI hotplug driver

 - Two build fixes for uncommon configurations

Thanks to Christophe Leroy and Tyrel Datwyler.

* tag 'powerpc-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  PCI: rpadlpar: Fix potential drc_name corruption in store functions
  powerpc: Force inlining of cpu_has_feature() to avoid build failure
  powerpc/vdso32: Add missing _restgpr_31_x to fix build failure

3 years agoio_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL
Stefan Metzmacher [Sat, 20 Mar 2021 19:33:36 +0000 (20:33 +0100)]
io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL

Without that it's not safe to use them in a linked combination with
others.

Now combinations like IORING_OP_SENDMSG followed by IORING_OP_SPLICE
should be possible.

We already handle short reads and writes for the following opcodes:

- IORING_OP_READV
- IORING_OP_READ_FIXED
- IORING_OP_READ
- IORING_OP_WRITEV
- IORING_OP_WRITE_FIXED
- IORING_OP_WRITE
- IORING_OP_SPLICE
- IORING_OP_TEE

Now we have it for these as well:

- IORING_OP_SENDMSG
- IORING_OP_SEND
- IORING_OP_RECVMSG
- IORING_OP_RECV

For IORING_OP_RECVMSG we also check for the MSG_TRUNC and MSG_CTRUNC
flags in order to call req_set_fail_links().

There might be applications arround depending on the behavior
that even short send[msg]()/recv[msg]() retuns continue an
IOSQE_IO_LINK chain.

It's very unlikely that such applications pass in MSG_WAITALL,
which is only defined in 'man 2 recvmsg', but not in 'man 2 sendmsg'.

It's expected that the low level sock_sendmsg() call just ignores
MSG_WAITALL, as MSG_ZEROCOPY is also ignored without explicitly set
SO_ZEROCOPY.

We also expect the caller to know about the implicit truncation to
MAX_RW_COUNT, which we don't detect.

cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/c4e1a4cc0d905314f4d5dc567e65a7b09621aab3.1615908477.git.metze@samba.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoio-wq: ensure task is running before processing task_work
Jens Axboe [Sun, 21 Mar 2021 13:06:56 +0000 (07:06 -0600)]
io-wq: ensure task is running before processing task_work

Mark the current task as running if we need to run task_work from the
io-wq threads as part of work handling. If that is the case, then return
as such so that the caller can appropriately loop back and reset if it
was part of a going-to-sleep flush.

Fixes: 3bfe6106693b ("io-wq: fork worker threads from original task")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agosignal: don't allow STOP on PF_IO_WORKER threads
Eric W. Biederman [Sun, 21 Mar 2021 15:37:48 +0000 (09:37 -0600)]
signal: don't allow STOP on PF_IO_WORKER threads

Just like we don't allow normal signals to IO threads, don't deliver a
STOP to a task that has PF_IO_WORKER set. The IO threads don't take
signals in general, and have no means of flushing out a stop either.

Longer term, we may want to look into allowing stop of these threads,
as it relates to eg process freezing. For now, this prevents a spin
issue if a SIGSTOP is delivered to the parent task.

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
3 years agosignal: don't allow sending any signals to PF_IO_WORKER threads
Jens Axboe [Sat, 20 Mar 2021 01:25:13 +0000 (19:25 -0600)]
signal: don't allow sending any signals to PF_IO_WORKER threads

They don't take signals individually, and even if they share signals with
the parent task, don't allow them to be delivered through the worker
thread. Linux does allow this kind of behavior for regular threads, but
it's really a compatability thing that we need not care about for the IO
threads.

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
3 years agoext4: initialize ret to suppress smatch warning
Theodore Ts'o [Sun, 21 Mar 2021 04:45:37 +0000 (00:45 -0400)]
ext4: initialize ret to suppress smatch warning

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: stop inode update before return
Pan Bian [Sun, 17 Jan 2021 08:57:32 +0000 (00:57 -0800)]
ext4: stop inode update before return

The inode update should be stopped before returing the error code.

Signed-off-by: Pan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210117085732.93788-1-bianpan2016@163.com
Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix rename whiteout with fast commit
Harshad Shirwadkar [Tue, 16 Mar 2021 22:19:21 +0000 (15:19 -0700)]
ext4: fix rename whiteout with fast commit

This patch adds rename whiteout support in fast commits. Note that the
whiteout object that gets created is actually char device. Which
imples, the function ext4_inode_journal_mode(struct inode *inode)
would return "JOURNAL_DATA" for this inode. This has a consequence in
fast commit code that it will make creation of the whiteout object a
fast-commit ineligible behavior and thus will fall back to full
commits. With this patch, this can be observed by running fast commits
with rename whiteout and seeing the stats generated by ext4_fc_stats
tracepoint as follows:

ext4_fc_stats: dev 254:32 fc ineligible reasons:
XATTR:0, CROSS_RENAME:0, JOURNAL_FLAG_CHANGE:0, NO_MEM:0, SWAP_BOOT:0,
RESIZE:0, RENAME_DIR:0, FALLOC_RANGE:0, INODE_JOURNAL_DATA:16;
num_commits:6, ineligible: 6, numblks: 3

So in short, this patch guarantees that in case of rename whiteout, we
fall back to full commits.

Amir mentioned that instead of creating a new whiteout object for
every rename, we can create a static whiteout object with irrelevant
nlink. That will make fast commits to not fall back to full
commit. But until this happens, this patch will ensure correctness by
falling back to full commits.

Fixes: 8016e29f4362 ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20210316221921.1124955-1-harshadshirwadkar@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix timer use-after-free on failed mount
Jan Kara [Mon, 15 Mar 2021 16:59:06 +0000 (17:59 +0100)]
ext4: fix timer use-after-free on failed mount

When filesystem mount fails because of corrupted filesystem we first
cancel the s_err_report timer reminding fs errors every day and only
then we flush s_error_work. However s_error_work may report another fs
error and re-arm timer thus resulting in timer use-after-free. Fix the
problem by first flushing the work and only after that canceling the
s_err_report timer.

Reported-by: syzbot+628472a2aac693ab0fcd@syzkaller.appspotmail.com
Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210315165906.2175-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: fix potential error in ext4_do_update_inode
Shijie Luo [Fri, 12 Mar 2021 06:50:51 +0000 (01:50 -0500)]
ext4: fix potential error in ext4_do_update_inode

If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(),
the error code will be overridden, go to out_brelse to avoid this
situation.

Signed-off-by: Shijie Luo <luoshijie1@huawei.com>
Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: do not try to set xattr into ea_inode if value is empty
zhangyi (F) [Fri, 5 Mar 2021 12:05:08 +0000 (20:05 +0800)]
ext4: do not try to set xattr into ea_inode if value is empty

Syzbot report a warning that ext4 may create an empty ea_inode if set
an empty extent attribute to a file on the file system which is no free
blocks left.

  WARNING: CPU: 6 PID: 10667 at fs/ext4/xattr.c:1640 ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640
  ...
  Call trace:
   ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640
   ext4_xattr_block_set+0x1d0/0x1b1c fs/ext4/xattr.c:1942
   ext4_xattr_set_handle+0x8a0/0xf1c fs/ext4/xattr.c:2390
   ext4_xattr_set+0x120/0x1f0 fs/ext4/xattr.c:2491
   ext4_xattr_trusted_set+0x48/0x5c fs/ext4/xattr_trusted.c:37
   __vfs_setxattr+0x208/0x23c fs/xattr.c:177
  ...

Now, ext4 try to store extent attribute into an external inode if
ext4_xattr_block_set() return -ENOSPC, but for the case of store an
empty extent attribute, store the extent entry into the extent
attribute block is enough. A simple reproduce below.

  fallocate test.img -l 1M
  mkfs.ext4 -F -b 2048 -O ea_inode test.img
  mount test.img /mnt
  dd if=/dev/zero of=/mnt/foo bs=2048 count=500
  setfattr -n "user.test" /mnt/foo

Reported-by: syzbot+98b881fdd8ebf45ab4ae@syzkaller.appspotmail.com
Fixes: 9c6e7853c531 ("ext4: reserve space for xattr entries/names")
Cc: stable@kernel.org
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210305120508.298465-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: do not iput inode under running transaction in ext4_rename()
zhangyi (F) [Wed, 3 Mar 2021 13:17:03 +0000 (21:17 +0800)]
ext4: do not iput inode under running transaction in ext4_rename()

In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into
directory, it ends up dropping new created whiteout inode under the
running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode
under running transaction"), we follow the assumptions that evict() does
not get called from a transaction context but in ext4_rename() it breaks
this suggestion. Although it's not a real problem, better to obey it, so
this patch add inode to orphan list and stop transaction before final
iput().

Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210303131703.330415-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agoext4: find old entry again if failed to rename whiteout
zhangyi (F) [Wed, 3 Mar 2021 13:17:02 +0000 (21:17 +0800)]
ext4: find old entry again if failed to rename whiteout

If we failed to add new entry on rename whiteout, we cannot reset the
old->de entry directly, because the old->de could have moved from under
us during make indexed dir. So find the old entry again before reset is
needed, otherwise it may corrupt the filesystem as below.

  /dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED.
  /dev/sda: Unattached inode 75
  /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

Fixes: 6b4b8e6b4ad ("ext4: fix bug for rename with RENAME_WHITEOUT")
Cc: stable@vger.kernel.org
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
3 years agogenirq: Disable interrupts for force threaded handlers
Thomas Gleixner [Wed, 17 Mar 2021 14:38:52 +0000 (15:38 +0100)]
genirq: Disable interrupts for force threaded handlers

With interrupt force threading all device interrupt handlers are invoked
from kernel threads. Contrary to hard interrupt context the invocation only
disables bottom halfs, but not interrupts. This was an oversight back then
because any code like this will have an issue:

thread(irq_A)
  irq_handler(A)
    spin_lock(&foo->lock);

interrupt(irq_B)
  irq_handler(B)
    spin_lock(&foo->lock);

This has been triggered with networking (NAPI vs. hrtimers) and console
drivers where printk() happens from an interrupt which interrupted the
force threaded handler.

Now people noticed and started to change the spin_lock() in the handler to
spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the
interrupt request which in turn breaks RT.

Fix the root cause and not the symptom and disable interrupts before
invoking the force threaded handler which preserves the regular semantics
and the usefulness of the interrupt force threading as a general debugging
tool.

For not RT this is not changing much, except that during the execution of
the threaded handler interrupts are delayed until the handler
returns. Vs. scheduling and softirq processing there is no difference.

For RT kernels there is no issue.

Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading")
Reported-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Johan Hovold <johan@kernel.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
3 years agoMerge tag 'riscv-for-linus-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 20 Mar 2021 18:01:54 +0000 (11:01 -0700)]
Merge tag 'riscv-for-linus-5.12-rc4' of git://git./linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "A handful of fixes for 5.12:

   - fix the SBI remote fence numbers for hypervisor fences, which had
     been transcribed in the wrong order in Linux. These fences are only
     used with the KVM patches applied.

   - fix a whole host of build warnings, these should have no functional
     change.

   - fix init_resources() to prevent an off-by-one error from causing an
     out-of-bounds array reference. This was manifesting during boot on
     vexriscv.

   - ensure the KASAN mappings are visible before proceeding to use
     them"

* tag 'riscv-for-linus-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Correct SPARSEMEM configuration
  RISC-V: kasan: Declare kasan_shallow_populate() static
  riscv: Ensure page table writes are flushed when initializing KASAN vmalloc
  RISC-V: Fix out-of-bounds accesses in init_resources()
  riscv: Fix compilation error with Canaan SoC
  ftrace: Fix spelling mistake "disabed" -> "disabled"
  riscv: fix bugon.cocci warnings
  riscv: process: Fix no prototype for arch_dup_task_struct
  riscv: ftrace: Use ftrace_get_regs helper
  riscv: process: Fix no prototype for show_regs
  riscv: syscall_table: Reduce W=1 compilation warnings noise
  riscv: time: Fix no prototype for time_init
  riscv: ptrace: Fix no prototype warnings
  riscv: sbi: Fix comment of __sbi_set_timer_v01
  riscv: irq: Fix no prototype warning
  riscv: traps: Fix no prototype warnings
  RISC-V: correct enum sbi_ext_rfence_fid

3 years agoMerge tag '5.12-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sat, 20 Mar 2021 18:00:25 +0000 (11:00 -0700)]
Merge tag '5.12-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Five cifs/smb3 fixes - three for stable, including an important ACL
  fix and security signature fix"

* tag '5.12-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix allocation size on newly created files
  cifs: warn and fail if trying to use rootfs without the config option
  fs/cifs/: fix misspellings using codespell tool
  cifs: Fix preauth hash corruption
  cifs: update new ACE pointer after populate_new_aces.

3 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 20 Mar 2021 17:57:10 +0000 (10:57 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Eight fixes, all in drivers, all fairly minor either being fixes in
  error legs, memory leaks on teardown, context errors or semantic
  problems"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: mpt3sas: Do not use GFP_KERNEL in atomic context
  scsi: ufs: ufs-mediatek: Correct operator & -> &&
  scsi: sd_zbc: Update write pointer offset cache
  scsi: lpfc: Fix some error codes in debugfs
  scsi: qla2xxx: Fix broken #endif placement
  scsi: st: Fix a use after free in st_open()
  scsi: myrs: Fix a double free in myrs_cleanup()
  scsi: ibmvfc: Free channel_setup_buf during device tear down

3 years agoMerge tag 'zonefs-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Sat, 20 Mar 2021 00:32:30 +0000 (17:32 -0700)]
Merge tag 'zonefs-5.12-rc4' of git://git./linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

 - fix inode write open reference count (Chao)

 - Fix wrong write offset for asynchronous O_APPEND writes (me)

 - Prevent use of sequential zone file as swap files (me)

* tag 'zonefs-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone()
  zonefs: Fix O_APPEND async write handling
  zonefs: prevent use of seq files as swap file

3 years agoMerge tag 'block-5.12-2021-03-19' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 20 Mar 2021 00:07:10 +0000 (17:07 -0700)]
Merge tag 'block-5.12-2021-03-19' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Just an NVMe pull request this week:

   - fix tag allocation for keep alive

   - fix a unit mismatch for the Write Zeroes limits

   - various TCP transport fixes (Sagi Grimberg, Elad Grupi)

   - fix iosqes and iocqes validation for discovery controllers (Sagi Grimberg)"

* tag 'block-5.12-2021-03-19' of git://git.kernel.dk/linux-block:
  nvmet-tcp: fix kmap leak when data digest in use
  nvmet: don't check iosqes,iocqes for discovery controllers
  nvme-rdma: fix possible hang when failing to set io queues
  nvme-tcp: fix possible hang when failing to set io queues
  nvme-tcp: fix misuse of __smp_processor_id with preemption enabled
  nvme-tcp: fix a NULL deref when receiving a 0-length r2t PDU
  nvme: fix Write Zeroes limitations
  nvme: allocate the keep alive request using BLK_MQ_REQ_NOWAIT
  nvme: merge nvme_keep_alive into nvme_keep_alive_work
  nvme-fabrics: only reserve a single tag

3 years agoMerge tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 20 Mar 2021 00:01:09 +0000 (17:01 -0700)]
Merge tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Quieter week this time, which was both expected and desired. About
  half of the below is fixes for this release, the other half are just
  fixes in general. In detail:

   - Fix the freezing of IO threads, by making the freezer not send them
     fake signals. Make them freezable by default.

   - Like we did for personalities, move the buffer IDR to xarray. Kills
     some code and avoids a use-after-free on teardown.

   - SQPOLL cleanups and fixes (Pavel)

   - Fix linked timeout race (Pavel)

   - Fix potential completion post use-after-free (Pavel)

   - Cleanup and move internal structures outside of general kernel view
     (Stefan)

   - Use MSG_SIGNAL for send/recv from io_uring (Stefan)"

* tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block:
  io_uring: don't leak creds on SQO attach error
  io_uring: use typesafe pointers in io_uring_task
  io_uring: remove structures from include/linux/io_uring.h
  io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
  io_uring: fix sqpoll cancellation via task_work
  io_uring: add generic callback_head helpers
  io_uring: fix concurrent parking
  io_uring: halt SQO submission on ctx exit
  io_uring: replace sqd rw_semaphore with mutex
  io_uring: fix complete_post use ctx after free
  io_uring: fix ->flags races by linked timeouts
  io_uring: convert io_buffer_idr to XArray
  io_uring: allow IO worker threads to be frozen
  kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing

3 years agox86/apic/of: Fix CPU devicetree-node lookups
Johan Hovold [Fri, 12 Mar 2021 09:20:33 +0000 (10:20 +0100)]
x86/apic/of: Fix CPU devicetree-node lookups

Architectures that describe the CPU topology in devicetree and do not have
an identity mapping between physical and logical CPU ids must override the
default implementation of arch_match_cpu_phys_id().

Failing to do so breaks CPU devicetree-node lookups using of_get_cpu_node()
and of_cpu_device_node_get() which several drivers rely on. It also causes
the CPU struct devices exported through sysfs to point to the wrong
devicetree nodes.

On x86, CPUs are described in devicetree using their APIC ids and those
do not generally coincide with the logical ids, even if CPU0 typically
uses APIC id 0.

Add the missing implementation of arch_match_cpu_phys_id() so that CPU-node
lookups work also with SMP.

Apart from fixing the broken sysfs devicetree-node links this likely does
not affect current users of mainline kernels on x86.

Fixes: 4e07db9c8db8 ("x86/devicetree: Use CPU description from Device Tree")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210312092033.26317-1-johan@kernel.org
3 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Fri, 19 Mar 2021 21:10:07 +0000 (14:10 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Fixes for kvm on x86:

   - new selftests

   - fixes for migration with HyperV re-enlightenment enabled

   - fix RCU/SRCU usage

   - fixes for local_irq_restore misuse false positive"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  documentation/kvm: additional explanations on KVM_SET_BOOT_CPU_ID
  x86/kvm: Fix broken irq restoration in kvm_wait
  KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs
  KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
  selftests: kvm: add set_boot_cpu_id test
  selftests: kvm: add _vm_ioctl
  selftests: kvm: add get_msr_index_features
  selftests: kvm: Add basic Hyper-V clocksources tests
  KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment
  KVM: x86: hyper-v: Track Hyper-V TSC page status
  KVM: x86: hyper-v: Prevent using not-yet-updated TSC page by secondary CPUs
  KVM: x86: hyper-v: Limit guest to writing zero to HV_X64_MSR_TSC_EMULATION_STATUS
  KVM: x86/mmu: Store the address space ID in the TDP iterator
  KVM: x86/mmu: Factor out tdp_iter_return_to_root
  KVM: x86/mmu: Fix RCU usage when atomically zapping SPTEs
  KVM: x86/mmu: Fix RCU usage in handle_removed_tdp_mmu_page

3 years agoMerge tag 'gpio-fixes-for-v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Fri, 19 Mar 2021 21:07:19 +0000 (14:07 -0700)]
Merge tag 'gpio-fixes-for-v5.12-rc4' of git://git./linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:
 "Two fixes for the GPIO subsystem. Both address issues in the core GPIO
  code:

   - fix the return value in error path in gpiolib_dev_init()

   - fix the 'gpio-line-names' property handling correctly this time"

* tag 'gpio-fixes-for-v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpiolib: Assign fwnode to parent's if no primary one provided
  gpiolib: Fix error return code in gpiolib_dev_init()

3 years agoMerge tag 's390-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Fri, 19 Mar 2021 18:39:28 +0000 (11:39 -0700)]
Merge tag 's390-5.12-4' of git://git./linux/kernel/git/s390/linux

Pull s390 updates from Heiko Carstens:

 - disable preemption when accessing local per-cpu variables in the new
   counter set driver

 - fix by a factor of four increased steal time due to missing
   cputime_to_nsecs() conversion

 - fix PCI device structure leak

* tag 's390-5.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/pci: fix leak of PCI device structure
  s390/vtime: fix increased steal time accounting
  s390/cpumf: disable preemption when accessing per-cpu variable

3 years agoMerge tag 'trace-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Fri, 19 Mar 2021 17:06:30 +0000 (10:06 -0700)]
Merge tag 'trace-v5.12-rc3' of git://git./linux/kernel/git/rostedt/linux-trace

Pull workqueue tracing fix from Steven Rostedt:
 "Fix workqueue trace event unsafe string reference

  After adding a verifier to test all strings printed in trace events to
  make sure they either point to a string on the ring buffer, or to read
  only core kernel memory, it triggered on a workqueue trace event. The
  trace event workqueue_queue_work references the allocated name of the
  workqueue in the output. If the workqueue is freed before the trace is
  read, then the trace will dereference freed memory.

  Update the trace event to use the __string(), __assign_str(), and
  __get_str() helpers to handle such cases"

* tag 'trace-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  workqueue/tracing: Copy workqueue name to buffer in trace event

3 years agoMerge tag 'pm-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 19 Mar 2021 17:00:10 +0000 (10:00 -0700)]
Merge tag 'pm-5.12-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "Revert two problematic commits.

  Specifics:

   - Revert ACPI PM commit that attempted to improve reboot handling on
     some systems, but it caused other systems to panic() during reboot
     (Josef Bacik)

   - Revert PM-runtime commit that attempted to improve the handling of
     suppliers during PM-runtime suspend of a consumer device, but it
     introduced a race condition potentially leading to unexpected
     behavior (Rafael Wysocki)"

* tag 'pm-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  Revert "PM: runtime: Update device status before letting suppliers suspend"
  Revert "PM: ACPI: reboot: Use S5 for reboot"

3 years agoMerge tag 'iommu-fixes-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 19 Mar 2021 16:56:04 +0000 (09:56 -0700)]
Merge tag 'iommu-fixes-v5.12-rc3' of git://git./linux/kernel/git/joro/iommu

Pull iommu fixes from Joerg Roedel:

 - Three AMD IOMMU patches to fix a boot crash on AMD Stoney systems and
   every other AMD IOMMU system booted with 'amd_iommu=off'.

   This is a v5.11 regression.

 - A Fix for the Tegra IOMMU driver to make sure it detects all IOMMUs

* tag 'iommu-fixes-v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/tegra-smmu: Make tegra_smmu_probe_device() to handle all IOMMU phandles
  iommu/amd: Keep track of amd_iommu_irq_remap state
  iommu/amd: Don't call early_amd_iommu_init() when AMD IOMMU is disabled
  iommu/amd: Move Stoney Ridge check to detect_ivrs()

3 years agoMerge tag 'sound-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 19 Mar 2021 16:53:32 +0000 (09:53 -0700)]
Merge tag 'sound-5.12-rc4' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "The majority of changes are various ASoC device/platform-specific
  small fixes (including a removal of stale file) while the only common
  change is a clk management fix in ASoC simple-card driver.

  The rest are the usual HD-audio quirks"

* tag 'sound-5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (44 commits)
  ALSA: usb-audio: Fix unintentional sign extension issue
  ALSA: hda/realtek: fix mute/micmute LEDs for HP 850 G8
  ASoC: dt-bindings: fsl_spdif: Add compatible string for new platforms
  ASoC: rt711: add snd_soc_component remove callback
  ASoC: rt5659: Update MCLK rate in set_sysclk()
  ASoC: simple-card-utils: Do not handle device clock
  ALSA: hda/realtek: fix mute/micmute LEDs for HP 440 G8
  ALSA: hda/realtek: fix mute/micmute LEDs for HP 840 G8
  ALSA: hda/realtek: apply pin quirk for XiaomiNotebook Pro
  ALSA: hda/realtek: Apply headset-mic quirks for Xiaomi Redmibook Air
  ASoC: mediatek: mt8192: fix tdm out data is valid on rising edge
  ALSA: dice: fix null pointer dereference when node is disconnected
  ALSA: hda: generic: Fix the micmute led init state
  ASoC: qcom: lpass-cpu: Fix lpass dai ids parse
  spi: cadence: set cqspi to the driver_data field of struct device
  ASoC: SOF: intel: fix wrong poll bits in dsp power down
  ASoC: codecs: wcd934x: add a sanity check in set channel map
  ASoC: qcom: sdm845: Fix array out of range on rx slim channels
  ASoC: qcom: sdm845: Fix array out of bounds access
  ASoC: remove remnants of sirf prima/atlas audio codec
  ...

3 years agocifs: fix allocation size on newly created files
Steve French [Fri, 19 Mar 2021 05:05:48 +0000 (00:05 -0500)]
cifs: fix allocation size on newly created files

Applications that create and extend and write to a file do not
expect to see 0 allocation size.  When file is extended,
set its allocation size to a plausible value until we have a
chance to query the server for it.  When the file is cached
this will prevent showing an impossible number of allocated
blocks (like 0).  This fixes e.g. xfstests 614 which does

    1) create a file and set its size to 64K
    2) mmap write 64K to the file
    3) stat -c %b for the file (to query the number of allocated blocks)

It was failing because we returned 0 blocks.  Even though we would
return the correct cached file size, we returned an impossible
allocation size.

Signed-off-by: Steve French <stfrench@microsoft.com>
CC: <stable@vger.kernel.org>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
3 years agoMerge branch 'pm-core'
Rafael J. Wysocki [Fri, 19 Mar 2021 15:38:45 +0000 (16:38 +0100)]
Merge branch 'pm-core'

* pm-core:
  Revert "PM: runtime: Update device status before letting suppliers suspend"