OSDN Git Service

tomoyo/tomoyo-test1.git
4 years agoxfs: mark XLOG_FORCED_SHUTDOWN as unlikely
Christoph Hellwig [Thu, 12 Mar 2020 23:52:49 +0000 (16:52 -0700)]
xfs: mark XLOG_FORCED_SHUTDOWN as unlikely

A shutdown log is a slow failure path.  Add an unlikely annotation to
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: make the btree ag cursor private union anonymous
Dave Chinner [Wed, 11 Mar 2020 00:57:51 +0000 (17:57 -0700)]
xfs: make the btree ag cursor private union anonymous

This is much less widely used than the bc_private union was, so this
is done as a single patch. The named union xfs_btree_cur_private
goes away and is embedded into the struct xfs_btree_cur_ag as an
anonymous union, and the code is modified via this script:

$ sed -i 's/priv\.\([abt|refc]\)/\1/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: make the btree cursor union members named structure
Dave Chinner [Wed, 11 Mar 2020 00:57:07 +0000 (17:57 -0700)]
xfs: make the btree cursor union members named structure

we need to name the btree cursor private structures to be able
to pull them out of the deeply nested structure definition they are
in now.

Based on code extracted from a patchset by Darrick Wong.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: make btree cursor private union anonymous
Dave Chinner [Wed, 11 Mar 2020 00:54:38 +0000 (17:54 -0700)]
xfs: make btree cursor private union anonymous

Rename the union and it's internal structures to the new name and
remove the temporary defines that facilitated the change.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: rename btree cursor private btree member flags
Dave Chinner [Wed, 11 Mar 2020 00:54:38 +0000 (17:54 -0700)]
xfs: rename btree cursor private btree member flags

BPRV is not longer appropriate because bc_private is going away.
Script:

$ sed -i 's/BTCUR_BPRV/BTCUR_BMBT/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

With manual cleanup to the definitions in fs/xfs/libxfs/xfs_btree.h

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change "BC_BT" to "BTCUR_BMBT", fix subject line typo]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: convert btree cursor inode-private member names
Dave Chinner [Wed, 11 Mar 2020 00:52:53 +0000 (17:52 -0700)]
xfs: convert btree cursor inode-private member names

bc_private.b -> bc_ino conversion via script:

$ sed -i 's/bc_private\.b/bc_ino/g' fs/xfs/*[ch] fs/xfs/*/*[ch]

And then revert the change to the bc_ino #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: tweak the subject line slightly]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: convert btree cursor ag-private member name
Dave Chinner [Wed, 11 Mar 2020 00:51:15 +0000 (17:51 -0700)]
xfs: convert btree cursor ag-private member name

bc_private.a -> bc_ag conversion via script:

`sed -i 's/bc_private\.a/bc_ag/g' fs/xfs/*[ch] fs/xfs/*/*[ch]`

And then revert the change to the bc_ag #define in
fs/xfs/libxfs/xfs_btree.h manually.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: introduce new private btree cursor names
Dave Chinner [Wed, 11 Mar 2020 00:50:41 +0000 (17:50 -0700)]
xfs: introduce new private btree cursor names

Just the defines of the new names - the conversion will be in
scripted commits after this.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change "bc_bt" to "bc_ino"]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
4 years agoxfs: fix regression in "cleanup xfs_dir2_block_getdents"
Tommi Rantala [Thu, 12 Mar 2020 14:43:53 +0000 (07:43 -0700)]
xfs: fix regression in "cleanup xfs_dir2_block_getdents"

Commit 263dde869bd09 ("xfs: cleanup xfs_dir2_block_getdents") introduced
a getdents regression, when it converted the pointer arithmetics to
offset calculations: offset is updated in the loop already for the next
iteration, but the updated offset value is used incorrectly in two
places, where we should have used the not-yet-updated value.

This caused for example "git clean -ffdx" failures to cleanup certain
directory structures when running in a container.

Fix the regression by making sure we use proper offset in the loop body.
Thanks to Christoph Hellwig for suggestion how to best fix the code.

Cc: Christoph Hellwig <hch@lst.de>
Fixes: 263dde869bd09 ("xfs: cleanup xfs_dir2_block_getdents")
Signed-off-by: Tommi Rantala <tommi.t.rantala@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: Use scnprintf() for avoiding potential buffer overflow
Takashi Iwai [Wed, 11 Mar 2020 18:15:35 +0000 (11:15 -0700)]
xfs: Use scnprintf() for avoiding potential buffer overflow

Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: mark extended attr corrupt when lookup-by-hash fails
Darrick J. Wong [Wed, 11 Mar 2020 17:38:09 +0000 (10:38 -0700)]
xfs: mark extended attr corrupt when lookup-by-hash fails

In xchk_xattr_listent, we attempt to validate the extended attribute
hash structures by performing a attr lookup by (hashed) name.  If the
lookup returns ENODATA, that means that the hash information is corrupt.
The _process_error functions don't catch this, so we have to add that
explicitly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: mark dir corrupt when lookup-by-hash fails
Darrick J. Wong [Wed, 11 Mar 2020 17:37:57 +0000 (10:37 -0700)]
xfs: mark dir corrupt when lookup-by-hash fails

In xchk_dir_actor, we attempt to validate the directory hash structures
by performing a directory entry lookup by (hashed) name.  If the lookup
returns ENOENT, that means that the hash information is corrupt.  The
_process_error functions don't catch this, so we have to add that
explicitly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: check owner of dir3 blocks
Darrick J. Wong [Wed, 11 Mar 2020 17:37:57 +0000 (10:37 -0700)]
xfs: check owner of dir3 blocks

Check the owner field of dir3 block headers.  If it's corrupt, release
the buffer and return EFSCORRUPTED.  All callers handle this properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: check owner of dir3 data blocks
Darrick J. Wong [Wed, 11 Mar 2020 17:37:56 +0000 (10:37 -0700)]
xfs: check owner of dir3 data blocks

Check the owner field of dir3 data block headers.  If it's corrupt,
release the buffer and return EFSCORRUPTED.  All callers handle this
properly.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: check owner of dir3 free blocks
Darrick J. Wong [Wed, 11 Mar 2020 17:37:56 +0000 (10:37 -0700)]
xfs: check owner of dir3 free blocks

Check the owner field of dir3 free block headers and reject the metadata
if there's something wrong with it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: don't ever return a stale pointer from __xfs_dir3_free_read
Darrick J. Wong [Wed, 11 Mar 2020 17:37:55 +0000 (10:37 -0700)]
xfs: don't ever return a stale pointer from __xfs_dir3_free_read

If we decide that a directory free block is corrupt, we must take care
not to leak a buffer pointer to the caller.  After xfs_trans_brelse
returns, the buffer can be freed or reused, which means that we have to
set *bpp back to NULL.

Callers are supposed to notice the nonzero return value and not use the
buffer pointer, but we should code more defensively, even if all current
callers handle this situation correctly.

Fixes: de14c5f541e7 ("xfs: verify free block header fields")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails
Darrick J. Wong [Wed, 11 Mar 2020 17:37:55 +0000 (10:37 -0700)]
xfs: fix buffer corruption reporting when xfs_dir3_free_header_check fails

xfs_verifier_error is supposed to be called on a corrupt metadata buffer
from within a buffer verifier function, whereas xfs_buf_mark_corrupt
is the function to be called when a piece of code has read a buffer and
catches something that a read verifier cannot.  The first function sets
b_error anticipating that the low level buffer handling code will see
the nonzero b_error and clear XBF_DONE on the buffer, whereas the second
function does not.

Since xfs_dir3_free_header_check examines fields in the dir free block
header that require more context than can be provided to read verifiers,
we must call xfs_buf_mark_corrupt when it finds a problem.

Switching the calls has a secondary effect that we no longer corrupt the
buffer state by setting b_error and leaving XBF_DONE set.  When /that/
happens, we'll trip over various state assertions (most commonly the
b_error check in xfs_buf_reverify) on a subsequent attempt to read the
buffer.

Fixes: bc1a09b8e334bf5f ("xfs: refactor verifier callers to print address of failing check")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: xfs_buf_corruption_error should take __this_address
Darrick J. Wong [Wed, 11 Mar 2020 17:37:54 +0000 (10:37 -0700)]
xfs: xfs_buf_corruption_error should take __this_address

Add a xfs_failaddr_t parameter to this function so that callers can
potentially pass in (and therefore report) the exact point in the code
where we decided that a metadata buffer was corrupt.  This enables us to
wire it up to checking functions that have to run outside of verifiers.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: add a function to deal with corrupt buffers post-verifiers
Darrick J. Wong [Wed, 11 Mar 2020 17:37:54 +0000 (10:37 -0700)]
xfs: add a function to deal with corrupt buffers post-verifiers

Add a helper function to get rid of buffers that we have decided are
corrupt after the verifiers have run.  This function is intended to
handle metadata checks that can't happen in the verifiers, such as
inter-block relationship checking.  Note that we now mark the buffer
stale so that it will not end up on any LRU and will be purged on
release.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
4 years agoxfs: fix xfs_rmap_has_other_keys usage of ECANCELED
Darrick J. Wong [Wed, 11 Mar 2020 17:37:53 +0000 (10:37 -0700)]
xfs: fix xfs_rmap_has_other_keys usage of ECANCELED

In e7ee96dfb8c26, we converted all ITER_ABORT users to use ECANCELED
instead, but we forgot to teach xfs_rmap_has_other_keys not to return
that magic value to callers.  Fix it now by using ECANCELED both to
abort the iteration and to signal that we found another reverse mapping.
This enables us to drop the separate boolean flag.

Fixes: e7ee96dfb8c26 ("xfs: remove all *_ITER_ABORT values")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agoxfs: fix use-after-free when aborting corrupt attr inactivation
Darrick J. Wong [Wed, 11 Mar 2020 17:37:53 +0000 (10:37 -0700)]
xfs: fix use-after-free when aborting corrupt attr inactivation

Log the corrupt buffer before we release the buffer.

Fixes: a5155b870d687 ("xfs: always log corruption errors")
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agoxfs: remove XFS_BUF_TO_SBP
Christoph Hellwig [Tue, 10 Mar 2020 15:57:30 +0000 (08:57 -0700)]
xfs: remove XFS_BUF_TO_SBP

Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove XFS_BUF_TO_AGF
Christoph Hellwig [Tue, 10 Mar 2020 15:57:29 +0000 (08:57 -0700)]
xfs: remove XFS_BUF_TO_AGF

Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove XFS_BUF_TO_AGI
Christoph Hellwig [Tue, 10 Mar 2020 15:57:29 +0000 (08:57 -0700)]
xfs: remove XFS_BUF_TO_AGI

Just dereference bp->b_addr directly and make the code a little
simpler and more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the xfs_agfl_t typedef
Christoph Hellwig [Tue, 10 Mar 2020 15:57:28 +0000 (08:57 -0700)]
xfs: remove the xfs_agfl_t typedef

There is just a single user left, so remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the agfl_bno member from struct xfs_agfl
Christoph Hellwig [Tue, 10 Mar 2020 15:57:28 +0000 (08:57 -0700)]
xfs: remove the agfl_bno member from struct xfs_agfl

struct xfs_agfl is a header in front of the AGFL entries that exists
for CRC enabled file systems.  For not CRC enabled file systems the AGFL
is simply a list of agbno.  Make the CRC case similar to that by just
using the list behind the new header.  This indirectly solves a problem
with modern gcc versions that warn about taking addresses of packed
structures (and we have to pack the AGFL given that gcc rounds up
structure sizes).  Also replace the helper macro to get from a buffer
with an inline function in xfs_alloc.h to make the code easier to
read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: clear PF_MEMALLOC before exiting xfsaild thread
Eric Biggers [Tue, 10 Mar 2020 15:57:27 +0000 (08:57 -0700)]
xfs: clear PF_MEMALLOC before exiting xfsaild thread

Leaving PF_MEMALLOC set when exiting a kthread causes it to remain set
during do_exit().  That can confuse things.  In particular, if BSD
process accounting is enabled, then do_exit() writes data to an
accounting file.  If that file has FS_SYNC_FL set, then this write
occurs synchronously and can misbehave if PF_MEMALLOC is set.

For example, if the accounting file is located on an XFS filesystem,
then a WARN_ON_ONCE() in iomap_do_writepage() is triggered and the data
doesn't get written when it should.  Or if the accounting file is
located on an ext4 filesystem without a journal, then a WARN_ON_ONCE()
in ext4_write_inode() is triggered and the inode doesn't get written.

Fix this in xfsaild() by using the helper functions to save and restore
PF_MEMALLOC.

This can be reproduced as follows in the kvm-xfstests test appliance
modified to add the 'acct' Debian package, and with kvm-xfstests's
recommended kconfig modified to add CONFIG_BSD_PROCESS_ACCT=y:

        mkfs.xfs -f /dev/vdb
        mount /vdb
        touch /vdb/file
        chattr +S /vdb/file
        accton /vdb/file
        mkfs.xfs -f /dev/vdc
        mount /vdc
        umount /vdc

It causes:
WARNING: CPU: 1 PID: 336 at fs/iomap/buffered-io.c:1534
CPU: 1 PID: 336 Comm: xfsaild/vdc Not tainted 5.6.0-rc5 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
RIP: 0010:iomap_do_writepage+0x16b/0x1f0 fs/iomap/buffered-io.c:1534
[...]
Call Trace:
 write_cache_pages+0x189/0x4d0 mm/page-writeback.c:2238
 iomap_writepages+0x1c/0x33 fs/iomap/buffered-io.c:1642
 xfs_vm_writepages+0x65/0x90 fs/xfs/xfs_aops.c:578
 do_writepages+0x41/0xe0 mm/page-writeback.c:2344
 __filemap_fdatawrite_range+0xd2/0x120 mm/filemap.c:421
 file_write_and_wait_range+0x71/0xc0 mm/filemap.c:760
 xfs_file_fsync+0x7a/0x2b0 fs/xfs/xfs_file.c:114
 generic_write_sync include/linux/fs.h:2867 [inline]
 xfs_file_buffered_aio_write+0x379/0x3b0 fs/xfs/xfs_file.c:691
 call_write_iter include/linux/fs.h:1901 [inline]
 new_sync_write+0x130/0x1d0 fs/read_write.c:483
 __kernel_write+0x54/0xe0 fs/read_write.c:515
 do_acct_process+0x122/0x170 kernel/acct.c:522
 slow_acct_process kernel/acct.c:581 [inline]
 acct_process+0x1d4/0x27c kernel/acct.c:607
 do_exit+0x83d/0xbc0 kernel/exit.c:791
 kthread+0xf1/0x140 kernel/kthread.c:257
 ret_from_fork+0x27/0x50 arch/x86/entry/entry_64.S:352

This bug was originally reported by syzbot at
https://lore.kernel.org/r/0000000000000e7156059f751d7b@google.com.

Reported-by: syzbot+1f9dc49e8de2582d90c2@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agoxfs: switch xfs_attrmulti_attr_get to lazy attr buffer allocation
Christoph Hellwig [Thu, 27 Feb 2020 01:30:45 +0000 (17:30 -0800)]
xfs: switch xfs_attrmulti_attr_get to lazy attr buffer allocation

Let the low-level attr code only allocate the needed buffer size
for xfs_attrmulti_attr_get instead of allocating the upper bound
at the top of the call chain.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: only allocate the buffer size actually needed in __xfs_set_acl
Christoph Hellwig [Thu, 27 Feb 2020 01:30:44 +0000 (17:30 -0800)]
xfs: only allocate the buffer size actually needed in __xfs_set_acl

No need to allocate the max size if we can just allocate the easily
known actual ACL size.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: clean up bufsize alignment in xfs_ioc_attr_list
Christoph Hellwig [Thu, 27 Feb 2020 01:30:44 +0000 (17:30 -0800)]
xfs: clean up bufsize alignment in xfs_ioc_attr_list

Use the round_down macro, and use the size of the uint32 type we
use in the callback that fills the buffer to make the code a little
more clear - the size of it is always the same as int for platforms
that Linux runs on.

Suggested-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: embedded the attrlist cursor into struct xfs_attr_list_context
Christoph Hellwig [Thu, 27 Feb 2020 01:30:43 +0000 (17:30 -0800)]
xfs: embedded the attrlist cursor into struct xfs_attr_list_context

The attrlist cursor only exists as part of an attr list context, so
embedd the structure instead of pointing to it.  Also give it a proper
xfs_ prefix and remove the obsolete typedef.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove XFS_DA_OP_INCOMPLETE
Christoph Hellwig [Thu, 27 Feb 2020 01:30:43 +0000 (17:30 -0800)]
xfs: remove XFS_DA_OP_INCOMPLETE

Now that we use the on-disk flags field also for the interface to the
lower level attr routines we can use the XFS_ATTR_INCOMPLETE definition
from the on-disk format directly instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: clean up the attr flag confusion
Christoph Hellwig [Thu, 27 Feb 2020 01:30:42 +0000 (17:30 -0800)]
xfs: clean up the attr flag confusion

The ATTR_* flags have a long IRIX history, where they a userspace
interface, the on-disk format and an internal interface.  We've split
out the on-disk interface to the XFS_ATTR_* values, but despite (or
because?) of that the flag have still been a mess.  Switch the
internal interface to pass the on-disk XFS_ATTR_* flags for the
namespace and the Linux XATTR_* flags for the actual flags instead.
The ATTR_* values that are actually used are move to xfs_fs.h with a
new XFS_IOC_* prefix to not conflict with the userspace version that
has the same name and must have the same value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: clean up the ATTR_REPLACE checks
Christoph Hellwig [Thu, 27 Feb 2020 01:30:42 +0000 (17:30 -0800)]
xfs: clean up the ATTR_REPLACE checks

Remove superflous braces, elses after return statements and use a goto
label to merge common error handling.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: improve xfs_forget_acl
Christoph Hellwig [Thu, 27 Feb 2020 01:30:41 +0000 (17:30 -0800)]
xfs: improve xfs_forget_acl

Move the function to xfs_acl.c and provide a proper stub for the
!CONFIG_XFS_POSIX_ACL case.  Lift the flags check to the caller as it
nicely fits in there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: lift cursor copy in/out into xfs_ioc_attr_list
Christoph Hellwig [Thu, 27 Feb 2020 01:30:41 +0000 (17:30 -0800)]
xfs: lift cursor copy in/out into xfs_ioc_attr_list

Lift the common code to copy the cursor from and to user space into
xfs_ioc_attr_list.  Note that this means we copy in twice now as
the cursor is in the middle of the conaining structure, but we never
touch the memory for the original copy.  Doing so keeps the cursor
handling isolated in the common helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: lift buffer allocation into xfs_ioc_attr_list
Christoph Hellwig [Thu, 27 Feb 2020 01:30:40 +0000 (17:30 -0800)]
xfs: lift buffer allocation into xfs_ioc_attr_list

Lift the buffer allocation from the two callers into xfs_ioc_attr_list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: lift common checks into xfs_ioc_attr_list
Christoph Hellwig [Thu, 27 Feb 2020 01:30:40 +0000 (17:30 -0800)]
xfs: lift common checks into xfs_ioc_attr_list

Lift the flags and bufsize checks from both callers into the common code
in xfs_ioc_attr_list.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: rename xfs_attr_list_int to xfs_attr_list
Christoph Hellwig [Thu, 27 Feb 2020 01:30:39 +0000 (17:30 -0800)]
xfs: rename xfs_attr_list_int to xfs_attr_list

The version taking the context structure is the main interface to list
attributes, so drop the _int postfix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: move the legacy xfs_attr_list to xfs_ioctl.c
Christoph Hellwig [Thu, 27 Feb 2020 01:30:38 +0000 (17:30 -0800)]
xfs: move the legacy xfs_attr_list to xfs_ioctl.c

The old xfs_attr_list code is only used by the attrlist by handle
ioctl.  Move it to xfs_ioctl.c with its user.  Also move the
attrlist and attrlist_ent structure to xfs_fs.h, as they are exposed
user ABIs.  They are used through libattr headers with the same name
by at least xfsdump.  Also document this relation so that it doesn't
require a research project to figure out.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: open code ATTR_ENTSIZE
Christoph Hellwig [Thu, 27 Feb 2020 01:30:38 +0000 (17:30 -0800)]
xfs: open code ATTR_ENTSIZE

Replace a single use macro containing open-coded variants of
standard helpers with direct calls to the standard helpers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the unused ATTR_ENTRY macro
Christoph Hellwig [Thu, 27 Feb 2020 01:30:37 +0000 (17:30 -0800)]
xfs: remove the unused ATTR_ENTRY macro

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: cleanup struct xfs_attr_list_context
Christoph Hellwig [Thu, 27 Feb 2020 01:30:37 +0000 (17:30 -0800)]
xfs: cleanup struct xfs_attr_list_context

Replace the alist char pointer with a void buffer given that different
callers use it in different ways.  Use the chance to remove the typedef
and reduce the indentation of the struct definition so that it doesn't
overflow 80 char lines all over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: factor out a xfs_attr_match helper
Christoph Hellwig [Thu, 27 Feb 2020 01:30:36 +0000 (17:30 -0800)]
xfs: factor out a xfs_attr_match helper

Factor out a helper that compares an on-disk attr vs the name, length and
flags specified in struct xfs_da_args.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: replace ATTR_KERNOTIME with XFS_DA_OP_NOTIME
Christoph Hellwig [Thu, 27 Feb 2020 01:30:36 +0000 (17:30 -0800)]
xfs: replace ATTR_KERNOTIME with XFS_DA_OP_NOTIME

op_flags with the XFS_DA_OP_* flags is the usual place for in-kernel
only flags, so move the notime flag there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove ATTR_ALLOC and XFS_DA_OP_ALLOCVAL
Christoph Hellwig [Thu, 27 Feb 2020 01:30:35 +0000 (17:30 -0800)]
xfs: remove ATTR_ALLOC and XFS_DA_OP_ALLOCVAL

Use a NULL args->value as the indicator to lazily allocate a buffer
instead, and let the caller always free args->value instead of
duplicating the cleanup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove ATTR_KERNOVAL
Christoph Hellwig [Thu, 27 Feb 2020 01:30:35 +0000 (17:30 -0800)]
xfs: remove ATTR_KERNOVAL

We can just pass down the Linux convention of a zero valuelen to just
query for the existance of an attribute to the low-level code instead.
The use in the legacy xfs_attr_list code only used by the ioctl
interface was already dead code, as the callers check that the flag
is not present.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the xfs_inode argument to xfs_attr_get_ilocked
Christoph Hellwig [Thu, 27 Feb 2020 01:30:34 +0000 (17:30 -0800)]
xfs: remove the xfs_inode argument to xfs_attr_get_ilocked

The inode can easily be derived from the args structure.  Also
don't bother with else statements after early returns.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: pass an initialized xfs_da_args to xfs_attr_get
Christoph Hellwig [Thu, 27 Feb 2020 01:30:34 +0000 (17:30 -0800)]
xfs: pass an initialized xfs_da_args to xfs_attr_get

Instead of converting from one style of arguments to another in
xfs_attr_set, pass the structure from higher up in the call chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: pass an initialized xfs_da_args structure to xfs_attr_set
Christoph Hellwig [Thu, 27 Feb 2020 01:30:33 +0000 (17:30 -0800)]
xfs: pass an initialized xfs_da_args structure to xfs_attr_set

Instead of converting from one style of arguments to another in
xfs_attr_set, pass the structure from higher up in the call chain.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: turn xfs_da_args.value into a void pointer
Christoph Hellwig [Thu, 27 Feb 2020 01:30:33 +0000 (17:30 -0800)]
xfs: turn xfs_da_args.value into a void pointer

The xattr values are blobs and should not be typed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the MAXNAMELEN check from xfs_attr_args_init
Christoph Hellwig [Thu, 27 Feb 2020 01:30:32 +0000 (17:30 -0800)]
xfs: remove the MAXNAMELEN check from xfs_attr_args_init

All the callers already check the length when allocating the
in-kernel xattrs buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the name == NULL check from xfs_attr_args_init
Christoph Hellwig [Thu, 27 Feb 2020 01:30:32 +0000 (17:30 -0800)]
xfs: remove the name == NULL check from xfs_attr_args_init

All callers provide a valid name pointer, remove the redundant check.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: factor out a helper for a single XFS_IOC_ATTRMULTI_BY_HANDLE op
Christoph Hellwig [Thu, 27 Feb 2020 01:30:31 +0000 (17:30 -0800)]
xfs: factor out a helper for a single XFS_IOC_ATTRMULTI_BY_HANDLE op

Add a new helper to handle a single attr multi ioctl operation that
can be shared between the native and compat ioctl implementation.

There is a slight change in behaviour in that we don't break out of the
loop when copying in the attribute name fails.  The previous behaviour
was rather inconsistent here as it continued for any other kind of
error, and that we don't clear the flags in the structure returned
to userspace, a behavior only introduced as a bug fix in the last
merge window.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: use strndup_user in XFS_IOC_ATTRMULTI_BY_HANDLE
Christoph Hellwig [Thu, 27 Feb 2020 01:30:31 +0000 (17:30 -0800)]
xfs: use strndup_user in XFS_IOC_ATTRMULTI_BY_HANDLE

Simplify the user copy code by using strndup_user.  This means that we
now do one memory allocation per operation instead of one per ioctl,
but memory allocations are cheap compared to the actual file system
operations.  Also the error for an invalid path is now EINVAL or EFAULT
instead of the previous odd and undocumented ERANGE.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: merge xfs_attrmulti_attr_remove into xfs_attrmulti_attr_set
Christoph Hellwig [Thu, 27 Feb 2020 01:30:30 +0000 (17:30 -0800)]
xfs: merge xfs_attrmulti_attr_remove into xfs_attrmulti_attr_set

Merge the ioctl handlers just like the low-level xfs_attr_set function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: merge xfs_attr_remove into xfs_attr_set
Christoph Hellwig [Thu, 27 Feb 2020 01:30:29 +0000 (17:30 -0800)]
xfs: merge xfs_attr_remove into xfs_attr_set

The Linux xattr and acl APIs use a single call for set and remove.
Modify the high-level XFS API to match that and let xfs_attr_set handle
removing attributes as well.  With a little bit of reordering this
removes a lot of code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the ATTR_INCOMPLETE flag
Christoph Hellwig [Thu, 27 Feb 2020 01:30:29 +0000 (17:30 -0800)]
xfs: remove the ATTR_INCOMPLETE flag

Replace the ATTR_INCOMPLETE flag with a new boolean field in struct
xfs_attr_list_context.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: reject invalid flags combinations in XFS_IOC_ATTRLIST_BY_HANDLE
Christoph Hellwig [Thu, 27 Feb 2020 01:30:28 +0000 (17:30 -0800)]
xfs: reject invalid flags combinations in XFS_IOC_ATTRLIST_BY_HANDLE

While the flags field in the ABI and the on-disk format allows for
multiple namespace flags, an attribute can only exist in a single
namespace at a time. Hence asking to list attributes that exist
in multiple namespaces simultaneously is a logically invalid
request and will return no results. Reject this case early with
-EINVAL.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: rework collapse range into an atomic operation
Brian Foster [Wed, 26 Feb 2020 17:43:16 +0000 (09:43 -0800)]
xfs: rework collapse range into an atomic operation

The collapse range operation uses a unique transaction and ilock
cycle for the hole punch and each extent shift iteration of the
overall operation. While the hole punch is safe as a separate
operation due to the iolock, cycling the ilock after each extent
shift is risky w.r.t. concurrent operations, similar to insert range.

To avoid this problem, make collapse range atomic with respect to
ilock. Hold the ilock across the entire operation, replace the
individual transactions with a single rolling transaction sequence
and finish dfops on each iteration to perform pending frees and roll
the transaction. Remove the unnecessary quota reservation as
collapse range can only ever merge extents (and thus remove extent
records and potentially free bmap blocks). The dfops call
automatically relogs the inode to keep it moving in the log. This
guarantees that nothing else can change the extent mapping of an
inode while a collapse range operation is in progress.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: rework insert range into an atomic operation
Brian Foster [Wed, 26 Feb 2020 17:43:16 +0000 (09:43 -0800)]
xfs: rework insert range into an atomic operation

The insert range operation uses a unique transaction and ilock cycle
for the extent split and each extent shift iteration of the overall
operation. While this works, it is risks racing with other
operations in subtle ways such as COW writeback modifying an extent
tree in the middle of a shift operation.

To avoid this problem, make insert range atomic with respect to
ilock. Hold the ilock across the entire operation, replace the
individual transactions with a single rolling transaction sequence
and relog the inode to keep it moving in the log. This guarantees
that nothing else can change the extent mapping of an inode while
an insert range operation is in progress.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: open code insert range extent split helper
Brian Foster [Wed, 26 Feb 2020 17:43:15 +0000 (09:43 -0800)]
xfs: open code insert range extent split helper

The insert range operation currently splits the extent at the target
offset in a separate transaction and lock cycle from the one that
shifts extents. In preparation for reworking insert range into an
atomic operation, lift the code into the caller so it can be easily
condensed to a single rolling transaction and lock cycle and
eliminate the helper. No functional changes.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Allison Collins <allison.henderson@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: Add missing annotation to xfs_ail_check()
Jules Irenge [Wed, 26 Feb 2020 17:37:15 +0000 (09:37 -0800)]
xfs: Add missing annotation to xfs_ail_check()

Sparse reports a warning at xfs_ail_check()

warning: context imbalance in xfs_ail_check() - unexpected unlock

The root cause is the missing annotation at xfs_ail_check()

Add the missing __must_hold(&ailp->ail_lock) annotation

Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: fix an undefined behaviour in _da3_path_shift
Qian Cai [Wed, 26 Feb 2020 17:36:44 +0000 (09:36 -0800)]
xfs: fix an undefined behaviour in _da3_path_shift

In xfs_da3_path_shift() "blk" can be assigned to state->path.blk[-1] if
state->path.active is 1 (which is a valid state) when it tries to add an
entry to a single dir leaf block and then to shift forward to see if
there's a sibling block that would be a better place to put the new
entry. This causes a UBSAN warning given negative array indices are
undefined behavior in C. In practice the warning is entirely harmless
given that "blk" is never dereferenced in this case, but it is still
better to fix up the warning and slightly improve the code.

 UBSAN: Undefined behaviour in fs/xfs/libxfs/xfs_da_btree.c:1989:14
 index -1 is out of range for type 'xfs_da_state_blk_t [5]'
 Call trace:
  dump_backtrace+0x0/0x2c8
  show_stack+0x20/0x2c
  dump_stack+0xe8/0x150
  __ubsan_handle_out_of_bounds+0xe4/0xfc
  xfs_da3_path_shift+0x860/0x86c [xfs]
  xfs_da3_node_lookup_int+0x7c8/0x934 [xfs]
  xfs_dir2_node_addname+0x2c8/0xcd0 [xfs]
  xfs_dir_createname+0x348/0x38c [xfs]
  xfs_create+0x6b0/0x8b4 [xfs]
  xfs_generic_create+0x12c/0x1f8 [xfs]
  xfs_vn_mknod+0x3c/0x4c [xfs]
  xfs_vn_create+0x34/0x44 [xfs]
  do_last+0xd4c/0x10c8
  path_openat+0xbc/0x2f4
  do_filp_open+0x74/0xf4
  do_sys_openat2+0x98/0x180
  __arm64_sys_openat+0xf8/0x170
  do_el0_svc+0x170/0x240
  el0_sync_handler+0x150/0x250
  el0_sync+0x164/0x180

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: ratelimit xfs_discard_page messages
Christoph Hellwig [Fri, 21 Feb 2020 15:34:48 +0000 (07:34 -0800)]
xfs: ratelimit xfs_discard_page messages

Use printk_ratelimit() to limit the amount of messages printed from
xfs_discard_page.  Without that a failing device causes a large
number of errors that doesn't really help debugging the underling
issue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: ratelimit xfs_buf_ioerror_alert messages
Christoph Hellwig [Fri, 21 Feb 2020 15:34:47 +0000 (07:34 -0800)]
xfs: ratelimit xfs_buf_ioerror_alert messages

Use printk_ratelimit() to limit the amount of messages printed from
xfs_buf_ioerror_alert.  Without that a failing device causes a large
number of errors that doesn't really help debugging the underling
issue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the kuid/kgid conversion wrappers
Christoph Hellwig [Fri, 21 Feb 2020 16:31:27 +0000 (08:31 -0800)]
xfs: remove the kuid/kgid conversion wrappers

Remove the XFS wrappers for converting from and to the kuid/kgid types.
Mostly this means switching to VFS i_{u,g}id_{read,write} helpers, but
in a few spots the calls to the conversion functions is open coded.
To match the use of sb->s_user_ns in the helpers and other file systems,
sb->s_user_ns is also used in the quota code.  The ACL code already does
the conversion in a grotty layering violation in the VFS xattr code,
so it keeps using init_user_ns for the identity mapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: remove the icdinode di_uid/di_gid members
Christoph Hellwig [Fri, 21 Feb 2020 16:31:27 +0000 (08:31 -0800)]
xfs: remove the icdinode di_uid/di_gid members

Use the Linux inode i_uid/i_gid members everywhere and just convert
from/to the scalar value when reading or writing the on-disk inode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: ensure that the inode uid/gid match values match the icdinode ones
Christoph Hellwig [Fri, 21 Feb 2020 16:31:26 +0000 (08:31 -0800)]
xfs: ensure that the inode uid/gid match values match the icdinode ones

Instead of only synchronizing the uid/gid values in xfs_setup_inode,
ensure that they always match to prepare for removing the icdinode
fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: improve error message when we can't allocate memory for xfs_buf
Darrick J. Wong [Fri, 21 Feb 2020 15:40:44 +0000 (07:40 -0800)]
xfs: improve error message when we can't allocate memory for xfs_buf

If xfs_buf_get_map can't allocate enough memory for the buffer it's
trying to create, it'll cough up an error about not being able to
allocate "pagesn".  That's not particularly helpful (and if we're really
out of memory the message is very spammy) so change the message to tell
us how many pages were actually requested, and ratelimit it too.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
4 years agoxfs: add agf freeblocks verify in xfs_agf_verify
Zheng Bin [Fri, 21 Feb 2020 15:38:20 +0000 (07:38 -0800)]
xfs: add agf freeblocks verify in xfs_agf_verify

We recently used fuzz(hydra) to test XFS and automatically generate
tmp.img(XFS v5 format, but some metadata is wrong)

xfs_repair information(just one AG):
agf_freeblks 0, counted 3224 in ag 0
agf_longest 536874136, counted 3224 in ag 0
sb_fdblocks 613, counted 3228

Test as follows:
mount tmp.img tmpdir
cp file1M tmpdir
sync

In 4.19-stable, sync will stuck, the reason is:
xfs_mountfs
  xfs_check_summary_counts
    if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
       XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
       !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
return 0;  -->just return, incore sb_fdblocks still be 613
    xfs_initialize_perag_data

cp file1M tmpdir -->ok(write file to pagecache)
sync -->stuck(write pagecache to disk)
xfs_map_blocks
  xfs_iomap_write_allocate
    while (count_fsb != 0) {
      nimaps = 0;
      while (nimaps == 0) { --> endless loop
         nimaps = 1;
         xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again
xfs_bmapi_write
  xfs_bmap_alloc
    xfs_bmap_btalloc
      xfs_alloc_vextent
        xfs_alloc_fix_freelist
          xfs_alloc_space_available -->fail(agf_freeblks is 0)

In linux-next, sync not stuck, cause commit c2b3164320b5 ("xfs:
use the latest extent at writeback delalloc conversion time") remove
the above while, dmesg is as follows:
[   55.250114] XFS (loop0): page discard on page ffffea0008bc7380, inode 0x1b0c, offset 0.

Users do not know why this page is discard, the better soultion is:
1. Like xfs_repair, make sure sb_fdblocks is equal to counted
(xfs_initialize_perag_data did this, who is not called at this mount)
2. Add agf verify, if fail, will tell users to repair

This patch use the second soultion.

Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Ren Xudong <renxudong1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoxfs: fix iclog release error check race with shutdown
Brian Foster [Fri, 21 Feb 2020 15:36:24 +0000 (07:36 -0800)]
xfs: fix iclog release error check race with shutdown

Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
l_icloglock held"), xlog_state_release_iclog() always performed a
locked check of the iclog error state before proceeding into the
sync state processing code. As of this commit, part of
xlog_state_release_iclog() was open-coded into
xfs_log_release_iclog() and as a result the locked error state check
was lost.

The lockless check still exists, but this doesn't account for the
possibility of a race with a shutdown being performed by another
task causing the iclog state to change while the original task waits
on ->l_icloglock. This has reproduced very rarely via generic/475
and manifests as an assert failure in __xlog_state_release_iclog()
due to an unexpected iclog state.

Restore the locked error state check in xlog_state_release_iclog()
to ensure that an iclog state update via shutdown doesn't race with
the iclog release state processing code.

Fixes: df732b29c807 ("xfs: call xlog_state_release_iclog with l_icloglock held")
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
4 years agoLinux 5.6-rc4 v5.6-rc4
Linus Torvalds [Sun, 1 Mar 2020 22:38:46 +0000 (16:38 -0600)]
Linux 5.6-rc4

4 years agoMerge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 1 Mar 2020 22:35:08 +0000 (16:35 -0600)]
Merge tag 'ext4_for_linus_stable' of git://git./linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Two more bug fixes (including a regression) for 5.6"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
  jbd2: fix data races at struct journal_head

4 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Sun, 1 Mar 2020 21:16:35 +0000 (15:16 -0600)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "More bugfixes, including a few remaining "make W=1" issues such as too
  large frame sizes on some configurations.

  On the ARM side, the compiler was messing up shadow stacks between EL1
  and EL2 code, which is easily fixed with __always_inline"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: check descriptor table exits on instruction emulation
  kvm: x86: Limit the number of "kvm: disabled by bios" messages
  KVM: x86: avoid useless copy of cpufreq policy
  KVM: allow disabling -Werror
  KVM: x86: allow compiling as non-module with W=1
  KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
  KVM: Introduce pv check helpers
  KVM: let declaration of kvm_get_running_vcpus match implementation
  KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
  arm64: Ask the compiler to __always_inline functions used by KVM at HYP
  KVM: arm64: Define our own swab32() to avoid a uapi static inline
  KVM: arm64: Ask the compiler to __always_inline functions used at HYP
  kvm: arm/arm64: Fold VHE entry/exit work into kvm_vcpu_run_vhe()
  KVM: arm/arm64: Fix up includes for trace.h

4 years agoKVM: VMX: check descriptor table exits on instruction emulation
Oliver Upton [Sat, 29 Feb 2020 19:30:14 +0000 (11:30 -0800)]
KVM: VMX: check descriptor table exits on instruction emulation

KVM emulates UMIP on hardware that doesn't support it by setting the
'descriptor table exiting' VM-execution control and performing
instruction emulation. When running nested, this emulation is broken as
KVM refuses to emulate L2 instructions by default.

Correct this regression by allowing the emulation of descriptor table
instructions if L1 hasn't requested 'descriptor table exiting'.

Fixes: 07721feee46b ("KVM: nVMX: Don't emulate instructions in guest mode")
Reported-by: Jan Kiszka <jan.kiszka@web.de>
Cc: stable@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 1 Mar 2020 01:16:46 +0000 (19:16 -0600)]
Merge branch 'i2c/for-current-fixed' of git://git./linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
 "I2C has three driver bugfixes for you. We agreed on the Mac regression
  to go in via I2C"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  macintosh: therm_windtunnel: fix regression when instantiating devices
  i2c: altera: Fix potential integer overflow
  i2c: jz4780: silence log flood on txabrt

4 years agoext4: potential crash on allocation error in ext4_alloc_flex_bg_array()
Dan Carpenter [Fri, 28 Feb 2020 09:22:56 +0000 (12:22 +0300)]
ext4: potential crash on allocation error in ext4_alloc_flex_bg_array()

If sbi->s_flex_groups_allocated is zero and the first allocation fails
then this code will crash.  The problem is that "i--" will set "i" to
-1 but when we compare "i >= sbi->s_flex_groups_allocated" then the -1
is type promoted to unsigned and becomes UINT_MAX.  Since UINT_MAX
is more than zero, the condition is true so we call kvfree(new_groups[-1]).
The loop will carry on freeing invalid memory until it crashes.

Fixes: 7c990728b99e ("ext4: fix potential race between s_flex_groups online resizing and access")
Reviewed-by: Suraj Jitindar Singh <surajjs@amazon.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20200228092142.7irbc44yaz3by7nb@kili.mountain
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
4 years agomacintosh: therm_windtunnel: fix regression when instantiating devices
Wolfram Sang [Tue, 25 Feb 2020 14:12:29 +0000 (15:12 +0100)]
macintosh: therm_windtunnel: fix regression when instantiating devices

Removing attach_adapter from this driver caused a regression for at
least some machines. Those machines had the sensors described in their
DT, too, so they didn't need manual creation of the sensor devices. The
old code worked, though, because manual creation came first. Creation of
DT devices then failed later and caused error logs, but the sensors
worked nonetheless because of the manually created devices.

When removing attach_adaper, manual creation now comes later and loses
the race. The sensor devices were already registered via DT, yet with
another binding, so the driver could not be bound to it.

This fix refactors the code to remove the race and only manually creates
devices if there are no DT nodes present. Also, the DT binding is updated
to match both, the DT and manually created devices. Because we don't
know which device creation will be used at runtime, the code to start
the kthread is moved to do_probe() which will be called by both methods.

Fixes: 3e7bed52719d ("macintosh: therm_windtunnel: drop using attach_adapter")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201723
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Tested-by: Erhard Furtner <erhard_f@mailbox.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: stable@kernel.org # v4.19+
4 years agojbd2: fix data races at struct journal_head
Qian Cai [Sat, 22 Feb 2020 04:31:11 +0000 (23:31 -0500)]
jbd2: fix data races at struct journal_head

journal_head::b_transaction and journal_head::b_next_transaction could
be accessed concurrently as noticed by KCSAN,

 LTP: starting fsync04
 /dev/zero: Can't open blockdev
 EXT4-fs (loop0): mounting ext3 file system using the ext4 subsystem
 EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null)
 ==================================================================
 BUG: KCSAN: data-race in __jbd2_journal_refile_buffer [jbd2] / jbd2_write_access_granted [jbd2]

 write to 0xffff99f9b1bd0e30 of 8 bytes by task 25721 on cpu 70:
  __jbd2_journal_refile_buffer+0xdd/0x210 [jbd2]
  __jbd2_journal_refile_buffer at fs/jbd2/transaction.c:2569
  jbd2_journal_commit_transaction+0x2d15/0x3f20 [jbd2]
  (inlined by) jbd2_journal_commit_transaction at fs/jbd2/commit.c:1034
  kjournald2+0x13b/0x450 [jbd2]
  kthread+0x1cd/0x1f0
  ret_from_fork+0x27/0x50

 read to 0xffff99f9b1bd0e30 of 8 bytes by task 25724 on cpu 68:
  jbd2_write_access_granted+0x1b2/0x250 [jbd2]
  jbd2_write_access_granted at fs/jbd2/transaction.c:1155
  jbd2_journal_get_write_access+0x2c/0x60 [jbd2]
  __ext4_journal_get_write_access+0x50/0x90 [ext4]
  ext4_mb_mark_diskspace_used+0x158/0x620 [ext4]
  ext4_mb_new_blocks+0x54f/0xca0 [ext4]
  ext4_ind_map_blocks+0xc79/0x1b40 [ext4]
  ext4_map_blocks+0x3b4/0x950 [ext4]
  _ext4_get_block+0xfc/0x270 [ext4]
  ext4_get_block+0x3b/0x50 [ext4]
  __block_write_begin_int+0x22e/0xae0
  __block_write_begin+0x39/0x50
  ext4_write_begin+0x388/0xb50 [ext4]
  generic_perform_write+0x15d/0x290
  ext4_buffered_write_iter+0x11f/0x210 [ext4]
  ext4_file_write_iter+0xce/0x9e0 [ext4]
  new_sync_write+0x29c/0x3b0
  __vfs_write+0x92/0xa0
  vfs_write+0x103/0x260
  ksys_write+0x9d/0x130
  __x64_sys_write+0x4c/0x60
  do_syscall_64+0x91/0xb05
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 5 locks held by fsync04/25724:
  #0: ffff99f9911093f8 (sb_writers#13){.+.+}, at: vfs_write+0x21c/0x260
  #1: ffff99f9db4c0348 (&sb->s_type->i_mutex_key#15){+.+.}, at: ext4_buffered_write_iter+0x65/0x210 [ext4]
  #2: ffff99f5e7dfcf58 (jbd2_handle){++++}, at: start_this_handle+0x1c1/0x9d0 [jbd2]
  #3: ffff99f9db4c0168 (&ei->i_data_sem){++++}, at: ext4_map_blocks+0x176/0x950 [ext4]
  #4: ffffffff99086b40 (rcu_read_lock){....}, at: jbd2_write_access_granted+0x4e/0x250 [jbd2]
 irq event stamp: 1407125
 hardirqs last  enabled at (1407125): [<ffffffff980da9b7>] __find_get_block+0x107/0x790
 hardirqs last disabled at (1407124): [<ffffffff980da8f9>] __find_get_block+0x49/0x790
 softirqs last  enabled at (1405528): [<ffffffff98a0034c>] __do_softirq+0x34c/0x57c
 softirqs last disabled at (1405521): [<ffffffff97cc67a2>] irq_exit+0xa2/0xc0

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 68 PID: 25724 Comm: fsync04 Tainted: G L 5.6.0-rc2-next-20200221+ #7
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019

The plain reads are outside of jh->b_state_lock critical section which result
in data races. Fix them by adding pairs of READ|WRITE_ONCE().

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/r/20200222043111.2227-1-cai@lca.pw
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
4 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Sat, 29 Feb 2020 15:58:47 +0000 (09:58 -0600)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Four small fixes.

  Three are in drivers for fairly obvious bugs. The fourth is a set of
  regressions introduced by the compat_ioctl changes because some of the
  compat updates wrongly replaced .ioctl instead of .compat_ioctl"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: compat_ioctl: cdrom: Replace .ioctl with .compat_ioctl in four appropriate places
  scsi: zfcp: fix wrong data and display format of SFP+ temperature
  scsi: sd_sbc: Fix sd_zbc_report_zones()
  scsi: libfc: free response frame from GPN_ID

4 years agoMerge tag 'pci-v5.6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Linus Torvalds [Fri, 28 Feb 2020 19:51:53 +0000 (11:51 -0800)]
Merge tag 'pci-v5.6-fixes-2' of git://git./linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Fix build issue on 32-bit ARM with old compilers (Marek Szyprowski)

 - Update MAINTAINERS for recent Cadence driver file move (Lukas
   Bulwahn)

* tag 'pci-v5.6-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  MAINTAINERS: Correct Cadence PCI driver path
  PCI: brcmstb: Fix build on 32bit ARM platforms with older compilers

4 years agoMerge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 28 Feb 2020 19:43:30 +0000 (11:43 -0800)]
Merge tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:

 - Passthrough insertion fix (Ming)

 - Kill off some unused arguments (John)

 - blktrace RCU fix (Jan)

 - Dead fields removal for null_blk (Dongli)

 - NVMe polled IO fix (Bijan)

* tag 'block-5.6-2020-02-28' of git://git.kernel.dk/linux-block:
  nvme-pci: Hold cq_poll_lock while completing CQEs
  blk-mq: Remove some unused function arguments
  null_blk: remove unused fields in 'nullb_cmd'
  blktrace: Protect q->blk_trace with RCU
  blk-mq: insert passthrough request into hctx->dispatch directly

4 years agoMerge tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block
Linus Torvalds [Fri, 28 Feb 2020 19:39:14 +0000 (11:39 -0800)]
Merge tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:

 - Fix for a race with IOPOLL used with SQPOLL (Xiaoguang)

 - Only show ->fdinfo if procfs is enabled (Tobias)

 - Fix for a chain with multiple personalities in the SQEs

 - Fix for a missing free of personality idr on exit

 - Removal of the spin-for-work optimization

 - Fix for next work lookup on request completion

 - Fix for non-vec read/write result progation in case of links

 - Fix for a fileset references on switch

 - Fix for a recvmsg/sendmsg 32-bit compatability mode

* tag 'io_uring-5.6-2020-02-28' of git://git.kernel.dk/linux-block:
  io_uring: fix 32-bit compatability with sendmsg/recvmsg
  io_uring: define and set show_fdinfo only if procfs is enabled
  io_uring: drop file set ref put/get on switch
  io_uring: import_single_range() returns 0/-ERROR
  io_uring: pick up link work on submit reference drop
  io-wq: ensure work->task_pid is cleared on init
  io-wq: remove spin-for-work optimization
  io_uring: fix poll_list race for SETUP_IOPOLL|SETUP_SQPOLL
  io_uring: fix personality idr leak
  io_uring: handle multiple personalities in link chains

4 years agoMerge branch 'nvme-5.6-rc4' of git://git.infradead.org/nvme into block-5.6
Jens Axboe [Fri, 28 Feb 2020 17:02:36 +0000 (10:02 -0700)]
Merge branch 'nvme-5.6-rc4' of git://git.infradead.org/nvme into block-5.6

Pull NVMe fix from Keith.

* 'nvme-5.6-rc4' of git://git.infradead.org/nvme:
  nvme-pci: Hold cq_poll_lock while completing CQEs

4 years agoMerge tag 'acpi-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 28 Feb 2020 17:02:18 +0000 (09:02 -0800)]
Merge tag 'acpi-5.6-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "Fix a couple of configuration issues in the ACPI watchdog (WDAT)
  driver (Mika Westerberg) and make it possible to disable that driver
  at boot time in case it still does not work as expected (Jean
  Delvare)"

* tag 'acpi-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: watchdog: Set default timeout in probe
  ACPI: watchdog: Fix gas->access_width usage
  ACPICA: Introduce ACPI_ACCESS_BYTE_WIDTH() macro
  ACPI: watchdog: Allow disabling WDAT at boot

4 years agoMerge tag 'pm-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 28 Feb 2020 16:49:52 +0000 (08:49 -0800)]
Merge tag 'pm-5.6-rc4' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "Fix a recent cpufreq initialization regression (Rafael Wysocki),
  revert a devfreq commit that made incompatible changes and broke user
  land on some systems (Orson Zhai), drop a stale reference to a
  document that has gone away recently (Jonathan Neuschäfer), and fix a
  typo in a hibernation code comment (Alexandre Belloni)"

* tag 'pm-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: Fix policy initialization for internal governor drivers
  Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"
  PM / hibernate: fix typo "reserverd_size" -> "reserved_size"
  Documentation: power: Drop reference to interface.rst

4 years agoMerge tag 'zonefs-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal...
Linus Torvalds [Fri, 28 Feb 2020 16:34:47 +0000 (08:34 -0800)]
Merge tag 'zonefs-5.6-rc4' of git://git./linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:
 "Two fixes in here:

   - Revert the initial decision to silently ignore IOCB_NOWAIT for
     asynchronous direct IOs to sequential zone files. Instead, return
     an error to the user to signal that the feature is not supported
     (from Christoph)

   - A fix to zonefs Kconfig to select FS_IOMAP to avoid build failures
     if no other file system already selected this option (from
     Johannes)"

* tag 'zonefs-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: select FS_IOMAP
  zonefs: fix IOCB_NOWAIT handling

4 years agoMerge tag 'kvmarm-fixes-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmar...
Paolo Bonzini [Fri, 28 Feb 2020 10:46:59 +0000 (11:46 +0100)]
Merge tag 'kvmarm-fixes-5.6-1' of git://git./linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm fixes for 5.6, take #1

- Fix compilation on 32bit
- Move  VHE guest entry/exit into the VHE-specific entry code
- Make sure all functions called by the non-VHE HYP code is tagged as __always_inline

4 years agokvm: x86: Limit the number of "kvm: disabled by bios" messages
Erwan Velu [Thu, 27 Feb 2020 18:00:46 +0000 (19:00 +0100)]
kvm: x86: Limit the number of "kvm: disabled by bios" messages

In older version of systemd(219), at boot time, udevadm is called with :
/usr/bin/udevadm trigger --type=devices --action=add"

This program generates an echo "add" in /sys/devices/system/cpu/cpu<x>/uevent,
leading to the "kvm: disabled by bios" message in case of your Bios disabled
the virtualization extensions.

On a modern system running up to 256 CPU threads, this pollutes the Kernel logs.

This patch offers to ratelimit this message to avoid any userspace program triggering
this uevent printing this message too often.

This patch is only a workaround but greatly reduce the pollution without
breaking the current behavior of printing a message if some try to instantiate
KVM on a system that doesn't support it.

Note that recent versions of systemd (>239) do not have trigger this behavior.

This patch will be useful at least for some using older systemd with recent Kernels.

Signed-off-by: Erwan Velu <e.velu@criteo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge branches 'pm-sleep' and 'pm-devfreq'
Rafael J. Wysocki [Fri, 28 Feb 2020 10:00:50 +0000 (11:00 +0100)]
Merge branches 'pm-sleep' and 'pm-devfreq'

* pm-sleep:
  PM / hibernate: fix typo "reserverd_size" -> "reserved_size"
  Documentation: power: Drop reference to interface.rst

* pm-devfreq:
  Revert "PM / devfreq: Modify the device name as devfreq(X) for sysfs"

4 years agoKVM: x86: avoid useless copy of cpufreq policy
Paolo Bonzini [Fri, 28 Feb 2020 09:49:10 +0000 (10:49 +0100)]
KVM: x86: avoid useless copy of cpufreq policy

struct cpufreq_policy is quite big and it is not a good idea
to allocate one on the stack.  Just use cpufreq_cpu_get and
cpufreq_cpu_put which is even simpler.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: allow disabling -Werror
Paolo Bonzini [Fri, 28 Feb 2020 09:42:31 +0000 (10:42 +0100)]
KVM: allow disabling -Werror

Restrict -Werror to well-tested configurations and allow disabling it
via Kconfig.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: x86: allow compiling as non-module with W=1
Valdis Klētnieks [Fri, 28 Feb 2020 02:49:52 +0000 (21:49 -0500)]
KVM: x86: allow compiling as non-module with W=1

Compile error with CONFIG_KVM_INTEL=y and W=1:

  CC      arch/x86/kvm/vmx/vmx.o
arch/x86/kvm/vmx/vmx.c:68:32: error: 'vmx_cpu_id' defined but not used [-Werror=unused-const-variable=]
   68 | static const struct x86_cpu_id vmx_cpu_id[] = {
      |                                ^~~~~~~~~~
cc1: all warnings being treated as errors

When building with =y, the MODULE_DEVICE_TABLE macro doesn't generate a
reference to the structure (or any code at all).  This makes W=1 compiles
unhappy.

Wrap both in a #ifdef to avoid the issue.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
[Do the same for CONFIG_KVM_AMD. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis
Wanpeng Li [Tue, 18 Feb 2020 01:08:24 +0000 (09:08 +0800)]
KVM: Pre-allocate 1 cpumask variable per cpu for both pv tlb and pv ipis

Nick Desaulniers Reported:

  When building with:
  $ make CC=clang arch/x86/ CFLAGS=-Wframe-larger-than=1000
  The following warning is observed:
  arch/x86/kernel/kvm.c:494:13: warning: stack frame size of 1064 bytes in
  function 'kvm_send_ipi_mask_allbutself' [-Wframe-larger-than=]
  static void kvm_send_ipi_mask_allbutself(const struct cpumask *mask, int
  vector)
              ^
  Debugging with:
  https://github.com/ClangBuiltLinux/frame-larger-than
  via:
  $ python3 frame_larger_than.py arch/x86/kernel/kvm.o \
    kvm_send_ipi_mask_allbutself
  points to the stack allocated `struct cpumask newmask` in
  `kvm_send_ipi_mask_allbutself`. The size of a `struct cpumask` is
  potentially large, as it's CONFIG_NR_CPUS divided by BITS_PER_LONG for
  the target architecture. CONFIG_NR_CPUS for X86_64 can be as high as
  8192, making a single instance of a `struct cpumask` 1024 B.

This patch fixes it by pre-allocate 1 cpumask variable per cpu and use it for
both pv tlb and pv ipis..

Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: Introduce pv check helpers
Wanpeng Li [Tue, 18 Feb 2020 01:08:23 +0000 (09:08 +0800)]
KVM: Introduce pv check helpers

Introduce some pv check helpers for consistency.

Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: let declaration of kvm_get_running_vcpus match implementation
Christian Borntraeger [Fri, 28 Feb 2020 08:49:41 +0000 (09:49 +0100)]
KVM: let declaration of kvm_get_running_vcpus match implementation

Sparse notices that declaration and implementation do not match:
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17: warning: incorrect type in return expression (different address spaces)
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17:    expected struct kvm_vcpu [noderef] <asn:3> **
arch/s390/kvm/../../../virt/kvm/kvm_main.c:4435:17:    got struct kvm_vcpu *[noderef] <asn:3> *

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoKVM: SVM: allocate AVIC data structures based on kvm_amd module parameter
Paolo Bonzini [Tue, 25 Feb 2020 07:54:26 +0000 (08:54 +0100)]
KVM: SVM: allocate AVIC data structures based on kvm_amd module parameter

Even if APICv is disabled at startup, the backing page and ir_list need
to be initialized in case they are needed later.  The only case in
which this can be skipped is for userspace irqchip, and that must be
done because avic_init_backing_page dereferences vcpu->arch.apic
(which is NULL for userspace irqchip).

Tested-by: rmuncrief@humanavance.com
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=206579
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
4 years agoMerge tag 'drm-fixes-2020-02-28' of git://anongit.freedesktop.org/drm/drm
Linus Torvalds [Fri, 28 Feb 2020 05:52:18 +0000 (21:52 -0800)]
Merge tag 'drm-fixes-2020-02-28' of git://anongit.freedesktop.org/drm/drm

Pull drm fixes from Dave Airlie:
 "Just some fixes for this week: amdgpu, radeon and i915.

  The main i915 one is a regression Gen7 (Ivybridge/Haswell), this moves
  them back from trying to use the full-ppgtt support to the aliasing
  version it used to use due to gpu hangs. Otherwise it's pretty quiet.

  amdgpu:
   - Drop DRIVER_USE_AGP
   - Fix memory leak in GPU reset
   - Resume fix for raven

  radeon:
   - Drop DRIVER_USE_AGP

  i915:
   - downgrade gen7 back to aliasing-ppgtt to avoid GPU hangs
   - shrinker fix
   - pmu leak and double free fixes
   - gvt user after free and virtual display reset fixes
   - randconfig build fix"

* tag 'drm-fixes-2020-02-28' of git://anongit.freedesktop.org/drm/drm:
  drm/radeon: Inline drm_get_pci_dev
  drm/amdgpu: Drop DRIVER_USE_AGP
  drm/i915: Avoid recursing onto active vma from the shrinker
  drm/i915/pmu: Avoid using globals for PMU events
  drm/i915/pmu: Avoid using globals for CPU hotplug state
  drm/i915/gtt: Downgrade gen7 (ivb, byt, hsw) back to aliasing-ppgtt
  drm/i915: fix header test with GCOV
  amdgpu/gmc_v9: save/restore sdpif regs during S3
  drm/amdgpu: fix memory leak during TDR test(v2)
  drm/i915/gvt: Fix orphan vgpu dmabuf_objs' lifetime
  drm/i915/gvt: Separate display reset from ALL_ENGINES reset

4 years agoMerge tag 'drm-intel-fixes-2020-02-27' of git://anongit.freedesktop.org/drm/drm-intel...
Dave Airlie [Fri, 28 Feb 2020 02:40:41 +0000 (12:40 +1000)]
Merge tag 'drm-intel-fixes-2020-02-27' of git://anongit.freedesktop.org/drm/drm-intel into drm-fixes

drm/i915 fixes for v5.6-rc4:
- downgrade gen7 back to aliasing-ppgtt to avoid GPU hangs
- shrinker fix
- pmu leak and double free fixes
- gvt user after free and virtual display reset fixes
- randconfig build fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/874kvcsh00.fsf@intel.com