OSDN Git Service

android-x86/kernel.git
6 years agoproc: Make inline name size calculation automatic
David Howells [Wed, 13 Jun 2018 18:43:19 +0000 (19:43 +0100)]
proc: Make inline name size calculation automatic

Make calculation of the size of the inline name in struct proc_dir_entry
automatic, rather than having to manually encode the numbers and failing to
allow for lockdep.

Require a minimum inline name size of 33+1 to allow for names that look
like two hex numbers with a dash between.

Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoMerge tag 'afs-fixes-20180514' into afs-proc
Al Viro [Sat, 2 Jun 2018 22:08:11 +0000 (18:08 -0400)]
Merge tag 'afs-fixes-20180514' into afs-proc

backmerge AFS fixes that went into mainline and deal with
the conflict in fs/afs/fsclient.c

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoafs: Implement network namespacing
David Howells [Fri, 18 May 2018 10:46:15 +0000 (11:46 +0100)]
afs: Implement network namespacing

Implement network namespacing within AFS, but don't yet let mounts occur
outside the init namespace.  An additional patch will be required propagate
the network namespace across automounts.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Mark afs_net::ws_cell as __rcu and set using rcu functions
David Howells [Wed, 23 May 2018 10:51:29 +0000 (11:51 +0100)]
afs: Mark afs_net::ws_cell as __rcu and set using rcu functions

The afs_net::ws_cell member is sometimes used under RCU conditions from
within an seq-readlock.  It isn't, however, marked __rcu and it isn't set
using the proper RCU barrier-imposing functions.

Fix this by annotating it with __rcu and using appropriate barriers to
make sure accesses are correctly ordered.

Without this, the code can produce the following warning:

>> fs/afs/proc.c:151:24: sparse: incompatible types in comparison expression (different address spaces)

Fixes: f044c8847bb6 ("afs: Lay the groundwork for supporting network namespaces")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()
David Howells [Wed, 23 May 2018 10:32:06 +0000 (11:32 +0100)]
afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus()

Sparse doesn't appear able to handle the conditionally-taken locks in
xdr_decode_AFSFetchStatus(), even though the lock and unlock are both
contingent on the same unvarying function argument.

Deal with this by interpolating a wrapper function that takes the lock if
needed and calls xdr_decode_AFSFetchStatus() on two separate branches, one
with the lock held and one without.

This allows Sparse to work out the locking.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoproc: Add a way to make network proc files writable
David Howells [Fri, 18 May 2018 10:46:15 +0000 (11:46 +0100)]
proc: Add a way to make network proc files writable

Provide two extra functions, proc_create_net_data_write() and
proc_create_net_single_write() that act like their non-write versions but
also set a write method in the proc_dir_entry struct.

An internal simple write function is provided that will copy its buffer and
hand it to the pde->write() method if available (or give an error if not).
The buffer may be modified by the write method.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Rearrange fs/afs/proc.c to remove remaining predeclarations.
David Howells [Fri, 18 May 2018 10:46:15 +0000 (11:46 +0100)]
afs: Rearrange fs/afs/proc.c to remove remaining predeclarations.

Rearrange fs/afs/proc.c to get rid of all the remaining predeclarations.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Rearrange fs/afs/proc.c to move the show routines up
David Howells [Fri, 18 May 2018 10:46:14 +0000 (11:46 +0100)]
afs: Rearrange fs/afs/proc.c to move the show routines up

Rearrange fs/afs/proc.c to move the show routines up to the top of each
block so the order is show, iteration, ops, file ops, fops.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Rearrange fs/afs/proc.c by moving fops and open functions down
David Howells [Fri, 18 May 2018 10:46:14 +0000 (11:46 +0100)]
afs: Rearrange fs/afs/proc.c by moving fops and open functions down

Rearrange fs/afs/proc.c by moving fops and open functions down so as to
remove predeclarations.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Move /proc management functions to the end of the file
David Howells [Fri, 18 May 2018 10:46:14 +0000 (11:46 +0100)]
afs: Move /proc management functions to the end of the file

In fs/afs/proc.c, move functions that create and remove /proc files to the
end of the source file as a first stage in getting rid of all the forward
declarations.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoisdn/gigaset: add back gigaset_procinfo assignment
Christoph Hellwig [Thu, 17 May 2018 06:45:15 +0000 (08:45 +0200)]
isdn/gigaset: add back gigaset_procinfo assignment

Fixes: 2cd1f0ddbb56 ("isdn: replace ->proc_fops with ->proc_show")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoproc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
Christoph Hellwig [Tue, 15 May 2018 14:06:04 +0000 (16:06 +0200)]
proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields

This makes Alexey happy and Al groan.  Based on a patch from
Alexey Dobriyan.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agotty: replace ->proc_fops with ->proc_show
Christoph Hellwig [Fri, 13 Apr 2018 19:04:45 +0000 (21:04 +0200)]
tty: replace ->proc_fops with ->proc_show

Just set up the show callback in the tty_operations, and use
proc_create_single_data to create the file without additional
boilerplace code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoide: replace ->proc_fops with ->proc_show
Christoph Hellwig [Fri, 13 Apr 2018 19:25:54 +0000 (21:25 +0200)]
ide: replace ->proc_fops with ->proc_show

Just set up the show callback in the tty_operations, and use
proc_create_single_data to create the file without additional
boilerplace code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoide: remove ide_driver_proc_write
Christoph Hellwig [Fri, 13 Apr 2018 19:22:21 +0000 (21:22 +0200)]
ide: remove ide_driver_proc_write

The driver proc file hasn't been writeable for a long time, so this is
just dead code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
6 years agoisdn: replace ->proc_fops with ->proc_show
Christoph Hellwig [Wed, 11 Apr 2018 16:39:29 +0000 (18:39 +0200)]
isdn: replace ->proc_fops with ->proc_show

And switch to proc_create_single_data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoatm: switch to proc_create_seq_private
Christoph Hellwig [Wed, 18 Apr 2018 04:09:21 +0000 (06:09 +0200)]
atm: switch to proc_create_seq_private

And remove proc boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoatm: simplify procfs code
Christoph Hellwig [Sun, 15 Apr 2018 08:53:36 +0000 (10:53 +0200)]
atm: simplify procfs code

Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agobluetooth: switch to proc_create_seq_data
Christoph Hellwig [Sun, 15 Apr 2018 08:27:22 +0000 (10:27 +0200)]
bluetooth: switch to proc_create_seq_data

And use proc private data directly instead of doing a detour
through seq->private and private state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agonetfilter/x_tables: switch to proc_create_seq_private
Christoph Hellwig [Sun, 15 Apr 2018 08:36:56 +0000 (10:36 +0200)]
netfilter/x_tables: switch to proc_create_seq_private

And remove proc boilerplate code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agonetfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
Christoph Hellwig [Sun, 15 Apr 2018 08:22:23 +0000 (10:22 +0200)]
netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data

And use proc private data directly instead of doing a detour
through seq->private.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoneigh: switch to proc_create_seq_data
Christoph Hellwig [Sun, 15 Apr 2018 08:16:41 +0000 (10:16 +0200)]
neigh: switch to proc_create_seq_data

And use proc private data directly instead of doing a detour
through seq->private.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agohostap: switch to proc_create_{seq,single}_data
Christoph Hellwig [Wed, 11 Apr 2018 09:05:22 +0000 (11:05 +0200)]
hostap: switch to proc_create_{seq,single}_data

And use proc private data directly instead of doing a detour
through seq->private.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agobonding: switch to proc_create_seq_data
Christoph Hellwig [Wed, 11 Apr 2018 09:00:32 +0000 (11:00 +0200)]
bonding: switch to proc_create_seq_data

And use proc private data directly instead of doing a detour
through seq->private.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agortc/proc: switch to proc_create_single_data
Christoph Hellwig [Wed, 11 Apr 2018 16:22:20 +0000 (18:22 +0200)]
rtc/proc: switch to proc_create_single_data

And stop trying to get a reference on the submodule, procfs code deals
with release after an unloaded module and thus removed proc entry.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
6 years agodrbd: switch to proc_create_single
Christoph Hellwig [Wed, 11 Apr 2018 14:46:11 +0000 (16:46 +0200)]
drbd: switch to proc_create_single

And stop messing with try_module_get on THIS_MODULE, which doesn't make
any sense here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoresource: switch to proc_create_seq_data
Christoph Hellwig [Wed, 11 Apr 2018 09:52:39 +0000 (11:52 +0200)]
resource: switch to proc_create_seq_data

And use the root resource directly from the proc private data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agostaging/rtl8192u: simplify procfs code
Christoph Hellwig [Wed, 11 Apr 2018 16:16:54 +0000 (18:16 +0200)]
staging/rtl8192u: simplify procfs code

Unwind the registration loop into individual calls.  Switch to use
proc_create_single where applicable.

Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agojfs: simplify procfs code
Christoph Hellwig [Wed, 11 Apr 2018 14:51:18 +0000 (16:51 +0200)]
jfs: simplify procfs code

Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoext4: simplify procfs code
Christoph Hellwig [Wed, 11 Apr 2018 09:37:23 +0000 (11:37 +0200)]
ext4: simplify procfs code

Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoafs: simplify procfs code
Christoph Hellwig [Fri, 13 Apr 2018 18:45:09 +0000 (20:45 +0200)]
afs: simplify procfs code

Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agosg: simplify procfs code
Christoph Hellwig [Wed, 11 Apr 2018 09:26:22 +0000 (11:26 +0200)]
sg: simplify procfs code

Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agomegaraid: simplify procfs code
Christoph Hellwig [Fri, 13 Apr 2018 18:57:56 +0000 (20:57 +0200)]
megaraid: simplify procfs code

Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_single.

Also don't bother handling proc_create* failures - the driver works
perfectly fine without the proc files, and the cleanup will handle
missing files gracefully.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agosgi-gru: simplify procfs code
Christoph Hellwig [Tue, 10 Apr 2018 15:47:15 +0000 (17:47 +0200)]
sgi-gru: simplify procfs code

Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoacpi/battery: simplify procfs code
Christoph Hellwig [Wed, 11 Apr 2018 14:27:14 +0000 (16:27 +0200)]
acpi/battery: simplify procfs code

Use remove_proc_subtree to remove the whole subtree on cleanup, and
unwind the registration loop into individual calls.  Switch to use
proc_create_seq where applicable.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
6 years agoproc: introduce proc_create_net_single
Christoph Hellwig [Fri, 13 Apr 2018 18:38:35 +0000 (20:38 +0200)]
proc: introduce proc_create_net_single

Variant of proc_create_data that directly take a seq_file show
callback and deals with network namespaces in ->open and ->release.
All callers of proc_create + single_open_net converted over, and
single_{open,release}_net are removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoproc: introduce proc_create_net{,_data}
Christoph Hellwig [Tue, 10 Apr 2018 17:42:55 +0000 (19:42 +0200)]
proc: introduce proc_create_net{,_data}

Variants of proc_create{,_data} that directly take a struct seq_operations
and deal with network namespaces in ->open and ->release.  All callers of
proc_create + seq_open_net converted over, and seq_{open,release}_net are
removed entirely.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agonet: move seq_file_single_net to <linux/seq_file_net.h>
Christoph Hellwig [Wed, 11 Apr 2018 10:32:45 +0000 (12:32 +0200)]
net: move seq_file_single_net to <linux/seq_file_net.h>

This helper deals with single_{open,release}_net internals and thus
belongs here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agonetfilter/x_tables: simplify ѕeq_file code
Christoph Hellwig [Tue, 10 Apr 2018 19:59:39 +0000 (21:59 +0200)]
netfilter/x_tables: simplify ѕeq_file code

Just use the address family from the proc private data instead of copying
it into per-file data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agonet/kcm: simplify proc registration
Christoph Hellwig [Tue, 10 Apr 2018 17:53:58 +0000 (19:53 +0200)]
net/kcm: simplify proc registration

Remove a couple indirections to make the code look like most other
protocols.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoipv6/flowlabel: simplify pid namespace lookup
Christoph Hellwig [Wed, 11 Apr 2018 08:01:30 +0000 (10:01 +0200)]
ipv6/flowlabel: simplify pid namespace lookup

The code should be using the pid namespace from the procfs mount
instead of trying to look it up during open.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoipv{4,6}/raw: simplify ѕeq_file code
Christoph Hellwig [Tue, 10 Apr 2018 20:08:28 +0000 (22:08 +0200)]
ipv{4,6}/raw: simplify ѕeq_file code

Pass the hashtable to the proc private data instead of copying
it into the per-file private data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoipv{4,6}/ping: simplify proc file creation
Christoph Hellwig [Tue, 10 Apr 2018 18:04:20 +0000 (20:04 +0200)]
ipv{4,6}/ping: simplify proc file creation

Remove the pointless ping_seq_afinfo indirection and make the code look
like most other protocols.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoipv{4,6}/tcp: simplify procfs registration
Christoph Hellwig [Wed, 11 Apr 2018 07:31:28 +0000 (09:31 +0200)]
ipv{4,6}/tcp: simplify procfs registration

Avoid most of the afinfo indirections and just call the proc helpers
directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoipv{4,6}/udp{,lite}: simplify proc registration
Christoph Hellwig [Tue, 10 Apr 2018 19:31:50 +0000 (21:31 +0200)]
ipv{4,6}/udp{,lite}: simplify proc registration

Remove a couple indirections to make the code look like most other
protocols.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoproc: introduce proc_create_single{,_data}
Christoph Hellwig [Tue, 15 May 2018 13:57:23 +0000 (15:57 +0200)]
proc: introduce proc_create_single{,_data}

Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoproc: introduce proc_create_seq_private
Christoph Hellwig [Tue, 24 Apr 2018 15:05:17 +0000 (17:05 +0200)]
proc: introduce proc_create_seq_private

Variant of proc_create_data that directly take a struct seq_operations
argument + a private state size and drastically reduces the boilerplate
code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoproc: introduce proc_create_seq{,_data}
Christoph Hellwig [Fri, 13 Apr 2018 17:44:18 +0000 (19:44 +0200)]
proc: introduce proc_create_seq{,_data}

Variants of proc_create{,_data} that directly take a struct seq_operations
argument and drastically reduces the boilerplate code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoproc: add a proc_create_reg helper
Christoph Hellwig [Tue, 24 Apr 2018 15:08:36 +0000 (17:08 +0200)]
proc: add a proc_create_reg helper

Common code for creating a regular file.  Factor out of proc_create_data, to
be reused by other functions soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoproc: simplify proc_register calling conventions
Christoph Hellwig [Tue, 24 Apr 2018 15:00:52 +0000 (17:00 +0200)]
proc: simplify proc_register calling conventions

Return registered entry on success, return NULL on failure and free the
passed in entry.  Also expose it in internal.h as we'll start using it
in proc_net.c soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoproc: don't detour through seq->private to get the inode
Christoph Hellwig [Wed, 16 May 2018 05:21:53 +0000 (07:21 +0200)]
proc: don't detour through seq->private to get the inode

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoproc: introduce a proc_pid_ns helper
Christoph Hellwig [Wed, 16 May 2018 05:19:01 +0000 (07:19 +0200)]
proc: introduce a proc_pid_ns helper

Factor out retrieving the per-sb pid namespaces from the sb private data
into an easier to understand helper.

Suggested-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoafs: Fix the non-encryption of calls
David Howells [Thu, 10 May 2018 22:10:40 +0000 (23:10 +0100)]
afs: Fix the non-encryption of calls

Some AFS servers refuse to accept unencrypted traffic, so can't be accessed
with kAFS.  Set the AF_RXRPC security level to encrypt client calls to deal
with this.

Note that incoming service calls are set by the remote client and so aren't
affected by this.

This requires an AF_RXRPC patch to pass the value set by setsockopt to calls
begun by the kernel.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix CB.CallBack handling
David Howells [Fri, 11 May 2018 23:28:58 +0000 (00:28 +0100)]
afs: Fix CB.CallBack handling

The handling of CB.CallBack messages sent by the fileserver to the client
is broken in that they are currently being processed after the reply has
been transmitted.

This is not what the fileserver expects, however.  It holds up change
visibility until the reply comes so as to maintain cache coherency, and so
expects the client to have to refetch the state on the affected files.

Fix CB.CallBack handling to perform the callback break before sending the
reply.

The fileserver is free to hold up status fetches issued by other threads on
the same client that occur in reponse to the callback until any pending
changes have been committed.

Fixes: d001648ec7cf ("rxrpc: Don't expose skbs to in-kernel users [ver #2]")
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix whole-volume callback handling
David Howells [Sat, 12 May 2018 21:31:33 +0000 (22:31 +0100)]
afs: Fix whole-volume callback handling

It's possible for an AFS file server to issue a whole-volume notification
that callbacks on all the vnodes in the file have been broken.  This is
done for R/O and backup volumes (which don't have per-file callbacks) and
for things like a volume being taken offline.

Fix callback handling to detect whole-volume notifications, to track it
across operations and to check it during inode validation.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix afs_find_server search loop
Marc Dionne [Sat, 12 May 2018 00:35:06 +0000 (21:35 -0300)]
afs: Fix afs_find_server search loop

The code that looks up servers by addresses makes the assumption
that the list of addresses for a server is sorted.  It exits the
loop if it finds that the target address is larger than the
current candidate.  As the list is not currently sorted, this
can lead to a failure to find a matching server, which can cause
callbacks from that server to be ignored.

Remove the early exit case so that the complete list is searched.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix the handling of an unfound server in CM operations
David Howells [Fri, 11 May 2018 22:45:40 +0000 (23:45 +0100)]
afs: Fix the handling of an unfound server in CM operations

If the client cache manager operations that need the server record
(CB.Callback, CB.InitCallBackState, and CB.InitCallBackState3) can't find
the server record, they abort the call from the file server with
RX_CALL_DEAD when they should return okay.

Fixes: c35eccb1f614 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Add a tracepoint to record callbacks from unlisted servers
David Howells [Fri, 11 May 2018 21:59:42 +0000 (22:59 +0100)]
afs: Add a tracepoint to record callbacks from unlisted servers

Add a tracepoint to record callbacks from servers for which we don't have a
record.

Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix the handling of CB.InitCallBackState3 to find the server by UUID
David Howells [Fri, 11 May 2018 22:21:35 +0000 (23:21 +0100)]
afs: Fix the handling of CB.InitCallBackState3 to find the server by UUID

Fix the handling of the CB.InitCallBackState3 service call to find the
record of a server that we're using by looking it up by the UUID passed as
the parameter rather than by its address (of which it might have many, and
which may change).

Fixes: c35eccb1f614 ("[AFS]: Implement the CB.InitCallBackState3 operation.")
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix VNOVOL handling in address rotation
David Howells [Fri, 11 May 2018 21:55:59 +0000 (22:55 +0100)]
afs: Fix VNOVOL handling in address rotation

If a volume location record lists multiple file servers for a volume, then
it's possible that due to a misconfiguration or a changing configuration
that one of the file servers doesn't know about it yet and will abort
VNOVOL.  Currently, the rotation algorithm will stop with EREMOTEIO.

Fix this by moving on to try the next server if VNOVOL is returned.  Once
all the servers have been tried and the record rechecked, the algorithm
will stop with EREMOTEIO or ENOMEDIUM.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility
David Howells [Thu, 10 May 2018 20:51:47 +0000 (21:51 +0100)]
afs: Fix AFSFetchStatus decoder to provide OpenAFS compatibility

The OpenAFS server's RXAFS_InlineBulkStatus implementation has a bug
whereby if an error occurs on one of the vnodes being queried, then the
errorCode field is set correctly in the corresponding status, but the
interfaceVersion field is left unset.

Fix kAFS to deal with this by evaluating the AFSFetchStatus blob against
the following cases when called from FS.InlineBulkStatus delivery:

 (1) If InterfaceVersion == 0 then:

     (a) If errorCode != 0 then it indicates the abort code for the
         corresponding vnode.

     (b) If errorCode == 0 then the status record is invalid.

 (2) If InterfaceVersion == 1 then:

     (a) If errorCode != 0 then it indicates the abort code for the
         corresponding vnode.

     (b) If errorCode == 0 then the status record is valid and can be
       parsed.

 (3) If InterfaceVersion is anything else then the status record is
     invalid.

Fixes: dd9fbcb8e103 ("afs: Rearrange status mapping")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agonet/can: single_open_net needs to be paired with single_release_net
Christoph Hellwig [Wed, 11 Apr 2018 10:16:13 +0000 (12:16 +0200)]
net/can: single_open_net needs to be paired with single_release_net

Otherwise we will leak a reference to the network namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
6 years agoafs: Fix server rotation's handling of fileserver probe failure
David Howells [Thu, 10 May 2018 13:22:38 +0000 (14:22 +0100)]
afs: Fix server rotation's handling of fileserver probe failure

The server rotation algorithm just gives up if it fails to probe a
fileserver.  Fix this by rotating to the next fileserver instead.

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix refcounting in callback registration
David Howells [Thu, 10 May 2018 07:43:04 +0000 (08:43 +0100)]
afs: Fix refcounting in callback registration

The refcounting on afs_cb_interest struct objects in
afs_register_server_cb_interest() is wrong as it uses the server list
entry's call back interest pointer without regard for the fact that it
might be replaced at any time and the object thrown away.

Fix this by:

 (1) Put a lock on the afs_server_list struct that can be used to
     mediate access to the callback interest pointers in the servers array.

 (2) Keep a ref on the callback interest that we get from the entry.

 (3) Dropping the old reference held by vnode->cb_interest if we replace
     the pointer.

Fixes: c435ee34551e ("afs: Overhaul the callback handling")
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix giving up callbacks on server destruction
David Howells [Thu, 10 May 2018 13:12:50 +0000 (14:12 +0100)]
afs: Fix giving up callbacks on server destruction

When a server record is destroyed, we want to send a message to the server
telling it that we're giving up all the callbacks it has promised us.

Apply two fixes to this:

 (1) Only send the FS.GiveUpAllCallBacks message if we actually got a
     callback from that server.  We assume this to be the case if we
     performed at least one successful FS operation on that server.

 (2) Send it to the address last used for that server rather than always
     picking the first address in the list (which might be unreachable).

Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix address list parsing
David Howells [Wed, 9 May 2018 21:03:18 +0000 (22:03 +0100)]
afs: Fix address list parsing

The parsing of port specifiers in the address list obtained from the DNS
resolution upcall doesn't work as in4_pton() and in6_pton() will fail on
encountering an unexpected delimiter (in this case, the '+' marking the
port number).  However, in*_pton() can't be given multiple specifiers.

Fix this by finding the delimiter in advance and not relying on in*_pton()
to find the end of the address for us.

Fixes: 8b2a464ced77 ("afs: Add an address list concept")
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoafs: Fix directory page locking
David Howells [Fri, 27 Apr 2018 19:46:22 +0000 (20:46 +0100)]
afs: Fix directory page locking

The afs directory loading code (primarily afs_read_dir()) locks all the
pages that hold a directory's content blob to defend against
getdents/getdents races and getdents/lookup races where the competitors
issue conflicting reads on the same data.  As the reads will complete
consecutively, they may retrieve different versions of the data and
one may overwrite the data that the other is busy parsing.

Fix this by not locking the pages at all, but rather by turning the
validation lock into an rwsem and getting an exclusive lock on it whilst
reading the data or validating the attributes and a shared lock whilst
parsing the data.  Sharing the attribute validation lock should be fine as
the data fetch will retrieve the attributes also.

The individual page locks aren't needed at all as the only place they're
being used is to serialise data loading.

Without this patch, the:

  if (!test_bit(AFS_VNODE_DIR_VALID, &dvnode->flags)) {
...
}

part of afs_read_dir() may be skipped, leaving the pages unlocked when we
hit the success: clause - in which case we try to unlock the not-locked
pages, leading to the following oops:

  page:ffffe38b405b4300 count:3 mapcount:0 mapping:ffff98156c83a978 index:0x0
  flags: 0xfffe000001004(referenced|private)
  raw: 000fffe000001004 ffff98156c83a978 0000000000000000 00000003ffffffff
  raw: dead000000000100 dead000000000200 0000000000000001 ffff98156b27c000
  page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
  page->mem_cgroup:ffff98156b27c000
  ------------[ cut here ]------------
  kernel BUG at mm/filemap.c:1205!
  ...
  RIP: 0010:unlock_page+0x43/0x50
  ...
  Call Trace:
   afs_dir_iterate+0x789/0x8f0 [kafs]
   ? _cond_resched+0x15/0x30
   ? kmem_cache_alloc_trace+0x166/0x1d0
   ? afs_do_lookup+0x69/0x490 [kafs]
   ? afs_do_lookup+0x101/0x490 [kafs]
   ? key_default_cmp+0x20/0x20
   ? request_key+0x3c/0x80
   ? afs_lookup+0xf1/0x340 [kafs]
   ? __lookup_slow+0x97/0x150
   ? lookup_slow+0x35/0x50
   ? walk_component+0x1bf/0x490
   ? path_lookupat.isra.52+0x75/0x200
   ? filename_lookup.part.66+0xa0/0x170
   ? afs_end_vnode_operation+0x41/0x60 [kafs]
   ? __check_object_size+0x9c/0x171
   ? strncpy_from_user+0x4a/0x170
   ? vfs_statx+0x73/0xe0
   ? __do_sys_newlstat+0x39/0x70
   ? __x64_sys_getdents+0xc9/0x140
   ? __x64_sys_getdents+0x140/0x140
   ? do_syscall_64+0x5b/0x160
   ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: f3ddee8dc4e2 ("afs: Fix directory handling")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
6 years agoLinux 4.17-rc5
Linus Torvalds [Sun, 13 May 2018 23:15:17 +0000 (16:15 -0700)]
Linux 4.17-rc5

6 years agoMerge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 13 May 2018 17:53:08 +0000 (10:53 -0700)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86/pti updates from Thomas Gleixner:
 "A mixed bag of fixes and updates for the ghosts which are hunting us.

  The scheduler fixes have been pulled into that branch to avoid
  conflicts.

   - A set of fixes to address a khread_parkme() race which caused lost
     wakeups and loss of state.

   - A deadlock fix for stop_machine() solved by moving the wakeups
     outside of the stopper_lock held region.

   - A set of Spectre V1 array access restrictions. The possible
     problematic spots were discuvered by Dan Carpenters new checks in
     smatch.

   - Removal of an unused file which was forgotten when the rest of that
     functionality was removed"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vdso: Remove unused file
  perf/x86/cstate: Fix possible Spectre-v1 indexing for pkg_msr
  perf/x86/msr: Fix possible Spectre-v1 indexing in the MSR driver
  perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map()
  perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_*
  perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[]
  sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
  sched/core: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]
  sched/core: Introduce set_special_state()
  kthread, sched/wait: Fix kthread_parkme() completion issue
  kthread, sched/wait: Fix kthread_parkme() wait-loop
  sched/fair: Fix the update of blocked load when newly idle
  stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock

6 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 13 May 2018 17:46:53 +0000 (10:46 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fix from Thomas Gleixner:
 "Revert the new NUMA aware placement approach which turned out to
  create more problems than it solved"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"

6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 13 May 2018 17:44:32 +0000 (10:44 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull perf tooling fixes from Thomas Gleixner:
 "Another small set of perf tooling fixes and updates:

   - Revert "perf pmu: Fix pmu events parsing rule", as it broke Intel
     PT event description parsing (Arnaldo Carvalho de Melo)

   - Sync x86's cpufeatures.h and kvm UAPI headers with the kernel
     sources, suppressing the ABI drift warnings (Arnaldo Carvalho de
     Melo)

   - Remove duplicated entry for westmereep-dp in Intel's mapfile.csv
     (William Cohen)

   - Fix typo in 'perf bench numa' options description (Yisheng Xie)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  Revert "perf pmu: Fix pmu events parsing rule"
  tools headers kvm: Sync ARM UAPI headers with the kernel sources
  tools headers kvm: Sync uapi/linux/kvm.h with the kernel sources
  tools headers: Sync x86 cpufeatures.h with the kernel sources
  perf vendor events intel: Remove duplicated entry for westmereep-dp in mapfile.csv
  perf bench numa: Fix typo in options

6 years agoMerge tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping
Linus Torvalds [Sun, 13 May 2018 17:28:53 +0000 (10:28 -0700)]
Merge tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fix from Christoph Hellwig:
 "Just one little fix from Jean to avoid a harmless but very annoying
  warning, especially for the drm code"

* tag 'dma-mapping-4.17-5' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: silent unwanted warning "buffer is full"

6 years agoMerge tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Sun, 13 May 2018 01:49:53 +0000 (18:49 -0700)]
Merge tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "Some small SMB3 fixes for 4.17-rc5, some for stable"

* tag '4.17-rc4-SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb3: directory sync should not return an error
  cifs: smb2ops: Fix listxattr() when there are no EAs
  cifs: smbd: Enable signing with smbdirect
  cifs: Allocate validate negotiation request through kmalloc

6 years agoMerge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Linus Torvalds [Sat, 12 May 2018 17:58:57 +0000 (10:58 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/rzhang/linux

Pull thermal fixes from Zhang Rui:

 - fix NULL pointer dereference on module load/probe for int3403_thermal
   driver

 - fix an emergency shutdown issue on exynos thermal driver

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  thermal: exynos: Propagate error value from tmu_read()
  thermal: exynos: Reading temperature makes sense only when TMU is turned on
  thermal: int3403_thermal: Fix NULL pointer deref on module load / probe

6 years agoMerge tag 'for-linus-20180511' of git://git.kernel.dk/linux-block
Linus Torvalds [Sat, 12 May 2018 17:55:48 +0000 (10:55 -0700)]
Merge tag 'for-linus-20180511' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Just a few NVMe fixes this round - one fixing a use-after-free, one
  fixes the return value after controller reset, and the last one fixes
  an issue where some drives will spuriously EIO. We should get these
  into 4.17"

* tag 'for-linus-20180511' of git://git.kernel.dk/linux-block:
  nvme: add quirk to force medium priority for SQ creation
  nvme: Fix sync controller reset return
  nvme: fix use-after-free in nvme_free_ns_head

6 years agoswiotlb: silent unwanted warning "buffer is full"
Jean Delvare [Sat, 12 May 2018 09:57:37 +0000 (11:57 +0200)]
swiotlb: silent unwanted warning "buffer is full"

If DMA_ATTR_NO_WARN is passed to swiotlb_alloc_buffer(), it should be
passed further down to swiotlb_tbl_map_single(). Otherwise we escape
half of the warnings but still log the other half.

This is one of the multiple causes of spurious warnings reported at:
https://bugs.freedesktop.org/show_bug.cgi?id=104082

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 0176adb00406 ("swiotlb: refactor coherent buffer allocation")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christian König <christian.koenig@amd.com>
Cc: Michel Dänzer <michel@daenzer.net>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: stable@vger.kernel.org # v4.16
6 years agoRevert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_af...
Mel Gorman [Wed, 9 May 2018 16:31:15 +0000 (17:31 +0100)]
Revert "sched/numa: Delay retrying placement for automatic NUMA balance after wake_affine()"

This reverts commit 7347fc87dfe6b7315e74310ee1243dc222c68086.

Srikar Dronamra pointed out that while the commit in question did show
a performance improvement on ppc64, it did so at the cost of disabling
active CPU migration by automatic NUMA balancing which was not the intent.
The issue was that a serious flaw in the logic failed to ever active balance
if SD_WAKE_AFFINE was disabled on scheduler domains. Even when it's enabled,
the logic is still bizarre and against the original intent.

Investigation showed that fixing the patch in either the way he suggested,
using the correct comparison for jiffies values or introducing a new
numa_migrate_deferred variable in task_struct all perform similarly to a
revert with a mix of gains and losses depending on the workload, machine
and socket count.

The original intent of the commit was to handle a problem whereby
wake_affine, idle balancing and automatic NUMA balancing disagree on the
appropriate placement for a task. This was particularly true for cases where
a single task was a massive waker of tasks but where wake_wide logic did
not apply.  This was particularly noticeable when a futex (a barrier) woke
all worker threads and tried pulling the wakees to the waker nodes. In that
specific case, it could be handled by tuning MPI or openMP appropriately,
but the behavior is not illogical and was worth attempting to fix. However,
the approach was wrong. Given that we're at rc4 and a fix is not obvious,
it's better to play safe, revert this commit and retry later.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: efault@gmx.de
Cc: ggherdovich@suse.cz
Cc: hpa@zytor.com
Cc: matt@codeblueprint.co.uk
Cc: mpe@ellerman.id.au
Link: http://lkml.kernel.org/r/20180509163115.6fnnyeg4vdm2ct4v@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
6 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 12 May 2018 01:04:12 +0000 (18:04 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
 "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  rbtree: include rcu.h
  scripts/faddr2line: fix error when addr2line output contains discriminator
  ocfs2: take inode cluster lock before moving reflinked inode from orphan dir
  mm, oom: fix concurrent munlock and oom reaper unmap, v3
  mm: migrate: fix double call of radix_tree_replace_slot()
  proc/kcore: don't bounds check against address 0
  mm: don't show nr_indirectly_reclaimable in /proc/vmstat
  mm: sections are not offlined during memory hotremove
  z3fold: fix reclaim lock-ups
  init: fix false positives in W+X checking
  lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()
  KASAN: prohibit KASAN+STRUCTLEAK combination
  MAINTAINERS: update Shuah's email address

6 years agorbtree: include rcu.h
Sebastian Andrzej Siewior [Fri, 11 May 2018 23:02:14 +0000 (16:02 -0700)]
rbtree: include rcu.h

Since commit c1adf20052d8 ("Introduce rb_replace_node_rcu()")
rbtree_augmented.h uses RCU related data structures but does not include
the header file.  It works as long as it gets somehow included before
that and fails otherwise.

Link: http://lkml.kernel.org/r/20180504103159.19938-1-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoscripts/faddr2line: fix error when addr2line output contains discriminator
Changbin Du [Fri, 11 May 2018 23:02:11 +0000 (16:02 -0700)]
scripts/faddr2line: fix error when addr2line output contains discriminator

When addr2line output contains discriminator, the current awk script
cannot parse it.  This patch fixes it by extracting key words using
regex which is more reliable.

  $ scripts/faddr2line vmlinux tlb_flush_mmu_free+0x26
  tlb_flush_mmu_free+0x26/0x50:
  tlb_flush_mmu_free at mm/memory.c:258 (discriminator 3)
  scripts/faddr2line: eval: line 173: unexpected EOF while looking for matching `)'

Link: http://lkml.kernel.org/r/1525323379-25193-1-git-send-email-changbin.du@intel.com
Fixes: 6870c0165feaa5 ("scripts/faddr2line: show the code context")
Signed-off-by: Changbin Du <changbin.du@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: NeilBrown <neilb@suse.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoocfs2: take inode cluster lock before moving reflinked inode from orphan dir
Ashish Samant [Fri, 11 May 2018 23:02:07 +0000 (16:02 -0700)]
ocfs2: take inode cluster lock before moving reflinked inode from orphan dir

While reflinking an inode, we create a new inode in orphan directory,
then take EX lock on it, reflink the original inode to orphan inode and
release EX lock.  Once the lock is released another node could request
it in EX mode from ocfs2_recover_orphans() which causes downconvert of
the lock, on this node, to NL mode.

Later we attempt to initialize security acl for the orphan inode and
move it to the reflink destination.  However, while doing this we dont
take EX lock on the inode.  This could potentially cause problems
because we could be starting transaction, accessing journal and
modifying metadata of the inode while holding NL lock and with another
node holding EX lock on the inode.

Fix this by taking orphan inode cluster lock in EX mode before
initializing security and moving orphan inode to reflink destination.
Use the __tracker variant while taking inode lock to avoid recursive
locking in the ocfs2_init_security_and_acl() call chain.

Link: http://lkml.kernel.org/r/1523475107-7639-1-git-send-email-ashish.samant@oracle.com
Signed-off-by: Ashish Samant <ashish.samant@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm, oom: fix concurrent munlock and oom reaper unmap, v3
David Rientjes [Fri, 11 May 2018 23:02:04 +0000 (16:02 -0700)]
mm, oom: fix concurrent munlock and oom reaper unmap, v3

Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.

This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma.  Since munlock_vma_pages_range() depends
on clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.

This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup().  If the pmd
is zapped by the oom reaper during follow_page_mask() after the check
for pmd_none() is bypassed, this ends up deferencing a NULL ptl or a
kernel oops.

Fix this by manually freeing all possible memory from the mm before
doing the munlock and then setting MMF_OOM_SKIP.  The oom reaper can not
run on the mm anymore so the munlock is safe to do in exit_mmap().  It
also matches the logic that the oom reaper currently uses for
determining when to set MMF_OOM_SKIP itself, so there's no new risk of
excessive oom killing.

This issue fixes CVE-2018-1000200.

Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804241526320.238665@chino.kir.corp.google.com
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes@google.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm: migrate: fix double call of radix_tree_replace_slot()
Naoya Horiguchi [Fri, 11 May 2018 23:02:00 +0000 (16:02 -0700)]
mm: migrate: fix double call of radix_tree_replace_slot()

radix_tree_replace_slot() is called twice for head page, it's obviously
a bug.  Let's fix it.

Link: http://lkml.kernel.org/r/20180423072101.GA12157@hori1.linux.bs1.fc.nec.co.jp
Fixes: e71769ae5260 ("mm: enable thp migration for shmem thp")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reported-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <zi.yan@sent.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoproc/kcore: don't bounds check against address 0
Laura Abbott [Fri, 11 May 2018 23:01:57 +0000 (16:01 -0700)]
proc/kcore: don't bounds check against address 0

The existing kcore code checks for bad addresses against __va(0) with
the assumption that this is the lowest address on the system.  This may
not hold true on some systems (e.g.  arm64) and produce overflows and
crashes.  Switch to using other functions to validate the address range.

It's currently only seen on arm64 and it's not clear if anyone wants to
use that particular combination on a stable release.  So this is not
urgent for stable.

Link: http://lkml.kernel.org/r/20180501201143.15121-1-labbott@redhat.com
Signed-off-by: Laura Abbott <labbott@redhat.com>
Tested-by: Dave Anderson <anderson@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>a
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm: don't show nr_indirectly_reclaimable in /proc/vmstat
Roman Gushchin [Fri, 11 May 2018 23:01:53 +0000 (16:01 -0700)]
mm: don't show nr_indirectly_reclaimable in /proc/vmstat

Don't show nr_indirectly_reclaimable in /proc/vmstat, because there is
no need to export this vm counter to userspace, and some changes are
expected in reclaimable object accounting, which can alter this counter.

Link: http://lkml.kernel.org/r/20180425191422.9159-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agomm: sections are not offlined during memory hotremove
Pavel Tatashin [Fri, 11 May 2018 23:01:50 +0000 (16:01 -0700)]
mm: sections are not offlined during memory hotremove

Memory hotplug and hotremove operate with per-block granularity.  If the
machine has a large amount of memory (more than 64G), the size of a
memory block can span multiple sections.  By mistake, during hotremove
we set only the first section to offline state.

The bug was discovered because kernel selftest started to fail:
  https://lkml.kernel.org/r/20180423011247.GK5563@yexl-desktop

After commit, "mm/memory_hotplug: optimize probe routine".  But, the bug
is older than this commit.  In this optimization we also added a check
for sections to be in a proper state during hotplug operation.

Link: http://lkml.kernel.org/r/20180427145257.15222-1-pasha.tatashin@oracle.com
Fixes: 2d070eab2e82 ("mm: consider zone which is not fully populated to have holes")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoz3fold: fix reclaim lock-ups
Vitaly Wool [Fri, 11 May 2018 23:01:46 +0000 (16:01 -0700)]
z3fold: fix reclaim lock-ups

Do not try to optimize in-page object layout while the page is under
reclaim.  This fixes lock-ups on reclaim and improves reclaim
performance at the same time.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20180430125800.444cae9706489f412ad12621@gmail.com
Signed-off-by: Vitaly Wool <vitaly.vul@sony.com>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Cc: <Oleksiy.Avramchenko@sony.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoinit: fix false positives in W+X checking
Jeffrey Hugo [Fri, 11 May 2018 23:01:42 +0000 (16:01 -0700)]
init: fix false positives in W+X checking

load_module() creates W+X mappings via __vmalloc_node_range() (from
layout_and_allocate()->move_module()->module_alloc()) by using
PAGE_KERNEL_EXEC.  These mappings are later cleaned up via
"call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().

This is a problem because call_rcu_sched() queues work, which can be run
after debug_checkwx() is run, resulting in a race condition.  If hit,
the race results in a nasty splat about insecure W+X mappings, which
results in a poor user experience as these are not the mappings that
debug_checkwx() is intended to catch.

This issue is observed on multiple arm64 platforms, and has been
artificially triggered on an x86 platform.

Address the race by flushing the queued work before running the
arch-defined mark_rodata_ro() which then calls debug_checkwx().

Link: http://lkml.kernel.org/r/1525103946-29526-1-git-send-email-jhugo@codeaurora.org
Fixes: e1a58320a38d ("x86/mm: Warn on W^X mappings")
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
Reported-by: Timur Tabi <timur@codeaurora.org>
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agolib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()
Yury Norov [Fri, 11 May 2018 23:01:39 +0000 (16:01 -0700)]
lib/find_bit_benchmark.c: avoid soft lockup in test_find_first_bit()

test_find_first_bit() is intentionally sub-optimal, and may cause soft
lockup due to long time of run on some systems.  So decrease length of
bitmap to traverse to avoid lockup.

With the change below, time of test execution doesn't exceed 0.2 seconds
on my testing system.

Link: http://lkml.kernel.org/r/20180420171949.15710-1-ynorov@caviumnetworks.com
Fixes: 4441fca0a27f5 ("lib: test module for find_*_bit() functions")
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoKASAN: prohibit KASAN+STRUCTLEAK combination
Dmitry Vyukov [Fri, 11 May 2018 23:01:35 +0000 (16:01 -0700)]
KASAN: prohibit KASAN+STRUCTLEAK combination

Currently STRUCTLEAK inserts initialization out of live scope of variables
from KASAN point of view.  This leads to KASAN false positive reports.
Prohibit this combination for now.

Link: http://lkml.kernel.org/r/20180419172451.104700-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoMAINTAINERS: update Shuah's email address
Shuah Khan (Samsung OSG) [Fri, 11 May 2018 23:01:32 +0000 (16:01 -0700)]
MAINTAINERS: update Shuah's email address

Update email address in MAINTAINERS file due to IT infrastructure changes
at Samsung.

Link: http://lkml.kernel.org/r/20180501212815.25911-1-shuah@kernel.org
Signed-off-by: Shuah Khan (Samsung OSG) <shuah@kernel.org>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Fri, 11 May 2018 21:14:46 +0000 (14:14 -0700)]
Merge git://git./linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Verify lengths of keys provided by the user is AF_KEY, from Kevin
    Easton.

 2) Add device ID for BCM89610 PHY. Thanks to Bhadram Varka.

 3) Add Spectre guards to some ATM code, courtesy of Gustavo A. R.
    Silva.

 4) Fix infinite loop in NSH protocol code. To Eric Dumazet we are most
    grateful for this fix.

 5) Line up /proc/net/netlink headers properly. This fix from YU Bo, we
    do appreciate.

 6) Use after free in TLS code. Once again we are blessed by the
    honorable Eric Dumazet with this fix.

 7) Fix regression in TLS code causing stalls on partial TLS records.
    This fix is bestowed upon us by Andrew Tomt.

 8) Deal with too small MTUs properly in LLC code, another great gift
    from Eric Dumazet.

 9) Handle cached route flushing properly wrt. MTU locking in ipv4, to
    Hangbin Liu we give thanks for this.

10) Fix regression in SO_BINDTODEVIC handling wrt. UDP socket demux.
    Paolo Abeni, he gave us this.

11) Range check coalescing parameters in mlx4 driver, thank you Moshe
    Shemesh.

12) Some ipv6 ICMP error handling fixes in rxrpc, from our good brother
    David Howells.

13) Fix kexec on mlx5 by freeing IRQs in shutdown path. Daniel Juergens,
    you're the best!

14) Don't send bonding RLB updates to invalid MAC addresses. Debabrata
    Benerjee saved us!

15) Uh oh, we were leaking in udp_sendmsg and ping_v4_sendmsg. The ship
    is now water tight, thanks to Andrey Ignatov.

16) IPSEC memory leak in ixgbe from Colin Ian King, man we've got holes
    everywhere!

17) Fix error path in tcf_proto_create, Jiri Pirko what would we do
    without you!

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
  net sched actions: fix refcnt leak in skbmod
  net: sched: fix error path in tcf_proto_create() when modules are not configured
  net sched actions: fix invalid pointer dereferencing if skbedit flags missing
  ixgbe: fix memory leak on ipsec allocation
  ixgbevf: fix ixgbevf_xmit_frame()'s return type
  ixgbe: return error on unsupported SFP module when resetting
  ice: Set rq_last_status when cleaning rq
  ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg
  mlxsw: core: Fix an error handling path in 'mlxsw_core_bus_device_register()'
  bonding: send learning packets for vlans on slave
  bonding: do not allow rlb updates to invalid mac
  net/mlx5e: Err if asked to offload TC match on frag being first
  net/mlx5: E-Switch, Include VF RDMA stats in vport statistics
  net/mlx5: Free IRQs in shutdown path
  rxrpc: Trace UDP transmission failure
  rxrpc: Add a tracepoint to log ICMP/ICMP6 and error messages
  rxrpc: Fix the min security level for kernel calls
  rxrpc: Fix error reception on AF_INET6 sockets
  rxrpc: Fix missing start of call timeout
  qed: fix spelling mistake: "taskelt" -> "tasklet"
  ...

6 years agoMerge tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Linus Torvalds [Fri, 11 May 2018 20:56:43 +0000 (13:56 -0700)]
Merge tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "These patches fix both a possible corruption during NFSoRDMA MR
  recovery, and a sunrpc tracepoint crash.

  Additionally, Trond has a new email address to put in the MAINTAINERS
  file"

* tag 'nfs-for-4.17-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  Change Trond's email address in MAINTAINERS
  sunrpc: Fix latency trace point crashes
  xprtrdma: Fix list corruption / DMAR errors during MR recovery

6 years agonet sched actions: fix refcnt leak in skbmod
Roman Mashak [Fri, 11 May 2018 18:35:33 +0000 (14:35 -0400)]
net sched actions: fix refcnt leak in skbmod

When application fails to pass flags in netlink TLV when replacing
existing skbmod action, the kernel will leak refcnt:

$ tc actions get action skbmod index 1
total acts 0

        action order 0: skbmod pipe set smac 00:11:22:33:44:55
         index 1 ref 1 bind 0

For example, at this point a buggy application replaces the action with
index 1 with new smac 00:aa:22:33:44:55, it fails because of zero flags,
however refcnt gets bumped:

$ tc actions get actions skbmod index 1
total acts 0

        action order 0: skbmod pipe set smac 00:11:22:33:44:55
         index 1 ref 2 bind 0
$

Tha patch fixes this by calling tcf_idr_release() on existing actions.

Fixes: 86da71b57383d ("net_sched: Introduce skbmod action")
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'ceph-for-4.17-rc5' of git://github.com/ceph/ceph-client
Linus Torvalds [Fri, 11 May 2018 20:36:06 +0000 (13:36 -0700)]
Merge tag 'ceph-for-4.17-rc5' of git://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "These patches fix two long-standing bugs in the DIO code path, one of
  which is a crash trivially triggerable with splice()"

* tag 'ceph-for-4.17-rc5' of git://github.com/ceph/ceph-client:
  ceph: fix iov_iter issues in ceph_direct_read_write()
  libceph: add osd_req_op_extent_osd_data_bvecs()
  ceph: fix rsize/wsize capping in ceph_direct_read_write()

6 years agonet: sched: fix error path in tcf_proto_create() when modules are not configured
Jiri Pirko [Fri, 11 May 2018 15:45:32 +0000 (17:45 +0200)]
net: sched: fix error path in tcf_proto_create() when modules are not configured

In case modules are not configured, error out when tp->ops is null
and prevent later null pointer dereference.

Fixes: 33a48927c193 ("sched: push TC filter protocol creation into a separate function")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'sh-for-4.17-fixes' of git://git.libc.org/linux-sh
Linus Torvalds [Fri, 11 May 2018 20:14:24 +0000 (13:14 -0700)]
Merge tag 'sh-for-4.17-fixes' of git://git.libc.org/linux-sh

Pull arch/sh fixes from Rich Felker:
 "Fixes for critical regressions and a build failure.

  The regressions were introduced in 4.15 and 4.17-rc1 and prevented
  booting on affected systems"

* tag 'sh-for-4.17-fixes' of git://git.libc.org/linux-sh:
  sh: switch to NO_BOOTMEM
  sh: mm: Fix unprotected access to struct device
  sh: fix build failure for J2 cpu with SMP disabled

6 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 11 May 2018 20:09:04 +0000 (13:09 -0700)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "There's a small memblock accounting problem when freeing the initrd
  and a Spectre-v2 mitigation for NVIDIA Denver CPUs which just requires
  a match on the CPU ID register.

  Summary:

   - Mitigate Spectre-v2 for NVIDIA Denver CPUs

   - Free memblocks corresponding to freed initrd area"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: capabilities: Add NVIDIA Denver CPU to bp_harden list
  arm64: Add MIDR encoding for NVIDIA CPUs
  arm64: To remove initrd reserved area entry from memblock

6 years agoMerge tag 'powerpc-4.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Fri, 11 May 2018 20:07:22 +0000 (13:07 -0700)]
Merge tag 'powerpc-4.17-5' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "One fix for an actual regression, the change to the SYSCALL_DEFINE
  wrapper broke FTRACE_SYSCALLS for us due to a name mismatch. There's
  also another commit to the same code to make sure we match all our
  syscalls with various prefixes.

  And then just one minor build fix, and the removal of an unused
  variable that was removed and then snuck back in due to some rebasing.

  Thanks to: Naveen N. Rao"

* tag 'powerpc-4.17-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/pseries: Fix CONFIG_NUMA=n build
  powerpc/trace/syscalls: Update syscall name matching logic to account for ppc_ prefix
  powerpc/trace/syscalls: Update syscall name matching logic
  powerpc/64: Remove unused paca->soft_enabled

6 years agoMerge tag 'trace-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt...
Linus Torvalds [Fri, 11 May 2018 20:04:35 +0000 (13:04 -0700)]
Merge tag 'trace-v4.17-rc4' of git://git./linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "Working on some new updates to trace filtering, I noticed that the
  regex_match_front() test was updated to be limited to the size of the
  pattern instead of the full test string.

  But as the test string is not guaranteed to be nul terminated, it
  still needs to consider the size of the test string"

* tag 'trace-v4.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix regex_match_front() to not over compare the test string