OSDN Git Service

uclinux-h8/linux.git
6 years agoMerge branch 'tipc-Better-check-user-provided-attributes'
David S. Miller [Mon, 16 Apr 2018 22:08:18 +0000 (18:08 -0400)]
Merge branch 'tipc-Better-check-user-provided-attributes'

Eric Dumazet says:

====================
tipc: Better check user provided attributes

syzbot reported a crash in __tipc_nl_net_set()

While fixing it, I also had to fix an old bug involving TIPC_NLA_NET_ADDR
====================

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: fix possible crash in __tipc_nl_net_set()
Eric Dumazet [Mon, 16 Apr 2018 15:29:43 +0000 (08:29 -0700)]
tipc: fix possible crash in __tipc_nl_net_set()

syzbot reported a crash in __tipc_nl_net_set() caused by NULL dereference.

We need to check that both TIPC_NLA_NET_NODEID and TIPC_NLA_NET_NODEID_W1
are present.

We also need to make sure userland provided u64 attributes.

Fixes: d50ccc2d3909 ("tipc: add 128-bit node identifier")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: add policy for TIPC_NLA_NET_ADDR
Eric Dumazet [Mon, 16 Apr 2018 15:29:42 +0000 (08:29 -0700)]
tipc: add policy for TIPC_NLA_NET_ADDR

Before syzbot/KMSAN bites, add the missing policy for TIPC_NLA_NET_ADDR

Fixes: 27c21416727a ("tipc: add net set to new netlink api")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'parisc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller...
Linus Torvalds [Mon, 16 Apr 2018 21:07:39 +0000 (14:07 -0700)]
Merge branch 'parisc-4.17-3' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc build fix from Helge Deller:
 "Fix build error because of missing binfmt_elf32.o file which is still
  mentioned in the Makefile"

* 'parisc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix missing binfmt_elf32.o build error

6 years agoMIPS: memset.S: EVA & fault support for small_memset
Matt Redfearn [Thu, 29 Mar 2018 09:28:23 +0000 (10:28 +0100)]
MIPS: memset.S: EVA & fault support for small_memset

The MIPS kernel memset / bzero implementation includes a small_memset
branch which is used when the region to be set is smaller than a long (4
bytes on 32bit, 8 bytes on 64bit). The current small_memset
implementation uses a simple store byte loop to write the destination.
There are 2 issues with this implementation:

1. When EVA mode is active, user and kernel address spaces may overlap.
Currently the use of the sb instruction means kernel mode addressing is
always used and an intended write to userspace may actually overwrite
some critical kernel data.

2. If the write triggers a page fault, for example by calling
__clear_user(NULL, 2), instead of gracefully handling the fault, an OOPS
is triggered.

Fix these issues by replacing the sb instruction with the EX() macro,
which will emit EVA compatible instuctions as required. Additionally
implement a fault fixup for small_memset which sets a2 to the number of
bytes that could not be cleared (as defined by __clear_user).

Reported-by: Chuanhua Lei <chuanhua.lei@intel.com>
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/18975/
Signed-off-by: James Hogan <jhogan@kernel.org>
6 years agoMerge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Mon, 16 Apr 2018 19:44:03 +0000 (12:44 -0700)]
Merge branch 'timers-core-for-linus' of git://git./linux/kernel/git/tip/tip

Pull missed timer updates from Thomas Gleixner:
 "This is a branch which got forgotten during the merge window, but it
  contains only fixes and hardware enablement. No fundamental changes.

   - Various fixes for the imx-tpm clocksource driver

   - A new timer driver for the NCPM7xx SoC family"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/imx-tpm: Add different counter width support
  clocksource/drivers/imx-tpm: Correct some registers operation flow
  clocksource/drivers/imx-tpm: Fix typo of clock name
  dt-bindings: timer: tpm: fix typo of clock name
  clocksource/drivers/npcm: Add NPCM7xx timer driver
  dt-binding: timer: document NPCM7xx timer DT bindings

6 years agoeCryptfs: don't pass up plaintext names when using filename encryption
Tyler Hicks [Wed, 28 Mar 2018 23:41:52 +0000 (23:41 +0000)]
eCryptfs: don't pass up plaintext names when using filename encryption

Both ecryptfs_filldir() and ecryptfs_readlink_lower() use
ecryptfs_decode_and_decrypt_filename() to translate lower filenames to
upper filenames. The function correctly passes up lower filenames,
unchanged, when filename encryption isn't in use. However, it was also
passing up lower filenames when the filename wasn't encrypted or
when decryption failed. Since 88ae4ab9802e, eCryptfs refuses to lookup
lower plaintext names when filename encryption is enabled so this
resulted in a situation where userspace would see lower plaintext
filenames in calls to getdents(2) but then not be able to lookup those
filenames.

An example of this can be seen when enabling filename encryption on an
eCryptfs mount at the root directory of an Ext4 filesystem:

$ ls -1i /lower
12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
11 lost+found
$ ls -1i /upper
ls: cannot access '/upper/lost+found': No such file or directory
 ? lost+found
12 test

With this change, the lower lost+found dentry is ignored:

$ ls -1i /lower
12 ECRYPTFS_FNEK_ENCRYPTED.FWYZD8TcW.5FV-TKTEYOHsheiHX9a-w.NURCCYIMjI8pn5BDB9-h3fXwrE--
11 lost+found
$ ls -1i /upper
12 test

Additionally, some potentially noisy error/info messages in the related
code paths are turned into debug messages so that the logs can't be
easily filled.

Fixes: 88ae4ab9802e ("ecryptfs_lookup(): try either only encrypted or plaintext name")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Linus Torvalds [Mon, 16 Apr 2018 18:24:28 +0000 (11:24 -0700)]
Merge tag 'for-linus' of git://git./virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bug fixes, plus a new test case and the associated infrastructure for
  writing nested virtualization tests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: selftests: add vmx_tsc_adjust_test
  kvm: x86: move MSR_IA32_TSC handling to x86.c
  X86/KVM: Properly update 'tsc_offset' to represent the running guest
  kvm: selftests: add -std=gnu99 cflags
  x86: Add check for APIC access address for vmentry of L2 guests
  KVM: X86: fix incorrect reference of trace_kvm_pi_irte_update
  X86/KVM: Do not allow DISABLE_EXITS_MWAIT when LAPIC ARAT is not available
  kvm: selftests: fix spelling mistake: "divisable" and "divisible"
  X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted

6 years agox86/ldt: Fix support_pte_mask filtering in map_ldt_struct()
Joerg Roedel [Mon, 16 Apr 2018 09:43:57 +0000 (11:43 +0200)]
x86/ldt: Fix support_pte_mask filtering in map_ldt_struct()

The |= operator will let us end up with an invalid PTE. Use
the correct &= instead.

[ The bug was also independently reported by Shuah Khan ]

Fixes: fb43d6cb91ef ('x86/mm: Do not auto-massage page protections')
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agonet: Fix one possible memleak in ip_setup_cork
Gao Feng [Mon, 16 Apr 2018 02:16:45 +0000 (10:16 +0800)]
net: Fix one possible memleak in ip_setup_cork

It would allocate memory in this function when the cork->opt is NULL. But
the memory isn't freed if failed in the latter rt check, and return error
directly. It causes the memleak if its caller is ip_make_skb which also
doesn't free the cork->opt when meet a error.

Now move the rt check ahead to avoid the memleak.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agokvm: selftests: add vmx_tsc_adjust_test
Paolo Bonzini [Tue, 27 Mar 2018 20:46:11 +0000 (22:46 +0200)]
kvm: selftests: add vmx_tsc_adjust_test

The test checks the behavior of setting MSR_IA32_TSC in a nested guest,
and the TSC_OFFSET VMCS field in general.  It also introduces the testing
infrastructure for Intel nested virtualization.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agokvm: x86: move MSR_IA32_TSC handling to x86.c
Paolo Bonzini [Fri, 13 Apr 2018 09:38:35 +0000 (11:38 +0200)]
kvm: x86: move MSR_IA32_TSC handling to x86.c

This is not specific to Intel/AMD anymore.  The TSC offset is available
in vcpu->arch.tsc_offset.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agoX86/KVM: Properly update 'tsc_offset' to represent the running guest
KarimAllah Ahmed [Sat, 14 Apr 2018 03:10:52 +0000 (05:10 +0200)]
X86/KVM: Properly update 'tsc_offset' to represent the running guest

Update 'tsc_offset' on vmentry/vmexit of L2 guests to ensure that it always
captures the TSC_OFFSET of the running guest whether it is the L1 or L2
guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
[AMD changes, fix update_ia32_tsc_adjust_msr. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
6 years agonetfilter: nf_tables: free set name in error path
Florian Westphal [Tue, 10 Apr 2018 07:00:24 +0000 (09:00 +0200)]
netfilter: nf_tables: free set name in error path

set->name must be free'd here in case ops->init fails.

Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: can't fail after linking rule into active rule list
Florian Westphal [Tue, 10 Apr 2018 07:30:27 +0000 (09:30 +0200)]
netfilter: nf_tables: can't fail after linking rule into active rule list

rules in nftables a free'd using kfree, but protected by rcu, i.e. we
must wait for a grace period to elapse.

Normal removal patch does this, but nf_tables_newrule() doesn't obey
this rule during error handling.

It calls nft_trans_rule_add() *after* linking rule, and, if that
fails to allocate memory, it unlinks the rule and then kfree() it --
this is unsafe.

Switch order -- first add rule to transaction list, THEN link it
to public list.

Note: nft_trans_rule_add() uses GFP_KERNEL; it will not fail so this
is not a problem in practice (spotted only during code review).

Fixes: 0628b123c96d12 ("netfilter: nfnetlink: add batch support and use it from nf_tables")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: fix CONFIG_NF_REJECT_IPV6=m link error
Arnd Bergmann [Mon, 9 Apr 2018 10:53:12 +0000 (12:53 +0200)]
netfilter: fix CONFIG_NF_REJECT_IPV6=m link error

We get a new link error with CONFIG_NFT_REJECT_INET=y and CONFIG_NF_REJECT_IPV6=m
after larger parts of the nftables modules are linked together:

net/netfilter/nft_reject_inet.o: In function `nft_reject_inet_eval':
nft_reject_inet.c:(.text+0x17c): undefined reference to `nf_send_unreach6'
nft_reject_inet.c:(.text+0x190): undefined reference to `nf_send_reset6'

The problem is that with NF_TABLES_INET set, we implicitly try to use
the ipv6 version as well for NFT_REJECT, but when CONFIG_IPV6 is set to
a loadable module, it's impossible to reach that.

The best workaround I found is to express the above as a Kconfig
dependency, forcing NFT_REJECT itself to be 'm' in that particular
configuration.

Fixes: 02c7b25e5f54 ("netfilter: nf_tables: build-in filter chain type")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: conntrack: silent a memory leak warning
Cong Wang [Fri, 30 Mar 2018 20:22:06 +0000 (13:22 -0700)]
netfilter: conntrack: silent a memory leak warning

The following memory leak is false postive:

unreferenced object 0xffff8f37f156fb38 (size 128):
  comm "softirq", pid 0, jiffies 4294899665 (age 11.292s)
  hex dump (first 32 bytes):
    6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
    00 00 00 00 30 00 20 00 48 6b 6b 6b 6b 6b 6b 6b  ....0. .Hkkkkkkk
  backtrace:
    [<000000004fda266a>] __kmalloc_track_caller+0x10d/0x141
    [<000000007b0a7e3c>] __krealloc+0x45/0x62
    [<00000000d08e0bfb>] nf_ct_ext_add+0xdc/0x133
    [<0000000099b47fd8>] init_conntrack+0x1b1/0x392
    [<0000000086dc36ec>] nf_conntrack_in+0x1ee/0x34b
    [<00000000940592de>] nf_hook_slow+0x36/0x95
    [<00000000d1bd4da7>] nf_hook.constprop.43+0x1c3/0x1dd
    [<00000000c3673266>] __ip_local_out+0xae/0xb4
    [<000000003e4192a6>] ip_local_out+0x17/0x33
    [<00000000b64356de>] igmp_ifc_timer_expire+0x23e/0x26f
    [<000000006a8f3032>] call_timer_fn+0x14c/0x2a5
    [<00000000650c1725>] __run_timers.part.34+0x150/0x182
    [<0000000090e6946e>] run_timer_softirq+0x2a/0x4c
    [<000000004d1e7293>] __do_softirq+0x1d1/0x3c2
    [<000000004643557d>] irq_exit+0x53/0xa2
    [<0000000029ddee8f>] smp_apic_timer_interrupt+0x22a/0x235

because __krealloc() is not supposed to release the old
memory and it is released later via kfree_rcu(). Since this is
the only external user of __krealloc(), just mark it as not leak
here.

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonet: af_packet: fix race in PACKET_{R|T}X_RING
Eric Dumazet [Mon, 16 Apr 2018 00:52:04 +0000 (17:52 -0700)]
net: af_packet: fix race in PACKET_{R|T}X_RING

In order to remove the race caught by syzbot [1], we need
to lock the socket before using po->tp_version as this could
change under us otherwise.

This means lock_sock() and release_sock() must be done by
packet_set_ring() callers.

[1] :
BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:17 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:53
 kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
 packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
 packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662
 SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
 SyS_setsockopt+0x76/0xa0 net/socket.c:1828
 do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x3d/0xa2
RIP: 0033:0x449099
RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099
RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003
RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000
R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001

Local variable description: ----req_u@packet_setsockopt
Variable was created at:
 packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612
 SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849

Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoibmvnic: Clear pending interrupt after device reset
Thomas Falcon [Sun, 15 Apr 2018 23:53:36 +0000 (18:53 -0500)]
ibmvnic: Clear pending interrupt after device reset

Due to a firmware bug, the hypervisor can send an interrupt to a
transmit or receive queue just prior to a partition migration, not
allowing the device enough time to handle it and send an EOI. When
the partition migrates, the interrupt is lost but an "EOI-pending"
flag for the interrupt line is still set in firmware. No further
interrupts will be sent until that flag is cleared, effectively
freezing that queue. To workaround this, the driver will disable the
hardware interrupt and send an H_EOI signal prior to re-enabling it.
This will flush the pending EOI and allow the driver to continue
operation.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotcp: clear tp->packets_out when purging write queue
Soheil Hassas Yeganeh [Sun, 15 Apr 2018 00:44:46 +0000 (20:44 -0400)]
tcp: clear tp->packets_out when purging write queue

Clear tp->packets_out when purging the write queue, otherwise
tcp_rearm_rto() mistakenly assumes TCP write queue is not empty.
This results in NULL pointer dereference.

Also, remove the redundant `tp->packets_out = 0` from
tcp_disconnect(), since tcp_disconnect() calls
tcp_write_queue_purge().

Fixes: a27fd7a8ed38 (tcp: purge write queue upon RST)
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Reported-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Tested-by: Sami Farin <hvtaifwkbgefbaei@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoteam: avoid adding twice the same option to the event list
Paolo Abeni [Fri, 13 Apr 2018 11:59:25 +0000 (13:59 +0200)]
team: avoid adding twice the same option to the event list

When parsing the options provided by the user space,
team_nl_cmd_options_set() insert them in a temporary list to send
multiple events with a single message.
While each option's attribute is correctly validated, the code does
not check for duplicate entries before inserting into the event
list.

Exploiting the above, the syzbot was able to trigger the following
splat:

kernel BUG at lib/list_debug.c:31!
invalid opcode: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4466 Comm: syzkaller556835 Not tainted 4.16.0+ #17
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:__list_add_valid+0xaa/0xb0 lib/list_debug.c:29
RSP: 0018:ffff8801b04bf248 EFLAGS: 00010286
RAX: 0000000000000058 RBX: ffff8801c8fc7a90 RCX: 0000000000000000
RDX: 0000000000000058 RSI: ffffffff815fbf41 RDI: ffffed0036097e3f
RBP: ffff8801b04bf260 R08: ffff8801b0b2a700 R09: ffffed003b604f90
R10: ffffed003b604f90 R11: ffff8801db027c87 R12: ffff8801c8fc7a90
R13: ffff8801c8fc7a90 R14: dffffc0000000000 R15: 0000000000000000
FS:  0000000000b98880(0000) GS:ffff8801db000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000043fc30 CR3: 00000001afe8e000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  __list_add include/linux/list.h:60 [inline]
  list_add include/linux/list.h:79 [inline]
  team_nl_cmd_options_set+0x9ff/0x12b0 drivers/net/team/team.c:2571
  genl_family_rcv_msg+0x889/0x1120 net/netlink/genetlink.c:599
  genl_rcv_msg+0xc6/0x170 net/netlink/genetlink.c:624
  netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
  genl_rcv+0x28/0x40 net/netlink/genetlink.c:635
  netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
  netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
  netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
  sock_sendmsg_nosec net/socket.c:629 [inline]
  sock_sendmsg+0xd5/0x120 net/socket.c:639
  ___sys_sendmsg+0x805/0x940 net/socket.c:2117
  __sys_sendmsg+0x115/0x270 net/socket.c:2155
  SYSC_sendmsg net/socket.c:2164 [inline]
  SyS_sendmsg+0x29/0x30 net/socket.c:2162
  do_syscall_64+0x29e/0x9d0 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x4458b9
RSP: 002b:00007ffd1d4a7278 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 000000000000001b RCX: 00000000004458b9
RDX: 0000000000000010 RSI: 0000000020000d00 RDI: 0000000000000004
RBP: 00000000004a74ed R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000213 R12: 00007ffd1d4a7348
R13: 0000000000402a60 R14: 0000000000000000 R15: 0000000000000000
Code: 75 e8 eb a9 48 89 f7 48 89 75 e8 e8 d1 85 7b fe 48 8b 75 e8 eb bb 48
89 f2 48 89 d9 4c 89 e6 48 c7 c7 a0 84 d8 87 e8 ea 67 28 fe <0f> 0b 0f 1f
40 00 48 b8 00 00 00 00 00 fc ff df 55 48 89 e5 41
RIP: __list_add_valid+0xaa/0xb0 lib/list_debug.c:29 RSP: ffff8801b04bf248

This changeset addresses the avoiding list_add() if the current
option is already present in the event list.

Reported-and-tested-by: syzbot+4d4af685432dc0e56c91@syzkaller.appspotmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fixes: 2fcdb2c9e659 ("team: allow to send multiple set events in one message")
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agopowerpc/lib: Fix off-by-one in alternate feature patching
Michael Ellerman [Mon, 16 Apr 2018 13:25:19 +0000 (23:25 +1000)]
powerpc/lib: Fix off-by-one in alternate feature patching

When we patch an alternate feature section, we have to adjust any
relative branches that branch out of the alternate section.

But currently we have a bug if we have a branch that points to past
the last instruction of the alternate section, eg:

  FTR_SECTION_ELSE
  1:     b       2f
         or      6,6,6
  2:
  ALT_FTR_SECTION_END(...)
         nop

This will result in a relative branch at 1 with a target that equals
the end of the alternate section.

That branch does not need adjusting when it's moved to the non-else
location. Currently we do adjust it, resulting in a branch that goes
off into the link-time location of the else section, which is junk.

The fix is to not patch branches that have a target == end of the
alternate section.

Fixes: d20fe50a7b3c ("KVM: PPC: Book3S HV: Branch inside feature section")
Fixes: 9b1a735de64c ("powerpc: Add logic to patch alternative feature sections")
Cc: stable@vger.kernel.org # v2.6.27+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
6 years agoxen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_reg_add
Jia-Ju Bai [Wed, 11 Apr 2018 01:15:31 +0000 (09:15 +0800)]
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_reg_add

pcistub_reg_add() is never called in atomic context.

pcistub_reg_add() is only called by pcistub_quirk_add, which is
only set in DRIVER_ATTR().

Despite never getting called from atomic context,
pcistub_reg_add() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agoxen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in xen_pcibk_config_quirks_init
Jia-Ju Bai [Mon, 9 Apr 2018 15:04:25 +0000 (23:04 +0800)]
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in xen_pcibk_config_quirks_init

xen_pcibk_config_quirks_init() is never called in atomic context.

The call chains ending up at xen_pcibk_config_quirks_init() are:
[1] xen_pcibk_config_quirks_init() <- xen_pcibk_config_init_dev() <-
pcistub_init_device() <- pcistub_seize() <- pcistub_probe()
[2] xen_pcibk_config_quirks_init() <- xen_pcibk_config_init_dev() <-
pcistub_init_device() <- pcistub_init_devices_late() <-
xen_pcibk_init()
pcistub_probe() is only set as ".probe" in struct pci_driver.
xen_pcibk_init() is is only set as a parameter of module_init().
These functions are not called in atomic context.

Despite never getting called from atomic context,
xen_pcibk_config_quirks_init() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of sucessful allocation.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agoxen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_device_alloc
Jia-Ju Bai [Mon, 9 Apr 2018 15:04:12 +0000 (23:04 +0800)]
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_device_alloc

pcistub_device_alloc() is never called in atomic context.

The call chain ending up at pcistub_device_alloc() is:
[1] pcistub_device_alloc() <- pcistub_seize() <- pcistub_probe()
pcistub_probe() is only set as ".probe" in struct pci_driver.
This function is not called in atomic context.

Despite never getting called from atomic context,
pcistub_device_alloc() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of sucessful allocation.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agoxen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_init_device
Jia-Ju Bai [Mon, 9 Apr 2018 15:03:53 +0000 (23:03 +0800)]
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_init_device

pcistub_init_device() is never called in atomic context.

The call chain ending up at pcistub_init_device() is:
[1] pcistub_init_device() <- pcistub_seize() <- pcistub_probe()
[2] pcistub_init_device() <- pcistub_init_devices_late() <-
xen_pcibk_init()
pcistub_probe() is only set as ".probe" in struct pci_driver.
xen_pcibk_init() is is only set as a parameter of module_init().
These functions are not called in atomic context.

Despite never getting called from atomic context,
pcistub_init_device() calls kzalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agoxen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_probe
Jia-Ju Bai [Mon, 9 Apr 2018 15:03:36 +0000 (23:03 +0800)]
xen: xen-pciback: Replace GFP_ATOMIC with GFP_KERNEL in pcistub_probe

pcistub_probe() is never called in atomic context.
This function is only set as ".probe" in struct pci_driver.

Despite never getting called from atomic context,
pcistub_probe() calls kmalloc() with GFP_ATOMIC,
which does not sleep for allocation.
GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
which can sleep and improve the possibility of sucessful allocation.

This is found by a static analysis tool named DCNS written by myself.
And I also manually check it.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agonet: mvpp2: Fix TCAM filter reserved range
Maxime Chevallier [Mon, 16 Apr 2018 08:07:23 +0000 (10:07 +0200)]
net: mvpp2: Fix TCAM filter reserved range

Marvell's PPv2 controller has a Packet Header parser, which uses a
fixed-size TCAM array of filter entries.

The mvpp2 driver reserves some ranges among the 256 TCAM entries to
perform MAC and VID filtering. The rest of the TCAM ids are freely usable
for other features, such as IPv4 proto matching.

This commit fixes the MVPP2_PE_LAST_FREE_TID define that sets the end of
the "free range", which included the MAC range. This could therefore allow
some other features to use entries dedicated to MAC filtering,
lowering the number of unicast/multicast addresses that could be allowed
before switching to promiscuous mode.

Fixes: 10fea26ce2aa ("net: mvpp2: Add support for unicast filtering")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoRevert "macsec: missing dev_put() on error in macsec_newlink()"
Dan Carpenter [Mon, 16 Apr 2018 10:17:50 +0000 (13:17 +0300)]
Revert "macsec: missing dev_put() on error in macsec_newlink()"

This patch is just wrong, sorry.  I was trying to fix a static checker
warning and misread the code.  The reference taken in macsec_newlink()
is released in macsec_free_netdev() when the netdevice is destroyed.

This reverts commit 5dcd8400884cc4a043a6d4617e042489e5d566a9.

Reported-by: Laura Abbott <labbott@redhat.com>
Fixes: 5dcd8400884c ("macsec: missing dev_put() on error in macsec_newlink()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMIPS: dts: Boston: Fix PCI bus dtc warnings:
Matt Redfearn [Fri, 13 Apr 2018 08:50:44 +0000 (09:50 +0100)]
MIPS: dts: Boston: Fix PCI bus dtc warnings:

dtc recently (v1.4.4-8-g756ffc4f52f6) added PCI bus checks. Fix the
warnings now emitted:

arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@10000000: missing bus-range for PCI bridge
arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@12000000: missing bus-range for PCI bridge
arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@14000000: missing bus-range for PCI bridge

Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mips@linux-mips.org
Cc: devicetree@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/19070/
Signed-off-by: James Hogan <jhogan@kernel.org>
6 years agos390: rename default_defconfig to debug_defconfig
Heiko Carstens [Thu, 12 Apr 2018 09:01:07 +0000 (11:01 +0200)]
s390: rename default_defconfig to debug_defconfig

The name debug_defconfig reflects what the config is actually good
for and should be less confusing.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390: remove gcov defconfig
Heiko Carstens [Thu, 12 Apr 2018 09:00:31 +0000 (11:00 +0200)]
s390: remove gcov defconfig

This config is not needed anymore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390: update defconfig
Martin Schwidefsky [Mon, 20 Nov 2017 07:48:02 +0000 (08:48 +0100)]
s390: update defconfig

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agofs: ext2: Adding new return type vm_fault_t
Souptick Joarder [Sat, 14 Apr 2018 19:33:42 +0000 (01:03 +0530)]
fs: ext2: Adding new return type vm_fault_t

Use new return type vm_fault_t for page_mkwrite,
pfn_mkwrite and fault handler.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Jan Kara <jack@suse.cz>
6 years agoisofs: fix potential memory leak in mount option parsing
Chengguang Xu [Sat, 14 Apr 2018 12:16:06 +0000 (20:16 +0800)]
isofs: fix potential memory leak in mount option parsing

When specifying string type mount option (e.g., iocharset)
several times in a mount, current option parsing may
cause memory leak. Hence, call kfree for previous one
in this case. Meanwhile, check memory allocation result
for it.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Jan Kara <jack@suse.cz>
6 years agorbd: notrim map option
Ilya Dryomov [Fri, 23 Mar 2018 05:14:47 +0000 (06:14 +0100)]
rbd: notrim map option

Add an option to turn off discard and write zeroes offload support to
avoid deprovisioning a fully provisioned image.  When enabled, discard
requests will fail with -EOPNOTSUPP, write zeroes requests will fall
back to manually zeroing.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Hitoshi Kamei <hitoshi.kamei.xm@hitachi.com>
6 years agorbd: adjust queue limits for "fancy" striping
Ilya Dryomov [Mon, 16 Apr 2018 07:32:18 +0000 (09:32 +0200)]
rbd: adjust queue limits for "fancy" striping

In order to take full advantage of merging in ceph_file_to_extents(),
allow object set sized I/Os.  If the layout is not "fancy", an object
set consists of just one object.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
6 years agorbd: avoid Wreturn-type warnings
Arnd Bergmann [Wed, 4 Apr 2018 12:53:39 +0000 (14:53 +0200)]
rbd: avoid Wreturn-type warnings

In some configurations gcc cannot see that rbd_assert(0) leads to an
unreachable code path:

  drivers/block/rbd.c: In function 'rbd_img_is_write':
  drivers/block/rbd.c:1397:1: error: control reaches end of non-void function [-Werror=return-type]
  drivers/block/rbd.c: In function '__rbd_obj_handle_request':
  drivers/block/rbd.c:2499:1: error: control reaches end of non-void function [-Werror=return-type]
  drivers/block/rbd.c: In function 'rbd_obj_handle_write':
  drivers/block/rbd.c:2471:1: error: control reaches end of non-void function [-Werror=return-type]

As the rbd_assert() here shows has no extra information beyond the verbose
BUG(), we can simply use BUG() directly in its place.  This is reliably
detected as not returning on any architecture, since it doesn't depend
on the unlikely() comparison that confused gcc.

Fixes: 3da691bf4366 ("rbd: new request handling code")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
6 years agoceph: always update atime/mtime/ctime for new inode
Yan, Zheng [Mon, 26 Mar 2018 08:46:39 +0000 (16:46 +0800)]
ceph: always update atime/mtime/ctime for new inode

For new inode, atime/mtime/ctime are uninitialized.  Don't compare
against them.

Cc: stable@kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
6 years agorbd: support timeout in rbd_wait_state_locked()
Dongsheng Yang [Mon, 26 Mar 2018 14:22:55 +0000 (10:22 -0400)]
rbd: support timeout in rbd_wait_state_locked()

currently, the rbd_wait_state_locked() will wait forever if we
can't get our state locked. Example:

rbd map --exclusive test1  --> /dev/rbd0
rbd map test1  --> /dev/rbd1
dd if=/dev/zero of=/dev/rbd1 bs=1M count=1 --> IO blocked

To avoid this problem, this patch introduce a timeout design
in rbd_wait_state_locked(). Then rbd_wait_state_locked() will
return error when we reach a timeout.

This patch allow user to set the lock_timeout in rbd mapping.

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
6 years agorbd: refactor rbd_wait_state_locked()
Ilya Dryomov [Wed, 4 Apr 2018 08:15:38 +0000 (10:15 +0200)]
rbd: refactor rbd_wait_state_locked()

In preparation for lock_timeout option, make rbd_wait_state_locked()
return error codes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
6 years agos390: add support for IBM z14 Model ZR1
Heiko Carstens [Fri, 13 Apr 2018 12:04:24 +0000 (14:04 +0200)]
s390: add support for IBM z14 Model ZR1

Just add the new machine type number to the two places that matter.

Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390: remove couple of duplicate includes
Vasily Gorbik [Fri, 13 Apr 2018 08:57:27 +0000 (10:57 +0200)]
s390: remove couple of duplicate includes

Removing couple of duplicate includes, found by "make includecheck".
That leaves 1 duplicate include in arch/s390/kernel/entry.S, which is
there for a reason (it includes generated asm/syscall_table.h twice).

Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/boot: remove unused COMPILE_VERSION and ccflags-y
Vasily Gorbik [Wed, 11 Apr 2018 08:24:29 +0000 (10:24 +0200)]
s390/boot: remove unused COMPILE_VERSION and ccflags-y

ccflags-y has no effect (no code is built in that directory,
arch/s390/boot/compressed/Makefile defines its own KBUILD_CFLAGS).
Removing ccflags-y together with COMPILE_VERSION.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/nospec: include cpu.h
Sebastian Ott [Tue, 10 Apr 2018 10:39:34 +0000 (12:39 +0200)]
s390/nospec: include cpu.h

Fix the following sparse warnings:
symbol 'cpu_show_spectre_v1' was not declared. Should it be static?
symbol 'cpu_show_spectre_v2' was not declared. Should it be static?

Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/decompressor: Ignore file vmlinux.bin.full
Thomas Richter [Thu, 12 Apr 2018 07:42:48 +0000 (08:42 +0100)]
s390/decompressor: Ignore file vmlinux.bin.full

Commit 81796a3c6a4a ("s390/decompressor: trim uncompressed
image head during the build") introduced a new
file named vmlinux.bin.full in directory
arch/s390/boot/compressed.

Add this file to the list of ignored files so it does
not show up on git status.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/kexec_file: add generated files to .gitignore
Heiko Carstens [Thu, 12 Apr 2018 11:45:52 +0000 (13:45 +0200)]
s390/kexec_file: add generated files to .gitignore

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/Kconfig: Move kexec config options to "Processor type and features"
Philipp Rudo [Tue, 27 Mar 2018 11:14:12 +0000 (13:14 +0200)]
s390/Kconfig: Move kexec config options to "Processor type and features"

The config options for kexec are currently not under any menu directory. Up
until now this was not a problem as standard kexec is always compiled in
and thus does not create a menu entry. This changed when kexec_file_load
was enabled. Its config option requires a menu entry which, when added
beneath standard kexec option, appears on the main directory above "General
Setup". Thus move the whole block further down such that the entry in now
in "Processor type and features".

While at it also update the help text for kexec file.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/kexec_file: Add ELF loader
Philipp Rudo [Mon, 11 Sep 2017 13:15:29 +0000 (15:15 +0200)]
s390/kexec_file: Add ELF loader

Add an ELF loader for kexec_file. The main task here is to do proper sanity
checks on the ELF file. Basically all other functionality was already
implemented for the image loader.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/kexec_file: Add crash support to image loader
Philipp Rudo [Tue, 5 Sep 2017 09:55:23 +0000 (11:55 +0200)]
s390/kexec_file: Add crash support to image loader

Add support to load a crash kernel to the image loader. This requires
extending the purgatory.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/kexec_file: Add image loader
Philipp Rudo [Wed, 30 Aug 2017 12:03:38 +0000 (14:03 +0200)]
s390/kexec_file: Add image loader

Add an image loader for kexec_file_load. For simplicity first skip crash
support. The functions defined in machine_kexec_file will later be shared
with the ELF loader.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/kexec_file: Add kexec_file_load system call
Philipp Rudo [Mon, 19 Jun 2017 08:45:33 +0000 (10:45 +0200)]
s390/kexec_file: Add kexec_file_load system call

This patch adds the kexec_file_load system call to s390 as well as the arch
specific functions common code requires to work. Loaders for the different
file types will be added later.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/kexec_file: Add purgatory
Philipp Rudo [Mon, 28 Aug 2017 13:32:36 +0000 (15:32 +0200)]
s390/kexec_file: Add purgatory

The common code expects the architecture to have a purgatory that runs
between the two kernels. Add it now. For simplicity first skip crash
support.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/kexec_file: Prepare setup.h for kexec_file_load
Philipp Rudo [Tue, 27 Jun 2017 10:44:11 +0000 (12:44 +0200)]
s390/kexec_file: Prepare setup.h for kexec_file_load

kexec_file_load needs to prepare the new kernels before they are loaded.
For that it has to know the offsets in head.S, e.g. to register the new
command line. Unfortunately there are no macros right now defining those
offsets. Define them now.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/smsgiucv: disable SMSG on module unload
Martin Schwidefsky [Tue, 3 Apr 2018 09:08:52 +0000 (11:08 +0200)]
s390/smsgiucv: disable SMSG on module unload

The module exit function of the smsgiucv module uses the incorrect CP
command to disable SMSG messages. The correct command is "SET SMSG OFF".
Use it.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agos390/sclp: avoid potential usage of uninitialized value
Vasily Gorbik [Fri, 13 Apr 2018 16:22:14 +0000 (18:22 +0200)]
s390/sclp: avoid potential usage of uninitialized value

sclp_early_printk could be used before .bss section is zeroed
(i.e. from als.c during the decompressor phase), therefore values used
by sclp_early_printk should be located in the .data section.

Another reason for that is to avoid potential initrd corruption, if some
code in future would use sclp_early_printk before initrd is moved from
possibly overlapping with .bss section region to a safe location.

Fixes: 0b0d1173d8ae ("s390/sclp: 32 bit event mask compatibility mode")
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
6 years agomm,vmscan: Allow preallocating memory for register_shrinker().
Tetsuo Handa [Wed, 4 Apr 2018 10:53:07 +0000 (19:53 +0900)]
mm,vmscan: Allow preallocating memory for register_shrinker().

syzbot is catching so many bugs triggered by commit 9ee332d99e4d5a97
("sget(): handle failures of register_shrinker()"). That commit expected
that calling kill_sb() from deactivate_locked_super() without successful
fill_super() is safe, but the reality was different; some callers assign
attributes which are needed for kill_sb() after sget() succeeds.

For example, [1] is a report where sb->s_mode (which seems to be either
FMODE_READ | FMODE_EXCL | FMODE_WRITE or FMODE_READ | FMODE_EXCL) is not
assigned unless sget() succeeds. But it does not worth complicate sget()
so that register_shrinker() failure path can safely call
kill_block_super() via kill_sb(). Making alloc_super() fail if memory
allocation for register_shrinker() failed is much simpler. Let's avoid
calling deactivate_locked_super() from sget_userns() by preallocating
memory for the shrinker and making register_shrinker() in sget_userns()
never fail.

[1] https://syzkaller.appspot.com/bug?id=588996a25a2587be2e3a54e8646728fb9cae44e7

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+5a170e19c963a2e0df79@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agorpc_pipefs: fix double-dput()
Al Viro [Tue, 3 Apr 2018 05:15:46 +0000 (01:15 -0400)]
rpc_pipefs: fix double-dput()

if we ever hit rpc_gssd_dummy_depopulate() dentry passed to
it has refcount equal to 1.  __rpc_rmpipe() drops it and
dput() done after that hits an already freed dentry.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoorangefs_kill_sb(): deal with allocation failures
Al Viro [Tue, 3 Apr 2018 04:13:17 +0000 (00:13 -0400)]
orangefs_kill_sb(): deal with allocation failures

orangefs_fill_sb() might've failed to allocate ORANGEFS_SB(s); don't
oops in that case.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agojffs2_kill_sb(): deal with failed allocations
Al Viro [Tue, 3 Apr 2018 03:56:44 +0000 (23:56 -0400)]
jffs2_kill_sb(): deal with failed allocations

jffs2_fill_super() might fail to allocate jffs2_sb_info;
jffs2_kill_sb() must survive that.

Cc: stable@kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agohypfs_kill_super(): deal with failed allocations
Al Viro [Tue, 3 Apr 2018 03:50:31 +0000 (23:50 -0400)]
hypfs_kill_super(): deal with failed allocations

hypfs_fill_super() might fail to allocate sbi; hypfs_kill_super()
should not oops on that.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6 years agoLinux 4.17-rc1 v4.17-rc1
Linus Torvalds [Mon, 16 Apr 2018 01:24:20 +0000 (18:24 -0700)]
Linux 4.17-rc1

6 years agoMerge tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave...
Linus Torvalds [Mon, 16 Apr 2018 01:08:35 +0000 (18:08 -0700)]
Merge tag 'for-4.17-part2-tag' of git://git./linux/kernel/git/kdave/linux

Pull more btrfs updates from David Sterba:
 "We have queued a few more fixes (error handling, log replay,
  softlockup) and the rest is SPDX updates that touche almost all files
  so the diffstat is long"

* tag 'for-4.17-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Only check first key for committed tree blocks
  btrfs: add SPDX header to Kconfig
  btrfs: replace GPL boilerplate by SPDX -- sources
  btrfs: replace GPL boilerplate by SPDX -- headers
  Btrfs: fix loss of prealloc extents past i_size after fsync log replay
  Btrfs: clean up resources during umount after trans is aborted
  btrfs: Fix possible softlock on single core machines
  Btrfs: bail out on error during replay_dir_deletes
  Btrfs: fix NULL pointer dereference in log_dir_items

6 years agoMerge tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Mon, 16 Apr 2018 01:06:22 +0000 (18:06 -0700)]
Merge tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
 "SMB3 fixes, a few for stable, and some important cleanup work from
  Ronnie of the smb3 transport code"

* tag '4.17-rc1SMB3-Fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: change validate_buf to validate_iov
  cifs: remove rfc1002 hardcoded constants from cifs_discard_remaining_data()
  cifs: Change SMB2_open to return an iov for the error parameter
  cifs: add resp_buf_size to the mid_q_entry structure
  smb3.11: replace a 4 with server->vals->header_preamble_size
  cifs: replace a 4 with server->vals->header_preamble_size
  cifs: add pdu_size to the TCP_Server_Info structure
  SMB311: Improve checking of negotiate security contexts
  SMB3: Fix length checking of SMB3.11 negotiate request
  CIFS: add ONCE flag for cifs_dbg type
  cifs: Use ULL suffix for 64-bit constant
  SMB3: Log at least once if tree connect fails during reconnect
  cifs: smb2pdu: Fix potential NULL pointer dereference

6 years agofilter.txt: update 'tools/net/' to 'tools/bpf/'
Wang Sheng-Hui [Sun, 15 Apr 2018 08:07:12 +0000 (16:07 +0800)]
filter.txt: update 'tools/net/' to 'tools/bpf/'

The tools are located at tootls/bpf/ instead of tools/net/.
Update the filter.txt doc.

Signed-off-by: Wang Sheng-Hui <shhuiw@foxmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Mon, 16 Apr 2018 00:24:12 +0000 (17:24 -0700)]
Merge tag 'scsi-fixes' of git://git./linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of minor (and safe changes) that didn't make the initial
  pull request plus some bug fixes.

  The status handling code is actually a running regression from the
  previous merge window which had an incomplete fix (now reverted) and
  most of the remaining bug fixes are for problems older than the
  current merge window"

[ Side note: this merge also takes the base kernel git repository to 6+
  million objects for the first time. Technically we hit it a couple of
  merges ago already if you count all the tag objects, but now it
  reaches 6M+ objects reachable from HEAD.

  I was joking around that that's when I should switch to 5.0, because
  3.0 happened at the 2M mark, and 4.0 happened at 4M objects. But
  probably not, even if numerology is about as good a reason as any.

                                                              - Linus ]

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: devinfo: Add Microsoft iSCSI target to 1024 sector blacklist
  scsi: cxgb4i: silence overflow warning in t4_uld_rx_handler()
  scsi: dpt_i2o: Use after free in I2ORESETCMD ioctl
  scsi: core: Make scsi_result_to_blk_status() recognize CONDITION MET
  scsi: core: Rename __scsi_error_from_host_byte() into scsi_result_to_blk_status()
  Revert "scsi: core: return BLK_STS_OK for DID_OK in __scsi_error_from_host_byte()"
  scsi: aacraid: Insure command thread is not recursively stopped
  scsi: qla2xxx: Correct setting of SAM_STAT_CHECK_CONDITION
  scsi: qla2xxx: correctly shift host byte
  scsi: qla2xxx: Fix race condition between iocb timeout and initialisation
  scsi: qla2xxx: Avoid double completion of abort command
  scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure
  scsi: scsi_dh: Don't look for NULL devices handlers by name
  scsi: core: remove redundant assignment to shost->use_blk_mq

6 years agoMerge tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy...
Linus Torvalds [Mon, 16 Apr 2018 00:21:30 +0000 (17:21 -0700)]
Merge tag 'kbuild-v4.17-2' of git://git./linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - pass HOSTLDFLAGS when compiling single .c host programs

 - build genksyms lexer and parser files instead of using shipped
   versions

 - rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency

 - let the top .gitignore globally ignore artifacts generated by flex,
   bison, and asn1_compiler

 - let the top Makefile globally clean artifacts generated by flex,
   bison, and asn1_compiler

 - use safer .SECONDARY marker instead of .PRECIOUS to prevent
   intermediate files from being removed

 - support -fmacro-prefix-map option to make __FILE__ a relative path

 - fix # escaping to prepare for the future GNU Make release

 - clean up deb-pkg by using debian tools instead of handrolled
   source/changes generation

 - improve rpm-pkg portability by supporting kernel-install as a
   fallback of new-kernel-pkg

 - extend Kconfig listnewconfig target to provide more information

* tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: extend output of 'listnewconfig'
  kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg
  Kbuild: fix # escaping in .cmd files for future Make
  kbuild: deb-pkg: split generating packaging and build
  kbuild: use -fmacro-prefix-map to make __FILE__ a relative path
  kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers
  kbuild: rename *-asn1.[ch] to *.asn1.[ch]
  kbuild: clean up *-asn1.[ch] patterns from top-level Makefile
  .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore
  kbuild: add %.dtb.S and %.dtb to 'targets' automatically
  kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically
  genksyms: generate lexer and parser during build instead of shipping
  kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile
  .gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore
  kbuild: use HOSTLDFLAGS for single .c executables

6 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 15 Apr 2018 23:12:35 +0000 (16:12 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Thomas Gleixner:
 "A set of fixes and updates for x86:

   - Address a swiotlb regression which was caused by the recent DMA
     rework and made driver fail because dma_direct_supported() returned
     false

   - Fix a signedness bug in the APIC ID validation which caused invalid
     APIC IDs to be detected as valid thereby bloating the CPU possible
     space.

   - Fix inconsisten config dependcy/select magic for the MFD_CS5535
     driver.

   - Fix a corruption of the physical address space bits when encryption
     has reduced the address space and late cpuinfo updates overwrite
     the reduced bit information with the original value.

   - Dominiks syscall rework which consolidates the architecture
     specific syscall functions so all syscalls can be wrapped with the
     same macros. This allows to switch x86/64 to struct pt_regs based
     syscalls. Extend the clearing of user space controlled registers in
     the entry patch to the lower registers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix signedness bug in APIC ID validity checks
  x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
  x86/olpc: Fix inconsistent MFD_CS5535 configuration
  swiotlb: Use dma_direct_supported() for swiotlb_ops
  syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
  syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
  syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
  syscalls/core, syscalls/x86: Clean up syscall stub naming convention
  syscalls/x86: Extend register clearing on syscall entry to lower registers
  syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
  syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
  syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
  syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
  syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
  x86/syscalls: Don't pointlessly reload the system call number
  x86/mm: Fix documentation of module mapping range with 4-level paging
  x86/cpuid: Switch to 'static const' specifier

6 years agoMerge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 15 Apr 2018 20:35:29 +0000 (13:35 -0700)]
Merge branch 'x86-pti-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 pti updates from Thomas Gleixner:
 "Another series of PTI related changes:

   - Remove the manual stack switch for user entries from the idtentry
     code. This debloats entry by 5k+ bytes of text.

   - Use the proper types for the asm/bootparam.h defines to prevent
     user space compile errors.

   - Use PAGE_GLOBAL for !PCID systems to gain back performance

   - Prevent setting of huge PUD/PMD entries when the entries are not
     leaf entries otherwise the entries to which the PUD/PMD points to
     and are populated get lost"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
  x86/pti: Leave kernel text global for !PCID
  x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image
  x86/pti: Enable global pages for shared areas
  x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
  x86/mm: Comment _PAGE_GLOBAL mystery
  x86/mm: Remove extra filtering in pageattr code
  x86/mm: Do not auto-massage page protections
  x86/espfix: Document use of _PAGE_GLOBAL
  x86/mm: Introduce "default" kernel PTE mask
  x86/mm: Undo double _PAGE_PSE clearing
  x86/mm: Factor out pageattr _PAGE_GLOBAL setting
  x86/entry/64: Drop idtentry's manual stack switch for user entries
  x86/uapi: Fix asm/bootparam.h userspace compilation errors

6 years agoMerge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 15 Apr 2018 19:43:30 +0000 (12:43 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull scheduler fixes from Thomas Gleixner:
 "A few scheduler fixes:

   - Prevent a bogus warning vs. runqueue clock update flags in
     do_sched_rt_period_timer()

   - Simplify the helper functions which handle requests for skipping
     the runqueue clock updat.

   - Do not unlock the tunables mutex in the error path of the cpu
     frequency scheduler utils. Its not held.

   - Enforce proper alignement for 'struct util_est' in sched_avg to
     prevent a misalignment fault on IA64"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/core: Force proper alignment of 'struct util_est'
  sched/core: Simplify helpers for rq clock update skip requests
  sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
  sched/cpufreq/schedutil: Fix error path mutex unlock

6 years agoMerge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 15 Apr 2018 19:36:31 +0000 (12:36 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull more perf updates from Thomas Gleixner:
 "A rather large set of perf updates:

  Kernel:

   - Fix various initialization issues

   - Prevent creating [ku]probes for not CAP_SYS_ADMIN users

  Tooling:

   - Show only failing syscalls with 'perf trace --failure' (Arnaldo
     Carvalho de Melo)

            e.g: See what 'openat' syscalls are failing:

        # perf trace --failure -e openat
         762.323 ( 0.007 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video2) = -1 ENOENT No such file or directory
         <SNIP N /dev/videoN open attempts... sigh, where is that improvised camera lid?!? >
         790.228 ( 0.008 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video63) = -1 ENOENT No such file or directory
        ^C#

   - Show information about the event (freq, nr_samples, total
     period/nr_events) in the annotate --tui and --stdio2 'perf
     annotate' output, similar to the first line in the 'perf report
     --tui', but just for the samples for a the annotated symbol
     (Arnaldo Carvalho de Melo)

   - Introduce 'perf version --build-options' to show what features were
     linked, aliased as well as a shorter 'perf -vv' (Jin Yao)

   - Add a "dso_size" sort order (Kim Phillips)

   - Remove redundant ')' in the tracepoint output in 'perf trace'
     (Changbin Du)

   - Synchronize x86's cpufeatures.h, no effect on toolss (Arnaldo
     Carvalho de Melo)

   - Show group details on the title line in the annotate browser and
     'perf annotate --stdio2' output, so that the per-event columns can
     have headers (Arnaldo Carvalho de Melo)

   - Fixup vertical line separating metrics from instructions and
     cleaning unused lines at the bottom, both in the annotate TUI
     browser (Arnaldo Carvalho de Melo)

   - Remove duplicated 'samples' in lost samples warning in
     'perf report' (Arnaldo Carvalho de Melo)

   - Synchronize i915_drm.h, silencing the perf build process,
     automagically adding support for the new DRM_I915_QUERY ioctl
     (Arnaldo Carvalho de Melo)

   - Make auxtrace_queues__add_buffer() allocate struct buffer, from a
     patchkit already applied (Adrian Hunter)

   - Fix the --stdio2/TUI annotate output to include group details, be
     it for a recorded '{a,b,f}' explicit event group or when forcing
     group display using 'perf report --group' for a set of events not
     recorded as a group (Arnaldo Carvalho de Melo)

   - Fix display artifacts in the ui browser (base class for the
     annotate and main report/top TUI browser) related to the extra
     title lines work (Arnaldo Carvalho de Melo)

   - perf auxtrace refactorings, leftovers from a previously partially
     processed patchset (Adrian Hunter)

   - Fix the builtin clang build (Sandipan Das, Arnaldo Carvalho de
     Melo)

   - Synchronize i915_drm.h, silencing a perf build warning and in the
     process automagically adding support for a new ioctl command
     (Arnaldo Carvalho de Melo)

   - Fix a strncpy issue in uprobe tracing"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  perf/core: Need CAP_SYS_ADMIN to create k/uprobe with perf_event_open()
  tracing/uprobe_event: Fix strncpy corner case
  perf/core: Fix perf_uprobe_init()
  perf/core: Fix perf_kprobe_init()
  perf/core: Fix use-after-free in uprobe_perf_close()
  perf tests clang: Fix function name for clang IR test
  perf clang: Add support for recent clang versions
  perf tools: Fix perf builds with clang support
  perf tools: No need to include namespaces.h in util.h
  perf hists browser: Remove leftover from row returned from refresh
  perf hists browser: Show extra_title_lines in the 'D' debug hotkey
  perf auxtrace: Make auxtrace_queues__add_buffer() do CPU filtering
  tools headers uapi: Synchronize i915_drm.h
  perf report: Remove duplicated 'samples' in lost samples warning
  perf ui browser: Fixup cleaning unused lines at the bottom
  perf annotate browser: Fixup vertical line separating metrics from instructions
  perf annotate: Show group details on the title line
  perf auxtrace: Make auxtrace_queues__add_buffer() allocate struct buffer
  perf/x86/intel: Move regs->flags EXACT bit init
  perf trace: Remove redundant ')'
  ...

6 years agoMerge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 15 Apr 2018 19:32:06 +0000 (12:32 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 EFI bootup fixlet from Thomas Gleixner:
 "A single fix for an early boot warning caused by invoking
  this_cpu_has() before SMP initialization"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()

6 years agoMerge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 15 Apr 2018 19:29:46 +0000 (12:29 -0700)]
Merge branch 'irq-core-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq affinity fixes from Thomas Gleixner:

  - Fix error path handling in the affinity spreading code

  - Make affinity spreading smarter to avoid issues on systems which
    claim to have hotpluggable CPUs while in fact they can't hotplug
    anything.

    So instead of trying to spread the vectors (and thereby the
    associated device queues) to all possibe CPUs, spread them on all
    present CPUs first. If there are left over vectors after that first
    step they are spread among the possible, but not present CPUs which
    keeps the code backwards compatible for virtual decives and NVME
    which allocate a queue per possible CPU, but makes the spreading
    smarter for devices which have less queues than possible or present
    CPUs.

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/affinity: Spread irq vectors among present CPUs as far as possible
  genirq/affinity: Allow irq spreading from a given starting point
  genirq/affinity: Move actual irq vector spreading into a helper function
  genirq/affinity: Rename *node_to_possible_cpumask as *node_to_cpumask
  genirq/affinity: Don't return with empty affinity masks on error

6 years agoMerge tag 'for-linus' of git://github.com/openrisc/linux
Linus Torvalds [Sun, 15 Apr 2018 19:27:58 +0000 (12:27 -0700)]
Merge tag 'for-linus' of git://github.com/openrisc/linux

Pull OpenRISC fixlet from Stafford Horne:
 "Just one small thing here, it came in a while back but I didnt have
  anything in my 4.16 queue, still its the only thing for 4.17 so
  sending it alone.

  Small cleanup: remove unused __ARCH_HAVE_MMU define"

* tag 'for-linus' of git://github.com/openrisc/linux:
  openrisc: remove unused __ARCH_HAVE_MMU define

6 years agoMerge tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc...
Linus Torvalds [Sun, 15 Apr 2018 18:57:12 +0000 (11:57 -0700)]
Merge tag 'powerpc-4.17-2' of git://git./linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix crashes when loading modules built with a different
   CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic.

 - Fix busy loops in the OPAL NVRAM driver if we get certain error
   conditions from firmware.

 - Remove tlbie trace points from KVM code that's called in real mode,
   because it causes crashes.

 - Fix checkstops caused by invalid tlbiel on Power9 Radix.

 - Ensure the set of CPU features we "know" are always enabled is
   actually the minimal set when we build with support for firmware
   supplied CPU features.

Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.

* tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
  powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
  KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
  powerpc/8xx: Fix build with hugetlbfs enabled
  powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
  powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
  powerpc/fscr: Enable interrupts earlier before calling get_user()
  powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
  powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic

6 years agoMerge branch 'sfc-ARFS-fixes'
David S. Miller [Sat, 14 Apr 2018 19:39:53 +0000 (15:39 -0400)]
Merge branch 'sfc-ARFS-fixes'

Edward Cree says:

====================
sfc: ARFS fixes

Three issues introduced by my recent asynchronous filter handling changes:
1. The old filter_rfs_insert would replace a matching filter of equal
   priority; we need to pass the appropriate argument to filter_insert to
   make it do the same.
2. We're lying to the kernel with our return value from ndo_rx_flow_steer,
   so we need to lie consistently when calling rps_may_expire_flow.  This
   is only a partial fix, as the lie still prevents us from steering
   multiple flows with the same ID to different queues; a proper fix that
   stops us lying at all will hopefully follow later.
3. It's possible to cause the kernel to hammer ndo_rx_flow_steer very
   hard, so make sure we don't build up too huge a backlog of workitems.

Possibly it would be better to fix #3 on the kernel side; I have a patch
 which I think does that but it's not a regression in 4.17 so isn't 'net'
 material.
There's also the issue that we come up in the bad configuration that
 triggers #3 by default, but that too is a problem for another time.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: limit ARFS workitems in flight per channel
Edward Cree [Fri, 13 Apr 2018 18:18:09 +0000 (19:18 +0100)]
sfc: limit ARFS workitems in flight per channel

A misconfigured system (e.g. with all interrupts affinitised to all CPUs)
 may produce a storm of ARFS steering events.  With the existing sfc ARFS
 implementation, that could create a backlog of workitems that grinds the
 system to a halt.  To prevent this, limit the number of workitems that
 may be in flight for a given SFC device to 8 (EFX_RPS_MAX_IN_FLIGHT), and
 return EBUSY from our ndo_rx_flow_steer method if the limit is reached.
Given this limit, also store the workitems in an array of slots within the
 struct efx_nic, rather than dynamically allocating for each request.
The limit should not negatively impact performance, because it is only
 likely to be hit in cases where ARFS will be ineffective anyway.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: pass the correctly bogus filter_id to rps_may_expire_flow()
Edward Cree [Fri, 13 Apr 2018 18:17:49 +0000 (19:17 +0100)]
sfc: pass the correctly bogus filter_id to rps_may_expire_flow()

When we inserted an ARFS filter for ndo_rx_flow_steer(), we didn't know
 what the filter ID would be, so we just returned 0.  Thus, we must also
 pass 0 as the filter ID when calling rps_may_expire_flow() for it, and
 rely on the flow_id to identify what we're talking about.

Fixes: 3af0f34290f6 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosfc: insert ARFS filters with replace_equal=true
Edward Cree [Fri, 13 Apr 2018 18:17:22 +0000 (19:17 +0100)]
sfc: insert ARFS filters with replace_equal=true

Necessary to allow redirecting a flow when the application moves.

Fixes: 3af0f34290f6 ("sfc: replace asynchronous filter operations")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'akpm' (patches from Andrew)
Linus Torvalds [Sat, 14 Apr 2018 15:50:50 +0000 (08:50 -0700)]
Merge branch 'akpm' (patches from Andrew)

Merge yet more updates from Andrew Morton:

 - various hotfixes

 - kexec_file updates and feature work

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  kernel/kexec_file.c: move purgatories sha256 to common code
  kernel/kexec_file.c: allow archs to set purgatory load address
  kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
  kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: split up __kexec_load_puragory
  kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
  kernel/kexec_file.c: search symbols in read-only kexec_purgatory
  kernel/kexec_file.c: make purgatory_info->ehdr const
  kernel/kexec_file.c: remove checks in kexec_purgatory_load
  include/linux/kexec.h: silence compile warnings
  kexec_file, x86: move re-factored code to generic side
  x86: kexec_file: clean up prepare_elf64_headers()
  x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
  x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
  x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
  kexec_file,x86,powerpc: factor out kexec_file_ops functions
  kexec_file: make use of purgatory optional
  proc: revalidate misc dentries
  mm, slab: reschedule cache_reap() on the same CPU
  ...

6 years agoparisc: Fix missing binfmt_elf32.o build error
Helge Deller [Fri, 13 Apr 2018 19:54:37 +0000 (21:54 +0200)]
parisc: Fix missing binfmt_elf32.o build error

Commit 71d577db01a5 ("parisc: Switch to generic COMPAT_BINFMT_ELF")
removed the binfmt_elf32.c source file, but missed to drop the object
file from the list of object files the Makefile, which then results in a
build error.

Fixes: 71d577db01a5 ("parisc: Switch to generic COMPAT_BINFMT_ELF")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
6 years agokernel/kexec_file.c: move purgatories sha256 to common code
Philipp Rudo [Fri, 13 Apr 2018 22:36:46 +0000 (15:36 -0700)]
kernel/kexec_file.c: move purgatories sha256 to common code

The code to verify the new kernels sha digest is applicable for all
architectures.  Move it to common code.

One problem is the string.c implementation on x86.  Currently sha256
includes x86/boot/string.h which defines memcpy and memset to be gcc
builtins.  By moving the sha256 implementation to common code and
changing the include to linux/string.h both functions are no longer
defined.  Thus definitions have to be provided in x86/purgatory/string.c

Link: http://lkml.kernel.org/r/20180321112751.22196-12-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: allow archs to set purgatory load address
Philipp Rudo [Fri, 13 Apr 2018 22:36:43 +0000 (15:36 -0700)]
kernel/kexec_file.c: allow archs to set purgatory load address

For s390 new kernels are loaded to fixed addresses in memory before they
are booted.  With the current code this is a problem as it assumes the
kernel will be loaded to an 'arbitrary' address.  In particular,
kexec_locate_mem_hole searches for a large enough memory region and sets
the load address (kexec_bufer->mem) to it.

Luckily there is a simple workaround for this problem.  By returning 1
in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off.  This
allows the architecture to set kbuf->mem by hand.  While the trick works
fine for the kernel it does not for the purgatory as here the
architectures don't have access to its kexec_buffer.

Give architectures access to the purgatories kexec_buffer by changing
kexec_load_purgatory to take a pointer to it.  With this change
architectures have access to the buffer and can edit it as they need.

A nice side effect of this change is that we can get rid of the
purgatory_info->purgatory_load_address field.  As now the information
stored there can directly be accessed from kbuf->mem.

Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
Philipp Rudo [Fri, 13 Apr 2018 22:36:39 +0000 (15:36 -0700)]
kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load

The current code uses the sh_offset field in purgatory_info->sechdrs to
store a pointer to the current load address of the section.  Depending
whether the section will be loaded or not this is either a pointer into
purgatory_info->purgatory_buf or kexec_purgatory.  This is not only a
violation of the ELF standard but also makes the code very hard to
understand as you cannot tell if the memory you are using is read-only
or not.

Remove this misuse and store the offset of the section in
pugaroty_info->purgatory_buf in sh_offset.

Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs
Philipp Rudo [Fri, 13 Apr 2018 22:36:35 +0000 (15:36 -0700)]
kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs

The main loop currently uses quite a lot of variables to update the
section headers.  Some of them are unnecessary.  So clean them up a
little.

Link: http://lkml.kernel.org/r/20180321112751.22196-9-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs
Philipp Rudo [Fri, 13 Apr 2018 22:36:32 +0000 (15:36 -0700)]
kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs

To update the entry point there is an extra loop over all section
headers although this can be done in the main loop.  So move it there
and eliminate the extra loop and variable to store the 'entry section
index'.

Also, in the main loop, move the usual case, i.e.  non-bss section, out
of the extra if-block.

Link: http://lkml.kernel.org/r/20180321112751.22196-8-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: split up __kexec_load_puragory
Philipp Rudo [Fri, 13 Apr 2018 22:36:28 +0000 (15:36 -0700)]
kernel/kexec_file.c: split up __kexec_load_puragory

When inspecting __kexec_load_purgatory you find that it has two tasks

1) setting up the kexec_buffer for the new kernel and,
2) setting up pi->sechdrs for the final load address.

The two tasks are independent of each other.  To improve readability
split up __kexec_load_purgatory into two functions, one for each task,
and call them directly from kexec_load_purgatory.

Link: http://lkml.kernel.org/r/20180321112751.22196-7-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
Philipp Rudo [Fri, 13 Apr 2018 22:36:24 +0000 (15:36 -0700)]
kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*

When the relocations are applied to the purgatory only the section the
relocations are applied to is writable.  The other sections, i.e.  the
symtab and .rel/.rela, are in read-only kexec_purgatory.  Highlight this
by marking the corresponding variables as 'const'.

While at it also change the signatures of arch_kexec_apply_relocations* to
take section pointers instead of just the index of the relocation section.
This removes the second lookup and sanity check of the sections in arch
code.

Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: search symbols in read-only kexec_purgatory
Philipp Rudo [Fri, 13 Apr 2018 22:36:21 +0000 (15:36 -0700)]
kernel/kexec_file.c: search symbols in read-only kexec_purgatory

The stripped purgatory does not contain a symtab.  So when looking for
symbols this is done in read-only kexec_purgatory.  Highlight this by
marking the corresponding variables as 'const'.

Link: http://lkml.kernel.org/r/20180321112751.22196-5-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: make purgatory_info->ehdr const
Philipp Rudo [Fri, 13 Apr 2018 22:36:17 +0000 (15:36 -0700)]
kernel/kexec_file.c: make purgatory_info->ehdr const

The kexec_purgatory buffer is read-only.  Thus all pointers into
kexec_purgatory are read-only, too.  Point this out by explicitly
marking purgatory_info->ehdr as 'const' and update the comments in
purgatory_info.

Link: http://lkml.kernel.org/r/20180321112751.22196-4-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokernel/kexec_file.c: remove checks in kexec_purgatory_load
Philipp Rudo [Fri, 13 Apr 2018 22:36:13 +0000 (15:36 -0700)]
kernel/kexec_file.c: remove checks in kexec_purgatory_load

Before the purgatory is loaded several checks are done whether the ELF
file in kexec_purgatory is valid or not.  These checks are incomplete.
For example they don't check for the total size of the sections defined
in the section header table or if the entry point actually points into
the purgatory.

On the other hand the purgatory, although an ELF file on its own, is
part of the kernel.  Thus not trusting the purgatory means not trusting
the kernel build itself.

So remove all validity checks on the purgatory and just trust the kernel
build.

Link: http://lkml.kernel.org/r/20180321112751.22196-3-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoinclude/linux/kexec.h: silence compile warnings
Philipp Rudo [Fri, 13 Apr 2018 22:36:10 +0000 (15:36 -0700)]
include/linux/kexec.h: silence compile warnings

Patch series "kexec_file: Clean up purgatory load", v2.

Following the discussion with Dave and AKASHI, here are the common code
patches extracted from my recent patch set (Add kexec_file_load support
to s390) [1].  The patches were extracted to allow upstream integration
together with AKASHI's common code patches before the arch code gets
adjusted to the new base.

The reason for this series is to prepare common code for adding
kexec_file_load to s390 as well as cleaning up the mis-use of the
sh_offset field during purgatory load.  In detail this series contains:

Patch #1&2: Minor cleanups/fixes.

Patch #3-9: Clean up the purgatory load/relocation code.  Especially
remove the mis-use of the purgatory_info->sechdrs->sh_offset field,
currently holding a pointer into either kexec_purgatory (ro) or
purgatory_buf (rw) depending on the section.  With these patches the
section address will be calculated verbosely and sh_offset will contain
the offset of the section in the stripped purgatory binary
(purgatory_buf).

Patch #10: Allows architectures to set the purgatory load address.  This
patch is important for s390 as the kernel and purgatory have to be
loaded to fixed addresses.  In current code this is impossible as the
purgatory load is opaque to the architecture.

Patch #11: Moves x86 purgatories sha implementation to common lib/
directory to allow reuse in other architectures.

This patch (of 11)

When building the kernel with CONFIG_KEXEC_FILE enabled gcc prints a
compile warning multiple times.

  In file included from <path>/linux/init/initramfs.c:526:0:
  <path>/include/linux/kexec.h:120:9: warning: `struct kimage' declared inside parameter list [enabled by default]
           unsigned long cmdline_len);
           ^

This is because the typedefs for kexec_file_load uses struct kimage
before it is declared.  Fix this by simply forward declaring struct
kimage.

Link: http://lkml.kernel.org/r/20180321112751.22196-2-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokexec_file, x86: move re-factored code to generic side
AKASHI Takahiro [Fri, 13 Apr 2018 22:36:06 +0000 (15:36 -0700)]
kexec_file, x86: move re-factored code to generic side

In the previous patches, commonly-used routines, exclude_mem_range() and
prepare_elf64_headers(), were carved out.  Now place them in kexec
common code.  A prefix "crash_" is given to each of their names to avoid
possible name collisions.

Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agox86: kexec_file: clean up prepare_elf64_headers()
AKASHI Takahiro [Fri, 13 Apr 2018 22:36:03 +0000 (15:36 -0700)]
x86: kexec_file: clean up prepare_elf64_headers()

Removing bufp variable in prepare_elf64_headers() makes the code simpler
and more understandable.

Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agox86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
AKASHI Takahiro [Fri, 13 Apr 2018 22:35:59 +0000 (15:35 -0700)]
x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer

While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number
array is not a good idea in general.

In this patch, size of crash_mem buffer is calculated as before and the
buffer is now dynamically allocated.  This change also allows removing
crash_elf_data structure.

Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agox86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
AKASHI Takahiro [Fri, 13 Apr 2018 22:35:56 +0000 (15:35 -0700)]
x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()

The code guarded by CONFIG_X86_64 is necessary on some architectures
which have a dedicated kernel mapping outside of linear memory mapping.
(arm64 is among those.)

In this patch, an additional argument, kernel_map, is added to enable/
disable the code removing #ifdef.

Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agox86: kexec_file: purge system-ram walking from prepare_elf64_headers()
AKASHI Takahiro [Fri, 13 Apr 2018 22:35:53 +0000 (15:35 -0700)]
x86: kexec_file: purge system-ram walking from prepare_elf64_headers()

While prepare_elf64_headers() in x86 looks pretty generic for other
architectures' use, it contains some code which tries to list crash
memory regions by walking through system resources, which is not always
architecture agnostic.  To make this function more generic, the related
code should be purged.

In this patch, prepare_elf64_headers() simply scans crash_mem buffer
passed and add all the listed regions to elf header as a PT_LOAD
segment.  So walk_system_ram_res(prepare_elf64_headers_callback) have
been moved forward before prepare_elf64_headers() where the callback,
prepare_elf64_headers_callback(), is now responsible for filling up
crash_mem buffer.

Meanwhile exclude_elf_header_ranges() used to be called every time in
this callback it is rather redundant and now called only once in
prepare_elf_headers() as well.

Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokexec_file,x86,powerpc: factor out kexec_file_ops functions
AKASHI Takahiro [Fri, 13 Apr 2018 22:35:49 +0000 (15:35 -0700)]
kexec_file,x86,powerpc: factor out kexec_file_ops functions

As arch_kexec_kernel_image_{probe,load}(),
arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig()
are almost duplicated among architectures, they can be commonalized with
an architecture-defined kexec_file_ops array.  So let's factor them out.

Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agokexec_file: make use of purgatory optional
AKASHI Takahiro [Fri, 13 Apr 2018 22:35:45 +0000 (15:35 -0700)]
kexec_file: make use of purgatory optional

Patch series "kexec_file, x86, powerpc: refactoring for other
architecutres", v2.

This is a preparatory patchset for adding kexec_file support on arm64.

It was originally included in a arm64 patch set[1], but Philipp is also
working on their kexec_file support on s390[2] and some changes are now
conflicting.

So these common parts were extracted and put into a separate patch set
for better integration.  What's more, my original patch#4 was split into
a few small chunks for easier review after Dave's comment.

As such, the resulting code is basically identical with my original, and
the only *visible* differences are:

 - renaming of _kexec_kernel_image_probe() and  _kimage_file_post_load_cleanup()

 - change one of types of arguments at prepare_elf64_headers()

Those, unfortunately, require a couple of trivial changes on the rest
(#1, #6 to #13) of my arm64 kexec_file patch set[1].

Patch #1 allows making a use of purgatory optional, particularly useful
for arm64.

Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load,
verify_sig}() and arch_kimage_file_post_load_cleanup() across
architectures.

Patches #3-#7 are also intended to generalize parse_elf64_headers(),
along with exclude_mem_range(), to be made best re-use of.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html
[2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html

This patch (of 7):

On arm64, crash dump kernel's usable memory is protected by *unmapping*
it from kernel virtual space unlike other architectures where the region
is just made read-only.  It is highly unlikely that the region is
accidentally corrupted and this observation rationalizes that digest
check code can also be dropped from purgatory.  The resulting code is so
simple as it doesn't require a bit ugly re-linking/relocation stuff,
i.e.  arch_kexec_apply_relocations_add().

Please see:

   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html

All that the purgatory does is to shuffle arguments and jump into a new
kernel, while we still need to have some space for a hash value
(purgatory_sha256_digest) which is never checked against.

As such, it doesn't make sense to have trampline code between old kernel
and new kernel on arm64.

This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and
allows related code to be compiled in only if necessary.

[takahiro.akashi@linaro.org: fix trivial screwup]
Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org
Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6 years agoproc: revalidate misc dentries
Alexey Dobriyan [Fri, 13 Apr 2018 22:35:42 +0000 (15:35 -0700)]
proc: revalidate misc dentries

If module removes proc directory while another process pins it by
chdir'ing to it, then subsequent recreation of proc entry and all
entries down the tree will not be visible to any process until pinning
process unchdir from directory and unpins everything.

Steps to reproduce:

proc_mkdir("aaa", NULL);
proc_create("aaa/bbb", ...);

chdir("/proc/aaa");

remove_proc_entry("aaa/bbb", NULL);
remove_proc_entry("aaa", NULL);

proc_mkdir("aaa", NULL);
# inaccessible because "aaa" dentry still points
# to the original "aaa".
proc_create("aaa/bbb", ...);

Fix is to implement ->d_revalidate and ->d_delete.

Link: http://lkml.kernel.org/r/20180312201938.GA4871@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>