OSDN Git Service
Andrea Parri [Tue, 20 Feb 2018 18:45:56 +0000 (19:45 +0100)]
locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
[ Upstream commit
cb13b424e986aed68d74cbaec3449ea23c50e167 ]
Continuing along with the fight against smp_read_barrier_depends() [1]
(or rather, against its improper use), add an unconditional barrier to
cmpxchg. This guarantees that dependency ordering is preserved when a
dependency is headed by an unsuccessful cmpxchg. As it turns out, the
change could enable further simplification of LKMM as proposed in [2].
[1] https://marc.info/?l=linux-kernel&m=
150884953419377&w=2
https://marc.info/?l=linux-kernel&m=
150884946319353&w=2
https://marc.info/?l=linux-kernel&m=
151215810824468&w=2
https://marc.info/?l=linux-kernel&m=
151215816324484&w=2
[2] https://marc.info/?l=linux-kernel&m=
151881978314872&w=2
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-alpha@vger.kernel.org
Link: http://lkml.kernel.org/r/1519152356-4804-1-git-send-email-parri.andrea@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andreas Kemnade [Tue, 20 Feb 2018 13:30:10 +0000 (07:30 -0600)]
usb: musb: fix enumeration after resume
[ Upstream commit
17539f2f4f0b7fa906b508765c8ada07a1e45f52 ]
On dm3730 there are enumeration problems after resume.
Investigation led to the cause that the MUSB_POWER_SOFTCONN
bit is not set. If it was set before suspend (because it
was enabled via musb_pullup()), it is set in
musb_restore_context() so the pullup is enabled. But then
musb_start() is called which overwrites MUSB_POWER and
therefore disables MUSB_POWER_SOFTCONN, so no pullup is
enabled and the device is not enumerated.
So let's do a subset of what musb_start() does
in the same way as musb_suspend() does it. Platform-specific
stuff it still called as there might be some phy-related stuff
which needs to be enabled.
Also interrupts are enabled, as it was the original idea
of calling musb_start() in musb_resume() according to
Commit
6fc6f4b87cb3 ("usb: musb: Disable interrupts on suspend,
enable them on resume")
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Bin Liu <b-liu@ti.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wolfram Sang [Mon, 5 Feb 2018 20:09:59 +0000 (21:09 +0100)]
drm/exynos: fix comparison to bitshift when dealing with a mask
[ Upstream commit
1293b6191010672c0c9dacae8f71c6f3e4d70cbe ]
Due to a typo, the mask was destroyed by a comparison instead of a bit
shift.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yufen Yu [Tue, 6 Feb 2018 09:39:15 +0000 (17:39 +0800)]
md raid10: fix NULL deference in handle_write_completed()
[ Upstream commit
01a69cab01c184d3786af09e9339311123d63d22 ]
In the case of 'recover', an r10bio with R10BIO_WriteError &
R10BIO_IsRecover will be progressed by handle_write_completed().
This function traverses all r10bio->devs[copies].
If devs[m].repl_bio != NULL, it thinks conf->mirrors[dev].replacement
is also not NULL. However, this is not always true.
When there is an rdev of raid10 has replacement, then each r10bio
->devs[m].repl_bio != NULL in conf->r10buf_pool. However, in 'recover',
even if corresponded replacement is NULL, it doesn't clear r10bio
->devs[m].repl_bio, resulting in replacement NULL deference.
This bug was introduced when replacement support for raid10 was
added in Linux 3.3.
As NeilBrown suggested:
Elsewhere the determination of "is this device part of the
resync/recovery" is made by resting bio->bi_end_io.
If this is end_sync_write, then we tried to write here.
If it is NULL, then we didn't try to write.
Fixes:
9ad1aefc8ae8 ("md/raid10: Handle replacement devices during resync.")
Cc: stable (V3.3+)
Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Felix Fietkau [Sat, 10 Feb 2018 12:20:34 +0000 (13:20 +0100)]
mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
[ Upstream commit
651b9920d7a694ffb1f885aef2bbb068a25d9d66 ]
This ensures that mac80211 allocated management frames are properly
aligned, which makes copying them more efficient.
For instance, mt76 uses iowrite32_copy to copy beacon frames to beacon
template memory on the chip.
Misaligned 32-bit accesses cause CPU exceptions on MIPS and should be
avoided.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kees Cook [Wed, 14 Feb 2018 23:45:07 +0000 (15:45 -0800)]
NFC: llcp: Limit size of SDP URI
[ Upstream commit
fe9c842695e26d8116b61b80bfb905356f07834b ]
The tlv_len is u8, so we need to limit the size of the SDP URI. Enforce
this both in the NLA policy and in the code that performs the allocation
and copy, to avoid writing past the end of the allocated buffer.
Fixes:
d9b8d8e19b073 ("NFC: llcp: Service Name Lookup netlink interface")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Tue, 2 Jan 2018 15:25:35 +0000 (16:25 +0100)]
ARM: OMAP1: clock: Fix debugfs_create_*() usage
[ Upstream commit
8cbbf1745dcde7ba7e423dc70619d223de90fd43 ]
When exposing data access through debugfs, the correct
debugfs_create_*() functions must be used, depending on data type.
Remove all casts from data pointers passed to debugfs_create_*()
functions, as such casts prevent the compiler from flagging bugs.
Correct all wrong usage:
- clk.rate is unsigned long, not u32,
- clk.flags is u8, not u32, which exposed the successive
clk.rate_offset and clk.src_offset fields.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tony Lindgren [Fri, 9 Feb 2018 16:15:53 +0000 (08:15 -0800)]
ARM: OMAP3: Fix prm wake interrupt for resume
[ Upstream commit
d3be6d2a08bd26580562d9714d3d97ea9ba22c73 ]
For platform_suspend_ops, the finish call is too late to re-enable wake
irqs and we need re-enable wake irqs on wake call instead.
Otherwise noirq resume for devices has already happened. And then
dev_pm_disarm_wake_irq() has already disabled the dedicated wake irqs
when the interrupt triggers and the wake irq is never handled.
For devices that are already in PM runtime suspended state when we
enter suspend this means that a possible wake irq will never trigger.
And this can lead into a situation where a device has a pending padconf
wake irq, and the device will stay unresponsive to any further wake
irqs.
This issue can be easily reproduced by setting serial console log level
to zero, letting the serial console idle, and suspend the system from
an ssh terminal. Then try to wake up the system by typing to the serial
console.
Note that this affects only omap3 PRM interrupt as that's currently
the only omap variant that does anything in omap_pm_wake().
In general, for the wake irqs to work, the interrupt must have either
IRQF_NO_SUSPEND or IRQF_EARLY_RESUME set for it to trigger before
dev_pm_disarm_wake_irq() disables the wake irqs.
Reported-by: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Qi Hou [Thu, 11 Jan 2018 04:54:43 +0000 (12:54 +0800)]
ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
[ Upstream commit
db35340c536f1af0108ec9a0b2126a05d358d14a ]
When more than one GP timers are used as kernel system timers and the
corresponding nodes in device-tree are marked with the same "disabled"
property, then the "attr" field of the property will be initialized
more than once as the property being added to sys file system via
__of_add_property_sysfs().
In __of_add_property_sysfs(), the "name" field of pp->attr.attr is set
directly to the return value of safe_name(), without taking care of
whether it's already a valid pointer to a memory block. If it is, its
old value will always be overwritten by the new one and the memory block
allocated before will a "ghost", then a kmemleak happened.
That the same "disabled" property being added to different nodes of device
tree would cause that kind of kmemleak overhead, at least once.
To fix it, allocate the property dynamically, and delete static one.
Signed-off-by: Qi Hou <qi.hou@windriver.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Manish Rangankar [Mon, 12 Feb 2018 06:48:41 +0000 (22:48 -0800)]
scsi: qla4xxx: skip error recovery in case of register disconnect.
[ Upstream commit
1bc5ad3a6acdcf56f83272f2de1cd2389ea9e9e2 ]
A system crashes when continuously removing/re-adding the storage
controller.
Signed-off-by: Manish Rangankar <manish.rangankar@cavium.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Meelis Roos [Fri, 9 Feb 2018 06:57:44 +0000 (08:57 +0200)]
scsi: aacraid: fix shutdown crash when init fails
[ Upstream commit
00c20cdc79259c6c5bf978b21af96c2d3edb646d ]
When aacraid init fails with "AAC0: adapter self-test failed.", shutdown
leads to UBSAN warning and then oops:
[154316.118423] ================================================================================
[154316.118508] UBSAN: Undefined behaviour in drivers/scsi/scsi_lib.c:2328:27
[154316.118566] member access within null pointer of type 'struct Scsi_Host'
[154316.118631] CPU: 2 PID: 14530 Comm: reboot Tainted: G W 4.15.0-dirty #89
[154316.118701] Hardware name: Hewlett Packard HP NetServer/HP System Board, BIOS 4.06.46 PW 06/25/2003
[154316.118774] Call Trace:
[154316.118848] dump_stack+0x48/0x65
[154316.118916] ubsan_epilogue+0xe/0x40
[154316.118976] __ubsan_handle_type_mismatch+0xfb/0x180
[154316.119043] scsi_block_requests+0x20/0x30
[154316.119135] aac_shutdown+0x18/0x40 [aacraid]
[154316.119196] pci_device_shutdown+0x33/0x50
[154316.119269] device_shutdown+0x18a/0x390
[...]
[154316.123435] BUG: unable to handle kernel NULL pointer dereference at
000000f4
[154316.123515] IP: scsi_block_requests+0xa/0x30
This is because aac_shutdown() does
struct Scsi_Host *shost = pci_get_drvdata(dev);
scsi_block_requests(shost);
and that assumes shost has been assigned with pci_set_drvdata().
However, pci_set_drvdata(pdev, shost) is done in aac_probe_one() far
after bailing out with error from calling the init function
((*aac_drivers[index].init)(aac)), and when the init function fails, no
error is returned from aac_probe_one() so PCI layer assumes there is
driver attached, and tries to shut it down later.
Fix it by returning error from aac_probe_one() when card-specific init
function fails.
This fixes reboot on my HP NetRAID-4M with dead battery.
Signed-off-by: Meelis Roos <mroos@linux.ee>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Kelley (EOSG) [Wed, 24 Jan 2018 22:49:57 +0000 (22:49 +0000)]
scsi: storvsc: Increase cmd_per_lun for higher speed devices
[ Upstream commit
cabe92a55e3a12005a4ac4d3954c9a174b0efe2a ]
Increase cmd_per_lun to allow more I/Os in progress per device,
particularly for NVMe's. The Hyper-V host side can handle the higher
count with no issues.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anders Roxell [Tue, 6 Feb 2018 22:20:44 +0000 (16:20 -0600)]
selftests: memfd: add config fragment for fuse
[ Upstream commit
9a606f8d55cfc932ec02172aaed4124fdc150047 ]
The memfd test requires to insert the fuse module (CONFIG_FUSE_FS).
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vardan Mikayelyan [Tue, 16 Jan 2018 12:04:24 +0000 (16:04 +0400)]
usb: dwc2: Fix dwc2_hsotg_core_init_disconnected()
[ Upstream commit
755d739534f998d92e348fba8ffb0478416576e7 ]
We should call dwc2_hsotg_enqueue_setup() after properly
setting lx_state. Because it may cause error-out from
dwc2_hsotg_enqueue_setup() due to wrong value in lx_state.
Issue can be reproduced by loading driver while connected
A-Connector (start in A-HOST mode) then disconnect A-Connector
to switch to B-DEVICE.
Acked-by: John Youn <johnyoun@synopsys.com>
Signed-off-by: Vardan Mikayelyan <mvardan@synopsys.com>
Signed-off-by: Grigor Tovmasyan <tovmasya@synopsys.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stefan Agner [Sun, 11 Feb 2018 23:14:42 +0000 (00:14 +0100)]
usb: gadget: fsl_udc_core: fix ep valid checks
[ Upstream commit
20c63f4089cceab803438c383631963e34c4d8e5 ]
Clang reports the following warning:
drivers/usb/gadget/udc/fsl_udc_core.c:1312:10: warning: address of array
'ep->name' will always evaluate to 'true' [-Wpointer-bool-conversion]
if (ep->name)
~~ ~~~~^~~~
It seems that the authors intention was to check if the ep has been
configured through struct_ep_setup. Check whether struct usb_ep name
pointer has been set instead.
Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
John Keeping [Fri, 12 Jan 2018 18:43:32 +0000 (18:43 +0000)]
usb: gadget: f_uac2: fix bFirstInterface in composite gadget
[ Upstream commit
8813a59ed892305b5ac1b5b901740b1ad4b5fefa ]
If there are multiple functions associated with a configuration, then
the UAC2 interfaces may not start at zero. Set the correct first
interface number in the association descriptor so that the audio
interfaces are enumerated correctly in this case.
Reviewed-by: Krzysztof Opasiak <k.opasiak@samsung.com>
Signed-off-by: John Keeping <john@metanate.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ulf Magnusson [Mon, 5 Feb 2018 01:21:31 +0000 (02:21 +0100)]
ARC: Fix malformed ARC_EMUL_UNALIGNED default
[ Upstream commit
827cc2fa024dd6517d62de7a44c7b42f32af371b ]
'default N' should be 'default n', though they happen to have the same
effect here, due to undefined symbols (N in this case) evaluating to n
in a tristate sense.
Remove the default from ARC_EMUL_UNALIGNED instead of changing it. bool
and tristate symbols implicitly default to n.
Discovered with the
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ulfalizer_Kconfiglib_blob_master_examples_list-5Fundefined.py&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=c14YS-cH-kdhTOW89KozFhBtBJgs1zXscZojEZQ0THs&m=WxxD8ozR7QQUVzNCBksiznaisBGO_crN7PBOvAoju8s&s=1LmxsNqxwT-7wcInVpZ6Z1J27duZKSoyKxHIJclXU_M&e=
script.
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Bart Van Assche [Thu, 25 Jan 2018 16:24:29 +0000 (08:24 -0800)]
scsi: qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
[ Upstream commit
c02189e12ce3bf3808cb880569d3b10249f50bd9 ]
A left shift must shift less than the bit width of the left argument.
Avoid triggering undefined behavior if ha->mbx_count == 32.
This patch avoids that UBSAN reports the following complaint:
UBSAN: Undefined behaviour in drivers/scsi/qla2xxx/qla_isr.c:275:14
shift exponent 32 is too large for 32-bit type 'int'
Call Trace:
dump_stack+0x4e/0x6c
ubsan_epilogue+0xd/0x3b
__ubsan_handle_shift_out_of_bounds+0x112/0x14c
qla2x00_mbx_completion+0x1c5/0x25d [qla2xxx]
qla2300_intr_handler+0x1ea/0x3bb [qla2xxx]
qla2x00_mailbox_command+0x77b/0x139a [qla2xxx]
qla2x00_mbx_reg_test+0x83/0x114 [qla2xxx]
qla2x00_chip_diag+0x354/0x45f [qla2xxx]
qla2x00_initialize_adapter+0x2c2/0xa4e [qla2xxx]
qla2x00_probe_one+0x1681/0x392e [qla2xxx]
pci_device_probe+0x10b/0x1f1
driver_probe_device+0x21f/0x3a4
__driver_attach+0xa9/0xe1
bus_for_each_dev+0x6e/0xb5
driver_attach+0x22/0x3c
bus_add_driver+0x1d1/0x2ae
driver_register+0x78/0x130
__pci_register_driver+0x75/0xa8
qla2x00_module_init+0x21b/0x267 [qla2xxx]
do_one_initcall+0x5a/0x1e2
do_init_module+0x9d/0x285
load_module+0x20db/0x38e3
SYSC_finit_module+0xa8/0xbc
SyS_finit_module+0x9/0xb
do_syscall_64+0x77/0x271
entry_SYSCALL64_slow_path+0x25/0x25
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Thu, 25 Jan 2018 14:27:27 +0000 (17:27 +0300)]
scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo()
[ Upstream commit
a7043e9529f3c367cc4d82997e00be034cbe57ca ]
My static checker complains about an out of bounds read:
drivers/message/fusion/mptctl.c:2786 mptctl_hp_targetinfo()
error: buffer overflow 'hd->sel_timeout' 255 <= u32max.
It's true that we probably should have a bounds check here.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Thu, 25 Jan 2018 14:13:40 +0000 (17:13 +0300)]
scsi: sym53c8xx_2: iterator underflow in sym_getsync()
[ Upstream commit
e6f791d95313c85f3dd4a26141e28e50ae9aa0ae ]
We wanted to exit the loop with "div" set to zero, but instead, if we
don't hit the break then "div" is -1 when we finish the loop. It leads
to an array underflow a few lines later.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chad Dupuis [Wed, 24 Jan 2018 16:07:06 +0000 (08:07 -0800)]
scsi: bnx2fc: Fix check in SCSI completion handler for timed out request
[ Upstream commit
ecf7ff49945f5741fa1da112f994939f942031d3 ]
When a request times out we set the io_req flag BNX2FC_FLAG_IO_COMPL so
that if a subsequent completion comes in on that task ID we will ignore
it. The issue is that in the check for this flag there is a missing
return so we will continue to process a request which may have already
been returned to the ownership of the SCSI layer. This can cause
unpredictable results.
Solution is to add in the missing return.
[mkp: typo plus title shortening]
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sujit Reddy Thumma [Wed, 24 Jan 2018 04:22:35 +0000 (09:52 +0530)]
scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
[ Upstream commit
84af7e8b895088d89f246d6b0f82717fafdebf61 ]
WRITE_SAME command is not supported by UFS. Enable a quirk for the upper
level drivers to not send WRITE SAME command.
[mkp: botched patch, applied by hand]
Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Salter [Fri, 2 Feb 2018 14:20:29 +0000 (09:20 -0500)]
irqchip/gic-v3: Change pr_debug message to pr_devel
[ Upstream commit
b6dd4d83dc2f78cebc9a7e6e7e4bc2be4d29b94d ]
The pr_debug() in gic-v3 gic_send_sgi() can trigger a circular locking
warning:
GICv3: CPU10: ICC_SGI1R_EL1
5000400
======================================================
WARNING: possible circular locking dependency detected
4.15.0+ #1 Tainted: G W
------------------------------------------------------
dynamic_debug01/1873 is trying to acquire lock:
((console_sem).lock){-...}, at: [<
0000000099c891ec>] down_trylock+0x20/0x4c
but task is already holding lock:
(&rq->lock){-.-.}, at: [<
00000000842e1587>] __task_rq_lock+0x54/0xdc
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&rq->lock){-.-.}:
__lock_acquire+0x3b4/0x6e0
lock_acquire+0xf4/0x2a8
_raw_spin_lock+0x4c/0x60
task_fork_fair+0x3c/0x148
sched_fork+0x10c/0x214
copy_process.isra.32.part.33+0x4e8/0x14f0
_do_fork+0xe8/0x78c
kernel_thread+0x48/0x54
rest_init+0x34/0x2a4
start_kernel+0x45c/0x488
-> #1 (&p->pi_lock){-.-.}:
__lock_acquire+0x3b4/0x6e0
lock_acquire+0xf4/0x2a8
_raw_spin_lock_irqsave+0x58/0x70
try_to_wake_up+0x48/0x600
wake_up_process+0x28/0x34
__up.isra.0+0x60/0x6c
up+0x60/0x68
__up_console_sem+0x4c/0x7c
console_unlock+0x328/0x634
vprintk_emit+0x25c/0x390
dev_vprintk_emit+0xc4/0x1fc
dev_printk_emit+0x88/0xa8
__dev_printk+0x58/0x9c
_dev_info+0x84/0xa8
usb_new_device+0x100/0x474
hub_port_connect+0x280/0x92c
hub_event+0x740/0xa84
process_one_work+0x240/0x70c
worker_thread+0x60/0x400
kthread+0x110/0x13c
ret_from_fork+0x10/0x18
-> #0 ((console_sem).lock){-...}:
validate_chain.isra.34+0x6e4/0xa20
__lock_acquire+0x3b4/0x6e0
lock_acquire+0xf4/0x2a8
_raw_spin_lock_irqsave+0x58/0x70
down_trylock+0x20/0x4c
__down_trylock_console_sem+0x3c/0x9c
console_trylock+0x20/0xb0
vprintk_emit+0x254/0x390
vprintk_default+0x58/0x90
vprintk_func+0xbc/0x164
printk+0x80/0xa0
__dynamic_pr_debug+0x84/0xac
gic_raise_softirq+0x184/0x18c
smp_cross_call+0xac/0x218
smp_send_reschedule+0x3c/0x48
resched_curr+0x60/0x9c
check_preempt_curr+0x70/0xdc
wake_up_new_task+0x310/0x470
_do_fork+0x188/0x78c
SyS_clone+0x44/0x50
__sys_trace_return+0x0/0x4
other info that might help us debug this:
Chain exists of:
(console_sem).lock --> &p->pi_lock --> &rq->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&rq->lock);
lock(&p->pi_lock);
lock(&rq->lock);
lock((console_sem).lock);
*** DEADLOCK ***
2 locks held by dynamic_debug01/1873:
#0: (&p->pi_lock){-.-.}, at: [<
000000001366df53>] wake_up_new_task+0x40/0x470
#1: (&rq->lock){-.-.}, at: [<
00000000842e1587>] __task_rq_lock+0x54/0xdc
stack backtrace:
CPU: 10 PID: 1873 Comm: dynamic_debug01 Tainted: G W 4.15.0+ #1
Hardware name: GIGABYTE R120-T34-00/MT30-GS2-00, BIOS T48 10/02/2017
Call trace:
dump_backtrace+0x0/0x188
show_stack+0x24/0x2c
dump_stack+0xa4/0xe0
print_circular_bug.isra.31+0x29c/0x2b8
check_prev_add.constprop.39+0x6c8/0x6dc
validate_chain.isra.34+0x6e4/0xa20
__lock_acquire+0x3b4/0x6e0
lock_acquire+0xf4/0x2a8
_raw_spin_lock_irqsave+0x58/0x70
down_trylock+0x20/0x4c
__down_trylock_console_sem+0x3c/0x9c
console_trylock+0x20/0xb0
vprintk_emit+0x254/0x390
vprintk_default+0x58/0x90
vprintk_func+0xbc/0x164
printk+0x80/0xa0
__dynamic_pr_debug+0x84/0xac
gic_raise_softirq+0x184/0x18c
smp_cross_call+0xac/0x218
smp_send_reschedule+0x3c/0x48
resched_curr+0x60/0x9c
check_preempt_curr+0x70/0xdc
wake_up_new_task+0x310/0x470
_do_fork+0x188/0x78c
SyS_clone+0x44/0x50
__sys_trace_return+0x0/0x4
GICv3: CPU0: ICC_SGI1R_EL1 12000
This could be fixed with printk_deferred() but that might lessen its
usefulness for debugging. So change it to pr_devel to keep it out of
production kernels. Developers working on gic-v3 can enable it as
needed in their kernels.
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Will Deacon [Tue, 13 Feb 2018 13:22:57 +0000 (13:22 +0000)]
locking/qspinlock: Ensure node->count is updated before initialising node
[ Upstream commit
11dc13224c975efcec96647a4768a6f1bb7a19a8 ]
When queuing on the qspinlock, the count field for the current CPU's head
node is incremented. This needn't be atomic because locking in e.g. IRQ
context is balanced and so an IRQ will return with node->count as it
found it.
However, the compiler could in theory reorder the initialisation of
node[idx] before the increment of the head node->count, causing an
IRQ to overwrite the initialised node and potentially corrupt the lock
state.
Avoid the potential for this harmful compiler reordering by placing a
barrier() between the increment of the head node->count and the subsequent
node initialisation.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1518528177-19169-3-git-send-email-will.deacon@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jesper Dangaard Brouer [Thu, 8 Feb 2018 11:48:32 +0000 (12:48 +0100)]
tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
[ Upstream commit
e3d91b0ca523d53158f435a3e13df7f0cb360ea2 ]
V3: More generic skipping of relo-section (suggested by Daniel)
If clang >= 4.0.1 is missing the option '-target bpf', it will cause
llc/llvm to create two ELF sections for "Exception Frames", with
section names '.eh_frame' and '.rel.eh_frame'.
The BPF ELF loader library libbpf fails when loading files with these
sections. The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c,
handle this gracefully. And iproute2 loader also seems to work with these
"eh" sections.
The issue in libbpf is caused by bpf_object__elf_collect() skipping
some sections, and later when performing relocation it will be
pointing to a skipped section, as these sections cannot be found by
bpf_object__find_prog_by_idx() in bpf_object__collect_reloc().
This is a general issue that also occurs for other sections, like
debug sections which are also skipped and can have relo section.
As suggested by Daniel. To avoid keeping state about all skipped
sections, instead perform a direct qlookup in the ELF object. Lookup
the section that the relo-section points to and check if it contains
executable machine instructions (denoted by the sh_flags
SHF_EXECINSTR). Use this check to also skip irrelevant relo-sections.
Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used
due to incompatibility with asm embedded headers, that some of the samples
include. This is explained in more details by Yonghong Song in bpf_devel_QA.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tang Junhui [Wed, 7 Feb 2018 19:41:45 +0000 (11:41 -0800)]
bcache: return attach error when no cache set exist
[ Upstream commit
7f4fc93d4713394ee8f1cd44c238e046e11b4f15 ]
I attach a back-end device to a cache set, and the cache set is not
registered yet, this back-end device did not attach successfully, and no
error returned:
[root]# echo
87859280-fec6-4bcc-
20df7ca8f86b > /sys/block/sde/bcache/attach
[root]#
In sysfs_attach(), the return value "v" is initialized to "size" in
the beginning, and if no cache set exist in bch_cache_sets, the "v" value
would not change any more, and return to sysfs, sysfs regard it as success
since the "size" is a positive number.
This patch fixes this issue by assigning "v" with "-ENOENT" in the
initialization.
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tang Junhui [Wed, 7 Feb 2018 19:41:46 +0000 (11:41 -0800)]
bcache: fix for data collapse after re-attaching an attached device
[ Upstream commit
73ac105be390c1de42a2f21643c9778a5e002930 ]
back-end device sdm has already attached a cache_set with ID
f67ebe1f-f8bc-4d73-bfe5-
9dc88607f119, then try to attach with
another cache set, and it returns with an error:
[root]# cd /sys/block/sdm/bcache
[root]# echo
5ccd0a63-148e-48b8-afa2-
aca9cbd6279f > attach
-bash: echo: write error: Invalid argument
After that, execute a command to modify the label of bcache
device:
[root]# echo data_disk1 > label
Then we reboot the system, when the system power on, the back-end
device can not attach to cache_set, a messages show in the log:
Feb 5 12:05:52 ceph152 kernel: [922385.508498] bcache:
bch_cached_dev_attach() couldn't find uuid for sdm in set
In sysfs_attach(), dc->sb.set_uuid was assigned to the value
which input through sysfs, no matter whether it is success
or not in bch_cached_dev_attach(). For example, If the back-end
device has already attached to an cache set, bch_cached_dev_attach()
would fail, but dc->sb.set_uuid was changed. Then modify the
label of bcache device, it will call bch_write_bdev_super(),
which would write the dc->sb.set_uuid to the super block, so we
record a wrong cache set ID in the super block, after the system
reboot, the cache set couldn't find the uuid of the back-end
device, so the bcache device couldn't exist and use any more.
In this patch, we don't assigned cache set ID to dc->sb.set_uuid
in sysfs_attach() directly, but input it into bch_cached_dev_attach(),
and assigned dc->sb.set_uuid to the cache set ID after the back-end
device attached to the cache set successful.
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tang Junhui [Wed, 7 Feb 2018 19:41:43 +0000 (11:41 -0800)]
bcache: fix for allocator and register thread race
[ Upstream commit
682811b3ce1a5a4e20d700939a9042f01dbc66c4 ]
After long time running of random small IO writing,
I reboot the machine, and after the machine power on,
I found bcache got stuck, the stack is:
[root@ceph153 ~]# cat /proc/2510/task/*/stack
[<
ffffffffa06b2455>] closure_sync+0x25/0x90 [bcache]
[<
ffffffffa06b6be8>] bch_journal+0x118/0x2b0 [bcache]
[<
ffffffffa06b6dc7>] bch_journal_meta+0x47/0x70 [bcache]
[<
ffffffffa06be8f7>] bch_prio_write+0x237/0x340 [bcache]
[<
ffffffffa06a8018>] bch_allocator_thread+0x3c8/0x3d0 [bcache]
[<
ffffffff810a631f>] kthread+0xcf/0xe0
[<
ffffffff8164c318>] ret_from_fork+0x58/0x90
[<
ffffffffffffffff>] 0xffffffffffffffff
[root@ceph153 ~]# cat /proc/2038/task/*/stack
[<
ffffffffa06b1abd>] __bch_btree_map_nodes+0x12d/0x150 [bcache]
[<
ffffffffa06b1bd1>] bch_btree_insert+0xf1/0x170 [bcache]
[<
ffffffffa06b637f>] bch_journal_replay+0x13f/0x230 [bcache]
[<
ffffffffa06c75fe>] run_cache_set+0x79a/0x7c2 [bcache]
[<
ffffffffa06c0cf8>] register_bcache+0xd48/0x1310 [bcache]
[<
ffffffff812f702f>] kobj_attr_store+0xf/0x20
[<
ffffffff8125b216>] sysfs_write_file+0xc6/0x140
[<
ffffffff811dfbfd>] vfs_write+0xbd/0x1e0
[<
ffffffff811e069f>] SyS_write+0x7f/0xe0
[<
ffffffff8164c3c9>] system_call_fastpath+0x16/0x1
The stack shows the register thread and allocator thread
were getting stuck when registering cache device.
I reboot the machine several times, the issue always
exsit in this machine.
I debug the code, and found the call trace as bellow:
register_bcache()
==>run_cache_set()
==>bch_journal_replay()
==>bch_btree_insert()
==>__bch_btree_map_nodes()
==>btree_insert_fn()
==>btree_split() //node need split
==>btree_check_reserve()
In btree_check_reserve(), It will check if there is enough buckets
of RESERVE_BTREE type, since allocator thread did not work yet, so
no buckets of RESERVE_BTREE type allocated, so the register thread
waits on c->btree_cache_wait, and goes to sleep.
Then the allocator thread initialized, the call trace is bellow:
bch_allocator_thread()
==>bch_prio_write()
==>bch_journal_meta()
==>bch_journal()
==>journal_wait_for_write()
In journal_wait_for_write(), It will check if journal is full by
journal_full(), but the long time random small IO writing
causes the exhaustion of journal buckets(journal.blocks_free=0),
In order to release the journal buckets,
the allocator calls btree_flush_write() to flush keys to
btree nodes, and waits on c->journal.wait until btree nodes writing
over or there has already some journal buckets space, then the
allocator thread goes to sleep. but in btree_flush_write(), since
bch_journal_replay() is not finished, so no btree nodes have journal
(condition "if (btree_current_write(b)->journal)" never satisfied),
so we got no btree node to flush, no journal bucket released,
and allocator sleep all the times.
Through the above analysis, we can see that:
1) Register thread wait for allocator thread to allocate buckets of
RESERVE_BTREE type;
2) Alloctor thread wait for register thread to replay journal, so it
can flush btree nodes and get journal bucket.
then they are all got stuck by waiting for each other.
Hua Rui provided a patch for me, by allocating some buckets of
RESERVE_BTREE type in advance, so the register thread can get bucket
when btree node splitting and no need to waiting for the allocator
thread. I tested it, it has effect, and register thread run a step
forward, but finally are still got stuck, the reason is only 8 bucket
of RESERVE_BTREE type were allocated, and in bch_journal_replay(),
after 2 btree nodes splitting, only 4 bucket of RESERVE_BTREE type left,
then btree_check_reserve() is not satisfied anymore, so it goes to sleep
again, and in the same time, alloctor thread did not flush enough btree
nodes to release a journal bucket, so they all got stuck again.
So we need to allocate more buckets of RESERVE_BTREE type in advance,
but how much is enough? By experience and test, I think it should be
as much as journal buckets. Then I modify the code as this patch,
and test in the machine, and it works.
This patch modified base on Hua Rui’s patch, and allocate more buckets
of RESERVE_BTREE type in advance to avoid register thread and allocate
thread going to wait for each other.
[patch v2] ca->sb.njournal_buckets would be 0 in the first time after
cache creation, and no journal exists, so just 8 btree buckets is OK.
Signed-off-by: Hua Rui <huarui.dev@gmail.com>
Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Coly Li [Wed, 7 Feb 2018 19:41:41 +0000 (11:41 -0800)]
bcache: properly set task state in bch_writeback_thread()
[ Upstream commit
99361bbf26337186f02561109c17a4c4b1a7536a ]
Kernel thread routine bch_writeback_thread() has the following code block,
447 down_write(&dc->writeback_lock);
448~450 if (check conditions) {
451 up_write(&dc->writeback_lock);
452 set_current_state(TASK_INTERRUPTIBLE);
453
454 if (kthread_should_stop())
455 return 0;
456
457 schedule();
458 continue;
459 }
If condition check is true, its task state is set to TASK_INTERRUPTIBLE
and call schedule() to wait for others to wake up it.
There are 2 issues in current code,
1, Task state is set to TASK_INTERRUPTIBLE after the condition checks, if
another process changes the condition and call wake_up_process(dc->
writeback_thread), then at line 452 task state is set back to
TASK_INTERRUPTIBLE, the writeback kernel thread will lose a chance to be
waken up.
2, At line 454 if kthread_should_stop() is true, writeback kernel thread
will return to kernel/kthread.c:kthread() with TASK_INTERRUPTIBLE and
call do_exit(). It is not good to enter do_exit() with task state
TASK_INTERRUPTIBLE, in following code path might_sleep() is called and a
warning message is reported by __might_sleep(): "WARNING: do not call
blocking ops when !TASK_RUNNING; state=1 set at [xxxx]".
For the first issue, task state should be set before condition checks.
Ineed because dc->writeback_lock is required when modifying all the
conditions, calling set_current_state() inside code block where dc->
writeback_lock is hold is safe. But this is quite implicit, so I still move
set_current_state() before all the condition checks.
For the second issue, frankley speaking it does not hurt when kernel thread
exits with TASK_INTERRUPTIBLE state, but this warning message scares users,
makes them feel there might be something risky with bcache and hurt their
data. Setting task state to TASK_RUNNING before returning fixes this
problem.
In alloc.c:allocator_wait(), there is also a similar issue, and is also
fixed in this patch.
Changelog:
v3: merge two similar fixes into one patch
v2: fix the race issue in v1 patch.
v1: initial buggy fix.
Signed-off-by: Coly Li <colyli@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Michael Lyle <mlyle@lyle.org>
Cc: Michael Lyle <mlyle@lyle.org>
Cc: Junhui Tang <tang.junhui@zte.com.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnd Bergmann [Fri, 2 Feb 2018 15:48:47 +0000 (16:48 +0100)]
cifs: silence compiler warnings showing up with gcc-8.0.0
[ Upstream commit
ade7db991b47ab3016a414468164f4966bd08202 ]
This bug was fixed before, but came up again with the latest
compiler in another function:
fs/cifs/cifssmb.c: In function 'CIFSSMBSetEA':
fs/cifs/cifssmb.c:6362:3: error: 'strncpy' offset 8 is out of the bounds [0, 4] [-Werror=array-bounds]
strncpy(parm_data->list[0].name, ea_name, name_len);
Let's apply the same fix that was used for the other instances.
Fixes:
b2a3ad9ca502 ("cifs: silence compiler warnings showing up with gcc-4.7.0")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Dobriyan [Tue, 6 Feb 2018 23:36:59 +0000 (15:36 -0800)]
proc: fix /proc/*/map_files lookup
[ Upstream commit
ac7f1061c2c11bb8936b1b6a94cdb48de732f7a4 ]
Current code does:
if (sscanf(dentry->d_name.name, "%lx-%lx", start, end) != 2)
However sscanf() is broken garbage.
It silently accepts whitespace between format specifiers
(did you know that?).
It silently accepts valid strings which result in integer overflow.
Do not use sscanf() for any even remotely reliable parsing code.
OK
# readlink '/proc/1/map_files/
55a23af39000-
55a23b05b000'
/lib/systemd/systemd
broken
# readlink '/proc/1/map_files/
55a23af39000-
55a23b05b000'
/lib/systemd/systemd
broken
# readlink '/proc/1/map_files/
55a23af39000-
55a23b05b000 '
/lib/systemd/systemd
very broken
# readlink '/proc/1/map_files/
1000000000000000055a23af39000-
55a23b05b000'
/lib/systemd/systemd
Andrei said:
: This patch breaks criu. It was a bug in criu. And this bug is on a minor
: path, which works when memfd_create() isn't available. It is a reason why
: I ask to not backport this patch to stable kernels.
:
: In CRIU this bug can be triggered, only if this patch will be backported
: to a kernel which version is lower than v3.16.
Link: http://lkml.kernel.org/r/20171120212706.GA14325@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Will Deacon [Wed, 31 Jan 2018 12:12:20 +0000 (12:12 +0000)]
arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
[ Upstream commit
202fb4ef81e3ec765c23bd1e6746a5c25b797d0e ]
If the spinlock "next" ticket wraps around between the initial LDR
and the cmpxchg in the LSE version of spin_trylock, then we can erroneously
think that we have successfuly acquired the lock because we only check
whether the next ticket return by the cmpxchg is equal to the owner ticket
in our updated lock word.
This patch fixes the issue by performing a full 32-bit check of the lock
word when trying to determine whether or not the CASA instruction updated
memory.
Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Guanglei Li [Tue, 6 Feb 2018 02:43:21 +0000 (10:43 +0800)]
RDS: IB: Fix null pointer issue
[ Upstream commit
2c0aa08631b86a4678dbc93b9caa5248014b4458 ]
Scenario:
1. Port down and do fail over
2. Ap do rds_bind syscall
PID: 47039 TASK:
ffff89887e2fe640 CPU: 47 COMMAND: "kworker/u:6"
#0 [
ffff898e35f159f0] machine_kexec at
ffffffff8103abf9
#1 [
ffff898e35f15a60] crash_kexec at
ffffffff810b96e3
#2 [
ffff898e35f15b30] oops_end at
ffffffff8150f518
#3 [
ffff898e35f15b60] no_context at
ffffffff8104854c
#4 [
ffff898e35f15ba0] __bad_area_nosemaphore at
ffffffff81048675
#5 [
ffff898e35f15bf0] bad_area_nosemaphore at
ffffffff810487d3
#6 [
ffff898e35f15c00] do_page_fault at
ffffffff815120b8
#7 [
ffff898e35f15d10] page_fault at
ffffffff8150ea95
[exception RIP: unknown or invalid address]
RIP:
0000000000000000 RSP:
ffff898e35f15dc8 RFLAGS:
00010282
RAX:
00000000fffffffe RBX:
ffff889b77f6fc00 RCX:
ffffffff81c99d88
RDX:
0000000000000000 RSI:
ffff896019ee08e8 RDI:
ffff889b77f6fc00
RBP:
ffff898e35f15df0 R8:
ffff896019ee08c8 R9:
0000000000000000
R10:
0000000000000400 R11:
0000000000000000 R12:
ffff896019ee08c0
R13:
ffff889b77f6fe68 R14:
ffffffff81c99d80 R15:
ffffffffa022a1e0
ORIG_RAX:
ffffffffffffffff CS: 0010 SS: 0018
#8 [
ffff898e35f15dc8] cma_ndev_work_handler at
ffffffffa022a228 [rdma_cm]
#9 [
ffff898e35f15df8] process_one_work at
ffffffff8108a7c6
#10 [
ffff898e35f15e58] worker_thread at
ffffffff8108bda0
#11 [
ffff898e35f15ee8] kthread at
ffffffff81090fe6
PID: 45659 TASK:
ffff880d313d2500 CPU: 31 COMMAND: "oracle_45659_ap"
#0 [
ffff881024ccfc98] __schedule at
ffffffff8150bac4
#1 [
ffff881024ccfd40] schedule at
ffffffff8150c2cf
#2 [
ffff881024ccfd50] __mutex_lock_slowpath at
ffffffff8150cee7
#3 [
ffff881024ccfdc0] mutex_lock at
ffffffff8150cdeb
#4 [
ffff881024ccfde0] rdma_destroy_id at
ffffffffa022a027 [rdma_cm]
#5 [
ffff881024ccfe10] rds_ib_laddr_check at
ffffffffa0357857 [rds_rdma]
#6 [
ffff881024ccfe50] rds_trans_get_preferred at
ffffffffa0324c2a [rds]
#7 [
ffff881024ccfe80] rds_bind at
ffffffffa031d690 [rds]
#8 [
ffff881024ccfeb0] sys_bind at
ffffffff8142a670
PID: 45659 PID: 47039
rds_ib_laddr_check
/* create id_priv with a null event_handler */
rdma_create_id
rdma_bind_addr
cma_acquire_dev
/* add id_priv to cma_dev->id_list */
cma_attach_to_dev
cma_ndev_work_handler
/* event_hanlder is null */
id_priv->id.event_handler
Signed-off-by: Guanglei Li <guanglei.li@oracle.com>
Signed-off-by: Honglei Wang <honglei.wang@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Yanjun Zhu <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ross Lagerwall [Thu, 11 Jan 2018 09:36:37 +0000 (09:36 +0000)]
xen/grant-table: Use put_page instead of free_page
[ Upstream commit
3ac7292a25db1c607a50752055a18aba32ac2176 ]
The page given to gnttab_end_foreign_access() to free could be a
compound page so use put_page() instead of free_page() since it can
handle both compound and single pages correctly.
This bug was discovered when migrating a Xen VM with several VIFs and
CONFIG_DEBUG_VM enabled. It hits a BUG usually after fewer than 10
iterations. All netfront devices disconnect from the backend during a
suspend/resume and this will call gnttab_end_foreign_access() if a
netfront queue has an outstanding skb. The mismatch between calling
get_page() and free_page() on a compound page causes a reference
counting error which is detected when DEBUG_VM is enabled.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ross Lagerwall [Thu, 11 Jan 2018 09:36:38 +0000 (09:36 +0000)]
xen-netfront: Fix race between device setup and open
[ Upstream commit
f599c64fdf7d9c108e8717fb04bc41c680120da4 ]
When a netfront device is set up it registers a netdev fairly early on,
before it has set up the queues and is actually usable. A userspace tool
like NetworkManager will immediately try to open it and access its state
as soon as it appears. The bug can be reproduced by hotplugging VIFs
until the VM runs out of grant refs. It registers the netdev but fails
to set up any queues (since there are no more grant refs). In the
meantime, NetworkManager opens the device and the kernel crashes trying
to access the queues (of which there are none).
Fix this in two ways:
* For initial setup, register the netdev much later, after the queues
are setup. This avoids the race entirely.
* During a suspend/resume cycle, the frontend reconnects to the backend
and the queues are recreated. It is possible (though highly unlikely) to
race with something opening the device and accessing the queues after
they have been destroyed but before they have been recreated. Extend the
region covered by the rtnl semaphore to protect against this race. There
is a possibility that we fail to recreate the queues so check for this
in the open function.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Matt Redfearn [Mon, 29 Jan 2018 11:26:45 +0000 (11:26 +0000)]
MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
[ Upstream commit
0cde5b44a30f1daaef1c34e08191239dc63271c4 ]
When commit
b27311e1cace ("MIPS: TXx9: Add RBTX4939 board support")
added board support for the RBTX4939, it added a call to
led_classdev_register even if the LED class is built as a module.
Built-in arch code cannot call module code directly like this. Commit
b33b44073734 ("MIPS: TXX9: use IS_ENABLED() macro") subsequently
changed the inclusion of this code to a single check that
CONFIG_LEDS_CLASS is either builtin or a module, but the same issue
remains.
This leads to MIPS allmodconfig builds failing when CONFIG_MACH_TX49XX=y
is set:
arch/mips/txx9/rbtx4939/setup.o: In function `rbtx4939_led_probe':
setup.c:(.init.text+0xc0): undefined reference to `of_led_classdev_register'
make: *** [Makefile:999: vmlinux] Error 1
Fix this by using the IS_BUILTIN() macro instead.
Fixes:
b27311e1cace ("MIPS: TXx9: Add RBTX4939 board support")
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Reviewed-by: James Hogan <jhogan@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/18544/
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yonghong Song [Sat, 3 Feb 2018 06:37:15 +0000 (22:37 -0800)]
bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
[ Upstream commit
09584b406742413ac4c8d7e030374d4daa045b69 ]
With CONFIG_BPF_JIT_ALWAYS_ON is defined in the config file,
tools/testing/selftests/bpf/test_kmod.sh failed like below:
[root@localhost bpf]# ./test_kmod.sh
sysctl: setting key "net.core.bpf_jit_enable": Invalid argument
[ JIT enabled:0 hardened:0 ]
[ 132.175681] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
[ 132.458834] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
[ JIT enabled:1 hardened:0 ]
[ 133.456025] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
[ 133.730935] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
[ JIT enabled:1 hardened:1 ]
[ 134.769730] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
[ 135.050864] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
[ JIT enabled:1 hardened:2 ]
[ 136.442882] test_bpf: #297 BPF_MAXINSNS: Jump, gap, jump, ... FAIL to prog_create err=-524 len=4096
[ 136.821810] test_bpf: Summary: 348 PASSED, 1 FAILED, [340/340 JIT'ed]
[root@localhost bpf]#
The test_kmod.sh load/remove test_bpf.ko multiple times with different
settings for sysctl net.core.bpf_jit_{enable,harden}. The failed test #297
of test_bpf.ko is designed such that JIT always fails.
Commit
290af86629b2 (bpf: introduce BPF_JIT_ALWAYS_ON config)
introduced the following tightening logic:
...
if (!bpf_prog_is_dev_bound(fp->aux)) {
fp = bpf_int_jit_compile(fp);
#ifdef CONFIG_BPF_JIT_ALWAYS_ON
if (!fp->jited) {
*err = -ENOTSUPP;
return fp;
}
#endif
...
With this logic, Test #297 always gets return value -ENOTSUPP
when CONFIG_BPF_JIT_ALWAYS_ON is defined, causing the test failure.
This patch fixed the failure by marking Test #297 as expected failure
when CONFIG_BPF_JIT_ALWAYS_ON is defined.
Fixes:
290af86629b2 (bpf: introduce BPF_JIT_ALWAYS_ON config)
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chen Yu [Mon, 29 Jan 2018 02:26:46 +0000 (10:26 +0800)]
ACPI: processor_perflib: Do not send _PPC change notification if not ready
[ Upstream commit
ba1edb9a5125a617d612f98eead14b9b84e75c3a ]
The following warning was triggered after resumed from S3 -
if all the nonboot CPUs were put offline before suspend:
[ 1840.329515] unchecked MSR access error: RDMSR from 0x771 at rIP: 0xffffffff86061e3a (native_read_msr+0xa/0x30)
[ 1840.329516] Call Trace:
[ 1840.329521] __rdmsr_on_cpu+0x33/0x50
[ 1840.329525] generic_exec_single+0x81/0xb0
[ 1840.329527] smp_call_function_single+0xd2/0x100
[ 1840.329530] ? acpi_ds_result_pop+0xdd/0xf2
[ 1840.329532] ? acpi_ds_create_operand+0x215/0x23c
[ 1840.329534] rdmsrl_on_cpu+0x57/0x80
[ 1840.329536] ? cpumask_next+0x1b/0x20
[ 1840.329538] ? rdmsrl_on_cpu+0x57/0x80
[ 1840.329541] intel_pstate_update_perf_limits+0xf3/0x220
[ 1840.329544] ? notifier_call_chain+0x4a/0x70
[ 1840.329546] intel_pstate_set_policy+0x4e/0x150
[ 1840.329548] cpufreq_set_policy+0xcd/0x2f0
[ 1840.329550] cpufreq_update_policy+0xb2/0x130
[ 1840.329552] ? cpufreq_update_policy+0x130/0x130
[ 1840.329556] acpi_processor_ppc_has_changed+0x65/0x80
[ 1840.329558] acpi_processor_notify+0x80/0x100
[ 1840.329561] acpi_ev_notify_dispatch+0x44/0x5c
[ 1840.329563] acpi_os_execute_deferred+0x14/0x20
[ 1840.329565] process_one_work+0x193/0x3c0
[ 1840.329567] worker_thread+0x35/0x3b0
[ 1840.329569] kthread+0x125/0x140
[ 1840.329571] ? process_one_work+0x3c0/0x3c0
[ 1840.329572] ? kthread_park+0x60/0x60
[ 1840.329575] ? do_syscall_64+0x67/0x180
[ 1840.329577] ret_from_fork+0x25/0x30
[ 1840.329585] unchecked MSR access error: WRMSR to 0x774 (tried to write 0x0000000000000000) at rIP: 0xffffffff86061f78 (native_write_msr+0x8/0x30)
[ 1840.329586] Call Trace:
[ 1840.329587] __wrmsr_on_cpu+0x37/0x40
[ 1840.329589] generic_exec_single+0x81/0xb0
[ 1840.329592] smp_call_function_single+0xd2/0x100
[ 1840.329594] ? acpi_ds_create_operand+0x215/0x23c
[ 1840.329595] ? cpumask_next+0x1b/0x20
[ 1840.329597] wrmsrl_on_cpu+0x57/0x70
[ 1840.329598] ? rdmsrl_on_cpu+0x57/0x80
[ 1840.329599] ? wrmsrl_on_cpu+0x57/0x70
[ 1840.329602] intel_pstate_hwp_set+0xd3/0x150
[ 1840.329604] intel_pstate_set_policy+0x119/0x150
[ 1840.329606] cpufreq_set_policy+0xcd/0x2f0
[ 1840.329607] cpufreq_update_policy+0xb2/0x130
[ 1840.329610] ? cpufreq_update_policy+0x130/0x130
[ 1840.329613] acpi_processor_ppc_has_changed+0x65/0x80
[ 1840.329615] acpi_processor_notify+0x80/0x100
[ 1840.329617] acpi_ev_notify_dispatch+0x44/0x5c
[ 1840.329619] acpi_os_execute_deferred+0x14/0x20
[ 1840.329620] process_one_work+0x193/0x3c0
[ 1840.329622] worker_thread+0x35/0x3b0
[ 1840.329624] kthread+0x125/0x140
[ 1840.329625] ? process_one_work+0x3c0/0x3c0
[ 1840.329626] ? kthread_park+0x60/0x60
[ 1840.329628] ? do_syscall_64+0x67/0x180
[ 1840.329631] ret_from_fork+0x25/0x30
This is because if there's only one online CPU, the MSR_PM_ENABLE
(package wide)can not be enabled after resumed, due to
intel_pstate_hwp_enable() will only be invoked on AP's online
process after resumed - if there's no AP online, the HWP remains
disabled after resumed (BIOS has disabled it in S3). Then if
there comes a _PPC change notification which touches HWP register
during this stage, the warning is triggered.
Since we don't call acpi_processor_register_performance() when
HWP is enabled, the pr->performance will be NULL. When this is
NULL we don't need to do _PPC change notification.
Reported-by: Doug Smythies <dsmythies@telus.net>
Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Yu Chen <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jean Delvare [Sat, 3 Feb 2018 10:25:20 +0000 (11:25 +0100)]
firmware: dmi_scan: Fix handling of empty DMI strings
[ Upstream commit
a7770ae194569e96a93c48aceb304edded9cc648 ]
The handling of empty DMI strings looks quite broken to me:
* Strings from 1 to 7 spaces are not considered empty.
* True empty DMI strings (string index set to 0) are not considered
empty, and result in allocating a 0-char string.
* Strings with invalid index also result in allocating a 0-char
string.
* Strings starting with 8 spaces are all considered empty, even if
non-space characters follow (sounds like a weird thing to do, but
I have actually seen occurrences of this in DMI tables before.)
* Strings which are considered empty are reported as 8 spaces,
instead of being actually empty.
Some of these issues are the result of an off-by-one error in memcmp,
the rest is incorrect by design.
So let's get it square: missing strings and strings made of only
spaces, regardless of their length, should be treated as empty and
no memory should be allocated for them. All other strings are
non-empty and should be allocated.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes:
79da4721117f ("x86: fix DMI out of memory problems")
Cc: Parag Warudkar <parag.warudkar@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnd Bergmann [Fri, 2 Feb 2018 14:56:18 +0000 (15:56 +0100)]
x86/power: Fix swsusp_arch_resume prototype
[ Upstream commit
328008a72d38b5bde6491e463405c34a81a65d3e ]
The declaration for swsusp_arch_resume marks it as 'asmlinkage', but the
definition in x86-32 does not, and it fails to include the header with the
declaration. This leads to a warning when building with
link-time-optimizations:
kernel/power/power.h:108:23: error: type of 'swsusp_arch_resume' does not match original declaration [-Werror=lto-type-mismatch]
extern asmlinkage int swsusp_arch_resume(void);
^
arch/x86/power/hibernate_32.c:148:0: note: 'swsusp_arch_resume' was previously declared here
int swsusp_arch_resume(void)
This moves the declaration into a globally visible header file and fixes up
both x86 definitions to match it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Len Brown <len.brown@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: linux-pm@vger.kernel.org
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Link: https://lkml.kernel.org/r/20180202145634.200291-2-arnd@arndb.de
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alex Estrin [Thu, 1 Feb 2018 18:55:41 +0000 (10:55 -0800)]
IB/ipoib: Fix for potential no-carrier state
[ Upstream commit
1029361084d18cc270f64dfd39529fafa10cfe01 ]
On reboot SM can program port pkey table before ipoib registered its
event handler, which could result in missing pkey event and leave root
interface with initial pkey value from index 0.
Since OPA port starts with invalid pkey in index 0, root interface will
fail to initialize and stay down with no-carrier flag.
For IB ipoib interface may end up with pkey different from value
opensm put in pkey table idx 0, resulting in connectivity issues
(different mcast groups, for example).
Close the window by calling event handler after registration
to make sure ipoib pkey is in sync with port pkey table.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mel Gorman [Thu, 1 Feb 2018 00:19:52 +0000 (16:19 -0800)]
mm: pin address_space before dereferencing it while isolating an LRU page
[ Upstream commit
69d763fc6d3aee787a3e8c8c35092b4f4960fa5d ]
Minchan Kim asked the following question -- what locks protects
address_space destroying when race happens between inode trauncation and
__isolate_lru_page? Jan Kara clarified by describing the race as follows
CPU1 CPU2
truncate(inode) __isolate_lru_page()
...
truncate_inode_page(mapping, page);
delete_from_page_cache(page)
spin_lock_irqsave(&mapping->tree_lock, flags);
__delete_from_page_cache(page, NULL)
page_cache_tree_delete(..)
... mapping = page_mapping(page);
page->mapping = NULL;
...
spin_unlock_irqrestore(&mapping->tree_lock, flags);
page_cache_free_page(mapping, page)
put_page(page)
if (put_page_testzero(page)) -> false
- inode now has no pages and can be freed including embedded address_space
if (mapping && !mapping->a_ops->migratepage)
- we've dereferenced mapping which is potentially already free.
The race is theoretically possible but unlikely. Before the
delete_from_page_cache, truncate_cleanup_page is called so the page is
likely to be !PageDirty or PageWriteback which gets skipped by the only
caller that checks the mappping in __isolate_lru_page. Even if the race
occurs, a substantial amount of work has to happen during a tiny window
with no preemption but it could potentially be done using a virtual
machine to artifically slow one CPU or halt it during the critical
window.
This patch should eliminate the race with truncation by try-locking the
page before derefencing mapping and aborting if the lock was not
acquired. There was a suggestion from Huang Ying to use RCU as a
side-effect to prevent mapping being freed. However, I do not like the
solution as it's an unconventional means of preserving a mapping and
it's not a context where rcu_read_lock is obviously protecting rcu data.
Link: http://lkml.kernel.org/r/20180104102512.2qos3h5vqzeisrek@techsingularity.net
Fixes:
c82449352854 ("mm: compaction: make isolate_lru_page() filter-aware again")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kirill A. Shutemov [Thu, 1 Feb 2018 00:17:43 +0000 (16:17 -0800)]
asm-generic: provide generic_pmdp_establish()
[ Upstream commit
c58f0bb77ed8bf93dfdde762b01cb67eebbdfc29 ]
Patch series "Do not lose dirty bit on THP pages", v4.
Vlastimil noted that pmdp_invalidate() is not atomic and we can lose
dirty and access bits if CPU sets them after pmdp dereference, but
before set_pmd_at().
The bug can lead to data loss, but the race window is tiny and I haven't
seen any reports that suggested that it happens in reality. So I don't
think it worth sending it to stable.
Unfortunately, there's no way to address the issue in a generic way. We
need to fix all architectures that support THP one-by-one.
All architectures that have THP supported have to provide atomic
pmdp_invalidate() that returns previous value.
If generic implementation of pmdp_invalidate() is used, architecture
needs to provide atomic pmdp_estabish().
pmdp_estabish() is not used out-side generic implementation of
pmdp_invalidate() so far, but I think this can change in the future.
This patch (of 12):
This is an implementation of pmdp_establish() that is only suitable for
an architecture that doesn't have hardware dirty/accessed bits. In this
case we can't race with CPU which sets these bits and non-atomic
approach is fine.
Link: http://lkml.kernel.org/r/20171213105756.69879-2-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Daney <david.daney@cavium.com>
Cc: David Miller <davem@davemloft.net>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nitin Gupta <nitin.m.gupta@oracle.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yisheng Xie [Thu, 1 Feb 2018 00:16:15 +0000 (16:16 -0800)]
mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
[ Upstream commit
0486a38bcc4749808edbc848f1bcf232042770fc ]
As in manpage of migrate_pages, the errno should be set to EINVAL when
none of the node IDs specified by new_nodes are on-line and allowed by
the process's current cpuset context, or none of the specified nodes
contain memory. However, when test by following case:
new_nodes = 0;
old_nodes = 0xf;
ret = migrate_pages(pid, old_nodes, new_nodes, MAX);
The ret will be 0 and no errno is set. As the new_nodes is empty, we
should expect EINVAL as documented.
To fix the case like above, this patch check whether target nodes AND
current task_nodes is empty, and then check whether AND
node_states[N_MEMORY] is empty.
Link: http://lkml.kernel.org/r/1510882624-44342-4-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Chris Salls <salls@cs.ucsb.edu>
Cc: Christopher Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tan Xiaojun <tanxiaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yisheng Xie [Thu, 1 Feb 2018 00:16:11 +0000 (16:16 -0800)]
mm/mempolicy: fix the check of nodemask from user
[ Upstream commit
56521e7a02b7b84a5e72691a1fb15570e6055545 ]
As Xiaojun reported the ltp of migrate_pages01 will fail on arm64 system
which has 4 nodes[0...3], all have memory and CONFIG_NODES_SHIFT=2:
migrate_pages01 0 TINFO : test_invalid_nodes
migrate_pages01 14 TFAIL : migrate_pages_common.c:45: unexpected failure - returned value = 0, expected: -1
migrate_pages01 15 TFAIL : migrate_pages_common.c:55: call succeeded unexpectedly
In this case the test_invalid_nodes of migrate_pages01 will call:
SYSC_migrate_pages as:
migrate_pages(0, , {0x0000000000000001}, 64, , {0x0000000000000010}, 64) = 0
The new nodes specifies one or more node IDs that are greater than the
maximum supported node ID, however, the errno is not set to EINVAL as
expected.
As man pages of set_mempolicy[1], mbind[2], and migrate_pages[3]
mentioned, when nodemask specifies one or more node IDs that are greater
than the maximum supported node ID, the errno should set to EINVAL.
However, get_nodes only check whether the part of bits
[BITS_PER_LONG*BITS_TO_LONGS(MAX_NUMNODES), maxnode) is zero or not, and
remain [MAX_NUMNODES, BITS_PER_LONG*BITS_TO_LONGS(MAX_NUMNODES)
unchecked.
This patch is to check the bits of [MAX_NUMNODES, maxnode) in get_nodes
to let migrate_pages set the errno to EINVAL when nodemask specifies one
or more node IDs that are greater than the maximum supported node ID,
which follows the manpage's guide.
[1] http://man7.org/linux/man-pages/man2/set_mempolicy.2.html
[2] http://man7.org/linux/man-pages/man2/mbind.2.html
[3] http://man7.org/linux/man-pages/man2/migrate_pages.2.html
Link: http://lkml.kernel.org/r/1510882624-44342-3-git-send-email-xieyisheng1@huawei.com
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Reported-by: Tan Xiaojun <tanxiaojun@huawei.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Chris Salls <salls@cs.ucsb.edu>
Cc: Christopher Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
piaojun [Thu, 1 Feb 2018 00:15:32 +0000 (16:15 -0800)]
ocfs2: return error when we attempt to access a dirty bh in jbd2
[ Upstream commit
d984187e3a1ad7d12447a7ab2c43ce3717a2b5b3 ]
We should not reuse the dirty bh in jbd2 directly due to the following
situation:
1. When removing extent rec, we will dirty the bhs of extent rec and
truncate log at the same time, and hand them over to jbd2.
2. The bhs are submitted to jbd2 area successfully.
3. The write-back thread of device help flush the bhs to disk but
encounter write error due to abnormal storage link.
4. After a while the storage link become normal. Truncate log flush
worker triggered by the next space reclaiming found the dirty bh of
truncate log and clear its 'BH_Write_EIO' and then set it uptodate in
__ocfs2_journal_access():
ocfs2_truncate_log_worker
ocfs2_flush_truncate_log
__ocfs2_flush_truncate_log
ocfs2_replay_truncate_records
ocfs2_journal_access_di
__ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata.
5. Then jbd2 will flush the bh of truncate log to disk, but the bh of
extent rec is still in error state, and unfortunately nobody will
take care of it.
6. At last the space of extent rec was not reduced, but truncate log
flush worker have given it back to globalalloc. That will cause
duplicate cluster problem which could be identified by fsck.ocfs2.
Sadly we can hardly revert this but set fs read-only in case of ruining
atomicity and consistency of space reclaim.
Link: http://lkml.kernel.org/r/5A6E8092.8090701@huawei.com
Fixes:
acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access")
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Yiwen Jiang <jiangyiwen@huawei.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
piaojun [Thu, 1 Feb 2018 00:14:59 +0000 (16:14 -0800)]
ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
[ Upstream commit
16c8d569f5704a84164f30ff01b29879f3438065 ]
The race between *set_acl and *get_acl will cause getting incomplete
xattr data as below:
processA processB
ocfs2_set_acl
ocfs2_xattr_set
__ocfs2_xattr_set_handle
ocfs2_get_acl_nolock
ocfs2_xattr_get_nolock:
processB may get incomplete xattr data if processA hasn't set_acl done.
So we should use 'ip_xattr_sem' to protect getting extended attribute in
ocfs2_get_acl_nolock(), as other processes could be changing it
concurrently.
Link: http://lkml.kernel.org/r/5A5DDCFF.7030001@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
piaojun [Thu, 1 Feb 2018 00:14:44 +0000 (16:14 -0800)]
ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
[ Upstream commit
025bcbde3634b2c9b316f227fed13ad6ad6817fb ]
If metadata is corrupted such as 'invalid inode block', we will get
failed by calling 'mount()' and then set filesystem readonly as below:
ocfs2_mount
ocfs2_initialize_super
ocfs2_init_global_system_inodes
ocfs2_iget
ocfs2_read_locked_inode
ocfs2_validate_inode_block
ocfs2_error
ocfs2_handle_error
ocfs2_set_ro_flag(osb, 0); // set readonly
In this situation we need return -EROFS to 'mount.ocfs2', so that user
can fix it by fsck. And then mount again. In addition, 'mount.ocfs2'
should be updated correspondingly as it only return 1 for all errno.
And I will post a patch for 'mount.ocfs2' too.
Link: http://lkml.kernel.org/r/5A4302FA.2010606@huawei.com
Signed-off-by: Jun Piao <piaojun@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Reviewed-by: Gang He <ghe@suse.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Logan Gunthorpe [Mon, 18 Dec 2017 18:25:05 +0000 (11:25 -0700)]
ntb_transport: Fix bug with max_mw_size parameter
[ Upstream commit
cbd27448faff4843ac4b66cc71445a10623ff48d ]
When using the max_mw_size parameter of ntb_transport to limit the size of
the Memory windows, communication cannot be established and the queues
freeze.
This is because the mw_size that's reported to the peer is correctly
limited but the size used locally is not. So the MW is initialized
with a buffer smaller than the window but the TX side is using the
full window. This means the TX side will be writing to a region of the
window that points nowhere.
This is easily fixed by applying the same limit to tx_size in
ntb_transport_init_queue().
Fixes:
e26a5843f7f5 ("NTB: Split ntb_hw_intel and ntb_transport drivers")
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Acked-by: Allen Hubbe <Allen.Hubbe@dell.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Leon Romanovsky [Sun, 28 Jan 2018 09:25:30 +0000 (11:25 +0200)]
RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
[ Upstream commit
b081808a66345ba725b77ecd8d759bee874cd937 ]
Failure in XRCD FW deallocation command leaves memory leaked and
returns error to the user which he can't do anything about it.
This patch changes behavior to always free memory and always return
success to the user.
Fixes:
e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Bringmann [Tue, 28 Nov 2017 22:58:40 +0000 (16:58 -0600)]
powerpc/numa: Ensure nodes initialized for hotplug
[ Upstream commit
ea05ba7c559c8e5a5946c3a94a2a266e9a6680a6 ]
This patch fixes some problems encountered at runtime with
configurations that support memory-less nodes, or that hot-add CPUs
into nodes that are memoryless during system execution after boot. The
problems of interest include:
* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
them are allowed to be 'possible' and 'online'. Memory allocations
for those nodes are taken from another node that does have memory
until and if memory is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still
be referenced subsequently by affinity or associativity attributes,
are kept in the list of 'possible' nodes for powerpc. Hot-add of
memory or CPUs to the system can reference these nodes and bring
them online instead of redirecting the references to one of the set
of nodes known to have memory at boot.
Note that this software operates under the context of CPU hotplug. We
are not doing memory hotplug in this code, but rather updating the
kernel's CPU topology (i.e. arch_update_cpu_topology /
numa_update_cpu_topology). We are initializing a node that may be used
by CPUs or memory before it can be referenced as invalid by a CPU
hotplug operation. CPU hotplug operations are protected by a range of
APIs including cpu_maps_update_begin/cpu_maps_update_done,
cpus_read/write_lock / cpus_read/write_unlock, device locks, and more.
Memory hotplug operations, including try_online_node, are protected by
mem_hotplug_begin/mem_hotplug_done, device locks, and more. In the
case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no
overlap. Using HMC hot-add/hot-remove operations, we have been able to
add and remove CPUs to any possible node without failures. HMC
operations involve a degree self-serialization, though.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Bringmann [Tue, 28 Nov 2017 22:58:36 +0000 (16:58 -0600)]
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
[ Upstream commit
a346137e9142b039fd13af2e59696e3d40c487ef ]
On powerpc systems which allow 'hot-add' of CPU or memory resources,
it may occur that the new resources are to be inserted into nodes that
were not used for these resources at bootup. In the kernel, any node
that is used must be defined and initialized. These empty nodes may
occur when,
* Dedicated vs. shared resources. Shared resources require information
such as the VPHN hcall for CPU assignment to nodes. Associativity
decisions made based on dedicated resource rules, such as
associativity properties in the device tree, may vary from decisions
made using the values returned by the VPHN hcall.
* memoryless nodes at boot. Nodes need to be defined as 'possible' at
boot for operation with other code modules. Previously, the powerpc
code would limit the set of possible nodes to those which have
memory assigned at boot, and were thus online. Subsequent add/remove
of CPUs or memory would only work with this subset of possible
nodes.
* memoryless nodes with CPUs at boot. Due to the previous restriction
on nodes, nodes that had CPUs but no memory were being collapsed
into other nodes that did have memory at boot. In practice this
meant that the node assignment presented by the runtime kernel
differed from the affinity and associativity attributes presented by
the device tree or VPHN hcalls. Nodes that might be known to the
pHyp were not 'possible' in the runtime kernel because they did not
have memory at boot.
This patch ensures that sufficient nodes are defined to support
configuration requirements after boot, as well as at boot. This patch
set fixes a couple of problems.
* Nodes known to powerpc to be memoryless at boot, but to have CPUs in
them are allowed to be 'possible' and 'online'. Memory allocations
for those nodes are taken from another node that does have memory
until and if memory is hot-added to the node. * Nodes which have no
resources assigned at boot, but which may still be referenced
subsequently by affinity or associativity attributes, are kept in
the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs
to the system can reference these nodes and bring them online
instead of redirecting to one of the set of nodes that were known to
have memory at boot.
This patch extracts the value of the lowest domain level (number of
allocable resources) from the device tree property
"ibm,max-associativity-domains" to use as the maximum number of nodes
to setup as possibly available in the system. This new setting will
override the instruction:
nodes_and(node_possible_map, node_possible_map, node_online_map);
presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
If the "ibm,max-associativity-domains" property is not present at
boot, no operation will be performed to define or enable additional
nodes, or enable the above 'nodes_and()'.
Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jake Daryll Obina [Thu, 21 Sep 2017 16:00:14 +0000 (00:00 +0800)]
jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
[ Upstream commit
5bdd0c6f89fba430e18d636493398389dadc3b17 ]
If jffs2_iget() fails for a newly-allocated inode, jffs2_do_clear_inode()
can get called twice in the error handling path, the first call in
jffs2_iget() itself and the second through iget_failed(). This can result
to a use-after-free error in the second jffs2_do_clear_inode() call, such
as shown by the oops below wherein the second jffs2_do_clear_inode() call
was trying to free node fragments that were already freed in the first
jffs2_do_clear_inode() call.
[ 78.178860] jffs2: error: (1904) jffs2_do_read_inode_internal: CRC failed for read_inode of inode 24 at physical location 0x1fc00c
[ 78.178914] Unable to handle kernel paging request at virtual address
6b6b6b6b6b6b6b7b
[ 78.185871] pgd =
ffffffc03a567000
[ 78.188794] [
6b6b6b6b6b6b6b7b] *pgd=
0000000000000000, *pud=
0000000000000000
[ 78.194968] Internal error: Oops:
96000004 [#1] PREEMPT SMP
...
[ 78.513147] PC is at rb_first_postorder+0xc/0x28
[ 78.516503] LR is at jffs2_kill_fragtree+0x28/0x90 [jffs2]
[ 78.520672] pc : [<
ffffff8008323d28>] lr : [<
ffffff8000eb1cc8>] pstate:
60000105
[ 78.526757] sp :
ffffff800cea38f0
[ 78.528753] x29:
ffffff800cea38f0 x28:
ffffffc01f3f8e80
[ 78.532754] x27:
0000000000000000 x26:
ffffff800cea3c70
[ 78.536756] x25:
00000000dc67c8ae x24:
ffffffc033d6945d
[ 78.540759] x23:
ffffffc036811740 x22:
ffffff800891a5b8
[ 78.544760] x21:
0000000000000000 x20:
0000000000000000
[ 78.548762] x19:
ffffffc037d48910 x18:
ffffff800891a588
[ 78.552764] x17:
0000000000000800 x16:
0000000000000c00
[ 78.556766] x15:
0000000000000010 x14:
6f2065646f6e695f
[ 78.560767] x13:
6461657220726f66 x12:
2064656c69616620
[ 78.564769] x11:
435243203a6c616e x10:
7265746e695f6564
[ 78.568771] x9 :
6f6e695f64616572 x8 :
ffffffc037974038
[ 78.572774] x7 :
bbbbbbbbbbbbbbbb x6 :
0000000000000008
[ 78.576775] x5 :
002f91d85bd44a2f x4 :
0000000000000000
[ 78.580777] x3 :
0000000000000000 x2 :
000000403755e000
[ 78.584779] x1 :
6b6b6b6b6b6b6b6b x0 :
6b6b6b6b6b6b6b6b
...
[ 79.038551] [<
ffffff8008323d28>] rb_first_postorder+0xc/0x28
[ 79.042962] [<
ffffff8000eb5578>] jffs2_do_clear_inode+0x88/0x100 [jffs2]
[ 79.048395] [<
ffffff8000eb9ddc>] jffs2_evict_inode+0x3c/0x48 [jffs2]
[ 79.053443] [<
ffffff8008201ca8>] evict+0xb0/0x168
[ 79.056835] [<
ffffff8008202650>] iput+0x1c0/0x200
[ 79.060228] [<
ffffff800820408c>] iget_failed+0x30/0x3c
[ 79.064097] [<
ffffff8000eba0c0>] jffs2_iget+0x2d8/0x360 [jffs2]
[ 79.068740] [<
ffffff8000eb0a60>] jffs2_lookup+0xe8/0x130 [jffs2]
[ 79.073434] [<
ffffff80081f1a28>] lookup_slow+0x118/0x190
[ 79.077435] [<
ffffff80081f4708>] walk_component+0xfc/0x28c
[ 79.081610] [<
ffffff80081f4dd0>] path_lookupat+0x84/0x108
[ 79.085699] [<
ffffff80081f5578>] filename_lookup+0x88/0x100
[ 79.089960] [<
ffffff80081f572c>] user_path_at_empty+0x58/0x6c
[ 79.094396] [<
ffffff80081ebe14>] vfs_statx+0xa4/0x114
[ 79.098138] [<
ffffff80081ec44c>] SyS_newfstatat+0x58/0x98
[ 79.102227] [<
ffffff800808354c>] __sys_trace_return+0x0/0x4
[ 79.106489] Code:
d65f03c0 f9400001 b40000e1 aa0103e0 (
f9400821)
The jffs2_do_clear_inode() call in jffs2_iget() is unnecessary since
iget_failed() will eventually call jffs2_do_clear_inode() if needed, so
just remove it.
Fixes:
5451f79f5f81 ("iget: stop JFFS2 from using iget() and read_inode()")
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Jake Daryll Obina <jake.obina@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Wed, 10 Jan 2018 09:39:03 +0000 (12:39 +0300)]
HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
[ Upstream commit
7ad81482cad67cbe1ec808490d1ddfc420c42008 ]
We get the "new_profile_index" value from the mouse device when we're
handling raw events. Smatch taints it as untrusted data and complains
that we need a bounds check. This seems like a reasonable warning
otherwise there is a small read beyond the end of the array.
Fixes:
0e70f97f257e ("HID: roccat: Add support for Kova[+] mouse")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Silvan Jegen <s.jegen@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnd Bergmann [Thu, 18 Jan 2018 13:16:38 +0000 (14:16 +0100)]
scsi: fas216: fix sense buffer initialization
[ Upstream commit
96d5eaa9bb74d299508d811d865c2c41b38b0301 ]
While testing with the ARM specific memset() macro removed, I ran into a
compiler warning that shows an old bug:
drivers/scsi/arm/fas216.c: In function 'fas216_rq_sns_done':
drivers/scsi/arm/fas216.c:2014:40: error: argument to 'sizeof' in 'memset' call is the same expression as the destination; did you mean to provide an explicit length? [-Werror=sizeof-pointer-memaccess]
It turns out that the definition of the scsi_cmd structure changed back
in linux-2.6.25, so now we clear only four bytes (sizeof(pointer))
instead of 96 (SCSI_SENSE_BUFFERSIZE). I did not check whether we
actually need to initialize the buffer here, but it's clear that if we
do it, we should use the correct size.
Fixes:
de25deb18016 ("[SCSI] use dynamically allocated sense buffer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liu Bo [Tue, 2 Jan 2018 20:36:41 +0000 (13:36 -0700)]
Btrfs: fix scrub to repair raid6 corruption
[ Upstream commit
762221f095e3932669093466aaf4b85ed9ad2ac1 ]
The raid6 corruption is that,
suppose that all disks can be read without problems and if the content
that was read out doesn't match its checksum, currently for raid6
btrfs at most retries twice,
- the 1st retry is to rebuild with all other stripes, it'll eventually
be a raid5 xor rebuild,
- if the 1st fails, the 2nd retry will deliberately fail parity p so
that it will do raid6 style rebuild,
however, the chances are that another non-parity stripe content also
has something corrupted, so that the above retries are not able to
return correct content.
We've fixed normal reads to rebuild raid6 correctly with more retries
in Patch "Btrfs: make raid6 rebuild retry more"[1], this is to fix
scrub to do the exactly same rebuild process.
[1]: https://patchwork.kernel.org/patch/
10091755/
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nikolay Borisov [Tue, 12 Dec 2017 09:14:49 +0000 (11:14 +0200)]
btrfs: Fix out of bounds access in btrfs_search_slot
[ Upstream commit
9ea2c7c9da13c9073e371c046cbbc45481ecb459 ]
When modifying a tree where the root is at BTRFS_MAX_LEVEL - 1 then
the level variable is going to be 7 (this is the max height of the
tree). On the other hand btrfs_cow_block is always called with
"level + 1" as an index into the nodes and slots arrays. This leads to
an out of bounds access. Admittdely this will be benign since an OOB
access of the nodes array will likely read the 0th element from the
slots array, which in this case is going to be 0 (since we start CoW at
the top of the tree). The OOB access into the slots array in turn will
read the 0th and 1st values of the locks array, which would both be 0
at the time. However, this benign behavior relies on the fact that the
path being passed hasn't been initialised, if it has already been used to
query a btree then it could potentially have populated the nodes/slots arrays.
Fix it by explicitly checking if we are at level 7 (the maximum allowed
index in nodes/slots arrays) and explicitly call the CoW routine with
NULL for parent's node/slot.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Fixes-coverity-id: 711515
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Liu Bo [Wed, 15 Nov 2017 23:10:28 +0000 (16:10 -0700)]
Btrfs: set plug for fsync
[ Upstream commit
343e4fc1c60971b0734de26dbbd475d433950982 ]
Setting plug can merge adjacent IOs before dispatching IOs to the disk
driver.
Without plug, it'd not be a problem for single disk usecases, but for
multiple disks using raid profile, a large IO can be split to several
IOs of stripe length, and plug can be helpful to bring them together
for each disk so that we can save several disk access.
Moreover, fsync issues synchronous writes, so plug can really take
effect.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Wei Yongjun [Thu, 18 Jan 2018 01:43:19 +0000 (01:43 +0000)]
ipmi/powernv: Fix error return code in ipmi_powernv_probe()
[ Upstream commit
e749d328b0b450aa78d562fa26a0cd8872325dd9 ]
Fix to return a negative error code from the request_irq() error
handling case instead of 0, as done elsewhere in this function.
Fixes:
dce143c3381c ("ipmi/powernv: Convert to irq event interface")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
weiyongjun (A) [Thu, 18 Jan 2018 02:23:34 +0000 (02:23 +0000)]
mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
[ Upstream commit
0ddcff49b672239dda94d70d0fcf50317a9f4b51 ]
'hwname' is malloced in hwsim_new_radio_nl() and should be freed
before leaving from the error handling cases, otherwise it will cause
memory leak.
Fixes:
ff4dd73dd2b4 ("mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ulf Magnusson [Sun, 8 Oct 2017 17:35:45 +0000 (19:35 +0200)]
kconfig: Fix expr_free() E_NOT leak
[ Upstream commit
5b1374b3b3c2fc4f63a398adfa446fb8eff791a4 ]
Only the E_NOT operand and not the E_NOT node itself was freed, due to
accidentally returning too early in expr_free(). Outline of leak:
switch (e->type) {
...
case E_NOT:
expr_free(e->left.expr);
return;
...
}
*Never reached, 'e' leaked*
free(e);
Fix by changing the 'return' to a 'break'.
Summary from Valgrind on 'menuconfig' (ARCH=x86) before the fix:
LEAK SUMMARY:
definitely lost: 44,448 bytes in 1,852 blocks
...
Summary after the fix:
LEAK SUMMARY:
definitely lost: 1,608 bytes in 67 blocks
...
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ulf Magnusson [Sun, 8 Oct 2017 17:35:44 +0000 (19:35 +0200)]
kconfig: Fix automatic menu creation mem leak
[ Upstream commit
ae7440ef0c8013d68c00dad6900e7cce5311bb1c ]
expr_trans_compare() always allocates and returns a new expression,
giving the following leak outline:
...
*Allocate*
basedep = expr_trans_compare(basedep, E_UNEQUAL, &symbol_no);
...
for (menu = parent->next; menu; menu = menu->next) {
...
*Copy*
dep2 = expr_copy(basedep);
...
*Free copy*
expr_free(dep2);
}
*basedep lost!*
Fix by freeing 'basedep' after the loop.
Summary from Valgrind on 'menuconfig' (ARCH=x86) before the fix:
LEAK SUMMARY:
definitely lost: 344,376 bytes in 14,349 blocks
...
Summary after the fix:
LEAK SUMMARY:
definitely lost: 44,448 bytes in 1,852 blocks
...
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ulf Magnusson [Sun, 8 Oct 2017 17:11:21 +0000 (19:11 +0200)]
kconfig: Don't leak main menus during parsing
[ Upstream commit
0724a7c32a54e3e50d28e19e30c59014f61d4e2c ]
If a 'mainmenu' entry appeared in the Kconfig files, two things would
leak:
- The 'struct property' allocated for the default "Linux Kernel
Configuration" prompt.
- The string for the T_WORD/T_WORD_QUOTE prompt after the
T_MAINMENU token, allocated on the heap in zconf.l.
To fix it, introduce a new 'no_mainmenu_stmt' nonterminal that matches
if there's no 'mainmenu' and adds the default prompt. That means the
prompt only gets allocated once regardless of whether there's a
'mainmenu' statement or not, and managing it becomes simple.
Summary from Valgrind on 'menuconfig' (ARCH=x86) before the fix:
LEAK SUMMARY:
definitely lost: 344,568 bytes in 14,352 blocks
...
Summary after the fix:
LEAK SUMMARY:
definitely lost: 344,440 bytes in 14,350 blocks
...
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Guenter Roeck [Sun, 24 Dec 2017 21:04:07 +0000 (13:04 -0800)]
watchdog: sp5100_tco: Fix watchdog disable bit
[ Upstream commit
f541c09ebfc61697b586b38c9ebaf4b70defb278 ]
According to all published information, the watchdog disable bit for SB800
compatible controllers is bit 1 of PM register 0x48, not bit 2. For the
most part that doesn't matter in practice, since the bit has to be cleared
to enable watchdog address decoding, which is the default setting, but it
still needs to be fixed.
Cc: Zoltán Böszörményi <zboszor@pr.hu>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Chochol [Fri, 5 Jan 2018 07:39:12 +0000 (08:39 +0100)]
nfs: Do not convert nfs_idmap_cache_timeout to jiffies
[ Upstream commit
cbebc6ef4fc830f4040d4140bf53484812d5d5d9 ]
Since commit
57e62324e469 ("NFS: Store the legacy idmapper result in the
keyring") nfs_idmap_cache_timeout changed units from jiffies to seconds.
Unfortunately sysctl interface was not updated accordingly.
As a effect updating /proc/sys/fs/nfs/idmap_cache_timeout with some
value will incorrectly multiply this value by HZ.
Also reading /proc/sys/fs/nfs/idmap_cache_timeout will show real value
divided by HZ.
Fixes:
57e62324e469 ("NFS: Store the legacy idmapper result in the keyring")
Signed-off-by: Jan Chochol <jan@chochol.info>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mulhern [Mon, 27 Nov 2017 15:02:39 +0000 (10:02 -0500)]
dm thin: fix documentation relative to low water mark threshold
[ Upstream commit
9b28a1102efc75d81298198166ead87d643a29ce ]
Fixes:
1. The use of "exceeds" when the opposite of exceeds, falls below,
was meant.
2. Properly speaking, a table can not exceed a threshold.
It emphasizes the important point, which is that it is the userspace
daemon's responsibility to check for low free space when a device
is resumed, since it won't get a special event indicating low free
space in that situation.
Signed-off-by: mulhern <amulhern@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven Rostedt (VMware) [Fri, 12 Jan 2018 00:47:51 +0000 (19:47 -0500)]
tools lib traceevent: Fix get_field_str() for dynamic strings
[ Upstream commit
d777f8de99b05d399c0e4e51cdce016f26bd971b ]
If a field is a dynamic string, get_field_str() returned just the
offset/size value and not the string. Have it parse the offset/size
correctly to return the actual string. Otherwise filtering fails when
trying to filter fields that are dynamic strings.
Reported-by: Gopanapalli Pradeep <prap_hai@yahoo.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20180112004823.146333275@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Arnaldo Carvalho de Melo [Mon, 15 Jan 2018 14:07:58 +0000 (11:07 -0300)]
perf callchain: Fix attr.sample_max_stack setting
[ Upstream commit
249d98e567e25dd03e015e2d31e1b7b9648f34df ]
When setting the "dwarf" unwinder for a specific event and not
specifying the max-stack, the attr.sample_max_stack ended up using an
uninitialized callchain_param.max_stack, fix it by using designated
initializers for that callchain_param variable, zeroing all non
explicitely initialized struct members.
Here is what happened:
# perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
perf_event_attr:
type 2
size 112
config 0x730
{ sample_period, sample_freq } 1
sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC
exclude_callchain_user 1
{ wakeup_events, wakeup_watermark } 1
sample_regs_user 0xff0fff
sample_stack_user 8192
sample_max_stack 50656
sys_perf_event_open failed, error -75
Value too large for defined data type
# perf trace -vv --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
callchain: type DWARF
callchain: stack dump size 8192
perf_event_attr:
type 2
size 112
config 0x730
sample_type IP|TID|TIME|ADDR|CALLCHAIN|CPU|PERIOD|RAW|REGS_USER|STACK_USER|DATA_SRC
exclude_callchain_user 1
sample_regs_user 0xff0fff
sample_stack_user 8192
sample_max_stack 30448
sys_perf_event_open failed, error -75
Value too large for defined data type
#
Now the attr.sample_max_stack is set to zero and the above works as
expected:
# perf trace --no-syscalls --max-stack 4 -e probe_libc:inet_pton/call-graph=dwarf/ ping -6 -c 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.072 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.072/0.072/0.072/0.000 ms
0.000 probe_libc:inet_pton:(
7feb7a998350))
__inet_pton (inlined)
gaih_inet.constprop.7 (/usr/lib64/libc-2.26.so)
__GI_getaddrinfo (inlined)
[0xffffaa39b6108f3f] (/usr/bin/ping)
#
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hendrick Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Thomas Richter <tmricht@linux.vnet.ibm.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-is9tramondqa9jlxxsgcm9iz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Steven Rostedt (VMware) [Fri, 12 Jan 2018 00:47:45 +0000 (19:47 -0500)]
tools lib traceevent: Simplify pointer print logic and fix %pF
[ Upstream commit
38d70b7ca1769f26c0b79f3c08ff2cc949712b59 ]
When processing %pX in pretty_print(), simplify the logic slightly by
incrementing the ptr to the format string if isalnum(ptr[1]) is true.
This follows the logic a bit more closely to what is in the kernel.
Also, this fixes a small bug where %pF was not giving the offset of the
function.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20180112004822.260262257@goodmis.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alex Williamson [Tue, 16 Jan 2018 17:05:26 +0000 (10:05 -0700)]
PCI: Add function 1 DMA alias quirk for Marvell 9128
[ Upstream commit
aa008206634363ef800fbd5f0262016c9ff81dea ]
The Marvell 9128 is the original device generating bug 42679, from which
many other Marvell DMA alias quirks have been sourced, but we didn't have
positive confirmation of the fix on 9128 until now.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=42679
Link: https://www.spinics.net/lists/kvm/msg161459.html
Reported-by: Binarus <lists@binarus.de>
Tested-by: Binarus <lists@binarus.de>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anna-Maria Gleixner [Thu, 21 Dec 2017 10:41:37 +0000 (11:41 +0100)]
tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
[ Upstream commit
91633eed73a3ac37aaece5c8c1f93a18bae616a9 ]
So far only CLOCK_MONOTONIC and CLOCK_REALTIME were taken into account as
well as HRTIMER_MODE_ABS/REL in the hrtimer_init tracepoint. The query for
detecting the ABS or REL timer modes is not valid anymore, it got broken
by the introduction of HRTIMER_MODE_PINNED.
HRTIMER_MODE_PINNED is not evaluated in the hrtimer_init() call, but for the
sake of completeness print all given modes.
Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Link: http://lkml.kernel.org/r/20171221104205.7269-9-anna-maria@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Paolo Bonzini [Thu, 26 Oct 2017 13:45:47 +0000 (15:45 +0200)]
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
[ Upstream commit
51776043afa415435c7e4636204fbe4f7edc4501 ]
This ioctl is obsolete (it was used by Xenner as far as I know) but
still let's not break it gratuitously... Its handler is copying
directly into struct kvm. Go through a bounce buffer instead, with
the added benefit that we can actually do something useful with the
flags argument---the previous code was exiting with -EINVAL but still
doing the copy.
This technically is a userspace ABI breakage, but since no one should be
using the ioctl, it's a good occasion to see if someone actually
complains.
Cc: kernel-hardening@lists.openwall.com
Cc: Kees Cook <keescook@chromium.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Mon, 15 Jan 2018 08:08:38 +0000 (11:08 +0300)]
ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
[ Upstream commit
123af9043e93cb6f235207d260d50f832cdb5439 ]
The loop timeout doesn't work because it's a post op and ends with "tmo"
set to -1. I changed it from a post-op to a pre-op and I changed the
initial the starting value from 5 to 6 so we still iterate 5 times. I
left the other as it was because it's a large number.
Fixes:
b3c70c9ea62a ("ASoC: Alchemy AC97C/I2SC audio support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Mon, 15 Jan 2018 09:44:35 +0000 (10:44 +0100)]
ALSA: hda - Use IS_REACHABLE() for dependency on input
[ Upstream commit
c469652bb5e8fb715db7d152f46d33b3740c9b87 ]
The commit
ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek
HD-audio codec") introduced the reverse-selection of CONFIG_INPUT for
Realtek codec in order to avoid the mess with dependency between
built-in and modules. Later on, we obtained IS_REACHABLE() macro
exactly for this kind of problems, and now we can remove th INPUT
selection in Kconfig and put IS_REACHABLE(INPUT) to the appropriate
places in the code, so that the driver doesn't need to select other
subsystem forcibly.
Fixes:
ffcd28d88e4f ("ALSA: hda - Select INPUT for Realtek HD-audio codec")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # and build-tested
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
NeilBrown [Tue, 12 Dec 2017 22:57:09 +0000 (09:57 +1100)]
NFSv4: always set NFS_LOCK_LOST when a lock is lost.
[ Upstream commit
dce2630c7da73b0634686bca557cc8945cc450c8 ]
There are 2 comments in the NFSv4 code which suggest that
SIGLOST should possibly be sent to a process. In these
cases a lock has been lost.
The current practice is to set NFS_LOCK_LOST so that
read/write returns EIO when a lock is lost.
So change these comments to code when sets NFS_LOCK_LOST.
One case is when lock recovery after apparent server restart
fails with NFS4ERR_DENIED, NFS4ERR_RECLAIM_BAD, or
NFS4ERRO_RECLAIM_CONFLICT. The other case is when a lock
attempt as part of lease recovery fails with NFS4ERR_DENIED.
In an ideal world, these should not happen. However I have
a packet trace showing an NFSv4.1 session getting
NFS4ERR_BADSESSION after an extended network parition. The
NFSv4.1 client treats this like server reboot until/unless
it get NFS4ERR_NO_GRACE, in which case it switches over to
"nograce" recovery mode. In this network trace, the client
attempts to recover a lock and the server (incorrectly)
reports NFS4ERR_DENIED rather than NFS4ERR_NO_GRACE. This
leads to the ineffective comment and the client then
continues to write using the OPEN stateid.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Hector Martin [Fri, 3 Nov 2017 11:28:57 +0000 (20:28 +0900)]
firewire-ohci: work around oversized DMA reads on JMicron controllers
[ Upstream commit
188775181bc05f29372b305ef96485840e351fde ]
At least some JMicron controllers issue buggy oversized DMA reads when
fetching context descriptors, always fetching 0x20 bytes at once for
descriptors which are only 0x10 bytes long. This is often harmless, but
can cause page faults on modern systems with IOMMUs:
DMAR: [DMA Read] Request device [05:00.0] fault addr
fff56000 [fault reason 06] PTE Read access is not set
firewire_ohci 0000:05:00.0: DMA context IT0 has stopped, error code: evt_descriptor_read
This works around the problem by always leaving 0x10 padding bytes at
the end of descriptor buffer pages, which should be harmless to do
unconditionally for controllers in case others have the same behavior.
Signed-off-by: Hector Martin <marcan@marcan.st>
Reviewed-by: Clemens Ladisch <clemens@ladisch.de>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Fri, 4 May 2018 12:23:01 +0000 (08:23 -0400)]
do d_instantiate/unlock_new_inode combinations safely
commit
1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream.
For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex. Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.
Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode(). All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.
Cc: stable@kernel.org # 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Brian Foster [Wed, 25 Jan 2017 15:53:43 +0000 (07:53 -0800)]
xfs: remove racy hasattr check from attr ops
commit
5a93790d4e2df73e30c965ec6e49be82fc3ccfce upstream.
xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize
away a lock cycle in cases where the fork does not exist or is otherwise
empty. This check is not safe, however, because an attribute fork short
form to extent format conversion includes a transient state that causes
the xfs_inode_hasattr() check to fail. Specifically,
xfs_attr_shortform_to_leaf() creates an empty extent format attribute
fork and then adds the existing shortform attributes to it.
This means that lookup of an existing xattr can spuriously return
-ENOATTR when racing against a setxattr that causes the associated
format conversion. This was originally reproduced by an untar on a
particularly configured glusterfs volume, but can also be reproduced on
demand with properly crafted xattr requests.
The format conversion occurs under the exclusive ilock. xfs_attr_get()
and xfs_attr_remove() already have the proper locking and checks further
down in the functions to handle this situation correctly. Drop the
unlocked checks to avoid the spurious failure and rely on the existing
logic.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Daniel Sangorrin <daniel.sangorrin@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
zhongjiang [Mon, 10 Jul 2017 22:52:57 +0000 (15:52 -0700)]
kernel/signal.c: avoid undefined behaviour in kill_something_info
commit
4ea77014af0d6205b05503d1c7aac6eace11d473 upstream.
When running kill(
72057458746458112, 0) in userspace I hit the following
issue.
UBSAN: Undefined behaviour in kernel/signal.c:1462:11
negation of -
2147483648 cannot be represented in type 'int':
CPU: 226 PID: 9849 Comm: test Tainted: G B ---- ------- 3.10.0-327.53.58.70.x86_64_ubsan+ #116
Hardware name: Huawei Technologies Co., Ltd. RH8100 V3/BC61PBIA, BIOS BLHSV028 11/11/2014
Call Trace:
dump_stack+0x19/0x1b
ubsan_epilogue+0xd/0x50
__ubsan_handle_negate_overflow+0x109/0x14e
SYSC_kill+0x43e/0x4d0
SyS_kill+0xe/0x10
system_call_fastpath+0x16/0x1b
Add code to avoid the UBSAN detection.
[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/1496670008-59084-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhongjiang <zhongjiang@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gustavo A. R. Silva [Fri, 25 May 2018 21:47:57 +0000 (14:47 -0700)]
kernel/sys.c: fix potential Spectre v1 issue
commit
23d6aef74da86a33fa6bb75f79565e0a16ee97c2 upstream.
`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue 'get_current()->signal->rlim' (local cap)
Fix this by sanitizing *resource* before using it to index
current->signal->rlim
Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=
152449131114778&w=2
Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David Hildenbrand [Fri, 25 May 2018 21:48:11 +0000 (14:48 -0700)]
kasan: fix memory hotplug during boot
commit
3f1959721558a976aaf9c2024d5bc884e6411bf7 upstream.
Using module_init() is wrong. E.g. ACPI adds and onlines memory before
our memory notifier gets registered.
This makes sure that ACPI memory detected during boot up will not result
in a kernel crash.
Easily reproducible with QEMU, just specify a DIMM when starting up.
Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
Fixes:
786a8959912e ("kasan: disable memory hotplug")
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davidlohr Bueso [Fri, 25 May 2018 21:47:30 +0000 (14:47 -0700)]
ipc/shm: fix shmat() nil address after round-down when remapping
commit
8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.
shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for. Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil. As
of this patch, such cases will return -EINVAL.
Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Davidlohr Bueso [Fri, 25 May 2018 21:47:27 +0000 (14:47 -0700)]
Revert "ipc/shm: Fix shmat mmap nil-page protection"
commit
a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.
Patch series "ipc/shm: shmat() fixes around nil-page".
These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.
The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.
I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour. Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).
[1] lkml.kernel.org/r/
20180430172152.nfa564pvgpk3ut7p@linux-n805
This patch (of 2):
Commit
95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED. However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.
For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].
[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes:
95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Joe Jin [Thu, 17 May 2018 19:33:28 +0000 (12:33 -0700)]
xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
commit
4855c92dbb7b3b85c23e88ab7ca04f99b9677b41 upstream.
When run raidconfig from Dom0 we found that the Xen DMA heap is reduced,
but Dom Heap is increased by the same size. Tracing raidconfig we found
that the related ioctl() in megaraid_sas will call dma_alloc_coherent()
to apply memory. If the memory allocated by Dom0 is not in the DMA area,
it will exchange memory with Xen to meet the requiment. Later drivers
call dma_free_coherent() to free the memory, on xen_swiotlb_free_coherent()
the check condition (dev_addr + size - 1 <= dma_mask) is always false,
it prevents calling xen_destroy_contiguous_region() to return the memory
to the Xen DMA heap.
This issue introduced by commit
6810df88dcfc2 "xen-swiotlb: When doing
coherent alloc/dealloc check before swizzling the MFNs.".
Signed-off-by: Joe Jin <joe.jin@oracle.com>
Tested-by: John Sobecki <john.sobecki@oracle.com>
Reviewed-by: Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sudip Mukherjee [Sat, 19 May 2018 21:29:36 +0000 (22:29 +0100)]
libata: blacklist Micron 500IT SSD with MU01 firmware
commit
136d769e0b3475d71350aa3648a116a6ee7a8f6c upstream.
While whitelisting Micron M500DC drives, the tweaked blacklist entry
enabled queued TRIM from M500IT variants also. But these do not support
queued TRIM. And while using those SSDs with the latest kernel we have
seen errors and even the partition table getting corrupted.
Some part from the dmesg:
[ 6.727384] ata1.00: ATA-9: Micron_M500IT_MTFDDAK060MBD, MU01, max UDMA/133
[ 6.727390] ata1.00:
117231408 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 6.741026] ata1.00: supports DRM functions and may not be fully accessible
[ 6.759887] ata1.00: configured for UDMA/133
[ 6.762256] scsi 0:0:0:0: Direct-Access ATA Micron_M500IT_MT MU01 PQ: 0 ANSI: 5
and then for the error:
[ 120.860334] ata1.00: exception Emask 0x1 SAct 0x7ffc0007 SErr 0x0 action 0x6 frozen
[ 120.860338] ata1.00: irq_stat 0x40000008
[ 120.860342] ata1.00: failed command: SEND FPDMA QUEUED
[ 120.860351] ata1.00: cmd 64/01:00:00:00:00/00:00:00:00:00/a0 tag 0 ncq dma 512 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x5 (timeout)
[ 120.860353] ata1.00: status: { DRDY }
[ 120.860543] ata1: hard resetting link
[ 121.166128] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 121.166376] ata1.00: supports DRM functions and may not be fully accessible
[ 121.186238] ata1.00: supports DRM functions and may not be fully accessible
[ 121.204445] ata1.00: configured for UDMA/133
[ 121.204454] ata1.00: device reported invalid CHS sector 0
[ 121.204541] sd 0:0:0:0: [sda] tag#18 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08
[ 121.204546] sd 0:0:0:0: [sda] tag#18 Sense Key : 0x5 [current]
[ 121.204550] sd 0:0:0:0: [sda] tag#18 ASC=0x21 ASCQ=0x4
[ 121.204555] sd 0:0:0:0: [sda] tag#18 CDB: opcode=0x93 93 08 00 00 00 00 00 04 28 80 00 00 00 30 00 00
[ 121.204559] print_req_error: I/O error, dev sda, sector 272512
After few reboots with these errors, and the SSD is corrupted.
After blacklisting it, the errors are not seen and the SSD does not get
corrupted any more.
Fixes:
243918be6393 ("libata: Do not blacklist Micron M500DC")
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tejun Heo [Tue, 8 May 2018 21:21:56 +0000 (14:21 -0700)]
libata: Blacklist some Sandisk SSDs for NCQ
commit
322579dcc865b94b47345ad1b6002ad167f85405 upstream.
Sandisk SSDs SD7SN6S256G and SD8SN8U256G are regularly locking up
regularly under sustained moderate load with NCQ enabled. Blacklist
for now.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Corneliu Doban [Fri, 18 May 2018 22:03:56 +0000 (15:03 -0700)]
mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
commit
5f651b870485ee60f5abbbd85195a6852978894a upstream.
When the host controller accepts only 32bit writes, the value of the
16bit TRANSFER_MODE register, that has the same 32bit address as the
16bit COMMAND register, needs to be saved and it will be written
in a 32bit write together with the command as this will trigger the
host to send the command on the SD interface.
When sending the tuning command, TRANSFER_MODE is written and then
sdhci_set_transfer_mode reads it back to clear AUTO_CMD12 bit and
write it again resulting in wrong value to be written because the
initial write value was saved in a shadow and the read-back returned
a wrong value, from the register.
Fix sdhci_iproc_readw to return the saved value of TRANSFER_MODE
when a saved value exist.
Same fix for read of BLOCK_SIZE and BLOCK_COUNT registers, that are
saved for a different reason, although a scenario that will cause the
mentioned problem on this registers is not probable.
Fixes:
b580c52d58d9 ("mmc: sdhci-iproc: add IPROC SDHCI driver")
Signed-off-by: Corneliu Doban <corneliu.doban@broadcom.com>
Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ben Hutchings [Thu, 17 May 2018 21:34:39 +0000 (22:34 +0100)]
ALSA: timer: Fix pause event notification
commit
3ae180972564846e6d794e3615e1ab0a1e6c4ef9 upstream.
Commit
f65e0d299807 ("ALSA: timer: Call notifier in the same spinlock")
combined the start/continue and stop/pause functions, and in doing so
changed the event code for the pause case to SNDRV_TIMER_EVENT_CONTINUE.
Change it back to SNDRV_TIMER_EVENT_PAUSE.
Fixes:
f65e0d299807 ("ALSA: timer: Call notifier in the same spinlock")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Sun, 20 May 2018 20:46:23 +0000 (16:46 -0400)]
aio: fix io_destroy(2) vs. lookup_ioctx() race
commit
baf10564fbb66ea222cae66fbff11c444590ffd9 upstream.
kill_ioctx() used to have an explicit RCU delay between removing the
reference from ->ioctx_table and percpu_ref_kill() dropping the refcount.
At some point that delay had been removed, on the theory that
percpu_ref_kill() itself contained an RCU delay. Unfortunately, that was
the wrong kind of RCU delay and it didn't care about rcu_read_lock() used
by lookup_ioctx(). As the result, we could get ctx freed right under
lookup_ioctx(). Tejun has fixed that in
a6d7cff472e ("fs/aio: Add explicit
RCU grace period when freeing kioctx"); however, that fix is not enough.
Suppose io_destroy() from one thread races with e.g. io_setup() from another;
CPU1 removes the reference from current->mm->ioctx_table[...] just as CPU2
has picked it (under rcu_read_lock()). Then CPU1 proceeds to drop the
refcount, getting it to 0 and triggering a call of free_ioctx_users(),
which proceeds to drop the secondary refcount and once that reaches zero
calls free_ioctx_reqs(). That does
INIT_RCU_WORK(&ctx->free_rwork, free_ioctx);
queue_rcu_work(system_wq, &ctx->free_rwork);
and schedules freeing the whole thing after RCU delay.
In the meanwhile CPU2 has gotten around to percpu_ref_get(), bumping the
refcount from 0 to 1 and returned the reference to io_setup().
Tejun's fix (that queue_rcu_work() in there) guarantees that ctx won't get
freed until after percpu_ref_get(). Sure, we'd increment the counter before
ctx can be freed. Now we are out of rcu_read_lock() and there's nothing to
stop freeing of the whole thing. Unfortunately, CPU2 assumes that since it
has grabbed the reference, ctx is *NOT* going away until it gets around to
dropping that reference.
The fix is obvious - use percpu_ref_tryget_live() and treat failure as miss.
It's not costlier than what we currently do in normal case, it's safe to
call since freeing *is* delayed and it closes the race window - either
lookup_ioctx() comes before percpu_ref_kill() (in which case ctx->users
won't reach 0 until the caller of lookup_ioctx() drops it) or lookup_ioctx()
fails, ctx->users is unaffected and caller of lookup_ioctx() doesn't see
the object in question at all.
Cc: stable@kernel.org
Fixes:
a6d7cff472e "fs/aio: Add explicit RCU grace period when freeing kioctx"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Sun, 6 May 2018 16:15:20 +0000 (12:15 -0400)]
affs_lookup(): close a race with affs_remove_link()
commit
30da870ce4a4e007c901858a96e9e394a1daa74a upstream.
we unlock the directory hash too early - if we are looking at secondary
link and primary (in another directory) gets removed just as we unlock,
we could have the old primary moved in place of the secondary, leaving
us to look into freed entry (and leaving our dentry with ->d_fsdata
pointing to a freed entry).
Cc: stable@vger.kernel.org # 2.4.4+
Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Mon, 14 May 2018 17:23:50 +0000 (18:23 +0100)]
KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
commit
ba3696e94d9d590d9a7e55f68e81c25dba515191 upstream.
Trivial fix to spelling mistake in debugfs_entries text.
Fixes:
669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: kernel-janitors@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maciej W. Rozycki [Mon, 14 May 2018 15:49:43 +0000 (16:49 +0100)]
MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
commit
9a3a92ccfe3620743d4ae57c987dc8e9c5f88996 upstream.
Check the TIF_32BIT_FPREGS task setting of the tracee rather than the
tracer in determining the layout of floating-point general registers in
the floating-point context, correcting access to odd-numbered registers
for o32 tracees where the setting disagrees between the two processes.
Fixes:
597ce1723e0f ("MIPS: Support for 64-bit FP with O32 binaries")
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Maciej W. Rozycki [Mon, 30 Apr 2018 14:56:47 +0000 (15:56 +0100)]
MIPS: ptrace: Expose FIR register through FP regset
commit
71e909c0cdad28a1df1fa14442929e68615dee45 upstream.
Correct commit
7aeb753b5353 ("MIPS: Implement task_user_regset_view.")
and expose the FIR register using the unused 4 bytes at the end of the
NT_PRFPREG regset. Without that register included clients cannot use
the PTRACE_GETREGSET request to retrieve the complete FPU register set
and have to resort to one of the older interfaces, either PTRACE_PEEKUSR
or PTRACE_GETFPREGS, to retrieve the missing piece of data. Also the
register is irreversibly missing from core dumps.
This register is architecturally hardwired and read-only so the write
path does not matter. Ignore data supplied on writes then.
Fixes:
7aeb753b5353 ("MIPS: Implement task_user_regset_view.")
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.13+
Patchwork: https://patchwork.linux-mips.org/patch/19273/
Signed-off-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Sat, 26 May 2018 06:49:01 +0000 (08:49 +0200)]
Linux 4.4.133
Tetsuo Handa [Wed, 9 May 2018 10:42:20 +0000 (19:42 +0900)]
x86/kexec: Avoid double free_page() upon do_kexec_load() failure
commit
a466ef76b815b86748d9870ef2a430af7b39c710 upstream.
>From
ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 9 May 2018 12:12:39 +0900
Subject: x86/kexec: Avoid double free_page() upon do_kexec_load() failure
syzbot is reporting crashes after memory allocation failure inside
do_kexec_load() [1]. This is because free_transition_pgtable() is called
by both init_transition_pgtable() and machine_kexec_cleanup() when memory
allocation failed inside init_transition_pgtable().
Regarding 32bit code, machine_kexec_free_page_tables() is called by both
machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory
allocation failed inside machine_kexec_alloc_page_tables().
Fix this by leaving the error handling to machine_kexec_cleanup()
(and optionally setting NULL after free_page()).
[1] https://syzkaller.appspot.com/bug?id=
91e52396168cf2bdd572fe1e1bc0bc645c1c6b40
Fixes:
f5deb79679af6eb4 ("x86: kexec: Use one page table in x86_64 machine_kexec")
Fixes:
92be3d6bdf2cb349 ("kexec/i386: allocate page table pages dynamically")
Reported-by: syzbot <syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: thomas.lendacky@amd.com
Cc: prudo@linux.vnet.ibm.com
Cc: Huang Ying <ying.huang@intel.com>
Cc: syzkaller-bugs@googlegroups.com
Cc: takahiro.akashi@linaro.org
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: akpm@linux-foundation.org
Cc: dyoung@redhat.com
Cc: kirill.shutemov@linux.intel.com
Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA.ne.jp
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tetsuo Handa [Fri, 18 May 2018 23:09:16 +0000 (16:09 -0700)]
hfsplus: stop workqueue when fill_super() failed
commit
66072c29328717072fd84aaff3e070e3f008ba77 upstream.
syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This
is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync().
As far as I can see, it is hfsplus_mark_mdb_dirty() from
hfsplus_new_inode() in hfsplus_fill_super() that calls
queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does
not fail if queue_delayed_work() was called, and the out_put_hidden_dir
label is the appropriate location to call cancel_delayed_work_sync().
[1] https://syzkaller.appspot.com/bug?id=
a66f45e96fdbeb76b796bf46eb25ea878c42a6c9
Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johannes Berg [Tue, 3 Apr 2018 12:33:49 +0000 (14:33 +0200)]
cfg80211: limit wiphy names to 128 bytes
commit
a7cfebcb7594a24609268f91299ab85ba064bf82 upstream.
There's currently no limit on wiphy names, other than netlink
message size and memory limitations, but that causes issues when,
for example, the wiphy name is used in a uevent, e.g. in rfkill
where we use the same name for the rfkill instance, and then the
buffer there is "only" 2k for the environment variables.
This was reported by syzkaller, which used a 4k name.
Limit the name to something reasonable, I randomly picked 128.
Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Thu, 18 Feb 2016 16:06:30 +0000 (17:06 +0100)]
gpio: rcar: Add Runtime PM handling for interrupts
commit
b26a719bdba9aa926ceaadecc66e07623d2b8a53 upstream.
The R-Car GPIO driver handles Runtime PM for requested GPIOs only.
When using a GPIO purely as an interrupt source, no Runtime PM handling
is done, and the GPIO module's clock may not be enabled.
To fix this:
- Add .irq_request_resources() and .irq_release_resources() callbacks
to handle Runtime PM when an interrupt is requested,
- Add irq_bus_lock() and sync_unlock() callbacks to handle Runtime PM
when e.g. disabling/enabling an interrupt, or configuring the
interrupt type.
Fixes:
d5c3d84657db57bd "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS"
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
[fabrizio: cherry-pick to v4.4.y. Use container_of instead of
gpiochip_get_data.]
Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Reviewed-by: Biju Das <biju.das@bp.renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
John Stultz [Thu, 8 Jun 2017 23:44:21 +0000 (16:44 -0700)]
time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
commit
3d88d56c5873f6eebe23e05c3da701960146b801 upstream.
Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.
This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.
Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Daniel Mentz <danielmentz@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: "stable #4 . 8+" <stable@vger.kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[fabrizio: cherry-pick to 4.4. Kept cycle_t type for function
logarithmic_accumulation local variable "interval". Dropped
casting of "interval" variable]
Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Signed-off-by: Biju Das <biju.das@bp.renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Vinod Koul [Tue, 12 Apr 2016 15:37:06 +0000 (21:07 +0530)]
dmaengine: ensure dmaengine helpers check valid callback
commit
757d12e5849be549076901b0d33c60d5f360269c upstream.
dmaengine has various device callbacks and exposes helper
functions to invoke these. These helpers should check if channel,
device and callback is valid or not before invoking them.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com>
Signed-off-by: Jianming Qiao <jianming.qiao@bp.renesas.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>