OSDN Git Service

uclinux-h8/linux.git
6 years agoskd: Report completion mismatches once
Bart Van Assche [Wed, 23 Aug 2017 17:56:30 +0000 (10:56 -0700)]
skd: Report completion mismatches once

This patch removes one debug statement but otherwise does not change
any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: Warn if blk_queue_rq_timed_out() is called for a blk-mq queue
Bart Van Assche [Wed, 23 Aug 2017 17:56:29 +0000 (10:56 -0700)]
block: Warn if blk_queue_rq_timed_out() is called for a blk-mq queue

The timeout handler set by blk_queue_rq_timed_out() is only used
in single queue mode. Calling this function for blk-mq drivers is
wrong. Hence issue a warning if this function is called by a blk-mq
driver.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: badbblocks support
Shaohua Li [Mon, 14 Aug 2017 22:05:00 +0000 (15:05 -0700)]
nullb: badbblocks support

Sometime disk could have tracks broken and data there is inaccessable,
but data in other parts can be accessed in normal way. MD RAID supports
such disks. But we don't have a good way to test it, because we can't
control which part of a physical disk is bad. For a virtual disk, this
can be easily controlled.

This patch adds a new 'badblock' attribute. Configure it in this way:
echo "+1-100" > xxx/badblock, this will make sector [1-100] as bad
blocks.
echo "-20-30" > xxx/badblock, this will make sector [20-30] good

If badblocks are accessed, the nullb disk will return IO error. Other
parts of the disk can accessed in normal way.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: emulate cache
Shaohua Li [Mon, 14 Aug 2017 22:04:59 +0000 (15:04 -0700)]
nullb: emulate cache

Software must flush disk cache to guarantee data safety. To check if
software correctly does disk cache flush, we must know the behavior of
disk. But physical disk behavior is uncontrollable. Even software
doesn't do the flush, the disk probably does the flush. This patch tries
to emulate a cache in the test disk.

All write will go to a cache first, when the cache is full, we then
flush some data to disk storage. A flush request will flush all data of
the cache to disk storage. A FUA write will write to memory store
directly and revalidate data in cache. If there is a power failure (by
writing to power attribute, 'echo 0 > disk_name/power'), we discard all
data in the cache, but preserve the data in disk storage. Later we can
power on the disk again as usual (write 1 to 'power' attribute), then we
can check data integrity and very if software does everything correctly.

A new attribute 'cache_size' (in MB) is added to configure cache size.

Based on original patch from Kyungchan Koh

Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: bandwidth control
Shaohua Li [Mon, 14 Aug 2017 22:04:58 +0000 (15:04 -0700)]
nullb: bandwidth control

In test, we usually expect controllable disk speed. For example, in a
raid array, we'd like some disks are fast and some are slow. MD RAID
actually has a feature for this. To test the feature, we'd like to make
the disk run in specific speed.

block throttling probably can be used for this purpose, but it requires
cgroup setup. Here we just implement a simple throttling mechanism in
the driver. There is slight fluctuation in the mechanism, but it's good
enough for test.

To configure the bandwidth cap, user sets the 'mbps' attribute. mbps is
MB/s.

Based on original patch from Kyungchan Koh

Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: support discard
Shaohua Li [Mon, 14 Aug 2017 22:04:57 +0000 (15:04 -0700)]
nullb: support discard

discard makes sense for memory backed disk. And also it's useful to test
if upper layer supports dicard correctly.

User configures 'discard' attribute to enable/disable dicard support.

Based on original patch from Kyungchan Koh

Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: support memory backed store
Shaohua Li [Mon, 14 Aug 2017 22:04:56 +0000 (15:04 -0700)]
nullb: support memory backed store

This adds memory backed store in nullb.

User configure 'memory_backed' attribute for this. By default, nullb
disk doesn't use memory backed store.

Based on original patch from Kyungchan Koh

Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: use ida to manage index
Shaohua Li [Mon, 14 Aug 2017 22:04:55 +0000 (15:04 -0700)]
nullb: use ida to manage index

We now dynamically create disks. Managing the disk index with ida to
avoid bump up the index too much.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: add interface to power on disk
Shaohua Li [Mon, 14 Aug 2017 22:04:54 +0000 (15:04 -0700)]
nullb: add interface to power on disk

The device created in nullb configfs interface isn't power on by
default. After user configures the device, user can do 'echo 1 >
xxx/nullb/device_name/power' to power on the device, which will create a
disk. the xxx/nullb/device_name/index is the disk index, so if the index
is 2, the new created disk should be named as /dev/nullb2. Note, the
'index' is only valid after disk is power on.

'echo 0 > xxx/nullb/device_name/power' will remove the disk. Note, this
doesn't remove the device. To remove the device, user should do 'rmdir
xxx/nullb/device_name'. Removing the device will remove the disk too.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: add configfs interface
Shaohua Li [Mon, 14 Aug 2017 22:04:53 +0000 (15:04 -0700)]
nullb: add configfs interface

Add configfs interface for nullb. configfs interface is more flexible
and easy to configure in a per-disk basis.

Configuration is something like this:
mount -t configfs none /mnt

Checking which features the driver supports:
cat /mnt/nullb/features

The 'features' attribute is for future extension. We probably will add
new features into the driver, userspace can check this attribute to find
the supported features.

Create/remove a device:
mkdir/rmdir /mnt/nullb/a

Then configure the device by setting attributes under /mnt/nullb/a, most
of nullb supported module parameters are converted to attributes:
size; /* device size in MB */
completion_nsec; /* time in ns to complete a request */
submit_queues; /* number of submission queues */
home_node; /* home node for the device */
queue_mode; /* block interface */
blocksize; /* block size */
irqmode; /* IRQ completion handler */
hw_queue_depth; /* queue depth */
use_lightnvm; /* register as a LightNVM device */
blocking; /* blocking blk-mq device */
use_per_node_hctx; /* use per-node allocation for hardware context */

Note, creating a device doesn't create a disk immediately. Creating a
disk is done in two phases: create a device and then power on the
device. Next patch will introduce device power on.

Based on original patch from Kyungchan Koh

Signed-off-by: Kyungchan Koh <kkc6196@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonullb: factor disk parameters
Shaohua Li [Mon, 14 Aug 2017 22:04:52 +0000 (15:04 -0700)]
nullb: factor disk parameters

When we switch to configfs interface, each disk could have different
configuration. To prepare for the change, we move most disk setting to a
separate data structure. The existing module parameter interface is
kept. The 'nr_devices' and 'shared_tags' don't make sense for per-disk
setting, so they are remained as global settings.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: error pointer dereference in skd_cons_disk()
Dan Carpenter [Wed, 23 Aug 2017 11:20:57 +0000 (14:20 +0300)]
skd: error pointer dereference in skd_cons_disk()

My initial impulse was to check for IS_ERR_OR_NULL() but when I looked
at this code a bit more closely, we should only need to check for
IS_ERR().

The blk_mq_alloc_tag_set() returns negative error codes and zero on
success so we can just do an "if (rc) goto err_out;".  It's better to
preserve the error code anyhow.  The blk_mq_init_queue() returns error
pointers on failure, it never returns NULL.  We can also remove the
"q = NULL;" at the start because that's no longer needed.

Fixes: ca33dd92968b ("skd: Convert to blk-mq")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Uninitialized variable in skd_isr_completion_posted()
Dan Carpenter [Wed, 23 Aug 2017 10:44:20 +0000 (13:44 +0300)]
skd: Uninitialized variable in skd_isr_completion_posted()

Someone got too agressive about removing initializations and
accidentally removed the "rc = 0;" which is required.

Fixes: c830da8cbc7b ("skd: Remove superfluous initializations from skd_isr_completion_posted()")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove driver version information
Bart Van Assche [Fri, 18 Aug 2017 15:22:28 +0000 (08:22 -0700)]
skd: Remove driver version information

Remove the driver version information because this information
is not useful in an upstream kernel driver.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Bump driver version
Bart Van Assche [Thu, 17 Aug 2017 20:13:38 +0000 (13:13 -0700)]
skd: Bump driver version

Bump the driver version. Remove the build ID because build IDs do
not make sense for an upstream kernel driver. Keep the driver
version in the module information but do not report it during every
load, unload or probe.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Optimize locking
Bart Van Assche [Thu, 17 Aug 2017 20:13:37 +0000 (13:13 -0700)]
skd: Optimize locking

Only take skdev->lock if necessary.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove several local variables
Bart Van Assche [Thu, 17 Aug 2017 20:13:36 +0000 (13:13 -0700)]
skd: Remove several local variables

This patch does not change any functionality but makes the code
more brief.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Reduce memory usage
Bart Van Assche [Thu, 17 Aug 2017 20:13:35 +0000 (13:13 -0700)]
skd: Reduce memory usage

Every single coherent DMA memory buffer occupies at least one page.
Reduce memory usage by switching from coherent buffers to streaming
DMA for I/O requests (struct skd_fitmsg_context) and S/G-lists
(struct fit_sg_descriptor[]).

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove skd_device.in_flight
Bart Van Assche [Thu, 17 Aug 2017 20:13:34 +0000 (13:13 -0700)]
skd: Remove skd_device.in_flight

Since skd_device.in_flight is only used to display the number of
in-flight requests in debug messages, remove that member and
introduce skd_in_flight(). That last function relies on the block
layer to determine the number of in flight requests.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Switch to block layer timeout mechanism
Bart Van Assche [Thu, 17 Aug 2017 20:13:33 +0000 (13:13 -0700)]
skd: Switch to block layer timeout mechanism

Remove the timeout slot variables and rely on the block layer to
detect request timeouts.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Convert to blk-mq
Bart Van Assche [Thu, 17 Aug 2017 20:13:32 +0000 (13:13 -0700)]
skd: Convert to blk-mq

Introduce a tag set and a blk_mq_ops structure. Set .cmd_size such
that struct request and struct skd_request_context are allocated
through a single allocation. Remove the skd_request_context.req
pointer. Make queue starting asynchronous such that this can occur
safely from interrupt context. Use locking to protect skdev->skmsg
and *skdev->skmsg against concurrent access from concurrent
.queue_rq() calls. Introduce the functions skd_init_request() and
skd_exit_request() to set up / clean up the per-request S/G-list.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Coalesce struct request and struct skd_request_context
Bart Van Assche [Thu, 17 Aug 2017 20:13:31 +0000 (13:13 -0700)]
skd: Coalesce struct request and struct skd_request_context

Set request_queue.cmd_size, introduce skd_init_rq() and skd_exit_rq()
and remove skd_device.skreq_table.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Move skd_free_sg_list() up
Bart Van Assche [Thu, 17 Aug 2017 20:13:30 +0000 (13:13 -0700)]
skd: Move skd_free_sg_list() up

Issue a warning if a NULL argument is passed to skd_free_sg_list().
Move this function up to make the blk-mq conversion patch easier
to read.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Split skd_recover_requests()
Bart Van Assche [Thu, 17 Aug 2017 20:13:29 +0000 (13:13 -0700)]
skd: Split skd_recover_requests()

This patch does not change any functionality but makes the blk-mq
conversion patch easier to read.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Introduce skd_process_request()
Bart Van Assche [Thu, 17 Aug 2017 20:13:28 +0000 (13:13 -0700)]
skd: Introduce skd_process_request()

The only functional change in this patch is that the skd_fitmsg_context
in which requests are accumulated is changed from a local variable into
a member of struct skd_device. This patch will make the blk-mq conversion
easier.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Convert several per-device scalar variables into atomics
Bart Van Assche [Thu, 17 Aug 2017 20:13:27 +0000 (13:13 -0700)]
skd: Convert several per-device scalar variables into atomics

Convert the per-device scalar variables that are protected by the
queue lock into atomics such that it becomes safe to access these
variables without holding the queue lock.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Enable request tags for the block layer queue
Bart Van Assche [Thu, 17 Aug 2017 20:13:26 +0000 (13:13 -0700)]
skd: Enable request tags for the block layer queue

Use the request tag when allocating a skd_fitmsg_context or
skd_request_context such that the lists used to track free elements
can be eliminated. Swap the skd_end_request() and skd_release_req()
calls to avoid triggering a use-after-free. Remove
skd_fitmsg_context.state and .outstanding because FIT messages are
shared among requests and because updating a FIT message after a
request has finished whould trigger a use-after-free.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Initialize skd_special_context.req.n_sg to one
Bart Van Assche [Thu, 17 Aug 2017 20:13:25 +0000 (13:13 -0700)]
skd: Initialize skd_special_context.req.n_sg to one

The debug code in skd_send_special_fitmsg() assumes that req.n_sg
represents the number of S/G descriptors. However, skd_construct()
initializes that member variable to zero. Set req.n_sg to one such
that the debugging code in skd_send_special_fitmsg() works as
expected.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove dead code
Bart Van Assche [Thu, 17 Aug 2017 20:13:24 +0000 (13:13 -0700)]
skd: Remove dead code

Removing the SG IO code also removed the code that sets
SKD_REQ_STATE_ABORTED. Hence also remove the code that checks for
this state.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove SG IO support
Bart Van Assche [Thu, 17 Aug 2017 20:13:23 +0000 (13:13 -0700)]
skd: Remove SG IO support

The skd SG IO support duplicates the functionality of the bsg driver.
Hence remove it.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Convert explicit skd_request_fn() calls
Bart Van Assche [Thu, 17 Aug 2017 20:13:22 +0000 (13:13 -0700)]
skd: Convert explicit skd_request_fn() calls

This will make it easier to convert this driver to the blk-mq
approach. This patch also reduces interrupt latency by moving
skd_request_fn() calls out of the skd_isr() interrupt.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Rework request failing code path
Bart Van Assche [Thu, 17 Aug 2017 20:13:21 +0000 (13:13 -0700)]
skd: Rework request failing code path

Move the skd_fail_all_pending() call out of skd_request_fn_not_online()
such that this function can be reused in the blk-mq code path.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Move a function definition
Bart Van Assche [Thu, 17 Aug 2017 20:13:20 +0000 (13:13 -0700)]
skd: Move a function definition

This patch does not change any functionality but makes the next
patch in this series easier to read.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskb: Use symbolic names for SCSI opcodes
Bart Van Assche [Thu, 17 Aug 2017 20:13:19 +0000 (13:13 -0700)]
skb: Use symbolic names for SCSI opcodes

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Use kcalloc() instead of kzalloc() with multiply
Bart Van Assche [Thu, 17 Aug 2017 20:13:18 +0000 (13:13 -0700)]
skd: Use kcalloc() instead of kzalloc() with multiply

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove superfluous occurrences of the 'volatile' keyword
Bart Van Assche [Thu, 17 Aug 2017 20:13:17 +0000 (13:13 -0700)]
skd: Remove superfluous occurrences of the 'volatile' keyword

mem_map[i] is accessed through readl() / writel() hence declaring
mem_map as volatile is not necessary.

Remove the volatile declarations from struct fit_completion_entry_v1
pointers and struct fit_comp_error_info since reading these structures
multiple times is safe.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove a redundant init_timer() call
Bart Van Assche [Thu, 17 Aug 2017 20:13:16 +0000 (13:13 -0700)]
skd: Remove a redundant init_timer() call

Since setup_timer() invokes init_timer(), invoking init_timer()
just before setup_timer() is redundant. Hence remove the init_timer()
call.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Use for_each_sg()
Bart Van Assche [Thu, 17 Aug 2017 20:13:15 +0000 (13:13 -0700)]
skd: Use for_each_sg()

This change makes skd_preop_sg_list() support chained sg-lists.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Drop second argument of skd_recover_requests()
Bart Van Assche [Thu, 17 Aug 2017 20:13:14 +0000 (13:13 -0700)]
skd: Drop second argument of skd_recover_requests()

Since all callers pass zero as second argument to skd_recover_requests(),
drop that second argument.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove superfluous initializations from skd_isr_completion_posted()
Bart Van Assche [Thu, 17 Aug 2017 20:13:13 +0000 (13:13 -0700)]
skd: Remove superfluous initializations from skd_isr_completion_posted()

The value of skcmp, cmp_cntxt etc. is overwritten during every
loop iteration and is not used after the loop has finished. Hence
initializing these variables outside the loop is not necessary.
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Simplify the code for handling data direction
Bart Van Assche [Thu, 17 Aug 2017 20:13:12 +0000 (13:13 -0700)]
skd: Simplify the code for handling data direction

Use DMA_FROM_DEVICE and DMA_TO_DEVICE directly instead of
introducing driver-private constants with the same numerical
value.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Use ARRAY_SIZE() where appropriate
Bart Van Assche [Thu, 17 Aug 2017 20:13:11 +0000 (13:13 -0700)]
skd: Use ARRAY_SIZE() where appropriate

Use ARRAY_SIZE() instead of open-coding it. This patch does not
change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Make the skd_isr() code more brief
Bart Van Assche [Thu, 17 Aug 2017 20:13:10 +0000 (13:13 -0700)]
skd: Make the skd_isr() code more brief

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Use __packed only when needed
Bart Van Assche [Thu, 17 Aug 2017 20:13:09 +0000 (13:13 -0700)]
skd: Use __packed only when needed

Since needless use of __packed slows down access to data structures,
only use __packed when needed.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Check structure sizes at build time
Bart Van Assche [Thu, 17 Aug 2017 20:13:08 +0000 (13:13 -0700)]
skd: Check structure sizes at build time

This patch will help to verify the changes made by the next patch.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Use a structure instead of hardcoding structure offsets
Bart Van Assche [Thu, 17 Aug 2017 20:13:07 +0000 (13:13 -0700)]
skd: Use a structure instead of hardcoding structure offsets

This change makes the source code easier to read.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Simplify the code for allocating DMA message buffers
Bart Van Assche [Thu, 17 Aug 2017 20:13:06 +0000 (13:13 -0700)]
skd: Simplify the code for allocating DMA message buffers

dma_alloc_coherent() guarantees alignment on a page boundary so
no explicit alignment is needed to align on a 64 byte boundary.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Simplify the code for deciding whether or not to send a FIT msg
Bart Van Assche [Thu, 17 Aug 2017 20:13:05 +0000 (13:13 -0700)]
skd: Simplify the code for deciding whether or not to send a FIT msg

Due to the previous patch it is guaranteed that the FIT msg contains
at least one request after the for-loop has finished. Use this to
simplify the code.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Reorder the code in skd_process_request()
Bart Van Assche [Thu, 17 Aug 2017 20:13:04 +0000 (13:13 -0700)]
skd: Reorder the code in skd_process_request()

Prepare the S/G-list before allocating a FIT msg such that the FIT
msg always contains at least one request after the for-loop is
finished.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Fix size argument in skd_free_skcomp()
Bart Van Assche [Thu, 17 Aug 2017 20:13:03 +0000 (13:13 -0700)]
skd: Fix size argument in skd_free_skcomp()

Pass the correct size to pci_free_consistent() in skd_free_skcomp().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Introduce SKD_SKCOMP_SIZE
Bart Van Assche [Thu, 17 Aug 2017 20:13:02 +0000 (13:13 -0700)]
skd: Introduce SKD_SKCOMP_SIZE

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Introduce the symbolic constant SKD_MAX_REQ_PER_MSG
Bart Van Assche [Thu, 17 Aug 2017 20:13:01 +0000 (13:13 -0700)]
skd: Introduce the symbolic constant SKD_MAX_REQ_PER_MSG

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Document locking assumptions
Bart Van Assche [Thu, 17 Aug 2017 20:13:00 +0000 (13:13 -0700)]
skd: Document locking assumptions

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Fix endianness annotations
Bart Van Assche [Thu, 17 Aug 2017 20:12:59 +0000 (13:12 -0700)]
skd: Fix endianness annotations

Ensure that sparse does not report any warnings when building the
skd driver with sparse verification enabled (C=1 or C=2).

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Switch from the pr_*() to the dev_*() logging functions
Bart Van Assche [Thu, 17 Aug 2017 20:12:58 +0000 (13:12 -0700)]
skd: Switch from the pr_*() to the dev_*() logging functions

Use dev_err() and dev_info() instead of pr_err() and pr_info().
Since dev_dbg() is able to report file name and line number
information, remove __FILE__ and __LINE__ from the dev_dbg() calls.
Remove the struct skd_device members and the function (skd_name())
that became superfluous due to these changes.

This patch removes the device name and serial number from log
statements. An example of the old log line format:

(skd0:STM000196603:[0000:00:09.0]): Driver state STARTING(3)=>ONLINE(4)

An example of the new log line format:

skd:0000:00:09.0: Driver state STARTING(3)=>ONLINE(4)

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove useless barrier() calls
Bart Van Assche [Thu, 17 Aug 2017 20:12:57 +0000 (13:12 -0700)]
skd: Remove useless barrier() calls

The purpose of barrier() is to prevent reordering by the compiler.
Since the compiler does not reorder calls to non-pure functions,
remove the barrier() calls from skd_reg_{read,write}{32,64}().

Since pr_debug() is able to report file name and line number
information, remove __FILE__ and __LINE__ from the pr_debug() calls.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove a set-but-not-used variable from struct skd_device
Bart Van Assche [Thu, 17 Aug 2017 20:12:56 +0000 (13:12 -0700)]
skd: Remove a set-but-not-used variable from struct skd_device

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove set-but-not-used local variables
Bart Van Assche [Thu, 17 Aug 2017 20:12:55 +0000 (13:12 -0700)]
skd: Remove set-but-not-used local variables

These variables have been detected by building with W=1. Declare
'acc' as __maybe_unused because most access_ok() implementations
ignore their first argument. This patch does not change any
functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Fix a function name in a comment
Bart Van Assche [Thu, 17 Aug 2017 20:12:54 +0000 (13:12 -0700)]
skd: Fix a function name in a comment

There is no function skd_completion_posted_isr() in the skd driver
but there is a function called skd_isr_completion_posted(). Fix
the function name in the comment.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Fix spelling in a source code comment
Bart Van Assche [Thu, 17 Aug 2017 20:12:53 +0000 (13:12 -0700)]
skd: Fix spelling in a source code comment

Change "ptimal" into "optimal" and remove the misleading reference
to sysfs.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Avoid that gcc 7 warns about fall-through when building with W=1
Bart Van Assche [Thu, 17 Aug 2017 20:12:52 +0000 (13:12 -0700)]
skd: Avoid that gcc 7 warns about fall-through when building with W=1

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove unnecessary blank lines
Bart Van Assche [Thu, 17 Aug 2017 20:12:51 +0000 (13:12 -0700)]
skd: Remove unnecessary blank lines

This patch does not change any functionality but makes the skd
driver source code more uniform with that of other kernel drivers.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove ESXi code
Bart Van Assche [Thu, 17 Aug 2017 20:12:50 +0000 (13:12 -0700)]
skd: Remove ESXi code

Since the code guarded by #ifdef SKD_VMK_POLL_HANDLER / #endif
is never built on Linux systems, remove it.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Remove unneeded #include directives
Bart Van Assche [Thu, 17 Aug 2017 20:12:49 +0000 (13:12 -0700)]
skd: Remove unneeded #include directives

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Update maintainer information
Bart Van Assche [Thu, 17 Aug 2017 20:12:48 +0000 (13:12 -0700)]
skd: Update maintainer information

E-mails sent to support@stec-inc.com bounce. Hence remove that
e-mail address from the driver. Add an entry to the MAINTAINERS
file instead.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Switch to GPLv2
Bart Van Assche [Thu, 17 Aug 2017 20:12:47 +0000 (13:12 -0700)]
skd: Switch to GPLv2

This change does not affect any skd driver version derived from a
dual licensed code base but makes all code derived from future
upstream skd driver versions GPLv2 only.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Submit requests to firmware before triggering the doorbell
Bart Van Assche [Thu, 17 Aug 2017 20:12:46 +0000 (13:12 -0700)]
skd: Submit requests to firmware before triggering the doorbell

Ensure that the members of struct skd_msg_buf have been transferred
to the PCIe adapter before the doorbell is triggered. This patch
avoids that I/O fails sporadically and that the following error
message is reported:

(skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoskd: Avoid that module unloading triggers a use-after-free
Bart Van Assche [Thu, 17 Aug 2017 20:12:45 +0000 (13:12 -0700)]
skd: Avoid that module unloading triggers a use-after-free

Since put_disk() triggers a disk_release() call and since that
last function calls blk_put_queue() if disk->queue != NULL, clear
the disk->queue pointer before calling put_disk(). This avoids
that unloading the skd kernel module triggers the following
use-after-free:

WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80
refcount_t: underflow; use-after-free.
CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1
Workqueue: events work_for_cpu_fn
Call Trace:
 dump_stack+0x63/0x84
 __warn+0xcb/0xf0
 warn_slowpath_fmt+0x5a/0x80
 refcount_sub_and_test+0x70/0x80
 refcount_dec_and_test+0x11/0x20
 kobject_put+0x1f/0x50
 blk_put_queue+0x15/0x20
 disk_release+0xae/0xf0
 device_release+0x32/0x90
 kobject_release+0x67/0x170
 kobject_put+0x2b/0x50
 put_disk+0x17/0x20
 skd_destruct+0x5c/0x890 [skd]
 skd_pci_probe+0x124d/0x13a0 [skd]
 local_pci_probe+0x42/0xa0
 work_for_cpu_fn+0x14/0x20
 process_one_work+0x19e/0x470
 worker_thread+0x1dc/0x4a0
 kthread+0x125/0x140
 ret_from_fork+0x25/0x30

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: Relax a check in blk_start_queue()
Bart Van Assche [Thu, 17 Aug 2017 20:12:44 +0000 (13:12 -0700)]
block: Relax a check in blk_start_queue()

Calling blk_start_queue() from interrupt context with the queue
lock held and without disabling IRQs, as the skd driver does, is
safe. This patch avoids that loading the skd driver triggers the
following warning:

WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0
RIP: 0010:blk_start_queue+0x84/0xa0
Call Trace:
 skd_unquiesce_dev+0x12a/0x1d0 [skd]
 skd_complete_internal+0x1e7/0x5a0 [skd]
 skd_complete_other+0xc2/0xd0 [skd]
 skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd]
 skd_isr+0x14f/0x180 [skd]
 irq_forced_thread_fn+0x2a/0x70
 irq_thread+0x144/0x1a0
 kthread+0x125/0x140
 ret_from_fork+0x2a/0x40

Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
Cc: Andrew Morton <akpm@osdl.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoxen-blkfront: Avoid that gcc 7 warns about fall-through when building with W=1
Bart Van Assche [Thu, 17 Aug 2017 23:23:11 +0000 (16:23 -0700)]
xen-blkfront: Avoid that gcc 7 warns about fall-through when building with W=1

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoxen-blkback: Avoid that gcc 7 warns about fall-through when building with W=1
Bart Van Assche [Thu, 17 Aug 2017 23:23:10 +0000 (16:23 -0700)]
xen-blkback: Avoid that gcc 7 warns about fall-through when building with W=1

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoxen-blkback: Fix indentation
Bart Van Assche [Thu, 17 Aug 2017 23:23:09 +0000 (16:23 -0700)]
xen-blkback: Fix indentation

Avoid that smatch reports the following warning when building with
C=2 CHECK="smatch -p=kernel":

drivers/block/xen-blkback/blkback.c:710 xen_blkbk_unmap_prepare() warn: inconsistent indenting

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roger Pau Monn303251 <roger.pau@citrix.com>
Cc: xen-devel@lists.xenproject.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agovirtio_blk: Use blk_rq_is_scsi()
Bart Van Assche [Thu, 17 Aug 2017 23:23:08 +0000 (16:23 -0700)]
virtio_blk: Use blk_rq_is_scsi()

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoide-floppy: Use blk_rq_is_scsi()
Bart Van Assche [Thu, 17 Aug 2017 23:23:07 +0000 (16:23 -0700)]
ide-floppy: Use blk_rq_is_scsi()

This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: linux-ide@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agogenhd: Annotate all part and part_tbl pointer dereferences
Bart Van Assche [Thu, 17 Aug 2017 23:23:06 +0000 (16:23 -0700)]
genhd: Annotate all part and part_tbl pointer dereferences

Annotate gendisk.part_tbl and disk_part_tbl.part dereferences with
rcu_dereference_protected(). This patch does not change the behavior
of the modified code but ensures that sparse does not complain about
disk->part_tbl manipulations nor about part_tbl->part accesses.
Additionally, improve documentation of the locking requirements of
the modified functions.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq-debugfs: Declare a local symbol static
Bart Van Assche [Thu, 17 Aug 2017 23:23:04 +0000 (16:23 -0700)]
blk-mq-debugfs: Declare a local symbol static

This was detected by sparse.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: Make blk_mq_reinit_tagset() calls easier to read
Bart Van Assche [Thu, 17 Aug 2017 23:23:03 +0000 (16:23 -0700)]
blk-mq: Make blk_mq_reinit_tagset() calls easier to read

Since blk_mq_ops.reinit_request is only called from inside
blk_mq_reinit_tagset(), make this function pointer an argument of
blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops.
This patch does not change any functionality but makes
blk_mq_reinit_tagset() calls easier to read and to analyze.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: James Smart <james.smart@broadcom.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: Unexport blk_queue_end_tag()
Bart Van Assche [Thu, 17 Aug 2017 23:23:01 +0000 (16:23 -0700)]
block: Unexport blk_queue_end_tag()

This function is only used inside the block layer core. Hence
unexport it.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: Fix two comments that refer to .queue_rq() return values
Bart Van Assche [Thu, 17 Aug 2017 23:23:00 +0000 (16:23 -0700)]
block: Fix two comments that refer to .queue_rq() return values

Since patch "blk-mq: switch .queue_rq return value to blk_status_t"
.queue_rq() returns a BLK_STS_* value instead of a BLK_MQ_RQ_*
value. Hence refer to the former in comments about .queue_rq()
return values.

Fixes: commit 39a70c76b89b ("blk-mq: clarify dispatch may not be drained/blocked by stopping queue")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonbd: change the default nbd partitions
Josef Bacik [Mon, 14 Aug 2017 18:56:16 +0000 (18:56 +0000)]
nbd: change the default nbd partitions

There's no reason to have partitions disabled for nbd by default, it costs us
nothing to have it enabled and is just confusing/obnoxious to users who try to
use partitions with nbd.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonbd: allow device creation at a specific index
Josef Bacik [Mon, 14 Aug 2017 18:25:33 +0000 (18:25 +0000)]
nbd: allow device creation at a specific index

If users really want to use a particular index for their nbd device and it
doesn't already exist there's no reason we can't just create it for them.  Do
this instead of erroring out.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoloop: fix to a race condition due to the early registration of device
Anton Volkov [Mon, 7 Aug 2017 12:37:50 +0000 (15:37 +0300)]
loop: fix to a race condition due to the early registration of device

The early device registration made possible a race leading to allocations
of disks with wrong minors.

This patch moves the device registration further down the loop_init
function to make the race infeasible.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Anton Volkov <avolkov@ispras.ru>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agocfq: Give a chance for arming slice idle timer in case of group_idle
Ritesh Harjani [Wed, 9 Aug 2017 12:58:32 +0000 (18:28 +0530)]
cfq: Give a chance for arming slice idle timer in case of group_idle

In below scenario blkio cgroup does not work as per their assigned
weights :-
1. When the underlying device is nonrotational with a single HW queue
with depth of >= CFQ_HW_QUEUE_MIN
2. When the use case is forming two blkio cgroups cg1(weight 1000) &
cg2(wight 100) and two processes(file1 and file2) doing sync IO in
their respective blkio cgroups.

For above usecase result of fio (without this patch):-
file1: (groupid=0, jobs=1): err= 0: pid=685: Thu Jan  1 19:41:49 1970
  write: IOPS=1315, BW=41.1MiB/s (43.1MB/s)(1024MiB/24906msec)
<...>
file2: (groupid=0, jobs=1): err= 0: pid=686: Thu Jan  1 19:41:49 1970
  write: IOPS=1295, BW=40.5MiB/s (42.5MB/s)(1024MiB/25293msec)
<...>
// both the process BW is equal even though they belong to diff.
cgroups with weight of 1000(cg1) and 100(cg2)

In above case (for non rotational NCQ devices),
as soon as the request from cg1 is completed and even
though it is provided with higher set_slice=10, because of CFQ
algorithm when the driver tries to fetch the request, CFQ expires
this group without providing any idle time nor weight priority
and schedules another cfq group (in this case cg2).
And thus both cfq groups(cg1 & cg2) keep alternating to get the
disk time and hence loses the cgroup weight based scheduling.

Below patch gives a chance to cfq algorithm (cfq_arm_slice_timer)
to arm the slice timer in case group_idle is enabled.
In case if group_idle is also not required (including for nonrotational
NCQ drives), we need to explicitly set group_idle = 0 from sysfs for
such cases.

With this patch result of fio(for above usecase) :-
file1: (groupid=0, jobs=1): err= 0: pid=690: Thu Jan  1 00:06:08 1970
  write: IOPS=1706, BW=53.3MiB/s (55.9MB/s)(1024MiB/19197msec)
<..>
file2: (groupid=0, jobs=1): err= 0: pid=691: Thu Jan  1 00:06:08 1970
  write: IOPS=1043, BW=32.6MiB/s (34.2MB/s)(1024MiB/31401msec)
<..>
// In this processes BW is as per their respective cgroups weight.

Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock, bfq: boost throughput with flash-based non-queueing devices
Paolo Valente [Fri, 4 Aug 2017 05:35:11 +0000 (07:35 +0200)]
block, bfq: boost throughput with flash-based non-queueing devices

When a queue associated with a process remains empty, there are cases
where throughput gets boosted if the device is idled to await the
arrival of a new I/O request for that queue. Currently, BFQ assumes
that one of these cases is when the device has no internal queueing
(regardless of the properties of the I/O being served). Unfortunately,
this condition has proved to be too general. So, this commit refines it
as "the device has no internal queueing and is rotational".

This refinement provides a significant throughput boost with random
I/O, on flash-based storage without internal queueing. For example, on
a HiKey board, throughput increases by up to 125%, growing, e.g., from
6.9MB/s to 15.6MB/s with two or three random readers in parallel.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock,bfq: refactor device-idling logic
Paolo Valente [Fri, 4 Aug 2017 05:35:10 +0000 (07:35 +0200)]
block,bfq: refactor device-idling logic

The logic that decides whether to idle the device is scattered across
three functions. Almost all of the logic is in the function
bfq_bfqq_may_idle, but (1) part of the decision is made in
bfq_update_idle_window, and (2) the function bfq_bfqq_must_idle may
switch off idling regardless of the output of bfq_bfqq_may_idle. In
addition, both bfq_update_idle_window and bfq_bfqq_must_idle make
their decisions as a function of parameters that are used, for similar
purposes, also in bfq_bfqq_may_idle. This commit addresses these
issues by moving all the logic into bfq_bfqq_may_idle.

Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: remove unused syncfull/asyncfull queue flags
Jens Axboe [Thu, 10 Aug 2017 14:25:38 +0000 (08:25 -0600)]
block: remove unused syncfull/asyncfull queue flags

We haven't used these in years, but somehow the definitions still
remained. Kill them, and renumber the QUEUE_FLAG_ space. We had
a hole in the beginning of the space, too.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: enable checking two part inflight counts at the same time
Jens Axboe [Tue, 8 Aug 2017 23:53:33 +0000 (17:53 -0600)]
blk-mq: enable checking two part inflight counts at the same time

Modify blk_mq_in_flight() to count both a partition and root at
the same time. Then we only have to call it once, instead of
potentially looping the tags twice.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: provide internal in-flight variant
Jens Axboe [Tue, 8 Aug 2017 23:51:45 +0000 (17:51 -0600)]
blk-mq: provide internal in-flight variant

We don't have to inc/dec some counter, since we can just
iterate the tags. That makes inc/dec a noop, but means we
have to iterate busy tags to get an in-flight count.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: make part_in_flight() take an array of two ints
Jens Axboe [Tue, 8 Aug 2017 23:49:47 +0000 (17:49 -0600)]
block: make part_in_flight() take an array of two ints

Instead of returning the count that matches the partition, pass
in an array of two ints. Index 0 will be filled with the inflight
count for the partition in question, and index 1 will filled
with the root inflight count, if the partition passed in is not the
root.

This is in preparation for being able to calculate both in one
go.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: pass in queue to inflight accounting
Jens Axboe [Sat, 1 Jul 2017 03:55:08 +0000 (21:55 -0600)]
block: pass in queue to inflight accounting

No functional change in this patch, just in preparation for
basing the inflight mechanism on the queue in question.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq-tag: check for NULL rq when iterating tags
Jens Axboe [Fri, 4 Aug 2017 19:37:03 +0000 (13:37 -0600)]
blk-mq-tag: check for NULL rq when iterating tags

Since we introduced blk-mq-sched, the tags->rqs[] array has been
dynamically assigned. So we need to check for NULL when iterating,
since there's a window of time where the bit is set, but we haven't
dynamically assigned the tags->rqs[] array position yet.

This is perfectly safe, since the memory backing of the request is
never going away while the device is alive.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agodm-crypt: don't mess with BIP_BLOCK_INTEGRITY
Christoph Hellwig [Wed, 9 Aug 2017 15:48:09 +0000 (17:48 +0200)]
dm-crypt: don't mess with BIP_BLOCK_INTEGRITY

This flag is never set right after calling bio_integrity_alloc,
so don't clear it and confuse the reader.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agobio-integrity: move the bio integrity profile check earlier in bio_integrity_prep
Christoph Hellwig [Wed, 9 Aug 2017 15:48:08 +0000 (17:48 +0200)]
bio-integrity: move the bio integrity profile check earlier in bio_integrity_prep

This makes the code more obvious, and moves the most likely branch first
in the function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonull_blk: make sure submit_queues > 0
weiping zhang [Wed, 2 Aug 2017 16:27:37 +0000 (00:27 +0800)]
null_blk: make sure submit_queues > 0

set submit_queues to 1 by default, and make sure it's value > 0.

Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agonull_blk: simplify logic for use_per_node_hctx
weiping zhang [Wed, 2 Aug 2017 16:26:39 +0000 (00:26 +0800)]
null_blk: simplify logic for use_per_node_hctx

make sure submit_queues equal nr_online_nodes.

Signed-off-by: weiping zhang <zhangweiping@didichuxing.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: Add comment to submit_bio_wait()
Jan Kara [Wed, 2 Aug 2017 08:25:21 +0000 (10:25 +0200)]
block: Add comment to submit_bio_wait()

submit_bio_wait() does not consume bio reference. Add comment about
that.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled
Jens Axboe [Tue, 1 Aug 2017 15:28:24 +0000 (09:28 -0600)]
blk-mq: add warning to __blk_mq_run_hw_queue() for ints disabled

We recently had a bug in the IPR SCSI driver, where it would end up
making the SCSI mid layer run the mq hardware queue with interrupts
disabled. This isn't legal, since the software queue locking relies
on never being grabbed from interrupt context. Additionally, drivers
that set BLK_MQ_F_BLOCKING may schedule from this context.

Add a WARN_ON_ONCE() to catch bad users up front.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblk-mq: blk_mq_requeue_work() doesn't need to save IRQ flags
Jens Axboe [Thu, 27 Jul 2017 14:03:57 +0000 (08:03 -0600)]
blk-mq: blk_mq_requeue_work() doesn't need to save IRQ flags

We know we're in process context, so don't bother using the
IRQ safe versions of the spin lock.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: DAC960: shut up format-overflow warning
Arnd Bergmann [Fri, 14 Jul 2017 12:07:11 +0000 (14:07 +0200)]
block: DAC960: shut up format-overflow warning

gcc-7 points out that a large controller number would overflow the
string length for the procfs name and the firmware version string:

drivers/block/DAC960.c: In function 'DAC960_Probe':
drivers/block/DAC960.c:6591:38: warning: 'sprintf' may write a terminating nul past the end of the destination [-Wformat-overflow=]
drivers/block/DAC960.c: In function 'DAC960_V1_ReadControllerConfiguration':
drivers/block/DAC960.c:1681:40: error: '%02d' directive writing between 2 and 3 bytes into a region of size between 2 and 5 [-Werror=format-overflow=]
drivers/block/DAC960.c:1681:40: note: directive argument in the range [0, 255]
drivers/block/DAC960.c:1681:3: note: 'sprintf' output between 10 and 14 bytes into a destination of size 12

Both of these seem appropriately sized, and using snprintf()
instead of sprintf() improves this by ensuring that even
incorrect data won't cause undefined behavior here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 years agoblock: use standard blktrace API to output cgroup info for debug notes
Shaohua Li [Wed, 12 Jul 2017 18:49:56 +0000 (11:49 -0700)]
block: use standard blktrace API to output cgroup info for debug notes

Currently cfq/bfq/blk-throttle output cgroup info in trace in their own
way. Now we have standard blktrace API for this, so convert them to use
it.

Note, this changes the behavior a little bit. cgroup info isn't output
by default, we only do this with 'blk_cgroup' option enabled. cgroup
info isn't output as a string by default too, we only do this with
'blk_cgname' option enabled. Also cgroup info is output in different
position of the note string. I think these behavior changes aren't a big
issue (actually we make trace data shorter which is good), since the
blktrace note is solely for debugging.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>