git.osdn.net Git - qmiga/qemu.git/log

qcow2: Increase the default upper limit on the L2 cache size

The upper limit on the L2 cache size is increased from 1 MB to 32 MB
on Linux platforms, and to 8 MB on other platforms (this difference is
caused by the ability to set intervals for cache cleaning on Linux
platforms only).

This is done in order to allow default full coverage with the L2 cache
for images of up to 256 GB in size (was 8 GB). Note, that only the
needed amount to cover the full image is allocated. The value which is
changed here is just the upper limit on the L2 cache size, beyond which
it will not grow, even if the size of the image will require it to.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Assign the L2 cache relatively to the image size

Sufficient L2 cache can noticeably improve the performance when using
large images with frequent I/O.

Previously, unless 'cache-size' was specified and was large enough, the
L2 cache was set to a certain size without taking the virtual image size
into account.

Now, the L2 cache assignment is aware of the virtual size of the image,
and will cover the entire image, unless the cache size needed for that is
larger than a certain maximum. This maximum is set to 1 MB by default
(enough to cover an 8 GB image with the default cluster size) but can
be increased or decreased using the 'l2-cache-size' option. This option
was previously documented as the *maximum* L2 cache size, and this patch
makes it behave as such, instead of as a constant size. Also, the
existing option 'cache-size' can limit the sum of both L2 and refcount
caches, as previously.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Avoid duplication in setting the refcount cache size

The refcount cache size does not need to be set to its minimum value in
read_cache_sizes(), as it is set to at least its minimum value in
qcow2_update_options_prepare().

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Make sizes more humanly readable

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

include: Add a lookup table of sizes

Adding a lookup table for the powers of two, with the appropriate size
prefixes. This is needed when a size has to be stringified, in which
case something like '(1 * KiB)' would become a literal '(1 * (1L << 10))'
string. Powers of two are used very often for sizes, so such a table
will also make it easier and more intuitive to write them.

This table is generatred using the following AWK script:

BEGIN {
suffix="KMGTPE";
for(i=10; i<64; i++) {
val=2**i;
s=substr(suffix, int(i/10), 1);
n=2**(i%10);
pad=21-int(log(n)/log(10));
printf("#define S_%d%siB %*d\n", n, s, pad, val);
}
}

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Options' documentation fixes

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow changing 'detect-zeroes' on reopen

'detect-zeroes' is one of the basic BlockdevOptions available for all
drivers, but it's not handled by bdrv_reopen_prepare(), so any attempt
to change it results in an error:

(qemu) qemu-io virtio0 "reopen -o detect-zeroes=on"
Cannot change the option 'detect-zeroes'

Since there's no reason why we shouldn't allow changing it and the
implementation is simple let's just do it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow changing 'discard' on reopen

'discard' is one of the basic BlockdevOptions available for all
drivers, but it's not handled by bdrv_reopen_prepare() so any attempt
to change it results in an error:

(qemu) qemu-io virtio0 "reopen -o discard=on"
Cannot change the option 'discard'

Since there's no reason why we shouldn't allow changing it and the
implementation is simple let's just do it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

file-posix: Forbid trying to change unsupported options during reopen

The file-posix code is used for the "file", "host_device" and
"host_cdrom" drivers, and it allows reopening images. However the only
option that is actually processed is "x-check-cache-dropped", and
changes in all other options (e.g. "filename") are silently ignored:

(qemu) qemu-io virtio0 "reopen -o file.filename=no-such-file"

While we could allow changing some of the other options, let's keep
things as they are for now but return an error if the user tries to
change any of them.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Forbid trying to change unsupported options during reopen

The bdrv_reopen_prepare() function checks all options passed to each
BlockDriverState (in the reopen_state->options QDict) and makes all
necessary preparations to apply the option changes requested by the
user.

Options are removed from the QDict as they are processed, so at the
end of bdrv_reopen_prepare() only the options that can't be changed
are left. Then a loop goes over all remaining options and verifies
that the old and new values are identical, returning an error if
they're not.

The problem is that at the moment there are options that are removed
from the QDict although they can't be changed. The consequence of this
is any modification to any of those options is silently ignored:

(qemu) qemu-io virtio0 "reopen -o discard=on"

This happens when all options from bdrv_runtime_opts are removed
from the QDict but then only a few of them are processed. Since
it's especially important that "node-name" and "driver" are not
changed, the code puts them back into the QDict so they are checked
at the end of the function. Instead of putting only those two options
back into the QDict, this patch puts all unprocessed options using
qemu_opts_to_qdict().

update_flags_from_options() also needs to be modified to prevent
BDRV_OPT_CACHE_NO_FLUSH, BDRV_OPT_CACHE_DIRECT and BDRV_OPT_READ_ONLY
from going back to the QDict.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow child references on reopen

In the previous patches we removed all child references from
bs->{options,explicit_options} because keeping them is useless and
wrong.

Because of this, any attempt to reopen a BlockDriverState using a
child reference as one of its options would result in a failure,
because bdrv_reopen_prepare() would detect that there's a new option
(the child reference) that wasn't present in bs->options.

But passing child references on reopen can be useful. It's a way to
specify a BDS's child without having to pass recursively all of the
child's options, and if the reference points to a different BDS then
this can allow us to replace the child.

However, replacing the child is something that needs to be implemented
case by case and only when it makes sense. For now, this patch allows
passing a child reference as long as it points to the current child of
the BlockDriverState.

It's also important to remember that, as a consequence of the
previous patches, this child reference will be removed from
bs->{options,explicit_options} after the reopening has been completed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Don't look for child references in append_open_options()

In the previous patch we removed child references from bs->options, so
there's no need to look for them here anymore.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Remove child references from bs->{options,explicit_options}

Block drivers allow opening their children using a reference to an
existing BlockDriverState. These references remain stored in the
'options' and 'explicit_options' QDicts, but we don't need to keep
them once everything is open.

What is more important, these values can become wrong if the children
change:

    $ qemu-img create -f qcow2 hd0.qcow2 10M
    $ qemu-img create -f qcow2 hd1.qcow2 10M
    $ qemu-img create -f qcow2 hd2.qcow2 10M
    $ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \
            -drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \
            -drive file=hd2.qcow2,node-name=hd2,backing=hd1

After this hd2 has hd1 as its backing file. Now let's remove it using
block_stream:

    (qemu) block_stream hd2 0 hd0.qcow2

Now hd0 is the backing file of hd2, but hd2's options QDicts still
contain backing=hd1.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

file-posix: x-check-cache-dropped should default to false on reopen

The default value of x-check-cache-dropped is false. There's no reason
to use the previous value as a default in raw_reopen_prepare() because
bdrv_reopen_queue_child() already takes care of putting the old
options in the BDRVReopenState.options QDict.

If x-check-cache-dropped was previously set but is now missing from
the reopen QDict then it should be reset to false.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-io: Fix writethrough check in reopen

"qemu-io reopen" doesn't allow changing the writethrough setting of
the cache, but the check is wrong, causing an error even on a simple
reopen with the default parameters:

   $ qemu-img create -f qcow2 hd.qcow2 1M
   $ qemu-system-x86_64 -monitor stdio -drive if=virtio,file=hd.qcow2
   (qemu) qemu-io virtio0 reopen
   Cannot change cache.writeback: Device attached

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

file-posix: Include filename in locking error message

Image locking errors happening at device initialization time doesn't say
which file cannot be locked, for instance,

-device scsi-disk,drive=drive-1: Failed to get shared "write" lock
Is another process using the image?

could refer to either the overlay image or its backing image.

Hoist the error_append_hint to the caller of raw_check_lock_bytes where
file name is known, and include it in the error hint.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into staging

Queued tcg patches

# gpg: Signature made Wed 26 Sep 2018 19:27:22 BST
# gpg:                using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20180926:
  tcg/i386: fix vector operations on 32-bit hosts
  qht-bench: add -p flag to precompute hash values
  qht: constify arguments to some internal functions
  qht: constify qht_statistics_init
  qht: constify qht_lookup
  qht: fix comment in qht_bucket_remove_entry
  qht: drop ht argument from qht iterators
  test-qht: speed up + test qht_resize
  test-qht: test deletion of the last entry in a bucket
  test-qht: test removal of non-existent entries
  test-qht: test qht_iter_remove
  qht: add qht_iter_remove
  qht: remove unused map param from qht_remove__locked

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20180926a' into staging

Migration pull 2018-09-26

This supercedes Juan's pull from the 13th

# gpg: Signature made Wed 26 Sep 2018 18:07:30 BST
# gpg:                using RSA key 0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20180926a:
  migration/ram.c: Avoid taking address of fields in packed MultiFDInit_t struct
  migration: fix the compression code
  migration: fix QEMUFile leak
  tests/migration: Speed up the test on ppc64
  migration: cleanup in error paths in loadvm
  migration/postcopy: Clear have_listen_thread
  tests/migration: Add migration-test header file
  tests/migration: Support cross compilation in generating boot header file
  tests/migration: Convert x86 boot block compilation script into Makefile
  migration: use save_page_use_compression in flush_compressed_data
  migration: show the statistics of compression
  migration: do not flush_compressed_data at the end of iteration
  Add a hint message to loadvm and exits on failure
  migration: handle the error condition properly
  migration: fix calculating xbzrle_counters.cache_miss_rate
  migration/rdma: Fix uninitialised rdma_return_path

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20180926' into staging

pull-seccomp-20180926

# gpg: Signature made Wed 26 Sep 2018 14:20:06 BST
# gpg:                using RSA key DF32E7C0F0FFF9A2
# gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <otubo@redhat.com>"
# Primary key fingerprint: D67E 1B50 9374 86B4 0723  DBAB DF32 E7C0 F0FF F9A2

* remotes/otubo/tags/pull-seccomp-20180926:
  seccomp: check TSYNC host capability

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/famz/tags/staging-pull-request' into staging

Block and testing patches

- Paolo's AIO fixes.
- VMDK streamOptimized corner case fix
- VM testing improvment on -cpu

# gpg: Signature made Wed 26 Sep 2018 03:54:08 BST
# gpg:                using RSA key CA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/staging-pull-request:
  vmdk: align end of file to a sector boundary
  tests/vm: Use -cpu max rather than -cpu host
  aio-posix: do skip system call if ctx->notifier polling succeeds
  aio-posix: compute timeout before polling
  aio-posix: fix concurrent access to poll_disable_cnt

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-3.1-pull-request' into staging

- some fixes for setrlimit() and write()
- fixes ELF loader when host page size is greater than target page size
- add SO_LINGER to getsockopt()/setsockopt()
- move TargetFdTrans from syscall.c
  v2: add "#include <linux/netlink.h>" in linux-user/fd-trans.c

# gpg: Signature made Tue 25 Sep 2018 21:51:13 BST
# gpg:                using RSA key F30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-3.1-pull-request:
  linux-user: do setrlimit selectively
  linux-user: write(fd, NULL, 0) parity with linux's treatment of same
  linux-user: elf: mmap all the target-pages of hostpage for data segment
  linux-user: add SO_LINGER to {g,s}etsockopt
  linux-user: move TargetFdTrans functions to their own file

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

migration/ram.c: Avoid taking address of fields in packed MultiFDInit_t struct

Taking the address of a field in a packed struct is a bad idea, because
it might not be actually aligned enough for that pointer type (and
thus cause a crash on dereference on some host architectures). Newer
versions of clang warn about this:

migration/ram.c:651:19: warning: taking address of packed member 'magic' of class or structure 'MultiFDInit_t' may result in an unaligned pointer value [-Waddress-of-packed-member]
migration/ram.c:652:19: warning: taking address of packed member 'version' of class or structure 'MultiFDInit_t' may result in an unaligned pointer value [-Waddress-of-packed-member]
migration/ram.c:737:19: warning: taking address of packed member 'magic' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member]
migration/ram.c:745:19: warning: taking address of packed member 'version' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member]
migration/ram.c:755:19: warning: taking address of packed member 'size' of class or structure 'MultiFDPacket_t' may result in an unaligned pointer value [-Waddress-of-packed-member]

Avoid the bug by not using the "modify in place" byteswapping
functions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180925161924.7832-1-peter.maydell@linaro.org>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: fix the compression code

Add judgement in compress_threads_save_cleanup() to check whether the
static CompressParam *comp_param has been allocated. If not, just
return; or else segmentation fault will occur when using the NULL
comp_param's parameters. One test case can reproduce this is: set
the compression on and migrate to a wrong nonexistent host IP address.

Our current code does not judge before handling comp_param[idx]'s quit
and cond that whether they have been initialized. If not initialized,
"qemu_mutex_lock_impl: Assertion `mutex->initialized' failed." will
occur. Fix this by squashing the terminate_compression_threads() into
compress_threads_save_cleanup() and employing the existing judgement
condition. One test case can reproduce this error is: set the
compression on and fail to fully setup the default eight compression
thread in compress_threads_save_setup().

Signed-off-by: Fei Li <fli@suse.com>
Message-Id: <20180925091440.18910-1-fli@suse.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: fix QEMUFile leak

Spotted by ASAN while running:

$ tests/migration-test -p /x86_64/migration/postcopy/recovery

=================================================================
==18034==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 33864 byte(s) in 1 object(s) allocated from:
    #0 0x7f3da7f31e50 in calloc (/lib64/libasan.so.5+0xeee50)
    #1 0x7f3da644441d in g_malloc0 (/lib64/libglib-2.0.so.0+0x5241d)
    #2 0x55af9db15440 in qemu_fopen_channel_input /home/elmarco/src/qemu/migration/qemu-file-channel.c:183
    #3 0x55af9db15413 in channel_get_output_return_path /home/elmarco/src/qemu/migration/qemu-file-channel.c:159
    #4 0x55af9db0d4ac in qemu_file_get_return_path /home/elmarco/src/qemu/migration/qemu-file.c:78
    #5 0x55af9dad5e4f in open_return_path_on_source /home/elmarco/src/qemu/migration/migration.c:2295
    #6 0x55af9dadb3bf in migrate_fd_connect /home/elmarco/src/qemu/migration/migration.c:3111
    #7 0x55af9dae1bf3 in migration_channel_connect /home/elmarco/src/qemu/migration/channel.c:91
    #8 0x55af9daddeca in socket_outgoing_migration /home/elmarco/src/qemu/migration/socket.c:108
    #9 0x55af9e13d3db in qio_task_complete /home/elmarco/src/qemu/io/task.c:158
    #10 0x55af9e13ca03 in qio_task_thread_result /home/elmarco/src/qemu/io/task.c:89
    #11 0x7f3da643b1ca in g_idle_dispatch gmain.c:5535

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180925092245.29565-1-marcandre.lureau@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tests/migration: Speed up the test on ppc64

The SLOF boot process is always quite slow ... but we can speed it up
a little bit by specifying "-nodefaults" and by using the "nvramrc"
variable instead of "boot-command" (since "nvramrc" is evaluated earlier
in the SLOF boot process than "boot-command").

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1537204330-16076-1-git-send-email-thuth@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: cleanup in error paths in loadvm

There's a couple of error paths in qemu_loadvm_state
which happen early on but after we've initialised the
load state; that needs to be cleaned up otherwise
we can hit asserts if the state gets reinitialised later.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180914170430.54271-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/postcopy: Clear have_listen_thread

Clear have_listen_thread when we exit the thread.
The fallout from this was that various things thought there was
an ongoing postcopy after the postcopy had finished.

The case that failed was postcopy->savevm->loadvm.

This corresponds to RH bug https://bugzilla.redhat.com/show_bug.cgi?id=1608765

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180914170430.54271-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tcg/i386: fix vector operations on 32-bit hosts

The TCG backend uses LOWREGMASK to get the low 3 bits of register numbers.
This was defined as no-op for 32-bit x86, with the assumption that we have
eight registers anyway. This assumption is not true once we have xmm regs.

Since LOWREGMASK was a no-op, xmm register indidices were wrong in opcodes
and have overflown into other opcode fields, wreaking havoc.

To trigger these problems, you can try running the "movi d8, #0x0" AArch64
instruction on 32-bit x86. "vpxor %xmm0, %xmm0, %xmm0" should be generated,
but instead TCG generated "vpxor %xmm0, %xmm0, %xmm2".

Fixes: 770c2fc7bb ("Add vector operations")
Signed-off-by: Roman Kapl <rka@sysgo.com>
Message-Id: <20180824131734.18557-1-rka@sysgo.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qht-bench: add -p flag to precompute hash values

Precomputing the hash values allows us to perform more frequent
accesses to the hash table, thereby reaching higher throughputs.

We keep the old behaviour by default, since (1) we might confuse
users if they measured a speedup without changing anything in
the QHT implementation, and (2) benchmarking the hash function
"on line" is also valuable.

Before:
$ taskset -c 0 tests/qht-bench -n 1
Throughput:        38.18 MT/s

After:
$ taskset -c 0 tests/qht-bench -n 1
Throughput:        38.16 MT/s

After (with precomputing):
$ taskset -c 0 tests/qht-bench -n 1 -p
Throughput:        50.87 MT/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qht: constify arguments to some internal functions

These functions do not modify their @ht or @bucket arguments.
Constify those arguments.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qht: constify qht_statistics_init

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qht: constify qht_lookup

seqlock_read_begin takes a const param since c04649eeea
("seqlock: constify seqlock_read_begin", 2018-08-23), so
we can constify the entire lookup.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qht: fix comment in qht_bucket_remove_entry

Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qht: drop ht argument from qht iterators

Accessing the HT from an iterator results almost always
in a deadlock. Given that only one qht-internal function
uses this argument, drop it from the interface.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

test-qht: speed up + test qht_resize

Perform first the tests that exercise code paths that are
easier to hit at small table sizes, and then resize the table
to speed up subsequent tests. If this resize is not too large,
we can make the test faster with no code coverage loss.

- With gcov enabled:

Before: 20.568s, 90.28% qht.c coverage
After: 5.168s, 93.06% qht.c coverage

The coverage increase is entirely due to calling qht_resize,
which we weren't calling before. Note that the code paths
that remain to be tested are either error handling or
can only occur when several threads are accessing the
hash table concurrently (e.g. seqlock retry, trylock fail).

- Without gcov:

Before: 1.987s
After: 0.528s

The speedup is almost the same as with gcov, although the
"before" run is a lot faster.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

test-qht: test deletion of the last entry in a bucket

This improves coverage by one (!) LoC in qht.c, bringing the
coverage rate up from 90.00% to 90.28%.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

test-qht: test removal of non-existent entries

This improves qht.c code coverage from 89.44% to 90.00%.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

test-qht: test qht_iter_remove

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qht: add qht_iter_remove

This currently has no users, but the use case is so common that I
think we must support it.

Note that without the appended we cannot safely remove a set of
elements; a 2-step approach (i.e. qht_iter first, keep track of
the to-be-deleted elements, and then a bunch of qht_remove calls)
would be racy, since between the iteration and the removals other
threads might insert additional elements.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qht: remove unused map param from qht_remove__locked

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

seccomp: check TSYNC host capability

Remove -sandbox option if the host is not capable of TSYNC, since the
sandbox will fail at setup time otherwise. This will help libvirt, for
ex, to figure out if -sandbox will work.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Acked-by: Eduardo Otubo <otubo@redhat.com>

tests/migration: Add migration-test header file

This patch moves the settings related migration-test from the
migration-test.c file to a new header file.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Wei Huang <wei@redhat.com>
Message-Id: <1536174934-26022-4-git-send-email-wei@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tests/migration: Support cross compilation in generating boot header file

Recently a new configure option, CROSS_CC_GUEST, was added to
$(TARGET)-softmmu/config-target.mak to support TCG-related tests. This
patch tries to leverage this option to support cross compilation when the
migration boot block file is being re-generated:

* The x86 related files are moved to a new sub-dir (named ./i386).
* A new top-layer Makefile is created in tests/migration/ directory.
This Makefile searches and parses CROSS_CC_GUEST to generate CROSS_PREFIX.
The CROSS_PREFIX, if available, is then passed to migration/$ARCH/Makefile.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Wei Huang <wei@redhat.com>
Message-Id: <1536174934-26022-3-git-send-email-wei@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tests/migration: Convert x86 boot block compilation script into Makefile

The x86 boot block header currently is generated with a shell script.
To better support other CPUs (e.g. aarch64), we convert the script
into Makefile. This allows us to 1) support cross-compilation easily,
and 2) avoid creating a script file for every architecture.

Note that, in the new design, the cross compiler prefix can be specified by
setting the CROSS_PREFIX in "make" command. Also to allow gcc pre-processor
to include the C-style file correctly, it also renames the
x86-a-b-bootblock.s file extension from .s to .S.

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Wei Huang <wei@redhat.com>
Message-Id: <1536174934-26022-2-git-send-email-wei@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: use save_page_use_compression in flush_compressed_data

It avoids to touch compression locks if xbzrle and compression
are both enabled

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20180906070101.27280-4-xiaoguangrong@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: show the statistics of compression

Currently, it includes:
pages: amount of pages compressed and transferred to the target VM
busy: amount of count that no free thread to compress data
busy-rate: rate of thread busy
compressed-size: amount of bytes after compression
compression-rate: rate of compressed size

Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180906070101.27280-3-xiaoguangrong@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: do not flush_compressed_data at the end of iteration

flush_compressed_data() needs to wait all compression threads to
finish their work, after that all threads are free until the
migration feeds new request to them, reducing its call can improve
the throughput and use CPU resource more effectively

We do not need to flush all threads at the end of iteration, the
data can be kept locally until the memory block is changed or
memory migration starts over in that case we will meet a dirtied
page which may still exists in compression threads's ring

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20180906070101.27280-2-xiaoguangrong@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Add a hint message to loadvm and exits on failure

This patch adds a small hint for the failure case of the load snapshot
process. It may be useful for users to remember that the VM
configuration has changed between the save and load processes.

(qemu) loadvm vm-20180903083641
Unknown savevm section or instance 'cpu_common' 4.
Make sure that your current VM setup matches your saved VM setup, including any hotplugged devices
Error -22 while loading VM state
(qemu) device_add host-spapr-cpu-core,core-id=4
(qemu) loadvm vm-20180903083641
(qemu) c
(qemu) info status
VM status: running

It also exits Qemu if the snapshot cannot be loaded before reaching the
main loop (-loadvm in the command line).

$ qemu-system-ppc64 ... -loadvm vm-20180903083641
qemu-system-ppc64: Unknown savevm section or instance 'cpu_common' 4.
Make sure that your current VM setup matches your saved VM setup, including any hotplugged devices
qemu-system-ppc64: Error -22 while loading VM state
$

Signed-off-by: Jose Ricardo Ziviani <joserz@linux.ibm.com>
Message-Id: <20180903162613.15877-1-joserz@linux.ibm.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: handle the error condition properly

ram_find_and_save_block() can return negative if any error hanppens,
however, it is completely ignored in current code

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20180903092644.25812-5-xiaoguangrong@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: fix calculating xbzrle_counters.cache_miss_rate

As Peter pointed out:
| - xbzrle_counters.cache_miss is done in save_xbzrle_page(), so it's
| per-guest-page granularity
|
| - RAMState.iterations is done for each ram_find_and_save_block(), so
| it's per-host-page granularity
|
| An example is that when we migrate a 2M huge page in the guest, we
| will only increase the RAMState.iterations by 1 (since
| ram_find_and_save_block() will be called once), but we might increase
| xbzrle_counters.cache_miss for 2M/4K=512 times (we'll call
| save_xbzrle_page() that many times) if all the pages got cache miss.
| Then IMHO the cache miss rate will be 512/1=51200% (while it should
| actually be just 100% cache miss).

And he also suggested as xbzrle_counters.cache_miss_rate is the only
user of rs->iterations we can adapt it to count target guest page
numbers

After that, rename 'iterations' to 'target_page_count' to better reflect
its meaning

Suggested-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180903092644.25812-3-xiaoguangrong@tencent.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/rdma: Fix uninitialised rdma_return_path

Clang correctly errors out moaning that rdma_return_path
is used uninitialised in the earlier error paths.
Make it NULL so that the error path ignores it.

Fixes: 55cc1b5937a8e709e4c102e74b206281073aab82
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reported-by: Cornelia Huck <cohuck@redhat.com>
Message-Id: <20180830173657.22939-1-dgilbert@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

vmdk: align end of file to a sector boundary

There is a rare case which the size of last compressed cluster
is larger than the cluster size, which will cause the file is
not aligned at the sector boundary.

There are three reasons to do it. First, if vmdk doesn't align at
the sector boundary, there may be many undefined behaviors,
such as, in vbox it will show VMDK: Compressed image is corrupted
'syno-vm-disk1.vmdk' (VERR_ZIP_CORRUPTED) when we try to import an
ova with unaligned vmdk. Second, all the cluster_sector is aligned
to sector, the last one should be like this, too. Third, it ease
reading with sector based I/Os.

Signed-off-by: yuchenlin <yuchenlin@synology.com>
Message-Id: <20180913082952.3675-1-yuchenlin@synology.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>

tests/vm: Use -cpu max rather than -cpu host

-cpu max works with any accelerator, so we don't need
to use it only conditionally if not using KVM. Just use
it all the time.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20180820155554.23476-1-peter.maydell@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>

aio-posix: do skip system call if ctx->notifier polling succeeds

Commit 70232b5253 ("aio-posix: Don't count ctx->notifier as progress when
2018-08-15), by not reporting progress, causes aio_poll to execute the
system call when polling succeeds because of ctx->notifier. This introduces
latency before the call to aio_bh_poll() and negates the advantages of
polling, unfortunately.

The fix builds on the previous patch, separating the effect of polling on
the timeout from the progress reported to aio_poll(). ctx->notifier
does zero the timeout, causing the caller to skip the system call,
but it does not report progress, so that the bug fix of commit 70232b5253
still stands.

Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-4-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>

aio-posix: compute timeout before polling

This is a preparation for the next patch, and also a very small
optimization. Compute the timeout only once, before invoking
try_poll_mode, and adjust it in run_poll_handlers. The adjustment
is the polling time when polling fails, or zero (non-blocking) if
polling succeeds.

Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-3-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>

aio-posix: fix concurrent access to poll_disable_cnt

It is valid for an aio_set_fd_handler to happen concurrently with
aio_poll. In that case, poll_disable_cnt can change under the heels
of aio_poll, and the assertion on poll_disable_cnt can fail in
run_poll_handlers.

Therefore, this patch simply checks the counter on every polling
iteration. There are no particular needs for ordering, since the
polling loop is terminated anyway by aio_notify at the end of
aio_set_fd_handler.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180912171040.1732-2-pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>

linux-user: do setrlimit selectively

setrlimit guest calls that affect memory resources
(RLIMIT_{AS,DATA,STACK}) may interfere with QEMU internal memory
management. They may result in QEMU lockup because mprotect call in
page_unprotect would fail with ENOMEM error code, causing infinite loop
of SIGSEGV. E.g. it happens when running libstdc++ testsuite for xtensa
target on x86_64 host.

Don't call host setrlimit for memory-related resources.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Message-Id: <20180917181314.22551-1-jcmvbkbc@gmail.com>
[lv: rebase on master]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

linux-user: write(fd, NULL, 0) parity with linux's treatment of same

Bring linux-user write(2) handling into line with linux for the case
of a 0-byte write with a NULL buffer. Based on a patch originally
written by Zhuowei Zhang.

Addresses https://bugs.launchpad.net/qemu/+bug/1716292.

>From Zhuowei Zhang's patch (https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg08073.html):

    Linux returns success for the special case of calling write with a
    zero-length NULL buffer: compiling and running

    int main() {
       ssize_t ret = write(STDOUT_FILENO, NULL, 0);
       fprintf(stderr, "write returned %ld\n", ret);
       return 0;
    }

    gives "write returned 0" when run directly, but "write returned
    -1" in QEMU.

    This commit checks for this situation and returns success if
    found.

Subsequent discussion raised the following questions (and my answers):

- Q. Should TARGET_NR_read pass through to safe_read in this
      situation too?
   A. I'm wary of changing unrelated code to the specific problem I'm
      addressing. TARGET_NR_read is already consistent with Linux for
      this case.

- Q. Do pread64/pwrite64 need to be changed similarly?
   A. Experiment suggests not: both linux and linux-user yield -1 for
      NULL 0-length reads/writes.

Signed-off-by: Tony Garnock-Jones <tonygarnockjones@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20180908182205.GB409@mornington.dcs.gla.ac.uk>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

linux-user: elf: mmap all the target-pages of hostpage for data segment

If the hostpage size is greater than the TARGET_PAGESIZE, the
target-pages of size TARGET_PAGESIZE are marked valid only till the
length requested during the elfload. The glibc attempts to consume unused
space in the last page of data segment(__libc_memalign() in
elf/dl-minimal.c). If PT_LOAD p_align is greater than or
equal to hostpage size, the GLRO(dl_pagesize) is actually the host pagesize
as set in the auxillary vectors. So, there is no explicit mmap request for
the remaining target-pages on the last hostpage. The glibc assumes that
particular space as available and subsequent attempts to use
those addresses lead to crash as the target_mmap has not marked them valid
for those target-pages.

The issue is seen when trying to chroot to 16.04-x86_64 ubuntu on a PPC64
host where the fork fails to access the thread_id as it is allocated on a
page not marked valid. The recent glibc doesn't have checks for thread-id in
fork, but the issue can manifest somewhere else, none the less.

The fix here is to map all the target-pages of the hostpage during the
elfload if the p_align is greater than or equal to hostpage size, for
data segment to allow the glibc for proper consumption.

Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <153553435604.51992.5640085189104207249.stgit@lep8c.aus.stglabs.ibm.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

linux-user: add SO_LINGER to {g,s}etsockopt

Original implementation for setsockopt by Chen Gang[1]; all bugs mine,
including removing assignment for optname which hopefully makes the
logic easier to follow and moving some variables to make the code
more selfcontained.

[1] http://patchwork.ozlabs.org/patch/565659/

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Co-Authored-By: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20180824085601.6259-1-carenas@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

linux-user: move TargetFdTrans functions to their own file

This will ease to move out syscall functions from syscall.c

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20180823222215.13781-1-laurent@vivier.eu>

Merge remote-tracking branch 'remotes/huth-gitlab/tags/pull-request-2018-09-25' into staging

- Deprecate the usage of a network backend via "name" instead of "id"
- Deprecate the "enforce-config-section" machine parameter
- Re-enable the wdt_ib700, endianness and vmxnet3 qtests
- Some trivial fixes and doc update patches that crossed my way

# gpg: Signature made Tue 25 Sep 2018 16:58:42 BST
# gpg:                using RSA key 2ED9D774FE702DB5
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>"
# gpg:                 aka "Thomas Huth <thuth@redhat.com>"
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>"
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>"
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* remotes/huth-gitlab/tags/pull-request-2018-09-25:
  Revert "check: Move VMXNET3 test to common"
  Revert "check: Move endianess test to common"
  Revert "check: Move wdt_ib700 test to common"
  tests/migration: Speed up the test on ppc64
  hw/qdev-core: Fix description of instance_init
  qdev: fix a typo in comment
  docs: Fix some typos (most found by codespell)
  trivial: Make bios files and source files non-executable
  memfd: fix possible usage of the uninitialized file descriptor
  hw/core/machine: Officially deprecate the enforce-config-section parameter
  net/slirp: Deprecate the [hub_id name] parameter tuple
  net: Deprecate the "name" parameter of -net
  Makefile: Add missing dependency for qemu-deprecated.texi

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/xanclic/tags/pull-block-2018-09-25' into staging

Block layer patches:
- Drain fixes
- node-name parameters for block-commit
- Refactor block jobs to use transactional callbacks for exiting

# gpg: Signature made Tue 25 Sep 2018 16:12:44 BST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* remotes/xanclic/tags/pull-block-2018-09-25: (42 commits)
  test-bdrv-drain: Test draining job source child and parent
  block: Use a single global AioWait
  test-bdrv-drain: Fix outdated comments
  test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort
  job: Avoid deadlocks in job_completed_txn_abort()
  test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
  block: Remove aio_poll() in bdrv_drain_poll variants
  blockjob: Lie better in child_job_drained_poll()
  block-backend: Decrease in_flight only after callback
  block-backend: Fix potential double blk_delete()
  block-backend: Add .drained_poll callback
  block: Add missing locking in bdrv_co_drain_bh_cb()
  test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
  job: Use AIO_WAIT_WHILE() in job_finish_sync()
  test-blockjob: Acquire AioContext around job_cancel_sync()
  test-bdrv-drain: Drain with block jobs in an I/O thread
  aio-wait: Increase num_waiters even in home thread
  blockjob: Wake up BDS when job becomes idle
  job: Fix missing locking due to mismerge
  job: Fix nested aio_poll() hanging in job_txn_apply
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Revert "check: Move VMXNET3 test to common"

This reverts commit 7a066770f53c198014add869696427f81d67e9c2.

The patch did not work as expected: The vmxnet3 test is currently
not run at all anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>

Revert "check: Move endianess test to common"

This reverts commit 669cc7100065c690cb7b4f3da5cfc471d1ed4740.

The patch did not work as expected: The endianess test is currently
not run at all anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>

Revert "check: Move wdt_ib700 test to common"

This reverts commit ee1f6c812b3240420dff07a3860060b7d4abfe09.

The patch did not work as expected: The wdt_ib700 test is currently
not run at all anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/migration: Speed up the test on ppc64

The SLOF boot process is always quite slow ... but we can speed it up
a little bit by specifying "-nodefaults" and by using the "nvramrc"
variable instead of "boot-command" (since "nvramrc" is evaluated earlier
in the SLOF boot process than "boot-command").

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

hw/qdev-core: Fix description of instance_init

The part of the documentation of DeviceClass that talks about instance_init
is partly wrong: instance_init() functions must not abort or exit, since
the function is also called during introspection of the device already.
So if a device calls exit() during its instance_init() function, QEMU
terminates unexpectedly if somebody tries to just have a look at the
interfaces from the device with "device_add xyz,help" or with the
"device-list-properties" QOM command. This should never happen.

Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

qdev: fix a typo in comment

Found by reading code.

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

docs: Fix some typos (most found by codespell)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

trivial: Make bios files and source files non-executable

These files can not be executed on the host, so they should not be
marked as executable.

Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

memfd: fix possible usage of the uninitialized file descriptor

The qemu_memfd_alloc_check() routine allocates the fd variable on stack.
This variable is initialized inside the qemu_memfd_alloc() function.
There are several cases when *fd will be left unintialized which can
lead to the unexpected close() in the qemu_memfd_free() call.

Set file descriptor to -1 before calling the qemu_memfd_alloc routine.

Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

hw/core/machine: Officially deprecate the enforce-config-section parameter

Commit 16f7244842b5135543ef068a1adafd94c6965953 added this parameter
to the documentation, including a note that it is deprecated. But it
has never been added to the "Deprecated features" appendix, which is
our official way to deprecate legacy parameters. So let's do this now.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

net/slirp: Deprecate the [hub_id name] parameter tuple

The "name" in the [hub_id name] parameter tuple is the same as a
"netdev_id" (which should be unique), so specifying the hub_id here
is just redundant (it was likely just necessary in the past when
the network subsystem was still using "vlans" only and when it did
not use unique "id"s yet).

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

net: Deprecate the "name" parameter of -net

In early times, network backends were specified by a "vlan" and "name"
tuple. With the introduction of netdevs, the "name" was replaced by an
"id" (which is supposed to be unique), but the "name" parameter stayed
as an alias which could be used instead of "id". Unfortunately, we miss
the duplication check for "name":

$ qemu-system-x86_64 -net user,name=n1 -net user,name=n1

... starts without an error, while "id" correctly complains:

$ qemu-system-x86_64 -net user,id=n1 -net user,id=n1
qemu-system-x86_64: -net user,id=n1: Duplicate ID 'n1' for net

Instead of trying to fix the code for the legacy "name" parameter, let's
rather get rid of this old interface and deprecate the "name" parameter
now - this will also be less confusing for the users in the long run.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

Makefile: Add missing dependency for qemu-deprecated.texi

Make sure that the docs get correctly regenerated when the
file qemu-deprecated.texi has been changed.

Fixes: 44c67847e32c91a6071fb0440c357b9489f08bc6
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
(cherry picked from commit f99ce85279178385f204a52236f855c879c29cdc)

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-hmp-20180925' into staging

HMP pull 2018-09-25

# gpg: Signature made Tue 25 Sep 2018 15:11:09 BST
# gpg:                using RSA key 0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-hmp-20180925:
  qmp, hmp: add PCI subsystem id and vendor id to PCI info
  hmp: fix migrate status timer leak
  monitor: print message when using 'help' with an unknown command

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180925-1' into staging

target-arm queue:
* target/arm: Fix cpu_get_tb_cpu_state() for non-SVE CPUs
* hw/arm/exynos4210: fix Exynos4210 UART support
* hw/arm/virt-acpi-build: Add a check for memory-less NUMA nodes
* arm: Add BBC micro:bit machine
* aspeed/i2c: Fix interrupt handling bugs
* hw/arm/smmu-common: Fix the name of the iommu memory regions
* hw/arm/smmuv3: fix eventq recording and IRQ triggerring
* hw/intc/arm_gic: Document QEMU interface
* hw/intc/arm_gic: Drop GIC_BASE_IRQ macro
* hw/net/pcnet-pci: Convert away from old_mmio accessors
* hw/timer/cmsdk-apb-dualtimer: Add missing 'break' statements
* aspeed/timer: fix compile breakage with clang 3.4.2
* hw/arm/aspeed: change the FMC flash model of the AST2500 evb
* hw/arm/aspeed: Minor code cleanups
* target/arm: Start AArch32 CPUs with EL2 but not EL3 in Hyp mode

# gpg: Signature made Tue 25 Sep 2018 15:23:11 BST
# gpg:                using RSA key 3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20180925-1: (21 commits)
  target/arm: Start AArch32 CPUs with EL2 but not EL3 in Hyp mode
  aspeed/smc: fix some alignment issues
  hw/arm/aspeed: Add an Aspeed machine class
  hw/arm/aspeed: change the FMC flash model of the AST2500 evb
  aspeed/timer: fix compile breakage with clang 3.4.2
  hw/timer/cmsdk-apb-dualtimer: Add missing 'break' statements
  hw/net/pcnet-pci: Unify pcnet_ioport_read/write and pcnet_mmio_read/write
  hw/net/pcnet-pci: Convert away from old_mmio accessors
  hw/intc/arm_gic: Drop GIC_BASE_IRQ macro
  hw/intc/arm_gic: Document QEMU interface
  hw/arm/smmuv3: fix eventq recording and IRQ triggerring
  hw/arm/smmu-common: Fix the name of the iommu memory regions
  aspeed/i2c: Fix receive done interrupt handling
  aspeed/i2c: Handle receive command in separate function
  aspeed/i2c: interrupts should be cleared by software only
  arm: Add BBC micro:bit machine
  arm: Add Nordic Semiconductor nRF51 SoC
  MAINTAINERS: Add NRF51 entry
  hw/arm/virt-acpi-build: Add a check for memory-less NUMA nodes
  hw/arm/exynos4210: fix Exynos4210 UART support
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Start AArch32 CPUs with EL2 but not EL3 in Hyp mode

The ARMv8 architecture defines that an AArch32 CPU starts
in SVC mode, unless EL2 is the highest available EL, in
which case it starts in Hyp mode. (In ARMv7 a CPU with EL2
but not EL3 was not a valid configuration, but we don't
specifically reject this if the user asks for one.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20180823135047.16525-1-peter.maydell@linaro.org

aspeed/smc: fix some alignment issues

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180921161939.822-6-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/aspeed: Add an Aspeed machine class

The code looks better, it removes duplicated lines and it will ease
the introduction of common properties for the Aspeed machines.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180921161939.822-4-clg@kaod.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/aspeed: change the FMC flash model of the AST2500 evb

The AST2500 evb is shipped with a W25Q256 which has a non volatile bit
to make the chip operate in 4 Byte address mode at power up. This
should be an interesting feature to model as it will exercise a bit
more the SMC controllers and MMIO execution at boot time.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20180921161939.822-3-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

aspeed/timer: fix compile breakage with clang 3.4.2

In file included from /home/thuth/devel/qemu/hw/timer/aspeed_timer.c:16:
/home/thuth/devel/qemu/include/hw/misc/aspeed_scu.h:37:3: error:
redefinition of typedef 'AspeedSCUState' is a C11 feature
[-Werror,-Wtypedef-redefinition]
} AspeedSCUState;
^
/home/thuth/devel/qemu/include/hw/timer/aspeed_timer.h:27:31: note:
previous definition is here
typedef struct AspeedSCUState AspeedSCUState;

Reported-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180921161939.822-2-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/timer/cmsdk-apb-dualtimer: Add missing 'break' statements

Add 'break' statements missing from a switch in the APB dual-timer
write function. Spotted by Coverity as CID 1395626 and 1395633.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180924123122.14549-1-peter.maydell@linaro.org

hw/net/pcnet-pci: Unify pcnet_ioport_read/write and pcnet_mmio_read/write

The only difference between our implementation of the pcnet ioport
accessors and the mmio accessors is that the former check BCR_DWIO to
see what access widths are permitted for addresses in the aprom range
(0x0..0xf). In fact our failure to do this in the mmio accessors
is a bug (one which was fixed for the ioport accessors in
commit 7ba79741970 in 2011).

The data sheet for the Am79C970A does not describe the DWIO
bit as only applying for I/O space mapped I/O resources and
not memory mapped I/O resources, and our MMIO accessors already
honour DWIO for accesses in the 0x10..0x1f range (since the
pcnet_ioport_{read,write}{w,l} functions check it).

The data sheet for the later but compatible Am79C976 is clearer:
it states specifically "DWIO mode applies to both I/O- and
memory-mapped acceses." This seems to be reasonable evidence
in favour of interpretating the Am79C970A spec as being the same.

(NB: Linux's pcnet driver only supports I/O accesses, so the
MMIO access part of this device is probably untested anyway.)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/net/pcnet-pci: Convert away from old_mmio accessors

Convert the pcnet-pci device away from using the old_mmio
MemoryRegionOps accessor functions.

This commit is a no-behaviour-change API conversion.
(Since PCNET_PNPMMIO_SIZE is 0x20, the old "addr & 0x10"
check and the new "addr < 0x10" check are exact opposites;
the new code is phrased to be parallel with the
pcnet_io_read/write functions.)

I have left a TODO comment marker because the similarity
between the MMIO and IO accessor behaviour is suspicious
and they could be combined, but this will be left to a
different patch.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/intc/arm_gic: Drop GIC_BASE_IRQ macro

The GIC_BASE_IRQ macro is a leftover from when we shared code
between the GICv2 and the v7M NVIC. Since the NVIC is now
split off, GIC_BASE_IRQ is always 0, and we can just delete it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20180824161819.11085-1-peter.maydell@linaro.org

hw/intc/arm_gic: Document QEMU interface

The GICv2's QEMU interface (sysbus MMIO regions, IRQs,
etc) is now quite complicated with the addition of the
virtualization extensions. Add a comment in the header
file which documents it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Luc Michel <luc.michel@greensocs.com>
Message-id: 20180823103818.31189-1-peter.maydell@linaro.org

hw/arm/smmuv3: fix eventq recording and IRQ triggerring

The event queue management is broken today. Event records
are not properly written as EVT_SET_* macro was not updating
the actual event record. Also the event queue interrupt
is not correctly triggered.

Fixes: bb981004eaf4 ("hw/arm/smmuv3: Event queue recording helper")
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20180921070138.10114-3-eric.auger@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'kevin/tags/for-upstream' into block

Block layer patches:

- Fix some jobs/drain/aio_poll related hangs
- commit: Add top-node/base-node options
- linux-aio: Fix locking for qemu_laio_process_completions()
- Fix use after free error in bdrv_open_inherit

# gpg: Signature made Tue Sep 25 15:54:01 2018 CEST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* kevin/tags/for-upstream: (26 commits)
  test-bdrv-drain: Test draining job source child and parent
  block: Use a single global AioWait
  test-bdrv-drain: Fix outdated comments
  test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort
  job: Avoid deadlocks in job_completed_txn_abort()
  test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
  block: Remove aio_poll() in bdrv_drain_poll variants
  blockjob: Lie better in child_job_drained_poll()
  block-backend: Decrease in_flight only after callback
  block-backend: Fix potential double blk_delete()
  block-backend: Add .drained_poll callback
  block: Add missing locking in bdrv_co_drain_bh_cb()
  test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
  job: Use AIO_WAIT_WHILE() in job_finish_sync()
  test-blockjob: Acquire AioContext around job_cancel_sync()
  test-bdrv-drain: Drain with block jobs in an I/O thread
  aio-wait: Increase num_waiters even in home thread
  blockjob: Wake up BDS when job becomes idle
  job: Fix missing locking due to mismerge
  job: Fix nested aio_poll() hanging in job_txn_apply
  ...

Signed-off-by: Max Reitz <mreitz@redhat.com>

test-bdrv-drain: Test draining job source child and parent

For the block job drain test, don't only test draining the source and
the target node, but create a backing chain for the source
(source_backing <- source <- source_overlay) and test draining each of
the nodes in it.

When using iothreads, the source node (and therefore the job) is in a
different AioContext than the drain, which happens from the main
thread. This way, the main thread waits in AIO_WAIT_WHILE() for the
iothread to make process and aio_wait_kick() is required to notify it.
The test validates that calling bdrv_wakeup() for a child or a parent
node will actually notify AIO_WAIT_WHILE() instead of letting it hang.

Increase the sleep time a bit (to 1 ms) because the test case is racy
and with the shorter sleep, it didn't reproduce the bug it is supposed
to test for me under 'rr record -n'.

This was because bdrv_drain_invoke_entry() (in the main thread) was only
called after the job had already reached the pause point, so we got a
bdrv_dec_in_flight() from the main thread and the additional
aio_wait_kick() when the job becomes idle (that we really wanted to test
here) wasn't even necessary any more to make progress.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block: Use a single global AioWait

When draining a block node, we recurse to its parent and for subtree
drains also to its children. A single AIO_WAIT_WHILE() is then used to
wait for bdrv_drain_poll() to become true, which depends on all of the
nodes we recursed to. However, if the respective child or parent becomes
quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
original node.

Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().

This may mean that the draining thread gets a few more unnecessary
wakeups because an unrelated operation got completed, but we already
wake it up when something _could_ have changed rather than only if it
has certainly changed.

Apart from that, drain is a slow path anyway. In theory it would be
possible to use wakeups more selectively and still correctly, but the
gains are likely not worth the additional complexity. In fact, this
patch is a nice simplification for some places in the code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

test-bdrv-drain: Fix outdated comments

Commit 89bd030533e changed the test case from using job_sleep_ns() to
using qemu_co_sleep_ns() instead. Also, block_job_sleep_ns() became
job_sleep_ns() in commit 5d43e86e11f.

In both cases, some comments in the test case were not updated. Do that
now.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort

This adds tests for calling AIO_WAIT_WHILE() in the .commit and .abort
callbacks. Both reasons why .abort could be called for a single job are
tested: Either .run or .prepare could return an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

job: Avoid deadlocks in job_completed_txn_abort()

Amongst others, job_finalize_single() calls the .prepare/.commit/.abort
callbacks of the individual job driver. Recently, their use was adapted
for all block jobs so that they involve code calling AIO_WAIT_WHILE()
now. Such code must be called under the AioContext lock for the
respective job, but without holding any other AioContext lock.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()

This is a regression test for a deadlock that could occur in callbacks
called from the aio_poll() in bdrv_drain_poll_top_level(). The
AioContext lock wasn't released and therefore would be taken a second
time in the callback. This would cause a possible AIO_WAIT_WHILE() in
the callback to hang.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>

block: Remove aio_poll() in bdrv_drain_poll variants

bdrv_drain_poll_top_level() was buggy because it didn't release the
AioContext lock of the node to be drained before calling aio_poll().
This way, callbacks called by aio_poll() would possibly take the lock a
second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.

However, it turns out that the aio_poll() call isn't actually needed any
more. It was introduced in commit 91af091f923, which is effectively
reverted by this patch. The cases it was supposed to fix are now covered
by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
state.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

blockjob: Lie better in child_job_drained_poll()

Block jobs claim in .drained_poll() that they are in a quiescent state
as soon as job->deferred_to_main_loop is true. This is obviously wrong,
they still have a completion BH to run. We only get away with this
because commit 91af091f923 added an unconditional aio_poll(false) to the
drain functions, but this is bypassing the regular drain mechanisms.

However, just removing this and telling that the job is still active
doesn't work either: The completion callbacks themselves call drain
functions (directly, or indirectly with bdrv_reopen), so they would
deadlock then.

As a better lie, tell that the job is active as long as the BH is
pending, but falsely call it quiescent from the point in the BH when the
completion callback is called. At this point, nested drain calls won't
deadlock because they ignore the job, and outer drains will wait for the
job to really reach a quiescent state because the callback is already
running.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block-backend: Decrease in_flight only after callback

Request callbacks can do pretty much anything, including operations that
will yield from the coroutine (such as draining the backend). In that
case, a decreased in_flight would be visible to other code and could
lead to a drain completing while the callback hasn't actually completed
yet.

Note that reordering these operations forbids calling drain directly
inside an AIO callback. As Paolo explains, indirectly calling it is
okay:

- Calling it through a coroutine is okay, because then
  bdrv_drained_begin() goes through bdrv_co_yield_to_drain() and you
  have in_flight=2 when bdrv_co_yield_to_drain() yields, then soon
  in_flight=1 when the aio_co_wake() in the AIO callback completes, then
  in_flight=0 after the bottom half starts.

- Calling it through a bottom half would be okay too, as long as the AIO
  callback remembers to do inc_in_flight/dec_in_flight just like
  bdrv_co_yield_to_drain() and bdrv_co_drain_bh_cb() do

A few more important cases that come to mind:

- A coroutine that yields because of I/O is okay, with a sequence
  similar to bdrv_co_yield_to_drain().

- A coroutine that yields with no I/O pending will correctly decrease
  in_flight to zero before yielding.

- Calling more AIO from the callback won't overflow the counter just
  because of mutual recursion, because AIO functions always yield at
  least once before invoking the callback.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

block-backend: Fix potential double blk_delete()

blk_unref() first decreases the refcount of the BlockBackend and calls
blk_delete() if the refcount reaches zero. Requests can still be in
flight at this point, they are only drained during blk_delete():

At this point, arbitrary callbacks can run. If any callback takes a
temporary BlockBackend reference, it will first increase the refcount to
1 and then decrease it to 0 again, triggering another blk_delete(). This
will cause a use-after-free crash in the outer blk_delete().

Fix it by draining the BlockBackend before decreasing to refcount to 0.
Assert in blk_ref() that it never takes the first refcount (which would
mean that the BlockBackend is already being deleted).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>