OSDN Git Service

qmiga/qemu.git
7 years agovirtio-9p: message header is 7-byte long
Greg Kurz [Thu, 29 Jun 2017 13:11:50 +0000 (15:11 +0200)]
virtio-9p: message header is 7-byte long

The 9p spec at http://man.cat-v.org/plan_9/5/intro reads:

 "Each 9P message begins with a four-byte size field specify-
  ing the length in bytes of the complete message including
  the four bytes of the size field itself.  The next byte is
  the message type, one of the constants in the enumeration in
  the include file <fcall.h>.  The next two bytes are an iden-
  tifying tag, described below."

ie, each message starts with a 7-byte long header.

The core 9P code already assumes this pretty much everywhere. This patch
does the following:
- makes the assumption explicit in the common 9p.h header, since it isn't
  related to the transport
- open codes the header size in handle_9p_output() and hardens the sanity
  check on the space needed for the reply message

Signed-off-by: Greg Kurz <groug@kaod.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agovirtio-9p: record element after sanity checks
Greg Kurz [Thu, 29 Jun 2017 13:11:50 +0000 (15:11 +0200)]
virtio-9p: record element after sanity checks

If the guest sends a malformed request, we end up with a dangling pointer
in V9fsVirtioState. This doesn't seem to cause any bug, but let's remove
this side effect anyway.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
7 years ago9pfs: replace g_malloc()+memcpy() with g_memdup()
Marc-André Lureau [Thu, 29 Jun 2017 13:11:50 +0000 (15:11 +0200)]
9pfs: replace g_malloc()+memcpy() with g_memdup()

I found these pattern via grepping the source tree. I don't have a
coccinelle script for it!

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
7 years ago9pfs: local: Add support for custom fmode/dmode in 9ps mapped security modes
Tobias Schramm [Thu, 29 Jun 2017 13:11:50 +0000 (15:11 +0200)]
9pfs: local: Add support for custom fmode/dmode in 9ps mapped security modes

In mapped security modes, files are created with very restrictive
permissions (600 for files and 700 for directories). This makes
file sharing between virtual machines and users on the host rather
complicated. Imagine eg. a group of users that need to access data
produced by processes on a virtual machine. Giving those users access
to the data will be difficult since the group access mode is always 0.

This patch makes the default mode for both files and directories
configurable. Existing setups that don't know about the new parameters
keep using the current secure behavior.

Signed-off-by: Tobias Schramm <tobleminer@gmail.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
7 years ago9pfs: local: remove: use correct path component
Bruce Rogers [Thu, 29 Jun 2017 13:11:50 +0000 (15:11 +0200)]
9pfs: local: remove: use correct path component

Commit a0e640a8 introduced a path processing error.
Pass fstatat the dirpath based path component instead
of the entire path.

Signed-off-by: Bruce Rogers <brogers@suse.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
7 years agoMerge remote-tracking branch 'remotes/sstabellini/tags/xen-20170627-tag' into staging
Peter Maydell [Thu, 29 Jun 2017 10:45:01 +0000 (11:45 +0100)]
Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20170627-tag' into staging

Xen 2017/06/27

# gpg: Signature made Tue 27 Jun 2017 23:02:43 BST
# gpg:                using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# gpg:                 aka "Stefano Stabellini <sstabellini@kernel.org>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3  0AEA 894F 8F48 70E1 AE90

* remotes/sstabellini/tags/xen-20170627-tag:
  xen-disk: add support for multi-page shared rings
  xen-disk: only advertize feature-persistent if grant copy is not available
  xen/disk: don't leak stack data via response ring

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agolinux-user: Put PPC AT_IGNOREPPC auxv entries in the right place
Peter Maydell [Tue, 27 Jun 2017 16:49:58 +0000 (17:49 +0100)]
linux-user: Put PPC AT_IGNOREPPC auxv entries in the right place

The 32-bit PPC auxv is a bit complicated because in the
mists of time it used to be 16-aligned rather than directly
after the environment. Older glibc versions had code to
try to probe for whether it needed alignment or not:
https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/powerpc/dl-sysdep.c;hb=e84eabb3871c9b39e59323bf3f6b98c2ca9d1cd0
and the kernel has code which puts some magic entries at
the bottom to ensure that the alignment probe fails:
http://elixir.free-electrons.com/linux/latest/source/arch/powerpc/include/asm/elf.h#L158

QEMU has similar code too, but it was broken by commit
7c4ee5bcc82e64, which changed elfload.c from filling in
the auxv starting at the highest address and working down
to starting at the lowest address and working up. This
means that the ARCH_DLINFO hook must now be invoked first
rather than last, and the entries in it for PPC must
be reversed so that the magic AT_IGNOREPPC entries come
at the lowest address in the auxv as they should.

The effect of this was that if running a guest binary that
used an old glibc with the alignment probing the guest ld.so
code would segfault if the size of the guest environment and
argv happened to put the auxv at an address that triggered
the alignment code in the guest glibc.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Tested-by: Richard Henderson <rth@twiddle.net>
Message-id: 1498582198-6649-1-git-send-email-peter.maydell@linaro.org

7 years agoxen-disk: add support for multi-page shared rings
Paul Durrant [Wed, 21 Jun 2017 12:52:48 +0000 (08:52 -0400)]
xen-disk: add support for multi-page shared rings

The blkif protocol has had provision for negotiation of multi-page shared
rings for some time now and many guest OS have support in their frontend
drivers.

This patch makes the necessary modifications to xen-disk support a shared
ring up to order 4 (i.e. 16 pages).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen-disk: only advertize feature-persistent if grant copy is not available
Paul Durrant [Wed, 21 Jun 2017 12:52:47 +0000 (08:52 -0400)]
xen-disk: only advertize feature-persistent if grant copy is not available

If grant copy is available then it will always be used in preference to
persistent maps. In this case feature-persistent should not be advertized
to the frontend, otherwise it may needlessly copy data into persistently
granted buffers.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/disk: don't leak stack data via response ring
Stefano Stabellini [Tue, 27 Jun 2017 21:45:34 +0000 (14:45 -0700)]
xen/disk: don't leak stack data via response ring

Rather than constructing a local structure instance on the stack, fill
the fields directly on the shared ring, just like other (Linux)
backends do. Build on the fact that all response structure flavors are
actually identical (aside from alignment and padding at the end).

This is XSA-216.

Reported by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
7 years agoMerge remote-tracking branch 'remotes/edgar/tags/edgar/mmio-exec-v2.for-upstream...
Peter Maydell [Tue, 27 Jun 2017 15:56:55 +0000 (16:56 +0100)]
Merge remote-tracking branch 'remotes/edgar/tags/edgar/mmio-exec-v2.for-upstream' into staging

edgar/mmio-exec-v2.for-upstream

# gpg: Signature made Tue 27 Jun 2017 16:22:30 BST
# gpg:                using RSA key 0x29C596780F6BCA83
# gpg: Good signature from "Edgar E. Iglesias (Xilinx key) <edgar.iglesias@xilinx.com>"
# gpg:                 aka "Edgar E. Iglesias <edgar.iglesias@gmail.com>"
# Primary key fingerprint: AC44 FEDC 14F7 F1EB EDBF  4151 29C5 9678 0F6B CA83

* remotes/edgar/tags/edgar/mmio-exec-v2.for-upstream:
  xilinx_spips: allow mmio execution
  exec: allow to get a pointer for some mmio memory region
  introduce mmio_interface
  qdev: add MemoryRegion property
  cputlb: fix the way get_page_addr_code fills the tlb
  cputlb: move get_page_addr_code
  cputlb: cleanup get_page_addr_code to use VICTIM_TLB_HIT

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoxilinx_spips: allow mmio execution
KONRAD Frederic [Thu, 20 Oct 2016 09:09:53 +0000 (11:09 +0200)]
xilinx_spips: allow mmio execution

This allows to execute from the lqspi area.

When the request_ptr is called the device loads 1024bytes from the SPI device.
Then this code can be executed by the guest.

Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7 years agoexec: allow to get a pointer for some mmio memory region
KONRAD Frederic [Wed, 19 Oct 2016 13:06:49 +0000 (15:06 +0200)]
exec: allow to get a pointer for some mmio memory region

This introduces a special callback which allows to run code from some MMIO
devices.

SysBusDevice with a MemoryRegion which implements the request_ptr callback will
be notified when the guest try to execute code from their offset. Then it will
be able to eg: pre-load some code from an SPI device or ask a pointer from an
external simulator, etc..

When the pointer or the data in it are no longer valid the device has to
invalidate it.

Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7 years agointroduce mmio_interface
KONRAD Frederic [Thu, 16 Feb 2017 09:27:00 +0000 (10:27 +0100)]
introduce mmio_interface

This introduces mmio_interface object which contains a MemoryRegion
and can be hotplugged/hotunplugged.

Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7 years agoqdev: add MemoryRegion property
KONRAD Frederic [Thu, 16 Feb 2017 14:06:24 +0000 (15:06 +0100)]
qdev: add MemoryRegion property

We need to pass a pointer to a MemoryRegion for mmio_interface.
So this just adds that.

Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7 years agocputlb: fix the way get_page_addr_code fills the tlb
KONRAD Frederic [Fri, 3 Feb 2017 15:32:12 +0000 (16:32 +0100)]
cputlb: fix the way get_page_addr_code fills the tlb

get_page_addr_code(..) does a cpu_ldub_code to fill the tlb:
This can lead to some side effects if a device is mapped at this address.

So this patch replaces the cpu_memory_ld by a tlb_fill.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7 years agocputlb: move get_page_addr_code
KONRAD Frederic [Fri, 3 Feb 2017 15:29:50 +0000 (16:29 +0100)]
cputlb: move get_page_addr_code

This just moves the code before VICTIM_TLB_HIT macro definition
so we can use it.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7 years agocputlb: cleanup get_page_addr_code to use VICTIM_TLB_HIT
KONRAD Frederic [Fri, 3 Feb 2017 15:27:49 +0000 (16:27 +0100)]
cputlb: cleanup get_page_addr_code to use VICTIM_TLB_HIT

This replaces env1 and page_index variables by env and index
so we can use VICTIM_TLB_HIT macro later.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
7 years agoMerge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Peter Maydell [Mon, 26 Jun 2017 14:38:29 +0000 (15:38 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Mon 26 Jun 2017 14:07:32 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (60 commits)
  qemu-img: don't shadow opts variable in img_dd()
  block: Do not strcmp() with NULL uri->scheme
  blkverify: Catch bs->exact_filename overflow
  blkdebug: Catch bs->exact_filename overflow
  fix: avoid an infinite loop or a dangling pointer problem in img_commit
  block: change variable names in BlockDriverState
  block: Remove bdrv_aio_readv/writev/flush()
  qed: Use bdrv_co_* for coroutine_fns
  qed: Add coroutine_fn to I/O path functions
  qed: Use a coroutine for need_check_timer
  qed: Simplify request handling
  qed: Use CoQueue for serialising allocations
  qed: Implement .bdrv_co_readv/writev
  qed: Remove recursion in qed_aio_next_io()
  qed: Remove ret argument from qed_aio_next_io()
  qed: Add return value to qed_aio_read/write_data()
  qed: Add return value to qed_aio_write_inplace/alloc()
  qed: Add return value to qed_aio_write_cow()
  qed: Add return value to qed_aio_write_main()
  qed: Add return value to qed_aio_write_l2_update()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'mreitz/tags/pull-block-2017-06-26' into queue-block
Kevin Wolf [Mon, 26 Jun 2017 12:57:27 +0000 (14:57 +0200)]
Merge remote-tracking branch 'mreitz/tags/pull-block-2017-06-26' into queue-block

Block patches for the block queue

# gpg: Signature made Mon Jun 26 14:56:24 2017 CEST
# gpg:                using RSA key 0xF407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-06-26:
  qemu-img: don't shadow opts variable in img_dd()
  block: Do not strcmp() with NULL uri->scheme
  blkverify: Catch bs->exact_filename overflow
  blkdebug: Catch bs->exact_filename overflow
  fix: avoid an infinite loop or a dangling pointer problem in img_commit
  block: change variable names in BlockDriverState

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqemu-img: don't shadow opts variable in img_dd()
Stefan Hajnoczi [Mon, 19 Jun 2017 15:00:02 +0000 (16:00 +0100)]
qemu-img: don't shadow opts variable in img_dd()

It's confusing when two different variables have the same name in one
function.

Cc: Reda Sallahi <fullmanet@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20170619150002.3033-1-stefanha@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: Do not strcmp() with NULL uri->scheme
Max Reitz [Tue, 13 Jun 2017 20:57:26 +0000 (22:57 +0200)]
block: Do not strcmp() with NULL uri->scheme

uri_parse(...)->scheme may be NULL. In fact, probably every field may be
NULL, and the callers do test this for all of the other fields but not
for scheme (except for block/gluster.c; block/vxhs.c does not access
that field at all).

We can easily fix this by using g_strcmp0() instead of strcmp().

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613205726.13544-1-mreitz@redhat.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblkverify: Catch bs->exact_filename overflow
Max Reitz [Tue, 13 Jun 2017 17:20:06 +0000 (19:20 +0200)]
blkverify: Catch bs->exact_filename overflow

The bs->exact_filename field may not be sufficient to store the full
blkverify node filename. In this case, we should not generate a filename
at all instead of an unusable one.

Cc: qemu-stable@nongnu.org
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613172006.19685-3-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblkdebug: Catch bs->exact_filename overflow
Max Reitz [Tue, 13 Jun 2017 17:20:05 +0000 (19:20 +0200)]
blkdebug: Catch bs->exact_filename overflow

The bs->exact_filename field may not be sufficient to store the full
blkdebug node filename. In this case, we should not generate a filename
at all instead of an unusable one.

Cc: qemu-stable@nongnu.org
Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170613172006.19685-2-mreitz@redhat.com
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agofix: avoid an infinite loop or a dangling pointer problem in img_commit
sochin.jiang [Thu, 15 Jun 2017 06:47:33 +0000 (14:47 +0800)]
fix: avoid an infinite loop or a dangling pointer problem in img_commit

img_commit could fall into an infinite loop calling run_block_job() if
its blockjob fails on any I/O error, fix this already known problem.

Signed-off-by: sochin.jiang <sochin.jiang@huawei.com>
Message-id: 1497509253-28941-1-git-send-email-sochin.jiang@huawei.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: change variable names in BlockDriverState
Manos Pitsidianakis [Fri, 9 Jun 2017 10:18:08 +0000 (13:18 +0300)]
block: change variable names in BlockDriverState

Change the 'int count' parameter in *pwrite_zeros, *pdiscard related
functions (and some others) to 'int bytes', as they both refer to bytes.
This helps with code legibility.

Signed-off-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Message-id: 20170609101808.13506-1-el13635@mail.ntua.gr
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: Remove bdrv_aio_readv/writev/flush()
Kevin Wolf [Fri, 18 Nov 2016 15:47:54 +0000 (16:47 +0100)]
block: Remove bdrv_aio_readv/writev/flush()

These functions are unused now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Use bdrv_co_* for coroutine_fns
Kevin Wolf [Fri, 16 Jun 2017 12:43:19 +0000 (14:43 +0200)]
qed: Use bdrv_co_* for coroutine_fns

All functions that are marked coroutine_fn can directly call the
bdrv_co_* version of functions instead of going through the wrapper.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Add coroutine_fn to I/O path functions
Kevin Wolf [Mon, 12 Jun 2017 09:12:41 +0000 (11:12 +0200)]
qed: Add coroutine_fn to I/O path functions

Now that we stay in coroutine context for the whole request when doing
reads or writes, we can add coroutine_fn annotations to many functions
that can do I/O or yield directly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Use a coroutine for need_check_timer
Kevin Wolf [Fri, 18 Nov 2016 15:04:59 +0000 (16:04 +0100)]
qed: Use a coroutine for need_check_timer

This fixes the last place where we degraded from AIO to actual blocking
synchronous I/O requests. Putting it into a coroutine means that instead
of blocking, the coroutine simply yields while doing I/O.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Simplify request handling
Kevin Wolf [Fri, 18 Nov 2016 13:47:36 +0000 (14:47 +0100)]
qed: Simplify request handling

Now that we process a request in the same coroutine from beginning to
end and don't drop out of it any more, we can look like a proper
coroutine-based driver and simply call qed_aio_next_io() and get a
return value from it instead of spawning an additional coroutine that
reenters the parent when it's done.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Use CoQueue for serialising allocations
Kevin Wolf [Fri, 18 Nov 2016 14:32:17 +0000 (15:32 +0100)]
qed: Use CoQueue for serialising allocations

Now that we're running in coroutine context, the ad-hoc serialisation
code (which drops a request that has to wait out of coroutine context)
can be replaced by a CoQueue.

This means that when we resume a serialised request, it is running in
coroutine context again and its I/O isn't blocking any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Implement .bdrv_co_readv/writev
Kevin Wolf [Mon, 14 Nov 2016 13:20:00 +0000 (14:20 +0100)]
qed: Implement .bdrv_co_readv/writev

Most of the qed code is now synchronous and matches the coroutine model.
One notable exception is the serialisation between requests which can
still schedule a callback. Before we can replace this with coroutine
locks, let's convert the driver's external interfaces to the coroutine
versions.

We need to be careful to handle both requests that call the completion
callback directly from the calling coroutine (i.e. fully synchronous
code) and requests that involve some callback, so that we need to yield
and wait for the completion callback coming from outside the coroutine.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Manos Pitsidianakis <el13635@mail.ntua.gr>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove recursion in qed_aio_next_io()
Kevin Wolf [Fri, 18 Nov 2016 13:16:42 +0000 (14:16 +0100)]
qed: Remove recursion in qed_aio_next_io()

Instead of calling itself recursively as the last thing, just convert
qed_aio_next_io() into a loop.

This patch is best reviewed with 'git show -w' because most of it is
just whitespace changes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove ret argument from qed_aio_next_io()
Kevin Wolf [Fri, 18 Nov 2016 12:40:13 +0000 (13:40 +0100)]
qed: Remove ret argument from qed_aio_next_io()

All callers pass ret = 0, so we can just remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Add return value to qed_aio_read/write_data()
Kevin Wolf [Thu, 17 Nov 2016 14:40:41 +0000 (15:40 +0100)]
qed: Add return value to qed_aio_read/write_data()

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Add return value to qed_aio_write_inplace/alloc()
Kevin Wolf [Thu, 17 Nov 2016 14:40:41 +0000 (15:40 +0100)]
qed: Add return value to qed_aio_write_inplace/alloc()

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Add return value to qed_aio_write_cow()
Kevin Wolf [Thu, 17 Nov 2016 14:40:41 +0000 (15:40 +0100)]
qed: Add return value to qed_aio_write_cow()

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

While refactoring qed_aio_write_alloc() to accomodate the change,
qed_aio_write_zero_cluster() ended up with a single line, so I chose to
inline that line and remove the function completely.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Add return value to qed_aio_write_main()
Kevin Wolf [Thu, 17 Nov 2016 14:40:41 +0000 (15:40 +0100)]
qed: Add return value to qed_aio_write_main()

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Add return value to qed_aio_write_l2_update()
Kevin Wolf [Thu, 17 Nov 2016 14:40:41 +0000 (15:40 +0100)]
qed: Add return value to qed_aio_write_l2_update()

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Add return value to qed_aio_write_l1_update()
Kevin Wolf [Thu, 17 Nov 2016 14:40:41 +0000 (15:40 +0100)]
qed: Add return value to qed_aio_write_l1_update()

Don't recurse into qed_aio_next_io() and qed_aio_complete() here, but
just return an error code and let the caller handle it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Inline qed_commit_l2_update()
Kevin Wolf [Thu, 17 Nov 2016 11:51:21 +0000 (12:51 +0100)]
qed: Inline qed_commit_l2_update()

qed_commit_l2_update() is unconditionally called at the end of
qed_aio_write_l1_update(). Inline it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Make qed_aio_write_main() synchronous
Kevin Wolf [Mon, 14 Nov 2016 13:56:32 +0000 (14:56 +0100)]
qed: Make qed_aio_write_main() synchronous

Note that this code is generally not running in coroutine context, so
this is an actual blocking synchronous operation. We'll fix this in a
moment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Make qed_aio_read_data() synchronous
Kevin Wolf [Mon, 14 Nov 2016 13:56:32 +0000 (14:56 +0100)]
qed: Make qed_aio_read_data() synchronous

Note that this code is generally not running in coroutine context, so
this is an actual blocking synchronous operation. We'll fix this in a
moment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove callback from qed_write_table()
Kevin Wolf [Tue, 15 Nov 2016 10:14:01 +0000 (11:14 +0100)]
qed: Remove callback from qed_write_table()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove GenericCB
Kevin Wolf [Fri, 18 Nov 2016 16:16:24 +0000 (17:16 +0100)]
qed: Remove GenericCB

The GenericCB infrastructure isn't used any more. Remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Make qed_write_table() synchronous
Kevin Wolf [Mon, 14 Nov 2016 13:56:32 +0000 (14:56 +0100)]
qed: Make qed_write_table() synchronous

Note that this code is generally not running in coroutine context, so
this is an actual blocking synchronous operation. We'll fix this in a
moment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove callback from qed_write_header()
Kevin Wolf [Tue, 15 Nov 2016 10:14:01 +0000 (11:14 +0100)]
qed: Remove callback from qed_write_header()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Make qed_write_header() synchronous
Kevin Wolf [Mon, 14 Nov 2016 13:56:32 +0000 (14:56 +0100)]
qed: Make qed_write_header() synchronous

Note that this code is generally not running in coroutine context, so
this is an actual blocking synchronous operation. We'll fix this in a
moment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove callback from qed_copy_from_backing_file()
Kevin Wolf [Tue, 15 Nov 2016 10:14:01 +0000 (11:14 +0100)]
qed: Remove callback from qed_copy_from_backing_file()

With this change, qed_aio_write_prefill() and qed_aio_write_postfill()
collapse into a single function. This is reflected by a rename of the
combined function to qed_aio_write_cow().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Make qed_copy_from_backing_file() synchronous
Kevin Wolf [Mon, 14 Nov 2016 13:56:32 +0000 (14:56 +0100)]
qed: Make qed_copy_from_backing_file() synchronous

Note that this code is generally not running in coroutine context, so
this is an actual blocking synchronous operation. We'll fix this in a
moment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Make qed_read_backing_file() synchronous
Kevin Wolf [Mon, 14 Nov 2016 13:56:32 +0000 (14:56 +0100)]
qed: Make qed_read_backing_file() synchronous

Note that this code is generally not running in coroutine context, so
this is an actual blocking synchronous operation. We'll fix this in a
moment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove callback from qed_find_cluster()
Kevin Wolf [Mon, 14 Nov 2016 15:56:10 +0000 (16:56 +0100)]
qed: Remove callback from qed_find_cluster()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove callback from qed_read_l2_table()
Kevin Wolf [Mon, 14 Nov 2016 15:26:14 +0000 (16:26 +0100)]
qed: Remove callback from qed_read_l2_table()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Remove callback from qed_read_table()
Kevin Wolf [Mon, 14 Nov 2016 15:08:44 +0000 (16:08 +0100)]
qed: Remove callback from qed_read_table()

Instead of passing the return value to a callback, return it to the
caller so that the callback can be inlined there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Make qed_read_table() synchronous
Kevin Wolf [Mon, 14 Nov 2016 13:56:32 +0000 (14:56 +0100)]
qed: Make qed_read_table() synchronous

Note that this code is generally not running in coroutine context, so
this is an actual blocking synchronous operation. We'll fix this in a
moment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqed: Use bottom half to resume waiting requests
Kevin Wolf [Wed, 16 Nov 2016 16:31:14 +0000 (17:31 +0100)]
qed: Use bottom half to resume waiting requests

The qed driver serialises allocating write requests. When the active
allocation is finished, the AIO callback is called, but after this, the
next allocating request is immediately processed instead of leaving the
coroutine. Resuming another allocation request in the same request
coroutine means that the request now runs in the wrong coroutine.

The following is one of the possible effects of this: The completed
request will generally reenter its request coroutine in a bottom half,
expecting that it completes the request in bdrv_driver_pwritev().
However, if the second request actually yielded before leaving the
coroutine, the reused request coroutine is in an entirely different
place and is reentered prematurely. Not a good idea.

Let's make sure that we exit the coroutine after completing the first
request by resuming the next allocating request only with a bottom
half.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqcow2: Use offset_into_cluster() and offset_to_l2_index()
Alberto Garcia [Tue, 20 Jun 2017 13:01:36 +0000 (16:01 +0300)]
qcow2: Use offset_into_cluster() and offset_to_l2_index()

We already have functions for doing these calculations, so let's use
them instead of doing everything by hand. This makes the code a bit
more readable.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqcow2: Merge the writing of the COW regions with the guest data
Alberto Garcia [Mon, 19 Jun 2017 13:40:08 +0000 (16:40 +0300)]
qcow2: Merge the writing of the COW regions with the guest data

If the guest tries to write data that results on the allocation of a
new cluster, instead of writing the guest data first and then the data
from the COW regions, write everything together using one single I/O
operation.

This can improve the write performance by 25% or more, depending on
several factors such as the media type, the cluster size and the I/O
request size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()
Alberto Garcia [Mon, 19 Jun 2017 13:40:07 +0000 (16:40 +0300)]
qcow2: Pass a QEMUIOVector to do_perform_cow_{read,write}()

Instead of passing a single buffer pointer to do_perform_cow_write(),
pass a QEMUIOVector. This will allow us to merge the write requests
for the COW regions and the actual data into a single one.

Although do_perform_cow_read() does not strictly need to change its
API, we're doing it here as well for consistency.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqcow2: Allow reading both COW regions with only one request
Alberto Garcia [Mon, 19 Jun 2017 13:40:06 +0000 (16:40 +0300)]
qcow2: Allow reading both COW regions with only one request

Reading both COW regions requires two separate requests, but it's
perfectly possible to merge them and perform only one. This generally
improves performance, particularly on rotating disk drives. The
downside is that the data in the middle region is read but discarded.

This patch takes a conservative approach and only merges reads when
the size of the middle region is <= 16KB.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqcow2: Split do_perform_cow() into _read(), _encrypt() and _write()
Alberto Garcia [Mon, 19 Jun 2017 13:40:05 +0000 (16:40 +0300)]
qcow2: Split do_perform_cow() into _read(), _encrypt() and _write()

This patch splits do_perform_cow() into three separate functions to
read, encrypt and write the COW regions.

perform_cow() can now read both regions first, then encrypt them and
finally write them to disk. The memory allocation is also done in
this function now, using one single buffer large enough to hold both
regions.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqcow2: Make perform_cow() call do_perform_cow() twice
Alberto Garcia [Mon, 19 Jun 2017 13:40:04 +0000 (16:40 +0300)]
qcow2: Make perform_cow() call do_perform_cow() twice

Instead of calling perform_cow() twice with a different COW region
each time, call it just once and make perform_cow() handle both
regions.

This patch simply moves code around. The next one will do the actual
reordering of the COW operations.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqcow2: Use unsigned int for both members of Qcow2COWRegion
Alberto Garcia [Mon, 19 Jun 2017 13:40:03 +0000 (16:40 +0300)]
qcow2: Use unsigned int for both members of Qcow2COWRegion

Qcow2COWRegion has two attributes:

- The offset of the COW region from the start of the first cluster
  touched by the I/O request. Since it's always going to be positive
  and the maximum request size is at most INT_MAX, we can use a
  regular unsigned int to store this offset.

- The size of the COW region in bytes. This is guaranteed to be >= 0,
  so we should use an unsigned type instead.

In x86_64 this reduces the size of Qcow2COWRegion from 16 to 8 bytes.
It will also help keep some assertions simpler now that we know that
there are no negative numbers.

The prototype of do_perform_cow() is also updated to reflect these
changes.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqcow2: Remove unused Error variable in do_perform_cow()
Alberto Garcia [Mon, 19 Jun 2017 13:40:02 +0000 (16:40 +0300)]
qcow2: Remove unused Error variable in do_perform_cow()

We are using the return value of qcow2_encrypt_sectors() to detect
problems but we are throwing away the returned Error since we have no
way to report it to the user. Therefore we can simply get rid of the
local Error variable and pass NULL instead.

Alternatively we could try to figure out a way to pass the original
error instead of simply returning -EIO, but that would be more
invasive, so let's keep the current approach.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agonvme: Add support for Read Data and Write Data in CMBs.
Stephen Bates [Tue, 13 Jun 2017 10:08:35 +0000 (04:08 -0600)]
nvme: Add support for Read Data and Write Data in CMBs.

Add the ability for the NVMe model to support both the RDS and WDS
modes in the Controller Memory Buffer.

Although not currently supported in the upstreamed Linux kernel a fork
with support exists [1] and user-space test programs that build on
this also exist [2].

Useful for testing CMB functionality in preperation for real CMB
enabled NVMe devices (coming soon).

[1] https://github.com/sbates130272/linux-p2pmem
[2] https://github.com/sbates130272/p2pmem-test

Signed-off-by: Stephen Bates <sbates@raithlin.com>
Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqemu-iotests: 068: test iothread mode
Stefan Hajnoczi [Thu, 15 Jun 2017 16:38:13 +0000 (17:38 +0100)]
qemu-iotests: 068: test iothread mode

Perform the savevm/loadvm test with both iothread on and off.  This
covers the recently found savevm/loadvm hang when iothread is enabled.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqemu-iotests: 068: use -drive/-device instead of -hda
Stefan Hajnoczi [Thu, 15 Jun 2017 16:38:12 +0000 (17:38 +0100)]
qemu-iotests: 068: use -drive/-device instead of -hda

The legacy -hda option does not support -drive/-device parameters.  They
will be required by the next patch that extends this test case.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqemu-iotests: 068: extract _qemu() function
Stefan Hajnoczi [Thu, 15 Jun 2017 16:38:11 +0000 (17:38 +0100)]
qemu-iotests: 068: extract _qemu() function

Avoid duplicating the QEMU command-line.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agomigration: hold AioContext lock for loadvm qemu_fclose()
Stefan Hajnoczi [Thu, 15 Jun 2017 16:38:10 +0000 (17:38 +0100)]
migration: hold AioContext lock for loadvm qemu_fclose()

migration_incoming_state_destroy() uses qemu_fclose() on the vmstate
file.  Make sure to call it inside an AioContext acquire/release region.

This fixes an 'qemu: qemu_mutex_unlock: Operation not permitted' abort
in loadvm.

This patch closes the vmstate file before ending the drained region.
Previously we closed the vmstate file after ending the drained region.
The order does not matter.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agothrottle: Update throttle-groups.c documentation
Alberto Garcia [Tue, 13 Jun 2017 21:16:12 +0000 (00:16 +0300)]
throttle: Update throttle-groups.c documentation

There used to be throttle_timers_{detach,attach}_aio_context() calls
in bdrv_set_aio_context(), but since 7ca7f0f6db1fedd28d490795d778cf239
they are now in blk_set_aio_context().

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agodoc: Document driver-specific -blockdev options
Kevin Wolf [Thu, 22 Sep 2016 15:24:38 +0000 (17:24 +0200)]
doc: Document driver-specific -blockdev options

This documents the driver-specific options for the raw, qcow2 and file
block drivers for the man page. For everything else, we refer to the
QAPI documentation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agodoc: Document generic -blockdev options
Kevin Wolf [Thu, 22 Sep 2016 14:53:24 +0000 (16:53 +0200)]
doc: Document generic -blockdev options

This adds documentation for the -blockdev options that apply to all
nodes independent of the block driver used.

All options that are shared by -blockdev and -drive are now explained in
the section for -blockdev. The documentation of -drive mentions that all
-blockdev options are accepted as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agomigration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()
Stefan Hajnoczi [Mon, 22 May 2017 13:57:04 +0000 (14:57 +0100)]
migration: use bdrv_drain_all_begin/end() instead bdrv_drain_all()

blk/bdrv_drain_all() only takes effect for a single instant and then
resumes block jobs, guest devices, and other external clients like the
NBD server.  This can be handy when performing a synchronous drain
before terminating the program, for example.

Monitor commands usually need to quiesce I/O across an entire code
region so blk/bdrv_drain_all() is not suitable.  They must use
bdrv_drain_all_begin/end() to mark the region.  This prevents new I/O
requests from slipping in or worse - block jobs completing and modifying
the graph.

I audited other blk/bdrv_drain_all() callers but did not find anything
that needs a similar fix.  This patch fixes the savevm/loadvm commands.
Although I haven't encountered a read world issue this makes the code
safer.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agomigration: avoid recursive AioContext locking in save_vmstate()
Stefan Hajnoczi [Mon, 22 May 2017 13:57:03 +0000 (14:57 +0100)]
migration: avoid recursive AioContext locking in save_vmstate()

AioContext was designed to allow nested acquire/release calls.  It uses
a recursive mutex so callers don't need to worry about nesting...or so
we thought.

BDRV_POLL_WHILE() is used to wait for block I/O requests.  It releases
the AioContext temporarily around aio_poll().  This gives IOThreads a
chance to acquire the AioContext to process I/O completions.

It turns out that recursive locking and BDRV_POLL_WHILE() don't mix.
BDRV_POLL_WHILE() only releases the AioContext once, so the IOThread
will not be able to acquire the AioContext if it was acquired
multiple times.

Instead of trying to release AioContext n times in BDRV_POLL_WHILE(),
this patch simply avoids nested locking in save_vmstate().  It's the
simplest fix and we should step back to consider the big picture with
all the recent changes to block layer threading.

This patch is the final fix to solve 'savevm' hanging with -object
iothread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoblock: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()
Stefan Hajnoczi [Mon, 22 May 2017 13:57:02 +0000 (14:57 +0100)]
block: use BDRV_POLL_WHILE() in bdrv_rw_vmstate()

Calling aio_poll() directly may have been fine previously, but this is
the future, man!  The difference between an aio_poll() loop and
BDRV_POLL_WHILE() is that BDRV_POLL_WHILE() releases the AioContext
around aio_poll().

This allows the IOThread to run fd handlers or BHs to complete the
request.  Failure to release the AioContext causes deadlocks.

Using BDRV_POLL_WHILE() partially fixes a 'savevm' hang with -object
iothread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoblock: count bdrv_co_rw_vmstate() requests
Stefan Hajnoczi [Mon, 22 May 2017 13:57:01 +0000 (14:57 +0100)]
block: count bdrv_co_rw_vmstate() requests

Call bdrv_inc/dec_in_flight() for vmstate reads/writes.  This seems
unnecessary at first glance because vmstate reads/writes are done
synchronously while the guest is stopped.  But we need the bdrv_wakeup()
in bdrv_dec_in_flight() so the main loop sees request completion.
Besides, it's cleaner to count vmstate reads/writes like ordinary
read/write requests.

The bdrv_wakeup() partially fixes a 'savevm' hang with -object iothread.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqemu-iotests: Test exiting qemu with running job
Kevin Wolf [Fri, 9 Jun 2017 11:37:01 +0000 (13:37 +0200)]
qemu-iotests: Test exiting qemu with running job

When qemu is exited, all running jobs should be cancelled successfully.
This adds a test for this for all types of block jobs that currently
exist in qemu.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
7 years agoqemu-iotests: Allow starting new qemu after cleanup
Kevin Wolf [Fri, 9 Jun 2017 11:32:48 +0000 (13:32 +0200)]
qemu-iotests: Allow starting new qemu after cleanup

After _cleanup_qemu(), test cases should be able to start the next qemu
process and call _cleanup_qemu() for that one as well. For this to work
cleanly, we need to improve the cleanup so that the second invocation
doesn't try to kill the qemu instances from the first invocation a
second time (which would result in error messages).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agocommit: Fix completion with extra reference
Kevin Wolf [Fri, 9 Jun 2017 11:29:36 +0000 (13:29 +0200)]
commit: Fix completion with extra reference

commit_complete() can't assume that after its block_job_completed() the
job is actually immediately freed; someone else may still be holding
references. In this case, the op blockers on the intermediate nodes make
the graph reconfiguration in the completion code fail.

Call block_job_remove_all_bdrv() manually so that we know for sure that
any blockers on intermediate nodes are given up.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
7 years agoconfigure: Define NCURSES_WIDECHAR if we're using curses
Peter Maydell [Fri, 2 Jun 2017 14:35:38 +0000 (15:35 +0100)]
configure: Define NCURSES_WIDECHAR if we're using curses

We want the wide character functions from the ncurses header.
Unfortunately it doesn't provide them by default, but only
if either:
 * NCURSES_WIDECHAR is defined (for ncurses 20111030 and up)
 * _XOPEN_SOURCE/_XOPEN_SOURCE_EXTENDED are suitably defined

So far we have been implicitly relying on the latter, because
for GNU libc when we define _GNU_SOURCE this causes libc
to define the _XOPEN_SOURCE macros for us. Unfortunately
this doesn't work on all libcs, because some (like OSX and
musl libc) do not define _XOPEN_SOURCE when _GNU_SOURCE
is defined.

We can't fix this by defining _XOPEN_SOURCE ourselves, because
that also means "and don't provide any functions that aren't in
that standard", and not all libcs provide any way to override
that to also get the non-standard functions. In particular
FreeBSD has no such mechanism, and OSX's _DARWIN_C_SOURCE
doesn't reenable everything (for instance getpagesize()
is still not prototyped if _DARWIN_C_SOURCE and _XOPEN_SOURCE
are both defined).

So we have to define NCURSES_WIDECHAR. (This will only work
if your ncurses is at least 20111030, as older versions
don't honour this macro.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Laszlo Ersek <lersek@redhat.com>
Message-id: 1496414138-7622-1-git-send-email-peter.maydell@linaro.org

7 years agoMerge remote-tracking branch 'remotes/rth/tags/pull-s390-20170623' into staging
Peter Maydell [Fri, 23 Jun 2017 17:11:48 +0000 (18:11 +0100)]
Merge remote-tracking branch 'remotes/rth/tags/pull-s390-20170623' into staging

Queued target/s390x patches

# gpg: Signature made Fri 23 Jun 2017 17:18:24 BST
# gpg:                using RSA key 0xAD1270CC4DD0279B
# gpg: Good signature from "Richard Henderson <rth7680@gmail.com>"
# gpg:                 aka "Richard Henderson <rth@redhat.com>"
# gpg:                 aka "Richard Henderson <rth@twiddle.net>"
# Primary key fingerprint: 9CB1 8DDA F8E8 49AD 2AFC  16A4 AD12 70CC 4DD0 279B

* remotes/rth/tags/pull-s390-20170623:
  target/s390x: Implement idte instruction
  target/s390x: Improve heuristic for ipte
  target/s390x: Indicate and check for local tlb clearing
  target/s390x: Clean up TB flag bits
  target/s390x: Finish implementing ETF2-ENH
  target/s390x: Mark STFLE_49 facility as available
  target/s390x: Implement processor-assist insn
  target/s390x: Implement execution-hint insns
  target/s390x: Mark STFLE_53 facility as available
  target/s390x: Implement load-and-zero-rightmost-byte insns
  target/s390x: Implement load-on-condition-2 insns
  target/s390x: Mark FPSEH facility as available
  target/s390x: implement mvcos instruction
  target/s390x: change PSW_SHIFT_KEY
  target/s390x: Map existing FAC_* names to S390_FEAT_* names

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agotarget/s390x: Implement idte instruction
David Hildenbrand [Thu, 22 Jun 2017 09:41:51 +0000 (11:41 +0200)]
target/s390x: Implement idte instruction

Let's keep it very simple for now and flush the complete tlb,
we currently can't find the right entries in our tlb, we would have
to store the used tables for each element.

As we now fully implement the DAT-enhancement facility, we can allow to
enable it for the qemu CPU model.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170622094151.28633-4-david@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Improve heuristic for ipte
David Hildenbrand [Thu, 22 Jun 2017 09:41:50 +0000 (11:41 +0200)]
target/s390x: Improve heuristic for ipte

If only the page index is set, most likely we don't have a valid
virtual address. Let's do a full tlb flush for that case.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170622094151.28633-3-david@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Indicate and check for local tlb clearing
David Hildenbrand [Thu, 22 Jun 2017 09:41:49 +0000 (11:41 +0200)]
target/s390x: Indicate and check for local tlb clearing

Let's allow to enable it for the qemu cpu model and correctly emulate
it.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170622094151.28633-2-david@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Clean up TB flag bits
Richard Henderson [Mon, 19 Jun 2017 04:11:48 +0000 (21:11 -0700)]
target/s390x: Clean up TB flag bits

Most of the PSW bits that were being copied into TB->flags
are not relevant to translation.  Removing those that are
unnecessary reduces the amount of translation required.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Finish implementing ETF2-ENH
Richard Henderson [Sat, 17 Jun 2017 00:37:59 +0000 (17:37 -0700)]
target/s390x: Finish implementing ETF2-ENH

Missed the proper alignment in TRTO/TRTT, and ignoring the M3
field for all TRXX insns without ETF2-ENH.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Mark STFLE_49 facility as available
Richard Henderson [Sat, 17 Jun 2017 00:16:24 +0000 (17:16 -0700)]
target/s390x: Mark STFLE_49 facility as available

This facility bit includes execution-hint, load-and-trap,
miscellaneous-instruction-extensions and processor-assist.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Implement processor-assist insn
Richard Henderson [Sat, 17 Jun 2017 00:15:39 +0000 (17:15 -0700)]
target/s390x: Implement processor-assist insn

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Implement execution-hint insns
Richard Henderson [Sat, 17 Jun 2017 00:05:50 +0000 (17:05 -0700)]
target/s390x: Implement execution-hint insns

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Mark STFLE_53 facility as available
Richard Henderson [Fri, 16 Jun 2017 23:49:28 +0000 (16:49 -0700)]
target/s390x: Mark STFLE_53 facility as available

This facility bit includes load-on-condition-2 and
load-and-zero-rightmost-byte.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Implement load-and-zero-rightmost-byte insns
Richard Henderson [Fri, 16 Jun 2017 23:47:51 +0000 (16:47 -0700)]
target/s390x: Implement load-and-zero-rightmost-byte insns

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Implement load-on-condition-2 insns
Richard Henderson [Fri, 16 Jun 2017 23:35:34 +0000 (16:35 -0700)]
target/s390x: Implement load-on-condition-2 insns

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Mark FPSEH facility as available
Richard Henderson [Thu, 15 Jun 2017 21:14:06 +0000 (14:14 -0700)]
target/s390x: Mark FPSEH facility as available

This facility bit includes DFP-rounding, FPR-GR-transfer,
FPS-sign-handling, and IEEE-exception-simulation.  We do
support all of these.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: implement mvcos instruction
David Hildenbrand [Wed, 14 Jun 2017 13:38:19 +0000 (15:38 +0200)]
target/s390x: implement mvcos instruction

This adds support for the MOVE WITH OPTIONAL SPECIFICATIONS (MVCOS)
instruction. Allow to enable it for the qemu cpu model using

qemu-system-s390x ... -cpu qemu,mvcos=on ...

This allows to boot linux kernel that uses it for uacccess.

We are missing (as for most other part) low address protection checks,
PSW key / storage key checks and support for AR-mode.

We fake an ADDRESSING exception when called from problem state (which
seems to rely on PSW key checks to be in place) and if AR-mode is used.
user mode will always see a PRIVILEDGED exception.

This patch is based on an original patch by Miroslav Benes (thanks!).

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170614133819.18480-3-david@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: change PSW_SHIFT_KEY
David Hildenbrand [Wed, 14 Jun 2017 13:38:18 +0000 (15:38 +0200)]
target/s390x: change PSW_SHIFT_KEY

Such shifts are usually used to easily extract the PSW KEY from the PSW
mask, so let's avoid the confusing offset of 4.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20170614133819.18480-2-david@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agotarget/s390x: Map existing FAC_* names to S390_FEAT_* names
Richard Henderson [Thu, 15 Jun 2017 02:24:16 +0000 (19:24 -0700)]
target/s390x: Map existing FAC_* names to S390_FEAT_* names

The FAC_ names were placeholders prior to the introduction
of the current facility modeling.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
7 years agoMerge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20170622' into staging
Peter Maydell [Fri, 23 Jun 2017 15:19:04 +0000 (16:19 +0100)]
Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20170622' into staging

pull-seccomp-20170622

# gpg: Signature made Thu 22 Jun 2017 09:01:01 BST
# gpg:                using RSA key 0xDF32E7C0F0FFF9A2
# gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <otubo@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: D67E 1B50 9374 86B4 0723  DBAB DF32 E7C0 F0FF F9A2

* remotes/otubo/tags/pull-seccomp-20170622:
  MAINTAINERS: seccomp: change email contact for Eduardo Otubo

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'remotes/kraxel/tags/queue/misc-pull-request' into staging
Peter Maydell [Fri, 23 Jun 2017 14:40:09 +0000 (15:40 +0100)]
Merge remote-tracking branch 'remotes/kraxel/tags/queue/misc-pull-request' into staging

# gpg: Signature made Fri 23 Jun 2017 13:48:04 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/queue/misc-pull-request:
  applesmc: fix port i/o access width
  applesmc: implement error status port
  applesmc: cosmetic whitespace and indentation cleanup

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoui/cocoa.m: add Speed menu
John Arbuckle [Wed, 14 Jun 2017 03:17:38 +0000 (23:17 -0400)]
ui/cocoa.m: add Speed menu

Programs running inside of QEMU can sometimes use more CPU time than is really
needed. To solve this problem, we just need to throttle the virtual CPU. This
feature will stop laptops from burning up.

This patch adds a menu called Speed that has menu items from 100% to 1% that
represent the speed options. 100% is full speed and 1% is slowest.

Signed-off-by: John Arbuckle <programmingkidx@gmail.com>
Message-id: D6FAAABF-064D-49C0-B572-C73679F34052@gmail.com
[PMM: Moved "mark 100% menu item as checked initially" code to
 after menu item is allocated, not before it]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>