git.osdn.net Git - qmiga/qemu.git/log

qmp: Include "reserve" property of memory backends

Let's include the new property. Instead of relying on CONFIG_LINUX,
let's try to unconditionally grab the property and treat errors as
"does not exist".

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-15-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hmp: Print "share" property of memory backends with "info memdev"

Let's print the property.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-14-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qmp: Include "share" property of memory backends

Let's include the property, which can be helpful when debugging,
for example, to spot misuse of MAP_PRIVATE which can result in some ugly
corner cases (e.g., double-memory consumption on shmem).

Use the same description we also use for describing the property.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-13-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qmp: Clarify memory backend properties returned via query-memdev

We return information on the currently configured memory backends and
don't configure them, so decribe what the currently set properties
express.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-12-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hostmem: Wire up RAM_NORESERVE via "reserve" property

Let's provide a way to control the use of RAM_NORESERVE via memory
backends using the "reserve" property which defaults to true (old
behavior).

Only Linux currently supports clearing the flag (and support is checked at
runtime, depending on the setting of "/proc/sys/vm/overcommit_memory").
Windows and other POSIX systems will bail out with "reserve=false".

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM. This essentially allows
avoiding to set "/proc/sys/vm/overcommit_memory == 0") when using
virtio-mem and also supporting hugetlbfs in the future.

As really only Linux implements RAM_NORESERVE right now, let's expose
the property only with CONFIG_LINUX. Setting the property to "false"
will then only fail in corner cases -- for example on very old kernels
or when memory overcommit was completely disabled by the admin.

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-11-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/mmap-alloc: Support RAM_NORESERVE via MAP_NORESERVE under Linux

Let's support RAM_NORESERVE via MAP_NORESERVE on Linux. The flag has no
effect on most shared mappings - except for hugetlbfs and anonymous memory.

Linux man page:
  "MAP_NORESERVE: Do not reserve swap space for this mapping. When swap
  space is reserved, one has the guarantee that it is possible to modify
  the mapping. When swap space is not reserved one might get SIGSEGV
  upon a write if no physical memory is available. See also the discussion
  of the file /proc/sys/vm/overcommit_memory in proc(5). In kernels before
  2.6, this flag had effect only for private writable mappings."

Note that the "guarantee" part is wrong with memory overcommit in Linux.

Also, in Linux hugetlbfs is treated differently - we configure reservation
of huge pages from the pool, not reservation of swap space (huge pages
cannot be swapped).

The rough behavior is [1]:
a) !Hugetlbfs:

  1) Without MAP_NORESERVE *or* with memory overcommit under Linux
     disabled ("/proc/sys/vm/overcommit_memory == 2"), the following
     accounting/reservation happens:
      For a file backed map
       SHARED or READ-only - 0 cost (the file is the map not swap)
       PRIVATE WRITABLE - size of mapping per instance

      For an anonymous or /dev/zero map
       SHARED   - size of mapping
       PRIVATE READ-only - 0 cost (but of little use)
       PRIVATE WRITABLE - size of mapping per instance

  2) With MAP_NORESERVE, no accounting/reservation happens.

b) Hugetlbfs:

  1) Without MAP_NORESERVE, huge pages are reserved.

  2) With MAP_NORESERVE, no huge pages are reserved.

Note: With "/proc/sys/vm/overcommit_memory == 0", we were already able
to configure it for !hugetlbfs globally; this toggle now allows
configuring it more fine-grained, not for the whole system.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM.

[1] https://www.kernel.org/doc/Documentation/vm/overcommit-accounting

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-10-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()

Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The
new flag has the following semantics:

"
RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge
pages if applicable) is skipped: will bail out if not supported. When not
set, the OS will do the reservation, if supported for the memory type.
"

Allow passing it into:
- memory_region_init_ram_nomigrate()
- memory_region_init_resizeable_ram()
- memory_region_init_ram_from_file()

... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag.
Bail out if the flag is not supported, which is the case right now for
both, POSIX and win32. We will add Linux support next and allow specifying
RAM_NORESERVE via memory backends.

The target use case is virtio-mem, which dynamically exposes memory
inside a large, sparse memory area to the VM.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-9-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()

Let's pass flags instead of bools to prepare for passing other flags and
update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_
flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify
it.

We expose only flags that are currently supported by qemu_ram_mmap().
Maybe, we'll see qemu_mmap() in the future as well that can implement these
flags.

Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only
defined for some systems and we want to always be able to identify
these flags reliably inside qemu_ram_mmap() -- for example, to properly
warn when some future flags are not available or effective on a system.
Also, this way we can simplify PROT_ handling as well.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-8-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

softmmu/memory: Pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal()

Let's pass ram_flags to qemu_ram_alloc() and qemu_ram_alloc_internal(),
preparing for passing additional flags.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-7-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

softmmu/memory: Pass ram_flags to memory_region_init_ram_shared_nomigrate()

Let's forward ram_flags instead, renaming
memory_region_init_ram_shared_nomigrate() into
memory_region_init_ram_flags_nomigrate().

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-6-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

softmmu/memory: Pass ram_flags to qemu_ram_alloc_from_fd()

Let's pass in ram flags just like we do with qemu_ram_alloc_from_file(),
to clean up and prepare for more flags.

Simplify the documentation of passed ram flags: Looking at our
documentation of RAM_SHARED and RAM_PMEM is sufficient, no need to be
repetitive.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-5-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/mmap-alloc: Factor out activating of memory to mmap_activate()

We want to activate memory within a reserved memory region, to make it
accessible. Let's factor that out.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-4-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/mmap-alloc: Factor out reserving of a memory region to mmap_reserve()

We want to reserve a memory region without actually populating memory.
Let's factor that out.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-3-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/mmap-alloc: Factor out calculation of the pagesize for the guard page

Let's factor out calculating the size of the guard page and rename the
variable to make it clearer that this pagesize only applies to the
guard page.

Reviewed-by: Peter Xu <peterx@redhat.com>
Acked-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core
Cc: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210510114328.21835-2-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

softmmu/physmem: Fix qemu_ram_remap() to handle shared anonymous memory

RAM_SHARED now also properly indicates shared anonymous memory. Let's check
that flag for anonymous memory as well, to restore the proper mapping.

Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210406080126.24010-4-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

softmmu/physmem: Fix ram_block_discard_range() to handle shared anonymous memory

We can create shared anonymous memory via
"-object memory-backend-ram,share=on,..."
which is, for example, required by PVRDMA for mremap() to work.

Shared anonymous memory is weird, though. Instead of MADV_DONTNEED, we
have to use MADV_REMOVE: MADV_DONTNEED will only remove / zap all
relevant page table entries of the current process, the backend storage
will not get removed, resulting in no reduced memory consumption and
a repopulation of previous content on next access.

Shared anonymous memory is internally really just shmem, but without a
fd exposed. As we cannot use fallocate() without the fd to discard the
backing storage, MADV_REMOVE gets the same job done without a fd as
documented in "man 2 madvise". Removing backing storage implicitly
invalidates all page table entries with relevant mappings - an additional
MADV_DONTNEED is not required.

Fixes: 06329ccecfa0 ("mem: add share parameter to memory-backend-ram")
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210406080126.24010-3-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

softmmu/physmem: Mark shared anonymous memory RAM_SHARED

Let's drop the "shared" parameter from ram_block_add() and properly
store it in the flags of the ram block instead, such that
qemu_ram_is_shared() properly succeeds on all ram blocks that were mapped
MAP_SHARED.

We'll use this information next to fix some cases with shared anonymous
memory.

Reviewed-by: Igor Kotrasinski <i.kotrasinsk@partner.samsung.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20210406080126.24010-2-david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: store lun coming from the MESSAGE OUT phase

The LUN is selected with an IDENTIFY message, and persists
until the next message out phase. Instead of passing it to
do_busid_cmd, store it in ESPState. Because do_cmd can simply
skip the message out phase if cmdfifo_cdb_offset is zero, it
can now be used for the S without ATN cases as well.

Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: fix migration version check in esp_is_version_5()

Commit 4e78f3bf35 "esp: defer command completion interrupt on incoming data
transfers" added a version check for use with VMSTATE_*_TEST macros to allow
migration from older QEMU versions. Unfortunately the version check fails to
work in its current form since if the VMStateDescription version_id is
incremented, the test returns false and so the fields are not included in the
outgoing migration stream.

Change the version check to use >= rather == to ensure that migration works
correctly when the ESPState VMStateDescription has version_id > 5.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Fixes: 4e78f3bf35 ("esp: defer command completion interrupt on incoming data transfers")
Message-Id: <20210613102614.5438-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: correctly accumulate extended messages for PDMA

Commit 799d90d818 "esp: transition to message out phase after SATN and stop
command" added logic to correctly handle extended messages for DMA requests
but not for PDMA requests.

Apply the same logic in esp_do_dma() to do_dma_pdma_cb() so that extended
messages terminated with a PDMA request are accumulated correctly. This allows
the ESP device to respond correctly to the SDTR negotiation initiated by the
NetBSD ESP driver without causing errors and timeouts on boot.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20210519100803.10293-6-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: revert 75ef849696 "esp: correctly fill bus id with requested lun"

This commit from nearly 10 years ago is now broken due to the improvements
in esp emulation (or perhaps was never correct).  It shows up as a bug
in detecting the CDROM drive under MacOS. The error is caused by the
MacOS CDROM driver sending this CDB with an "S without ATN" command and
without DMA:

    0x12 0x00 0x00 0x00 0x05 0x00 (INQUIRY)

This is a valid INQUIRY command, however with this logic present the 3rd
byte (0x0) is copied over the 1st byte (0x12) which silently converts the
INQUIRY command to a TEST UNIT READY command before passing it to the
QEMU SCSI layer.  Since the TEST UNIT READY command has a zero length
response the MacOS CDROM driver never receives a response and assumes
the CDROM is not present.

The logic was to ignore the IDENTIFY byte and copy the LUN over from
the CDB, which did store the LUN in bits 5-7 of the second byte in
olden times.  This however is all obsolete, so just drop the code.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20210519100803.10293-5-mark.cave-ayland@ilande.co.uk>
[Tweaked commit message. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: ensure PDMA write transfers are flushed from the FIFO to the target immediately

After each PDMA write transfer the MacOS CDROM driver waits until the FIFO is empty
(i.e. its contents have been written out to the SCSI bus) by polling the FIFO count
register until it reads 0. This doesn't work with the current PDMA write
implementation which waits until either the FIFO is full or the transfer is complete
before invoking the PDMA callback to process the FIFO contents.

Change the PDMA write transfer logic so that the PDMA callback is invoked after each
PDMA write to transfer the FIFO contents to the target buffer immediately, and hence
avoid getting stuck in the FIFO count register polling loop.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20210519100803.10293-4-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: handle non-DMA transfers from the target one byte at a time

The initial implementation of non-DMA transfers was based upon analysis of traces
from the MacOS toolbox ROM for handling unaligned reads but missed one key
aspect - during a non-DMA transfer from the target, the bus service interrupt
should be raised for every single byte received from the bus and not just at either
the end of the transfer or when the FIFO is full.

Adjust the non-DMA code accordingly so that esp_do_nodma() is called for every byte
received from the target. This also includes special handling for managing the change
from DATA IN to STATUS phase as this needs to occur when the final byte is read out
from the FIFO, and not at the end of the transfer of the last byte into the FIFO.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20210519100803.10293-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: allow non-DMA callback in esp_transfer_data() initial transfer

The current implementation only resumes DMA transfers when incoming data is
received from the target device, but this is also required for non-DMA transfers
with the next set of non-DMA changes.

Rather than duplicate the DMA/non-DMA dispatch logic in the initial transfer
section, update the code so that the initial transfer section can just
fallthrough to the main DMA/non-DMA dispatch logic.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20210519100803.10293-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: only set ESP_RSEQ at the start of the select sequence

When processing a command to select a target and send a CDB, the ESP device
maintains a sequence step register so that if an error occurs the host can
determine which part of the selection/CDB submission sequence failed.

The old Linux 2.6 driver is really pedantic here: it checks the sequence step
register even if a command succeeds and complains loudly on the console if the
sequence step register doesn't match the expected bus phase and interrupt flags.

This reason this mismatch occurs is because the ESP emulation currently doesn't
update the bus phase until the next TI (Transfer Information) command and so the
cleared sequence step register is considered invalid for the stale bus phase.

Normally this isn't an issue as the host only checks the sequence step register
if an error occurs but the old Linux 2.6 driver does this in several places
causing a large stream of "esp0: STEP_ASEL for tgt 0" messages to appear on the
console during the boot process.

Fix this by not clearing the sequence step register when reading the interrupt
register and clearing the DMA status, so the guest sees a valid sequence step
and bus phase combination at the end of the command phase. No other change is
required since the sequence step register is correctly updated throughout the
selection/CDB submission sequence once one of the select commands is issued.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Fixes: 1b9e48a5bd ("esp: implement non-DMA transfers in PDMA mode")
Message-Id: <20210518212511.21688-3-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

esp: only assert INTR_DC interrupt flag if selection fails

The datasheet sequence tables confirm that when a target selection fails, only
the INTR_DC interrupt flag should be asserted.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Fixes: cf47a41e05 ("esp: latch individual bits in ESP_RINTR register")
Message-Id: <20210518212511.21688-2-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

runstate: Initialize Error * to NULL

Based on the description of error_setg(), the local variable err in
qemu_init_subsystems() should be initialized to NULL.

Fixes: efd7ab22fb ("vl: extract qemu_init_subsystems")
Cc: qemu-stable@nongnu.org
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
Message-Id: <20210610131729.3906565-1-liangpeng10@huawei.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

block/scsi: correctly emulate the VPD block limits page

When the device doesn't support the VPD block limits page, we emulate it even
for SCSI passthrough.

As a part of the emulation we need to add it to the 'Supported VPD Pages'

The code that does this adds it to the page, but it doesn't increase the length
of the data to be copied to the guest, thus the guest never sees the VPD block
limits page as supported.

Bump the transfer size by 1 in this case.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201217165612.942849-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qemu-config: use qemu_opts_from_qdict

Using qemu_opts_absorb_qdict, and then checking for any leftover options,
is redundant because there is already a function that does the same,
qemu_opts_from_qdict. qemu_opts_from_qdict consumes the whole dictionary
and therefore can just return an error message if an option fails to validate.

This also fixes a bug, because the "id" entry was retrieved in
qemu_config_do_parse and then left there by qemu_opts_absorb_qdict.
As a result, it was reported as an unrecognized option.

Reported-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Fixes: 3770141139 ("qemu-config: parse configuration files to a QDict")
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

vl: Fix an assert failure in error path

Based on the description of error_setg(), the local variable err in
qemu_maybe_daemonize() should be initialized to NULL.

Without fix, the uninitialized *errp triggers assert failure which
doesn't show much valuable information.

Before the fix:
qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion `*errp == NULL' failed.

After fix:
qemu-system-x86_64: cannot create PID file: Cannot open pid file: Permission denied

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20210610084741.456260-1-zhenzhong.duan@intel.com>
Cc: qemu-stable@nongnu.org
Fixes: 0546c0609c ("vl: split various early command line options to a separate function", 2020-12-10)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

vnc: avoid deprecation warnings for SASL on OS X

Apple has deprecated sasl.h functions in OS X 10.11. Therefore,
all files that use SASL API need to disable -Wdeprecated-declarations.
Remove the only use that is outside vnc-auth-sasl.c and add the
relevant #pragma GCC diagnostic there.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210604120915.286195-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge remote-tracking branch 'remotes/berrange-gitlab/tags/misc-fixes-pull-request' into staging

Merge misc patches

# gpg: Signature made Mon 14 Jun 2021 15:14:48 BST
# gpg:                using RSA key DAF3A6FDB26B62912D0E8E3FBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" [full]
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>" [full]
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange-gitlab/tags/misc-fixes-pull-request:
  usb/dev-mtp: use GDateTime for formatting timestamp for objects
  block: use GDateTime for formatting timestamp when dumping snapshot info
  migration: use GDateTime for formatting timestamp in snapshot names
  block: remove duplicate trace.h include
  block: add trace point when fdatasync fails
  block: preserve errno from fdatasync failures
  softmmu: add trace point when bdrv_flush_all fails
  migration: add trace point when vm_stop_force_state fails
  sasl: remove comment about obsolete kerberos versions
  docs: recommend SCRAM-SHA-256 SASL mech instead of SHA-1 variant
  docs: document usage of the authorization framework
  docs: document how to pass secret data to QEMU
  docs: add table of contents to QAPI references

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

usb/dev-mtp: use GDateTime for formatting timestamp for objects

The GDateTime APIs provided by GLib avoid portability pitfalls, such
as some platforms where 'struct timeval.tv_sec' field is still 'long'
instead of 'time_t'. When combined with automatic cleanup, GDateTime
often results in simpler code too.

Reviewed-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

block: use GDateTime for formatting timestamp when dumping snapshot info

The GDateTime APIs provided by GLib avoid portability pitfalls, such
as some platforms where 'struct timeval.tv_sec' field is still 'long'
instead of 'time_t'. When combined with automatic cleanup, GDateTime
often results in simpler code too.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

migration: use GDateTime for formatting timestamp in snapshot names

The GDateTime APIs provided by GLib avoid portability pitfalls, such
as some platforms where 'struct timeval.tv_sec' field is still 'long'
instead of 'time_t'. When combined with automatic cleanup, GDateTime
often results in simpler code too.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

block: remove duplicate trace.h include

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

block: add trace point when fdatasync fails

A flush failure is a critical failure scenario for some operations.
For example, it will prevent migration from completing, as it will
make vm_stop() report an error. Thus it is important to have a
trace point present for debugging.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

block: preserve errno from fdatasync failures

When fdatasync() fails on a file backend we set a flag that
short-circuits any future attempts to call fdatasync(). The
first failure returns the true errno, but the later short-
circuited calls return a generic EIO. The latter is unhelpful
because fdatasync() can return a variety of errnos, including
EACCESS.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

softmmu: add trace point when bdrv_flush_all fails

The VM stop process has to flush outstanding I/O and this is a critical
failure scenario that is hard to diagnose. Add a probe point that
records the flush return code.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

migration: add trace point when vm_stop_force_state fails

This is a critical failure scenario for migration that is hard to
diagnose from existing probes. Most likely it is caused by an error
from bdrv_flush(), but we're not logging the errno anywhere, hence
this new probe.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

sasl: remove comment about obsolete kerberos versions

This is not relevant to any OS distro that QEMU currently targets.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

docs: recommend SCRAM-SHA-256 SASL mech instead of SHA-1 variant

The SHA-256 variant better meats modern security expectations.
Also warn that the password file is storing entries in clear
text.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

docs: document usage of the authorization framework

The authorization framework provides a way to control access to network
services after a client has been authenticated. This documents how to
actually use it.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

docs: document how to pass secret data to QEMU

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

docs: add table of contents to QAPI references

The QAPI reference docs for the guest agent, storage daemon and QMP are
all rather long and hard to navigate unless you already know the name of
the command and can do full text search for it.

A table of contents in each doc will help people locate stuff much more
easily.

Reviewed-by: Connor Kuehl <ckuehl@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20210613' into staging

Clean up code_gen_buffer allocation.
Add tcg_remove_ops_after.
Fix tcg_constant_* documentation.
Improve TB chaining documentation.
Fix float32_exp2.
Fix arm tcg_out_op function signature.

# gpg: Signature made Mon 14 Jun 2021 02:12:35 BST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth-gitlab/tags/pull-tcg-20210613: (34 commits)
  docs/devel: Explain in more detail the TB chaining mechanisms
  softfloat: Fix tp init in float32_exp2
  tcg/arm: Fix tcg_out_op function signature
  tcg: Fix documentation for tcg_constant_* vs tcg_temp_free_*
  tcg: Introduce tcg_remove_ops_after
  tcg: Move tcg_init_ctx and tcg_ctx from accel/tcg/
  tcg: When allocating for !splitwx, begin with PROT_NONE
  tcg: Merge buffer protection and guard page protection
  tcg: Round the tb_size default from qemu_get_host_physmem
  util/osdep: Add qemu_mprotect_rw
  tcg: Sink qemu_madvise call to common code
  tcg: Return the map protection from alloc_code_gen_buffer
  tcg: Allocate code_gen_buffer into struct tcg_region_state
  tcg: Move in_code_gen_buffer and tests to region.c
  tcg: Tidy split_cross_256mb
  tcg: Tidy tcg_n_regions
  tcg: Rename region.start to region.after_prologue
  tcg: Replace region.end with region.total_size
  tcg: Move MAX_CODE_GEN_BUFFER_SIZE to tcg-target.h
  tcg: Introduce tcg_max_ctxs
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

docs/devel: Explain in more detail the TB chaining mechanisms

Signed-off-by: Luis Pires <luis.pires@eldorado.org.br>
Message-Id: <20210601125143.191165-1-luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

softfloat: Fix tp init in float32_exp2

Typo in the conversion to FloatParts64.

Fixes: 572c4d862ff2
Fixes: Coverity CID 1457457
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210607223812.110596-1-richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/arm: Fix tcg_out_op function signature

Commit 5e8892db93 fixed several function signatures but tcg_out_op for
arm is missing. This patch fixes it as well.

Signed-off-by: Jose R. Ziviani <jziviani@suse.de>
Message-Id: <20210610224450.23425-1-jziviani@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Fix documentation for tcg_constant_* vs tcg_temp_free_*

At some point during the development of tcg_constant_*, I changed
my mind about whether such temps should be able to be passed to
tcg_temp_free_*. The final version committed allows this, but the
commentary was not updated to match.

Fixes: c0522136adf
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Introduce tcg_remove_ops_after

Introduce a function to remove everything emitted
since a given point.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Move tcg_init_ctx and tcg_ctx from accel/tcg/

These variables belong to the jit side, not the user side.

Since tcg_init_ctx is no longer used outside of tcg/, move
the declaration to tcg-internal.h.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Suggested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: When allocating for !splitwx, begin with PROT_NONE

There's a change in mprotect() behaviour [1] in the latest macOS
on M1 and it's not yet clear if it's going to be fixed by Apple.

In this case, instead of changing permissions of N guard pages,
we change permissions of N rwx regions. The same number of
syscalls are required either way.

[1] https://gist.github.com/hikalium/75ae822466ee4da13cbbe486498a191f

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Merge buffer protection and guard page protection

Do not handle protections on a case-by-case basis in the
various alloc_code_gen_buffer instances; do it within a
single loop in tcg_region_init.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Round the tb_size default from qemu_get_host_physmem

If qemu_get_host_physmem returns an odd number of pages,
then physmem / 8 will not be a multiple of the page size.

The following was observed on a gitlab runner:

ERROR qtest-arm/boot-serial-test - Bail out!
ERROR:../util/osdep.c:80:qemu_mprotect__osdep: \
assertion failed: (!(size & ~qemu_real_host_page_mask))

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

util/osdep: Add qemu_mprotect_rw

For --enable-tcg-interpreter on Windows, we will need this.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Sink qemu_madvise call to common code

Move the call out of the N versions of alloc_code_gen_buffer
and into tcg_region_init.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Return the map protection from alloc_code_gen_buffer

Change the interface from a boolean error indication to a
negative error vs a non-negative protection. For the moment
this is only interface change, not making use of the new data.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Allocate code_gen_buffer into struct tcg_region_state

Do not mess around with setting values within tcg_init_ctx.
Put the values into 'region' directly, which is where they
will live for the lifetime of the program.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Move in_code_gen_buffer and tests to region.c

Shortly, the full code_gen_buffer will only be visible
to region.c, so move in_code_gen_buffer out-of-line.

Move the debugging versions of tcg_splitwx_to_{rx,rw}
to region.c as well, so that the compiler gets to see
the implementation of in_code_gen_buffer.

This leaves exactly one use of in_code_gen_buffer outside
of region.c, in cpu_restore_state. Which, being on the
exception path, is not performance critical.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Tidy split_cross_256mb

Return output buffer and size via output pointer arguments,
rather than returning size via tcg_ctx->code_gen_buffer_size.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Tidy tcg_n_regions

Compute the value using straight division and bounds,
rather than a loop. Pass in tb_size rather than reading
from tcg_init_ctx.code_gen_buffer_size,

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Rename region.start to region.after_prologue

Give the field a name reflecting its actual meaning.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Replace region.end with region.total_size

A size is easier to work with than an end point,
particularly during initial buffer allocation.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Move MAX_CODE_GEN_BUFFER_SIZE to tcg-target.h

Remove the ifdef ladder and move each define into the
appropriate header file.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Introduce tcg_max_ctxs

Finish the divorce of tcg/ from hw/, and do not take
the max cpu value from MachineState; just remember what
we were passed in tcg_init.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: Pass down max_cpus to tcg_init

Start removing the include of hw/boards.h from tcg/.
Pass down the max_cpus value from tcg_init_machine,
where we have the MachineState already.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: Use MiB in tcg_init_machine

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: Merge tcg_exec_init into tcg_init_machine

There is only one caller, and shortly we will need access
to the MachineState, which tcg_init_machine already has.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Create tcg_init

Perform both tcg_context_init and tcg_region_init.
Do not leave this split to the caller.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: Rename tcg_init to tcg_init_machine

We shortly want to use tcg_init for something else.
Since the hook is called init_machine, match that.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: Move alloc_code_gen_buffer to tcg/region.c

Buffer management is integral to tcg. Do not leave the allocation
to code outside of tcg/. This is code movement, with further
cleanups to follow.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

accel/tcg: Inline cpu_gen_init

It consists of one function call and has only one caller.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Split out region.c

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Split out tcg_region_prologue_set

This has only one user, but will make more sense after some
code motion.

Always leave the tcg_init_ctx initialized to the first region,
in preparation for tcg_prologue_init(). This also requires
that we don't re-allocate the region for the first cpu, lest
we hit the assertion for total number of regions allocated .

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Split out tcg_region_initial_alloc

This has only one user, and currently needs an ifdef,
but will make more sense after some code motion.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Remove error return from tcg_region_initial_alloc__locked

All callers immediately assert on error, so move the assert
into the function itself.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Re-order tcg_region_init vs tcg_prologue_init

Instead of delaying tcg_region_init until after tcg_prologue_init
is complete, do tcg_region_init first and let tcg_prologue_init
shrink the first region by the size of the generated prologue.

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

meson: Split out fpu/meson.build

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

meson: Split out tcg/meson.build

Reviewed-by: Luis Pires <luis.pires@eldorado.org.br>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Fri 11 Jun 2021 03:54:51 BST
# gpg:                using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()
  Add the function of colo_compare_cleanup
  Add a function named packet_new_nocopy for COLO.
  Remove migrate_set_block_enabled in checkpoint
  Optimize the function of filter_send
  Fix the qemu crash when guest shutdown during checkpoint
  Remove some duplicate trace code.
  netdev: add more commands to preconfig mode
  vhost-vdpa: remove the unused vhost_vdpa_get_acked_features()
  vhost-vdpa: don't initialize backend_features
  vhost-vdpa: map virtqueue notification area if possible
  vhost-vdpa: skip ram device from the IOTLB mapping

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Fixed calculation error of pkt->header_size in fill_pkt_tcp_info()

The data pointer has skipped vnet_hdr_len in the function of
parse_packet_early().So, we can not subtract vnet_hdr_len again
when calculating pkt->header_size in fill_pkt_tcp_info(). Otherwise,
it will cause network packet comparsion errors and greatly increase
the frequency of checkpoints.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Add the function of colo_compare_cleanup

This patch fixes the following:
    #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
    #1  0x00007f6ae4559859 in __GI_abort () at abort.c:79
    #2  0x0000559aaa386720 in error_exit (err=16, msg=0x559aaa5973d0 <__func__.16227> "qemu_mutex_destroy") at util/qemu-thread-posix.c:36
    #3  0x0000559aaa3868c5 in qemu_mutex_destroy (mutex=0x559aabffe828) at util/qemu-thread-posix.c:69
    #4  0x0000559aaa2f93a8 in char_finalize (obj=0x559aabffe800) at chardev/char.c:285
    #5  0x0000559aaa23318a in object_deinit (obj=0x559aabffe800, type=0x559aabfd7d20) at qom/object.c:606
    #6  0x0000559aaa2331b8 in object_deinit (obj=0x559aabffe800, type=0x559aabfd9060) at qom/object.c:610
    #7  0x0000559aaa233200 in object_finalize (data=0x559aabffe800) at qom/object.c:620
    #8  0x0000559aaa234202 in object_unref (obj=0x559aabffe800) at qom/object.c:1074
    #9  0x0000559aaa2356b6 in object_finalize_child_property (obj=0x559aac0dac10, name=0x559aac778760 "compare0-0", opaque=0x559aabffe800) at qom/object.c:1584
    #10 0x0000559aaa232f70 in object_property_del_all (obj=0x559aac0dac10) at qom/object.c:557
    #11 0x0000559aaa2331ed in object_finalize (data=0x559aac0dac10) at qom/object.c:619
    #12 0x0000559aaa234202 in object_unref (obj=0x559aac0dac10) at qom/object.c:1074
    #13 0x0000559aaa2356b6 in object_finalize_child_property (obj=0x559aac0c75c0, name=0x559aac0dadc0 "chardevs", opaque=0x559aac0dac10) at qom/object.c:1584
    #14 0x0000559aaa233071 in object_property_del_child (obj=0x559aac0c75c0, child=0x559aac0dac10, errp=0x0) at qom/object.c:580
    #15 0x0000559aaa233155 in object_unparent (obj=0x559aac0dac10) at qom/object.c:599
    #16 0x0000559aaa2fb721 in qemu_chr_cleanup () at chardev/char.c:1159
    #17 0x0000559aa9f9b110 in main (argc=54, argv=0x7ffeb62fa998, envp=0x7ffeb62fab50) at vl.c:4539

When chardev is cleaned up, chr_write_lock needs to be destroyed. But
the colo-compare module is not cleaned up normally before it when the
guest poweroff. It is holding chr_write_lock at this time. This will
cause qemu crash.So we add the function of colo_compare_cleanup() before
qemu_chr_cleanup() to fix the bug.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Add a function named packet_new_nocopy for COLO.

Use the packet_new_nocopy instead of packet_new in the
filter-rewriter module. There will be one less memory
copy in the processing of each network packet.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Remove migrate_set_block_enabled in checkpoint

We can detect disk migration in migrate_prepare, if disk migration
is enabled in COLO mode, we can directly report an error.and there
is no need to disable block migration at every checkpoint.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Optimize the function of filter_send

The iov_size has been calculated in filter_send(). we can directly
return the size.In this way, this is no need to repeat calculations
in filter_redirector_receive_iov();

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Fix the qemu crash when guest shutdown during checkpoint

This patch fixes the following:
qemu-system-x86_64: invalid runstate transition: 'colo' ->'shutdown'
Aborted (core dumped)

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Remove some duplicate trace code.

There is the same trace code in the colo_compare_packet_payload.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Lukas Straub <lukasstraub2@web.de>
Tested-by: Lukas Straub <lukasstraub2@web.de>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

netdev: add more commands to preconfig mode

Creating and destroying network backend does not require a fully
constructed machine. Allow the related monitor commands to run before
machine initialization has concluded.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vhost-vdpa: remove the unused vhost_vdpa_get_acked_features()

No user for this helper, let's remove it.

Signed-off-by: Jason Wang <jasowang@redhat.com>

vhost-vdpa: don't initialize backend_features

We used to initialize backend_features during vhost_vdpa_init()
regardless whether or not it was supported by vhost. This will lead
the unsupported features like VIRTIO_F_IN_ORDER to be included and set
to the vhost-vdpa during vhost_dev_start. Because the
VIRTIO_F_IN_ORDER is not supported by vhost-vdpa so it won't be
advertised to guest which will break the datapath.

Fix this by not initializing the backend_features, so the
acked_features could be built only from guest features via
vhost_net_ack_features().

Fixes: 108a64818e69b ("vhost-vdpa: introduce vhost-vdpa backend")
Cc: qemu-stable@nongnu.org
Cc: Gautam Dawar <gdawar@xilinx.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vhost-vdpa: map virtqueue notification area if possible

This patch implements the vq notification mapping support for
vhost-vDPA. This is simply done by using mmap()/munmap() for the
vhost-vDPA fd during device start/stop. For the device without
notification mapping support, we fall back to eventfd based
notification gracefully.

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vhost-vdpa: skip ram device from the IOTLB mapping

vDPA is not tie to any specific hardware, for safety and simplicity,
vhost-vDPA doesn't allow MMIO area to be mapped via IOTLB. Only the
doorbell could be mapped via mmap(). So this patch exclude skip the
ram device from the IOTLB mapping.

Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Merge remote-tracking branch 'remotes/dgilbert-gitlab/tags/pull-migration-20210609a' into staging

Migration pull for 2021-06-09

Yank crash fix from Leo
RDMA fix from Li
mptcp support from me
dirty-rate changes from Hyman and Peter

(Note I've switched to the gitlab I've been using for virtiofs pulls)

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Wed 09 Jun 2021 15:40:01 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert-gitlab/tags/pull-migration-20210609a:
  hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds
  migration/dirtyrate: make sample page count configurable
  sockets: Support multipath TCP
  migration/socket: Close the listener at the end
  migration: Add cleanup hook for inwards migration
  io/net-listener: Call the notifier during finalize
  channel-socket: Only set CLOEXEC if we have space for fds
  migration/rdma: Fix cm event use after free
  yank: Unregister function when using TLS migration

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hmp: Add "calc_dirty_rate" and "info dirty_rate" cmds

These two commands are missing when adding the QMP sister commands.
Add them, so developers can play with them easier.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Message-Id: <4cc0039fc3ad6145136770cf3b0f056c09a2910b.1623027729.git.huangy81@chinatelecom.cn>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/dirtyrate: make sample page count configurable

introduce optional sample-pages argument in calc-dirty-rate,
making sample page count per GB configurable so that more
accurate dirtyrate can be calculated.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Message-Id: <3103453a3b2796f929269c99a6ad81a9a7f1f405.1623027729.git.huangy81@chinatelecom.cn>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Wrapped a couple of long lines

sockets: Support multipath TCP

Multipath TCP allows combining multiple interfaces/routes into a single
socket, with very little work for the user/admin.

It's enabled by 'mptcp' on most socket addresses:

./qemu-system-x86_64 -nographic -incoming tcp:0:4444,mptcp

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20210421112834.107651-6-dgilbert@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/socket: Close the listener at the end

Delay closing the listener until the cleanup hook at the end; mptcp
needs the listener to stay open while the other paths come in.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210421112834.107651-5-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Add cleanup hook for inwards migration

Add a cleanup hook for incoming migration that gets called
at the end as a way for a transport to allow cleanup.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210421112834.107651-4-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

io/net-listener: Call the notifier during finalize

Call the notifier during finalize; it's currently only called
if we change it, which is not the intent.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20210421112834.107651-3-dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>