git.osdn.net Git - qmiga/qemu.git/log

ipmi:smbus: Add a check around a memcpy

In one case:

  memcpy(sid->inmsg + sid->inlen, buf, len);

if len == 0 then sid->inmsg + sig->inlen can point to one past the inmsg
array if the array is full.  We have to allow len == 0 due to some
vagueness in the spec, but we don't have to call memcpy.

Found by Coverity.  This is not a problem in practice, but the results
are technically (maybe) undefined.  So make Coverity happy.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-ppc-20220728' of https://gitlab.com/danielhb/qemu into staging

ppc patch queue for 2022-07-28:

Short queue with 2 Coverity fixes and one fix of the
'wait' insns that is causing hangs if the guest kernel uses
the most up to date wait opcode.

- target/ppc:
  - implement new wait variants to fix guest hang when using the new opcode
- ppc440_uc: initialize length passed to cpu_physical_memory_map()
- spapr_nvdimm: check if spapr_drc_index() returns NULL

# -----BEGIN PGP SIGNATURE-----
#
# iHUEABYKAB0WIQQX6/+ZI9AYAK8oOBk82cqW3gMxZAUCYuK8VgAKCRA82cqW3gMx
# ZOc7AQDPMsFY9NHNqJ3O0MiX4Qoy8IGUreZ9dzZSS3zT1nxtEAD+Lwl0/aGO+dk+
# +NiIO80A5Agy/0g8PHie4qR3EqHEnwA=
# =Q4eR
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 28 Jul 2022 09:41:58 AM PDT
# gpg:                using EDDSA key 17EBFF9923D01800AF2838193CD9CA96DE033164
# gpg: Good signature from "Daniel Henrique Barboza <danielhb413@gmail.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 17EB FF99 23D0 1800 AF28  3819 3CD9 CA96 DE03 3164

* tag 'pull-ppc-20220728' of https://gitlab.com/danielhb/qemu:
  target/ppc: Implement new wait variants
  hw/ppc/ppc440_uc: Initialize length passed to cpu_physical_memory_map()
  hw/ppc: check if spapr_drc_index() returns NULL in spapr_nvdimm.c

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

target/ppc: Implement new wait variants

ISA v2.06 adds new variations of wait, specified by the WC field. These
are not all compatible with the prior wait implementation, because they
add additional conditions that cause the processor to resume, which can
cause software to hang or run very slowly.

At this moment, with the current wait implementation and a pseries guest
using mainline kernel with new wait upcodes [1], QEMU hangs during boot if
more than one CPU is present:

qemu-system-ppc64 -M pseries,x-vof=on -cpu POWER10 -smp 2 -nographic
-kernel zImage.pseries -no-reboot

QEMU will exit (as there's no filesystem) if the test "passes", or hang
during boot if it hits the bug.

ISA v3.0 changed the wait opcode and removed the new variants (retaining
the WC field but making non-zero values reserved).

ISA v3.1 added new WC values to the new wait opcode, and added a PL
field.

This patch implements the new wait encoding and supports WC variants
with no-op implementations, which provides basic correctness as
explained in comments.

[1] https://lore.kernel.org/all/20220720132132.903462-1-npiggin@gmail.com/

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Tested-by: Joel Stanley <joel@jms.id.au>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Message-Id: <20220720133352.904263-1-npiggin@gmail.com>
[danielhb: added information about the bug being fixed]
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

hw/ppc/ppc440_uc: Initialize length passed to cpu_physical_memory_map()

In dcr_write_dma(), there is code that uses cpu_physical_memory_map()
to implement a DMA transfer. That function takes a 'plen' argument,
which points to a hwaddr which is used for both input and output: the
caller must set it to the size of the range it wants to map, and on
return it is updated to the actual length mapped. The dcr_write_dma()
code fails to initialize rlen and wlen, so will end up mapping an
unpredictable amount of memory.

Initialize the length values correctly, and check that we managed to
map the entire range before using the fast-path memmove().

This was spotted by Coverity, which points out that we never
initialized the variables before using them.

Fixes: Coverity CID 1487137, 1487150
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220726182341.1888115-2-peter.maydell@linaro.org>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

hw/ppc: check if spapr_drc_index() returns NULL in spapr_nvdimm.c

spapr_nvdimm_flush_completion_cb() and flush_worker_cb() are using the
DRC object returned by spapr_drc_index() without checking it for NULL.
In this case we would be dereferencing a NULL pointer when doing
SPAPR_NVDIMM(drc->dev) and PC_DIMM(drc->dev).

This can happen if, during a scm_flush(), the DRC object is wrongly
freed/released (e.g. a bug in another part of the code).
spapr_drc_index() would then return NULL in the callbacks.

Fixes: Coverity CID 1487108, 1487178
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20220409200856.283076-2-danielhb413@gmail.com>
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Merge tag 'pull-riscv-to-apply-20220728' of github.com:alistair23/qemu into staging

Sixth RISC-V PR for QEMU 7.1

This is a PR to go in for RC1. It fixes a segfault that occurs
when using multiple sockets on the RISC-V virt board. It also
includes a small fix to allow both Zmmul and M extensions.

* Allow both Zmmul and M extension
* Fix multi-socket plic configuraiton

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEE9sSsRtSTSGjTuM6PIeENKd+XcFQFAmLh33AACgkQIeENKd+X
# cFROBQf/QFxHsIX9clpAkHmK220efQ3rjHZtdCqQoCeRZp2EytFS9KZ6iae/BM9r
# 3Z8cZci38kxjqTzsYJLj46yNO3AxHoFsDH41yWTMOsxjVWVlno/06R/C1B4Ek37N
# kZXWKHzqfQvZRJIUAjKfVxaLtw9xRI9xYqWxVngdYSoW3HWHHz5UmA6fFoJ29QiZ
# SKEgxhakrqhvN9GMm1aWGkLN10uD5lFWOBMYdqMVcWq48XSP3Df5FU2Xk0sfegXq
# EqbIYKJL/Q6koyvmdpQz7VmtMAGjMTcmozEH8oN/MuCk7MCLmbloWVl+LF39SeTH
# 3amapiJBtYBOwaNZUpb5TZkv/bEDIw==
# =ip1R
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 27 Jul 2022 05:59:28 PM PDT
# gpg:                using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: F6C4 AC46 D493 4868 D3B8  CE8F 21E1 0D29 DF97 7054

* tag 'pull-riscv-to-apply-20220728' of github.com:alistair23/qemu:
  hw/intc: sifive_plic: Fix multi-socket plic configuraiton
  RISC-V: Allow both Zmmul and M

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'pull-block-2022-07-27' of https://gitlab.com/vsementsov/qemu into staging

Block: fix parallels block driver

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCgAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmLhi0cACgkQVh8kwfGf
# efs1Dg/7BwbaJu5uZEGhz3+KBRPk5kdYKX60bOojac27pBTVo4OyiP7QFzBt4c6+
# 4yftT2vD7yTyzKANlmIYBvmjoIEw6eB09gJ5/mnUKgxTAS+thKo0e2v1zdncliy+
# h9SEYRT3RhlePJYSssZx8lW4gfCG2JZi5xSjfqbG50X7I8RgDtMmcj7EUwkvCkaI
# WL3iZIuYPxkfFwbQ/6xVmwc6uE97tWom9Z0iyEgFIhtFGlrgV3zJrDJ2CbOXIbi+
# 9c2j4zmnMUZLwtdT2CFwyvO03iU8eMJxqnt4aSyByOAd/rqko+ugHeE53eZkND0Q
# ci4bFq9XjgxOSsIqHXemIEUnuExhMuw5i7dtwR8w7K5Kwc88/44GTUgCZrPnBLx2
# smGX0g7BiCpNYXA8DkquOsUQf8cS67M3rjdTB6SiMo0KuQHe5O0RDQAwu7f+hnTw
# vEyo8dk4xGqUvqYcOpLLBHDis1lghWwseC5gB/M6Q+KqvDF4WDpIWwPLfR1phJ0L
# kA1M9QO+NAcUtLEuT7N22QU8LMTxAX/hSYpR5Jrt5g3R26h7w7VZEvJbpQaytXTY
# VhDVWAGg9Xn3oxGTEyVqGP3Avik9OeoK9gCFiIyTEOL1jfEXqOsX8V6QkpaKP6c+
# WXWiWfV9A9D7O556Z92hUeDuWhQKb2w1dry2e7DPeSWiUmgHtyY=
# =QfFj
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 27 Jul 2022 12:00:23 PM PDT
# gpg:                using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB
# gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>" [unknown]
# gpg:                 aka "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8B9C 26CD B2FD 147C 880E  86A1 561F 24C1 F19F 79FB

* tag 'pull-block-2022-07-27' of https://gitlab.com/vsementsov/qemu:
  iotests/131: Add parallels regression test
  block/parallels: Fix buffer-based write call

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

hw/intc: sifive_plic: Fix multi-socket plic configuraiton

Since commit 40244040a7ac, multi-socket configuration with plic is
broken as the hartid for second socket is calculated incorrectly.
The hartid stored in addr_config already includes the offset
for the base hartid for that socket. Adding it again would lead
to segfault while creating the plic device for the virt machine.
qdev_connect_gpio_out was also invoked with incorrect number of gpio
lines.

Fixes: 40244040a7ac (hw/intc: sifive_plic: Avoid overflowing the addr_config buffer)

Signed-off-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220723090335.671105-1-atishp@rivosinc.com>
[ Changes by AF:
- Change the qdev_connect_gpio_out() numbering
]
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

RISC-V: Allow both Zmmul and M

We got to talking about how Zmmul and M interact with each other
https://github.com/riscv/riscv-isa-manual/issues/869 , and it turns out
that QEMU's behavior is slightly wrong: having Zmmul and M is a legal
combination, it just means that the multiplication instructions are
supported even when M is disabled at runtime via misa.

This just stops overriding M from Zmmul, with that the other checks for
the multiplication instructions work as per the ISA.

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-Id: <20220714180033.22385-1-palmer@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>

Update version for v7.1.0-rc0 release

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'for_upstream' of git://git./virt/kvm/mst/qemu into staging

pc,virtio: fixes

Several fixes. From now on, regression fixes only.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# -----BEGIN PGP SIGNATURE-----
#
# iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmLgQr8PHG1zdEByZWRo
# YXQuY29tAAoJECgfDbjSjVRpGUUIAKtNhrnKopGm4LlRpx8zN3Jc1Jo0nb648gaM
# Oyi+Pl8+hpESUhaWN10XDk38/QuPQfIFeR2ZhfYjFTRlZE+n3X9LVlwL8ejjP8KH
# AcWm78Ff/SLA45aMKMmw74pvEDNsoPYTp7TrfeIej5ub8BIXr8+8pqDdIR9WwtWO
# PbhLNXkTT2yLEs6jCVT4/dyh7zivSkrY7G/RVmtUaFe3PgY8fdW2z3+Txz7UIMgw
# CQoGuAucCO5ToBbs2CbT0V5yxY6G5VO6Qd8g0PzDW4M6GsY/Xr5QCnyJe0jTW0d6
# Dcc7UZFAzGNzyQCxHCic9xwTO+ZcJPJlH5TwknunxOb9xwCx4Qs=
# =zN41
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 26 Jul 2022 12:38:39 PM PDT
# gpg:                using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg:                issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [undefined]
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* tag 'for_upstream' of git://git.kernel.org/pub/scm/virt/kvm/mst/qemu:
  hw/virtio/virtio-iommu: Enforce power-of-two notify for both MAP and UNMAP
  i386/pc: restrict AMD only enforcing of 1Tb hole to new machine type
  i386/pc: relocate 4g start to 1T where applicable
  i386/pc: bounds check phys-bits against max used GPA
  i386/pc: factor out device_memory base/size to helper
  i386/pc: handle unitialized mr in pc_get_cxl_range_end()
  i386/pc: factor out cxl range start to helper
  i386/pc: factor out cxl range end to helper
  i386/pc: factor out above-4g end to an helper
  i386/pc: pass pci_hole64_size to pc_memory_init()
  i386/pc: create pci-host qdev prior to pc_memory_init()
  hw/i386: add 4g boundary start to X86MachineState
  hw/cxl: Fix size of constant in interleave granularity function.
  hw/i386/pc: Always place CXL Memory Regions after device_memory
  hw/machine: Clear out left over CXL related pointer from move of state handling to machines.
  acpi/nvdimm: Define trace events for NVDIMM and substitute nvdimm_debug()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

hw/virtio/virtio-iommu: Enforce power-of-two notify for both MAP and UNMAP

Currently we only enforce power-of-two mappings (required by the QEMU
notifier) for UNMAP requests. A MAP request not aligned on a
power-of-two may be successfully handled by VFIO, and then the
corresponding UNMAP notify will fail because it will attempt to split
that mapping. Ensure MAP and UNMAP notifications are consistent.

Fixes: dde3f08b5cab ("virtio-iommu: Handle non power of 2 range invalidations")
Reported-by: Tina Zhang <tina.zhang@intel.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20220718135636.338264-1-jean-philippe@linaro.org>
Tested-by: Tina Zhang <tina.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

iotests/131: Add parallels regression test

Test an allocating write to a parallels image that has a backing node.
Before HEAD^, doing so used to give me a failed assertion (when the
backing node contains only `42` bytes; the results varies with the value
chosen, for `0` bytes, for example, all I get is EIO).

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20220714132801.72464-3-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

block/parallels: Fix buffer-based write call

Commit a4072543ccdddbd241d5962d9237b8b41fd006bf has changed the I/O here
from working on a local one-element I/O vector to just using the buffer
directly (using the bdrv_co_pread()/bdrv_co_pwrite() helper functions
introduced shortly before).

However, it only changed the bdrv_co_preadv() call to bdrv_co_pread() -
the subsequent bdrv_co_pwritev() call stayed this way, and so still
expects a QEMUIOVector pointer instead of a plain buffer. We must
change that to be a bdrv_co_pwrite() call.

Fixes: a4072543ccdddbd241d5962d ("block/parallels: use buffer-based io")
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20220714132801.72464-2-hreitz@redhat.com>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Merge tag 'pull-target-arm-20220726' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
* Update Coverity component definitions
* target/arm: Add MO_128 entry to pred_esz_masks[]
* configure: Fix portability issues
* hw/display/bcm2835_fb: Fix framebuffer allocation address

# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmLgBfkZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vFdD/wLVC2gJ4Uxt2Ri5vutF6fl
# RKTNiIhcE/XQNUogQiVIERYJJ9CUOALtg3q/SPCItq0nFgNG4h+sB7Ms+VcYVmNd
# iphbYBF4nFXYsAGlYIiAPU4I5SVnL4ORLMovRmlqMGYO/xlWe4LMIIOI+Iky4z9G
# pgho7n0yuKNPwikFdX1nKH2lYvoh9pn/p8buwre4qg6z/p4XssV295NAWeGvynab
# Sj9cmBvQC9ijKADvWXrfaGbHWQCAOwjRI7su/Ky0QGHjEprBpyCC8QtKEPP0flTh
# ffWCPX/pATwkbOH6m7rVFhIpI0r+6UQaDX/5SWruMNRto6WocNbX3JYT4XzdNln9
# nkVTgqn5PTzfd801RmfhJ/iGV2zf3ZE/Entj3n1RrpxI1gb56Q2tFghJNVgnL4Mq
# eBeODhPUJRqOd2dIcFKQbRhQs4Uaonu4V6QM+F7SekdV7VbU5VbJzB/9IvCkpNJo
# TqHDLp3makEabonal2gucmhxon7+C+4NXv+YMzTQbG2g/lVa4kmXehEA5BDcFScE
# XYKBEXkWsabV2IRVaZybu+0qkD+2PNtWQP3iAqOX8RPCGKieu4fbDTbzaPJAPNTb
# OBgDnzO3tukwI1upHQDIuO06poGfwMjJGKR4IZgCphTzNO7AtzUBFR96wmoaJGfq
# t7VO2lnKf5tGPifFTi/egg==
# =SWMq
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 26 Jul 2022 08:19:21 AM PDT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [full]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full]

* tag 'pull-target-arm-20220726' of https://git.linaro.org/people/pmaydell/qemu-arm:
  hw/display/bcm2835_fb: Fix framebuffer allocation address
  configure: Avoid '==' bashism
  configure: Drop dead code attempting to use -msmall-data on alpha hosts
  configure: Don't use bash-specific string-replacement syntax
  configure: Add braces to clarify intent of $emu[[:space:]]
  configure: Add missing POSIX-required space
  target/arm: Add MO_128 entry to pred_esz_masks[]
  scripts/coverity-scan/COMPONENTS.md: Update slirp component info
  scripts/coverity-scan/COMPONENTS.md: Add loongarch component

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

i386/pc: restrict AMD only enforcing of 1Tb hole to new machine type

The added enforcing is only relevant in the case of AMD where the
range right before the 1TB is restricted and cannot be DMA mapped
by the kernel consequently leading to IOMMU INVALID_DEVICE_REQUEST
or possibly other kinds of IOMMU events in the AMD IOMMU.

Although, there's a case where it may make sense to disable the
IOVA relocation/validation when migrating from a
non-amd-1tb-aware qemu to one that supports it.

Relocating RAM regions to after the 1Tb hole has consequences for
guest ABI because we are changing the memory mapping, so make
sure that only new machine enforce but not older ones.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-12-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: relocate 4g start to 1T where applicable

It is assumed that the whole GPA space is available to be DMA
addressable, within a given address space limit, except for a
tiny region before the 4G. Since Linux v5.4, VFIO validates
whether the selected GPA is indeed valid i.e. not reserved by
IOMMU on behalf of some specific devices or platform-defined
restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
-EINVAL.

AMD systems with an IOMMU are examples of such platforms and
particularly may only have these ranges as allowed:

        0000000000000000 - 00000000fedfffff (0      .. 3.982G)
        00000000fef00000 - 000000fcffffffff (3.983G .. 1011.9G)
        0000010000000000 - ffffffffffffffff (1Tb    .. 16Pb[*])

We already account for the 4G hole, albeit if the guest is big
enough we will fail to allocate a guest with  >1010G due to the
~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).

[*] there is another reserved region unrelated to HT that exists
in the 256T boundary in Fam 17h according to Errata #1286,
documeted also in "Open-Source Register Reference for AMD Family
17h Processors (PUB)"

When creating the region above 4G, take into account that on AMD
platforms the HyperTransport range is reserved and hence it
cannot be used either as GPAs. On those cases rather than
establishing the start of ram-above-4g to be 4G, relocate instead
to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical
Topology", for more information on the underlying restriction of
IOVAs.

After accounting for the 1Tb hole on AMD hosts, mtree should
look like:

0000000000000000-000000007fffffff (prio 0, i/o):
         alias ram-below-4g @pc.ram 0000000000000000-000000007fffffff
0000010000000000-000001ff7fffffff (prio 0, i/o):
        alias ram-above-4g @pc.ram 0000000080000000-000000ffffffffff

If the relocation is done or the address space covers it, we
also add the the reserved HT e820 range as reserved.

Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough
to address 1Tb (0xff ffff ffff). On AMD platforms, if a
ram-above-4g relocation is attempted and the CPU wasn't configured
with a big enough phys-bits, an error message will be printed
due to the maxphysaddr vs maxusedaddr check previously added.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-11-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: bounds check phys-bits against max used GPA

Calculate max *used* GPA against the CPU maximum possible address
and error out if the former surprasses the latter. This ensures
max used GPA is reacheable by configured phys-bits. Default phys-bits
on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough for the CPU to
address 1Tb (0xff ffff ffff) or 1010G (0xfc ffff ffff) in AMD hosts
with IOMMU.

This is preparation for AMD guests with >1010G, where it will want relocate
ram-above-4g to be after 1Tb instead of 4G.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-10-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: factor out device_memory base/size to helper

Move obtaining hole64_start from device_memory memory region base/size
into an helper alongside correspondent getters in pc_memory_init() when
the hotplug range is unitialized. While doing that remove the memory
region based logic from this newly added helper.

This is the final step that allows pc_pci_hole64_start() to be callable
at the beginning of pc_memory_init() before any memory regions are
initialized.

Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-9-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: handle unitialized mr in pc_get_cxl_range_end()

Remove pc_get_cxl_range_end() dependency on the CXL memory region,
and replace with one that does not require the CXL host_mr to determine
the start of CXL start.

This in preparation to allow pc_pci_hole64_start() to be called early
in pc_memory_init(), handle CXL memory region end when its underlying
memory region isn't yet initialized.

Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Message-Id: <20220719170014.27028-8-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>

i386/pc: factor out cxl range start to helper

Factor out the calculation of the base address of the memory region.
It will be used later on for the cxl range end counterpart calculation
and as well in pc_memory_init() CXL memory region initialization, thus
avoiding duplication.

Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-7-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: factor out cxl range end to helper

Move calculation of CXL memory region end to separate helper.

This is in preparation to a future change that removes CXL range
dependency on the CXL memory region, with the goal of allowing
pc_pci_hole64_start() to be called before any memory region are
initialized.

Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-6-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: factor out above-4g end to an helper

There's a couple of places that seem to duplicate this calculation
of RAM size above the 4G boundary. Move all those to a helper function.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-5-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: pass pci_hole64_size to pc_memory_init()

Use the pre-initialized pci-host qdev and fetch the
pci-hole64-size into pc_memory_init() newly added argument.
Use PCI_HOST_PROP_PCI_HOLE64_SIZE pci-host property for
fetching pci-hole64-size.

This is in preparation to determine that host-phys-bits are
enough and for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-4-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: create pci-host qdev prior to pc_memory_init()

At the start of pc_memory_init() we usually pass a range of
0..UINT64_MAX as pci_memory, when really its 2G (i440fx) or
32G (q35). To get the real user value, we need to get pci-host
passed property for default pci_hole64_size. Thus to get that,
create the qdev prior to memory init to better make estimations
on max used/phys addr.

This is in preparation to determine that host-phys-bits are
enough and also for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-3-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/i386: add 4g boundary start to X86MachineState

Rather than hardcoding the 4G boundary everywhere, introduce a
X86MachineState field @above_4g_mem_start and use it
accordingly.

This is in preparation for relocating ram-above-4g to be
dynamically start at 1T on AMD platforms.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20220719170014.27028-2-joao.m.martins@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl: Fix size of constant in interleave granularity function.

Whilst the interleave granularity is always small enough that this isn't
a real problem (much less than 4GiB) let's change the constant
to ULL to fix the coverity warning.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Fixes: 829de299d1 ("hw/cxl/component: Add utils for interleave parameter encoding/decoding")
Fixes: Coverity CID 1488868
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220701132300.2264-4-Jonathan.Cameron@huawei.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/i386/pc: Always place CXL Memory Regions after device_memory

Previously broken_reserved_end was taken into account, but Igor Mammedov
identified that this could lead to a clash between potential RAM being
mapped in the region and CXL usage. Hence always add the size of the
device_memory memory region. This only affects the case where the
broken_reserved_end flag was set.

Fixes: 6e4e3ae936e6 ("hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)")
Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220701132300.2264-3-Jonathan.Cameron@huawei.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/machine: Clear out left over CXL related pointer from move of state handling to machines.

This got left behind in the move of the CXL setup code from core
files to the machines that support it.

Link: https://gitlab.com/qemu-project/qemu/-/commit/1ebf9001fb2701e3c00b401334c8f3900a46adaa
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220701132300.2264-2-Jonathan.Cameron@huawei.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

acpi/nvdimm: Define trace events for NVDIMM and substitute nvdimm_debug()

Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Reviewed-by: Jingqi Liu <jingqi.liu@intel.com>
Message-Id: <20220704085852.330005-1-robert.hu@linux.intel.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/display/bcm2835_fb: Fix framebuffer allocation address

This patch fixes the dedicated framebuffer mailbox interface by
removing an unneeded offset. This means that we pick the framebuffer
address in the same way that we do if the guest code uses the buffer
allocate mechanism of the bcm2835_property interface (case
0x00040001: /* Allocate buffer */ in bcm2835_property.c).

The documentation of this mailbox interface doesn't say anything
about using parts of the request buffer address to affect the
chosen framebuffer address:
https://github.com/raspberrypi/firmware/wiki/Mailbox-framebuffer-interface

Some baremetal applications like the Screen01/Screen02 examples from
Baking Pi tutorial[1] didn't work before this patch.

[1] https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/screen01.html

Signed-off-by: Alan Jian <alanjian85@outlook.com>
Message-id: 20220725145838.8412-1-alanjian85@outlook.com
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

configure: Avoid '==' bashism

The '==' operator to test is a bashism; the standard way to copmare
strings is '='. This causes dash to complain:

../../configure: 681: test: linux: unexpected operator

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20220720152631.450903-6-peter.maydell@linaro.org

configure: Drop dead code attempting to use -msmall-data on alpha hosts

In commit 823eb013452e93d we moved the setting of ARCH from configure
to meson.build, but we accidentally left behind one attempt to use
$ARCH in configure, which was trying to add -msmall-data to the
compiler flags on Alpha hosts. Since ARCH is now never set, the test
always fails and we never add the flag.

There isn't actually any need to use this compiler flag on Alpha:
the original intent was that it would allow us to simplify our TCG
codegen on that platform, but we never actually made the TCG changes
that would rely on -msmall-data.

Drop the effectively-dead code from configure, as we don't need it.

This was spotted by shellcheck:

In ./configure line 2254:
case "$ARCH" in
^---^ SC2153: Possible misspelling: ARCH may not be assigned, but arch is.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20220720152631.450903-5-peter.maydell@linaro.org

configure: Don't use bash-specific string-replacement syntax

The variable string-replacement syntax ${var/old/new} is a bashism
(though it is also supported by some other shells), and for instance
does not work with the NetBSD /bin/sh, which complains:
../src/configure: 687: Syntax error: Bad substitution

Replace it with a more portable sed-based approach, similar to
what we already do in quote_sh().

Note that shellcheck also diagnoses this:

In ./configure line 687:
    e=${e/'\'/'\\'}
      ^-----------^ SC2039: In POSIX sh, string replacement is undefined.
           ^-- SC1003: Want to escape a single quote? echo 'This is how it'\''s done'.
                ^-- SC1003: Want to escape a single quote? echo 'This is how it'\''s done'.

In ./configure line 688:
    e=${e/\"/'\"'}
      ^----------^ SC2039: In POSIX sh, string replacement is undefined.

Fixes: 8154f5e64b0cf ("meson: Prefix each element of firmware path")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Tested-by: Thomas Huth <thuth@redhat.com>
Message-id: 20220720152631.450903-4-peter.maydell@linaro.org

configure: Add braces to clarify intent of $emu[[:space:]]

In shell script syntax, $var[something] is not special for variable
expansion: $var is expanded.  However, as it can look as if it were
intended to be an array element access (the correct syntax for which
is ${var[something]}), shellcheck recommends using explicit braces
around ${var} to clarify the intended expansion.

This fixes the warning:

In ./configure line 2346:
        if "$target_ld" -verbose 2>&1 | grep -q "^[[:space:]]*$emu[[:space:]]*$"; then
                                                              ^-- SC1087: Use braces when expanding arrays, e.g. ${array[idx]} (or ${var}[.. to quiet).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20220720152631.450903-3-peter.maydell@linaro.org

configure: Add missing POSIX-required space

In commit 7d7dbf9dc15be6e1 we added a line to the configure script
which is not valid POSIX shell syntax, because it is missing a space
after a '!' character. shellcheck diagnoses this:

if !(GIT="$git" "$source_path/scripts/git-submodule.sh" "$git_submodules_action" "$git_submodules"); then
^-- SC1035: You are missing a required space after the !.

and the OpenBSD shell will not correctly handle this without the space.

Fixes: 7d7dbf9dc15be6e1 ("configure: replace --enable/disable-git-update with --with-git-submodules")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20220720152631.450903-2-peter.maydell@linaro.org

target/arm: Add MO_128 entry to pred_esz_masks[]

In commit 7390e0e9ab8475, we added support for SME loads and stores.
Unlike SVE loads and stores, these include handling of 128-bit
elements. The SME load/store functions call down into the existing
sve_cont_ldst_elements() function, which uses the element size MO_*
value as an index into the pred_esz_masks[] array. Because this code
path now has to handle MO_128, we need to add an extra element to the
array.

This bug was spotted by Coverity because it meant we were reading off
the end of the array.

Resolves: Coverity CID 1490539, 1490541, 1490543, 1490544, 1490545,
1490546, 1490548, 1490549, 1490550, 1490551, 1490555, 1490557,
1490558, 1490560, 1490561, 1490563
Fixes: 7390e0e9ab8475 ("target/arm: Implement SME LD1, ST1")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220718100144.3248052-1-peter.maydell@linaro.org

scripts/coverity-scan/COMPONENTS.md: Update slirp component info

Update the regex for the slirp component now that it lives
solely inside /slirp/, and note that it should be ignored in
Coverity analysis (because it's a separate upstream project
now, and they run Coverity on it themselves).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20220718142310.16013-3-peter.maydell@linaro.org

scripts/coverity-scan/COMPONENTS.md: Add loongarch component

Add the component regex for the new loongarch target.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 20220718142310.16013-2-peter.maydell@linaro.org

Merge tag 'linux-user-for-7.1-pull-request' of https://gitlab.com/laurent_vivier/qemu into staging

linux-user pull request 20220726

# gpg: Signature made Tue 26 Jul 2022 10:44:29 BST
# gpg:                using RSA key CD2F75DDC8E3A4DC2E4F5173F30C38BD3F2FBE3C
# gpg:                issuer "laurent@vivier.eu"
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>" [full]
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>" [full]
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>" [full]
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* tag 'linux-user-for-7.1-pull-request' of https://gitlab.com/laurent_vivier/qemu:
  linux-user: Use target abi_int type for pipefd[1] in pipe()
  linux-user: Unconditionally use pipe2() syscall
  linux-user/hppa: Fix segfaults on page zero

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging

# gpg: Signature made Tue 26 Jul 2022 09:47:24 BST
# gpg:                using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* tag 'net-pull-request' of https://github.com/jasowang/qemu:
  vdpa: Fix memory listener deletions of iova tree
  vhost: Get vring base from vq, not svq
  e1000e: Fix possible interrupt loss when using MSI

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

vdpa: Fix memory listener deletions of iova tree

vhost_vdpa_listener_region_del is always deleting the first iova entry
of the tree, since it's using the needle iova instead of the result's
one.

This was detected using a vga virtual device in the VM using vdpa SVQ.
It makes some extra memory adding and deleting, so the wrong one was
mapped / unmapped. This was undetected before since all the memory was
mappend and unmapped totally without that device, but other conditions
could trigger it too:

* mem_region was with .iova = 0, .translated_addr = (correct GPA).
* iova_tree_find_iova returned right result, but does not update
  mem_region.
* iova_tree_remove always removed region with .iova = 0. Right iova were
  sent to the device.
* Next map will fill the first region with .iova = 0, causing a mapping
  with the same iova and device complains, if the next action is a map.
* Next unmap will cause to try to unmap again iova = 0, causing the
  device to complain that no region was mapped at iova = 0.

Fixes: 34e3c94edaef ("vdpa: Add custom IOTLB translations to SVQ")
Reported-by: Lei Yang <leiyang@redhat.com>
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vhost: Get vring base from vq, not svq

The SVQ vring used idx usually match with the guest visible one, as long
as all the guest buffers (GPA) maps to exactly one buffer within qemu's
VA. However, as we can see in virtqueue_map_desc, a single guest buffer
could map to many buffers in SVQ vring.

Also, its also a mistake to rewind them at the source of migration.
Since VirtQueue is able to migrate the inflight descriptors, its
responsability of the destination to perform the rewind just in case it
cannot report the inflight descriptors to the device.

This makes easier to migrate between backends or to recover them in
vhost devices that support set in flight descriptors.

Fixes: 6d0b22266633 ("vdpa: Adapt vhost_vdpa_get_vring_base to SVQ")
Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

e1000e: Fix possible interrupt loss when using MSI

Commit "e1000e: Prevent MSI/MSI-X storms" introduced msi_causes_pending
to prevent interrupt storms problem. It was tested with MSI-X.

In case of MSI, the guest can rely solely on interrupts to clear ICR.
Upon clearing all pending interrupts, msi_causes_pending gets cleared.
However, when e1000e_itr_should_postpone() in e1000e_send_msi() returns
true, MSI never gets fired by e1000e_intrmgr_on_throttling_timer()
because msi_causes_pending is still set. This results in interrupt loss.

To prevent this, we need to clear msi_causes_pending when MSI is going
to get fired by the throttling timer. The guest can then receive
interrupts eventually.

Signed-off-by: Ake Koomsin <ake@igel.co.jp>
Signed-off-by: Jason Wang <jasowang@redhat.com>

Merge tag 'for-upstream2' of https://gitlab.com/bonzini/qemu into staging

* Bug fixes
* Pass random seed to x86 and other FDT platforms

# gpg: Signature made Fri 22 Jul 2022 18:26:45 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream2' of https://gitlab.com/bonzini/qemu:
  hw/i386: pass RNG seed via setup_data entry
  hw/rx: pass random seed to fdt
  hw/mips: boston: pass random seed to fdt
  hw/nios2: virt: pass random seed to fdt
  oss-fuzz: ensure base_copy is a generic-fuzzer
  oss-fuzz: remove binaries from qemu-bundle tree
  accel/kvm: Avoid Coverity warning in query_stats()
  docs: Add caveats for Windows as the build platform

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

linux-user: Use target abi_int type for pipefd[1] in pipe()

When writing back the fd[1] pipe file handle to emulated userspace
memory, use sizeof(abi_int) as offset insted of the hosts's int type.
There is no functional change in this patch.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <YtQ3Id6z8slpVr7r@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

linux-user: Unconditionally use pipe2() syscall

The pipe2() syscall is available on all Linux platforms since kernel
2.6.27, so use it unconditionally to emulate pipe() and pipe2().

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <YtbZ2ojisTnzxN9Y@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

linux-user/hppa: Fix segfaults on page zero

This program:

    int main(void) { asm("bv %r0(%r0)"); return 0; }

produces on real hppa hardware the expected segfault:

    SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x3} ---
    killed by SIGSEGV +++
    Segmentation fault

But when run on linux-user you get instead internal qemu errors:

ERROR: linux-user/hppa/cpu_loop.c:172:cpu_loop: code should not be reached
Bail out! ERROR: linux-user/hppa/cpu_loop.c:172:cpu_loop: code should not be reached
ERROR: accel/tcg/cpu-exec.c:933:cpu_exec: assertion failed: (cpu == current_cpu)
Bail out! ERROR: accel/tcg/cpu-exec.c:933:cpu_exec: assertion failed: (cpu == current_cpu)

Fix it by adding the missing case for the EXCP_IMP trap in
cpu_loop() and raise a segfault.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <YtWNC56seiV6VenA@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>

hw/i386: pass RNG seed via setup_data entry

Tiny machines optimized for fast boot time generally don't use EFI,
which means a random seed has to be supplied some other way. For this
purpose, Linux (â‰¥5.20) supports passing a seed in the setup_data table
with SETUP_RNG_SEED, specially intended for hypervisors, kexec, and
specialized bootloaders. The linked commit shows the upstream kernel
implementation.

At Paolo's request, we don't pass these to versioned machine types â‰¤7.0.

Link: https://git.kernel.org/tip/tip/c/68b8e9713c8
Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Eduardo Habkost <eduardo@habkost.net>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220721125636.446842-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/rx: pass random seed to fdt

If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This FDT node is part of the DT specification.

Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220719122033.135902-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/mips: boston: pass random seed to fdt

If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This FDT node is part of the DT specification.

I'd do the same for other MIPS platforms but boston is the only one that
seems to use FDT.

Cc: Paul Burton <paulburton@kernel.org>
Cc: Aleksandar Rikalo <aleksandar.rikalo@syrmia.com>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220719120843.134392-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/nios2: virt: pass random seed to fdt

If the FDT contains /chosen/rng-seed, then the Linux RNG will use it to
initialize early. Set this using the usual guest random number
generation function. This FDT node is part of the DT specification.

Cc: Chris Wulff <crwulff@gmail.com>
Cc: Marek Vasut <marex@denx.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20220719120113.118034-1-Jason@zx2c4.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

oss-fuzz: ensure base_copy is a generic-fuzzer

Depending on how the target list is sorted in by qemu, the first target
(used as the base copy of the fuzzer, to which all others are linked)
might not be a generic-fuzzer. Since we are trying to only use
generic-fuzz, on oss-fuzz, fix that, to ensure the base copy is a
generic-fuzzer.

Signed-off-by: Alexander Bulekov <alxndr@bu.edu>
Message-Id: <20220720180946.2264253-1-alxndr@bu.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

oss-fuzz: remove binaries from qemu-bundle tree

oss-fuzz is finding possible fuzzing targets even under qemu-bundle/.../bin, but they
cannot be used because the required shared libraries are missing. Since the
fuzzing targets are already placed manually in $OUT, the bindir and libexecdir
subtrees are not needed; remove them.

Cc: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

accel/kvm: Avoid Coverity warning in query_stats()

Coverity complains that there is a codepath in the query_stats()
function where it can leak the memory pointed to by stats_list. This
can only happen if the caller passes something other than
STATS_TARGET_VM or STATS_TARGET_VCPU as the 'target', which no
callsite does. Enforce this assumption using g_assert_not_reached(),
so that if we have a future bug we hit the assert rather than
silently leaking memory.

Resolves: Coverity CID 1490140
Fixes: cc01a3f4cadd91e6 ("kvm: Support for querying fd-based stats")
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220719134853.327059-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

docs: Add caveats for Windows as the build platform

Commit cf60ccc3306c ("cutils: Introduce bundle mechanism") introduced
a Python script to populate a bundle directory using os.symlink() to
point to the binaries in the pc-bios directory of the source tree.
Commit 882084a04ae9 ("datadir: Use bundle mechanism") removed previous
logic in pc-bios/meson.build to create a link/copy of pc-bios binaries
in the build tree so os.symlink() is the way to go.

However os.symlink() may fail [1] on Windows if an unprivileged Windows
user started the QEMU build process, which results in QEMU executables
generated in the build tree not able to load the default BIOS/firmware
images due to symbolic links not present in the bundle directory.

This commits updates the documentation by adding such caveats for users
who want to build QEMU on the Windows platform.

[1] https://docs.python.org/3/library/os.html#os.symlink

Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com>
Message-Id: <20220719135014.764981-1-bmeng.cn@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* Boolean statistics for KVM
* Fix build on Haiku

# gpg: Signature made Tue 19 Jul 2022 10:32:34 BST
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
  util: Fix broken build on Haiku
  kvm: add support for boolean statistics
  monitor: add support for boolean statistics

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-migration-20220720c' of https://gitlab.com/dagrh/qemu into staging

Migration pull 2022-07-20

This replaces yesterdays pull and:
  a) Fixes some test build errors without TLS
  b) Reenabled the zlib acceleration on s390
     now that we have Ilya's fix

  Hyman's dirty page rate limit set
  Ilya's fix for zlib vs migration
  Peter's postcopy-preempt
  Cleanup from Dan
  zero-copy tidy ups from Leo
  multifd doc fix from Juan
  Revert disable of zlib acceleration on s390x

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Wed 20 Jul 2022 12:18:56 BST
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* tag 'pull-migration-20220720c' of https://gitlab.com/dagrh/qemu: (30 commits)
  Revert "gitlab: disable accelerated zlib for s390x"
  migration: Avoid false-positive on non-supported scenarios for zero-copy-send
  multifd: Document the locking of MultiFD{Send/Recv}Params
  migration/multifd: Report to user when zerocopy not working
  Add dirty-sync-missed-zero-copy migration stat
  QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent
  migration: remove unreachable code after reading data
  tests: Add postcopy preempt tests
  tests: Add postcopy tls recovery migration test
  tests: Add postcopy tls migration test
  tests: Move MigrateCommon upper
  migration: Respect postcopy request order in preemption mode
  migration: Enable TLS for preempt channel
  migration: Export tls-[creds|hostname|authz] params to cmdline too
  migration: Add helpers to detect TLS capability
  migration: Add property x-postcopy-preempt-break-huge
  migration: Create the postcopy preempt channel asynchronously
  migration: Postcopy recover with preempt enabled
  migration: Postcopy preemption enablement
  migration: Postcopy preemption preparation on channel creation
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging

# gpg: Signature made Wed 20 Jul 2022 09:58:47 BST
# gpg:                using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* tag 'net-pull-request' of https://github.com/jasowang/qemu: (25 commits)
  net/colo.c: fix segmentation fault when packet is not parsed correctly
  net/colo.c: No need to track conn_list for filter-rewriter
  net/colo: Fix a "double free" crash to clear the conn_list
  softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH
  vdpa: Add x-svq to NetdevVhostVDPAOptions
  vdpa: Add device migration blocker
  vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  vdpa: Buffer CVQ support on shadow virtqueue
  vdpa: manual forward CVQ buffers
  vhost-net-vdpa: add stubs for when no virtio-net device is present
  vdpa: Export vhost_vdpa_dma_map and unmap calls
  vhost: Add svq avail_handler callback
  vhost: add vhost_svq_poll
  vhost: Expose vhost_svq_add
  vhost: add vhost_svq_push_elem
  vhost: Track number of descs in SVQDescState
  vhost: Add SVQDescState
  vhost: Decouple vhost_svq_add from VirtQueueElement
  vhost: Check for queue full at vhost_svq_add
  vhost: Move vhost_svq_kick call to vhost_svq_add
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-request-2022-07-20' of https://gitlab.com/thuth/qemu into staging

* Fixes for s390x floating point vector instructions

# gpg: Signature made Wed 20 Jul 2022 08:14:50 BST
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2022-07-20' of https://gitlab.com/thuth/qemu:
  tests/tcg/s390x: test signed vfmin/vfmax
  target/s390x: fix NaN propagation rules
  target/s390x: fix handling of zeroes in vfmin/vfmax

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# fpu/softfloat-specialize.c.inc

Revert "gitlab: disable accelerated zlib for s390x"

This reverts commit 309df6acb29346f89e1ee542b1986f60cab12b87.
With Ilya's 'multifd: Copy pages before compressing them with zlib'
in the latest migration series, this shouldn't be a problem any more.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>

migration: Avoid false-positive on non-supported scenarios for zero-copy-send

Migration with zero-copy-send currently has it's limitations, as it can't
be used with TLS nor any kind of compression. In such scenarios, it should
output errors during parameter / capability setting.

But currently there are some ways of setting this not-supported scenarios
without printing the error message:

!) For 'compression' capability, it works by enabling it together with
zero-copy-send. This happens because the validity test for zero-copy uses
the helper unction migrate_use_compression(), which check for compression
presence in s->enabled_capabilities[MIGRATION_CAPABILITY_COMPRESS].

The point here is: the validity test happens before the capability gets
enabled. If all of them get enabled together, this test will not return
error.

In order to fix that, replace migrate_use_compression() by directly testing
the cap_list parameter migrate_caps_check().

2) For features enabled by parameters such as TLS & 'multifd_compression',
there was also a possibility of setting non-supported scenarios: setting
zero-copy-send first, then setting the unsupported parameter.

In order to fix that, also add a check for parameters conflicting with
zero-copy-send on migrate_params_check().

3) XBZRLE is also a compression capability, so it makes sense to also add
it to the list of capabilities which are not supported with zero-copy-send.

Fixes: 1abaec9a1b2c ("migration: Change zero_copy_send from migration parameter to migration capability")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Message-Id: <20220719122345.253713-1-leobras@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

multifd: Document the locking of MultiFD{Send/Recv}Params

Reorder the structures so we can know if the fields are:
- Read only
- Their own locking (i.e. sems)
- Protected by 'mutex'
- Only for the multifd channel

Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220531104318.7494-2-quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Typo fixes from Chen Zhang

migration/multifd: Report to user when zerocopy not working

Some errors, like the lack of Scatter-Gather support by the network
interface(NETIF_F_SG) may cause sendmsg(...,MSG_ZEROCOPY) to fail on using
zero-copy, which causes it to fall back to the default copying mechanism.

After each full dirty-bitmap scan there should be a zero-copy flush
happening, which checks for errors each of the previous calls to
sendmsg(...,MSG_ZEROCOPY). If all of them failed to use zero-copy, then
increment dirty_sync_missed_zero_copy migration stat to let the user know
about it.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220711211112.18951-4-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Add dirty-sync-missed-zero-copy migration stat

Signed-off-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220711211112.18951-3-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

QIOChannelSocket: Fix zero-copy flush returning code 1 when nothing sent

If flush is called when no buffer was sent with MSG_ZEROCOPY, it currently
returns 1. This return code should be used only when Linux fails to use
MSG_ZEROCOPY on a lot of sendmsg().

Fix this by returning early from flush if no sendmsg(...,MSG_ZEROCOPY)
was attempted.

Fixes: 2bc58ffc2926 ("QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX")
Signed-off-by: Leonardo Bras <leobras@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220711211112.18951-2-leobras@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: remove unreachable code after reading data

The code calls qio_channel_read() in a loop when it reports
QIO_CHANNEL_ERR_BLOCK. This code is reported when errno==EAGAIN.

As such the later block of code will always hit the 'errno != EAGAIN'
condition, making the final 'else' unreachable.

Fixes: Coverity CID 1490203
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220627135318.156121-1-berrange@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tests: Add postcopy preempt tests

Four tests are added for preempt mode:

  - Postcopy plain
  - Postcopy recovery
  - Postcopy tls
  - Postcopy tls+recovery

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185530.27801-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  dgilbert: Manual merge

tests: Add postcopy tls recovery migration test

It's easy to build this upon the postcopy tls test. Rename the old
postcopy recovery test to postcopy/recovery/plain.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185527.27747-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge

tests: Add postcopy tls migration test

We just added TLS tests for precopy but not postcopy. Add the
corresponding test for vanilla postcopy.

Rename the vanilla postcopy to "postcopy/plain" because all postcopy tests
will only use unix sockets as channel.

Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185525.27692-1-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: Manual merge

tests: Move MigrateCommon upper

So that it can be used in postcopy tests too soon.

Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185522.27638-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Respect postcopy request order in preemption mode

With preemption mode on, when we see a postcopy request that was requesting
for exactly the page that we have preempted before (so we've partially sent
the page already via PRECOPY channel and it got preempted by another
postcopy request), currently we drop the request so that after all the
other postcopy requests are serviced then we'll go back to precopy stream
and start to handle that.

We dropped the request because we can't send it via postcopy channel since
the precopy channel already contains partial of the data, and we can only
send a huge page via one channel as a whole.  We can't split a huge page
into two channels.

That's a very corner case and that works, but there's a change on the order
of postcopy requests that we handle since we're postponing this (unlucky)
postcopy request to be later than the other queued postcopy requests.  The
problem is there's a possibility that when the guest was very busy, the
postcopy queue can be always non-empty, it means this dropped request will
never be handled until the end of postcopy migration. So, there's a chance
that there's one dest QEMU vcpu thread waiting for a page fault for an
extremely long time just because it's unluckily accessing the specific page
that was preempted before.

The worst case time it needs can be as long as the whole postcopy migration
procedure.  It's extremely unlikely to happen, but when it happens it's not
good.

The root cause of this problem is because we treat pss->postcopy_requested
variable as with two meanings bound together, as the variable shows:

  1. Whether this page request is urgent, and,
  2. Which channel we should use for this page request.

With the old code, when we set postcopy_requested it means either both (1)
and (2) are true, or both (1) and (2) are false.  We can never have (1)
and (2) to have different values.

However it doesn't necessarily need to be like that.  It's very legal that
there's one request that has (1) very high urgency, but (2) we'd like to
use the precopy channel.  Just like the corner case we were discussing
above.

To differenciate the two meanings better, introduce a new field called
postcopy_target_channel, showing which channel we should use for this page
request, so as to cover the old meaning (2) only.  Then we leave the
postcopy_requested variable to stand only for meaning (1), which is the
urgency of this page request.

With this change, we can easily boost priority of a preempted precopy page
as long as we know that page is also requested as a postcopy page.  So with
the new approach in get_queued_page() instead of dropping that request, we
send it right away with the precopy channel so we get back the ordering of
the page faults just like how they're requested on dest.

Reported-by: Manish Mishra <manish.mishra@nutanix.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185520.27583-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Enable TLS for preempt channel

This patch is based on the async preempt channel creation. It continues
wiring up the new channel with TLS handshake to destionation when enabled.

Note that only the src QEMU needs such operation; the dest QEMU does not
need any change for TLS support due to the fact that all channels are
established synchronously there, so all the TLS magic is already properly
handled by migration_tls_channel_process_incoming().

Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185518.27529-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Export tls-[creds|hostname|authz] params to cmdline too

It's useful for specifying tls credentials all in the cmdline (along with
the -object tls-creds-*), especially for debugging purpose.

The trick here is we must remember to not free these fields again in the
finalize() function of migration object, otherwise it'll cause double-free.

The thing is when destroying an object, we'll first destroy the properties
that bound to the object, then the object itself.  To be explicit, when
destroy the object in object_finalize() we have such sequence of
operations:

    object_property_del_all(obj);
    object_deinit(obj, ti);

So after this change the two fields are properly released already even
before reaching the finalize() function but in object_property_del_all(),
hence we don't need to free them anymore in finalize() or it's double-free.

This also fixes a trivial memory leak for tls-authz as we forgot to free it
before this patch.

Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185515.27475-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Add helpers to detect TLS capability

Add migrate_channel_requires_tls() to detect whether the specific channel
requires TLS, leveraging the recently introduced migrate_use_tls(). No
functional change intended.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185513.27421-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Add property x-postcopy-preempt-break-huge

Add a property field that can conditionally disable the "break sending huge
page" behavior in postcopy preemption. By default it's enabled.

It should only be used for debugging purposes, and we should never remove
the "x-" prefix.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185511.27366-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Create the postcopy preempt channel asynchronously

This patch allows the postcopy preempt channel to be created
asynchronously.  The benefit is that when the connection is slow, we won't
take the BQL (and potentially block all things like QMP) for a long time
without releasing.

A function postcopy_preempt_wait_channel() is introduced, allowing the
migration thread to be able to wait on the channel creation.  The channel
is always created by the main thread, in which we'll kick a new semaphore
to tell the migration thread that the channel has created.

We'll need to wait for the new channel in two places: (1) when there's a
new postcopy migration that is starting, or (2) when there's a postcopy
migration to resume.

For the start of migration, we don't need to wait for this channel until
when we want to start postcopy, aka, postcopy_start().  We'll fail the
migration if we found that the channel creation failed (which should
probably not happen at all in 99% of the cases, because the main channel is
using the same network topology).

For a postcopy recovery, we'll need to wait in postcopy_pause().  In that
case if the channel creation failed, we can't fail the migration or we'll
crash the VM, instead we keep in PAUSED state, waiting for yet another
recovery.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Manish Mishra <manish.mishra@nutanix.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185509.27311-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Postcopy recover with preempt enabled

To allow postcopy recovery, the ram fast load (preempt-only) dest QEMU thread
needs similar handling on fault tolerance.  When ram_load_postcopy() fails,
instead of stopping the thread it halts with a semaphore, preparing to be
kicked again when recovery is detected.

A mutex is introduced to make sure there's no concurrent operation upon the
socket.  To make it simple, the fast ram load thread will take the mutex during
its whole procedure, and only release it if it's paused.  The fast-path socket
will be properly released by the main loading thread safely when there's
network failures during postcopy with that mutex held.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185506.27257-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Postcopy preemption enablement

This patch enables postcopy-preempt feature.

It contains two major changes to the migration logic:

(1) Postcopy requests are now sent via a different socket from precopy
    background migration stream, so as to be isolated from very high page
    request delays.

(2) For huge page enabled hosts: when there's postcopy requests, they can now
    intercept a partial sending of huge host pages on src QEMU.

After this patch, we'll live migrate a VM with two channels for postcopy: (1)
PRECOPY channel, which is the default channel that transfers background pages;
and (2) POSTCOPY channel, which only transfers requested pages.

There's no strict rule of which channel to use, e.g., if a requested page is
already being transferred on precopy channel, then we will keep using the same
precopy channel to transfer the page even if it's explicitly requested.  In 99%
of the cases we'll prioritize the channels so we send requested page via the
postcopy channel as long as possible.

On the source QEMU, when we found a postcopy request, we'll interrupt the
PRECOPY channel sending process and quickly switch to the POSTCOPY channel.
After we serviced all the high priority postcopy pages, we'll switch back to
PRECOPY channel so that we'll continue to send the interrupted huge page again.
There's no new thread introduced on src QEMU.

On the destination QEMU, one new thread is introduced to receive page data from
the postcopy specific socket (done in the preparation patch).

This patch has a side effect: after sending postcopy pages, previously we'll
assume the guest will access follow up pages so we'll keep sending from there.
Now it's changed.  Instead of going on with a postcopy requested page, we'll go
back and continue sending the precopy huge page (which can be intercepted by a
postcopy request so the huge page can be sent partially before).

Whether that's a problem is debatable, because "assuming the guest will
continue to access the next page" may not really suite when huge pages are
used, especially if the huge page is large (e.g. 1GB pages).  So that locality
hint is much meaningless if huge pages are used.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185504.27203-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Postcopy preemption preparation on channel creation

Create a new socket for postcopy to be prepared to send postcopy requested
pages via this specific channel, so as to not get blocked by precopy pages.

A new thread is also created on dest qemu to receive data from this new channel
based on the ram_load_postcopy() routine.

The ram_load_postcopy(POSTCOPY) branch and the thread has not started to
function, and that'll be done in follow up patches.

Cleanup the new sockets on both src/dst QEMUs, meanwhile look after the new
thread too to make sure it'll be recycled properly.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185502.27149-1-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
dgilbert: With Peter's fix to quieten compiler warning on
start_migration

migration: Add postcopy-preempt capability

Firstly, postcopy already preempts precopy due to the fact that we do
unqueue_page() first before looking into dirty bits.

However that's not enough, e.g., when there're host huge page enabled, when
sending a precopy huge page, a postcopy request needs to wait until the whole
huge page that is sending to finish. That could introduce quite some delay,
the bigger the huge page is the larger delay it'll bring.

This patch adds a new capability to allow postcopy requests to preempt existing
precopy page during sending a huge page, so that postcopy requests can be
serviced even faster.

Meanwhile to send it even faster, bypass the precopy stream by providing a
standalone postcopy socket for sending requested pages.

Since the new behavior will not be compatible with the old behavior, this will
not be the default, it's enabled only when the new capability is set on both
src/dst QEMUs.

This patch only adds the capability itself, the logic will be added in follow
up patches.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20220707185342.26794-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

multifd: Copy pages before compressing them with zlib

zlib_send_prepare() compresses pages of a running VM. zlib does not
make any thread-safety guarantees with respect to changing deflate()
input concurrently with deflate() [1].

One can observe problems due to this with the IBM zEnterprise Data
Compression accelerator capable zlib [2]. When the hardware
acceleration is enabled, migration/multifd/tcp/plain/zlib test fails
intermittently [3] due to sliding window corruption. The accelerator's
architecture explicitly discourages concurrent accesses [4]:

    Page 26-57, "Other Conditions":

    As observed by this CPU, other CPUs, and channel
    programs, references to the parameter block, first,
    second, and third operands may be multiple-access
    references, accesses to these storage locations are
    not necessarily block-concurrent, and the sequence
    of these accesses or references is undefined.

Mark Adler pointed out that vanilla zlib performs double fetches under
certain circumstances as well [5], therefore we need to copy data
before passing it to deflate().

[1] https://zlib.net/manual.html
[2] https://github.com/madler/zlib/pull/410
[3] https://lists.nongnu.org/archive/html/qemu-devel/2022-03/msg03988.html
[4] http://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf
[5] https://lists.gnu.org/archive/html/qemu-devel/2022-07/msg00889.html

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20220705203559.2960949-1-iii@linux.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tests: Add dirty page rate limit test

Add dirty page rate limit test if kernel support dirty ring,

The following qmp commands are covered by this test case:
"calc-dirty-rate", "query-dirty-rate", "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit" and "query-vcpu-dirty-limit".

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <eed5b847a6ef0a9c02a36383dbdd7db367dd1e7e.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

softmmu/dirtylimit: Implement dirty page rate limit

Implement dirtyrate calculation periodically basing on
dirty-ring and throttle virtual CPU until it reachs the quota
dirty page rate given by user.

Introduce qmp commands "set-vcpu-dirty-limit",
"cancel-vcpu-dirty-limit", "query-vcpu-dirty-limit"
to enable, disable, query dirty page limit for virtual CPU.

Meanwhile, introduce corresponding hmp commands
"set_vcpu_dirty_limit", "cancel_vcpu_dirty_limit",
"info vcpu_dirty_limit" so the feature can be more usable.

"query-vcpu-dirty-limit" success depends on enabling dirty
page rate limit, so just add it to the list of skipped
command to ensure qmp-cmd-test run successfully.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <4143f26706d413dd29db0b672fe58b3d3fbe34bc.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

softmmu/dirtylimit: Implement virtual CPU throttle

Setup a negative feedback system when vCPU thread
handling KVM_EXIT_DIRTY_RING_FULL exit by introducing
throttle_us_per_full field in struct CPUState. Sleep
throttle_us_per_full microseconds to throttle vCPU
if dirtylimit is in service.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <977e808e03a1cef5151cae75984658b6821be618.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function

Introduce kvm_dirty_ring_size util function to help calculate
dirty ring ful time.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <f9ce1f550bfc0e3a1f711e17b1dbc8f701700e56.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically

Introduce the third method GLOBAL_DIRTY_LIMIT of dirty
tracking for calculate dirtyrate periodly for dirty page
rate limit.

Add dirtylimit.c to implement dirtyrate calculation periodly,
which will be used for dirty page rate limit.

Add dirtylimit.h to export util functions for dirty page rate
limit implementation.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <5d0d641bffcb9b1c4cc3e323b6dfecb36050d948.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/dirtyrate: Refactor dirty page rate calculation

abstract out dirty log change logic into function
global_dirty_log_change.

abstract out dirty page rate calculation logic via
dirty-ring into function vcpu_calculate_dirtyrate.

abstract out mathematical dirty page rate calculation
into do_calculate_dirtyrate, decouple it from DirtyStat.

rename set_sample_page_period to dirty_stat_wait, which
is well-understood and will be reused in dirtylimit.

handle cpu hotplug/unplug scenario during measurement of
dirty page rate.

export util functions outside migration.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <7b6f6f4748d5b3d017b31a0429e630229ae97538.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

cpus: Introduce cpu_list_generation_id

Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.

cpu_list_generation_id could be used to detect changes of cpu
list, which is prepared for dirty page rate measurement.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <06e1f1362b2501a471dce796abb065b04f320fa5.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping

Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.

Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Reviewed-by: Peter Xu <peterx@redhat.com>
Message-Id: <c32001242875e83b0d9f78f396fe2dcd380ba9e8.1656177590.git.huangy81@chinatelecom.cn>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Merge tag 'pull-hex-20220719-1' of https://github.com/quic/qemu into staging

Recall that the semantics of a Hexagon mem_noshuf packet are that the
store effectively happens before the load.  There are two bug fixes
in this series.

# gpg: Signature made Tue 19 Jul 2022 22:25:19 BST
# gpg:                using RSA key 3635C788CE62B91FD4C59AB47B0244FB12DE4422
# gpg: Good signature from "Taylor Simpson (Rock on) <tsimpson@quicinc.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 3635 C788 CE62 B91F D4C5  9AB4 7B02 44FB 12DE 4422

* tag 'pull-hex-20220719-1' of https://github.com/quic/qemu:
  Hexagon (target/hexagon) fix bug in mem_noshuf load exception
  Hexagon (target/hexagon) fix store w/mem_noshuf & predicated load

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

net/colo.c: fix segmentation fault when packet is not parsed correctly

When COLO use only one vnet_hdr_support parameter between
filter-redirector and filter-mirror(or colo-compare), COLO will crash
with segmentation fault. Back track as follow:

Thread 1 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
296         uint16_t proto = be16_to_cpu(PKT_GET_ETH_HDR(p)->h_proto);
(gdb) bt
0  0x0000555555cb200b in eth_get_l2_hdr_length (p=0x0)
    at /home/tao/project/COLO/colo-qemu/include/net/eth.h:296
1  0x0000555555cb22b4 in parse_packet_early (pkt=0x555556a44840) at
net/colo.c:49
2  0x0000555555cb2b91 in is_tcp_packet (pkt=0x555556a44840) at
net/filter-rewriter.c:63

So wrong vnet_hdr_len will cause pkt->data become NULL. Add check to
raise error and add trace-events to track vnet_hdr_len.

Signed-off-by: Tao Xu <tao3.xu@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net/colo.c: No need to track conn_list for filter-rewriter

Filter-rewriter no need to track connection in conn_list.
This patch fix the glib g_queue_is_empty assertion when COLO guest
keep a lot of network connection.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net/colo: Fix a "double free" crash to clear the conn_list

We notice the QEMU may crash when the guest has too many
incoming network connections with the following log:

15197@1593578622.668573:colo_proxy_main : colo proxy connection hashtable full, clear it
free(): invalid pointer
[1]    15195 abort (core dumped)  qemu-system-x86_64 ....

This is because we create the s->connection_track_table with
g_hash_table_new_full() which is defined as:

GHashTable * g_hash_table_new_full (GHashFunc hash_func,
                       GEqualFunc key_equal_func,
                       GDestroyNotify key_destroy_func,
                       GDestroyNotify value_destroy_func);

The fourth parameter connection_destroy() will be called to free the
memory allocated for all 'Connection' values in the hashtable when
we call g_hash_table_remove_all() in the connection_hashtable_reset().

But both connection_track_table and conn_list reference to the same
conn instance. It will trigger double free in conn_list clear. So this
patch remove free action on hash table side to avoid double free the
conn.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

softmmu/runstate.c: add RunStateTransition support form COLO to PRELAUNCH

If the checkpoint occurs when the guest finishes restarting
but has not started running, the runstate_set() may reject
the transition from COLO to PRELAUNCH with the crash log:

{"timestamp": {"seconds": 1593484591, "microseconds": 26605},\
"event": "RESET", "data": {"guest": true, "reason": "guest-reset"}}
qemu-system-x86_64: invalid runstate transition: 'colo' -> 'prelaunch'

Long-term testing says that it's pretty safe.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vdpa: Add x-svq to NetdevVhostVDPAOptions

Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vdpa: Add device migration blocker

Since the vhost-vdpa device is exposing _F_LOG, adding a migration blocker if
it uses CVQ.

However, qemu is able to migrate simple devices with no CVQ as long as
they use SVQ. To allow it, add a placeholder error to vhost_vdpa, and
only add to vhost_dev when used. vhost_dev machinery place the migration
blocker if needed.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs

To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vdpa: Buffer CVQ support on shadow virtqueue

Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like rx filtering.

Virtio-net control VQ copies the descriptors to qemu's VA, so we avoid
TOCTOU with the guest's or device's memory every time there is a device
model change. Otherwise, the guest could change the memory content in
the time between qemu and the device read it.

To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR is
implemented. If the virtio-net driver changes MAC the virtio-net device
model will be updated with the new one, and a rx filtering change event
will be raised.

More cvq commands could be added here straightforwardly but they have
not been tested.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

vdpa: manual forward CVQ buffers

Do a simple forwarding of CVQ buffers, the same work SVQ could do but
through callbacks. No functional change intended.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>