git.osdn.net Git - qmiga/qemu.git/log

Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging

Block layer patches

- Code cleanups around block graph modification
- Simplify drain
- coroutine_fn correctness fixes, including splitting generated
  coroutine wrappers into co_wrapper (to be called only from
  non-coroutine context) and co_wrapper_mixed (both coroutine and
  non-coroutine context)
- Introduce a block graph rwlock

# gpg: Signature made Thu 15 Dec 2022 15:08:34 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (50 commits)
  block: GRAPH_RDLOCK for functions only called by co_wrappers
  block: use co_wrapper_mixed_bdrv_rdlock in functions taking the rdlock
  block-coroutine-wrapper.py: introduce annotations that take the graph rdlock
  Mark assert_bdrv_graph_readable/writable() GRAPH_RD/WRLOCK
  graph-lock: TSA annotations for lock/unlock functions
  block: assert that graph read and writes are performed correctly
  block: remove unnecessary assert_bdrv_graph_writable()
  block: wrlock in bdrv_replace_child_noperm
  block: Fix locking in external_snapshot_prepare()
  test-bdrv-drain: Fix incorrrect drain assumptions
  clang-tsa: Add macros for shared locks
  clang-tsa: Add TSA_ASSERT() macro
  Import clang-tsa.h
  async: Register/unregister aiocontext in graph lock list
  graph-lock: Implement guard macros
  graph-lock: Introduce a lock to protect block graph operations
  block: Factor out bdrv_drain_all_begin_nopoll()
  block/dirty-bitmap: convert coroutine-only functions to co_wrapper
  block: convert bdrv_create to co_wrapper
  block-coroutine-wrapper.py: support also basic return types
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-request-2022-12-15' of https://gitlab.com/thuth/qemu into staging

* s390x PCI fixes and improvements (for the ISM device)
* Fix emulated MVCP and MVCS s390x instructions
* Clean-ups for the e1000e qtest
* Enable qtests on Windows
* Update FreeBSD CI to version 12.4
* Check --disable-tcg for ppc64 in the CI
* Improve scripts/make-releases a little bit
* Many other misc small clean-ups and fixes here and there

# gpg: Signature made Thu 15 Dec 2022 15:05:44 GMT
# gpg:                using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5
# gpg:                issuer "thuth@redhat.com"
# gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full]
# gpg:                 aka "Thomas Huth <thuth@redhat.com>" [full]
# gpg:                 aka "Thomas Huth <huth@tuxfamily.org>" [full]
# gpg:                 aka "Thomas Huth <th.huth@posteo.de>" [unknown]
# Primary key fingerprint: 27B8 8847 EEE0 2501 18F3  EAB9 2ED9 D774 FE70 2DB5

* tag 'pull-request-2022-12-15' of https://gitlab.com/thuth/qemu: (23 commits)
  tests/qtest/vhost-user-blk-test: don't abort all qtests on missing envar
  .gitlab/issue_templates: Move suggestions into comments
  gitlab-ci: Check building ppc64 without TCG
  FreeBSD: Upgrade to 12.4 release
  tests/qtest: Enable qtest build on Windows
  .gitlab-ci.d/windows.yml: Exclude qTests from 64-bit CI job for now
  .gitlab-ci.d/windows.yml: Keep 64-bit and 32-bit build scripts consistent
  .gitlab-ci.d/windows.yml: Unify the prerequisite packages
  tests/qtest/libqos/e1000e: Correctly group register accesses
  tests/qtest/e1000e-test: De-duplicate constants
  tests/qtest/libqos/e1000e: Remove "other" interrupts
  hw: Include the VMWare devices only in the x86 targets
  MAINTAINERS: Add documentation files to the corresponding sections
  util/oslib-win32: Remove obsolete reference to g_poll code
  util/qemu-config: Fix "query-command-line-options" to provide the right values
  scripts/make-release: Only clone single branches to speed up the script
  scripts/make-release: Add a simple help text for the script
  monitor/misc: Remove superfluous include statements
  target/s390x: The MVCP and MVCS instructions are not privileged
  target/s390x/tcg/mem_helper: Test the right bits in psw_key_valid()
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge tag 'pull-target-arm-20221215-1' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
* hw/arm/virt: Add properties to allow more granular
   configuration of use of highmem space
* target/arm: Add Cortex-A55 CPU
* hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement
* Implement FEAT_EVT
* Some 3-phase-reset conversions for Arm GIC, SMMU
* hw/arm/boot: set initrd with #address-cells type in fdt
* hw/misc: Move some arm-related files from specific_ss into softmmu_ss
* Restrict arm_cpu_exec_interrupt() to TCG accelerator

# gpg: Signature made Thu 15 Dec 2022 17:38:36 GMT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# gpg:                 aka "Peter Maydell <peter@archaic.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* tag 'pull-target-arm-20221215-1' of https://git.linaro.org/people/pmaydell/qemu-arm: (28 commits)
  target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator
  hw/misc: Move some arm-related files from specific_ss into softmmu_ss
  hw/arm/boot: set initrd with #address-cells type in fdt
  hw/intc: Convert TYPE_KVM_ARM_ITS to 3-phase reset
  hw/intc: Convert TYPE_ARM_GICV3_ITS to 3-phase reset
  hw/intc: Convert TYPE_ARM_GICV3_ITS_COMMON to 3-phase reset
  hw/intc: Convert TYPE_KVM_ARM_GICV3 to 3-phase reset
  hw/intc: Convert TYPE_ARM_GICV3_COMMON to 3-phase reset
  hw/intc: Convert TYPE_ARM_GIC_KVM to 3-phase reset
  hw/intc: Convert TYPE_ARM_GIC_COMMON to 3-phase reset
  hw/arm: Convert TYPE_ARM_SMMUV3 to 3-phase reset
  hw/arm: Convert TYPE_ARM_SMMU to 3-phase reset
  target/arm: Report FEAT_EVT for TCG '-cpu max'
  target/arm: Implement HCR_EL2.TID4 traps
  target/arm: Implement HCR_EL2.TICAB,TOCU traps
  target/arm: Implement HCR_EL2.TTLBOS traps
  target/arm: Implement HCR_EL2.TTLBIS traps
  target/arm: Allow relevant HCR bits to be written for FEAT_EVT
  hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement
  target/arm: Add Cortex-A55 CPU
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Restrict arm_cpu_exec_interrupt() to TCG accelerator

When building with --disable-tcg on Darwin we get:

  target/arm/cpu.c:725:16: error: incomplete definition of type 'struct TCGCPUOps'
    cc->tcg_ops->do_interrupt(cs);
    ~~~~~~~~~~~^

Commit 083afd18a9 ("target/arm: Restrict cpu_exec_interrupt()
handler to sysemu") limited this block to system emulation,
but neglected to also limit it to TCG.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Message-id: 20221209110823.59495-1-philmd@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/misc: Move some arm-related files from specific_ss into softmmu_ss

The header target/arm/kvm-consts.h checks CONFIG_KVM which is marked as
poisoned in common code, so the files that include this header have to
be added to specific_ss and recompiled for each, qemu-system-arm and
qemu-system-aarch64. However, since the kvm headers are only optionally
used in kvm-constants.h for some sanity checks, we can additionally
check the NEED_CPU_H macro first to avoid the poisoned CONFIG_KVM macro,
so kvm-constants.h can also be used from "common" files (without the
sanity checks - which should be OK since they are still done from other
target-specific files instead). This way, and by adjusting some other
include statements in the related files here and there, we can move some
files from specific_ss into softmmu_ss, so that they only need to be
compiled once during the build process.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221202154023.293614-1-thuth@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

block: GRAPH_RDLOCK for functions only called by co_wrappers

The generated coroutine wrappers already take care to take the lock in
the non-coroutine path, and assume that the lock is already taken in the
coroutine path.

The only thing we need to do for the wrapped function is adding the
GRAPH_RDLOCK annotation. Doing so also allows us to mark the
corresponding callbacks in BlockDriver as GRAPH_RDLOCK_PTR.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-19-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: use co_wrapper_mixed_bdrv_rdlock in functions taking the rdlock

Take the rdlock already, before we add the assertions.

All these functions either read the graph recursively, or call
BlockDriver callbacks that will eventually need to be protected by the
graph rdlock.

Do it now to all functions together, because many of these recursively
call each other.

For example, bdrv_co_truncate calls BlockDriver->bdrv_co_truncate, and
some driver callbacks implement their own .bdrv_co_truncate by calling
bdrv_flush inside. So if bdrv_flush asserts but bdrv_truncate does not
take the rdlock yet, the assertion will always fail.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-18-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-coroutine-wrapper.py: introduce annotations that take the graph rdlock

Add co_wrapper_bdrv_rdlock and co_wrapper_mixed_bdrv_rdlock option to
the block-coroutine-wrapper.py script.

This "_bdrv_rdlock" option takes and releases the graph rdlock when a
coroutine function is created.

This means that when used together with "_mixed", the function marked
with co_wrapper_mixed_bdrv_rdlock will support both coroutine and
non-coroutine case, and in the latter case it will create a coroutine
that takes and releases the rdlock. When called from a coroutine, the
caller must already hold the graph lock.

Example:
void co_wrapper_mixed_bdrv_rdlock bdrv_f1();

Becomes

static void bdrv_co_enter_f1()
{
    bdrv_graph_co_rdlock();
    bdrv_co_function();
    bdrv_graph_co_rdunlock();
}

void bdrv_f1()
{
    if (qemu_in_coroutine) {
        assume_graph_lock();
        bdrv_co_function();
    } else {
        qemu_co_enter(bdrv_co_enter_f1);
        ...
    }
}

When used alone, the function will not work in coroutine context, and
when called in non-coroutine context it will create a new coroutine that
takes care of taking and releasing the rdlock automatically.

Example:
void co_wrapper_bdrv_rdlock bdrv_f1();

Becomes

static void bdrv_co_enter_f1()
{
    bdrv_graph_co_rdlock();
    bdrv_co_function();
    bdrv_graph_co_rdunlock();
}

void bdrv_f1()
{
    assert(!qemu_in_coroutine());
    qemu_co_enter(bdrv_co_enter_f1);
    ...
}

About their usage:
- co_wrapper does not take the rdlock, so it can be used also outside
  the block layer.
- co_wrapper_mixed will be used by many blk_* functions, since the
  coroutine function needs to call blk_wait_while_drained() and
  the rdlock *must* be taken afterwards, otherwise it's a deadlock.
  In the future this annotation will go away, and blk_* will use
  co_wrapper directly.
- co_wrapper_bdrv_rdlock will be used by BlockDriver callbacks, ideally
  by all of them in the future.
- co_wrapper_mixed_bdrv_rdlock will be used by the remaining functions
  that are still called by coroutine and non-coroutine context. In the
  future this annotation will go away, as we will split such mixed
  functions.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-17-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Mark assert_bdrv_graph_readable/writable() GRAPH_RD/WRLOCK

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-16-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

graph-lock: TSA annotations for lock/unlock functions

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-15-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: assert that graph read and writes are performed correctly

Remove the old assert_bdrv_graph_writable, and replace it with
the new version using graph-lock API.

See the function documentation for more information.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-14-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: remove unnecessary assert_bdrv_graph_writable()

We don't protect bdrv->aio_context with the graph rwlock,
so these assertions are not needed

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-13-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: wrlock in bdrv_replace_child_noperm

Protect the main function where graph is modified.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-12-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix locking in external_snapshot_prepare()

bdrv_img_create() polls internally (when calling bdrv_create(), which is
a co_wrapper), so it can't be called while holding the lock of any
AioContext except the current one without causing deadlocks. Drop the
lock around the call in external_snapshot_prepare().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-11-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

test-bdrv-drain: Fix incorrrect drain assumptions

The test case assumes that a drain only happens in one specific place
where it drains explicitly. This assumption happened to hold true until
now, but block layer functions may drain interally (any graph
modifications are going to do that through bdrv_graph_wrlock()), so this
is incorrect. Make sure that the test code in .drained_begin only runs
where we actually want it to run.

When scheduling a BH from .drained_begin, we also need to increase the
in_flight counter to make sure that the operation is actually completed
in time before the node that it works on goes away.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-10-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

clang-tsa: Add macros for shared locks

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-8-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

clang-tsa: Add TSA_ASSERT() macro

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-7-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Import clang-tsa.h

This defines macros that allow clang to perform Thread Safety Analysis
based on function and variable annotations that specify the locking
rules. On non-clang compilers, the annotations are ignored.

Imported tsa.h from the original repository with the pthread_mutex_t
wrapper removed:

https://github.com/jhi/clang-thread-safety-analysis-for-c.git

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-6-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

async: Register/unregister aiocontext in graph lock list

Add/remove the AioContext in aio_context_list in graph-lock.c when it is
created/destroyed. This allows using the graph locking operations from
this AioContext.

In order to allow linking util/async.c with binaries that don't include
the block layer, introduce stubs for (un)register_aiocontext().

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-5-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

graph-lock: Implement guard macros

Similar to the implementation in lockable.h, implement macros to
automatically take and release the rdlock.

Create the empty GraphLockable and GraphLockableMainloop structs only to
use it as a type for G_DEFINE_AUTOPTR_CLEANUP_FUNC.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-4-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

graph-lock: Introduce a lock to protect block graph operations

Block layer graph operations are always run under BQL in the main loop.
This is proved by the assertion qemu_in_main_thread() and its wrapper
macro GLOBAL_STATE_CODE.

However, there are also concurrent coroutines running in other iothreads
that always try to traverse the graph. Currently this is protected
(among various other things) by the AioContext lock, but once this is
removed, we need to make sure that reads do not happen while modifying
the graph.

We distinguish between writer (main loop, under BQL) that modifies the
graph, and readers (all other coroutines running in various AioContext),
that go through the graph edges, reading ->parents and->children.

The writer (main loop) has "exclusive" access, so it first waits for any
current read to finish, and then prevents incoming ones from entering
while it has the exclusive access.

The readers (coroutines in multiple AioContext) are free to access the
graph as long the writer is not modifying the graph. In case it is, they
go in a CoQueue and sleep until the writer is done.

If a coroutine changes AioContext, the counter in the original and new
AioContext are left intact, since the writer does not care where the
reader is, but only if there is one.

As a result, some AioContexts might have a negative reader count, to
balance the positive count of the AioContext that took the lock. This
also means that when an AioContext is deleted it may have a nonzero
reader count. In that case we transfer the count to a global shared
counter so that the writer is always aware of all readers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-3-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Factor out bdrv_drain_all_begin_nopoll()

Provide a separate function that just quiesces the users of a node to
prevent new requests from coming in, but without waiting for the already
in-flight I/O to complete.

This function can be used in contexts where polling is not allowed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-2-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/dirty-bitmap: convert coroutine-only functions to co_wrapper

bdrv_can_store_new_dirty_bitmap and bdrv_remove_persistent_dirty_bitmap
check if they are running in a coroutine, directly calling the
coroutine callback if it's the case.
Except that no coroutine calls such functions, therefore that check
can be removed, and function creation can be offloaded to
c_w.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-15-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: convert bdrv_create to co_wrapper

This function is never called in coroutine context, therefore
instead of manually creating a new coroutine, delegate it to the
block-coroutine-wrapper script, defining it as co_wrapper.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-14-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-coroutine-wrapper.py: support also basic return types

Extend the regex to cover also return type, pointers included.
This implies that the value returned by the function cannot be
a simple "int" anymore, but the custom return type.
Therefore remove poll_state->ret and instead use a per-function
custom "ret" field.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-13-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-coroutine-wrapper.py: support functions without bs arg

Right now, we take the first parameter of the function to get the
BlockDriverState to pass to bdrv_poll_co(), that internally calls
functions that figure in which aiocontext the coroutine should run.

However, it is useless to pass a bs just to get its own AioContext,
so instead pass it directly, and default to the main loop if no
BlockDriverState is passed as parameter.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-12-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-coroutine-wrapper.py: introduce co_wrapper

This new annotation starts just a function wrapper that creates
a new coroutine. It assumes the caller is not a coroutine.
It will be the default annotation to be used in the future.

This is much better as c_w_mixed, because it is clear if the caller
is a coroutine or not, and provides the advantage of automating
the code creation. In the future all c_w_mixed functions will be
substituted by co_wrapper.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-11-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: rename generated_co_wrapper in co_wrapper_mixed

In preparation to the incoming new function specifiers,
rename g_c_w with a more meaningful name and document it.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-10-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: bdrv_create_file is a coroutine_fn

It is always called in coroutine_fn callbacks, therefore
it can directly call bdrv_co_create().

Rename it to bdrv_co_create_file too.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-9-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: distinguish between bdrv_create running in coroutine and not

Call two different functions depending on whether bdrv_create
is in coroutine or not, following the same pattern as
generated_co_wrapper functions.

This allows to also call the coroutine function directly,
without using CreateCo or relying in bdrv_create().

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-8-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: avoid duplicating filename string in bdrv_create

We know that the string will stay around until the function
returns, and the parameter of drv->bdrv_co_create_opts is const char*,
so it must not be modified either.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-7-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/vmdk: add coroutine_fn annotations

These functions end up calling bdrv_create() implemented as generated_co_wrapper
functions.
In addition, they also happen to be always called in coroutine context,
meaning all callers are coroutine_fn.
This means that the g_c_w function will enter the qemu_in_coroutine()
case and eventually suspend (or in other words call qemu_coroutine_yield()).
Therefore we can mark such functions coroutine_fn too.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-6-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-backend: replace bdrv_*_above with blk_*_above

Avoid mixing bdrv_* functions with blk_*, so create blk_* counterparts
for bdrv_block_status_above and bdrv_is_allocated_above.

Note that since blk_co_block_status_above only calls the g_c_w function
bdrv_common_block_status_above and is marked as coroutine_fn, call
directly bdrv_co_common_block_status_above() to avoid using a g_c_w.
Same applies to blk_co_is_allocated_above.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-5-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

nbd/server.c: add coroutine_fn annotations

These functions end up calling bdrv_*() implemented as generated_co_wrapper
functions.
In addition, they also happen to be always called in coroutine context,
meaning all callers are coroutine_fn.
This means that the g_c_w function will enter the qemu_in_coroutine()
case and eventually suspend (or in other words call qemu_coroutine_yield()).
Therefore we can mark such functions coroutine_fn too.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-4-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-copy: add coroutine_fn annotations

These functions end up calling bdrv_common_block_status_above(), a
generated_co_wrapper function.
In addition, they also happen to be always called in coroutine context,
meaning all callers are coroutine_fn.
This means that the g_c_w function will enter the qemu_in_coroutine()
case and eventually suspend (or in other words call qemu_coroutine_yield()).
Therefore we can mark such functions coroutine_fn too.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-3-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-io: introduce coroutine_fn duplicates for bdrv_common_block_status_above callers

bdrv_common_block_status_above() is a g_c_w, and it is being called by
many "wrapper" functions like bdrv_is_allocated(),
bdrv_is_allocated_above() and bdrv_block_status_above().

Because we want to eventually split the coroutine from non-coroutine
case in g_c_w, create duplicate wrappers that take care of directly
calling the same coroutine functions called in the g_c_w.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-Id: <20221128142337.657646-2-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Remove poll parameter from bdrv_parent_drained_begin_single()

All callers of bdrv_parent_drained_begin_single() pass poll=false now,
so we don't need the parameter any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221118174110.55183-16-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Don't poll in bdrv_replace_child_noperm()

In order to make sure that bdrv_replace_child_noperm() doesn't have to
poll any more, get rid of the bdrv_parent_drained_begin_single() call.

This is possible now because we can require that the parent is already
drained through the child in question when the function is called and we
don't call the parent drain callbacks more than once.

The additional drain calls needed in callers cause the test case to run
its code in the drain handler too early (bdrv_attach_child() drains
now), so modify it to only enable the code after the test setup has
completed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221118174110.55183-15-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Drop out of coroutine in bdrv_do_drained_begin_quiesce()

The next patch adds a parent drain to bdrv_attach_child_common(), which
shouldn't be, but is currently called from coroutines in some cases (e.g.
.bdrv_co_create implementations generally open new nodes). Therefore,
the assertion that we're not in a coroutine doesn't hold true any more.

We could just remove the assertion because there is nothing in the
function that should be in conflict with running in a coroutine, but
just to be on the safe side, we can reverse the caller relationship
between bdrv_do_drained_begin() and bdrv_do_drained_begin_quiesce() so
that the latter also just drops out of coroutine context and we can
still be certain in the future that any drain code doesn't run in
coroutines.

As a nice side effect, the structure of bdrv_do_drained_begin() is now
symmetrical with bdrv_do_drained_end().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221118174110.55183-14-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Remove ignore_bds_parents parameter from drain_begin/end.

ignore_bds_parents is now ignored during drain_begin and drain_end, so
we can just remove it there. It is still a valid optimisation for
drain_all in bdrv_drained_poll(), so leave it around there.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221118174110.55183-13-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Call drain callbacks only once

We only need to call both the BlockDriver's callback and the parent
callbacks when going from undrained to drained or vice versa. A second
drain section doesn't make a difference for the driver or the parent,
they weren't supposed to send new requests before and after the second
drain.

One thing that gets in the way is the 'ignore_bds_parents' parameter in
bdrv_do_drained_begin_quiesce() and bdrv_do_drained_end(): It means that
bdrv_drain_all_begin() increases bs->quiesce_counter, but does not
quiesce the parent through BdrvChildClass callbacks. If an additional
drain section is started now, bs->quiesce_counter will be non-zero, but
we would still need to quiesce the parent through BdrvChildClass in
order to keep things consistent (and unquiesce it on the matching
bdrv_drained_end(), even though the counter would not reach 0 yet as
long as the bdrv_drain_all() section is still active).

Instead of keeping track of this, let's just get rid of the parameter.
It was introduced in commit 6cd5c9d7b2d as an optimisation so that
during bdrv_drain_all(), we wouldn't recursively drain all parents up to
the root for each node, resulting in quadratic complexity. As it happens,
calling the callbacks only once solves the same problem, so as of this
patch, we'll still have O(n) complexity and ignore_bds_parents is not
needed any more.

This patch only ignores the 'ignore_bds_parents' parameter. It will be
removed in a separate patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-12-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Remove subtree drains

Subtree drains are not used any more. Remove them.

After this, BdrvChildClass.attach/detach() don't poll any more.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-11-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

stream: Replace subtree drain with a single node drain

The subtree drain was introduced in commit b1e1af394d9 as a way to avoid
graph changes between finding the base node and changing the block graph
as necessary on completion of the image streaming job.

The block graph could change between these two points because
bdrv_set_backing_hd() first drains the parent node, which involved
polling and can do anything.

Subtree draining was an imperfect way to make this less likely (because
with it, fewer callbacks are called during this window). Everyone agreed
that it's not really the right solution, and it was only committed as a
stopgap solution.

This replaces the subtree drain with a solution that simply drains the
parent node before we try to find the base node, and then call a version
of bdrv_set_backing_hd() that doesn't drain, but just asserts that the
parent node is already drained.

This way, any graph changes caused by draining happen before we start
looking at the graph and things stay consistent between finding the base
node and changing the graph.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-10-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Don't use subtree drains in bdrv_drop_intermediate()

Instead of using a subtree drain from the top node (which also drains
child nodes of base that we're not even interested in), use a normal
drain for base, which automatically drains all of the parents, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-9-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Drain individual nodes during reopen

bdrv_reopen() and friends use subtree drains as a lazy way of covering
all the nodes they touch. Turns out that this lazy way is a lot more
complicated than just draining the nodes individually, even not
accounting for the additional complexity in the drain mechanism itself.

Simplify the code by switching to draining the individual nodes that are
already managed in the BlockReopenQueue anyway.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221118174110.55183-8-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix locking for bdrv_reopen_queue_child()

Callers don't agree whether bdrv_reopen_queue_child() should be called
with the AioContext lock held or not. Standardise on holding the lock
(as done by QMP blockdev-reopen and the replication block driver) and
fix bdrv_reopen() to do the same.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221118174110.55183-7-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Inline bdrv_drain_invoke()

bdrv_drain_invoke() has now two entirely separate cases that share no
code any more and are selected depending on a bool parameter. Each case
has only one caller. Just inline the function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-6-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Remove drained_end_counter

drained_end_counter is unused now, nobody changes its value any more. It
can be removed.

In cases where we had two almost identical functions that only differed
in whether the caller passes drained_end_counter, or whether they would
poll for a local drained_end_counter to reach 0, these become a single
function.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20221118174110.55183-5-kwolf@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Revert .bdrv_drained_begin/end to non-coroutine_fn

Polling during bdrv_drained_end() can be problematic (and in the future,
we may get cases for bdrv_drained_begin() where polling is forbidden,
and we don't care about already in-flight requests, but just want to
prevent new requests from arriving).

The .bdrv_drained_begin/end callbacks running in a coroutine is the only
reason why we have to do this polling, so make them non-coroutine
callbacks again. None of the callers actually yield any more.

This means that bdrv_drained_end() effectively doesn't poll any more,
even if AIO_WAIT_WHILE() loops are still there (their condition is false
from the beginning). This is generally not a problem, but in
test-bdrv-drain, some additional explicit aio_poll() calls need to be
added because the test case wants to verify the final state after BHs
have executed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-4-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

test-bdrv-drain: Don't yield in .bdrv_co_drained_begin/end()

We want to change .bdrv_co_drained_begin/end() back to be non-coroutine
callbacks, so in preparation, avoid yielding in their implementation.

This does almost the same as the existing logic in bdrv_drain_invoke(),
by creating and entering coroutines internally. However, since the test
case is by far the heaviest user of coroutine code in drain callbacks,
it is preferable to have the complexity in the test case rather than the
drain core, which is already complicated enough without this.

The behaviour for bdrv_drain_begin() is unchanged because we increase
bs->in_flight and this is still polled. However, bdrv_drain_end()
doesn't wait for the spawned coroutine to complete any more. This is
fine, we don't rely on bdrv_drain_end() restarting all operations
immediately before the next aio_poll().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-3-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qed: Don't yield in bdrv_qed_co_drain_begin()

We want to change .bdrv_co_drained_begin() back to be a non-coroutine
callback, so in preparation, avoid yielding in its implementation.

Because we increase bs->in_flight and bdrv_drained_begin() polls, the
behaviour is unchanged.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221118174110.55183-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: refactor bdrv_list_refresh_perms to allow any list of nodes

We are going to increase usage of collecting nodes in a list to then
update, and calling bdrv_topological_dfs() each time is not convenient,
and not correct as we are going to interleave graph modifying with
filling the node list.

So, let's switch to a function that takes any list of nodes, adds all
their subtrees and do topological sort. And finally, refresh
permissions.

While being here, make the function public, as we'll want to use it
from blockdev.c in near future.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221107163558.618889-5-vsementsov@yandex-team.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: bdrv_refresh_perms(): allow external tran

Allow passing external Transaction pointer, stop creating extra
Transaction objects.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221107163558.618889-4-vsementsov@yandex-team.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: drop bdrv_remove_filter_or_cow_child

Drop this simple wrapper used only in one place. We have too many graph
modifying functions even without it.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221107163558.618889-3-vsementsov@yandex-team.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Inline bdrv_detach_child()

The only caller is bdrv_root_unref_child(), let's just do the logic
directly in it. It simplifies further conversion of
bdrv_root_unref_child() to transaction actions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20221107163558.618889-2-vsementsov@yandex-team.ru>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge tag 'next-8.0-pull-request' of https://gitlab.com/juan.quintela/qemu into staging

Migration patches for 8.0

Hi

This are the patches that I had to drop form the last PULL request because they werent fixes:
- AVX2 is dropped, intel posted a fix, I have to redo it
- Fix for out of order channels is out
  Daniel nacked it and I need to redo it

# gpg: Signature made Thu 15 Dec 2022 09:38:29 GMT
# gpg:                using RSA key 1899FF8EDEBF58CCEE034B82F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full]
# gpg:                 aka "Juan Quintela <quintela@trasno.org>" [full]
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* tag 'next-8.0-pull-request' of https://gitlab.com/juan.quintela/qemu:
  migration: Drop rs->f
  migration: Remove old preempt code around state maintainance
  migration: Send requested page directly in rp-return thread
  migration: Move last_sent_block into PageSearchStatus
  migration: Make PageSearchStatus part of RAMState
  migration: Add pss_init()
  migration: Introduce pss_channel
  migration: Teach PSS about host page
  migration: Use atomic ops properly for page accountings
  migration: Yield bitmap_mutex properly when sending/sleeping
  migration: Remove RAMState.f references in compression code
  migration: Trivial cleanup save_page_header() on same block check
  migration: Cleanup xbzrle zero page cache update logic
  migration: Add postcopy_preempt_active()
  migration: Take bitmap mutex when completing ram migration
  migration: Export ram_release_page()
  migration: Export ram_transferred_ram()
  multifd: Create page_count fields into both MultiFD{Recv,Send}Params
  multifd: Create page_size fields into both MultiFD{Recv,Send}Params

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tests/qtest/vhost-user-blk-test: don't abort all qtests on missing envar

This test requires environment variable QTEST_QEMU_STORAGE_DAEMON_BINARY
to be defined for running. If not, it would immediately abort all qtests
and prevent other, unrelated tests from running.

To fix that, just skip vhost-user-blk-test instead and log a message
about missing environment variable.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Message-Id: <E1oybRD-0005D5-5r@lizzy.crudebyte.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

.gitlab/issue_templates: Move suggestions into comments

Many users forget to remove the suggestions from the bug template
when creating a new issue. So when searching for strings like "s390x"
or "Windows", you get a lot of unrelated issues in the results.
Thus let's move the suggestions into HTML comments - so they will
still show up in the markdown when editing the bug, while being
hidden/ignored in the final text or in the search queries.

Message-Id: <20221201133756.77216-1-thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

gitlab-ci: Check building ppc64 without TCG

Building QEMU for ppc64 hosts with --disable-tcg used to break a couple
of times in the past, see e.g. commit a01b64cee7 ("target/ppc: Put do_rfi
under a TCG-only block") or commit 049b4ad669 ("target/ppc: Fix build
warnings when building with 'disable-tcg'"), so we should test this in
our CI to avoid such regressions.

Message-Id: <20221208101527.36873-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

FreeBSD: Upgrade to 12.4 release

Upgrade to 12.4 release

Signed-off-by: Brad Smith <brad@comstyle.com>
Message-Id: <Y5GJpW/1s+NEah98@humpty.home.comstyle.com>
Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed by: Warner Losh <imp@bsdimp.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest: Enable qtest build on Windows

Now that we have fixed various test case issues as seen when running
on Windows, let's enable the qtest build on Windows.

Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20221125114100.3184790-4-bmeng.cn@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

.gitlab-ci.d/windows.yml: Exclude qTests from 64-bit CI job for now

qTests don't run successfully with "--without-default-devices",
so let's exclude the qtests from CI for now.

Suggested-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20221125114100.3184790-3-bmeng.cn@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

.gitlab-ci.d/windows.yml: Keep 64-bit and 32-bit build scripts consistent

At present the build scripts of 32-bit and 64-bit are inconsistent.
Let's keep them consistent for easier maintenance.

While we are here, add some comments to explain that for the 64-bit
job, "--without-default-devices" is a must have, at least for now.

Signed-off-by: Bin Meng <bin.meng@windriver.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20221125114100.3184790-2-bmeng.cn@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

.gitlab-ci.d/windows.yml: Unify the prerequisite packages

At present the prerequisite packages for 64-bit and 32-bit builds
are slightly different. Let's use the same packages for both for
easier maintenance in the future.

Signed-off-by: Bin Meng <bin.meng@windriver.com>
Message-Id: <20221125114100.3184790-1-bmeng.cn@gmail.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest/libqos/e1000e: Correctly group register accesses

Add a newline after E1000_TCTL write and make it clear that E1000_TCTL
write is what enabling transmit.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20221110114549.66081-1-akihiko.odaki@daynix.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest/e1000e-test: De-duplicate constants

De-duplicate constants found in e1000e_send_verify() and
e1000e_receive_verify() to avoid mismatch and improve readability.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20221110114426.65951-1-akihiko.odaki@daynix.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

tests/qtest/libqos/e1000e: Remove "other" interrupts

The "other" kind of interrupts are not used in the tests.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20221110114045.65544-1-akihiko.odaki@daynix.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

hw: Include the VMWare devices only in the x86 targets

It seems a little bit weird that the para-virtualized x86 VMWare
devices "vmware-svga" and "vmxnet3" also show up in non-x86 targets.
They are likely pretty useless there (since the guest OSes likely
do not have any drivers for those enabled), so let's change this and
only enable those devices by default for the classical x86 targets.

Message-Id: <20221213095144.42355-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

MAINTAINERS: Add documentation files to the corresponding sections

A lot of files in the docs directory do not have a maintainer according to
our MAINTAINERS file, though they can be clearly associated with one of the
sections in there. Add the files now so that our scripts/get_maintainer.pl
script can output the right maintainer for them.

Message-Id: <20221212174841.201003-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

util/oslib-win32: Remove obsolete reference to g_poll code

The comment about g_poll is not required here anymore since
the corresponding code has been removed a while ago already.

Fixes: b4c6036faa ("configure: bump min required glib version to 2.56")
Message-Id: <20221208133257.95673-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

util/qemu-config: Fix "query-command-line-options" to provide the right values

The "query-command-line-options" command uses a hand-crafted list
of options that should be returned for the "machine" parameter.
This is pretty much out of sync with reality, for example settings
like "kvm_shadow_mem" or "accel" are not parameters for the machine
anymore. Also, there is no distinction between the targets here, so
e.g. the s390x-specific values like "loadparm" in this list also
show up with the other targets like x86_64.

Let's fix this now by geting rid of the hand-crafted list and by
querying the properties of the machine classes instead to assemble
the list.

Fixes: 0a7cf217d8 ("fix regression of qmp_query_command_line_options")
Message-Id: <20221111141323.246267-1-thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

scripts/make-release: Only clone single branches to speed up the script

Using --single-branch and --depth 1 here helps to speed up the process
a little bit and helps to save some networking bandwidth.

Message-Id: <20221128092555.37102-3-thuth@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

scripts/make-release: Add a simple help text for the script

Print a simple help text if the script has been called with the
wrong amount of parameters.

Message-Id: <20221128092555.37102-2-thuth@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Thomas Huth <thuth@redhat.com>

monitor/misc: Remove superfluous include statements

These #includes are not required anymore (the likely got superfluous
with commit da76ee76f7 - "hmp-commands-info: move info_cmds content
out of monitor.c").

Message-Id: <20221128133514.220919-1-thuth@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

target/s390x: The MVCP and MVCS instructions are not privileged

The "MOVE TO PRIMARY/SECONDARY" instructions can also be called
from problem state. We just should properly check whether the
secondary-space access key is valid here, too, and inject a
privileged program exception if it is invalid.

Message-Id: <20221205125852.81848-1-thuth@redhat.com>
Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

target/s390x/tcg/mem_helper: Test the right bits in psw_key_valid()

The PSW key mask is a 16 bit field, and the psw_key variable is
in the range from 0 to 15, so it does not make sense to use
"0x80 >> psw_key" for testing the bits here. We should use 0x8000
instead.

Message-Id: <20221205142043.95185-1-thuth@redhat.com>
Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

s390x/pci: reset ISM passthrough devices on shutdown and system reset

ISM device firmware stores unique state information that can
can cause a wholesale unmap of the associated IOMMU (e.g. when
we get a termination signal for QEMU) to trigger firmware errors
because firmware believes we are attempting to invalidate entries
that are still in-use by the guest OS (when in fact that guest is
in the process of being terminated or rebooted).
To alleviate this, register both a shutdown notifier (for unexpected
termination cases e.g. virsh destroy) as well as a reset callback
(for cases like guest OS reboot). For each of these scenarios, trigger
PCI device reset; this is enough to indicate to firmware that the IOMMU
is no longer in-use by the guest OS, making it safe to invalidate any
associated IOMMU entries.

Fixes: 15d0e7942d3b ("s390x/pci: don't fence interpreted devices without MSI-X")
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221209195700.263824-1-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
[thuth: Adjusted the hunk in s390-pci-vfio.c due to different context]
Signed-off-by: Thomas Huth <thuth@redhat.com>

s390x/pci: shrink DMA aperture to be bound by vfio DMA limit

Currently, s390x-pci performs accounting against the vfio DMA
limit and triggers the guest to clean up mappings when the limit
is reached. Let's go a step further and also limit the size of
the supported DMA aperture reported to the guest based upon the
initial vfio DMA limit reported for the container (if less than
than the size reported by the firmware/host zPCI layer). This
avoids processing sections of the guest DMA table during global
refresh that, for common use cases, will never be used anway, and
makes exhausting the vfio DMA limit due to mismatch between guest
aperture size and host limit far less likely and more indicitive
of an error.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221028194758.204007-4-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

s390x/pci: coalesce unmap operations

Currently, each unmapped page is handled as an individual iommu
region notification. Attempt to group contiguous unmap operations
into fewer notifications to reduce overhead.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Message-Id: <20221028194758.204007-3-mjrosato@linux.ibm.com>
Reviewed-by: Eric Farman <farman@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>

hw/arm/boot: set initrd with #address-cells type in fdt

We use 32bit value for linux,initrd-[start/end], when we have
loader_start > 4GB, there will be a wrong initrd_start passed
to the kernel, and the kernel will report the following warning.

[    0.000000] ------------[ cut here ]------------
[    0.000000] initrd not fully accessible via the linear mapping -- please check your bootloader ...
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/arm64/mm/init.c:355 arm64_memblock_init+0x158/0x244
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W          6.1.0-rc3-13250-g30a0b95b1335-dirty #28
[    0.000000] Hardware name: Horizon Sigi Virtual development board (DT)
[    0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : arm64_memblock_init+0x158/0x244
[    0.000000] lr : arm64_memblock_init+0x158/0x244
[    0.000000] sp : ffff800009273df0
[    0.000000] x29: ffff800009273df0 x28: 0000001000cc0010 x27: 0000800000000000
[    0.000000] x26: 000000000050a3e2 x25: ffff800008b46000 x24: ffff800008b46000
[    0.000000] x23: ffff800008a53000 x22: ffff800009420000 x21: ffff800008a53000
[    0.000000] x20: 0000000004000000 x19: 0000000004000000 x18: 00000000ffff1020
[    0.000000] x17: 6568632065736165 x16: 6c70202d2d20676e x15: 697070616d207261
[    0.000000] x14: 656e696c20656874 x13: 0a2e2e2e20726564 x12: 0000000000000000
[    0.000000] x11: 0000000000000000 x10: 00000000ffffffff x9 : 0000000000000000
[    0.000000] x8 : 0000000000000000 x7 : 796c6c756620746f x6 : 6e20647274696e69
[    0.000000] x5 : ffff8000093c7c47 x4 : ffff800008a2102f x3 : ffff800009273a88
[    0.000000] x2 : 80000000fffff038 x1 : 00000000000000c0 x0 : 0000000000000056
[    0.000000] Call trace:
[    0.000000]  arm64_memblock_init+0x158/0x244
[    0.000000]  setup_arch+0x164/0x1cc
[    0.000000]  start_kernel+0x94/0x4ac
[    0.000000]  __primary_switched+0xb4/0xbc
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000001000000000-0x0000001007ffffff]

This doesn't affect any machine types we currently support, because
for all of our machine types the RAM starts well below the 4GB
mark, but it does demonstrate that we're not currently writing
the device-tree properties quite as intended.

To fix it, we can change it to write these values to the dtb using a
type width matching #address-cells.  This is the intended size for
these dtb properties, and is how u-boot, for instance, writes them,
although in practice the Linux kernel will cope with them being any
width as long as they're big enough to fit the value.

Signed-off-by: Schspa Shi <schspa@gmail.com>
Message-id: 20221129160724.75667-1-schspa@gmail.com
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/intc: Convert TYPE_KVM_ARM_ITS to 3-phase reset

Convert the TYPE_KVM_ARM_ITS device to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-10-peter.maydell@linaro.org

hw/intc: Convert TYPE_ARM_GICV3_ITS to 3-phase reset

Convert the TYPE_ARM_GICV3_ITS device to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-9-peter.maydell@linaro.org

hw/intc: Convert TYPE_ARM_GICV3_ITS_COMMON to 3-phase reset

Convert the TYPE_ARM_GICV3_ITS_COMMON parent class to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221109161444.3397405-8-peter.maydell@linaro.org

hw/intc: Convert TYPE_KVM_ARM_GICV3 to 3-phase reset

Convert the TYPE_KVM_ARM_GICV3 device to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-7-peter.maydell@linaro.org

hw/intc: Convert TYPE_ARM_GICV3_COMMON to 3-phase reset

Convert the TYPE_ARM_GICV3_COMMON parent class to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-6-peter.maydell@linaro.org

hw/intc: Convert TYPE_ARM_GIC_KVM to 3-phase reset

Now we have converted TYPE_ARM_GIC_COMMON, we can convert the
TYPE_ARM_GIC_KVM subclass to 3-phase reset.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-5-peter.maydell@linaro.org

hw/intc: Convert TYPE_ARM_GIC_COMMON to 3-phase reset

Convert the TYPE_ARM_GIC_COMMON device to 3-phase reset. This is a
simple no-behaviour-change conversion.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20221109161444.3397405-4-peter.maydell@linaro.org

hw/arm: Convert TYPE_ARM_SMMUV3 to 3-phase reset

Convert the TYPE_ARM_SMMUV3 device to 3-phase reset. The legacy
reset method doesn't do anything that's invalid in the hold phase, so
the conversion only requires changing it to a hold phase method, and
using the 3-phase versions of the "save the parent reset method and
chain to it" code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: 20221109161444.3397405-3-peter.maydell@linaro.org

hw/arm: Convert TYPE_ARM_SMMU to 3-phase reset

Convert the TYPE_ARM_SMMU device to 3-phase reset. The legacy method
doesn't do anything that's invalid in the hold phase, so the
conversion is simple and not a behaviour change.

Note that we must convert this base class before we can convert the
TYPE_ARM_SMMUV3 subclass -- transitional support in Resettable
handles "chain to parent class reset" when the base class is 3-phase
and the subclass is still using legacy reset, but not the other way
around.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20221109161444.3397405-2-peter.maydell@linaro.org

target/arm: Report FEAT_EVT for TCG '-cpu max'

Update the ID registers for TCG's '-cpu max' to report the
FEAT_EVT Enhanced Virtualization Traps support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Implement HCR_EL2.TID4 traps

For FEAT_EVT, the HCR_EL2.TID4 trap allows trapping of the cache ID
registers CCSIDR_EL1, CCSIDR2_EL1, CLIDR_EL1 and CSSELR_EL1 (and
their AArch32 equivalents). This is a subset of the registers
trapped by HCR_EL2.TID2, which includes all of these and also the
CTR_EL0 register.

Our implementation already uses a separate access function for
CTR_EL0 (ctr_el0_access()), so all of the registers currently using
access_aa64_tid2() should also be checking TID4. Make that function
check both TID2 and TID4, and rename it appropriately.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Implement HCR_EL2.TICAB,TOCU traps

For FEAT_EVT, the HCR_EL2.TICAB bit allows trapping of the ICIALLUIS
and IC IALLUIS cache maintenance instructions.

The HCR_EL2.TOCU bit traps all the other cache maintenance
instructions that operate to the point of unification:
AArch64 IC IVAU, IC IALLU, DC CVAU
AArch32 ICIMVAU, ICIALLU, DCCMVAU

The two trap bits between them cover all of the cache maintenance
instructions which must also check the HCR_TPU flag. Turn the old
aa64_cacheop_pou_access() function into a helper function which takes
the set of HCR_EL2 flags to check as an argument, and call it from
new access_ticab() and access_tocu() functions as appropriate for
each cache op.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Implement HCR_EL2.TTLBOS traps

For FEAT_EVT, the HCR_EL2.TTLBOS bit allows trapping on EL1
use of TLB maintenance instructions that operate on the
outer shareable domain:

TLBI VMALLE1OS, TLBI VAE1OS, TLBI ASIDE1OS,TLBI VAAE1OS,
TLBI VALE1OS, TLBI VAALE1OS, TLBI RVAE1OS, TLBI RVAAE1OS,
TLBI RVALE1OS, and TLBI RVAALE1OS.

(There are no AArch32 outer-shareable TLB maintenance ops.)

Implement the trapping.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Implement HCR_EL2.TTLBIS traps

For FEAT_EVT, the HCR_EL2.TTLBIS bit allows trapping on EL1 use of
TLB maintenance instructions that operate on the inner shareable
domain:

AArch64:
TLBI VMALLE1IS, TLBI VAE1IS, TLBI ASIDE1IS, TLBI VAAE1IS,
TLBI VALE1IS, TLBI VAALE1IS, TLBI RVAE1IS, TLBI RVAAE1IS,
TLBI RVALE1IS, and TLBI RVAALE1IS.

AArch32:
TLBIALLIS, TLBIMVAIS, TLBIASIDIS, TLBIMVAAIS, TLBIMVALIS,
and TLBIMVAALIS.

Add the trapping support.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

target/arm: Allow relevant HCR bits to be written for FEAT_EVT

FEAT_EVT adds five new bits to the HCR_EL2 register: TTLBIS, TTLBOS,
TICAB, TOCU and TID4. These allow the guest to enable trapping of
various EL1 instructions to EL2. In this commit, add the necessary
code to allow the guest to set these bits if the feature is present;
because the bit is always zero when the feature isn't present we
won't need to use explicit feature checks in the "trap on condition"
tests in the following commits.

Note that although full implementation of the feature (mandatory from
Armv8.5 onward) requires all five trap bits, the ID registers permit
a value indicating that only TICAB, TOCU and TID4 are implemented,
which might be the case for CPUs between Armv8.2 and Armv8.5.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

hw/intc/arm_gicv3: Fix GICD_TYPER ITLinesNumber advertisement

The ARM GICv3 TRM describes that the ITLinesNumber field of GICD_TYPER
register:

"indicates the maximum SPI INTID that the GIC implementation supports"

As SPI #0 is absolute IRQ #32, the max SPI INTID should have accounted
for the internal 16x SGI's and 16x PPI's.  However, the original GICv3
model subtracted off the SGI/PPI.  Cosmetically this can be seen at OS
boot (Linux) showing 32 shy of what should be there, i.e.:

    [    0.000000] GICv3: 224 SPIs implemented

Though in hw/arm/virt.c, the machine is configured for 256 SPI's.  ARM
virt machine likely doesn't have a problem with this because the upper
32 IRQ's don't actually have anything meaningful wired. But, this does
become a functional issue on a custom use case which wants to make use
of these IRQ's.  Additionally, boot code (i.e. TF-A) will only init up
to the number (blocks of 32) that it believes to actually be there.

Signed-off-by: Luke Starrett <lukes@xsightlabs.com>
Message-id: AM9P193MB168473D99B761E204E032095D40D9@AM9P193MB1684.EURP193.PROD.OUTLOOK.COM
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Add Cortex-A55 CPU

The Cortex-A55 is one of the newer armv8.2+ CPUs; in particular
it supports the Privileged Access Never (PAN) feature. Add
a model of this CPU, so you can use a CPU type on the virt
board that models a specific real hardware CPU, rather than
having to use the QEMU-specific "max" CPU type.

Signed-off-by: Timofey Kutergin <tkutergin@gmail.com>
Message-id: 20221121150819.2782817-1-tkutergin@gmail.com
[PMM: tweaked commit message]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/virt: build SMBIOS 19 table

Use the base_memmap to build the SMBIOS 19 table which provides the address
mapping for a Physical Memory Array (from spec [1] chapter 7.20).

This was present on i386 from commit c97294ec1b9e36887e119589d456557d72ab37b5
("SMBIOS: Build aggregate smbios tables and entry point").

[1] https://www.dmtf.org/sites/default/files/standards/documents/DSP0134_3.5.0.pdf

The absence of this table is a breach of the specs and is
detected by the FirmwareTestSuite (FWTS), but it doesn't
cause any known problems for guest OSes.

Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
Message-id: 1668789029-5432-1-git-send-email-mihai.carabas@oracle.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/virt: Add properties to disable high memory regions

The 3 high memory regions are usually enabled by default, but they may
be not used. For example, VIRT_HIGH_GIC_REDIST2 isn't needed by GICv2.
This leads to waste in the PA space.

Add properties ("highmem-redists", "highmem-ecam", "highmem-mmio") to
allow users selectively disable them if needed. After that, the high
memory region for GICv3 or GICv4 redistributor can be disabled by user,
the number of maximal supported CPUs needs to be calculated based on
'vms->highmem_redists'. The follow-up error message is also improved
to indicate if the high memory region for GICv3 and GICv4 has been
enabled or not.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20221029224307.138822-8-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/virt: Add 'compact-highmem' property

After the improvement to high memory region address assignment is
applied, the memory layout can be changed, introducing possible
migration breakage. For example, VIRT_HIGH_PCIE_MMIO memory region
is disabled or enabled when the optimization is applied or not, with
the following configuration. The configuration is only achievable by
modifying the source code until more properties are added to allow
users selectively disable those high memory regions.

  pa_bits              = 40;
  vms->highmem_redists = false;
  vms->highmem_ecam    = false;
  vms->highmem_mmio    = true;

  # qemu-system-aarch64 -accel kvm -cpu host    \
    -machine virt-7.2,compact-highmem={on, off} \
    -m 4G,maxmem=511G -monitor stdio

  Region             compact-highmem=off         compact-highmem=on
  ----------------------------------------------------------------
  MEM                [1GB         512GB]        [1GB         512GB]
  HIGH_GIC_REDISTS2  [512GB       512GB+64MB]   [disabled]
  HIGH_PCIE_ECAM     [512GB+256MB 512GB+512MB]  [disabled]
  HIGH_PCIE_MMIO     [disabled]                 [512GB       1TB]

In order to keep backwords compatibility, we need to disable the
optimization on machine, which is virt-7.1 or ealier than it. It
means the optimization is enabled by default from virt-7.2. Besides,
'compact-highmem' property is added so that the optimization can be
explicitly enabled or disabled on all machine types by users.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-id: 20221029224307.138822-7-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>