OSDN Git Service

tomoyo/tomoyo-test1.git
2 years agoKVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis
Ben Gardon [Mon, 13 Jun 2022 21:25:21 +0000 (21:25 +0000)]
KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

In some cases, the NX hugepage mitigation for iTLB multihit is not
needed for all guests on a host. Allow disabling the mitigation on a
per-VM basis to avoid the performance hit of NX hugepages on trusted
workloads.

In order to disable NX hugepages on a VM, ensure that the userspace
actor has permission to reboot the system. Since disabling NX hugepages
would allow a guest to crash the system, it is similar to reboot
permissions.

Ideally, KVM would require userspace to prove it has access to KVM's
nx_huge_pages module param, e.g. so that userspace can opt out without
needing full reboot permissions.  But getting access to the module param
file info is difficult because it is buried in layers of sysfs and module
glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases.

Suggested-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-9-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Fix errant brace in KVM capability handling
Ben Gardon [Mon, 13 Jun 2022 21:25:20 +0000 (21:25 +0000)]
KVM: x86: Fix errant brace in KVM capability handling

The braces around the KVM_CAP_XSAVE2 block also surround the
KVM_CAP_PMU_CAPABILITY block, likely the result of a merge issue. Simply
move the curly brace back to where it belongs.

Fixes: ba7bb663f5547 ("KVM: x86: Provide per VM capability for disabling PMU virtualization")

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-8-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Read binary stat data in lib
Ben Gardon [Mon, 13 Jun 2022 21:25:18 +0000 (21:25 +0000)]
KVM: selftests: Read binary stat data in lib

Move the code to read the binary stats data to the KVM selftests
library. It will be re-used by other tests to check KVM behavior.

Also opportunistically remove an unnecessary calculation with
"size_data" in stats_test.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-6-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Clean up coding style in binary stats test
Sean Christopherson [Mon, 13 Jun 2022 21:25:17 +0000 (21:25 +0000)]
KVM: selftests: Clean up coding style in binary stats test

Fix a variety of code style violations and/or inconsistencies in the
binary stats test.  The 80 char limit is a soft limit and can and should
be ignored/violated if doing so improves the overall code readability.

Specifically, provide consistent indentation and don't split expressions
at arbitrary points just to honor the 80 char limit.

Opportunistically expand/add comments to call out the more subtle aspects
of the code.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-5-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Read binary stats desc in lib
Ben Gardon [Mon, 13 Jun 2022 21:25:16 +0000 (21:25 +0000)]
KVM: selftests: Read binary stats desc in lib

Move the code to read the binary stats descriptors to the KVM selftests
library. It will be re-used by other tests to check KVM behavior.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-4-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Read binary stats header in lib
Ben Gardon [Mon, 13 Jun 2022 21:25:15 +0000 (21:25 +0000)]
KVM: selftests: Read binary stats header in lib

Move the code to read the binary stats header to the KVM selftests
library. It will be re-used by other tests to check KVM behavior.

No functional change intended.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-3-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Remove dynamic memory allocation for stats header
Ben Gardon [Mon, 13 Jun 2022 21:25:14 +0000 (21:25 +0000)]
KVM: selftests: Remove dynamic memory allocation for stats header

There's no need to allocate dynamic memory for the stats header since
its size is known at compile time.

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-2-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Add MONITOR/MWAIT quirk test
Sean Christopherson [Wed, 8 Jun 2022 22:45:16 +0000 (22:45 +0000)]
KVM: selftests: Add MONITOR/MWAIT quirk test

Add a test to verify the "MONITOR/MWAIT never fault" quirk, and as a
bonus, also verify the related "MISC_ENABLES ignores ENABLE_MWAIT" quirk.

If the "never fault" quirk is enabled, MONITOR/MWAIT should always be
emulated as NOPs, even if they're reported as disabled in guest CPUID.
Use the MISC_ENABLES quirk to coerce KVM into toggling the MWAIT CPUID
enable, as KVM now disallows manually toggling CPUID bits after running
the vCPU.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220608224516.3788274-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Use exception fixup for #UD/#GP Hyper-V MSR/hcall tests
Sean Christopherson [Wed, 8 Jun 2022 22:45:15 +0000 (22:45 +0000)]
KVM: selftests: Use exception fixup for #UD/#GP Hyper-V MSR/hcall tests

Use exception fixup to verify VMCALL/RDMSR/WRMSR fault as expected in the
Hyper-V Features test.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220608224516.3788274-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Mostly fix broken Hyper-V Features test
Sean Christopherson [Wed, 8 Jun 2022 22:45:14 +0000 (22:45 +0000)]
KVM: selftests: Mostly fix broken Hyper-V Features test

Explicitly do all setup at every stage of the Hyper-V Features test, e.g.
set the MSR/hypercall, enable capabilities, etc...  Now that the VM is
recreated for every stage, values that are written into the VM's address
space, i.e. shared with the guest, are reset between sub-tests, as are
any capabilities, etc...

Fix the hypercall params as well, which were broken in the same rework.
The "hcall" struct/pointer needs to point at the hcall_params object, not
the set of hypercall pages.

The goofs were hidden by the test's dubious behavior of using '0' to
signal "done", i.e. the MSR test ran exactly one sub-test, and the
hypercall test was a gigantic nop.

Fixes: 6c1186430a80 ("KVM: selftests: Avoid KVM_SET_CPUID2 after KVM_RUN in hyperv_features test")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220608224516.3788274-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Add x86-64 support for exception fixup
Sean Christopherson [Wed, 8 Jun 2022 22:45:13 +0000 (22:45 +0000)]
KVM: selftests: Add x86-64 support for exception fixup

Add x86-64 support for exception fixup on single instructions, without
forcing tests to install their own fault handlers.  Use registers r9-r11
to flag the instruction as "safe" and pass fixup/vector information,
i.e. introduce yet another flavor of fixup (versus the kernel's in-memory
tables and KUT's per-CPU area) to take advantage of KVM sefltests being
64-bit only.

Using only registers avoids the need to allocate fixup tables, ensure
FS or GS base is valid for the guest, ensure memory is mapped into the
guest, etc..., and also reduces the potential for recursive faults due to
accessing memory.

Providing exception fixup trivializes tests that just want to verify that
an instruction faults, e.g. no need to track start/end using global
labels, no need to install a dedicated handler, etc...

Deliberately do not support #DE in exception fixup so that the fixup glue
doesn't need to account for a fault with vector == 0, i.e. the vector can
also indicate that a fault occurred.  KVM injects #DE only for esoteric
emulation scenarios, i.e. there's very, very little value in testing #DE.
Force any test that wants to generate #DEs to install its own handler(s).

Use kvm_pv_test as a guinea pig for the new fixup, as it has a very
straightforward use case of wanting to verify that RDMSR and WRMSR fault.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220608224516.3788274-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior
Sean Christopherson [Wed, 8 Jun 2022 22:45:12 +0000 (22:45 +0000)]
KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior

Add a quirk for KVM's behavior of emulating intercepted MONITOR/MWAIT
instructions a NOPs regardless of whether or not they are supported in
guest CPUID.  KVM's current behavior was likely motiviated by a certain
fruity operating system that expects MONITOR/MWAIT to be supported
unconditionally and blindly executes MONITOR/MWAIT without first checking
CPUID.  And because KVM does NOT advertise MONITOR/MWAIT to userspace,
that's effectively the default setup for any VMM that regurgitates
KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID2.

Note, this quirk interacts with KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT.  The
behavior is actually desirable, as userspace VMMs that want to
unconditionally hide MONITOR/MWAIT from the guest can leave the
MISC_ENABLE quirk enabled.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220608224516.3788274-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Ignore benign host writes to "unsupported" F15H_PERF_CTL MSRs
Sean Christopherson [Sat, 11 Jun 2022 00:57:55 +0000 (00:57 +0000)]
KVM: x86: Ignore benign host writes to "unsupported" F15H_PERF_CTL MSRs

Ignore host userspace writes of '0' to F15H_PERF_CTL MSRs KVM reports
in the MSR-to-save list, but the MSRs are ultimately unsupported.  All
MSRs in said list must be writable by userspace, e.g. if userspace sends
the list back at KVM without filtering out the MSRs it doesn't need.

Note, reads of said MSRs already have the desired behavior.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Ignore benign host accesses to "unsupported" PEBS and BTS MSRs
Sean Christopherson [Sat, 11 Jun 2022 00:57:54 +0000 (00:57 +0000)]
KVM: x86: Ignore benign host accesses to "unsupported" PEBS and BTS MSRs

Ignore host userspace reads and writes of '0' to PEBS and BTS MSRs that
KVM reports in the MSR-to-save list, but the MSRs are ultimately
unsupported.  All MSRs in said list must be writable by userspace, e.g.
if userspace sends the list back at KVM without filtering out the MSRs it
doesn't need.

Fixes: 8183a538cd95 ("KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS")
Fixes: 902caeb6841a ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS")
Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: VMX: Use vcpu_get_perf_capabilities() to get guest-visible value
Sean Christopherson [Sat, 11 Jun 2022 00:57:53 +0000 (00:57 +0000)]
KVM: VMX: Use vcpu_get_perf_capabilities() to get guest-visible value

Use vcpu_get_perf_capabilities() when querying MSR_IA32_PERF_CAPABILITIES
from the guest's perspective, e.g. to update the vPMU and to determine
which MSRs exist.  If userspace ignores MSR_IA32_PERF_CAPABILITIES but
clear X86_FEATURE_PDCM, the guest should see '0'.

Fixes: 902caeb6841a ("KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS")
Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoRevert "KVM: x86: always allow host-initiated writes to PMU MSRs"
Sean Christopherson [Sat, 11 Jun 2022 00:57:52 +0000 (00:57 +0000)]
Revert "KVM: x86: always allow host-initiated writes to PMU MSRs"

Revert the hack to allow host-initiated accesses to all "PMU" MSRs,
as intel_is_valid_msr() returns true for _all_ MSRs, regardless of whether
or not it has a snowball's chance in hell of actually being a PMU MSR.

That mostly gets papered over by the actual get/set helpers only handling
MSRs that they knows about, except there's the minor detail that
kvm_pmu_{g,s}et_msr() eat reads and writes when the PMU is disabled.
I.e. KVM will happy allow reads and writes to _any_ MSR if the PMU is
disabled, either via module param or capability.

This reverts commit d1c88a4020567ba4da52f778bcd9619d87e4ea75.

Fixes: d1c88a402056 ("KVM: x86: always allow host-initiated writes to PMU MSRs")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoRevert "KVM: x86/pmu: Accept 0 for absent PMU MSRs when host-initiated if !enable_pmu"
Sean Christopherson [Sat, 11 Jun 2022 00:57:51 +0000 (00:57 +0000)]
Revert "KVM: x86/pmu: Accept 0 for absent PMU MSRs when host-initiated if !enable_pmu"

Eating reads and writes to all "PMU" MSRs when there is no PMU is wildly
broken as it results in allowing accesses to _any_ MSR on Intel CPUs
as intel_is_valid_msr() returns true for all host_initiated accesses.

A revert of commit d1c88a402056 ("KVM: x86: always allow host-initiated
writes to PMU MSRs") will soon follow.

This reverts commit 8e6a58e28b34e8d247e772159b8fa8f6bae39192.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: VMX: Give host userspace full control of MSR_IA32_PERF_CAPABILITIES
Sean Christopherson [Sat, 11 Jun 2022 00:57:50 +0000 (00:57 +0000)]
KVM: VMX: Give host userspace full control of MSR_IA32_PERF_CAPABILITIES

Do not clear manipulate MSR_IA32_PERF_CAPABILITIES in intel_pmu_refresh(),
i.e. give userspace full control over capability/read-only MSRs.  KVM is
not a babysitter, it is userspace's responsiblity to provide a valid and
coherent vCPU model.

Attempting to "help" the guest by forcing a consistent model creates edge
cases, and ironicially leads to inconsistent behavior.

Example #1:  KVM doesn't do intel_pmu_refresh() when userspace writes
the MSR.

Example #2: KVM doesn't clear the bits when the PMU is disabled, or when
there's no architectural PMU.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220611005755.753273-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Give host userspace full control of MSR_IA32_MISC_ENABLES
Sean Christopherson [Sat, 11 Jun 2022 00:57:49 +0000 (00:57 +0000)]
KVM: x86: Give host userspace full control of MSR_IA32_MISC_ENABLES

Give userspace full control of the read-only bits in MISC_ENABLES, i.e.
do not modify bits on PMU refresh and do not preserve existing bits when
userspace writes MISC_ENABLES.  With a few exceptions where KVM doesn't
expose the necessary controls to userspace _and_ there is a clear cut
association with CPUID, e.g. reserved CR4 bits, KVM does not own the vCPU
and should not manipulate the vCPU model on behalf of "dummy user space".

The argument that KVM is doing userspace a favor because "the order of
setting vPMU capabilities and MSR_IA32_MISC_ENABLE is not strictly
guaranteed" is specious, as attempting to configure MSRs on behalf of
userspace inevitably leads to edge cases precisely because KVM does not
prescribe a specific order of initialization.

Example #1: intel_pmu_refresh() consumes and modifies the vCPU's
MSR_IA32_PERF_CAPABILITIES, and so assumes userspace initializes config
MSRs before setting the guest CPUID model.  If userspace sets CPUID
first, then KVM will mark PEBS as available when arch.perf_capabilities
is initialized with a non-zero PEBS format, thus creating a bad vCPU
model if userspace later disables PEBS by writing PERF_CAPABILITIES.

Example #2: intel_pmu_refresh() does not clear PERF_CAP_PEBS_MASK in
MSR_IA32_PERF_CAPABILITIES if there is no vPMU, making KVM inconsistent
in its desire to be consistent.

Example #3: intel_pmu_refresh() does not clear MSR_IA32_MISC_ENABLE_EMON
if KVM_SET_CPUID2 is called multiple times, first with a vPMU, then
without a vPMU.  While slightly contrived, it's plausible a VMM could
reflect KVM's default vCPU and then operate on KVM's copy of CPUID to
later clear the vPMU settings, e.g. see KVM's selftests.

Example #4: Enumerating an Intel vCPU on an AMD host will not call into
intel_pmu_refresh() at any point, and so the BTS and PEBS "unavailable"
bits will be left clear, without any way for userspace to set them.

Keep the "R" behavior of the bit 7, "EMON available", for the guest.
Unlike the BTS and PEBS bits, which are fully "RO", the EMON bit can be
written with a different value, but that new value is ignored.

Cc: Like Xu <likexu@tencent.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Message-Id: <20220611005755.753273-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agox86: kvm: remove NULL check before kfree
Dongliang Mu [Tue, 14 Jun 2022 13:34:58 +0000 (21:34 +0800)]
x86: kvm: remove NULL check before kfree

kfree can handle NULL pointer as its argument.
According to coccinelle isnullfree check, remove NULL check
before kfree operation.

Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Message-Id: <20220614133458.147314-1-dzm91@hust.edu.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Do not zero initialize 'pfn' in hva_to_pfn()
Sean Christopherson [Fri, 29 Apr 2022 01:04:07 +0000 (01:04 +0000)]
KVM: Do not zero initialize 'pfn' in hva_to_pfn()

Drop the unnecessary initialization of the local 'pfn' variable in
hva_to_pfn().  First and foremost, '0' is not an invalid pfn, it's a
perfectly valid pfn on most architectures.  I.e. if hva_to_pfn() were to
return an "uninitializd" pfn, it would actually be interpeted as a legal
pfn by most callers.

Second, hva_to_pfn() can't return an uninitialized pfn as hva_to_pfn()
explicitly sets pfn to an error value (or returns an error value directly)
if a helper returns failure, and all helpers set the pfn on success.

The zeroing of 'pfn' was introduced by commit 2fc843117d64 ("KVM:
reorganize hva_to_pfn"), probably to avoid "uninitialized variable"
warnings on statements that return pfn.  However, no compiler seems
to produce them, making the initialization unnecessary.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Shove refcounted page dependency into host_pfn_mapping_level()
Sean Christopherson [Fri, 29 Apr 2022 01:04:16 +0000 (01:04 +0000)]
KVM: x86/mmu: Shove refcounted page dependency into host_pfn_mapping_level()

Move the check that restricts mapping huge pages into the guest to pfns
that are backed by refcounted 'struct page' memory into the helper that
actually "requires" a 'struct page', host_pfn_mapping_level().  In
addition to deduplicating code, moving the check to the helper eliminates
the subtle requirement that the caller check that the incoming pfn is
backed by a refcounted struct page, and as an added bonus avoids an extra
pfn_to_page() lookup.

Note, the is_error_noslot_pfn() check in kvm_mmu_hugepage_adjust() needs
to stay where it is, as it guards against dereferencing a NULL memslot in
the kvm_slot_dirty_track_enabled() that follows.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-11-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page()
Sean Christopherson [Fri, 29 Apr 2022 01:04:15 +0000 (01:04 +0000)]
KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page()

Rename and refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page()
to better reflect what KVM is actually checking, and to eliminate extra
pfn_to_page() lookups.  The kvm_release_pfn_*() an kvm_try_get_pfn()
helpers in particular benefit from "refouncted" nomenclature, as it's not
all that obvious why KVM needs to get/put refcounts for some PG_reserved
pages (ZERO_PAGE and ZONE_DEVICE).

Add a comment to call out that the list of exceptions to PG_reserved is
all but guaranteed to be incomplete.  The list has mostly been compiled
by people throwing noodles at KVM and finding out they stick a little too
well, e.g. the ZERO_PAGE's refcount overflowed and ZONE_DEVICE pages
didn't get freed.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page()
Sean Christopherson [Fri, 29 Apr 2022 01:04:14 +0000 (01:04 +0000)]
KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page()

Operate on a 'struct page' instead of a pfn when checking if a page is a
ZONE_DEVICE page, and rename the helper accordingly.  Generally speaking,
KVM doesn't actually care about ZONE_DEVICE memory, i.e. shouldn't do
anything special for ZONE_DEVICE memory.  Rather, KVM wants to treat
ZONE_DEVICE memory like regular memory, and the need to identify
ZONE_DEVICE memory only arises as an exception to PG_reserved pages. In
other words, KVM should only ever check for ZONE_DEVICE memory after KVM
has already verified that there is a struct page associated with the pfn.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()
Sean Christopherson [Fri, 29 Apr 2022 01:04:13 +0000 (01:04 +0000)]
KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()

Drop helpers to convert a gfn/gpa to a 'struct page' in the context of a
vCPU.  KVM doesn't require that guests be backed by 'struct page' memory,
thus any use of helpers that assume 'struct page' is bound to be flawed,
as was the case for the recently removed last user in x86's nested VMX.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn
Sean Christopherson [Fri, 29 Apr 2022 01:04:12 +0000 (01:04 +0000)]
KVM: Don't WARN if kvm_pfn_to_page() encounters a "reserved" pfn

Drop a WARN_ON() if kvm_pfn_to_page() encounters a "reserved" pfn, which
in this context means a struct page that has PG_reserved but is not a/the
ZERO_PAGE and is not a ZONE_DEVICE page.  The usage, via gfn_to_page(),
in x86 is safe as gfn_to_page() is used only to retrieve a page from
KVM-controlled memslot, but the usage in PPC and s390 operates on
arbitrary gfns and thus memslots that can be backed by incompatible
memory.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page
Sean Christopherson [Fri, 29 Apr 2022 01:04:11 +0000 (01:04 +0000)]
KVM: nVMX: Use kvm_vcpu_map() to get/pin vmcs12's APIC-access page

Use kvm_vcpu_map() to get/pin the backing for vmcs12's APIC-access page,
there's no reason it has to be restricted to 'struct page' backing.  The
APIC-access page actually doesn't need to be backed by anything, which is
ironically why it got left behind by the series which introduced
kvm_vcpu_map()[1]; the plan was to shove a dummy pfn into vmcs02[2], but
that code never got merged.

Switching the APIC-access page to kvm_vcpu_map() doesn't preclude using a
magic pfn in the future, and will allow a future patch to drop
kvm_vcpu_gpa_to_page().

[1] https://lore.kernel.org/all/1547026933-31226-1-git-send-email-karahmed@amazon.de
[2] https://lore.kernel.org/lkml/1543845551-4403-1-git-send-email-karahmed@amazon.de

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Avoid pfn_to_page() and vice versa when releasing pages
Sean Christopherson [Fri, 29 Apr 2022 01:04:10 +0000 (01:04 +0000)]
KVM: Avoid pfn_to_page() and vice versa when releasing pages

Invert the order of KVM's page/pfn release helpers so that the "inner"
helper operates on a page instead of a pfn.  As pointed out by Linus[*],
converting between struct page and a pfn isn't necessarily cheap, and
that's not even counting the overhead of is_error_noslot_pfn() and
kvm_is_reserved_pfn().  Even if the checks were dirt cheap, there's no
reason to convert from a page to a pfn and back to a page, just to mark
the page dirty/accessed or to put a reference to the page.

Opportunistically drop a stale declaration of kvm_set_page_accessed()
from kvm_host.h (there was no implementation).

No functional change intended.

[*] https://lore.kernel.org/all/CAHk-=wifQimj2d6npq-wCi5onYPjzQg4vyO4tFcPJJZr268cRw@mail.gmail.com

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Don't set Accessed/Dirty bits for ZERO_PAGE
Sean Christopherson [Fri, 29 Apr 2022 01:04:09 +0000 (01:04 +0000)]
KVM: Don't set Accessed/Dirty bits for ZERO_PAGE

Don't set Accessed/Dirty bits for a struct page with PG_reserved set,
i.e. don't set A/D bits for the ZERO_PAGE.  The ZERO_PAGE (or pages
depending on the architecture) should obviously never be written, and
similarly there's no point in marking it accessed as the page will never
be swapped out or reclaimed.  The comment in page-flags.h is quite clear
that PG_reserved pages should be managed only by their owner, and
strictly following that mandate also simplifies KVM's logic.

Fixes: 7df003c85218 ("KVM: fix overflow of zero page refcount with ksm running")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Drop bogus "pfn != 0" guard from kvm_release_pfn()
Sean Christopherson [Fri, 29 Apr 2022 01:04:08 +0000 (01:04 +0000)]
KVM: Drop bogus "pfn != 0" guard from kvm_release_pfn()

Remove a check from kvm_release_pfn() to bail if the provided @pfn is
zero.  Zero is a perfectly valid pfn on most architectures, and should
not be used to indicate an error or an invalid pfn.  The bogus check was
added by commit 917248144db5 ("x86/kvm: Cache gfn to pfn translation"),
which also did the bad thing of zeroing the pfn and gfn to mark a cache
invalid.  Thankfully, that bad behavior was axed by commit 357a18ad230f
("KVM: Kill kvm_map_gfn() / kvm_unmap_gfn() and gfn_to_pfn_cache").

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220429010416.2788472-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Use common logic for computing the 32/64-bit base PA mask
Sean Christopherson [Tue, 14 Jun 2022 23:33:28 +0000 (23:33 +0000)]
KVM: x86/mmu: Use common logic for computing the 32/64-bit base PA mask

Use common logic for computing PT_BASE_ADDR_MASK for 32-bit, 64-bit, and
EPT paging.  Both PAGE_MASK and the new-common logic are supsersets of
what is actually needed for 32-bit paging.  PAGE_MASK sets bits 63:12 and
the former GUEST_PT64_BASE_ADDR_MASK sets bits 51:12, so regardless of
which value is used, the result will always be bits 31:12.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614233328.3896033-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Truncate paging32's PT_BASE_ADDR_MASK to 32 bits
Sean Christopherson [Tue, 14 Jun 2022 23:33:27 +0000 (23:33 +0000)]
KVM: x86/mmu: Truncate paging32's PT_BASE_ADDR_MASK to 32 bits

Truncate paging32's PT_BASE_ADDR_MASK to a pt_element_t, i.e. to 32 bits.
Ignoring PSE huge pages, the mask is only used in conjunction with gPTEs,
which are 32 bits, and so the address is limited to bits 31:12.

PSE huge pages encoded PA bits 39:32 in PTE bits 20:13, i.e. need custom
logic to handle their funky encoding regardless of PT_BASE_ADDR_MASK.

Note, PT_LVL_OFFSET_MASK is somewhat confusing in that it computes the
offset of the _gfn_, not of the gpa, i.e. not having bits 63:32 set in
PT_BASE_ADDR_MASK is again correct.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614233328.3896033-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Use common macros to compute 32/64-bit paging masks
Paolo Bonzini [Wed, 15 Jun 2022 14:15:56 +0000 (10:15 -0400)]
KVM: x86/mmu: Use common macros to compute 32/64-bit paging masks

Dedup the code for generating (most of) the per-type PT_* masks in
paging_tmpl.h.  The relevant macros only vary based on the number of bits
per level, and that smidge of info is already provided in a common form
as PT_LEVEL_BITS.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614233328.3896033-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Use separate namespaces for guest PTEs and shadow PTEs
Sean Christopherson [Tue, 14 Jun 2022 23:33:25 +0000 (23:33 +0000)]
KVM: x86/mmu: Use separate namespaces for guest PTEs and shadow PTEs

Separate the macros for KVM's shadow PTEs (SPTE) from guest 64-bit PTEs
(PT64).  SPTE and PT64 are _mostly_ the same, but the few differences are
quite critical, e.g. *_BASE_ADDR_MASK must differentiate between host and
guest physical address spaces, and SPTE_PERM_MASK (was PT64_PERM_MASK) is
very much specific to SPTEs.

Opportunistically (and temporarily) move most guest macros into paging.h
to clearly associate them with shadow paging, and to ensure that they're
not used as of this commit.  A future patch will eliminate them entirely.

Sadly, PT32_LEVEL_BITS is left behind in mmu_internal.h because it's
needed for the quadrant calculation in kvm_mmu_get_page().  The quadrant
calculation is hot enough (when using shadow paging with 32-bit guests)
that adding a per-context helper is undesirable, and burying the
computation in paging_tmpl.h with a forward declaration isn't exactly an
improvement.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614233328.3896033-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Dedup macros for computing various page table masks
Sean Christopherson [Tue, 14 Jun 2022 23:33:24 +0000 (23:33 +0000)]
KVM: x86/mmu: Dedup macros for computing various page table masks

Provide common helper macros to generate various masks, shifts, etc...
for 32-bit vs. 64-bit page tables.  Only the inputs differ, the actual
calculations are identical.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614233328.3896033-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Bury 32-bit PSE paging helpers in paging_tmpl.h
Sean Christopherson [Tue, 14 Jun 2022 23:33:23 +0000 (23:33 +0000)]
KVM: x86/mmu: Bury 32-bit PSE paging helpers in paging_tmpl.h

Move a handful of one-off macros and helpers for 32-bit PSE paging into
paging_tmpl.h and hide them behind "PTTYPE == 32".  Under no circumstance
should anything but 32-bit shadow paging care about PSE paging.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614233328.3896033-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: VMX: Refactor 32-bit PSE PT creation to avoid using MMU macro
Sean Christopherson [Tue, 14 Jun 2022 23:33:22 +0000 (23:33 +0000)]
KVM: VMX: Refactor 32-bit PSE PT creation to avoid using MMU macro

Compute the number of PTEs to be filled for the 32-bit PSE page tables
using the page size and the size of each entry.  While using the MMU's
PT32_ENT_PER_PAGE macro is arguably better in isolation, removing VMX's
usage will allow a future namespacing cleanup to move the guest page
table macros into paging_tmpl.h, out of the reach of code that isn't
directly related to shadow paging.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614233328.3896033-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Use lapic_in_kernel() to query in-kernel APIC in APICv helper
Sean Christopherson [Tue, 14 Jun 2022 23:05:48 +0000 (23:05 +0000)]
KVM: x86: Use lapic_in_kernel() to query in-kernel APIC in APICv helper

Use lapic_in_kernel() in kvm_vcpu_apicv_active() to take advantage of the
kvm_has_noapic_vcpu static branch.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614230548.3852141-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Move "apicv_active" into "struct kvm_lapic"
Sean Christopherson [Tue, 14 Jun 2022 23:05:47 +0000 (23:05 +0000)]
KVM: x86: Move "apicv_active" into "struct kvm_lapic"

Move the per-vCPU apicv_active flag into KVM's local APIC instance.
APICv is fully dependent on an in-kernel local APIC, but that's not at
all clear when reading the current code due to the flag being stored in
the generic kvm_vcpu_arch struct.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614230548.3852141-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Check for in-kernel xAPIC when querying APICv for directed yield
Sean Christopherson [Tue, 14 Jun 2022 23:05:46 +0000 (23:05 +0000)]
KVM: x86: Check for in-kernel xAPIC when querying APICv for directed yield

Use kvm_vcpu_apicv_active() to check if APICv is active when seeing if a
vCPU is a candidate for directed yield due to a pending ACPIv interrupt.
This will allow moving apicv_active into kvm_lapic without introducing a
potential NULL pointer deref (kvm_vcpu_apicv_active() effectively adds a
pre-check on the vCPU having an in-kernel APIC).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614230548.3852141-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86: Drop @vcpu parameter from kvm_x86_ops.hwapic_isr_update()
Sean Christopherson [Tue, 14 Jun 2022 23:05:45 +0000 (23:05 +0000)]
KVM: x86: Drop @vcpu parameter from kvm_x86_ops.hwapic_isr_update()

Drop the unused @vcpu parameter from hwapic_isr_update().  AMD/AVIC is
unlikely to implement the helper, and VMX/APICv doesn't need the vCPU as
it operates on the current VMCS.  The result is somewhat odd, but allows
for a decent amount of (future) cleanup in the APIC code.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614230548.3852141-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: SVM: Drop unused AVIC / kvm_x86_ops declarations
Sean Christopherson [Tue, 14 Jun 2022 23:05:44 +0000 (23:05 +0000)]
KVM: SVM: Drop unused AVIC / kvm_x86_ops declarations

Drop a handful of unused AVIC function declarations whose implementations
were removed during the conversion to optional static calls.

No functional change intended.

Fixes: abb6d479e226 ("KVM: x86: make several APIC virtualization callbacks optional")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614230548.3852141-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: nVMX: Update vmcs12 on BNDCFGS write, not at vmcs02=>vmcs12 sync
Sean Christopherson [Tue, 14 Jun 2022 21:58:31 +0000 (21:58 +0000)]
KVM: nVMX: Update vmcs12 on BNDCFGS write, not at vmcs02=>vmcs12 sync

Update vmcs12->guest_bndcfgs on intercepted writes to BNDCFGS from L2
instead of waiting until vmcs02 is synchronized to vmcs12.  KVM always
intercepts BNDCFGS accesses, so the only way the value in vmcs02 can
change is via KVM's explicit VMWRITE during emulation.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614215831.3762138-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: nVMX: Save BNDCFGS to vmcs12 iff relevant controls are exposed to L1
Sean Christopherson [Tue, 14 Jun 2022 21:58:30 +0000 (21:58 +0000)]
KVM: nVMX: Save BNDCFGS to vmcs12 iff relevant controls are exposed to L1

Save BNDCFGS to vmcs12 (from vmcs02) if and only if at least of one of
the load-on-entry or clear-on-exit fields for BNDCFGS is enumerated as an
allowed-1 bit in vmcs12.  Skipping the field avoids an unnecessary VMREAD
when MPX is supported but not exposed to L1.

Per Intel's SDM:

  If the processor supports either the 1-setting of the "load IA32_BNDCFGS"
  VM-entry control or that of the "clear IA32_BNDCFGS" VM-exit control, the
  contents of the IA32_BNDCFGS MSR are saved into the corresponding field.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614215831.3762138-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: nVMX: Rename nested.vmcs01_* fields to nested.pre_vmenter_*
Sean Christopherson [Tue, 14 Jun 2022 21:58:29 +0000 (21:58 +0000)]
KVM: nVMX: Rename nested.vmcs01_* fields to nested.pre_vmenter_*

Rename the fields in struct nested_vmx used to snapshot pre-VM-Enter
values to reflect that they can hold L2's values when restoring nested
state, e.g. if userspace restores MSRs before nested state.  As crazy as
it seems, restoring MSRs before nested state actually works (because KVM
goes out if it's way to make it work), even though the initial MSR writes
will hit vmcs01 despite holding L2 values.

Add a related comment to vmx_enter_smm() to call out that using the
common VM-Exit and VM-Enter helpers to emulate SMI and RSM is wrong and
broken.  The few MSRs that have snapshots _could_ be fixed by taking a
snapshot prior to the forced VM-Exit instead of at forced VM-Enter, but
that's just the tip of the iceberg as the rather long list of MSRs that
aren't snapshotted (hello, VM-Exit MSR load list) can't be handled this
way.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614215831.3762138-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: nVMX: Snapshot pre-VM-Enter DEBUGCTL for !nested_run_pending case
Sean Christopherson [Tue, 14 Jun 2022 21:58:28 +0000 (21:58 +0000)]
KVM: nVMX: Snapshot pre-VM-Enter DEBUGCTL for !nested_run_pending case

If a nested run isn't pending, snapshot vmcs01.GUEST_IA32_DEBUGCTL
irrespective of whether or not VM_ENTRY_LOAD_DEBUG_CONTROLS is set in
vmcs12.  When restoring nested state, e.g. after migration, without a
nested run pending, prepare_vmcs02() will propagate
nested.vmcs01_debugctl to vmcs02, i.e. will load garbage/zeros into
vmcs02.GUEST_IA32_DEBUGCTL.

If userspace restores nested state before MSRs, then loading garbage is a
non-issue as loading DEBUGCTL will also update vmcs02.  But if usersepace
restores MSRs first, then KVM is responsible for propagating L2's value,
which is actually thrown into vmcs01, into vmcs02.

Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state
is all kinds of bizarre and ideally would not be supported.  Sadly, some
VMMs do exactly that and rely on KVM to make things work.

Note, there's still a lurking SMM bug, as propagating vmcs01's DEBUGCTL
to vmcs02 across RSM may corrupt L2's DEBUGCTL.  But KVM's entire VMX+SMM
emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the
"default treatment of SMIs", i.e. when not using an SMI Transfer Monitor.

Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com
Fixes: 8fcc4b5923af ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614215831.3762138-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: nVMX: Snapshot pre-VM-Enter BNDCFGS for !nested_run_pending case
Sean Christopherson [Tue, 14 Jun 2022 21:58:27 +0000 (21:58 +0000)]
KVM: nVMX: Snapshot pre-VM-Enter BNDCFGS for !nested_run_pending case

If a nested run isn't pending, snapshot vmcs01.GUEST_BNDCFGS irrespective
of whether or not VM_ENTRY_LOAD_BNDCFGS is set in vmcs12.  When restoring
nested state, e.g. after migration, without a nested run pending,
prepare_vmcs02() will propagate nested.vmcs01_guest_bndcfgs to vmcs02,
i.e. will load garbage/zeros into vmcs02.GUEST_BNDCFGS.

If userspace restores nested state before MSRs, then loading garbage is a
non-issue as loading BNDCFGS will also update vmcs02.  But if usersepace
restores MSRs first, then KVM is responsible for propagating L2's value,
which is actually thrown into vmcs01, into vmcs02.

Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state
is all kinds of bizarre and ideally would not be supported.  Sadly, some
VMMs do exactly that and rely on KVM to make things work.

Note, there's still a lurking SMM bug, as propagating vmcs01.GUEST_BNDFGS
to vmcs02 across RSM may corrupt L2's BNDCFGS.  But KVM's entire VMX+SMM
emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the
"default treatment of SMIs", i.e. when not using an SMI Transfer Monitor.

Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com
Fixes: 62cf9bd8118c ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
Cc: stable@vger.kernel.org
Cc: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220614215831.3762138-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Use try_cmpxchg64 in fast_pf_fix_direct_spte
Uros Bizjak [Fri, 20 May 2022 14:46:35 +0000 (16:46 +0200)]
KVM: x86/mmu: Use try_cmpxchg64 in fast_pf_fix_direct_spte

Use try_cmpxchg64 instead of cmpxchg64 (*ptr, old, new) != old in
fast_pf_fix_direct_spte. cmpxchg returns success in ZF flag, so this
change saves a compare after cmpxchg (and related move instruction
in front of cmpxchg).

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Message-Id: <20220520144635.63134-1-ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: VMX: Use try_cmpxchg64 in pi_try_set_control
Uros Bizjak [Fri, 20 May 2022 14:37:37 +0000 (16:37 +0200)]
KVM: VMX: Use try_cmpxchg64 in pi_try_set_control

Use try_cmpxchg64 instead of cmpxchg64 (*ptr, old, new) != old
in pi_try_set_control.  cmpxchg returns success in ZF flag, so this
change saves a compare after cmpxchg (and related move instruction
in front of cmpxchg):

  b9:   88 44 24 60             mov    %al,0x60(%rsp)
  bd:   48 89 c8                mov    %rcx,%rax
  c0:   c6 44 24 62 f2          movb   $0xf2,0x62(%rsp)
  c5:   48 8b 74 24 60          mov    0x60(%rsp),%rsi
  ca:   f0 49 0f b1 34 24       lock cmpxchg %rsi,(%r12)
  d0:   48 39 c1                cmp    %rax,%rcx
  d3:   75 cf                   jne    a4 <vmx_vcpu_pi_load+0xa4>

patched:

  c1:   88 54 24 60             mov    %dl,0x60(%rsp)
  c5:   c6 44 24 62 f2          movb   $0xf2,0x62(%rsp)
  ca:   48 8b 54 24 60          mov    0x60(%rsp),%rdx
  cf:   f0 48 0f b1 13          lock cmpxchg %rdx,(%rbx)
  d4:   75 d5                   jne    ab <vmx_vcpu_pi_load+0xab>

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Reported-by: kernel test robot <lkp@intel.com>
Message-Id: <20220520143737.62513-1-ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Use try_cmpxchg64 in tdp_mmu_set_spte_atomic
Uros Bizjak [Wed, 18 May 2022 13:51:11 +0000 (15:51 +0200)]
KVM: x86/mmu: Use try_cmpxchg64 in tdp_mmu_set_spte_atomic

Use try_cmpxchg64 instead of cmpxchg64 (*ptr, old, new) != old in
tdp_mmu_set_spte_atomic.  cmpxchg returns success in ZF flag, so this
change saves a compare after cmpxchg (and related move instruction
in front of cmpxchg). Also, remove explicit assignment to iter->old_spte
when cmpxchg fails, this is what try_cmpxchg does implicitly.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Message-Id: <20220518135111.3535-1-ubizjak@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: VMX: Skip filter updates for MSRs that KVM is already intercepting
Sean Christopherson [Fri, 10 Jun 2022 21:41:40 +0000 (21:41 +0000)]
KVM: VMX: Skip filter updates for MSRs that KVM is already intercepting

When handling userspace MSR filter updates, recompute interception for
possible passthrough MSRs if and only if KVM wants to disabled
interception.  If KVM wants to intercept accesses, i.e. the associated
bit is set in vmx->shadow_msr_intercept, then there's no need to set the
intercept again as KVM will intercept the MSR regardless of userspace's
wants.

No functional change intended, the call to vmx_enable_intercept_for_msr()
really is just a gigantic nop.

Suggested-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220610214140.612025-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: x86/mmu: Drop unused CMPXCHG macro from paging_tmpl.h
Sean Christopherson [Mon, 13 Jun 2022 22:57:16 +0000 (22:57 +0000)]
KVM: x86/mmu: Drop unused CMPXCHG macro from paging_tmpl.h

Drop the CMPXCHG macro from paging_tmpl.h, it's no longer used now that
KVM uses a common uaccess helper to do 8-byte CMPXCHG.

Fixes: f122dfe44768 ("KVM: x86: Use __try_cmpxchg_user() to update guest PTE A/D bits")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220613225723.2734132-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: X86/SVM: Use root_level in svm_load_mmu_pgd()
Lai Jiangshan [Sun, 5 Jun 2022 06:34:17 +0000 (14:34 +0800)]
KVM: X86/SVM: Use root_level in svm_load_mmu_pgd()

Use root_level in svm_load_mmu_pg() rather that looking up the root
level in vcpu->arch.mmu->root_role.level. svm_load_mmu_pgd() has only
one caller, kvm_mmu_load_pgd(), which always passes
vcpu->arch.mmu->root_role.level as root_level.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220605063417.308311-7-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: X86/MMU: Remove useless mmu_topup_memory_caches() in kvm_mmu_pte_write()
Lai Jiangshan [Sun, 5 Jun 2022 06:34:16 +0000 (14:34 +0800)]
KVM: X86/MMU: Remove useless mmu_topup_memory_caches() in kvm_mmu_pte_write()

Since the commit c5e2184d1544("KVM: x86/mmu: Remove the defunct
update_pte() paging hook"), kvm_mmu_pte_write() no longer uses the rmap
cache.

So remove mmu_topup_memory_caches() in it.

Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220605063417.308311-6-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: Rename ack_flush() to ack_kick()
Lai Jiangshan [Sun, 5 Jun 2022 06:34:15 +0000 (14:34 +0800)]
KVM: Rename ack_flush() to ack_kick()

Make it use the same verb as in kvm_kick_many_cpus().

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220605063417.308311-5-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: X86/MMU: Remove unused PT32_DIR_BASE_ADDR_MASK from mmu.c
Lai Jiangshan [Sun, 5 Jun 2022 06:34:13 +0000 (14:34 +0800)]
KVM: X86/MMU: Remove unused PT32_DIR_BASE_ADDR_MASK from mmu.c

It is unused.

Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220605063417.308311-3-jiangshanlai@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: s390: selftests: Fix memop extension capability check
Janis Schoetterl-Glausch [Tue, 14 Jun 2022 16:26:35 +0000 (18:26 +0200)]
KVM: s390: selftests: Fix memop extension capability check

Fix the inverted logic of the memop extension capability check.

Fixes: 97da92c0ff92 ("KVM: s390: selftests: Use TAP interface in the memop test")
Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Message-Id: <20220614162635.3445019-1-scgl@linux.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: SVM: Hide SEV migration lockdep goo behind CONFIG_PROVE_LOCKING
Sean Christopherson [Mon, 13 Jun 2022 21:42:37 +0000 (21:42 +0000)]
KVM: SVM: Hide SEV migration lockdep goo behind CONFIG_PROVE_LOCKING

Wrap the manipulation of @role and the manual mutex_{release,acquire}()
invocations in CONFIG_PROVE_LOCKING=y to squash a clang-15 warning.  When
building with -Wunused-but-set-parameter and CONFIG_DEBUG_LOCK_ALLOC=n,
clang-15 seees there's no usage of @role in mutex_lock_killable_nested()
and yells.  PROVE_LOCKING selects DEBUG_LOCK_ALLOC, and the only reason
KVM manipulates @role is to make PROVE_LOCKING happy.

To avoid true ugliness, use "i" and "j" to detect the first pass in the
loops; the "idx" field that's used by kvm_for_each_vcpu() is guaranteed
to be '0' on the first pass as it's simply the first entry in the vCPUs
XArray, which is fully KVM controlled.  kvm_for_each_vcpu() passes '0'
for xa_for_each_range()'s "start", and xa_for_each_range() will not enter
the loop if there's no entry at '0'.

Fixes: 0c2c7c069285 ("KVM: SEV: Mark nested locking of vcpu->lock")
Reported-by: kernel test robot <lkp@intel.com>
Cc: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220613214237.2538266-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: SEV: fix misplaced closing parenthesis
Paolo Bonzini [Wed, 15 Jun 2022 12:03:53 +0000 (08:03 -0400)]
KVM: SEV: fix misplaced closing parenthesis

This caused a warning on 32-bit systems, but undoubtedly would have acted
funny on 64-bit as well.

The fix was applied directly on merge in 5.19, see commit 24625f7d91fb ("Merge
tag for-linus of git://git.kernel.org/pub/scm/virt/kvm/kvm").

Fixes: 3743c2f02517 ("KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Remove the mismatched parameter comments
Shaoqin Huang [Tue, 14 Jun 2022 22:41:19 +0000 (16:41 -0600)]
KVM: selftests: Remove the mismatched parameter comments

There are some parameter being removed in function but the parameter
comments still exist, so remove them.

Signed-off-by: Shaoqin Huang <shaoqin.huang@intel.com>
Message-Id: <20220614224126.211054-1-shaoqin.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Use kvm_has_cap(), not kvm_check_cap(), where possible
Sean Christopherson [Mon, 13 Jun 2022 16:19:42 +0000 (16:19 +0000)]
KVM: selftests: Use kvm_has_cap(), not kvm_check_cap(), where possible

Replace calls to kvm_check_cap() that treat its return as a boolean with
calls to kvm_has_cap().  Several instances of kvm_check_cap() were missed
when kvm_has_cap() was introduced.

Reported-by: Andrew Jones <drjones@redhat.com>
Fixes: 3ea9b809650b ("KVM: selftests: Add kvm_has_cap() to provide syntactic sugar")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220613161942.1586791-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Drop a duplicate TEST_ASSERT() in vm_nr_pages_required()
Sean Christopherson [Mon, 13 Jun 2022 16:19:41 +0000 (16:19 +0000)]
KVM: selftests: Drop a duplicate TEST_ASSERT() in vm_nr_pages_required()

Remove a duplicate TEST_ASSERT() on the number of runnable vCPUs in
vm_nr_pages_required() that snuck in during a rebase gone bad.

Reported-by: Andrew Jones <drjones@redhat.com>
Fixes: 6e1d13bf3815 ("KVM: selftests: Move per-VM/per-vCPU nr pages calculation to __vm_create()")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220613161942.1586791-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Call a dummy helper in VM/vCPU ioctls() to enforce type
Sean Christopherson [Mon, 13 Jun 2022 16:19:40 +0000 (16:19 +0000)]
KVM: selftests: Call a dummy helper in VM/vCPU ioctls() to enforce type

Replace the goofy static_assert on the size of the @vm/@vcpu parameters
with a call to a dummy helper, i.e. let the compiler naturally complain
about an incompatible type instead of homebrewing a poor replacement.

Reported-by: Andrew Jones <drjones@redhat.com>
Fixes: fcba483e8246 ("KVM: selftests: Sanity check input to ioctls() at build time")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220613161942.1586791-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Add a missing apostrophe in comment to show ownership
Sean Christopherson [Mon, 13 Jun 2022 16:19:39 +0000 (16:19 +0000)]
KVM: selftests: Add a missing apostrophe in comment to show ownership

Add an apostrophe in a comment about it being the caller's, not callers,
responsibility to free an object.

Reported-by: Andrew Jones <drjones@redhat.com>
Fixes: 768e9a61856b ("KVM: selftests: Purge vm+vcpu_id == vcpu silliness")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220613161942.1586791-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: kvm_binary_stats_test: Fix index expressions
Andrew Jones [Tue, 14 Jun 2022 08:10:41 +0000 (10:10 +0200)]
KVM: selftests: kvm_binary_stats_test: Fix index expressions

kvm_binary_stats_test accepts two arguments, the number of vms
and number of vcpus. If these inputs are not equal then the
test would likely crash for one reason or another due to using
miscalculated indices for the vcpus array. Fix the index
expressions by swapping the use of i and j.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <20220614081041.2571511-1-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Sanity check input to ioctls() at build time
Sean Christopherson [Wed, 1 Jun 2022 18:01:58 +0000 (11:01 -0700)]
KVM: selftests: Sanity check input to ioctls() at build time

Add a static assert to the KVM/VM/vCPU ioctl() helpers to verify that the
size of the argument provided matches the expected size of the IOCTL.
Because ioctl() ultimately takes a "void *", it's all too easy to pass in
garbage and not detect the error until runtime.  E.g. while working on a
CPUID rework, selftests happily compiled when vcpu_set_cpuid()
unintentionally passed the cpuid() function as the parameter to ioctl()
(a local "cpuid" parameter was removed, but its use was not replaced with
"vcpu->cpuid" as intended).

Tweak a variety of benign issues that aren't compatible with the sanity
check, e.g. passing a non-pointer for ioctls().

Note, static_assert() requires a string on older versions of GCC.  Feed
it an empty string to make the compiler happy.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Use TAP-friendly ksft_exit_skip() in __TEST_REQUIRE
Sean Christopherson [Fri, 10 Jun 2022 00:03:19 +0000 (17:03 -0700)]
KVM: selftests: Use TAP-friendly ksft_exit_skip() in __TEST_REQUIRE

Use the TAP-friendly ksft_exit_skip() instead of KVM's custom print_skip()
when skipping a test via __TEST_REQUIRE.  KVM's "skipping test" has no
known benefit, whereas some setups rely on TAP output.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Add TEST_REQUIRE macros to reduce skipping copy+paste
Sean Christopherson [Fri, 27 May 2022 23:24:02 +0000 (16:24 -0700)]
KVM: selftests: Add TEST_REQUIRE macros to reduce skipping copy+paste

Add TEST_REQUIRE() and __TEST_REQUIRE() to replace the myriad open coded
instances of selftests exiting with KSFT_SKIP after printing an
informational message.  In addition to reducing the amount of boilerplate
code in selftests, the UPPERCASE macro names make it easier to visually
identify a test's requirements.

Convert usage that erroneously uses something other than print_skip()
and/or "exits" with '0' or some other non-KSFT_SKIP value.

Intentionally drop a kvm_vm_free() in aarch64/debug-exceptions.c as part
of the conversion.  All memory and file descriptors are freed on process
exit, so the explicit free is superfluous.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Add kvm_has_cap() to provide syntactic sugar
Sean Christopherson [Fri, 27 May 2022 22:13:03 +0000 (15:13 -0700)]
KVM: selftests: Add kvm_has_cap() to provide syntactic sugar

Add kvm_has_cap() to wrap kvm_check_cap() and return a bool for the use
cases where the caller only wants check if a capability is supported,
i.e. doesn't care about the value beyond whether or not it's non-zero.
The "check" terminology is somewhat ambiguous as the non-boolean return
suggests that '0' might mean "success", i.e. suggests that the ioctl uses
the 0/-errno pattern.  Provide a wrapper instead of trying to find a new
name for the raw helper; the "check" terminology is derived from the name
of the ioctl, so using e.g. "get" isn't a clear win.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Return an 'unsigned int' from kvm_check_cap()
Sean Christopherson [Fri, 27 May 2022 22:09:52 +0000 (15:09 -0700)]
KVM: selftests: Return an 'unsigned int' from kvm_check_cap()

Return an 'unsigned int' instead of a signed 'int' from kvm_check_cap(),
to make it more obvious that kvm_check_cap() can never return a negative
value due to its assertion that the return is ">= 0".

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Drop DEFAULT_GUEST_PHY_PAGES, open code the magic number
Sean Christopherson [Tue, 3 May 2022 22:26:02 +0000 (15:26 -0700)]
KVM: selftests: Drop DEFAULT_GUEST_PHY_PAGES, open code the magic number

Remove DEFAULT_GUEST_PHY_PAGES and open code the magic number (with a
comment) in vm_nr_pages_required().  Exposing DEFAULT_GUEST_PHY_PAGES to
tests was a symptom of the VM creation APIs not cleanly supporting tests
that create runnable vCPUs, but can't do so immediately.  Now that tests
don't have to manually compute the amount of memory needed for basic
operation, make it harder for tests to do things that should be handled
by the framework, i.e. force developers to improve the framework instead
of hacking around flaws in individual tests.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Trust that MAXPHYADDR > memslot0 in vmx_apic_access_test
Sean Christopherson [Tue, 3 May 2022 21:48:59 +0000 (14:48 -0700)]
KVM: selftests: Trust that MAXPHYADDR > memslot0 in vmx_apic_access_test

Use vm->max_gfn to compute the highest gpa in vmx_apic_access_test, and
blindly trust that the highest gfn/gpa will be well above the memory
carved out for memslot0.  The existing check is beyond paranoid; KVM
doesn't support CPUs with host.MAXPHYADDR < 32, and the selftests are all
kinds of hosed if memslot0 overlaps the local xAPIC, which resides above
"lower" (below 4gb) DRAM.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Move per-VM/per-vCPU nr pages calculation to __vm_create()
Sean Christopherson [Tue, 3 May 2022 16:52:48 +0000 (09:52 -0700)]
KVM: selftests: Move per-VM/per-vCPU nr pages calculation to __vm_create()

Handle all memslot0 size adjustments in __vm_create().  Currently, the
adjustments reside in __vm_create_with_vcpus(), which means tests that
call vm_create() or __vm_create() directly are left to their own devices.
Some tests just pass DEFAULT_GUEST_PHY_PAGES and don't bother with any
adjustments, while others mimic the per-vCPU calculations.

For vm_create(), and thus __vm_create(), take the number of vCPUs that
will be runnable to calculate that number of per-vCPU pages needed for
memslot0.  To give readers a hint that neither vm_create() nor
__vm_create() create vCPUs, name the parameter @nr_runnable_vcpus instead
of @nr_vcpus.  That also gives readers a hint as to why tests that create
larger numbers of vCPUs but never actually run those vCPUs can skip
straight to the vm_create_barebones() variant.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Drop @num_percpu_pages from __vm_create_with_vcpus()
Sean Christopherson [Tue, 3 May 2022 00:39:47 +0000 (17:39 -0700)]
KVM: selftests: Drop @num_percpu_pages from __vm_create_with_vcpus()

Drop @num_percpu_pages from __vm_create_with_vcpus(), all callers pass
'0' and there's unlikely to be a test that allocates just enough memory
that it needs a per-CPU allocation, but not so much that it won't just do
its own memory management.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Drop @slot0_mem_pages from __vm_create_with_vcpus()
Sean Christopherson [Tue, 3 May 2022 00:25:17 +0000 (17:25 -0700)]
KVM: selftests: Drop @slot0_mem_pages from __vm_create_with_vcpus()

All callers of __vm_create_with_vcpus() pass DEFAULT_GUEST_PHY_PAGES for
@slot_mem_pages; drop the param and just hardcode the "default" as the
base number of pages for slot0.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Open code and drop 'struct kvm_vm' accessors
Sean Christopherson [Thu, 17 Feb 2022 00:51:20 +0000 (16:51 -0800)]
KVM: selftests: Open code and drop 'struct kvm_vm' accessors

Drop a variety of 'struct kvm_vm' accessors that wrap a single variable
now that tests can simply reference the variable directly.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Remove vcpu_state() helper
Sean Christopherson [Thu, 17 Feb 2022 00:48:13 +0000 (16:48 -0800)]
KVM: selftests: Remove vcpu_state() helper

Drop vcpu_state() now that all tests reference vcpu->run directly.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Drop vcpu_get(), rename vcpu_find() => vcpu_exists()
Sean Christopherson [Thu, 17 Feb 2022 00:46:46 +0000 (16:46 -0800)]
KVM: selftests: Drop vcpu_get(), rename vcpu_find() => vcpu_exists()

Drop vcpu_get() and rename vcpu_find() to vcpu_exists() to make it that
much harder for a test to give meaning to a vCPU ID.  I.e. force tests to
capture a vCPU when the vCPU is created.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Purge vm+vcpu_id == vcpu silliness
Sean Christopherson [Thu, 2 Jun 2022 20:41:33 +0000 (13:41 -0700)]
KVM: selftests: Purge vm+vcpu_id == vcpu silliness

Take a vCPU directly instead of a VM+vcpu pair in all vCPU-scoped helpers
and ioctls.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Require vCPU output array when creating VM with vCPUs
Sean Christopherson [Wed, 16 Feb 2022 21:53:23 +0000 (13:53 -0800)]
KVM: selftests: Require vCPU output array when creating VM with vCPUs

Require the caller of __vm_create_with_vcpus() to provide a non-NULL
array of vCPUs now that all callers do so.  It's extremely unlikely a
test will have a legitimate use case for creating a VM with vCPUs without
wanting to do something with those vCPUs, and if there is such a use case,
requiring that one-off test to provide a dummy array is a minor
annoyance.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Remove vcpu_get() usage from dirty_log_test
Sean Christopherson [Thu, 17 Feb 2022 00:44:34 +0000 (16:44 -0800)]
KVM: selftests: Remove vcpu_get() usage from dirty_log_test

Grab the vCPU from vm_vcpu_add() directly instead of doing vcpu_get()
after the fact.  This will allow removing vcpu_get() entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Stop conflating vCPU index and ID in perf tests
Sean Christopherson [Wed, 16 Feb 2022 21:38:12 +0000 (13:38 -0800)]
KVM: selftests: Stop conflating vCPU index and ID in perf tests

Track vCPUs by their 'struct kvm_vcpu' object, and stop assuming that a
vCPU's ID is the same as its index when referencing a vCPU's metadata.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Stop hardcoding vCPU IDs in vcpu_width_config
Sean Christopherson [Wed, 20 Apr 2022 19:15:50 +0000 (12:15 -0700)]
KVM: selftests: Stop hardcoding vCPU IDs in vcpu_width_config

In preparation for taking a vCPU pointer in vCPU-scoped functions, grab
the vCPU(s) created by __vm_vcpu_add() and use the ID from the vCPU
object instead of hardcoding the ID in ioctl() invocations.

Rename init1/init2 => init0/init1 to avoid having odd/confusing code
where vcpu0 consumes init1 and vcpu1 consumes init2.

Note, this change could easily be done when the functions are converted
in the future, and/or the vcpu{0,1} vs. init{1,2} discrepancy could be
ignored, but then there would be no opportunity to poke fun at the
1-based counting scheme.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert get-reg-list away from its "VCPU_ID"
Sean Christopherson [Fri, 18 Feb 2022 01:01:58 +0000 (17:01 -0800)]
KVM: selftests: Convert get-reg-list away from its "VCPU_ID"

Track the vCPU's 'struct kvm_vcpu' object in get-reg-list instead of
hardcoding '0' everywhere.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert kvm_binary_stats_test away from vCPU IDs
Sean Christopherson [Thu, 17 Feb 2022 00:16:32 +0000 (16:16 -0800)]
KVM: selftests: Convert kvm_binary_stats_test away from vCPU IDs

Track vCPUs by their 'struct kvm_vcpu' object in kvm_binary_stats_test,
not by their ID.  The per-vCPU helpers will soon take a vCPU instead of a
VM+vcpu_id pair.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert kvm_page_table_test away from reliance on vcpu_id
Sean Christopherson [Wed, 16 Feb 2022 21:06:18 +0000 (13:06 -0800)]
KVM: selftests: Convert kvm_page_table_test away from reliance on vcpu_id

Reference vCPUs by their 'struct kvm_vcpu' object in kvm_page_table_test
instead of by their ID.  This moves selftests one step closer towards
taking a 'struct kvm_vcpu *' instead of VM+vcpu_id for vCPU helpers.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Drop @vcpuids param from VM creators
Sean Christopherson [Wed, 16 Feb 2022 20:49:13 +0000 (12:49 -0800)]
KVM: selftests: Drop @vcpuids param from VM creators

Drop the @vcpuids parameter from VM creators now that there are no users.
Allowing tests to specify IDs was a gigantic mistake as it resulted in
tests with arbitrary and ultimately meaningless IDs that differed only
because the author used test X intead of test Y as the source for
copy+paste (the de facto standard way to create a KVM selftest).

Except for literally two tests, x86's set_boot_cpu_id and s390's resets,
tests do not and should not care about the vCPU ID.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Drop vm_create_default* helpers
Sean Christopherson [Wed, 16 Feb 2022 20:45:22 +0000 (12:45 -0800)]
KVM: selftests: Drop vm_create_default* helpers

Drop all vm_create_default*() helpers, the "default" naming turned out to
terrible as wasn't extensible (hard to have multiple defaults), was a lie
(half the settings were default, half weren't), and failed to capture
relationships between helpers, e.g. compared with the kernel's standard
underscores pattern.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Use vm_create_with_vcpus() in max_guest_memory_test
Sean Christopherson [Tue, 19 Apr 2022 18:35:28 +0000 (11:35 -0700)]
KVM: selftests: Use vm_create_with_vcpus() in max_guest_memory_test

Use vm_create_with_vcpus() in max_guest_memory_test and reference vCPUs
by their 'struct kvm_vcpu' object instead of their ID.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Use vm_create() in tsc_scaling_sync
Sean Christopherson [Tue, 19 Apr 2022 00:35:33 +0000 (17:35 -0700)]
KVM: selftests: Use vm_create() in tsc_scaling_sync

Use vm_create() instead of vm_create_default_with_vcpus() in
tsc_scaling_sync.  The existing call doesn't create any vCPUs, and the
guest_code() entry point is set when vm_vcpu_add_default() is invoked.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert tprot away from VCPU_ID
Sean Christopherson [Thu, 21 Apr 2022 15:34:06 +0000 (08:34 -0700)]
KVM: selftests: Convert tprot away from VCPU_ID

Convert tprot to use vm_create_with_vcpus() and pass around a
'struct kvm_vcpu' object instead of passing around vCPU IDs.  Note, this is
a "functional" change in the sense that the test now creates a vCPU with
vcpu_id==0 instead of vcpu_id==1.  The non-zero VCPU_ID was 100% arbitrary
and added little to no validation coverage.  If testing non-zero vCPU IDs
is desirable for generic tests, that can be done in the future by tweaking
the VM creation helpers.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert s390x/diag318_test_handler away from VCPU_ID
Sean Christopherson [Wed, 16 Feb 2022 20:40:18 +0000 (12:40 -0800)]
KVM: selftests: Convert s390x/diag318_test_handler away from VCPU_ID

Convert diag318_test_handler to use vm_create_with_vcpus() and pass around a
'struct kvm_vcpu' object instead of passing around vCPU IDs.  Note, this is
a "functional" change in the sense that the test now creates a vCPU with
vcpu_id==0 instead of vcpu_id==6.  The non-zero VCPU_ID was 100% arbitrary
and added little to no validation coverage.  If testing non-zero vCPU IDs
is desirable for generic tests, that can be done in the future by tweaking
the VM creation helpers.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert memop away from VCPU_ID
Sean Christopherson [Wed, 16 Feb 2022 20:39:34 +0000 (12:39 -0800)]
KVM: selftests: Convert memop away from VCPU_ID

Pass around a 'struct kvm_vcpu' object instead of a vCPU ID in s390's
memop test.  Pass NULL for the vCPU instead of a magic '-1' ID to
indicate that an ioctl/test should be done at VM scope.

Rename "struct test_vcpu vcpu" to "struct test_info info" in order to
avoid naming collisions (this is the bulk of the diff :-( ).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert s390's "resets" test away from VCPU_ID
Sean Christopherson [Wed, 16 Feb 2022 20:38:26 +0000 (12:38 -0800)]
KVM: selftests: Convert s390's "resets" test away from VCPU_ID

Pass around a 'struct kvm_vcpu' object in the "resets" test instead of
referencing the vCPU by the global VCPU_ID.  Rename the #define for the
vCPU's ID to ARBITRARY_NON_ZERO_VCPU_ID to make it more obvious that (a)
the value matters but (b) is otherwise arbitrary.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert sync_regs_test away from VCPU_ID
Sean Christopherson [Wed, 16 Feb 2022 20:20:41 +0000 (12:20 -0800)]
KVM: selftests: Convert sync_regs_test away from VCPU_ID

Convert sync_regs_test to use vm_create_with_vcpus() and pass around a
'struct kvm_vcpu' object instead of passing around vCPU IDs.  Note, this
is a "functional" change in the sense that the test now creates a vCPU
with vcpu_id==0 instead of vcpu_id==5.  The non-zero VCPU_ID was 100%
arbitrary and added little to no validation coverage.  If testing
non-zero vCPU IDs is desirable for generic tests, that can be done in the
future by tweaking the VM creation helpers.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert xapic_ipi_test away from *_VCPU_ID
Sean Christopherson [Wed, 16 Feb 2022 20:10:40 +0000 (12:10 -0800)]
KVM: selftests: Convert xapic_ipi_test away from *_VCPU_ID

Convert vm_create_with_one_vcpu to use vm_create_with_vcpus() and pass
around 'struct kvm_vcpu' objects instead of passing around vCPU IDs.
Don't bother with macros for the HALTER versus SENDER indices, the vast
majority of references don't differentiate between the vCPU roles, and
the code that does either has a comment or an explicit reference to the
role, e.g. to halter_guest_code() or sender_guest_code().

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert hypercalls test away from vm_create_default()
Sean Christopherson [Thu, 2 Jun 2022 00:32:52 +0000 (17:32 -0700)]
KVM: selftests: Convert hypercalls test away from vm_create_default()

Use a combination of vm_create(), vm_create_with_vcpus(), and
    vm_vcpu_add() to convert vgic_init from vm_create_default_with_vcpus(),
    and away from referncing vCPUs by ID.

Thus continues the march toward total annihilation of "default" helpers.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Sync stage before VM is freed in hypercalls test
Sean Christopherson [Thu, 2 Jun 2022 00:27:51 +0000 (17:27 -0700)]
KVM: selftests: Sync stage before VM is freed in hypercalls test

Sync the next stage using the VM before said VM is potentially freed by
the TEST_STAGE_HVC_IFACE_FEAT_DISABLED stage.

Opportunistically take a double pointer in anticipation of also having to
set the new vCPU pointer once the test stops hardcoding '0' everywhere.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Consolidate KVM_{G,S}ET_ONE_REG helpers
Sean Christopherson [Thu, 2 Jun 2022 00:16:11 +0000 (17:16 -0700)]
KVM: selftests: Consolidate KVM_{G,S}ET_ONE_REG helpers

Rework vcpu_{g,s}et_reg() to provide the APIs that tests actually want to
use, and drop the three "one-off" implementations that cropped up due to
the poor API.

Ignore the handful of direct KVM_{G,S}ET_ONE_REG calls that don't fit the
APIs for one reason or another.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2 years agoKVM: selftests: Convert vgic_init away from vm_create_default_with_vcpus()
Sean Christopherson [Wed, 16 Feb 2022 19:57:41 +0000 (11:57 -0800)]
KVM: selftests: Convert vgic_init away from vm_create_default_with_vcpus()

Use a combination of vm_create(), vm_create_with_vcpus(), and
vm_vcpu_add() to convert vgic_init from vm_create_default_with_vcpus(),
and away from referncing vCPUs by ID.

Thus continues the march toward total annihilation of "default" helpers.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>