git.osdn.net Git - qmiga/qemu.git/log

PPC: KVM: Fix BAT put

In the sregs API, upper and lower 32bit segments of the BAT registers
are swapped when doing a set. Since we need to support old kernels out
there, don't bother to fix it in the kernel, but instead work around
the problem in QEMU by swapping on put.

Signed-off-by: Alexander Graf <agraf@suse.de>

PPC: e500: Only expose even TLB sizes in initial TLB

When booting our e500 machine, we automatically generate a big TLB entry
in TLB1 that covers all of the code we need to run in there until the guest
can handle its TLB on its own.

However, e500v2 can only handle MAS1.0 sizes. However, we keep our TLB
information in MAS2.0 layout, which means we have twice as many TLB sizes
to choose from. That also means we can run into a situation where we try
to add a TLB size that could not fit into the MAS1.0 size bits.

Fix it by making sure we always have the lower bit set to 0. That way we
are always guaranteed to have MAS1.0 compatible TLB size information.

Signed-off-by: Alexander Graf <agraf@suse.de>

ppc/pseries: Reset VPA registration on CPU reset

The ppc specific CPU state contains several variables which track the
VPA, SLB shadow and dispatch trace log. These are structures shared
between OS and hypervisor that are used on the pseries machine to track
various per-CPU quantities.

The address of these structures needs to be registered by the guest on each
boot, however currently this registration is not cleared when we reset the
cpu. This patch corrects this bug.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Don't test for MSR_PR for hypercalls under KVM

PAPR hypercalls should only be invoked from the guest kernel, not guest
user programs, that is, with MSR[PR]=0. Currently we check this in
spapr_hypercall, returning H_PRIVILEGE if MSR[PR]=1.

However, under KVM the state of MSR[PR] is already checked by the host
kernel before passing the hypercall to qemu, making this check redundant.
Worse, however, we don't generally synchronize KVM and qemu state on the
hypercall path, meaning that qemu could incorrectly reject a hypercall
because it has a stale MSR value.

This patch fixes the problem by moving the privilege test exclusively to
the TCG hypercall path.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
CC: qemu-stable@nongnu.org
Signed-off-by: Alexander Graf <agraf@suse.de>

PPC: e500: calculate initrd_base like dt_base

While investigating dtb pad issues, I noticed that initrd_base wasn't taking
loadaddr into account the way dt_base was. This seems wrong.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

PPC: e500: increase DTC_LOAD_PAD

An allowance of 5 MiB for BSS is not enough for Linux kernels with certain
debug options enabled (not sure exactly which one caused it, but I'd guess
lockdep).  The kernel I ran into this with had a BSS of around 6.4 MB.

Unfortunately, uImage does not give us enough information to determine the
actual BSS size.  Increase the allowance to 18 MiB to give us plenty of
room.  Eventually this should be more intelligent, possibly packing
initrd+dtb at the end of guest RAM.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

device tree: simplify dumpdtb code

As per Peter's suggestion, we can use glib to write out a buffer in whole to
a file, simplifying the code dramatically.

Signed-off-by: Alexander Graf <agraf@suse.de>

fdt: move dumpdtb interpretation code to device_tree.c

The dumpdtb code can be useful in more places than just for e500. Move it
to a generic place.

Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Remove unused power_mode field from cpu state

CPUPPCState includes a variable 'power_mode' which is used nowhere. This
patch removes it. This includes saving a dummy zero in its place during
vmsave, to avoid breaking the save format.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Set hash table size based on RAM size

Currently the pseries machine code always attempts to set the size of the
guests's hash page table to 16MB.  However, because of the way the POWER
MMU works, a suitable hash page table size should really depend on memory
size.  16MB will be excessive for guests with <1GB and RAM, and may not be
enough for guests with >2GB of RAM (depending on guest page size and
other factors).

The usual given rule of thumb is that the hash table should be 1/64 of
the size of memory, but in fact the Linux guests we are aiming at don't
really need that much.  This patch, therefore, changes the hash table
allocation code to aim for 1/128 of the size of RAM (rounding up).  When
using KVM, this size may still be adjusted by the host kernel if it is
unable to allocate a suitable (contiguous) table.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Remove unnecessary locking from PAPR hash table hcalls

In the paravirtualized environment provided by PAPR, there is a standard
locking scheme so that hypercalls updating the hash page table from
different guest threads don't corrupt the haah table state.  We implement
this HVLOCK bit in out page table hypercalls.  However, it is not necessary
in our case, since the hypercalls all run in the qemu environment under the
big qemu lock.

Therefore, this patch removes the locking code.  This has the additional
advantage of freeing up a hash PTE bit which will be useful for migration
support.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

ppc405_uc: Fix buffer overflow

Report from smatch:

ppc405_uc.c:209 dcr_read_pob(12) error: buffer overflow 'pob->besr' 2 <= 2
ppc405_uc.c:232 dcr_write_pob(12) error: buffer overflow 'pob->besr' 2 <= 2

The old code reads and writes besr[POB0_BESR1 - POB0_BESR0] or besr[2]
which is one too much.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: KVM: Fix some kernel version edge cases for kvmppc_reset_htab()

The kvmppc_reset_htab() function invokes the KVM_PPC_ALLOCATE_HTAB vm ioctl
to request KVM to allocate and reset a hash page table for the guest - it
returns the size of hash table allocated, or 0 to indicate that qemu needs
to allocate the hash table itself.  In practice qemu needs to allocate the
htab for full emulation and with Book3sPR KVM, but the kernel has to
allocate it for Book3sHV KVM (the hash table needs to be physically
contiguous in that case).

Unfortunately, the logic in this function is incorrect for some existing
kernels.  Specifically:
  * at least some PR KVM versions advertise the relevant capability but
don't actually implement the ioctl(), returning ENOTTY.
  * For old kernels which don't have the capability, we currently return 0.
This is correct for PV KVM, where we need to allocate the htab, but not for
HV KVM - kernels of this era always allocate a 16MB hash table per guest.

This patch corrects both of these edge cases.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Fix semantics of RTAS int-on, int-off and set-xive functions

Currently the ibm,int-on and ibm,int-off RTAS functions are implemented as
no-ops.  This is because when implemented as specified in PAPR they caused
Linux (which calls both int-on/off and set-xive) to end up with interrupts
masked when they should not be.  Since Linux's set-xive calls make the
int-on/off calls redundant, making them nops worked around the problem.

In fact, the problem was caused because there was a subtle bug in set-xive,
PAPR specifies that as well as updating the current priority, it also needs
to update the saved priority used by int-on/off.  With this bug fixed the
problem goes away.  This patch implements this more correct fix.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Rework implementation of TCE bypass

On the pseries machine the IOMMU (aka TCE tables) is always active for all
PCI and VIO devices.  Mostly to simplify the SLOF firmware, we implement an
extension which allows the IOMMU to be temporarily disabled for certain
devices.

Currently this is implemented by setting the device's DMAContext pointer to
NULL (thus reverting to qemu's default no-IOMMU DMA behaviour), then
replacing it when bypass mode is disabled.

This approach causes a bunch of complications though.  It complexifies the
management of the DMAContext lifetimes, it's problematic for savevm/loadvm,
and it means that while bypass is active we have nowhere to store the
device's LIOBN (Logical IO Bus Number, used to identify DMA address
spaces).  At present we regenerate the LIOBN from other address information
but this restricts how we can allocate LIOBNs.

This patch gives up on this approach, replacing it with the much simpler
one of having a 'bypass' boolean flag in the TCE state structure.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Remove never used flags field from spapr vio devices

The general device state structure for PAPR VIO emulated devices includes a
'flags' field which was never used. This patch removes it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Remove XICS irq type enum type

Currently the XICS interrupt controller emulation uses a custom enum to
specify whether a given interrupt is level-sensitive or message-triggered.
This enum makes life awkward for saving the state, and isn't particularly
useful since there are only two possibilities. This patch replaces the
enum with a simple bool.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Remove C bitfields from xics code

The XICS interrupt controller emulation uses some C bitfield variables in
its internal state structure. This makes like awkward for saving the state
because we don't have easy VMSTATE helpers for bitfields.

This patch removes the bitfields, instead using explicit bit masking in a
single status variable.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Small cleanup to H_CEDE implementation

The H_CEDE hypercall implementation for the pseries machine doesn't trigger
quite the right path in the main cpu exec loop. We should set exit_request
to pop up one extra level and recheck state, and we should set the
exception_index to EXCP_HLT (H_CEDE is roughly equivalent to the hlt
instruction on x86).

In practice, this doesn't really matter except for KVM, and KVM implements
H_CEDE internally so we never hit this code path. But we might as well
get it right, just in case it matters some day.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Fix XICS reset

The XICS interrupt controller used on the pseries machine currently has no
reset handler. We can get away with this under some circumstances, but
it's not correct, and can cause failures if the XICS happens to be in the
wrong state at the time of reset.

This patch adds a hook to properly reset the XICS state.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Reset emulated PCI TCE tables on system reset

The emulated PCI host bridge on the pseries machine incorporates an IOMMU
(PAPR TCE table).  Currently the mappings in this IOMMU are not cleared
when we reset the system.  This patch fixes this bug.  To do this it adds
a new reset function to the IOMMU emulation code.  The VIO devices already
reset their TCE tables, but they do so by destroying and re-creating their
DMA context.  This doesn't work for the PCI host bridge, because the
infrastructure for PCI IOMMUs has already copied/cached the DMA pointer
context into the subordinate PCI device structures.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Clear TCE and signal state when resetting PAPR VIO devices

When we reset the system, the reset method for VIO bus devices resets
the state of their request queue (if present) as it should.  However
it was not resetting the state of their TCE table (DMA translation) if
present.  It was also not resetting the state of the per-device signal
mask set with H_VIO_SIGNAL.  This patch corrects both bugs, and also
removes some small code duplication in the reset paths.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Add support for new KVM hash table control call

This adds support for then new "reset htab" ioctl which allows qemu
to properly cleanup the MMU hash table when the guest is reset. With
the corresponding kernel support, reset of a guest now works properly.

This also paves the way for indicating a different size hash table
to the kernel and for the kernel to be able to impose limits on
the requested size.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Use new method to correct reset sequence

A number of things need to occur during reset of the PAPR
paravirtualized platform in a specific order.  For example, the hash
table needs to be cleared before the CPUs are reset, so that they
initialize their register state correctly, and the CPUs need to have
their main reset called before we set up the entry point state on the
boot cpu.  We also need to have the main qdev reset happen before the
creation and installation of the device tree for the new boot, because
we need the state of the devices settled to correctly construct the
device tree.

We currently do the pseries once-per-reset initializations done from a
reset handler.  However we can't adequately control when this handler
is called during the reset - in particular we can't guarantee it
happens after all the qdev resets (since qdevs might be registered
after the machine init function has executed).

This patch uses the new QEMUMachine reset method to to fix this
problem, ensuring the various order dependent reset steps happen in
the correct order.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>

pseries: Fix and cleanup CPU initialization and reset

The current pseries machine init function iterates over the CPUs at several
points, doing various bits of initialization.  This is messy; these can
and should be merged into a single iteration doing all the necessary per
cpu initialization.  Worse, some of these initializations were setting up
state which should be set on every reset, not just at machine init time.
A few of the initializations simply weren't necessary at all.

This patch, therefore, moves those things that need to be to the
per-cpu reset handler, and combines the remainder into two loops over
the cpus (which also creates them).  The second loop is for setting up
hash table information, and will be removed in a subsequent patch also
making other fixes to the hash table setup.

This exposes a bug in our start-cpu RTAS routine (called by the guest to
start up CPUs other than CPU0) under kvm.  Previously, this function did
not make a call to ensure that it's changes to the new cpu's state were
pushed into KVM in-kernel state.  We sort-of got away with this because
some of the initializations had already placed the secondary CPUs into the
right starting state for the sorts of Linux guests we've been running.

Nonetheless the start-cpu RTAS call's behaviour was not correct and could
easily have been broken by guest changes.  This patch also fixes it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>

ppc: Make kvm_arch_put_registers() put *all* the registers

At least when invoked with high enough 'level' arguments,
kvm_arch_put_registers() is supposed to copy essentially all the cpu state
as encoded in qemu's internal structures into the kvm state.  Currently
the ppc version does not do this - it never calls KVM_SET_SREGS, for
example, and therefore never sets the SDR1 and various other important
though rarely changed registers.

Instead, the code paths which need to set these registers need to
explicitly make (conditional) kvm calls which transfer the changes to kvm.
This breaks the usual model of handling state updates in qemu, where code
just changes the internal model and has it flushed out to kvm automatically
at some later point.

This patch fixes this for Book S ppc CPUs by adding a suitable call to
KVM_SET_SREGS and als to KVM_SET_ONE_REG to set the HIOR (the only register
that is set with that call so far).  This lets us remove the hacks to
explicitly set these registers from the kvmppc_set_papr() function.

The problem still exists for Book E CPUs (which use a different version of
the kvm_sregs structure).  But fixing that has some complications of its
own so can be left to another day.

Lkewise, there is still some ugly code for setting the PVR through special
calls to SET_SREGS which is left in for now.  The PVR needs to be set
especially early because it can affect what other features are available
on the CPU, so I need to do more thinking to see if it can be integrated
into the normal paths or not.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: get rid of the HANDLE_NAN{1, 2, 3} macros

We can finally get rid of the ugly HANDLE_NAN{1,2,3} macros.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: use the softfloat float32_muladd function

Use the new softfloat float32_muladd() function to implement the vmaddfp
and vnmsubfp instructions. As a bonus we can get rid of the call to the
HANDLE_NAN3 macro, as the NaN handling is directly done at the softfloat
level.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: use the softfloat min/max functions

Use the new softfloat float32_min() and float32_max() to implement the
vminfp and vmaxfp instructions. As a bonus we can get rid of the call to
the HANDLE_NAN2 macro, as the NaN handling is directly done at the
softfloat level.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: simplify NaN propagation for vector functions

Commit e024e881bb1a8b5085026589360d26ed97acdd64 provided a pickNaN()
function for PowerPC, implementing the correct NaN propagation rules.
Therefore there is no need to test the operands manually, we can rely
on the softfloat code to do that.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

MAINTAINERS: Document virtex_ml507 machine

Place it in alphabetical order, there is a separate section for sharing
ppc4xx devices now.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

MAINTAINERS: Document Bamboo machine and ppc4xx devices

Place it in alphabetical order and add new Devices section ppc4xx to
share file rules with 405 and virtex_ml507.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>

MAINTAINERS: Downgrade ppc405 to Odd Fixes

As requested by Alex.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Alexander Graf <agraf@suse.de>

MAINTAINERS: Document e500 machines and devices

Signed-off-by: Andreas Färber <afaerber@suse.de>
Cc: Alexander Graf <agraf@suse.de>
Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>

MAINTAINERS: Document sPAPR (pSeries) machine

Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

fpu/softfloat.c: Return correctly signed values from uint64_to_float32

The uint64_to_float32() conversion function was incorrectly always
returning numbers with the sign bit set (ie negative numbers). Correct
this so we return positive numbers instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

fpu/softfloat.c: Remove pointless shift of always-zero value

In float16_to_float32, when returning an infinity, just pass zero
as the mantissa argument to packFloat32(), rather than shifting
a value which we know must be zero.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

vfio_pci: fix build on 32-bit systems

We cannot cast directly from pointer to uint64.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alex Barcelo <abarcelo@ac.upc.edu>
Reported-by: Alex Barcelo <abarcelo@ac.upc.edu>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

vfio: Enable vfio-pci and mark supported

Enabled for all softmmu guests supporting PCI on Linux hosts. Note
that currently only x86 hosts have the kernel side VFIO IOMMU support
for this. PPC (g3beige) is the only non-x86 guest known to work.
ARM (veratile) hangs in firmware, others untested.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

vfio: vfio-pci device assignment driver

This adds the core of the QEMU VFIO-based PCI device assignment driver.
To make use of this driver, enable CONFIG_VFIO, CONFIG_VFIO_IOMMU_TYPE1,
and CONFIG_VFIO_PCI in your host Linux kernel config.  Load the vfio-pci
module.  To assign device 0000:05:00.0 to a guest, do the following:

for dev in $(ls /sys/bus/pci/devices/0000:05:00.0/iommu_group/devices); do
    vendor=$(cat /sys/bus/pci/devices/$dev/vendor)
    device=$(cat /sys/bus/pci/devices/$dev/device)
    if [ -e /sys/bus/pci/devices/$dev/driver ]; then
        echo $dev > /sys/bus/pci/devices/$dev/driver/unbind
    fi
    echo $vendor $device > /sys/bus/pci/drivers/vfio-pci/new_id
done

See Documentation/vfio.txt in the Linux kernel tree for further
description of IOMMU groups and VFIO.

Then launch qemu including the option:

-device vfio-pci,host=0000:05:00.0

Legacy PCI interrupts (INTx) currently makes use of a kludge where we
trap BAR accesses and assume the access is in response to an interrupt,
therefore de-asserting and unmasking the interrupt.  It's not quite as
targetted as using the EOI for this, but it's self contained and seems
to work across all architectures.  The side-effect is a significant
performance slow-down for device in INTx mode.  Some devices, like
graphics cards, don't really use their interrupt, so this can be turned
off with the x-intx=off option, which disables INTx alltogether.  This
should be considered an experimental option until we refine this code.
Both MSI and MSI-X are supported and avoid these issues.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Update Linux kernel headers

Based on Linux as of 1a95620.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Update kernel header script to include vfio

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

x86: Implement SMEP and SMAP

This patch implements Supervisor Mode Execution Prevention (SMEP) and
Supervisor Mode Access Prevention (SMAP) for x86.  The purpose of the
patch, obviously, is to help kernel developers debug the support for
those features.

A fair bit of the code relates to the handling of CPUID features.  The
CPUID code probably would get greatly simplified if all the feature
bit words were unified into a single vector object, but in the
interest of producing a minimal patch for SMEP/SMAP, and because I had
very limited time for this project, I followed the existing style.

[ v2: don't change the definition of the qemu64 CPU shorthand, since
  that breaks loading old snapshots.  Per Anthony Liguori this can be
  fixed once the CPU feature set is snapshot.

  Change the coding style slightly to conform to checkpatch.pl. ]

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

i386: -cpu help: remove reference to specific CPUID leaves/registers

The -cpu configuration interface is based on a list of feature names or
properties, on a single namespace, so there's no need to mention on
which CPUID leaf/register each flag is located.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Don Slutz <Don@CloudSwitch.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

i386: cpu: eliminate duplicate feature names

Instead of having duplicate feature names on the ext2_feature array for
the AMD feature bit aliases, we keep the feature names only on the
feature_name[] array, and copy the corresponding bits to
cpuid_ext2_features in case the CPU vendor is AMD.

This will:

- Make sure we don't set the feature bit aliases on Intel CPUs;
- Make it easier to convert feature bits to CPU properties, as now we
have a single bit on the x86_def_t struct for each CPU feature.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Don Slutz <Don@CloudSwitch.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

i386: cpu: replace EXT2_FEATURE_MASK with CPUID_EXT2_AMD_ALIASES

Both constants have the same value, but CPUID_EXT2_AMD_ALIASES is
defined without using magic numbers.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Don Slutz <Don@CloudSwitch.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

i386: kvm: use a #define for the set of alias feature bits

Instea of using a hardcoded hex constant, define CPUID_EXT2_AMD_ALIASES
as the set of CPUID[8000_0001].EDX bits that on AMD are the same as the
bits of CPUID[1].EDX.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-By: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Don Slutz <Don@CloudSwitch.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

i386: kvm: bit 10 of CPUID[8000_0001].EDX is reserved

Bit 10 of CPUID[8000_0001].EDX is not defined as an alias of
CPUID[1].EDX[10], so do not duplicate it on
kvm_arch_get_supported_cpuid().

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-By: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Don Slutz <Don@CloudSwitch.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

Merge branch 'arm-devs.for-upstream' of git://git.linaro.org/people/pmaydell/qemu-arm

* 'arm-devs.for-upstream' of git://git.linaro.org/people/pmaydell/qemu-arm:
  Versatile Express: Add modelling of NOR flash
  Versatile Express: Fix NOR flash 0 address and remove flash alias
  hw/armv7m_nvic: Correctly register GIC region when setting up NVIC
  pl190: fix read of VECTADDR

target-s390x: Tidy cpu_dump_state

The blank lines inside the single dump make it difficult for the
eye to pick out the block. Worse, with interior newlines, but
no blank line following, the PSW line appears to belong to the
next dump block.

Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

target-s390x: Avoid double CPU_LOG_TB_CPU

This is already handled generically in cpu_exec.

Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

target-s390x: Use CPU_LOG_INT

Three places in the interrupt code did we not honor the mask.

Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

target-unicore32: Call tcg_gen_debug_insn_start

Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

target-s390x: Call tcg_gen_debug_insn_start

Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

target-m68k: Call tcg_gen_debug_insn_start

Cc: Paul Brook <paul@codesourcery.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Emit debug_insn for CPU_LOG_TB_OP_OPT as well.

For all targets that currently call tcg_gen_debug_insn_start,
add CPU_LOG_TB_OP_OPT to the condition that gates it.

This is useful for comparing optimization dumps, when the
pre-optimization dump is merely noise.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tci: Fix for AREG0 free mode

Support for helper functions with 5 arguments was missing
in the code generator and in the interpreter.

There is no need to pass the constant TCG_AREG0 from the
code generator to the interpreter. Remove that code for
the INDEX_op_qemu_st* opcodes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Versatile Express: Add modelling of NOR flash

This patch adds modelling of the two NOR flash banks found on the
Versatile Express motherboard. Tested with U-Boot running on an emulated
Versatile Express, with either A9 or A15 CoreTile.

Signed-off-by: Francesco Lavra <francescolavra.fl@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Versatile Express: Fix NOR flash 0 address and remove flash alias

In the A series memory map (implemented in the Cortex A15 CoreTile), the
first NOR flash bank (flash 0) is mapped to address 0x08000000, while
address 0x00000000 can be configured as alias to either the first or the
second flash bank. This patch fixes the definition of flash 0 address,
and for simplicity removes the alias definition.

Signed-off-by: Francesco Lavra <francescolavra.fl@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/armv7m_nvic: Correctly register GIC region when setting up NVIC

When setting up the NVIC memory regions the memory range
0x100..0xcff is aliased to an IO memory region that belongs
to the ARM GIC. This aliased region should be added to the
NVIC memory container, but the actual GIC IO memory region
was being added instead. This mixup was causing the wrong
IO memory access functions to be called when accessing parts
of the NVIC memory.

Signed-off-by: Meador Inge <meadori@codesourcery.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

pl190: fix read of VECTADDR

Reading VECTADDR was causing us to set the current priority to
the wrong value, the most obvious effect of which was that we
would return the vector for the wrong interrupt as the result
of the read.

Signed-off-by: Brendan Fennell <bfennell@skynet.ie>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

add a boot parameter to set reboot timeout

Added an option to let qemu transfer a configuration file to bios,
"etc/boot-fail-wait", which could be specified by command
-boot reboot-timeout=T
T have a max value of 0xffff, unit is ms.

With this option, guest will wait for a given time if not find
bootabled device, then reboot. If reboot-timeout is '-1', guest
will not reboot, qemu passes '-1' to bios by default.

This feature need the new seabios's support.

Seabios pulls the value from the fwcfg "file" interface, this
interface is used because SeaBIOS needs a reliable way of
obtaining a name, value size, and value. It in no way requires
that there be a real file on the user's host machine.

Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Clear handler only for valid fd

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Fix address handling in inet_nonblocking_connect

getaddrinfo can give us a list of addresses, but we only try to
connect to the first one. If that fails we never proceed to
the next one. This is common on desktop setups that often have ipv6
configured but not actually working.

To fix this make inet_connect_nonblocking retry connection with a different
address.
callers on inet_nonblocking_connect register a callback function that will
be called when connect opertion completes, in case of failure the fd will have
a negative value

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Separate inet_connect into inet_connect (blocking) and inet_nonblocking_connect

No need to add non blocking parameters to the blocking inet_connect
add block parameter for inet_connect_opts instead of using QemuOpt "block".

Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Refactor inet_connect_opts function

refactor address resolution code to fix nonblocking connect
remove getnameinfo call

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Amos Kong <akong@redhat.com>
Signed-off-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

configure: Allow builds without any system or user emulation

The old code aborted configure when no emulation target was selected.
Even after removing the 'exit 1', it tried to read from STDIN
when QEMU was configured with

configure' '--disable-user' '--disable-system'

This is fixed here.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

ivshmem: add 64bit option

This patch adds a "use64" property which will make the ivshmem driver
register a 64bit memory bar when set, so you have something to play with
when testing 64bit pci bits.  It also allows to have quite big shared
memory regions, like this:

[root@fedora ~]# lspci -vs1:1
01:01.0 RAM memory: Red Hat, Inc Device 1110
        Subsystem: Red Hat, Inc Device 1100
        Physical Slot: 1-1
        Flags: fast devsel
        Memory at fd400000 (32-bit, non-prefetchable) [disabled] [size=256]
        Memory at 8040000000 (64-bit, prefetchable) [size=1G]

[ v5: rebase, update compat property for post-1.2 merge ]
[ v4: rebase & adapt to latest master again ]
[ v3: rebase & adapt to latest master ]
[ v2: default to on as suggested by avi,
      turn off for pc-$old using compat property ]

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Tested-by: Cam Macdonell <cam@cs.ualberta.ca>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

compat: turn off msi/msix on xhci for old machine types

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

add pc-1.3 machine type

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Cleanup unused global var qemu_system_powerdown

All deps that used global qemu_system_powerdown var are now converted
to notifiers, so remove it.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

target-sparc: use notifier for signaling guest system_powerdown command

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

target-arm: use notifier for signaling guest system_powerdown command

Acked-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

acpi: use notifier for signaling guest system_powerdown command

In addition, there is no need to allocate an extra irq just for
rising SCI in irq handler. Just rise SCI right from notifier
handler instead.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Introduce powerdown_notifiers

Notifier will be used for signaling powerdown request to guest in
a more general way and intended to replace very specific
qemu_irq_rise(qemu_system_powerdown) and will allow to remove global
variable qemu_system_powerdown.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Merge remote-tracking branch 'origin/master' into staging

* origin/master:
  tcg/i386: fix build with -march < i686
  tcg: Streamline movcond_i64 using movcond_i32
  tcg: Streamline movcond_i64 using 32-bit arithmetic
  tcg: Sanity check goto_tb input
  tcg: Sanity check deposit inputs
  tcg: Add tcg_debug_assert
  tcg: Implement concat*_i64 with deposit_i64
  tcg: Emit XORI as NOT for appropriate constants
  tcg: Optimize initial inputs for ori_i64
  tcg: Emit ANDI as EXTU for appropriate constants
  tcg: Adjust descriptions of *cond opcodes
  tcg/mips: fix MIPS32(R2) detection

tcg/i386: fix build with -march < i686

The movcond_i32 op has to be protected with TCG_TARGET_HAS_movcond_i32
to fix the build with -march < i686.

Thanks to Richard Henderson for the hint.

Reported-by: Alex Barcelo <abarcelo@ac.upc.edu>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Streamline movcond_i64 using movcond_i32

When movcond_i32 is available we can further reduce the generated
op count from 12 to 6, and the generated code size on i686 from
88 to 74 bytes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Streamline movcond_i64 using 32-bit arithmetic

Avoiding 64-bit arithmetic (outside of the compare) reduces the
generated op count from 15 to 12, and the generated code size on
i686 from 105 to 88 bytes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Sanity check goto_tb input

Checking that we don't try for idx != [01] is trivial. Checking
that we don't issue more than one of any index requires a tad
more data and some ifdefs protecting that new variable.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Sanity check deposit inputs

Given these are constants, checking once here means everything
after can assume they're correct.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Add tcg_debug_assert

Like the C assert macro, except only enabled for CONFIG_DEBUG_TCG,
and without having to set _NDEBUG and disable all other asserts at
the same time.

The use of __builtin_unreachable (when available) gives the compiler
the same information, which may (or may not) help it optimize better.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Implement concat*_i64 with deposit_i64

For tcg_gen_concat_i32_i64 we only use deposit if the host supports it.
For tcg_gen_concat32_i64 even if the host does not, as we get identical
code before and after.

Note that this relies on the ANDI -> EXTU patch for the identity claim.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Emit XORI as NOT for appropriate constants

Note that xori_i64 failed to perform even the minimal
optimizations promised by the README.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Optimize initial inputs for ori_i64

Copy the same optimizations from ori_i32.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Emit ANDI as EXTU for appropriate constants

Note that andi_i64 failed to perform even the minimal
optimizations promised by the README.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg: Adjust descriptions of *cond opcodes

The README file documented the operand ordering of the tcg_gen_*
functions. Since we're documenting opcodes here, use the true
operand ordering.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Cc: malc <av1474@comtv.ru>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg/mips: fix MIPS32(R2) detection

Fix the MIPS32(R2) cpu detection so that it also works with
-march=octeon. Thanks to Andrew Pinski for the hint.

Cc: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Merge remote-tracking branch 'kwolf/for-anthony' into staging

* kwolf/for-anthony:
  block: remove keep_read_only flag from BlockDriverState struct
  block: convert bdrv_commit() to use bdrv_reopen()
  block: vpc image file reopen
  block: vdi image file reopen
  block: vmdk image file reopen
  block: qcow image file reopen
  block: qcow2 image file reopen
  block: qed image file reopen
  block: raw image file reopen
  block: raw-posix image file reopen
  block: purge s->aligned_buf and s->aligned_buf_size from raw-posix.c
  block: use BDRV_O_NOCACHE instead of s->aligned_buf in raw-posix.c
  block: do not parse BDRV_O_CACHE_WB in block drivers
  block: move open flag parsing in raw block drivers to helper functions
  block: move aio initialization into a helper function
  block: Framework for reopening files safely
  block: make bdrv_set_enable_write_cache() modify open_flags
  block: correctly set the keep_read_only flag
  blockdev: preserve readonly and snapshot states across media changes

Merge remote-tracking branch 'stefanha/trivial-patches' into staging

* stefanha/trivial-patches:
  w32: Always use standard instead of native format strings
  net/socket: Fix compiler warning (regression for MinGW)
  linux-user: Remove redundant null check and replace free by g_free
  qemu-timer: simplify qemu_run_timers
  TextConsole: saturate escape parameter in TTY_STATE_CSI
  curses: don't initialize curses when qemu is daemonized
  dtrace backend: add function to reserved words
  pflash_cfi01: Fix warning caused by unreachable code
  ioh3420: Remove unreachable code
  lm4549: Fix buffer overflow
  cadence_uart: Fix buffer overflow
  qemu-sockets: Fix potential memory leak
  qemu-ga: Remove unreachable code after g_error
  target-i386: Allow tsc-frequency to be larger then 2.147G

Merge remote-tracking branch 'afaerber/qom-cpu' into staging

* afaerber/qom-cpu:
  target-alpha: Initialize env->cpu_model_str
  target-i386: Drop unused setscalar() macro
  target-i386: Kill cpudef config section support
  target-i386: x86_cpudef_setup() coding style change
  Eliminate cpus-x86_64.conf file
  target-i386: Move CPU models from cpus-x86_64.conf to C
  target-i386: Add missing CPUID_* constants
  Drop cpu_list_id macro
  target-i386: Fold -cpu ?cpuid, ?model output into -cpu help, drop ?dump
  MAINTAINERS: Add entry for QOM CPU

Merge remote-tracking branch 'bonzini/scsi-next' into staging

* bonzini/scsi-next:
  SCSI: Standard INQUIRY data should report HiSup flag as set.
  scsi-disk: use scsi_data_cdb_length
  scsi: introduce scsi_cdb_length and scsi_data_cdb_length
  scsi-disk: fix check for out-of-range LBA
  scsi-disk: introduce check_lba_range
  iSCSI: We dont need to explicitely call qemu_notify_event() any more
  iSCSI: We need to support SG_IO also from iscsi_ioctl()

Merge remote-tracking branch 'bonzini/nbd-next' into staging

* bonzini/nbd-next:
  nbd: add nbd_export_get_blockdev
  nbd: negotiate with named exports
  nbd: register named exports
  qemu-nbd: rewrite termination conditions to use a state machine
  nbd: add notification for closing an NBDExport
  nbd: track clients into NBDExport
  nbd: add reference counting to NBDExport
  nbd: do not leak nbd_trip coroutines when a connection is torn down
  nbd: make refcount interface public
  nbd: do not close BlockDriverState in nbd_export_close
  nbd: pass NBDClient to nbd_send_negotiate
  nbd: add more constants

block: remove keep_read_only flag from BlockDriverState struct

The keep_read_only flag is no longer used, in favor of the bdrv
flag BDRV_O_ALLOW_RDWR.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: convert bdrv_commit() to use bdrv_reopen()

Currently, bdrv_commit() reopens images r/w itself, via risky
_delete() and _open() calls. Use the new safe method for drive reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: vpc image file reopen

There is currently nothing that needs to be done for VPC image
file reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: vdi image file reopen

There is currently nothing that needs to be done for VDI reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: vmdk image file reopen

This patch supports reopen for VMDK image files. VMDK extents are added
to the existing reopen queue, so that the transactional model of reopen
is maintained with multiple image files.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: qcow image file reopen

These are the stubs for the file reopen drivers for the qcow format.

There is currently nothing that needs to be done by the qcow driver
in reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: qcow2 image file reopen

These are the stubs for the file reopen drivers for the qcow2 format.

There is currently nothing that needs to be done by the qcow2 driver
in reopen.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>