git.osdn.net Git - qmiga/qemu.git/log

target/ppc: Simplify cpu valid check in ppc_cpu_realize

The #if isn't necessary, because there's a suitable one inside
ppc_cpu_is_valid(). We've already filtered for suitable cpu models in the
functions that search and register them. So by the time we get to realize
having an invalid one indicates a code error, not a user error, so an
assert() is more appropriate than error_setg().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

target/ppc: Standardize instance_init and realize function names

Because of the various hooks called some variant on "init" - and the rather
greater number that used to exist, I'm always wondering when a function
called simply "*_init" or "*_initfn" will be called.

To make it easier on myself, and maybe others, rename the instance_init
hooks for ppc cpus to *_instance_init(). While we're at it rename the
realize time hooks to *_realize() (from *_realizefn()) which seems to be
the more common current convention.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>

spapr: drop useless dynamic sysbus device sanity check

Since commit 7da79a167aa11, the machine class init function registers
dynamic sysbus device types it supports. Passing an unsupported device
type on the command line causes QEMU to exit with an error message
just after machine init.

It is hence not needed to do the same sanity check at machine reset.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Revert "spapr: Don't allow memory hotplug to memory less nodes"

This reverts commit b556854bd8524c26b8be98ab1bfdf0826831e793.

Leave change @node type from uint32_t to to int from reverted commit
because node < 0 is always false.

Note that implementing capability or some trick to detect if guest
kernel does not support hot-add to memory: this returns previous
behavour where memory added to first non-empty node.

Signed-off-by: Serhii Popovych <spopovyc@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: drop useless sanity check in spapr_irq_alloc*()

Both spapr_irq_alloc() and spapr_irq_alloc_block() have an errp
parameter, but they don't use it if XICS hasn't been initialized
yet.

This is doubly wrong:

- all callers do pass a non-null Error **, ie, they expect an error
to be propagated in case of failure

- XICS obviously needs to be initialized before anything starts allocating
IRQs

So this patch turns the check into an assert.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Add host_memory_backend_pagesize() helper

There are a couple places (one generic, one target specific) where we need
to get the host page size associated with a particular memory backend. I
have some upcoming code which will add another place which wants this. So,
for convenience, add a helper function to calculate this.

host_memory_backend_pagesize() returns the host pagesize for a given
HostMemoryBackend object.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

Make qemu_mempath_getpagesize() accept NULL

qemu_mempath_getpagesize() gets the effective (host side) page size for
a block of memory backed by an mmap()ed file on the host. It requires
the mem_path parameter to be non-NULL.

This ends up meaning all the callers need a different case for handling
anonymous memory (for memory-backend-ram or default memory with -mem-path
is not specified).

We can make all those callers a little simpler by having
qemu_mempath_getpagesize() accept NULL, and treat that as the anonymous
memory case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>

spapr: Introduce pseries-2.13 machine type

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: Fix reserved bit mask of dstst instruction

According to the Vector/SIMD extension documentation bit 6 that is
currently masked is valid (listed as transient bit) but bits 7 and 8
should be reserved instead. Fix the mask to match this.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc: Fix size of ppc64 xer register

The normal gdb definition of the XER registers is only 32 bit,
and that's what the current version of power64-core.xml also
says (seems copied from gdb's).  But qemu's idea of the XER register
is target_ulong (in CPUPPCState, ppc_gdb_register_len and
ppc_cpu_gdb_read_register)

That mismatch leads to the following message when attaching
with gdb:

  Truncated register 32 in remote 'g' packet

(and following on that qemu stops responding).  The simple fix is
to say the truth in the .xml file.  But the better fix is to
actually make it 32bit on the wire, as old gdbs don't support
XML files for describing registers.  Also the XER state in qemu
doesn't seem to use the high 32 bits, so sending it off to gdb
doesn't seem worthwhile.

Signed-off-by: Michael Matz <matz@suse.de>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: rename UNINState to UNINHostState

The existing UNINState actually represents the PCI/AGP host bridge stage so
rename it accordingly.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: move PCI IO (ISA) memory region into the uninorth device

Do this for both the uninorth main and uninorth u3 AGP buses, using the main
PCI bus for each machine (this ensures the IO addresses still match those
used by OpenBIOS).

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: use object link to pass OpenPIC object to uninorth

Now that the OpenPIC is wired up via the board, we can now remove our temporary
PIC qdev pointer property and replace it with an object link instead.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: remove obsolete pci_pmac_u3_init() function

Instead wire up the PCI/AGP host bridges in mac_newworld.c. Now this is complete
it is possible to move the initialisation of the PCI hole alias into
pci_u3_agp_init().

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: remove obsolete pci_pmac_init() function

Instead wire up the PCI/AGP host bridges in mac_newworld.c. Now this is complete
it is possible to move the initialisation of the PCI hole alias into
pci_unin_main_init().

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: enable internal PCI host bridge

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: fix PCI and AGP bus mixup

Somewhere in the history of time, the initialisation of the PCI buses for the
AGP and PCI host bridges got mixed up in that the PCI host bridge was
creating an instance of the AGP PCI bus, and the AGP PCI bus was missing.

Swap the PCI host bridge over to use the correct PCI bus (including setting
the kMacRISCPCIAddressSelect register used by MacOS X) and add the missing
reference to the AGP PCI bus.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: move PCI host bridge bus initialisation into device realize

Since the IO address space is fixed to use the standard system IO address
space then we can also use the opportunity to remove the address_space_io
parameter from pci_pmac_init() and pci_pmac_u3_init().

Note we also move the default mac99 PCI bus to the end of the initialisation
list so that it becomes the default destination for any devices specified
via -device without an explicit PCI bus provided.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: introduce temporary pic_irqs device property

This is in preparation for moving the PCI bus wiring inside the uninorth
host bridge devices. In the future it will be possible to remove this once the
PICs have been switched to use qdev GPIOs.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: move PCI mmio memory region initialisation into init function

Whilst we are here, rename the memory regions to better reflect whether they
belong to either a PCI or an AGP bus.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

mac_oldworld: move wiring of macio IRQs to macio_oldworld_realize()

Since the macio device has a link to the PIC device, we can now wire up the
IRQs directly via qdev GPIOs rather than having to use an intermediate array.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

mac_oldworld: remove pics IRQ array and wire up macio to heathrow directly

Introduce constants for the pre-defined Old World IRQs to help keep things
readable.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

grackle: move PCI IO (ISA) memory region into the grackle device

This simplifies the Old World machine to simply mapping the ISA memory region
into the main address space.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

grackle: remove deprecated pci_grackle_init() function

Instead wire up the grackle device inside the Mac Old World machine.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

grackle: general tidy-up and QOMify

This is the first step towards removing the old-style pci_grackle_init()
function. Following on from the previous commit we can now pass the heathrow
device as an object link and wire up the heathrow IRQs via qdev GPIOs.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

heathrow: remove obsolete heathow_init() function

Instead wire up heathrow to the CPU and grackle PCI host using qdev GPIOs.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: alter pci_pmac_init() and pci_pmac_u3_init() to return uninorth device

This is in preparation for moving the device wiring into the New World machine.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: move uninorth definitions into uninorth.h

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
[dwg: Added hw/hw.h #include as suggested by Philippe Mathieu-Daudé]
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: remove stray PCIBus realize from mac_newworld.c

After QOMification this is clearly no longer needed (and possibly hasn't been
for some time).

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: QOMify PCI and AGP host bridges

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: remove second set of uninorth token registers

Commit 593c181160: "PPC: Newworld: Add second uninorth control register set"
added a second set of uninorth registers at 0xf3000000.

Testing MacOS 9.2 to MacOS X 10.4 reveals no accesses to this address and I
can't find any reference to it in Apple's Core99.cpp source so I'm assuming
that this was the result of another bug that has now been fixed.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

uninorth: trivial style fixups

This makes sure we keep patchew/checkpatch happy during the remainder of this
patchset.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Merge remote-tracking branch 'remotes/iwj/tags/for-upstream.depriv-2' into staging

xen: xen-domid-restrict improvements

# gpg: Signature made Thu 26 Apr 2018 19:11:22 BST
# gpg:                using RSA key E3E3392348B50D39
# gpg: Good signature from "Ian Jackson (new general purpose key) <ijackson@chiark.greenend.org.uk>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 559A E46C 2D6B 6D32 65E7  CBA1 E3E3 3923 48B5 0D39

* remotes/iwj/tags/for-upstream.depriv-2:
  configure: do_compiler: Dump some extra info under bash
  os-posix: cleanup: Replace perror with error_report
  os-posix: cleanup: Replace fprintf with error_report in remaining call sites
  xen: Expect xenstore write to fail when restricted
  xen: Remove now-obsolete xen_xc_domain_add_to_physmap
  xen: Use newly added dmops for mapping VGA memory
  os-posix: Provide new -runas <uid>:<gid> facility
  os-posix: cleanup: Replace fprintfs with error_report in change_process_uid
  xen: destroy_hvm_domain: Try xendevicemodel_shutdown
  xen: move xc_interface compatibility fallback further up the file
  xen: destroy_hvm_domain: Move reason into a variable
  xen: defer call to xen_restrict until just before os_setup_post
  xen: restrict: use xentoolcore_restrict_all
  xen: link against xentoolcore
  AccelClass: Introduce accel_setup_post
  checkpatch: Add xendevicemodel_handle to the list of types

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

configure: do_compiler: Dump some extra info under bash

This makes it much easier to find a particular thing in config.log.

We have to use the ${BASH_LINENO[*]} syntax which is a syntax error in
other shells, so test what shell we are running and use eval.

The extra output is only printed if configure is run with bash. On
systems where /bin/sh is not bash, it is necessary to say bash
./configure to get the extra debug info in the log.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Kent R. Spillner <kspillner@acm.org>
CC: Janosch Frank <frankja@linux.vnet.ibm.com>
CC: Thomas Huth <thuth@redhat.com>
CC: Peter Maydell <peter.maydell@linaro.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Thomas Huth <thuth@redhat.com>

os-posix: cleanup: Replace perror with error_report

perror() is defined to fprintf(stderr,...). HACKING says
fprintf(stderr,...) is wrong. So perror() is too.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Daniel P. Berrange <berrange@redhat.com>
CC: Michael Tokarev <mjt@tls.msk.ru>
CC: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>

os-posix: cleanup: Replace fprintf with error_report in remaining call sites

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Daniel P. Berrange <berrange@redhat.com>
CC: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>

xen: Expect xenstore write to fail when restricted

Saving the current state to xenstore may fail when running restricted
(in particular, after a migration). Therefore, don't report the error or
exit when running restricted. Toolstacks that want to allow running
QEMU restricted should instead make use of QMP events to listen for
state changes.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>

xen: Remove now-obsolete xen_xc_domain_add_to_physmap

The last user was just removed; remove this function, accordingly.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>

xen: Use newly added dmops for mapping VGA memory

Xen unstable (to be in 4.11) has two new dmops, relocate_memory and
pin_memory_cacheattr. Use these to set up the VGA memory, replacing the
previous calls to libxc. This allows the VGA console to work properly
when QEMU is running restricted (-xen-domid-restrict).

Wrapper functions are provided to allow QEMU to work with older versions
of Xen.

Tweak the error handling while making this change:
* Report pin_memory_cacheattr errors.
* Report errors even when DEBUG_HVM is not set. This is useful for
trying to understand why VGA is not working, since otherwise it just
fails silently.
* Fix the return values when an error occurs. The functions now
consistently return -1 and set errno.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>

os-posix: Provide new -runas <uid>:<gid> facility

This allows the caller to specify a uid and gid to use, even if there
is no corresponding password entry. This will be useful in certain
Xen configurations.

We don't support just -runas <uid> because: (i) deprivileging without
calling setgroups would be ineffective (ii) given only a uid we don't
know what gid we ought to use (since uids may eppear in multiple
passwd file entries with different gids).

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Daniel P. Berrange <berrange@redhat.com>
CC: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

os-posix: cleanup: Replace fprintfs with error_report in change_process_uid

I'm going to be editing this function and it makes sense to clean up
this style problem in advance.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Markus Armbruster <armbru@redhat.com>
CC: Daniel P. Berrange <berrange@redhat.com>
CC: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>

xen: destroy_hvm_domain: Try xendevicemodel_shutdown

xc_interface_open etc. is not going to work if we have dropped
privilege, but xendevicemodel_shutdown will if everything is new
enough.

xendevicemodel_shutdown is only availabe in Xen 4.10 and later, so
provide a stub for earlier versions.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

xen: move xc_interface compatibility fallback further up the file

We are going to want to use the dummy xendevicemodel_handle type in
new stub functions in the CONFIG_XEN_CTRL_INTERFACE_VERSION < 41000
section. So we need to provide that definition, or (as applicable)
include the appropriate header, earlier in the file.

(Ideally the newer compatibility layers would be at the bottom of the
file, so that they can naturally benefit from the compatibility layers
for earlier version. But that's rather too much for this series.)

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen: destroy_hvm_domain: Move reason into a variable

We are going to want to reuse this.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen: defer call to xen_restrict until just before os_setup_post

We need to restrict *all* the control fds that qemu opens.  Looking in
/proc/PID/fd shows there are many; their allocation seems scattered
throughout Xen support code in qemu.

We must postpone the restrict call until roughly the same time as qemu
changes its uid, chroots (if applicable), and so on.

There doesn't seem to be an appropriate hook already.  The RunState
change hook fires at different times depending on exactly what mode
qemu is operating in.

And it appears that no-one but the Xen code wants a hook at this phase
of execution.  So, introduce a bare call to a new function
xen_setup_post, just before os_setup_post.  Also provide the
appropriate stub for when Xen compilation is disabled.

We do the restriction before rather than after os_setup_post, because
xen_restrict may need to open /dev/null, and os_setup_post might have
called chroot.

Currently this does not work with migration, because when running as
the Xen device model qemu needs to signal to the toolstack that it is
ready.  It currently does this using xenstore, and for incoming
migration (but not for ordinary startup) that happens after
os_setup_post.

It is correct that this happens late: we want the incoming migration
stream to be processed by a restricted qemu.  The fix for this will be
to do the startup notification a different way, without using
xenstore.  (QMP is probably a reasonable choice.)

So for now this restriction feature cannot be used in conjunction with
migration.  (Note that this is not a regression in this patch, because
previously the -xen-restrict-domid call was, in fact, simply
ineffective!)  We will revisit this in the Xen 4.11 release cycle.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Paolo Bonzini <pbonzini@redhat.com> (maintainer:X86)
CC: Richard Henderson <rth@twiddle.net> (maintainer:X86)
CC: Eduardo Habkost <ehabkost@redhat.com> (maintainer:X86)
CC: Michael S. Tsirkin <mst@redhat.com> (supporter:PC)
Acked-by: Anthony PERARD <anthony.perard@citrix.com>

xen: restrict: use xentoolcore_restrict_all

And insist that it works.

Drop individual use of xendevicemodel_restrict and
xenforeignmemory_restrict. These are not actually effective in this
version of qemu, because qemu has a large number of fds open onto
various Xen control devices.

The restriction arrangements are still not right, because the
restriction needs to be done very late - after qemu has opened all of
its control fds.

xentoolcore_restrict_all and xentoolcore.h are available in Xen 4.10
and later, only. Provide a compatibility stub. And drop the
compatibility stubs for the old functions.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

xen: link against xentoolcore

Xen libraries in 4.10 include a new xentoolcore library. This
contains the xentoolcore_restrict_all function which we are about to
want to use.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>

AccelClass: Introduce accel_setup_post

This is called just before os_setup_post. Currently none of the
accelerators provide this hook, but the Xen one is going to provide
one in a moment.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

checkpatch: Add xendevicemodel_handle to the list of types

This avoids checkpatch misparsing (as statements) long function
definitions or declarations, which sometimes start with constructs
like this:

  static inline int xendevicemodel_relocate_memory(
      xendevicemodel_handle *dmod, domid_t domid, ...

The type xendevicemodel_handle does not conform to Qemu CODING_STYLE,
which would suggest CamelCase.  However, it is a type defined by the
Xen Project in xen.git.  It would be possible to introduce a typedef
to allow the qemu code to refer to it by a differently-spelled name,
but that would obfuscate more than it would clarify.

CC: Eric Blake <eblake@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

vl.c: new function serial_max_hds()

Create a new function serial_max_hds() which returns the number of
serial ports defined by the user. This is needed only by spapr.

This allows us to remove the MAX_SERIAL_PORTS define.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-14-peter.maydell@linaro.org

vl.c: Remove compile time limit on number of serial ports

Instead of having a fixed sized global serial_hds[] array,
use a local dynamically reallocated one, so we don't have
a compile time limit on how many serial ports a system has.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-13-peter.maydell@linaro.org

superio: Don't use MAX_SERIAL_PORTS for serial port limit

The superio device has a limit on the number of serial
ports it supports which is really only there because
it has a fixed-size array serial[]. This limit isn't
related particularly to the global MAX_SERIAL_PORTS limit,
so use a different #define for it.

(In practice the users of superio only ever want 2 serial ports.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-12-peter.maydell@linaro.org

serial-isa: Use MAX_ISA_SERIAL_PORTS instead of MAX_SERIAL_PORTS

The ISA serial port handling in serial-isa.c imposes a limit
of 4 serial ports. This is because we only know of 4 IO port
and IRQ settings for them, and is unrelated to the generic
MAX_SERIAL_PORTS limit, though they happen to both be set at
4 currently.

Use a new MAX_ISA_SERIAL_PORTS wherever that is the correct
limit to be checking against.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-11-peter.maydell@linaro.org

hw/char/exynos4210_uart.c: Remove unneeded handling of NULL chardev

The handling of NULL chardevs in exynos4210_uart_create() is now
all unnecessary: we don't need to create 'null' chardevs, and we
don't need to enforce a bounds check on serial_hd().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20180420145249.32435-10-peter.maydell@linaro.org

Remove checks on MAX_SERIAL_PORTS that are just bounds checks

Remove checks on MAX_SERIAL_PORTS that were just checking whether
they were within bounds for the serial_hds[] array and falling
back to NULL if not. This isn't needed with the serial_hd()
function, which returns NULL for all indexes beyond what the
user set up.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-9-peter.maydell@linaro.org

Change references to serial_hds[] to serial_hd()

Change all the uses of serial_hds[] to go via the new
serial_hd() function. Code change produced with:
find hw -name '*.[ch]' | xargs sed -i -e 's/serial_hds\[$[^]]*$\]/serial_hd(\1)/g'

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20180420145249.32435-8-peter.maydell@linaro.org

vl.c: Provide accessor function serial_hd() for serial_hds[] array

Provide an accessor function serial_hd() to return the Chardev
(if any) associated with the numbered serial port. This will
be used to replace direct accesses to the serial_hds[] array,
so that calling code doesn't need to care about the size of
that array.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-7-peter.maydell@linaro.org

hw/xtensa/xtfpga.c: Don't create "null" chardevs for serial devices

Following commit 12051d82f004024, UART devices should handle
being passed a NULL pointer chardev, so we don't need to
create "null" backends in board code. Remove the code that
does this and updates serial_hds[].

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-6-peter.maydell@linaro.org

hw/mips/mips_malta: Don't create "null" chardevs for serial devices

Following commit 12051d82f004024, UART devices should handle
being passed a NULL pointer chardev, so we don't need to
create "null" backends in board code. Remove the code that
does this and updates serial_hds[].

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-5-peter.maydell@linaro.org

hw/mips/boston.c: Don't create "null" chardevs for serial devices

Following commit 12051d82f004024, UART devices should handle
being passed a NULL pointer chardev, so we don't need to
create "null" backends in board code. Remove the code that
does this and updates serial_hds[].

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420145249.32435-4-peter.maydell@linaro.org

hw/arm/fsl-imx*: Don't create "null" chardevs for serial devices

Following commit 12051d82f004024, UART devices should handle
being passed a NULL pointer chardev, so we don't need to
create "null" backends in board code. Remove the code that
does this and updates serial_hds[].

(fsl-imx7.c was already written this way.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Message-id: 20180420145249.32435-3-peter.maydell@linaro.org

hw/char/serial: Allow disconnected chardevs

Currently the serial.c realize code has an explicit check that it is not
connected to a disconnected backend (ie one with a NULL chardev).
This isn't what we want -- you should be able to create a serial device
even if it isn't attached to anything. Remove the check.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Thomas Huth <thuth@redhat.com>
Message-id: 20180420145249.32435-2-peter.maydell@linaro.org

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180426' into staging

target-arm queue:
* xilinx_spips: Correct SNOOP_NONE state when flushing the txfifo
* timer/aspeed: fix vmstate version id
* hw/arm/aspeed_soc: don't use vmstate_register_ram_global for SRAM
* hw/arm/aspeed: don't make 'boot_rom' region 'nomigrate'
* hw/arm/highbank: don't make sysram 'nomigrate'
* hw/arm/raspi: Don't bother setting default_cpu_type
* PMU emulation: some minor bugfixes and preparation for
   support of other events than just the cycle counter
* target/arm: Use v7m_stack_read() for reading the frame signature
* target/arm: Remove stale TODO comment
* arm: always start from first_cpu when registering loader cpu reset callback
* device_tree: Increase FDT_MAX_SIZE to 1 MiB

# gpg: Signature made Thu 26 Apr 2018 11:46:31 BST
# gpg:                using RSA key 3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20180426:
  xilinx_spips: Correct SNOOP_NONE state when flushing the txfifo
  timer/aspeed: fix vmstate version id
  hw/arm/aspeed_soc: don't use vmstate_register_ram_global for SRAM
  hw/arm/aspeed: don't make 'boot_rom' region 'nomigrate'
  hw/arm/highbank: don't make sysram 'nomigrate'
  hw/arm/raspi: Don't bother setting default_cpu_type
  target/arm: Make PMOVSCLR and PMUSERENR 64 bits wide
  target/arm: Fix bitmask for PMCCFILTR writes
  target/arm: Allow EL change hooks to do IO
  target/arm: Add pre-EL change hooks
  target/arm: Support multiple EL change hooks
  target/arm: Fetch GICv3 state directly from CPUARMState
  target/arm: Mask PMU register writes based on PMCR_EL0.N
  target/arm: Treat PMCCNTR as alias of PMCCNTR_EL0
  target/arm: Check PMCNTEN for whether PMCCNTR is enabled
  target/arm: Use v7m_stack_read() for reading the frame signature
  target/arm: Remove stale TODO comment
  arm: always start from first_cpu when registering loader cpu reset callback
  device_tree: Increase FDT_MAX_SIZE to 1 MiB

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Open 2.13 development tree

Unfortunately I forgot to do this before applying the merge
in commit 8e383d19b44863556, so that commit will incorrectly
claim to be 2.12 even though it isn't in the official 2.12
release. Oops.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

xilinx_spips: Correct SNOOP_NONE state when flushing the txfifo

SNOOP_NONE state handle is moved above in the if ladder, as it's same
as SNOOP_STRIPPING during data cycles.

Signed-off-by: Sai Pavan Boddu <saipava@xilinx.com>
Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com>
Message-id: 1524119244-1240-1-git-send-email-saipava@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

timer/aspeed: fix vmstate version id

commit 1d3e65aa7ac5 ("hw/timer: Add value matching support to
aspeed_timer") increased the vmstate version of aspeed.timer because
the state had changed, but it also bumped the version of the
VMSTATE_STRUCT_ARRAY under the aspeed.timerctrl which did not need to.

Change back this version to fix migration.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20180423101433.17759-1-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/aspeed_soc: don't use vmstate_register_ram_global for SRAM

Currently we use vmstate_register_ram_global() for the SRAM;
this is not a good idea for devices, because it means that
you can only ever create one instance of the device, as
the second instance would get a RAM block name clash.
Instead, use memory_region_init_ram(), which automatically
registers the RAM block with a local-to-the-device name.

Note that this would be a cross-version migration compatibility break
for the "palmetto-bmc", "ast2500-evb" and "romulus-bmc" machines,
but migration is currently broken for them.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20180420124835.7268-4-peter.maydell@linaro.org

hw/arm/aspeed: don't make 'boot_rom' region 'nomigrate'

Currently we use memory_region_init_ram_nomigrate() to create
the "aspeed.boot_rom" memory region, and we don't manually
register it with vmstate_register_ram(). This currently
means that its contents are migrated but as a ram block
whose name is the empty string; in future it may mean they
are not migrated at all. Use memory_region_init_ram() instead.

Note that would be a cross-version migration compatibility break
for the "palmetto-bmc", "ast2500-evb" and "romulus-bmc" machines,
but migration is currently broken for them.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Tested-by: Cédric Le Goater <clg@kaod.org>
Message-id: 20180420124835.7268-3-peter.maydell@linaro.org

hw/arm/highbank: don't make sysram 'nomigrate'

Currently we use memory_region_init_ram_nomigrate() to create
the "highbank.sysram" memory region, and we don't manually
register it with vmstate_register_ram(). This currently
means that its contents are migrated but as a ram block
whose name is the empty string; in future it may mean they
are not migrated at all. Use memory_region_init_ram() instead.

Note that this is a cross-version migration compatibility
break for the "highbank" and "midway" machines.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180420124835.7268-2-peter.maydell@linaro.org

hw/arm/raspi: Don't bother setting default_cpu_type

In commit 210f47840dd62, we changed the bcm2836 SoC object to
always create a CPU of the correct type for that SoC model. This
makes the default_cpu_type settings in the MachineClass structs
for the raspi2 and raspi3 boards redundant. We didn't change
those at the time because it would have meant a temporary
regression in a corner case of error handling if the user
requested a non-existing CPU type. The -cpu parse handling
changes in 2278b93941d42c3 mean that it no longer implicitly
depends on default_cpu_type for this to work, so we can now
delete the redundant default_cpu_type fields.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180420155547.9497-1-peter.maydell@linaro.org

target/arm: Make PMOVSCLR and PMUSERENR 64 bits wide

This is a bug fix to ensure 64-bit reads of these registers don't read
adjacent data.

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Message-id: 1523997485-1905-13-git-send-email-alindsay@codeaurora.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Fix bitmask for PMCCFILTR writes

It was shifted to the left one bit too few.

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1523997485-1905-10-git-send-email-alindsay@codeaurora.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Allow EL change hooks to do IO

During code generation, surround CPSR writes and exception returns which
call the EL change hooks with gen_io_start/end. The immediate need is
for the PMU to access the clock and icount during EL change to support
mode filtering.

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Message-id: 1523997485-1905-9-git-send-email-alindsay@codeaurora.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Add pre-EL change hooks

Because the design of the PMU requires that the counter values be
converted between their delta and guest-visible forms for mode
filtering, an additional hook which occurs before the EL is changed is
necessary.

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Message-id: 1523997485-1905-8-git-send-email-alindsay@codeaurora.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Support multiple EL change hooks

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Message-id: 1523997485-1905-7-git-send-email-alindsay@codeaurora.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Fetch GICv3 state directly from CPUARMState

This eliminates the need for fetching it from el_change_hook_opaque, and
allows for supporting multiple el_change_hooks without having to hack
something together to find the registered opaque belonging to GICv3.

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1523997485-1905-6-git-send-email-alindsay@codeaurora.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Mask PMU register writes based on PMCR_EL0.N

This is in preparation for enabling counters other than PMCCNTR

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1523997485-1905-5-git-send-email-alindsay@codeaurora.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Treat PMCCNTR as alias of PMCCNTR_EL0

They share the same underlying state

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1523997485-1905-3-git-send-email-alindsay@codeaurora.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Check PMCNTEN for whether PMCCNTR is enabled

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1523997485-1905-2-git-send-email-alindsay@codeaurora.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Use v7m_stack_read() for reading the frame signature

In commit 95695effe8caa552b8f2 we changed the v7M/v8M stack
pop code to use a new v7m_stack_read() function that checks
whether the read should fail due to an MPU or bus abort.
We missed one call though, the one which reads the signature
word for the callee-saved register part of the frame.

Correct the omission.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180419142106.9694-1-peter.maydell@linaro.org

target/arm: Remove stale TODO comment

Remove a stale TODO comment -- we have now made the arm_ldl_ptw()
and arm_ldq_ptw() functions propagate physical memory read errors
out to their callers.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180419142151.9862-1-peter.maydell@linaro.org

arm: always start from first_cpu when registering loader cpu reset callback

if arm_load_kernel() were passed non first_cpu, QEMU would end up
with partially set do_cpu_reset() callback leaving some CPUs without it.

Make sure that do_cpu_reset() is registered for all CPUs by enumerating
CPUs from first_cpu.

(In practice every board that we have was passing us the first CPU
as the boot CPU, either directly or indirectly, so this wasn't
causing incorrect behaviour.)

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: added a note that this isn't a behaviour change]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

device_tree: Increase FDT_MAX_SIZE to 1 MiB

It is not uncommon for a contemporary FDT to be larger than 64 KiB,
leading to failures loading the device tree from sysfs:

qemu-system-aarch64: qemu_fdt_setprop: Couldn't set ...: FDT_ERR_NOSPACE

Hence increase the limit to 1 MiB, like on PPC.

For reference, the largest arm64 DTB created from the Linux sources is
ca. 75 KiB large (100 KiB when built with symbols/fixup support).

Cc: qemu-stable@nongnu.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Message-id: 1523541337-23919-1-git-send-email-geert+renesas@glider.be
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20180425a' into staging

Migration pull for 2.13

Alexey Perevalov postcopy blocktime statistics
Xiao Guangrong's compression performance improvements

# gpg: Signature made Wed 25 Apr 2018 20:21:13 BST
# gpg:                using RSA key 0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20180425a:
  migration: remove ram_save_compressed_page()
  migration: introduce save_normal_page()
  migration: move calling save_zero_page to the common place
  migration: move calling control_save_page to the common place
  migration: move some code to ram_save_host_page
  migration: introduce control_save_page()
  migration: detect compression and decompression errors
  migration: stop decompression to allocate and free memory frequently
  migration: stop compression to allocate and free memory frequently
  migration: stop compressing page in migration thread
  migration: add postcopy total blocktime into query-migrate
  migration: add blocktime calculation into migration-test
  migration: postcopy_blocktime documentation
  migration: calculate vCPU blocktime on dst side
  migration: add postcopy blocktime ctx into MigrationIncomingState
  migration: introduce postcopy-blocktime capability

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

migration: remove ram_save_compressed_page()

Now, we can reuse the path in ram_save_page() to post the page out
as normal, then the only thing remained in ram_save_compressed_page()
is compression that we can move it out to the caller

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-11-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: introduce save_normal_page()

It directly sends the page to the stream neither checking zero nor
using xbzrle or compression

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-10-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: move calling save_zero_page to the common place

save_zero_page() is always our first approach to try, move it to
the common place before calling ram_save_compressed_page
and ram_save_page

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-9-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: move calling control_save_page to the common place

The function is called by both ram_save_page and ram_save_target_page,
so move it to the common caller to cleanup the code

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-8-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: move some code to ram_save_host_page

Move some code from ram_save_target_page() to ram_save_host_page()
to make it be more readable for latter patches that dramatically
clean ram_save_target_page() up

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-7-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: introduce control_save_page()

Abstract the common function control_save_page() to cleanup the code,
no logic is changed

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-6-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: detect compression and decompression errors

Currently the page being compressed is allowed to be updated by
the VM on the source QEMU, correspondingly the destination QEMU
just ignores the decompression error. However, we completely miss
the chance to catch real errors, then the VM is corrupted silently

To make the migration more robuster, we copy the page to a buffer
first to avoid it being written by VM, then detect and handle the
errors of both compression and decompression errors properly

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-5-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: stop decompression to allocate and free memory frequently

Current code uses uncompress() to decompress memory which manages
memory internally, that causes huge memory is allocated and freed
very frequently, more worse, frequently returning memory to kernel
will flush TLBs

So, we maintain the memory by ourselves and reuse it for each
decompression

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-4-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: stop compression to allocate and free memory frequently

Current code uses compress2() to compress memory which manages memory
internally, that causes huge memory is allocated and freed very
frequently

More worse, frequently returning memory to kernel will flush TLBs
and trigger invalidation callbacks on mmu-notification which
interacts with KVM MMU, that dramatically reduce the performance
of VM

So, we maintain the memory by ourselves and reuse it for each
compression

Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-3-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: stop compressing page in migration thread

As compression is a heavy work, do not do it in migration thread,
instead, we post it out as a normal page

Reviewed-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180330075128.26919-2-xiaoguangrong@tencent.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: add postcopy total blocktime into query-migrate

Postcopy total blocktime is available on destination side only.
But query-migrate was possible only for source. This patch
adds ability to call query-migrate on destination.
To be able to see postcopy blocktime, need to request postcopy-blocktime
capability.

The query-migrate command will show following sample result:
{"return":
    "postcopy-vcpu-blocktime": [115, 100],
    "status": "completed",
    "postcopy-blocktime": 100
}}

postcopy_vcpu_blocktime contains list, where the first item is the first
vCPU in QEMU.

This patch has a drawback, it combines states of incoming and
outgoing migration. Ongoing migration state will overwrite incoming
state. Looks like better to separate query-migrate for incoming and
outgoing migration or add parameter to indicate type of migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-7-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: add blocktime calculation into migration-test

This patch just requests blocktime calculation,
and check it in case when UFFD_FEATURE_THREAD_ID feature is set
on the host.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-6-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: postcopy_blocktime documentation

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-5-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: calculate vCPU blocktime on dst side

This patch provides blocktime calculation per vCPU,
as a summary and as a overlapped value for all vCPUs.

This approach was suggested by Peter Xu, as an improvements of
previous approch where QEMU kept tree with faulted page address and cpus bitmask
in it. Now QEMU is keeping array with faulted page address as value and vCPU
as index. It helps to find proper vCPU at UFFD_COPY time. Also it keeps
list for blocktime per vCPU (could be traced with page_fault_addr)

Blocktime will not calculated if postcopy_blocktime field of
MigrationIncomingState wasn't initialized.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-4-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: add postcopy blocktime ctx into MigrationIncomingState

This patch adds request to kernel space for UFFD_FEATURE_THREAD_ID, in
case this feature is provided by kernel.

PostcopyBlocktimeContext is encapsulated inside postcopy-ram.c,
due to it being a postcopy-only feature.
Also it defines PostcopyBlocktimeContext's instance live time.
Information from PostcopyBlocktimeContext instance will be provided
much after postcopy migration end, instance of PostcopyBlocktimeContext
will live till QEMU exit, but part of it (vcpu_addr,
page_fault_vcpu_time) used only during calculation, will be released
when postcopy ended or failed.

To enable postcopy blocktime calculation on destination, need to
request proper compatibility (Patch for documentation will be at the
tail of the patch set).

As an example following command enable that capability, assume QEMU was
started with
-chardev socket,id=charmonitor,path=/var/lib/migrate-vm-monitor.sock
option to control it

[root@host]#printf "{\"execute\" : \"qmp_capabilities\"}\r\n \
{\"execute\": \"migrate-set-capabilities\" , \"arguments\": {
\"capabilities\": [ { \"capability\": \"postcopy-blocktime\", \"state\":
true } ] } }" | nc -U /var/lib/migrate-vm-monitor.sock

Or just with HMP
(qemu) migrate_set_capability postcopy-blocktime on

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-3-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: introduce postcopy-blocktime capability

Right now it could be used on destination side to
enable vCPU blocktime calculation for postcopy live migration.
vCPU blocktime - it's time since vCPU thread was put into
interruptible sleep, till memory page was copied and thread awake.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <1521742647-25550-2-git-send-email-a.perevalov@samsung.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>