OSDN Git Service

qmiga/qemu.git
6 years agotcg: consolidate TB lookups in tb_lookup__cpu_state
Emilio G. Cota [Tue, 11 Jul 2017 21:33:33 +0000 (17:33 -0400)]
tcg: consolidate TB lookups in tb_lookup__cpu_state

This avoids duplicating code. cpu_exec_step will also use the
new common function once we integrate parallel_cpus into tb->cflags.

Note that in this commit we also fix a race, described by Richard Henderson
during review. Think of this scenario with threads A and B:

   (A) Lookup succeeds for TB in hash without tb_lock
        (B) Sets the TB's tb->invalid flag
        (B) Removes the TB from tb_htable
        (B) Clears all CPU's tb_jmp_cache
   (A) Store TB into local tb_jmp_cache

Given that order of events, (A) will keep executing that invalid TB until
another flush of its tb_jmp_cache happens, which in theory might never happen.
We can fix this by checking the tb->invalid flag every time we look up a TB
from tb_jmp_cache, so that in the above scenario, next time we try to find
that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
up in tb_htable.

Performance-wise, I measured a small improvement when booting debian-arm.
Note that inlining pays off:

 Performance counter stats for 'taskset -c 0 qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=jessie.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):

Before:
      18714.917392 task-clock                #    0.952 CPUs utilized            ( +-  0.95% )
            23,142 context-switches          #    0.001 M/sec                    ( +-  0.50% )
                 1 CPU-migrations            #    0.000 M/sec
            10,558 page-faults               #    0.001 M/sec                    ( +-  0.95% )
    53,957,727,252 cycles                    #    2.883 GHz                      ( +-  0.91% ) [83.33%]
    24,440,599,852 stalled-cycles-frontend   #   45.30% frontend cycles idle     ( +-  1.20% ) [83.33%]
    16,495,714,424 stalled-cycles-backend    #   30.57% backend  cycles idle     ( +-  0.95% ) [66.66%]
    76,267,572,582 instructions              #    1.41  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  0.87% ) [83.34%]
    12,692,186,323 branches                  #  678.186 M/sec                    ( +-  0.92% ) [83.35%]
       263,486,879 branch-misses             #    2.08% of all branches          ( +-  0.73% ) [83.34%]

      19.648474449 seconds time elapsed                                          ( +-  0.82% )

After, w/ inline (this patch):
      18471.376627 task-clock                #    0.955 CPUs utilized            ( +-  0.96% )
            23,048 context-switches          #    0.001 M/sec                    ( +-  0.48% )
                 1 CPU-migrations            #    0.000 M/sec
            10,708 page-faults               #    0.001 M/sec                    ( +-  0.81% )
    53,208,990,796 cycles                    #    2.881 GHz                      ( +-  0.98% ) [83.34%]
    23,941,071,673 stalled-cycles-frontend   #   44.99% frontend cycles idle     ( +-  0.95% ) [83.34%]
    16,161,773,848 stalled-cycles-backend    #   30.37% backend  cycles idle     ( +-  0.76% ) [66.67%]
    75,786,269,766 instructions              #    1.42  insns per cycle
                                             #    0.32  stalled cycles per insn  ( +-  1.24% ) [83.34%]
    12,573,617,143 branches                  #  680.708 M/sec                    ( +-  1.34% ) [83.33%]
       260,235,550 branch-misses             #    2.07% of all branches          ( +-  0.66% ) [83.33%]

      19.340502161 seconds time elapsed                                          ( +-  0.56% )

After, w/o inline:
      18791.253967 task-clock                #    0.954 CPUs utilized            ( +-  0.78% )
            23,230 context-switches          #    0.001 M/sec                    ( +-  0.42% )
                 1 CPU-migrations            #    0.000 M/sec
            10,563 page-faults               #    0.001 M/sec                    ( +-  1.27% )
    54,168,674,622 cycles                    #    2.883 GHz                      ( +-  0.80% ) [83.34%]
    24,244,712,629 stalled-cycles-frontend   #   44.76% frontend cycles idle     ( +-  1.37% ) [83.33%]
    16,288,648,572 stalled-cycles-backend    #   30.07% backend  cycles idle     ( +-  0.95% ) [66.66%]
    77,659,755,503 instructions              #    1.43  insns per cycle
                                             #    0.31  stalled cycles per insn  ( +-  0.97% ) [83.34%]
    12,922,780,045 branches                  #  687.702 M/sec                    ( +-  1.06% ) [83.34%]
       261,962,386 branch-misses             #    2.03% of all branches          ( +-  0.71% ) [83.35%]

      19.700174670 seconds time elapsed                                          ( +-  0.56% )

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agotcg: remove addr argument from lookup_tb_ptr
Emilio G. Cota [Tue, 11 Jul 2017 21:06:48 +0000 (17:06 -0400)]
tcg: remove addr argument from lookup_tb_ptr

It is unlikely that we will ever want to call this helper passing
an argument other than the current PC. So just remove the argument,
and use the pc we already get from cpu_get_tb_cpu_state.

This change paves the way to having a common "tb_lookup" function.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agotcg/mips: constify tcg_target_callee_save_regs
Emilio G. Cota [Wed, 5 Jul 2017 22:13:07 +0000 (18:13 -0400)]
tcg/mips: constify tcg_target_callee_save_regs

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agotcg/i386: constify tcg_target_callee_save_regs
Emilio G. Cota [Wed, 5 Jul 2017 22:12:56 +0000 (18:12 -0400)]
tcg/i386: constify tcg_target_callee_save_regs

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agocpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find
Emilio G. Cota [Wed, 12 Jul 2017 18:29:26 +0000 (14:29 -0400)]
cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find

Reusing the have_tb_lock name, which is also defined in translate-all.c,
makes code reviewing unnecessarily harder.

Avoid potential confusion by renaming the local have_tb_lock variable
to something else.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agotranslate-all: make have_tb_lock static
Emilio G. Cota [Fri, 7 Jul 2017 01:28:52 +0000 (21:28 -0400)]
translate-all: make have_tb_lock static

It is only used by this object, and it's not exported to any other.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agoexec-all: fix typos in TranslationBlock's documentation
Emilio G. Cota [Fri, 23 Jun 2017 23:43:01 +0000 (19:43 -0400)]
exec-all: fix typos in TranslationBlock's documentation

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agotcg: fix corruption of code_time profiling counter upon tb_flush
Emilio G. Cota [Fri, 7 Jul 2017 22:22:49 +0000 (18:22 -0400)]
tcg: fix corruption of code_time profiling counter upon tb_flush

Whenever there is an overflow in code_gen_buffer (e.g. we run out
of space in it and have to flush it), the code_time profiling counter
ends up with an invalid value (that is, code_time -= profile_getclock(),
without later on getting += profile_getclock() due to the goto).

Fix it by using the ti variable, so that we only update code_time
when there is no overflow. Note that in case there is an overflow
we fail to account for the elapsed coding time, but this is quite rare
so we can probably live with it.

"info jit" before/after, roughly at the same time during debian-arm bootup:

- before:
Statistics:
TB flush count      1
TB invalidate count 4665
TLB flush count     998
JIT cycles          -615191529184601 (-256329.804 s at 2.4 GHz)
translated TBs      302310 (aborted=0 0.0%)
avg ops/TB          48.4 max=438
deleted ops/TB      8.54
avg temps/TB        32.31 max=38
avg host code/TB    361.5
avg search data/TB  24.5
cycles/op           -42014693.0
cycles/in byte      -121444900.2
cycles/out byte     -5629031.1
cycles/search byte     -83114481.0
  gen_interm time   -0.0%
  gen_code time     100.0%
optim./code time    -0.0%
liveness/code time  -0.0%
cpu_restore count   6236
  avg cycles        110.4

- after:
Statistics:
TB flush count      1
TB invalidate count 4665
TLB flush count     1010
JIT cycles          1996899624 (0.832 s at 2.4 GHz)
translated TBs      297961 (aborted=0 0.0%)
avg ops/TB          48.5 max=438
deleted ops/TB      8.56
avg temps/TB        32.31 max=38
avg host code/TB    361.8
avg search data/TB  24.5
cycles/op           138.2
cycles/in byte      398.4
cycles/out byte     18.5
cycles/search byte     273.1
  gen_interm time   14.0%
  gen_code time     86.0%
optim./code time    19.4%
liveness/code time  10.3%
cpu_restore count   6372
  avg cycles        111.0

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agocputlb: bring back tlb_flush_count under !TLB_DEBUG
Emilio G. Cota [Thu, 6 Jul 2017 18:42:26 +0000 (14:42 -0400)]
cputlb: bring back tlb_flush_count under !TLB_DEBUG

Commit f0aff0f124 ("cputlb: add assert_cpu_is_self checks") buried
the increment of tlb_flush_count under TLB_DEBUG. This results in
"info jit" always (mis)reporting 0 TLB flushes when !TLB_DEBUG.

Besides, under MTTCG tlb_flush_count is updated by several threads,
so in order not to lose counts we'd either have to use atomic ops
or distribute the counter, which is more scalable.

This patch does the latter by embedding tlb_flush_count in CPUArchState.
The global count is then easily obtained by iterating over the CPU list.

Note that this change also requires updating the accessors to
tlb_flush_count to use atomic_read/set whenever there may be conflicting
accesses (as defined in C11) to it.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
6 years agoMerge remote-tracking branch 'remotes/ehabkost/tags/x86-and-machine-pull-request...
Peter Maydell [Tue, 10 Oct 2017 12:25:46 +0000 (13:25 +0100)]
Merge remote-tracking branch 'remotes/ehabkost/tags/x86-and-machine-pull-request' into staging

x86 and machine queue, 2017-10-09

Includes x86, QOM, CPU, and option/config parsing patches.

Highlights:
* Deprecation of -nodefconfig option;
* MachineClass::valid_cpu_types field.

# gpg: Signature made Tue 10 Oct 2017 03:31:33 BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-and-machine-pull-request:
  x86: Correct translation of some rdgsbase and wrgsbase encodings
  vl: exit if maxcpus is negative
  qom: update doc comment for type_register[_static]()
  config: qemu_config_parse() return number of config groups
  qemu-options: Deprecate -nodefconfig
  vl: Eliminate defconfig variable
  machine: Add a valid_cpu_types property
  qom/cpu: move cpu_model null check to cpu_class_by_name()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6 years agox86: Correct translation of some rdgsbase and wrgsbase encodings
Todd Eisenberger [Thu, 28 Sep 2017 17:17:06 +0000 (10:17 -0700)]
x86: Correct translation of some rdgsbase and wrgsbase encodings

It looks like there was a transcription error when writing this code
initially.  The code previously only decoded src or dst of rax.  This
resolves
https://bugs.launchpad.net/qemu/+bug/1719984.

Signed-off-by: Todd Eisenberger <teisenbe@google.com>
Message-Id: <CAP26EVRNVb=Mq=O3s51w7fDhGVmf-e3XFFA73MRzc5b4qKBA4g@mail.gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
6 years agovl: exit if maxcpus is negative
Seeteena Thoufeek [Mon, 4 Sep 2017 07:43:51 +0000 (13:13 +0530)]
vl: exit if maxcpus is negative

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---Steps to Reproduce---

When passed a negative number to 'maxcpus' parameter, Qemu aborts
with a core dump.

Run the following command with maxcpus argument as negative number

ppc64-softmmu/qemu-system-ppc64 --nographic -vga none -machine
pseries,accel=kvm,kvm-type=HV -m size=200g -device virtio-blk-pci,
drive=rootdisk -drive file=/home/images/pegas-1.0-ppc64le.qcow2,
if=none,cache=none,id=rootdisk,format=qcow2 -monitor telnet
:127.0.0.1:1234,server,nowait -net nic,model=virtio -net
user -redir tcp:2000::22 -device nec-usb-xhci -smp 8,cores=1,
threads=1,maxcpus=-12

(process:12149): GLib-ERROR **: gmem.c:130: failed to allocate
 18446744073709550568 bytes

Trace/breakpoint trap

Reported-by: R.Nageswara Sastry <rnsastry@linux.vnet.ibm.com>
Signed-off-by: Seeteena Thoufeek <s1seetee@linux.vnet.ibm.com>
Message-Id: <1504511031-26834-1-git-send-email-s1seetee@linux.vnet.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
6 years agoqom: update doc comment for type_register[_static]()
Igor Mammedov [Wed, 4 Oct 2017 10:08:00 +0000 (12:08 +0200)]
qom: update doc comment for type_register[_static]()

type_register()/type_register_static() functions in current impl.
can't fail returning 0, also none of the users check for error
so update doc comment to reflect current behaviour.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1507111682-66171-2-git-send-email-imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
6 years agoconfig: qemu_config_parse() return number of config groups
Eduardo Habkost [Wed, 4 Oct 2017 02:50:42 +0000 (23:50 -0300)]
config: qemu_config_parse() return number of config groups

Change qemu_config_parse() to return the number of config groups
in success and -EINVAL on error. This will allow callers of
qemu_config_parse() to check if something was really loaded from
the config file.

All existing callers of qemu_config_parse() and
qemu_read_config_file() only check if the return value was
negative, so the change shouldn't affect them.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004025043.3788-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
6 years agoqemu-options: Deprecate -nodefconfig
Eduardo Habkost [Wed, 4 Oct 2017 03:00:25 +0000 (00:00 -0300)]
qemu-options: Deprecate -nodefconfig

Since 2012 (commit ba6212d8 "Eliminate cpus-x86_64.conf file") we
have no default config files that would be disabled using
-nodefconfig.  Update documentation and document -nodefconfig as
deprecated.

Cc: Markus Armbruster <armbru@redhat.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004030025.7866-3-ehabkost@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
6 years agovl: Eliminate defconfig variable
Eduardo Habkost [Wed, 4 Oct 2017 03:00:24 +0000 (00:00 -0300)]
vl: Eliminate defconfig variable

Both -nodefconfig and -no-user-config options do the same thing
today, we only need one variable to keep track of them.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Acked-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20171004030025.7866-2-ehabkost@redhat.com>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
6 years agomachine: Add a valid_cpu_types property
Alistair Francis [Tue, 3 Oct 2017 20:05:09 +0000 (13:05 -0700)]
machine: Add a valid_cpu_types property

This patch add a MachineClass element that can be set in the machine C
code to specify a list of supported CPU types. If the supported CPU
types are specified the user enter CPU (by -cpu at runtime) is checked
against the supported types and QEMU exits if they aren't supported.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-Id: <b8474e9d2e0a219d9bac901342f983b13d009301.1507059418.git.alistair.francis@xilinx.com>
[ehabkost: removed assert(), rewrote comment]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
6 years agoqom/cpu: move cpu_model null check to cpu_class_by_name()
Philippe Mathieu-Daudé [Sun, 17 Sep 2017 23:28:42 +0000 (20:28 -0300)]
qom/cpu: move cpu_model null check to cpu_class_by_name()

and clean every implementation.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20170917232842.14544-1-f4bug@amsat.org>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Artyom Tarasenko <atar4qemu@gmail.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
6 years agoMerge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Peter Maydell [Fri, 6 Oct 2017 16:43:02 +0000 (17:43 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Fri 06 Oct 2017 16:52:59 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (54 commits)
  block/mirror: check backing in bdrv_mirror_top_flush
  qcow2: truncate the tail of the image file after shrinking the image
  qcow2: fix return error code in qcow2_truncate()
  iotests: Fix 195 if IMGFMT is part of TEST_DIR
  block/mirror: check backing in bdrv_mirror_top_refresh_filename
  block: support passthrough of BDRV_REQ_FUA in crypto driver
  block: convert qcrypto_block_encrypt|decrypt to take bytes offset
  block: convert crypto driver to bdrv_co_preadv|pwritev
  block: fix data type casting for crypto payload offset
  crypto: expose encryption sector size in APIs
  block: use 1 MB bounce buffers for crypto instead of 16KB
  iotests: Add test 197 for covering copy-on-read
  block: Perform copy-on-read in loop
  block: Add blkdebug hook for copy-on-read
  iotests: Restore stty settings on completion
  block: Uniform handling of 0-length bdrv_get_block_status()
  qemu-io: Add -C for opening with copy-on-read
  commit: Remove overlay_bs
  qemu-iotests: Test commit block job where top has two parents
  qemu-iotests: Allow QMP pretty printing in common.qemu
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6 years agoMerge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20171006' into...
Peter Maydell [Fri, 6 Oct 2017 16:00:42 +0000 (17:00 +0100)]
Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20171006' into staging

target-arm:
 * v8M: more preparatory work
 * nvic: reset properly rather than leaving the nvic in a weird state
 * xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
 * sd: fix out-of-bounds check for multi block reads
 * arm: Fix SMC reporting to EL2 when QEMU provides PSCI

# gpg: Signature made Fri 06 Oct 2017 16:58:15 BST
# gpg:                using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20171006:
  nvic: Add missing code for writing SHCSR.HARDFAULTPENDED bit
  target/arm: Factor out "get mmuidx for specified security state"
  target/arm: Fix calculation of secure mm_idx values
  target/arm: Implement security attribute lookups for memory accesses
  nvic: Implement Security Attribution Unit registers
  target/arm: Add v8M support to exception entry code
  target/arm: Add support for restoring v8M additional state context
  target/arm: Update excret sanity checks for v8M
  target/arm: Add new-in-v8M SFSR and SFAR
  target/arm: Don't warn about exception return with PC low bit set for v8M
  target/arm: Warn about restoring to unaligned stack
  target/arm: Check for xPSR mismatch usage faults earlier for v8M
  target/arm: Restore SPSEL to correct CONTROL register on exception return
  target/arm: Restore security state on exception return
  target/arm: Prepare for CONTROL.SPSEL being nonzero in Handler mode
  target/arm: Don't switch to target stack early in v7M exception return
  nvic: Clear the vector arrays and prigroup on reset
  hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
  hw/sd: fix out-of-bounds check for multi block reads
  arm: Fix SMC reporting to EL2 when QEMU provides PSCI

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6 years agonvic: Add missing code for writing SHCSR.HARDFAULTPENDED bit
Peter Maydell [Fri, 6 Oct 2017 15:46:49 +0000 (16:46 +0100)]
nvic: Add missing code for writing SHCSR.HARDFAULTPENDED bit

When we added support for the new SHCSR bits in v8M in commit
437d59c17e9 the code to support writing to the new HARDFAULTPENDED
bit was accidentally only added for non-secure writes; the
secure banked version of the bit should also be writable.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-21-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Factor out "get mmuidx for specified security state"
Peter Maydell [Fri, 6 Oct 2017 15:46:49 +0000 (16:46 +0100)]
target/arm: Factor out "get mmuidx for specified security state"

For the SG instruction and secure function return we are going
to want to do memory accesses using the MMU index of the CPU
in secure state, even though the CPU is currently in non-secure
state. Write arm_v7m_mmu_idx_for_secstate() to do this job,
and use it in cpu_mmu_index().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-17-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Fix calculation of secure mm_idx values
Peter Maydell [Fri, 6 Oct 2017 15:46:49 +0000 (16:46 +0100)]
target/arm: Fix calculation of secure mm_idx values

In cpu_mmu_index() we try to do this:
        if (env->v7m.secure) {
            mmu_idx += ARMMMUIdx_MSUser;
        }
but it will give the wrong answer, because ARMMMUIdx_MSUser
includes the 0x40 ARM_MMU_IDX_M field, and so does the
mmu_idx we're adding to, and we'll end up with 0x8n rather
than 0x4n. This error is then nullified by the call to
arm_to_core_mmu_idx() which masks out the high part, but
we're about to factor out the code that calculates the
ARMMMUIdx values so it can be used without passing it through
arm_to_core_mmu_idx(), so fix this bug first.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-16-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Implement security attribute lookups for memory accesses
Peter Maydell [Fri, 6 Oct 2017 15:46:49 +0000 (16:46 +0100)]
target/arm: Implement security attribute lookups for memory accesses

Implement the security attribute lookups for memory accesses
in the get_phys_addr() functions, causing these to generate
various kinds of SecureFault for bad accesses.

The major subtlety in this code relates to handling of the
case when the security attributes the SAU assigns to the
address don't match the current security state of the CPU.

In the ARM ARM pseudocode for validating instruction
accesses, the security attributes of the address determine
whether the Secure or NonSecure MPU state is used. At face
value, handling this would require us to encode the relevant
bits of state into mmu_idx for both S and NS at once, which
would result in our needing 16 mmu indexes. Fortunately we
don't actually need to do this because a mismatch between
address attributes and CPU state means either:
 * some kind of fault (usually a SecureFault, but in theory
   perhaps a UserFault for unaligned access to Device memory)
 * execution of the SG instruction in NS state from a
   Secure & NonSecure code region

The purpose of SG is simply to flip the CPU into Secure
state, so we can handle it by emulating execution of that
instruction directly in arm_v7m_cpu_do_interrupt(), which
means we can treat all the mismatch cases as "throw an
exception" and we don't need to encode the state of the
other MPU bank into our mmu_idx values.

This commit doesn't include the actual emulation of SG;
it also doesn't include implementation of the IDAU, which
is a per-board way to specify hard-coded memory attributes
for addresses, which override the CPU-internal SAU if they
specify a more secure setting than the SAU is programmed to.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-15-git-send-email-peter.maydell@linaro.org

6 years agonvic: Implement Security Attribution Unit registers
Peter Maydell [Fri, 6 Oct 2017 15:46:49 +0000 (16:46 +0100)]
nvic: Implement Security Attribution Unit registers

Implement the register interface for the SAU: SAU_CTRL,
SAU_TYPE, SAU_RNR, SAU_RBAR and SAU_RLAR. None of the
actual behaviour is implemented here; registers just
read back as written.

When the CPU definition for Cortex-M33 is eventually
added, its initfn will set cpu->sau_sregion, in the same
way that we currently set cpu->pmsav7_dregion for the
M3 and M4.

Number of SAU regions is typically a configurable
CPU parameter, but this patch doesn't provide a
QEMU CPU property for it. We can easily add one when
we have a board that requires it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-14-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Add v8M support to exception entry code
Peter Maydell [Fri, 6 Oct 2017 15:46:49 +0000 (16:46 +0100)]
target/arm: Add v8M support to exception entry code

Add support for v8M and in particular the security extension
to the exception entry code. This requires changes to:
 * calculation of the exception-return magic LR value
 * push the callee-saves registers in certain cases
 * clear registers when taking non-secure exceptions to avoid
   leaking information from the interrupted secure code
 * switch to the correct security state on entry
 * use the vector table for the security state we're targeting

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-13-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Add support for restoring v8M additional state context
Peter Maydell [Fri, 6 Oct 2017 15:46:48 +0000 (16:46 +0100)]
target/arm: Add support for restoring v8M additional state context

For v8M, exceptions from Secure to Non-Secure state will save
callee-saved registers to the exception frame as well as the
caller-saved registers. Add support for unstacking these
registers in exception exit when necessary.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-12-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Update excret sanity checks for v8M
Peter Maydell [Fri, 6 Oct 2017 15:46:48 +0000 (16:46 +0100)]
target/arm: Update excret sanity checks for v8M

In v8M, more bits are defined in the exception-return magic
values; update the code that checks these so we accept
the v8M values when the CPU permits them.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-11-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Add new-in-v8M SFSR and SFAR
Peter Maydell [Fri, 6 Oct 2017 15:46:48 +0000 (16:46 +0100)]
target/arm: Add new-in-v8M SFSR and SFAR

Add the new M profile Secure Fault Status Register
and Secure Fault Address Register.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-10-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Don't warn about exception return with PC low bit set for v8M
Peter Maydell [Fri, 6 Oct 2017 15:46:48 +0000 (16:46 +0100)]
target/arm: Don't warn about exception return with PC low bit set for v8M

In the v8M architecture, return from an exception to a PC which
has bit 0 set is not UNPREDICTABLE; it is defined that bit 0
is discarded [R_HRJH]. Restrict our complaint about this to v7M.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-9-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Warn about restoring to unaligned stack
Peter Maydell [Fri, 6 Oct 2017 15:46:48 +0000 (16:46 +0100)]
target/arm: Warn about restoring to unaligned stack

Attempting to do an exception return with an exception frame that
is not 8-aligned is UNPREDICTABLE in v8M; warn about this.
(It is not UNPREDICTABLE in v7M, and our implementation can
handle the merely-4-aligned case fine, so we don't need to
do anything except warn.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-8-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Check for xPSR mismatch usage faults earlier for v8M
Peter Maydell [Fri, 6 Oct 2017 15:46:48 +0000 (16:46 +0100)]
target/arm: Check for xPSR mismatch usage faults earlier for v8M

ARM v8M specifies that the INVPC usage fault for mismatched
xPSR exception field and handler mode bit should be checked
before updating the PSR and SP, so that the fault is taken
with the existing stack frame rather than by pushing a new one.
Perform this check in the right place for v8M.

Since v7M specifies in its pseudocode that this usage fault
check should happen later, we have to retain the original
code for that check rather than being able to merge the two.
(The distinction is architecturally visible but only in
very obscure corner cases like attempting an invalid exception
return with an exception frame in read only memory.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-7-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Restore SPSEL to correct CONTROL register on exception return
Peter Maydell [Fri, 6 Oct 2017 15:46:48 +0000 (16:46 +0100)]
target/arm: Restore SPSEL to correct CONTROL register on exception return

On exception return for v8M, the SPSEL bit in the EXC_RETURN magic
value should be restored to the SPSEL bit in the CONTROL register
banked specified by the EXC_RETURN.ES bit.

Add write_v7m_control_spsel_for_secstate() which behaves like
write_v7m_control_spsel() but allows the caller to specify which
CONTROL bank to use, reimplement write_v7m_control_spsel() in
terms of it, and use it in exception return.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-6-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Restore security state on exception return
Peter Maydell [Fri, 6 Oct 2017 15:46:47 +0000 (16:46 +0100)]
target/arm: Restore security state on exception return

Now that we can handle the CONTROL.SPSEL bit not necessarily being
in sync with the current stack pointer, we can restore the correct
security state on exception return. This happens before we start
to read registers off the stack frame, but after we have taken
possible usage faults for bad exception return magic values and
updated CONTROL.SPSEL.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-5-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Prepare for CONTROL.SPSEL being nonzero in Handler mode
Peter Maydell [Fri, 6 Oct 2017 15:46:47 +0000 (16:46 +0100)]
target/arm: Prepare for CONTROL.SPSEL being nonzero in Handler mode

In the v7M architecture, there is an invariant that if the CPU is
in Handler mode then the CONTROL.SPSEL bit cannot be nonzero.
This in turn means that the current stack pointer is always
indicated by CONTROL.SPSEL, even though Handler mode always uses
the Main stack pointer.

In v8M, this invariant is removed, and CONTROL.SPSEL may now
be nonzero in Handler mode (though Handler mode still always
uses the Main stack pointer). In preparation for this change,
change how we handle this bit: rename switch_v7m_sp() to
the now more accurate write_v7m_control_spsel(), and make it
check both the handler mode state and the SPSEL bit.

Note that this implicitly changes the point at which we switch
active SP on exception exit from before we pop the exception
frame to after it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-4-git-send-email-peter.maydell@linaro.org

6 years agotarget/arm: Don't switch to target stack early in v7M exception return
Peter Maydell [Fri, 6 Oct 2017 15:46:47 +0000 (16:46 +0100)]
target/arm: Don't switch to target stack early in v7M exception return

Currently our M profile exception return code switches to the
target stack pointer relatively early in the process, before
it tries to pop the exception frame off the stack. This is
awkward for v8M for two reasons:
 * in v8M the process vs main stack pointer is not selected
   purely by the value of CONTROL.SPSEL, so updating SPSEL
   and relying on that to switch to the right stack pointer
   won't work
 * the stack we should be reading the stack frame from and
   the stack we will eventually switch to might not be the
   same if the guest is doing strange things

Change our exception return code to use a 'frame pointer'
to read the exception frame rather than assuming that we
can switch the live stack pointer this early.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 1506092407-26985-3-git-send-email-peter.maydell@linaro.org

6 years agonvic: Clear the vector arrays and prigroup on reset
Peter Maydell [Fri, 6 Oct 2017 15:46:47 +0000 (16:46 +0100)]
nvic: Clear the vector arrays and prigroup on reset

Reset for devices does not include an automatic clear of the
device state (unlike CPU state, where most of the state
structure is cleared to zero). Add some missing initialization
of NVIC state that meant that the device was left in the wrong
state if the guest did a warm reset.

(In particular, since we were resetting the computed state like
s->exception_prio but not all the state it was computed
from like s->vectors[x].active, the NVIC wound up in an
inconsistent state that could later trigger assertion failures.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1506092407-26985-2-git-send-email-peter.maydell@linaro.org

6 years agohw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false
Thomas Huth [Fri, 6 Oct 2017 15:46:47 +0000 (16:46 +0100)]
hw/arm/xlnx-zynqmp: Mark the "xlnx, zynqmp" device with user_creatable = false

The device uses serial_hds in its realize function and thus can't be
used twice. Apart from that, the comma in its name makes it quite hard
to use for the user anyway, since a comma is normally used to separate
the device name from its properties when using the "-device" parameter
or the "device_add" HMP command.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 1506441116-16627-1-git-send-email-thuth@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6 years agohw/sd: fix out-of-bounds check for multi block reads
Michael Olbrich [Fri, 6 Oct 2017 15:46:47 +0000 (16:46 +0100)]
hw/sd: fix out-of-bounds check for multi block reads

The current code checks if the next block exceeds the size of the card.
This generates an error while reading the last block of the card.
Do the out-of-bounds check when starting to read a new block to fix this.

This issue became visible with increased error checking in Linux 4.13.

Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Olbrich <m.olbrich@pengutronix.de>
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 20170916091611.10241-1-m.olbrich@pengutronix.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6 years agoarm: Fix SMC reporting to EL2 when QEMU provides PSCI
Jan Kiszka [Fri, 6 Oct 2017 15:46:47 +0000 (16:46 +0100)]
arm: Fix SMC reporting to EL2 when QEMU provides PSCI

This properly forwards SMC events to EL2 when PSCI is provided by QEMU
itself and, thus, ARM_FEATURE_EL3 is off.

Found and tested with the Jailhouse hypervisor. Solution based on
suggestions by Peter Maydell.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-id: 4f243068-aaea-776f-d18f-f9e05e7be9cd@siemens.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6 years agoMerge remote-tracking branch 'mreitz/tags/pull-block-2017-10-06' into queue-block
Kevin Wolf [Fri, 6 Oct 2017 14:32:08 +0000 (16:32 +0200)]
Merge remote-tracking branch 'mreitz/tags/pull-block-2017-10-06' into queue-block

Block patches

# gpg: Signature made Fri Oct  6 16:30:57 2017 CEST
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-10-06:
  block/mirror: check backing in bdrv_mirror_top_flush
  qcow2: truncate the tail of the image file after shrinking the image
  qcow2: fix return error code in qcow2_truncate()
  iotests: Fix 195 if IMGFMT is part of TEST_DIR
  block/mirror: check backing in bdrv_mirror_top_refresh_filename
  block: support passthrough of BDRV_REQ_FUA in crypto driver
  block: convert qcrypto_block_encrypt|decrypt to take bytes offset
  block: convert crypto driver to bdrv_co_preadv|pwritev
  block: fix data type casting for crypto payload offset
  crypto: expose encryption sector size in APIs
  block: use 1 MB bounce buffers for crypto instead of 16KB

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoblock/mirror: check backing in bdrv_mirror_top_flush
Vladimir Sementsov-Ogievskiy [Fri, 29 Sep 2017 15:22:55 +0000 (18:22 +0300)]
block/mirror: check backing in bdrv_mirror_top_flush

Backing may be zero after failed bdrv_append in mirror_start_job,
which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170929152255.5431-1-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoqcow2: truncate the tail of the image file after shrinking the image
Pavel Butsykin [Fri, 29 Sep 2017 12:16:13 +0000 (15:16 +0300)]
qcow2: truncate the tail of the image file after shrinking the image

Now after shrinking the image, at the end of the image file, there might be a
tail that probably will never be used. So we can find the last used cluster and
cut the tail.

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20170929121613.25997-3-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoqcow2: fix return error code in qcow2_truncate()
Pavel Butsykin [Fri, 29 Sep 2017 12:16:12 +0000 (15:16 +0300)]
qcow2: fix return error code in qcow2_truncate()

Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170929121613.25997-2-pbutsykin@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoiotests: Fix 195 if IMGFMT is part of TEST_DIR
Max Reitz [Wed, 27 Sep 2017 21:13:34 +0000 (23:13 +0200)]
iotests: Fix 195 if IMGFMT is part of TEST_DIR

do_run_qemu() in iotest 195 first applies _filter_imgfmt when printing
qemu's command line and _filter_testdir only afterwards.  Therefore, if
the image format is part of the test directory path, _filter_testdir
will no longer apply and the actual output will differ from the
reference output even in case of success.

For example, TEST_DIR might be "/tmp/test-qcow2", in which case
_filter_imgfmt first transforms this to "/tmp/test-IMGFMT" which is no
longer recognized as the TEST_DIR by _filter_testdir.

Fix this by not applying _filter_imgfmt in do_run_qemu() but in
run_qemu() instead, and only after _filter_testdir.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20170927211334.3988-1-mreitz@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoblock/mirror: check backing in bdrv_mirror_top_refresh_filename
Vladimir Sementsov-Ogievskiy [Thu, 28 Sep 2017 12:03:00 +0000 (15:03 +0300)]
block/mirror: check backing in bdrv_mirror_top_refresh_filename

Backing may be zero after failed bdrv_attach_child in
bdrv_set_backing_hd, which leads to SIGSEGV.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20170928120300.58164-1-vsementsov@virtuozzo.com
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoblock: support passthrough of BDRV_REQ_FUA in crypto driver
Daniel P. Berrange [Wed, 27 Sep 2017 12:53:40 +0000 (13:53 +0100)]
block: support passthrough of BDRV_REQ_FUA in crypto driver

The BDRV_REQ_FUA flag can trivially be allowed in the crypt driver
as a passthrough to the underlying block driver.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-7-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoblock: convert qcrypto_block_encrypt|decrypt to take bytes offset
Daniel P. Berrange [Wed, 27 Sep 2017 12:53:39 +0000 (13:53 +0100)]
block: convert qcrypto_block_encrypt|decrypt to take bytes offset

Instead of sector offset, take the bytes offset when encrypting
or decrypting data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-6-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoblock: convert crypto driver to bdrv_co_preadv|pwritev
Daniel P. Berrange [Wed, 27 Sep 2017 12:53:38 +0000 (13:53 +0100)]
block: convert crypto driver to bdrv_co_preadv|pwritev

Make the crypto driver implement the bdrv_co_preadv|pwritev
callbacks, and also use bdrv_co_preadv|pwritev for I/O
with the protocol driver beneath. This replaces sector based
I/O with byte based I/O, and allows us to stop assuming the
physical sector size matches the encryption sector size.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-5-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoblock: fix data type casting for crypto payload offset
Daniel P. Berrange [Wed, 27 Sep 2017 12:53:37 +0000 (13:53 +0100)]
block: fix data type casting for crypto payload offset

The crypto APIs report the offset of the data payload as an uint64_t
type, but the block driver is casting to size_t or ssize_t which will
potentially truncate.

Most of the block APIs use int64_t for offsets meanwhile, so even if
using uint64_t in the crypto block driver we are still at risk of
truncation.

Change the block crypto driver to use uint64_t, but add asserts that
the value is less than INT64_MAX.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-4-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agocrypto: expose encryption sector size in APIs
Daniel P. Berrange [Wed, 27 Sep 2017 12:53:36 +0000 (13:53 +0100)]
crypto: expose encryption sector size in APIs

While current encryption schemes all have a fixed sector size of
512 bytes, this is not guaranteed to be the case in future. Expose
the sector size in the APIs so the block layer can remove assumptions
about fixed 512 byte sectors.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-3-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoblock: use 1 MB bounce buffers for crypto instead of 16KB
Daniel P. Berrange [Wed, 27 Sep 2017 12:53:35 +0000 (13:53 +0100)]
block: use 1 MB bounce buffers for crypto instead of 16KB

Using 16KB bounce buffers creates a significant performance
penalty for I/O to encrypted volumes on storage which high
I/O latency (rotating rust & network drives), because it
triggers lots of fairly small I/O operations.

On tests with rotating rust, and cache=none|directsync,
write speed increased from 2MiB/s to 32MiB/s, on a par
with that achieved by the in-kernel luks driver. With
other cache modes the in-kernel driver is still notably
faster because it is able to report completion of the
I/O request before any encryption is done, while the
in-QEMU driver must encrypt the data before completion.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20170927125340.12360-2-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
6 years agoiotests: Add test 197 for covering copy-on-read
Eric Blake [Thu, 5 Oct 2017 19:02:48 +0000 (14:02 -0500)]
iotests: Add test 197 for covering copy-on-read

Add a test for qcow2 copy-on-read behavior, including exposure
for the just-fixed bugs.

The copy-on-read behavior is always to a qcow2 image, but the
test is careful to allow running with most image protocol/format
combos as the backing file being copied from (luks being the
exception, as it is harder to pass the right secret to all the
right places).  In fact, for './check nbd', this appears to be
the first time we've had a qcow2 image wrapping NBD, requiring
an additional line in _filter_img_create to match the similar
line in _filter_img_info.

Invoking blkdebug to prove we don't write too much took some
effort to get working; and it requires that $TEST_WRAP (based
on $TEST_DIR) not be subject to word splitting.  We may decide
later to have the entire iotests suite use relative rather than
absolute names, to avoid problems inherited by the absolute
name of $PWD or $TEST_DIR, at which point the sanity check in
this commit could be simplified.

This test requires at least 2G of consecutive memory to succeed;
as such, it is prone to spurious failures, particularly on
32-bit machines under load.  This situation is detected and
triggers an early exit to skip the test, rather than a failure.
To manually provoke this setup on a beefier machine, I used:
  $ (ulimit -S -v 1000000; ./check -qcow2 197)

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoblock: Perform copy-on-read in loop
Eric Blake [Thu, 5 Oct 2017 19:02:47 +0000 (14:02 -0500)]
block: Perform copy-on-read in loop

Improve our braindead copy-on-read implementation.  Pre-patch,
we have multiple issues:
- we create a bounce buffer and perform a write for the entire
request, even if the active image already has 99% of the
clusters occupied, and really only needs to copy-on-read the
remaining 1% of the clusters
- our bounce buffer was as large as the read request, and can
needlessly exhaust our memory by using double the memory of
the request size (the original request plus our bounce buffer),
rather than a capped maximum overhead beyond the original
- if a driver has a max_transfer limit, we are bypassing the
normal code in bdrv_aligned_preadv() that fragments to that
limit, and instead attempt to read the entire buffer from the
driver in one go, which some drivers may assert on
- a client can request a large request of nearly 2G such that
rounding the request out to cluster boundaries results in a
byte count larger than 2G.  While this cannot exceed 32 bits,
it DOES have some follow-on problems:
-- the call to bdrv_driver_pread() can assert for exceeding
BDRV_REQUEST_MAX_BYTES, if the driver is old and lacks
.bdrv_co_preadv
-- if the buffer is all zeroes, the subsequent call to
bdrv_co_do_pwrite_zeroes is a no-op due to a negative size,
which means we did not actually copy on read

Fix all of these issues by breaking up the action into a loop,
where each iteration is capped to sane limits.  Also, querying
the allocation status allows us to optimize: when data is
already present in the active layer, we don't need to bounce.

Note that the code has a telling comment that copy-on-read
should probably be a filter driver rather than a bolt-on hack
in io.c; but that remains a task for another day.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoblock: Add blkdebug hook for copy-on-read
Eric Blake [Thu, 5 Oct 2017 19:02:46 +0000 (14:02 -0500)]
block: Add blkdebug hook for copy-on-read

Make it possible to inject errors on writes performed during a
read operation due to copy-on-read semantics.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoiotests: Restore stty settings on completion
Eric Blake [Thu, 5 Oct 2017 19:02:45 +0000 (14:02 -0500)]
iotests: Restore stty settings on completion

Executing qemu with a terminal as stdin will temporarily alter stty
settings on that terminal (for example, disabling echo), because of
how we run both the monitor and any multiplexing with guest input.
Normally, qemu restores the original settings on exit; but if an
iotest triggers qemu to abort in the middle, we can be left with
the altered terminal setup.  This can make life very annoying when
debugging an iotest failure (not everyone remembers the trick of
blind-typing 'stty sane' without echo, and some people prefer
terminal settings that are slightly different than the defaults
picked by 'stty sane').

It is possible to avoid qemu corrupting the terminal by not passing
a terminal to qemu's stdin in the first place (as in, use
'./check ... </dev/null'), but that's extra typing to have to
remember.  But running 'exec </dev/null' in the harness seems like
it might be too heavy of a hammer.  So I instead went the the
solution of saving and restoring the stty settings, only when the
harness detects that it is run interactively.

I tested this patch by forcing an allocation failure (I can't
guarantee that this particular limit will work on all setups, but
it shows the idea):
 $ (ulimit -S -v 500000; ./check -qcow2 1)

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoblock: Uniform handling of 0-length bdrv_get_block_status()
Eric Blake [Thu, 5 Oct 2017 19:02:44 +0000 (14:02 -0500)]
block: Uniform handling of 0-length bdrv_get_block_status()

Handle a 0-length block status request up front, with a uniform
return value claiming the area is not allocated.

Most callers don't pass a length of 0 to bdrv_get_block_status()
and friends; but it definitely happens with a 0-length read when
copy-on-read is enabled.  While we could audit all callers to
ensure that they never make a 0-length request, and then assert
that fact, it was just as easy to fix things to always report
success (as long as the callers are careful to not go into an
infinite loop).  However, we had inconsistent behavior on whether
the status is reported as allocated or defers to the backing
layer, depending on what callbacks the driver implements, and
possibly wasting quite a few CPU cycles to get to that answer.
Consistently reporting unallocated up front doesn't really hurt
anything, and makes it easier both for callers (0-length requests
now have well-defined behavior) and for drivers (drivers don't
have to deal with 0-length requests).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-io: Add -C for opening with copy-on-read
Eric Blake [Thu, 5 Oct 2017 19:02:43 +0000 (14:02 -0500)]
qemu-io: Add -C for opening with copy-on-read

Make it easier to enable copy-on-read during iotests, by
exposing a new bool option to main and open.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agocommit: Remove overlay_bs
Kevin Wolf [Tue, 27 Jun 2017 18:36:18 +0000 (20:36 +0200)]
commit: Remove overlay_bs

We don't need to make any assumptions about the graph layout above the
top node of the commit operation any more. Remove the use of
bdrv_find_overlay() and related variables from the commit job code.

bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so
we can just drop it.

The overlay node was previously added to the block job to get a
BLK_PERM_GRAPH_MOD. We really need to respect those permissions in
bdrv_drop_intermediate() now, but as long as we haven't figured out yet
how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO
comment there.

With this change, it is now possible to perform another block job on an
overlay node without conflicts. qemu-iotests 030 is changed accordingly.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
6 years agoqemu-iotests: Test commit block job where top has two parents
Kevin Wolf [Thu, 29 Jun 2017 17:56:48 +0000 (19:56 +0200)]
qemu-iotests: Test commit block job where top has two parents

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: Allow QMP pretty printing in common.qemu
Kevin Wolf [Thu, 29 Jun 2017 16:54:24 +0000 (18:54 +0200)]
qemu-iotests: Allow QMP pretty printing in common.qemu

QMP responses to certain commands can become quite long, which doesn't
only make reading them hard, but also means that the maximum line length
in patch emails can be exceeded. Allow tests to switch to QMP pretty
printing, which results in more, but shorter lines.

We also need to make sure to keep indentation in the response for this
to work as expected.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
6 years agocommit: Support multiple roots above top node
Kevin Wolf [Tue, 19 Sep 2017 14:22:54 +0000 (16:22 +0200)]
commit: Support multiple roots above top node

This changes the commit block job to support operation in a graph where
there is more than a single active layer that references the top node.

This involves inserting the commit filter node not only on the path
between the given active node and the top node, but between the top node
and all of its parents.

On completion, bdrv_drop_intermediate() must consider all parents for
updating the backing file link. These parents may be backing files
themselves and as such read-only; reopen them temporarily if necessary.
Previously this was achieved by the bdrv_reopen() calls in the commit
block job that made overlay_bs read-write for the whole duration of the
block job, even though write access is only needed on completion.

Now that we consider all parents, overlay_bs is meaningless. It is left
in place in this commit, but we'll remove it soon.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoblock: Introduce BdrvChildRole.update_filename
Kevin Wolf [Thu, 29 Jun 2017 17:32:21 +0000 (19:32 +0200)]
block: Introduce BdrvChildRole.update_filename

There is no good reason for bdrv_drop_intermediate() to know the active
layer above the subchain it is operating on - even more so, because
the assumption that there is a single active layer above it is not
generally true.

In order to prepare removal of the active parameter, use a BdrvChildRole
callback to update the backing file string in the overlay image instead
of directly calling bdrv_change_backing_file().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
6 years agoqemu-iotests: merge "check" and "common"
Paolo Bonzini [Tue, 12 Sep 2017 14:44:59 +0000 (16:44 +0200)]
qemu-iotests: merge "check" and "common"

"check" is full of qemu-iotests--specific details.  Separating it
from "common" does not make much sense anymore.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: get rid of $iam
Paolo Bonzini [Tue, 12 Sep 2017 14:44:58 +0000 (16:44 +0200)]
qemu-iotests: get rid of $iam

The variable is almost unused, and one of the two uses is actually
uninitialized.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: fix uninitialized variable
Paolo Bonzini [Tue, 12 Sep 2017 14:44:57 +0000 (16:44 +0200)]
qemu-iotests: fix uninitialized variable

The variable is used in "common" but defined only after the file
is sourced.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: disintegrate more parts of common.config
Paolo Bonzini [Tue, 12 Sep 2017 14:44:56 +0000 (16:44 +0200)]
qemu-iotests: disintegrate more parts of common.config

Split "check" parts from tests part.

For the directory setup, the actual computation of directories goes
in "check", while the sanity checks go in the tests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: do not include common.rc in "check"
Paolo Bonzini [Tue, 12 Sep 2017 14:44:55 +0000 (16:44 +0200)]
qemu-iotests: do not include common.rc in "check"

It only provides functions used by the test programs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: limit non-_PROG-suffixed variables to common.rc
Paolo Bonzini [Tue, 12 Sep 2017 14:44:54 +0000 (16:44 +0200)]
qemu-iotests: limit non-_PROG-suffixed variables to common.rc

These are never used by "check", with one exception that does not need
$QEMU_OPTIONS.  Keep them in common.rc, which will be soon included only
by the tests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: cleanup and fix search for programs
Paolo Bonzini [Tue, 12 Sep 2017 14:44:53 +0000 (16:44 +0200)]
qemu-iotests: cleanup and fix search for programs

Instead of ./check failing when a binary is missing, we try each test
case now and each one fails with tons of test case diffs.  Also, all the
variables were initialized by "check" prior to "common" being sourced,
and then (uselessly) checked for emptiness again in "check".

Centralize the search for programs in "common" (which will soon be
one with "check"), including the "realpath" invocation which can be done
just once in "check" rather than in the tests.

For qnio_server, move the detection to "common", simplifying
set_prog_path to stop handling the unused second argument, and
embedding the "realpath" pass.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: move "check" code out of common.rc
Paolo Bonzini [Tue, 12 Sep 2017 14:44:52 +0000 (16:44 +0200)]
qemu-iotests: move "check" code out of common.rc

Some functions in common.rc are never used by the tests.  Move
them out of that file and into common, which is already included
only by "check".

Code that actually *is* common to "check" and tests can be placed in
common.config.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: get rid of AWK_PROG
Paolo Bonzini [Tue, 12 Sep 2017 14:44:51 +0000 (16:44 +0200)]
qemu-iotests: get rid of AWK_PROG

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqemu-iotests: remove dead code
Paolo Bonzini [Tue, 12 Sep 2017 14:44:50 +0000 (16:44 +0200)]
qemu-iotests: remove dead code

This includes shell function, shell variables and command line options
(randomize.awk does not exist).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agohw/block/onenand: Remove dead code block
Thomas Huth [Tue, 3 Oct 2017 09:57:24 +0000 (11:57 +0200)]
hw/block/onenand: Remove dead code block

The condition of the for-loop makes sure that b is always smaller
than s->blocks, so the "if (b >= s->blocks)" statement is completely
superfluous here.

Buglink: https://bugs.launchpad.net/qemu/+bug/1715007
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Convert internal hbitmap size/granularity
Eric Blake [Mon, 25 Sep 2017 14:55:26 +0000 (09:55 -0500)]
dirty-bitmap: Convert internal hbitmap size/granularity

Now that all callers are using byte-based interfaces, there's no
reason for our internal hbitmap to remain with sector-based
granularity.  It also simplifies our internal scaling, since we
already know that hbitmap widens requests out to granularity
boundaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Switch bdrv_set_dirty() to bytes
Eric Blake [Mon, 25 Sep 2017 14:55:25 +0000 (09:55 -0500)]
dirty-bitmap: Switch bdrv_set_dirty() to bytes

Both callers already had bytes available, but were scaling to
sectors.  Move the scaling to internal code.  In the case of
bdrv_aligned_pwritev(), we are now passing the exact offset
rather than a rounded sector-aligned value, but that's okay
as long as dirty bitmap widens start/bytes to granularity
boundaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqcow2: Switch store_bitmap_data() to byte-based iteration
Eric Blake [Mon, 25 Sep 2017 14:55:24 +0000 (09:55 -0500)]
qcow2: Switch store_bitmap_data() to byte-based iteration

Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

iotests 165 was rather weak - on a default 64k-cluster image, where
bitmap granularity also defaults to 64k bytes, a single cluster of
the bitmap table thus covers (64*1024*8) bits which each cover 64k
bytes, or 32G of image space.  But the test only uses a 1G image,
so it cannot trigger any more than one loop of the code in
store_bitmap_data(); and it was writing to the first cluster.  In
order to test that we are properly aligning which portions of the
bitmap are being written to the file, we really want to test a case
where the first dirty bit returned by bdrv_dirty_iter_next() is not
aligned to the start of a cluster, which we can do by modifying the
test to write data that doesn't happen to fall in the first cluster
of the image.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqcow2: Switch load_bitmap_data() to byte-based iteration
Eric Blake [Mon, 25 Sep 2017 14:55:23 +0000 (09:55 -0500)]
qcow2: Switch load_bitmap_data() to byte-based iteration

Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqcow2: Switch qcow2_measure() to byte-based iteration
Eric Blake [Mon, 25 Sep 2017 14:55:22 +0000 (09:55 -0500)]
qcow2: Switch qcow2_measure() to byte-based iteration

This is new code, but it is easier to read if it makes passes over
the image using bytes rather than sectors (and will get easier in
the future when bdrv_get_block_status is converted to byte-based).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agomirror: Switch mirror_dirty_init() to byte-based iteration
Eric Blake [Mon, 25 Sep 2017 14:55:21 +0000 (09:55 -0500)]
mirror: Switch mirror_dirty_init() to byte-based iteration

Now that we have adjusted the majority of the calls this function
makes to be byte-based, it is easier to read the code if it makes
passes over the image using bytes rather than sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes
Eric Blake [Mon, 25 Sep 2017 14:55:20 +0000 (09:55 -0500)]
dirty-bitmap: Change bdrv_[re]set_dirty_bitmap() to use bytes

Some of the callers were already scaling bytes to sectors; others
can be easily converted to pass byte offsets, all in our shift
towards a consistent byte interface everywhere.  Making the change
will also make it easier to write the hold-out callers to use byte
rather than sectors for their iterations; it also makes it easier
for a future dirty-bitmap patch to offload scaling over to the
internal hbitmap.  Although all callers happen to pass
sector-aligned values, make the internal scaling robust to any
sub-sector requests.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Change bdrv_get_dirty_locked() to take bytes
Eric Blake [Mon, 25 Sep 2017 14:55:19 +0000 (09:55 -0500)]
dirty-bitmap: Change bdrv_get_dirty_locked() to take bytes

Half the callers were already scaling bytes to sectors; the other
half can eventually be simplified to use byte iteration.  Both
callers were already using the result as a bool, so make that
explicit.  Making the change also makes it easier for a future
dirty-bitmap patch to offload scaling over to the internal hbitmap.

Remember, asking whether a byte is dirty is effectively asking
whether the entire granularity containing the byte is dirty, since
we only track dirtiness by granularity.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Change bdrv_get_dirty_count() to report bytes
Eric Blake [Mon, 25 Sep 2017 14:55:18 +0000 (09:55 -0500)]
dirty-bitmap: Change bdrv_get_dirty_count() to report bytes

Thanks to recent cleanups, all callers were scaling a return value
of sectors into bytes; do the scaling internally instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset
Eric Blake [Mon, 25 Sep 2017 14:55:17 +0000 (09:55 -0500)]
dirty-bitmap: Change bdrv_dirty_iter_next() to report byte offset

Thanks to recent cleanups, most callers were scaling a return value
of sectors into bytes (the exception, in qcow2-bitmap, will be
converted to byte-based iteration later).  Update the interface to
do the scaling internally instead.

In qcow2-bitmap, the code was specifically checking for an error
return of -1.  To avoid a regression, we either have to make sure
we continue to return -1 (rather than a scaled -512) on error, or
we have to fix the caller to treat all negative values as error
rather than just one magic value.  It's easy enough to make both
changes at the same time, even though either one in isolation
would work.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Set iterator start by offset, not sector
Eric Blake [Mon, 25 Sep 2017 14:55:16 +0000 (09:55 -0500)]
dirty-bitmap: Set iterator start by offset, not sector

All callers to bdrv_dirty_iter_new() passed 0 for their initial
starting point, drop that parameter.

Most callers to bdrv_set_dirty_iter() were scaling a byte offset to
a sector number; the exception qcow2-bitmap will be converted later
to use byte rather than sector iteration.  Move the scaling to occur
internally to dirty bitmap code instead, so that callers now pass
in bytes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqcow2: Switch sectors_covered_by_bitmap_cluster() to byte-based
Eric Blake [Mon, 25 Sep 2017 14:55:15 +0000 (09:55 -0500)]
qcow2: Switch sectors_covered_by_bitmap_cluster() to byte-based

We are gradually converting to byte-based interfaces, as they are
easier to reason about than sector-based.  Change the qcow2 bitmap
helper function sectors_covered_by_bitmap_cluster(), renaming it
to bytes_covered_by_bitmap_cluster() in the process.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Change bdrv_dirty_bitmap_*serialize*() to take bytes
Eric Blake [Mon, 25 Sep 2017 14:55:14 +0000 (09:55 -0500)]
dirty-bitmap: Change bdrv_dirty_bitmap_*serialize*() to take bytes

Right now, the dirty-bitmap code exposes the fact that we use
a scale of sector granularity in the underlying hbitmap to anything
that wants to serialize a dirty bitmap.  It's nicer to uniformly
expose bytes as our dirty-bitmap interface, matching the previous
change to bitmap size.  The only caller to serialization is currently
qcow2-cluster.c, which becomes a bit more verbose because it is still
tracking sectors for other reasons, but a later patch will fix that
to more uniformly use byte offsets everywhere.  Likewise, within
dirty-bitmap, we have to add more assertions that we are not
truncating incorrectly, which can go away once the internal hbitmap
is byte-based rather than sector-based.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Track bitmap size by bytes
Eric Blake [Mon, 25 Sep 2017 14:55:13 +0000 (09:55 -0500)]
dirty-bitmap: Track bitmap size by bytes

We are still using an internal hbitmap that tracks a size in sectors,
with the granularity scaled down accordingly, because it lets us
use a shortcut for our iterators which are currently sector-based.
But there's no reason we can't track the dirty bitmap size in bytes,
since it is (mostly) an internal-only variable (remember, the size
is how many bytes are covered by the bitmap, not how many bytes the
bitmap occupies).  A later cleanup will convert dirty bitmap
internals to be entirely byte-based, eliminating the intermediate
sector rounding added here; and technically, since bdrv_getlength()
already rounds up to sectors, our use of DIV_ROUND_UP is more for
theoretical completeness than for any actual rounding.

Use is_power_of_2() while at it, instead of open-coding that.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes
Eric Blake [Mon, 25 Sep 2017 14:55:12 +0000 (09:55 -0500)]
dirty-bitmap: Change bdrv_dirty_bitmap_size() to report bytes

We're already reporting bytes for bdrv_dirty_bitmap_granularity();
mixing bytes and sectors in our return values is a recipe for
confusion.  A later cleanup will convert dirty bitmap internals
to be entirely byte-based, but in the meantime, we should report
the bitmap size in bytes.

The only external caller in qcow2-bitmap.c is temporarily more verbose
(because it is still using sector-based math), but will later be
switched to track progress by bytes instead of sectors.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Avoid size query failure during truncate
Eric Blake [Mon, 25 Sep 2017 14:55:11 +0000 (09:55 -0500)]
dirty-bitmap: Avoid size query failure during truncate

We've previously fixed several places where we failed to account
for possible errors from bdrv_nb_sectors().  Fix another one by
making bdrv_dirty_bitmap_truncate() take the new size from the
caller instead of querying itself; then adjust the sole caller
bdrv_truncate() to pass the size just determined by a successful
resize, or to reuse the size given to the original truncate
operation when refresh_total_sectors() was not able to confirm the
actual size (the two sizes can potentially differ according to
rounding constraints), thus avoiding sizing the bitmaps to -1.
This also fixes a bug where not all failure paths in
bdrv_truncate() would set errp.

Note that bdrv_truncate() is still a bit awkward.  We may want
to revisit it later and clean up things to better guarantee that
a resize attempt either fails cleanly up front, or cannot fail
after guest-visible changes have been made (if temporary changes
are made, then they need to be cleanly rolled back).  But that
is a task for another day; for now, the goal is the bare minimum
fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agodirty-bitmap: Drop unused functions
Eric Blake [Mon, 25 Sep 2017 14:55:10 +0000 (09:55 -0500)]
dirty-bitmap: Drop unused functions

We had several functions that no one is currently using, and which
use sector-based interfaces.  I'm trying to convert towards byte-based
interfaces, so it's easier to just drop the unused functions:

bdrv_dirty_bitmap_get_meta
bdrv_dirty_bitmap_get_meta_locked
bdrv_dirty_bitmap_reset_meta
bdrv_dirty_bitmap_meta_granularity

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoqcow2: Ensure bitmap serialization is aligned
Eric Blake [Mon, 25 Sep 2017 14:55:09 +0000 (09:55 -0500)]
qcow2: Ensure bitmap serialization is aligned

When subdividing a bitmap serialization, the code in hbitmap.c
enforces that start/count parameters are aligned (except that
count can end early at end-of-bitmap).  We exposed this required
alignment through bdrv_dirty_bitmap_serialization_align(), but
forgot to actually check that we comply with it.

Fortunately, qcow2 is never dividing bitmap serialization smaller
than one cluster (which is a minimum of 512 bytes); so we are
always compliant with the serialization alignment (which insists
that we partition at least 64 bits per chunk) because we are doing
at least 4k bits per chunk.

Still, it's safer to add an assertion (for the unlikely case that
we'd ever support a cluster smaller than 512 bytes, or if the
hbitmap implementation changes what it considers to be aligned),
rather than leaving bdrv_dirty_bitmap_serialization_align()
without a caller.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agohbitmap: Rename serialization_granularity to serialization_align
Eric Blake [Mon, 25 Sep 2017 14:55:08 +0000 (09:55 -0500)]
hbitmap: Rename serialization_granularity to serialization_align

The only client of hbitmap_serialization_granularity() is dirty-bitmap's
bdrv_dirty_bitmap_serialization_align().  Keeping the two names consistent
is worthwhile, and the shorter name is more representative of what the
function returns (the required alignment to be used for start/count of
other serialization functions, where violating the alignment causes
assertion failures).

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoblock: Make bdrv_img_create() size selection easier to read
Eric Blake [Mon, 25 Sep 2017 14:55:07 +0000 (09:55 -0500)]
block: Make bdrv_img_create() size selection easier to read

All callers of bdrv_img_create() pass in a size, or -1 to read the
size from the backing file.  We then set that size as the QemuOpt
default, which means we will reuse that default rather than the
final parameter to qemu_opt_get_size() several lines later.  But
it is rather confusing to read subsequent checks of 'size == -1'
when it looks (without seeing the full context) like size defaults
to 0; it also doesn't help that a size of 0 is valid (for some
formats).

Rework the logic to make things more legible.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoblock: Typo fix in copy_on_readv()
Eric Blake [Thu, 28 Sep 2017 19:13:25 +0000 (14:13 -0500)]
block: Typo fix in copy_on_readv()

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
6 years agoMerge remote-tracking branch 'remotes/cohuck/tags/s390x-20171006' into staging
Peter Maydell [Fri, 6 Oct 2017 12:19:02 +0000 (13:19 +0100)]
Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20171006' into staging

s390x changes:
- support for IDA (indirect addressing in ccws) via ccw data stream
- support for extended TOD-Clock (z14 feature)
- various fixes and improvements all over the place

# gpg: Signature made Fri 06 Oct 2017 10:52:22 BST
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg:                 aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg:                 aka "Cornelia Huck <cohuck@kernel.org>"
# gpg:                 aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20171006: (33 commits)
  hw/s390x: Mark the "sclpquiesce" device with user_creatable = false
  s390x/tcg: initialize machine check queue
  s390x/sclp: mark sclp-cpu-hotplug as non-usercreatable
  s390x/sclp: Mark the sclp device with user_creatable = false
  s390/kvm: make TOD setting failures fatal for migration
  s390/kvm: Support for get/set of extended TOD-Clock for guest
  s390x/css: fix css migration compat handling
  s390x: sort some devices into categories
  s390x/tcg: make STFL store into the lowcore
  s390x: introduce and use S390_MAX_CPUS
  target/s390x: get rid of next_core_id
  s390x/cpumodel: fix max STFL(E) bit number
  s390x: raise CPU hotplug irq after really hotplugged
  MAINTAINERS: use KVM s390x maintainers for kvm-stubs.c and kvm_s390x.h
  s390x/3270: handle writes of arbitrary length
  s390x/3270: IDA support for 3270 via CcwDataStream
  Revert "s390x/ccw: create s390 phb conditionally"
  s390x/tcg: make idte/ipte use the new _real mmu
  s390x/tcg: make testblock use the new _real mmu
  s390x/tcg: make stora(g) use the new _real mmu
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
6 years agohw/s390x: Mark the "sclpquiesce" device with user_creatable = false
Thomas Huth [Thu, 5 Oct 2017 08:45:05 +0000 (10:45 +0200)]
hw/s390x: Mark the "sclpquiesce" device with user_creatable = false

The "sclpquiesce" device is just an internal device that should not be
created by the user directly. Though it currently does not seem to cause
any obvious trouble when the user instantiates an additional device, let's
better mark it with user_creatable = false to avoid unexpected behavior,
e.g. because the quiesce notifier gets registered multiple times.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1507193105-15627-1-git-send-email-thuth@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
6 years agos390x/tcg: initialize machine check queue
Cornelia Huck [Fri, 29 Sep 2017 14:49:59 +0000 (16:49 +0200)]
s390x/tcg: initialize machine check queue

Just as for external interrupts and I/O interrupts, we need to
initialize mchk_index during cpu reset.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
6 years agos390x/sclp: mark sclp-cpu-hotplug as non-usercreatable
Cornelia Huck [Wed, 4 Oct 2017 15:34:23 +0000 (17:34 +0200)]
s390x/sclp: mark sclp-cpu-hotplug as non-usercreatable

A TYPE_SCLP_CPU_HOTPLUG device for handling cpu hotplug events
is already created by the sclp event facility. Adding a second
TYPE_SCLP_CPU_HOTPLUG device via -device sclp-cpu-hotplug creates
an ambiguity in raise_irq_cpu_hotplug(), leading to a crash once
a cpu is hotplugged.

To fix this, disallow creating a sclp-cpu-hotplug device manually.

Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
6 years agos390x/sclp: Mark the sclp device with user_creatable = false
Thomas Huth [Wed, 4 Oct 2017 13:53:19 +0000 (15:53 +0200)]
s390x/sclp: Mark the sclp device with user_creatable = false

The "sclp" device is just an internal device that can not be instantiated
by the users. If they try to use it, they only get a simple error message:

$ qemu-system-s390x -nographic -device sclp
qemu-system-s390x: Option '-device s390-sclp-event-facility' cannot be
handled by this machine

Since sclp_init() tries to create a TYPE_SCLP_EVENT_FACILITY which is
a non-pluggable sysbus device, there is really no way that the "sclp"
device can be used by the user, so let's set the user_creatable = false
accordingly.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1507125199-22562-1-git-send-email-thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>