git.proxmox.com Git - mirror

block: fix QEMU crash with scsi-hd and drive_del

Removing a drive with drive_del while it is being used to run an I/O
intensive workload can cause QEMU to crash.

An AIO flush can yield at some point:

blk_aio_flush_entry()
blk_co_flush(blk)
  bdrv_co_flush(blk->root->bs)
   ...
    qemu_coroutine_yield()

and let the HMP command to run, free blk->root and give control
back to the AIO flush:

    hmp_drive_del()
     blk_remove_bs()
      bdrv_root_unref_child(blk->root)
       child_bs = blk->root->bs
       bdrv_detach_child(blk->root)
        bdrv_replace_child(blk->root, NULL)
         blk->root->bs = NULL
        g_free(blk->root) <============== blk->root becomes stale
       bdrv_unref(child_bs)
        bdrv_delete(child_bs)
         bdrv_close()
          bdrv_drained_begin()
           bdrv_do_drained_begin()
            bdrv_drain_recurse()
             aio_poll()
              ...
              qemu_coroutine_switch()

and the AIO flush completion ends up dereferencing blk->root:

  blk_aio_complete()
   scsi_aio_complete()
    blk_get_aio_context(blk)
     bs = blk_bs(blk)
ie, bs = blk->root ? blk->root->bs : NULL
            ^^^^^
            stale

The problem is that we should avoid making block driver graph
changes while we have in-flight requests. Let's drain all I/O
for this BB before calling bdrv_root_unref_child().

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

test-bdrv-drain: Test graph changes in drain_all section

This tests both adding and remove a node between bdrv_drain_all_begin()
and bdrv_drain_all_end(), and enabled the existing detach test for
drain_all.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow graph changes in bdrv_drain_all_begin/end sections

bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
did a subtree drain for each of them. This works fine as long as the
graph is static, but sadly, reality looks different.

If the graph changes so that root nodes are added or removed, we would
have to compensate for this. bdrv_next() returns each root node only
once even if it's the root node for multiple BlockBackends or for a
monitor-owned block driver tree, which would only complicate things.

The much easier and more obviously correct way is to fundamentally
change the way the functions work: Iterate over all BlockDriverStates,
no matter who owns them, and drain them individually. Compensation is
only necessary when a new BDS is created inside a drain_all section.
Removal of a BDS doesn't require any action because it's gone afterwards
anyway.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: ignore_bds_parents parameter for drain functions

In the future, bdrv_drained_all_begin/end() will drain all invidiual
nodes separately rather than whole subtrees. This means that we don't
want to propagate the drain to all parents any more: If the parent is a
BDS, it will already be drained separately. Recursing to all parents is
unnecessary work and would make it an O(n²) operation.

Prepare the drain function for the changed drain_all by adding an
ignore_bds_parents parameter to the internal implementation that
prevents the propagation of the drain to BDS parents. We still (have to)
propagate it to non-BDS parents like BlockBackends or Jobs because those
are not drained separately.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Move bdrv_drain_all_begin() out of coroutine context

Before we can introduce a single polling loop for all nodes in
bdrv_drain_all_begin(), we must make sure to run it outside of coroutine
context like we already do for bdrv_do_drained_begin().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow AIO_WAIT_WHILE with NULL ctx

bdrv_drain_all() wants to have a single polling loop for draining the
in-flight requests of all nodes. This means that the AIO_WAIT_WHILE()
condition relies on activity in multiple AioContexts, which is polled
from the mainloop context. We must therefore call AIO_WAIT_WHILE() from
the mainloop thread and use the AioWait notification mechanism.

Just randomly picking the AioContext of any non-mainloop thread would
work, but instead of bothering to find such a context in the caller, we
can just as well accept NULL for ctx.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll

This adds a test case that goes wrong if bdrv_drain_invoke() calls
aio_poll().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Defer .bdrv_drain_begin callback to polling phase

We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
done with propagating the drain through the graph and are doing the
single final BDRV_POLL_WHILE().

Just schedule the coroutine with the callback and increase bs->in_flight
to make sure that the polling phase will wait for it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

test-bdrv-drain: Graph change through parent callback

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Don't poll in parent drain callbacks

bdrv_do_drained_begin() is only safe if we have a single
BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
that parent callbacks introduce a nested polling loop that could cause
graph changes while we're traversing the graph.

Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
node without waiting for its requests to complete. These requests will
be waited for in the BDRV_POLL_WHILE() call down the call chain.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

test-bdrv-drain: Test node deletion in subtree recursion

If bdrv_do_drained_begin() polls during its subtree recursion, the graph
can change and mess up the bs->children iteration. Test that this
doesn't happen.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Drain recursively with a single BDRV_POLL_WHILE()

Anything can happen inside BDRV_POLL_WHILE(), including graph
changes that may interfere with its callers (e.g. child list iteration
in recursive callers of bdrv_do_drained_begin).

Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
end of bdrv_do_drained_begin() to avoid such effects. The recursion
happens now inside the loop condition. As the graph can only change
between bdrv_drain_poll() calls, but not inside of it, doing the
recursion here is safe.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

test-bdrv-drain: Add test for node deletion

This patch adds two bdrv-drain tests for what happens if some BDS goes
away during the drainage.

The basic idea is that you have a parent BDS with some child nodes.
Then, you drain one of the children. Because of that, the party who
actually owns the parent decides to (A) delete it, or (B) detach all its
children from it -- both while the child is still being drained.

A real-world case where this can happen is the mirror block job, which
may exit if you drain one of its children.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Remove bdrv_drain_recurse()

For bdrv_drain(), recursively waiting for child node requests is
pointless because we didn't quiesce their parents, so new requests could
come in anyway. Letting the function work only on a single node makes it
more consistent.

For subtree drains and drain_all, we already have the recursion in
bdrv_do_drained_begin(), so the extra recursion doesn't add anything
either.

Remove the useless code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Really pause block jobs on drain

We already requested that block jobs be paused in .bdrv_drained_begin,
but no guarantee was made that the job was actually inactive at the
point where bdrv_drained_begin() returned.

This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
uses it to make bdrv_drain_poll() consider block jobs using the node to
be drained.

For the test case to work as expected, we have to switch from
block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
considered active and must be waited for when draining the node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()

Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE()
in order to make sure that all pending BHs are executed on drain. This
was the wrong place to make the fix, as it is useless overhead for all
other users of the macro and unnecessarily complicates the mechanism.

This patch effectively reverts said commit (the context has changed a
bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the
loop condition for drain.

The effect is probably hard to measure in any real-world use case
because actual I/O will dominate, but if I run only the initialisation
part of 'qemu-img convert' where it calls bdrv_block_status() for the
whole image to find out how much data there is copy, this phase actually
needs only roughly half the time after this patch.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now

Since we use bdrv_do_drained_begin/end() for bdrv_drain_all_begin/end(),
coroutine context is automatically left with a BH, preventing the
deadlocks that made bdrv_drain_all*() unsafe in coroutine context. Now
that we even removed the old polling code as dead code, it's obvious
that it's compatible now.

Enable the coroutine test cases for bdrv_drain_all().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Don't manually poll in bdrv_drain_all()

All involved nodes are already idle, we called bdrv_do_drain_begin() on
them.

The comment in the code suggested that this was not correct because the
completion of a request on one node could spawn a new request on a
different node (which might have been drained before, so we wouldn't
drain the new request). In reality, new requests to different nodes
aren't spawned out of nothing, but only in the context of a parent
request, and they aren't submitted to random nodes, but only to child
nodes. As long as we still poll for the completion of the parent request
(which we do), draining each root node separately is good enough.

Remove the additional polling code from bdrv_drain_all_begin() and
replace it with an assertion that all nodes are already idle after we
drained them separately.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Remove 'recursive' parameter from bdrv_drain_invoke()

All callers pass false for the 'recursive' parameter now. Remove it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Use bdrv_do_drain_begin/end in bdrv_drain_all()

bdrv_do_drain_begin/end() implement already everything that
bdrv_drain_all_begin/end() need and currently still do manually: Disable
external events, call parent drain callbacks, call block driver
callbacks.

It also does two more things:

The first is incrementing bs->quiesce_counter. bdrv_drain_all() already
stood out in the test case by behaving different from the other drain
variants. Adding this is not only safe, but in fact a bug fix.

The second is calling bdrv_drain_recurse(). We already do that later in
the same function in a loop, so basically doing an early first iteration
doesn't hurt.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

test-bdrv-drain: bdrv_drain() works with cross-AioContext events

As long as nobody keeps the other I/O thread from working, there is no
reason why bdrv_drain() wouldn't work with cross-AioContext events. The
key is that the root request we're waiting for is in the AioContext
we're polling (which it always is for bdrv_drain()) so that aio_poll()
is woken up in the end.

Add a test case that shows that it works. Remove the comment in
bdrv_drain() that claims otherwise.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20180615a' into staging

Migration pull 2018-06-15

# gpg: Signature made Fri 15 Jun 2018 16:13:17 BST
# gpg:                using RSA key 0516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20180615a:
  migration: calculate expected_downtime with ram_bytes_remaining()
  migration/postcopy: Wake rate limit sleep on postcopy request
  migration: Wake rate limiting for urgent requests
  migration/postcopy: Add max-postcopy-bandwidth parameter
  migration: introduce migration_update_rates
  migration: fix counting xbzrle cache_miss_rate
  migration/block-dirty-bitmap: fix dirty_bitmap_load
  migration: Poison ramblock loops in migration
  migration: Fixes for non-migratable RAMBlocks
  typedefs: add QJSON

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/edgar/tags/edgar/xilinx-next-2018-06-15.for-upstream' into staging

xilinx-next-2018-06-15.for-upstream

# gpg: Signature made Fri 15 Jun 2018 15:32:47 BST
# gpg:                using RSA key 29C596780F6BCA83
# gpg: Good signature from "Edgar E. Iglesias (Xilinx key) <edgar.iglesias@xilinx.com>"
# gpg:                 aka "Edgar E. Iglesias <edgar.iglesias@gmail.com>"
# Primary key fingerprint: AC44 FEDC 14F7 F1EB EDBF  4151 29C5 9678 0F6B CA83

* remotes/edgar/tags/edgar/xilinx-next-2018-06-15.for-upstream:
  target-microblaze: Rework NOP/zero instruction handling
  target-microblaze: mmu: Correct masking of output addresses

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Fix options that work only with -drive or -blockdev, but not with
  both, because of QDict type confusion
- rbd: Add options 'auth-client-required' and 'key-secret'
- Remove deprecated -drive options serial/addr/cyls/heads/secs/trans
- rbd, iscsi: Remove deprecated 'filename' option
- Fix 'qemu-img map' crash with unaligned image size
- Improve QMP documentation for jobs

# gpg: Signature made Fri 15 Jun 2018 15:20:03 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (26 commits)
  block: Remove dead deprecation warning code
  block: Remove deprecated -drive option serial
  block: Remove deprecated -drive option addr
  block: Remove deprecated -drive geometry options
  rbd: New parameter key-secret
  rbd: New parameter auth-client-required
  block: Fix -blockdev / blockdev-add for empty objects and arrays
  check-block-qdict: Cover flattening of empty lists and dictionaries
  check-block-qdict: Rename qdict_flatten()'s variables for clarity
  block-qdict: Simplify qdict_is_list() some
  block-qdict: Clean up qdict_crumple() a bit
  block-qdict: Tweak qdict_flatten_qdict(), qdict_flatten_qlist()
  block-qdict: Simplify qdict_flatten_qdict()
  block: Make remaining uses of qobject input visitor more robust
  block: Factor out qobject_input_visitor_new_flat_confused()
  block: Clean up a misuse of qobject_to() in .bdrv_co_create_opts()
  block: Fix -drive for certain non-string scalars
  block: Fix -blockdev for certain non-string scalars
  qobject: Move block-specific qdict code to block-qdict.c
  block: Add block-specific QDict header
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20180615' into staging

target-arm and miscellaneous queue:
* fix KVM state save/restore for GICv3 priority registers for high IRQ numbers
* hw/arm/mps2-tz: Put ethernet controller behind PPC
* hw/sh/sh7750: Convert away from old_mmio
* hw/m68k/mcf5206: Convert away from old_mmio
* hw/block/pflash_cfi02: Convert away from old_mmio
* hw/watchdog/wdt_i6300esb: Convert away from old_mmio
* hw/input/pckbd: Convert away from old_mmio
* hw/char/parallel: Convert away from old_mmio
* armv7m: refactor to get rid of armv7m_init() function
* arm: Don't crash if user tries to use a Cortex-M CPU without an NVIC
* hw/core/or-irq: Support more than 16 inputs to an OR gate
* cpu-defs.h: Document CPUIOTLBEntry 'addr' field
* cputlb: Pass cpu_transaction_failed() the correct physaddr
* CODING_STYLE: Define our preferred form for multiline comments
* Add and use new stn_*_p() and ldn_*_p() memory access functions
* target/arm: More parts of the upcoming SVE support
* aspeed_scu: Implement RNG register
* m25p80: add support for two bytes WRSR for Macronix chips
* exec.c: Handle IOMMUs being in the path of TCG CPU memory accesses
* target/arm: Allow ARMv6-M Thumb2 instructions

# gpg: Signature made Fri 15 Jun 2018 15:24:03 BST
# gpg:                using RSA key 3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20180615: (43 commits)
  target/arm: Allow ARMv6-M Thumb2 instructions
  exec.c: Handle IOMMUs in address_space_translate_for_iotlb()
  iommu: Add IOMMU index argument to translate method
  iommu: Add IOMMU index argument to notifier APIs
  iommu: Add IOMMU index concept to IOMMU API
  m25p80: add support for two bytes WRSR for Macronix chips
  aspeed_scu: Implement RNG register
  target/arm: Implement SVE Floating Point Arithmetic - Unpredicated Group
  target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group
  target/arm: Implement FDUP/DUP
  target/arm: Implement SVE Integer Compare - Scalars Group
  target/arm: Implement SVE Predicate Count Group
  target/arm: Implement SVE Partition Break Group
  target/arm: Implement SVE Integer Compare - Immediate Group
  target/arm: Implement SVE Integer Compare - Vectors Group
  target/arm: Implement SVE Select Vectors Group
  target/arm: Implement SVE vector splice (predicated)
  target/arm: Implement SVE reverse within elements
  target/arm: Implement SVE copy to vector (predicated)
  target/arm: Implement SVE conditionally broadcast/extract element
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Allow ARMv6-M Thumb2 instructions

ARMv6-M supports 6 Thumb2 instructions. This patch checks for these
instructions and allows their execution.
Like Thumb2 cores, ARMv6-M always interprets BL instruction as 32-bit.

This patch is required for future Cortex-M0 support.

Signed-off-by: Julia Suvorova <jusual@mail.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180612204632.28780-1-jusual@mail.ru
[PMM: move armv6m_insn[] and armv6m_mask[] closer to
point of use, and mark 'const'. Check for M-and-not-v7
rather than M-and-6.]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

exec.c: Handle IOMMUs in address_space_translate_for_iotlb()

Currently we don't support board configurations that put an IOMMU
in the path of the CPU's memory transactions, and instead just
assert() if the memory region fonud in address_space_translate_for_iotlb()
is an IOMMUMemoryRegion.

Remove this limitation by having the function handle IOMMUs.
This is mostly straightforward, but we must make sure we have
a notifier registered for every IOMMU that a transaction has
passed through, so that we can flush the TLB appropriately
when any of the IOMMUs change their mappings.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-5-peter.maydell@linaro.org

iommu: Add IOMMU index argument to translate method

Add an IOMMU index argument to the translate method of
IOMMUs. Since all of our current IOMMU implementations
support only a single IOMMU index, this has no effect
on the behaviour.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-4-peter.maydell@linaro.org

iommu: Add IOMMU index argument to notifier APIs

Add support for multiple IOMMU indexes to the IOMMU notifier APIs.
When initializing a notifier with iommu_notifier_init(), the caller
must pass the IOMMU index that it is interested in. When a change
happens, the IOMMU implementation must pass
memory_region_notify_iommu() the IOMMU index that has changed and
that notifiers must be called for.

IOMMUs which support only a single index don't need to change.
Callers which only really support working with IOMMUs with a single
index can use the result of passing MEMTXATTRS_UNSPECIFIED to
memory_region_iommu_attrs_to_index().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-3-peter.maydell@linaro.org

iommu: Add IOMMU index concept to IOMMU API

If an IOMMU supports mappings that care about the memory
transaction attributes, then it no longer has a unique
address -> output mapping, but more than one. We can
represent these using an IOMMU index, analogous to TCG's
mmu indexes.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-2-peter.maydell@linaro.org

m25p80: add support for two bytes WRSR for Macronix chips

On Macronix chips, two bytes can written to the WRSR. First byte will
configure the status register and the second the configuration
register. It is important to save the configuration value as it
contains the dummy cycle setting when using dual or quad IO mode.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

aspeed_scu: Implement RNG register

The ASPEED SoCs contain a single register that returns random data when
read. This models that register so that guests can use it.

The random number data register has a corresponding control register,
however it returns data regardless of the state of the enabled bit, so
the model follows this behaviour.

When the qcrypto call fails we exit as the guest uses the random number
device to feed it's entropy pool, which is used for cryptographic
purposes.

Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Message-id: 20180613114836.9265-1-joel@jms.id.au
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Floating Point Arithmetic - Unpredicated Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-19-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-18-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement FDUP/DUP

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Integer Compare - Scalars Group

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-16-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Predicate Count Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-15-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Partition Break Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Integer Compare - Immediate Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Integer Compare - Vectors Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Select Vectors Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE vector splice (predicated)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-10-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE reverse within elements

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-9-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE copy to vector (predicated)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-8-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE conditionally broadcast/extract element

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-7-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE compress active elements

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Permute - Interleaving Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Permute - Predicates Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement SVE Permute - Unpredicated Group

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Extend vec_reg_offset to larger sizes

Rearrange the arithmetic so that we are agnostic about the total size
of the vector and the size of the element. This will allow us to index
up to the 32nd byte and with 16-byte elements.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180613015641.5667-2-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

exec.c: Use stn_p() and ldn_p() instead of explicit switches

Now we have stn_p() and ldn_p() we can use them in various
functions in exec.c that used to have their own switch-on-size code.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611171007.4165-4-peter.maydell@linaro.org

exec.c: Don't accidentally sign-extend 4-byte loads in subpage_read()

In subpage_read() we perform a load of the data into a local buffer
which we then access using ldub_p(), lduw_p(), ldl_p() or ldq_p()
depending on its size, storing the result into the uint64_t *data.
Since ldl_p() returns an 'int', this means that for the 4-byte
case we will sign-extend the data, whereas for 1 and 2 byte
reads we zero-extend it.

This ought not to matter since the caller will likely ignore values in
the high bytes of the data, but add a cast so that we're consistent.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611171007.4165-3-peter.maydell@linaro.org

bswap: Add new stn_*_p() and ldn_*_p() memory access functions

There's a common pattern in QEMU where a function needs to perform
a data load or store of an N byte integer in a particular endianness.
At the moment this is handled by doing a switch() on the size and
calling the appropriate ld*_p or st*_p function for each size.

Provide a new family of functions ldn_*_p() and stn_*_p() which
take the size as an argument and do the switch() themselves.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611171007.4165-2-peter.maydell@linaro.org

CODING_STYLE: Define our preferred form for multiline comments

The codebase has a bit of a mix of different multiline
comment styles. State a preference for the Linux kernel
style:
    /*
     * Star on the left for each line.
     * Leading slash-star and trailing star-slash
     * each go on a line of their own.
     */

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180611141716.3813-1-peter.maydell@linaro.org

cputlb: Pass cpu_transaction_failed() the correct physaddr

The API for cpu_transaction_failed() says that it takes the physical
address for the failed transaction. However we were actually passing
it the offset within the target MemoryRegion. We don't currently
have any target CPU implementations of this hook that require the
physical address; fix this bug so we don't get confused if we ever
do add one.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611125633.32755-3-peter.maydell@linaro.org

cpu-defs.h: Document CPUIOTLBEntry 'addr' field

The 'addr' field in the CPUIOTLBEntry struct has a rather non-obvious
use; add a comment documenting it (reverse-engineered from what
the code that sets it is doing).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20180611125633.32755-2-peter.maydell@linaro.org

hw/core/or-irq: Support more than 16 inputs to an OR gate

For the IoTKit MPC support, we need to wire together the
interrupt outputs of 17 MPCs; this exceeds the current
value of MAX_OR_LINES. Increase MAX_OR_LINES to 32 (which
should be enough for anyone).

The tricky part is retaining the migration compatibility for
existing OR gates; we add a subsection which is only used
for larger OR gates, and define it such that we can freely
increase MAX_OR_LINES in future (or even move to a dynamically
allocated levels[] array without an upper size limit) without
breaking compatibility.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20180604152941.20374-10-peter.maydell@linaro.org

arm: Don't crash if user tries to use a Cortex-M CPU without an NVIC

The Cortex-M CPU and its NVIC are two intimately intertwined parts of
the same hardware; it is not possible to use one without the other.
Unfortunately a lot of our board models don't do any sanity checking
on the CPU type the user asks for, so a command line like
qemu-system-arm -M versatilepb -cpu cortex-m3
will create an M3 without an NVIC, and coredump immediately.
In the other direction, trying a non-M-profile CPU in an M-profile
board won't blow up, but doesn't do anything useful either:
qemu-system-arm -M lm3s6965evb -cpu arm926

Add some checking in the NVIC and CPU realize functions that the
user isn't trying to use an NVIC without an M-profile CPU or
an M-profile CPU without an NVIC, so we can produce a helpful
error message rather than a core dump.

Fixes: https://bugs.launchpad.net/qemu/+bug/1766896
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180601160355.15393-1-peter.maydell@linaro.org

hw/arm/armv7m: Remove unused armv7m_init() function

Remove the now-unused armv7m_init() function. This was a legacy from
before we properly QOMified ARMv7M, and it has some flaws:

* it combines work that needs to be done by an SoC object (creating
   and initializing the TYPE_ARMV7M object) with work that needs to
   be done by the board model (setting the system up to load the ELF
   file specified with -kernel)
* TYPE_ARMV7M creation failure is fatal, but an SoC object wants to
   arrange to propagate the failure outward
* it uses allocate-and-create via qdev_create() whereas the current
   preferred style for SoC objects is to do creation in-place

Board and SoC models can instead do the two jobs this function
was doing themselves, in the right places and with whatever their
preferred style/error handling is.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601144328.23817-3-peter.maydell@linaro.org

stellaris: Stop using armv7m_init()

The stellaris board is still using the legacy armv7m_init() function,
which predates conversion of the ARMv7M into a proper QOM container
object. Make the board code directly create the ARMv7M object instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20180601144328.23817-2-peter.maydell@linaro.org

hw/char/parallel: Convert away from old_mmio

Convert the parallel device away from using the old_mmio field
of MemoryRegionOps. This change only affects the memory-mapped
variant, which is used by the MIPS Jazz boards 'magnum' and 'pica61'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180601141223.26630-7-peter.maydell@linaro.org

hw/input/pckbd: Convert away from old_mmio

Convert the pckbd device away from using the old_mmio field
of MemoryRegionOps. This change only affects the memory-mapped
variant of the i8042, which is used by the Unicore32 'puv3'
board and the MIPS Jazz boards 'magnum' and 'pica61'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180601141223.26630-6-peter.maydell@linaro.org

hw/watchdog/wdt_i6300esb: Convert away from old_mmio

Convert the wdt_i6300esb device away from using the old_mmio field
of MemoryRegionOps.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180601141223.26630-5-peter.maydell@linaro.org

hw/block/pflash_cfi02: Convert away from old_mmio

Convert the pflash_cfi02 device away from using the old_mmio field
of MemoryRegionOps.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Acked-by: Max Reitz <mreitz@redhat.com>
Message-id: 20180601141223.26630-4-peter.maydell@linaro.org

hw/m68k/mcf5206: Convert away from old_mmio

Convert the mcf5206 device away from using the old_mmio field
of MemoryRegionOps. This device is used by the an5206 board.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Acked-by: Thomas Huth <huth@tuxfamily.org>
Message-id: 20180601141223.26630-3-peter.maydell@linaro.org

hw/sh/sh7750: Convert away from old_mmio

Convert the sh7750 device away from using the old_mmio field
of MemoryRegionOps. This device is used by the sh4 r2d board.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20180601141223.26630-2-peter.maydell@linaro.org

hw/arm/mps2-tz: Put ethernet controller behind PPC

The ethernet controller in the AN505 MPC FPGA image is behind
the same AHB Peripheral Protection Controller that handles
the graphics and GPIOs. (In the documentation this is clear
in the block diagram but the ethernet controller was omitted
from the table listing devices connected to the PPC.)
The ethernet sits behind AHB PPCEXP0 interface 5. We had
incorrectly claimed that this was a "gpio4", but there are
only 4 GPIOs in this image.

Correct the QEMU model to match the hardware.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20180515171446.10834-1-peter.maydell@linaro.org

arm_gicv3_kvm: kvm_dist_get/put_priority: skip the registers banked by GICR_IPRIORITYR

While for_each_dist_irq_reg loop starts from GIC_INTERNAL, it forgot to
offset the date array and index. This will overlap the GICR registers
value and leave the last GIC_INTERNAL irq's registers out of update.

Fixes: 367b9f527becdd20ddf116e17a3c0c2bbc486920
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Shannon Zhao <zhaoshenglong@huawei.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

migration: calculate expected_downtime with ram_bytes_remaining()

expected_downtime value is not accurate with dirty_pages_rate * page_size,
using ram_bytes_remaining() would yeild it resonable.

consider to read the remaining ram just after having updated the dirty
pages count later migration_bitmap_sync_range() in migration_bitmap_sync()
and reuse the `remaining` field in ram_counters to hold ram_bytes_remaining()
for calculating expected_downtime.

Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <20180612085009.17594-2-bala24@linux.vnet.ibm.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/postcopy: Wake rate limit sleep on postcopy request

Use the 'urgent request' mechanism added in the previous patch
for entries added to the postcopy request queue for RAM. Ignore
the rate limiting while we have requests.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180613102642.23995-4-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Wake rate limiting for urgent requests

Rate limiting sleeps the migration thread for a while when it runs
out of bandwidth; but sometimes we want to wake up to get on with
something more urgent (like a postcopy request). Here we use
a semaphore with a timedwait instead of a simple sleep; Incrementing
the sempahore will wake it up sooner. Anything that consumes
these urgent events must decrement the sempahore.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180613102642.23995-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/postcopy: Add max-postcopy-bandwidth parameter

Limit the background transfer bandwidth during the postcopy
phase to the value set on this new parameter. The default, 0,
corresponds to the existing behaviour which is unlimited bandwidth.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180613102642.23995-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: introduce migration_update_rates

It is used to slightly clean the code up, no logic is changed

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180604095520.8563-5-xiaoguangrong@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: fix counting xbzrle cache_miss_rate

Sync up xbzrle_cache_miss_prev only after migration iteration goes
forward

Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Message-Id: <20180604095520.8563-4-xiaoguangrong@tencent.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration/block-dirty-bitmap: fix dirty_bitmap_load

dirty_bitmap_load_header return code is obtained but not handled. Fix
this.

Bug was introduced in b35ebdf076d697bc
"migration: add postcopy migration of dirty bitmaps" with the whole
function.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180530112424.204835-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Poison ramblock loops in migration

The migration code should be using the
RAMBLOCK_FOREACH_MIGRATABLE and qemu_ram_foreach_block_migratable
not the all-block versions; poison them so that we can't accidentally
use them.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180605162545.80778-3-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Fixes for non-migratable RAMBlocks

There are still a few cases where migration code is using the macros
and functions that do all RAMBlocks rather than just the migratable
blocks; fix those up.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20180605162545.80778-2-dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

typedefs: add QJSON

Since commit 83ee768d6247b, we now have two places that define the
QJSON type:

$ git grep 'typedef struct QJSON QJSON'
include/migration/vmstate.h:typedef struct QJSON QJSON;
migration/qjson.h:typedef struct QJSON QJSON;

This breaks docker-test-build@centos6:

In file included from /tmp/qemu-test/src/migration/savevm.c:59:
/tmp/qemu-test/src/migration/qjson.h:16: error: redefinition of typedef
'QJSON'
/tmp/qemu-test/src/include/migration/vmstate.h:30: note: previous
declaration of 'QJSON' was here
make: *** [migration/savevm.o] Error 1

This happens because CentOS 6 has an old GCC 4.4.7. Even if redefining
a typedef with the same type is permitted since GCC 4.6, unless -pedantic
is passed, we don't really need to do that on purpose. Let's have a
single definition in <qemu/typedefs.h> instead.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <152844714981.11789.3657734445739553287.stgit@bahia.lan>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

block: Remove dead deprecation warning code

We removed all options from the 'deprecated' array, so the code is dead
and can be removed as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

block: Remove deprecated -drive option serial

The -drive option serial was deprecated in QEMU 2.10. It's time to
remove it.

Tests need to be updated to set the serial number with -global instead
of using the -drive option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>

block: Remove deprecated -drive option addr

The -drive option addr was deprecated in QEMU 2.10. It's time to remove
it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>

block: Remove deprecated -drive geometry options

The -drive options cyls, heads, secs and trans were deprecated in
QEMU 2.10. It's time to remove them.

hd-geo-test tested both the old version with geometry options in -drive
and the new one with -device. Therefore the code using -drive doesn't
have to be replaced there, we just need to remove the -drive test cases.
This in turn allows some simplification of the code.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

rbd: New parameter key-secret

Legacy -drive supports "password-secret" parameter that isn't
available with -blockdev / blockdev-add.  That's because we backed out
our first try to provide it there due to interface design doubts, in
commit 577d8c9a811, v2.9.0.

This is the second try.  It brings back the parameter, except it's
named "key-secret" now.

Let's review our reasons for backing out the first try, as stated in
the commit message:

    * BlockdevOptionsRbd member @password-secret isn't actually a
      password, it's a key generated by Ceph.

Addressed by the rename.

    * We're not sure where member @password-secret belongs (see the
      previous commit).

See previous commit.

    * How @password-secret interacts with settings from a configuration
      file specified with @conf is undocumented.

Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

rbd: New parameter auth-client-required

Parameter auth-client-required lets you configure authentication
methods.  We tried to provide that in v2.9.0, but backed out due to
interface design doubts (commit 464444fcc16).

This commit is similar to what we backed out, but simpler: we use a
list of enumeration values instead of a list of objects with a member
of enumeration type.

Let's review our reasons for backing out the first try, as stated in
the commit message:

    * The implementation uses deprecated rados_conf_set() key
      "auth_supported".  No biggie.

Fixed: we use "auth-client-required".

    * The implementation makes -drive silently ignore invalid parameters
      "auth" and "auth-supported.*.X" where X isn't "auth".  Fixable (in
      fact I'm going to fix similar bugs around parameter server), so
      again no biggie.

That fix is commit 2836284db60.  This commit doesn't bring the bugs
back.

    * BlockdevOptionsRbd member @password-secret applies only to
      authentication method cephx.  Should it be a variant member of
      RbdAuthMethod?

We've had time to ponder, and we decided to stick to the way Ceph
configuration works: the key configured separately, and silently
ignored if the authentication method doesn't use it.

    * BlockdevOptionsRbd member @user could apply to both methods cephx
      and none, but I'm not sure it's actually used with none.  If it
      isn't, should it be a variant member of RbdAuthMethod?

Likewise.

    * The client offers a *set* of authentication methods, not a list.
      Should the methods be optional members of BlockdevOptionsRbd instead
      of members of list @auth-supported?  The latter begs the question
      what multiple entries for the same method mean.  Trivial question
      now that RbdAuthMethod contains nothing but @type, but less so when
      RbdAuthMethod acquires other members, such the ones discussed above.

Again, we decided to stick to the way Ceph configuration works, except
we make auth-client-required a list of enumeration values instead of a
string containing keywords separated by delimiters.

    * How BlockdevOptionsRbd member @auth-supported interacts with
      settings from a configuration file specified with @conf is
      undocumented.  I suspect it's untested, too.

Not actually true, the documentation for @conf says "Values in the
configuration file will be overridden by options specified via QAPI",
and we've tested this.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix -blockdev / blockdev-add for empty objects and arrays

-blockdev and blockdev-add silently ignore empty objects and arrays in
their argument.  That's because qmp_blockdev_add() converts the
argument to a flat QDict, and qdict_flatten() eats empty QDict and
QList members.  For instance, we ignore an empty BlockdevOptions
member @cache.  No real harm, as absent means the same as empty there.

Thus, the flaw puts an artificial restriction on the QAPI schema: we
can't have potentially empty objects and arrays within
BlockdevOptions, except when they're optional and "empty" has the same
meaning as "absent".

Our QAPI schema satisfies this restriction (I checked), but it's a
trap for the unwary, and a temptation to employ awkward workarounds
for the wary.  Let's get rid of it.

Change qdict_flatten() and qdict_crumple() to treat empty dictionaries
and lists exactly like scalars.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

check-block-qdict: Cover flattening of empty lists and dictionaries

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

check-block-qdict: Rename qdict_flatten()'s variables for clarity

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-qdict: Simplify qdict_is_list() some

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-qdict: Clean up qdict_crumple() a bit

When you mix scalar and non-scalar keys, whether you get an "already
set as scalar" or an "already set as dict" error depends on qdict
iteration order.  Neither message makes much sense.  Replace by
""Cannot mix scalar and non-scalar keys".  This is similar to the
message we get for mixing list and non-list keys.

I find qdict_crumple()'s first loop hard to understand.  Rearrange it
and add a comment.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-qdict: Tweak qdict_flatten_qdict(), qdict_flatten_qlist()

qdict_flatten_qdict() skips copying scalars from @qdict to @target
when the two are the same. Fair enough, but it uses a non-obvious
test for "same". Replace it by the obvious one. While there, improve
comments.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-qdict: Simplify qdict_flatten_qdict()

There's no need to restart the loop. We don't elsewhere, e.g. in
qdict_extract_subqdict(), qdict_join() and qemu_opts_absorb_qdict().
Simplify accordingly.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Make remaining uses of qobject input visitor more robust

Remaining uses of qobject_input_visitor_new_keyval() in the block
subsystem:

* block_crypto_open_opts_init()
  Currently doesn't visit any non-string scalars, thus safe.  It's
  called from
  - block_crypto_open_luks()
    Creates the QDict with qemu_opts_to_qdict_filtered(), which
    creates only string scalars, but has a TODO asking for other types.
  - qcow_open()
  - qcow2_open(), qcow2_co_invalidate_cache(), qcow2_reopen_prepare()

* block_crypto_create_opts_init(), called from
  - block_crypto_co_create_opts_luks()
    Also creates the QDict with qemu_opts_to_qdict_filtered().

* vdi_co_create_opts()
  Also creates the QDict with qemu_opts_to_qdict_filtered().

Replace these uses by qobject_input_visitor_new_flat_confused() for
robustness.  This adds crumpling.  Right now, that's a no-op, but if
we ever extend these things in non-flat ways, crumpling will be
needed.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Factor out qobject_input_visitor_new_flat_confused()

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Clean up a misuse of qobject_to() in .bdrv_co_create_opts()

The following pattern occurs in the .bdrv_co_create_opts() methods of
parallels, qcow, qcow2, qed, vhdx and vpc:

    qobj = qdict_crumple_for_keyval_qiv(qdict, errp);
    qobject_unref(qdict);
    qdict = qobject_to(QDict, qobj);
    if (qdict == NULL) {
         ret = -EINVAL;
         goto done;
    }

    v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
    [...]
    ret = 0;
done:
    qobject_unref(qdict);
    [...]
    return ret;

If qobject_to() fails, we return failure without setting errp.  That's
wrong.  As far as I can tell, it cannot fail here.  Clean it up
anyway, by removing the useless conversion.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix -drive for certain non-string scalars

The previous commit fixed -blockdev breakage due to misuse of the
qobject input visitor's keyval flavor in bdrv_file_open().  The commit
message explain why using the plain flavor would be just as wrong; it
would break -drive.  Turns out we break it in three places:
nbd_open(), sd_open() and ssh_file_open().  They are even marked
FIXME.  Example breakage:

    $ qemu-system-x86 -drive node-name=n1,driver=nbd,server.type=inet,server.host=localhost,server.port=1234,server.numeric=off
    qemu-system-x86: -drive node-name=n1,driver=nbd,server.type=inet,server.host=localhost,server.port=1234,server.numeric=off: Invalid parameter type for 'numeric', expected: boolean

Fix it the same way: replace qdict_crumple() by
qdict_crumple_for_keyval_qiv(), and switch from plain to the keyval
flavor.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix -blockdev for certain non-string scalars

Configuration flows through the block subsystem in a rather peculiar
way.  Configuration made with -drive enters it as QemuOpts.
Configuration made with -blockdev / blockdev-add enters it as QAPI
type BlockdevOptions.  The block subsystem uses QDict, QemuOpts and
QAPI types internally.  The precise flow is next to impossible to
explain (I tried for this commit message, but gave up after wasting
several hours).  What I can explain is a flaw in the BlockDriver
interface that leads to this bug:

    $ qemu-system-x86_64 -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234
    qemu-system-x86_64: -blockdev node-name=n1,driver=nfs,server.type=inet,server.host=localhost,path=/foo/bar,user=1234: Internal error: parameter user invalid

QMP blockdev-add is broken the same way.

Here's what happens.  The block layer passes configuration represented
as flat QDict (with dotted keys) to BlockDriver methods
.bdrv_file_open().  The QDict's members are typed according to the
QAPI schema.

nfs_file_open() converts it to QAPI type BlockdevOptionsNfs, with
qdict_crumple() and a qobject input visitor.

This visitor comes in two flavors.  The plain flavor requires scalars
to be typed according to the QAPI schema.  That's the case here.  The
keyval flavor requires string scalars.  That's not the case here.
nfs_file_open() uses the latter, and promptly falls apart for members
@user, @group, @tcp-syn-count, @readahead-size, @page-cache-size,
@debug.

Switching to the plain flavor would fix -blockdev, but break -drive,
because there the scalars arrive in nfs_file_open() as strings.

The proper fix would be to replace the QDict by QAPI type
BlockdevOptions in the BlockDriver interface.  Sadly, that's beyond my
reach right now.

Next best would be to fix the block layer to always pass correctly
typed QDicts to the BlockDriver methods.  Also beyond my reach.

What I can do is throw another hack onto the pile: have
nfs_file_open() convert all members to string, so use of the keyval
flavor actually works, by replacing qdict_crumple() by new function
qdict_crumple_for_keyval_qiv().

The pattern "pass result of qdict_crumple() to
qobject_input_visitor_new_keyval()" occurs several times more:

* qemu_rbd_open()

  Same issue as nfs_file_open(), but since BlockdevOptionsRbd has only
  string members, its only a latent bug.  Fix it anyway.

* parallels_co_create_opts(), qcow_co_create_opts(),
  qcow2_co_create_opts(), bdrv_qed_co_create_opts(),
  sd_co_create_opts(), vhdx_co_create_opts(), vpc_co_create_opts()

  These work, because they create the QDict with
  qemu_opts_to_qdict_filtered(), which creates only string scalars.
  The function sports a TODO comment asking for better typing; that's
  going to be fun.  Use qdict_crumple_for_keyval_qiv() to be safe.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qobject: Move block-specific qdict code to block-qdict.c

Pure code motion, except for two brace placements and a comment
tweaked to appease checkpatch.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Add block-specific QDict header

There are numerous QDict functions that have been introduced for and are
used only by the block layer. Move their declarations into an own
header file to reflect that.

While qdict_extract_subqdict() is in fact used outside of the block
layer (in util/qemu-config.c), it is still a function related very
closely to how the block layer works with nested QDicts, namely by
sometimes flattening them. Therefore, its declaration is put into this
header as well and util/qemu-config.c includes it with a comment stating
exactly which function it needs.

Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20180509165530.29561-7-mreitz@redhat.com>
[Copyright note tweaked, superfluous includes dropped]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iscsi: Drop deprecated -drive parameter "filename"

Parameter "filename" is deprecated since commit 5c3ad1a6a8f, v2.10.0.
Time to get rid of it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

rbd: Drop deprecated -drive parameter "filename"

Parameter "filename" is deprecated since commit 91589d9e5ca, v2.10.0.
Time to get rid of it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>