git.proxmox.com Git - mirror

block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper

Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block: Make bdrv_pread() a bdrv_prwv_co() wrapper

Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_preadv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block: Change coroutine wrapper to byte granularity

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block: Assert serialisation assumptions in pwritev

If a request calls wait_serialising_requests() and actually has to wait
in this function (i.e. a coroutine yield), other requests can run and
previously read data (like the head or tail buffer) could become
outdated. In this case, we would have to restart from the beginning to
read in the updated data.

However, we're lucky and don't actually need to do that: A request can
only wait in the first call of wait_serialising_requests() because we
mark it as serialising before that call, so any later requests would
wait. So as we don't wait in practice, we don't have to reload the data.

This is an important assumption that may not be broken or data
corruption will happen. Document it with some assertions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block: Align requests in bdrv_co_do_pwritev()

This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.

Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Allow wait_serialising_requests() at any point

We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().

Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Make overlap range for serialisation dynamic

Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Generalise and optimise COR serialisation

Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Make zero-after-EOF work with larger alignment

Odd file sizes could make bdrv_aligned_preadv() shorten the request in
non-aligned ways. Fix it by rounding to the required alignment instead
of 512 bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Allow waiting for overlapping requests between begin/end

Previously, it was not possible to use wait_for_overlapping_requests()
between tracked_request_begin()/end() because it would wait for itself.

Ignore the current request in the overlap check and run more of the
bdrv_co_do_preadv/pwritev code with a BdrvTrackedRequest present.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Switch BdrvTrackedRequest to byte granularity

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Introduce bdrv_co_do_pwritev()

This is going to become the bdrv_co_do_preadv() equivalent for writes.
In this patch, however, just a function taking byte offsets is created,
it doesn't align anything yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: write: Handle COR dependency after I/O throttling

First waiting for all COR requests to complete and calling the
throttling function afterwards means that the request could be delayed
and we still need to wait for the COR request even if it was issued only
after the throttled write request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Introduce bdrv_aligned_pwritev()

This separates the part of bdrv_co_do_writev() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Introduce bdrv_co_do_preadv()

Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte
sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it
to the alignment specified in bs->request_alignment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Introduce bdrv_aligned_preadv()

This separates the part of bdrv_co_do_readv() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

raw: Probe required direct I/O alignment

Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block: rename buffer_alignment to guest_block_size

The alignment field is now set to the value that is promised to the
guest, rather than required by the host. The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Don't use guest sector size for qemu_blockalign()

bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block: Detect unaligned length in bdrv_qiov_is_aligned()

For an O_DIRECT request to succeed, it's not only necessary that all
base addresses in the qiov are aligned, but also that each length in it
is aligned.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

qemu_memalign: Allow small alignments

The functions used by qemu_memalign() require an alignment that is at
least sizeof(void*). Adjust it if it is too small.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>

block: Update BlockLimits when they might have changed

When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>

block: Inherit opt_transfer_length

When there is a format driver between the backend, it's not guaranteed
that exposing the opt_transfer_length for the format driver results in
the optimal requests (because of fragmentation etc.), but it can't make
things worse, so let's just do it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>

block: Move initialisation of BlockLimits to bdrv_refresh_limits()

This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: Fix bdrv_commit return value

bdrv_commit() could return 0 or 1 on success, depending on whether or
not the last sector was allocated in the overlay and whether the overlay
format had a .bdrv_make_empty callback.

Most callers ignored it, but qemu-img commit would print an error
message while the operation actually succeeded.

Also clean up the handling of I/O errors to return the real error code
instead of -EIO.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

block: update block commit documentation regarding image truncation

This updates the documentation for commiting snapshot images.
Specifically, this highlights what happens when the base image
is either smaller or larger than the snapshot image being committed.

In the case of the base image being smaller, it is resized to the
larger size of the snapshot image. In the case of the base image
being larger, it is not resized automatically, but once the commit
has completed it is safe for the user to truncate the base image.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: resize backing image during active layer commit, if needed

If the top image to commit is the active layer, and also larger than
the base image, then an I/O error will likely be returned during
block-commit.

For instance, if we have a base image with a virtual size 10G, and a
active layer image of size 20G, then committing the snapshot via
'block-commit' will likely fail.

This will automatically attempt to resize the base image, if the
active layer image to be committed is larger.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: resize backing file image during offline commit, if necessary

Currently, if an image file is logically larger than its backing file,
committing it via 'qemu-img commit' will fail.

For instance, if we have a base image with a virtual size 10G, and a
snapshot image of size 20G, then committing the snapshot offline with
'qemu-img commit' will likely fail.

This will automatically attempt to resize the base image, if the
snapshot image to be committed is larger.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/curl: Implement the libcurl timer callback interface

libcurl versions 7.16.0 and later have a timer callback interface which
must be implemented in order for libcurl to make forward progress (it
will sometimes rely on being called back on the timeout if there are
no file descriptors registered). Implement the callback, and use a
QEMU AIO timer to ensure we prod libcurl again when it asks us to.

Based on Peter's original patch plus my fix to add curl_multi_timeout_do.
Should compile just fine even on older versions of libcurl.

I also tried copy-on-read and streaming:

    $ ./qemu-img create -f qcow2 -o \
         backing_file=http://download.fedoraproject.org/pub/fedora/linux/releases/20/Live/x86_64/Fedora-Live-Desktop-x86_64-20-1.iso \
         foo.qcow2 1G
    $ x86_64-softmmu/qemu-system-x86_64 \
         -drive if=none,file=foo.qcow2,copy-on-read=on,id=cd \
         -device ide-cd,drive=cd --enable-kvm -m 1024

Direct http usage is probably too slow, but with copy-on-read ultimately
the image does boot!

After some time, streaming gets canceled by an EIO, which needs further
investigation.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qmp: Allow to take external snapshots on bs graphs node.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qmp: Allow block_resize to manipulate bs graph nodes.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Create authorizations mechanism for external snapshot and resize.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qmp: Allow to change password on named block driver states.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
There was two candidate ways to implement named node manipulation:

1)
{ 'command': 'block_passwd', 'data': {'*device': 'str',
                                      '*node-name': 'str', 'password': 'str'}
}

2)

{ 'command': 'block_passwd', 'data': {'device': 'str',
                                      '*device-is-node': 'bool',
                                      'password': 'str'} }

Luiz proposed 1 and says 2 was an abuse of the QMP interface and proposed to
rewrite the QMP block interface for 2.0.

Luiz does not like in 1 the fact that 2 fields are optional but one of them must
be specified leading to an abuse of the QMP semantic.

Kevin argumented that 2 what a clear abuse of the device field and would not be
practical when reading fast some log file because the user would read "device"
and think that a device is manipulated when it's in fact a node name.
Documentation of 1 make it pretty clear what to do for the user.

Kevin argued that all bs are node including devices ones so 2 does not make
sense.

Kevin also argued that rewriting the QMP block interface would not make disapear
the current one.

Kevin pushed the argument that making the QAPI generator compatible with the
semantic of the operation would need a rewrite that no one has done yet.

A vote has been done on the list to elect the version to use and 1 won.

For reference the complete thread is:
"[Qemu-devel] [PATCH V4 4/7] qmp: Allow to change password on names block driver
states."

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qmp: Add QMP query-named-block-nodes to list the named BlockDriverState nodes.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow the user to define "node-name" option both on command line and QMP.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Add bs->node_name to hold the name of a bs node of the bs graph.

Add the minimum of code to prepare for the following patches.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qapi: Add "backing" to BlockStats

Currently there is no way to query BlockStats of the backing chain. This
adds "backing" field into BlockStats to make it possible.

The comment of "parent" is reworded.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

vmdk: Fix format specific information (create type) for streamOptimized

Previously the field is wrong:

    $ ./qemu-img create -f vmdk -o subformat=streamOptimized /tmp/a.vmdk 1G

    $ ./qemu-img info /tmp/a.vmdk
    image: /tmp/a.vmdk
    file format: vmdk
    virtual size: 1.0G (1073741824 bytes)
    disk size: 12K
    Format specific information:
        cid: 1390460459
        parent cid: 4294967295
>>>     create type: monolithicSparse
        <snip>

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

drive mirror:fix memory leak

In the function mirror_iteration() -> qemu_iovec_init(),
it allocates memory for op->qiov.iov, when the write request calls back,
but in the function mirror_iteration_done(), it only frees the op,
not free the op->qiov.iov, so this causes memory leak.

It should use qemu_iovec_destroy() to free op->qiov.

Signed-off-by: Zhang Min <rudy.zhangmin@huawei.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

sheepdog: fix 'qemu-img map'

It was muted in the previous commit 4bc74be9. Let's revive it since nothing
prevents us to do it.

With this patch, following command will work as other formats:

$ qemu-img map sheepdog:image

Cc: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Documentation: qemu-img: Mention SIGUSR1 progress report

Document the SIGUSR1 behaviour of qemu-img. Also, added compare to the
list of subcommands that support -p.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

qemu-progress: Fix progress printing on SIGUSR1

Since commit a7aae221 ('Switch SIG_IPI to SIGUSR1'), SIGUSR1 is blocked
during startup, breaking the progress report in tools.

This patch reenables the signal when initialising a progress report.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

qemu-progress: Drop unused include

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>

vmdk: Check for overhead when opening

Report an error if file size is even smaller than metadata.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: fix wrong value of L1E_OFFSET_MASK, L2E_OFFSET_MASK and REFT_OFFSET_MASK

Accoring to qcow spec, the offset fields in l1e, l2e and ref table entry
start at bit 9. The offset is cluster offset, and the smallest possible
cluster size is 512 bytes.

Signed-off-by: Hu Tao <hutao@cn.fujitsu.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

dataplane: fix shadowed return value

Propagate the error return value from get_indirect(). This bug was
introduced in commit 4d684832 ("vring: create a common function to parse
descriptors").

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: fix backing file segfault

When a backing file is opened such that (1) a protocol is directly
used as the block driver and (2) the block driver has bdrv_file_open,
bdrv_open_backing_file segfaults. The problem arises because
bdrv_open_common returns without setting bd->backing_hd->file.

To effect (1), you seem to have to use the -F flag in qemu-img. There
are several block drivers that satisfy (2), such as "file" and "nbd".
Here are some concrete examples:

    #!/bin/bash

    echo Test file format
    ./qemu-img create -f file base.file 1m
    ./qemu-img create -f qcow2 -F file -o backing_file=base.file\
        file-overlay.qcow2
    ./qemu-img convert -O raw file-overlay.qcow2 file-convert.raw

    echo Test nbd format
    SOCK=$PWD/nbd.sock
    ./qemu-img create -f raw base.raw 1m
    ./qemu-nbd -t -k $SOCK base.raw &
    trap "kill $!" EXIT
    while ! test -e $SOCK; do sleep 1; done
    ./qemu-img create -f qcow2 -F nbd -o backing_file=nbd:unix:$SOCK\
        nbd-overlay.qcow2
    ./qemu-img convert -O raw nbd-overlay.qcow2 nbd-convert.raw

Without this patch, the two qemu-img convert commands segfault.

This is a regression that was introduced in v1.7 by
dbecebddfa4932d1c83915bcb9b5ba5984eb91be.

Signed-off-by: Peter Feiner <peter@gridcentric.ca>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Test file format nesting

Add a test for nested image formats.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Test new blkdebug/blkverify interface

Add a test for the new blkdebug/blkverify interface.

This test is not written in Python, although it uses QMP. This is
because it invokes the qemu-io HMP command, which outputs errors to
stderr instead of returning them through QMP. Filtering and testing that
output is easier in a shell script than with the Python infrastructure.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests: Add test for qdict_flatten()

Add a test case for qdict_flatten() in tests/check-qdict.c. This test
case covers the flattening of subordinate QLists as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests: Add test for qdict_array_split()

Add a test case for qdict_array_split() in tests/check-qdict.c.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-io: Make filename optional

Giving a filename is actually not essential, since it can be specified
through the options as well - on the contrary: Sometimes a filename must
not be given.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qapi: QMP interface for blkdebug and blkverify

Add structures to support blkdebug and blkverify in blockdev-add.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qapi: Add "errno" to the list of polluted words

Using "errno" directly as an identifier results in various syntax
errors; therefore it should be added to the list of polluted words.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkverify: Don't require protocol filename

If the filename is not prefixed by "blkverify:" in
blkverify_parse_filename(), the blkverify driver was not selected
through that protocol prefix, but by an explicit command line (or QMP)
option (like driver=blkverify).

If blkverify_parse_filename() has been called, a filename has been
given. If it is not prefixed, it is probably really just a plain
filename. This is no problem, since we can use it as the test image
filename and rely on the user to specify the raw image filename through
the new corresponding option.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkverify: Allow command-line configuration

Introduce the "test" and "raw" options for specifying images.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkdebug: Allow command-line file configuration

Introduce the "image" option as an alternative to specifying the image
through the filename.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockdev: Move "file" to legacy_opts

Specifying the image filename through the "file" option is a legacy
option and should not be supported by blockdev-add (in that case, giving
a string for "file" references an existing block device).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow recursive "file"s

It should be possible to use a format as a driver for a file which in
turn requires another file, i.e., nesting file formats.

Allowing nested file formats results in e.g. qcow2 BlockDriverStates
never being directly passed to bdrv_open_common() from bdrv_file_open(),
but instead being handed through bdrv_open(). This changes the error
message when trying to give a filename to qcow2, i.e. trying to use it
as a driver for the protocol level. Therefore, change the reference
output of I/O test 051 accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Use bdrv_open_image() in bdrv_open()

Using bdrv_open_image() instead of bdrv_file_open() directly in
bdrv_open() is easier.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Add bdrv_open_image()

Add a common function for opening images to be used for block drivers
specified through BlockdevRefs in an option QDict. The difference from
bdrv_file_open() is that this function may invoke bdrv_open() instead,
allowing auto-detection of the driver to be used; and second, it
automatically extracts the BlockdevRef from the option QDict.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow block devices without files

blkdebug and blkverify will, in order to retain compatibility, not
support the field "file" implicitly through bdrv_open(). In order to be
able to use those drivers without giving a filename anyway, it is
necessary to be able to have block devices without files implicitly
opened by bdrv_open(). This is the case, if there was neither a file
name, a reference to an existing block device to use as a file nor
options specific to the file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Pass reference to bdrv_file_open()

With that now being possible, bdrv_open() should try to extract a block
device reference from the options and pass it to bdrv_file_open().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow reference for bdrv_file_open()

Allow specifying a reference to an existing block device (by name) for
bdrv_file_open() instead of a filename and/or options.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkdebug: Use command-line in read_config()

Use qemu_config_parse_qdict() to parse the command-line options in
addition to the config file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkdebug: Always call read_config()

Move the check whether there actually is a config file into the
read_config() function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-option: Add qemu_config_parse_qdict()

This function basically parses command-line options given as a QDict
replacing a config file.

For instance, the QDict {"section.opt1": 42, "section.opt2": 23}
corresponds to the config file:

[section]
opt1 = 42
opt2 = 23

It is possible to specify multiple sections and also multiple sections
of the same type. On the command line, this looks like the following:

inject-error.0.event=reftable_load,\
inject-error.1.event=l2_load,\
set-state.event=l1_update

This would correspond to the following config file:

[inject-error "inject-error.0"]
event = reftable_load

[inject-error "inject-error.1"]
event = l2_load

[set-state]
event = l1_update

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qapi: extend qdict_flatten() for QLists

Reversing qdict_array_split(), qdict_flatten() should flatten QLists as
well by interpreting them as QDicts where every entry's key is its
index.

This allows bringing QDicts with QLists from QMP commands to the same
form as they would be given as command-line options, thereby allowing
them to be parsed the same way.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qdict: Add qdict_array_split()

This function splits a QDict consisting of entries prefixed by
incrementally enumerated indices into a QList of QDicts.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkdebug: Don't require sophisticated filename

If the filename is not prefixed by "blkdebug:" in
blkdebug_parse_filename(), the blkdebug driver was not selected through
that protocol prefix, but by an explicit command line option
(file.driver=blkdebug or something similar). Contrary to the current
reaction, this is not a problem at all; we just need to store the
filename (in the x-image option) and can go on; the user just has to
manually specify the config option.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkdebug: Use errp for read_config()

Use an Error variable in the read_config() function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-io: add command completion

Autocomplete qemu-io commands at the interactive prompt.

Note this only completes command names and not their options.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-io: use readline.c

Use readline.c for command-line history. There was support for GNU
Readline and BSD Editline but it was never compiled in. Since QEMU has
its own readline.c, just use that when qemu-io runs with stdin attached
to a terminal.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

osdep: add qemu_set_tty_echo()

Using stdin with readline.c requires disabling echo and line buffering.
Add a portable wrapper to set the terminal attributes under Linux and
Windows.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

readline: move readline to a generic location

Now that the monitor and readline are decoupled, readline.h no longer
belongs in include/monitor/. Put the header into include/qemu/.

Move the source file into util/ so it can be linked as part of
libqemuutil.a.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

readline: decouple readline from the monitor

Make the readline.c functionality reusable.  Instead of calling
monitor_printf() and monitor_flush() directly, invoke function pointers
provided by the user.

This way readline.c does not know about Monitor and other users will be
able to make use of readline.c.

Note that there is already an "opaque" argument to the ReadLineFunc
callback.  Consistently call it "readline_opaque" from now on to
distinguish from the ReadLinePrintfFunc/ReadLineFlushFunc "opaque"
argument.

I also dropped the printf macro trickery since it's now highly unlikely
that anyone modifying readline.c would call printf(3) directly.  We no
longer need this protection.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

vmdk: Fix big flat extent IO

Local variable "n" as int64_t avoids overflow with large sector number
calculation. See test case change for failure case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

docs: qcow2 compat=1.1 is now the default

Commit 9117b47717ad208b12786ce88eacb013f9b3dd1c ("qcow2: Change default
for new images to compat=1.1") changed the default qcow2 image format
version but forgot to update qemu-doc.texi and qemu-img.texi.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qtest: Fix the bug about disable vnc causes "make check" fail

When we disable vnc from "./configure", QEMU can't use the vnc option.
So qtest can't use the "vnc -none ", otherwise "make check" fails.
If QEMU uses "-display none", "-vnc none" is excrescent, So we just need to drop it.

Signed-off-by: Kewei Yu <keweihk@gmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

sheepdog: fix clone operation by 'qemu-img create -b'

We should pass base_inode->vdi_id to base_vdi_id of SheepdogVdiReq so that sheep
can create a clone instead a fresh volume.

This fixes following command:

qemu-create -b sheepdog:base sheepdog:clone

so users can boot sheepdog:clone as a normal volume.

Cc: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <namei.unix@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

gluster: Add support for creating zero-filled image

GlusterFS supports creation of zero-filled file on GlusterFS volume
by means of an API called glfs_zerofill(). Use this API from QEMU to
create an image that is filled with zeroes by using the preallocation
option of qemu-img.

qemu-img create gluster://server/volume/image -o preallocation=full 10G

The allowed values for preallocation are 'full' and 'off'. By default
preallocation is off and image is not zero-filled.

glfs_zerofill() offloads the writing of zeroes to the server and if
the storage supports SCSI WRITESAME, GlusterFS server can issue
BLKZEROOUT ioctl to achieve the zeroing.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

gluster: Implement .bdrv_co_write_zeroes for gluster

Support .bdrv_co_write_zeroes() from gluster driver by using GlusterFS API
glfs_zerofill() that off-loads the writing of zeroes to GlusterFS server.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

gluster: Convert aio routines into coroutines

Convert the read, write, flush and discard implementations from aio-based
ones to coroutine based ones.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/iscsi: return -ENOMEM if an async call fails immediately

if an async libiscsi call fails directly it can only be due
to an out of memory condition. All other errors are returned
through the callback.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Ronnie Sahlberg <ronniesahlberg@gmail.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-iotests: Clean up all extents for vmdk

This modifies _cleanup_test_img to remove all the extent files listed by
"qemu-img info"'s format specific information.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-iotests: Add _unsupported_imgopts for vmdk subformats

Some cases are not applicable for vmdk subformats those don't support
certain features, e.g. backing file, and some others can't run on
mult-file image, e.g. monolithicFlat. This adds declaration in test
cases to skip them automatically, so that iotests on vmdk can go
more smoothly (without manually picking of cases for each subformat).

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-iotests: Introduce _unsupported_imgopts

Introduce _unsupported_imgopts that causes _notrun for specific image
options.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

rbd: switch from pipe to QEMUBH completion notification

rbd callbacks are called from non-QEMU threads.  Up until now a pipe was
used to signal completion back to the QEMU iothread.

The pipe writer code handles EAGAIN using select(2).  The select(2) API
is not scalable since fd_set size is static.  FD_SET() can write beyond
the end of fd_set if the file descriptor number is too high.  (QEMU's
main loop uses poll(2) to avoid this issue with select(2).)

Since the pipe itself is quite clumsy to use and QEMUBH is now
thread-safe, just schedule a BH from the rbd callback function.  This
way we can simplify I/O completion in addition to eliminating the
potential FD_SET() crash when file descriptor numbers become too high.

Crash scenario: QEMU already has 1024 file descriptors open.  Hotplug an
rbd drive and get the pipe writer to take the select(2) code path.

Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Tested-by: Josh Durgin <josh.durgin@inktank.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Revert "error: Don't use error_report() for assertion msgs."

This reverts commit d32934c84c72f57e78d430c22974677b7bcabe5d.

The original implementation before this patch makes abortive error
messages much more friendly. The underlying bug that required this
change is now fixed. Revert.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

tests: Add libqemustub to qom-interface-check

The recent addition of util/error.c's dependency on error_report()
causes this test to fail to link due to a number of missing monitor
related symbols. All these symbols are however defined by libqemustub.
Add this libary to the link.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

SPARC: Fix LEON3 power down instruction

Synchronize the program counter before the power down helper call
otherwise interrupts will return to the wrong context.

Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

error: Don't use error_report() for assertion msgs.

Use fprintf(stderr instead. This removes dependency of libqemuutil.a
on the monitor.

We can further justify this change, in that this code path should only
trigger under a fatal error condition. fprintf-stderr is probably the
appropriate medium as under a fatal error conidition the monitor itself
may be down and out for the count. So assertion failure messages should
go lowest common denominator - straight to stderr.

Fixes the build as reported by Kevin Wolf. Issue debugged and change
suggested by Luiz Capitulino. Issue introduced by
5d24ee70bcbcf578614193526bcd5ed30a8eb16c.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

Merge remote branch 'luiz/queue/qmp' into qmpq

* luiz/queue/qmp:
  migration: qmp_migrate(): keep working after syntax error
  qerror: Remove assert_no_error()
  qemu-option: Remove qemu_opts_create_nofail
  target-i386: Remove assert_no_error usage
  hw: Remove assert_no_error usages
  qdev: Delete dead code
  error: Add error_abort
  monitor: add object-add (QMP) and object_add (HMP) command
  monitor: add object-del (QMP) and object_del (HMP) command
  qom: catch errors in object_property_add_child
  qom: fix leak for objects created with -object
  rng: initialize file descriptor to -1
  qemu-monitor: HMP cpu-add wrapper
  vl: add missing transition debug->finish_migrate

Message-Id: 1389045795-18706-1-git-send-email-lcapitulino@redhat.com
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

Microblaze: Convert Microblaze-pic handling to GPIOs

This patch uses inbound GPIO lines (IRQ and FIR) for
interrupts instead of using the old pic_cpu method,
which doesn't correspond to real hardware.

This creates the CPU's inbound IRQ and FIR GPIO lines and
updates the Microblaze boards to use this new method.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Suggested-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reveiwed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

target-arm: Switch ARMCPUInfo arrays to use terminator entries

Switch the ARMCPUInfo arrays in cpu.c and cpu64.c to use a terminator
entry rather than looping based on ARRAY_SIZE. The latter causes
compile warnings on some versions of gcc if the configure options
happen to result in an empty array.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Tested-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>

Merge remote-tracking branch 'quintela/tags/migration/20140113' into staging

migration.next for 20140113

# gpg: Signature made Mon 13 Jan 2014 09:38:27 AM PST using RSA key ID 5872D723
# gpg: Can't check signature: public key not found

* quintela/tags/migration/20140113: (49 commits)
  migration: synchronize memory bitmap 64bits at a time
  ram: split function that synchronizes a range
  memory: syncronize kvm bitmap using bitmaps operations
  memory: move bitmap synchronization to its own function
  kvm: refactor start address calculation
  kvm: use directly cpu_physical_memory_* api for tracking dirty pages
  memory: unfold memory_region_test_and_clear()
  memory: split cpu_physical_memory_* functions to its own include
  memory: cpu_physical_memory_set_dirty_tracking() should return void
  memory: make cpu_physical_memory_reset_dirty() take a length parameter
  memory: s/dirty/clean/ in cpu_physical_memory_is_dirty()
  memory: cpu_physical_memory_clear_dirty_range() now uses bitmap operations
  memory: cpu_physical_memory_set_dirty_range() now uses bitmap operations
  memory: use find_next_bit() to find dirty bits
  memory: s/mask/clear/ cpu_physical_memory_mask_dirty_range
  memory: cpu_physical_memory_get_dirty() is used as returning a bool
  memory: make cpu_physical_memory_get_dirty() the main function
  memory: unfold cpu_physical_memory_set_dirty_flag()
  memory: unfold cpu_physical_memory_set_dirty() in its only user
  memory: unfold cpu_physical_memory_clear_dirty_flag() in its only user
  ...

Message-id: 1389634834-24181-1-git-send-email-quintela@redhat.com
Signed-off-by: Anthony Liguori <aliguori@amazon.com>

migration: synchronize memory bitmap 64bits at a time

We use the old code if the bitmaps are not aligned

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>

ram: split function that synchronizes a range

This function is the only bit where we care about speed.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>

memory: syncronize kvm bitmap using bitmaps operations

If bitmaps are aligned properly, use bitmap operations. If they are
not, just use old bit at a time code.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>

memory: move bitmap synchronization to its own function

We want to have all the functions that handle directly the dirty
bitmap near. We will change it later.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>