tcg: Move some opcode generation functions out of line
Some of these functions are really quite large. We have a number of
things that ought to be circularly dependent, but we duplicated code
to break that chain for the inlines.
This saved 25% of the code size of one of the translators I examined.
Reviewed-by: Bastian Koppelmann <kbastian@mail.uni-paderborn.de> Signed-off-by: Richard Henderson <rth@twiddle.net>
Alex Williamson [Tue, 10 Feb 2015 17:25:44 +0000 (10:25 -0700)]
vfio: Use vfio type1 v2 IOMMU interface
The difference between v1 and v2 is fairly subtle, simply more
deterministic behavior for unmaps. The v1 interface allows the user
to attempt to unmap sub-regions of previous mappings, returning
success with zero size if unable to comply. This was a reflection of
the underlying IOMMU API. The v2 interface requires that the user
may only unmap fully contained mappings, ie. an unmap cannot intersect
or bisect a previous mapping, but may cover multiple mappings. QEMU
never made use of the sub-region v1 support anyway, so we can support
either v1 or v2. We'll favor v2 since it's newer.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Paolo Bonzini [Tue, 10 Feb 2015 17:25:44 +0000 (10:25 -0700)]
vfio: unmap and free BAR data in instance_finalize
In the case of VFIO, the unrealize callback is too early to munmap the
BARs. The munmap must be delayed until memory accesses are complete.
To do this, split vfio_unmap_bars in two. The removal step, now called
vfio_unregister_bars, remains in vfio_exitfn. The reclamation step
is vfio_unmap_bars and is moved to the instance_finalize callback.
Similarly, quirk MemoryRegions have to be removed during
vfio_unregister_bars, but freeing the data structure must be delayed
to vfio_unmap_bars.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Paolo Bonzini [Tue, 10 Feb 2015 17:25:44 +0000 (10:25 -0700)]
vfio: free dynamically-allocated data in instance_finalize
In order to enable out-of-BQL address space lookup, destruction of
devices needs to be split in two phases.
Unrealize is the first phase; once it complete no new accesses will
be started, but there may still be pending memory accesses can still
be completed.
The second part is freeing the device, which only happens once all memory
accesses are complete. At this point the reference count has dropped to
zero, an RCU grace period must have completed (because the RCU-protected
FlatViews hold a reference to the device via memory_region_ref). This is
when instance_finalize is called.
Freeing data belongs in an instance_finalize callback, because the
dynamically allocated memory can still be used after unrealize by the
pending memory accesses.
This starts the process by creating an instance_finalize callback and
freeing most of the dynamically-allocated data in instance_finalize.
Because instance_finalize is also called on error paths or also when
the device is actually not realized, the common code needs some changes
to be ready for this. The error path in vfio_initfn can be simplified too.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Now that vfio_put_base_device is called unconditionally at instance_finalize
time, it can be called twice if vfio_populate_device fails. This works
but it is slightly harder to follow.
Change vfio_get_device to not touch the vbasedev struct until it will
definitely succeed, moving the vfio_populate_device call back to vfio-pci.
This way, vfio_put_base_device will only be called once.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Paolo Bonzini [Tue, 10 Feb 2015 17:25:44 +0000 (10:25 -0700)]
memory: unregister AddressSpace MemoryListener within BQL
address_space_destroy_dispatch is called from an RCU callback and hence
outside the iothread mutex (BQL). However, after address_space_destroy
no new accesses can hit the destroyed AddressSpace so it is not necessary
to observe changes to the memory map. Move the memory_listener_unregister
call earlier, to make it thread-safe again.
Reported-by: Alex Williamson <alex.williamson@redhat.com> Fixes: 374f2981d1f10bc4307f250f24b2a7ddb9b14be0 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Stefan Weil [Fri, 6 Feb 2015 21:43:17 +0000 (22:43 +0100)]
virtio: Fix warning caused by missing 'static' attribute
Warning from the Sparse static analysis tool:
hw/char/virtio-serial-bus.c:31:3:
warning: symbol 'vserdevices' was not declared. Should it be static?
Cc: Amit Shah <amit.shah@redhat.com> Cc: Anthony Liguori <aliguori@amazon.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Stefan Weil [Fri, 6 Feb 2015 21:43:13 +0000 (22:43 +0100)]
serial: Fix warnings caused by missing 'static' attribute
Warnings from the Sparse static analysis tool:
hw/char/serial.c:630:26: warning: symbol
'vmstate_serial_thr_ipending' was not declared. Should it be static?
hw/char/serial.c:646:26: warning: symbol
'vmstate_serial_tsr' was not declared. Should it be static?
hw/char/serial.c:665:26: warning: symbol
'vmstate_serial_recv_fifo' was not declared. Should it be static?
hw/char/serial.c:681:26: warning: symbol
'vmstate_serial_xmit_fifo' was not declared. Should it be static?
hw/char/serial.c:697:26: warning: symbol
'vmstate_serial_fifo_timeout_timer' was not declared. Should it be static?
hw/char/serial.c:713:26: warning: symbol
'vmstate_serial_timeout_ipending' was not declared. Should it be static?
hw/char/serial.c:729:26: warning: symbol
'vmstate_serial_poll' was not declared. Should it be static?
Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Stefan Weil [Fri, 6 Feb 2015 21:43:11 +0000 (22:43 +0100)]
migration: Fix warnings caused by missing 'static' attribute
Warnings from the Sparse static analysis tool:
migration-rdma.c:151:12: warning:
symbol 'wrid_desc' was not declared. Should it be static?
migration-rdma.c:190:12: warning:
symbol 'control_desc' was not declared. Should it be static?
migration-rdma.c:3301:19: warning:
symbol 'rdma_read_ops' was not declared. Should it be static?
migration-rdma.c:3308:19: warning:
symbol 'rdma_write_ops' was not declared. Should it be static?
Cc: Juan Quintela <quintela@redhat.com> Cc: Amit Shah <amit.shah@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Stefan Weil [Fri, 6 Feb 2015 21:43:10 +0000 (22:43 +0100)]
migration: Fix warning caused by missing declaration of vmstate_dummy
Warning from the Sparse static analysis tool:
stubs/vmstate.c:4:26: warning:
symbol 'vmstate_dummy' was not declared. Should it be static?
Cc: Juan Quintela <quintela@redhat.com> Cc: Amit Shah <amit.shah@redhat.com> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
translate-all: Use g_try_malloc() for dynamic translator buffer
The USE_MMAP code can fail, and the caller handles the failure
already. Let the !USE_MMAP code fail as well, for consistency.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
vnc: g_realloc() can't fail, bury dead error handling
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rdma: g_malloc0() can't fail, bury dead error handling
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
kvm: g_malloc() can't fail, bury dead error handling
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
rtl8139: g_malloc() can't fail, bury dead error handling
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
onenand: g_malloc() can't fail, bury dead error handling
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Greg Kurz [Sat, 7 Feb 2015 10:25:14 +0000 (11:25 +0100)]
Fix name error in migration stream analyzation script
It fixes the following error:
Traceback (most recent call last):
File "./scripts/analyze-migration.py", line 584, in <module>
dump.read(dump_memory = args.memory)
File "./scripts/analyze-migration.py", line 528, in read
self.sections[section_id].read()
File "./scripts/analyze-migration.py", line 250, in read
self.file.readvar(n_valid * HASH_PTE_SIZE_64)
NameError: global name 'HASH_PTE_SIZE_64' is not defined
Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
util/uri: URI member path can be null, compare more carfully
uri_resolve_relative() calls strcmp(bas->path, ref->path). However,
either argument could be null! Evidence: the code checks for null
after the comparison. Spotted by Coverity.
I suspect this was screwed up when we stole the code from libxml2.
There the conditional reads
int
xmlStrEqual(const xmlChar *str1, const xmlChar *str2) {
if (str1 == str2) return(1);
if (str1 == NULL) return(0);
if (str2 == NULL) return(0);
do {
if (*str1++ != *str2) return(0);
} while (*str2++);
return(1);
}
Fix by replicating libxml2's logic faithfully.
Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
spapr_vio: Pair g_malloc() with g_free(), not free()
Spotted by Coverity with preview checker ALLOC_FREE_MISMATCH enabled
and my "coverity: Model g_free() isn't necessarily free()" model patch
applied.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
qemu-option: Pair g_malloc() with g_free(), not free()
Spotted by Coverity with preview checker ALLOC_FREE_MISMATCH enabled
and my "coverity: Model g_free() isn't necessarily free()" model patch
applied.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
qemu-option: Replace pointless use of g_malloc0() by g_malloc()
get_opt_value() takes a write-only buffer, so zeroing it is pointless.
We don't do it elsewhere, either.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Cc: <qemu-stable@nongnu.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
valgrind complains about:
==42062== 16 bytes in 1 blocks are definitely lost in loss record 387 of 1,048
==42062== at 0x402DCB2: malloc (vg_replace_malloc.c:299)
==42062== by 0x40C1BE3: g_malloc (in /usr/lib64/libglib-2.0.so.0.3800.2)
==42062== by 0x40DA133: g_slice_alloc (in /usr/lib64/libglib-2.0.so.0.3800.2)
==42062== by 0x40DB2E5: g_slist_prepend (in /usr/lib64/libglib-2.0.so.0.3800.2)
==42062== by 0x801637FF: object_class_get_list_tramp (object.c:690)
==42062== by 0x40A96C9: g_hash_table_foreach (in /usr/lib64/libglib-2.0.so.0.3800.2)
==42062== by 0x80164885: object_class_foreach (object.c:665)
==42062== by 0x80164975: object_class_get_list (object.c:698)
==42062== by 0x800100A5: machine_parse (vl.c:2447)
==42062== by 0x800100A5: main (vl.c:3756)
Lets free machines in case of mc.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Paolo Bonzini [Mon, 26 Jan 2015 11:12:24 +0000 (12:12 +0100)]
qemu-sockets: improve error reporting in unix_listen_opts
Coverity complains about not checking the returned value of mkstemp. While
at it, also improve error checking for snprintf, and refine error messages
in general.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Paolo Bonzini [Mon, 26 Jan 2015 11:12:21 +0000 (12:12 +0100)]
cpu-exec: drop dead assignment
All uses of TB inside cpu_exec are dominated by "tb = tb_find_fast(env)",
and there are no uses after the switch statement. So the assignment
is dead, as reported by Coverity.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Thomas Huth [Tue, 27 Jan 2015 12:11:26 +0000 (13:11 +0100)]
qemu-log: Correct help text of 'log cpu_reset'
The logging of the CPU state during reset is done for all architectures
nowadays (see cpu_common_reset() in qom/cpu.c), so the "x86 only" text
does not apply here anymore.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Chen Gang S [Sun, 25 Jan 2015 00:00:42 +0000 (08:00 +0800)]
linux-user/syscall.c: do_ioctl_dm: Need to call unlock_user() before going to failure return in default case
In abi_long do_ioctl_dm(), after lock_user() call, the code does
not call unlock_user() before going to failure return in default case.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Chen Gang S [Sun, 25 Jan 2015 11:35:58 +0000 (19:35 +0800)]
linux-user/main.c: Use TARGET_SIG* instead of SIG*
In main.c, all SIG* should be TARGET_SIG*, since the relevant functions
(queue_signal() and gdb_handlesig()) expect TARGET_SIG*.
The corresponding vi command is "1,$ s/\<SIG/TARGET_SIG/g".
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Chen Gang S [Fri, 23 Jan 2015 10:07:50 +0000 (18:07 +0800)]
linux-user/syscall.c: Fix typo issue for using target_vec[i].iov_len instead of target_vec[i].iov_base
It is only a typo issue, need use tswapal(target_vec[i].iov_len) for the
len.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Chen Gang S [Fri, 23 Jan 2015 10:01:09 +0000 (18:01 +0800)]
linux-user/syscall.c: lock_iovec: unlock vec[i] in failure processing code block
When failure occurs during locking of vec[i], we also need to unlock all
already locked vec[i] in failure processing code block before return.
Code in unlock_user() checks vec[i].iov_base for NULL, so there's no
need not check it .
If error is EFAULT when "i == 0", vec[i].iov_base is NULL, we can just
skip it, so can still use "while (--i >= 0)" loop condition.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
vl: Fix bogus error message for implied mon ID clashing
monitor_parse() desugars --monitor, --qmp and -qmp-pretty to --mon.
The ID it picks can clash with a user-specified ID. When it happens,
the error message is misleading.
Reproducer:
$ qemu --mon id=compat_monitor0 --monitor stdio
Message before the patch:
duplicate chardev: compat_monitor0
There's no "duplicate chardev" here. The problem is a duplicate
monitor ID. Moreover, the message provides no clue which option
caused the problem. The patch changes the message to:
qemu: --monitor stdio: Duplicate ID 'compat_monitor0' for mon
monitor_parse() is also used for creating a default monitor, but
that's not done when the user specifies a monitor, so an ID clash is
impossible then.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
target-mips: Clean up switch fall through after commit fecd264
Commit fecd264 added a number of fall-throughs, but neglected to
properly document them as intentional. Commit d922445 cleaned that up
for many, but not all cases. Take care of the remaining ones.
Spotted by Coverity.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Peter Maydell [Fri, 6 Feb 2015 18:06:07 +0000 (18:06 +0000)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block patches for 2.3
# gpg: Signature made Fri 06 Feb 2015 17:14:10 GMT using RSA key ID C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
* remotes/kevin/tags/for-upstream: (47 commits)
block/raw-posix.c: Fix raw_getlength() on Mac OS X block devices
block: Eliminate silly QERR_ macros used for encryption keys
block: New bdrv_add_key(), convert monitor to use it
blockdev: Eliminate silly QERR_BLOCK_JOB_NOT_ACTIVE macro
blockdev: Give find_block_job() an Error ** parameter
qcow2: Rewrite qcow2_alloc_bytes()
block: Give always priority to unused entries in the qcow2 L2 cache
nbd: fix max_discard/max_transfer_length
block: introduce BDRV_REQUEST_MAX_SECTORS
nbd: Improve error messages
iotests: Fix 104 for NBD
iotests: Fix 100 for nbd
iotests: Fix 083
block: fix off-by-one error in qcow and qcow2
qemu-iotests: add 116 invalid QED input file tests
qed: check for header size overflow
block/dmg: improve zeroes handling
block/dmg: support bzip2 block entry types
block/dmg: factor out block type check
block/dmg: use SectorNumber from BLKX header
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Programmingkid [Mon, 19 Jan 2015 22:12:55 +0000 (17:12 -0500)]
block/raw-posix.c: Fix raw_getlength() on Mac OS X block devices
This patch replaces the dummy code in raw_getlength() for block devices
on OS X, which always returned LLONG_MAX, with a real implementation
that returns the actual block device size.
Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin Wolf [Fri, 6 Feb 2015 17:00:14 +0000 (18:00 +0100)]
Merge remote-tracking branch 'mreitz/block' into queue-block
* mreitz/block:
block: Eliminate silly QERR_ macros used for encryption keys
block: New bdrv_add_key(), convert monitor to use it
blockdev: Eliminate silly QERR_BLOCK_JOB_NOT_ACTIVE macro
blockdev: Give find_block_job() an Error ** parameter
block: Eliminate silly QERR_ macros used for encryption keys
The QERR_ macros are leftovers from the days of "rich" error objects.
They're used with error_set() and qerror_report(), and expand into the
first *two* arguments. This trickiness has become pointless. Clean
up QERR_DEVICE_ENCRYPTED and QERR_DEVICE_NOT_ENCRYPTED.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-5-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
block: New bdrv_add_key(), convert monitor to use it
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-4-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
The QERR_ macros are leftovers from the days of "rich" error objects.
They're used with error_set() and qerror_report(), and expand into the
first *two* arguments. This trickiness has become pointless. Clean
this one up.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-3-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
blockdev: Give find_block_job() an Error ** parameter
When find_block_job() fails, all its callers build the same Error
object. Build it in find_block_job() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-2-git-send-email-armbru@redhat.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
Alberto Garcia [Thu, 5 Feb 2015 12:55:31 +0000 (14:55 +0200)]
block: Give always priority to unused entries in the qcow2 L2 cache
The current algorithm to replace entries from the L2 cache gives
priority to newer hits by dividing the hit count of all existing
entries by two everytime there is a cache miss.
However, if there are several cache misses the hit count of the
existing entries can easily go down to 0. This will result in those
entries being replaced even when there are others that have never been
used.
This problem is more noticeable with larger disk images and cache
sizes, since the chances of having several misses before the cache is
full are higher.
If we make sure that the hit count can never go down to 0 again,
unused entries will always have priority.
Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Denis V. Lunev [Fri, 6 Feb 2015 11:24:43 +0000 (14:24 +0300)]
nbd: fix max_discard/max_transfer_length
nbd_co_discard calls nbd_client_session_co_discard which uses uint32_t
as the length in bytes of the data to discard due to the following
definition:
struct nbd_request {
uint32_t magic;
uint32_t type;
uint64_t handle;
uint64_t from;
uint32_t len; <-- the length of data to be discarded, in bytes
} QEMU_PACKED;
Thus we should limit bl_max_discard to UINT32_MAX >> BDRV_SECTOR_BITS to
avoid overflow.
NBD read/write code uses the same structure for transfers. Fix
max_transfer_length accordingly.
Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Lieven <pl@kamp.de> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Fri, 6 Feb 2015 10:54:11 +0000 (11:54 +0100)]
block: introduce BDRV_REQUEST_MAX_SECTORS
we check and adjust request sizes at several places with
sometimes inconsistent checks or default values:
INT_MAX
INT_MAX >> BDRV_SECTOR_BITS
UINT_MAX >> BDRV_SECTOR_BITS
SIZE_MAX >> BDRV_SECTOR_BITS
This patches introdocues a macro for the maximal allowed sectors
per request and uses it at several places.
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Tue, 27 Jan 2015 02:02:59 +0000 (21:02 -0500)]
nbd: Improve error messages
This patch makes use of the Error object for nbd_receive_negotiate() so
that errors during negotiation look nicer.
Furthermore, this patch adds an additional error message if the received
magic was wrong, but would be correct for the other protocol version,
respectively: So if an export name was specified, but the NBD server
magic corresponds to an old handshake, this condition is explicitly
signaled to the user, and vice versa.
As these messages are now part of the "Could not open image" error
message, additional filtering has to be employed in iotest 083, which
this patch does as well.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Tue, 27 Jan 2015 02:02:58 +0000 (21:02 -0500)]
iotests: Fix 104 for NBD
_make_test_img sets up an NBD server, _cleanup_test_img shuts it down;
thus, _cleanup_test_img has to be called before _make_test_img is
invoked another time.
Furthermore, the pipe through _filter_test_img was unnecessary;
_make_test_img already takes care of that.
And finally, a filter is added to _filter_img_info to replace
"nbd://127.0.0.1:10810" by "TEST_DIR/t.IMGFMT", since the former is the
way to express the full image path (normally the latter) for NBD tests.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Tue, 27 Jan 2015 02:02:57 +0000 (21:02 -0500)]
iotests: Fix 100 for nbd
In case of NBD, _make_test_img starts a new NBD server. Therefore,
_cleanup_test_img (which shuts that server down) has to be invoked
before the next _make_test_img call in order to make 100 work for NBD.
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Max Reitz [Tue, 27 Jan 2015 02:02:56 +0000 (21:02 -0500)]
iotests: Fix 083
As of 8f9e835fd2e687d2bfe936819c3494af4343614d, probing should be
disabled in the qemu-iotests (at least when using qemu-io). This broke
083's reference output (which consisted mostly of "Could not read image
for determining its format").
This patch fixes it.
Note that one case which failed before is now successful: Disconnect
after data. This is due to qemu having read twice before (once for
probing, once for the qemu-io read command), but only once now (the
qemu-io read command). Therefore, reading is successful (which is
correct).
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Jeff Cody [Tue, 27 Jan 2015 13:33:55 +0000 (08:33 -0500)]
block: fix off-by-one error in qcow and qcow2
This fixes an off-by-one error introduced in 9a29e18. Both qcow and
qcow2 need to make sure to leave room for string terminator '\0' for
the backing file, so the max length of the non-terminated string is
either 1023 or PATH_MAX - 1.
Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Mon, 12 Jan 2015 12:31:33 +0000 (12:31 +0000)]
qemu-iotests: add 116 invalid QED input file tests
These tests exercise error code paths in the QED image format. The
tests are very simple, they just prove that the error path exits
cleanly.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1421065893-18875-3-git-send-email-stefanha@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Mon, 12 Jan 2015 12:31:32 +0000 (12:31 +0000)]
qed: check for header size overflow
Header size is denoted in clusters. The maximum cluster size is 64 MB
but there is no limit on header size. Check for uint32_t overflow in
case the header size field has a whacky value.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1421065893-18875-2-git-send-email-stefanha@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:15 +0000 (18:48 +0100)]
block/dmg: improve zeroes handling
Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB
is seen in the real world). These blocks (type 2) do not need to be
extracted into a temporary buffer, there is no need to allocate memory
for these blocks nor to check its length.
(For the test image, the maximum uncompressed size is 1054371 bytes,
probably for a bzip2-compressed block.)
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-13-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:14 +0000 (18:48 +0100)]
block/dmg: support bzip2 block entry types
This patch adds support for bzip2-compressed block entries as introduced
with OS X 10.4 (source: https://en.wikipedia.org/wiki/Apple_Disk_Image).
It was tested against a 5.2G "OS X Yosemite" installation image which
stores the BLXX block in the XML property list (instead of resource
forks) and has over 5k chunks.
New configure entries are added (--enable-bzip2 / --disable-bzip2) to
control inclusion of bzip2 functionality (which requires linking against
libbz2). The help message suggests that this option is needed for DMG
files, but the tests are generic enough that other parts of QEMU can use
bzip2 if needed.
The identifiers are based on http://newosxbook.com/DMG.html.
The decompression routines are based on the zlib case, but as there is
no way to reset the decompression state (unlike zlib), memory is
allocated and deallocated for every decompression. This should not be
problematic as the decompression takes most of the time and as blocks
are typically about/over 1 MiB in size, only one allocation is done
every 2000 sectors.
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1420566495-13284-12-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:13 +0000 (18:48 +0100)]
block/dmg: factor out block type check
In preparation for adding bzip2 support, split the type check into a
separate function. Make all offsets relative to the begin of a chunk
such that it is easier to recognize the position without having to
add up all offsets. Some comments are added to describe the fields.
There is no functional change.
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-11-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:12 +0000 (18:48 +0100)]
block/dmg: use SectorNumber from BLKX header
Previously the sector table parsing relied on the previous offset of
the DMG file. Now it uses the sector number from the BLKX header
(see http://newosxbook.com/DMG.html).
The implementation of dmg2img (from vu1tur) does not base the output
sector on the location of the terminator (0xffffffff) either so it
should be safe to drop this dependency on the previous state.
(It makes somehow makes sense, a terminator should halt further
processing of a block and is perhaps used to preallocate some space.)
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-10-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:11 +0000 (18:48 +0100)]
block/dmg: fix sector data offset calculation
This patch addresses two issues:
- The data fork offset was not taken into account, resulting in failure
to read an InstallESD.dmg file (5164763151 bytes) which had a
non-zero DataForkOffset field.
- The offset of the previous block ("partition") was unconditionally
added to the current block because older files would start the input
offset of a new block at zero. Newer files (including vlc-2.1.5.dmg,
tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in
reads because these files have chunk offsets, relative to the begin
of a data fork.
Now the data offset of the mish is taken into account. While we could
check that the data_offset is within the data fork, let's not do that
here as it would only result in parse failures on invalid files (rather
than gracefully handling such bad files). dmg_read will error out if
the offset is incorrect.
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-9-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:10 +0000 (18:48 +0100)]
block/dmg: set virtual size to a non-zero value
Right now the virtual size is always reported as zero which makes it
impossible to convert between formats.
After this patch, the number of sectors will be read from the trailer
("koly" block).
To verify the behavior, the output of `dmg2img foo.dmg foo.img` was
compared against `qemu-img convert -f dmg -O raw foo.dmg foo.raw`. The
tests showed that the file contents are exactly the same, except that
QEMU creates a slightly larger file (it matches the total sectors
count).
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-8-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:09 +0000 (18:48 +0100)]
block/dmg: process XML plists
The format is simple enough to avoid using a full-blown XML parser. It
assumes that all BLKX items begin with the "mish" magic word, therefore
it is not a problem if other values get matched which are not a BLKX
block.
The offsets are based on the description at
http://newosxbook.com/DMG.html
For compatibility with glib 2.12, use g_base64_decode (which
additionally requires an extra buffer allocation) instead of
g_base64_decode_inplace (which is only available since glib 2.20).
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-7-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:08 +0000 (18:48 +0100)]
block/dmg: validate chunk size to avoid overflow
Previously the chunk size was not checked, allowing for a large memory
allocation. This patch checks whether the chunks size is within the
resource fork length, and whether the resource fork is below the
trailer of the dmg file.
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-6-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:07 +0000 (18:48 +0100)]
block/dmg: process a buffer instead of reading ints
As the decoded plist XML is not a pointer in the file,
dmg_read_mish_block must be able to process a buffer instead of a file
pointer. Since the full buffer must be processed, let's change the
return value again to just a success flag.
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-5-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:06 +0000 (18:48 +0100)]
block/dmg: extract processing of resource forks
Besides the offset, also read the resource length. This length is now
used in the extracted function to verify the end of the resource fork
against "count" from the resource fork.
Instead of relying on the value of offset to conclude whether the
resource fork is available or not (info_begin==0), check the
rsrc_fork_length instead. This would allow a dmg file to begin with a
resource fork. This seemingly unnecessary restriction was found while
trying to craft a DMG file by hand.
Other changes:
- Do not require resource data offset to be 0x100 (but check that it
is within bounds though).
- Further improve boundary checking (resource data must be within
the resource fork).
- Use correct value for resource data length (spotted by John Snow)
- Consider the resource data offset when determining info_end.
This fixes an EINVAL on the tuxpaint dmg example.
The resource fork format is documented at
https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1420566495-13284-4-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Extract the mish block decoder such that this can be used for other
formats in the future. A new DmgHeaderState struct is introduced to
share state while decoding.
The code is kept unchanged as much as possible, a "fail" label is added
for example where a simple return would probably do. In dmg_open, the
variable "tmp" is renamed to "rsrc_data_offset" for clarity and comments
have been added explaining various data.
Note that this patch has one subtle difference with the previous
version which should not affect functionality. In the previous code,
the end of a resource was inferred from the mish block (the offsets
would be increased by the fields). In this patch, the resource length
is used instead to avoid the need to rely on the previous offsets.
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1420566495-13284-3-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Wu [Tue, 6 Jan 2015 17:48:04 +0000 (18:48 +0100)]
block/dmg: properly detect the UDIF trailer
DMG files have a variable length with a UDIF trailer at the end of a
file. This UDIF trailer is essential as it describes the contents of
the image. At the moment however, the start of this trailer is almost
always incorrect as bdrv_getlength() returns a multiple of the block
size (rounded up). This results in a failure to recognize DMG files,
resulting in Invalid argument (EINVAL) errors.
As there is no API to retrieve the real file size, look for the magic
header in the last two sectors to find the start of this 512-byte UDIF
trailer (the "koly" block).
The resource fork offset ("info_begin") has its offset adjusted as the
initial value of offset does not mean "end of file" anymore, but "begin
of UDIF trailer".
[Replaced error_set(errp, ERROR_CLASS_GENERIC_ERROR, ...) with
error_setg(errp, ...) as discussed with Peter.
--Stefan]
Signed-off-by: Peter Wu <peter@lekensteyn.nl> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1420566495-13284-2-git-send-email-peter@lekensteyn.nl Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Francesco Romani [Mon, 12 Jan 2015 13:11:13 +0000 (14:11 +0100)]
block: add event when disk usage exceeds threshold
Managing applications, like oVirt (http://www.ovirt.org), make extensive
use of thin-provisioned disk images.
To let the guest run smoothly and be not unnecessarily paused, oVirt sets
a disk usage threshold (so called 'high water mark') based on the occupation
of the device, and automatically extends the image once the threshold
is reached or exceeded.
In order to detect the crossing of the threshold, oVirt has no choice but
aggressively polling the QEMU monitor using the query-blockstats command.
This lead to unnecessary system load, and is made even worse under scale:
deployments with hundreds of VMs are no longer rare.
To fix this, this patch adds:
* A new monitor command `block-set-write-threshold', to set a mark for
a given block device.
* A new event `BLOCK_WRITE_THRESHOLD', to report if a block device
usage exceeds the threshold.
* A new `write_threshold' field into the `BlockDeviceInfo' structure,
to report the configured threshold.
This will allow the managing application to use smarter and more
efficient monitoring, greatly reducing the need of polling.
[Updated qemu-iotests 067 output to add the new 'write_threshold'
property. --Stefan]
[Changed g_assert_false() to !g_assert() to fix the build on older glib
versions. --Kevin]
Signed-off-by: Francesco Romani <fromani@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1421068273-692-1-git-send-email-fromani@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Fam Zheng [Fri, 16 Jan 2015 01:38:42 +0000 (09:38 +0800)]
qemu-iotests: Fix supported_oses check
There is a bug in the recently added sys.platform test, and we no longer
run python tests, because "linux2" is the value to compare here. So do a
prefix match. According to python doc [1], the way to use sys.platform
is "unless you want to test for a specific system version, it is
therefore recommended to use the following idiom":
Peter Lieven [Mon, 2 Feb 2015 13:52:21 +0000 (14:52 +0100)]
virtio-blk: introduce multiread
this patch finally introduces multiread support to virtio-blk. While
multiwrite support was there for a long time, read support was missing.
The complete merge logic is moved into virtio-blk.c which has
been the only user of request merging ever since. This is required
to be able to merge chunks of requests and immediately invoke callbacks
for those requests. Secondly, this is required to switch to
direct invocation of coroutines which is planned at a later stage.
The following benchmarks show the performance of running fio with
4 worker threads on a local ram disk. The numbers show the average
of 10 test runs after 1 run as warmup phase.
Peter Lieven [Mon, 2 Feb 2015 13:52:19 +0000 (14:52 +0100)]
hw/virtio-blk: add a constant for max number of merged requests
As it was not obvious (at least for me) where the 32 comes from;
add a constant for it.
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Peter Lieven [Mon, 2 Feb 2015 13:52:18 +0000 (14:52 +0100)]
block: add accounting for merged requests
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>