Eric Blake [Thu, 17 Nov 2016 20:13:55 +0000 (14:13 -0600)]
qcow2: Inform block layer about discard boundaries
At the qcow2 layer, discard is only possible on a per-cluster
basis; at the moment, qcow2 silently rounds any unaligned
requests to this granularity. However, an upcoming patch will
fix a regression in the block layer ignoring too much of an
unaligned discard request, by changing the block layer to
break up a discard request at alignment boundaries; for that
to work, the block layer must know about our limits.
However, we can't go one step further by changing
qcow2_discard_clusters() to assert that requests are always
aligned, since that helper function is reached on paths
outside of the block layer.
CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit ecdbead659f037dc572bba9eb1cd31a5a1a9ad9a) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Samuel Thibault [Sun, 13 Nov 2016 22:54:27 +0000 (23:54 +0100)]
slirp: Fix access to freed memory
if_start() goes through the slirp->if_fastq and slirp->if_batchq
list of pending messages, and accesses ifm->ifq_so->so_nqueued of its
elements if ifm->ifq_so != NULL. When freeing a socket, we thus need
to make sure that any pending message for this socket does not refer
to the socket any more.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Tested-by: Brian Candler <b.candler@pobox.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit ea64d5f08817b5e79e17135dce516c7583107f91) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Fri, 4 Nov 2016 08:39:15 +0000 (09:39 +0100)]
vhost: adapt vhost_verify_ring_mappings() to virtio 1 ring layout
With virtio 1, the vring layout is split in 3 separate regions of
contiguous memory for the descriptor table, the available ring and the
used ring, as opposed with legacy virtio which uses a single region.
In case of memory re-mapping, the code ensures it doesn't affect the
vring mapping. This is done in vhost_verify_ring_mappings() which assumes
the device is legacy.
This patch changes vhost_verify_ring_mappings() to check the mappings of
each part of the vring separately.
This works for legacy mappings as well.
Cc: qemu-stable@nongnu.org Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f1f9e6c5961ffb36fd4a81cd7edcded7bfad2ab2) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Kevin Wolf [Fri, 4 Nov 2016 23:03:15 +0000 (00:03 +0100)]
block: Don't mark node clean after failed flush
Commit 3ff2f67a changed bdrv_co_flush() so that no flush is issues if
the image hasn't been dirtied since the last flush. This is not quite
correct: The condition should be that the image hasn't been dirtied
since the last _successful_ flush. This patch changes the logic
accordingly.
Without this fix, subsequent bdrv_co_flush() calls would return success
without actually doing anything even though the image is still dirty.
The difference is visible in some blkdebug test cases where error
messages incorrectly disappeared after commit 3ff2f67a.
Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 1478300595-10090-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit e6af1e085416378918cca357bf2abd8b90224667)
virtio 1.0 spec says this is a legacy feature bit,
hide it from guests in modern mode.
Note: for cross-version migration compatibility,
we keep the bit set in host_features.
The result will be that a guest migrating cross-version
will see host features change under it.
As guests only seem to read it once, this should
not be an issue. Meanwhile, will work to fix guests to
ignore this bit in virtio1 mode, too.
Cc: qemu-stable@nongnu.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit 2a083ffd2e37ef08769749a5c7cfc6ca65c9f8ea) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Legacy features are those that transitional devices only
expose on the legacy interface.
Allow different ones per device class.
Cc: qemu-stable@nongnu.org # dependency for the next patch Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
(cherry picked from commit 9b706dbbbb81f5cb7c67e491d38cd6077205e056)
Conflicts:
hw/virtio/virtio.c
* drop context dep on ff4c07df
* resolv func dep on ff4c07df creating vdc variable in
virtio_device_class_init()
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Mon, 21 Nov 2016 05:28:12 +0000 (16:28 +1100)]
target-ppc: Fix CPU migration from qemu-2.6 <-> later versions
When migration for target-ppc was converted to vmstate, several
VMSTATE_EQUAL() checks were foolishly included of things that really
should be internal state. Specifically we verified equality of the
insns_flags and insns_flags2 fields, which are used within TCG to
determine which groups of instructions are available on this cpu
model. Between qemu-2.6 and qemu-2.7 we made some changes to these
classes which broke migration.
This path fixes migration both forwards and backwards. On migration
from 2.6 to later versions we import the fields into teporary
variables, which we then ignore. In migration backwards, we populate
the temporary fields from the runtime fields, but mask out the bits
which were added after qemu-2.6, allowing the VMSTATE_EQUAL in
qemu-2.6 to accept the stream.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 16a2497bd44cac1856e259654fd304079bd1dcdc) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
This function is from net/socket.c, move it to net.c and net.h.
Add SocketReadState to make others reuse net_fill_rstate().
suggestion from jason.
This refactored the state out of NetSocketState into a
separate SocketReadState. This refactoring requires
that a callback is provided to be triggered upon
completion of a packet receive from the guest.
The patch only registered this callback in the codepaths
hit by -net socket,connect, not -net socket,listen. So
as a result packets sent by the guest in the latter case
get dropped on the floor.
This bug is hidden because net_fill_rstate() silently
does nothing if the callback is not set.
This patch adds in the middle callback registration
and also adds an assert so that QEMU aborts if there
are any other codepaths hit which are missing the
callback.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit e79cd4068063ea2859199002a049010a11202939) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Corey Minyard [Mon, 24 Oct 2016 20:10:21 +0000 (15:10 -0500)]
acpi/ipmi: Initialize the fwinfo before fetching it
The initialization was missed before, resulting in some
bad data in the smbus case.
Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: qemu-stable@nongnu.org Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 698ae42b9124dce23e03d0fea2e635b70540ef13) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Alex Williamson [Mon, 31 Oct 2016 15:53:03 +0000 (09:53 -0600)]
memory: Don't use memcpy for ram_device regions
With a vfio assigned device we lay down a base MemoryRegion registered
as an IO region, giving us read & write accessors. If the region
supports mmap, we lay down a higher priority sub-region MemoryRegion
on top of the base layer initialized as a RAM device pointer to the
mmap. Finally, if we have any quirks for the device (ie. address
ranges that need additional virtualization support), we put another IO
sub-region on top of the mmap MemoryRegion. When this is flattened,
we now potentially have sub-page mmap MemoryRegions exposed which
cannot be directly mapped through KVM.
This is as expected, but a subtle detail of this is that we end up
with two different access mechanisms through QEMU. If we disable the
mmap MemoryRegion, we make use of the IO MemoryRegion and service
accesses using pread and pwrite to the vfio device file descriptor.
If the mmap MemoryRegion is enabled and results in one of these
sub-page gaps, QEMU handles the access as RAM, using memcpy to the
mmap. Using either pread/pwrite or the mmap directly should be
correct, but using memcpy causes us problems. I expect that not only
does memcpy not necessarily honor the original width and alignment in
performing a copy, but it potentially also uses processor instructions
not intended for MMIO spaces. It turns out that this has been a
problem for Realtek NIC assignment, which has such a quirk that
creates a sub-page mmap MemoryRegion access.
To resolve this, we disable memory_access_is_direct() for ram_device
regions since QEMU assumes that it can use memcpy for those regions.
Instead we access through MemoryRegionOps, which replaces the memcpy
with simple de-references of standard sizes to the host memory.
With this patch we attempt to provide unrestricted access to the RAM
device, allowing byte through qword access as well as unaligned
access. The assumption here is that accesses initiated by the VM are
driven by a device specific driver, which knows the device
capabilities. If unaligned accesses are not supported by the device,
we don't want them to work in a VM by performing multiple aligned
accesses to compose the unaligned access. A down-side of this
philosophy is that the xp command from the monitor attempts to use
the largest available access weidth, unaware of the underlying
device. Using memcpy had this same restriction, but at least now an
operator can dump individual registers, even if blocks of device
memory may result in access widths beyond the capabilities of a
given device (RTL NICs only support up to dword).
Reported-by: Thorsten Kohfeldt <thorsten.kohfeldt@gmx.de> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 4a2e242bbb306ef5c16ce9e7bb2da3bd8a4eb098) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Alex Williamson [Mon, 31 Oct 2016 15:53:03 +0000 (09:53 -0600)]
memory: Replace skip_dump flag with "ram_device"
Setting skip_dump on a MemoryRegion allows us to modify one specific
code path, but the restriction we're trying to address encompasses
more than that. If we have a RAM MemoryRegion backed by a physical
device, it not only restricts our ability to dump that region, but
also affects how we should manipulate it. Here we recognize that
MemoryRegions do not change to sometimes allow dumps and other times
not, so we replace setting the skip_dump flag with a new initializer
so that we know exactly the type of region to which we're applying
this behavior.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 21e00fa55f3fdfcbb20da7c6876c91ef3609b387) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
net: rtl8139: limit processing of ring descriptors
RTL8139 ethernet controller in C+ mode supports multiple
descriptor rings, each with maximum of 64 descriptors. While
processing transmit descriptor ring in 'rtl8139_cplus_transmit',
it does not limit the descriptor count and runs forever. Add
check to avoid it.
Reported-by: Andrew Henderson <hendersa@icculus.org> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit c7c35916692fe010fef25ac338443d3fe40be225) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Alberto Garcia [Mon, 17 Oct 2016 15:46:03 +0000 (18:46 +0300)]
qemu-iotests: Test I/O in a single drive from a throttling group
iotest 093 contains a test that creates a throttling group with
several drives and performs I/O in all of them. This patch adds a new
test that creates a similar setup but only performs I/O in one of the
drives at the same time.
This is useful to test that the round robin algorithm is behaving
properly in these scenarios, and is specifically written using the
regression introduced in 27ccdd52598290f0f8b58be56e as an example.
Signed-off-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit a26ddb43963e77aeebc2a4f011d27b2d9c017f21) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Alberto Garcia [Mon, 17 Oct 2016 15:46:02 +0000 (18:46 +0300)]
throttle: Correct access to wrong BlockBackendPublic structures
In 27ccdd52598290f0f8b58be56e235aff7aebfaf3 the throttling fields were
moved from BlockDriverState to BlockBackend. However in a few cases
the code started using throttling fields from the active BlockBackend
instead of the round-robin token, making the algorithm behave
incorrectly.
This can cause starvation if there's a throttling group with several
drives but only one of them has I/O.
Cc: qemu-stable@nongnu.org Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 6bf77e1c2dc24da1bade16e8a9a637f3b127314d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Thomas Huth [Wed, 21 Sep 2016 09:42:15 +0000 (11:42 +0200)]
ppc/kvm: Mark 64kB page size support as disabled if not available
QEMU currently refuses to start with KVM-PR and only prints out
qemu: fatal: Unknown MMU model 851972
when being started there. This is because commit 4322e8ced5aaac719
("ppc: Fix 64K pages support in full emulation") introduced a new
POWERPC_MMU_64K bit to indicate support for this page size, but
it never gets cleared on KVM-PR if the host kernel does not support
this. Thus we've got to turn off this bit in the mmu_model for KVM-PR.
Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 0d594f5565837fe2886a8aa307ef8abb65eab8f7) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
tests/test-qmp-input-strict: Cover missing struct members
These tests would have caught the bug fixed by the previous commit.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <1475594630-24758-1-git-send-email-armbru@redhat.com>
(cherry picked from commit bce3035a44c40bd3ec29d3162025fd350f2d8dbf) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
qapi: Fix crash when 'any' or 'null' parameter is missing
Unlike the other visit methods, visit_type_any() and visit_type_null()
neglect to check whether qmp_input_get_object() succeeded. They crash
when it fails. Reproducer:
Since commit ad739706bbadee49, user_creatable_add_type() expects to be
given a qdict. However, if object-add is called without props, you reach
the assert: "qemu/qom/object_interfaces.c:115: user_creatable_add_type:
Assertion `qdict' failed.", because the qdict isn't created in this
case (it's optional).
Furthermore, qmp_input_visitor_new() is not meant to be called without a
dict, and a further commit will assert in this situation.
If none given, create an empty qdict in qmp to avoid the
user_creatable_add_type() assert(qdict).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20160922203927.28241-2-marcandre.lureau@redhat.com> Tested-by: Xiao Long Jiang <zxiaol@linux.vnet.ibm.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
(cherry picked from commit e64c75a9752c5d0fd64eb2e684c656a5ea7d03c6) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
char: fix missing return in error path for chardev TLS init
If the qio_channel_tls_new_(server|client) methods fail,
we disconnect the client. Unfortunately a missing return
means we then go on to try and run the TLS handshake on
a NULL I/O channel. This gives predictably segfaulty
results.
The main way to trigger this is to request a bogus TLS
priority string for the TLS credentials. e.g.
Most other ways appear impossible to trigger except
perhaps if OOM conditions cause gnutls initialization
to fail.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(cherry picked from commit 660a2d83e026496db6b3eaec2256a2cdd6c74de8) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Emilio G. Cota [Wed, 5 Oct 2016 22:34:39 +0000 (18:34 -0400)]
qht: fix unlock-after-free segfault upon resizing
The old map's bucket locks are being unlocked *after*
that same old map has been passed to RCU for destruction.
This is a bug that can cause a segfault, since there's
no guarantee that the deletion will be deferred (e.g.
there may be no concurrent readers).
The segfault is easily triggered in RHEL6/CentOS6 with qht-test,
particularly on a single-core system or by pinning qht-test
to a single core.
Fix it by unlocking the map's bucket locks right after having
published the new map, and (crucially) before marking the map
for deletion via call_rcu().
While at it, expand qht_do_resize() to atomically do (1) a reset,
(2) a resize, or (3) a reset+resize. This simplifies the calling
code, since the new function (qht_do_resize_reset()) acquires
and releases the buckets' locks.
Note that no qht_do_reset inline is provided, since it would have
no users--qht_reset() already performs a reset without taking
ht->lock.
Reported-by: Peter Maydell <peter.maydell@linaro.org> Reported-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1475706880-10667-3-git-send-email-cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 76b553b308dc8671eb672b889b38889b1231cf1e) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Emilio G. Cota [Wed, 5 Oct 2016 22:34:38 +0000 (18:34 -0400)]
qht: simplify qht_reset_size
Sometimes gcc doesn't pick up the fact that 'new' is properly
set if 'resize == true', which may generate an unnecessary
build warning.
Fix it by removing 'resize' and directly checking that 'new'
is non-NULL.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1475706880-10667-2-git-send-email-cota@braap.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit f555a9d0b3c785b698f32e6879e97d0a4b387314) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eric Blake [Fri, 9 Sep 2016 03:14:14 +0000 (22:14 -0500)]
migrate: Fix cpu-throttle-increment regression in HMP
Commit 69ef1f3 accidentally broke migrate_set_parameter's ability
to set the cpu-throttle-increment to anything other than the
default, because it forgot to parse the user's string into an
integer.
CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
(cherry picked from commit bb2b777cf9a2862fe31a40256659ff49ae3d2006) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
John Snow [Fri, 23 Sep 2016 01:45:52 +0000 (21:45 -0400)]
block-backend: remove blk_flush_all
We can teach Xen to drain and flush each device as it needs to, instead
of trying to flush ALL devices. This removes the last user of
blk_flush_all.
The function is therefore removed under the premise that any new uses
of blk_flush_all would be the wrong paradigm: either flush the single
device that requires flushing, or use an appropriate flush_all mechanism
from outside of the BlkBackend layer.
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 49137bf6845eaecad51a047fc06dd11c56118460) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
John Snow [Fri, 23 Sep 2016 01:45:51 +0000 (21:45 -0400)]
qemu: use bdrv_flush_all for vm_stop et al
Reimplement bdrv_flush_all for vm_stop. In contrast to blk_flush_all,
bdrv_flush_all does not have device model restrictions. This allows
us to flush and halt unconditionally without error.
This allows us to do things like migrate when we have a device with
an open tray, but has a node that may need to be flushed, or nodes
that aren't currently attached to any device and need to be flushed.
Specifically, this allows us to migrate when we have a CDROM with
an open tray.
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 22af08eacf6b5aa0e6c0581e547380b3eb4f95e9)
Conflicts:
cpus.c
John Snow [Fri, 23 Sep 2016 01:45:50 +0000 (21:45 -0400)]
block: reintroduce bdrv_flush_all
Commit fe1a9cbc moved the flush_all routine from the bdrv layer to the
block-backend layer. In doing so, however, the semantics of the routine
changed slightly such that flush_all now used blk_flush instead of
bdrv_flush.
blk_flush can fail if the attached device model reports that it is not
"available," (i.e. the tray is open.) This changed the semantics of
flush_all such that it can now fail for e.g. open CDROM drives.
Reintroduce bdrv_flush_all to regain the old semantics without having to
alter the behavior of blk_flush or blk_flush_all, which are already
'doing the right thing.'
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit 4085f5c7a239567a292876f46cb59d9b19bcf6ac) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Eric Blake [Wed, 7 Sep 2016 21:27:20 +0000 (16:27 -0500)]
iscsi: Fix divide-by-zero regression on raw SG devices
When qemu uses iscsi devices in sg mode, iscsilun->block_size
is left at 0. Prior to commits cf081fca and similar, when
block limits were tracked in sectors, this did not matter:
various block limits were just left at 0. But when we started
scaling by block size, this caused SIGFPE.
Then, in a later patch, commit a5b8dd2c added an assertion to
bdrv_open_common() that request_alignment is always non-zero;
which was not true for SG mode. Rather than relax that assertion,
we can just provide a sane value (we don't know of any SG device
with a block size smaller than qemu's default sizing of 512 bytes).
One possible solution for SG mode is to just blindly skip ALL
of iscsi_refresh_limits(), since we already short circuit so
many other things in sg mode. But this patch takes a slightly
more conservative approach, and merely guarantees that scaling
will succeed, while still using multiples of the original size
where possible. Resulting limits may still be zero in SG mode
(that is, we mostly only fix block_size used as a denominator
or which affect assertions, not all uses).
Reported-by: Holger Schranz <holger@fam-schranz.de> Signed-off-by: Eric Blake <eblake@redhat.com> CC: qemu-stable@nongnu.org
Message-Id: <1473283640-15756-1-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 95eaa78537c734fa3cb3373d47ba8c0099a36ff0) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
The copy_sectors() code was originally using the 'sector'
parameter for encryption, which was passed in by the caller
from the QCowL2Meta.offset field (aka the guest logical
offset).
After the change, the code is using 'cluster_offset' which
was passed in from QCow2L2Meta.alloc_offset field (aka the
host physical offset).
This would cause the data to be encrypted using an incorrect
initialization vector which will in turn cause later reads
to return garbage.
Although current qcow2 built-in encryption is blocked from
usage in the emulator, one could still hit this if writing
to the file via qemu-{img,io,nbd} commands.
Cc: qemu-stable@nongnu.org Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit bb9f8dd0e15a9744b8d09d06ecb6a18ca3dcc173)
Conflicts:
tests/qemu-iotests/group
* drop context dependancy on non-2.7 iotest groups
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
David Gibson [Thu, 15 Sep 2016 06:11:48 +0000 (16:11 +1000)]
vfio/pci: Fix regression in MSI routing configuration
d1f6af6 "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup
of kvmchip routing configuration, that was mostly intended for x86.
However, it also contains a subtle change in behaviour which breaks EEH[1]
error recovery on certain VFIO passthrough devices on spapr guests. So far
it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be
other hardware with the same problem. It's also possible there could be
circumstances where it causes a bug on x86 as well, though I don't know of
any obvious candidates.
Prior to d1f6af6, both vfio_msix_vector_do_use() and
vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this
as the "dummy" vector used to make the host hardware state sync with the
guest expected hardware state in terms of MSI configuration.
Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op,
meaning the dummy irq would always be delivered via qemu. d1f6af6 changed
vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg
parameter, and determines the correct message itself. The test for !msg
was removed, and not replaced with anything there or in the caller.
With an spapr guest which has a VFIO device, if an EEH error occurs on the
host hardware, then the device will be isolated then reset. This is a
combination of host and guest action, mediated by some EEH related
hypercalls. I haven't fully traced the mechanics, but somehow installing
the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH
reset and recovery, at least some irqs are no longer delivered to the
guest.
In particular, the guest never gets the link up event, and so the NIC is
effectively dead.
[1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-*
error reporting and recovery mechanism. The concept is somewhat
similar to PCI-E AER, but the details are different.
Cornelia Huck [Mon, 15 Aug 2016 09:10:28 +0000 (11:10 +0200)]
s390x/css: handle cssid 255 correctly
The cssid 255 is reserved but still valid from an architectural
point of view. However, feeding a bogus schid of 0xffffffff into
the virtio hypercall will lead to a crash:
John Snow [Mon, 26 Sep 2016 18:33:37 +0000 (14:33 -0400)]
ahci: clear aiocb in ncq_cb
Similar to existing fixes for IDE (87ac25fd) and ATAPI (7f951b2d), the
AIOCB must be cleared in the callback. Otherwise, we may accidentally
try to reset a dangling pointer in bdrv_aio_cancel() from a port reset.
Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1474575040-32079-2-git-send-email-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
(cherry picked from commit df403bc58859c893ebd0accda07678e84d15dc5d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Even if tray is not open, it can be empty (blk_is_inserted() == false).
Handle both cases correctly by replacing the s->tray_open checks with
blk_is_available(), which is an AND of the two.
Also simplify successive checks of them into blk_is_available(), in a
couple cases.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1473848224-24809-2-git-send-email-famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cd723b85601baa7a0eeffbac83421357a70d81ee) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Right after main_loop ends, we release various things but keep iothread
alive. The latter is not prepared to the sudden change of resources.
Specifically, after bdrv_close_all(), virtio-scsi dataplane get a
surprise at the empty BlockBackend:
(gdb) bt
at /usr/src/debug/qemu-2.6.0/hw/scsi/virtio-scsi.c:543
at /usr/src/debug/qemu-2.6.0/hw/scsi/virtio-scsi.c:577
It is because the d->conf.blk->root is set to NULL, then
blk_get_aio_context() returns qemu_aio_context, whereas s->ctx is still
pointing to the iothread:
hw/scsi/virtio-scsi.c:543:
if (s->dataplane_started) {
assert(blk_get_aio_context(d->conf.blk) == s->ctx);
}
To fix this, let's stop iothreads before doing bdrv_close_all().
Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1473326931-9699-1-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit dce8921b2baaf95974af8176406881872067adfa) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
crypto: ensure XTS is only used with ciphers with 16 byte blocks
The XTS cipher mode needs to be used with a cipher which has
a block size of 16 bytes. If a mis-matching block size is used,
the code will either corrupt memory beyond the IV array, or
not fully encrypt/decrypt the IV.
This fixes a memory corruption crash when attempting to use
cast5-128 with xts, since the former has an 8 byte block size.
A test case is added to ensure the cipher creation fails with
such an invalid combination.
Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
(cherry picked from commit a5d2f44d0d3e7523670e103a8c37faed29ff2b76) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Paolo Bonzini [Mon, 29 Aug 2016 09:35:37 +0000 (11:35 +0200)]
scsi: mptconfig: fix misuse of MPTSAS_CONFIG_PACK
These issues cause respectively a QEMU crash and a leak of 2 bytes of
stack. They were discovered by VictorV of 360 Marvel Team.
Reported-by: Tom Victor <i-tangtianwen@360.cm> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 65a8e1f6413a0f6f79894da710b5d6d43361d27d) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
When LSI SAS1068 Host Bus emulator builds configuration page
headers, mptsas_config_pack() should assert that the size
fits in a byte. However, the size is expressed in 32-bit
units, so up to 1020 bytes fit. The assertion was only
allowing replies up to 252 bytes, so fix it.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1472645167-30765-2-git-send-email-ppandit@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cf2bce203a45d7437029d108357fb23fea0967b6) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
vmw_pvscsi: check page count while initialising descriptor rings
Vmware Paravirtual SCSI emulation uses command descriptors to
process SCSI commands. These descriptors come with their ring
buffers. A guest could set the page count for these rings to
an arbitrary value, leading to infinite loop or OOB access.
Add check to avoid it.
Reported-by: Tom Victor <vv474172261@gmail.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1472626169-12989-1-git-send-email-ppandit@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 7f61f4690dd153be98900a2a508b88989e692753) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Rony Weng [Mon, 29 Aug 2016 07:52:18 +0000 (15:52 +0800)]
scsi-disk: change disk serial length from 20 to 36
Openstack Cinder assigns volume a 36 characters uuid as serial.
QEMU will shrinks the uuid to 20 characters, which does not match
the original uuid.
Note that there is no limit to the length of the serial number in
the SCSI spec. 20 was copy-pasted from virtio-blk which in turn was
copy-pasted from ATA; 36 is even more arbitrary. However, bumping it
up too much might cause issues (e.g. 252 seems to make sense because
then the maximum amount of returned data is 256; but who knows there's
no off-by-one somewhere for such a nicely rounded number).
Signed-off-by: Rony Weng <ronyweng@synology.com>
Message-Id: <1472457138-23386-1-git-send-email-ronyweng@synology.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 48b6206305b8d56524ac2ee347b68e6e0a528559) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Lin Ma [Wed, 14 Sep 2016 06:22:50 +0000 (14:22 +0800)]
qemu-char: avoid segfault if user lacks of permisson of a given logfile
Function qemu_chr_alloc returns NULL if it failed to open logfile by any reason,
says no write permission. For backends tty, stdio and msmouse, They need to
check this return value to avoid segfault in this case.
Signed-off-by: Lin Ma <lma@suse.com> Cc: qemu-stable <qemu-stable@nongnu.org>
Message-Id: <20160914062250.22226-1-lma@suse.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 71200fb9664c2967a1cdd22b68b0da3a8b2b3eb7) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Vmware Paravirtual SCSI emulator while processing IO requests
could run into an infinite loop if 'pvscsi_ring_pop_req_descr'
always returned positive value. Limit IO loop to the ring size.
Cc: qemu-stable@nongnu.org Reported-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1473845952-30785-1-git-send-email-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d251157ac1928191af851d199a9ff255d330bec9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Li Qiang [Mon, 12 Sep 2016 12:44:11 +0000 (18:14 +0530)]
scsi: mptsas: use g_new0 to allocate MPTSASRequest object
When processing IO request in mptsas, it uses g_new to allocate
a 'req' object. If an error occurs before 'req->sreq' is
allocated, It could lead to an OOB write in mptsas_free_request
function. Use g_new0 to avoid it.
Reported-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1473684251-17476-1-git-send-email-ppandit@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 670e56d3ed2918b3861d9216f2c0540d9e9ae0d5) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Greg Kurz [Fri, 16 Sep 2016 09:44:49 +0000 (11:44 +0200)]
9pfs: fix potential segfault during walk
If the call to fid_to_qid() returns an error, we will call v9fs_path_free()
on uninitialized paths.
It is a regression introduced by the following commit:
56f101ecce0e 9pfs: handle walk of ".." in the root directory
Let's fix this by initializing dpath and path before calling fid_to_qid().
Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org>
[groug: updated the changelog to indicate this is regression and to provide
the offending commit SHA1] Signed-off-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 13fd08e631ec0c3ff5ad1bdcb6a4474c7d9a024f) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
virtio-balloon: discard virtqueue element on reset
The one pending element is being freed but not discarded on device
reset, which causes svq->inuse to creep up, eventually hitting the
"Virtqueue size exceeded" error.
Properly discarding the element on device reset makes sure that its
buffers are unmapped and the inuse counter stays balanced.
Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 104e70cae78bd4afd95d948c6aff188f10508a9c) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Stefan Hajnoczi [Wed, 7 Sep 2016 15:51:25 +0000 (11:51 -0400)]
virtio: zero vq->inuse in virtio_reset()
vq->inuse must be zeroed upon device reset like most other virtqueue
fields.
In theory, virtio_reset() just needs assert(vq->inuse == 0) since
devices must clean up in-flight requests during reset (requests cannot
not be leaked!).
In practice, it is difficult to achieve vq->inuse == 0 across reset
because balloon, blk, 9p, etc implement various different strategies for
cleaning up requests. Most devices call g_free(elem) directly without
telling virtio.c that the VirtQueueElement is cleaned up. Therefore
vq->inuse is not decremented during reset.
This patch zeroes vq->inuse and trusts that devices are not leaking
VirtQueueElements across reset.
I will send a follow-up series that refactors request life-cycle across
all devices and converts vq->inuse = 0 into assert(vq->inuse == 0) but
this more invasive approach is not appropriate for stable trees.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Cc: qemu-stable <qemu-stable@nongnu.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Ladi Prosek <lprosek@redhat.com>
(cherry picked from commit 4b7f91ed0270a371e1933efa21ba600b6da23ab9) Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Michael Roth [Wed, 2 Nov 2016 21:40:01 +0000 (16:40 -0500)]
Merge tag 'ppc-for-2.7-20161013' into stable-2.7-staging
qemu-2.7 (stable): ppc patch queue 2016-10-13
TCG for ppc does not properly implement hardware transactional memory.
It has a stub implementation in which transactions always fail.
Unfortunately in v2.7.0, HTM is advertised as being available to
guests, which means guests may incorrectly attempt to use it and hang.
This has been the case for a while, but has become more urgent with
recent (guest) Linux kernel versions which attempt to lazily enable
TM. Under TCG that now triggers the problem regularly, instead of
just when running a TM aware userspace program.
The problem is already fixed in the 2.8/master branch, by correctly
advertising HTM as not being available with TCG. This series
backports the relevant patches to the qemu-2.7 stable branch to fix
the problem there.
* tag 'ppc-for-2.7-20161013':
ppc: Check the availability of transactional memory
hw/ppc/spapr: Fix the selection of the processor features
hw/ppc/spapr: Move code related to "ibm,pa-features" to a separate function
linux-headers: update
Thomas Huth [Wed, 28 Sep 2016 11:16:30 +0000 (13:16 +0200)]
ppc: Check the availability of transactional memory
KVM-PR currently does not support transactional memory, and the
implementation in TCG is just a fake. We should not announce TM
support in the ibm,pa-features property when running on such a
system, so disable it by default and only enable it if the KVM
implementation supports it (i.e. recent versions of KVM-HV).
These changes are based on some earlier work from Anton Blanchard
(thanks!).
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit bac3bf287ab60e264b636f5f00c116a19b655762)
Thomas Huth [Wed, 28 Sep 2016 11:16:29 +0000 (13:16 +0200)]
hw/ppc/spapr: Fix the selection of the processor features
The current code uses pa_features_206 for POWERPC_MMU_2_06, and
for everything else, it uses pa_features_207. This is bad in some
cases because there is also a "degraded" MMU version of ISA 2.06,
called POWERPC_MMU_2_06a, which should of course use the flags for
2.06 instead. And there is also the possibility that the user runs
the pseries machine with a POWER5+ or even 970 processor. In that
case we certainly do not want to set the flags for 2.07, and rather
simply skip the setting of the pa-features property instead.
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4cbec30d769a73853b60dc7f275e6e7da9ab5162)
Thomas Huth [Wed, 28 Sep 2016 11:16:28 +0000 (13:16 +0200)]
hw/ppc/spapr: Move code related to "ibm,pa-features" to a separate function
The function spapr_populate_cpu_dt() has become quite big
already, and since we likely have to extend the pa-features
property for every new processor generation, it is nicer
if we put the related code into a separate function.
Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 230bf719d3a3b144a4ffa441e5d6170ef0ad8999)
Greg Kurz [Tue, 30 Aug 2016 15:02:27 +0000 (17:02 +0200)]
9pfs: handle walk of ".." in the root directory
The 9P spec at http://man.cat-v.org/plan_9/5/intro says:
All directories must support walks to the directory .. (dot-dot) meaning
parent directory, although by convention directories contain no explicit
entry for .. or . (dot). The parent of the root directory of a server's
tree is itself.
This means that a client cannot walk further than the root directory
exported by the server. In other words, if the client wants to walk
"/.." or "/foo/../..", the server should answer like the request was
to walk "/".
This patch just does that:
- we cache the QID of the root directory at attach time
- during the walk we compare the QID of each path component with the root
QID to detect if we're in a "/.." situation
- if so, we skip the current component and go to the next one
Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The unlinkat request also gets patched for consistency (even if
rmdir("foo/..") is expected to fail according to POSIX.1-2001).
The various error values come from the linux manual pages.
Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Greg Kurz [Tue, 30 Aug 2016 17:11:05 +0000 (19:11 +0200)]
9pfs: forbid illegal path names
Empty path components don't make sense for most commands and may cause
undefined behavior, depending on the backend.
Also, the walk request described in the 9P spec [1] clearly shows that
the client is supposed to send individual path components: the official
linux client never sends portions of path containing the / character for
example.
Moreover, the 9P spec [2] also states that a system can decide to restrict
the set of supported characters used in path components, with an explicit
mention "to remove slashes from name components".
This patch introduces a new name_is_illegal() helper that checks the
names sent by the client are not empty and don't contain unwanted chars.
Since 9pfs is only supported on linux hosts, only the / character is
checked at the moment. When support for other hosts (AKA. win32) is added,
other chars may need to be blacklisted as well.
If a client sends an illegal path component, the request will fail and
ENOENT is returned to the client.
Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Paolo Bonzini [Tue, 30 Aug 2016 12:04:12 +0000 (14:04 +0200)]
Revert "Change net/socket.c to use socket_*() functions"
Since commit 7e8449594c929, the socket connect code is blocking, because
calling socket_connect() without callback is blocking. This reverts the
commit.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
translate: early exit in tb_flush if there is no tcg
tb_flush does all kind of things, which are very tcg specific. As it
is called from some places even for KVM (e.g. gdb server) it is better
to detect these cases and do an early exit.
This also fixes a crash in the gdb server that was triggered by
commit 909eaac9bbc2 ("tb hash: track translated blocks with qht").
Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Richard Henderson <rth@twiddle.net> Reported-by: Brent Baccala <cosine@freesoft.org> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-id: 1472148686-39841-1-git-send-email-borntraeger@de.ibm.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
vnc: only alloc server surface with clients connected
the VNC server was changed so that the 'vd->server' pixman
image was only allocated when a client is connected.
Since then if a client disconnects and then reconnects to
the VNC server all they will see is a black screen until
they do something that triggers a refresh. On a graphical
desktop this is not often noticed since there's many things
going on which cause a refresh. On a plain text console it
is really obvious since nothing refreshes frequently.
The problem is that the VNC server didn't update the guest
dirty bitmap, so still believes its server image is in sync
with the guest contents.
To fix this we must explicitly mark the entire guest desktop
as dirty after re-creating the server surface. Move this
logic into vnc_update_server_surface() so it is guaranteed
to be call in all code paths that re-create the surface
instead of only in vnc_dpy_switch()
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de> Tested-by: Peter Lieven <pl@kamp.de>
Message-id: 1471365032-18096-1-git-send-email-berrange@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Must include "qemu-version.h" for the QEMU_PKGVERSION definition.
Signed-off-by: Ed Maste <emaste@freebsd.org>
Message-id: 1471877833-52343-1-git-send-email-emaste@freebsd.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Stefan Hajnoczi [Mon, 15 Aug 2016 12:54:16 +0000 (13:54 +0100)]
virtio: decrement vq->inuse in virtqueue_discard()
virtqueue_discard() moves vq->last_avail_idx back so the element can be
popped again. It's necessary to decrement vq->inuse to avoid "leaking"
the element count.
Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Stefan Hajnoczi [Mon, 15 Aug 2016 12:54:15 +0000 (13:54 +0100)]
virtio: recalculate vq->inuse after migration
The vq->inuse field is not migrated. Many devices don't hold
VirtQueueElements across migration so it doesn't matter that vq->inuse
starts at 0 on the destination QEMU.
At least virtio-serial, virtio-blk, and virtio-balloon migrate while
holding VirtQueueElements. For these devices we need to recalculate
vq->inuse upon load so the value is correct.
Cc: qemu-stable@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Peter Maydell [Mon, 22 Aug 2016 09:02:28 +0000 (10:02 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Mon 22 Aug 2016 09:06:32 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
e1000e: remove internal interrupt flag
slirp: fix segv when init failed
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cao jin [Thu, 18 Aug 2016 14:15:54 +0000 (22:15 +0800)]
e1000e: remove internal interrupt flag
Commit 66bf7d58 removed internal msi state flag E1000E_USE_MSI, E1000E_USE_MSIX
is not necessary too, remove it now. And interrupt flag field intr_state also
can be removed now.
CC: Dmitry Fleytman <dmitry@daynix.com> CC: Jason Wang <jasowang@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Marcel Apfelbaum <marcel@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Dmitry Fleytman <dmitry@daynix.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Since commit f6c2e66ae8c8a, slirp uses an exit notifier to call
slirp_smb_cleanup. However, if init() failed, the notifier isn't added,
and removing it will fail:
==18447== Invalid write of size 8
==18447== at 0x7EF2B5: notifier_remove (notify.c:32)
==18447== by 0x48E80C: qemu_remove_exit_notifier (vl.c:2661)
==18447== by 0x6A2187: net_slirp_cleanup (slirp.c:134)
==18447== by 0x69419D: qemu_cleanup_net_client (net.c:338)
==18447== by 0x69445B: qemu_del_net_client (net.c:401)
==18447== by 0x6A2B81: net_slirp_init (slirp.c:366)
==18447== by 0x6A4241: net_init_slirp (slirp.c:865)
==18447== by 0x695C6D: net_client_init1 (net.c:1051)
==18447== by 0x695F6E: net_client_init (net.c:1108)
==18447== by 0x696DBA: net_init_netdev (net.c:1498)
==18447== by 0x7F1F99: qemu_opts_foreach (qemu-option.c:1116)
==18447== by 0x696E60: net_init_clients (net.c:1516)
==18447== Address 0x0 is not stack'd, malloc'd or (recently) free'd
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Sascha Silbe [Thu, 18 Aug 2016 18:46:03 +0000 (20:46 +0200)]
test-logging: don't hard-code paths in /tmp
Since f6880b7f [qemu-log: support simple pid substitution for logs],
test-logging creates files with hard-coded names in /tmp. In the best
case, this prevents multiple developers from running "make check" on
the same machine. In the worst case, it allows for symlink attacks,
enabling an attacker to overwrite files that are writable to the
developer running "make check".
Instead of hard-coding the paths, create a temporary directory using
g_dir_make_tmp() and clean it up afterwards.
Fixes: f6880b7f ("qemu-log: support simple pid substitution for logs") Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1471545963-11720-3-git-send-email-silbe@linux.vnet.ibm.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Sascha Silbe [Thu, 18 Aug 2016 18:46:02 +0000 (20:46 +0200)]
glib: add compatibility implementation for g_dir_make_tmp()
We're going to make use of g_dir_make_tmp() in test-logging. Provide a
compatibility implementation of it for glib < 2.30.
May behave differently in some edge cases (e.g. pattern only at the
end of the template, the file name is not part of the error message),
but good enough in practice.
Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1471545963-11720-2-git-send-email-silbe@linux.vnet.ibm.com
[PMM: removed variable "template" which caused compilation failures
when C++ files include glib-compat.h] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Michal Privoznik [Fri, 19 Aug 2016 08:06:40 +0000 (10:06 +0200)]
syscall.c: Redefine IFLA_* enums
In 9c37146782 I've tried to fix a broken build with older
linux-headers. However, I didn't do it properly. The solution
implemented here is to grab the enums that caused the problem
initially, and rename their values so that they are "QEMU_"
prefixed. In order to guarantee matching values with actual
enums from linux-headers, the enums are seeded with starting
values from the original enums.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-id: 75c14d6e8a97c4ff3931d69c13eab7376968d8b4.1471593869.git.mprivozn@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Denis V. Lunev [Wed, 17 Aug 2016 18:06:54 +0000 (21:06 +0300)]
block: fix possible reorder of flush operations
This patch reduce CPU usage of flush operations a bit. When we have one
flush completed we should kick only next operation. We should not start
all pending operations in the hope that they will go back to wait on
wait_queue.
Also there is a technical possibility that requests will get reordered
with the previous approach. After wakeup all requests are removed from
the wait queue. They become active and they are processed one-by-one
adding to the wait queue in the same order. Though new flush can arrive
while all requests are not put into the queue.
Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Message-id: 1471457214-3994-3-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Evgeny Yakovlev [Wed, 17 Aug 2016 18:06:53 +0000 (21:06 +0300)]
block: fix deadlock in bdrv_co_flush
The following commit
commit 3ff2f67a7c24183fcbcfe1332e5223ac6f96438c
Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Date: Mon Jul 18 22:39:52 2016 +0300
block: ignore flush requests when storage is clean
has introduced a regression.
There is a problem that it is still possible for 2 requests to execute
in non sequential fashion and sometimes this results in a deadlock
when bdrv_drain_one/all are called for BDS with such stalled requests.
1. Current flushed_gen and flush_started_gen is 1.
2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same
as flushed_gen). It gets past flushed_gen != flush_started_gen and
sets flush_started_gen to 1 (again, the same it was before).
3. Request 1 yields somewhere before exiting bdrv_co_flush
4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past
flushed_gen != flush_started_gen and sets flush_started_gen to 2.
5. Request 2 runs to completion and sets flushed_gen to 2
6. Request 1 is resumed, runs to completion and sets flushed_gen to 1.
However flush_started_gen is now 2.
From here on out flushed_gen is always != to flush_started_gen and all
further requests will wait on flush_queue. This change replaces
flush_started_gen with an explicitly tracked active flush request.
Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org>
Message-id: 1471457214-3994-2-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Peter Maydell [Thu, 18 Aug 2016 09:56:40 +0000 (10:56 +0100)]
Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging
# gpg: Signature made Thu 18 Aug 2016 06:36:16 BST
# gpg: using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F 3562 EF04 965B 398D 6211
* remotes/jasowang/tags/net-pull-request:
net/net: properly handle multiple packets in net_fill_rstate()
net: vmxnet: use g_new for pkt initialisation
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Thu, 18 Aug 2016 09:27:57 +0000 (10:27 +0100)]
Merge remote-tracking branch 'remotes/famz/tags/docker-pull-request' into staging
Fix 'make docker-test-mingw@fedora'
Peter,
This is the single patch that stalls patchew's mingw testing. Since it
is small and trivial, let's have it in 2.7.
Fam
# gpg: Signature made Wed 17 Aug 2016 13:13:53 BST
# gpg: using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021 AD56 CA35 624C 6A91 71C6
* remotes/famz/tags/docker-pull-request:
curl: Cast fd to int for DPRINTF
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Zhang Chen [Thu, 18 Aug 2016 03:23:25 +0000 (11:23 +0800)]
net/net: properly handle multiple packets in net_fill_rstate()
When network is busy, we will receive multiple packets at one time. In
that situation, we should keep trying to do the receiving instead of
finalizing only the first packet.
Signed-off-by: Zhang Chen <zhangchen.fnst@cn.fujitsu.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Li Qiang [Tue, 16 Aug 2016 11:28:01 +0000 (16:58 +0530)]
net: vmxnet: use g_new for pkt initialisation
When network transport abstraction layer initialises pkt, the maximum
fragmentation count is not checked. This could lead to an integer
overflow causing a NULL pointer dereference. Replace g_malloc() with
g_new() to catch the multiplication overflow.
Reported-by: Li Qiang <liqiang6-s@360.cn> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Acked-by: Dmitry Fleytman <dmitry@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Fam Zheng [Mon, 1 Aug 2016 05:04:48 +0000 (13:04 +0800)]
curl: Cast fd to int for DPRINTF
Currently "make docker-test-mingw@fedora" has a warning like:
/tmp/qemu-test/src/block/curl.c: In function 'curl_sock_cb':
/tmp/qemu-test/src/block/curl.c:172:6: warning: format '%d' expects
argument of type 'int', but argument 4 has type 'curl_socket_t {aka long
long unsigned int}'
DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, fd);
^
cc1: all warnings being treated as errors
Cast to int to suppress it.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1470027888-24381-1-git-send-email-famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
Peter Maydell [Thu, 11 Aug 2016 17:59:39 +0000 (18:59 +0100)]
linux-user: Fix llseek with high bit of offset_low set
The llseek syscall takes two 32-bit arguments, offset_high
and offset_low, which must be combined to form a single
64-bit offset. Unfortunately we were combining them with
(uint64_t)arg2 << 32) | arg3
and arg3 is a signed type; this meant that when promoting
arg3 to a 64-bit type it would be sign-extended. The effect
was that if the offset happened to have bit 31 set then
this bit would get sign-extended into all of bits 63..32.
Explicitly cast arg3 to abi_ulong to avoid the erroneous
sign extension.
Reported-by: Chanho Park <parkch98@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Tested-by: Chanho Park <parkch98@gmail.com>
Message-id: 1470938379-1133-1-git-send-email-peter.maydell@linaro.org
Michal Privoznik [Tue, 16 Aug 2016 09:47:43 +0000 (11:47 +0200)]
syscall.c: Fix build with older linux-headers
In c5dff280 we tried to make us understand netlink messages more.
So we've added a code that does some translation. However, the
code assumed linux-headers to be at least version 4.4 of it
because most of the symbols there (if not all of them) were added
in just that release. This, however, breaks build on systems with
older versions of the package.
Thomas Huth [Mon, 15 Aug 2016 08:24:54 +0000 (10:24 +0200)]
slirp: Rename "struct arphdr" to "struct slirp_arphdr"
struct arphdr is already used by the system headers on OpenBSD
and thus QEMU does not compile here anymore. Fix it by renaming
our struct to slirp_arphdr instead.
Reported-by: Brad Smith Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-id: 1471249494-17392-1-git-send-email-thuth@redhat.com Buglink: https://bugs.launchpad.net/qemu/+bug/1613133 Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Since commit d7a04fd7d5008, tcp_chr_wait_connected() was introduced,
so vhost-user could wait until a backend started successfully. In
vhost-user case, the chr socket must be plain unix, and the chr+vhost
setup happens synchronously during qemu startup.
However, with TLS and telnet socket, initial socket setup happens
asynchronously, and s->connected is not set after the socket is
accepted. In order for tcp_chr_wait_connected() to not keep accepting
new connections and proceed with the last accepted socket, it can
check for s->ioc instead.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 20160816083332.15088-1-marcandre.lureau@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The virtio-gpu.h file defines a macro VIRTIO_GPU_FILL_CMD
which includes a call to qemu_log_mask, but does not
include qemu/log.h. In a default configure, it is lucky
and gets qemu/log.h indirectly due to the 'log' trace
backend being enabled. If that trace backend is disabled
though, eg
./configure --enable-trace-backends=nop
Then the build will fail:
In file included from /home/berrange/src/virt/qemu/hw/display/virtio-gpu-3d.c:19:0:
/home/berrange/src/virt/qemu/hw/display/virtio-gpu-3d.c: In function ‘virgl_cmd_create_resource_2d’:
/home/berrange/src/virt/qemu/include/hw/virtio/virtio-gpu.h:138:13: error: implicit declaration of function ‘qemu_log_mask’ [-Werror=implicit-function-declaration]
qemu_log_mask(LOG_GUEST_ERROR, \
^
/home/berrange/src/virt/qemu/hw/display/virtio-gpu-3d.c:34:5: note: in expansion of macro ‘VIRTIO_GPU_FILL_CMD’
VIRTIO_GPU_FILL_CMD(c2d);
^~~~~~~~~~~~~~~~~~~
/home/berrange/src/virt/qemu/hw/display/virtio-gpu-3d.c:34:5: error: nested extern declaration of ‘qemu_log_mask’ [-Werror=nested-externs]
In file included from /home/berrange/src/virt/qemu/hw/display/virtio-gpu-3d.c:19:0:
/home/berrange/src/virt/qemu/include/hw/virtio/virtio-gpu.h:138:27: error: ‘LOG_GUEST_ERROR’ undeclared (first use in this function)
qemu_log_mask(LOG_GUEST_ERROR, \
[snip many more errors]
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1470648700-3474-1-git-send-email-berrange@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Tue, 16 Aug 2016 08:32:40 +0000 (09:32 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Block layer patches for 2.7.0-rc3
# gpg: Signature made Mon 15 Aug 2016 14:55:46 BST
# gpg: using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6
* remotes/kevin/tags/for-upstream:
iotests: Test case for wrong runtime option types
block/nbd: Store runtime option values
block/blkdebug: Store config filename
block/nbd: Use QemuOpts for runtime options
block/ssh: Use QemuOpts for runtime options
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 15 Aug 2016 20:48:03 +0000 (21:48 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160815' into staging
ppc patch queue for 2016-08-15
Just a single patch here, I hope this is the last ppc / spapr fix to
squeeze into qemu-2.7.
# gpg: Signature made Mon 15 Aug 2016 07:46:36 BST
# gpg: using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg: aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg: aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg: It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E 87DC 6C38 CACA 20D9 B392
* remotes/dgibson/tags/ppc-for-2.7-20160815:
ppc: parse cpu features once
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 15 Aug 2016 18:04:51 +0000 (19:04 +0100)]
Merge remote-tracking branch 'remotes/sstabellini/tags/xen-20160812-tag-2' into staging
Xen 2016/08/12, fixed commit message
# gpg: Signature made Sat 13 Aug 2016 00:39:09 BST
# gpg: using RSA key 0x894F8F4870E1AE90
# gpg: Good signature from "Stefano Stabellini <stefano.stabellini@eu.citrix.com>"
# Primary key fingerprint: D04E 33AB A51F 67BA 07D3 0AEA 894F 8F48 70E1 AE90
* remotes/sstabellini/tags/xen-20160812-tag-2:
xen: handle inbound migration of VMs without ioreq server pages
Xen: fix converity warning of xen_pt_config_init()
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Peter Maydell [Mon, 8 Aug 2016 16:11:28 +0000 (17:11 +0100)]
pc-bios/optionrom: Fix OpenBSD build with better detection of linker emulation
The various host OSes are irritatingly variable about the name
of the linker emulation we need to pass to ld's -m option to
build the i386 option ROMs. Instead of doing this via a
CONFIG ifdef, check in configure whether any of the emulation
names we know about will work and pass the right answer through
to the makefile. If we can't find one, we fall back to not trying
to build the option ROMs, in the same way we would for a non-x86
host platform.
This is in particular necessary to unbreak the build on OpenBSD,
since it wants a different answer to FreeBSD and we don't have
an existing CONFIG_ variable that distinguishes the two.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Sean Bruno <sbruno@freebsd.org>
Message-id: 1470672688-6754-1-git-send-email-peter.maydell@linaro.org
Max Reitz [Mon, 15 Aug 2016 13:29:27 +0000 (15:29 +0200)]
iotests: Test case for wrong runtime option types
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>