]> git.proxmox.com Git - mirror_qemu.git/log
mirror_qemu.git
7 years agoqdev: Use GList for global properties
Eduardo Habkost [Thu, 28 Jan 2016 14:22:35 +0000 (12:22 -0200)]
qdev: Use GList for global properties

If the same GlobalProperty struct is registered twice, the list
entry gets corrupted, making tqe_next points to itself, and
qdev_prop_set_globals() gets stuck in a loop. The bug can be
easily reproduced by running:

  $ qemu-system-x86_64 -rtc-td-hack -rtc-td-hack

Change global_props to use GList instead of queue.h, making the
code simpler and able to deal with properties being registered
twice.

Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
7 years agoMerge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160617' into staging
Peter Maydell [Fri, 17 Jun 2016 11:36:27 +0000 (12:36 +0100)]
Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160617' into staging

ppc patch queue for 2016-06-17

Here's the current accumulated set of spapr, ppc and related patches.
  * The big thing in here is CPU hotplug for spapr
    - This includes a number of acked generic changes adding new
      infrastructure for hotplugging cpu cores
  * A number of TCG bug fixes are also included
  * This adds a new testcase to make it harder to accidentally break
    Macintosh (and other openbios) platforms

# gpg: Signature made Fri 17 Jun 2016 07:35:29 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160617:
  spapr: implement query-hotpluggable-cpus callback
  hmp: Add 'info hotpluggable-cpus' HMP command
  QMP: Add query-hotpluggable-cpus
  spapr: CPU hot unplug support
  spapr: CPU hotplug support
  spapr: convert boot CPUs into CPU core devices
  spapr: Move spapr_cpu_init() to spapr_cpu_core.c
  spapr: Abstract CPU core device and type specific core devices
  qom: API to get instance_size of a type
  spapr_drc: Prevent detach racing against attach for CPU DR
  xics,xics_kvm: Handle CPU unplug correctly
  cpu: Abstract CPU core type
  qdev: hotplug: Introduce HotplugHandler.pre_plug() callback
  target-ppc: Fix rlwimi, rlwinm, rlwnm
  vfio: Fix broken EEH
  target-ppc: Bug in BookE wait instruction
  ppc / sparc: Add a tester for checking whether OpenBIOS runs successfully
  hw/ppc/spapr: Silence deprecation message in qtest mode

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
Peter Maydell [Fri, 17 Jun 2016 10:25:46 +0000 (11:25 +0100)]
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: new features, cleanups, fixes

Beginning of reconnect support for vhost-user.
Misc cleanups and fixes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 17 Jun 2016 01:28:39 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  MAINTAINERS: add Marcel to PCI
  msi_init: change return value to 0 on success
  fix some coding style problems
  pci core: assert ENOSPC when add capability
  test: start vhost-user reconnect test
  tests: append i386 tests
  vhost-net: save & restore vring enable state
  vhost-net: save & restore vhost-user acked features
  vhost-net: do not crash if backend is not present
  vhost-user: disconnect on start failure
  qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
  tests/vhost-user-bridge: workaround stale vring base
  tests/vhost-user-bridge: add client mode
  vhost-user: add ability to know vhost-user backend disconnection
  pci: fix pci_requester_id()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Conflicts:
tests/Makefile.include

7 years agospapr: implement query-hotpluggable-cpus callback
Igor Mammedov [Fri, 10 Jun 2016 00:59:08 +0000 (06:29 +0530)]
spapr: implement query-hotpluggable-cpus callback

It returns a list of present/possible to hotplug CPU
objects with a list of properties to use with
device_add.

in spapr case returned list would looks like:
-> { "execute": "query-hotpluggable-cpus" }
<- {"return": [
     { "props": { "core": 8 }, "type": "POWER8-spapr-cpu-core",
       "vcpus-count": 2 },
     { "props": { "core": 0 }, "type": "POWER8-spapr-cpu-core",
       "vcpus-count": 2,
       "qom-path": "/machine/unattached/device[0]"}
   ]}'

TODO:
  add 'node' property for core <-> numa node mapping

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agohmp: Add 'info hotpluggable-cpus' HMP command
Bharata B Rao [Fri, 10 Jun 2016 00:59:07 +0000 (06:29 +0530)]
hmp: Add 'info hotpluggable-cpus' HMP command

This is the HMP equivalent for QMP query-hotpluggable-cpus.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Fixed problem with printf formats on 32-bit host]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoQMP: Add query-hotpluggable-cpus
Igor Mammedov [Fri, 10 Jun 2016 00:59:06 +0000 (06:29 +0530)]
QMP: Add query-hotpluggable-cpus

It will allow mgmt to query present and hotpluggable CPU objects,
it is required from a target platform that wishes to support command
to implement and set MachineClass.query_hotpluggable_cpus callback,
which will return a list of possible CPU objects with options that
would be needed for hotplugging possible CPU objects.

There are:
'type': 'str' - QOM CPU object type for usage with device_add
'vcpus-count': 'int' - number of logical VCPU threads per
                        CPU object (mgmt needs to know)

and a set of optional fields that are to used for hotplugging a CPU
objects and would allows mgmt tools to know what/where it could be
hotplugged;
[node],[socket],[core],[thread]

For present CPUs there is a 'qom-path' field which would allow mgmt to
inspect whatever object/abstraction the target platform considers
as CPU object.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agospapr: CPU hot unplug support
Bharata B Rao [Fri, 10 Jun 2016 00:59:05 +0000 (06:29 +0530)]
spapr: CPU hot unplug support

Remove the CPU core device by removing the underlying CPU thread devices.
Hot removal of CPU for sPAPR guests is achieved by sending the hot unplug
notification to the guest. Release the vCPU object after CPU hot unplug so
that vCPU fd can be parked and reused.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agospapr: CPU hotplug support
Bharata B Rao [Fri, 10 Jun 2016 00:59:04 +0000 (06:29 +0530)]
spapr: CPU hotplug support

Set up device tree entries for the hotplugged CPU core and use the
exising RTAS event logging infrastructure to send CPU hotplug notification
to the guest.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agospapr: convert boot CPUs into CPU core devices
Bharata B Rao [Fri, 10 Jun 2016 00:59:03 +0000 (06:29 +0530)]
spapr: convert boot CPUs into CPU core devices

Introduce sPAPRMachineClass.dr_cpu_enabled to indicate support for
CPU core hotplug. Initialize boot time CPUs as core deivces and prevent
topologies that result in partially filled cores. Both of these are done
only if CPU core hotplug is supported.

Note: An unrelated change in the call to xics_system_init() is done
in this patch as it makes sense to use the local variable smt introduced
in this patch instead of kvmppc_smt_threads() call here.

TODO: We derive sPAPR core type by looking at -cpu <model>. However
we don't take care of "compat=" feature yet for boot time as well
as hotplug CPUs.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agospapr: Move spapr_cpu_init() to spapr_cpu_core.c
Bharata B Rao [Fri, 10 Jun 2016 00:59:02 +0000 (06:29 +0530)]
spapr: Move spapr_cpu_init() to spapr_cpu_core.c

Start consolidating CPU init related routines in spapr_cpu_core.c. As
part of this, move spapr_cpu_init() and its dependencies from spapr.c
to spapr_cpu_core.c

No functionality change in this patch.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Rename TIMEBASE_FREQ to SPAPR_TIMEBASE_FREQ, since it's now in a
 public(ish) header]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agospapr: Abstract CPU core device and type specific core devices
Bharata B Rao [Fri, 10 Jun 2016 00:59:01 +0000 (06:29 +0530)]
spapr: Abstract CPU core device and type specific core devices

Add sPAPR specific abastract CPU core device that is based on generic
CPU core device. Use this as base type to create sPAPR CPU specific core
devices.

TODO:
- Add core types for other remaining CPU types
- Handle CPU model alias correctly

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoqom: API to get instance_size of a type
Bharata B Rao [Fri, 10 Jun 2016 00:59:00 +0000 (06:29 +0530)]
qom: API to get instance_size of a type

Add an API object_type_get_size(const char *typename) that returns the
instance_size of the give typename.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agospapr_drc: Prevent detach racing against attach for CPU DR
Bharata B Rao [Thu, 12 May 2016 03:48:21 +0000 (09:18 +0530)]
spapr_drc: Prevent detach racing against attach for CPU DR

If a CPU is hot removed while hotplug of the same is still in progress,
the guest crashes. Prevent this by ensuring that detach is done only
after attach has completed.

The existing code already prevents such race for PCI hotplug. However
given that CPU is a logical DR unlike PCI and starts with ISOLATED
state, we need a logic that works for CPU too.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
               [Don't set awaiting_attach for PCI devices]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoxics,xics_kvm: Handle CPU unplug correctly
Bharata B Rao [Thu, 12 May 2016 03:48:20 +0000 (09:18 +0530)]
xics,xics_kvm: Handle CPU unplug correctly

XICS is setup for each CPU during initialization. Provide a routine
to undo the same when CPU is unplugged. While here, move ss->cs management
into xics from xics_kvm since there is nothing KVM specific in it.
Also ensure xics reset doesn't set irq for CPUs that are already unplugged.

This allows reboot of a VM that has undergone CPU hotplug and unplug
to work correctly.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agocpu: Abstract CPU core type
Bharata B Rao [Thu, 12 May 2016 03:48:16 +0000 (09:18 +0530)]
cpu: Abstract CPU core type

Add an abstract CPU core type that could be used by machines that want
to define and hotplug CPUs in core granularity.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
               [Integer core property]
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
[dwg: changed property names to 'core-id' and 'nr-threads']
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoqdev: hotplug: Introduce HotplugHandler.pre_plug() callback
Igor Mammedov [Thu, 12 May 2016 03:48:15 +0000 (09:18 +0530)]
qdev: hotplug: Introduce HotplugHandler.pre_plug() callback

pre_plug callback is to be called before device.realize() is executed.
This would allow to check/set device's properties from HotplugHandler.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agotarget-ppc: Fix rlwimi, rlwinm, rlwnm
Richard Henderson [Thu, 16 Jun 2016 19:04:04 +0000 (12:04 -0700)]
target-ppc: Fix rlwimi, rlwinm, rlwnm

In 63ae0915f8ec, I arranged to use a 32-bit rotate, without
considering the effect of a mask value that wraps around to
the high bits of the word.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agovfio: Fix broken EEH
Gavin Shan [Wed, 15 Jun 2016 04:28:27 +0000 (14:28 +1000)]
vfio: Fix broken EEH

vfio_eeh_container_op() is the backend that communicates with
host kernel to support EEH functionality in QEMU. However, the
functon should return the value from host kernel instead of 0
unconditionally.

dwg: Specifically the problem occurs for the handful of EEH
sub-operations which can return a non-zero, non-error result.

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
[dwg: clarification to commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agotarget-ppc: Bug in BookE wait instruction
Jakub Horak [Mon, 6 Jun 2016 08:47:28 +0000 (10:47 +0200)]
target-ppc: Bug in BookE wait instruction

Fixed bug in code generation for the PowerPC "wait" instruction. It
doesn't make sense to store a non-initialized register.

Signed-off-by: Jakub Horak <thement@ibawizard.net>
[dwg: revised commit message]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoppc / sparc: Add a tester for checking whether OpenBIOS runs successfully
Thomas Huth [Tue, 14 Jun 2016 13:57:56 +0000 (15:57 +0200)]
ppc / sparc: Add a tester for checking whether OpenBIOS runs successfully

Since the mac99 and g3beige PowerPC machines recently broke without
being noticed, it would be good to have a tester for "make check"
that detects such issues immediately. A simple way to test the firmware
of these machines is to use the "-prom-env" parameter of QEMU. This
parameter can be used to put some Forth code into the 'boot-command'
firmware variable which then can signal success to the tester by
writing a magic value to a known memory location. And since some of the
Sparc machines are also using OpenBIOS, they are now tested with this
prom-env-tester, too.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
[dwg: Removed sparc64, because it trips a TCG bug on 32-bit hosts]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoMAINTAINERS: add Marcel to PCI
Michael S. Tsirkin [Mon, 13 Jun 2016 20:06:32 +0000 (23:06 +0300)]
MAINTAINERS: add Marcel to PCI

Marcel is reviewing PCI patches anyway, things will
be easier if people remember to Cc him.

Cc: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agomsi_init: change return value to 0 on success
Cao jin [Fri, 10 Jun 2016 09:54:32 +0000 (17:54 +0800)]
msi_init: change return value to 0 on success

No caller use its return value as msi capability offset, also in order
to make its return behaviour consistent with msix_init().

cc: Michael S. Tsirkin <mst@redhat.com>
cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Hannes Reinecke <hare@suse.de>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>

Acked-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agofix some coding style problems
Cao jin [Fri, 10 Jun 2016 09:54:23 +0000 (17:54 +0800)]
fix some coding style problems

It has:
1. More newlines make the code block well separated.
2. Add more comments for msi_init.
3. Fix a indentation in vmxnet3.c.
4. ioh3420 & xio3130_downstream: put PCI Express capability init function
   together, make it more readable.

cc: Michael S. Tsirkin <mst@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Jason Wang <jasowang@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agopci core: assert ENOSPC when add capability
Cao jin [Fri, 10 Jun 2016 09:54:22 +0000 (17:54 +0800)]
pci core: assert ENOSPC when add capability

ENOSPC is programming error, assert it for debugging.

cc: Michael S. Tsirkin <mst@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agotest: start vhost-user reconnect test
Marc-André Lureau [Mon, 6 Jun 2016 16:45:08 +0000 (18:45 +0200)]
test: start vhost-user reconnect test

This is a simple reconnect test, that simply checks if vhost-user
reconnection is possible and restore the state. A more complete test
would actually manipulate and check the ring contents (such extended
testing would benefit from the libvhost-user proposed in QEMU list to
avoid duplication of ring manipulations)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agotests: append i386 tests
Marc-André Lureau [Mon, 6 Jun 2016 16:45:07 +0000 (18:45 +0200)]
tests: append i386 tests

Do not overwrite x86-64 tests, re-enable vhost-user-test.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovhost-net: save & restore vring enable state
Marc-André Lureau [Mon, 6 Jun 2016 16:45:06 +0000 (18:45 +0200)]
vhost-net: save & restore vring enable state

A driver may change the vring enable state at run time but vhost-user
backend may not be present (a contrived example is when the backend is
disconnected and the device is reconfigured after driver rebinding)

Restore the vring state when the vhost-user backend is started, so it
can process the ring.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovhost-net: save & restore vhost-user acked features
Marc-André Lureau [Mon, 6 Jun 2016 16:45:05 +0000 (18:45 +0200)]
vhost-net: save & restore vhost-user acked features

The initial vhost-user connection sets the features to be negotiated
with the driver. Renegotiation isn't possible without device reset.

To handle reconnection of vhost-user backend, ensure the same set of
features are provided, and reuse already acked features.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovhost-net: do not crash if backend is not present
Marc-André Lureau [Mon, 6 Jun 2016 16:45:04 +0000 (18:45 +0200)]
vhost-net: do not crash if backend is not present

Do not crash when backend is not present while enabling the ring. A
following patch will save the enabled state so it can be restored once
the backend is started.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovhost-user: disconnect on start failure
Marc-André Lureau [Mon, 6 Jun 2016 16:45:03 +0000 (18:45 +0200)]
vhost-user: disconnect on start failure

If the backend failed to start (for example feature negociation failed),
do not exit, but disconnect the char device instead. Slightly more
robust for reconnect case.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agoqemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd
Tetsuya Mukawa [Mon, 6 Jun 2016 16:45:02 +0000 (18:45 +0200)]
qemu-char: add qemu_chr_disconnect to close a fd accepted by listen fd

The patch introduces qemu_chr_disconnect(). The function is used for
closing a fd accepted by listen fd. Though we already have qemu_chr_delete(),
but it closes not only accepted fd but also listen fd. This new function
is used when we still want to keep listen fd.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agotests/vhost-user-bridge: workaround stale vring base
Marc-André Lureau [Mon, 6 Jun 2016 16:45:01 +0000 (18:45 +0200)]
tests/vhost-user-bridge: workaround stale vring base

This patch is a similar solution to what Yuanhan Liu/Huawei Xie have
suggested for DPDK. When vubr quits (killed or crashed), a restart of
vubr would get stale vring base from QEMU. That would break the kernel
virtio net completely, making it non-work any more, unless a driver
reset is done.

So, instead of getting the stale vring base from QEMU, Huawei suggested
we could get a proper one from used->idx. This works because the queues
packets are processed in order.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agotests/vhost-user-bridge: add client mode
Marc-André Lureau [Mon, 6 Jun 2016 16:45:00 +0000 (18:45 +0200)]
tests/vhost-user-bridge: add client mode

If -c is specified, vubr will try to connect to the socket instead of
listening for connections.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agovhost-user: add ability to know vhost-user backend disconnection
Tetsuya Mukawa [Mon, 6 Jun 2016 16:44:59 +0000 (18:44 +0200)]
vhost-user: add ability to know vhost-user backend disconnection

Current QEMU cannot detect vhost-user backend disconnection. The
patch adds ability to know it.
To know disconnection, add watcher to detect G_IO_HUP event. When
G_IO_HUP event is detected, the disconnected socket will be read
to cause a CHR_EVENT_CLOSED.

Signed-off-by: Tetsuya Mukawa <mukawa@igel.co.jp>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reviewed-by: Victor Kaplansky <victork@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agopci: fix pci_requester_id()
Peter Xu [Tue, 17 May 2016 11:26:10 +0000 (19:26 +0800)]
pci: fix pci_requester_id()

This fix SID verification failure when IOMMU IR is enabled with PCI
bridges. Existing pci_requester_id() is more like getting BDF info
only. Renaming it to pci_get_bdf(). Meanwhile, we provide the correct
implementation to get requester ID. VT-d spec 5.1.1 is a good reference
to go, though it talks only about interrupt delivery, the rule works
exactly the same for non-interrupt cases.

Currently, there are three use cases for pci_requester_id():

- PCIX status bits: here we need BDF only, not requester ID. Replacing
  with pci_get_bdf().
- PCIe Error injection and MSI delivery: for both these cases, we are
  looking for requester IDs. Here we should use the new impl.

To avoid a PCI walk every time we send MSI message, one requester_id
cache field is added to PCIDevice to cache the result when initialize
PCI device.

Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
7 years agohw/ppc/spapr: Silence deprecation message in qtest mode
Thomas Huth [Tue, 14 Jun 2016 17:23:03 +0000 (19:23 +0200)]
hw/ppc/spapr: Silence deprecation message in qtest mode

When running "make check", there is currently always an error message
saying "spapr-pci-vfio-host-bridge is deprecated". This happens because
the QOM tests are instantiating all possible devices, and the error
message is currently located in the instance_init() function of the
device. Since it is legal for the tests to instantiate a device without
using it, the error message should be silenced when we're running in
test mode.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
7 years agoMerge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging
Peter Maydell [Thu, 16 Jun 2016 16:58:45 +0000 (17:58 +0100)]
Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* KVM startup speedup (Chao Peng)
* configure fixes and cleanups (David, Thomas)
* ctags fix (Sergey)
* NBD cleanups (Peter, Eric)
* "-L help" command line option (Richard)
* More esp.c bugfixes (me, Prasad)
* KVM_CAP_MAX_VCPU_ID support (Greg)

# gpg: Signature made Thu 16 Jun 2016 17:39:10 BST
# gpg:                using RSA key 0xBFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (29 commits)
  vl: smp_parse: cleanups
  scsi: esp: make cmdbuf big enough for maximum CDB size
  scsi: esp: clean up handle_ti/esp_do_dma if s->do_cmd
  scsi: esp: respect FIFO invariant after message phase
  scsi: esp: check buffer length before reading scsi command
  nbd: Avoid magic number for NBD max name size
  nbd: Detect servers that send unexpected error values
  nbd: Clean up ioctl handling of qemu-nbd -c
  nbd: Group all Linux-specific ioctl code in one place
  nbd: Reject unknown request flags
  nbd: Improve server handling of bogus commands
  nbd: Quit server after any write error
  nbd: More debug typo fixes, use correct formats
  nbd: Use BDRV_REQ_FUA for better FUA where supported
  vl.c: Add '-L help' which lists data dirs.
  KVM: use KVM_CAP_MAX_VCPU_ID
  scsi-disk: Use (unsigned long) typecasts when using "%lu" format string
  target-i386: kvm: cache KVM_GET_SUPPORTED_CPUID data
  nbd: simplify the nbd_request and nbd_reply structs
  nbd: Don't use cpu_to_*w() functions
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agovl: smp_parse: cleanups
Andrew Jones [Fri, 10 Jun 2016 17:40:12 +0000 (19:40 +0200)]
vl: smp_parse: cleanups

No functional changes; only some code movement and removal of
dead code (impossible conditions). Also, max_cpus can be
initialized to 1, like smp_cpus, because it's either set by the
user or set to smp_cpus, when smp_cpus is set by the user, or
set to 1, when nothing is set.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Message-Id: <1465580427-13596-2-git-send-email-drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoscsi: esp: make cmdbuf big enough for maximum CDB size
Prasad J Pandit [Wed, 15 Jun 2016 22:22:35 +0000 (00:22 +0200)]
scsi: esp: make cmdbuf big enough for maximum CDB size

While doing DMA read into ESP command buffer 's->cmdbuf', it could
write past the 's->cmdbuf' area, if it was transferring more than 16
bytes.  Increase the command buffer size to 32, which is maximum when
's->do_cmd' is set, and add a check on 'len' to avoid OOB access.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoscsi: esp: clean up handle_ti/esp_do_dma if s->do_cmd
Paolo Bonzini [Wed, 15 Jun 2016 12:29:33 +0000 (14:29 +0200)]
scsi: esp: clean up handle_ti/esp_do_dma if s->do_cmd

Avoid duplicated code between esp_do_dma and handle_ti.  esp_do_dma
has the same code that handle_ti contains after the call to esp_do_dma;
but the code in handle_ti is never reached because it is in an "else if".
Remove the else and also the pointless return.

esp_do_dma also has a partially dead assignment of the to_device
variable.  Sink it to the point where it's actually used.

Finally, assert that the other caller of esp_do_dma (esp_transfer_data)
only transfers data and not a command.  This is true because get_cmd
cancels the old request synchronously before its caller handle_satn_stop
sets do_cmd to 1.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoscsi: esp: respect FIFO invariant after message phase
Paolo Bonzini [Tue, 14 Jun 2016 13:10:24 +0000 (15:10 +0200)]
scsi: esp: respect FIFO invariant after message phase

The FIFO contains two bytes; hence the write ptr should be two bytes ahead
of the read pointer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoscsi: esp: check buffer length before reading scsi command
Prasad J Pandit [Tue, 31 May 2016 17:53:27 +0000 (23:23 +0530)]
scsi: esp: check buffer length before reading scsi command

The 53C9X Fast SCSI Controller(FSC) comes with an internal 16-byte
FIFO buffer. It is used to handle command and data transfer.
Routine get_cmd() in non-DMA mode, uses 'ti_size' to read scsi
command into a buffer. Add check to validate command length against
buffer size to avoid any overrun.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464717207-7549-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Avoid magic number for NBD max name size
Eric Blake [Wed, 11 May 2016 22:39:44 +0000 (16:39 -0600)]
nbd: Avoid magic number for NBD max name size

Declare a constant and use that when determining if an export
name fits within the constraints we are willing to support.

Note that upstream NBD recently documented that clients MUST
support export names of 256 bytes (not including trailing NUL),
and SHOULD support names up to 4096 bytes.  4096 is a bit big
(we would lose benefits of stack-allocation of a name array),
and we already have other limits in place (for example, qcow2
snapshot names are clamped around 1024).  So for now, just
stick to the required minimum, as that's easier to audit than
a full-scale support for larger names.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-12-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Detect servers that send unexpected error values
Eric Blake [Wed, 11 May 2016 22:39:43 +0000 (16:39 -0600)]
nbd: Detect servers that send unexpected error values

Add some debugging to flag servers that are not compliant to
the NBD protocol.  This would have flagged the server bug
fixed in commit c0301fcc.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1463006384-7734-11-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Clean up ioctl handling of qemu-nbd -c
Eric Blake [Wed, 11 May 2016 22:39:40 +0000 (16:39 -0600)]
nbd: Clean up ioctl handling of qemu-nbd -c

The kernel ioctl() interface into NBD is limited to 'unsigned long';
we MUST pass in input with that type (and not int or size_t, as
there may be platform ABIs where the wrong types promote incorrectly
through var-args).  Furthermore, on 32-bit platforms, the kernel
is limited to a maximum export size of 2T (our BLKSIZE of 512 times
a SIZE_BLOCKS constrained by 32 bit unsigned long).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-8-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Group all Linux-specific ioctl code in one place
Eric Blake [Wed, 11 May 2016 22:39:39 +0000 (16:39 -0600)]
nbd: Group all Linux-specific ioctl code in one place

NBD ioctl()s are used to manage an NBD client session where
initial handshake is done in userspace, but then the transmission
phase is handed off to the kernel through a /dev/nbdX device.
As such, all ioctls sent to the kernel on the /dev/nbdX fd belong
in client.c; nbd_disconnect() was out-of-place in server.c.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-7-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Reject unknown request flags
Eric Blake [Wed, 11 May 2016 22:39:38 +0000 (16:39 -0600)]
nbd: Reject unknown request flags

The NBD protocol says that clients should not send a command flag
that has not been negotiated (whether by the client requesting an
option during a handshake, or because we advertise support for the
flag in response to NBD_OPT_EXPORT_NAME), and that servers should
reject invalid flags with EINVAL.  We were silently ignoring the
flags instead.  The client can't rely on our behavior, since it is
their fault for passing the bad flag in the first place, but it's
better to be robust up front than to possibly behave differently
than the client was expecting with the attempted flag.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alex Bligh <alex@alex.org.uk>
Message-Id: <1463006384-7734-6-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Improve server handling of bogus commands
Eric Blake [Wed, 11 May 2016 22:39:37 +0000 (16:39 -0600)]
nbd: Improve server handling of bogus commands

We have a few bugs in how we handle invalid client commands:

- A client can send an NBD_CMD_DISC where from + len overflows,
convincing us to reply with an error and stay connected, even
though the protocol requires us to silently disconnect. Fix by
hoisting the special case sooner.

- A client can send an NBD_CMD_WRITE where from + len overflows,
where we reply to the client with EINVAL without consuming the
payload; this will normally cause us to fail if the next thing
read is not the right magic, but in rare cases, could cause us
to interpret the data payload as valid commands and do things
not requested by the client. Fix by adding a complete flag to
track whether we are in sync or must disconnect.

Furthermore, we have split the checks for bogus from/len across
two functions, when it is easier to do it all at once.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-5-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Quit server after any write error
Eric Blake [Wed, 11 May 2016 22:39:36 +0000 (16:39 -0600)]
nbd: Quit server after any write error

We should never ignore failure from nbd_negotiate_send_rep(); if
we are unable to write to the client, then it is not worth trying
to continue the negotiation.  Fortunately, the problem is not
too severe - chances are that the errors being ignored here (mainly
inability to write the reply to the client) are indications of
a closed connection or something similar, which will also affect
the next attempt to interact with the client and eventually reach
a point where the errors are detected to end the loop.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-4-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: More debug typo fixes, use correct formats
Eric Blake [Wed, 11 May 2016 22:39:35 +0000 (16:39 -0600)]
nbd: More debug typo fixes, use correct formats

Clean up some debug message oddities missed earlier; this includes
some typos, and recognizing that %d is not necessarily compatible
with uint32_t. Also add a couple messages that I found useful
while debugging things.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-3-git-send-email-eblake@redhat.com>
[Do not use PRIx16, clang complains. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Use BDRV_REQ_FUA for better FUA where supported
Eric Blake [Wed, 11 May 2016 22:39:34 +0000 (16:39 -0600)]
nbd: Use BDRV_REQ_FUA for better FUA where supported

Rather than always flushing ourselves, let the block layer
forward the FUA on to the underlying device - where all
underlying layers also understand FUA, we are now more
efficient; and where any underlying layer doesn't understand
it, now the block layer takes care of the full flush fallback
on our behalf.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463006384-7734-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agovl.c: Add '-L help' which lists data dirs.
Richard W.M. Jones [Mon, 16 May 2016 16:34:35 +0000 (17:34 +0100)]
vl.c: Add '-L help' which lists data dirs.

QEMU compiles a list of data directories from various sources.  When
consuming a QEMU binary it's useful to be able to get this list of
data directories: a primary reason is so you can list what BIOSes or
keymaps ship with this version of QEMU.  However without reproducing
the method that QEMU uses internally, it's not possible to get the
list of data directories.

This commit adds a simple '-L help' option that just lists out the
data directories as qemu calculates them:

$ ./x86_64-softmmu/qemu-system-x86_64 -L help
/home/rjones/d/qemu/pc-bios
/usr/local/share/qemu

$ ./x86_64-softmmu/qemu-system-x86_64 -L /tmp -L help
/tmp
/home/rjones/d/qemu/pc-bios
/usr/local/share/qemu

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <1463416475-11728-2-git-send-email-rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoKVM: use KVM_CAP_MAX_VCPU_ID
Greg Kurz [Thu, 26 May 2016 08:02:23 +0000 (10:02 +0200)]
KVM: use KVM_CAP_MAX_VCPU_ID

As stated in linux/Documentation/virtual/kvm/api.txt:

The maximum possible value for max_vcpu_id can be retrieved using the
KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time.

If the KVM_CAP_MAX_VCPU_ID does not exist, you should assume that
max_vcpu_id is the same as the value returned from KVM_CAP_MAX_VCPUS.

Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
Message-Id: <146424974323.5666.5471538288045048119.stgit@bahia.huguette.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoscsi-disk: Use (unsigned long) typecasts when using "%lu" format string
Thomas Huth [Mon, 13 Jun 2016 08:10:18 +0000 (10:10 +0200)]
scsi-disk: Use (unsigned long) typecasts when using "%lu" format string

Some source code analyzers like cppcheck spill out a warning if
the sign of the argument does not match the format string.

Ticket: https://bugs.launchpad.net/qemu/+bug/1589564
Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1465805418-15906-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agotarget-i386: kvm: cache KVM_GET_SUPPORTED_CPUID data
Chao Peng [Mon, 13 Jun 2016 02:21:27 +0000 (10:21 +0800)]
target-i386: kvm: cache KVM_GET_SUPPORTED_CPUID data

KVM_GET_SUPPORTED_CPUID ioctl is called frequently when initializing
CPU. Depends on CPU features and CPU count, the number of calls can be
extremely high which slows down QEMU booting significantly. In our
testing, we saw 5922 calls with switches:

    -cpu SandyBridge -smp 6,sockets=6,cores=1,threads=1

This ioctl takes more than 100ms, which is almost half of the total
QEMU startup time.

While for most cases the data returned from two different invocations
are not changed, that means, we can cache the data to avoid trapping
into kernel for the second time. To make sure the cache safe one
assumption is desirable: the ioctl is stateless. This is not true for
CPUID leaves in general (such as CPUID leaf 0xD, whose value depends
on guest XCR0 and IA32_XSS) but it is true of KVM_GET_SUPPORTED_CPUID,
which runs before there is a value for XCR0 and IA32_XSS.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Message-Id: <1465784487-23482-1-git-send-email-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: simplify the nbd_request and nbd_reply structs
Paolo Bonzini [Mon, 13 Jun 2016 09:42:40 +0000 (11:42 +0200)]
nbd: simplify the nbd_request and nbd_reply structs

These structs are never used to represent the bytes that go over the
network.  The big-endian network data is built into a uint8_t array
in nbd_{receive,send}_{request,reply}.  Remove the unused magic field,
reorder the struct to avoid holes, and remove the packed attribute.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Don't use cpu_to_*w() functions
Peter Maydell [Fri, 10 Jun 2016 16:15:42 +0000 (17:15 +0100)]
nbd: Don't use cpu_to_*w() functions

The cpu_to_*w() functions just compose a pointer dereference
with a byteswap. Instead use st*_p(), which handles potential
pointer misalignment and avoids the need to cast the pointer.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1465575342-12146-1-git-send-email-peter.maydell@linaro.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agonbd: Don't use *_to_cpup() functions
Peter Maydell [Fri, 10 Jun 2016 15:00:36 +0000 (16:00 +0100)]
nbd: Don't use *_to_cpup() functions

The *_to_cpup() functions are not very useful, as they simply do
a pointer dereference and then a *_to_cpu(). Instead use either:
 * ld*_*_p(), if the data is at an address that might not be
   correctly aligned for the load
 * a local dereference and *_to_cpu(), if the pointer is
   the correct type and known to be correctly aligned

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <1465570836-22211-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoconfigure: Remove unused CONFIG_SIGEV_THREAD_ID switch
Thomas Huth [Fri, 10 Jun 2016 15:04:44 +0000 (17:04 +0200)]
configure: Remove unused CONFIG_SIGEV_THREAD_ID switch

The CONFIG_SIGEV_THREAD_ID switch is unused since the related code
has been removed by commit 6d327171551a12b937c5718073b9848d0274c74d
("aio / timers: Remove alarm timers"), so it can safely be removed
nowadays.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1465571084-19885-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoavx2 configure: Use primitives in test
Dr. David Alan Gilbert [Fri, 10 Jun 2016 11:16:18 +0000 (12:16 +0100)]
avx2 configure: Use primitives in test

Use the avx2 primitives during the test, thus making sure that the
compiler and assembler could actually use avx2.

This also detects the failure case on gcc 4.8.x with -save-temps
and avoids the need for the gcc version check in cutils.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1465557378-24105-3-git-send-email-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoMake avx2 configure test work with -O2
Dr. David Alan Gilbert [Fri, 10 Jun 2016 11:16:17 +0000 (12:16 +0100)]
Make avx2 configure test work with -O2

When configured with --extra-cflags=-O2 gcc optimised out the test
and the readelf failed the check leaving avx2 disabled.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <1465557378-24105-2-git-send-email-dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoMakefile: Fix tag file generation targets
Sergey Fedorov [Thu, 9 Jun 2016 17:58:35 +0000 (20:58 +0300)]
Makefile: Fix tag file generation targets

"ctags" produces a file named "tags", not "ctags". It doesn't look
reasonable to use phony target name as a file name to remove. Just use
exact file names to remove in "ctags" and "TAGS" target receipts.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Message-Id: <1465495115-24665-1-git-send-email-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoconfigure: Enable -Werror for MinGW builds, too
Thomas Huth [Wed, 8 Jun 2016 08:13:26 +0000 (10:13 +0200)]
configure: Enable -Werror for MinGW builds, too

MinGW seems to compile currently without warnings, so it should
be safe to enable -Werror now for this environment, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <1465373606-18486-1-git-send-email-thuth@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoclean-includes: run it once more
Paolo Bonzini [Mon, 6 Jun 2016 16:56:37 +0000 (18:56 +0200)]
clean-includes: run it once more

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoos-posix: include sys/mman.h
Paolo Bonzini [Mon, 6 Jun 2016 11:57:39 +0000 (13:57 +0200)]
os-posix: include sys/mman.h

qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoconfigure: Remove unused CONFIG_ZERO_MALLOC setting
Thomas Huth [Wed, 8 Jun 2016 15:11:23 +0000 (17:11 +0200)]
configure: Remove unused CONFIG_ZERO_MALLOC setting

CONFIG_ZERO_MALLOC was only used in qemu-malloc.c and
this file has been removed with the following commit:

41a748265f4879b52b0e87ff9c93bed975163886
Remove qemu_malloc/qemu_free

So we don't need this configuration setting anymore.
This patch also removes the z_version variable, since
this is now also not needed anymore.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Message-Id: <1465398683-3152-1-git-send-email-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoMerge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging
Peter Maydell [Thu, 16 Jun 2016 14:22:56 +0000 (15:22 +0100)]
Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches

# gpg: Signature made Thu 16 Jun 2016 15:01:27 BST
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (39 commits)
  hbitmap: add 'pos < size' asserts
  iotests: Add test for oVirt-like storage migration
  iotests: Add test for post-mirror backing chains
  block/null: Implement bdrv_refresh_filename()
  block/mirror: Fix target backing BDS
  block: Allow replacement of a BDS by its overlay
  rbd:change error_setg() to error_setg_errno()
  iotests: 095: Clean up QEMU before showing image info
  block: Create the commit block job before reopening any image
  block: Prevent sleeping jobs from resuming if they have been paused
  block: use the block job list in qmp_query_block_jobs()
  block: use the block job list in bdrv_drain_all()
  block: Fix snapshot=on with aio=native
  block: Remove bs->zero_beyond_eof
  qcow2: Let vmstate call qcow2_co_preadv/pwrite directly
  block: Make bdrv_load/save_vmstate coroutine_fns
  block: Allow .bdrv_load/save_vmstate() to return 0/-errno
  block: Make .bdrv_load_vmstate() vectored
  block: Introduce bdrv_preadv()
  doc: Fix mailing list address in tests/qemu-iotests/README
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
7 years agoMerge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-06-16' into queue...
Kevin Wolf [Thu, 16 Jun 2016 13:22:18 +0000 (15:22 +0200)]
Merge remote-tracking branch 'mreitz/tags/pull-block-for-kevin-2016-06-16' into queue-block

Block patches

# gpg: Signature made Thu Jun 16 15:21:35 2016 CEST
# gpg:                using RSA key 0x3BB14202E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40
#      Subkey fingerprint: 58B3 81CE 2DC8 9CF9 9730  EE64 3BB1 4202 E838 ACAD

* mreitz/tags/pull-block-for-kevin-2016-06-16:
  hbitmap: add 'pos < size' asserts
  iotests: Add test for oVirt-like storage migration
  iotests: Add test for post-mirror backing chains
  block/null: Implement bdrv_refresh_filename()
  block/mirror: Fix target backing BDS
  block: Allow replacement of a BDS by its overlay
  rbd:change error_setg() to error_setg_errno()
  iotests: 095: Clean up QEMU before showing image info
  block: Create the commit block job before reopening any image
  block: Prevent sleeping jobs from resuming if they have been paused
  block: use the block job list in qmp_query_block_jobs()
  block: use the block job list in bdrv_drain_all()

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agohbitmap: add 'pos < size' asserts
Vladimir Sementsov-Ogievskiy [Tue, 14 Jun 2016 17:08:12 +0000 (20:08 +0300)]
hbitmap: add 'pos < size' asserts

For now, fail in hbitmap_set on start + count > size will come from
hbitmap_set
  hb_count_between
    hbitmap_iter_init
      assert(pos < hb->size)

This patch adds such checks to set/get/reset functions of hbitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 1465924093-76875-2-git-send-email-vsementsov@virtuozzo.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoiotests: Add test for oVirt-like storage migration
Max Reitz [Fri, 10 Jun 2016 18:57:50 +0000 (20:57 +0200)]
iotests: Add test for oVirt-like storage migration

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-6-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoiotests: Add test for post-mirror backing chains
Max Reitz [Fri, 10 Jun 2016 18:57:49 +0000 (20:57 +0200)]
iotests: Add test for post-mirror backing chains

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-5-mreitz@redhat.com
Reviewed-by: Fam Zheng <famz@redhat.com>
[mreitz@redhat.com: Removed unnecessary imports]
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock/null: Implement bdrv_refresh_filename()
Max Reitz [Fri, 10 Jun 2016 18:57:48 +0000 (20:57 +0200)]
block/null: Implement bdrv_refresh_filename()

The null block driver ignores any filename used for creating its BDSs,
which allows creating such BDSs even without any filename at all. In
that case, we currently construct a JSON filename when queried instead
of a plain "null-co://" or "null-aio://". This patch implements
bdrv_refresh_filename() to remedy this behavior.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-4-mreitz@redhat.com
[mreitz@redhat.com: Added commit message]
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock/mirror: Fix target backing BDS
Max Reitz [Fri, 10 Jun 2016 18:57:47 +0000 (20:57 +0200)]
block/mirror: Fix target backing BDS

Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:

- If blockdev-mirror is used, we can assume the user made sure that the
  target already has the correct backing chain. In particular, we should
  not try to open a backing file if the target does not have any yet.

- If drive-mirror with mode=absolute-paths is used, we can and should
  reuse the already existing chain of nodes that the source BDS is in.
  In case of sync=full, no backing BDS is required; with sync=top, we
  just link the source's backing BDS to the target, and with sync=none,
  we use the source BDS as the target's backing BDS.
  We should not try to open these backing files anew because this would
  lead to two BDSs existing per physical file in the backing chain, and
  we would like to avoid such concurrent access.

- If drive-mirror with mode=existing is used, we have to use the
  information provided in the physical image file which means opening
  the target's backing chain completely anew, just as it has been done
  already.
  If the target's backing chain shares images with the source, this may
  lead to multiple BDSs per physical image file. But since we cannot
  reliably ascertain this case, there is nothing we can do about it.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: Allow replacement of a BDS by its overlay
Max Reitz [Fri, 10 Jun 2016 18:57:46 +0000 (20:57 +0200)]
block: Allow replacement of a BDS by its overlay

change_parent_backing_link() asserts that the BDS to be replaced is not
used as a backing file. However, we may want to replace a BDS by its
overlay in which case that very link should not be redirected.

For instance, when doing a sync=none drive-mirror operation, we may have
the following BDS/BB forest before block job completion:

  target

  base <- source <- BlockBackend

During job completion, we want to establish the source BDS as the
target's backing node:

          target
            |
            v
  base <- source <- BlockBackend

This makes the target a valid replacement for the source:

          target <- BlockBackend
            |
            v
  base <- source

Without this modification to change_parent_backing_link() we have to
inject the target into the graph before the source is its backing node,
thus temporarily creating a wrong graph:

  target <- BlockBackend

  base <- source

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-2-mreitz@redhat.com
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agorbd:change error_setg() to error_setg_errno()
Vikhyat Umrao [Mon, 9 May 2016 07:51:59 +0000 (13:21 +0530)]
rbd:change error_setg() to error_setg_errno()

Ceph RBD block driver does not use error_setg_errno() where
it is possible to use. This patch replaces error_setg()
from error_setg_errno().

Signed-off-by: Vikhyat Umrao <vumrao@redhat.com>
Message-id: 1462780319-5796-1-git-send-email-vumrao@redhat.com
Reviewed-by: Josh Durgin <jdurgin@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoiotests: 095: Clean up QEMU before showing image info
Fam Zheng [Fri, 3 Jun 2016 09:07:52 +0000 (17:07 +0800)]
iotests: 095: Clean up QEMU before showing image info

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1464944872-24484-1-git-send-email-famz@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: Create the commit block job before reopening any image
Alberto Garcia [Fri, 27 May 2016 10:53:39 +0000 (12:53 +0200)]
block: Create the commit block job before reopening any image

If the base or overlay images need to be reopened in read-write mode
but the block_job_create() call fails then no one will put those
images back in read-only mode.

We can solve this problem easily by calling block_job_create() first.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: aa495045770a6f1a7cc5d408397a17c75097fdd8.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: Prevent sleeping jobs from resuming if they have been paused
Alberto Garcia [Fri, 27 May 2016 10:53:38 +0000 (12:53 +0200)]
block: Prevent sleeping jobs from resuming if they have been paused

If we pause a block job and drain its BlockDriverState we want that
the job remains inactive until we call block_job_resume() again.

However if we pause the job while it is sleeping then it will resume
when the sleep timer fires.

This patch prevents that from happening by checking if the job has
been paused after it comes back from sleeping.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 3d9011151512326b890d22bdab3530244ef349d7.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: use the block job list in qmp_query_block_jobs()
Alberto Garcia [Fri, 27 May 2016 10:53:37 +0000 (12:53 +0200)]
block: use the block job list in qmp_query_block_jobs()

qmp_query_block_jobs() uses bdrv_next() to look for block jobs, but
this function can only find those in top-level BlockDriverStates.

This patch uses block_job_next() instead.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: a8b7e5497b7c1fa67c12fcceae1630d01c3b1f96.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: use the block job list in bdrv_drain_all()
Alberto Garcia [Fri, 27 May 2016 10:53:36 +0000 (12:53 +0200)]
block: use the block job list in bdrv_drain_all()

bdrv_drain_all() pauses all block jobs by using bdrv_next() to iterate
over all top-level BlockDriverStates. Therefore the code is unable to
find block jobs in other nodes.

This patch uses block_job_next() to iterate over all block jobs.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Message-id: 55ee7d7d4a65c28aa1a1b28823897ef326f328e2.1464346103.git.berto@igalia.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
7 years agoblock: Fix snapshot=on with aio=native
Kevin Wolf [Thu, 16 Jun 2016 10:59:30 +0000 (12:59 +0200)]
block: Fix snapshot=on with aio=native

snapshot=on creates a temporary overlay that is always opened with
cache=unsafe (the cache mode specified by the user is only for the
actual image file and its children). This means that we must not inherit
the BDRV_O_NATIVE_AIO flag for the temporary overlay because trying to
use Linux AIO with cache=unsafe results in an error.

Reproducer without this patch:

$ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on
qemu-system-x86_64: -drive file=/tmp/test.qcow2,cache=none,aio=native,snapshot=on: aio=native was
specified, but it requires cache.direct=on, which was not specified.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoblock: Remove bs->zero_beyond_eof
Kevin Wolf [Wed, 1 Jun 2016 15:13:47 +0000 (17:13 +0200)]
block: Remove bs->zero_beyond_eof

It is always true for open images now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoqcow2: Let vmstate call qcow2_co_preadv/pwrite directly
Kevin Wolf [Wed, 1 Jun 2016 15:07:24 +0000 (17:07 +0200)]
qcow2: Let vmstate call qcow2_co_preadv/pwrite directly

We don't really want to go through the block layer in order to read from
or write to the vmstate in a qcow2 image. Doing so required a few ugly
hacks like saving and restoring the old image size (because writing to
vmstate offsets would increase the image size) or disabling the "reads
after EOF = zeroes" logic. When calling the right functions directly,
these hacks aren't necessary any more.

Note that .bdrv_vmstate_load/save() return 0 instead of the number of
bytes in case of success now.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoblock: Make bdrv_load/save_vmstate coroutine_fns
Kevin Wolf [Thu, 9 Jun 2016 14:24:44 +0000 (16:24 +0200)]
block: Make bdrv_load/save_vmstate coroutine_fns

This allows drivers to share code between normal I/O and vmstate
accesses.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoblock: Allow .bdrv_load/save_vmstate() to return 0/-errno
Kevin Wolf [Fri, 10 Jun 2016 15:57:26 +0000 (17:57 +0200)]
block: Allow .bdrv_load/save_vmstate() to return 0/-errno

The return value of .bdrv_load/save_vmstate() can be any non-negative
number in case of success now. It used to be bytes/-errno.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoblock: Make .bdrv_load_vmstate() vectored
Kevin Wolf [Thu, 9 Jun 2016 14:50:16 +0000 (16:50 +0200)]
block: Make .bdrv_load_vmstate() vectored

This brings it in line with .bdrv_save_vmstate().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoblock: Introduce bdrv_preadv()
Kevin Wolf [Thu, 9 Jun 2016 14:36:00 +0000 (16:36 +0200)]
block: Introduce bdrv_preadv()

We already have a byte-based bdrv_pwritev(), but the read counterpart
was still missing. This commit adds it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agodoc: Fix mailing list address in tests/qemu-iotests/README
Thomas Huth [Thu, 16 Jun 2016 07:53:53 +0000 (09:53 +0200)]
doc: Fix mailing list address in tests/qemu-iotests/README

The address of the mailing list is qemu-devel@nongnu.org
instead of qemu-devel@savannah.nongnu.org. And while we're
at it, also mention the qemu-block mailing list here.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agolinux-aio: Cancel BH if not needed
Kevin Wolf [Fri, 28 Nov 2014 14:23:12 +0000 (15:23 +0100)]
linux-aio: Cancel BH if not needed

linux-aio uses a BH in order to make sure that the remaining completions
are processed even in nested event loops of completion callbacks in
order to avoid deadlocks.

There is no need, however, to have the BH overhead for the first call
into qemu_laio_completion_bh() or after all pending completions have
already been processed. Therefore, this patch calls directly into
qemu_laio_completion_bh() in qemu_laio_completion_cb() and cancels
the BH after qemu_laio_completion_bh() has processed all pending
completions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
7 years agoblock: Don't enforce 512 byte minimum alignment
Kevin Wolf [Fri, 3 Jun 2016 14:45:44 +0000 (16:45 +0200)]
block: Don't enforce 512 byte minimum alignment

If block drivers say that they can do an alignment < 512 bytes, let's
just suppose they mean it. raw-posix used to be an offender with respect
to this, but it can actually deal with byte-aligned requests now.

The default is still 512 bytes for any drivers that only implement
sector-based interfaces, but it is 1 now for drivers that implement
.bdrv_co_preadv.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoraw-posix: Implement .bdrv_co_preadv/pwritev
Kevin Wolf [Fri, 3 Jun 2016 15:36:27 +0000 (17:36 +0200)]
raw-posix: Implement .bdrv_co_preadv/pwritev

The raw-posix block driver actually supports byte-aligned requests now
on non-O_DIRECT images, like it already (and previously incorrectly)
claimed in bs->request_alignment.

For some block drivers this means that a RMW cycle can be avoided when
they write sub-sector metadata e.g. for cluster allocation.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoraw-posix: Switch to bdrv_co_* interfaces
Kevin Wolf [Wed, 6 Aug 2014 15:18:07 +0000 (17:18 +0200)]
raw-posix: Switch to bdrv_co_* interfaces

In order to use the modern byte-based .bdrv_co_preadv/pwritev()
interface, this patch switches raw-posix to coroutine-based interfaces
as a first step. In terms of semantics and performance, it doesn't make
a difference with the existing code whether we go from a coroutine to a
callback-based interface already in block/io.c or only in linux-aio.c

As there have been concerns in the past that this change may be a step
in the wrong direction with respect to a possible AIO fast path, the
old callback-based interface for linux-aio is left around and can be
reactivated when a fast path (e.g. directly from virtio-blk dataplane,
bypassing the whole block layer) is implemented.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoblock: Prepare bdrv_aligned_pwritev() for byte-aligned requests
Kevin Wolf [Fri, 3 Jun 2016 16:42:51 +0000 (18:42 +0200)]
block: Prepare bdrv_aligned_pwritev() for byte-aligned requests

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoblock: Prepare bdrv_aligned_preadv() for byte-aligned requests
Kevin Wolf [Fri, 3 Jun 2016 14:17:28 +0000 (16:17 +0200)]
block: Prepare bdrv_aligned_preadv() for byte-aligned requests

This patch makes bdrv_aligned_preadv() ready to accept byte-aligned
requests. Note that this doesn't mean that such requests are actually
made. The caller still ensures that all requests are aligned to at least
512 bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoblock: Byte-based bdrv_co_do_copy_on_readv()
Kevin Wolf [Thu, 2 Jun 2016 09:41:52 +0000 (11:41 +0200)]
block: Byte-based bdrv_co_do_copy_on_readv()

In a first step to convert the common I/O path to work on bytes rather
than sectors, this converts the copy-on-read logic that is used by
bdrv_aligned_preadv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agoblock: drop support for using qcow[2] encryption with system emulators
Daniel P. Berrange [Mon, 13 Jun 2016 11:30:09 +0000 (12:30 +0100)]
block: drop support for using qcow[2] encryption with system emulators

Back in the 2.3.0 release we declared qcow[2] encryption as
deprecated, warning people that it would be removed in a future
release.

  commit a1f688f4152e65260b94f37543521ceff8bfebe4
  Author: Markus Armbruster <armbru@redhat.com>
  Date:   Fri Mar 13 21:09:40 2015 +0100

    block: Deprecate QCOW/QCOW2 encryption

The code still exists today, but by a (happy?) accident we entirely
broke the ability to use qcow[2] encryption in the system emulators
in the 2.4.0 release due to

  commit 8336aafae1451d54c81dd2b187b45f7c45d2428e
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Tue May 12 17:09:18 2015 +0100

    qcow2/qcow: protect against uninitialized encryption key

This commit was designed to prevent future coding bugs which
might cause QEMU to read/write data on an encrypted block
device in plain text mode before a decryption key is set.

It turns out this preventative measure was a little too good,
because we already had a long standing bug where QEMU read
encrypted data in plain text mode during system emulator
startup, in order to guess disk geometry:

  Thread 10 (Thread 0x7fffd3fff700 (LWP 30373)):
  #0  0x00007fffe90b1a28 in raise () at /lib64/libc.so.6
  #1  0x00007fffe90b362a in abort () at /lib64/libc.so.6
  #2  0x00007fffe90aa227 in __assert_fail_base () at /lib64/libc.so.6
  #3  0x00007fffe90aa2d2 in  () at /lib64/libc.so.6
  #4  0x000055555587ae19 in qcow2_co_readv (bs=0x5555562accb0, sector_num=0, remaining_sectors=1, qiov=0x7fffffffd260) at block/qcow2.c:1229
  #5  0x000055555589b60d in bdrv_aligned_preadv (bs=bs@entry=0x5555562accb0, req=req@entry=0x7fffd3ffea50, offset=offset@entry=0, bytes=bytes@entry=512, align=align@entry=512, qiov=qiov@entry=0x7fffffffd260, flags=0) at block/io.c:908
  #6  0x000055555589b8bc in bdrv_co_do_preadv (bs=0x5555562accb0, offset=0, bytes=512, qiov=0x7fffffffd260, flags=<optimized out>) at block/io.c:999
  #7  0x000055555589c375 in bdrv_rw_co_entry (opaque=0x7fffffffd210) at block/io.c:544
  #8  0x000055555586933b in coroutine_thread (opaque=0x555557876310) at coroutine-gthread.c:134
  #9  0x00007ffff64e1835 in g_thread_proxy (data=0x5555562b5590) at gthread.c:778
  #10 0x00007ffff6bb760a in start_thread () at /lib64/libpthread.so.0
  #11 0x00007fffe917f59d in clone () at /lib64/libc.so.6

  Thread 1 (Thread 0x7ffff7ecab40 (LWP 30343)):
  #0  0x00007fffe91797a9 in syscall () at /lib64/libc.so.6
  #1  0x00007ffff64ff87f in g_cond_wait (cond=cond@entry=0x555555e085f0 <coroutine_cond>, mutex=mutex@entry=0x555555e08600 <coroutine_lock>) at gthread-posix.c:1397
  #2  0x00005555558692c3 in qemu_coroutine_switch (co=<optimized out>) at coroutine-gthread.c:117
  #3  0x00005555558692c3 in qemu_coroutine_switch (from_=0x5555562b5e30, to_=to_@entry=0x555557876310, action=action@entry=COROUTINE_ENTER) at coroutine-gthread.c:175
  #4  0x0000555555868a90 in qemu_coroutine_enter (co=0x555557876310, opaque=0x0) at qemu-coroutine.c:116
  #5  0x0000555555859b84 in thread_pool_completion_bh (opaque=0x7fffd40010e0) at thread-pool.c:187
  #6  0x0000555555859514 in aio_bh_poll (ctx=ctx@entry=0x5555562953b0) at async.c:85
  #7  0x0000555555864d10 in aio_dispatch (ctx=ctx@entry=0x5555562953b0) at aio-posix.c:135
  #8  0x0000555555864f75 in aio_poll (ctx=ctx@entry=0x5555562953b0, blocking=blocking@entry=true) at aio-posix.c:291
  #9  0x000055555589c40d in bdrv_prwv_co (bs=bs@entry=0x5555562accb0, offset=offset@entry=0, qiov=qiov@entry=0x7fffffffd260, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:591
  #10 0x000055555589c503 in bdrv_rw_co (bs=bs@entry=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845, is_write=is_write@entry=false, flags=flags@entry=(unknown: 0)) at block/io.c:614
  #11 0x000055555589c562 in bdrv_read_unthrottled (nb_sectors=21845, buf=0x7fffffffd2e0 "\321,", sector_num=0, bs=0x5555562accb0) at block/io.c:622
  #12 0x000055555589c562 in bdrv_read_unthrottled (bs=0x5555562accb0, sector_num=sector_num@entry=0, buf=buf@entry=0x7fffffffd2e0 "\321,", nb_sectors=nb_sectors@entry=21845) at block/io.c:634
    nb_sectors@entry=1) at block/block-backend.c:504
  #14 0x0000555555752e9f in guess_disk_lchs (blk=blk@entry=0x5555562a5290, pcylinders=pcylinders@entry=0x7fffffffd52c, pheads=pheads@entry=0x7fffffffd530, psectors=psectors@entry=0x7fffffffd534) at hw/block/hd-geometry.c:68
  #15 0x0000555555752ff7 in hd_geometry_guess (blk=0x5555562a5290, pcyls=pcyls@entry=0x555557875d1c, pheads=pheads@entry=0x555557875d20, psecs=psecs@entry=0x555557875d24, ptrans=ptrans@entry=0x555557875d28) at hw/block/hd-geometry.c:133
  #16 0x0000555555752b87 in blkconf_geometry (conf=conf@entry=0x555557875d00, ptrans=ptrans@entry=0x555557875d28, cyls_max=cyls_max@entry=65536, heads_max=heads_max@entry=16, secs_max=secs_max@entry=255, errp=errp@entry=0x7fffffffd5e0) at hw/block/block.c:71
  #17 0x0000555555799bc4 in ide_dev_initfn (dev=0x555557875c80, kind=IDE_HD) at hw/ide/qdev.c:174
  #18 0x0000555555768394 in device_realize (dev=0x555557875c80, errp=0x7fffffffd640) at hw/core/qdev.c:247
  #19 0x0000555555769a81 in device_set_realized (obj=0x555557875c80, value=<optimized out>, errp=0x7fffffffd730) at hw/core/qdev.c:1058
  #20 0x00005555558240ce in property_set_bool (obj=0x555557875c80, v=<optimized out>, opaque=0x555557875de0, name=<optimized out>, errp=0x7fffffffd730)
        at qom/object.c:1514
  #21 0x0000555555826c87 in object_property_set_qobject (obj=obj@entry=0x555557875c80, value=value@entry=0x55555784bcb0, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/qom-qobject.c:24
  #22 0x0000555555825760 in object_property_set_bool (obj=obj@entry=0x555557875c80, value=value@entry=true, name=name@entry=0x55555591cb3d "realized", errp=errp@entry=0x7fffffffd730) at qom/object.c:905
  #23 0x000055555576897b in qdev_init_nofail (dev=dev@entry=0x555557875c80) at hw/core/qdev.c:380
  #24 0x0000555555799ead in ide_create_drive (bus=bus@entry=0x555557629630, unit=unit@entry=0, drive=0x5555562b77e0) at hw/ide/qdev.c:122
  #25 0x000055555579a746 in pci_ide_create_devs (dev=dev@entry=0x555557628db0, hd_table=hd_table@entry=0x7fffffffd830) at hw/ide/pci.c:440
  #26 0x000055555579b165 in pci_piix3_ide_init (bus=<optimized out>, hd_table=0x7fffffffd830, devfn=<optimized out>) at hw/ide/piix.c:218
  #27 0x000055555568ca55 in pc_init1 (machine=0x5555562960a0, pci_enabled=1, kvmclock_enabled=<optimized out>) at /home/berrange/src/virt/qemu/hw/i386/pc_piix.c:256
  #28 0x0000555555603ab2 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4249

So the safety net is correctly preventing QEMU reading cipher
text as if it were plain text, during startup and aborting QEMU
to avoid bad usage of this data.

For added fun this bug only happens if the encrypted qcow2
file happens to have data written to the first cluster,
otherwise the cluster won't be allocated and so qcow2 would
not try the decryption routines at all, just return all 0's.

That no one even noticed, let alone reported, this bug that
has shipped in 2.4.0, 2.5.0 and 2.6.0 shows that the number
of actual users of encrypted qcow2 is approximately zero.

So rather than fix the crash, and backport it to stable
releases, just go ahead with what we have warned users about
and disable any use of qcow2 encryption in the system
emulators. qemu-img/qemu-io/qemu-nbd are still able to access
qcow2 encrypted images for the sake of data conversion.

In the future, qcow2 will gain support for the alternative
luks format, but when this happens it'll be using the
'-object secret' infrastructure for getting keys, which
avoids this problematic scenario entirely.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoblock: Assert that flags are in range
Eric Blake [Mon, 13 Jun 2016 18:56:35 +0000 (12:56 -0600)]
block: Assert that flags are in range

Add a new BDRV_REQ_MASK constant, and use it to make sure that
caller flags are always valid.

Tested with 'make check' and with qemu-iotests on both '-raw'
and '-qcow2'; the only failure turned up was fixed in the
previous commit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoblock: Avoid bogus flags during mirroring
Eric Blake [Mon, 13 Jun 2016 18:56:34 +0000 (12:56 -0600)]
block: Avoid bogus flags during mirroring

Commit e253f4b8 converted mirroring from sector-based bdrv_aio_*
to byte-based blk_aio_*, but failed to account for the subtle
difference in signatures (the former takes a semi-redundant length,
the latter takes a flags parameter).  Since all of our flags are
currently smaller in size than BDRV_SECTOR_SIZE, it has no ill
effects until we either perform sub-sector mirroring, or we start
asserting that no unexpected flags are set.  I found it while
testing new asserts when qemu-iotests 132 started warning about an
unknown flag 0x200000.

Signed-off-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
7 years agoqemu-img bench: Fix uninitialised writethrough mode
Kevin Wolf [Tue, 14 Jun 2016 09:29:32 +0000 (11:29 +0200)]
qemu-img bench: Fix uninitialised writethrough mode

If no -t option is specified, bool writethrough stayed uninitialised.
Initialise it as false, which makes cache=writeback the default cache
mode.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
7 years agom25p80: fix test on blk_pread() return value
Cédric Le Goater [Tue, 31 May 2016 11:36:05 +0000 (13:36 +0200)]
m25p80: fix test on blk_pread() return value

commit 243e6f69c129 ("m25p80: Switch to byte-based block access")
replaced blk_read() calls with blk_pread() but return values are
different.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>