git.proxmox.com Git - mirror

target/ppc: Clean up probing of VMX, VSX and DFP availability on KVM

When constructing the "host" cpu class we modify whether the VMX and VSX
vector extensions and DFP (Decimal Floating Point) are available
based on whether KVM can support those instructions.  This can depend on
policy in the host kernel as well as on the actual host cpu capabilities.

However, the way we probe for this is not very nice: we explicitly check
the host's device tree.  That works in practice, but it's not really
correct, since the device tree is a property of the host kernel's platform
which we don't really know about.  We get away with it because the only
modern POWER platforms happen to encode VMX, VSX and DFP availability in
the device tree in the same way.

Arguably we should have an explicit KVM capability for this, but we haven't
needed one so far.  Barring specific KVM policies which don't yet exist,
each of these instruction classes will be available in the guest if and
only if they're available in the qemu userspace process.  We can determine
that from the ELF AUX vector we're supplied with.

Once reworked like this, there are no more callers for kvmppc_get_vmx() and
kvmppc_get_dfp() so remove them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 3f2ca480eb872b4946baf77f756236b637a5b15a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

spapr: Validate capabilities on migration

Now that the "pseries" machine type implements optional capabilities (well,
one so far) there's the possibility of having different capabilities
available at either end of a migration.  Although arguably a user error,
it would be nice to catch this situation and fail as gracefully as we can.

This adds code to migrate the capabilities flags.  These aren't pulled
directly into the destination's configuration since what the user has
specified on the destination command line should take precedence.  However,
they are checked against the destination capabilities.

If the source was using a capability which is absent on the destination,
we fail the migration, since that could easily cause a guest crash or other
bad behaviour.  If the source lacked a capability which is present on the
destination we warn, but allow the migration to proceed.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit be85537d654565e35e359a74b46fc08b7956525c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

spapr: Treat Hardware Transactional Memory (HTM) as an optional capability

This adds an spapr capability bit for Hardware Transactional Memory.  It is
enabled by default for pseries-2.11 and earlier machine types. with POWER8
or later CPUs (as it must be, since earlier qemu versions would implicitly
allow it).  However it is disabled by default for the latest pseries-2.12
machine type.

This means that with the latest machine type, HTM will not be available,
regardless of CPU, unless it is explicitly enabled on the command line.
That change is made on the basis that:

* This way running with -M pseries,accel=tcg will start with whatever cpu
   and will provide the same guest visible model as with accel=kvm.
     - More specifically, this means existing make check tests don't have
       to be modified to use cap-htm=off in order to run with TCG

* We hope to add a new "HTM without suspend" feature in the not too
   distant future which could work on both POWER8 and POWER9 cpus, and
   could be enabled by default.

* Best guesses suggest that future POWER cpus may well only support the
   HTM-without-suspend model, not the (frankly, horribly overcomplicated)
   POWER8 style HTM with suspend.

* Anecdotal evidence suggests problems with HTM being enabled when it
   wasn't wanted are more common than being missing when it was.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit ee76a09fc72cfbfab2bb5529320ef7e460adffd8)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

spapr: Capabilities infrastructure

Because PAPR is a paravirtual environment access to certain CPU (or other)
facilities can be blocked by the hypervisor.  PAPR provides ways to
advertise in the device tree whether or not those features are available to
the guest.

In some places we automatically determine whether to make a feature
available based on whether our host can support it, in most cases this is
based on limitations in the available KVM implementation.

Although we correctly advertise this to the guest, it means that host
factors might make changes to the guest visible environment which is bad:
as well as generaly reducing reproducibility, it means that a migration
between different host environments can easily go bad.

We've mostly gotten away with it because the environments considered mature
enough to be well supported (basically, KVM on POWER8) have had consistent
feature availability.  But, it's still not right and some limitations on
POWER9 is going to make it more of an issue in future.

This introduces an infrastructure for defining "sPAPR capabilities".  These
are set by default based on the machine version, masked by the capabilities
of the chosen cpu, but can be overriden with machine properties.

The intention is at reset time we verify that the requested capabilities
can be supported on the host (considering TCG, KVM and/or host cpu
limitations).  If not we simply fail, rather than silently modifying the
advertised featureset to the guest.

This does mean that certain configurations that "worked" may now fail, but
such configurations were already more subtly broken.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
(cherry picked from commit 33face6b8981add8eba1f7cdaf4cf6cede415d2e)
Conflicts:
include/hw/ppc/spapr.h
*drop context dep on 60c6823b9bc
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

spapr: Add pseries-2.12 machine type

While we're at it fix a couple of small errors in the 2.11 and 2.10 models
(they didn't have any real effect, but don't quite match the template).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 2b6154120cbd7f5514cefd3c6084d39922d26d88)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

spapr: don't initialize PATB entry if max-cpu-compat < power9

if KVM is enabled and KVM capabilities MMU radix is available,
the partition table entry (patb_entry) for the radix mode is
initialized by default in ppc_spapr_reset().

It's a problem if we want to migrate the guest to a POWER8 host
while the kernel is not started to set the value to the one
expected for a POWER8 CPU.

The "-machine max-cpu-compat=power8" should allow to migrate
a POWER9 KVM host to a POWER8 KVM host, but because patb_entry
is set, the destination QEMU tries to enable radix mode on the
POWER8 host. This fails and cancels the migration:

    Process table config unsupported by the host
    error while loading state for instance 0x0 of device 'spapr'
    load of migration failed: Invalid argument

This patch doesn't set the PATB entry if the user provides
a CPU compatibility mode that doesn't support radix mode.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 1481fe5fcfeb7fcf3c1ebb9d8c0432e3e0188ccf)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

linux-user/signal.c: Rename MC_* defines

The SPARC code in linux-user/signal.c defines a set of
MC_* constants. On some SPARC hosts these are also defined
by sys/ucontext.h, resulting in build failures:

linux-user/signal.c:2786:0: error: "MC_NGREG" redefined [-Werror]
#define MC_NGREG 19

In file included from /usr/include/signal.h:302:0,
from include/qemu/osdep.h:86,
from linux-user/signal.c:19:
/usr/include/sparc64-linux-gnu/sys/ucontext.h:59:0: note: this is the location of the previous definition
# define MC_NGREG __MC_NGREG

Rename all these constants to SPARC_MC_* to avoid the clash.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1517318239-15764-1-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 8ebb314b957403c1c9a3f1cf995f73c6ae9d5d10)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

spapr_pci: fix MSI/MSIX selection

In various place we don't correctly check if the device supports MSI or
MSI-X. This can cause devices to be advertised with MSI support, even
if they only support MSI-X (like virtio-pci-* devices for example):

                ethernet@0 {
                        ibm,req#msi = <0x1>; <--- wrong!
.
ibm,loc-code = "qemu_virtio-net-pci:0000:00:00.0";
.
ibm,req#msi-x = <0x3>;
                };

Worse, this can also cause the "ibm,change-msi" RTAS call to corrupt the
PCI status and cause migration to fail:

  qemu-system-ppc64: get_pci_config_device: Bad config data: i=0x6
    read: 0 device: 10 cmask: 10 wmask: 0 w1cmask:0
                              ^^
           PCI_STATUS_CAP_LIST bit which is assumed to be constant

This patch changes spapr_populate_pci_child_dt() to properly check for
MSI support using msi_present(): this ensures that PCIDevice::msi_cap
was set by msi_init() and that msi_nr_vectors_allocated() will look at
the right place in the config space.

Checking PCIDevice::msix_entries_nr is enough for MSI-X but let's add
a call to msix_present() there as well for consistency.

It also changes rtas_ibm_change_msi() to select the appropriate MSI
type in Function 1 instead of always selecting plain MSI. This new
behaviour is compliant with LoPAPR 1.1, as described in "Table 71.
ibm,change-msi Argument Call Buffer":

  Function 1: If Number Outputs is equal to 3, request to set to a new
           number of MSIs (including set to 0).
           If the “ibm,change-msix-capable” property exists and Number
           Outputs is equal to 4, request is to set to a new number of
           MSI or MSI-X (platform choice) interrupts (including set to
           0).

Since MSI is the the platform default (LoPAPR 6.2.3 MSI Option), let's
check for MSI support first.

And finally, it checks the input parameters are valid, as described in
LoPAPR 1.1 "R1–7.3.10.5.1–3":

  For the MSI option: The platform must return a Status of -3 (Parameter
  error) from ibm,change-msi, with no change in interrupt assignments if
  the PCI configuration address does not support MSI and Function 3 was
  requested (that is, the “ibm,req#msi” property must exist for the PCI
  configuration address in order to use Function 3), or does not support
  MSI-X and Function 4 is requested (that is, the “ibm,req#msi-x” property
  must exist for the PCI configuration address in order to use Function 4),
  or if neither MSIs nor MSI-Xs are supported and Function 1 is requested.

This ensures that the ret_intr_type variable contains a valid MSI type
for this device, and that spapr_msi_setmsg() won't corrupt the PCI status.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 9cbe305b60cc49cfcd134765b85c28be95b1b57d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

usb-storage: Fix share-rw option parsing

Because usb-storage creates an internal scsi device, we should propagate
options. We already do so for bootindex etc, but failed to take care of
share-rw. Fix it in an apparent way: add a new parameter to
scsi_bus_legacy_add_drive and pass in s->conf.share_rw.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Message-id: 20180117005222.4781-1-famz@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 395b95395934785ca86baafd314d0c31b307d16d)
Conflicts:
hw/usb/dev-storage.c
* dropped context dep on ceff3e1f01e
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

osdep: Retry SETLK upon EINTR

We could hit lock failure if there is a signal that makes fcntl return
-1 and errno set to EINTR. In this case we should retry.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit f86428a1f4f91a460ed585682af70d3e8c31dc06)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

s390x/kvm: provide stfle.81

stfle.81 (ppa15) is a transparent facility that can be passed to the
guest without the need to implement hypervisor support. As this feature
can be provided by firmware we add it to all full models.

Cc: qemu-stable@nongnu.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20180118085628.40798-4-borntraeger@de.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 9f0d13f4f1de3cf9b70435cc4e87a301ee12471f)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

s390x/kvm: Handle bpb feature

We need to handle the bpb control on reset and migration. Normally
stfle.82 is transparent (and the normal guest part works without
hypervisor activity). To prevent any issues we require full
host kernel support for this feature.

Cc: qemu-stable@nongnu.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20180118085628.40798-3-borntraeger@de.ibm.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[CH: 'Branch Prediction Blocking' -> 'Branch prediction blocking']
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit b073c87517d4d348c7bac0f0b35e8e83e6354d82)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

linux-headers: update

Update headers against 4.15-rc9.

Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 9cbb636270b4df6f0a548e5c34b895330db5df8b)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

linux-headers: update to 4.15-rc1

Update headers against v4.15-rc1.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1511883692-11511-4-git-send-email-eric.auger@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
(cherry picked from commit dd8739669f95b30653a3a05cb2e21da3f52894fa)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

s390x: fix storage attributes migration for non-small guests

Fix storage attribute migration so that it does not fail for guests
with more than a few GB of RAM.
With such guests, the index in the buffer would go out of bounds,
usually by large amounts, thus receiving -EFAULT from the kernel.
Migration itself would be successful, but storage attributes would then
not be migrated completely.

This patch fixes the out of bounds access, and thus migration of all
storage attributes when the guest have large amounts of memory.

Cc: qemu-stable@nongnu.org
Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Fixes: 903fd80b03243476 ("s390x/migration: Storage attributes device")
Message-Id: <1516297904-18188-1-git-send-email-imbrenda@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 46fa893355e0bd88f3c59b886f0d75cbd5f0bbbe)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

linux-user: Fix locking order in fork_start()

Our locking order is that the tb lock should be taken
inside the mmap_lock, but fork_start() grabs locks the
other way around. This means that if a heavily multithreaded
guest process (such as Java) calls fork() it can deadlock,
with the thread that called fork() stuck in fork_start()
with the tb lock and waiting for the mmap lock, but some
other thread in tb_find() with the mmap lock and waiting
for the tb lock. The cpu_list_lock() should also always be
taken last, not first.

Fix this by making fork_start() grab the locks in the
right order. The order in which we drop locks doesn't
matter, so we leave fork_end() the way it is.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <1512397331-15238-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
(cherry picked from commit 024949caf32805f4cc3e7d363a80084b47aac1f6)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

i386: Add EPYC-IBPB CPU model

EPYC-IBPB is a copy of the EPYC CPU model with
just CPUID_8000_0008_EBX_IBPB added.

Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-7-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 6cfbc54e8903a9bcc0346119949162d040c144c1)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

i386: Add new -IBRS versions of Intel CPU models

The new MSR IA32_SPEC_CTRL MSR was introduced by a recent Intel
microcode updated and can be used by OSes to mitigate
CVE-2017-5715. Unfortunately we can't change the existing CPU
models without breaking existing setups, so users need to
explicitly update their VM configuration to use the new *-IBRS
CPU model if they want to expose IBRS to guests.

The new CPU models are simple copies of the existing CPU models,
with just CPUID_7_0_EDX_SPEC_CTRL added and model_id updated.

Cc: Jiri Denemark <jdenemar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-6-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit ac96c41354b7e4c70b756342d9b686e31ab87458)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

i386: Add FEAT_8000_0008_EBX CPUID feature word

Add the new feature word and the "ibpb" feature flag.

Based on a patch by Paolo Bonzini.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-5-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 1b3420e1c4d523c49866cca4e7544753201cd43d)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

i386: Add spec-ctrl CPUID bit

Add the feature name and a CPUID_7_0_EDX_SPEC_CTRL macro.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-4-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit a2381f0934432ef2cd47a335348ba8839632164c)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

i386: Add support for SPEC_CTRL MSR

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-3-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit a33a2cfe2f771b360b3422f6cdf566a560860bfc)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

i386: Change X86CPUDefinition::model_id to const char*

It is valid to have a 48-character model ID on CPUID, however the
definition of X86CPUDefinition::model_id is char[48], which can
make the compiler drop the null terminator from the string.

If a CPU model happens to have 48 bytes on model_id, "-cpu help"
will print garbage and the object_property_set_str() call at
x86_cpu_load_def() will read data outside the model_id array.

We could increase the array size to 49, but this would mean the
compiler would not issue a warning if a 49-char string is used by
mistake for model_id.

To make things simpler, simply change model_id to be const char*,
and validate the string length using an assert() on
x86_register_cpudef_type().

Reported-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Message-Id: <20180109154519.25634-2-ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
(cherry picked from commit 807e9869b8c4119b81df902625af818519e01759)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

hw/pci-bridge: fix QEMU crash because of pcie-root-port

If we try to use more pcie_root_ports then available slots
and an IO hint is passed to the port, QEMU crashes because
we try to init the "IO hint" capability even if the device
is not created.
Fix it by checking for error before adding the capability,
so QEMU can fail gracefully.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit fced4d00e68e7559c73746d963265f7fd0b6abf9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

scsi-disk: release AioContext in unaligned WRITE SAME case

scsi_write_same_complete() can retry the write if the request was
unaligned. Make sure to release the AioContext when that code path is
taken!

This patch fixes a hang when QEMU terminates after an unaligned WRITE
SAME request has been processed with dataplane. The hang occurs because
iothread_stop_all() cannot acquire the AioContext lock that was leaked
by the IOThread in scsi_write_same_complete().

Fixes: b9e413dd37 ("block: explicitly acquire aiocontext in aio callbacks that need it").
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reported-by: Cong Li <coli@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180104142502.15175-1-stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 24355b79bdaf6ab12f7c610b032fc35ec045cd55)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

hw/sd/ssi-sd: Reset SD card on controller reset

Since ssi-sd is still using the legacy SD card API, the SD
card created by sd_init() is not plugged into any bus. This
means that the controller has to reset it manually.

Failing to do this mostly didn't affect the guest since the
guest typically does a programmed SD card reset as part of
its SD controller driver initialization, but meant that
migration failed because it's only in sd_reset() that we
set up the wpgrps_size field.

In the case of sd-ssi, we have to implement an entire
reset function since there wasn't one previously, and
that requires a QOM cast macro that got omitted when this
device was QOMified.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1515506513-31961-4-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 8046d44f3c9f67828d3368797d4d314433ee75e9)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

hw/sd/milkymist-memcard: Reset SD card on controller reset

Since milkymist-memcard is still using the legacy SD card API,
the SD card created by sd_init() is not plugged into any bus.
This means that the controller has to reset it manually.

Failing to do this mostly didn't affect the guest since the
guest typically does a programmed SD card reset as part of
its SD controller driver initialization, but meant that
migration failed because it's only in sd_reset() that we
set up the wpgrps_size field.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1515506513-31961-3-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 16bf0e0e7aaa8efc0b8ee7e2aecb2fa235f82d38)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

hw/sd/pl181: Reset SD card on controller reset

Since pl181 is still using the legacy SD card API, the SD
card created by sd_init() is not plugged into any bus. This
means that the controller has to reset it manually.

Failing to do this mostly didn't affect the guest since the
guest typically does a programmed SD card reset as part of
its SD controller driver initialization, but meant that
migration failed because it's only in sd_reset() that we
set up the wpgrps_size field.

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1739378
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 1515506513-31961-2-git-send-email-peter.maydell@linaro.org
(cherry picked from commit 0cb57cc701839e7358918d5f2922ccbc04d28d17)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

vhost: remove assertion to prevent crash

QEMU will assert on vhost-user backed virtio device hotplug if QEMU is
using more RAM regions than VHOST_MEMORY_MAX_NREGIONS (for example if
it were started with a lot of DIMM devices).

Fix it by returning error instead of asserting and let callers of
vhost_set_mem_table() handle error condition gracefully.

Cc: qemu-stable@nongnu.org
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Jay Zhou <jianjay.zhou@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit f4bf56fb78ed0e9f60fa1ed656c14ff4c494da5a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

virtio_error: don't invoke status callbacks

Backends don't need to know what frontend requested a reset,
and notifying then from virtio_error is messy because
virtio_error itself might be invoked from backend.

Let's just set the status directly.

Cc: qemu-stable@nongnu.org
Reported-by: Ilya Maximets <i.maximets@samsung.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 8fc47c876de638353bb635872f2c25bb7f4a3d6e)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

hw/intc/arm_gic: reserved register addresses are RAZ/WI

The GICv2 specification says that reserved register addresses
must RAZ/WI; now that we implement external abort handling
for Arm CPUs this means we must return MEMTX_OK rather than
MEMTX_ERROR, to avoid generating a spurious guest data abort.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1513183941-24300-3-git-send-email-peter.maydell@linaro.org
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
(cherry picked from commit 0cf09852015e47a5fbb974ff7ac320366afd21ee)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

hw/intc/arm_gicv3: Make reserved register addresses RAZ/WI

The GICv3 specification says that reserved register addresses
should RAZ/WI. This means we need to return MEMTX_OK, not MEMTX_ERROR,
because now that we support generating external aborts the
latter will cause an abort on new board models.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1513183941-24300-2-git-send-email-peter.maydell@linaro.org
Reviewed-by: Alistair Francis <alistair.francis@xilinx.com>
(cherry picked from commit f1945632b43e36bd9f3e0c2feb0e5b152be7ed91)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

vfio: Fix vfio-kvm group registration

Commit 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container
attaching") moved registration of groups with the vfio-kvm device from
vfio_get_group() to vfio_connect_container(), but it missed the case
where a group is attached to an existing container and takes an early
exit. Perhaps this is a less common case on ppc64/spapr, but on x86
(without viommu) all groups are connected to the same container and
thus only the first group gets registered with the vfio-kvm device.
This becomes a problem if we then hot-unplug the devices associated
with that first group and we end up with KVM being misinformed about
any vfio connections that might remain. Fix by including the call to
vfio_kvm_device_add_group() in this early exit path.

Fixes: 8c37faa475f3 ("vfio-pci, ppc64/spapr: Reorder group-to-container attaching")
Cc: qemu-stable@nongnu.org # qemu-2.10+
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
(cherry picked from commit 2016986aedb6ea2839662eb5f60630f3e231bd1a)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

block: Open backing image in force share mode for size probe

Management tools create overlays of running guests with qemu-img:

  $ qemu-img create -b /image/in/use.qcow2 -f qcow2 /overlay/image.qcow2

but this doesn't work anymore due to image locking:

    qemu-img: /overlay/image.qcow2: Failed to get shared "write" lock
    Is another process using the image?
    Could not open backing image to determine size.
Use the force share option to allow this use case again.

Cc: qemu-stable@nongnu.org
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
(cherry picked from commit cc954f01e3c004aad081aa36736a17e842b80211)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

block: Call .drain_begin only once in bdrv_drain_all_begin()

bdrv_drain_all_begin() used to call the .bdrv_co_drain_begin() driver
callback inside its polling loop. This means that how many times it got
called for each node depended on long it had to poll the event loop.

This is obviously not right and results in nodes that stay drained even
after bdrv_drain_all_end(), which calls .bdrv_co_drain_begin() once per
node.

Fix bdrv_drain_all_begin() to call the callback only once, too.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit 2da9b7d456278bccc6ce889ae350f2867155d7e8)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

block: Make bdrv_drain_invoke() recursive

This change separates bdrv_drain_invoke(), which calls the BlockDriver
drain callbacks, from bdrv_drain_recurse(). Instead, the function
performs its own recursion now.

One reason for this is that bdrv_drain_recurse() can be called multiple
times by bdrv_drain_all_begin(), but the callbacks may only be called
once. The separation is necessary to fix this bug.

The other reason is that we intend to go to a model where we call all
driver callbacks first, and only then start polling. This is not fully
achieved yet with this patch, as bdrv_drain_invoke() contains a
BDRV_POLL_WHILE() loop for the block driver callbacks, which can still
call callbacks for any unrelated event. It's a step in this direction
anyway.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
(cherry picked from commit db0289b9b26cb653d5662f5d6a2a52d70243cd56)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

block/nbd: fix segmentation fault when .desc is not null-terminated

The find_desc_by_name() from util/qemu-option.c relies on the .name not being
NULL to call strcmp(). This check becomes unsafe when the list is not
NULL-terminated, which is the case of nbd_runtime_opts in block/nbd.c, and can
result in segmentation fault when strcmp() tries to access an invalid memory:

    #0 0x00007fff8c75f7d4 in __strcmp_power9 () from /lib64/libc.so.6
    #1 0x00000000102d3ec8 in find_desc_by_name (desc=0x1036d6f0, name=0x28e46670 "server.path") at util/qemu-option.c:166
    #2 0x00000000102d93e0 in qemu_opts_absorb_qdict (opts=0x28e47a80, qdict=0x28e469a0, errp=0x7fffec247c98) at util/qemu-option.c:1026
    #3 0x000000001012a2e4 in nbd_open (bs=0x28e42290, options=0x28e469a0, flags=24578, errp=0x7fffec247d80) at block/nbd.c:406
    #4 0x00000000100144e8 in bdrv_open_driver (bs=0x28e42290, drv=0x1036e070 <bdrv_nbd_unix>, node_name=0x0, options=0x28e469a0, open_flags=24578, errp=0x7fffec247f50) at block.c:1135
    #5 0x0000000010015b04 in bdrv_open_common (bs=0x28e42290, file=0x0, options=0x28e469a0, errp=0x7fffec247f50) at block.c:1395

>From gdb, the desc[i].name was not NULL and resulted in strcmp() accessing an
invalid memory:

    >>> p desc[5]
    $8 = {
      name = 0x1037f098 "R27A",
      type = 1561964883,
      help = 0xc0bbb23e <error: Cannot access memory at address 0xc0bbb23e>,
      def_value_str = 0x2 <error: Cannot access memory at address 0x2>
    }
    >>> p desc[6]
    $9 = {
      name = 0x103dac78 <__gcov0.do_qemu_init_bdrv_nbd_init> "\001",
      type = 272101528,
      help = 0x29ec0b754403e31f <error: Cannot access memory at address 0x29ec0b754403e31f>,
      def_value_str = 0x81f343b9 <error: Cannot access memory at address 0x81f343b9>
    }

This patch fixes the segmentation fault in strcmp() by adding a NULL element at
the end of nbd_runtime_opts.desc list, which is the common practice to most of
other structs like runtime_opts in block/null.c. Thus, the desc[i].name != NULL
check becomes safe because it will not evaluate to true when .desc list reached
its end.

Reported-by: R. Nageswara Sastry <nasastry@in.ibm.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1727259
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.vnet.ibm.com>
Message-Id: <20180105133241.14141-2-muriloo@linux.vnet.ibm.com>
CC: qemu-stable@nongnu.org
Fixes: 7ccc44fd7d1dfa62c4d6f3a680df809d6e7068ce
Signed-off-by: Eric Blake <eblake@redhat.com>
(cherry picked from commit c4365735a7d38f4355c6f77e6670d3972315f7c2)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

qemu-pr-helper: miscellaneous fixes

1) Return a generic sense if TEST UNIT READY does not provide one;

2) Fix two mistakes in copying from the spec.

Cc: qemu-stable@nongnu.org
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a4a9b6eaf35dbe4bf0e069854945bf5e45fc7eab)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

qemu-options: Remove stray colons from output of --help

Commit 43f187a broke --help: it put colons into blank lines. It
removed the colon from DEFHEADING(TITLE:) and added it back in the
macro expansion of DEFHEADING(TITLE), so hxtool can emit "@subsection
TITLE" more easily. Trouble is it's added back even for the blank
lines made with DEFHEADING().

Put the colons back where they were before commit 43f187a, and strip
them in hxtool instead.

Cc: Paolo Bonzini <pbonzini@redhat.com>
CC: qemu-stable@nongnu.org
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20171002140307.5292-2-armbru@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
(cherry picked from commit de6b4f908c300c7e7e0dc057310f5cbdcf1aed78)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

target/sh4: fix TCG leak during gusa sequence

This fixes bug #1735384 while running java under qemu-sh4. When debug
was enabled it showed a problem with TCG temps. Once fixed I was able
to run java -version normally.

Cc: qemu-stable@nongnu.org
Reported-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20171206093050.25308-1-alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
(cherry picked from commit 6d56fc6cc372284a4571f09b361a9ccd99318103)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

block/iscsi: dont leave allocmap in an invalid state on UNMAP failure

we forgot to set the allocmap to invalid if an UNMAP call fails.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1512733868-9009-2-git-send-email-pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit aef172ffdc2f9c41d9cc043a55f1259e7c07e587)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

target/i386: Fix handling of VEX prefixes

In commit e3af7c788b73a6495eb9d94992ef11f6ad6f3c56 we
replaced direct calls to to cpu_ld*_code() with calls
to the x86_ld*_code() wrappers which incorporate an
advance of s->pc. Unfortunately we didn't notice that
in one place the old code was deliberately not incrementing
s->pc:

@@ -4501,7 +4528,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
             static const int pp_prefix[4] = {
                 0, PREFIX_DATA, PREFIX_REPZ, PREFIX_REPNZ
             };
-            int vex3, vex2 = cpu_ldub_code(env, s->pc);
+            int vex3, vex2 = x86_ldub_code(env, s);

             if (!CODE64(s) && (vex2 & 0xc0) != 0xc0) {
                 /* 4.1.4.6: In 32-bit mode, bits [7:6] must be 11b,

This meant we were mishandling this set of instructions.
Remove the manual advance of s->pc for the "is VEX" case
(which is now done by x86_ldub_code()) and instead rewind
PC in the case where we decide that this isn't really VEX.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Reported-by: Alexandro Sanchez Bach <alexandro@phi.nz>
Message-Id: <1513163959-17545-1-git-send-email-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cfcca361d77142f25fb1128755084cf91faa4db7)
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Update version for v2.11.0 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Update version for v2.11.0-rc5 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Generate UNDEF for 32-bit Thumb2 insns

The refactoring of commit 296e5a0a6c3935 has a nasty bug:
it accidentally dropped the generation of code to raise
the UNDEF exception when disas_thumb2_insn() returns nonzero.
This means that 32-bit Thumb2 instruction patterns that
ought to UNDEF just act like nops instead. This is likely
to break any number of things, including the kernel's "disable
the FPU and use the UNDEF exception to identify when to turn
it back on again" trick.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1513006964-3371-1-git-send-email-peter.maydell@linaro.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Update version for v2.11.0-rc4 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

vhost-scsi: add missing virtqueue_size parameter

Commit 5c0919d02066 ("virtio-scsi: Add virtqueue_size parameter allowing
virtqueue size to be set.") introduced a new parameter to virtio-scsi.
Later, commit 920036106044 ("vhost-user-scsi: add missing virtqueue_size
param") added that parameter to the new vhost-user-scsi interface but
neglected the existing vhost-scsi interface it was built on.

Apply the same change to vhost-scsi, so that we can boot a guest with
a device defined. This also avoids crashing a guest when hotplugging
a vhost-scsi device.

Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com>
Message-id: 20171201151538.6844-2-farman@linux.vnet.ibm.com
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.11-20171205' into staging

ppc patch queue 2017-12-05

Alas, this is yet another fix for ppc that I think it's worth
squeezing into 2.11.  It's a really ugly fix for some pretty ugly
code, but it does seem to address a real problem.  It's also a problem
that's appeared relatively recently, since it was either created by,
or made much easier to trigger by, by the merge of MTTCG.

# gpg: Signature made Tue 05 Dec 2017 05:24:04 GMT
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.11-20171205:
  target/ppc: Fix system lockups caused by interrupt_request state corruption

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/ppc: Fix system lockups caused by interrupt_request state corruption

Occasionally in Linux guests on x86_64 we're seeing logs like:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000004

when they should read:

ppc_set_irq: 0x55b4e0d562f0 n_IRQ 8 level 1 => pending 00000100req 00000002

The "00000004" is CPU_INTERRUPT_EXITTB yet the code calls
cpu_interrupt(cs, CPU_INTERRUPT_HARD) ("00000002") in this function
just before the log message. Something is causing the HARD bit setting
to get lost.

The knock on effect of losing that bit is the decrementer timer interrupts
don't get delivered which causes the guest to sit idle in its idle handler
and 'hang'.

The issue occurs due to races from code which sets CPU_INTERRUPT_EXITTB.

Rather than poking directly into cs->interrupt_request, that code needs to:

a) hold BQL
b) use the cpu_interrupt() helper

This patch fixes the call sites to do this, fixing the hang. The calls
are made from a variety of contexts so a helper function is added to handle
the necessary locking. This can likely be improved and optimised in the future
but it ensures the code is correct and doesn't lockup as it stands today.

Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches for 2.11.0-rc4

# gpg: Signature made Mon 04 Dec 2017 16:46:07 GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  blockjob: Make block_job_pause_all() keep a reference to the jobs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

blockjob: Make block_job_pause_all() keep a reference to the jobs

Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are
pausing all block jobs during bdrv_reopen_multiple() to prevent any of
them from finishing and removing nodes from the graph while they are
being reopened.

It turns out that pausing a block job doesn't necessarily prevent it
from finishing: a paused block job can still run its exit function
from the main loop and call block_job_completed(). The mirror block
job in particular always goes to the main loop while it is paused (by
virtue of the bdrv_drained_begin() call in mirror_run()).

Destroying a paused block job during bdrv_reopen_multiple() has two
consequences:

   1) The references to the nodes involved in the job are released,
      possibly destroying some of them. If those nodes were in the
      reopen queue this would trigger the problem originally described
      in commit 40840e419be, crashing QEMU.

   2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
      not be doing all necessary bdrv_parent_drained_end() calls.

I can reproduce problem 1) easily with iotest 030 by increasing
STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
the iotest like in this example:

   https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html

This patch keeps an additional reference to all block jobs between
block_job_pause_all() and block_job_resume_all(), guaranteeing that
they are kept alive.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: fixes for rc3

A bunch of fixes all over the place.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 01 Dec 2017 17:06:33 GMT
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream:
  pc: fix crash on attempted cpu unplug
  virtio: check VirtQueue Vring object is set
  vhost: fix error check in vhost_verify_ring_mappings()
  dump-guest-memory.py: fix No symbol "vmcoreinfo_find"
  vhost: restore avail index from vring used index on disconnection
  virtio: Add queue interface to restore avail index from vring used index
  i386/msi: Correct mask of destination ID in MSI address

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.11-20171204' into staging

ppc patch queue 2017-12-04

We are, alas, not yet to the bottom of ppc bugs.  This pull request
fixes several more.  I believe they're important enough to include in
2.11. despite the late date.

# gpg: Signature made Mon 04 Dec 2017 03:40:56 GMT
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.11-20171204:
  spapr: Include "pre-plugged" DIMMS in ram size calculation at reset
  target-ppc: Don't invalidate non-supported msr bits
  pseries: fix TCG migration

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

spapr: Include "pre-plugged" DIMMS in ram size calculation at reset

At guest reset time, we allocate a hash page table (HPT) for the guest
based on the guest's RAM size.  If dynamic HPT resizing is not available we
use the maximum RAM size, if it is we use the current RAM size.

But the "current RAM size" calculation is incorrect - we just use the
"base" ram_size from the machine structure.  This doesn't include any
pluggable DIMMs that are already plugged at reset time.

This means that if you try to start a 'pseries' machine with a DIMM
specified on the command line that's much larger than the "base" RAM size,
then the guest will get a woefully inadequate HPT.  This can lead to a
guest freeze during boot as it runs out of HPT space during initial MMU
setup.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>

pc: fix crash on attempted cpu unplug

when qemu is started with '-no-acpi' CLI option, an attempt
to unplug a CPU using device_del results in null pointer
dereference at:

  #0 object_get_class
  #1 pc_machine_device_unplug_request_cb
  #2 qmp_marshal_device_del

which is caused by pcms->acpi_dev == NULL due to ACPI support
being disabled.

Considering that ACPI support is necessary for unplug to work,
check that it's enabled and fail unplug request gracefully
if no acpi device were found.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio: check VirtQueue Vring object is set

A guest could attempt to use an uninitialised VirtQueue object
or unset Vring.align leading to a arithmetic exception. Add check
to avoid it.

Reported-by: Zhangboxian <zhangboxian@huawei.com>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>

vhost: fix error check in vhost_verify_ring_mappings()

Since commit f1f9e6c5 "vhost: adapt vhost_verify_ring_mappings() to
virtio 1 ring layout", we check the mapping of each part (descriptor
table, available ring and used ring) of each virtqueue separately.

The checking of a part is done by the vhost_verify_ring_part_mapping()
function: it returns either 0 on success or a negative errno if the
part cannot be mapped at the same place.

Unfortunately, the vhost_verify_ring_mappings() function checks its
return value the other way round. It means that we either:
- only verify the descriptor table of the first virtqueue, and if it
is valid we ignore all the other mappings
- or ignore all broken mappings until we reach a valid one

ie, we only raise an error if all mappings are broken, and we consider
all mappings are valid otherwise (false success), which is obviously
wrong.

This patch ensures that vhost_verify_ring_mappings() only returns
success if ALL mappings are okay.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

dump-guest-memory.py: fix No symbol "vmcoreinfo_find"

When qemu is compiled without debug, the dump gdb python script can fail with:

Error occurred in Python command: No symbol "vmcoreinfo_find" in current context.

Because vmcoreinfo_find() is inlined and not exported.

Use the underlying object_resolve_path_type() to get the instance instead.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: restore avail index from vring used index on disconnection

vhost_virtqueue_stop() gets avail index value from the backend,
except if the backend is not responding.

It happens when the backend crashes, and in this case, internal
state of the virtio queue is inconsistent, making packets
to corrupt the vring state.

With a Linux guest, it results in following error message on
backend reconnection:

[   22.444905] virtio_net virtio0: output.0:id 0 is not a head!
[   22.446746] net enp0s3: Unexpected TXQ (0) queue failure: -5
[   22.476360] net enp0s3: Unexpected TXQ (0) queue failure: -5

Fixes: 283e2c2adcb8 ("net: virtio-net discards TX data after link down")
Cc: qemu-stable@nongnu.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio: Add queue interface to restore avail index from vring used index

In case of backend crash, it is not possible to restore internal
avail index from the backend value as vhost_get_vring_base
callback fails.

This patch provides a new interface to restore internal avail index
from the vring used index, as done by some vhost-user backend on
reconnection.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/msi: Correct mask of destination ID in MSI address

According to SDM 10.11.1, only [19:12] bits of MSI address are
Destination ID, change the mask to avoid ambiguity for VT-d spec
has used the bit 4 to indicate a remappable interrupt request.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

target-ppc: Don't invalidate non-supported msr bits

The msr invalidation code (commits 993eb and 2360b) inverts all
bits except MSR_TGPR and MSR_HVB. On non PowerPC 601 processors
this leads to incorrect change of excp_prefix in hreg_store_msr()
function. The problem is that new msr value get multiplied by msr_mask
and inverted msr does not, thus values of MSR_EP bit in new msr value
and inverted msr are distinct, so that excp_prefix changes but should
not.

Signed-off-by: Kurban Mallachiev <mallachiev@ispras.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

pseries: fix TCG migration

Migration of pseries is broken with TCG because
QEMU tries to restore KVM MMU state unconditionally.

The result is a SIGSEGV in kvm_vm_ioctl():

  #0  kvm_vm_ioctl (s=0x0, type=-2146390353)
      at qemu/accel/kvm/kvm-all.c:2032
  #1  0x00000001003e3e2c in kvmppc_configure_v3_mmu (cpu=<optimized out>,
      radix=<optimized out>, gtse=<optimized out>, proc_tbl=<optimized out>)
      at qemu/target/ppc/kvm.c:396
  #2  0x00000001002f8b88 in spapr_post_load (opaque=0x1019103c0,
      version_id=<optimized out>) at qemu/hw/ppc/spapr.c:1578
  #3  0x000000010059e4cc in vmstate_load_state (f=0x106230000,
      vmsd=0x1009479e0 <vmstate_spapr>, opaque=0x1019103c0,
      version_id=<optimized out>) at qemu/migration/vmstate.c:165
  #4  0x00000001005987e0 in vmstate_load (f=<optimized out>, se=<optimized out>)
      at qemu/migration/savevm.c:748

This patch fixes the problem by not calling the KVM function with the
TCG mode.

Fixes: d39c90f5f3 ("spapr: Fix migration of Radix guests")
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Update version for v2.11.0-rc3 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches for 2.11.0-rc3

# gpg: Signature made Wed 29 Nov 2017 15:25:13 GMT
# gpg:                using RSA key 0x7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  block/nfs: fix nfs_client_open for filesize greater than 1TB
  blockjob: reimplement block_job_sleep_ns to allow cancellation
  blockjob: introduce block_job_do_yield
  blockjob: remove clock argument from block_job_sleep_ns
  block: Expect graph changes in bdrv_parent_drained_begin/end
  blockjob: Remove the job from the list earlier in block_job_unref()
  QAPI & interop: Clarify events emitted by 'block-job-cancel'
  qemu-options: Mention locking option of file driver
  docs: Add image locking subsection
  iotests: fix 075 and 078

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'mreitz/tags/pull-block-2017-11-29' into queue-block

One block patch for 2.11.0-rc3

# gpg: Signature made Wed Nov 29 15:28:38 2017 CET
# gpg:                using RSA key F407DB0061D5CF40
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40

* mreitz/tags/pull-block-2017-11-29:
  block/nfs: fix nfs_client_open for filesize greater than 1TB

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/nfs: fix nfs_client_open for filesize greater than 1TB

DIV_ROUND_UP(st.st_size, BDRV_SECTOR_SIZE) was overflowing ret (int) if
st.st_size is greater than 1TB.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-id: 1511798407-31129-1-git-send-email-pl@kamp.de
Signed-off-by: Max Reitz <mreitz@redhat.com>

blockjob: reimplement block_job_sleep_ns to allow cancellation

This reverts the effects of commit 4afeffc857 ("blockjob: do not allow
coroutine double entry or entry-after-completion", 2017-11-21)

This fixed the symptom of a bug rather than the root cause. Canceling the
wait on a sleeping blockjob coroutine is generally fine, we just need to
make it work correctly across AioContexts. To do so, use a QEMUTimer
that calls block_job_enter. Use a mutex to ensure that block_job_enter
synchronizes correctly with block_job_sleep_ns.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-By: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjob: introduce block_job_do_yield

Hide the clearing of job->busy in a single function, and set it
in block_job_enter. This lets block_job_do_yield verify that
qemu_coroutine_enter is not used while job->busy = false.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-By: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjob: remove clock argument from block_job_sleep_ns

All callers are using QEMU_CLOCK_REALTIME, and it will not be possible to
support more than one clock when block_job_sleep_ns switches to a single
timer stored in the BlockJob struct.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Tested-By: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Expect graph changes in bdrv_parent_drained_begin/end

The .drained_begin/end callbacks can (directly or indirectly via
aio_poll()) cause block nodes to be removed or the current BdrvChild to
point to a different child node.

Use QLIST_FOREACH_SAFE() to make sure we don't access invalid
BlockDriverStates or accidentally continue iterating the parents of the
new child node instead of the node we actually came from.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Tested-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjob: Remove the job from the list earlier in block_job_unref()

When destroying a block job in block_job_unref() we should remove it
from the job list before calling block_job_remove_all_bdrv().

This is because removing the BDSs can trigger an aio_poll() and wake
up other jobs that might attempt to use the block job list. If that
happens the job we're currently destroying should not be in that list
anymore.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2017-11-28' into staging

nbd patches for 2017-11-28

Eric Blake - 0/2 fix two NBD server CVEs

# gpg: Signature made Tue 28 Nov 2017 12:58:29 GMT
# gpg:                using RSA key 0xA7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2017-11-28:
  nbd/server: CVE-2017-15118 Stack smash on large export name
  nbd/server: CVE-2017-15119 Reject options larger than 32M

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

nbd/server: CVE-2017-15118 Stack smash on large export name

Introduced in commit f37708f6b8 (2.10).  The NBD spec says a client
can request export names up to 4096 bytes in length, even though
they should not expect success on names longer than 256.  However,
qemu hard-codes the limit of 256, and fails to filter out a client
that probes for a longer name; the result is a stack smash that can
potentially give an attacker arbitrary control over the qemu
process.

The smash can be easily demonstrated with this client:
$ qemu-io f raw nbd://localhost:10809/$(printf %3000d 1 | tr ' ' a)

If the qemu NBD server binary (whether the standalone qemu-nbd, or
the builtin server of QMP nbd-server-start) was compiled with
-fstack-protector-strong, the ability to exploit the stack smash
into arbitrary execution is a lot more difficult (but still
theoretically possible to a determined attacker, perhaps in
combination with other CVEs).  Still, crashing a running qemu (and
losing the VM) is bad enough, even if the attacker did not obtain
full execution control.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd/server: CVE-2017-15119 Reject options larger than 32M

The NBD spec gives us permission to abruptly disconnect on clients
that send outrageously large option requests, rather than having
to spend the time reading to the end of the option.  No real
option request requires that much data anyways; and meanwhile, we
already have the practice of abruptly dropping the connection on
any client that sends NBD_CMD_WRITE with a payload larger than 32M.

For comparison, nbdkit drops the connection on any request with
more than 4096 bytes; however, that limit is probably too low
(as the NBD spec states an export name can theoretically be up
to 4096 bytes, which means a valid NBD_OPT_INFO could be even
longer) - even if qemu doesn't permit exports longer than 256
bytes.

It could be argued that a malicious client trying to get us to
read nearly 4G of data on a bad request is a form of denial of
service.  In particular, if the server requires TLS, but a client
that does not know the TLS credentials sends any option (other
than NBD_OPT_STARTTLS or NBD_OPT_EXPORT_NAME) with a stated
payload of nearly 4G, then the server was keeping the connection
alive trying to read all the payload, tying up resources that it
would rather be spending on a client that can get past the TLS
handshake.  Hence, this warranted a CVE.

Present since at least 2.5 when handling known options, and made
worse in 2.6 when fixing support for NBD_FLAG_C_FIXED_NEWSTYLE
to handle unknown options.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>

Merge remote-tracking branch 'remotes/berrange/tags/pull-qio-2017-11-28-1' into staging

Merge qio 2017/11/28 v1

# gpg: Signature made Tue 28 Nov 2017 10:49:08 GMT
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qio-2017-11-28-1:
  sockets: avoid crash when cleaning up sockets for an invalid FD

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

sockets: avoid crash when cleaning up sockets for an invalid FD

If socket_listen_cleanup is passed an invalid FD, then querying the socket
local address will fail. We must thus be prepared for the returned addr to
be NULL

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Tue 28 Nov 2017 03:58:11 GMT
# gpg:                using RSA key 0xEF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  virtio-net: don't touch virtqueue if vm is stopped

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

virtio-net: don't touch virtqueue if vm is stopped

Guest state should not be touched if VM is stopped, unfortunately we
didn't check running state and tried to drain tx queue unconditionally
in virtio_net_set_status(). A crash was then noticed as a migration
destination when user type quit after virtqueue state is loaded but
before region cache is initialized. In this case,
virtio_net_drop_tx_queue_data() tries to access the uninitialized
region cache.

Fix this by only dropping tx queue data when vm is running.

Fixes: 283e2c2adcb80 ("net: virtio-net discards TX data after link down")
Cc: Yuri Benditovich <yuri.benditovich@daynix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

QAPI & interop: Clarify events emitted by 'block-job-cancel'

When you cancel an in-progress 'mirror' job (or "active `block-commit`")
with QMP `block-job-cancel`, it emits the event: BLOCK_JOB_CANCELLED.
However, when `block-job-cancel` is issued *after* `drive-mirror` has
indicated (via the event BLOCK_JOB_READY) that the source and
destination have reached synchronization:

    [...] # Snip `drive-mirror` invocation & outputs
    {
      "execute":"block-job-cancel",
      "arguments":{
        "device":"virtio0"
      }
    }

    {"return": {}}

It (`block-job-cancel`) will counterintuitively emit the event
'BLOCK_JOB_COMPLETED':

    {
      "timestamp":{
        "seconds":1510678024,
        "microseconds":526240
      },
      "event":"BLOCK_JOB_COMPLETED",
      "data":{
        "device":"virtio0",
        "len":41126400,
        "offset":41126400,
        "speed":0,
        "type":"mirror"
      }
    }

But this is expected behaviour, where the _COMPLETED event indicates
that synchronization has successfully ended (and the destination now has
a point-in-time copy, which is at the time of cancel).

So add a small note to this effect in 'block-core.json'.  While at it,
also update the "Live disk synchronization -- drive-mirror and
blockdev-mirror" section in 'live-block-operations.rst'.

(Thanks: Max Reitz for reminding me of this caveat on IRC.)

Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.11-20171127' into staging

ppc patch queue 2017-11-27

This series contains a couple of migration fixes for hash guests on
POWER9 radix MMU hosts.

# gpg: Signature made Mon 27 Nov 2017 04:27:15 GMT
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.11-20171127:
  target/ppc: Fix setting of cpu->compat_pvr on incoming migration
  target/ppc: Move setting of patb_entry on hash table init

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

qemu-options: Mention locking option of file driver

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

docs: Add image locking subsection

This documents the image locking feature and explains when and how
related options can be used.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: fix 075 and 078

Both of these tests are for formats which now stipulate that they are
read-only. Adjust the tests to match.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

target/ppc: Fix setting of cpu->compat_pvr on incoming migration

cpu->compat_pvr is used to store the current compat mode of the cpu.

On the receiving side during incoming migration we check compatibility
with the compat mode by calling ppc_set_compat(). However we fail to set
the compat mode with the hypervisor since the "new" compat mode doesn't
differ from the current (due to a "cpu->compat_pvr != compat_pvr" check).
This means that kvm runs the vcpus without a compat mode, which is the
incorrect behaviour. The implication being that a compatibility mode
will never be in effect after migration.

To fix this so that the compat mode is correctly set with the
hypervisor, store the desired compat mode and reset cpu->compat_pvr to
zero before calling ppc_set_compat().

Fixes: 5dfaa532 ("ppc: fix ppc_set_compat() with KVM PR")
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: Move setting of patb_entry on hash table init

The patb_entry is used to store the location of the process table in
guest memory. The msb is also used to indicate the mmu mode of the
guest, that is patb_entry & 1 << 63 ? radix_mode : hash_mode.

Currently we set this to zero in spapr_setup_hpt_and_vrma() since if
this function gets called then we know we're hash. However some code
paths, such as setting up the hpt on incoming migration of a hash guest,
call spapr_reallocate_hpt() directly bypassing this higher level
function. Since we assume radix if the host is capable this results in
the msb in patb_entry being left set so in spapr_post_load() we call
kvmppc_configure_v3_mmu() and tell the host we're radix which as
expected means addresses cannot be translated once we actually run the cpu.

To fix this move the zeroing of patb_entry into spapr_reallocate_hpt().

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

osdep.h: Make TIME_MAX handle different time_t types

In our various supported host OSes, the time_t type may be either 32
or 64 bit, and could in theory also be either signed or unsigned.
Notably, in OpenBSD time_t is a 64 bit type even if 'long' is 32
bits, so using LONG_MAX for TIME_MAX is incorrect.

Use an approach suggested by Paolo Bonzini which calculates
the maximum value of the type rather than hardcoding it;
to do this we use the TYPE_MAXIMUM macro from Gnulib.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1511452598-6077-1-git-send-email-peter.maydell@linaro.org

hw/arm/virt: Add 2.11 machine type

Add virt-2.11 machine type.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-id: 1511516626-21178-1-git-send-email-eric.auger@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20171124' into staging

Deal with the fallout from the deletion of the old s390 virtio header
in Linux master.

# gpg: Signature made Fri 24 Nov 2017 09:56:49 GMT
# gpg:                using RSA key 0xDECF6B93C6F02FAF
# gpg: Good signature from "Cornelia Huck <conny@cornelia-huck.de>"
# gpg:                 aka "Cornelia Huck <huckc@linux.vnet.ibm.com>"
# gpg:                 aka "Cornelia Huck <cornelia.huck@de.ibm.com>"
# gpg:                 aka "Cornelia Huck <cohuck@kernel.org>"
# gpg:                 aka "Cornelia Huck <cohuck@redhat.com>"
# Primary key fingerprint: C3D0 D66D C362 4FF6 A8C0  18CE DECF 6B93 C6F0 2FAF

* remotes/cohuck/tags/s390x-20171124:
  s390/kvm_virtio/linux-headers: remove traces of old virtio transport

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

s390/kvm_virtio/linux-headers: remove traces of old virtio transport

We no longer support the old s390 transport, neither does the newest
Linux kernel. Remove it from the linux header script as well as the
s390x virtio code. We still should handle the VIRTIO_NOTIFY hypercall,
to tolerate early printk on older guest kernels without an sclp console.
We continue to ignore these events.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Message-Id: <20171115154223.109991-1-borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>

configure: Deal with OpenBSD/i386 emulation linker

OpenBSD/i386 uses elf_i386_obsd for the emulation linker.

Signed-off-by: Brad Smith <brad@comstyle.com>
Message-id: 20171107234608.GA395@humpty.home.comstyle.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20171122' into staging

migration/next for 20171122

# gpg: Signature made Wed 22 Nov 2017 08:43:13 GMT
# gpg:                using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration/20171122:
  migration/ram.c: do not set 'postcopy_running' in POSTCOPY_INCOMING_END
  migration, xen: Fix block image lock issue on live migration

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.11-20171122' into staging

ppc patch queue 2017-11-22

Several more fixes to merge for qemu-2.11.

# gpg: Signature made Wed 22 Nov 2017 04:29:57 GMT
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.11-20171122:
  ppc: fix VTB migration
  spapr: Implement bug in spapr-vty device to be compatible with PowerVM
  hw/ppc/spapr: Fix virtio-scsi bootindex handling for LUNs >= 256

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Fix build of console and GUI executables for Windows

It was broken by commit 8ecc89f6e792152496eccb684d6c8c48aba8027d which
moved the SDL linker flags from macro libs_softmmu to macro SDL_LIBS.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 20171116163732.31584-1-sw@weilnetz.de
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tcg: Fix compilation without TCG

Commit 27266271977c started to use tb_unlock() and tlb_set_dirty() on
non TCG code. Add the functions as stubs, so that builds with TCG
disabled continue to compile.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
[PMM: tweaked commit message]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

migration/ram.c: do not set 'postcopy_running' in POSTCOPY_INCOMING_END

When migrating a VM with 'migrate_set_capability postcopy-ram on'
a postcopy_state is set during the process, ending up with the
state POSTCOPY_INCOMING_END when the migration is over. This
postcopy_state is taken into account inside ram_load to check
how it will load the memory pages. This same ram_load is called when
in a loadvm command.

Inside ram_load, the logic to see if we're at postcopy_running state
is:

postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING

postcopy_state_get() returns this enum type:

typedef enum {
    POSTCOPY_INCOMING_NONE = 0,
    POSTCOPY_INCOMING_ADVISE,
    POSTCOPY_INCOMING_DISCARD,
    POSTCOPY_INCOMING_LISTENING,
    POSTCOPY_INCOMING_RUNNING,
    POSTCOPY_INCOMING_END
} PostcopyState;

In the case where ram_load is executed and postcopy_state is
POSTCOPY_INCOMING_END, postcopy_running will be set to 'true' and
ram_load will behave like a postcopy is in progress. This scenario isn't
achievable in a migration but it is reproducible when executing
savevm/loadvm after migrating with 'postcopy-ram on', causing loadvm
to fail with Error -22:

Source:

(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate tcp:127.0.0.1:4444

Dest:

(qemu) migrate_set_capability postcopy-ram on
(qemu)
ubuntu1704-intel login:
Ubuntu 17.04 ubuntu1704-intel ttyS0

ubuntu1704-intel login: (qemu)
(qemu) savevm test1
(qemu) loadvm test1
Unknown combination of migration flags: 0x4 (postcopy mode)
error while loading state for instance 0x0 of device 'ram'
Error -22 while loading VM state
(qemu)

This patch fixes this problem by changing the existing logic for
postcopy_advised and postcopy_running in ram_load, making them
'false' if we're at POSTCOPY_INCOMING_END state.

Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
CC: Juan Quintela <quintela@redhat.com>
CC: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

ppc: fix VTB migration

Migration of a system under stress (for example, with
"stress-ng --numa 2") triggers on the destination
some kernel watchdog messages like:

NMI watchdog: BUG: soft lockup - CPU#0 stuck for 3489660870s!
NMI watchdog: BUG: soft lockup - CPU#1 stuck for 3489660884s!

This problem appears with the changes introduced by
42043e4 spapr: clock should count only if vm is running

I think this commit only triggers the problem.

Kernel computes the soft lockup duration using the
Virtual Timebase register (VTB), not using the Timebase
Register (TBR, the one 42043e4 stops).

It appears VTB is not migrated, so this patch adds it in
the list of the SPRs to migrate, and fixes the problem.

For the migration, I've tested a migration from qemu-2.8.0 and
pseries-2.8.0 to a patched master (qemu-2.11.0-rc1). The received
VTB is 0 (as is it not initialized by qemu-2.8.0), but the value
seems to be ignored by KVM and a non zero VTB is used by the kernel.
I have no explanation for that, but as the original problem appears
only with SMP system under stress I suspect some problems in KVM
(I think because VTB is shared by all threads of a core).

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: Implement bug in spapr-vty device to be compatible with PowerVM

The spapr-vty device implements the PAPR defined virtual console,
which is also implemented by IBM's proprietary PowerVM hypervisor.

PowerVM's implementation has a bug where it inserts an extra \0 after
every \r going to the guest. Because of that Linux's guest side
driver has a workaround which strips \0 characters that appear
immediately after a \r.

That means that when running under qemu, sending a binary stream from
host to guest via spapr-vty which happens to include a \r\0 sequence
will get corrupted by that workaround.

To deal with that, this patch duplicates PowerVM's bug, inserting an
extra \0 after each \r. Ugly, but the best option available.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>

hw/ppc/spapr: Fix virtio-scsi bootindex handling for LUNs >= 256

LUNs >= 256 have to be encoded with the so-called "flat space
addressing method" for virtio-scsi, where an additional bit has to
be set. SLOF already took care of this with the following commit:

https://git.qemu.org/?p=SLOF.git;a=commitdiff;h=f72a37713fea47da
(see https://bugzilla.redhat.com/show_bug.cgi?id=1431584 for details)

But QEMU does not use this encoding yet for device tree paths
that have to be handed over to SLOF to deal with the "bootindex"
property, so SLOF currently fails to boot from virtio-scsi devices
with LUNs >= 256 in the right boot order. Fix it by using the bit
to indicate the "flat space addressing method" for LUNs >= 256.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

migration, xen: Fix block image lock issue on live migration

When doing a live migration of a Xen guest with libxl, the images for
block devices are locked by the original QEMU process, and this prevent
the QEMU at the destination to take the lock and the migration fail.

>From QEMU point of view, once the RAM of a domain is migrated, there is
two QMP commands, "stop" then "xen-save-devices-state", at which point a
new QEMU is spawned at the destination.

Release locks in "xen-save-devices-state" so the destination can takes
them, if it's a live migration.

This patch add the "live" parameter to "xen-save-devices-state" which
default to true so older version of libxenlight can work with newer
version of QEMU.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Update version for v2.11.0-rc2 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>