git.proxmox.com Git - mirror

block: Cater to iscsi with non-power-of-2 discard

Dell Equallogic iSCSI SANs have a very unusual advertised geometry:

$ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
wsnz:0
maximum compare and write length:1
optimal transfer length granularity:0
maximum transfer length:0
optimal transfer length:0
maximum prefetch xdread xdwrite transfer length:0
maximum unmap lba count:30720
maximum unmap block descriptor count:2
optimal unmap granularity:30720
ugavalid:1
unmap granularity alignment:0
maximum write same length:30720

which says that both the maximum and the optimal discard size
is 15M. It is not immediately apparent if the device allows
discard requests not aligned to the optimal size, nor if it
allows discards at a finer granularity than the optimal size.

I tried to find details in the SCSI Commands Reference Manual
Rev. A on what valid values of maximum and optimal sizes are
permitted, but while that document mentions a "Block Limits
VPD Page", I couldn't actually find documentation of that page
or what values it would have, or if a SCSI device has an
advertisement of its minimal unmap granularity. So it is not
obvious to me whether the Dell Equallogic device is compliance
with the SCSI specification.

Fortunately, it is easy enough to support non-power-of-2 sizing,
even if it means we are less efficient than truly possible when
targetting that device (for example, it means that we refuse to
unmap anything that is not a multiple of 15M and aligned to a
15M boundary, even if the device truly does support a smaller
granularity where unmapping actually works).

Reported-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

osdep: Document differences in rounding macros

Make it obvious which macros are safe in which situations.

Useful since QEMU_ALIGN_UP and ROUND_UP both purport to do
the same thing, but differ on whether the alignment must be
a power of 2.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-4-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

nbd: Limit nbdflags to 16 bits

Rather than asserting that nbdflags is within range, just give
it the correct type to begin with :) nbdflags corresponds to
the per-export portion of NBD Protocol "transmission flags", which
is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO.

Furthermore, upstream NBD has never passed the global flags to
the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first
introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually
tried to OR the global flags with the transmission flags, with
the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9
caused all earlier NBD 3.x clients to treat every export as
read-only; NBD 3.10 and later intentionally clip things to 16
bits to pass only transmission flags). Qemu should follow suit,
since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE
and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior
during transmission.

CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-3-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

nbd: Fix bad flag detection on server

Commit ab7c548e added a check for invalid flags, but used an
early return on error instead of properly going through the
cleanup label.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-2-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

i2c: fix migration regression introduced by broadcast support

QEMU fails migration with following error:

qemu-system-x86_64: Missing section footer for i2c_bus
qemu-system-x86_64: load of migration failed: Invalid argument

when migrating from:
qemu-system-x86_64-v2.6.0 -m 256M rhel72.img -M pc-i440fx-2.6
to
qemu-system-x86_64-v2.7.0-rc0 -m 256M rhel72.img -M pc-i440fx-2.6

Regression is added by commit 2293c27f (i2c: implement broadcast write)

Fix it by dropping 'broadcast' VMState introduced by 2293c27f and
reuse broadcast 0x00 address as broadcast flag in bus->saved_address.
Then if there were ongoing broadcast at migration time, set
bus->saved_address to it and at i2c_slave_post_load() time check
for it instead of transfering and using 'broadcast' VMState.

As result of reusing existing saved_address VMState, no compat
glue will be needed to keep forward/backward compatiblity. which
makes fix much less intrusive.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1469623198-177227-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

mptsas: really fix migration compatibility

Commit 2e2aa316 removed internal flag msi_in_use, but it
existed in vmstate. Restore it for migration to older QEMU
versions.

Reported-by: Amit Shah <amit.shah@redhat.com>
Suggested-by: Amit Shah <amit.shah@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qdist: return "(empty)" instead of NULL when printing an empty dist

Printf'ing a NULL string is undefined behaviour. Avoid it.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1469459025-23606-4-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qdist: use g_renew and g_new instead of g_realloc and g_malloc.

This is safer against overflow. g_renew is available in all
version of glib, while g_realloc_n is only available in 2.24.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1469459025-23606-3-git-send-email-cota@braap.org>
[Rewritten to use g_new/g_renew. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qdist: fix memory leak during binning

In qdist_bin__internal(), to->entries is initialized to a 1-element array,
which we then leak when n == from->n. Fix it.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1469459025-23606-2-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target-i386: fix typo in xsetbv implementation

QEMU 2.6 added support for the XSAVE family of instructions, which
includes the XSETBV instruction which allows setting the XCR0
register.

But, when booting Linux kernels with XSAVE support enabled, I was
getting very early crashes where the instruction pointer was set
to 0x3.  I tracked it down to a jump instruction generated by this:

        gen_jmp_im(s->pc - pc_start);

where s->pc is pointing to the instruction after XSETBV and pc_start
is pointing _at_ XSETBV.  Subtract the two and you get 0x3.  Whoops.

The fix is to replace this typo with the pattern found everywhere
else in the file when folks want to end the translation buffer.

Richard Henderson confirmed that this is a bug and that this is the
correct fix.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: qemu-stable@nongnu.org
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qht: do not segfault when gathering stats from an uninitialized qht

So far, QHT functions assume that the passed qht has previously been
initialized--otherwise they segfault.

This patch makes an exception for qht_statistics_init, with the goal
of simplifying calling code. For instance, qht_statistics_init is
called from the 'info jit' dump, and given that under KVM the TB qht
is never initialized, we get a segfault. Thus, instead of complicating
the 'info jit' code with additional checks, let's allow passing an
uninitialized qht to qht_statistics_init.

While at it, add a test for this to test-qht.

Before the patch (for $ qemu -enable-kvm [...]):
(qemu) info jit
[...]
direct jump count   0 (0%) (2 jumps=0 0%)
Program received signal SIGSEGV, Segmentation fault.

After the patch the "TB hash buckets", "TB hash occupancy"
and "TB hash avg chain" lines are omitted.
(qemu) info jit
[...]
direct jump count   0 (0%) (2 jumps=0 0%)
TB hash buckets     0/0 (-nan% head buckets used)
TB hash occupancy   nan% avg chain occ. Histogram: (null)
TB hash avg chain   nan buckets. Histogram: (null)
[...]

Reported by: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1469205390-14369-1-git-send-email-cota@braap.org>
[Extract printing statistics to an entirely separate function. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util: Drop inet_listen()

Since commit e65c67e4, inet_listen() is not used anymore, and all
inet listen operation goes through QIOChannel.

Cc: Daniel P. Berrange <berrange@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-Id: <1469451771-1173-3-git-send-email-caoj.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util: drop unix_nonblocking_connect()

It is never used; all nonblocking connect now goes through
socket_connect(), which calls unix_connect_addr().

Cc: Daniel P. Berrange <berrange@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-Id: <1469097213-26441-3-git-send-email-caoj.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util: drop inet_nonblocking_connect()

It is never used; all nonblocking connect now goes through
socket_connect(), which calls inet_connect_addr().

Cc: Daniel P. Berrange <berrange@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-Id: <1469097213-26441-2-git-send-email-caoj.fnst@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

checkpatch: add check for bzero

Tested-By: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fix qemu exit on memory hotplug when allocation fails at prealloc time

When adding hostmem backend at runtime, QEMU might exit with error:
"os_mem_prealloc: Insufficient free host memory pages available to allocate guest RAM"

It happens due to os_mem_prealloc() not handling errors gracefully.

Fix it by passing errp argument so that os_mem_prealloc() could
report error to callers and undo performed allocation when
os_mem_prealloc() fails.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1469008443-72059-1-git-send-email-imammedo@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

numa: set the memory backend "is_mapped" field

Commit 2aece63 "hostmem: detect host backend memory is being used properly"
added a way to know if a memory backend is busy or available for use. It
caused a slight regression if we pass the same backend to a NUMA node and
to a pc-dimm device:

-m 1G,slots=2,maxmem=2G \
-object memory-backend-ram,size=1G,id=mem-mem1 \
-device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \
-numa node,nodeid=0,memdev=mem-mem1

Before commit 2aece63, this would cause QEMU to print an error message and
to exit gracefully:

qemu-system-ppc64: -device pc-dimm,id=dimm-mem1,memdev=mem-mem1:
can't use already busy memdev: mem-mem1

Since commit 2aece63, QEMU hits an assertion in the memory code:

qemu-system-ppc64: memory.c:1934: memory_region_add_subregion_common:
Assertion `!subregion->container' failed.
Aborted

This happens because pc-dimm devices don't use memory_region_is_mapped()
anymore and cannot guess the backend is already used by a NUMA node.

Let's revert to the previous behavior by turning the NUMA code to also
call host_memory_backend_set_mapped() when it uses a backend.

Fixes: 2aece63c8a9d2c3a8ff41d2febc4cdeff2633331
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <146891691503.15642.9817215371777203794.stgit@bahia.lan>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util/qht: Document memory ordering assumptions

It is naturally expected that some memory ordering should be provided
around qht_insert() and qht_lookup(). Document these assumptions in the
header file and put some comments in the source to denote how that
memory ordering requirements are fulfilled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Sergey Fedorov: commit title and message provided;
comment on qht_remove() elided]
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Message-Id: <20160715175852.30749-2-sergey.fedorov@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

MAINTAINERS: Update the Xilinx maintainers

Update the Xilinx maintainers documentation to simplify what we maintain
and cover all of our upstream code.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Fix bsd-user build errors after 8642c1b81e0418df066a7960a7426d85a923a253

LINK sparc-bsd-user/qemu-sparc
bsd-user/main.o: In function `cpu_loop':
/home/sbruno/bsd/qemu/bsd-user/main.c:515: undefined reference to `cpu_sparc_exec'
c++: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[1]: *** [Makefile:197: qemu-sparc] Error 1
gmake: *** [Makefile:204: subdir-sparc-bsd-user] Error 2

LINK i386-bsd-user/qemu-i386
bsd-user/main.o: In function `cpu_loop':
/home/sbruno/bsd/qemu/bsd-user/main.c:174: undefined reference to `cpu_x86_exec'
c++: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[1]: *** [Makefile:197: qemu-i386] Error 1
gmake: *** [Makefile:204: subdir-i386-bsd-user] Error 2

Signed-off-by: Sean Bruno <sbruno@freebsd.org>
Message-id: 20160729160235.64525-1-sbruno@freebsd.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Update version for v2.7.0-rc1 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

avx2 configure: Disable if static build

This avoids a segfault like the following for at least some 4.8 versions
of gcc when configured with --static if avx2 instructions are also
enabled:

Program received signal SIGSEGV, Segmentation fault.
buffer_find_nonzero_offset_ifunc () at ./util/cutils.c:333
333     {
(gdb) bt
#0  buffer_find_nonzero_offset_ifunc () at ./util/cutils.c:333
#1  0x0000000000939c58 in __libc_start_main ()
#2  0x0000000000419337 in _start ()

Signed-off-by: Aaron Lindsay <alindsay@codeaurora.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Unbreak FreeBSD build after optionrom update.

Update the build flags appropriately for FreeBSD and add the
correct LD_EMULATION type for the FreeBSD build case.

Fixes FreeBSD build error:
ld: unrecognised emulation mode: elf_i386
Supported emulations: elf_x86_64_fbsd elf_i386_fbsd
gmake[1]: *** [Makefile:51: linuxboot_dma.img] Error 1
gmake: *** [Makefile:229: romsubdir-optionrom] Error 2

Signed-off-by: Sean Bruno <sbruno@freebsd.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

optionrom: fix detection of -Wa,-32

The cc-option macro runs $(CC) in -S mode (generate assembly) to avoid a
pointless run of the assembler. However, this does not work when you want
to detect support for cc->as option passthrough. clang ignores -Wa unless
-c is provided, and exits successfully even if the -Wa,-32 option is not
supported.

Reported-by: Stefan Hajnoczi <stefanha@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1469043409-14033-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/lalrae/tags/mips-20160729' into staging

MIPS patches 2016-07-29

Changes:
* bug fixes

# gpg: Signature made Fri 29 Jul 2016 09:44:13 BST
# gpg:                using RSA key 0x52118E3C0B29DA6B
# gpg: Good signature from "Leon Alrae <leon.alrae@imgtec.com>"
# Primary key fingerprint: 8DD3 2F98 5495 9D66 35D4  4FC0 5211 8E3C 0B29 DA6B

* remotes/lalrae/tags/mips-20160729:
  target-mips: fix EntryHi.EHINV being cleared on TLB exception
  hw/mips_malta: Fix YAMON API print routine

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160729' into staging

ppc patch queue 2016-07-29

Here are the current pending ppc and spapr related patches for
qemu-2.7.  Given the freeze status, these are all bugfixes, with two
exceptions:

  * There's some final rework of the vcpu hotplug model.  Specifically
    we add spapr specific code on the generic basis Igor established
    to make cpu_index stable for pseries-2.7 and later machine types.
      - This allows us to remove the limitation that cpu cores had to
        be inserted in linear order, and removed in LIFO order.
      - This is worth merging this late in 2.7 because it will avoid
        considerable future grief with management layers needing to
        discover whether out-of-order hotplug is possible, amongst
        other things.
      - For now we do add a constraint that the initial cpu cannot be
        unplugged.
  * We add two extra testcases to make check, for postcopy and
    drive_del on ppc64.
      - Not strictly bugfixes, but safe, because they don't affect the
        actual code, and increase test coverage.

# gpg: Signature made Fri 29 Jul 2016 05:50:02 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160729:
  tests: add drive_del-test to ppc/ppc64
  spapr: Prevent boot CPU core removal
  ppc: Fix fault PC reporting for lve*/stve* VMX instructions
  test: port postcopy test to ppc64
  Revert "spapr: Ensure CPU cores are added contiguously and removed in LIFO order"
  spapr: init CPUState->cpu_index with index relative to core-id

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging

pc, pci, virtio: cleanups, fixes

a bunch of bugfixes and a couple of cleanups
making these easier and/or making debugging easier

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Fri 29 Jul 2016 04:11:01 BST
# gpg:                using RSA key 0x281F0DB8D28D5469
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>"
# gpg:                 aka "Michael S. Tsirkin <mst@redhat.com>"
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17  0970 C350 3912 AFBE 8E67
#      Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA  8A0D 281F 0DB8 D28D 5469

* remotes/mst/tags/for_upstream: (41 commits)
  mptsas: Fix a migration compatible issue
  vhost: do not update last avail idx on get_vring_base() failure
  vhost: add vhost_net_set_backend()
  vhost-user: add error report in vhost_user_write()
  tests: fix vhost-user-test leak
  tests: plug some leaks in virtio-net-test
  vhost-user: wait until backend init is completed
  char: add and use tcp_chr_wait_connected
  char: add chr_wait_connected callback
  vhost: add assert() to check runtime behaviour
  vhost-net: vhost_migration_done is vhost-user specific
  Revert "vhost-net: do not crash if backend is not present"
  vhost-user: add get_vhost_net() assertions
  vhost-user: keep vhost_net after a disconnection
  vhost-user: check vhost_user_{read,write}() return value
  vhost-user: check qemu_chr_fe_set_msgfds() return value
  vhost-user: call set_msgfds unconditionally
  qemu-char: fix qemu_chr_fe_set_msgfds() crash when disconnected
  vhost: use error_report() instead of fprintf(stderr,...)
  vhost: add missing VHOST_OPS_DEBUG
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Thu 28 Jul 2016 23:50:37 BST
# gpg:                using RSA key 0x7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/ide-pull-request:
  ide: fix halted IO segfault at reset

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tests: add drive_del-test to ppc/ppc64

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

mptsas: Fix a migration compatible issue

My previous commit 2e2aa316 removed internal flag msi_in_use, which
exists in vmstate, use VMSTATE_UNUSED for migration compatibility.

Reported-by: Amit Shah <amit.shah@redhat.com>
Suggested-by: Amit Shah <amit.shah@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>

vhost: do not update last avail idx on get_vring_base() failure

The state.num value will probably be 0 in this case, but that
doesn't make sense to update.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

spapr: Prevent boot CPU core removal

Boot CPU is assumed to be always present in QEMU code. So
until that assumptions are gone, deny removal request.
In another words, QEMU won't support boot CPU core hot-unplug.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
[dwg: Tweaked error message for clarity]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc: Fix fault PC reporting for lve*/stve* VMX instructions

We forgot to do gen_update_nip() for these like we do with other
helpers. Fix this, but in a more efficient way by passing the RA
to the accessors instead so the overhead is only taken on faults.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

test: port postcopy test to ppc64

As userfaultfd syscall is available on powerpc, migration
postcopy can be used.

This patch adds the support needed to test this on powerpc,
instead of using a bootsector to run code to modify memory,
we use a FORTH script in "boot-command" property.

As spapr machine doesn't support "-prom-env" argument
(the nvram is initialized by SLOF and not by QEMU),
"boot-command" is provided to SLOF via a file mapped nvram
(with "-drive file=...,if=pflash")

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Revert "spapr: Ensure CPU cores are added contiguously and removed in LIFO order"

This reverts commit 5cbc64de25973e9129c5a7897734a06ac64b9aff.

Now that we have stable cpu_index values for pseries-2.7 (and future)
machine types, we can now safely allow hotplug and unplug in any order.

Conflicts:
hw/ppc/spapr_cpu_core.c

Some conflicts on revert due to some small changes in the inserted
code since the original commit.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: init CPUState->cpu_index with index relative to core-id

It will enshure that cpu_index for a given cpu stays the same
regardless of the order cpus has been created/deleted and so
it would be possible to migrate QEMU instance with out of order
created CPU.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ide: fix halted IO segfault at reset

If one attempts to perform a system_reset after a failed IO request
that causes the VM to enter a paused state, QEMU will segfault trying
to free up the pending IO requests.

These requests have already been completed and freed, though, so all
we need to do is NULL them before we enter the paused state.

Existing AHCI tests verify that halted requests are still resumed
successfully after a STOP event.

Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Message-id: 1469635201-11918-2-git-send-email-jsnow@redhat.com
Signed-off-by: John Snow <jsnow@redhat.com>

vhost: add vhost_net_set_backend()

Not all vhost-user backends support ops->vhost_net_set_backend(). It is
a nicer to provide an assert/error than to crash trying to
call. Furthermore, it improves a bit the code by hiding vhost_ops
details.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: add error report in vhost_user_write()

Similar to vhost_user_read() error report, it is useful to have early
error report.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: fix vhost-user-test leak

Spotted by valgrind.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

tests: plug some leaks in virtio-net-test

Found thanks to valgrind.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: wait until backend init is completed

The chardev waits for an initial connection before starting qemu, and
vhost-user should wait for the backend negotiation to be completed
before starting qemu too.

vhost-user is started in the net_vhost_user_event callback, which is
synchronously called after the socket is connected. Use a
VhostUserState.started flag to indicate vhost-user init completed
successfully and qemu can be started.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

char: add and use tcp_chr_wait_connected

Add a chr_wait_connected for the tcp backend, and use it in the
open_socket() function.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

char: add chr_wait_connected callback

A function to wait on the backend to be connected, to be used in the
following patches.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: add assert() to check runtime behaviour

All these functions must be called only after the backend is connected.
They are called from virtio-net.c, after either virtio or link status
change.

The check for nc->peer->link_down should ensure vhost_net_{start,stop}()
are always called between vhost_user_{start,stop}().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-net: vhost_migration_done is vhost-user specific

Either the callback is mandatory to implement, in which case an assert()
is more appropriate, or it's not and we can't tell much whether the
function should fail or not (given it's name, I guess it should silently
success by default). Instead, make the implementation mandatory and
vhost-user specific to be more clear about its usage.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Revert "vhost-net: do not crash if backend is not present"

Now that get_vhost_net() returns non-null after a successful
vhost_net_init(), we no longer need to check this case.

This reverts commit ecd34898596c60f79886061618dd7e01001113ad.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: add get_vhost_net() assertions

Add a few assertions to be more explicit about the runtime behaviour
after the previous patch: get_vhost_net() is non-null after
net_vhost_user_init().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: keep vhost_net after a disconnection

Many code paths assume get_vhost_net() returns non-null.

Keep VhostUserState.vhost_net after a successful vhost_net_init(),
instead of freeing it in vhost_net_cleanup().

VhostUserState.vhost_net is thus freed before after being recreated or
on final vhost_user_cleanup() and there is no need to save the acked
features.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: check vhost_user_{read,write}() return value

The vhost-user code is quite inconsistent with error handling. Instead
of ignoring some return values of read/write and silently going on with
invalid state (invalid read for example), break the code flow when the
error happened.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: check qemu_chr_fe_set_msgfds() return value

Check qemu_chr_fe_set_msgfds() for errors, to make sure the message to
be sent is correct.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: call set_msgfds unconditionally

It is fine to call set_msgfds() with 0 fd, and ensures any previous fd
array is cleared.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

qemu-char: fix qemu_chr_fe_set_msgfds() crash when disconnected

Calling qemu_chr_fe_set_msgfds() on unconnected socket leads to crash
since s->ioc is NULL in this case. Return an error earlier instead.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: use error_report() instead of fprintf(stderr,...)

Let's use qemu proper error reporting API, this ensures the error is
reported at the right place (stderr or monitor), with a conventional
format.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: add missing VHOST_OPS_DEBUG

Add missing VHOST_OPS_DEBUG() logs, for completeness.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: do not assert() on vhost_ops failure

Calling a vhost operation may fail, for example with disconnected
vhost-user backend, but qemu shouldn't abort in this case.

Log an error instead, except on error and cleanup code paths where it
can be mostly ignored.

Let's use a VHOST_OPS_DEBUG macro to easily disable those messages once
disconnected backend stabilizes.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: fix calling vhost_dev_cleanup() after vhost_dev_init()

vhost_net_init() calls vhost_dev_init() and in case of failure, calls
vhost_dev_cleanup() directly. However, the structure is already
partially cleaned on error. Calling vhost_dev_cleanup() again will call
vhost_virtqueue_cleanup() on already clean queues, and causing potential
double-close. Instead, adjust dev->nvqs and simplify vhost_dev_init()
code to not call vhost_virtqueue_cleanup() but vhost_dev_cleanup()
instead.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-net: always call vhost_dev_cleanup() on failure

vhost_dev_init(), calling vhost backend initialization, should be
cleaned up after failure too. Call vhost_dev_cleanup() in all failure
cases. First, it needs to zero-alloc the struct to avoid the initial
garbage.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: make vhost_dev_cleanup() idempotent

It is called on multiple code path, so make it safe to call several
times (note: I don't remember a reproducer here, but a function called
'cleanup' should probably be idempotent in my book)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: fix cleanup on not fully initialized device

If vhost_dev_init() failed, caller may still call vhost_dev_cleanup()
later. However, vhost_dev_cleanup() tries to remove the device from the
list even if it wasn't yet added, which may lead to crashes. Similarly
for the memory listener.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: assert the log was cleaned up

Make sure the log was released on cleanup, or it will leak (the
alternative is to call vhost_log_put() unconditionally, but it may hide
some dev state issues).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: make vhost_log_put() idempotent

Although not strictly required, it is nice to have vhost_log_put()
safely callable multiple times.

Clear dev->log* when calling vhost_log_put() to make the function
idempotent. This also simplifies a bit the caller work.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost: don't assume opaque is a fd, use backend cleanup

vhost-dev opaque isn't necessarily an fd, it can be a chardev when using
vhost-user. Goto fail, so vhost_backend_cleanup() is called to handle
backend cleanup appropriately.

vhost_set_backend_type() should never fail, use an assert().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: disconnect on HUP

In some cases, qemu_chr_fe_read_all() on HUP event doesn't raise
CHR_EVENT_CLOSED because the read/recv function returns -1 on
disconnected peers (for example with tch_chr_recv, an ECONNRESET errno
overwritten as EIO).

It is simpler to explicitely disconnect on HUP, rising CHR_EVENT_CLOSED
if it wasn't disconnected already.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-user: minor simplification

Shorten the code and make it more clear by using the specialized
function g_str_has_prefix().

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

misc: indentation

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio: check vring descriptor buffer length

virtio back end uses set of buffers to facilitate I/O operations.
An infinite loop unfolds in virtqueue_pop() if a buffer was
of zero size. Add check to avoid it.

Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

hw/virtio-pci: fix virtio behaviour

Enable transitional virtio devices by default.
Enable virtio-1.0 for devices plugged into
PCIe ports (Root ports or Downstream ports).

Using the virtio-1 mode will remove the limitation
of the number of devices that can be attached to a machine
by removing the need for the IO BAR.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>

apb: convert init to realize

Convert a device model where initialization obviously can't fail,
make it implement realize() rather than init().

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>

hw/pci-bridge: Convert pxb initialization functions to Error

Firstly, convert pxb_dev_init_common() to Error and rename
it to pxb_dev_realize_common().
Actually, pxb_register_bus() is converted as well.

And then,
convert pxb_dev_initfn() and pxb_pcie_dev_initfn() to Error,
rename them to pxb_dev_realize() and pxb_pcie_dev_realize()
respectively.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/apci: handle 64-bit MMIO regions correctly

In build_crs(), the calculation and merging of the ranges already happens
in 64-bit, but the entry boundaries are silently truncated to 32-bit in the
call to aml_dword_memory(). Fix it by handling the 64-bit MMIO ranges separately.
This fixes 64-bit BARs behind PXBs.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

acpi: refactor pxb crs computation

Instead of always passing both IO and MEM ranges when
computing CRS ranges, define a new CrsRangeSet structure
that include them both.

This is done before introducing a third type of range,
64-bit MEM, so it will be easier to pass them all around.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/acpi: fix a DSDT table issue when a pxb is present.

PXBs do not support hotplug so they don't have a PCNT function.
Since the PXB's PCI root-bus is a child bus of bus 0, the
build_dsdt code will add a call to the corresponding PCNT function.

Fix this by skipping the PCNT call for the above case.
While at it skip also PCIe child buses.

Reported-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/pxb: declare pxb devices as not hot-pluggable

Prevent future issues when hotplug will work for devices
attached to pxbs.

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Tested-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/pcie-root-port: Fix PCIe root port initialization

Specify the root port interrupt pin as part of the init
process for cases when msi/msix are not enabled.

Fixes "hw/pci/pci.c:196:23: runtime error: shift exponent -1 is negative"
warning from clang's sanitizer.

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

pcie: fix link active status bit migration

We changed link status register in pci express endpoint capability
over time. Specifically,

commit b2101eae63ea57b571cee4a9075a4287d24ba4a4 ("pcie: Set the "link
active" in the link status register") set data link layer link active
bit in this register without adding compatibility to old machine types.

When migrating from qemu 2.3 and older this affects xhci devices which
under machine type 2.0 and older have a pci express endpoint capability
even if they are on a pci bus.

Add compatibility flags to make this bit value match what it was under
2.3.

Additionally, to avoid breaking migration from qemu 2.3 and up,
suppress checking link status during migration: this seems sane
since hardware can change link status at any time.

https://bugzilla.redhat.com/show_bug.cgi?id=1352860

Reported-by: Gerd Hoffmann <kraxel@redhat.com>
Fixes: b2101eae63ea57b571cee4a9075a4287d24ba4a4
("pcie: Set the "link active" in the link status register")
Cc: qemu-stable@nongnu.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

target-mips: fix EntryHi.EHINV being cleared on TLB exception

While implementing TLB invalidation feature we forgot to modify
part of code responsible for updating EntryHi during TLB exception.
Consequently EntryHi.EHINV is unexpectedly cleared on the exception.

Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>

hw/mips_malta: Fix YAMON API print routine

The print routine provided as part of the in-built bootloader had a bug
in that it attempted to use a jump instruction as part of a loop, but
the target has its upper bits zeroed leading to control flow
transferring to 0xb0000814 rather than the intended 0xbfc00814. Fix this
by using a branch instruction instead, which seems more fit for purpose.

A simple way to test this is to build a Linux kernel with EVA enabled &
attempt to boot it in QEMU. It will attempt to print a message
indicating the configuration mismatch but QEMU would previously
incorrectly jump & wind up printing a continuous stream of the letter E.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Leon Alrae <leon.alrae@imgtec.com>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Leon Alrae <leon.alrae@imgtec.com>
Signed-off-by: Leon Alrae <leon.alrae@imgtec.com>

Merge remote-tracking branch 'remotes/ehabkost/tags/x86-pull-request' into staging

x86 and machine queue, 2016-07-27

Highlights:
* Fixes to allow CPU hotplug/unplug in any order;
* Exit QEMU on invalid global properties.

# gpg: Signature made Wed 27 Jul 2016 15:28:53 BST
# gpg:                using RSA key 0x2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/x86-pull-request:
  vl: exit if a bad property value is passed to -global
  qdev: ignore GlobalProperty.errp for hotplugged devices
  machine: Add comment to abort path in machine_set_kernel_irqchip
  Revert "pc: Enforce adding CPUs contiguously and removing them in opposite order"
  pc: Init CPUState->cpu_index with index in possible_cpus[]
  qdev: Fix object reference leak in case device.realize() fails
  exec: Set cpu_index only if it's not been explictly set
  exec: Don't use cpu_index to detect if cpu_exec_init()'s been called
  exec: Reduce CONFIG_USER_ONLY ifdeffenery

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/stefanha/tags/CVE-2016-5403-virtio-unbounded-allocation-pull-request' into staging

# gpg: Signature made Wed 27 Jul 2016 16:13:02 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/CVE-2016-5403-virtio-unbounded-allocation-pull-request:
  virtio: error out if guest exceeds virtqueue size

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Tue 26 Jul 2016 21:51:38 BST
# gpg:                using RSA key 0xBDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  mirror: double performance of the bulk stage if the disc is full
  block/gluster: fix doc in the qapi schema and member name

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

vl: exit if a bad property value is passed to -global

When passing '-global driver=host-powerpc64-cpu,property=compat,value=foo'
on the command line, without this patch, we get the following warning per
device (which means many lines if the guests has many cpus):

qemu-system-ppc64: Warning: can't apply global host-powerpc64-cpu.compat=foo:
Invalid compatibility mode "foo"

... and QEMU continues execution, ignoring the property.

With this patch, we get a single line:

qemu-system-ppc64: can't apply global host-powerpc64-cpu.compat=foo:
Invalid compatibility mode "foo"

... and QEMU exits.

The previous behavior is kept for hotplugged devices since we don't want
QEMU to exit when doing device_add.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

qdev: ignore GlobalProperty.errp for hotplugged devices

This patch ensures QEMU won't terminate while hotplugging a device if the
global property cannot be set and errp points to error_fatal or error_abort.

While here, it also fixes indentation of the typename argument.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

machine: Add comment to abort path in machine_set_kernel_irqchip

We're not supposed to abort when the user passes a bogus value.
Since the checking is done in visit_type_OnOffSplit(), the call
to abort() is legitimate. Let's add a comment to make it
explicit.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

virtio: error out if guest exceeds virtqueue size

A broken or malicious guest can submit more requests than the virtqueue
size permits, causing unbounded memory allocation in QEMU.

The guest can submit requests without bothering to wait for completion
and is therefore not bound by virtqueue size.  This requires reusing
vring descriptors in more than one request, which is not allowed by the
VIRTIO 1.0 specification.

In "3.2.1 Supplying Buffers to The Device", the VIRTIO 1.0 specification
says:

  1. The driver places the buffer into free descriptor(s) in the
     descriptor table, chaining as necessary

and

  Note that the above code does not take precautions against the
  available ring buffer wrapping around: this is not possible since the
  ring buffer is the same size as the descriptor table, so step (1) will
  prevent such a condition.

This implies that placing more buffers into the virtqueue than the
descriptor table size is not allowed.

QEMU is missing the check to prevent this case.  Processing a request
allocates a VirtQueueElement leading to unbounded memory allocation
controlled by the guest.

Exit with an error if the guest provides more requests than the
virtqueue size permits.  This bounds memory allocation and makes the
buggy guest visible to the user.

This patch fixes CVE-2016-5403 and was reported by Zhenhao Hong from 360
Marvel Team, China.

Reported-by: Zhenhao Hong <hongzhenhao@360.cn>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

mirror: double performance of the bulk stage if the disc is full

Mirror can do up to 16 in-flight requests, but actually on full copy
(the whole source disk is non-zero) in-flight is always 1. This happens
as the request is not limited in size: the data occupies maximum available
capacity of s->buf.

The patch limits the size of the request to some artificial constant
(1 Mb here), which is not that big or small. This effectively enables
back parallelism in mirror code as it was designed.

The result is important: the time to migrate 10 Gb disk is reduced from
~350 sec to 170 sec.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-id: 1468516741-82174-1-git-send-email-vsementsov@virtuozzo.com
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>

block/gluster: fix doc in the qapi schema and member name

1. qapi @BlockdevOptionsGluster schema member name s/debug_level/debug-level/
2. rearrange the versioning
3. s/server description/servers description/

Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Message-Id: <1469198048-8535-1-git-send-email-prasanna.kalever@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>

Revert "pc: Enforce adding CPUs contiguously and removing them in opposite order"

This reverts commit 4da7faaeb0c7dd3f7f233165d336c878f78fd1eb.

Since commit:
pc: init CPUState->cpu_index with index in possible_cpus[]
cpu_index is stable regardless of the order cpus were created
and QEMU instance stays migratable always so limitation added
by 4da7faaeb could be safely removed.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

pc: Init CPUState->cpu_index with index in possible_cpus[]

It will enshure that cpu_index for a given cpu stays the same
regardless of the order cpus has been created/deleted.

No compat code is needed as for initial cpus index in
possible_cpus[] matches cpu_index that's been auto-allocated
in cpu_exec_init().

Tha same applies for hotplug with cpu-add command if cpus are
added sequentially in increasing order as 'id' matches cpu_index.

If cpu-add had been used for creating out-of-order cpus,
that created unmigratable instance since it were not possible
to start target with the same cpu_index using old way
of migrating instance with hotplugged cpus:

* source QEMU with CLI (-smp 1,maxcpus=3 and cpu-add id=2)
  following set of cpu_index is allocated [0, 1] with
  apics set [0, 2] respectivelly
* target QEMU is started with CLI -smp 2,maxcpus=3
  resulting in set of cpu_index [0, 1] but with
  set of apics [0, 1] wich doesn't match source.

So we don't need compat code in this case as it's never worked
and newelly added device_add support would use stable cpu_index
set by machine to begin with, so it won't have above limitation
and source QEMU could be migrated to destination regardless
of the order cpus were created.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

qdev: Fix object reference leak in case device.realize() fails

If device doesn't have parent assined before its realize
is called, device_set_realized() will implicitly set parent
to '/machine/unattached'.

However device_set_realized() may fail after that point at
several other points leaving not realized object dangling
in '/machine/unattached' and as result caller of

  obj = object_new()
    obj->ref == 1
  object_property_set_bool(obj,..., true, "realized",...)
    obj->ref == 2
  if (fail)
      object_unref(obj);
      obj->ref == 1

will get object leak instead of expected object destruction.

Fix it by making device_set_realized() to cleanup after itself
in case of failure.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

exec: Set cpu_index only if it's not been explictly set

It keeps the legacy behavior for all users that doesn't care
about stable cpu_index value, but would allow boards that
would support device_add/device_del to set stable cpu_index
that won't depend on order in which cpus are created/destroyed.

While at that simplify cpu_get_free_index() as cpu_index
generated by USER_ONLY and softmmu variants is the same
since none of the users support cpu-remove so far, except
of not yet released spapr/x86 device_add/delr, which
will be altered by follow up patches to set stable
cpu_index manually.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

exec: Don't use cpu_index to detect if cpu_exec_init()'s been called

Instead use QTAIL's tqe_prev field to detect if cpu's been
placed in list by cpu_exec_init() which is always set if
QTAIL element is in list.

Fixes SIGSEGV on failure path in case cpu_index is assigned
by board and cpu.relalize() fails before cpu_exec_init() is called.

In follow up patches, cpu_index will be assigned by boards that
support cpu hot(un)plug and need stable cpu_index that doesn't
depend on order cpus are created/removed.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

exec: Reduce CONFIG_USER_ONLY ifdeffenery

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2016-07-26' into staging

Block patches for 2.7.0-rc1

# gpg: Signature made Tue 26 Jul 2016 18:11:36 BST
# gpg:                using RSA key 0x3BB14202E838ACAD
# gpg: Good signature from "Max Reitz <mreitz@redhat.com>"
# Primary key fingerprint: 91BE B60A 30DB 3E88 57D1  1829 F407 DB00 61D5 CF40
#      Subkey fingerprint: 58B3 81CE 2DC8 9CF9 9730  EE64 3BB1 4202 E838 ACAD

* remotes/maxreitz/tags/pull-block-2016-07-26:
  iotest: fix python based IO tests
  block: export LUKS specific data to qemu-img info
  crypto: add support for querying parameters for block encryption
  AioContext: correct comments
  qcow2: do not allocate extra memory

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

iotest: fix python based IO tests

The previous commit refactoring iotests.py:

  commit 66613974468fb6e1609fb3eabf55981b1ee436cf
  Author: Daniel P. Berrange <berrange@redhat.com>
  Date:   Wed Jul 20 14:23:10 2016 +0100

    scripts: refactor the VM class in iotests for reuse

was not properly tested and included a number of broken
bits.

- The 'event_match' method was not moved into qemu.py
- The 'self._args' list parameter in QEMUMachine needs
   to be copied otherwise modifications will affect the
   global 'qemu_opts' variable in iotests.py
- The QEMUQtestMachine class methods had inverted
   parameter order for the super() calls
- The QEMUQtestMachine class forgot to add
   '-machine accel=qtest'
- The QEMUQtestMachine class constructor needs to set
   a default 'name' value before using it as it may
   be None
- The QEMUQtestMachine class constructor needs to use
   named parameters when calling the super constructor
   as it is leaving out some positional parameters.
- The 'qemu_prog' variable should be a string not a
   list in iotests.py
- The VM classs constructor needs to use named
   parameters when calling the super constructor
   as it is leaving out some positional parameters.
- The path to the socket-scm-helper needs to be
   passed into the QEMUMachine class

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1469549767-27249-1-git-send-email-berrange@redhat.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

block: export LUKS specific data to qemu-img info

The qemu-img info command has the ability to expose format
specific metadata about volumes. Wire up this facility for
the LUKS driver to report on cipher configuration and key
slot usage.

    $ qemu-img info ~/VirtualMachines/demo.luks
    image: /home/berrange/VirtualMachines/demo.luks
    file format: luks
    virtual size: 98M (102760448 bytes)
    disk size: 100M
    encrypted: yes
    Format specific information:
        ivgen alg: plain64
        hash alg: sha1
        cipher alg: aes-128
        uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594
        cipher mode: xts
        slots:
            [0]:
                active: true
                iters: 572706
                key offset: 4096
                stripes: 4000
            [1]:
                active: false
                key offset: 135168
            [2]:
                active: false
                key offset: 266240
            [3]:
                active: false
                key offset: 397312
            [4]:
                active: false
                key offset: 528384
            [5]:
                active: false
                key offset: 659456
            [6]:
                active: false
                key offset: 790528
            [7]:
                active: false
                key offset: 921600
        payload offset: 2097152
        master key iters: 142375

One somewhat undesirable artifact is that the data fields are
printed out in (apparently) random order. This will be addressed
later by changing the way the block layer pretty-prints the
image specific data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1469192015-16487-3-git-send-email-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>

crypto: add support for querying parameters for block encryption

When creating new block encryption volumes, we accept a list of
parameters to control the formatting process. It is useful to
be able to query what those parameters were for existing block
devices. Add a qcrypto_block_get_info() method which returns a
QCryptoBlockInfo instance to report this data.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-id: 1469192015-16487-2-git-send-email-berrange@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>

AioContext: correct comments

Correct comments of field notify_me

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-id: 1468575858-22975-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Max Reitz <mreitz@redhat.com>

qcow2: do not allocate extra memory

There are no needs to allocate more than one cluster, as we set
avail_out for deflate to one cluster.

Zlib docs (http://www.zlib.net/manual.html) says:
"deflate compresses as much data as possible, and stops when the input
buffer becomes empty or the output buffer becomes full."

So, deflate will not write more than avail_out to output buffer. If
there is not enough space in output buffer for compressed data (it may
be larger than input data) deflate just returns Z_OK. (if all data is
compressed and written to output buffer deflate returns Z_STREAM_END).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 1468515565-81313-1-git-send-email-vsementsov@virtuozzo.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160726' into staging

ppc patch queue 2016-07-26

Here's the current batch of ppc and spapr related patches intended for
qemu-2.7.  Given the late stage in 2.7 development, these are all
bugfixes with one exception:

The "spapr: disintricate core-id from DT semantics" changes the way
ids are assigned in the new core-based hotplug infrastructure.  This
isn't strictly a bugfix, but we've determined that the current way of
assigning core-ids will cause considerable grief with future plans for
cpu hotplug.  Therefore it's better to fix this now, late in 2.7,
before we have a released version with the problematic numbering.

# gpg: Signature made Tue 26 Jul 2016 04:04:57 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160726:
  spapr: disintricate core-id from DT semantics
  target-ppc: add PPC_MFTB flag to e500mc and e5500
  spapr: fix spapr-nvram migration
  hw/ppc/spapr: Make sure to close the htab_fd when migration is canceled
  ppc: Huge page detection mechanism fixes - Episode III

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>