git.proxmox.com Git - mirror

i386: Compile CPUX86State xsave_buf only when support KVM or HVF

While at it, also rename var to indicate it is not used only in KVM.

Reviewed-by: Nikita Leshchenko <nikita.leshchenko@oracle.com>
Reviewed-by: Patrick Colp <patrick.colp@oracle.com>
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Message-Id: <20180914003827.124570-2-liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: rename HF_SVMI_MASK to HF_GUEST_MASK

This flag will be used for KVM's nested VMX migration; the HF_GUEST_MASK name
is already used in KVM, adopt it in QEMU as well.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: unify masking of interrupts

Interrupt handling depends on various flags in env->hflags or env->hflags2,
and the exact detail were not exactly replicated between x86_cpu_has_work
and x86_cpu_exec_interrupt. Create a new function that extracts the
highest-priority non-masked interrupt, and use it in both functions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

char-pty: remove unnecessary #ifdef

For some reason __APPLE__ was not checked in pty code. However, the #ifdef
is redundant: this file is already compiled only if CONFIG_POSIX, same as
util/qemu-openpty.c which it uses.

Reported-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

test-char: add socket reconnect test

This test exhibits a regression fixed by the previous reverts.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180817135224.22971-5-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

test-char: fix random socket test failure

Peter reported a test failure on FreeBSD with the new reconnect test:

MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))}
gtester -k --verbose -m=quick tests/test-char
TEST: tests/test-char... (pid=16190)
  /char/null:                                                          OK
  /char/invalid:                                                       OK
  /char/ringbuf:                                                       OK
  /char/mux:                                                           OK
  /char/stdio:                                                         OK
  /char/pipe:                                                          OK
  /char/file:                                                          OK
  /char/file-fifo:                                                     OK
  /char/udp:                                                           OK
  /char/serial:                                                        OK
  /char/hotswap:                                                       OK
  /char/socket/basic:                                                  OK
  /char/socket/reconnect:                                              FAIL
GTester: last random seed: R02S521380d9c12f1dac3ad1763bf5665c27
(pid=16367)
  /char/socket/fdpass:                                                 OK
FAIL: tests/test-char
**
ERROR:tests/test-char.c:353:char_socket_test_common: assertion failed:
(object_property_get_bool(OBJECT(chr_client), "connected",
&error_abort))

It turns out that the socket test code checks both server and client
connection states, but doesn't wait for both.

Wait for the client side as well.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20180823143125.16767-5-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

char-socket: update all ioc handlers when changing context

So far, tcp_chr_update_read_handler() only updated the read
handler. Let's also update the hup handler.

Factorize the code while at it. (note that s->ioc != NULL when
s->connected)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180817135224.22971-4-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Revert "chardev: tcp: postpone async connection setup"

This reverts commit 25679e5d58e258e9950685ffbd0cae4cd40d9cc2.

This commit broke "reconnect socket" chardev that are created after
"machine_done": they no longer try to connect. It broke also
vhost-user-test that uses chardev while there is no "machine_done"
event.

The goal of this patch was to move the "connect" source to the
frontend context. chr->gcontext is set with
qemu_chr_fe_set_handlers(). But there is no guarantee that it will be
called, so we can't delay connection until then: the chardev should
still attempt to connect during open(). qemu_chr_fe_set_handlers() is
eventually called later and will update the context.

Unless there is a good reason to not use initially the default
context, I think we should revert to the previous state to fix the
regressions.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180817135224.22971-3-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Revert "chardev: tcp: postpone TLS work until machine done"

This reverts commit 99f2f54174a595e3ada6e4332fcd2b37ebb0d55d.

See next commit reverting 25679e5d58e258e9950685ffbd0cae4cd40d9cc2 as
well for rationale.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180817135224.22971-2-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

memory: cleanup side effects of memory_region_init_foo() on failure

if MemoryRegion intialization fails it's left in semi-initialized state,
where it's size is not 0 and attached as child to owner object.
And this leds to crash in following use-case:
    (monitor) object_add memory-backend-file,id=mem1,size=99999G,mem-path=/tmp/foo,discard-data=yes
    memory.c:2083: memory_region_get_ram_ptr: Assertion `mr->ram_block' failed
    Aborted (core dumped)
it happens due to assumption that memory region is intialized when
   memory_region_size() != 0
and therefore it's ok to access it in
   file_backend_unparent()
      if (memory_region_size() != 0)
          memory_region_get_ram_ptr()

which happens when object_add fails and unparents failed backend making
file_backend_unparent() access invalid memory region.

Fix it by making sure that memory_region_init_foo() APIs cleanup externally
visible side effects on failure (like set size to 0 and unparenting object)

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <1536064777-42312-1-git-send-email-imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw: hyperv_testdev: add read callback

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <20180912160118.21158-4-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw: pc-testdev: add read memory region callback

Also change the write callback name.

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <20180912160118.21158-5-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw: debugexit: add read callback

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <20180912160118.21158-3-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

fw_cfg_mem: add read memory region callback

Signed-off-by: Li Qiang <liq3ea@gmail.com>
Message-Id: <20180912160118.21158-2-liq3ea@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ui: fix virtual timers

UI uses timers based on virtual clock for managing key queue.
This is incorrect because this service is not related to the guest state,
and its events should not be recorded and replayed. But these timers should
stop when the guest is not executing.
This patch changes using virtual clock to the new virtual_ext clock,
which runs as virtual clock, but its timers are not saved to the log.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180912082013.3228.33664.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

slirp: fix ipv6 timers

ICMP implementation for IPv6 uses timers based on virtual clock.
This is incorrect because this service is not related to the guest state,
and its events should not be recorded and replayed.
This patch changes using virtual clock to the new virtual_ext clock.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180912082007.3228.91491.stgit@pasha-VirtualBox>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

timer: introduce new virtual clock

Slirp and VNC modules use virtual clock for processing some events that
are related to the guest execution speed.
But virtual clock-related events are consideres to be deterministic and
are recorded/replayed by icount mechanism. But slirp and VNC lie outside
the recorded guest core (which includes CPU and peripherals).
Therefore slirp and VNC are external for the guest, but should work at
guest speed.
This patch introduces new virtual clock which can be used for external
subsystems for running timers that are synchronized with the guest.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180912082002.3228.82417.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

replay: allow loading any snapshots before recording

This patch enables using -loadvm in recording mode to allow starting
the execution recording from any of the available snapshots.
It also fixes loading of the record/replay state, therefore snapshots
created in replay mode may also be used for starting the new recording.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180912081939.3228.56131.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

translator: fix breakpoint processing

QEMU cannot pass through the breakpoints when 'si' command is used
in remote gdb. This patch disables inserting the breakpoints
when we are already single stepping though the gdb remote protocol.
This patch also fixes icount calculation for the blocks that include
breakpoints - instruction with breakpoint is not executed and shouldn't
be used in icount calculation.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180912081910.3228.8523.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

replay: flush events when exiting

This patch adds events processing when emulation finishes instead
of just cleaning the queue. Now the bdrv coroutines will be in consistent
state when emulator closes. It allows correct polling of the block layer
at exit.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180912081859.3228.79735.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

replay: wake up vCPU when replaying

In record/replay icount mode vCPU thread and iothread synchronize
the execution using the checkpoints.
vCPU thread processes the virtual timers and iothread processes all others.
When iothread wants to wake up sleeping vCPU thread, it sends dummy queued
work. Therefore it could be the following sequence of the events in
record mode:
- IO: sending dummy work
- IO: processing timers
- CPU: wakeup
- CPU: clearing dummy work
- CPU: processing virtual timers

But due to the races in replay mode the sequence may change:
- IO: sending dummy work
- CPU: wakeup
- CPU: clearing dummy work
- CPU: sleeping again because nothing to do
- IO: Processing timers
- CPU: zzzz

In this case vCPU will not wake up, because dummy work is not to be set up
again.

This patch tries to wake up the vCPU when it sleeps and the icount warp
checkpoint isn't met. It means that vCPU has something to do, because
there are no other reasons of non-matching warp checkpoint.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
--

v5: improve checking that vCPU is still sleeping
Message-Id: <20180912081945.3228.19776.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

configure: enable mttcg for i386 and x86_64

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move x86_64_hregs to DisasContext

And convert it to a bool to use an existing hole
in the struct.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_tmp1_i64 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_tmp3_i32 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_tmp2_i32 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_ptr1 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_ptr0 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_tmp4 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_tmp0 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_T1 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_T0 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_A0 to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: move cpu_cc_srcT to DisasContext

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

change get_image_size return type to int64_t

Previously, if the size of initrd >=2G, qemu exits with error:
root@haswell-OptiPlex-9020:/home/lizj# /home/lizhijian/lkp/qemu-colo/x86_64-softmmu/qemu-system-x86_64 -kernel ./vmlinuz-4.16.0-rc4 -initrd large.cgz -nographic
qemu: error reading initrd large.cgz: No such file or directory
root@haswell-OptiPlex-9020:/home/lizj# du -sh large.cgz
2.5G large.cgz

this patch changes the caller side that use this function to calculate
size of initrd file as well.

v2: update error message and int64_t printing format

Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Message-Id: <1536833233-14121-1-git-send-email-lizhijian@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Delete PID file on exit

Register an exit notifier to remove the PID file. By the time atexit()
is called, qemu_write_pidfile() guarantees QEMU owns the PID file,
thus we could safely remove it when exiting.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180907121319.8607-4-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

serial: fix DLL writes

Commit 0147883450fe84bb8de2d4a58381881f4262ce9b tries to handle
word-sized writes to DLL/DLH, but due to a typo,
this patch is causing tracebacks in all Linux kernels running the PXA
serial driver, due to an unexpected DLL register value. Here is the
surrounding code from drivers/tty/serial/pxa.c:

serial_out(up, UART_DLL, quot & 0xff); /* LS of divisor */

/*
* work around Errata #75 according to Intel(R) PXA27x
* Processor Family Specification Update (Nov 2005)
*/
dll = serial_in(up, UART_DLL);
WARN_ON(dll != (quot & 0xff)); // <-- warning

Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 0147883450fe84bb8de2d4a58381881f4262ce9b
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util: use fcntl() for qemu_write_pidfile() locking

Daniel BerrangÃ© suggested to use fcntl() locks rather than lockf().

'man lockf':

   On Linux, lockf() is just an interface on top of fcntl(2) locking.
   Many other systems implement lockf() in this way, but note that
   POSIX.1 leaves the relationship between lockf() and fcntl(2) locks
   unspecified.  A portable application should probably avoid mixing
   calls to these interfaces.

IOW, if its just a shim around fcntl() on many systems, it is clearer
if we just use fcntl() directly, as we then know how fcntl() locks will
behave if they're on a network filesystem like NFS.

Suggested-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180831145314.14736-3-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util: add qemu_write_pidfile()

There are variants of qemu_create_pidfile() in qemu-pr-helper and
qemu-ga. Let's have a common implementation in libqemuutil.

The code is initially based from pr-helper write_pidfile(), with
various improvements and suggestions from Daniel BerrangÃ©:

  QEMU will leave the pidfile existing on disk when it exits which
  initially made me think it avoids the deletion race. The app
  managing QEMU, however, may well delete the pidfile after it has
  seen QEMU exit, and even if the app locks the pidfile before
  deleting it, there is still a race.

  eg consider the following sequence

        QEMU 1        libvirtd        QEMU 2

  1.    lock(pidfile)

  2.    exit()

  3.                 open(pidfile)

  4.                 lock(pidfile)

  5.                                  open(pidfile)

  6.                 unlink(pidfile)

  7.                 close(pidfile)

  8.                                  lock(pidfile)

  IOW, at step 8 the new QEMU has successfully acquired the lock, but
  the pidfile no longer exists on disk because it was deleted after
  the original QEMU exited.

  While we could just say no external app should ever delete the
  pidfile, I don't think that is satisfactory as people don't read
  docs, and admins don't like stale pidfiles being left around on
  disk.

  To make this robust, I think we might want to copy libvirt's
  approach to pidfile acquisition which runs in a loop and checks that
  the file on disk /after/ acquiring the lock matches the file that
  was locked. Then we could in fact safely let QEMU delete its own
  pidfiles on clean exit..

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180831145314.14736-2-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/char/sh_serial: Add timeout handling to unbreak serial input

As of commit 18e8cf159177100e ("serial: sh-sci: increase RX FIFO trigger
defaults for (H)SCIF") in Linux v4.11-rc1, the serial console on the
QEMU SH4 target is broken: it delays serial input until enough data has
been received.

Since aforementioned commit, the Linux SCIF driver programs the Receive
FIFO Data Count Trigger bits in the FIFO Control Register, to postpone
generating a receive interrupt until:
  1. At least the receive trigger count of bytes of data are available
     in the receive FIFO, OR
  2. No further data has been received for at least 15 etu after the
     last received data.

While QEMU implements the former, it does not implement the latter.
Hence the receive interrupt is not generated until the former condition
is met.

Fix this by adding basic timeout handling.  As the QEMU SCIF emulation
ignores any serial speed programming, the timeout value used conforms to
a default speed of 9600 bps, which is fine for any interactive console.

Reported-by: Rob Landley <rob@landley.net>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Ulrich Hecht <uli@fpond.eu>
Tested-by: Rob Landley <rob@landley.net>
Tested-by: Rich Felker <dalias@libc.org>
Message-Id: <20180905131125.12635-1-geert+renesas@glider.be>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

configure: preserve various environment variables in config.status

The config.status script is auto-generated by configure upon
completion. The intention is that config.status can be later invoked by
the developer directly, or by make indirectly, to re-detect the same
environment that configure originally used.

The current config.status script, however, only contains a record of the
command line arguments to configure. Various environment variables have
an effect on what configure will find. In particular PKG_CONFIG_LIBDIR &
PKG_CONFIG_PATH vars will affect what libraries pkg-config finds. The
PATH var will affect what toolchain binaries and XXXX-config scripts are
found. The LD_LIBRARY_PATH var will affect what libraries are
found. Most commands have env variables that will override the name/path
of the default version configure finds.

All these key env variables should be recorded in the config.status script.

Autoconf would also preserve CFLAGS, LDFLAGS, LIBS, CPPFLAGS, but QEMU
deals with those differently, expecting extra flags to be set using
configure args, rather than env variables. At the end of the script we
also don't have the original values of those env vars, as we modify them
during configure.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Message-Id: <20180904123603.10016-1-berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

kvm: x86: Fix kvm_arch_fixup_msi_route for remap-less case

The AMD IOMMU does not (yet) support interrupt remapping. But
kvm_arch_fixup_msi_route assumes that all implementations do and crashes
when the AMD IOMMU is used in KVM mode.

Fixes: 8b5ed7dffa1f ("intel_iommu: add support for split irqchip")
Reported-by: Christopher Goldsworthy <christopher.goldsworthy@outlook.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-Id: <48ae78d8-58ec-8813-8680-6f407ea46041@siemens.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hostmem-memfd: add checks before adding hostmem-memfd & properties

Run some memfd-related checks before registering hostmem-memfd &
various properties. This will help libvirt to figure out what the host
is supposed to be capable of.

qemu_memfd_check() is changed to a less optimized version, since it is
used with various flags, it no longer caches the result.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180906161415.8543-1-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

dump: fix Windows dump memory run mapping

We should map and use guest memory run by parts if it can't be mapped as
a whole.
After this patch, continuos guest physical memory blocks which are not
continuos in host virtual address space will be processed correctly.

Signed-off-by: Viktor Prutyanov <viktor.prutyanov@virtuozzo.com>
Message-Id: <1535567456-6904-1-git-send-email-viktor.prutyanov@virtuozzo.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cpus: access .qemu_icount_bias with atomic64

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-11-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cpus: access .qemu_icount with atomic64

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-10-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cpus: take seqlock across qemu_icount updates

Even though writes of qemu_icount can safely race with reads in
qemu_icount_raw, qemu_icount is also read by icount_adjust, which
runs in the I/O thread. Therefore, writes do needs protection of
the vm_clock_lock; for simplicity the patch protects it with both
seqlock+spinlock, which we already do for hosts that lack 64-bit atomics.

The bug actually predated the introduction of vm_clock_lock;
cpu_update_icount would have needed the BQL before the spinlock was
introduced.

Reported-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

test-rcu-list: access n_reclaims and n_nodes_removed with atomic64

To avoid undefined behaviour.

Note that these "atomics" are atomic in the "access once" sense.
The variables are updated by a single thread at a time, so no
"full" atomics are necessary.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-6-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qsp: use atomic64 accessors

With the seqlock, we either have to use atomics to remain
within defined behaviour (and note that 64-bit atomics aren't
always guaranteed to compile, irrespective of __nocheck), or
drop the atomics and be in undefined behaviour territory.

Fix it by dropping the seqlock and using atomic64 accessors.
This will limit scalability when !CONFIG_ATOMIC64, but those
machines (1) don't have many users and (2) are unlikely to
have many cores.

- With CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
Throughput: 13.00 Mops/s

- Forcing !CONFIG_ATOMIC64:
$ tests/atomic_add-bench -n 1 -m -p
Throughput: 10.89 Mops/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-5-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tests: add atomic64-bench

- With CONFIG_ATOMIC64:
$ tests/atomic64-bench  -n 1
Throughput:         310.40 Mops/s

- Without:
$ tests/atomic64-bench  -n 1
Throughput:         149.08 Mops/s

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-4-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

util: add atomic64

This introduces read/set accessors for int64_t and uint64_t.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-3-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cacheinfo: add i/d cache_linesize_log

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180910232752.31565-2-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cpus: initialize timers_state.vm_clock_lock

We forgot to initialize the spinlock introduced in 94377115b2
("cpus: protect TimerState writes with a spinlock", 2018-08-23).
Fix it.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180903171831.15446-5-cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

atomic: fix comment s/x64_64/x86_64/

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20180903171831.15446-4-cota@braap.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

ps2: prevent changing irq state on save and load

Commit 2858ab09e6f708e381fc1a1cc87e747a690c4884 changed
PS/2 keyboard/mouse buffers to the standard size. However, its state
may change when migrating from the old buffer size and therefore irq needs
updating. But this change made wrong, because it throws the whole queue
if there are too much data instead of cropping it.

That commit also updates irq (because the queue state may change).
But updating the irq may change the VM state (and determinism of
the execution). E.g., when replaying the execution, one may save
the VM state and the state of the interrupt controller will be updated
at the moment of saving, instead of using the recorded update events.

This patch makes the queue update deterministic: it removes the update_irq
call and crops the queue to prevent losing the characters and changing
the required irq status.

Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>
Message-Id: <20180511081601.14610.39946.stgit@pasha-VirtualBox>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

es1370: fix ADC_FRAMEADR and ADC_FRAMECNT

They are not consecutive with DAC1_FRAME* and DAC2_FRAME*.

Fixes: 154c1d1f960c5147a3f8ef00907504112f271cd8
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

qsp: hide indirect function calls from Coverity

Coverity does not see anymore that qemu_mutex_lock is taking a lock.
Hide all the QSP magic so that static analysis works again.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

virtio: Return true from virtio_queue_empty if broken

Both virtio-blk and virtio-scsi use virtio_queue_empty() as the
loop condition in VQ handlers (virtio_blk_handle_vq,
virtio_scsi_handle_cmd_vq). When a device is marked broken in
virtqueue_pop, for example if a vIOMMU address translation failed, we
want to break out of the loop.

This fixes a hanging problem when booting a CentOS 3.10.0-862.el7.x86_64
kernel with ATS enabled:

  $ qemu-system-x86_64 \
    ... \
    -device intel-iommu,intremap=on,caching-mode=on,eim=on,device-iotlb=on \
    -device virtio-scsi-pci,iommu_platform=on,ats=on,id=scsi0,bus=pci.4,addr=0x0

The dead loop happens immediately when the kernel boots and initializes
the device, where virtio_scsi_data_plane_handle_cmd will not return:

    > ...
    > #13 0x00005586602b7793 in virtio_scsi_handle_cmd_vq
    > #14 0x00005586602b8d66 in virtio_scsi_data_plane_handle_cmd
    > #15 0x00005586602ddab7 in virtio_queue_notify_aio_vq
    > #16 0x00005586602dfc9f in virtio_queue_host_notifier_aio_poll
    > #17 0x00005586607885da in run_poll_handlers_once
    > #18 0x000055866078880e in try_poll_mode
    > #19 0x00005586607888eb in aio_poll
    > #20 0x0000558660784561 in aio_wait_bh_oneshot
    > #21 0x00005586602b9582 in virtio_scsi_dataplane_stop
    > #22 0x00005586605a7110 in virtio_bus_stop_ioeventfd
    > #23 0x00005586605a9426 in virtio_pci_stop_ioeventfd
    > #24 0x00005586605ab808 in virtio_pci_common_write
    > #25 0x0000558660242396 in memory_region_write_accessor
    > #26 0x00005586602425ab in access_with_adjusted_size
    > #27 0x0000558660245281 in memory_region_dispatch_write
    > #28 0x00005586601e008e in flatview_write_continue
    > #29 0x00005586601e01d8 in flatview_write
    > #30 0x00005586601e04de in address_space_write
    > #31 0x00005586601e052f in address_space_rw
    > #32 0x00005586602607f2 in kvm_cpu_exec
    > #33 0x0000558660227148 in qemu_kvm_cpu_thread_fn
    > #34 0x000055866078bde7 in qemu_thread_start
    > #35 0x00007f5784906594 in start_thread
    > #36 0x00007f5784639e6f in clone

With this patch, virtio_queue_empty will now return 1 as soon as the
vdev is marked as broken, after a "virtio: zero sized buffers are not
allowed" error.

To be consistent, update virtio_queue_empty_rcu as well.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <20180910145616.8598-2-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge remote-tracking branch 'remotes/dgibson/tags/libfdt-20181002' into staging

Update dtc submodule to v1.4.7

We have some upcoming things planned for ppc that will require some
newer libfdt features.  In preparation, update the dtc/libfdt
submodule to upstreasm version v1.4.7.

# gpg: Signature made Tue 02 Oct 2018 05:23:43 BST
# gpg:                using RSA key 6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/libfdt-20181002:
  Update dtc/libfdt submodule to v1.4.7

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/xtensa/tags/20181001-xtensa' into staging

target/xtensa: preparation for FLIX support

Separate generation of per-instruction code (such as raising exceptions
and terminating TB) from per-opcode code.

# gpg: Signature made Mon 01 Oct 2018 19:14:34 BST
# gpg:                using RSA key 51F9CC91F83FA044
# gpg: Good signature from "Max Filippov <filippov@cadence.com>"
# gpg:                 aka "Max Filippov <max.filippov@cogentembedded.com>"
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>"
# Primary key fingerprint: 2B67 854B 98E5 327D CDEB  17D8 51F9 CC91 F83F A044

* remotes/xtensa/tags/20181001-xtensa:
  target/xtensa: extract gen_check_interrupts call
  target/xtensa: make rsr/wsr helpers return void
  target/xtensa: extract unconditional TB termination via slot 0
  target/xtensa: always end TB on CCOUNT access/CCOMPARE write
  target/xtensa: change SR number checks to assertions
  target/xtensa: extract unconditional TB termination
  target/xtensa: extract test for division by zero
  target/xtensa: extract test for cpdisabled exception
  target/xtensa: extract test for alloca exception
  target/xtensa: extract test for window underflow exception
  target/xtensa: extract test for window overflow exception
  target/xtensa: extract test for debug exception
  target/xtensa: extract test for syscall instruction
  target/xtensa: extract test for privileged instruction
  target/xtensa: extract test for an illegal instruction

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Update dtc/libfdt submodule to v1.4.7

dtc v1.4.7 contains a bunch of improvements to make libfdt safer against
handling a corrupted or malicious tree, which is a good thing to have. It
also includes an explicit fdt checking function that we'll be wanting in
future.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/xtensa: extract gen_check_interrupts call

- mark instructions that affect active IRQ level;
- put call for gen_check_interrupts right after the instruction
translation; when FLIX is enabled it will need to appear before
other exits from the TB as well;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: make rsr/wsr helpers return void

Now that all logic for TB termination is extracted from rsr/wsr their
return value is not used and may be dropped.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract unconditional TB termination via slot 0

- mark instructions that require TB termination via slot 0;
- put TB termination right after the instruction translation loop, if
termination w/o TB linking wasn't requested;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: always end TB on CCOUNT access/CCOMPARE write

Currently we only end TB in icount mode, because access to CCOUNT or
write to CCOMPARE are IO operations. Simplify the behaviour a bit and
end TB unconditionally.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: change SR number checks to assertions

Opcode decoding with libisa takes care about range of valid group SRs,
like CCOMPARE, IBREAKA, DBREAKA or DBREAKC. Turn range checks in wsr
implementations into assertions.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract unconditional TB termination

- mark all instructions that exit TB and require dynamic search for the
next TB;
- put TB termination right after the instruction translation loop;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for division by zero

- mark quos/quou/rems/remu instructions;
- drop parameter 0 from the translate_quou and split translate_remu from
it;
- put test for division by zero exception right after the coprocessor
exception test;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for cpdisabled exception

- add XtensaOpcodeOps::coprocessor with bitmask of coprocessors used by
  the instruction;
- replace coprocessor id parameter of gen_check_cpenable with the
  bitmask of used coprocessors;
- collect coprocessor IDs used by an instruction in the disassembly
  loop;
- put test for coprocessor disabled exception after the alloca test;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for alloca exception

- mark movsp instruction;
- put test for alloca exception right after the test for window
underflow;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for window underflow exception

- mark retw and retw.n instructions;
- extract window inderflow test from retw helper;
- put underflow exception check generation right after the overflow
check;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for window overflow exception

- add ps.callinc to the TB flags, that allows testing all instructions
  for window overflow statically;
- drop gen_window_check* functions; replace them with get_window_check
  that accepts bitmask of used registers;
- add XtensaOpcodeOps::test_overflow that returns bitmask of implicitly
  used registers; use it for entry and call{,x}{4,8,12};
- drop window overflow test from the entry helper;
- drop parameter 0 from translate_[di]cache and use translate_nop for
  d/i cache opcodes that don't need memory accessibility check;
- add bitmask XtensaOpcodeOps::windowed_register_op that marks opcode
  arguments that refer to windowed registers;
- translate windowed_register_op mask to a mask of actually used
  registers in the disassembly loop;
- add check for window overflow right after the check for debug
  exception;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for debug exception

- mark break and break.n instructions;
- collect debug cause bits from parameter 0 of instructions marked for
debug exception;
- put debug exception check right after syscall check;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for syscall instruction

- mark syscall instruction;
- put syscall exception check right after privileged exception check;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for privileged instruction

- mark privileged instructions;
- put single privileged instruction check after disassembly loop;
- translate_[di]cache: drop parameter 0, shift parameters one down;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: extract test for an illegal instruction

- TB flags: add XTENSA_TBFLAG_CWOE that corresponds to the architectural
CWOE state;
- entry: move CWOE check from the helper to the test_ill_entry;
- retw: move CWOE check from the helper to the test_ill_retw;
- separate instruction disassembly loop and translation loop; save
disassembly results in local array;

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- qcow2 cache option default changes (Linux: 32 MB maximum, limited by
  whatever cache size can be made use of with the specific image;
  default cache-clean-interval of 10 minutes)
- reopen: Allow specifying unchanged child node references, and changing
  a few generic options (discard, detect-zeroes)
- Fix werror/rerror defaults for -device drive=<node-name>
- Test case fixes

# gpg: Signature made Mon 01 Oct 2018 18:17:35 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>"
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream: (23 commits)
  tests/test-bdrv-drain: Fix too late qemu_event_reset()
  test-replication: Lock AioContext around blk_unref()
  qcow2: Fix cache-clean-interval documentation
  block-backend: Set werror/rerror defaults in blk_new()
  qcow2: Explicit number replaced by a constant
  qcow2: Set the default cache-clean-interval to 10 minutes
  qcow2: Resize the cache upon image resizing
  qcow2: Increase the default upper limit on the L2 cache size
  qcow2: Assign the L2 cache relatively to the image size
  qcow2: Avoid duplication in setting the refcount cache size
  qcow2: Make sizes more humanly readable
  include: Add a lookup table of sizes
  qcow2: Options' documentation fixes
  block: Allow changing 'detect-zeroes' on reopen
  block: Allow changing 'discard' on reopen
  file-posix: Forbid trying to change unsupported options during reopen
  block: Forbid trying to change unsupported options during reopen
  block: Allow child references on reopen
  block: Don't look for child references in append_open_options()
  block: Remove child references from bs->{options,explicit_options}
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tests/test-bdrv-drain: Fix too late qemu_event_reset()

qemu_event_reset() must be called before the AIO request in a different
iothread is submitted. Otherwise the request could be completed before
we do the qemu_event_reset() and the test would hang in
qemu_event_wait().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Tested-by: Max Reitz <mreitz@redhat.com>

test-replication: Lock AioContext around blk_unref()

Recently, the test case has started failing because some job related
functions want to drop the AioContext lock even though it hasn't been
taken:

    (gdb) bt
    #0  0x00007f51c067c9fb in raise () from /lib64/libc.so.6
    #1  0x00007f51c067e77d in abort () from /lib64/libc.so.6
    #2  0x0000558c9d5dde7b in error_exit (err=<optimized out>, msg=msg@entry=0x558c9d6fe120 <__func__.18373> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
    #3  0x0000558c9d6b5263 in qemu_mutex_unlock_impl (mutex=mutex@entry=0x558c9f3999a0, file=file@entry=0x558c9d6fd36f "util/async.c", line=line@entry=516) at util/qemu-thread-posix.c:96
    #4  0x0000558c9d6b0565 in aio_context_release (ctx=ctx@entry=0x558c9f399940) at util/async.c:516
    #5  0x0000558c9d5eb3da in job_completed_txn_abort (job=0x558c9f68e640) at job.c:738
    #6  0x0000558c9d5eb227 in job_finish_sync (job=0x558c9f68e640, finish=finish@entry=0x558c9d5eb8d0 <job_cancel_err>, errp=errp@entry=0x0) at job.c:986
    #7  0x0000558c9d5eb8ee in job_cancel_sync (job=<optimized out>) at job.c:941
    #8  0x0000558c9d64d853 in replication_close (bs=<optimized out>) at block/replication.c:148
    #9  0x0000558c9d5e5c9f in bdrv_close (bs=0x558c9f41b020) at block.c:3420
    #10 bdrv_delete (bs=0x558c9f41b020) at block.c:3629
    #11 bdrv_unref (bs=0x558c9f41b020) at block.c:4685
    #12 0x0000558c9d62a3f3 in blk_remove_bs (blk=blk@entry=0x558c9f42a7c0) at block/block-backend.c:783
    #13 0x0000558c9d62a667 in blk_delete (blk=0x558c9f42a7c0) at block/block-backend.c:402
    #14 blk_unref (blk=0x558c9f42a7c0) at block/block-backend.c:457
    #15 0x0000558c9d5dfcea in test_secondary_stop () at tests/test-replication.c:478
    #16 0x00007f51c1f13178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    #17 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    #18 0x00007f51c1f1337b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
    #19 0x00007f51c1f13552 in g_test_run_suite () from /lib64/libglib-2.0.so.0
    #20 0x00007f51c1f13571 in g_test_run () from /lib64/libglib-2.0.so.0
    #21 0x0000558c9d5de31f in main (argc=<optimized out>, argv=<optimized out>) at tests/test-replication.c:581

It is yet unclear whether this should really be considered a bug in the
test case or whether blk_unref() should work for callers that haven't
taken the AioContext lock, but in order to fix the build tests quickly,
just take the AioContext lock around blk_unref().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Fix cache-clean-interval documentation

Fixing cache-clean-interval documentation following the recent change to
a default of 600 seconds on supported plarforms (only Linux currently).

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-backend: Set werror/rerror defaults in blk_new()

Currently, the default values for werror and rerror have to be set
explicitly with blk_set_on_error() by the callers of blk_new(). The only
caller actually doing this is blockdev_init(), which is called for
BlockBackends created using -drive.

In particular, anonymous BlockBackends created with
-device ...,drive=<node-name> didn't get the correct default set and
instead defaulted to the integer value 0 (= BLOCKDEV_ON_ERROR_REPORT).
This is the intended default for rerror anyway, but the default for
werror should be BLOCKDEV_ON_ERROR_ENOSPC.

Set the defaults in blk_new() instead so that they apply no matter what
way the BlockBackend was created.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>

Merge remote-tracking branch 'remotes/kraxel/tags/ui-20181001-pull-request' into staging

ui: some small fixes/improvements.

# gpg: Signature made Mon 01 Oct 2018 11:42:16 BST
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/ui-20181001-pull-request:
  gtk: add zoom-to-fit to gtk options.
  vnc: call sasl_server_init() only when required
  sdl2: show console #0 unconditionally

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kraxel/tags/usb-20181001-pull-request' into staging

usb: fixes for mtp, hub and ohci.

# gpg: Signature made Mon 01 Oct 2018 10:28:36 BST
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/usb-20181001-pull-request:
  ohci: set effectively usb frame rate to 1kHz
  usb-hub: clear suspend on detach
  usb-mtp: reset ObjectInfo dataset size on cleanup
  doc: replace x-root with rootdir for usb-mtp
  usb-mtp: fix error conditions for write operation

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

qcow2: Explicit number replaced by a constant

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Set the default cache-clean-interval to 10 minutes

The default cache-clean-interval is set to 10 minutes, in order to lower
the overhead of the qcow2 caches (before the default was 0, i.e.
disabled).

* For non-Linux platforms the default is kept at 0, because
cache-clean-interval is not supported there yet.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Resize the cache upon image resizing

The caches are now recalculated upon image resizing. This is done
because the new default behavior of assigning L2 cache relatively to
the image size, implies that the cache will be adapted accordingly
after an image resize.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Increase the default upper limit on the L2 cache size

The upper limit on the L2 cache size is increased from 1 MB to 32 MB
on Linux platforms, and to 8 MB on other platforms (this difference is
caused by the ability to set intervals for cache cleaning on Linux
platforms only).

This is done in order to allow default full coverage with the L2 cache
for images of up to 256 GB in size (was 8 GB). Note, that only the
needed amount to cover the full image is allocated. The value which is
changed here is just the upper limit on the L2 cache size, beyond which
it will not grow, even if the size of the image will require it to.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Assign the L2 cache relatively to the image size

Sufficient L2 cache can noticeably improve the performance when using
large images with frequent I/O.

Previously, unless 'cache-size' was specified and was large enough, the
L2 cache was set to a certain size without taking the virtual image size
into account.

Now, the L2 cache assignment is aware of the virtual size of the image,
and will cover the entire image, unless the cache size needed for that is
larger than a certain maximum. This maximum is set to 1 MB by default
(enough to cover an 8 GB image with the default cluster size) but can
be increased or decreased using the 'l2-cache-size' option. This option
was previously documented as the *maximum* L2 cache size, and this patch
makes it behave as such, instead of as a constant size. Also, the
existing option 'cache-size' can limit the sum of both L2 and refcount
caches, as previously.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Avoid duplication in setting the refcount cache size

The refcount cache size does not need to be set to its minimum value in
read_cache_sizes(), as it is set to at least its minimum value in
qcow2_update_options_prepare().

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Make sizes more humanly readable

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

include: Add a lookup table of sizes

Adding a lookup table for the powers of two, with the appropriate size
prefixes. This is needed when a size has to be stringified, in which
case something like '(1 * KiB)' would become a literal '(1 * (1L << 10))'
string. Powers of two are used very often for sizes, so such a table
will also make it easier and more intuitive to write them.

This table is generatred using the following AWK script:

BEGIN {
suffix="KMGTPE";
for(i=10; i<64; i++) {
val=2**i;
s=substr(suffix, int(i/10), 1);
n=2**(i%10);
pad=21-int(log(n)/log(10));
printf("#define S_%d%siB %*d\n", n, s, pad, val);
}
}

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Options' documentation fixes

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow changing 'detect-zeroes' on reopen

'detect-zeroes' is one of the basic BlockdevOptions available for all
drivers, but it's not handled by bdrv_reopen_prepare(), so any attempt
to change it results in an error:

(qemu) qemu-io virtio0 "reopen -o detect-zeroes=on"
Cannot change the option 'detect-zeroes'

Since there's no reason why we shouldn't allow changing it and the
implementation is simple let's just do it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow changing 'discard' on reopen

'discard' is one of the basic BlockdevOptions available for all
drivers, but it's not handled by bdrv_reopen_prepare() so any attempt
to change it results in an error:

(qemu) qemu-io virtio0 "reopen -o discard=on"
Cannot change the option 'discard'

Since there's no reason why we shouldn't allow changing it and the
implementation is simple let's just do it.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

file-posix: Forbid trying to change unsupported options during reopen

The file-posix code is used for the "file", "host_device" and
"host_cdrom" drivers, and it allows reopening images. However the only
option that is actually processed is "x-check-cache-dropped", and
changes in all other options (e.g. "filename") are silently ignored:

(qemu) qemu-io virtio0 "reopen -o file.filename=no-such-file"

While we could allow changing some of the other options, let's keep
things as they are for now but return an error if the user tries to
change any of them.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Forbid trying to change unsupported options during reopen

The bdrv_reopen_prepare() function checks all options passed to each
BlockDriverState (in the reopen_state->options QDict) and makes all
necessary preparations to apply the option changes requested by the
user.

Options are removed from the QDict as they are processed, so at the
end of bdrv_reopen_prepare() only the options that can't be changed
are left. Then a loop goes over all remaining options and verifies
that the old and new values are identical, returning an error if
they're not.

The problem is that at the moment there are options that are removed
from the QDict although they can't be changed. The consequence of this
is any modification to any of those options is silently ignored:

(qemu) qemu-io virtio0 "reopen -o discard=on"

This happens when all options from bdrv_runtime_opts are removed
from the QDict but then only a few of them are processed. Since
it's especially important that "node-name" and "driver" are not
changed, the code puts them back into the QDict so they are checked
at the end of the function. Instead of putting only those two options
back into the QDict, this patch puts all unprocessed options using
qemu_opts_to_qdict().

update_flags_from_options() also needs to be modified to prevent
BDRV_OPT_CACHE_NO_FLUSH, BDRV_OPT_CACHE_DIRECT and BDRV_OPT_READ_ONLY
from going back to the QDict.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Allow child references on reopen

In the previous patches we removed all child references from
bs->{options,explicit_options} because keeping them is useless and
wrong.

Because of this, any attempt to reopen a BlockDriverState using a
child reference as one of its options would result in a failure,
because bdrv_reopen_prepare() would detect that there's a new option
(the child reference) that wasn't present in bs->options.

But passing child references on reopen can be useful. It's a way to
specify a BDS's child without having to pass recursively all of the
child's options, and if the reference points to a different BDS then
this can allow us to replace the child.

However, replacing the child is something that needs to be implemented
case by case and only when it makes sense. For now, this patch allows
passing a child reference as long as it points to the current child of
the BlockDriverState.

It's also important to remember that, as a consequence of the
previous patches, this child reference will be removed from
bs->{options,explicit_options} after the reopening has been completed.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Don't look for child references in append_open_options()

In the previous patch we removed child references from bs->options, so
there's no need to look for them here anymore.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Remove child references from bs->{options,explicit_options}

Block drivers allow opening their children using a reference to an
existing BlockDriverState. These references remain stored in the
'options' and 'explicit_options' QDicts, but we don't need to keep
them once everything is open.

What is more important, these values can become wrong if the children
change:

    $ qemu-img create -f qcow2 hd0.qcow2 10M
    $ qemu-img create -f qcow2 hd1.qcow2 10M
    $ qemu-img create -f qcow2 hd2.qcow2 10M
    $ $QEMU -drive if=none,file=hd0.qcow2,node-name=hd0 \
            -drive if=none,file=hd1.qcow2,node-name=hd1,backing=hd0 \
            -drive file=hd2.qcow2,node-name=hd2,backing=hd1

After this hd2 has hd1 as its backing file. Now let's remove it using
block_stream:

    (qemu) block_stream hd2 0 hd0.qcow2

Now hd0 is the backing file of hd2, but hd2's options QDicts still
contain backing=hd1.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

file-posix: x-check-cache-dropped should default to false on reopen

The default value of x-check-cache-dropped is false. There's no reason
to use the previous value as a default in raw_reopen_prepare() because
bdrv_reopen_queue_child() already takes care of putting the old
options in the BDRVReopenState.options QDict.

If x-check-cache-dropped was previously set but is now missing from
the reopen QDict then it should be reset to false.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>