git.proxmox.com Git - qemu.git/log

qcow2: Refactor qcow2_free_any_clusters

Zero clusters will add another cluster type. Refactor the open-coded
cluster type detection into a switch of QCOW2_CLUSTER_* options so that
the detection is in a single place. This makes it easier to add new
cluster types.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Ignore reserved bits in L1/L2 entries

This changes the still existing places that assume that the only flags
are QCOW_OFLAG_COPIED and QCOW_OFLAG_COMPRESSED to properly mask out
reserved bits.

It does not convert bdrv_check yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Fail write_compressed when overwriting data

qcow2_alloc_compressed_cluster_offset() already fails if the copied flag
is set, because qcow2_write_compressed() doesn't perform COW as it would
have to do to allow this.

However, what we really want to check here is whether the cluster is
allocated or not. With internal snapshots the copied flag may not be set
on allocated clusters. Check the cluster offset instead.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Ignore reserved bits in count_contiguous_clusters()

Until now, count_contiguous_clusters() has an argument that allowed to
specify flags that should be ignored in the comparison, i.e. that are
allowed to change between contiguous clusters.

This patch changes the function so that it ignores all flags by default
now and you need to pass the flags on which it should stop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Ignore reserved bits in get_cluster_offset

With this change, reading from a qcow2 image ignores all reserved bits
that are set in an L1 or L2 table entry.

Now get_cluster_offset() assigns *cluster_offset only the offset without
any other flags. The cluster type is not longer encoded in the offset,
but a positive return value in case of success.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Save disk size in snapshot header

This allows that different snapshots of an image can have different
sizes, which is a requirement for enabling image resizing even with
images that have internal snapshots.

We don't do the actual support for it now, but make sure that the
additional field is present and not completely ignored in all version 3
images. When trying to load a snapshot of different size, it returns
an error.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Specification for qcow2 version 3

This updates the qcow2 specification to cover version 3. It contains the
following changes:

- Added compatible/incompatible/auto-clear feature bits plus an optional
  feature name table to allow useful error messages even if an older
  version doesn't know some feature at all.

- Configurable refcount width. If you don't want to use internal
  snapshots, make refcounts one bit and save cache space and I/O.

- Zero cluster flags. This allows discard even with a backing file that
  doesn't contain zeros. It is also useful for copy-on-read/image
  streaming, as you'll want to keep sparseness without accessing the
  remote image for an unallocated cluster all the time.

- Fixed internal snapshot metadata to use 64 bit VM state size. You
  can't save a snapshot of a VM with >= 4 GB RAM today.

- Extended internal snapshot metadata to contain the disk size, so that
  resizing images that have snapshots can be allowed in the future.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at()

Refcount block allocation and refcount table growth rely on
s->free_cluster_index pointing to somewhere after the current
allocation. Change qcow2_alloc_cluster_at() to fulfill this
assumption.

Without this change it could happen that a newly allocated refcount
block and the allocated data block point to the same area in the image
file, causing data corruption in the long run.

This fixes a bug that became first visible after commit 250196f1.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Resolve test failures caused by hostname

`hostname -s` may output an errror:
hostname: Name or service not known
This causes all tests to fail for `make check-block`.

Suppress such error messages, letting the tests succeed.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-img: let 'qemu-img convert' flush data

The 'qemu-img convert -h' advertise that the default cache mode is
'writeback', while in fact it is 'unsafe'.

This patch 1) fix the help manual and 2) let bdrv_close() call bdrv_flush()

2) is needed because some backend storage doesn't have a self-flush
mechanism(for e.g., sheepdog), so we need to call bdrv_flush() to make
sure the image is really writen to the storage instead of hanging around
writeback cache forever.

Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio: simplify qemu_aio_wait

The do...while loop can never loop, because select will just not return
0 when invoked with infinite timeout.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio: return "AIO in progress" state from qemu_aio_wait

The definition of when qemu_aio_flush should loop is much simpler
than it looks. It just has to call qemu_aio_wait until it makes
no progress and all flush callbacks return false. qemu_aio_wait
is the logical place to tell the caller about this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

aio: remove process_queue callback and qemu_aio_process_queue

Both unused after the previous patch.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

posix-aio: merge posix_aio_process_queue and posix_aio_read

posix_aio_read already calls qemu_aio_process_queue, and dually
qemu_aio_process_queue is always followed by a select loop that calls
posix_aio_read.

No races are possible, so there is no need for a separate process_queue
callback.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-tool: map vm_clock to rt_clock

QED uses vm_clock timers so that images are not touched during and after
migration. This however does not apply to qemu-io and qemu-img.
Treat vm_clock as a synonym for rt_clock there, and enable it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-io: use main_loop_wait

This will let timers run during aio_read and aio_write commands,
though not during synchronous commands.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: allow interrupting a co_sleep_ns

In the next patch we want to reenter the coroutine from
block_job_cancel_sync and cancel the timer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Fix return value of alloc_refcount_block

Someone forgot something in commit 29c1a730... Documenting the right
return value is not enough, you also need to actually return it in the
code.

This bug sometimes causes error return values even when everything has
succeeded: The new offset of the refcount block is truncated to 32 bits
and interpreted as signed. At least with small cluster sizes it's easy
to get a negative return value this way.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

qcow2: Fix error handling in qcow2_alloc_cluster_offset

If do_alloc_cluster_offset() fails, the error handling code tried to
remove the request from the in-flight queue, to which it wasn't added
yet, resulting in a NULL pointer dereference.

m->nb_clusters really only becomes != 0 when the request is in the list.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

ide: convert ide_sector_write() to asynchronous I/O

The IDE PIO write sector code path uses bdrv_write() and hence can make
the guest unresponsive while the I/O request is in progress. This patch
converts ide_sector_write() to use bdrv_aio_writev() by using the
BUSY_STAT bit to tell the guest that the request is in progress.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Tested-by: Richard Davies <richard@arachsys.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

ide: convert ide_sector_read() to asynchronous I/O

The IDE PIO interface currently uses bdrv_read() to perform reads
synchronously.  Synchronous I/O in the vcpu thread is bad because it
prevents the guest from executing code - it makes the guest
unresponsive.

This patch converts IDE PIO to use bdrv_aio_readv().  We simply need to
use the BUSY_STAT status so the guest knows to wait while we are busy.

The only external user of ide_sector_read() is restart behavior on I/O
errors and it is not affected by this change.  We still need to restart
I/O in the same way.

Migration is also unaffected if I understand the code correctly.  We
continue to use the same transfer function and the BUSY_STAT status
should never be migrated since we flush I/O before migrating device
state.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Tested-by: Richard Davies <richard@arachsys.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-io: Add command line switch for cache mode

To be used as in 'qemu-io -t writeback test.img'

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

block: Fix spelling in comment (ineffcient -> inefficient)

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: fix error in 005

According comment, we should not read again, we will write.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Drain requests in bdrv_close

If an AIO request is in flight that refers to a BlockDriverState that
has been closed and possibly even freed, more or less anything could
happen. I have seen segfaults, -EBADF return values and qcow2 sometimes
actually catches the situation in bdrv_close() and abort()s.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

qemu-iotests: Test bdrv_close while AIO is in flight

If the BlockDriverState is closed/freed without draining the AIO
requests first, the request coroutines may work on invalid data and file
descriptors or have some dangling pointers that cause segfaults.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

qemu-iotests: Always filter cluster_size out in _make_test_img

Some image formats do have a cluster size, others don't, but there are
tests that work with both sets of images and currently we get failures
because the qemu-img create output doesn't mention the cluster size for
some formats.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Merge remote-tracking branch 'origin/master' into staging

* origin/master:
  Allow controlling volume with PulseAudio backend
  configure: pa_simple is not needed anymore
  Do not use pa_simple PulseAudio API
  audio/spice: add support for volume control
  hw/ac97: add support for volume control
  hw/ac97: the volume mask is not only 0x1f
  hw/ac97: remove USE_MIXER code
  audio: don't apply volume effect if backend has VOICE_VOLUME_CAP
  audio: add VOICE_VOLUME ctl

Merge remote-tracking branch 'spice/spice.v52' into staging

* spice/spice.v52:
  qxl-render: fix broken vnc+spice since commit f934493
  qxl: set default values of vram*_size_mb to -1
  trace-events: remove unused qxl_vga_ioport_while_not_in_vga_mode

Merge remote-tracking branch 'kraxel/usb.46' into staging

* kraxel/usb.46: (21 commits)
  usb-ehci: drop assert()
  usb-redir: Notify our peer when we reject a device due to a speed mismatch
  usb-ehci: Drop unused sofv value
  usb-host: rewrite usb_linux_update_endp_table
  usb: use USBDescriptor for endpoint descriptors.
  usb: use USBDescriptor for interface descriptors.
  usb: use USBDescriptor for config descriptors.
  usb: use USBDescriptor for device qualifier descriptors.
  usb: add USBDescriptor, use for device descriptors.
  usb-ehci: frindex always is a 14 bits counter
  usb-ehci: fix ehci_child_detach
  usb-hub: add tracepoints
  usb_packet_set_state: handle p->ep == NULL
  usb-host: add property to turn off pipelining
  usb-host: add usb packet to request tracepoints
  usb-host: trace canceled requests
  usb-host: trace emulated requests
  Add bootindex support to usb-host and usb-redir
  usb-uhci: queuing fix
  usb-uhci: stop queue filling when we find a in-flight td
  ...

qxl-render: fix broken vnc+spice since commit f934493

Notify any listeners such as vnc that the displaysurface has been
changed, otherwise they will segfault when first accessing the freed old
displaysurface data.

Signed-off-by: Alon Levy <alevy@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

qxl: set default values of vram*_size_mb to -1

The addition of those values caused a regression where not specifying
any value for the vram bar size would result in a 4096 _byte_ surface
area. This is ok for the windows driver but causes the X driver to be
unusable. Also, it's a regression. This patch returns the default
behavior of having a 64 megabyte vram BAR.

Signed-off-by: Alon Levy <alevy@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

trace-events: remove unused qxl_vga_ioport_while_not_in_vga_mode

The resulting stp file fails to load because of an unresolvable probe.

Signed-off-by: Alon Levy <alevy@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Allow controlling volume with PulseAudio backend

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

configure: pa_simple is not needed anymore

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

Do not use pa_simple PulseAudio API

Unfortunately, pa_simple is a limited API which doesn't let us
retrieve the associated pa_stream. It is needed to control the volume
of the stream.

In v4:
- add missing braces

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

audio/spice: add support for volume control

Use Spice server volume control API when available.

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

hw/ac97: add support for volume control

Combine output volume with Master and PCM registers values.
Use default values in mixer_reset ().
Set volume on post-load to update backend values.

v4,v5:
- fix some code style

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

hw/ac97: the volume mask is not only 0x1f

It's a case by case (see Table 66. AC ?97 Baseline Audio Register Map)

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

hw/ac97: remove USE_MIXER code

That code doesn't compile. The interesting bits for volume control are
going to be rewritten in the following patch.

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

audio: don't apply volume effect if backend has VOICE_VOLUME_CAP

If the audio backend is capable of volume control, don't apply
software volume (mixeng_volume ()), but instead, rely on backend
volume control. This will allow guest to have full range volume
control.

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

audio: add VOICE_VOLUME ctl

Add a new PCM control operation to update the stream volume on the
audio backend. The argument given is a SWVoiceOut/SWVoiceIn.

v4:
- verified other backends didn't fail/assert on this new control
they randomly return 0 or -1, but we ignore return value.

Signed-off-by: Marc-Andr? Lureau <marcandre.lureau@redhat.com>
Signed-off-by: malc <av1474@comtv.ru>

seabios: update to 1.7.0

Update roms/seabios and pc-bios/bios.bin to the 1.7.0 release.
Most noticable new feature is virtio-scsi support.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-ehci: drop assert()

Not sure what the purpose of the assert() was, in any case it is bogous.
We can arrive there if transfer descriptors passed to us from the guest
failed to pass sanity checks, i.e. it is guest-triggerable. We deal
with that case by resetting the host controller. Everything is ok, no
need to throw a core dump here.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-redir: Notify our peer when we reject a device due to a speed mismatch

Also cleanup (reset) our device state when we reject a device due to a
speed mismatch.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-ehci: Drop unused sofv value

The sofv value only ever gets a value assigned and is never used (read)
anywhere, so we can just drop it.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-host: rewrite usb_linux_update_endp_table

This patch carries a complete rewrite of the usb descriptor parser.
Changes / improvements:

* We are using the USBDescriptor struct instead of hard-coded offsets
   now to access descriptor data.
* (debug) printfs are all gone, tracepoints have been added instead.
* We don't try (and fail) to skip over unneeded descriptors.  We parse
   them all one by one.  We keep track of which configuration, interface
   and altsetting we are looking at and use this information to figure
   which desciptors are in use and which we can ignore.
* On parse errors we clear all endpoint information, which will
   disallow any communication with the device, except control endpoint
   messages.  This makes sure we don't end up with a silly device state
   where half of the endpoints got enabled and the other half was left
   disabled.
* Some sanity checks have been added.

The new parser is more robust and also leaves complete device
information in the trace log if you enable the ush_host_parse_*
tracepoints.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb: use USBDescriptor for endpoint descriptors.

Add endpoint descriptor substruct to USBDescriptor,
use it in the descriptor generator code.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb: use USBDescriptor for interface descriptors.

Add interface descriptor substruct to USBDescriptor,
use it in the descriptor generator code.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb: use USBDescriptor for config descriptors.

Add config descriptor substruct to USBDescriptor,
use it in the descriptor generator code.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb: use USBDescriptor for device qualifier descriptors.

Add device qualifier substruct to USBDescriptor,
use it in the descriptor generator code.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb: add USBDescriptor, use for device descriptors.

This patch adds a new type for the binary representation of usb
descriptors. It is put into use for the descriptor generator code
where the struct replaces the hard-coded offsets.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-ehci: frindex always is a 14 bits counter

frindex always is a 14 bits counter, and not a 13 bits one as we were
emulating. There are some subtle hints to this in the spec, first of all
"Table 2-12. FRINDEX - Frame Index Register" says:
"Bit 13:0 Frame Index. The value in this register increments at the end of
each time frame (e.g. micro-frame). Bits [N:3] are used for the Frame List
current index. This means that each location of the frame list is accessed
8 times (frames or micro-frames) before moving to the next index. The
following illustrates values of N based on the value of the Frame List
Size field in the USBCMD register.

USBCMD[Frame List Size] Number Elements N
00b 1024 12
01b 512 11
10b 256 10
11b Reserved"

Notice how the text talks about "Bits [N:3]" are used ..., it does
NOT say that when N == 12 (our case) the counter will wrap from 8191 to 0,
or in otherwords that it is a 13 bits counter (bits 0 - 12).

The other hint is in "Table 2-10. USBSTS USB Status Register Bit Definitions":

"Bit 3 Frame List Rollover - R/WC. The Host Controller sets this bit to a one
when the Frame List Index (see Section 2.3.4) rolls over from its maximum value
to zero. The exact value at which the rollover occurs depends on the frame
list size. For example, if the frame list size (as programmed in the Frame
List Size field of the USBCMD register) is 1024, the Frame Index Register
rolls over every time FRINDEX[13] toggles. Similarly, if the size is 512,
the Host Controller sets this bit to a one every time FRINDEX[12] toggles."

Notice how this text talks about setting bit 3 when bit 13 of frindex toggles
(when there are 1024 entries, so our case), so this indicates that frindex
has a bit 13 making it a 14 bit counter.

Besides these clear hints the real proof is in the pudding. Before this
patch I could not stream data from a USB2 webcam under Windows XP, after
this cam using a USB2 webcam under Windows XP works fine, and no regressions
with other operating systems were seen.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-ehci: fix ehci_child_detach

Looks like a cut+paste bug from ehci_detach. When the device itself is
detached from a ehci port (ehci_detach op) we have to clear the
device pointer for the companion port too. When a device gets removed
from a downstream port of a usb hub (ehci_child_detach op) the ehci port
where the usb hub is plugged in is not affected.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-hub: add tracepoints

Add tracepoints to the usb hub emulation.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb_packet_set_state: handle p->ep == NULL

usb_packet_set_state can be called with p->ep = NULL. The tracepoint
there tries to log endpoint information, which leads to a segfault.
This patch makes usb_packet_set_state handle the NULL pointer properly.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-host: add property to turn off pipelining

Add a property to usb-host to disable the bulk endpoint pipelining.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-host: add usb packet to request tracepoints

Add pointer to USBPacket to all tracepoints tracking requests to make it
easier to identify them when multiple requests are in flight.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-host: trace canceled requests

Add tracepoints to track canceled requests.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-host: trace emulated requests

Add tracepoint to track completion of emulated control requests.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Add bootindex support to usb-host and usb-redir

When passing through a usb pendrive seabios will present it in the F12
boot menu and will happily boot from it.

This patch adds bootorder support so you can even make it the default
boot device.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-uhci: queuing fix

When we queue up usb packets we may happen to find a already queued
packet, which also might be finished at that point already. We don't
want continue processing the packet at this point though, so lets
just signal back we've found a in-flight packet when in queuing mode.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-uhci: stop queue filling when we find a in-flight td

Not only QHs can form rings, but TDs too.  With the new
queuing/pipelining support we are following TD chains and
can actually walk in circles.  An assert() prevents us from
entering an endless loop then.

Fix is easy:  Just stop queuing when we figure the TD we are
about to queue up is in flight already.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb/vmstate: add parent dev path

... to make vmstate id string truely unique with multiple host
controllers, i.e. move from "1/usb-ptr" to "0000:00:01.3/1/usb-ptr"
(usb tabled connected to piix3 uhci).

This obviously breaks migration.  To handle this the usb bus
property "full-path" is added.  When setting this to false old
behavior is maintained.  This way current qemu will be compatible
with old versions when started using '-M pc-$oldversion'.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

qemu-timer.c: Remove 250us timeouts

Basically, the main wait loop calls qemu_run_all_timers() unconditionally. The
first thing this routine used to do is to see if a timer had been serviced,
and then reset the loop timeout to the next deadline.

However, the new deadlines had not been calculated at that point, as
qemu_run_timers() had not been called yet for each of the clocks. So
qemu_rearm_alarm_timer() would end up with a negative or zero deadline, and
default to setting a 250us timeout for the loop.

As qemu_run_timers() is called for each clock, the real deadlines would be put
in place, but because a loop timeout was already set, the loop timeout would
not be changed.

Once that 250us timeout fired, the real deadline would be used for the
subsequent timeout.

For idle VMs, this effectively doubles the number of times through the loop,
doubling the number of select() system calls, timer calls, etc. putting added
scheduling pressure on the kernel. And under cgroups, this really causes a big
problem because the cgroup code does not scale well.

By simply running the timers before trying to rearm the timer, we always rearm
with a non-zero deadline, effectively halving the number of system calls.

Signed-off-by: Peter Portante <pportant@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Merge remote-tracking branch 'kiszka/queues/pending' into staging

* kiszka/queues/pending:
  vapic: Disable for pre-1.1 machines
  Kick io-thread on qemu_chr_accept_input
  pcnet: Properly handle TX requests during Link Fail
  pcnet: Clear ERR in CSR0 on stop
  signrom: Rewrite as python script

Conflicts:
hw/pc_piix.c

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

Merge remote-tracking branch 'mst/tags/for_anthony' into staging

* mst/tags/for_anthony:
  pci: fix corrupted pci conf index register by unaligned write
  acpi: explicitly account for >1 device per slot
  acpi_piix4: Re-define PCI hotplug eject register read
  acpi_piix4: Remove PCI_RMV_BASE write code
  acpi_piix4: Fix PCI hotplug race
  acpi_piix4: Disallow write to up/down PCI hotplug registers
  virtio-pci: change virtio balloon PCI class code
  ivshmem: add missing msix calls
  vhost: readd assert statement
  vhost: Fix size of dirty log sync on resize
  pc: reduce duplication in compat machine types
  piix_pci: fix typo in i400FX chipset init code

Merge remote-tracking branch 'sstabellini/for_anthony' into staging

* sstabellini/for_anthony:
  xen: introduce an event channel for buffered io event notifications
  xen-mapcache: don't unmap locked entry during mapcache invalidation
  Xen, mapcache: Fix the compute of the size of bucket.
  xen: handle backend deletion from xenstore
  Xen: Add xen-apic support and hook it up.
  Xen: basic HVM MSI injection support.

vapic: Disable for pre-1.1 machines

The kvmvapic was not present in older QEMU versions, thus must be
disabled in compat machines.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Kick io-thread on qemu_chr_accept_input

Once a chr frontend is able to receive input again, we need to inform
the io-thread about this fact. Otherwise, main_loop_wait may continue to
select without the related backend file descriptor in its set. This can
cause high input latencies if only low-rate events arrive otherwise.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

pcnet: Properly handle TX requests during Link Fail

As long as we have no link and we aren't in internal loopback mode, no
packet must be sent. Instead, LCAR needs to be set in any active TX
descriptor and also CERR in CSR0.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

pcnet: Clear ERR in CSR0 on stop

pcnet_stop already clears any reason (BABL, CERR, MISS, MERR) why ERR
(bit 15) should be set in CRS0. So we have to clear that bit as well.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

signrom: Rewrite as python script

Now that we have a hard dependency on python anyway, we can replace the
slow shell script to calculate the option ROM checksum with a fast AND
portable python version. Tested both with python 2.7 and 3.1.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

Merge branch 'w64' of git://qemu.weilnetz.de/qemu

* 'w64' of git://qemu.weilnetz.de/qemu:
  w64: Fix time conversion for some versions of MinGW-w64
  nbd: Fix compiler warning (w64)
  disas: Replace 'unsigned long' by 'uintptr_t'
  cpu-exec: Remove non-portable type cast and fix format string
  target-mips: Fix type cast for w64 (uintptr_t)
  w64: Fix type cast in os_host_main_loop_wait
  w64: Fix data types in softmmu*.h
  w64: Use uintptr_t in exec.c
  softmmu: Use uintptr_t for physaddr and rename it
  w64: Fix struct CPUTLBEntry
  w64: Fix definition of setjmp
  w32: Move defines for socket specific errors to qemu-os-win32.h
  w64: Use larger alignment for section with generated code
  w64: Fix data types in cpu-all.h, exec.c
  w64: Fix type casts used in some macros in cpu-all.h
  tcg/i386: Add support for w64 ABI
  tcg/i386: Use GDB JIT debugging interface only for hosts with ELF

target-alpha: QOM'ify CPU init

Move code from cpu_alpha_init() into a CPU initializer.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Richard Henderson <rth@twiddle.net>

target-alpha: QOM'ify CPU

Embed CPUAlphaState as first member of AlphaCPU.

Signed-off-by: Andreas Färber <afaerber@suse.de>
Acked-by: Richard Henderson <rth@twiddle.net>

w64: Fix time conversion for some versions of MinGW-w64

tb.time is a time value, but not necessarily of the same size as time_t:
while time_t is 64 bit for w64, tb.time still is 32 bit only.

Therefore we need en explicit conversion.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

nbd: Fix compiler warning (w64)

Portable printing of dev_offset (data type off_t) needs a type cast.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

disas: Replace 'unsigned long' by 'uintptr_t'

This is needed for w64. It changes nothing for other hosts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

cpu-exec: Remove non-portable type cast and fix format string

This change is needed for w64, but also changes the code for other hosts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

target-mips: Fix type cast for w64 (uintptr_t)

This changes nothing for other hosts.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w64: Fix type cast in os_host_main_loop_wait

Casting a pointer to an integer must use (DWORD_PTR) instead of (DWORD).
This also matches the definition of 'fd' (gint for w32, gint64 for w64).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w64: Fix data types in softmmu*.h

w64 requires uintptr_t.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w64: Use uintptr_t in exec.c

Replace all type casts to 'long' or 'unsigned long' by 'intptr_t' or 'uintptr_t'.

For type casts which are only used to extract the lower bits of an address
or to modify those bits, signedness does not matter. There I always use 'uintptr_t'.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

softmmu: Use uintptr_t for physaddr and rename it

Variable physaddr is a host address which should be represented by
data type 'uintptr_t'.

This is needed for w64 and changes nothing for other hosts.

v2:
Rename physaddr -> hostaddr (suggested by Blue Swirl).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w64: Fix struct CPUTLBEntry

For w64, some entries need 'uintptr_t' instead of 'unsigned long'.

For other host systems, both data types are identical, so nothing changes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w64: Fix definition of setjmp

The default definition of setjmp which is implemented in MinGW-w64
cannot be used with programs like QEMU which call longjmp from
code without structured exception handling (SEH).

This code therefore disables stack unwinding.

We could also implement SEH for QEMU's generated JIT code, but
that is much more difficult. Stack unwinding would also cost
execution time.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w32: Move defines for socket specific errors to qemu-os-win32.h

As those defines are only used for w32,
they should be in the header file for w32.

All files which include slirp.h or qemu_socket.h also
include qemu-os-win32.h.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w64: Use larger alignment for section with generated code

The MinGW-w64 compiler allows __attribute__((aligned (32)).

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w64: Fix data types in cpu-all.h, exec.c

w64 needs uintptr_t instead of unsigned long.
For other hosts, nothing changes.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

w64: Fix type casts used in some macros in cpu-all.h

Instead of type casts to long, w64 needs type casts to intptr_t.
For other hosts, this changes nothing.

Signed-off-by: Stefan Weil <sw@weilnetz.de>

tcg/i386: Add support for w64 ABI

w64 uses the registers rcx, rdx, r8 and r9 for function arguments,
so it needs a different declaration of tcg_target_call_iarg_regs.

rax, rcx, rdx, r8, r9, r10 and r11 may be changed by function calls.

rbx, rbp, rdi, rsi, r12, r13, r14 and r15 remain unchanged by function calls.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>

tcg/i386: Use GDB JIT debugging interface only for hosts with ELF

Not all i386 / x86_64 hosts use ELF.
Ask the compiler whether ELF is used.

On w64, gdb crashes when ELF_HOST_MACHINE is defined.

Cc: Blue Swirl <blauwirbel@gmail.com>
Acked-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>

Merge branch 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf

* 'ppc-for-upstream' of git://repo.or.cz/qemu/agraf:
  pseries: Fix reset of VIO network device
  pseries: Reset vscsi properly
  pseries: Correctly use the device model reset hooks
  pseries: Remove old hcalls hook stub
  pseries: Remove old debug leftovers from spapr_vscsi
  pseries: Fix RTAS based config access
  target-ppc/machine.c: Drop unnecessary ifdefs
  target-ppc: Init dcache and icache size for e500 user mode
  target-ppc: Fix type casts for w64 (uintptr_t)
  target-ppc: QOM'ify CPU reset
  target-ppc: Start QOM'ifying CPU init
  target-ppc: QOM'ify CPU
  target-ppc: Add hooks for handling tcg and kvm limitations
  target-ppc: Drop cpu_ppc_close()
  pseries: Consolidate hack for RTAS display-character usage
  pseries: Remove unused fields from VIOsPAPRBus structure
  pseries: Implement RTAS system-reboot call
  pseries: Fix bug with reset of VIO CRQs
  pseries: Clean up hcall_dprintf() debugging messages
  PPC: Fix TLB invalidation bug within the PPC interrupt handler.

pseries: Fix reset of VIO network device

Currently, the PAPR VIO network device does not have a reset handler. This
means that after a hard reset, H_REGISTER_LOGICAL_LAN will return an error
when the new guest boot attempts to initialize the device.

This patch corrects this, adding a suitable reset hook.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andreas Färber <afaerber@suse.de>

pseries: Reset vscsi properly

Currently the PAPR vscsi implementation does not properly clear its table
of request tags when the system is reset. This patch adds a reset hook
to do so.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andreas Färber <afaerber@suse.de>

pseries: Correctly use the device model reset hooks

Recently we added code to properly clean away VIO CRQs on reset  However,
this directly uses qemu_register, rather than the existing device model
reset callbacks.  This patch cleans this up by adding proper use of the
reset hook to the VIO bus model.  The existing CRQ reset code is converted
to the new method.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andreas Färber <afaerber@suse.de>

pseries: Remove old hcalls hook stub

Some time ago we removed all use of the 'hcalls' callback in the pseries
VIO code, which was used to workaround an ordering problem which has since
been solved properly. However, the function pointer for the hook remains.
This patch cleans it away.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andreas Färber <afaerber@suse.de>

pseries: Remove old debug leftovers from spapr_vscsi

The PAPR VSCSI emulation contains a few lines of code which were once used
for debug but now do nothing at all. This patch removes them.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andreas Färber <afaerber@suse.de>

pseries: Fix RTAS based config access

On the pseries platform, access to PCI config space is via RTAS calls(
which go to the hypervisor) rather than MMIO. This means we don't use
the same code path as nearly everyone else which goes through pci_host.c
and we're missing some of the parameter checking along the way.

We do have some parameter checking in the RTAS calls, but it's not enough.
It checks for overruns, but does not check for unaligned accesses,
oversized accesses (which means the guest could trigger an assertion
failure from pci_host_config_{read,write}_common(). Worse it doesn't do
the basic checking for the number of RTAS arguments and results before
accessing them.

This patch fixes these bugs.

Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[AF: Fix typos spotted by mst]
Signed-off-by: Andreas Färber <afaerber@suse.de>