Avi Kivity [Sun, 14 Aug 2011 04:04:49 +0000 (07:04 +0300)]
posix-aio-compat: fix latency issues
In certain circumstances, posix-aio-compat can incur a lot of latency:
- threads are created by vcpu threads, so if vcpu affinity is set,
aio threads inherit vcpu affinity. This can cause many aio threads
to compete for one cpu.
- we can create up to max_threads (64) aio threads in one go; since a
pthread_create can take around 30μs, we have up to 2ms of cpu time
under a global lock.
Fix by:
- moving thread creation to the main thread, so we inherit the main
thread's affinity instead of the vcpu thread's affinity.
- if a thread is currently being created, and we need to create yet
another thread, let thread being born create the new thread, reducing
the amount of time we spend under the main thread.
- drop the local lock while creating a thread (we may still hold the
global mutex, though)
Note this doesn't eliminate latency completely; scheduler artifacts or
lack of host cpu resources can still cause it. We may want pre-allocated
threads when this cannot be tolerated.
Thanks to Uli Obergfell of Red Hat for his excellent analysis and suggestions.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Nicholas Thomas [Mon, 15 Aug 2011 09:00:34 +0000 (10:00 +0100)]
block/curl: Handle failed reads gracefully.
Current behaviour if a read fails is for the acb to not get finished.
This causes an infinite loop in bdrv_read_em (block.c). The read failure
never gets reported to the guest and if the error condition clears, the
process never recovers.
With this patch, when curl reports a failure we finish the acb as a
failure. This results in the guest receiving an I/O error (rather than
the read hanging indefinitely) and if the error condition subsequently
clears, retries work as expected.
The simplest test is to put an ISO on a web server you have control over
and open it with qemu-io. Then move the ISO out of the way and attempt
to read some data - you should see behaviour matching the above.
Signed-off-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Kevin Wolf [Mon, 8 Aug 2011 12:09:12 +0000 (14:09 +0200)]
qemu-img: Use qemu_blockalign
Now that you can use cache=none for the output file in qemu-img, we should
properly align our buffers so that raw-posix doesn't have to use its (smaller)
bounce buffer.
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Philipp Hahn [Thu, 4 Aug 2011 17:22:10 +0000 (19:22 +0200)]
qcow2: Fix DEBUG_* compilation
By introducing BlockDriverState compiling qcow2 with DEBUG_ALLOC and DEBUG_EXT
defined got broken.
Define a BdrvCheckResult structure locally which is now needed as the second
argument.
Also fix qcow2_read_extensions() needing BDRVQcowState.
Signed-off-by: Philipp Hahn <hahn@univention.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Stefan Hajnoczi [Thu, 4 Aug 2011 11:26:52 +0000 (12:26 +0100)]
block: add cache=directsync parameter to -drive
This patch adds -drive cache=directsync for O_DIRECT | O_SYNC host file
I/O with no disk write cache presented to the guest.
This mode is useful when guests may not be sending flushes when
appropriate and therefore leave data at risk in case of power failure.
When cache=directsync is used, write operations are only completed to
the guest when data is safely on disk.
This new mode is like cache=writethrough but it bypasses the host page
cache.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This is a microblaze target specific function that belongs outside
of xilinx.h (which is a collection of target independent device model
instantiator functions)
Signed-off-by: Peter A. G. Crosthwaite <peter.crosthwaite@petalogix.com> Signed-off-by: Edgar E. Iglesias <edgar.iglesias@gmail.com>
Jan Kiszka [Mon, 22 Aug 2011 16:35:25 +0000 (18:35 +0200)]
Replace qemu_system_cond with VCPU stop mechanism
We can express the VCPU thread wakeup with the stop mechanism, saving
both qemu_system_ready and the qemu_system_cond. For KVM threads, we can
just enter the main loop as long as the thread is stopped. The central
TCG thread is better held back before the loop as there can be side
effects of the services called even when all CPUs are stopped.
Creating VCPUs in stopped state will also be required for proper CPU
hotplugging support.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Jan Kiszka [Mon, 22 Aug 2011 17:12:12 +0000 (19:12 +0200)]
vga: Use linear mapping + dirty logging in chain 4 memory access mode
Most VGA memory access modes require MMIO handling as they demand weird
logic to get a byte from or into the video RAM. However, there is one
exception: chain 4 mode with all memory planes enabled for writing. This
mode actually allows lineary mapping, which can then be combined with
dirty logging to accelerate KVM.
This patch accelerates specifically VBE accesses like they are used by
grub in graphical mode. Not only the standard VGA adapter benefits from
this, also vmware and spice in VGA mode.
CC: Gerd Hoffmann <kraxel@redhat.com> CC: Avi Kivity <avi@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Jan Kiszka [Mon, 22 Aug 2011 17:12:11 +0000 (19:12 +0200)]
vmware-vga: Eliminate vga_dirty_log_restart
After the conversion to the new Memory API, vga_dirty_log_restart became
seriously pointless. Remove it from vmware-vga and and then finally drop
the service.
CC: Andrzej Zaborowski <balrogg@gmail.com> CC: Avi Kivity <avi@redhat.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Jan Kiszka [Mon, 22 Aug 2011 15:46:03 +0000 (17:46 +0200)]
Do not kick vcpus in TCG mode
In TCG mode, iothread and vcpus run in lock-step. So it's pointless to
send a signal from qemu_cpu_kick to the vcpu thread - if we got here,
the receiver already left the vcpu loop.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Jan Kiszka [Mon, 22 Aug 2011 15:46:02 +0000 (17:46 +0200)]
Poll main loop after I/O events were received
Polling until select returns empty fdsets helps to reduce the switches
between iothread and vcpus. The benefit of this patch is best visible
when running an SMP guest on an SMP host in emulation mode.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Jan Kiszka [Mon, 22 Aug 2011 15:46:01 +0000 (17:46 +0200)]
Do not drop global mutex for polled main loop runs
If we call select without a timeout, it's more efficient to keep the
global mutex locked as we may otherwise just play ping pong with a
vcpu thread contending for it. This is particularly important for TCG
mode where we run in lock-step with the vcpu thread.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
The legacy functions that we're wrapping expect that offset
to be included in the register. Indeed, they generally
expect the absolute address and then mask off the "high" bits.
The FDC is the first converted device with a non-zero offset.
Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Avi Kivity <avi@redhat.com>
Anthony Liguori [Mon, 22 Aug 2011 16:14:56 +0000 (11:14 -0500)]
memory: temporarily suppress the subregion collision warning
After 312b4234, the APIC and PCI devices are colliding with each other. This
is harmless in practice because the APIC accesses are special cased and never
make there way onto the bus.
Avi is working on a proper fix, but until that's ready, avoid printing the
warning.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Avi Kivity [Mon, 15 Aug 2011 14:17:38 +0000 (17:17 +0300)]
440fx: fix PAM, PCI holes
The current implementation of PAM and the PCI holes is broken in several
ways:
- PCI BARs are not restricted to the PCI hole (a BAR may hide memory)
- PCI devices do not respect PAM (if a PCI device maps a region while
PAM maps the region to RAM, the request will be honored)
This patch fixes things by introducing a pci address space, and using
memory region aliases to represent PAM regions, SMRAM, and PCI holes.
The memory hierarchy looks something like
system_memory
|
+--- low memory alias (0-0xe0000000)
| |
| +-- ram@0
|
+--- high memory alias (0x100000000-EOM)
| |
| +-- ram@0xe0000000
|
+--- pci hole alias (end of low memory-0x100000000)
| |
| +-- pci@end-of-low-memory
|
|
+--- pam[n] (0xc0000-0xc3fff etc) (when set to pci, priority 1)
| |
| +-- pci@0xc4000 etc
|
+--- smram (0xa0000-0xbffff) (when set to pci/vga, priority 1)
|
+-- pci@0xa0000 etc
Avi Kivity [Mon, 15 Aug 2011 14:17:34 +0000 (17:17 +0300)]
sysbus: remove sysbus_init_mmio_cb()
This problem with this function is that it is not reversible - it is
impossible to know where things are registered and unregister them
exactly. As there are no more users, we can remove it.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Avi Kivity [Mon, 15 Aug 2011 14:17:29 +0000 (17:17 +0300)]
sysbus: add a variant of sysbus_init_mmio_cb with an unmap callback
sysbus_init_mmio_cb() uses the destructive IO_MEM_UNASSIGNED to remove a
region. Provide an alternative that calls an unmap callback, so the removal
may be done non-destructively.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Brad [Sun, 14 Aug 2011 00:30:14 +0000 (20:30 -0400)]
Improvements to libtool support.
Improvements to the libtool support in QEMU. Replace hard coded
libtool in the infrastructure with $(LIBTOOL) and allow
overriding the libtool binary used via the configure
script.
Reviewed-by: Andreas F=E4rber <andreas.faerber@web.de> Signed-off-by: Brad Smith <brad@comstyle.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Bjørn Mork [Wed, 17 Aug 2011 09:03:14 +0000 (11:03 +0200)]
e1000: use MII status register for link up/down
Some guests will use the standard MII status register
to verify link state. They will not notice link changes
unless this register is updated.
Verified with Linux 3.0 and Windows XP guests.
Without this patch, ethtool will report speed and duplex as
unknown when the link is down, but still report the link as
up. This is because the Linux e1000 driver checks the
mac_reg[STATUS] register link state before it checks speed
and duplex, but uses the phy_reg[PHY_STATUS] register for
the actual link state check. Fix by updating both registers
on link state changes.
Linux guest before:
(qemu) set_link e1000.0 off
kvm-sid:~# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
(qemu) set_link e1000.0 on
Linux guest after:
(qemu) set_link e1000.0 off
[ 63.384221] e1000: eth0 NIC Link is Down
kvm-sid:~# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x00000007 (7)
drv probe link
Link detected: no
(qemu) set_link e1000.0 on
[ 84.304582] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Michael Roth [Thu, 11 Aug 2011 20:38:12 +0000 (15:38 -0500)]
guest agent: remove uneeded dependencies
This patch tries to cull any uneeded library dependencies from the guest
agent to improve portability across various distros. We do so by being
as explicit as possible about in-tree dependencies rather than relying
on existing *-obj-y targets, and by manually setting LIBS for the
qemu-ga target to avoid pulling in LIBS_TOOLS libraries discovered by
configure.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Michael Roth [Thu, 11 Aug 2011 20:38:11 +0000 (15:38 -0500)]
guest agent: remove g_strcmp0 usage
g_strcmp0 isn't in all version of glib 2.0, so don't use it to avoid
build breakage on older distros.
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Avi Kivity [Thu, 11 Aug 2011 07:40:26 +0000 (10:40 +0300)]
memory: crack wide ioport accesses into smaller ones when needed
The memory API supports cracking wide accesses into narrower ones
when needed; but this was no implemented for the pio address space,
causing lsi53c895a's IO BAR to malfunction.
Fix by correctly cracking wide accesses when needed.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Avi Kivity [Thu, 11 Aug 2011 07:40:25 +0000 (10:40 +0300)]
memory: abstract cracking of write access ops into a function
The memory API automatically cracks large reads and writes into smaller
ones when needed. Factor out this mechanism, which is now duplicated between
memory reads and memory writes, into a function.
Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>