git.proxmox.com Git - qemu.git/log

migration: do not sent zero pages in bulk stage

during bulk stage of ram migration if a page is a
zero page do not send it at all.
the memory at the destination reads as zero anyway.

even if there is an madvise with QEMU_MADV_DONTNEED
at the target upon receipt of a zero page I have observed
that the target starts swapping if the memory is overcommitted.
it seems that the pages are dropped asynchronously.

this patch also updates QMP to return the number of
skipped pages in MigrationStats.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: add an indicator for bulk state of ram migration

the first round of ram transfer is special since all pages
are dirty and thus all memory pages are transferred to
the target. this patch adds a boolean variable to track
this stage.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: search for zero instead of dup pages

virtually all dup pages are zero pages. remove
the special is_dup_page() function and use the
optimized buffer_find_nonzero_offset() function
instead.

here buffer_find_nonzero_offset() is used directly
to avoid the unnecssary additional checks in
buffer_is_zero().

raw performace gain checking 1 GByte zeroed memory
over is_dup_page() is approx. 10-12% with SSE2
and 8-10% with unsigned long arithmedtic.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

bitops: unroll while loop in find_next_bit()

this patch adopts the loop unrolling idea of bitmap_is_zero() to
speed up the skipping of large areas with zeros in find_next_bit().

this routine is extensively used to find dirty pages in
live migration.

testing only the find_next_bit performance on a zeroed bitfield
the loop onrolling decreased executing time by approx. 50% on x86_64.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>

buffer_is_zero: use vector optimizations if possible

performance gain on SSE2 is approx. 20-25%. altivec
is not tested. performance for unsigned long arithmetic
is unchanged.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Orit Wasserman <owasserm@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

cutils: add a function to find non-zero content in a buffer

this adds buffer_find_nonzero_offset() which is a SSE2/Altivec
optimized function that searches for non-zero content in a
buffer.

the function starts full unrolling only after the first few chunks have
been checked one by one. analyzing real memory page data has revealed
that non-zero pages are non-zero within the first 256-512 bits in
most cases. as this function is also heavily used to check for zero memory
pages this tweak has been made to avoid the high setup costs of the fully
unrolled check for non-zero pages.

due to the optimizations used in the function there are restrictions
on buffer address and search length. the function
can_use_buffer_find_nonzero_content() can be used to check if
the function can be used safely.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>

move vector definitions to qemu-common.h

vector optimizations will now be used at various places
not just in is_dup_page() in arch_init.c

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Juan Quintela <quintela@redhat.com>

savevm: Fix bugs in the VMSTATE_VBUFFER_MULTIPLY definition

The VMSTATE_BUFFER_MULTIPLY macro is misnamed - it actually specifies
a variably sized buffer with VMS_VBUFFER, so should be named
VMSTATE_VBUFFER_MULTIPLY. This patch fixes this (the macro had no current
users under either name).

In addition, unlike the other VMSTATE_VBUFFER variants, this macro did not
specify VMS_POINTER. This patch fixes this bug as well.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Juan Quintela <quintela@redhat.com>

savevm: Add VMSTATE_STRUCT_VARRAY_POINTER_UINT32

Currently the savevm code contains a VMSTATE_STRUCT_VARRAY_POINTER_INT32
helper (a variably sized array with the number of elements in an int32_t),
but not VMSTATE_STRUCT_VARRAY_POINTER_UINT32 (... with the number of
elements in a uint32_t). This patch (trivially) fixes the deficiency.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Juan Quintela <quintela@redhat.com>

savevm: Add VMSTATE_FLOAT64 helpers

The current savevm code includes VMSTATE helpers for a number of commonly
used data types, but not for the float64 type used by the internal floating
point emulation code. This patch fixes the deficiency.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Juan Quintela <quintela@redhat.com>

savevm: Add VMSTATE_UINTTL_EQUAL helper

This adds an _EQUAL VMSTATE helper for target_ulongs, defined in terms of
VMSTATE_UINT32_EQUAL or VMSTATE_UINT64_EQUAL as appropriate.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Juan Quintela <quintela@redhat.com>

savevm: Add VMSTATE_UINT64_EQUAL helpers

The savevm code already includes a number of *_EQUAL helpers which act as
sanity checks verifying that the configuration of the saved state matches
that of the machine we're loading into to work. Variants already exist
for 8 bit 16 bit and 32 bit integers, but not 64 bit integers. This patch
fills that hole, adding a UINT64 version.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: Improve QMP documentation

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

Merge remote-tracking branch 'stefanha/net' into staging

# By Dmitry Fleytman (5) and others
# Via Stefan Hajnoczi
* stefanha/net:
  net: increase buffer size to accommodate Jumbo frame pkts
  VMXNET3 device implementation
  Packet abstraction for VMWARE network devices
  Common definitions for VMWARE devices
  net: iovec checksum calculator
  Checksum-related utility functions
  net: use socket_set_nodelay() for -netdev socket

Merge remote-tracking branch 'stefanha/block' into staging

# By Liu Yuan (1) and Stefan Weil (1)
# Via Stefan Hajnoczi
* stefanha/block:
block: Add options QDict to bdrv_file_open() prototypes (fix MinGW build)
rbd: fix compile error

Merge remote-tracking branch 'kraxel/ipxe.3' into staging

# By Gerd Hoffmann
# Via Gerd Hoffmann
* kraxel/ipxe.3:
ipxe: update binaries
ipxe: disable two second timeout

glib: add a compatibility interface for g_timeout_add_seconds

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

gtk: Release modifier when graphic console loses keyboard focus

This solves, e.g., sticky ALT when selecting a GTK menu, switching to a
different window or selecting a different virtual console.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-id: 514F417A.6010908@web.de
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

net: increase buffer size to accommodate Jumbo frame pkts

Socket buffer sizes were hard-coded to 4K for VDE and socket netdevs.  Bump this
up to 68K (ala tap netdev) to handle maximum GSO packet size (64k) plus plenty
of room for the ethernet and virtio_net headers.

Originally, ran into this limitation when using -netdev UDP sockets to connect
VM-to-VM, where VM interface is configure with MTU=9000.  (Using virtio_net
NIC model).  Test is simple: ping -M do -s 8500 <target>.  This test will
attempt to ping with unfragmented packet of given size.  Without patch, size
is limited to < 4K (minus protocol hdrs).  With patch, ping test works with pkt
size up to 9000 (again, minus protocol hdrs).

v2: per Stefan, increase buf size to (4096+65536) as done in tap and apply
    to vde and socket netdevs.
v1: increase buf size to 12K just for -netdev UDP sockets

Signed-off-by: Scott Feldman <sfeldma@cumulusnetworks.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

VMXNET3 device implementation

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Yan Vugenfirer <yan@daynix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Packet abstraction for VMWARE network devices

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Yan Vugenfirer <yan@daynix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Common definitions for VMWARE devices

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Yan Vugenfirer <yan@daynix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

net: iovec checksum calculator

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Yan Vugenfirer <yan@daynix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Checksum-related utility functions

net_checksum_add_cont()
checksum calculation for scattered data with odd chunk sizes

net_raw_checksum()
checksum calculation for a buffer

Signed-off-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Yan Vugenfirer <yan@daynix.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

net: use socket_set_nodelay() for -netdev socket

Reduce -netdev socket latency by disabling the Nagle algorithm on
SOCK_STREAM sockets in net/socket.c. Since we are tunelling Ethernet
over TCP we shouldn't artificially delay outgoing packets, let the guest
decide packet scheduling.

I already get sub-millisecond -netdev socket ping times on localhost, so
there was no measurable difference in my testing. This won't hurt
though and may improve remote socket performance.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Daniel P. Berrange <berrange@redhat.com>

block: Add options QDict to bdrv_file_open() prototypes (fix MinGW build)

The new parameter is unused yet.

This part was missing in commit 787e4a8500020695eb391e2f1cc4767ee071d441.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

rbd: fix compile error

Commit 787e4a85 [block: Add options QDict to bdrv_file_open() prototypes] didn't
update rbd.c accordingly.

Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

ipxe: update binaries

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

ipxe: disable two second timeout

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Remove device_tree.o from hw/moxie/Makefile.objs.

Here's a fix for the build problem identified by Aurelien Jarno here:
http://lists.gnu.org/archive/html/qemu-devel/2013-03/msg04177.html

Signed-off-by: Anthony Green <green@moxielogic.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

tcg-optimize: Fold sub r,0,x to neg r,x

Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

target-i386: Don't modify env->eflags around cpu_dump_state

We can compute the value in cpu_dump_state anyway, and gratuitous
modifications to eflags creates heisenbugs.

Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

target-i386: Fix flags computation for ADOX

When starting from CC_OP_DYNAMIC, and issuing adox before adcx,
a typo used the wrong value for the resulting CC_OP.

Cc: Blue Swirl <blauwirbel@gmail.com>
Reported-by: Torbjorn Granlund <tg@gmplib.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

Add top level changes for moxie

Signed-off-by: Anthony Green <green@moxielogic.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

Add sample moxie system

Signed-off-by: Anthony Green <green@moxielogic.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

Add moxie disassembler

Signed-off-by: Anthony Green <green@moxielogic.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

Add moxie target code

Signed-off-by: Anthony Green <green@moxielogic.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>

Merge branch 'for-upstream' of git://github.com/mwalle/qemu

* 'for-upstream' of git://github.com/mwalle/qemu:
  configure: rename OpenGL feature to GLX
  configure: proper OpenGL/GLX probe
  target-lm32: use HELPER() macro
  target-lm32: flush tlb after clearing env
  target-lm32: remove dead code
  target-lm32: fix cmpgui and cmpgeui opcodes
  tests: tcg: lm32: add more test cases
  target-lm32: don't log cpu state in translation
  lm32_uart: fix receive buffering
  milkymist-uart: fix receive buffering
  lm32-dis: fix NULL pointer dereference
  target-lm32: fix debug memory access

Merge branch 'ppc-for-upstream' of git://github.com/agraf/qemu

* 'ppc-for-upstream' of git://github.com/agraf/qemu: (58 commits)
  target-ppc: Use NARROW_MODE macro for tlbie
  target-ppc: Use NARROW_MODE macro for addresses
  target-ppc: Use NARROW_MODE macro for comparisons
  target-ppc: Use NARROW_MODE macro for branches
  target-ppc: Fix add and subf carry generation in narrow mode
  target-ppc: Use QOM method dispatch for MMU fault handling
  target-ppc: Move ppc tlb_fill implementation into mmu_helper.c
  target-ppc: Split user only code out of mmu_helper.c
  mmu-hash64: Implement Virtual Page Class Key Protection
  mmu-hash*: Merge translate and fault handling functions
  mmu-hash*: Don't use full ppc_hash{32, 64}_translate() path for get_phys_page_debug()
  mmu-hash*: Correctly mask RPN from hash PTE
  mmu-hash*: Clean up real address calculation
  mmu-hash*: Clean up PTE flags update
  mmu-hash64: Factor SLB N bit into permissions bits
  mmu-hash*: Clean up permission checking
  mmu-hash32: Remove nx from context structure
  mmu-hash*: Don't update PTE flags when permission is denied
  mmu-hash32: Don't look up page tables on BAT permission error
  mmu-hash32: Cleanup BAT lookup
  ...

tcg: Fix occasional TCG broken problem when ldst optimization enabled

is_tcg_gen_code() checks the upper limit of TCG generated code range wrong, so
that TCG could get broken occasionally only when CONFIG_QEMU_LDST_OPTIMIZATION
enabled. The reason is code_gen_buffer_max_size does not cover the upper range
up to (TCG_MAX_OP_SIZE * OPC_BUF_SIZE), thus code_gen_buffer_max_size should be
modified to code_gen_buffer_size.

CC: qemu-stable@nongnu.org
Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>

Merge remote-tracking branch 'kwolf/for-anthony' into staging

# By Kevin Wolf (12) and Peter Lieven (2)
# Via Kevin Wolf
* kwolf/for-anthony:
  nbd: Check against invalid option combinations
  nbd: Use default port if only host is specified
  block: Allow omitting the file name when using driver-specific options
  block: Make find_image_format safe with NULL filename
  block: Rename variable to avoid shadowing
  block: Introduce .bdrv_parse_filename callback
  nbd: Accept -drive options for the network connection
  nbd: Remove unused functions
  nbd: Keep hostname and port separate
  qemu-socket: Make socket_optslist public
  block: Pass bdrv_file_open() options to block drivers
  block: Add options QDict to bdrv_file_open() prototypes
  block: complete all IOs before resizing a device
  Revert "block: complete all IOs before .bdrv_truncate"

Merge remote-tracking branch 'stefanha/trivial-patches' into staging

# By liguang (2) and others
# Via Stefan Hajnoczi
* stefanha/trivial-patches:
  qdev: remove redundant abort()
  gitignore: ignore more files
  Use proper term in TCG README
  serial: Fix debug format strings
  Fix typos and misspellings
  Advertise --libdir in configure --help output
  memory: fix a bug of detection of memory region collision
  MinGW: Replace setsockopt by qemu_setsocketopt

Merge remote-tracking branch 'cohuck/virtio-ccw-upstr' into staging

# By Cornelia Huck
# Via Cornelia Huck
* cohuck/virtio-ccw-upstr:
  virtio-ccw, s390-virtio: Use generic virtio-blk macro.
  s390-virtio, virtio-ccw: Add config_wce for virtio-blk.
  virtio-ccw: Add missing blk chs properties.

nbd: Check against invalid option combinations

A file name may only specified if no host or socket path is specified.
The latter two may not appear at the same time either.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

nbd: Use default port if only host is specified

The URL method already takes care to apply the default port when none is
specfied. Directly specifying driver-specific options required the port
number until now. Allow leaving it out and apply the default.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

block: Allow omitting the file name when using driver-specific options

After this patch, using -drive with an empty file name continues to open
the file if driver-specific options are used. If no driver-specific
options are specified, the semantics stay as it was: It defines a drive
without an inserted medium.

In order to achieve this, bdrv_open() must be made safe to work with a
NULL filename parameter. The assumption that is made is that only block
drivers which implement bdrv_parse_filename() support using driver
specific options and could therefore work without a filename. These
drivers must make sure to cope with NULL in their implementation of
.bdrv_open() (this is only NBD for now). For all other drivers, the
block layer code will make sure to error out before calling into their
code - they can't possibly work without a filename.

Now an NBD connection can be opened like this:

qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

block: Make find_image_format safe with NULL filename

In order to achieve this, the .bdrv_probe callbacks of all drivers must
cope with this. The DMG driver is the only one that bases its decision
on the filename and it needs to be changed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

block: Rename variable to avoid shadowing

bdrv_open() uses two different variables called options. Rename one of
them to avoid confusion and to allow the outer one to be accessed
everywhere.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

block: Introduce .bdrv_parse_filename callback

If a driver needs structured data and not just a string, it can provide
a .bdrv_parse_filename callback now that parses the command line string
into separate options. Keeping this separate from .bdrv_open_filename
ensures that the preferred way of directly specifying the options always
works as well if parsing the string works.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

nbd: Accept -drive options for the network connection

The existing parsers for the file name now parse everything into the
bdrv_open() options QDict. Instead of using these parsers, you can now
directly specify the options on the command line, like this:

qemu-system-x86_64 -drive file=nbd:,file.port=1234,file.host=::1

Clearly the file=... part could use further improvement, but it's a
start.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

nbd: Remove unused functions

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

nbd: Keep hostname and port separate

The NBD block supports an URL syntax, for which a URL parser returns
separate hostname and port fields. It also supports the traditional qemu
syntax encoded in a filename. Until now, after parsing the URL to get
each piece of information, a new string is built to be fed to socket
functions.

Instead of building a string in the URL case that is immediately parsed
again, parse the string in both cases and use the QemuOpts interface to
qemu-sockets.c.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

qemu-socket: Make socket_optslist public

Allow other users to create the QemuOpts needed for inet_connect_opts().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

block: Pass bdrv_file_open() options to block drivers

Specify -drive file.option=... on the command line to pass the option to
the protocol instead of the format driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

block: Add options QDict to bdrv_file_open() prototypes

The new parameter is unused yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>

block: complete all IOs before resizing a device

this patch ensures that all pending IOs are completed
before a device is resized. this is especially important
if a device is shrinked as it the bdrv_check_request()
result is invalidated.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Revert "block: complete all IOs before .bdrv_truncate"

brdv_truncate() is also called from readv/writev commands on self-
growing file based storage. this will result in requests waiting
for theirselves to complete.

This reverts commit 9a665b2b8640e464f0a778216fc2dca8d02acf33.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qdev: remove redundant abort()

Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: liguang <lig.fnst@cn.fujitsu.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

gitignore: ignore more files

ignore *.patch, *.gcda, *.gcno

Signed-off-by: liguang <lig.fnst@cn.fujitsu.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Use proper term in TCG README

In TCG, "target" means the host architecture for which TCG generates
the code. Using "guest" rather than "target" to make the document more
consistent.

Signed-off-by: Chen Wei-Ren <chenwj@iis.sinica.edu.tw>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

target-ppc: Use NARROW_MODE macro for tlbie

Removing conditional compilation in the process.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Use NARROW_MODE macro for addresses

Removing conditional compilation in the process.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Use NARROW_MODE macro for comparisons

Removing conditional compilation in the process.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Use NARROW_MODE macro for branches

Removing conditional compilation in the process.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Fix add and subf carry generation in narrow mode

The set of computations used in b5a73f8d8a57e940f9bbeb399a9e47897522ee9a
are only valid if the current word size == target_long size. This failed
to take ppc64 in 32-bit (narrow) mode into account.

Add a NARROW_MODE macro to avoid conditional compilation.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Use QOM method dispatch for MMU fault handling

After previous cleanups, the many scattered checks of env->mmu_model in
the ppc MMU implementation have, at least for "classic" hash MMUs been
reduced (almost) to a single switch at the top of
cpu_ppc_handle_mmu_fault().

An explicit switch is still a pretty ugly way of handling this though.  Now
that Andreas Färber's CPU QOM cleanups for ppc have gone in, it's quite
straightforward to instead make the handle_mmu_fault function a QOM method
on the CPU object.

This patch implements such a scheme, initializing the method pointer at
the same time as the mmu_model variable.  We need to keep the latter around
for now, because of the MMU types (BookE, 4xx, et al) which haven't been
converted to the new scheme yet, and also for a few other uses.  It would
be good to clean those up eventually.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Move ppc tlb_fill implementation into mmu_helper.c

For softmmu builds the interface from the generic code to the target
specific MMU implementation is through the tlb_fill() function. For ppc
this is currently in mem_helper.c, whereas it would make more sense in
mmu_helper.c. This patch moves it, which also allows
cpu_ppc_handle_mmu_fault() to become a local function in mmu_helper.c

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Split user only code out of mmu_helper.c

mmu_helper.c is, for obvious reasons, almost entirely concerned with
softmmu builds of qemu.  However, it does contain one stub function which
is used when CONFIG_USER_ONLY=y - the user only versoin of
cpu_ppc_handle_mmu_fault, which always triggers an exception.  The entire
rest of the file is surrounded by #if !defined(CONFIG_USER_ONLY).

We clean this up by moving the user only stub into its own new file,
removing the ifdefs and building mmu_helper.c only when CONFIG_SOFTMMU
is set.  This also lets us remove the #define of cpu_handle_mmu_fault to
cpu_ppc_handle_mmu_fault - that name is only used from generic code for
user only - so we just name our split user version by the generic name.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash64: Implement Virtual Page Class Key Protection

Version 2.06 of the Power architecture describes an additional page
protection mechanism.  Each virtual page has a "class" (0-31) recorded in
the PTE.  The AMR register contains bits which can prohibit reads and/or
writes on a class by class basis.  Interestingly, the AMR is userspace
readable and writable, however user mode writes are masked by the contents
of the UAMOR which is privileged.

This patch implements this protection mechanism, along with the AMR and
UAMOR SPRs.  The architecture also specifies a hypervisor-privileged AMOR
register which masks user and supervisor writes to the AMR and UAMOR.  We
leave this out for now, since we don't at present model hypervisor mode
correctly in any case.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: fix 32-bit hosts]
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Merge translate and fault handling functions

ppc_hash{32,64}_handle_mmu_fault() is now the only caller of
ppc_hash{32,64{_translate(), so this patch combines them together. This
means that instead of one returning a variety of non-obvious error codes
which then get translated into the various mmu exception conditions, we can
just generate the exceptions as we discover problems in the translation
path. This also removes the last usage of mmu_ctx_hash{32,64}.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Don't use full ppc_hash{32, 64}_translate() path for get_phys_page_debug()

Currently the hash mmu versionsof get_phys_page_debug() use the same
ppc64_hash64_translate() function to do the translation logic as the normal
mm fault handler code.

That sounds like a good idea, but has some complications. The debug path
doesn't need, or even want some parts of the full translation path, like
permissions checking. Furthermore, the pte flags update included in the
normal path means that the debug call is not quite side effect free.

This patch, therefore, reimplements get_phys_page_debug as the minimal
required subset of the full translation path.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>`z
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Correctly mask RPN from hash PTE

BEHAVIOUR CHANGE

At present we take the whole of word 1 of the hash PTE as the real page
number used to calculate the translated address.  This is incorrect,
because it leaves the flags from the low bits of PTE word 1 in place in the
rpm.  We mostly get away with that because the value is later masked by
TARGET_PAGE_MASK.

More recent 64-bit CPUs also have a small number of flag bits (PP0 and
KEY) in the top bits of PTE word 1.  Any guest which used those bits would
fail with the current code.

This patch fixes the problem by correctly masking out the RPN field of
PTE word 1.  This is safe, even for older CPUs which didn't have PP0 and
KEY, because although the RPN notionally extended to the very top of PTE
word 1, none of those CPUs actually implemented that many real address
bits.

We add analogous masking to the 32-bit code, even though it also doesn't
have the high flag bits, for consistency and clarity.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Clean up real address calculation

More recent 64-bit hash MMUs support multiple page sizes, and PTEs for
large pages only include the offset of the whole large page.  But the qemu
tlb only handles pages of the base size (4k) so we need to break up the
large pages into 4k pieces for the qemu tlb.  To do that we have a somewhat
awkward piece of code that adds the folds address bits 4k and the page size
from the virtual address into the real address from the pte.

This patch simplifies this redefining the raddr output of
ppc_hash64_translate() to be the full real address of the faulting address,
rather than just the (4k) page offset.  Computing that turns out to be
simpler, and is fine for the caller, since it already masks with
TARGET_PAGE_MASK before inserting into the qemu tlb.

The multiple page size complication doesn't exist for 32-bit hash mmus, but
we make an analogous cleanup there for consistency.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Clean up PTE flags update

Currently the ppc_hash{32,64}_pte_update_flags() helper functions update a
PTE's referenced and changed bits as necessary to reflect the access. It
is somewhat long winded, though. This patch open codes them in their
(single) callers, in a simpler way.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash64: Factor SLB N bit into permissions bits

BEHAVIOUR CHANGE

Currently, for 64-bit hash mmu, the execute protection bit placed into the
qemu tlb is based only on the N (No execute) bit from the PTE. However,
No Execute can also be set at the segment level. We do check this on
execute faults, but this still means we could incorrectly allow execution
of code from a No Execute segment, if a prior read or write fault caused
the page to be loaded into the qemu tlb with PROT_EXEC set.

To correct this, we (re-)check the segment level no execute permission when
generating the protection bits for the qemu tlb.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Clean up permission checking

Currently checking of PTE permission bits is split messily amongst
ppc_hash{32,64}_pp_check(), ppc_hash{32,64}_check_prot() and their callers.
This patch cleans this up to have the new function
ppc_hash{32,64}_pte_prot() compute the page permissions from the SLBE (for
64-bit) or segment register (32-bit) and the pte. A greatly simplified
version of the actual permissions check is then open coded in the callers.

The 32-bit version of ppc_hash32_pte_prot() is implemented in terms of
ppc_hash32_pp_prot(), a renamed and slightly cleaned up version of the old
ppc_hash32_pp_check(), which is also used for checking BAT permissions on
the 601.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash32: Remove nx from context structure

Previous cleanups have meant the nx field of the mmu_ctx_hash32 structure
is now only used within ppc_hash32_translate(), and so it can be replaced
by a local variable.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Don't update PTE flags when permission is denied

BEHAVIOUR CHANGE

Currently if ppc_hash{32,64}_translate() finds a PTE matching the given
virtual address, it will always update the PTE's R & C (Referenced and
Changed) bits.  This happens even if the PTE's permissions mean we are
about to deny the translation.

This is clearly a bug, although we get away with it because:
  a) It will only incorrectly set, never reset the bits, which should not
cause guest correctness problems.
  b) Linux guests never use the R & C bits anyway.

This patch fixes the behaviour, only updating R & C when access is granted
by the PTE.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash32: Don't look up page tables on BAT permission error

BEHAVIOUR CHANGE

Currently, on any failure translating an address with BATs, we proceed to
normal segment and page table translation.  That's incorrect if the
BAT error was due to permissions, rather than not finding a matching BAT.
We've gotten away with it because a guest would not usually put
translations for the same address in both BATs and page table.  Nonetheless
this patch corrects the logic, only doing page table lookup if no BAT
is found.  A matching BAT with bad permissions will now correctly trigger
an exception.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash32: Cleanup BAT lookup

This patch makes a general cleanup of the ppc_hash32_get_bat() function,
renaming it to ppc_hash32_bat_lookup(). In particular, the new function
only looks for a matching BAT, with the permissions check from the old
function moved to the caller.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash32: Clean up BAT matching logic

The code to search for a matching BAT for a virtual address is somewhat
longwinded and awkward. In particular, it relies on seperate size and
validity information being returned from the hash32_bat_size() function
(and 601 specific variant).

We simplify this by having hash32_bat_size() return instead a mask of the
virtual address bits to match, and 0 for invalid (since a BAT can never
match the entire address space).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash32: Split BAT size logic from permissions logic

hash32_bat_size_prot() and its 601 variant, as the name suggests, returns
both a BAT's size - needed to search for a matching BAT - and its
permissions, only relevant once a matching BAT has been located.

There's no particular advantage to combining these, so we split these roles
into seperate functions for clarity.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash32: Remove odd pointer usage from BAT code

In the code for handling BATs, the hash32_bat_size_prot() and
hash32_bat_601_size_prot() functions are passed the BAT contents by
reference (pointer) for no clear reason, since they only need the values
within.

This patch removes this odd usage, and uses the resulting change to clean
up the caller slightly.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Fold pte_check*() logic into caller

With previous cleanups made, the 32-bit and 64-bit pte_check*() functions
are pretty trivial and only have one call site. This patch therefore
clarifies the overall code flow by folding those functions into their
call site.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash64: Clean up ppc_hash64_htab_lookup()

This patch makes a general cleanup of the address mangling logic in
ppc_hash64_htab_lookup(). In particular it now avoids repeatedly switching
on the segment size. The lack of SLB and multiple segment sizes on 32-bit
means an analogous cleanup is not needed there.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Remove permission checking from find_pte{32, 64}()

find_pte{32,64}() are poorly named, since they both find a PTE and do
permissions checking of it. This patch makes them only locate a matching
PTE, moving the permission checking and other logic to the caller. We
rename the resulting search functions ppc_hash{32,64}_htab_lookup().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Make find_pte{32, 64} do more of the job of finding ptes

find_pte{32,64}() are not particularly well named. They only "find" a PTE
within a given PTE group, and they also do permissions checking and other
things.

This patch makes it somewhat close to matching the name, by folding the
search of both primary and secondary hash bucket into it, along with the
various address bit shuffling to determine the right hash buckets.

In the 32-bit case we also remove the code for splitting large pages into
4k pieces for the qemu tlb, since no 32-bit hash MMUs support multiple page
sizes.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Separate PTEG searching from permissions checking

find_pte{32,64{() do several things.  First they search through a PTEG
ooking for a PTE matching our virtual address.  Then they do permissions
checking and other processing on that PTE.

This patch separates the search by VA out from the rest.  The search is
combined with the pte{32,64}_match() functions into new
ppc_has{32,64}_pteg_search() functions.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Don't keep looking for PTEs after we find a match

BEHAVIOUR CHANGE

The ppc hash mmu hashes each virtual address to a primary and secondary
possible hash bucket (aka PTE group or PTEG) each with 8 PTEs.  Then we
need a linear search through the PTEs to find the correct one for the
virtual address we're translating.

It is a programming error for the guest to insert multiple PTEs mapping the
same virtual address into a PTEG - in this case the ppc architecture says
the MMU can either act as if just one was present, or give a machine check.
Currently our code takes the first matching PTE in a PTEG if it finds a
successful translation.  But if a matching PTE is found, but permission
bits don't allow the access, we keep looking through the PTEG, checking
that any other matching PTEs contain an identical translation.

That behaviour is perhaps not exactly wrong, but it's certainly not useful.
This patch changes it to always just find the first matching PTE in a PTEG.

In addition, if we get a permissions problem on the primary PTEG, we then
search the secondary PTEG.  This is incorrect - a permission denying PTE
in the primary PTEG should not be overwritten by an access granting PTE in
the secondary (although again, it would be a programming error for the
guest to set up such a situation anyway).  So additionally we update the
code to only search the secondary PTEG if no matching PTE is found in the
primary at all.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Cleanup segment-level NX check

On the ppc hash mmus, no-execute can be set at the segment level (on more
recent 64-bit hash mmus it can also be set at the page level). This patch
separates out this check to make it clearer what is going on, and avoiding
excessive indentation of the remaining translation code.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash32: Split direct store segment handling into a helper

This further separates the unusual case handling of direct store segments
from the main translation path by moving its logic into a helper function,
with some tiny cleanups along the way.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash32: Split out handling of direct store segments

At present a large chunk of ppc_hash32_translate() is taken up with an
ugly if selecting between direct store segments (hardly ever used) and
normal paged segments. This patch clarifies the flow of code by
handling direct store segments immediately then returning, leaving the
straight line code to describe the normal MMU path.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Combine ppc_hash{32, 64}_get_physical_address and get_segment{32, 64}()

After previous work, ppc_hash{32,64}_get_physical_address() are almost
trivial wrappers around get_segment{32,64}() which does nearly all the work of
translating an address according to the hash mmu model. Therefore combine the
two functions into one, under the better name of
ppc_hash{32,64}_translate().

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Remove eaddr field from mmu_ctx_hash{32, 64}

The eaddr field of mmu_ctx_hash{32,64} is effectively just used to pass the
effective address from get_segment{32,64}() to find_pte{32,64}(). Just
pass it as a normal parameter instead.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash64: Remove nx from mmu_ctx_hash64

The nx field in mmu_ctx_hash64 is used in two different functions.  But its
used for slightly different things in each place, and the value is never
propagated between them.  In other words, it might as well be two local
variables.  This patch makes it so.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Reduce use of access_type

In ppc env->access_type is updated by e.g. integer load/stores with
ACCESS_INT floating point load/stores with ACCESS_FLOAT and so forth.  In
hash mmu fault paths it can also b set to ACCESS_CODE for instruction
fetch accesses.

But the only place which uses anything more of the access_type than
whether it is instruction fetch or data access is the direct store segment
handling.  Instruction versus data access can be more simply determined
from the rw value passed down from the top.

This changes the code to use rw in preference to checking access_type.
For the 32-bit case there is a small amount of code (for direct store
segments) that still needs the full access type.  Instead of passing it
all the way down the stack, we retrieve it from the env structure, which
is where it came anyway, before this patch.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Add hash pte load/store helpers

On real hardware the ppc hash page table is stored in memory; accordingly
our mmu emulation code can read a hash page table in guest memory.  But,
when paravirtualized under PAPR, the real hash page table is in host
memory, accessible to the guest only via hypercalls.  We model this by
also allowing the MMU emulation code to access a specially allocated hash
page table outside the guest's memory image. At present these two options
are implemented with some ugly conditionals at each access point in the mmu
emulation code.  In the implementation of the PAPR hypercalls, we assume
the external hash table.

This patch cleans things up by adding helpers to load and store from the
hash table for both 32-bit and 64-bit hash mmus.  The 64-bit versions
handle both the in-guest-memory and outside guest memory cases.  The 32-bit
versions only handle the in-guest-memory case since no 32-bit systems can
have an external hash table at present.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

mmu-hash*: Add header file for definitions

Currently cpu.h contains a number of definitions relating to the 64-bit
hash MMU. Some are used in the MMU emulation code, but some are only used
in the spapr MMU management hcall implementations.

This patch moves these definitions (except for a few that are needed
more widely) into mmu-hash64.h header, shared between the MMU emulation
code and the spapr hcall code. The MMU emulation code is also updated to
actually use a number of those definitions in place of hard coded
constants.

Similarly, we add new analogous definitions to mmu-hash32.h and use those
in place of many hard-coded constants in mmu-hash32.c

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[agraf: fix 32-bit hosts]
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: mmu_ctx_t should not be a global type

mmu_ctx_t is currently defined in cpu.h.  However it is used for temporary
information relating to mmu translation, and is only used in mmu_helper.c
and (now) mmu-hash{32,64}.c.  Furthermore it contains information which
should be specific to particular MMU types.  Therefore, move its definition
to mmu_helper.c.  mmu-hash{32,64}.c are converted to use new data types
private to the relevant MMUs (identical to mmu_ctx_t for now, but that will
change in future patches).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>

target-ppc: Disentangle BAT code for 32-bit hash MMUs

The functions for looking up BATs (Block Address Translation - essentially
a level 0 TLB) are shared between the classic 32-bit hash MMUs and the
6xx style software loaded TLB implementations.

This patch splits out a copy for the 32-bit hash MMUs, to facilitate
cleaning it up. The remaining version is left, but cleaned up slightly
to no longer deal with PowerPC 601 peculiarities (601 has a hash MMU).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexander Graf <agraf@suse.de>