git.proxmox.com Git - mirror

block: Convert bdrv_pread(v) to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_write() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_read() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Use BlockBackend for I/O in bdrv_commit()

Just like block jobs, the HMP commit command should use its own
BlockBackend for doing I/O on BlockDriverStates.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Move bdrv_commit() to block/commit.c

No code changes, just moved from one file to another.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_co_do_readv/writev to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_aio_writev() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_aio_readv() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_co_writev() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block: Convert bdrv_co_readv() to BdrvChild

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

vhdx: Some more BlockBackend use in vhdx_create()

This does some easy conversions from bdrv_* to blk_* functions in
vhdx_create(). We should avoid bypassing the BlockBackend layer whenever
possible.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

blkreplay: Convert to byte-based I/O

The blkreplay driver only forwards the requests it gets, so converting
it to byte granularity is trivial.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

vvfat: Use BdrvChild for s->qcow

vvfat uses a temporary qcow file to cache written data in read-write
mode. In order to do things properly, this should show up in the BDS
graph and I/O should go through BdrvChild like for every other node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>

block/qdev: Fix NULL access when using BB twice

BlockBackend has only a single pointer to its guest device, so it makes
sure that only a single guest device is attached to it. device-add
returns an error if you try to attach a second device to a BB. In order
to make the error message nicer, -device that manually connects to a
if=none block device get a different message than -drive that implicitly
creates a guest device. The if=... option is stored in DriveInfo.

However, since blockdev-add exists, not every BlockBackend has a
DriveInfo any more. Check that it exists before we dereference it.

QMP reproducer resulting in a segfault:

{"execute":"blockdev-add","arguments":{"options":{"id":"disk","driver":"file","filename":"/tmp/test.img"}}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"disk"}}
{"execute":"device_add","arguments":{"driver":"virtio-blk-pci","drive":"disk"}}

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

block: fix return code for partial write for Linux AIO

Partial write most likely means that there is not space rather than
"something wrong happens". Thus it would be more natural to return
ENOSPC rather than EINVAL.

The problem actually happens with NBD server, which has reported EINVAL
rather then ENOSPC on the first error using its protocol, which makes
report to the user wrong.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Use bool as appropriate for BDS members

Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix error message style

error_setg() is not supposed to be used for multi-sentence
messages; tweak the message to append a hint instead.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Move request_alignment into BlockLimit

It makes more sense to have ALL block size limit constraints
in the same struct. Improve the documentation while at it.

Simplify a couple of conditionals, now that we have audited and
documented that request_alignment is always non-zero.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Split bdrv_merge_limits() from bdrv_refresh_limits()

During bdrv_merge_limits(), we were computing initial limits
based on another BDS in two places. At first glance, the two
computations are not identical (one is doing straight copying,
the other is doing merging towards or away from zero) - but
when you realize that the first round is starting with all-0
memory, all of the merging happens to work. Factoring out the
merging makes it easier to track how two BDS limits are merged,
in case we have future reasons to merge in even more limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Drop raw_refresh_limits()

The raw block driver was blindly copying all limits from bs->file,
even though: 1. the main bdrv_refresh_limits() already does this
for many of the limits, and 2. blindly copying from the children
can weaken any stricter limits that were already inherited from
the backing chain during the main bdrv_refresh_limits(). Also,
a future patch is about to move .request_alignment into
BlockLimits, and that is a limit that should NOT be copied from
other layers in the BDS chain.

Thus, we can completely drop raw_refresh_limits(), and rely on
the block layer setting up the proper limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Switch discard length bounds to byte-based

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  The BlockLimits type is now completely
byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
longer needed.

pdiscard_alignment is made unsigned (we use power-of-2 alignments
as bitmasks, where unsigned is easier to think about) while
leaving max_pdiscard signed (since we still have an 'int'
interface); this is comparable to what commit cf081fc did for
write zeroes limits.  We may later want to make everything an
unsigned 64-bit limit - but that requires a bigger code audit.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Wording tweaks to write zeroes limits

Improve the documentation of the write zeroes limits, to mention
additional constraints that drivers should observe. Worth squashing
into commit cf081fca, if that hadn't been pushed already :)

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Switch transfer length bounds to byte-based

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length. Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation. Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

When a value comes from an external source (iscsi and raw-posix),
sanitize the results to ensure that opt_transfer is a power of 2.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Set default request_alignment during bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Now that all drivers have been updated to supply an override
of request_alignment during their .bdrv_refresh_limits(), as
needed, the block layer itself can defer setting the default
alignment until part of the overall bdrv_refresh_limits().

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Add a .bdrv_refresh_limits() to all four of our legacy devices
that will always be sector-only (bochs, cloop, dmg, vvfat), in
spite of their recent conversion to expose a byte interface.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

raw-win32: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

In this case, raw_probe_alignment() already did what we needed,
so just fix its signature and wire it in correctly.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qcow2: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iscsi: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blkdebug: Set request_alignment during .bdrv_refresh_limits()

We want to eventually stick request_alignment alongside other
BlockLimits, but first, we must ensure it is populated at the
same time as all other limits, rather than being a special case
that is set only when a block is first opened.

Note that when the user does not provide "align", then we were
defaulting to bs->request_alignment - but at this stage in the
initialization, that was always 512. We were also rejecting an
explicit "align":0 from the user; this patch now allows that,
as an explicit request for the default alignment (which may not
always be 512 in the future).

qemu-iotests 77 is particularly sensitive to the fact that we
can specify an artificial alignment override in blkdebug, and
that override must continue to work even when limits are
refreshed on an already open device.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Give nonzero result to blk_get_max_transfer_length()

Making all callers special-case 0 as unlimited is awkward,
and we DO have a hard maximum of BDRV_REQUEST_MAX_SECTORS given
our current block layer API limits.

In the case of scsi, this means that we now always advertise a
limit to the guest, even in cases where the underlying layers
previously use 0 for no inherent limit beyond the block layer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

scsi: Advertise limits by blocksize, not 512

s->blocksize may be larger than 512, in which case our
tweaks to max_xfer_len and opt_xfer_len must be scaled
appropriately.

CC: qemu-stable@nongnu.org
Reported-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iscsi: Advertise realistic limits to block layer

The function sector_limits_lun2qemu() returns a value in units of
the block layer's 512-byte sector, and can be as large as
0x40000000, which is much larger than the block layer's inherent
limit of BDRV_REQUEST_MAX_SECTORS. The block layer already
handles '0' as a synonym to the inherent limit, and it is nicer
to return this value than it is to calculate an arbitrary
maximum, for two reasons: we want to ensure that the block layer
continues to special-case '0' as 'no limit beyond the inherent
limits'; and we want to be able to someday expand the block
layer to allow 64-bit limits, where auditing for uses of
BDRV_REQUEST_MAX_SECTORS will help us make sure we aren't
artificially constraining iscsi to old block layer limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

nbd: Advertise realistic limits to block layer

We were basing the advertisement of maximum discard and transfer
length off of UINT32_MAX, but since the rest of the block layer
has signed int limits on a transaction, nothing could ever reach
that maximum, and we risk overflowing an int once things are
converted to byte-based rather than sector-based limits. What's
more, we DO have a much smaller limit: both the current kernel
and qemu-nbd have a hard limit of 32M on a read or write
transaction, and while they may also permit up to a full 32 bits
on a discard transaction, the upstream NBD protocol is proposing
wording that without any explicit advertisement otherwise,
clients should limit ALL requests to the same limits as read and
write, even though the other requests do not actually require as
many bytes across the wire. So the better limit to tell the
block layer is 32M for both values.

Behavior doesn't actually change with this patch (the block layer
is currently ignoring the max_transfer advertisements); but when
that problem is fixed in a later series, this patch will prevent
the exposure of a latent bug.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

nbd: Allow larger requests

The NBD layer was breaking up request at a limit of 2040 sectors
(just under 1M) to cater to old qemu-nbd. But the server limit
was raised to 32M in commit 2d8214885 to match the kernel, more
than three years ago; and the upstream NBD Protocol is proposing
documentation that without any explicit communication to state
otherwise, a client should be able to safely assume that a 32M
transaction will work. It is time to rely on the larger sizing,
and any downstream distro that cares about maximum
interoperability to older qemu-nbd servers can just tweak the
value of #define NBD_MAX_SECTORS.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-stable@nongnu.org
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Fix harmless off-by-one in bdrv_aligned_preadv()

If the amount of data to read ends exactly on the total size
of the bs, then we were wasting time creating a local qiov
to read the data in preparation for what would normally be
appending zeroes beyond the end, even though this corner case
has nothing further to do.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Document supported flags during bdrv_aligned_preadv()

We don't pass any flags on to drivers to handle. Tighten an
assert to explain why we pass 0 to bdrv_driver_preadv(), and add
some comments on things to be aware of if we want to turn on
per-BDS BDRV_REQ_FUA support during reads in the future. Also,
document that we may want to consider using unmap during
copy-on-read operations where the read is all zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Tighter assertions on bdrv_aligned_pwritev()

For symmetry with bdrv_aligned_preadv(), assert that the caller
really has aligned things properly. This requires adding an align
parameter, which is used now only in the new asserts, but will
come in handy in a later patch that adds auto-fragmentation to the
max transfer size, since that value need not always be a multiple
of the alignment, and therefore must be rounded down.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-img: fix failed autotests

There are 9 iotests failed on Ubuntu 15.10 at the moment.
The problem is that options parsing in qemu-img is broken by the
following commit:
    commit 10985131e337a0c52c5bd1e191fd7867a6ff8d02
    Author: Denis V. Lunev <den@openvz.org>
    Date:   Fri Jun 17 17:44:13 2016 +0300
    qemu-img: move common options parsing before commands processing

This strange command line reports error
  ./qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1024
  qemu-img: Invalid image size specified!
while original code parses it successfully.

The problem is that getopt_long state should be reset. This could be done
using this assignment according to the manual:
    optind = 0

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-ipxe-20160704-1' into staging

ipxe: update submodule from 4e03af8ec to 041863191
e1000e+vmxnet3: add boot rom

# gpg: Signature made Mon 04 Jul 2016 07:25:46 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-ipxe-20160704-1:
  build: add pc-bios to config-host.mak deps
  ipxe: add new roms to BLOBS
  ipxe: update prebuilt binaries
  vmxnet3: add boot rom
  e1000e: add boot rom
  ipxe: add vmxnet3 rom
  ipxe: add e1000e rom
  ipxe: update submodule from 4e03af8ec to 041863191

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160705' into staging

ppc patch queue for 2016-07-05

Here's the current ppc, sPAPR and related drivers patch queue.

  * The big addition is dynamic DMA window support (this includes some
    core VFIO changes)
  * There are also several fixes to the MMU emulation for bugs
    introduced with the HV mode patches
  * Several other bugfixes and cleanups

Changes in v2:
  I messed up and forgot to make a fix in the last patch which BenH
  pointed out (introduced by my rebasing).  That's fixed in this
  version, and I'm replacing the tag in place with the revised
  version.

# gpg: Signature made Tue 05 Jul 2016 06:28:58 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160705:
  ppc/hash64: Fix support for LPCR:ISL
  ppc/hash64: Add proper real mode translation support
  target-ppc: Return page shift from PTEG search
  target-ppc: Simplify HPTE matching
  target-ppc: Correct page size decoding in ppc_hash64_pteg_search()
  ppc: simplify ppc_hash64_hpte_page_shift_noslb()
  spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)
  vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)
  vfio: Add host side DMA window capabilities
  vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)
  spapr_iommu: Realloc guest visible TCE table when starting/stopping listening
  ppc: simplify max_smt initialization in ppc_cpu_realizefn()
  spapr: Ensure thread0 of CPU core is always realized first
  ppc: Fix xsrdpi, xvrdpi and xvrspi rounding

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

ppc/hash64: Fix support for LPCR:ISL

We need to ignore the segment page size and essentially treat
all pages as coming from a 4K segment.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[dwg: Adjusted for differences in my version of the prereq patches]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc/hash64: Add proper real mode translation support

This adds proper support for translating real mode addresses based
on the combination of HV and LPCR bits. This handles HRMOR offset
for hypervisor real mode, and both RMA and VRMA modes for guest
real mode. PAPR mode adjusts the offsets appropriately to match the
RMA used in TCG, but we need to limit to the max supported by the
implementation (16G).

This includes some fixes by Cédric Le Goater <clg@kaod.org>

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[dwg: Adjusted for differences in my version of the prereq patches]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target-ppc: Return page shift from PTEG search

ppc_hash64_pteg_search() now decodes a PTEs page size encoding, which it
didn't previously do. This means we're now double decoding the page size
because we check it int he fault path after ppc64_hash64_htab_lookup()
returns.

To avoid this duplication have ppc_hash64_pteg_search() and
ppc_hash64_htab_lookup() return the page size from the PTE and use that in
the callers instead of decoding again.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

target-ppc: Simplify HPTE matching

ppc_hash64_pteg_search() explicitly checks each HPTE's VALID and
SECONDARY bits, then uses the HPTE64_V_COMPARE() macro to check the B field
and AVPN. However, a small tweak to HPTE64_V_COMPARE() means we can check
all of these bits at once with a suitable ptem value. So, consolidate all
the comparisons for simplicity.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

target-ppc: Correct page size decoding in ppc_hash64_pteg_search()

The architecture specifies that when searching a PTEG for PTEs, entries
with a page size encoding that's not valid for the current segment should
be ignored, continuing the search.

The current implementation does this with ppc_hash64_pte_size_decode()
which is a very incomplete implementation of this check. We already have
code to do a full and correct page size decode in hpte_page_shift().

This patch moves hpte_page_shift() so it can be used in
ppc_hash64_pteg_search() and adjusts the latter's parameters to include
a full SLBE instead of just a segment page shift.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ppc: simplify ppc_hash64_hpte_page_shift_noslb()

The segment page shift parameter is never used. Let's remove it.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW)

This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.6 machine and older disable it.
This also creates a single DMA window for the older machines to
maintain backward migration.

This implements DDW for PHB with emulated and VFIO devices. The host
kernel support is required. The advertised IOMMU page sizes are 4K and
64K; 16M pages are supported but not advertised by default, in order to
enable them, the user has to specify "pgsz" property for PHB and
enable huge pages for RAM.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
and updates all references to dma_liobn. However this does not add
64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
rather pointless there (as it is a PHB property and the management
software can/should pass LIOBNs via CLI) but we keep it for the backward
migration support.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)

New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
This adds ability to VFIO common code to dynamically allocate/remove
DMA windows in the host kernel when new VFIO container is added/removed.

This adds a helper to vfio_listener_region_add which makes
VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into
the host IOMMU list; the opposite action is taken in
vfio_listener_region_del.

When creating a new window, this uses heuristic to decide on the TCE table
levels number.

This should cause no guest visible change in behavior.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[dwg: Added some casts to prevent printf() warnings on certain targets
where the kernel headers' __u64 doesn't match uint64_t or PRIx64]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

vfio: Add host side DMA window capabilities

There are going to be multiple IOMMUs per a container. This moves
the single host IOMMU parameter set to a list of VFIOHostDMAWindow.

This should cause no behavioral change and will be used later by
the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)

This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a prereg memory listener which listens on address_space_memory
and notifies a VFIO container about memory which needs to be
pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This enforces guest RAM blocks to be host page size aligned; however
this is not new as KVM already requires memory slots to be host page
size aligned.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Fix compile error on 32-bit host]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr_iommu: Realloc guest visible TCE table when starting/stopping listening

The sPAPR TCE tables manage 2 copies when VFIO is using an IOMMU -
a guest view of the table and a hardware TCE table. If there is no VFIO
presense in the address space, then just the guest view is used, if
this is the case, it is allocated in the KVM. However since there is no
support yet for VFIO in KVM TCE hypercalls, when we start using VFIO,
we need to move the guest view from KVM to the userspace; and we need
to do this for every IOMMU on a bus with VFIO devices.

This implements the callbacks for the sPAPR IOMMU - notify_started()
reallocated the guest view to the user space, notify_stopped() does
the opposite.

This removes explicit spapr_tce_set_need_vfio() call from PCI hotplug
path as the new callbacks do this better - they notify IOMMU at
the exact moment when the configuration is changed, and this also
includes the case of PCI hot unplug.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc: simplify max_smt initialization in ppc_cpu_realizefn()

kvmppc_smt_threads() returns 1 if KVM is not enabled.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: Ensure thread0 of CPU core is always realized first

During CPU core realization, we create all the thread objects and parent
them to the core object in a loop. However, the realization of thread
objects is done separately by walking the threads of a core using
object_child_foreach(). With this, there is no guarantee on the order
in which the child thread objects get realized. Since CPU device tree
properties are currently derived from the CPU thread object, we assume
thread0 of the core to be the representative thread of the core when
creating device tree properties for the core. If thread0 is not the
first thread that gets realized, then we would end up having an
incorrect dt_id for the core and this causes hotplug failures from
the guest.

Fix this by realizing each thread object by walking the core's thread
object list thereby ensuring that thread0 and other threads are always
realized in the correct order.

Future TODO: CPU DT nodes are per-core properties and we should
ideally base the creation of CPU DT nodes on core objects rather than
the thread objects.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

ppc: Fix xsrdpi, xvrdpi and xvrspi rounding

xsrdpi, xvrdpi and xvrspi use the round ties away method, not round
nearest even.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-seabios-20160704-3' into staging

Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."

# gpg: Signature made Mon 04 Jul 2016 16:24:55 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-seabios-20160704-3:
  Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/berrange/tags/pull-qcrypto-2016-07-04-1' into staging

Merge qcrypto 2016/07/04 v1

# gpg: Signature made Mon 04 Jul 2016 15:54:26 BST
# gpg:                using RSA key 0xBE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/pull-qcrypto-2016-07-04-1:
  crypto: allow default TLS priority to be chosen at build time
  crypto: add support for TLS priority string override
  crypto: implement sha224, sha384, sha512 and ripemd160 hashes
  crypto: switch hash code to use nettle/gcrypt directly
  crypto: rename OUT to out in xts test to avoid clash on MinGW
  crypto: fix handling of iv generator hash defaults

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Revert "bios: Add fast variant of SeaBIOS for use with -kernel on x86."

This reverts commit 4e04ab6a63ebe9fb4305e7e8e49cc8b0095db8fb.

Also remove pc-bios/bios-fast.bin.

Commit was merged by mistake.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

crypto: allow default TLS priority to be chosen at build time

Modern gnutls can use a global config file to control the
crypto priority settings for TLS connections. For example
the priority string "@SYSTEM" instructs gnutls to find the
priority setting named "SYSTEM" in the global config file.

Latest gnutls GIT codebase gained the ability to reference
multiple priority strings in the config file, with the first
one that is found to existing winning. This means it is now
possible to configure QEMU out of the box with a default
priority of "@QEMU,SYSTEM", which says to look for the
settings "QEMU" first, and if not found, use the "SYSTEM"
settings.

To make use of this facility, we introduce the ability to
set the QEMU default priority at build time via a new
configure argument. It is anticipated that distro vendors
will set this when building QEMU to a suitable value for
use with distro crypto policy setup. eg current Fedora
would run

./configure --tls-priority=@SYSTEM

while future Fedora would run

./configure --tls-priority=@QEMU,SYSTEM

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

crypto: add support for TLS priority string override

The gnutls default priority is either "NORMAL" (most historical
versions of gnutls) which is a built-in label in gnutls code,
or "@SYSTEM" (latest gnutls on Fedora at least) which refers
to an admin customizable entry in a gnutls config file.

Regardless of which default is used by a distro, they are both
global defaults applying to all applications using gnutls. If
a single application on the system needs to use a weaker set
of crypto priorities, this potentially forces the weakness onto
all applications. Or conversely if a single application wants a
strong default than all others, it can't do this via the global
config file.

This adds an extra parameter to the tls credential object which
allows the mgmt app / user to explicitly provide a priority
string to QEMU when configuring TLS.

For example, to use the "NORMAL" priority, but disable SSL 3.0
one can now configure QEMU thus:

  $QEMU -object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
                priority="NORMAL:-VERS-SSL3.0" \
        ..other args...

If creating tls-creds-anon, whatever priority the user specifies
will always have "+ANON-DH" appended to it, since that's mandatory
to make the anonymous credentials work.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

crypto: implement sha224, sha384, sha512 and ripemd160 hashes

Wire up the nettle and gcrypt hash backends so that they can
support the sha224, sha384, sha512 and ripemd160 hash algorithms.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20160704' into staging

target-arm queue:
* fix semihosting SYS_HEAPINFO call for A64 guests
* fix crash if guest tries to write to ROM on imx boards
* armv7m_nvic: fix crash for debugger reads from some registers
* virt: mark PCIe host controller as dma-coherent in the DT
* add data-driven register API
* Xilinx Zynq: add devcfg device model
* m25p80: fix various bugs
* ast2400: add SMC controllers and SPI flash slaves

# gpg: Signature made Mon 04 Jul 2016 13:17:34 BST
# gpg:                using RSA key 0x3C2525ED14360CDE
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>"
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>"
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>"
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20160704: (23 commits)
  ast2400: create SPI flash slaves
  ast2400: add SPI flash slaves
  ast2400: add SMC controllers (FMC and SPI)
  m25p80: qdev-ify drive property
  m25p80: change cur_addr to 32 bit integer
  m25p80: avoid out of bounds accesses
  m25p80: do not put iovec on the stack
  ssi: change ssi_slave_init to be a realize ops
  xilinx_zynq: Connect devcfg to the Zynq machine model
  dma: Add Xilinx Zynq devcfg device model
  register: Add block initialise helper
  register: QOMify
  register: Define REG and FIELD macros
  register: Add Memory API glue
  register: Add Register API
  bitops: Add MAKE_64BIT_MASK macro
  hw/arm/virt: mark the PCIe host controller as DMA coherent in the DT
  armv7m_nvic: Use qemu_get_cpu(0) instead of current_cpu
  memory: Assert that memory_region_init_rom_device() ops aren't NULL
  imx: Use memory_region_init_rom() for ROMs
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kraxel/tags/pull-seabios-20160704-1' into staging

seabios: update from 1.9.1 to 1.9.3

# gpg: Signature made Mon 04 Jul 2016 10:29:47 BST
# gpg:                using RSA key 0x4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>"
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>"
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>"
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/pull-seabios-20160704-1:
  seabios: update binaries from 1.9.1 to 1.9.3
  seabios: update 128k config
  bios: Add fast variant of SeaBIOS for use with -kernel on x86.
  seabios: update submodule from 1.9.1 to 1.9.3

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

ast2400: create SPI flash slaves

A set of SPI flash slaves is attached under the flash controllers of
the palmetto platform. "n25q256a" flash modules are used for the BMC
and "mx25l25635e" for the host. These types are common in the
OpenPower ecosystem.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467138270-32481-9-git-send-email-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

ast2400: add SPI flash slaves

Each controller on the ast2400 has a memory range on which it maps its
flash module slaves. Each slave is assigned a memory segment for its
mapping that can be changed at bootime with the Segment Address
Register. This is not supported in the current implementation so we
are using the defaults provided by the specs.

Each SPI flash slave can then be accessed in two modes: Command and
User. When in User mode, accesses to the memory segment of the slaves
are translated in SPI transfers. When in Command mode, the HW
generates the SPI commands automatically and the memory segment is
accessed as if doing a MMIO. Other SPI controllers call that mode
linear addressing mode.

For this purpose, we are adding below each crontoller an array of
structs gathering for each SPI flash module, a segment rank, a
MemoryRegion to handle the memory accesses and the associated SPI
slave device, which should be a m25p80.

Only the User mode is supported for now but we are preparing ground
for the Command mode. The framework is sufficient to support Linux.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467138270-32481-8-git-send-email-clg@kaod.org
[PMM: Use g_new0() rather than g_malloc0()]
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

ast2400: add SMC controllers (FMC and SPI)

The Aspeed AST2400 soc includes a static memory controller for the BMC
which supports NOR, NAND and SPI flash memory modules. This controller
has two modes : the SMC for the legacy interface which supports only
one module and the FMC for the new interface which supports up to five
modules. The AST2400 also includes a SPI only controller used for the
host firmware, commonly called BIOS on Intel. It can be used in three
mode : a SPI master, SPI slave and SPI pass-through

Below is the initial framework for the SMC controller (FMC mode only)
and the SPI controller: the sysbus object, MMIO for registers
configuration and controls. Each controller has a SPI bus and a
configurable number of CS lines for SPI flash slaves.

The differences between the controllers are small, so they are
abstracted using indirections on the register numbers.

Only SPI flash modules are supported.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467138270-32481-7-git-send-email-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: added one missing error_propagate]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

m25p80: qdev-ify drive property

This allows specifying the property via -drive if=none and creating
the flash device with -device.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467138270-32481-6-git-send-email-clg@kaod.org
[clg: added an extra fix for sabrelite_init()
keeping the test on flash_dev did not seem necessary. ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

m25p80: change cur_addr to 32 bit integer

The maximum amount of storage that can be addressed by the m25p80 command
set is 4 GiB. However, cur_addr is currently a 64-bit integer. To avoid
further problems related to sign extension of signed 32-bit integer
expressions, change cur_addr to a 32 bit integer. Preserve migration
format by adding a dummy 4-byte field in place of the (big-endian)
high four bytes in the formerly 64-bit cur_addr field.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467138270-32481-5-git-send-email-clg@kaod.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

m25p80: avoid out of bounds accesses

s->cur_addr can be made to point outside s->storage, either by
writing a value >= 128 to s->ear (because s->ear * MAX_3BYTES_SIZE
is a signed integer and sign-extends into the 64-bit cur_addr),
or just by writing an address beyond the size of the flash being
emulated. Avoid the sign extension to make the code cleaner, and
on top of that mask s->cur_addr to s->size.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467138270-32481-4-git-send-email-clg@kaod.org
Reviewed by: Marcin Krzeminski <marcin.krzeminski@nokia.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

m25p80: do not put iovec on the stack

When doing a read-modify-write cycle, QEMU uses the iovec after returning
from blk_aio_pwritev. m25p80 puts the iovec on the stack of blk_aio_pwritev's
caller, which causes trouble in this case. This has been a problem
since commit 243e6f6 ("m25p80: Switch to byte-based block access",
2016-05-12) started doing writes at a smaller granularity than 512 bytes.
In principle however it could have broken before when using -drive
if=mtd,cache=none on a disk with 4K native sectors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467138270-32481-3-git-send-email-clg@kaod.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

ssi: change ssi_slave_init to be a realize ops

This enables qemu to handle late inits and report errors. All the SSI
slave routine names were changed accordingly. Code was modified to
handle errors when possible (m25p80 and ssi-sd)

Tested with the m25p80 slave object.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-id: 1467138270-32481-2-git-send-email-clg@kaod.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

xilinx_zynq: Connect devcfg to the Zynq machine model

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 85f39c9a13569b1113dacac3b952b0af54fc1260.1467053537.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

dma: Add Xilinx Zynq devcfg device model

Add a minimal model for the devcfg device which is part of Zynq.
This model supports DMA capabilities and interrupt generation.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 83df49d8fa2d203a421ca71620809e4b04754e65.1467053537.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

register: Add block initialise helper

Add a helper that will scan a static RegisterAccessInfo Array
and populate a container MemoryRegion with registers as defined.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: 347b810b2799e413c98d5bbeca97bcb1557946c3.1467053537.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

register: QOMify

QOMify registers as a child of TYPE_DEVICE. This allows registers to
define GPIOs.

Define an init helper that will do QOM initialisation.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: KONRAD Frederic <fred.konrad@greensocs.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 2545f71db26bf5586ca0c08a3e3cf1b217450552.1467053537.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

register: Define REG and FIELD macros

Define some macros that can be used for defining registers and fields.

The REG32 macro will define A_FOO, for the byte address of a register
as well as R_FOO for the uint32_t[] register number (A_FOO / 4).

The FIELD macro will define FOO_BAR_MASK, FOO_BAR_SHIFT and
FOO_BAR_LENGTH constants for field BAR in register FOO.

Finally, there are some shorthand helpers for extracting/depositing
fields from registers based on these naming schemes.

Usage can greatly reduce the verbosity of device code.

The deposit and extract macros (eg FIELD_EX32, FIELD_DP32 etc.) can be
used to generate extract and deposits without any repetition of the name
stems.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: bbd87a3c03b1f173b1ed73a6d502c0196c18a72f.1467053537.git.alistair.francis@xilinx.com
[ EI Changes:
* Add Deposit macros
]
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

register: Add Memory API glue

Add memory io handlers that glue the register API to the memory API.
Just translation functions at this stage. Although it does allow for
devices to be created without all-in-one mmio r/w handlers.

This patch also adds the RegisterInfoArray struct, which allows all of
the individual RegisterInfo structs to be grouped into a single memory
region.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Message-id: f7704d8ac6ac0f469ed35401f8151a38bd01468b.1467053537.git.alistair.francis@xilinx.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

register: Add Register API

This API provides some encapsulation of registers and factors out some
common functionality to common code. Bits of device state (usually MMIO
registers) often have all sorts of access restrictions and semantics
associated with them. This API allows you to define what those
restrictions are on a bit-by-bit basis.

Helper functions are then used to access the register which observe the
semantics defined by the RegisterAccessInfo struct.

Some features:
Bits can be marked as read_only (ro field)
Bits can be marked as write-1-clear (w1c field)
Bits can be marked as reserved (rsvd field)
Reset values can be defined (reset)
Bits can be marked clear on read (cor)
Pre and post action callbacks can be added to read and write ops
Verbose debugging info can be enabled/disabled

Useful for defining device register spaces in a data driven way. Cuts
down on a lot of the verbosity and repetition in the switch-case blocks
in the standard foo_mmio_read/write functions.

Also useful for automated generation of device models from hardware
design sources.

Signed-off-by: Peter Crosthwaite <peter.crosthwaite@xilinx.com>
Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 40d62c7e1bf6e63bb4193ec46b15092a7d981e59.1467053537.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

bitops: Add MAKE_64BIT_MASK macro

Add a macro that creates a 64bit value which has length number of ones
shifted across by the value of shift.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 9773244aa1c8c26b8b82cb261d8f5dd4b7b9fcf9.1467053537.git.alistair.francis@xilinx.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/virt: mark the PCIe host controller as DMA coherent in the DT

Since QEMU performs cacheable accesses to guest memory when doing DMA
as part of the implementation of emulated PCI devices, guest drivers
should use cacheable accesses as well when running under KVM. Since this
essentially means that emulated PCI devices are DMA coherent, set the
'dma-coherent' DT property on the PCIe host controller DT node.

This brings the DT description into line with the ACPI description,
which already marks the PCI bridge as cache coherent (see commit
bc64b96c984abf).

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Message-id: 1467134090-5099-1-git-send-email-ard.biesheuvel@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

armv7m_nvic: Use qemu_get_cpu(0) instead of current_cpu

Starting QEMU with -S results in current_cpu containing its initial
value of NULL. It is however possible to connect to such QEMU instance
and query various CPU registers, one example being CPUID, and doing that
results in QEMU segfaulting.

Using qemu_get_cpu(0) seem reasonable enough given that ARMv7M
architecture is a single core architecture.

Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

memory: Assert that memory_region_init_rom_device() ops aren't NULL

It doesn't make sense to pass a NULL ops argument to
memory_region_init_rom_device(), because the effect will
be that if the guest tries to write to the memory region
then QEMU will segfault. Catch the bug earlier by sanity
checking the arguments to this function, and remove the
misleading documentation that suggests that passing NULL
might be sensible.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1467122287-24974-4-git-send-email-peter.maydell@linaro.org

imx: Use memory_region_init_rom() for ROMs

The imx boards were all incorrectly creating ROMs using
memory_region_init_rom_device() with a NULL ops pointer. This
will cause QEMU to abort if the guest tries to write to the
ROM. Switch to the new memory_region_init_rom() instead.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1467122287-24974-3-git-send-email-peter.maydell@linaro.org

memory: Provide memory_region_init_rom()

Provide a new helper function memory_region_init_rom() for memory
regions which are read-only (and unlike those created by
memory_region_init_rom_device() don't have special behaviour
for writes). This has the same behaviour as calling
memory_region_init_ram() and then memory_region_set_readonly()
(which is what we do today in boards with pure ROMs) but is a
more easily discoverable API for the purpose.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1467122287-24974-2-git-send-email-peter.maydell@linaro.org

target-arm/arm-semi.c: Fix SYS_HEAPINFO for 64-bit guests

SYS_HEAPINFO is one of the few semihosting calls which has to write
values back into a parameter block in memory. When we added
support for 64-bit semihosting we updated the code which reads from
the parameter block to read 64-bit words but forgot to change the
code that writes back into the block. Update it to treat the
block as a set of words of the appropriate width for the guest.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 1466783381-29506-3-git-send-email-peter.maydell@linaro.org

linux-user: Make semihosting heap/stack fields abi_ulongs

The fields in the TaskState heap_base, heap_limit and stack_base
are all guest addresses (representing the locations of the heap
and stack for the guest binary), so they should be abi_ulong
rather than uint32_t. (This only in practice affects ARM AArch64
since all the other semihosting implementations are 32-bit.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Message-id: 1466783381-29506-2-git-send-email-peter.maydell@linaro.org

Merge remote-tracking branch 'remotes/thibault/tags/samuel-thibault' into staging

slirp updates

# gpg: Signature made Sun 03 Jul 2016 23:03:04 BST
# gpg:                using RSA key 0xE3E51CE8FB6B2F1D
# gpg: Good signature from "Samuel Thibault <samuel.thibault@gnu.org>"
# gpg:                 aka "Samuel Thibault <sthibault@debian.org>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@inria.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@labri.fr>"
# gpg:                 aka "Samuel Thibault <samuel.thibault@ens-lyon.org>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 900C B024 B679 31D4 0F82  304B D017 8C76 7D06 9EE6
#      Subkey fingerprint: F632 74CD C630 0873 CB3D  29D9 E3E5 1CE8 FB6B 2F1D

* remotes/thibault/tags/samuel-thibault:
  slirp: Add support for stateless DHCPv6
  slirp: Remove superfluous memset() calls from the TFTP code
  slirp: Add RDNSS advertisement
  slirp: Support link-local DNS addresses
  slirp: Add dns6 resolution
  slirp: Split get_dns_addr

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

crypto: switch hash code to use nettle/gcrypt directly

Currently the internal hash code is using the gnutls hash APIs.
GNUTLS in turn is wrapping either nettle or gcrypt. Not only
were the GNUTLS hash APIs not added until GNUTLS 2.9.10, but
they don't expose support for all the algorithms QEMU needs
to use with LUKS.

Address this by directly wrapping nettle/gcrypt in QEMU and
avoiding GNUTLS's extra layer of indirection. This gives us
support for hash functions on a much wider range of platforms
and opens up ability to support more hash functions. It also
avoids a GNUTLS bug which would not correctly handle hashing
of large data blocks if int != size_t.

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

crypto: rename OUT to out in xts test to avoid clash on MinGW

On MinGW one of the system headers already has "OUT" defined
which causes a compile failure of the test suite. Rename the
test suite var to 'out' to avoid this clash

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

crypto: fix handling of iv generator hash defaults

When opening an existing LUKS volume, if the iv generator is
essiv, then the iv hash algorithm is mandatory to provide. We
must report an error if it is omitted in the cipher mode spec,
not silently default to hash 0 (md5). If the iv generator is
not essiv, then we explicitly ignore any iv hash algorithm,
rather than report an error, for compatibility with dm-crypt.

When creating a new LUKS volume, if the iv generator is essiv
and no iv hsah algorithm is provided, we should default to
using the sha256 hash.

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

seabios: update binaries from 1.9.1 to 1.9.3

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

seabios: update 128k config

Turn off mpt-scsi and bootsplash to keep size below 128k.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

bios: Add fast variant of SeaBIOS for use with -kernel on x86.

This commit adds a fast variant of SeaBIOS called 'bios-fast.bin'.

It's designed to be the fastest (also the smallest, but that's not the
main aim) SeaBIOS that is just enough to boot a Linux kernel using the
-kernel option on i686 and x86_64.

This commit does not modify the -kernel option to use this. You have
to specify it by doing something like this:

-kernel vmlinuz -bios bios-fast.bin

Signed-off-by: Richard W.M. Jones <rjones@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

seabios: update submodule from 1.9.1 to 1.9.3

git shortlog
============

Alex Williamson (1):
      fw/pci: Add support for mapping Intel IGD via QEMU

Haozhong Zhang (1):
      fw/msr_feature_control: add support to set MSR_IA32_FEATURE_CONTROL

Kevin O'Connor (1):
      build: fix .text section address alignment

Marcel Apfelbaum (1):
      fw/pci: add Q35 S3 support

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

slirp: Add support for stateless DHCPv6

Provide basic support for stateless DHCPv6 (see RFC 3736) so
that guests can also automatically boot via IPv6 with SLIRP
(for IPv6 network booting, see RFC 5970 for details).

Tested with:

qemu-system-ppc64 -nographic -vga none -boot n -net nic \
-net user,ipv6=yes,ipv4=no,tftp=/path/to/tftp,bootfile=ppc64.img

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>

slirp: Remove superfluous memset() calls from the TFTP code

Commit fad7fb9ccd8013ea03 ("Add IPv6 support to the TFTP code")
refactored some common code for preparing the mbuf into a new
function called tftp_prep_mbuf_data(). One part of this common
code is to do a "memset(m->m_data, 0, m->m_size);" for the related
buffer first. However, at two spots, the memset() was not removed
from the calling function, so it currently done twice in these code
paths. Thus let's delete these superfluous memsets in the calling
functions now.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>

slirp: Add RDNSS advertisement

This adds the RDNSS option to IPv6 router advertisements, so that the guest
can autoconfigure the DNS server address.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
---
Changes since last submission:
- Disable on windows, until we have support for it

slirp: Support link-local DNS addresses

They look like fe80::%eth0

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
---
Changes since last submission:
- fix windows build

slirp: Add dns6 resolution

This makes get_dns_addr address family-agnostic, thus allowing to add the
IPv6 case.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>

slirp: Split get_dns_addr

Separate get_dns_addr into get_dns_addr_cached and get_dns_addr_resolv_conf
to make conversion to IPv6 easier.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>

Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging

Only trivial fixes.

# gpg: Signature made Fri 01 Jul 2016 13:39:06 BST
# gpg:                using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <gkurz@fr.ibm.com>"
# gpg:                 aka "Greg Kurz <groug@free.fr>"
# gpg:                 aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg:                 aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg:                 aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>"
# gpg:                 aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894  DBA2 02FC 3AEB 0101 DBC2

* remotes/gkurz/tags/for-upstream:
  9p: synth: drop v9fs_ prefix
  9p: don't include <sys/uio.h>

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>