git.proxmox.com Git - mirror

configure: introduce --enable-vhost-user-blk-server

Make it possible to compile out the vhost-user-blk server. It is enabled
by default on Linux.

Note that vhost-user-server.c depends on libvhost-user, which requires
CONFIG_LINUX. The CONFIG_VHOST_USER dependency was erroneous since that
option controls vhost-user frontends (previously known as "master") and
not device backends (previously known as "slave").

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

libvhost-user: follow QEMU comment style

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201027173528.213464-2-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-blk: set features before setting inflight feature

Virtqueue has split and packed, so before setting inflight,
you need to inform the back-end virtqueue format.

Signed-off-by: Jin Yu <jin.yu@intel.com>
Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Message-Id: <20201103123617.28256-1-jin.yu@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Revert "vhost-blk: set features before setting inflight feature"

This reverts commit adb29c027341ba095a3ef4beef6aaef86d3a520e.

The commit broke -device vhost-user-blk-pci because the
vhost_dev_prepare_inflight() function it introduced segfaults in
vhost_dev_set_features() when attempting to access struct vhost_dev's
vdev pointer before it has been assigned.

To reproduce the segfault simply launch a vhost-user-blk device with the
contrib vhost-user-blk device backend:

  $ build/contrib/vhost-user-blk/vhost-user-blk -s /tmp/vhost-user-blk.sock -r -b /var/tmp/foo.img
  $ build/qemu-system-x86_64 \
        -device vhost-user-blk-pci,id=drv0,chardev=char1,addr=4.0 \
        -object memory-backend-memfd,id=mem,size=1G,share=on \
        -M memory-backend=mem,accel=kvm \
        -chardev socket,id=char1,path=/tmp/vhost-user-blk.sock
  Segmentation fault (core dumped)

Cc: Jin Yu <jin.yu@intel.com>
Cc: Raphael Norwitz <raphael.norwitz@nutanix.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102165709.232180-1-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

net: Add vhost-vdpa in show_netdevs()

Fix the bug that while Check qemu supported netdev,
there is no vhost-vdpa

Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20201016030909.9522-2-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vhost-vdpa: Add qemu_close in vhost_vdpa_cleanup

fix the bug that fd will still open after the cleanup

Signed-off-by: Cindy Lu <lulu@redhat.com>
Message-Id: <20201016030909.9522-1-lulu@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vfio: Don't issue full 2^64 unmap

IOMMUs may declare memory regions spanning from 0 to UINT64_MAX. When
attempting to deal with such region, vfio_listener_region_del() passes a
size of 2^64 to int128_get64() which throws an assertion failure. Even
ignoring this, the VFIO_IOMMU_DMA_MAP ioctl cannot handle this size
since the size field is 64-bit. Split the request in two.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-11-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Set supported page size mask

The virtio-iommu device can deal with arbitrary page sizes for virtual
endpoints, but for endpoints assigned with VFIO it must follow the page
granule used by the host IOMMU driver.

Implement the interface to set the vIOMMU page size mask, called by VFIO
for each endpoint. We assume that all host IOMMU drivers use the same
page granule (the host page granule). Override the page_size_mask field
in the virtio config space.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-10-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

vfio: Set IOMMU page size as per host supported page size

Set IOMMU supported page size mask same as host Linux supported page
size mask.

Acked-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-9-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

memory: Add interface to set iommu page size mask

Allow to set the page size mask supported by an iommu memory region.
This enables a vIOMMU to communicate the page size granule supported by
an assigned device, on hosts that use page sizes greater than 4kB.

Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-8-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Add notify_flag_changed() memory region callback

Add notify_flag_changed() to notice when memory listeners are added and
removed.

Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-7-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Add replay() memory region callback

Implement the replay callback to setup all mappings for a new memory
region.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-6-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Call memory notifiers in attach/detach

Call the memory notifiers when attaching an endpoint to a domain, to
replay existing mappings, and when detaching the endpoint, to remove all
mappings.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-5-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Add memory notifiers for map/unmap

Extend VIRTIO_IOMMU_T_MAP/UNMAP request to notify memory listeners. It
will call VFIO notifier to map/unmap regions in the physical IOMMU.

Signed-off-by: Bharat Bhushan <bbhushan2@marvell.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-4-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Store memory region in endpoint struct

Store the memory region associated to each endpoint into the endpoint
structure, to allow efficient memory notification on map/unmap.

Acked-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-3-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-iommu: Fix virtio_iommu_mr()

Due to an invalid mask, virtio_iommu_mr() may return the wrong memory
region. It hasn't been too problematic so far because the function was
only used to test existence of an endpoint, but that is about to change.

Fixes: cfb42188b24d ("virtio-iommu: Implement attach/detach command")
Cc: QEMU Stable <qemu-stable@nongnu.org>
Acked-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Message-Id: <20201030180510.747225-2-jean-philippe@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/smbios: Fix leaked fd in save_opt_one() error path

Fix the following Coverity issue (RESOURCE_LEAK):

CID 1432879: Resource leak

Handle variable fd going out of scope leaks the handle.

Replace a close() call by qemu_close() since the handle is
opened with qemu_open().

Fixes: bb99f4772f5 ("hw/smbios: support loading OEM strings values from a file")
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201030152742.1553968-1-philmd@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/virtio/vhost-backend: Fix Coverity CID 1432871

Fix uninitialized value issues reported by Coverity:

Field 'msg.reserved' is uninitialized when calling write().

While the 'struct vhost_msg' does not have a 'reserved' field,
we still initialize it to have the two parts of the function
consistent.

Reported-by: Coverity (CID 1432864: UNINIT)
Fixes: c471ad0e9bd ("vhost_net: device IOTLB support")
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201103063541.2463363-1-philmd@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/acpi : add spaces around operator

Fix code style. Operator needs spaces both sides.

Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com>
Signed-off-by: Kai Deng <dengkai1@huawei.com>
Message-Id: <20201103102634.273021-3-zhangxinhao1@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/acpi : add space before the open parenthesis '('

Fix code style. Space required before the open parenthesis '('.

Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com>
Signed-off-by: Kai Deng <dengkai1@huawei.com>
Message-Id: <20201103102634.273021-2-zhangxinhao1@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/acpi : Don't use '#' flag of printf format

Fix code style. Don't use '#' flag of printf format ('%#') in
format strings, use '0x' prefix instead

Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com>
Signed-off-by: Kai Deng <dengkai1@huawei.com>
Message-Id: <20201103102634.273021-1-zhangxinhao1@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virito-mem: Implement get_min_alignment()

The block size determines the alignment requirements. Implement
get_min_alignment() of the TYPE_MEMORY_DEVICE interface.

This allows auto-assignment of a properly aligned address in guest
physical address space. For example, when specifying a 2GB block size
for a virtio-mem device with 10GB with a memory setup "-m 4G, 20G",
we'll no longer fail when realizing.

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20201008083029.9504-7-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

memory-device: Add get_min_alignment() callback

Add a callback that can be used to express additional alignment
requirements (exceeding the ones from the memory region).

Will be used by virtio-mem to express special alignment requirements due
to manually configured, big block sizes (e.g., 1GB with an ordinary
memory-backend-ram). This avoids failing later when realizing, because
auto-detection wasn't able to assign a properly aligned address.

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20201008083029.9504-6-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

memory-device: Support big alignment requirements

Let's warn instead of bailing out - the worst thing that can happen is
that we'll fail hot/coldplug later. The user got warned, and this should
be rare.

This will be necessary for memory devices with rather big (user-defined)
alignment requirements - say a virtio-mem device with a 2G block size -
which will become important, for example, when supporting vfio in the
future.

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20201008083029.9504-5-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-mem: Probe THP size to determine default block size

Let's allow a minimum block size of 1 MiB in all configurations. Select
the default block size based on
- The page size of the memory backend.
- The THP size if the memory backend size corresponds to the real host
page size.
- The global minimum of 1 MiB.
and warn if something smaller is configured by the user.

VIRTIO_MEM only supports Linux (depends on LINUX), so we can probe the
THP size unconditionally.

For now we only support virtio-mem on x86-64 - there isn't a user-visible
change (x86-64 only supports 2 MiB THP on the PMD level) - the default
was, and will be 2 MiB.

If we ever have THP on the PUD level (e.g., 1 GiB THP on x86-64), we
expect it to be more transparent - e.g., to only optimize fully populated
ranges unless explicitly told /configured otherwise (in contrast to PMD
THP).

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20201008083029.9504-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-mem: Make sure "usable_region_size" is always multiples of the block size

The spec states:
"The device MUST set addr, region_size, usable_region_size, plugged_size,
requested_size to multiples of block_size."

With block sizes > 256MB, we currently wouldn't guarantee that for the
usable_region_size.

Note that we cannot exceed the region_size, as we already enforce the
alignment there properly.

Fixes: 910b25766b33 ("virtio-mem: Paravirtualized memory hot(un)plug")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20201008083029.9504-3-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-mem: Make sure "addr" is always multiples of the block size

The spec states:
"The device MUST set addr, region_size, usable_region_size, plugged_size,
requested_size to multiples of block_size."

In some cases, we currently don't guarantee that for "addr": For example,
when starting a VM with 4 GiB boot memory and a virtio-mem device with a
block size of 2 GiB, "memaddr"/"addr" will be auto-assigned to
0x140000000 (5 GiB).

We'll try to improve auto-assignment for memory devices next, to avoid
bailing out in case memory device code selects a bad address.

Note: The Linux driver doesn't support such big block sizes yet.

Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Fixes: 910b25766b33 ("virtio-mem: Paravirtualized memory hot(un)plug")
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20201008083029.9504-2-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

pc: comment style fixup

Fix up checkpatch comment style warnings.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Chen Qun <kuhn.chenqun@huawei.com>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20201102' into staging

target-arm queue:
* target/arm: Fix Neon emulation bugs on big-endian hosts
* target/arm: fix handling of HCR.FB
* target/arm: fix LORID_EL1 access check
* disas/capstone: Fix monitor disassembly of >32 bytes
* hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
* hw/arm/boot: fix SVE for EL3 direct kernel boot
* hw/display/omap_lcdc: Fix potential NULL pointer dereference
* hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
* target/arm: Get correct MMU index for other-security-state
* configure: Test that gio libs from pkg-config work
* hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
* docs: Fix building with Sphinx 3
* tests/qtest/npcm7xx_rng-test: Disable randomness tests

# gpg: Signature made Mon 02 Nov 2020 17:09:00 GMT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20201102: (26 commits)
  tests/qtest/npcm7xx_rng-test: Disable randomness tests
  qemu-option-trace.rst.inc: Don't use option:: markup
  scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments
  hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work
  configure: Test that gio libs from pkg-config work
  target/arm: Get correct MMU index for other-security-state
  hw/display/exynos4210_fimd: Fix potential NULL pointer dereference
  hw/display/omap_lcdc: Fix potential NULL pointer dereference
  hw/arm/boot: fix SVE for EL3 direct kernel boot
  hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)
  disas/capstone: Fix monitor disassembly of >32 bytes
  target/arm: fix LORID_EL1 access check
  target/arm: fix handling of HCR.FB
  target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts
  target/arm: Fix float16 pairwise Neon ops on big-endian hosts
  target/arm: Improve do_prewiden_3d
  target/arm: Simplify do_long_3d and do_2scalar_long
  target/arm: Rename neon_load_reg64 to vfp_load_reg64
  target/arm: Add read/write_neon_element64
  target/arm: Rename neon_load_reg32 to vfp_load_reg32
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20201102a' into staging

Migration and virtiofs fixes 2020-11-02

Fixes for postcopy migration test hang
A seccomp crash for virtiofsd on some !x86
Help message and minor CID fix

And another crack at Max's set.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
# gpg: Signature made Mon 02 Nov 2020 19:54:59 GMT
# gpg:                using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7
# gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full]
# Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7

* remotes/dgilbert/tags/pull-migration-20201102a:
  tests/acceptance: Add virtiofs_submounts.py
  tests/acceptance/boot_linux: Accept SSH pubkey
  virtiofsd: Announce sub-mount points
  virtiofsd: Add mount ID to the lo_inode key
  meson.build: Check for statx()
  virtiofsd: Add attr_flags to fuse_entry_param
  virtiofsd: Check FUSE_SUBMOUNTS
  virtiofsd: Fix the help message of posix lock
  tools/virtiofsd: Check vu_init() return value (CID 1435958)
  virtiofsd: Seccomp: Add 'send' for syslog
  migration: Postpone the kick of the fault thread after recover
  migration: Unify reset of last_rb on destination node when recover

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tests/acceptance: Add virtiofs_submounts.py

This test invokes several shell scripts to create a random directory
tree full of submounts, and then check in the VM whether every submount
has its own ID and the structure looks as expected.

(Note that the test scripts must be non-executable, so Avocado will not
try to execute them as if they were tests on their own, too.)

Because at this commit's date it is unlikely that the Linux kernel on
the image provided by boot_linux.py supports submounts in virtio-fs, the
test will be cancelled if no custom Linux binary is provided through the
vmlinuz parameter.  (The on-image kernel can be used by providing an
empty string via vmlinuz=.)

So, invoking the test can be done as follows:
$ avocado run \
    tests/acceptance/virtiofs_submounts.py \
    -p vmlinuz=/path/to/linux/build/arch/x86/boot/bzImage

This test requires root privileges (through passwordless sudo -n),
because at this point, virtiofsd requires them.  (If you have a
timestamp_timeout period for sudoers (e.g. the default of 5 min), you
can provide this by executing something like "sudo true" before invoking
Avocado.)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <20201102161859.156603-8-mreitz@redhat.com>
Tested-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tests/acceptance/boot_linux: Accept SSH pubkey

Let download_cloudinit() take an optional pubkey, which subclasses of
BootLinux can pass through setUp().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Willian Rampazzo <willianr@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-7-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

virtiofsd: Announce sub-mount points

Whenever we encounter a directory with an st_dev or mount ID that
differs from that of its parent, we set the FUSE_ATTR_SUBMOUNT flag so
the guest can create a submount for it.

We only need to do so in lo_do_lookup().  The following functions return
a fuse_attr object:
- lo_create(), though fuse_reply_create(): Calls lo_do_lookup().
- lo_lookup(), though fuse_reply_entry(): Calls lo_do_lookup().
- lo_mknod_symlink(), through fuse_reply_entry(): Calls lo_do_lookup().
- lo_link(), through fuse_reply_entry(): Creating a link cannot create a
  submount, so there is no need to check for it.
- lo_getattr(), through fuse_reply_attr(): Announcing submounts when the
  node is first detected (at lookup) is sufficient.  We do not need to
  return the submount attribute later.
- lo_do_readdir(), through fuse_add_direntry_plus(): Calls
  lo_do_lookup().

Make announcing submounts optional, so submounts are only announced to
the guest with the announce_submounts option.  Some users may prefer the
current behavior, so that the guest learns nothing about the host mount
structure.

(announce_submounts is force-disabled when the guest does not present
the FUSE_SUBMOUNTS capability, or when there is no statx().)

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-6-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

virtiofsd: Add mount ID to the lo_inode key

Using st_dev is not sufficient to uniquely identify a mount: You can
mount the same device twice, but those are still separate trees, and
e.g. by mounting something else inside one of them, they may differ.

Using statx(), we can get a mount ID that uniquely identifies a mount.
If that is available, add it to the lo_inode key.

Most of this patch is taken from Miklos's mail here:
https://marc.info/?l=fuse-devel&m=160062521827983
(virtiofsd-use-mount-id.patch attachment)

Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-5-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

meson.build: Check for statx()

Check whether the glibc provides statx() and if so, define CONFIG_STATX.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-4-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

virtiofsd: Add attr_flags to fuse_entry_param

fuse_entry_param is converted to fuse_attr on the line (by
fill_entry()), so it should have a member that mirrors fuse_attr.flags.

fill_entry() should then copy this fuse_entry_param.attr_flags to
fuse_attr.flags.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-3-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

virtiofsd: Check FUSE_SUBMOUNTS

FUSE_SUBMOUNTS is a pure indicator by the kernel to signal that it
supports submounts. It does not check its state in the init reply, so
there is nothing for fuse_lowlevel.c to do but to check its existence
and copy it into fuse_conn_info.capable.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20201102161859.156603-2-mreitz@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

virtiofsd: Fix the help message of posix lock

The commit 88fc107956a5812649e5918e0c092d3f78bb28ad disabled remote
posix locks by default. But the --help message still says it is enabled
by default. So fix it to output no_posix_lock.

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Message-Id: <20201027081558.29904-1-zhangjiachen.jaycee@bytedance.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

tools/virtiofsd: Check vu_init() return value (CID 1435958)

Since commit 6f5fd837889, vu_init() can fail if malloc() returns NULL.

This fixes the following Coverity warning:

CID 1435958 (#1 of 1): Unchecked return value (CHECKED_RETURN)

Fixes: 6f5fd837889 ("libvhost-user: support many virtqueues")
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20201102092339.2034297-1-philmd@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

virtiofsd: Seccomp: Add 'send' for syslog

On ppc, and some other archs, it looks like syslog ends up using 'send'
rather than 'sendto'.

Reference: https://github.com/kata-containers/kata-containers/issues/1050

Reported-by: amulmek1@in.ibm.com
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201102150750.34565-1-dgilbert@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Postpone the kick of the fault thread after recover

The new migrate_send_rp_req_pages_pending() call should greatly improve
destination responsiveness because it will resync faulted address after
postcopy recovery. However it is also the 1st place to initiate the page
request from the main thread.

One thing is overlooked on that migrate_send_rp_message_req_pages() is not
designed to be thread-safe. So if we wake the fault thread before syncing all
the faulted pages in the main thread, it means they can race.

Postpone the wake up operation after the sync of faulted addresses.

Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery")
Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20201102153010.11979-3-peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

migration: Unify reset of last_rb on destination node when recover

When postcopy recover happens, we need to reset last_rb after each return of
postcopy_pause_fault_thread() because that means we just got the postcopy
migration continued.

Unify this reset to the place right before we want to kick the fault thread
again, when we get the command MIG_CMD_POSTCOPY_RESUME from source.

This is actually more than that - because the main thread on destination will
now be able to call migrate_send_rp_req_pages_pending() too, so the fault
thread is not the only user of last_rb now. Move the reset earlier will allow
the first call to migrate_send_rp_req_pages_pending() to use the reset value
even if called from the main thread.

(NOTE: this is not a real fix to 0c26781c09 mentioned below, however it is just
a mark that when picking up 0c26781c09 we'd better have this one too; the real
fix will come later)

Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery")
Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20201102153010.11979-2-peterx@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20201102' into staging

nvme pull 2 Nov 2020

# gpg: Signature made Mon 02 Nov 2020 15:20:30 GMT
# gpg:                using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0
# gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown]
# gpg:                 aka "Keith Busch <keith.busch@gmail.com>" [unknown]
# gpg:                 aka "Keith Busch <keith.busch@intel.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5  02EC 6251 5661 0A4F 6CC0

* remotes/nvme/tags/pull-nvme-20201102: (30 commits)
  hw/block/nvme: fix queue identifer validation
  hw/block/nvme: fix create IO SQ/CQ status codes
  hw/block/nvme: fix prp mapping status codes
  hw/block/nvme: report actual LBA data shift in LBAF
  hw/block/nvme: add trace event for requests with non-zero status code
  hw/block/nvme: add nsid to get/setfeat trace events
  hw/block/nvme: reject io commands if only admin command set selected
  hw/block/nvme: support for admin-only command set
  hw/block/nvme: validate command set selected
  hw/block/nvme: support per-namespace smart log
  hw/block/nvme: fix log page offset check
  hw/block/nvme: remove pointless rw indirection
  hw/block/nvme: update nsid when registered
  hw/block/nvme: change controller pci id
  pci: allocate pci id for nvme
  hw/block/nvme: support multiple namespaces
  hw/block/nvme: refactor identify active namespace id list
  hw/block/nvme: add support for sgl bit bucket descriptor
  hw/block/nvme: add support for scatter gather lists
  hw/block/nvme: harden cmb access
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tests/qtest/npcm7xx_rng-test: Disable randomness tests

The randomness tests in the NPCM7xx RNG test fail intermittently
but fairly frequently. On my machine running the test in a loop:
while QTEST_QEMU_BINARY=./qemu-system-aarch64 ./tests/qtest/npcm7xx_rng-test; do true; done

will fail in less than a minute with an error like:
ERROR:../../tests/qtest/npcm7xx_rng-test.c:256:test_first_byte_runs:
assertion failed (calc_runs_p(buf.l, sizeof(buf) * BITS_PER_BYTE) > 0.01): (0.00286205989 > 0.01)

(Failures have been observed on all 4 of the randomness tests,
not just first_byte_runs.)

It's not clear why these tests are failing like this, but intermittent
failures make CI and merge testing awkward, so disable running them
unless a developer specifically sets QEMU_TEST_FLAKY_RNG_TESTS when
running the test suite, until we work out the cause.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20201102152454.8287-1-peter.maydell@linaro.org
Reviewed-by: Havard Skinnemoen <hskinnemoen@google.com>

qemu-option-trace.rst.inc: Don't use option:: markup

Sphinx 3.2 is pickier than earlier versions about the option:: markup,
and complains about our usage in qemu-option-trace.rst:

../../docs/qemu-option-trace.rst.inc:4:Malformed option description
  '[enable=]PATTERN', should look like "opt", "-opt args", "--opt args",
  "/opt args" or "+opt args"

In this file, we're really trying to document the different parts of
the top-level --trace option, which qemu-nbd.rst and qemu-img.rst
have already introduced with an option:: markup.  So it's not right
to use option:: here anyway.  Switch to a different markup
(definition lists) which gives about the same formatted output.

(Unlike option::, this markup doesn't produce index entries; but
at the moment we don't do anything much with indexes anyway, and
in any case I think it doesn't make much sense to have individual
index entries for the sub-parts of the --trace option.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-3-peter.maydell@linaro.org

scripts/kerneldoc: For Sphinx 3 use c:macro for macros with arguments

The kerneldoc script currently emits Sphinx markup for a macro with
arguments that uses the c:function directive. This is correct for
Sphinx versions earlier than Sphinx 3, where c:macro doesn't allow
documentation of macros with arguments and c:function is not picky
about the syntax of what it is passed. However, in Sphinx 3 the
c:macro directive was enhanced to support macros with arguments,
and c:function was made more picky about what syntax it accepted.

When kerneldoc is told that it needs to produce output for Sphinx
3 or later, make it emit c:function only for functions and c:macro
for macros with arguments. We assume that anything with a return
type is a function and anything without is a macro.

This fixes the Sphinx error:

/home/petmay01/linaro/qemu-from-laptop/qemu/docs/../include/qom/object.h:155:Error in declarator
If declarator-id with parameters (e.g., 'void f(int arg)'):
  Invalid C declaration: Expected identifier in nested name. [error at 25]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    -------------------------^
If parenthesis in noptr-declarator (e.g., 'void (*f(int arg))(double)'):
  Error in declarator or parameters
  Invalid C declaration: Expecting "(" in parameters. [error at 39]
    DECLARE_INSTANCE_CHECKER ( InstanceType,  OBJ_NAME,  TYPENAME)
    ---------------------------------------^

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 20201030174700.7204-2-peter.maydell@linaro.org

hw/intc/arm_gicv3_cpuif: Make GIC maintenance interrupts work

In gicv3_init_cpuif() we copy the ARMCPU gicv3_maintenance_interrupt
into the GICv3CPUState struct's maintenance_irq field.  This will
only work if the board happens to have already wired up the CPU
maintenance IRQ before the GIC was realized.  Unfortunately this is
not the case for the 'virt' board, and so the value that gets copied
is NULL (since a qemu_irq is really a pointer to an IRQState struct
under the hood).  The effect is that the CPU interface code never
actually raises the maintenance interrupt line.

Instead, since the GICv3CPUState has a pointer to the CPUState, make
the dereference at the point where we want to raise the interrupt, to
avoid an implicit requirement on board code to wire things up in a
particular order.

Reported-by: Jose Martins <josemartins90@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201009153904.28529-1-peter.maydell@linaro.org
Reviewed-by: Luc Michel <luc@lmichel.fr>

configure: Test that gio libs from pkg-config work

On some hosts (eg Ubuntu Bionic) pkg-config returns a set of
libraries for gio-2.0 which don't actually work when compiling
statically. (Specifically, the returned library string includes
-lmount, but not -lblkid which -lmount depends upon, so linking
fails due to missing symbols.)

Check that the libraries work, and don't enable gio if they don't,
in the same way we do for gnutls.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20200928160402.7961-1-peter.maydell@linaro.org

target/arm: Get correct MMU index for other-security-state

In arm_v7m_mmu_idx_for_secstate() we get the 'priv' level to pass to
armv7m_mmu_idx_for_secstate_and_priv() by calling arm_current_el().
This is incorrect when the security state being queried is not the
current one, because arm_current_el() uses the current security state
to determine which of the banked CONTROL.nPRIV bits to look at.
The effect was that if (for instance) Secure state was in privileged
mode but Non-Secure was not then we would return the wrong MMU index.

The only places where we are using this function in a way that could
trigger this bug are for the stack loads during a v8M function-return
and for the instruction fetch of a v8M SG insn.

Fix the bug by expanding out the M-profile version of the
arm_current_el() logic inline so it can use the passed in secstate
rather than env->v7m.secure.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201022164408.13214-1-peter.maydell@linaro.org

hw/display/exynos4210_fimd: Fix potential NULL pointer dereference

In exynos4210_fimd_update(), the pointer s is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to global_width after checking that the s is valid.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 5F9F8D88.9030102@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/display/omap_lcdc: Fix potential NULL pointer dereference

In omap_lcd_interrupts(), the pointer omap_lcd is dereferinced before
being check if it is valid, which may lead to NULL pointer dereference.
So move the assignment to surface after checking that the omap_lcd is valid
and move surface_bits_per_pixel(surface) to after the surface assignment.

Reported-by: Euler Robot <euler.robot@huawei.com>
Signed-off-by: AlexChen <alex.chen@huawei.com>
Message-id: 5F9CDB8A.9000001@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/boot: fix SVE for EL3 direct kernel boot

When booting a CPU with EL3 using the -kernel flag, set up CPTR_EL3 so
that SVE will not trap to EL3.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030151541.11976-1-remi@remlab.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/smmuv3: Fix potential integer overflow (CID 1432363)

Use the BIT_ULL() macro to ensure we use 64-bit arithmetic.
This fixes the following Coverity issue (OVERFLOW_BEFORE_WIDEN):

  CID 1432363 (#1 of 1): Unintentional integer overflow:

  overflow_before_widen:
    Potentially overflowing expression 1 << scale with type int
    (32 bits, signed) is evaluated using 32-bit arithmetic, and
    then used in a context that expects an expression of type
    hwaddr (64 bits, unsigned).

Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Acked-by: Eric Auger <eric.auger@redhat.com>
Message-id: 20201030144617.1535064-1-philmd@redhat.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

disas/capstone: Fix monitor disassembly of >32 bytes

If we're using the capstone disassembler, disassembly of a run of
instructions more than 32 bytes long disassembles the wrong data for
instructions beyond the 32 byte mark:

(qemu) xp /16x 0x100
0000000000000100: 0x00000005 0x54410001 0x00000001 0x00001000
0000000000000110: 0x00000000 0x00000004 0x54410002 0x3c000000
0000000000000120: 0x00000000 0x00000004 0x54410009 0x74736574
0000000000000130: 0x00000000 0x00000000 0x00000000 0x00000000
(qemu) xp /16i 0x100
0x00000100: 00000005 andeq r0, r0, r5
0x00000104: 54410001 strbpl r0, [r1], #-1
0x00000108: 00000001 andeq r0, r0, r1
0x0000010c: 00001000 andeq r1, r0, r0
0x00000110: 00000000 andeq r0, r0, r0
0x00000114: 00000004 andeq r0, r0, r4
0x00000118: 54410002 strbpl r0, [r1], #-2
0x0000011c: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x00000120: 54410001 strbpl r0, [r1], #-1
0x00000124: 00000001 andeq r0, r0, r1
0x00000128: 00001000 andeq r1, r0, r0
0x0000012c: 00000000 andeq r0, r0, r0
0x00000130: 00000004 andeq r0, r0, r4
0x00000134: 54410002 strbpl r0, [r1], #-2
0x00000138: 3c000000 .byte 0x00, 0x00, 0x00, 0x3c
0x0000013c: 00000000 andeq r0, r0, r0

Here the disassembly of 0x120..0x13f is using the data that is in
0x104..0x123.

This is caused by passing the wrong value to the read_memory_func().
The intention is that at this point in the loop the 'cap_buf' buffer
already contains 'csize' bytes of data for the instruction at guest
addr 'pc', and we want to read in an extra 'tsize' bytes. Those
extra bytes are therefore at 'pc + csize', not 'pc'. On the first
time through the loop 'csize' happens to be zero, so the initial read
of 32 bytes into cap_buf is correct and as long as the disassembly
never needs to read more data we return the correct information.

Use the correct guest address in the call to read_memory_func().

Cc: qemu-stable@nongnu.org
Fixes: https://bugs.launchpad.net/qemu/+bug/1900779
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201022132445.25039-1-peter.maydell@linaro.org

target/arm: fix LORID_EL1 access check

Secure mode is not exempted from checking SCR_EL3.TLOR, and in the
future HCR_EL2.TLOR when S-EL2 is enabled.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: fix handling of HCR.FB

HCR should be applied when NS is set, not when it is cleared.

Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Fix VUDOT/VSDOT (scalar) on big-endian hosts

The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.

For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-3-peter.maydell@linaro.org

target/arm: Fix float16 pairwise Neon ops on big-endian hosts

In the neon_padd/pmax/pmin helpers for float16, a cut-and-paste error
meant we were using the H4() address swizzler macro rather than the
H2() which is required for 2-byte data. This had no effect on
little-endian hosts but meant we put the result data into the
destination Dreg in the wrong order on big-endian hosts.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20201028191712.4910-2-peter.maydell@linaro.org

target/arm: Improve do_prewiden_3d

We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Simplify do_long_3d and do_2scalar_long

In both cases, we can sink the write-back and perform
the accumulate into the normal destination temps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-11-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Rename neon_load_reg64 to vfp_load_reg64

The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Add read/write_neon_element64

Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Rename neon_load_reg32 to vfp_load_reg32

The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Expand read/write_neon_element32 to all MemOp

We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.

These were the last uses of neon_load/store_reg, so remove them.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Add read/write_neon_element32

Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc. The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Use neon_element_offset in vfp_reg_offset

This seems a bit more readable than using offsetof CPU_DoubleU.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Use neon_element_offset in neon_load/store_reg

These are the only users of neon_reg_offset, so remove that.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Move neon_element_offset to translate.c

This will shortly have users outside of translate-neon.c.inc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Introduce neon_full_reg_offset

This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0. This fixes a bug
when running on a big-endian host.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20201102' into staging

xen patch

- Rework Xen disk unplug to work with newer command line
  options.

# gpg: Signature made Mon 02 Nov 2020 14:42:37 GMT
# gpg:                using RSA key F80C006308E22CFD8A92E7980CF5572FD7FB55AF
# gpg:                issuer "anthony.perard@citrix.com"
# gpg: Good signature from "Anthony PERARD <anthony.perard@gmail.com>" [marginal]
# gpg:                 aka "Anthony PERARD <anthony.perard@citrix.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 5379 2F71 024C 600F 778A  7161 D8D5 7199 DF83 42C8
#      Subkey fingerprint: F80C 0063 08E2 2CFD 8A92  E798 0CF5 572F D7FB 55AF

* remotes/aperard/tags/pull-xen-20201102:
  xen: rework pci_piix3_xen_ide_unplug

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

xen: rework pci_piix3_xen_ide_unplug

This is to allow IDE disks to be unplugged when adding to QEMU via:
    -drive file=/root/disk_file,if=none,id=ide-disk0,format=raw
    -device ide-hd,drive=ide-disk0,bus=ide.0,unit=0

as the current code only works for disk added with:
    -drive file=/root/disk_file,if=ide,index=0,media=disk,format=raw

Since the code already have the IDE controller as `dev`, we don't need
to use the legacy DriveInfo to find all the drive we want to unplug.
We can simply use `blk` from the controller, as it kind of was already
assume to be the same, by setting it to NULL.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: John Snow <jsnow@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Message-Id: <20201027154058.495112-1-anthony.perard@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>

Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' into staging

9pfs: only test case changes this time

* Fix occasional test failures with parallel tests.

* Fix coverity error in test code.

* Avoid error when auto removing test directory if it disappeared
  for some reason.

* Refactor: Rename functions to make top-level test functions fs_*()
  easily distinguishable from utility test functions do_*().

* Refactor: Drop unnecessary function arguments in utility test
  functions.

* More test cases using the 9pfs 'local' filesystem driver backend,
  namely for the following 9p requests: Tunlinkat, Tlcreate, Tsymlink
  and Tlink.

# gpg: Signature made Mon 02 Nov 2020 09:31:35 GMT
# gpg:                using RSA key 96D8D110CF7AF8084F88590134C2B58765A47395
# gpg:                issuer "qemu_oss@crudebyte.com"
# gpg: Good signature from "Christian Schoenebeck <qemu_oss@crudebyte.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: ECAB 1A45 4014 1413 BA38  4926 30DB 47C3 A012 D5F4
#      Subkey fingerprint: 96D8 D110 CF7A F808 4F88  5901 34C2 B587 65A4 7395

* remotes/cschoenebeck/tags/pull-9p-20201102:
  tests/9pfs: add local Tunlinkat hard link test
  tests/9pfs: add local Tlink test
  tests/9pfs: add local Tunlinkat symlink test
  tests/9pfs: add local Tsymlink test
  tests/9pfs: add local Tunlinkat file test
  tests/9pfs: add local Tlcreate test
  tests/9pfs: add local Tunlinkat directory test
  tests/9pfs: simplify do_mkdir()
  tests/9pfs: Turn fs_mkdir() into a helper
  tests/9pfs: Turn fs_readdir_split() into a helper
  tests/9pfs: Factor out do_attach() helper
  tests/9pfs: Set alloc in fs_create_dir()
  tests/9pfs: Factor out do_version() helper
  tests/9pfs: Force removing of local 9pfs test directory
  tests/9pfs: fix coverity error in create_local_test_dir()
  tests/9pfs: fix test dir for parallel tests
  tests/9pfs: make create/remove test dir public

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20201101.0' into staging

VFIO update 2020-11-01

* Migration support (Kirti Wankhede)
* s390 DMA limiting (Matthew Rosato)
* zPCI hardware info (Matthew Rosato)
* Lock guard (Amey Narkhede)
* Print fixes (Zhengui li)
* Warning/build fixes

# gpg: Signature made Sun 01 Nov 2020 20:38:10 GMT
# gpg:                using RSA key 239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full]
# gpg:                 aka "Alex Williamson <alex@shazbot.org>" [full]
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>" [full]
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>" [full]
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-update-20201101.0: (32 commits)
  vfio: fix incorrect print type
  hw/vfio: Use lock guard macros
  s390x/pci: get zPCI function info from host
  vfio: Add routine for finding VFIO_DEVICE_GET_INFO capabilities
  s390x/pci: use a PCI Function structure
  s390x/pci: clean up s390 PCI groups
  s390x/pci: use a PCI Group structure
  s390x/pci: create a header dedicated to PCI CLP
  s390x/pci: Honor DMA limits set by vfio
  s390x/pci: Add routine to get the vfio dma available count
  vfio: Find DMA available capability
  vfio: Create shared routine for scanning info capabilities
  s390x/pci: Move header files to include/hw/s390x
  linux-headers: update against 5.10-rc1
  update-linux-headers: Add vfio_zdev.h
  qapi: Add VFIO devices migration stats in Migration stats
  vfio: Make vfio-pci device migration capable
  vfio: Add ioctl to get dirty pages bitmap during dma unmap
  vfio: Dirty page tracking when vIOMMU is enabled
  vfio: Add vfio_listener_log_sync to mark dirty pages
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

vfio: fix incorrect print type

The type of input variable is unsigned int
while the printer type is int. So fix incorrect print type.

Signed-off-by: Zhengui li <lizhengui@huawei.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

hw/vfio: Use lock guard macros

Use qemu LOCK_GUARD macros in hw/vfio.
Saves manual unlock calls

Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390x/pci: get zPCI function info from host

We use the capability chains of the VFIO_DEVICE_GET_INFO ioctl to retrieve
the CLP information that the kernel exports.

To be compatible with previous kernel versions we fall back on previous
predefined values, same as the emulation values, when the ioctl is found
to not support capability chains. If individual CLP capabilities are not
found, we fall back on default values for only those capabilities missing
from the chain.

This patch is based on work previously done by Pierre Morel.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: non-Linux build fixes]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Add routine for finding VFIO_DEVICE_GET_INFO capabilities

Now that VFIO_DEVICE_GET_INFO supports capability chains, add a helper
function to find specific capabilities in the chain.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390x/pci: use a PCI Function structure

We use a ClpRspQueryPci structure to hold the information related to a
zPCI Function.

This allows us to be ready to support different zPCI functions and to
retrieve the zPCI function information from the host.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390x/pci: clean up s390 PCI groups

Add a step to remove all stashed PCI groups to avoid stale data between
machine resets.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390x/pci: use a PCI Group structure

We use a S390PCIGroup structure to hold the information related to a
zPCI Function group.

This allows us to be ready to support multiple groups and to retrieve
the group information from the host.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390x/pci: create a header dedicated to PCI CLP

To have a clean separation between s390-pci-bus.h and s390-pci-inst.h
headers we export the PCI CLP instructions in a dedicated header.

Signed-off-by: Pierre Morel <pmorel@linux.ibm.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390x/pci: Honor DMA limits set by vfio

When an s390 guest is using lazy unmapping, it can result in a very
large number of oustanding DMA requests, far beyond the default
limit configured for vfio. Let's track DMA usage similar to vfio
in the host, and trigger the guest to flush their DMA mappings
before vfio runs out.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: non-Linux build fixes]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390x/pci: Add routine to get the vfio dma available count

Create new files for separating out vfio-specific work for s390
pci. Add the first such routine, which issues VFIO_IOMMU_GET_INFO
ioctl to collect the current dma available count.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: Fix non-Linux build with CONFIG_LINUX]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Find DMA available capability

The underlying host may be limiting the number of outstanding DMA
requests for type 1 IOMMU. Add helper functions to check for the
DMA available capability and retrieve the current number of DMA
mappings allowed.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[aw: vfio_get_info_dma_avail moved inside CONFIG_LINUX]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Create shared routine for scanning info capabilities

Rather than duplicating the same loop in multiple locations,
create a static function to do the work.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

s390x/pci: Move header files to include/hw/s390x

Seems a more appropriate location for them.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

linux-headers: update against 5.10-rc1

commit 3650b228f83adda7e5ee532e2b90429c03f7b9ec

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
[aw: drop pvrdma_ring.h changes to avoid revert of d73415a31547 ("qemu/atomic.h: rename atomic_ to qatomic_")]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

update-linux-headers: Add vfio_zdev.h

vfio_zdev.h is used by s390x zPCI support to pass device-specific
CLP information between host and userspace.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

qapi: Add VFIO devices migration stats in Migration stats

Added amount of bytes transferred to the VM at destination by all VFIO
devices

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Make vfio-pci device migration capable

If the device is not a failover primary device, call
vfio_migration_probe() and vfio_migration_finalize() to enable
migration support for those devices that support it respectively to
tear it down again.
Removed migration blocker from VFIO PCI device specific structure and use
migration blocker from generic structure of VFIO device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Add ioctl to get dirty pages bitmap during dma unmap

With vIOMMU, IO virtual address range can get unmapped while in pre-copy
phase of migration. In that case, unmap ioctl should return pages pinned
in that range and QEMU should find its correcponding guest physical
addresses and report those dirty.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
[aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Dirty page tracking when vIOMMU is enabled

When vIOMMU is enabled, register MAP notifier from log_sync when all
devices in container are in stop and copy phase of migration. Call replay
and get dirty pages from notifier callback.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Add vfio_listener_log_sync to mark dirty pages

vfio_listener_log_sync gets list of dirty pages from container using
VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
devices are stopped and saving state.
Return early for the RAM block section of mapped MMIO region.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
[aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Add function to start and stop dirty pages tracking

Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
for VFIO devices.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Get migration capability flags for container

Added helper functions to get IOMMU info capability chain.
Added function to get migration capability information from that
capability chain for IOMMU container.

Similar change was proposed earlier:
https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg03759.html

Disable migration for devices if IOMMU module doesn't support migration
capability.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Cc: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled

mr->ram_block is NULL when mr->is_iommu is true, then fr.dirty_log_mask
wasn't set correctly due to which memory listener's log_sync doesn't
get called.
This patch returns log_mask with DIRTY_MEMORY_MIGRATION set when
IOMMU is enabled.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Add load state functions to SaveVMHandlers

Sequence during _RESUMING device state:
While data for this device is available, repeat below steps:
a. read data_offset from where user application should write data.
b. write data of data_size to migration region from data_offset.
c. write data_size which indicates vendor driver that data is written in
staging buffer.

For user, data is opaque. User should write data in the same order as
received.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Add save state functions to SaveVMHandlers

Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
  buffer.
- read data_size - amount of data in bytes written by vendor driver in
  migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
   doesn't need to be from vendor driver. Any other special config state
   from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
   buffer.
d. read data_size - amount of data in bytes written by vendor driver in
   migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
   {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Added fix suggested by Artem Polyakov to reset pending_bytes in
vfio_save_iterate().
Added fix suggested by Zhi Wang to add 0 as data size in migration stream and
add END_OF_STATE delimiter to indicate phase complete.

Suggested-by: Artem Polyakov <artemp@nvidia.com>
Suggested-by: Zhi Wang <zhi.wang.linux@gmail.com>
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Register SaveVMHandlers for VFIO device

Define flags to be used as delimiter in migration stream for VFIO devices.
Added .save_setup and .save_cleanup functions. Map & unmap migration
region from these functions at source during saving or pre-copy phase.

Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio: Add migration state change notifier

Added migration state change notifier to get notification on migration state
change. These states are translated to VFIO device state and conveyed to
vendor driver.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
Reviewed-by: Neo Jia <cjia@nvidia.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>