git.proxmox.com Git - mirror

qemu-io: Allow larger write zeroes under no fallback

When writing zeroes can fall back to a slow write, permitting an
overly large request can become an amplification denial of service
attack in triggering a large amount of work from a small request. But
the whole point of the no fallback flag is to quickly determine if
writing an entire device to zero can be done quickly (such as when it
is already known that the device started with zero contents); in those
cases, artificially capping things at 2G in qemu-io itself doesn't
help us.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211203231539.3900865-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

qemu-io: Utilize 64-bit status during map

The block layer has supported 64-bit block status from drivers since
commit 86a3d5c688 ("block: Add .bdrv_co_block_status() callback",
v2.12) and friends, with individual driver callbacks responsible for
capping things where necessary.  Artificially capping things below 2G
in the qemu-io 'map' command, added in commit d6a644bbfe ("block: Make
bdrv_is_allocated() byte-based", v2.10) is thus no longer necessary.

One way to test this is with qemu-nbd as server on a raw file larger
than 4G (the entire file should show as allocated), plus 'qemu-io -f
raw -c map nbd://localhost --trace=nbd_\*' as client.  Prior to this
patch, the NBD_CMD_BLOCK_STATUS requests are fragmented at 0x7ffffe00
distances; with this patch, the fragmenting changes to 0x7fffffff
(since the NBD protocol is currently still limited to 32-bit
transactions - see block/nbd.c:nbd_client_co_block_status).  Then in
later patches, once I add an NBD extension for a 64-bit block status,
the same map command completes with just one NBD_CMD_BLOCK_STATUS.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211203231539.3900865-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

nbd/server: Minor cleanups

Spelling fixes, grammar improvements and consistent spacing, noticed
while preparing other patches in this file.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20211203231539.3900865-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

tests/qemu-iotests: validate NBD TLS with UNIX sockets and PSK

This validates that connections to an NBD server running on a UNIX
socket can use TLS with pre-shared keys (PSK).

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-13-berrange@redhat.com>
[eblake: squash in rebase fix]
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

tests/qemu-iotests: validate NBD TLS with UNIX sockets

This validates that connections to an NBD server running on a UNIX
socket can use TLS, and require a TLS hostname override to pass
certificate validation.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-12-berrange@redhat.com>
[eblake: squash in rebase fix]
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

tests/qemu-iotests: validate NBD TLS with hostname mismatch

This validates that connections to an NBD server where the certificate
hostname does not match will fail. It further validates that using the
new 'tls-hostname' override option can solve the failure.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-11-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

tests/qemu-iotests: convert NBD TLS test to use standard filters

Using standard filters is more future proof than rolling our own.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-10-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

tests/qemu-iotests: introduce filter for qemu-nbd export list

Introduce a filter for the output of qemu-nbd export list so it can be
reused in multiple tests.

The filter is a bit more permissive that what test 241 currently uses,
as its allows printing of the export count, along with any possible
error messages that might be emitted.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-9-berrange@redhat.com>
Tested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

tests/qemu-iotests: expand _filter_nbd rules

Some tests will want to use 'localhost' instead of '127.0.0.1', and
some will use the image options syntax rather than the classic URI
syntax.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-8-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

tests/qemu-iotests: add QEMU_IOTESTS_REGEN=1 to update reference file

When developing an I/O test it is typical to add some logic to the
test script, run it to view the output diff, and then apply the
output diff to the reference file. This can be drastically simplified
by letting the test runner update the reference file in place.

By setting 'QEMU_IOTESTS_REGEN=1', the test runner will report the
failure and show the diff, but at the same time update the reference
file. So next time the I/O test is run it will succeed.

Continuing to display the diff when updating the reference gives the
developer a chance to review what was changed.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-7-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

block/nbd: don't restrict TLS usage to IP sockets

The TLS usage for NBD was restricted to IP sockets because validating
x509 certificates requires knowledge of the hostname that the client
is connecting to.

TLS does not have to use x509 certificates though, as PSK (pre-shared
keys) provide an alternative credential option. These have no
requirement for a hostname and can thus be trivially used for UNIX
sockets.

Furthermore, with the ability to overide the default hostname for
TLS validation in the previous patch, it is now also valid to want
to use x509 certificates with FD passing and UNIX sockets.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-6-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

qemu-nbd: add --tls-hostname option for TLS certificate validation

When using the --list option, qemu-nbd acts as an NBD client rather
than a server. As such when using TLS, it has a need to validate
the server certificate. This adds a --tls-hostname option which can
be used to override the default hostname used for certificate
validation.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-5-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

block/nbd: support override of hostname for TLS certificate validation

When connecting to an NBD server with TLS and x509 credentials,
the client must validate the hostname it uses for the connection,
against that published in the server's certificate. If the client
is tunnelling its connection over some other channel, however, the
hostname it uses may not match the info reported in the server's
certificate. In such a case, the user needs to explicitly set an
override for the hostname to use for certificate validation.

This is achieved by adding a 'tls-hostname' property to the NBD
block driver.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-4-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

block: pass desired TLS hostname through from block driver client

In

  commit a71d597b989fd701b923f09b3c20ac4fcaa55e81
  Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
  Date:   Thu Jun 10 13:08:00 2021 +0300

    block/nbd: reuse nbd_co_do_establish_connection() in nbd_open()

the use of the 'hostname' field from the BDRVNBDState struct was
lost, and 'nbd_connect' just hardcoded it to match the IP socket
address. This was a harmless bug at the time since we block use
with anything other than IP sockets.

Shortly though, we want to allow the caller to override the hostname
used in the TLS certificate checks. This is to allow for TLS
when doing port forwarding or tunneling. Thus we need to reinstate
the passing along of the 'hostname'.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-3-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

crypto: mandate a hostname when checking x509 creds on a client

Currently the TLS session object assumes that the caller will always
provide a hostname when using x509 creds on a client endpoint. This
relies on the caller to detect and report an error if the user has
configured QEMU with x509 credentials on a UNIX socket. The migration
code has such a check, but it is too broad, reporting an error when
the user has configured QEMU with PSK credentials on a UNIX socket,
where hostnames are irrelevant.

Putting the check into the TLS session object credentials validation
code ensures we report errors in only the scenario that matters.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220304193610.3293146-2-berrange@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

Merge remote-tracking branch 'remotes/hreitz-gitlab/tags/pull-block-2022-03-07' into staging

Block patches for 7.0-rc0:
- New fleecing backup scheme
- iotest fixes
- Fixes for the curl block driver
- Fix for the preallocate block driver
- IDE fix for zero-length TRIM requests

# gpg: Signature made Mon 07 Mar 2022 10:33:31 GMT
# gpg:                using RSA key CB62D7A0EE3829E45F004D34A1FA40D098019CDF
# gpg:                issuer "hreitz@redhat.com"
# gpg: Good signature from "Hanna Reitz <hreitz@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: CB62 D7A0 EE38 29E4 5F00  4D34 A1FA 40D0 9801 9CDF

* remotes/hreitz-gitlab/tags/pull-block-2022-03-07: (23 commits)
  iotests/image-fleecing: test push backup with fleecing
  iotests/image-fleecing: add test case with bitmap
  iotests.py: add qemu_io_pipe_and_status()
  iotests/image-fleecing: add test-case for fleecing format node
  block: copy-before-write: realize snapshot-access API
  block: introduce snapshot-access block driver
  block/io: introduce block driver snapshot-access API
  block/reqlist: add reqlist_wait_all()
  block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()
  block/reqlist: reqlist_find_conflict(): use ranges_overlap()
  block: intoduce reqlist
  block/block-copy: add block_copy_reset()
  block/copy-before-write: add bitmap open parameter
  block/block-copy: block_copy_state_new(): add bitmap parameter
  block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value
  block/block-copy: move copy_bitmap initialization to block_copy_state_new()
  iotests: Write test output to TEST_DIR
  tests/qemu-iotests/testrunner: Quote "case not run" lines in TAP mode
  tests/qemu-iotests/040: Skip TestCommitWithFilters without 'throttle'
  block: fix preallocate filter: don't do unaligned preallocate requests
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

iotests/image-fleecing: test push backup with fleecing

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-17-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

iotests/image-fleecing: add test case with bitmap

Note that reads zero areas (not dirty in the bitmap) fails, that's
correct.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-16-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

iotests.py: add qemu_io_pipe_and_status()

Add helper that returns both status and output, to be used in the
following commit

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-15-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

iotests/image-fleecing: add test-case for fleecing format node

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-14-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block: copy-before-write: realize snapshot-access API

Current scheme of image fleecing looks like this:

[guest]                    [NBD export]
  |                              |
  |root                          | root
  v                              v
[copy-before-write] -----> [temp.qcow2]
  |                 target  |
  |file                     |backing
  v                         |
[active disk] <-------------+

- On guest writes copy-before-write filter copies old data from active
   disk to temp.qcow2. So fleecing client (NBD export) when reads
   changed regions from temp.qcow2 image and unchanged from active disk
   through backing link.

This patch makes possible new image fleecing scheme:

[guest]                   [NBD export]
   |                            |
   | root                       | root
   v                 file       v
[copy-before-write]<------[snapshot-access]
   |           |
   | file      | target
   v           v
[active-disk] [temp.img]

- copy-before-write does CBW operations and also provides
   snapshot-access API. The API may be accessed through
   snapshot-access driver.

Benefits of new scheme:

1. Access control: if remote client try to read data that not covered
   by original dirty bitmap used on copy-before-write open, client gets
   -EACCES.

2. Discard support: if remote client do DISCARD, this additionally to
   discarding data in temp.img informs block-copy process to not copy
   these clusters. Next read from discarded area will return -EACCES.
   This is significant thing: when fleecing user reads data that was
   not yet copied to temp.img, we can avoid copying it on further guest
   write.

3. Synchronisation between client reads and block-copy write is more
   efficient. In old scheme we just rely on BDRV_REQ_SERIALISING flag
   used for writes to temp.qcow2. New scheme is less blocking:
     - fleecing reads are never blocked: if data region is untouched or
       in-flight, we just read from active-disk, otherwise we read from
       temp.img
     - writes to temp.img are not blocked by fleecing reads
     - still, guest writes of-course are blocked by in-flight fleecing
       reads, that currently read from active-disk - it's the minimum
       necessary blocking

4. Temporary image may be of any format, as we don't rely on backing
   feature.

5. Permission relation are simplified. With old scheme we have to share
   write permission on target child of copy-before-write, otherwise
   backing link conflicts with copy-before-write file child write
   permissions. With new scheme we don't have backing link, and
   copy-before-write node may have unshared access to temporary node.
   (Not realized in this commit, will be in future).

6. Having control on fleecing reads we'll be able to implement
   alternative behavior on failed copy-before-write operations.
   Currently we just break guest request (that's a historical behavior
   of backup). But in some scenarios it's a bad behavior: better
   is to drop the backup as failed but don't break guest request.
   With new scheme we can simply unset some bits in a bitmap on CBW
   failure and further fleecing reads will -EACCES, or something like
   this. (Not implemented in this commit, will be in future)
   Additional application for this is implementing timeout for CBW
   operations.

Iotest 257 output is updated, as two more bitmaps now live in
copy-before-write filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-13-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block: introduce snapshot-access block driver

The new block driver simply utilizes snapshot-access API of underlying
block node.

In further patches we want to use it like this:

[guest]                   [NBD export]
   |                            |
   | root                       | root
   v                 file       v
[copy-before-write]<------[snapshot-access]
   |           |
   | file      | target
   v           v
[active-disk] [temp.img]

This way, NBD client will be able to read snapshotted state of active
disk, when active disk is continued to be written by guest. This is
known as "fleecing", and currently uses another scheme based on qcow2
temporary image which backing file is active-disk. New scheme comes
with benefits - see next commit.

The other possible application is exporting internal snapshots of
qcow2, like this:

[guest]          [NBD export]
   |                  |
   | root             | root
   v       file       v
[qcow2]<---------[snapshot-access]

For this, we'll need to implement snapshot-access API handlers in
qcow2 driver, and improve snapshot-access block driver (and API) to
make it possible to select snapshot by name. Another thing to improve
is size of snapshot. Now for simplicity we just use size of bs->file,
which is OK for backup, but for qcow2 snapshots export we'll need to
imporve snapshot-access API to get size of snapshot.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-12-vsementsov@virtuozzo.com>
[hreitz: Rebased on block GS/IO split]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/io: introduce block driver snapshot-access API

Add new block driver handlers and corresponding generic wrappers.
It will be used to allow copy-before-write filter to provide
reach fleecing interface in further commit.

In future this approach may be used to allow reading qcow2 internal
snapshots, for example to export them through NBD.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-11-vsementsov@virtuozzo.com>
[hreitz: Rebased on block GS/IO split]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/reqlist: add reqlist_wait_all()

Add function to wait for all intersecting requests.
To be used in the further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-10-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()

Add a convenient function similar with bdrv_block_status() to get
status of dirty bitmap.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-9-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/reqlist: reqlist_find_conflict(): use ranges_overlap()

Let's reuse convenient helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-8-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block: intoduce reqlist

Split intersecting-requests functionality out of block-copy to be
reused in copy-before-write filter.

Note: while being here, fix tiny typo in MAINTAINERS.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-7-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/block-copy: add block_copy_reset()

Split block_copy_reset() out of block_copy_reset_unallocated() to be
used separately later.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-6-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/copy-before-write: add bitmap open parameter

This brings "incremental" mode to copy-before-write filter: user can
specify bitmap so that filter will copy only "dirty" areas.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20220303194349.2304213-5-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/block-copy: block_copy_state_new(): add bitmap parameter

This will be used in the following commit to bring "incremental" mode
to copy-before-write filter.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-4-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return value

That simplifies handling failure in existing code and in further new
usage of bdrv_merge_dirty_bitmap().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-3-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/block-copy: move copy_bitmap initialization to block_copy_state_new()

We are going to complicate bitmap initialization in the further
commit. And in future, backup job will be able to work without filter
(when source is immutable), so we'll need same bitmap initialization in
copy-before-write filter and in backup job. So, it's reasonable to do
it in block-copy.

Note that for now cbw_open() is the only caller of
block_copy_state_new().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220303194349.2304213-2-vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

iotests: Write test output to TEST_DIR

Drop the use of OUTPUT_DIR (test/qemu-iotests under the build
directory), and instead write test output files (.out.bad, .notrun, and
.casenotrun) to TEST_DIR.

With this, the same test can be run concurrently without the separate
instances interfering, because they will need separate TEST_DIRs anyway.
Running the same test separately is useful when running the iotests with
various format/protocol combinations in parallel, or when you just want
to aggressively exercise a single test (e.g. when it fails only
sporadically).

Putting this output into TEST_DIR means that it will stick around for
inspection after the test run is done (though running the same test in
the same TEST_DIR will overwrite it, just as it used to be); but given
that TEST_DIR is a scratch directory, it should be clear that users can
delete all of its content at any point.  (And if TEST_DIR is on tmpfs,
it will just disappear on shutdown.)  Contrarily, alternative approaches
that would put these output files into OUTPUT_DIR with some prefix to
differentiate between separate test runs might easily lead to cluttering
OUTPUT_DIR.

(This change means OUTPUT_DIR is no longer written to by the iotests, so
we can drop its usage altogether.)

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220221172909.762858-1-hreitz@redhat.com>
[hreitz: Simplified `Path(os.path.join(x, y))` to `Path(x, y)`, as
         suggested by Vladimir; and rebased on 9086c7639822b6
         ("tests/qemu-iotests: Rework the checks and spots using GNU
         sed")]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

tests/qemu-iotests/testrunner: Quote "case not run" lines in TAP mode

In TAP mode, the stdout is reserved for the TAP protocol, so we
have to make sure to mark other lines with a comment '#' character
at the beginning to avoid that the TAP parser at the other end
gets confused.

To test this condition, run "configure" for example with:

--block-drv-rw-whitelist=copy-before-write,qcow2,raw,file,host_device,blkdebug,null-co,copy-on-read

so that iotest 041 will report that some tests are not run due to
the missing "quorum" driver. Without this change, "make check-block"
fails since the meson tap parser gets confused by these messages.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220223124353.3273898-1-thuth@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

tests/qemu-iotests/040: Skip TestCommitWithFilters without 'throttle'

iotest 040 already has some checks for the availability of the 'throttle'
driver, but some new code has been added in the course of time that
depends on 'throttle' but does not check for its availability. Add
a check to the TestCommitWithFilters class so that this iotest now
also passes again if 'throttle' has not been enabled in the QEMU
binaries.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220223123127.3206042-1-thuth@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block: fix preallocate filter: don't do unaligned preallocate requests

There is a bug in handling BDRV_REQ_NO_WAIT flag: we still may wait in
wait_serialising_requests() if request is unaligned. And this is
possible for the only user of this flag (preallocate filter) if
underlying file is unaligned to its request_alignment on start.

So, we have to fix preallocate filter to do only aligned preallocate
requests.

Next, we should fix generic block/io.c somehow. Keeping in mind that
preallocate is the only user of BDRV_REQ_NO_WAIT and that we have to
fix its behavior now, it seems more safe to just assert that we never
use BDRV_REQ_NO_WAIT with unaligned requests and add corresponding
comment. Let's do so.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Message-Id: <20220215121609.38570-1-vsementsov@virtuozzo.com>
[hreitz: Rebased on block GS/IO split]
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/curl.c: Check error return from curl_easy_setopt()

Coverity points out that we aren't checking the return value
from curl_easy_setopt() for any of the calls to it we make
in block/curl.c.

Some of these options are documented as always succeeding (e.g.
CURLOPT_VERBOSE) but others have documented failure cases (e.g.
CURLOPT_URL). For consistency we check every call, even the ones
that theoretically cannot fail.

Fixes: Coverity CID 1459336, 1459482, 1460331
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220222152341.850419-3-peter.maydell@linaro.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

block/curl.c: Set error message string if curl_init_state() fails

In curl_open(), the 'out' label assumes that the state->errmsg string
has been set (either by curl_easy_perform() or by manually copying a
string into it); however if curl_init_state() fails we will jump to
that label without setting the string. Add the missing error string
setup.

(We can't be specific about the cause of failure: the documentation
of curl_easy_init() just says "If this function returns NULL,
something went wrong".)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220222152341.850419-2-peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>

ide: Increment BB in-flight counter for TRIM BH

When we still have an AIOCB registered for DMA operations, we try to
settle the respective operation by draining the BlockBackend associated
with the IDE device.

However, this assumes that every DMA operation is associated with an
increment of the BlockBackend’s in-flight counter (e.g. through some
ongoing I/O operation), so that draining the BB until its in-flight
counter reaches 0 will settle all DMA operations.  That is not the case:
For TRIM, the guest can issue a zero-length operation that will not
result in any I/O operation forwarded to the BlockBackend, and also not
increment the in-flight counter in any other way.  In such a case,
blk_drain() will be a no-op if no other operations are in flight.

It is clear that if blk_drain() is a no-op, the value of
s->bus->dma->aiocb will not change between checking it in the `if`
condition and asserting that it is NULL after blk_drain().

The particular problem is that ide_issue_trim() creates a BH
(ide_trim_bh_cb()) to settle the TRIM request: iocb->common.cb() is
ide_dma_cb(), which will either create a new request, or find the
transfer to be done and call ide_set_inactive(), which clears
s->bus->dma->aiocb.  Therefore, the blk_drain() must wait for
ide_trim_bh_cb() to run, which currently it will not always do.

To fix this issue, we increment the BlockBackend's in-flight counter
when the TRIM operation begins (in ide_issue_trim(), when the
ide_trim_bh_cb() BH is created) and decrement it when ide_trim_bh_cb()
is done.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2029980
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220120142259.120189-1-hreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Tested-by: John Snow <jsnow@redhat.com>

Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220305' into staging

ppc-7.0 queue :

* Clang fixes
* Vector/VSX instruction batch fixes

# gpg: Signature made Sat 05 Mar 2022 08:18:06 GMT
# gpg:                using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1
# gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: A0F6 6548 F048 95EB FE6B  0B60 51A3 43C7 CFFB ECA1

* remotes/legoater/tags/pull-ppc-20220305:
  target/ppc: Add missing helper_reset_fpstatus to helper_XVCVSPBF16
  target/ppc: Add missing helper_reset_fpstatus to VSX_MAX_MINC
  target/ppc: split XXGENPCV macros for readability
  target/ppc: use andc in vrlqmi
  target/ppc: use extract/extract2 to create vrlqnm mask
  target/ppc: use ext32u and deposit in do_vx_vmulhw_i64
  target/ppc: Fix vmul[eo]* instructions marked 2.07
  tests/tcg/ppc64le: Use Altivec register names in clobber list
  tests/tcg/ppc64le: emit bcdsub with .long when needed
  tests/tcg/ppc64le: drop __int128 usage in bcdsub
  target/ppc: change xs[n]madd[am]sp to use float64r32_muladd
  tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf
  Use long endian options for ppc64

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/rth-gitlab/tags/pull-tcg-20220304' into staging

Reorder do_constant_folding_cond test to satisfy valgrind.
Fix value of MAX_OPC_PARAM_IARGS.
Add opcodes for vector nand, nor, eqv.
Support vector nand, nor, eqv on PPC and S390X hosts.
Support AVX512VL, AVX512BW, AVX512DQ, and AVX512VBMI2.

# gpg: Signature made Fri 04 Mar 2022 18:59:08 GMT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth-gitlab/tags/pull-tcg-20220304: (21 commits)
  tcg/i386: Implement bitsel for avx512
  tcg/i386: Implement more logical operations for avx512
  tcg/i386: Implement avx512 multiply
  tcg/i386: Implement avx512 min/max/abs
  tcg/i386: Expand scalar rotate with avx512 insns
  tcg/i386: Remove rotls_vec from tcg_target_op_def
  tcg/i386: Expand vector word rotate as avx512vbmi2 shift-double
  tcg/i386: Support avx512vbmi2 vector shift-double instructions
  tcg/i386: Implement avx512 variable rotate
  tcg/i386: Implement avx512 immediate rotate
  tcg/i386: Implement avx512 immediate sari shift
  tcg/i386: Implement avx512 scalar shift
  tcg/i386: Implement avx512 variable shifts
  tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinv
  tcg/i386: Add tcg_out_evex_opc
  tcg/i386: Detect AVX512
  tcg/s390x: Implement vector NAND, NOR, EQV
  tcg/ppc: Implement vector NAND, NOR, EQV
  tcg: Add opcodes for vector nand, nor, eqv
  tcg: Set MAX_OPC_PARAM_IARGS to 7
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kwolf-gitlab/tags/for-upstream' into staging

Block layer patches

- qemu-storage-daemon: Add --daemonize
- Fix x-blockdev-amend and block node activation code which incorrectly
  executed code in the iothread that must run in the main thread.
- Add macros for coroutine-safe TLS variables (required for correctness
  with LTO)
- Fix crashes with concurrent I/O and bdrv_refresh_limits()
- Split block APIs in global state and I/O
- iotests: Don't refuse to run at all without GNU sed, just skip tests
  that need it

# gpg: Signature made Fri 04 Mar 2022 17:18:31 GMT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kwolf-gitlab/tags/for-upstream: (50 commits)
  block/amend: Keep strong reference to BDS
  block/amend: Always call .bdrv_amend_clean()
  tests/qemu-iotests: Rework the checks and spots using GNU sed
  iotests/graph-changes-while-io: New test
  iotests: Allow using QMP with the QSD
  block: Make bdrv_refresh_limits() non-recursive
  job.h: assertions in the callers of JobDriver function pointers
  job.h: split function pointers in JobDriver
  block-backend-common.h: split function pointers in BlockDevOps
  block_int-common.h: assertions in the callers of BdrvChildClass function pointers
  block_int-common.h: split function pointers in BdrvChildClass
  block_int-common.h: assertions in the callers of BlockDriver function pointers
  block_int-common.h: split function pointers in BlockDriver
  block/coroutines: I/O and "I/O or GS" API
  block/copy-before-write.h: global state API + assertions
  include/block/snapshot: global state API + assertions
  assertions for blockdev.h global state API
  include/sysemu/blockdev.h: global state API
  assertions for blockjob.h global state API
  include/block/blockjob.h: global state API
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/ppc: Add missing helper_reset_fpstatus to helper_XVCVSPBF16

Fixes: 3909ff1fac ("target/ppc: Implement xvcvbf16spn and xvcvspbf16 instructions")
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220304175156.2012315-8-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

target/ppc: Add missing helper_reset_fpstatus to VSX_MAX_MINC

Fixes: da499405aa ("target/ppc: Refactor VSX_MAX_MINC helper")
Signed-off-by: Víctor Colombo <victor.colombo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220304175156.2012315-7-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

target/ppc: split XXGENPCV macros for readability

Fixes: b090f4f1e3c9 ("target/ppc: Implement xxgenpcv[bhwd]m instruction")
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220304175156.2012315-6-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

target/ppc: use andc in vrlqmi

Fixes: 7e5947df6e94 ("target/ppc: implement vrlqmi")
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220304175156.2012315-5-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

target/ppc: use extract/extract2 to create vrlqnm mask

Fixes: 4e272668406b ("target/ppc: implement vrlqnm")
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220304175156.2012315-4-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

target/ppc: use ext32u and deposit in do_vx_vmulhw_i64

Fixes: 29e9dfcf755e ("target/ppc: vmulh* instructions without helpers")
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220304175156.2012315-3-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

target/ppc: Fix vmul[eo]* instructions marked 2.07

Some ISA v2.03 Vector Multiply instructions marked to be ISA v2.07 only.
This patch fixes it.

Fixes: 80eca687c851 ("target/ppc: moved vector even and odd multiplication to decodetree")
Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
Suggested-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Lucas Mateus Castro (alqotel) <lucas.araujo@eldorado.org.br>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220304175156.2012315-2-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

tests/tcg/ppc64le: Use Altivec register names in clobber list

LLVM/Clang doesn't know the VSX registers when compiling with
-mabi=elfv1. Use only registers >= 32 and list them with their Altivec
name.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220304165417.1981159-6-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

tests/tcg/ppc64le: emit bcdsub with .long when needed

Based on GCC docs[1], we use the '-mpower8-vector' flag at config-time
to detect the toolchain support to the bcdsub instruction. LLVM/Clang
supports this flag since version 3.6[2], but the instruction and related
builtins were only added in LLVM 14[3]. In the absence of other means to
detect this support at config-time, we resort to __has_builtin to
identify the presence of __builtin_bcdsub at compile-time. If the
builtin is not available, the instruction is emitted with a ".long".

[1] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html
[2] https://github.com/llvm/llvm-project/commit/59eb767e11d4ffefb5f55409524e5c8416b2b0db
[3] https://github.com/llvm/llvm-project/commit/c933c2eb334660c131f4afc9d194fafb0cec0423

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220304165417.1981159-5-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

tests/tcg/ppc64le: drop __int128 usage in bcdsub

Using __int128 with inline asm constraints like "v" generates incorrect
code when compiling with LLVM/Clang (e.g., only one doubleword of the
VSR is loaded). Instead, use a GPR pair to pass the 128-bits value and
load the VSR with mtvsrd/xxmrghd.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220304165417.1981159-4-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

target/ppc: change xs[n]madd[am]sp to use float64r32_muladd

Change VSX Scalar Multiply-Add/Subtract Type-A/M Single Precision
helpers to use float64r32_muladd. This method should correctly handle
all rounding modes, so the workaround for float_round_nearest_even can
be dropped.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220304165417.1981159-3-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

tests/tcg/ppc64le: use inline asm instead of __builtin_mtfsf

LLVM/Clang does not support __builtin_mtfsf.

Acked-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220304165417.1981159-2-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

Use long endian options for ppc64

GCC options pairs -mlittle/-mlittle-endian and -mbig/-mbig-endian are
equivalent on ppc64 architecture. However, Clang supports only long
version of the options.

Use longer form in configure to properly support both GCC and Clang
compiler. In addition, fix this issue in tcg test configure.

Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20220131091714.4825-1-mrezanin@redhat.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>

Merge remote-tracking branch 'remotes/kraxel/tags/kraxel-20220304-pull-request' into staging

usb: fixes for ohci, xhci, mtp and redirect
audio: latency fixes
ui: opengl and cocoa fixes
firmware: ovmf tabel aprser fixes

# gpg: Signature made Fri 04 Mar 2022 14:18:47 GMT
# gpg:                using RSA key A0328CFFB93A17A79901FE7D4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/kraxel-20220304-pull-request: (35 commits)
  hw/display/vmware_vga: replace fprintf calls with trace events
  edid: Fix clock of Detailed Timing Descriptor
  softmmu/qdev-monitor: Add virtio-gpu-gl aliases
  ui/cocoa: Add Services menu
  ui/clipboard: fix use-after-free regression
  ui: do not create a surface when resizing a GL scanout
  ui/console: fix texture leak when calling surface_gl_create_texture()
  ui/console: fix crash when using gl context with non-gl listeners
  docs: Add spec of OVMF GUIDed table for SEV guests
  hw/i386: Replace magic number with field length calculation
  hw/i386: Improve bounds checking in OVMF table parsing
  coreaudio: Notify error in coreaudio_init_out
  hw/usb/redirect.c: Stop using qemu_oom_check()
  sdlaudio: fix samples vs. frames mix-up
  paaudio: fix samples vs. frames mix-up
  ossaudio: reduce effective playback buffer size
  dsoundaudio: reduce effective playback buffer size
  paaudio: reduce effective playback buffer size
  audio: restore mixing-engine playback buffer size
  Revert "audio: fix wavcapture segfault"
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tcg/i386: Implement bitsel for avx512

The general ternary logic operation can implement BITSEL.
Funnel the 4-operand operation into three variants of the
3-operand instruction, depending on input operand overlap.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Implement more logical operations for avx512

AVX512VL has a general ternary logic operation, VPTERNLOGQ,
which can implement NOT, ORC, NAND, NOR, EQV.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Implement avx512 multiply

AVX512DQ has VPMULLQ.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Implement avx512 min/max/abs

AVX512VL has VPABSQ, VPMAXSQ, VPMAXUQ, VPMINSQ, VPMINUQ.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Expand scalar rotate with avx512 insns

Expand 32-bit and 64-bit scalar rotate with VPRO[LR]V;
expand 16-bit scalar rotate with VPSHLDV.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Remove rotls_vec from tcg_target_op_def

There is no such instruction on x86, so we should
not be pretending it has arguments.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Expand vector word rotate as avx512vbmi2 shift-double

While there are no specific 16-bit rotate instructions, there
are double-word shifts, which can perform the same operation.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Support avx512vbmi2 vector shift-double instructions

We will use VPSHLD, VPSHLDV and VPSHRDV for 16-bit rotates.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Implement avx512 variable rotate

AVX512VL has VPROLVD and VPRORVQ.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Implement avx512 immediate rotate

AVX512VL has VPROLD and VPROLQ, layered onto the same
opcode as PSHIFTD, but requires EVEX encoding and W1.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Implement avx512 immediate sari shift

AVX512 has VPSRAQ with immediate operand, in the same form as
with AVX, but requires EVEX encoding and W1.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Implement avx512 scalar shift

AVX512VL has VPSRAQ.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Implement avx512 variable shifts

AVX512VL has VPSRAVQ, and
AVX512BW has VPSLLVW, VPSRAVW, VPSRLVW.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Use tcg_can_emit_vec_op in expand_vec_cmp_noinv

The condition for UMIN/UMAX availability is about to change;
use the canonical version.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Add tcg_out_evex_opc

The evex encoding is added here, for use in a subsequent patch.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Detect AVX512

There are some operation sizes in some subsets of AVX512 that
are missing from previous iterations of AVX. Detect them.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/s390x: Implement vector NAND, NOR, EQV

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/ppc: Implement vector NAND, NOR, EQV

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Add opcodes for vector nand, nor, eqv

We've had placeholders for these opcodes for a while,
and should have support on ppc, s390x and avx512 hosts.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Set MAX_OPC_PARAM_IARGS to 7

The last entry of DEF_HELPERS_FLAGS_n is DEF_HELPER_FLAGS_7 and
thus the MAX_OPC_PARAM_IARGS should be 7.

Reviewed-by: Taylor Simpson <tsimpson@quicinc.com>
Signed-off-by: Ziqiao Kong <ziqiaokong@gmail.com>
Message-Id: <20220227113127.414533-2-ziqiaokong@gmail.com>
Fixes: e6cadf49c3d ("tcg: Add support for a helper with 7 arguments")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/optimize: only read val after const check

valgrind pointed out that arg_info()->val can be undefined which will
be the case if the arguments are not constant. The ordering of the
checks will have ensured we never relied on an undefined value but for
the sake of completeness re-order the code to be clear.

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220209112142.3367525-1-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

block/amend: Keep strong reference to BDS

Otherwise, the BDS might be freed while the job is running, which would
cause a use-after-free.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220304153729.711387-5-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/amend: Always call .bdrv_amend_clean()

.bdrv_amend_clean() says block drivers can use it to clean up what was
done in .bdrv_amend_pre_run(). Therefore, it should always be called
after .bdrv_amend_pre_run(), which means we need it to call it in the
JobDriver.free() callback, not in JobDriver.clean().

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220304153729.711387-3-hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Rework the checks and spots using GNU sed

Instead of failing the iotests if GNU sed is not available (or skipping
them completely in the check-block.sh script), it would be better to
simply skip the bash-based tests that rely on GNU sed, so that the other
tests could still be run. Thus we now explicitely use "gsed" (either as
direct program or as a wrapper around "sed" if it's the GNU version)
in the spots that rely on the GNU sed behavior. Statements that use the
"-r" parameter of sed have been switched to use "-E" instead, since this
switch is supported by all sed versions on our supported build hosts
(most also support "-r", but macOS' sed only supports "-E"). With all
these changes in place, we then can also remove the sed checks from the
check-block.sh script, so that "make check-block" can now be run on
systems without GNU sed, too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Message-Id: <20220216125454.465041-1-thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests/graph-changes-while-io: New test

Test the following scenario:
1. Some block node (null-co) attached to a user (here: NBD server) that
performs I/O and keeps the node in an I/O thread
2. Repeatedly run blockdev-add/blockdev-del to add/remove an overlay
to/from that node

Each blockdev-add triggers bdrv_refresh_limits(), and because
blockdev-add runs in the main thread, it does not stop the I/O requests.
I/O can thus happen while the limits are refreshed, and when such a
request sees a temporarily invalid block limit (e.g. alignment is 0),
this may easily crash qemu (or the storage daemon in this case).

The block layer needs to ensure that I/O requests to a node are paused
while that node's BlockLimits are refreshed.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220216105355.30729-4-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Allow using QMP with the QSD

Add a parameter to optionally open a QMP connection when creating a
QemuStorageDaemon instance.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220216105355.30729-3-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: Make bdrv_refresh_limits() non-recursive

bdrv_refresh_limits() recurses down to the node's children.  That does
not seem necessary: When we refresh limits on some node, and then
recurse down and were to change one of its children's BlockLimits, then
that would mean we noticed the changed limits by pure chance.  The fact
that we refresh the parent's limits has nothing to do with it, so the
reason for the change probably happened before this point in time, and
we should have refreshed the limits then.

Consequently, we should actually propagate block limits changes upwards,
not downwards.  That is a separate and pre-existing issue, though, and
so will not be addressed in this patch.

The problem with recursing is that bdrv_refresh_limits() is not atomic.
It begins with zeroing BDS.bl, and only then sets proper, valid limits.
If we do not drain all nodes whose limits are refreshed, then concurrent
I/O requests can encounter invalid request_alignment values and crash
qemu.  Therefore, a recursing bdrv_refresh_limits() requires the whole
subtree to be drained, which is currently not ensured by most callers.

A non-recursive bdrv_refresh_limits() only requires the node in question
to not receive I/O requests, and this is done by most callers in some
way or another:
- bdrv_open_driver() deals with a new node with no parents yet
- bdrv_set_file_or_backing_noperm() acts on a drained node
- bdrv_reopen_commit() acts only on drained nodes
- bdrv_append() should in theory require the node to be drained; in
  practice most callers just lock the AioContext, which should at least
  be enough to prevent concurrent I/O requests from accessing invalid
  limits

So we can resolve the bug by making bdrv_refresh_limits() non-recursive.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1879437
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220216105355.30729-2-hreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

job.h: assertions in the callers of JobDriver function pointers

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-32-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

job.h: split function pointers in JobDriver

The job API will be handled separately in another serie.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-31-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block-backend-common.h: split function pointers in BlockDevOps

Assertions in the callers of the function pointrs are already
added by previous patches.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20220303151616.325444-30-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block_int-common.h: assertions in the callers of BdrvChildClass function pointers

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-29-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block_int-common.h: split function pointers in BdrvChildClass

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-28-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block_int-common.h: assertions in the callers of BlockDriver function pointers

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-27-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block_int-common.h: split function pointers in BlockDriver

Similar to the header split, also the function pointers in BlockDriver
can be split in I/O and global state.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-26-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/coroutines: I/O and "I/O or GS" API

block coroutines functions run in different aiocontext, and are
not protected by the BQL. Therefore are I/O.

On the other side, generated_co_wrapper functions use BDRV_POLL_WHILE,
meaning the caller can either be the main loop or a specific iothread.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-25-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/copy-before-write.h: global state API + assertions

copy-before-write functions always run under BQL.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-24-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

include/block/snapshot: global state API + assertions

Snapshots run also under the BQL, so they all are
in the global state API. The aiocontext lock that they hold
is currently an overkill and in future could be removed.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-23-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

assertions for blockdev.h global state API

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-22-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

include/sysemu/blockdev.h: global state API

blockdev functions run always under the BQL lock.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-21-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

assertions for blockjob.h global state API

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-20-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

include/block/blockjob.h: global state API

blockjob functions run always under the BQL lock.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-19-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block.c: add assertions to static functions

Following the assertion derived from the API split,
propagate the assertion also in the static functions.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-18-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

GS and IO CODE macros for blockjob_int.h

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-17-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

include/block/blockjob_int.h: split header into I/O and GS API

Since the I/O functions are not many, keep a single file.
Also split the function pointers in BlockJobDriver.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20220303151616.325444-16-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>