git.proxmox.com Git - mirror

intel_iommu: Fix root_scalable migration breakage

When introducing the initial support for scalable mode we added a
new field into vmstate however we blindly migrate that field without
notice. That'll break migration no matter forward or backward.

The normal way should be that we use something like
VMSTATE_UINT32_TEST() or subsections for the new vmstate field however
for this case of vt-d we can even make it simpler because we've
already migrated all the registers and it'll be fairly simple that we
re-generate root_scalable field from the register values during post
load of the device.

Fixes: fb43cf739e ("intel_iommu: scalable mode emulation")
Reviewed-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20190329061422.7926-2-peterx@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

virtio-net: Fix typo in comment

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Message-Id: <20190321161832.10533-1-yuval.shaia@oracle.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

intel_iommu: Correct caching-mode error message

If we try to use the intel-iommu device with vfio-pci devices without
caching mode enabled, we're told:

qemu-system-x86_64: We need to set caching-mode=1 for intel-iommu to enable
device assignment with IOMMU protection.

But to enable caching mode, the option is actually "caching-mode=on".

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Message-Id: <155364147432.16467.15898335025013220939.stgit@gimli.home>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Williamson <<a href="mailto:alex.williamson@redhat.com" target="_blank" rel="noreferrer">alex.williamson@redhat.com</a>><br>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

acpi: verify file entries in bios_linker_loader_add_pointer()

The callers to bios_linker_find_file() assert that the file entry returned
is not NULL, except for those in bios_linker_loader_add_pointer(). Add two
asserts in that case for completeness and to facilitate static code analysis.

Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <1553199229-25318-1-git-send-email-liam.merwick@oracle.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Merge remote-tracking branch 'remotes/berrange/tags/filemon-next-pull-request' into staging

filemon: various fixes / improvements to file monitor for USB MTP

Ensure watch IDs unique within a monitor and avoid integer wraparound
issues when many watches are set & unset over time.

# gpg: Signature made Tue 02 Apr 2019 13:53:40 BST
# gpg:                using RSA key BE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>" [full]
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>" [full]
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/filemon-next-pull-request:
  filemon: fix watch IDs to avoid potential wraparound issues
  filemon: ensure watch IDs are unique to QFileMonitor scope
  tests: refactor file monitor test to make it more understandable

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- file-posix: Ignore unlock failure instead of crashing
- gluster: Limit the transfer size to 512 MiB
- stream: Fix backing chain freezing
- qemu-img: Enable BDRV_REQ_MAY_UNMAP for zero writes in convert
- iotests fixes

# gpg: Signature made Tue 02 Apr 2019 13:47:43 BST
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  tests/qemu-iotests/235: Allow fallback to tcg
  block: test block-stream with a base node that is used by block-commit
  block: freeze the backing chain earlier in stream_start()
  block: continue until base is found in bdrv_freeze_backing_chain() et al
  block/file-posix: do not fail on unlock bytes
  tests/qemu-iotests: Remove redundant COPYING file
  block/gluster: limit the transfer size to 512 MiB
  qemu-img: Enable BDRV_REQ_MAY_UNMAP in convert
  iotests: Fix test 200 on s390x without virtio-pci

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

filemon: fix watch IDs to avoid potential wraparound issues

Watch IDs are allocated from incrementing a int counter against
the QFileMonitor object. In very long life QEMU processes with
a huge amount of USB MTP activity creating & deleting directories
it is just about conceivable that the int counter can wrap
around. This would result in incorrect behaviour of the file
monitor watch APIs due to clashing watch IDs.

Instead of trying to detect this situation, this patch changes
the way watch IDs are allocated. It is turned into an int64_t
variable where the high 32 bits are set from the underlying
inotify "int" ID. This gives an ID that is guaranteed unique
for the directory as a whole, and we can rely on the kernel
to enforce this. QFileMonitor then sets the low 32 bits from
a per-directory counter.

The USB MTP device only sets watches on the directory as a
whole, not files within, so there is no risk of guest
triggered wrap around on the low 32 bits.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

filemon: ensure watch IDs are unique to QFileMonitor scope

The watch IDs are mistakenly only unique within the scope of the
directory being monitored. This is not useful for clients which are
monitoring multiple directories. They require watch IDs to be unique
globally within the QFileMonitor scope.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Tested-by: Bandan Das <bsd@redhat.com>
Reviewed-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

tests: refactor file monitor test to make it more understandable

The current file monitor unit tests are too clever for their own good
making it hard to understand the desired output.

Instead of trying to infer the expected events, explicitly list the
events we expect in the operation sequence.

Instead of dynamically building a matrix of tests, just have one giant
operation sequence that validates all scenarios in a single test.

Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

tests/qemu-iotests/235: Allow fallback to tcg

iotest 235 currently only works with KVM - this is bad for systems where
it is not available, e.g. CI pipelines. The test also works when using
"tcg" as accelerator, so we can simply add that to the list of accelerators,
too.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: test block-stream with a base node that is used by block-commit

The base node of a block-stream operation indicates the first image
from the backing chain starting from which no data is copied to the
top node.

The block-stream job allows others to use that base image, so a second
block-stream job could be writing to it at the same time. An important
restriction is that the base image must not disappear while the stream
job is ongoing. stream_start() freezes the backing chain from top to
base with that purpose but it does it too late in the code so there is
a race condition there.

This bug was fixed in the previous commit, and this patch contains an
iotest for this scenario.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: freeze the backing chain earlier in stream_start()

Commit 6585493369819a48d34a86d57ec6b97cb5cd9bc0 added code to freeze
the backing chain from 'top' to 'base' for the duration of the
block-stream job.

The problem is that the freezing happens too late in stream_start():
during the bdrv_reopen_set_read_only() call earlier in that function
another job can jump in and remove the base image. If that happens we
have an invalid chain and QEMU crashes.

This patch puts the bdrv_freeze_backing_chain() call at the beginning
of the function.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block: continue until base is found in bdrv_freeze_backing_chain() et al

All three functions that handle the BdrvChild.frozen attribute walk
the backing chain from 'bs' to 'base' and stop either when 'base' is
found or at the end of the chain if 'base' is NULL.

However if 'base' is not found then the functions return without
errors as if it was NULL.

This is wrong: if the caller passed an incorrect parameter that means
that there is a bug in the code.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/file-posix: do not fail on unlock bytes

bdrv_replace_child() calls bdrv_check_perm() with error_abort on
loosening permissions. However file-locking operations may fail even
in this case, for example on NFS. And this leads to Qemu crash.

Let's avoid such errors. Note, that we ignore such things anyway on
permission update commit and abort.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: Remove redundant COPYING file

The file tests/qemu-iotests/COPYING is the same text as in the
COPYING file in the main directory. So as far as I can see, we don't
need the duplicate here.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

block/gluster: limit the transfer size to 512 MiB

Several versions of GlusterFS (3.12? -> 6.0.1) fail when the
transfer size is greater or equal to 1024 MiB, so we are
limiting the transfer size to 512 MiB to avoid this rare issue.

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1691320
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Niels de Vos <ndevos@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-img: Enable BDRV_REQ_MAY_UNMAP in convert

With Kevin's "block: Fix slow pre-zeroing in qemu-img convert"[1]
(commit c9fdcf202f, 'qemu-img: Use BDRV_REQ_NO_FALLBACK for
pre-zeroing') we skip the pre zero step called like this:

    blk_make_zero(s->target, BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK)

And we write zeroes later using:

    blk_co_pwrite_zeroes(s->target,
                         sector_num << BDRV_SECTOR_BITS,
                         n << BDRV_SECTOR_BITS, 0);

Since we use flags=0, this is translated to NBD_CMD_WRITE_ZEROES with
NBD_CMD_FLAG_NO_HOLE flag, which cause the NBD server to allocated space
instead of punching a hole.

Here is an example failure:

$ dd if=/dev/urandom of=src.img bs=1M count=5
$ truncate -s 50m src.img
$ truncate -s 50m dst.img
$ nbdkit -f -v -e '' -U nbd.sock file file=dst.img

$ ./qemu-img convert -n src.img nbd:unix:nbd.sock

We can see in nbdkit log that it received the NBD_CMD_FLAG_NO_HOLE
(may_trim=0):

nbdkit: file[1]: debug: newstyle negotiation: flags: export 0x4d
nbdkit: file[1]: debug: pwrite count=2097152 offset=0
nbdkit: file[1]: debug: pwrite count=2097152 offset=2097152
nbdkit: file[1]: debug: pwrite count=1048576 offset=4194304
nbdkit: file[1]: debug: zero count=33554432 offset=5242880 may_trim=0
nbdkit: file[1]: debug: zero count=13631488 offset=38797312 may_trim=0
nbdkit: file[1]: debug: flush

And the image became fully allocated:

$ qemu-img info dst.img
virtual size: 50M (52428800 bytes)
disk size: 50M

With this change we see that nbdkit did not receive the
NBD_CMD_FLAG_NO_HOLE (may_trim=1):

nbdkit: file[1]: debug: newstyle negotiation: flags: export 0x4d
nbdkit: file[1]: debug: pwrite count=2097152 offset=0
nbdkit: file[1]: debug: pwrite count=2097152 offset=2097152
nbdkit: file[1]: debug: pwrite count=1048576 offset=4194304
nbdkit: file[1]: debug: zero count=33554432 offset=5242880 may_trim=1
nbdkit: file[1]: debug: zero count=13631488 offset=38797312 may_trim=1
nbdkit: file[1]: debug: flush

And the file is sparse as expected:

$ qemu-img info dst.img
virtual size: 50M (52428800 bytes)
disk size: 5.0M

[1] http://lists.nongnu.org/archive/html/qemu-block/2019-03/msg00761.html

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: Fix test 200 on s390x without virtio-pci

virtio-pci is optional on s390x, e.g. in downstream RHEL builds, it
is disabled. On s390x, virtio-ccw should be used instead. Other tests
like 051 or 240 already use virtio-scsi-ccw instead of virtio-scsi-pci
on s390x, so let's do the same here and always use virtio-scsi-ccw on
s390x.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/kraxel/tags/fixes-20190402-pull-request' into staging

fixes for 4.0 (audio, usb),

# gpg: Signature made Tue 02 Apr 2019 07:46:22 BST
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/fixes-20190402-pull-request:
  audio: fix audio timer rate conversion bug
  usb-mtp: remove usb_mtp_object_free_one
  usb-mtp: fix return status of delete
  hw/usb/bus.c: Handle "no speed matched" case in usb_mask_to_str()
  Revert "audio: fix pc speaker init"

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

audio: fix audio timer rate conversion bug

Currently the default audio timer frequency is 10000Hz instead of
a period of 10000us. Also the audiodev timer-period property gets
converted like a frequency. Only handling of the legacy
QEMU_AUDIO_TIMER_PERIOD environment variable is correct because
it's actually a frequency.

With this patch the property timer-period is really a timer period
and QEMU_AUDIO_TIMER_PERIOD remains a frequency.

Fixes: 71830221fb "-audiodev command line option basic implementation."
Signed-off-by: Volker Rümelin <vr_qemu@t-online.de>
Reviewed-by: Zoltán Kővágó <DirtY.iCE.hu@gmail.com>
Message-id: 90b95e4f-39ef-2b01-da6a-857ebaee1ec5@t-online.de
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-mtp: remove usb_mtp_object_free_one

This function is used in the delete path only and can
be replaced by a call to usb_mtp_object_free.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Bandan Das <bsd@redhat.com>
Message-Id: <20190401211712.19012-3-bsd@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

usb-mtp: fix return status of delete

Spotted by Coverity: CID 1399414

mtp delete allows the return status of delete succeeded,
partial_delete or readonly - when none of the objects could be
deleted. Give more meaningful names to return values of the
delete function.

Some initiators recurse over the objects themselves. In that case,
only READ_ONLY can be returned.

Signed-off-by: Bandan Das <bsd@redhat.com>
Message-Id: <20190401211712.19012-2-bsd@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2019-04-01' into staging

nbd patches for 2019-04-01

- Better behavior of qemu-img map on NBD images
- Fixes for NBD protocol alignment corner cases:
- the server has fewer places where it sends reads or block status
   not aligned to its advertised block size
- the client has more cases where it can work around server
   non-compliance present in qemu 3.1
- the client now avoids non-compliant requests when interoperating
   with nbdkit or other servers not advertising block size

# gpg: Signature made Mon 01 Apr 2019 15:06:54 BST
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full]
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full]
# gpg:                 aka "[jpeg image of size 6874]" [full]
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2019-04-01:
  nbd/client: Trace server noncompliance on structured reads
  nbd/server: Advertise actual minimum block size
  block: Add bdrv_get_request_alignment()
  nbd/client: Support qemu-img convert from unaligned size
  nbd/client: Reject inaccessible tail of inconsistent server
  nbd/client: Report offsets in bdrv_block_status
  nbd/client: Lower min_block for block-status, unaligned size
  iotests: Add 241 to test NBD on unaligned images
  nbd-client: Work around server BLOCK_STATUS misalignment at EOF
  qemu-img: Gracefully shutdown when map can't finish
  nbd: Permit simple error to NBD_CMD_BLOCK_STATUS
  nbd: Don't lose server's error to NBD_CMD_BLOCK_STATUS
  nbd: Tolerate some server non-compliance in NBD_CMD_BLOCK_STATUS
  qemu-img: Report bdrv_block_status failures

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

nbd/client: Trace server noncompliance on structured reads

Just as we recently added a trace for a server sending block status
that doesn't match the server's advertised minimum block alignment,
let's do the same for read chunks. But since qemu 3.1 is such a
server (because it advertised 512-byte alignment, but when serving a
file that ends in data but is not sector-aligned, NBD_CMD_READ would
detect a mid-sector change between data and hole at EOF and the
resulting read chunks are unaligned), we don't want to change our
behavior of otherwise tolerating unaligned reads.

Note that even though we fixed the server for 4.0 to advertise an
actual block alignment (which gets rid of the unaligned reads at EOF
for posix files), we can still trigger it via other means:

$ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file

Arguably, that is a bug in the blkdebug block status function, for
leaking a block status that is not aligned. It may also be possible to
observe issues with a backing layer with smaller alignment than the
active layer, although so far I have been unable to write a reliable
iotest for that scenario.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190330165349.32256-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

nbd/server: Advertise actual minimum block size

Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their
reply according to bdrv_block_status() boundaries. If the block device
has a request_alignment smaller than 512, but we advertise a block
alignment of 512 to the client, then this can result in the server
reply violating client expectations by reporting a smaller region of
the export than what the client is permitted to address (although this
is less of an issue for qemu 4.0 clients, given recent client patches
to overlook our non-compliance at EOF). Since it's always better to
be strict in what we send, it is worth advertising the actual minimum
block limit rather than blindly rounding it up to 512.

Note that this patch is not foolproof - it is still possible to
provoke non-compliant server behavior using:

$ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file

That is arguably a bug in the blkdebug driver (it should never pass
back block status smaller than its alignment, even if it has to make
multiple bdrv_get_status calls and determine the
least-common-denominator status among the group to return). It may
also be possible to observe issues with a backing layer with smaller
alignment than the active layer, although so far I have been unable to
write a reliable iotest for that scenario (but again, an issue like
that could be argued to be a bug in the block layer, or something
where we need a flag to bdrv_block_status() to state whether the
result must be aligned to the current layer's limits or can be
subdivided for accuracy when chasing backing files).

Anyways, as blkdebug is not normally used, and as this patch makes our
server more interoperable with qemu 3.1 clients, it is worth applying
now, even while we still work on a larger patch series for the 4.1
timeframe to have byte-accurate file lengths.

Note that the iotests output changes - for 223 and 233, we can see the
server's better granularity advertisement; and for 241, the three test
cases have the following effects:
- natural alignment: the server's smaller alignment is now advertised,
and the hole reported at EOF is now the right result; we've gotten rid
of the server's non-compliance
- forced server alignment: the server still advertises 512 bytes, but
still sends a mid-sector hole. This is still a server compliance bug,
which needs to be fixed in the block layer in a later patch; output
does not change because the client is already being tolerant of the
non-compliance
- forced client alignment: the server's smaller alignment means that
the client now sees the server's status change mid-sector without any
protocol violations, but the fact that the map shows an unaligned
mid-sector hole is evidence of the block layer problems with aligned
block status, to be fixed in a later patch

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-7-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: rebase to enhanced iotest 241 coverage]

block: Add bdrv_get_request_alignment()

The next patch needs access to a device's minimum permitted
alignment, since NBD wants to advertise this to clients. Add
an accessor function, borrowing from blk_get_max_transfer()
for accessing a backend's block limits.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190329042750.14704-6-eblake@redhat.com>

nbd/client: Support qemu-img convert from unaligned size

If an NBD server advertises a size that is not a multiple of a sector,
the block layer rounds up that size, even though we set info.size to
the exact byte value sent by the server. The block layer then proceeds
to let us read or query block status on the hole that it added past
EOF, which the NBD server is unlikely to be happy with. Fortunately,
qemu as a server never advertizes an unaligned size, so we generally
don't run into this problem; but the nbdkit server makes it easy to
test:

$ printf %1000d 1 > f1
$ ~/nbdkit/nbdkit -fv file f1 & pid=$!
$ qemu-img convert -f raw nbd://localhost:10809 f2
$ kill $pid
$ qemu-img compare f1 f2

Pre-patch, the server attempts a 1024-byte read, which nbdkit
rightfully rejects as going beyond its advertised 1000 byte size; the
conversion fails and the output files differ (not even the first
sector is copied, because qemu-img does not follow ddrescue's habit of
trying smaller reads to get as much information as possible in spite
of errors). Post-patch, the client's attempts to read (and query block
status, for new enough nbdkit) are properly truncated to the server's
length, with sane handling of the hole the block layer forced on
us. Although f2 ends up as a larger file (1024 bytes instead of 1000),
qemu-img compare shows the two images to have identical contents for
display to the guest.

I didn't add iotests coverage since I didn't want to add a dependency
on nbdkit in iotests. I also did NOT patch write, trim, or write
zeroes - these commands continue to fail (usually with ENOSPC, but
whatever the server chose), because we really can't write to the end
of the file, and because 'qemu-img convert' is the most common case
where we care about being tolerant (which is read-only). Perhaps we
could truncate the request if the client is writing zeros to the tail,
but that seems like more work, especially if the block layer is fixed
in 4.1 to track byte-accurate sizing (in which case this patch would
be reverted as unnecessary).

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-5-eblake@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>

nbd/client: Reject inaccessible tail of inconsistent server

The NBD spec suggests that a server should never advertise a size
inconsistent with its minimum block alignment, as that tail is
effectively inaccessible to a compliant client obeying those block
constraints. Since we have a habit of rounding up rather than
truncating, to avoid losing the last few bytes of user input, and we
cannot access the tail when the server advertises bogus block sizing,
abort the connection to alert the server to fix their bug. And
rejecting such servers matches what we already did for a min_block
that was not a power of 2 or which was larger than max_block.

Does not impact either qemu (which always sends properly aligned
sizes) or nbdkit (which does not send minimum block requirements yet);
so this is mostly aimed at new NBD server implementations, and ensures
that the rest of our code can assume the size is aligned.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190330155704.24191-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

hw/usb/bus.c: Handle "no speed matched" case in usb_mask_to_str()

In usb_mask_to_str() we convert a mask of USB speeds into
a human-readable string (like "full+high") for use in
tracing and error messages. However the conversion code
doesn't do anything to the string buffer if the passed in
speedmask doesn't match any of the recognized speeds,
which means that the tracing and error messages will
end up with random garbage in them. This can happen if
we're doing USB device passthrough.

Handle the "unrecognized speed" case by using the
string "unknown".

Fixes: https://bugs.launchpad.net/qemu/+bug/1603785
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20190328133503.6490-1-peter.maydell@linaro.org
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

Revert "audio: fix pc speaker init"

This reverts commit bd56d378842c238c8901536c06c20a4a51ee9761.

Turned out it isn't that simple as the device needs the pit object link.
So "-device isa-pcspk" isn't going wo work anyway. We are in freeze, so
just reverting the thing is the best way to handle this for now, trying
to come up with something better can be done in the 4.1 devel cycle.

Also add a comment noting the object link.

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Message-id: 20190328071121.21147-1-kraxel@redhat.com

nbd/client: Report offsets in bdrv_block_status

It is desirable for 'qemu-img map' to have the same output for a file
whether it is served over file or nbd protocols. However, ever since
we implemented block status for NBD (2.12), the NBD protocol forgot to
inform the block layer that as the final layer in the chain, the
offset is valid; without an offset, the human-readable form of
qemu-img map gives up with the unhelpful:

$ nbdkit -U - data data="1" size=512 --run 'qemu-img map $nbd'
Offset          Length          Mapped to       File
qemu-img: File contains external, encrypted or compressed clusters.

The --output=json form always works, because it is reporting the
lower-level bdrv_block_status results directly rather than trying to
filter out sparse ranges for human consumption - but now it also
shows the offset member.

With this patch, the human output changes to:

Offset          Length          Mapped to       File
0               0x200           0               nbd+unix://?socket=/tmp/nbdkitOxeoLa/socket

This change is observable to several iotests.

Fixes: 78a33ab5
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-4-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

nbd/client: Lower min_block for block-status, unaligned size

We have a latent bug in our NBD client code, tickled by the brand new
nbdkit 1.11.10 block status support:

$ nbdkit --filter=log --filter=truncate -U - \
           data data="1" size=511 truncate=64K logfile=/dev/stdout \
           --run 'qemu-img convert $nbd /var/tmp/out'
...
qemu-img: block/io.c:2122: bdrv_co_block_status: Assertion `*pnum && QEMU_IS_ALIGNED(*pnum, align) && align > offset - aligned_offset' failed.

The culprit? Our implementation of .bdrv_co_block_status can return
unaligned block status for any server that operates with a lower
actual alignment than what we tell the block layer in
request_alignment, in violation of the block layer's constraints. To
date, we've been unable to trip the bug, because qemu as NBD server
always advertises block sizing (at which point it is a server bug if
the server sends unaligned status - although qemu 3.1 is such a server
and I've sent separate patches for 4.0 both to get the server to obey
the spec, and to let the client to tolerate server oddities at EOF).

But nbdkit does not (yet) advertise block sizing, and therefore is not
in violation of the spec for returning block status at whatever
boundaries it wants, and those unaligned results can occur anywhere
rather than just at EOF. While we are still wise to avoid sending
sub-sector read/write requests to a server of unknown origin, we MUST
consider that a server telling us block status without an advertised
block size is correct.  So, we either have to munge unaligned answers
from the server into aligned ones that we hand back to the block
layer, or we have to tell the block layer about a smaller alignment.

Similarly, if the server advertises an image size that is not
sector-aligned, we might as well assume that the server intends to let
us access those tail bytes, and therefore supports a minimum block
size of 1, regardless of whether the server supports block status
(although we still need more patches to fix the problem that with an
unaligned image, we can send read or block status requests that exceed
EOF to the server). Again, qemu as server cannot trip this problem
(because it rounds images to sector alignment), but nbdkit advertised
unaligned size even before it gained block status support.

Solve both alignment problems at once by using better heuristics on
what alignment to report to the block layer when the server did not
give us something to work with. Note that very few NBD servers
implement block status (to date, only qemu and nbdkit are known to do
so); and as the NBD spec mentioned block sizing constraints prior to
documenting block status, it can be assumed that any future
implementations of block status are aware that they must advertise
block size if they want a minimum size other than 1.

We've had a long history of struggles with picking the right alignment
to use in the block layer, as evidenced by the commit message of
fd8d372d (v2.12) that introduced the current choice of forced 512-byte
alignment.

There is no iotest coverage for this fix, because qemu can't provoke
it, and I didn't want to make test 241 dependent on nbdkit.

Fixes: fd8d372d
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>

iotests: Add 241 to test NBD on unaligned images

Add a test for the NBD client workaround in the previous patch. It's
not really feasible for an iotest to assume a specific tracing engine,
so we can't really probe trace_nbd_parse_blockstatus_compliance to see
if the server was fixed vs. whether the client just worked around the
server (other than by rearranging order between code patches and this
test). But having a successful exchange sure beats the previous state
of an error message. Since format probing can change alignment, we can
use that as an easy way to test several configurations.

Not tested yet, but worth adding to this test in future patches: an
NBD server that can advertise a non-sector-aligned size (such as
nbdkit) causes qemu as the NBD client to misbehave when it rounds the
size up and accesses beyond the advertised size. Qemu as NBD server
never advertises a non-sector-aligned size (since bdrv_getlength()
currently rounds up to sector boundaries); until qemu can act as such
a server, testing that flaw will have to rely on external binaries.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190329042750.14704-2-eblake@redhat.com>
Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
[eblake: add forced-512 alignment, and nbdkit reproducer comment]

nbd-client: Work around server BLOCK_STATUS misalignment at EOF

The NBD spec is clear that a server that advertises a minimum block
size should reply to NBD_CMD_BLOCK_STATUS with extents aligned
accordingly. However, we know that the qemu NBD server implementation
has had a corner-case bug where it is not compliant with the spec,
present since the introduction of NBD_CMD_BLOCK_STATUS in qemu 2.12
(and unlikely to be patched in time for 4.0). Namely, when qemu is
serving a file that is not a multiple of 512 bytes, it rounds the size
advertised over NBD up to the next sector boundary (someday, I'd like
to fix that to be byte-accurate, but it's a much bigger audit not
appropriate for this release); yet if the final sector contains data
prior to EOF, lseek(SEEK_HOLE) will point to the implicit hole
mid-sector which qemu then reported over NBD.

We are well within our rights to hang up on a server that can't follow
the spec, but it is more useful to try and keep the connection alive
in spite of the problem. Do so by tracing a message about the problem,
and then either truncating the request back to an aligned boundary (if
it covered more than the final sector) or widening it out to the full
boundary with a forced status of data (since truncating would result
in 0 bytes, but we have to make progress, and valid since data is a
default-safe answer). And in practice, since the problem only happens
on a sector that starts with data and ends with a hole, we are going
to want to read that full sector anyway (where qemu as the server
fills in the tail beyond EOF with appropriate NUL bytes).

Easy reproduction:
$ printf %1000d 1 > file
$ qemu-nbd -f raw -t file & pid=$!
$ qemu-img map --output=json -f raw nbd://localhost:10809
qemu-img: Could not read file metadata: Invalid argument
$ kill $pid

where the patched version instead succeeds with:
[{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true}]

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190326171317.4036-1-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

qemu-img: Gracefully shutdown when map can't finish

Trying 'qemu-img map -f raw nbd://localhost:10809' causes the
NBD server to output a scary message:

qemu-nbd: Disconnect client, due to: Failed to read request: Unexpected end-of-file before all bytes were read

This is because the NBD client, being remote, has no way to expose a
human-readable map (the --output=json data is fine, however). But
because we exit(1) right after the message, causing the client to
bypass all block cleanup, the server sees the abrupt exit and warns,
whereas it would be silent had the client had a chance to send
NBD_CMD_DISC. Other protocols may have similar cleanup issues, where
failure to blk_unref() could cause unintended effects.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190326184043.7544-1-eblake@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

nbd: Permit simple error to NBD_CMD_BLOCK_STATUS

The NBD spec is clear that when structured replies are active, a
simple error reply is acceptable to any command except for
NBD_CMD_READ.  However, we were mistakenly requiring structured errors
for NBD_CMD_BLOCK_STATUS, and hanging up on a server that gave a
simple error (since qemu does not behave as such a server, we didn't
notice the problem until now).  Broken since its introduction in
commit 78a33ab5 (v2.12).

Noticed while debugging a separate failure reported by nbdkit while
working out its initial implementation of BLOCK_STATUS, although it
turns out that nbdkit also chose to send structured error replies for
BLOCK_STATUS, so I had to manually provoke the situation by hacking
qemu's server to send a simple error reply:

| diff --git i/nbd/server.c w/nbd/server.c
| index fd013a2817a..833288d7c45 100644
| 00--- i/nbd/server.c
| +++ w/nbd/server.c
| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
|                                        "discard failed", errp);
|
|      case NBD_CMD_BLOCK_STATUS:
| +        return nbd_co_send_simple_reply(client, request->handle, ENOMEM,
| +                                        NULL, 0, errp);
|          if (!request->len) {
|              return nbd_send_generic_reply(client, request->handle, -EINVAL,
|                                            "need non-zero length", errp);
|

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20190325190104.30213-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

nbd: Don't lose server's error to NBD_CMD_BLOCK_STATUS

When the server replies with a (structured [*]) error to
NBD_CMD_BLOCK_STATUS, without any extent information sent first, the
client code was blindly throwing away the server's error code and
instead telling the caller that EIO occurred.  This has been broken
since its introduction in 78a33ab5 (v2.12, where we should have called:
   error_setg(&local_err, "Server did not reply with any status extents");
   nbd_iter_error(&iter, false, -EIO, &local_err);
to declare the situation as a non-fatal error if no earlier error had
already been flagged, rather than just blindly slamming iter.err and
iter.ret), although it is more noticeable since commit 7f86068d, which
actually tries hard to preserve the server's code thanks to a separate
iter.request_ret.

[*] The spec is clear that the server is also permitted to reply with
a simple error, but that's a separate fix.

I was able to provoke this scenario with a hack to the server, then
seeing whether ENOMEM makes it back to the caller:

| diff --git a/nbd/server.c b/nbd/server.c
| index fd013a2817a..29c7995de02 100644
| --- a/nbd/server.c
| +++ b/nbd/server.c
| @@ -2269,6 +2269,8 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
|                                        "discard failed", errp);
|
|      case NBD_CMD_BLOCK_STATUS:
| +        return nbd_send_generic_reply(client, request->handle, -ENOMEM,
| +                                      "no status for you today", errp);
|          if (!request->len) {
|              return nbd_send_generic_reply(client, request->handle, -EINVAL,
|                                            "need non-zero length", errp);
| --

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190325190104.30213-2-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

nbd: Tolerate some server non-compliance in NBD_CMD_BLOCK_STATUS

The NBD spec states that NBD_CMD_FLAG_REQ_ONE (which we currently
always use) should not reply with an extent larger than our request,
and that the server's response should be exactly one extent. Right
now, that means that if a server sends more than one extent, we treat
the server as broken, fail the block status request, and disconnect,
which prevents all further use of the block device. But while good
software should be strict in what it sends, it should be tolerant in
what it receives.

While trying to implement NBD_CMD_BLOCK_STATUS in nbdkit, we
temporarily had a non-compliant server sending too many extents in
spite of REQ_ONE. Oddly enough, 'qemu-img convert' with qemu 3.1
failed with a somewhat useful message:
  qemu-img: Protocol error: invalid payload for NBD_REPLY_TYPE_BLOCK_STATUS

which then disappeared with commit d8b4bad8, on the grounds that an
error message flagged only at the time of coroutine teardown is
pointless, and instead we should rely on the actual failed API to
report an error - in other words, the 3.1 behavior was masking the
fact that qemu-img was not reporting an error. That has since been
fixed in the previous patch, where qemu-img convert now fails with:
  qemu-img: error while reading block status of sector 0: Invalid argument

But even that is harsh.  Since we already partially relaxed things in
commit acfd8f7a to tolerate a server that exceeds the cap (although
that change was made prior to the NBD spec actually putting a cap on
the extent length during REQ_ONE - in fact, the NBD spec change was
BECAUSE of the qemu behavior prior to that commit), it's not that much
harder to argue that we should also tolerate a server that sends too
many extents.  But at the same time, it's nice to trace when we are
being tolerant of server non-compliance, in order to help server
writers fix their implementations to be more portable (if they refer
to our traces, rather than just stderr).

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190323212639.579-3-eblake@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

qemu-img: Report bdrv_block_status failures

If bdrv_block_status_above() fails, we are aborting the convert
process but failing to print an error message. Broken in commit
690c7301 (v2.4) when rewriting convert's logic.

Discovered when teaching nbdkit to support NBD_CMD_BLOCK_STATUS, and
accidentally violating the protocol by returning more than one extent
in spite of qemu asking for NBD_CMD_FLAG_REQ_ONE. The qemu NBD code
should probably handle the server's non-compliance more gracefully
than failing with EINVAL, but qemu-img shouldn't be silently
squelching any block status failures. It doesn't help that qemu 3.1
masks the qemu-img bug with extra noise that the nbd code is dumping
to stderr (that noise was cleaned up in d8b4bad8).

Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20190323212639.579-2-eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Merge remote-tracking branch 'remotes/rth/tags/pull-axp-20190325' into staging

Update palcode for machine checks.

# gpg: Signature made Mon 25 Mar 2019 23:09:24 GMT
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-axp-20190325:
  pc-bios: Update palcode-clipper

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/jasowang/tags/net-pull-request' into staging

# gpg: Signature made Fri 29 Mar 2019 07:30:26 GMT
# gpg:                using RSA key EF04965B398D6211
# gpg: Good signature from "Jason Wang (Jason Wang on RedHat) <jasowang@redhat.com>" [marginal]
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 215D 46F4 8246 689E C77F  3562 EF04 965B 398D 6211

* remotes/jasowang/tags/net-pull-request:
  net: tap: use qemu_set_nonblock
  MAINTAINERS: Update the latest email address
  e1000: Delay flush queue when receive RCTL
  net/socket: learn to talk with a unix dgram socket

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-4.0-20190329' into staging

ppc patch queue 2019-03-29

Here's a set of bugfixes for ppc, aimed at qemu-4.0 during hard freeze.

We have one cleanup that's not strictly a bugfix, but will avoid an
ugly external interface making it to a released version.

We have one change to generic code to tweak the semantics of
qemu_getrampagesize() which fixes a bug for ppc.  This does have a
possible impact on s390x which uses this function for a different
purpose.  I've discussed with David Hildenbrand and Igor Mammedov,
however and we think it won't immediately break anything due to some
existing bugs in the s390 usage.  David H will be following up with
some s390 fixes in that area.

# gpg: Signature made Fri 29 Mar 2019 03:27:49 GMT
# gpg:                using RSA key 75F46586AE61A66CC44E87DC6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>" [full]
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>" [full]
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>" [full]
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>" [unknown]
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-4.0-20190329:
  exec: Only count mapped memory backends for qemu_getrampagesize()
  spapr/irq: Add XIVE sanity checks on non-P9 machines
  spapr: Simplify handling of host-serial and host-model values
  target/ppc: Fix QEMU crash with stxsdx
  target/ppc: Improve comment of bcctr used for spectre v2 mitigation
  target/ppc: Consolidate 64-bit server processor detection in a helper
  target/ppc: Enable "decrement and test CTR" version of bcctr
  target/ppc: Fix TCG temporary leaks in gen_bcond()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

net: tap: use qemu_set_nonblock

The fcntl will change the flags directly, use qemu_set_nonblock()
instead.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Li Qiang <liq3ea@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

MAINTAINERS: Update the latest email address

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

e1000: Delay flush queue when receive RCTL

Due to too early RCT0 interrput, win10x32 may hang on booting.
This problem can be reproduced by doing power cycle on win10x32 guest.
In our environment, we have 10 win10x32 and stress power cycle.
The problem will happen about 20 rounds.

Below shows some log with comment:

The normal case:

22831@1551928392.984687:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
22831@1551928392.985655:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
22831@1551928392.985801:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: RCTL: 0, mac_reg[RCTL] = 0x0
22831@1551928393.056710:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: ICR read: 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: RCTL: 0, mac_reg[RCTL] = 0x0
22831@1551928393.077548:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: ICR read: 0
e1000: set_ics 2, ICR 0, IMR 0
e1000: set_ics 2, ICR 2, IMR 0
e1000: RCTL: 0, mac_reg[RCTL] = 0x0
22831@1551928393.102974:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
22831@1551928393.103267:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
e1000: RCTL: 255, mac_reg[RCTL] = 0x40002 <- win10x32 says it can handle
RX now
e1000: set_ics 0, ICR 2, IMR 9d <- unmask interrupt
e1000: RCTL: 255, mac_reg[RCTL] = 0x48002
e1000: set_ics 80, ICR 2, IMR 9d <- interrupt and work!
...

The bad case:

27744@1551930483.117766:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
27744@1551930483.118398:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: RCTL: 0, mac_reg[RCTL] = 0x0
27744@1551930483.198063:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: ICR read: 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: RCTL: 0, mac_reg[RCTL] = 0x0
27744@1551930483.218675:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
e1000: set_ics 0, ICR 0, IMR 0
e1000: ICR read: 0
e1000: set_ics 2, ICR 0, IMR 0
e1000: set_ics 2, ICR 2, IMR 0
e1000: RCTL: 0, mac_reg[RCTL] = 0x0
27744@1551930483.241768:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
27744@1551930483.241979:e1000x_rx_disabled Received packet dropped
because receive is disabled RCTL = 0
e1000: RCTL: 255, mac_reg[RCTL] = 0x40002 <- win10x32 says it can handle
RX now
e1000: set_ics 80, ICR 2, IMR 0 <- flush queue (caused by setting RCTL)
e1000: set_ics 0, ICR 82, IMR 9d <- unmask interrupt and because 0x82&0x9d
!= 0 generate interrupt, hang on here...

To workaround this problem, simply delay flush queue. Also stop receiving
when timer is going to run.

Tested on CentOS, Win7SP1x64 and Win10x32.

Signed-off-by: yuchenlin <yuchenlin@synology.com>
Reviewed-by: Dmitry Fleytman <dmitry.fleytman@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

net/socket: learn to talk with a unix dgram socket

-net socket has a fd argument, and may be passed pre-opened sockets.

TCP sockets use framing.
UDP sockets have datagram boundaries.

When given a unix dgram socket, it will be able to read from it, but
will attempt to send on the dgram_dst, which is unset. The other end
will not receive the data.

Let's teach -net socket to recognize a UNIX DGRAM socket, and use the
regular send() command (without dgram_dst).

This makes running slirp out-of-process possible that
way (python pseudo-code):

a, b = socket.socketpair(socket.AF_UNIX, socket.SOCK_DGRAM)

subprocess.Popen('qemu -net socket,fd=%d -net user' % a.fileno(), shell=True)
subprocess.Popen('qemu ... -net nic -net socket,fd=%d' % b.fileno(), shell=True)

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>

exec: Only count mapped memory backends for qemu_getrampagesize()

qemu_getrampagesize() works out the minimum host page size backing any of
guest RAM.  This is required in a few places, such as for POWER8 PAPR KVM
guests, because limitations of the hardware virtualization mean the guest
can't use pagesizes larger than the host pages backing its memory.

However, it currently checks against *every* memory backend, whether or not
it is actually mapped into guest memory at the moment.  This is incorrect.

This can cause a problem attempting to add memory to a POWER8 pseries KVM
guest which is configured to allow hugepages in the guest (e.g.
-machine cap-hpt-max-page-size=16m).  If you attempt to add non-hugepage,
you can (correctly) create a memory backend, however it (correctly) will
throw an error when you attempt to map that memory into the guest by
'device_add'ing a pc-dimm.

What's not correct is that if you then reset the guest a startup check
against qemu_getrampagesize() will cause a fatal error because of the new
memory object, even though it's not mapped into the guest.

This patch corrects the problem by adjusting find_max_supported_pagesize()
(called from qemu_getrampagesize() via object_child_foreach) to exclude
non-mapped memory backends.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>

spapr/irq: Add XIVE sanity checks on non-P9 machines

On non-P9 machines, the XIVE interrupt mode is not advertised, see
spapr_dt_ov5_platform_support(). Add a couple of checks on the machine
configuration to filter bogus setups and prevent OS failures :

                     Interrupt modes

  CPU/Compat      XICS    XIVE                dual

   P8/P8          OK      QEMU failure (1)    OK (3)
   P9/P8          OK      QEMU failure (2)    OK (3)
   P9/P9          OK      OK                  OK

  (1) CPU exception model is incompatible with XIVE and the presenters
      will fail to realize.

  (2) CPU exception model is compatible with XIVE, but the XIVE CAS
      advertisement is dropped when in POWER8 mode. So we could ended up
      booting with the XIVE DT properties but without the HCALLs. Avoid
      confusing Linux with such settings and fail under QEMU.

  (3) force XICS in machine init

Remove the check on XIVE-only machines in spapr_machine_init(), which
has now become redundant.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Message-Id: <20190328100044.11408-1-clg@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

spapr: Simplify handling of host-serial and host-model values

27461d69a0f "ppc: add host-serial and host-model machine attributes
(CVE-2019-8934)" introduced 'host-serial' and 'host-model' machine
properties for spapr to explicitly control the values advertised to the
guest in device tree properties with the same names.

The previous behaviour on KVM was to unconditionally populate the device
tree with the real host serial number and model, which leaks possibly
sensitive information about the host to the guest.

To maintain compatibility for old machine types, we allowed those props
to be set to "passthrough" to take the value from the host as before.  Or
they could be set to "none" to explicitly omit the device tree items.

Special casing specific values on what's otherwise a user supplied string
is very ugly.  So, this patch simplifies things by implementing the
backwards compatibility in a different way: we have a machine class flag
set for the older machines, and we only load the host values into the
device tree if A) they're not set by the user and B) we have that flag set.

This does mean that the "passthrough" functionality is no longer available
with the current machine type.  That's ok though: if a user or management
layer really wants the information passed through they can read it
themselves (OpenStack Nova already does something similar for x86).

It also means the user can't explicitly ask for the values to be omitted
on the old machine types.  I think that's an acceptable trade-off: if you
care enough about not leaking the host information you can either move to
the new machine type, or use a dummy value for the properties.

For the new machine type, this also removes an odd inconsistency
between running on a POWER and non-POWER (or non-Linux) hosts: if the
host information couldn't be read from where we expect (in the host's
device tree as exposed by Linux), we'd fallback to omitting the guest
device tree items.

While we're there, improve some poorly worded comments, and the help text
for the properties.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Tested-by: Greg Kurz <groug@kaod.org>

target/ppc: Fix QEMU crash with stxsdx

I've been hitting several QEMU crashes while running a fedora29 ppc64le
guest under TCG. Each time, this would occur several minutes after the
guest reached login:

Fedora 29 (Twenty Nine)
Kernel 4.20.6-200.fc29.ppc64le on an ppc64le (hvc0)

Web console: https://localhost:9090/

localhost login:
tcg/tcg.c:3211: tcg fatal error

This happens because a bug crept up in the gen_stxsdx() helper when it
was converted to use VSR register accessors by commit 8b3b2d75c7c04
"target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}() helpers
for VSR register access".

The code creates a temporary, passes it directly to gen_qemu_st64_i64()
and then to set_cpu_vrsh()... which looks like this was mistakenly
coded as a load instead of a store.

Reverse the logic: read the VSR to the temporary first and then store
it to memory.

Fixes: 8b3b2d75c7c0481544e277dad226223245e058eb
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155371035249.2038502.12364252604337688538.stgit@bahia.lan>
Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: Improve comment of bcctr used for spectre v2 mitigation

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155359567174.1794128.3183997593369465355.stgit@bahia.lan>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: Consolidate 64-bit server processor detection in a helper

We use PPC_SEGMENT_64B in various places to guard code that is specific
to 64-bit server processors compliant with arch 2.x. Consolidate the
logic in a helper macro with an explicit name.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155327783157.1283071.3747129891004927299.stgit@bahia.lan>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: Enable "decrement and test CTR" version of bcctr

Even if all ISAs up to v3 indeed mention:

    If the "decrement and test CTR" option is specified (BO2=0), the
    instruction form is invalid.

The UMs of all existing 64-bit server class processors say:

    If BO[2] = 0, the contents of CTR (before any update) are used as the
    target address and for the test of the contents of CTR to resolve the
    branch. The contents of the CTR are then decremented and written back
    to the CTR.

The linux kernel has spectre v2 mitigation code that relies on a
BO[2] = 0 variant of bcctr, which is now activated by default on
spapr, even with TCG. This causes linux guests to panic with
the default machine type under TCG.

Since any CPU model can provide its own behaviour for invalid forms,
we could possibly introduce a new instruction flag to handle this.
In practice, since the behaviour is shared by all 64-bit server
processors starting with 970 up to POWER9, let's reuse the
PPC_SEGMENT_64B flag. Caveat: this may have to be fixed later if
POWER10 introduces a different behaviour.

The existing behaviour of throwing a program interrupt is kept for
all other CPU models.

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155327782604.1283071.10640596307206921951.stgit@bahia.lan>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

target/ppc: Fix TCG temporary leaks in gen_bcond()

Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <155327782047.1283071.10234727692461848972.stgit@bahia.lan>
Tested-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>

Merge remote-tracking branch 'remotes/alistair/tags/pull-device-tree-20190327' into staging

Device Tree Pull Request for 4.0

A single patch updating the MAINTAINERS file for 4.0.

# gpg: Signature made Wed 27 Mar 2019 17:02:00 GMT
# gpg:                using RSA key F6C4AC46D4934868D3B8CE8F21E10D29DF977054
# gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [full]
# Primary key fingerprint: F6C4 AC46 D493 4868 D3B8  CE8F 21E1 0D29 DF97 7054

* remotes/alistair/tags/pull-device-tree-20190327:
  MAINTAINERS: Update the device tree maintainers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20190327' into staging

pull-seccomp-20190327

# gpg: Signature made Wed 27 Mar 2019 12:12:39 GMT
# gpg:                using RSA key DF32E7C0F0FFF9A2
# gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <otubo@redhat.com>" [full]
# Primary key fingerprint: D67E 1B50 9374 86B4 0723  DBAB DF32 E7C0 F0FF F9A2

* remotes/otubo/tags/pull-seccomp-20190327:
  seccomp: report more useful errors from seccomp
  seccomp: don't kill process for resource control syscalls

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Kconfig improvements (msi_nonbroken, imply for default PCI devices)
* intel-iommu: sharing passthrough FlatViews (Peter)
* Fix for SEV with VFIO (Brijesh)
* Allow compilation without CONFIG_PARALLEL (Thomas)

# gpg: Signature made Thu 21 Mar 2019 16:42:24 GMT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (23 commits)
  virtio-vga: only enable for specific boards
  config-all-devices.mak: rebuild on reconfigure
  minikconf: fix parser typo
  intel-iommu: optimize nodmar memory regions
  test-announce-self: convert to qgraph
  hw/alpha/Kconfig: DP264 hardware requires e1000 network card
  hw/hppa/Kconfig: Dino board requires e1000 network card
  hw/sh4/Kconfig: r2d machine requires the rtl8139 network card
  hw/ppc/Kconfig: e500 based machines require virtio-net-pci device
  hw/ppc/Kconfig: Bamboo machine requires e1000 network card
  hw/mips/Kconfig: Fulong 2e board requires ati-vga/rtl8139 PCI devices
  hw/mips/Kconfig: Malta machine requires the pcnet network card
  hw/i386/Kconfig: enable devices that can be created by default
  hw/isa/Kconfig: PIIX4 southbridge requires USB UHCI
  hw/isa/Kconfig: i82378 SuperIO requires PC speaker device
  prep: do not select I82374
  hw/i386/Kconfig: PC uses I8257, not I82374
  hw/char/parallel: Make it possible to compile also without CONFIG_PARALLEL
  target/i386: sev: Do not pin the ram device memory region
  memory: Fix the memory region type assignment order
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# hw/rdma/Makefile.objs
# hw/riscv/sifive_plic.c

Merge remote-tracking branch 'remotes/xtensa/tags/20190326-xtensa' into staging

target/xtensa fixes for v4.0:

- fix translation of FLIX bundles with multiple references to the same
  register;
- don't announce exit simcall;
- clean up tests/tcg/xtensa.

# gpg: Signature made Tue 26 Mar 2019 17:58:59 GMT
# gpg:                using RSA key 2B67854B98E5327DCDEB17D851F9CC91F83FA044
# gpg:                issuer "jcmvbkbc@gmail.com"
# gpg: Good signature from "Max Filippov <filippov@cadence.com>" [unknown]
# gpg:                 aka "Max Filippov <max.filippov@cogentembedded.com>" [full]
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>" [full]
# Primary key fingerprint: 2B67 854B 98E5 327D CDEB  17D8 51F9 CC91 F83F A044

* remotes/xtensa/tags/20190326-xtensa:
  tests/tcg/xtensa: clean up test set
  target/xtensa: don't announce exit simcall
  target/xtensa: fix break_dependency for repeated resources

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

MAINTAINERS: Update the device tree maintainers

Remove Alex as a Device Tree maintainer as requested by him. Add myself
as a maintainer to avoid it being orphaned. Also add David as a
Reviewer (R) as he is the libfdt and DTC maintainer.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alexander Graf <agraf@csgraf.de>
Acked-by: David Gibson <david@gibson.dropbear.id.au>

seccomp: report more useful errors from seccomp

Most of the seccomp functions return errnos as a negative return
value. The code is currently ignoring these and reporting a generic
error message for all seccomp failure scenarios making debugging
painful. Report a more precise error from each failed call and include
errno if it is available.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@redhat.com>

seccomp: don't kill process for resource control syscalls

The Mesa library tries to set process affinity on some of its threads in
order to optimize its performance. Currently this results in QEMU being
immediately terminated when seccomp is enabled.

Mesa doesn't consider failure of the process affinity settings to be
fatal to its operation, but our seccomp policy gives it no choice in
gracefully handling this denial.

It is reasonable to consider that malicious code using the resource
control syscalls to be a less serious attack than if they were trying
to spawn processes or change UIDs and other such things. Generally
speaking changing the resource control setting will "merely" affect
quality of service of processes on the host. With this in mind, rather
than kill the process, we can relax the policy for these syscalls to
return the EPERM errno value. This allows callers to detect that QEMU
does not want them to change resource allocations, and apply some
reasonable fallback logic.

The main downside to this is for code which uses these syscalls but does
not check the return value, blindly assuming they will always
succeeed. Returning an errno could result in sub-optimal behaviour.
Arguably though such code is already broken & needs fixing regardless.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Eduardo Otubo <otubo@redhat.com>

Update version for v4.0.0-rc1 release

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging

Block layer patches:

- Fix slow pre-zeroing in qemu-img convert
- Test case for block job pausing on I/O errors

# gpg: Signature made Tue 26 Mar 2019 15:28:00 GMT
# gpg:                using RSA key 7F09B272C88F2FD6
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]
# Primary key fingerprint: DC3D EB15 9A9A F95D 3D74  56FE 7F09 B272 C88F 2FD6

* remotes/kevin/tags/for-upstream:
  qemu-io: Add write -n for BDRV_REQ_NO_FALLBACK
  qemu-img: Use BDRV_REQ_NO_FALLBACK for pre-zeroing
  file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes
  block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers
  block: Add BDRV_REQ_NO_FALLBACK
  block: Remove error messages in bdrv_make_zero()
  iotests: add 248: test resume mirror after auto pause on ENOSPC

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/kraxel/tags/fixes-20190326-pull-request' into staging

fixes for 4.0: ohci and ati-vga

# gpg: Signature made Tue 26 Mar 2019 14:05:40 GMT
# gpg:                using RSA key 4CB6D8EED3E87138
# gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full]
# gpg:                 aka "Gerd Hoffmann <gerd@kraxel.org>" [full]
# gpg:                 aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full]
# Primary key fingerprint: A032 8CFF B93A 17A7 9901  FE7D 4CB6 D8EE D3E8 7138

* remotes/kraxel/tags/fixes-20190326-pull-request:
  ati-vga: Fix indexed access to video memory
  ohci: don't die on ED_LINK_LIMIT overflow

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20190326' into staging

target-arm queue:
* Set SIMDMISC and FPMISC for 32-bit -cpu max
   (fixes regression from 3.1)
* fix vCont packet handling when no thread is specified

# gpg: Signature made Tue 26 Mar 2019 13:09:48 GMT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [ultimate]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate]
# Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83  15CF 3C25 25ED 1436 0CDE

* remotes/pmaydell/tags/pull-target-arm-20190326:
  gdbstub: fix vCont packet handling when no thread is specified
  target/arm: Set SIMDMISC and FPMISC for 32-bit -cpu max

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

gdbstub: fix vCont packet handling when no thread is specified

The vCont packet accepts a series of actions, each being applied on a
given thread ID. Giving no thread ID for an action is valid and means
"all threads".

This commit fixes vCont packets being incorrectly rejected when no
thread ID was given for an action.

In multiprocess mode, the GDB Remote Protocol specification is unclear
on what "all threads" means. We choose to apply the action on all
threads of all attached processes.

This commit is based on the initial fix by Lucien Murray-Pitts.

Fixes: e40e5204af8388
Reported-by: Lucien Murray-Pitts <lucienmp_antispam@yahoo.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Luc Michel <luc.michel@greensocs.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190325110452.6756-1-luc.michel@greensocs.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Set SIMDMISC and FPMISC for 32-bit -cpu max

Fixes: https://bugs.launchpad.net/bugs/1821430
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20190325161338.6536-1-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

ati-vga: Fix indexed access to video memory

Coverity (CID 1399700) found that this was wrong so instead of trying
to do it by hand use existing access functions that should work better.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-id: 20190318223842.427CB7456B2@zero.eik.bme.hu
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

ohci: don't die on ED_LINK_LIMIT overflow

Stop processing the descriptor list instead. The next frame timer tick will
resume the work

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1686705
Suggested-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Message-id: 20190321085212.10796-1-lvivier@redhat.com
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

qemu-io: Add write -n for BDRV_REQ_NO_FALLBACK

This makes the new BDRV_REQ_NO_FALLBACK flag available in the qemu-io
write command.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>

qemu-img: Use BDRV_REQ_NO_FALLBACK for pre-zeroing

If qemu-img convert sees that the target image isn't zero-initialised
yet, it tries to do an efficient zero write for the whole image first
to save the overhead of repeated explicit zero writes during the
conversion. Obviously, this provides only an advantage if the
pre-zeroing is actually efficient. Otherwise, we can end up writing
zeroes slowly while zeroing out the whole image, and then overwrite the
same blocks again with real data, potentially doubling the written data.

Pass BDRV_REQ_NO_FALLBACK to blk_make_zero() to avoid this case. If we
can't efficiently zero out, we'll instead write explicit zeroes only if
there is no data to be written to a block.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>

file-posix: Support BDRV_REQ_NO_FALLBACK for zero writes

We know that the kernel implements a slow fallback code path for
BLKZEROOUT, so if BDRV_REQ_NO_FALLBACK is given, we shouldn't call it.
The other operations we call in the context of .bdrv_co_pwrite_zeroes
should usually be quick, so no modification should be needed for them.
If we ever notice that there are additional problematic cases, we can
still make these conditional as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>

block: Advertise BDRV_REQ_NO_FALLBACK in filter drivers

Filter drivers that support .bdrv_co_pwrite_zeroes can safely advertise
BDRV_REQ_NO_FALLBACK because they just forward the request flags to
their child node.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>

block: Add BDRV_REQ_NO_FALLBACK

For qemu-img convert, we want an operation that zeroes out the whole
image if this can be done efficiently, but that returns an error
otherwise so we don't write explicit zeroes and immediately overwrite
them with the real data, potentially doubling the amount of data to be
written.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>

block: Remove error messages in bdrv_make_zero()

There is only a single caller of bdrv_make_zero(), which is qemu-img
convert. If the function fails, we just fall back to a different method
of zeroing out blocks on the target image. There is no good reason to
print error messages on stderr when the higher level operation will
actually succeed.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Eric Blake <eblake@redhat.com>

iotests: add 248: test resume mirror after auto pause on ENOSPC

Test that mirror job actually resume on resume command after being
automatically paused on ENOSPC error.

It's a follow-up test for 8d9648cbf3e
"blockjob: fix user pause in block_job_error_action"

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Tested-by: John Snow <jsnow@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/palmer/tags/riscv-for-master-4.0-rc1-v2' into staging

A second RISC-V Patch for 4.0.0-rc1

Sorry for sending two back-to-back pull requests.  It looks like I
misunderstood Kito and there were actually two patches necessary to fix
the GCC test suite runs.

# gpg: Signature made Tue 26 Mar 2019 10:20:20 GMT
# gpg:                using RSA key 00CE76D1834960DFCE886DF8EF4CA1502CCBAB41
# gpg:                issuer "palmer@dabbelt.com"
# gpg: Good signature from "Palmer Dabbelt <palmer@dabbelt.com>" [unknown]
# gpg:                 aka "Palmer Dabbelt <palmer@sifive.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 00CE 76D1 8349 60DF CE88  6DF8 EF4C A150 2CCB AB41

* remotes/palmer/tags/riscv-for-master-4.0-rc1-v2:
  target/riscv: Fix wrong expanding for c.fswsp

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/riscv: Fix wrong expanding for c.fswsp

base register is no rs1 not rs2 for fsw.

Signed-off-by: Kito Cheng <kito.cheng@gmail.com>
Reviewed-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>

Merge remote-tracking branch 'remotes/armbru/tags/pull-pflash-2019-03-26' into staging

Pflash and firmware configuration patches for 2019-03-26

# gpg: Signature made Tue 26 Mar 2019 07:21:13 GMT
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-pflash-2019-03-26:
  pflash: Bury disabled code to limit device sizes
  pflash: Require backend size to match device, improve errors

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/armbru/tags/pull-misc-2019-03-26' into staging

Miscellaneous patches for 2019-03-26

# gpg: Signature made Tue 26 Mar 2019 07:10:23 GMT
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [full]
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-misc-2019-03-26:
  qapi/qmp-dispatch: fix return value in do_qmp_dispatch
  json: Fix off-by-one assert check in next_state()
  xen-block: Replace qdict_put_obj() by qdict_put() where appropriate
  util/error: Remove an unnecessary NULL check

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/palmer/tags/riscv-for-master-4.0-rc1' into staging

A Single RISC-V Patch for 4.0-rc1

If this is too late I'm OK with it being in rc2, but it fixes a concrete
regression and nobody has complained yet so I'd prefer it to be in rc1
if possible.

The fix is to zero-extend the inputs to DIVUW and REMUW, which was
exposed by the GCC test suite.

# gpg: Signature made Tue 26 Mar 2019 05:54:20 GMT
# gpg:                using RSA key 00CE76D1834960DFCE886DF8EF4CA1502CCBAB41
# gpg:                issuer "palmer@dabbelt.com"
# gpg: Good signature from "Palmer Dabbelt <palmer@dabbelt.com>" [unknown]
# gpg:                 aka "Palmer Dabbelt <palmer@sifive.com>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 00CE 76D1 8349 60DF CE88  6DF8 EF4C A150 2CCB AB41

* remotes/palmer/tags/riscv-for-master-4.0-rc1:
  target/riscv: Zero extend the inputs of divuw and remuw

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

pflash: Bury disabled code to limit device sizes

We disabled code to limit device sizes to 8, 16, 32 or 64MiB more than
a decade ago in commit 95d1f3edd5e and c8b153d7949, v0.9.1. Bury.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
[Extracted from a larger patch, extended to pflash_cfi02.c]
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190319163551.32499-3-armbru@redhat.com>

pflash: Require backend size to match device, improve errors

We reject undersized backends with a rather enigmatic "failed to read
the initial flash content" error.  For instance:

    $ qemu-system-ppc64 -S -display none -M sam460ex -drive if=pflash,format=raw,file=eins.img
    qemu-system-ppc64: Initialization of device cfi.pflash02 failed: failed to read the initial flash content

We happily accept oversized images, ignoring their tail.  Throwing
away parts of firmware that way is pretty much certain to end in an
even more enigmatic failure to boot.

Require the backend's size to match the device's size exactly.  Report
mismatch like this:

    qemu-system-ppc64: Initialization of device cfi.pflash01 failed: device requires 1048576 bytes, block backend provides 512 bytes

Improve the error for actual read failures to "can't read block
backend".

To avoid duplicating even more code between the two pflash device
models, do all that in new helper blk_check_size_and_read_all().

The error reporting can still be confusing.  For instance:

    qemu-system-ppc64 -S -display none -M taihu -drive if=pflash,format=raw,file=eins.img  -drive if=pflash,unit=1,format=raw,file=zwei.img
    qemu-system-ppc64: Initialization of device cfi.pflash02 failed: device requires 2097152 bytes, block backend provides 512 bytes

Leaves the user guessing which of the two -drive is wrong.  Mention
the issue in a TODO comment.

Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190319163551.32499-2-armbru@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

qapi/qmp-dispatch: fix return value in do_qmp_dispatch

There are no harm but just looks weird to return bool in
pointer-returning function. Introduced in 69240fe62d1 with the whole
failure-checking "if" chunk.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20190325154748.66381-1-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

json: Fix off-by-one assert check in next_state()

The assert checking if the value of lexer->state in next_state(),
which is used as an index to the 'json_lexer' array, incorrectly
checks for an index value less than or equal to ARRAY_SIZE(json_lexer).
Fix assert so that it just checks for an index less than the array size.

Signed-off-by: Liam Merwick <liam.merwick@oracle.com>
Message-Id: <1553169472-25325-1-git-send-email-liam.merwick@oracle.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Li Qiang <liq3ea@gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

xen-block: Replace qdict_put_obj() by qdict_put() where appropriate

Patch created mechanically by rerunning:

    $ spatch --sp-file scripts/coccinelle/qobject.cocci \
             --macro-file scripts/cocci-macro-file.h \
             --dir hw/block --in-place

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20190313174433.12966-1-armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

util/error: Remove an unnecessary NULL check

This NULL check was required while introduced in 680d16dcb79f.
Later refactor added a NULL check in error_setv(), so this check
is now redundant.

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20190302223825.11192-2-philmd@redhat.com>
Signed-off-by: Markus Armbruster <armbru@redhat.com>

pc-bios: Update palcode-clipper

Report machine checks to the kernel.
It is now using these for probing missing devices.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge remote-tracking branch 'remotes/juanquintela/tags/migration-pull-request' into staging

Pull request

- Rebase last pull request
- Drop multifd
- several other minor fixesLaLaLa

# gpg: Signature made Mon 25 Mar 2019 17:46:29 GMT
# gpg:                using RSA key F487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>" [full]
# gpg:                 aka "Juan Quintela <quintela@trasno.org>" [full]
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration-pull-request:
  migration/postcopy: Update the bandwidth during postcopy
  Migration/colo.c: Make user obtain the last COLO mode info after failover
  Migration/colo.c: Add the necessary checks for colo_do_failover
  Migration/colo.c: Add new COLOExitReason to handle all failover state
  Migration/colo.c: Fix COLO failover status error
  migration/rdma: Check qemu_rdma_init_one_block
  migration: add support for a "tls-authz" migration parameter
  multifd: Drop x-
  multifd: Add some padding
  multifd: Change default packet size
  multifd: Be flexible about packet size
  multifd: Drop x-multifd-page-count parameter
  multifd: Create new next_packet_size field
  multifd: Rename "size" member to pages_alloc
  multifd: Only send pages when packet are not empty

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

migration/postcopy: Update the bandwidth during postcopy

The recently added max-postcopy-bandwidth parameter is only read
at the transition from precopy->postcopy where as the older
max-bandwidth parameter updates the migration bandwidth when changed
even if the migration is already running.

Fix this discrepency so that:
  a) You can change the bandwidth during postcopy by setting
     max-postcopy-bandwidth

  b) Changing max-bandwidth during postcopy has no effect
     (it currently changes the postcopy bandwidth which isn't
     expected).

Fixes: 7e555c6c
bz: https://bugzilla.redhat.com/show_bug.cgi?id=1686321
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Migration/colo.c: Make user obtain the last COLO mode info after failover

Add the last_colo_mode to save the status after failover.
This patch can solve the issue that user want to get last colo mode
use query_colo_status after failover.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Migration/colo.c: Add the necessary checks for colo_do_failover

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Migration/colo.c: Add new COLOExitReason to handle all failover state

In this patch we add the processing state for COLOExitReason,
because we have to identify COLO in the failover processing state or
failover error state. In the way, we can handle all the failover state.
We have improved the description of the COLOExitReason by the way.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

Migration/colo.c: Fix COLO failover status error

When finished COLO failover, the status is FAILOVER_STATUS_COMPLETED.
The origin codes misunderstand the FAILOVER_STATUS_REQUIRE.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration/rdma: Check qemu_rdma_init_one_block

Actually it can't fail at the moment, but Coverity moans that
it's the only place it's not checked, and it's an easy check.

Reported-by: Coverity (CID 1399413)
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

migration: add support for a "tls-authz" migration parameter

The QEMU instance that runs as the server for the migration data
transport (ie the target QEMU) needs to be able to configure access
control so it can prevent unauthorized clients initiating an incoming
migration. This adds a new 'tls-authz' migration parameter that is used
to provide the QOM ID of a QAuthZ subclass instance that provides the
access control check. This is checked against the x509 certificate
obtained during the TLS handshake.

For example, when starting a QEMU for incoming migration, it is
possible to give an example identity of the source QEMU that is
intended to be connecting later:

  $QEMU \
     -monitor stdio \
     -incoming defer \
     ...other args...

  (qemu) object_add tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\
             endpoint=server,verify-peer=yes \
  (qemu) object_add authz-simple,id=auth0,identity=CN=laptop.example.com,,\
             O=Example Org,,L=London,,ST=London,,C=GB \
  (qemu) migrate_incoming tcp:localhost:9000

Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

multifd: Drop x-

We make it supported from now on.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

multifd: Add some padding

Add some padding.
MultifdInit_t is padded to 64 bytes.
MultiFDPacket_t is padded to 320bytes (64 * 5).

Signed-off-by: Juan Quintela <quintela@redhat.com>

multifd: Change default packet size

We moved from 64KB to 512KB, as it makes less locking contention
without any downside in testing.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>

multifd: Be flexible about packet size

This way we can change the packet size in the future and everything
will work. We choose an arbitrary big number (100 times configured
size) as a limit about how big we will reallocate.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>