git.proxmox.com Git - mirror

blockjobs: add CONCLUDED state

add a new state "CONCLUDED" that identifies a job that has ceased all
operations. The wording was chosen to avoid any phrasing that might
imply success, error, or cancellation. The task has simply ceased all
operation and can never again perform any work.

("finished", "done", and "completed" might all imply success.)

Transitions:
Running  -> Concluded: normal completion
Ready    -> Concluded: normal completion
Aborting -> Concluded: error and cancellations

Verbs:
None as of this commit. (a future commit adds 'dismiss')

             +---------+
             |UNDEFINED|
             +--+------+
                |
             +--v----+
   +---------+CREATED|
   |         +--+----+
   |            |
   |         +--v----+     +------+
   +---------+RUNNING<----->PAUSED|
   |         +--+-+--+     +------+
   |            | |
   |            | +------------------+
   |            |                    |
   |         +--v--+       +-------+ |
   +---------+READY<------->STANDBY| |
   |         +--+--+       +-------+ |
   |            |                    |
+--v-----+   +--v------+             |
|ABORTING+--->CONCLUDED<-------------+
+--------+   +---------+

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjobs: add ABORTING state

Add a new state ABORTING.

This makes transitions from normative states to error states explicit
in the STM, and serves as a disambiguation for which states may complete
normally when normal end-states (CONCLUDED) are added in future commits.

Notably, Paused/Standby jobs do not transition directly to aborting,
as they must wake up first and cooperate in their cancellation.

Transitions:
Created -> Aborting: can be cancelled (by the system)
Running -> Aborting: can be cancelled or encounter an error
Ready   -> Aborting: can be cancelled or encounter an error

Verbs:
None. The job must finish cleaning itself up and report its final status.

             +---------+
             |UNDEFINED|
             +--+------+
                |
             +--v----+
   +---------+CREATED|
   |         +--+----+
   |            |
   |         +--v----+     +------+
   +---------+RUNNING<----->PAUSED|
   |         +--+----+     +------+
   |            |
   |         +--v--+       +-------+
   +---------+READY<------->STANDBY|
   |         +-----+       +-------+
   |
+--v-----+
|ABORTING|
+--------+

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjobs: add block_job_verb permission table

Which commands ("verbs") are appropriate for jobs in which state is
also somewhat burdensome to keep track of.

As of this commit, it looks rather useless, but begins to look more
interesting the more states we add to the STM table.

A recurring theme is that no verb will apply to an 'undefined' job.

Further, it's not presently possible to restrict the "pause" or "resume"
verbs any more than they are in this commit because of the asynchronous
nature of how jobs enter the PAUSED state; justifications for some
seemingly erroneous applications are given below.

=====
Verbs
=====

Cancel:    Any state except undefined.
Pause:     Any state except undefined;
           'created': Requests that the job pauses as it starts.
           'running': Normal usage. (PAUSED)
           'paused':  The job may be paused for internal reasons,
                      but the user may wish to force an indefinite
                      user-pause, so this is allowed.
           'ready':   Normal usage. (STANDBY)
           'standby': Same logic as above.
Resume:    Any state except undefined;
           'created': Will lift a user's pause-on-start request.
           'running': Will lift a pause request before it takes effect.
           'paused':  Normal usage.
           'ready':   Will lift a pause request before it takes effect.
           'standby': Normal usage.
Set-speed: Any state except undefined, though ready may not be meaningful.
Complete:  Only a 'ready' job may accept a complete request.

=======
Changes
=======

(1)

To facilitate "nice" error checking, all five major block-job verb
interfaces in blockjob.c now support an errp parameter:

- block_job_user_cancel is added as a new interface.
- block_job_user_pause gains an errp paramter
- block_job_user_resume gains an errp parameter
- block_job_set_speed already had an errp parameter.
- block_job_complete already had an errp parameter.

(2)

block-job-pause and block-job-resume will no longer no-op when trying
to pause an already paused job, or trying to resume a job that isn't
paused. These functions will now report that they did not perform the
action requested because it was not possible.

iotests have been adjusted to address this new behavior.

(3)

block-job-complete doesn't worry about checking !block_job_started,
because the permission table guards against this.

(4)

test-bdrv-drain's job implementation needs to announce that it is
'ready' now, in order to be completed.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests: add pause_wait

Split out the pause command into the actual pause and the wait.
Not every usage presently needs to resubmit a pause request.

The intent with the next commit will be to explicitly disallow
redundant or meaningless pause/resume requests, so the tests
need to become more judicious to reflect that.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjobs: add state transition table

The state transition table has mostly been implied. We're about to make
it a bit more complex, so let's make the STM explicit instead.

Perform state transitions with a function that for now just asserts the
transition is appropriate.

Transitions:
Undefined -> Created: During job initialization.
Created   -> Running: Once the job is started.
                      Jobs cannot transition from "Created" to "Paused"
                      directly, but will instead synchronously transition
                      to running to paused immediately.
Running   -> Paused:  Normal workflow for pauses.
Running   -> Ready:   Normal workflow for jobs reaching their sync point.
                      (e.g. mirror)
Ready     -> Standby: Normal workflow for pausing ready jobs.
Paused    -> Running: Normal resume.
Standby   -> Ready:   Resume of a Standby job.

+---------+
|UNDEFINED|
+--+------+
   |
+--v----+
|CREATED|
+--+----+
   |
+--v----+     +------+
|RUNNING<----->PAUSED|
+--+----+     +------+
   |
+--v--+       +-------+
|READY<------->STANDBY|
+-----+       +-------+

Notably, there is no state presently defined as of this commit that
deals with a job after the "running" or "ready" states, so this table
will be adjusted alongside the commits that introduce those states.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjobs: add status enum

We're about to add several new states, and booleans are becoming
unwieldly and difficult to reason about. It would help to have a
more explicit bookkeeping of the state of blockjobs. To this end,
add a new "status" field and add our existing states in a redundant
manner alongside the bools they are replacing:

UNDEFINED: Placeholder, default state. Not currently visible to QMP
           unless changes occur in the future to allow creating jobs
           without starting them via QMP.
CREATED:   replaces !!job->co && paused && !busy
RUNNING:   replaces effectively (!paused && busy)
PAUSED:    Nearly redundant with info->paused, which shows pause_count.
           This reports the actual status of the job, which almost always
           matches the paused request status. It differs in that it is
           strictly only true when the job has actually gone dormant.
READY:     replaces job->ready.
STANDBY:   Paused, but job->ready is true.

New state additions in coming commits will not be quite so redundant:

WAITING:   Waiting on transaction. This job has finished all the work
           it can until the transaction converges, fails, or is canceled.
PENDING:   Pending authorization from user. This job has finished all the
           work it can until the job or transaction is finalized via
           block_job_finalize. This implies the transaction has converged
           and left the WAITING phase.
ABORTING:  Job has encountered an error condition and is in the process
           of aborting.
CONCLUDED: Job has ceased all operations and has a return code available
           for query and may be dismissed via block_job_dismiss.
NULL:      Job has been dismissed and (should) be destroyed. Should never
           be visible to QMP.

Some of these states appear somewhat superfluous, but it helps define the
expected flow of a job; so some of the states wind up being synchronous
empty transitions. Importantly, jobs can be in only one of these states
at any given time, which helps code and external users alike reason about
the current condition of a job unambiguously.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Blockjobs: documentation touchup

Trivial; Document what the job creation flags do,
and some general tidying.

Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjobs: model single jobs as transactions

model all independent jobs as single job transactions.

It's one less case we have to worry about when we add more states to the
transition machine. This way, we can just treat all job lifetimes exactly
the same. This helps tighten assertions of the STM graph and removes some
conditionals that would have been needed in the coming commits adding a
more explicit job lifetime management API.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

blockjobs: fix set-speed kick

If speed is '0' it's not actually "less than" the previous speed.
Kick the job in this case too.

Signed-off-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Merge remote-tracking branch 'remotes/xtensa/tags/20180316-xtensa' into staging

target/xtensa linux-user support.

- small cleanup for xtensa registers dumping (-d cpu);
- add support for debugging linux-user process with xtensa-linux-gdb
  (as opposed to xtensa-elf-gdb), which can only access unprivileged
  registers;
- enable MTTCG for target/xtensa;
- cleanup in linux-user/mmap area making sure that it works correctly
  with limited 30-bit-wide user address space;
- import xtensa-specific definitions from the linux kernel,
  conditionalize user-only/softmmu-only code and add handlers for
  signals, exceptions, process/thread creation and core registers dumping.

# gpg: Signature made Fri 16 Mar 2018 16:46:19 GMT
# gpg:                using RSA key 51F9CC91F83FA044
# gpg: Good signature from "Max Filippov <filippov@cadence.com>"
# gpg:                 aka "Max Filippov <max.filippov@cogentembedded.com>"
# gpg:                 aka "Max Filippov <jcmvbkbc@gmail.com>"
# Primary key fingerprint: 2B67 854B 98E5 327D CDEB  17D8 51F9 CC91 F83F A044

* remotes/xtensa/tags/20180316-xtensa:
  MAINTAINERS: fix W: address for xtensa
  qemu-binfmt-conf.sh: add qemu-xtensa
  target/xtensa: add linux-user support
  linux-user: drop unused target_msync function
  linux-user: fix target_mprotect/target_munmap error return values
  linux-user: fix assertion in shmdt
  linux-user: fix mmap/munmap/mprotect/mremap/shmat
  target/xtensa: support MTTCG
  target/xtensa: use correct number of registers in gdbstub
  target/xtensa: mark register windows in the dump
  target/xtensa: dump correct physical registers

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# linux-user/syscall.c

Merge remote-tracking branch 'remotes/ehabkost/tags/machine-next-pull-request' into staging

PC compat bug fix for 2.12

# gpg: Signature made Fri 16 Mar 2018 19:30:25 GMT
# gpg:                using RSA key 2807936F984DC5A6
# gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
# Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6

* remotes/ehabkost/tags/machine-next-pull-request:
  pc: correct misspelled CPU model-id for pc 2.2

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

pc: correct misspelled CPU model-id for pc 2.2

Signed-off-by: Wang Xin <wangxinxin.wang@huawei.com>
Message-Id: <1517367668-25048-1-git-send-email-wangxinxin.wang@huawei.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180316' into staging

Queued TCG patches

# gpg: Signature made Thu 15 Mar 2018 17:17:31 GMT
# gpg:                using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>"
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20180316:
  tcg: Add choose_vector_size
  tcg/i386: Support INDEX_op_dup2_vec for -m32
  tcg: Improve tcg_gen_muli_i32/i64

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

MAINTAINERS: fix W: address for xtensa

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

qemu-binfmt-conf.sh: add qemu-xtensa

Register qemu-xtensa and qemu-xtensaeb for transparent linux userspace
emulation.

Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: add linux-user support

Import list of syscalls from the kernel source. Conditionalize code/data
that is only used with softmmu. Implement exception handlers. Implement
signal hander (only the core registers for now, no coprocessors or TIE).

Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

Merge remote-tracking branch 'remotes/jnsnow/tags/bitmaps-pull-request' into staging

# gpg: Signature made Tue 13 Mar 2018 21:11:43 GMT
# gpg:                using RSA key 7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/bitmaps-pull-request:
  iotests: add dirty bitmap postcopy test
  iotests: add dirty bitmap migration test
  migration: add postcopy migration of dirty bitmaps
  migration: allow qmp command migrate-start-postcopy for any postcopy
  migration: add is_active_iterate handler
  migration/qemu-file: add qemu_put_counted_string()
  migration: include migrate_dirty_bitmaps in migrate_postcopy
  qapi: add dirty-bitmaps migration capability
  migration: introduce postcopy-only pending
  dirty-bitmap: add locked state
  block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap
  block/dirty-bitmap: fix locking in bdrv_reclaim_dirty_bitmap
  block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor()

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2018-03-13-v2' into staging

nbd patches for 2018-03-13

- Eric Blake: iotests: Fix stuck NBD process on 33
- Vladimir Sementsov-Ogievskiy: 0/5 nbd server fixing and refactoring before BLOCK_STATUS
- Eric Blake: nbd/server: Honor FUA request on NBD_CMD_TRIM
- Stefan Hajnoczi: 0/2 block: fix nbd-server-stop crash after blockdev-snapshot-sync
- Vladimir Sementsov-Ogievskiy: nbd block status base:allocation

# gpg: Signature made Tue 13 Mar 2018 20:48:37 GMT
# gpg:                using RSA key A7A16B4A2527436A
# gpg: Good signature from "Eric Blake <eblake@redhat.com>"
# gpg:                 aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>"
# gpg:                 aka "[jpeg image of size 6874]"
# Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2  F3AA A7A1 6B4A 2527 436A

* remotes/ericb/tags/pull-nbd-2018-03-13-v2:
  iotests: new test 209 for NBD BLOCK_STATUS
  iotests: add file_path helper
  iotests.py: tiny refactor: move system imports up
  nbd: BLOCK_STATUS for standard get_block_status function: client part
  block/nbd-client: save first fatal error in nbd_iter_error
  nbd: BLOCK_STATUS for standard get_block_status function: server part
  nbd/server: add nbd_read_opt_name helper
  nbd/server: add nbd_opt_invalid helper
  iotests: add 208 nbd-server + blockdev-snapshot-sync test case
  block: let blk_add/remove_aio_context_notifier() tolerate BDS changes
  nbd/server: Honor FUA request on NBD_CMD_TRIM
  nbd/server: refactor nbd_trip: split out nbd_handle_request
  nbd/server: refactor nbd_trip: cmd_read and generic reply
  nbd/server: fix: check client->closing before sending reply
  nbd/server: fix sparse read
  nbd/server: move nbd_co_send_structured_error up
  iotests: Fix stuck NBD process on 33

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

* Record-replay lockstep execution, log dumper and fixes (Alex, Pavel)
* SCSI fix to pass maximum transfer size (Daniel Barboza)
* chardev fixes and improved iothread support (Daniel Berrangé, Peter)
* checkpatch tweak (Eric)
* make help tweak (Marc-André)
* make more PCI NICs available with -net or -nic (myself)
* change default q35 NIC to e1000e (myself)
* SCSI support for NDOB bit (myself)
* membarrier system call support (myself)
* SuperIO refactoring (Philippe)
* miscellaneous cleanups and fixes (Thomas)

# gpg: Signature made Mon 12 Mar 2018 16:10:52 GMT
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream: (69 commits)
  tcg: fix cpu_io_recompile
  replay: update documentation
  replay: save vmstate of the asynchronous events
  replay: don't process async events when warping the clock
  scripts/replay-dump.py: replay log dumper
  replay: avoid recursive call of checkpoints
  replay: check return values of fwrite
  replay: push replay_mutex_lock up the call tree
  replay: don't destroy mutex at exit
  replay: make locking visible outside replay code
  replay/replay-internal.c: track holding of replay_lock
  replay/replay.c: bump REPLAY_VERSION again
  replay: save prior value of the host clock
  replay: added replay log format description
  replay: fix save/load vm for non-empty queue
  replay: fixed replay_enable_events
  replay: fix processing async events
  cpu-exec: fix exception_index handling
  hw/i386/pc: Factor out the superio code
  hw/alpha/dp264: Use the TYPE_SMC37C669_SUPERIO
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# default-configs/i386-softmmu.mak
# default-configs/x86_64-softmmu.mak

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20180313.0' into staging

VFIO updates 2018-03-13

- Display support for vGPUs (Gerd Hoffmann)

- Enable new kernel support for mmaps overlapping MSI-X vector table,
   disable MSI-X emulation on POWER (Alexey Kardashevskiy)

# gpg: Signature made Tue 13 Mar 2018 19:48:49 GMT
# gpg:                using RSA key 239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-update-20180313.0:
  ppc/spapr, vfio: Turn off MSIX emulation for VFIO devices
  vfio-pci: Allow mmap of MSIX BAR
  vfio/pci: Relax DMA map errors for MMIO regions
  vfio/display: adding dmabuf support
  vfio/display: adding region support
  vfio/display: core & wireup
  vfio/common: cleanup in vfio_region_finalize
  secondary-vga: properly close QemuConsole on unplug
  console: minimal hotplug suport
  ui/pixman: add qemu_drm_format_to_pixman()
  standard-headers: add drm/drm_fourcc.h

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/berrange/tags/socket-next-pull-request' into staging

# gpg: Signature made Tue 13 Mar 2018 18:12:14 GMT
# gpg:                using RSA key BE86EBB415104FDF
# gpg: Good signature from "Daniel P. Berrange <dan@berrange.com>"
# gpg:                 aka "Daniel P. Berrange <berrange@redhat.com>"
# Primary key fingerprint: DAF3 A6FD B26B 6291 2D0E  8E3F BE86 EBB4 1510 4FDF

* remotes/berrange/tags/socket-next-pull-request:
  char: allow passing pre-opened socket file descriptor at startup
  char: refactor parsing of socket address information
  sockets: allow SocketAddress 'fd' to reference numeric file descriptors
  sockets: check that the named file descriptor is a socket
  sockets: move fd_is_socket() into common sockets code
  sockets: strengthen test suite IP protocol availability checks
  sockets: pull code for testing IP availability out of specific test
  cutils: add qemu_strtoi & qemu_strtoui parsers for int/unsigned int types
  char: don't silently skip tn3270 protocol init when TLS is enabled

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/vivier2/tags/linux-user-for-2.12-pull-request' into staging

# gpg: Signature made Tue 13 Mar 2018 17:33:03 GMT
# gpg:                using RSA key F30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier2/tags/linux-user-for-2.12-pull-request:
  linux-user: init_guest_space: Add a comment about search strategy
  linux-user: init_guest_space: Don't try to align if we'll reject it
  linux-user: init_guest_space: Clean up control flow a bit
  linux-user: init_guest_commpage: Add a comment about size check
  linux-user: init_guest_space: Clarify page alignment logic
  linux-user: init_guest_space: Correctly handle guest_start in commpage initialization
  linux-user: init_guest_space: Clean up if we can't initialize the commpage
  linux-user: Rename validate_guest_space => init_guest_commpage
  linux-user: Use #if to only call validate_guest_space for 32-bit ARM target
  qemu-binfmt-conf.sh: add qemu-xtensa
  linux-user: drop unused target_msync function
  linux-user: fix target_mprotect/target_munmap error return values
  linux-user: fix assertion in shmdt
  linux-user: fix mmap/munmap/mprotect/mremap/shmat
  linux-user: Support f_flags in statfs when available.
  linux-user: allows to use "--systemd ALL" with qemu-binfmt-conf.sh
  linux-user: Remove the unused "not implemented" signal handling stubs
  linux-user: Drop unicore32 code

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

tcg: Add choose_vector_size

This unifies 5 copies of checks for supported vector size,
and in the process fixes a missing check in tcg_gen_gvec_2s.

This lead to an assertion failure for 64-bit vector multiply,
which is not available in the AVX instruction set.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg/i386: Support INDEX_op_dup2_vec for -m32

Unknown why -m32 was passing with gcc but not clang; it should have
failed for both. This would be used for tcg_gen_dup_i64_vec, and
visible with the right TB and an aarch64 guest.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

tcg: Improve tcg_gen_muli_i32/i64

Convert multiplication by power of two to left shift.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream-sev' into staging

* Migrate MSR_SMI_COUNT (Liran)
* Update kernel headers (Gerd, myself)
* SEV support (Brijesh)

I have not tested non-x86 compilation, but I reordered the SEV patches
so that all non-x86-specific changes go first to catch any possible
issues (which weren't there anyway :)).

# gpg: Signature made Tue 13 Mar 2018 16:37:06 GMT
# gpg:                using RSA key BFFBD25F78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream-sev: (22 commits)
  sev/i386: add sev_get_capabilities()
  sev/i386: qmp: add query-sev-capabilities command
  sev/i386: qmp: add query-sev-launch-measure command
  sev/i386: hmp: add 'info sev' command
  cpu/i386: populate CPUID 0x8000_001F when SEV is active
  sev/i386: add migration blocker
  sev/i386: finalize the SEV guest launch flow
  sev/i386: add support to LAUNCH_MEASURE command
  target/i386: encrypt bios rom
  sev/i386: add command to encrypt guest memory region
  sev/i386: add command to create launch memory encryption context
  sev/i386: register the guest memory range which may contain encrypted data
  sev/i386: add command to initialize the memory encryption context
  include: add psp-sev.h header file
  sev/i386: qmp: add query-sev command
  target/i386: add Secure Encrypted Virtualization (SEV) object
  kvm: introduce memory encryption APIs
  kvm: add memory encryption context
  docs: add AMD Secure Encrypted Virtualization (SEV)
  machine: add memory-encryption option
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/stsquad/tags/pull-travis-speedup-130318-1' into staging

Some updates to reduce timeouts in Travis

# gpg: Signature made Tue 13 Mar 2018 16:44:46 GMT
# gpg:                using RSA key FBD0DB095A9E2A44
# gpg: Good signature from "Alex Bennée (Master Work Key) <alex.bennee@linaro.org>"
# Primary key fingerprint: 6685 AE99 E751 67BC AFC8  DF35 FBD0 DB09 5A9E 2A44

* remotes/stsquad/tags/pull-travis-speedup-130318-1:
  .travis.yml: add --disable-user with the rest of the disables
  .travis.yml: split default config into system and user
  .travis.yml: drop setting default log output

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Merge remote-tracking branch 'remotes/vivier/tags/m68k-for-2.12-pull-request' into staging

# gpg: Signature made Tue 13 Mar 2018 15:58:42 GMT
# gpg:                using RSA key F30C38BD3F2FBE3C
# gpg: Good signature from "Laurent Vivier <lvivier@redhat.com>"
# gpg:                 aka "Laurent Vivier <laurent@vivier.eu>"
# gpg:                 aka "Laurent Vivier (Red Hat) <lvivier@redhat.com>"
# Primary key fingerprint: CD2F 75DD C8E3 A4DC 2E4F  5173 F30C 38BD 3F2F BE3C

* remotes/vivier/tags/m68k-for-2.12-pull-request:
  target/m68k: implement fcosh
  target/m68k: implement fsinh
  target/m68k: implement ftanh
  target/m68k: implement fatanh
  target/m68k: implement facos
  target/m68k: implement fasin
  target/m68k: implement fatan
  target/m68k: implement fsincos
  target/m68k: implement fcos
  target/m68k: implement fsin
  target/m68k: implement ftan

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

iotests: add dirty bitmap postcopy test

Test
- start two vms (vm_a, vm_b)

- in a
    - do writes from set A
    - do writes from set B
    - fix bitmap sha256
    - clear bitmap
    - do writes from set A
    - start migration
- than, in b
    - wait vm start (postcopy should start)
    - do writes from set B
    - check bitmap sha256

The test should verify postcopy migration and then merging with delta
(changes in target, during postcopy process).

Reduce supported cache modes to only 'none', because with cache on time
from source.STOP to target.RESUME is unpredictable and we can fail with
timout while waiting for target.RESUME.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180313180320.339796-14-vsementsov@virtuozzo.com

iotests: add dirty bitmap migration test

The test starts two vms (vm_a, vm_b), create dirty bitmap in
the first one, do several writes to corresponding device and
then migrate vm_a to vm_b.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-id: 20180313180320.339796-13-vsementsov@virtuozzo.com

migration: add postcopy migration of dirty bitmaps

Postcopy migration of dirty bitmaps. Only named dirty bitmaps are migrated.

If destination qemu is already containing a dirty bitmap with the same name
as a migrated bitmap (for the same node), then, if their granularities are
the same the migration will be done, otherwise the error will be generated.

If destination qemu doesn't contain such bitmap it will be created.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20180313180320.339796-12-vsementsov@virtuozzo.com
[Changed '+' to '*' as per list discussion. --js]
Signed-off-by: John Snow <jsnow@redhat.com>

migration: allow qmp command migrate-start-postcopy for any postcopy

Allow migrate-start-postcopy for any postcopy type

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20180313180320.339796-11-vsementsov@virtuozzo.com

migration: add is_active_iterate handler

Only-postcopy savevm states (dirty-bitmap) don't need live iteration, so
to disable them and stop transporting empty sections there is a new
savevm handler.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180313180320.339796-10-vsementsov@virtuozzo.com

migration/qemu-file: add qemu_put_counted_string()

Add function opposite to qemu_get_counted_string.
qemu_put_counted_string puts one-byte length of the string (string
should not be longer than 255 characters), and then it puts the string,
without last zero byte.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180313180320.339796-9-vsementsov@virtuozzo.com

migration: include migrate_dirty_bitmaps in migrate_postcopy

Enable postcopy if dirty bitmap migration is enabled.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180313180320.339796-8-vsementsov@virtuozzo.com

qapi: add dirty-bitmaps migration capability

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180313180320.339796-7-vsementsov@virtuozzo.com

migration: introduce postcopy-only pending

There would be savevm states (dirty-bitmap) which can migrate only in
postcopy stage. The corresponding pending is introduced here.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20180313180320.339796-6-vsementsov@virtuozzo.com

dirty-bitmap: add locked state

Add special state, when qmp operations on the bitmap are disabled.
It is needed during bitmap migration. "Frozen" state is not
appropriate here, because it looks like bitmap is unchanged.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180207155837.92351-5-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>

block/dirty-bitmap: add _locked version of bdrv_reclaim_dirty_bitmap

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180207155837.92351-4-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>

block/dirty-bitmap: fix locking in bdrv_reclaim_dirty_bitmap

Like other setters here these functions should take a lock.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Message-id: 20180207155837.92351-3-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>

iotests: new test 209 for NBD BLOCK_STATUS

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-9-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

iotests: add file_path helper

Simple way to have auto generated filenames with auto cleanup. Like
FilePath but without using 'with' statement and without additional
indentation of the whole test.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-8-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweak]
Signed-off-by: Eric Blake <eblake@redhat.com>

iotests.py: tiny refactor: move system imports up

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180312152126.286890-7-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd: BLOCK_STATUS for standard get_block_status function: client part

Minimal realization: only one extent in server answer is supported.
Flag NBD_CMD_FLAG_REQ_ONE is used to force this behavior.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks, fix min_block check and 32-bit cap, use -1
instead of errno on failure in nbd_negotiate_simple_meta_context,
ensure that block status makes progress on success]
Signed-off-by: Eric Blake <eblake@redhat.com>

block/nbd-client: save first fatal error in nbd_iter_error

It is ok, that fatal error hides previous not fatal, but hiding
first fatal error is a bad feature.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180312152126.286890-5-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd: BLOCK_STATUS for standard get_block_status function: server part

Minimal realization: only one extent in server answer is supported.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: tweak whitespace, move constant from .h to .c, improve
logic of check_meta_export_name, simplify nbd_negotiate_options
by doing more in nbd_negotiate_meta_queries]
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd/server: add nbd_read_opt_name helper

Add helper to read name in format:

uint32 len (<= NBD_MAX_NAME_SIZE)
len bytes string (not 0-terminated)

The helper will be reused in following patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180312152126.286890-3-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar fixes, actually check error]
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd/server: add nbd_opt_invalid helper

NBD_REP_ERR_INVALID is often parameter to nbd_opt_drop and it would
be used more in following patches. So, let's add a helper.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180312152126.286890-2-vsementsov@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

iotests: add 208 nbd-server + blockdev-snapshot-sync test case

This test case adds an NBD server export and then invokes
blockdev-snapshot-sync, which changes the BlockDriverState node that the
NBD server's BlockBackend points to. This is an interesting scenario to
test and exercises the code path fixed by the previous commit.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180306204819.11266-3-stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

block: let blk_add/remove_aio_context_notifier() tolerate BDS changes

Commit 2019ba0a0197 ("block: Add AioContextNotifier functions to BB")
added blk_add/remove_aio_context_notifier() and implemented them by
passing through the bdrv_*() equivalent.

This doesn't work across bdrv_append(), which detaches child->bs and
re-attaches it to a new BlockDriverState. When
blk_remove_aio_context_notifier() is called we will access the new BDS
instead of the one where the notifier was added!

>From the point of view of the blk_*() API user, changes to the root BDS
should be transparent.

This patch maintains a list of AioContext notifiers in BlockBackend and
adds/removes them from the BlockDriverState as needed.

Reported-by: Stefano Panella <spanella@gmail.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20180306204819.11266-2-stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd/server: Honor FUA request on NBD_CMD_TRIM

The NBD spec states that since trim requests can affect disk contents,
then they should allow for FUA semantics just like writes for ensuring
the disk has settled before returning.  As bdrv_[co_]pdiscard() does
not support a flags argument, we can't pass FUA down the block layer
stack, and must therefore emulate it with a flush at the NBD layer.

Note that in all reality, generic well-behaved clients will never
send TRIM+FUA (in fact, qemu as a client never does, and we have no
intention to plumb flags into bdrv_pdiscard).  This is because the
NBD protocol states that it is unspecified to READ a trimmed area
(you might read stale data, all zeroes, or even random unrelated
data) without first rewriting it, and even the experimental
BLOCK_STATUS extension states that TRIM need not affect reported
status.  Thus, in the general case, a client cannot tell the
difference between an arbitrary server that ignores TRIM, a server
that had a power outage without flushing to disk, and a server that
actually affected the disk before returning; so waiting for the
trim actions to flush to disk makes little sense.  However, for a
specific client and server pair, where the client knows the server
treats TRIM'd areas as guaranteed reads-zero, waiting for a flush
makes sense, hence why the protocol documents that FUA is valid on
trim.  So, even though the NBD protocol doesn't have a way for the
server to advertise what effects (if any) TRIM will actually have,
and thus any client that relies on specific effects is probably
in error, we can at least support a client that requests TRIM+FUA.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180307225732.155835-1-eblake@redhat.com>

nbd/server: refactor nbd_trip: split out nbd_handle_request

Split out request handling logic.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-6-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: touch up blank line placement]
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd/server: refactor nbd_trip: cmd_read and generic reply

nbd_trip has difficult logic when sending replies: it tries to use one
code path for all replies. It is ok for simple replies, but is not
comfortable for structured replies. Also, two types of error (and
corresponding messages in local_err) - fatal (leading to disconnect)
and not-fatal (just to be sent to the client) are difficult to follow.

To make things a bit clearer, the following is done:
- split CMD_READ logic to separate function. It is the most difficult
   command for now, and it is definitely cramped inside nbd_trip. Also,
   it is difficult to follow CMD_READ logic, shared between
   "case NBD_CMD_READ" and "if"s under "reply:" label.
- create separate helper function nbd_send_generic_reply() and use it
   both in new nbd_do_cmd_read and for other commands in nbd_trip instead
   of common code-path under "reply:" label in nbd_trip. The helper
   supports an error message, so logic with local_err in nbd_trip is
   simplified.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-5-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: grammar tweaks and blank line placement]
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd/server: fix: check client->closing before sending reply

Since the unchanged code has just set client->recv_coroutine to
NULL before calling nbd_client_receive_next_request(), we are
spawning a new coroutine unconditionally, but the first thing
that coroutine will do is check for client->closing, making it
a no-op if we have already detected that the client is going
away. Furthermore, for any error other than EIO (where we
disconnect, which itself sets client->closing), if the client
has already gone away, we'll probably encounter EIO later
in the function and attempt disconnect at that point. Logically,
as soon as we know the connection is closing, there is no need
to try a likely-to-fail a response or spawn a no-op coroutine.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-4-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
[eblake: squash in further reordering: hoist check before spawning
next coroutine, and document rationale in commit message]
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd/server: fix sparse read

In case of io error in nbd_co_send_sparse_read we should not
"goto reply:", as it was a fatal error and the common behavior
is to disconnect in this case. We should not try to send the
client an additional error reply, since we already hit a
channel-io error on our previous attempt to send one.

Fix this by handling block-status error in nbd_co_send_sparse_read,
so nbd_co_send_sparse_read fails only on io error. Then just skip
common "reply:" code path in nbd_trip.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-3-vsementsov@virtuozzo.com>
[eblake: grammar tweaks]
Signed-off-by: Eric Blake <eblake@redhat.com>

nbd/server: move nbd_co_send_structured_error up

To be reused in nbd_co_send_sparse_read() in the following patch.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-Id: <20180308184636.178534-2-vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>

iotests: Fix stuck NBD process on 33

Commit afe35cde6 added additional actions to test 33, but forgot
to reset the image between tests. As a result, './check -nbd 33'
fails because the qemu-nbd process from the first half is still
occupying the port, preventing the second half from starting a
new qemu-nbd process. Worse, the failure leaves a rogue qemu-nbd
process behind even after the test fails, which causes knock-on
failures to later tests that also want to start qemu-nbd.

Reported-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20180312211156.452139-1-eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>

block/dirty-bitmap: add bdrv_dirty_bitmap_enable_successor()

Enabling bitmap successor is necessary to enable successors of bitmaps
being migrated before target vm start.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 20180207155837.92351-2-vsementsov@virtuozzo.com
Signed-off-by: John Snow <jsnow@redhat.com>

linux-user: drop unused target_msync function

target_msync is not used, remove its declaration and implementation.

Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

linux-user: fix target_mprotect/target_munmap error return values

target_mprotect/target_munmap return value goes through get_errno at the
call site, thus the functions must either set errno to host error code
and return -1 or return negative guest error code. Do the latter.

Cc: qemu-stable@nongnu.org
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

linux-user: fix assertion in shmdt

shmdt fails to call mmap_lock/mmap_unlock around page_set_flags,
resulting in the following assertion:
page_set_flags: Assertion `have_mmap_lock()' failed.

Wrap shmdt internals into mmap_lock/mmap_unlock.

Cc: qemu-stable@nongnu.org
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

linux-user: fix mmap/munmap/mprotect/mremap/shmat

In linux-user QEMU that runs for a target with TARGET_ABI_BITS bigger
than L1_MAP_ADDR_SPACE_BITS an assertion in page_set_flags fires when
mmap, munmap, mprotect, mremap or shmat is called for an address outside
the guest address space. mmap and mprotect should return ENOMEM in such
case.

Change definition of GUEST_ADDR_MAX to always be the last valid guest
address. Account for this change in open_self_maps.
Add macro guest_addr_valid that verifies if the guest address is valid.
Add function guest_range_valid that verifies if address range is within
guest address space and does not wrap around. Use that macro in
mmap/munmap/mprotect/mremap/shmat for error checking.

Cc: qemu-stable@nongnu.org
Cc: Riku Voipio <riku.voipio@iki.fi>
Cc: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: support MTTCG

- emit TCG barriers for MEMW, EXTW, S32RI and L32AI;
- do atomic_cmpxchg_i32 for S32C1I.

Cc: Emilio G. Cota <cota@braap.org>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: use correct number of registers in gdbstub

System emulation should provide access to all registers, userspace
emulation should only provide access to unprivileged registers.
Record register flags from GDB register map definition, calculate both
num_regs and num_core_regs if either is zero. Use num_regs in system
emulation, num_core_regs in userspace emulation gdbstub.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: mark register windows in the dump

Add arrows that mark beginning of register windows and position of the
current window in the windowed register file.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

target/xtensa: dump correct physical registers

xtensa_cpu_dump_state outputs CPU physical registers as is, without
synchronization from current window. That may result in different values
printed for the current window and corresponding physical registers.
Synchronize physical registers from window before dumping.

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>

Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging

# gpg: Signature made Tue 13 Mar 2018 12:28:21 GMT
# gpg:                using RSA key BDBE7B27C0DE3057
# gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>"
# gpg:                 aka "Jeffrey Cody <jeff@codyprime.org>"
# gpg:                 aka "Jeffrey Cody <codyprime@gmail.com>"
# Primary key fingerprint: 9957 4B4D 3474 90E7 9D98  D624 BDBE 7B27 C0DE 3057

* remotes/cody/tags/block-pull-request:
  block: include original filename when reporting invalid URIs

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

char: allow passing pre-opened socket file descriptor at startup

When starting QEMU management apps will usually setup a monitor socket, and
then open it immediately after startup. If not using QEMU's own -daemonize
arg, this process can be troublesome to handle correctly. The mgmt app will
need to repeatedly call connect() until it succeeds, because it does not
know when QEMU has created the listener socket. If can't retry connect()
forever though, because an error might have caused QEMU to exit before it
even creates the monitor.

The obvious way to fix this kind of problem is to just pass in a pre-opened
socket file descriptor for the QEMU monitor to listen on. The management
app can now immediately call connect() just once. If connect() fails it
knows that QEMU has exited with an error.

The SocketAddress(Legacy) structs allow for FD passing via the monitor, and
now via inherited file descriptors from the process that spawned QEMU. The
final missing piece is adding a 'fd' parameter in the socket chardev
options.

This allows both HMP usage, pass any FD number with SCM_RIGHTS, then
running HMP commands:

   getfd myfd
   chardev-add socket,fd=myfd

Note that numeric FDs cannot be referenced directly in HMP, only named FDs.

And also CLI usage, by leak FD 3 from parent by clearing O_CLOEXEC, then
spawning QEMU with

  -chardev socket,fd=3,id=mon
  -mon chardev=mon,mode=control

Note that named FDs cannot be referenced in CLI args, only numeric FDs.

We do not wire this up in the legacy chardev syntax, so you cannot use FD
passing with '-qmp', you must use the modern '-mon' + '-chardev' pair.

When passing pre-opened FDs there is a restriction on use of TLS encryption.
It can be used on a server socket chardev, but cannot be used for a client
socket chardev. This is because when validating a server's certificate, the
client needs to have a hostname available to match against the certificate
identity.

An illustrative example of usage is:

  #!/usr/bin/perl

  use IO::Socket::UNIX;
  use Fcntl;

  unlink "/tmp/qmp";
  my $srv = IO::Socket::UNIX->new(
    Type => SOCK_STREAM(),
    Local => "/tmp/qmp",
    Listen => 1,
  );

  my $flags = fcntl $srv, F_GETFD, 0;
  fcntl $srv, F_SETFD, $flags & ~FD_CLOEXEC;

  my $fd = $srv->fileno();

  exec "qemu-system-x86_64", \
      "-chardev", "socket,fd=$fd,server,nowait,id=mon", \
      "-mon", "chardev=mon,mode=control";

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

char: refactor parsing of socket address information

To prepare for handling more address types, refactor the parsing of
socket address information to make it more robust and extensible.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

sockets: allow SocketAddress 'fd' to reference numeric file descriptors

The SocketAddress 'fd' kind accepts the name of a file descriptor passed
to the monitor with the 'getfd' command. This makes it impossible to use
the 'fd' kind in cases where a monitor is not available. This can apply in
handling command line argv at startup, or simply if internal code wants to
use SocketAddress and pass a numeric FD it has acquired from elsewhere.

Fortunately the 'getfd' command mandated that the FD names must not start
with a leading digit. We can thus safely extend semantics of the
SocketAddress 'fd' kind, to allow a purely numeric name to reference an
file descriptor that QEMU already has open. There will be restrictions on
when each kind can be used.

In codepaths where we are handling a monitor command (ie cur_mon != NULL),
we will only support use of named file descriptors as before. Use of FD
numbers is still not permitted for monitor commands.

In codepaths where we are not handling a monitor command (ie cur_mon ==
NULL), we will not support named file descriptors. Instead we can reference
FD numers explicitly. This allows the app spawning QEMU to intentionally
"leak" a pre-opened socket to QEMU and reference that in a SocketAddress
definition, or for code inside QEMU to pass pre-opened FDs around.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

sockets: check that the named file descriptor is a socket

The SocketAddress struct has an "fd" type, which references the name of a
file descriptor passed over the monitor using the "getfd" command. We
currently blindly assume the FD is a socket, which can lead to hard to
diagnose errors later. This adds an explicit check that the FD is actually
a socket to improve the error diagnosis.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

sockets: move fd_is_socket() into common sockets code

The fd_is_socket() helper method is useful in a few places, so put it in
the common sockets code. Make the code more compact while moving it.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

sockets: strengthen test suite IP protocol availability checks

Instead of just checking whether it is possible to bind() on a socket, also
check that we can successfully connect() to the socket we bound to. This
more closely replicates the level of functionality that tests will actually
use.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

sockets: pull code for testing IP availability out of specific test

The test-io-channel-socket.c file has some useful helper functions for
checking if a specific IP protocol is available. Other tests need to
perform similar kinds of checks to avoid running tests that will fail
due to missing IP protocols.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

cutils: add qemu_strtoi & qemu_strtoui parsers for int/unsigned int types

There are qemu_strtoNN functions for various sized integers. This adds two
more for plain int & unsigned int types, with suitable range checking.

Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>

ppc/spapr, vfio: Turn off MSIX emulation for VFIO devices

This adds a possibility for the platform to tell VFIO not to emulate MSIX
so MMIO memory regions do not get split into chunks in flatview and
the entire page can be registered as a KVM memory slot and make direct
MMIO access possible for the guest.

This enables the entire MSIX BAR mapping to the guest for the pseries
platform in order to achieve the maximum MMIO preformance for certain
devices.

Tested on:
LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02)

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio-pci: Allow mmap of MSIX BAR

At the moment we unconditionally avoid mapping MSIX data of a BAR and
emulate MSIX table in QEMU. However it is 1) not always necessary as
a platform may provide a paravirt interface for MSIX configuration;
2) can affect the speed of MMIO access by emulating them in QEMU when
frequently accessed registers share same system page with MSIX data,
this is particularly a problem for systems with the page size bigger
than 4KB.

A new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - has been added
to the kernel [1] which tells the userspace that mapping of the MSIX data
is possible now. This makes use of it so from now on QEMU tries mapping
the entire BAR as a whole and emulate MSIX on top of that.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/pci: Relax DMA map errors for MMIO regions

At the moment if vfio_memory_listener is registered in the system memory
address space, it maps/unmaps every RAM memory region for DMA.
It expects system page size aligned memory sections so vfio_dma_map
would not fail and so far this has been the case. A mapping failure
would be fatal. A side effect of such behavior is that some MMIO pages
would not be mapped silently.

However we are going to change MSIX BAR handling so we will end having
non-aligned sections in vfio_memory_listener (more details is in
the next patch) and vfio_dma_map will exit QEMU.

In order to avoid fatal failures on what previously was not a failure and
was just silently ignored, this checks the section alignment to
the smallest supported IOMMU page size and prints an error if not aligned;
it also prints an error if vfio_dma_map failed despite the page size check.
Both errors are not fatal; only MMIO RAM regions are checked
(aka "RAM device" regions).

If the amount of errors printed is overwhelming, the MSIX relocation
could be used to avoid excessive error output.

This is unlikely to cause any behavioral change.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
[aw: Fix Int128 bit ops]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/display: adding dmabuf support

Wire up dmabuf-based display.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/display: adding region support

Wire up region-based display.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed By: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/display: core & wireup

Infrastructure for display support. Must be enabled
using 'display' property.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed By: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

vfio/common: cleanup in vfio_region_finalize

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

secondary-vga: properly close QemuConsole on unplug

Using the new graphic_console_close() function.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

console: minimal hotplug suport

This patch allows to unbind devices from QemuConsoles, using the new
graphic_console_close() function. The QemuConsole will show a static
display then, saying the device was unplugged. When re-plugging a
display later on the QemuConsole will be reused.

Eventually we will allocate and release QemuConsoles dynamically at some
point in the future, that'll need more infrastructure though to notify
user interfaces (gtk, sdl, spice, ...) about QemuConsoles coming and
going.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

ui/pixman: add qemu_drm_format_to_pixman()

Map drm fourcc codes to pixman formats.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed by: Kirti Wankhede <kwankhede@nvidia.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

standard-headers: add drm/drm_fourcc.h

So we can use the drm fourcc codes without a dependency on libdrm-devel.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>

sev/i386: add sev_get_capabilities()

The function can be used to get the current SEV capabilities.
The capabilities include platform diffie-hellman key (pdh) and certificate
chain. The key can be provided to the external entities which wants to
establish a trusted channel between SEV firmware and guest owner.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: qmp: add query-sev-capabilities command

The command can be used by libvirt to query the SEV capabilities.

Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: qmp: add query-sev-launch-measure command

The command can be used by libvirt to retrieve the measurement of SEV guest.
This measurement is a signature of the memory contents that was encrypted
through the LAUNCH_UPDATE_DATA.

Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: hmp: add 'info sev' command

The command can be used to show the SEV information when memory
encryption is enabled on AMD platform.

Cc: Eric Blake <eblake@redhat.com>
Cc: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Reviewed-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

cpu/i386: populate CPUID 0x8000_001F when SEV is active

When SEV is enabled, CPUID 0x8000_001F should provide additional
information regarding the feature (such as which page table bit is used
to mark the pages as encrypted etc).

The details for memory encryption CPUID is available in AMD APM
(https://support.amd.com/TechDocs/24594.pdf) Section E.4.17

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: add migration blocker

SEV guest migration is not implemented yet.

Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: finalize the SEV guest launch flow

SEV launch flow requires us to issue LAUNCH_FINISH command before guest
is ready to run.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: add support to LAUNCH_MEASURE command

During machine creation we encrypted the guest bios image, the
LAUNCH_MEASURE command can be used to retrieve the measurement of
the encrypted memory region. This measurement is a signature of
the memory contents that can be sent to the guest owner as an
attestation that the memory was encrypted correctly by the firmware.
VM management tools like libvirt can query the measurement using
query-sev-launch-measure QMP command.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: encrypt bios rom

SEV requires that guest bios must be encrypted before booting the guest.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: add command to encrypt guest memory region

The KVM_SEV_LAUNCH_UPDATE_DATA command is used to encrypt a guest memory
region using the VM Encryption Key created using LAUNCH_START.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: add command to create launch memory encryption context

The KVM_SEV_LAUNCH_START command creates a new VM encryption key (VEK).
The encryption key created with the command will be used for encrypting
the bootstrap images (such as guest bios).

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: register the guest memory range which may contain encrypted data

When SEV is enabled, the hardware encryption engine uses a tweak such
that the two identical plaintext at different location will have a
different ciphertexts. So swapping or moving a ciphertexts of two guest
pages will not result in plaintexts being swapped. Hence relocating
a physical backing pages of the SEV guest will require some additional
steps in KVM driver. The KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctl can be
used to register/unregister the guest memory region which may contain the
encrypted data. KVM driver will internally handle the relocating physical
backing pages of registered memory regions.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

sev/i386: add command to initialize the memory encryption context

When memory encryption is enabled, KVM_SEV_INIT command is used to
initialize the platform. The command loads the SEV related persistent
data from non-volatile storage and initializes the platform context.
This command should be first issued before invoking any other guest
commands provided by the SEV firmware.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

char: don't silently skip tn3270 protocol init when TLS is enabled

Even if common tn3270 implementations do not support TLS, it is trivial to
have them proxied over a proxy like stunnel which adds TLS at the sockets
layer. We should thus not silently skip tn3270 protocol initialization
when TLS is enabled.

Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>