before stopping the source VM. Enabling this migration capability will
guarantee that and thus, can potentially reduce downtime even further.
-Note that currently VFIO migration is supported only for a single device. This
-is due to VFIO migration's lack of P2P support. However, P2P support is planned
-to be added later on.
+To support migration of multiple devices that might do P2P transactions between
+themselves, VFIO migration uAPI defines an intermediate P2P quiescent state.
+While in the P2P quiescent state, P2P DMA transactions cannot be initiated by
+the device, but the device can respond to incoming ones. Additionally, all
+outstanding P2P transactions are guaranteed to have been completed by the time
+the device enters this state.
+
+All the devices that support P2P migration are first transitioned to the P2P
+quiescent state and only then are they stopped or started. This makes migration
+safe P2P-wise, since starting and stopping the devices is not done atomically
+for all the devices together.
+
+Thus, multiple VFIO devices migration is allowed only if all the devices
+support P2P migration. Single VFIO device migration is allowed regardless of
+P2P migration support.
A detailed description of the UAPI for VFIO device migration can be found in
the comment for the ``vfio_device_mig_state`` structure in the header file
Flow of state changes during Live migration
===========================================
-Below is the flow of state change during live migration.
+Below is the state change flow during live migration for a VFIO device that
+supports both precopy and P2P migration. The flow for devices that don't
+support it is similar, except that the relevant states for precopy and P2P are
+skipped.
The values in the parentheses represent the VM state, the migration state, and
the VFIO device state, respectively.
-The text in the square brackets represents the flow if the VFIO device supports
-pre-copy.
Live migration save path
------------------------
::
- QEMU normal running state
- (RUNNING, _NONE, _RUNNING)
- |
+ QEMU normal running state
+ (RUNNING, _NONE, _RUNNING)
+ |
migrate_init spawns migration_thread
- Migration thread then calls each device's .save_setup()
- (RUNNING, _SETUP, _RUNNING [_PRE_COPY])
- |
- (RUNNING, _ACTIVE, _RUNNING [_PRE_COPY])
- If device is active, get pending_bytes by .state_pending_{estimate,exact}()
- If total pending_bytes >= threshold_size, call .save_live_iterate()
- [Data of VFIO device for pre-copy phase is copied]
- Iterate till total pending bytes converge and are less than threshold
- |
- On migration completion, vCPU stops and calls .save_live_complete_precopy for
- each active device. The VFIO device is then transitioned into _STOP_COPY state
- (FINISH_MIGRATE, _DEVICE, _STOP_COPY)
- |
- For the VFIO device, iterate in .save_live_complete_precopy until
- pending data is 0
- (FINISH_MIGRATE, _DEVICE, _STOP)
- |
- (FINISH_MIGRATE, _COMPLETED, _STOP)
- Migraton thread schedules cleanup bottom half and exits
+ Migration thread then calls each device's .save_setup()
+ (RUNNING, _SETUP, _PRE_COPY)
+ |
+ (RUNNING, _ACTIVE, _PRE_COPY)
+ If device is active, get pending_bytes by .state_pending_{estimate,exact}()
+ If total pending_bytes >= threshold_size, call .save_live_iterate()
+ Data of VFIO device for pre-copy phase is copied
+ Iterate till total pending bytes converge and are less than threshold
+ |
+ On migration completion, the vCPUs and the VFIO device are stopped
+ The VFIO device is first put in P2P quiescent state
+ (FINISH_MIGRATE, _ACTIVE, _PRE_COPY_P2P)
+ |
+ Then the VFIO device is put in _STOP_COPY state
+ (FINISH_MIGRATE, _ACTIVE, _STOP_COPY)
+ .save_live_complete_precopy() is called for each active device
+ For the VFIO device, iterate in .save_live_complete_precopy() until
+ pending data is 0
+ |
+ (POSTMIGRATE, _COMPLETED, _STOP_COPY)
+ Migraton thread schedules cleanup bottom half and exits
+ |
+ .save_cleanup() is called
+ (POSTMIGRATE, _COMPLETED, _STOP)
Live migration resume path
--------------------------
::
- Incoming migration calls .load_setup for each device
- (RESTORE_VM, _ACTIVE, _STOP)
- |
- For each device, .load_state is called for that device section data
- (RESTORE_VM, _ACTIVE, _RESUMING)
- |
- At the end, .load_cleanup is called for each device and vCPUs are started
- (RUNNING, _NONE, _RUNNING)
+ Incoming migration calls .load_setup() for each device
+ (RESTORE_VM, _ACTIVE, _STOP)
+ |
+ For each device, .load_state() is called for that device section data
+ (RESTORE_VM, _ACTIVE, _RESUMING)
+ |
+ At the end, .load_cleanup() is called for each device and vCPUs are started
+ The VFIO device is first put in P2P quiescent state
+ (RUNNING, _ACTIVE, _RUNNING_P2P)
+ |
+ (RUNNING, _NONE, _RUNNING)
Postcopy
========
VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
VMChangeStateHandler *cb,
void *opaque)
+{
+ return qdev_add_vm_change_state_handler_full(dev, cb, NULL, opaque);
+}
+
+/*
+ * Exactly like qdev_add_vm_change_state_handler() but passes a prepare_cb
+ * argument too.
+ */
+VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
+ DeviceState *dev, VMChangeStateHandler *cb,
+ VMChangeStateHandler *prepare_cb, void *opaque)
{
int depth = qdev_get_dev_tree_depth(dev);
- return qemu_add_vm_change_state_handler_prio(cb, opaque, depth);
+ return qemu_add_vm_change_state_handler_prio_full(cb, prepare_cb, opaque,
+ depth);
}
#include "hw/vfio/vfio-common.h"
#include "hw/vfio/vfio.h"
+#include "hw/vfio/pci.h"
#include "exec/address-spaces.h"
#include "exec/memory.h"
#include "exec/ram_addr.h"
static Error *multiple_devices_migration_blocker;
-static unsigned int vfio_migratable_device_num(void)
+/*
+ * Multiple devices migration is allowed only if all devices support P2P
+ * migration. Single device migration is allowed regardless of P2P migration
+ * support.
+ */
+static bool vfio_multiple_devices_migration_is_supported(void)
{
VFIOGroup *group;
VFIODevice *vbasedev;
unsigned int device_num = 0;
+ bool all_support_p2p = true;
QLIST_FOREACH(group, &vfio_group_list, next) {
QLIST_FOREACH(vbasedev, &group->device_list, next) {
if (vbasedev->migration) {
device_num++;
+
+ if (!(vbasedev->migration->mig_flags & VFIO_MIGRATION_P2P)) {
+ all_support_p2p = false;
+ }
}
}
}
- return device_num;
+ return all_support_p2p || device_num <= 1;
}
int vfio_block_multiple_devices_migration(VFIODevice *vbasedev, Error **errp)
{
int ret;
- if (multiple_devices_migration_blocker ||
- vfio_migratable_device_num() <= 1) {
+ if (vfio_multiple_devices_migration_is_supported()) {
return 0;
}
if (vbasedev->enable_migration == ON_OFF_AUTO_ON) {
- error_setg(errp, "Migration is currently not supported with multiple "
- "VFIO devices");
+ error_setg(errp, "Multiple VFIO devices migration is supported only if "
+ "all of them support P2P migration");
return -EINVAL;
}
+ if (multiple_devices_migration_blocker) {
+ return 0;
+ }
+
error_setg(&multiple_devices_migration_blocker,
- "Migration is currently not supported with multiple "
- "VFIO devices");
+ "Multiple VFIO devices migration is supported only if all of "
+ "them support P2P migration");
ret = migrate_add_blocker(multiple_devices_migration_blocker, errp);
if (ret < 0) {
error_free(multiple_devices_migration_blocker);
void vfio_unblock_multiple_devices_migration(void)
{
if (!multiple_devices_migration_blocker ||
- vfio_migratable_device_num() > 1) {
+ !vfio_multiple_devices_migration_is_supported()) {
return;
}
}
}
+bool vfio_device_state_is_running(VFIODevice *vbasedev)
+{
+ VFIOMigration *migration = vbasedev->migration;
+
+ return migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+ migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P;
+}
+
+bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
+{
+ VFIOMigration *migration = vbasedev->migration;
+
+ return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ||
+ migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P;
+}
+
static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
{
VFIOGroup *group;
}
if (vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF &&
- (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
- migration->device_state == VFIO_DEVICE_STATE_PRE_COPY)) {
+ (vfio_device_state_is_running(vbasedev) ||
+ vfio_device_state_is_precopy(vbasedev))) {
return false;
}
}
return false;
}
- if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
- migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
+ if (vfio_device_state_is_running(vbasedev) ||
+ vfio_device_state_is_precopy(vbasedev)) {
continue;
} else {
return false;
hwaddr max32;
hwaddr min64;
hwaddr max64;
+ hwaddr minpci64;
+ hwaddr maxpci64;
} VFIODirtyRanges;
typedef struct VFIODirtyRangesListener {
MemoryListener listener;
} VFIODirtyRangesListener;
+static bool vfio_section_is_vfio_pci(MemoryRegionSection *section,
+ VFIOContainer *container)
+{
+ VFIOPCIDevice *pcidev;
+ VFIODevice *vbasedev;
+ VFIOGroup *group;
+ Object *owner;
+
+ owner = memory_region_owner(section->mr);
+
+ QLIST_FOREACH(group, &container->group_list, container_next) {
+ QLIST_FOREACH(vbasedev, &group->device_list, next) {
+ if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
+ continue;
+ }
+ pcidev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+ if (OBJECT(pcidev) == owner) {
+ return true;
+ }
+ }
+ }
+
+ return false;
+}
+
static void vfio_dirty_tracking_update(MemoryListener *listener,
MemoryRegionSection *section)
{
}
/*
- * The address space passed to the dirty tracker is reduced to two ranges:
- * one for 32-bit DMA ranges, and another one for 64-bit DMA ranges.
+ * The address space passed to the dirty tracker is reduced to three ranges:
+ * one for 32-bit DMA ranges, one for 64-bit DMA ranges and one for the
+ * PCI 64-bit hole.
+ *
* The underlying reports of dirty will query a sub-interval of each of
* these ranges.
*
- * The purpose of the dual range handling is to handle known cases of big
- * holes in the address space, like the x86 AMD 1T hole. The alternative
- * would be an IOVATree but that has a much bigger runtime overhead and
- * unnecessary complexity.
+ * The purpose of the three range handling is to handle known cases of big
+ * holes in the address space, like the x86 AMD 1T hole, and firmware (like
+ * OVMF) which may relocate the pci-hole64 to the end of the address space.
+ * The latter would otherwise generate large ranges for tracking, stressing
+ * the limits of supported hardware. The pci-hole32 will always be below 4G
+ * (overlapping or not) so it doesn't need special handling and is part of
+ * the 32-bit range.
+ *
+ * The alternative would be an IOVATree but that has a much bigger runtime
+ * overhead and unnecessary complexity.
*/
- min = (end <= UINT32_MAX) ? &range->min32 : &range->min64;
- max = (end <= UINT32_MAX) ? &range->max32 : &range->max64;
-
+ if (vfio_section_is_vfio_pci(section, dirty->container) &&
+ iova >= UINT32_MAX) {
+ min = &range->minpci64;
+ max = &range->maxpci64;
+ } else {
+ min = (end <= UINT32_MAX) ? &range->min32 : &range->min64;
+ max = (end <= UINT32_MAX) ? &range->max32 : &range->max64;
+ }
if (*min > iova) {
*min = iova;
}
memset(&dirty, 0, sizeof(dirty));
dirty.ranges.min32 = UINT32_MAX;
dirty.ranges.min64 = UINT64_MAX;
+ dirty.ranges.minpci64 = UINT64_MAX;
dirty.listener = vfio_dirty_tracking_listener;
dirty.container = container;
* DMA logging uAPI guarantees to support at least a number of ranges that
* fits into a single host kernel base page.
*/
- control->num_ranges = !!tracking->max32 + !!tracking->max64;
+ control->num_ranges = !!tracking->max32 + !!tracking->max64 +
+ !!tracking->maxpci64;
ranges = g_try_new0(struct vfio_device_feature_dma_logging_range,
control->num_ranges);
if (!ranges) {
if (tracking->max64) {
ranges->iova = tracking->min64;
ranges->length = (tracking->max64 - tracking->min64) + 1;
+ ranges++;
+ }
+ if (tracking->maxpci64) {
+ ranges->iova = tracking->minpci64;
+ ranges->length = (tracking->maxpci64 - tracking->minpci64) + 1;
}
trace_vfio_device_dirty_tracking_start(control->num_ranges,
tracking->min32, tracking->max32,
- tracking->min64, tracking->max64);
+ tracking->min64, tracking->max64,
+ tracking->minpci64, tracking->maxpci64);
return feature;
}
return "STOP_COPY";
case VFIO_DEVICE_STATE_RESUMING:
return "RESUMING";
+ case VFIO_DEVICE_STATE_RUNNING_P2P:
+ return "RUNNING_P2P";
case VFIO_DEVICE_STATE_PRE_COPY:
return "PRE_COPY";
+ case VFIO_DEVICE_STATE_PRE_COPY_P2P:
+ return "PRE_COPY_P2P";
default:
return "UNKNOWN STATE";
}
/* ---------------------------------------------------------------------- */
+static int vfio_save_prepare(void *opaque, Error **errp)
+{
+ VFIODevice *vbasedev = opaque;
+
+ /*
+ * Snapshot doesn't use postcopy nor background snapshot, so allow snapshot
+ * even if they are on.
+ */
+ if (runstate_check(RUN_STATE_SAVE_VM)) {
+ return 0;
+ }
+
+ if (migrate_postcopy_ram()) {
+ error_setg(
+ errp, "%s: VFIO migration is not supported with postcopy migration",
+ vbasedev->name);
+ return -EOPNOTSUPP;
+ }
+
+ if (migrate_background_snapshot()) {
+ error_setg(
+ errp,
+ "%s: VFIO migration is not supported with background snapshot",
+ vbasedev->name);
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
static int vfio_save_setup(QEMUFile *f, void *opaque)
{
VFIODevice *vbasedev = opaque;
VFIODevice *vbasedev = opaque;
VFIOMigration *migration = vbasedev->migration;
+ /*
+ * Changing device state from STOP_COPY to STOP can take time. Do it here,
+ * after migration has completed, so it won't increase downtime.
+ */
+ if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) {
+ /*
+ * If setting the device in STOP state fails, the device should be
+ * reset. To do so, use ERROR state as a recover state.
+ */
+ vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
+ VFIO_DEVICE_STATE_ERROR);
+ }
+
g_free(migration->data_buffer);
migration->data_buffer = NULL;
migration->precopy_init_size = 0;
VFIODevice *vbasedev = opaque;
VFIOMigration *migration = vbasedev->migration;
- if (migration->device_state != VFIO_DEVICE_STATE_PRE_COPY) {
+ if (!vfio_device_state_is_precopy(vbasedev)) {
return;
}
vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
*must_precopy += stop_copy_size;
- if (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY) {
+ if (vfio_device_state_is_precopy(vbasedev)) {
vfio_query_precopy_size(migration);
*must_precopy +=
static bool vfio_is_active_iterate(void *opaque)
{
VFIODevice *vbasedev = opaque;
- VFIOMigration *migration = vbasedev->migration;
- return migration->device_state == VFIO_DEVICE_STATE_PRE_COPY;
+ return vfio_device_state_is_precopy(vbasedev);
}
static int vfio_save_iterate(QEMUFile *f, void *opaque)
return ret;
}
- /*
- * If setting the device in STOP state fails, the device should be reset.
- * To do so, use ERROR state as a recover state.
- */
- ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP,
- VFIO_DEVICE_STATE_ERROR);
trace_vfio_save_complete_precopy(vbasedev->name, ret);
return ret;
}
static const SaveVMHandlers savevm_vfio_handlers = {
+ .save_prepare = vfio_save_prepare,
.save_setup = vfio_save_setup,
.save_cleanup = vfio_save_cleanup,
.state_pending_estimate = vfio_state_pending_estimate,
/* ---------------------------------------------------------------------- */
-static void vfio_vmstate_change(void *opaque, bool running, RunState state)
+static void vfio_vmstate_change_prepare(void *opaque, bool running,
+ RunState state)
{
VFIODevice *vbasedev = opaque;
VFIOMigration *migration = vbasedev->migration;
enum vfio_device_mig_state new_state;
int ret;
+ new_state = migration->device_state == VFIO_DEVICE_STATE_PRE_COPY ?
+ VFIO_DEVICE_STATE_PRE_COPY_P2P :
+ VFIO_DEVICE_STATE_RUNNING_P2P;
+
+ /*
+ * If setting the device in new_state fails, the device should be reset.
+ * To do so, use ERROR state as a recover state.
+ */
+ ret = vfio_migration_set_state(vbasedev, new_state,
+ VFIO_DEVICE_STATE_ERROR);
+ if (ret) {
+ /*
+ * Migration should be aborted in this case, but vm_state_notify()
+ * currently does not support reporting failures.
+ */
+ if (migrate_get_current()->to_dst_file) {
+ qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+ }
+ }
+
+ trace_vfio_vmstate_change_prepare(vbasedev->name, running,
+ RunState_str(state),
+ mig_state_to_str(new_state));
+}
+
+static void vfio_vmstate_change(void *opaque, bool running, RunState state)
+{
+ VFIODevice *vbasedev = opaque;
+ enum vfio_device_mig_state new_state;
+ int ret;
+
if (running) {
new_state = VFIO_DEVICE_STATE_RUNNING;
} else {
new_state =
- (migration->device_state == VFIO_DEVICE_STATE_PRE_COPY &&
+ (vfio_device_state_is_precopy(vbasedev) &&
(state == RUN_STATE_FINISH_MIGRATE || state == RUN_STATE_PAUSED)) ?
VFIO_DEVICE_STATE_STOP_COPY :
VFIO_DEVICE_STATE_STOP;
char id[256] = "";
g_autofree char *path = NULL, *oid = NULL;
uint64_t mig_flags = 0;
+ VMChangeStateHandler *prepare_cb;
if (!vbasedev->ops->vfio_get_object) {
return -EINVAL;
register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1, &savevm_vfio_handlers,
vbasedev);
- migration->vm_state = qdev_add_vm_change_state_handler(vbasedev->dev,
- vfio_vmstate_change,
- vbasedev);
+ prepare_cb = migration->mig_flags & VFIO_MIGRATION_P2P ?
+ vfio_vmstate_change_prepare :
+ NULL;
+ migration->vm_state = qdev_add_vm_change_state_handler_full(
+ vbasedev->dev, vfio_vmstate_change, prepare_cb, vbasedev);
migration->migration_state.notify = vfio_migration_state_notifier;
add_migration_state_change_notifier(&migration->migration_state);
vfio_listener_region_add_no_dma_map(const char *name, uint64_t iova, uint64_t size, uint64_t page_size) "Region \"%s\" 0x%"PRIx64" size=0x%"PRIx64" is not aligned to 0x%"PRIx64" and cannot be mapped for DMA"
vfio_listener_region_del(uint64_t start, uint64_t end) "region_del 0x%"PRIx64" - 0x%"PRIx64
vfio_device_dirty_tracking_update(uint64_t start, uint64_t end, uint64_t min, uint64_t max) "section 0x%"PRIx64" - 0x%"PRIx64" -> update [0x%"PRIx64" - 0x%"PRIx64"]"
-vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"]"
+vfio_device_dirty_tracking_start(int nr_ranges, uint64_t min32, uint64_t max32, uint64_t min64, uint64_t max64, uint64_t minpci, uint64_t maxpci) "nr_ranges %d 32:[0x%"PRIx64" - 0x%"PRIx64"], 64:[0x%"PRIx64" - 0x%"PRIx64"], pci64:[0x%"PRIx64" - 0x%"PRIx64"]"
vfio_disconnect_container(int fd) "close container->fd=%d"
vfio_put_group(int fd) "close group->fd=%d"
vfio_get_device(const char * name, unsigned int flags, unsigned int num_regions, unsigned int num_irqs) "Device %s flags: %u, regions: %u, irqs: %u"
vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
+vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
bool vfio_viommu_preset(VFIODevice *vbasedev);
int64_t vfio_mig_bytes_transferred(void);
void vfio_reset_bytes_transferred(void);
+bool vfio_device_state_is_running(VFIODevice *vbasedev);
+bool vfio_device_state_is_precopy(VFIODevice *vbasedev);
#ifdef CONFIG_LINUX
int vfio_get_region_info(VFIODevice *vbasedev, int index,
/* This runs inside the iothread lock. */
SaveStateHandler *save_state;
+ /*
+ * save_prepare is called early, even before migration starts, and can be
+ * used to perform early checks.
+ */
+ int (*save_prepare)(void *opaque, Error **errp);
void (*save_cleanup)(void *opaque);
int (*save_live_complete_postcopy)(QEMUFile *f, void *opaque);
int (*save_live_complete_precopy)(QEMUFile *f, void *opaque);
void *opaque);
VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
VMChangeStateHandler *cb, void *opaque, int priority);
+VMChangeStateEntry *
+qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
+ VMChangeStateHandler *prepare_cb,
+ void *opaque, int priority);
VMChangeStateEntry *qdev_add_vm_change_state_handler(DeviceState *dev,
VMChangeStateHandler *cb,
void *opaque);
+VMChangeStateEntry *qdev_add_vm_change_state_handler_full(
+ DeviceState *dev, VMChangeStateHandler *cb,
+ VMChangeStateHandler *prepare_cb, void *opaque);
void qemu_del_vm_change_state_handler(VMChangeStateEntry *e);
/**
* vm_state_notify: Notify the state of the VM
populate_time_info(info, s);
populate_ram_info(info, s);
populate_disk_info(info);
- populate_vfio_info(info);
+ migration_populate_vfio_info(info);
break;
case MIGRATION_STATUS_COLO:
info->has_status = true;
case MIGRATION_STATUS_COMPLETED:
populate_time_info(info, s);
populate_ram_info(info, s);
- populate_vfio_info(info);
+ migration_populate_vfio_info(info);
break;
case MIGRATION_STATUS_FAILED:
info->has_status = true;
s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE);
}
-void migrate_init(MigrationState *s)
+int migrate_init(MigrationState *s, Error **errp)
{
+ int ret;
+
+ ret = qemu_savevm_state_prepare(errp);
+ if (ret) {
+ return ret;
+ }
+
/*
* Reinitialise all migration state, except
* parameters/capabilities that the user set, and
s->iteration_initial_bytes = 0;
s->threshold_size = 0;
s->switchover_acked = false;
+ /*
+ * set mig_stats compression_counters memory to zero for a
+ * new migration
+ */
+ memset(&mig_stats, 0, sizeof(mig_stats));
+ memset(&compression_counters, 0, sizeof(compression_counters));
+ migration_reset_vfio_bytes_transferred();
+
+ return 0;
}
int migrate_add_blocker_internal(Error *reason, Error **errp)
migrate_set_block_incremental(true);
}
- migrate_init(s);
- /*
- * set mig_stats compression_counters memory to zero for a
- * new migration
- */
- memset(&mig_stats, 0, sizeof(mig_stats));
- memset(&compression_counters, 0, sizeof(compression_counters));
- reset_vfio_bytes_transferred();
+ if (migrate_init(s, errp)) {
+ return false;
+ }
return true;
}
bool migration_is_setup_or_active(int state);
bool migration_is_running(int state);
-void migrate_init(MigrationState *s);
+int migrate_init(MigrationState *s, Error **errp);
bool migration_is_blocked(Error **errp);
/* True if outgoing migration has entered postcopy phase */
bool migration_in_postcopy(void);
bool migration_rate_limit(void);
void migration_cancel(const Error *error);
-void populate_vfio_info(MigrationInfo *info);
-void reset_vfio_bytes_transferred(void);
+void migration_populate_vfio_info(MigrationInfo *info);
+void migration_reset_vfio_bytes_transferred(void);
void postcopy_temp_page_reset(PostcopyTmpPage *tmp_page);
#endif
return false;
}
+int qemu_savevm_state_prepare(Error **errp)
+{
+ SaveStateEntry *se;
+ int ret;
+
+ QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+ if (!se->ops || !se->ops->save_prepare) {
+ continue;
+ }
+ if (se->ops->is_active) {
+ if (!se->ops->is_active(se->opaque)) {
+ continue;
+ }
+ }
+
+ ret = se->ops->save_prepare(se->opaque, errp);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ return 0;
+}
+
void qemu_savevm_state_setup(QEMUFile *f)
{
MigrationState *ms = migrate_get_current();
return -EINVAL;
}
- migrate_init(ms);
- memset(&mig_stats, 0, sizeof(mig_stats));
- memset(&compression_counters, 0, sizeof(compression_counters));
- reset_vfio_bytes_transferred();
+ ret = migrate_init(ms, errp);
+ if (ret) {
+ return ret;
+ }
ms->to_dst_file = f;
qemu_mutex_unlock_iothread();
bool qemu_savevm_state_blocked(Error **errp);
void qemu_savevm_non_migratable_list(strList **reasons);
+int qemu_savevm_state_prepare(Error **errp);
void qemu_savevm_state_setup(QEMUFile *f);
bool qemu_savevm_state_guest_unplug_pending(void);
int qemu_savevm_state_resume_prepare(MigrationState *s);
#endif
#ifdef CONFIG_VFIO
-void populate_vfio_info(MigrationInfo *info)
+void migration_populate_vfio_info(MigrationInfo *info)
{
if (vfio_mig_active()) {
info->vfio = g_malloc0(sizeof(*info->vfio));
}
}
-void reset_vfio_bytes_transferred(void)
+void migration_reset_vfio_bytes_transferred(void)
{
vfio_reset_bytes_transferred();
}
#else
-void populate_vfio_info(MigrationInfo *info)
+void migration_populate_vfio_info(MigrationInfo *info)
{
}
-void reset_vfio_bytes_transferred(void)
+void migration_reset_vfio_bytes_transferred(void)
{
}
#endif
}
struct VMChangeStateEntry {
VMChangeStateHandler *cb;
+ VMChangeStateHandler *prepare_cb;
void *opaque;
QTAILQ_ENTRY(VMChangeStateEntry) entries;
int priority;
*/
VMChangeStateEntry *qemu_add_vm_change_state_handler_prio(
VMChangeStateHandler *cb, void *opaque, int priority)
+{
+ return qemu_add_vm_change_state_handler_prio_full(cb, NULL, opaque,
+ priority);
+}
+
+/**
+ * qemu_add_vm_change_state_handler_prio_full:
+ * @cb: the main callback to invoke
+ * @prepare_cb: a callback to invoke before the main callback
+ * @opaque: user data passed to the callbacks
+ * @priority: low priorities execute first when the vm runs and the reverse is
+ * true when the vm stops
+ *
+ * Register a main callback function and an optional prepare callback function
+ * that are invoked when the vm starts or stops running. The main callback and
+ * the prepare callback are called in two separate phases: First all prepare
+ * callbacks are called and only then all main callbacks are called. As its
+ * name suggests, the prepare callback can be used to do some preparatory work
+ * before invoking the main callback.
+ *
+ * Returns: an entry to be freed using qemu_del_vm_change_state_handler()
+ */
+VMChangeStateEntry *
+qemu_add_vm_change_state_handler_prio_full(VMChangeStateHandler *cb,
+ VMChangeStateHandler *prepare_cb,
+ void *opaque, int priority)
{
VMChangeStateEntry *e;
VMChangeStateEntry *other;
e = g_malloc0(sizeof(*e));
e->cb = cb;
+ e->prepare_cb = prepare_cb;
e->opaque = opaque;
e->priority = priority;
trace_vm_state_notify(running, state, RunState_str(state));
if (running) {
+ QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
+ if (e->prepare_cb) {
+ e->prepare_cb(e->opaque, running, state);
+ }
+ }
+
QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
e->cb(e->opaque, running, state);
}
} else {
+ QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
+ if (e->prepare_cb) {
+ e->prepare_cb(e->opaque, running, state);
+ }
+ }
+
QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
e->cb(e->opaque, running, state);
}