From nobody Wed May 1 20:56:04 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=intel.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 155056659747156.627798140844675; Tue, 19 Feb 2019 00:56:37 -0800 (PST) Received: from localhost ([127.0.0.1]:44439 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1CY-0005f5-A2 for importer@patchew.org; Tue, 19 Feb 2019 03:56:30 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55582) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw18o-0002iP-Gt for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw18m-00015g-Cv for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:38 -0500 Received: from mga05.intel.com ([192.55.52.43]:6447) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw18l-00012i-US for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:36 -0500 Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:52:18 -0800 Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga004.fm.intel.com with ESMTP; 19 Feb 2019 00:52:14 -0800 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="145414989" From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:52:14 +0800 Message-Id: <1550566334-3602-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.43 Subject: [Qemu-devel] [PATCH 1/5] vfio/migration: define kernel interfaces X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" - defined 4 device states regions: one control region and 3 data regions - defined layout of control region in struct vfio_device_state_ctl - defined 4 device states: running, stop, running&logging, stop&logging - define 3 device data categories: device config, device memory, system memory - defined 2 device data capabilities: device memory and system memory - defined device state interfaces' version and 12 device state interfaces Signed-off-by: Yan Zhao Signed-off-by: Kevin Tian Signed-off-by: Yulei Zhang --- linux-headers/linux/vfio.h | 260 +++++++++++++++++++++++++++++++++++++++++= ++++ 1 file changed, 260 insertions(+) diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h index ceb6453..a124fc1 100644 --- a/linux-headers/linux/vfio.h +++ b/linux-headers/linux/vfio.h @@ -303,6 +303,56 @@ struct vfio_region_info_cap_type { #define VFIO_REGION_SUBTYPE_INTEL_IGD_HOST_CFG (2) #define VFIO_REGION_SUBTYPE_INTEL_IGD_LPC_CFG (3) =20 +/* Device State region type and sub-type + * + * A VFIO device driver needs to register up to four device state regions = in + * total: two mandatory and another two optional, if it plans to support d= evice + * state management. + * + * 1. region CTL : + * Mandatory. + * This is a control region. + * Its layout is defined in struct vfio_device_state_ctl. + * Reading from this region can get version, capabilities and data + * size of device state interfaces. + * Writing to this region can set device state, data size and + * choose which interface to use. + * 2. region DEVICE_CONFIG + * Mandatory. + * This is a data region that holds device config data. + * Device config is such kind of data like MMIOs, page tables... + * Every device is supposed to possess device config data. + * Usually the size of device config data is small (no big + * than 10M), and it needs to be loaded in certain strict + * order. + * Therefore no dirty data logging is enabled for device + * config and it must be got/set as a whole. + * Size of device config data is smaller than or equal to that of + * device config region. + * It is able to be mmaped into user space. + * 3. region DEVICE_MEMORY + * Optional. + * This is a data region that holds device memory data. + * Device memory is device's internal memory, standalone and outs= ide + * system memory. It is usually very big. + * Not all device has device memory. Like IGD only uses system + * memory and has no device memory. + * Size of devie memory is usually larger than that of device + * memory region. qemu needs to save/load it in chunks of size of + * device memory region. + * It is able to be mmaped into user space. + * 4. region DIRTY_BITMAP + * Optional. + * This is a data region that holds bitmap of dirty pages in syst= em + * memory that a VFIO devices produces. + * It is able to be mmaped into user space. + */ +#define VFIO_REGION_TYPE_DEVICE_STATE (1 << 1) +#define VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL (1) +#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG (2) +#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_MEMORY (3) +#define VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP (4) + /* * The MSIX mappable capability informs that MSIX data of a BAR can be mma= pped * which allows direct access to non-MSIX registers which happened to be w= ithin @@ -816,6 +866,216 @@ struct vfio_iommu_spapr_tce_remove { }; #define VFIO_IOMMU_SPAPR_TCE_REMOVE _IO(VFIO_TYPE, VFIO_BASE + 20) =20 +/* version number of the device state interface */ +#define VFIO_DEVICE_STATE_INTERFACE_VERSION 1 + +/* + * For devices that have devcie memory, it is required to expose + * DEVICE_MEMORY capability. + * + * For devices producing dirty pages in system memory, it is required to + * expose cap SYSTEM_MEMORY in order to get dirty bitmap in certain range + * of system memory. + */ +#define VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY 1 +#define VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY 2 + +/* + * DEVICE STATES + * + * Four states are defined for a VFIO device: + * RUNNING, RUNNING & LOGGING, STOP & LOGGING, STOP. + * They can be set by writing to device_state field of + * vfio_device_state_ctl region. + * + * RUNNING: In this state, a VFIO device is in active state ready to + * receive commands from device driver. + * It is the default state that a VFIO device enters initially. + * + * STOP: In this state, a VFIO device is deactivated to interact with + * device driver. + * + * LOGGING state is a special state that it CANNOT exist + * independently. + * It must be set alongside with state RUNNING or STOP, i.e, + * RUNNING & LOGGING, STOP & LOGGING. + * It is used for dirty data logging both for device memory + * and system memory. + * + * LOGGING only impacts device/system memory. In LOGGING state, get buffer + * of device memory returns dirty pages since last call; outside LOGGING + * state, get buffer of device memory returns whole snapshot of device + * memory. system memory's dirty page is only available in LOGGING state. + * + * Device config should be always accessible and return whole config snaps= hot + * regardless of LOGGING state. + * */ +#define VFIO_DEVICE_STATE_RUNNING 0 +#define VFIO_DEVICE_STATE_STOP 1 +#define VFIO_DEVICE_STATE_LOGGING 2 + +/* action to get data from device memory or device config + * the action is write to device state's control region, and data is read + * from device memory region or device config region. + * Each time before read device memory region or device config region, + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to action + * field in control region. That is because device memory and devie config + * region is mmaped into user space. vendor driver has to be notified of + * the the GET_BUFFER action in advance. + */ +#define VFIO_DEVICE_DATA_ACTION_GET_BUFFER 1 + +/* action to set data to device memory or device config + * the action is write to device state's control region, and data is + * written to device memory region or device config region. + * Each time after write to device memory region or device config region, + * action VFIO_DEVICE_DATA_ACTION_GET_BUFFER is required to write to action + * field in control region. That is because device memory and devie config + * region is mmaped into user space. vendor driver has to be notified of + * the the SET_BUFFER action after data written. + */ +#define VFIO_DEVICE_DATA_ACTION_SET_BUFFER 2 + +/* layout of device state interfaces' control region + * By reading to control region and reading/writing data from device config + * region, device memory region, system memory regions, below interface can + * be implemented: + * + * 1. get version + * (1) user space calls read system call on "version" field of control + * region. + * (2) vendor driver writes version number of device state interfaces + * to the "version" field of control region. + * + * 2. get caps + * (1) user space calls read system call on "caps" field of control regi= on. + * (2) if a VFIO device has huge device memory, vendor driver reports + * VFIO_DEVICE_DATA_CAP_DEVICE_MEMORY in "caps" field of control regi= on. + * if a VFIO device produces dirty pages in system memory, vendor dri= ver + * reports VFIO_DEVICE_DATA_CAP_SYSTEM_MEMORY in "caps" field of + * control region. + * + * 3. set device state + * (1) user space calls write system call on "device_state" field of + * control region. + * (2) device state transitions as: + * + * RUNNING -- start dirty data logging --> RUNNING & LOGGING + * RUNNING -- deactivate --> STOP + * RUNNING -- deactivate & start dirty data longging --> STOP & LOGGING + * RUNNING & LOGGING -- stop dirty data logging --> RUNNING + * RUNNING & LOGGING -- deactivate --> STOP & LOGGING + * RUNNING & LOGGING -- deactivate & stop dirty data logging --> STOP + * STOP -- activate --> RUNNING + * STOP -- start dirty data logging --> STOP & LOGGING + * STOP -- activate & start dirty data logging --> RUNNING & LOGGING + * STOP & LOGGING -- stop dirty data logging --> STOP + * STOP & LOGGING -- activate --> RUNNING & LOGGING + * STOP & LOGGING -- activate & stop dirty data logging --> RUNNING + * + * 4. get device config size + * (1) user space calls read system call on "device_config.size" field of + * control region for the total size of device config snapshot. + * (2) vendor driver writes device config data's total size in + * "device_config.size" field of control region. + * + * 5. set device config size + * (1) user space calls write system call. + * total size of device config snapshot --> "device_config.size" fie= ld + * of control region. + * (2) vendor driver reads device config data's total size from + * "device_config.size" field of control region. + * + * 6 get device config buffer + * (1) user space calls write system call. + * "GET_BUFFER" --> "device_config.action" field of control region. + * (2) vendor driver + * a. gets whole snapshot for device config + * b. writes whole device config snapshot to region + * DEVICE_CONFIG. + * (3) user space reads the whole of device config snapshot from region + * DEVICE_CONFIG. + * + * 7. set device config buffer + * (1) user space writes whole of device config data to region + * DEVICE_CONFIG. + * (2) user space calls write system call. + * "SET_BUFFER" --> "device_config.action" field of control region. + * (3) vendor driver loads whole of device config from region DEVICE_CON= FIG. + * + * 8. get device memory size + * (1) user space calls read system call on "device_memory.size" field of + * control region for device memory size. + * (2) vendor driver + * a. gets device memory snapshot (in state RUNNING or STOP), or + * gets device memory dirty data (in state RUNNING & LOGGING or + * state STOP & LOGGING) + * b. writes size in "device_memory.size" field of control region + * + * 9. set device memory size + * (1) user space calls write system call on "device_memory.size" field = of + * control region to set total size of device memory snapshot. + * (2) vendor driver reads device memory's size from "device_memory.size" + * field of control region. + * + * + * 10. get device memory buffer + * (1) user space calls write system. + * pos --> "device_memory.pos" field of control region, + * "GET_BUFFER" --> "device_memory.action" field of control region. + * (pos must be 0 or multiples of length of region DEVICE_MEMORY). + * (2) vendor driver writes N'th chunk of device memory snapshot/dirty d= ata + * to region DEVICE_MEMORY. + * (N equals to pos/(region length of DEVICE_MEMORY)) + * (3) user space reads the N'th chunk of device memory snapshot/dirty d= ata + * from region DEVICE_MEMORY. + * + * 11. set device memory buffer + * (1) user space writes N'th chunk of device memory snapshot/dirty data= to + * region DEVICE_MEMORY. + * (N equals to pos/(region length of DEVICE_MEMORY)) + * (2) user space writes pos to "device_memory.pos" field and writes + * "SET_BUFFER" to "device_memory.action" field of control region. + * (3) vendor driver loads N'th chunk of device memory snapshot/dirty da= ta + * from region DEVICE_MEMORY. + * + * 12. get system memory dirty bitmap + * (1) user space calls write system call to specify a range of system + * memory that querying dirty pages. + * system memory's start address --> "system_memory.start_addr" field + * of control region, + * system memory's page count --> "system_memory.page_nr" field of + * control region. + * (2) if device state is not in RUNNING or STOP & LOGGING, + * vendor driver returns empty bitmap; otherwise, + * vendor driver checks the page_nr, + * if it's larger than the size that region DIRTY_BITMAP can support, + * error returns; if not, + * vendor driver returns as bitmap to specify dirty pages that + * device produces since last query in this range of system memory . + * (3) usespace reads back the dirty bitmap from region DIRTY_BITMAP. + * + */ + +struct vfio_device_state_ctl { + __u32 version; /* ro versio of devcie state interfaces*/ + __u32 device_state; /* VFIO device state, wo */ + __u32 caps; /* ro */ + struct { + __u32 action; /* wo, GET_BUFFER or SET_BUFFER */ + __u64 size; /*rw, total size of device config*/ + } device_config; + struct { + __u32 action; /* wo, GET_BUFFER or SET_BUFFER */ + __u64 size; /* rw, total size of device memory*/ + __u64 pos;/*chunk offset in total buffer of device memory*/ + } device_memory; + struct { + __u64 start_addr; /* wo */ + __u64 page_nr; /* wo */ + } system_memory; +}__attribute__((packed)); + /* ***************************************************************** */ =20 #endif /* VFIO_H */ --=20 2.7.4 From nobody Wed May 1 20:56:04 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=intel.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1550566744527384.8052030238082; Tue, 19 Feb 2019 00:59:04 -0800 (PST) Received: from localhost ([127.0.0.1]:44459 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1Ez-0007lE-Ao for importer@patchew.org; Tue, 19 Feb 2019 03:59:01 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55588) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw18o-0002iZ-Mw for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw18m-00015S-3K for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:38 -0500 Received: from mga05.intel.com ([192.55.52.43]:6456) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw18l-00014Y-NG for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:36 -0500 Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:52:32 -0800 Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga002.fm.intel.com with ESMTP; 19 Feb 2019 00:52:28 -0800 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="144637067" From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:52:27 +0800 Message-Id: <1550566347-3648-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.43 Subject: [Qemu-devel] [PATCH 2/5] vfio/migration: support device of device config capability X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Device config is the default data that every device should have. so device config capability is by default on, no need to set. - Currently two type of resources are saved/loaded for device of device config capability: General PCI config data, and Device config data. They are copies as a whole when precopy is stopped. Migration setup flow: - Setup device state regions, check its device state version and capabiliti= es. Mmap Device Config Region and Dirty Bitmap Region, if available. - If device state regions are failed to get setup, a migration blocker is registered instead. - Added SaveVMHandlers to register device state save/load handlers. - Register VM state change handler to set device's running/stop states. - On migration startup on source machine, set device's state to VFIO_DEVICE_STATE_LOGGING Signed-off-by: Yan Zhao Signed-off-by: Yulei Zhang --- hw/vfio/Makefile.objs | 2 +- hw/vfio/migration.c | 633 ++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/pci.c | 1 - hw/vfio/pci.h | 25 +- include/hw/vfio/vfio-common.h | 1 + 5 files changed, 659 insertions(+), 3 deletions(-) create mode 100644 hw/vfio/migration.c diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs index 8b3f664..f32ff19 100644 --- a/hw/vfio/Makefile.objs +++ b/hw/vfio/Makefile.objs @@ -1,6 +1,6 @@ ifeq ($(CONFIG_LINUX), y) obj-$(CONFIG_SOFTMMU) +=3D common.o -obj-$(CONFIG_PCI) +=3D pci.o pci-quirks.o display.o +obj-$(CONFIG_PCI) +=3D pci.o pci-quirks.o display.o migration.o obj-$(CONFIG_VFIO_CCW) +=3D ccw.o obj-$(CONFIG_SOFTMMU) +=3D platform.o obj-$(CONFIG_VFIO_XGMAC) +=3D calxeda-xgmac.o diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c new file mode 100644 index 0000000..16d6395 --- /dev/null +++ b/hw/vfio/migration.c @@ -0,0 +1,633 @@ +#include "qemu/osdep.h" + +#include "hw/vfio/vfio-common.h" +#include "migration/blocker.h" +#include "migration/register.h" +#include "qapi/error.h" +#include "pci.h" +#include "sysemu/kvm.h" +#include "exec/ram_addr.h" + +#define VFIO_SAVE_FLAG_SETUP 0 +#define VFIO_SAVE_FLAG_PCI 1 +#define VFIO_SAVE_FLAG_DEVCONFIG 2 +#define VFIO_SAVE_FLAG_DEVMEMORY 4 +#define VFIO_SAVE_FLAG_CONTINUE 8 + +static int vfio_device_state_region_setup(VFIOPCIDevice *vdev, + VFIORegion *region, uint32_t subtype, const char *name) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + struct vfio_region_info *info; + int ret; + + ret =3D vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_DEVICE_STA= TE, + subtype, &info); + if (ret) { + error_report("Failed to get info of region %s", name); + return ret; + } + + if (vfio_region_setup(OBJECT(vdev), vbasedev, + region, info->index, name)) { + error_report("Failed to setup migrtion region %s", name); + return ret; + } + + if (vfio_region_mmap(region)) { + error_report("Failed to mmap migrtion region %s", name); + } + + return 0; +} + +bool vfio_device_data_cap_system_memory(VFIOPCIDevice *vdev) +{ + return !!(vdev->migration->data_caps & VFIO_DEVICE_DATA_CAP_SYSTEM_MEMO= RY); +} + +bool vfio_device_data_cap_device_memory(VFIOPCIDevice *vdev) +{ + return !!(vdev->migration->data_caps & VFIO_DEVICE_DATA_CAP_DEVICE_MEMO= RY); +} + +static bool vfio_device_state_region_mmaped(VFIORegion *region) +{ + bool mmaped =3D true; + if (region->nr_mmaps !=3D 1 || region->mmaps[0].offset || + (region->size !=3D region->mmaps[0].size) || + (region->mmaps[0].mmap =3D=3D NULL)) { + mmaped =3D false; + } + + return mmaped; +} + +static int vfio_get_device_config_size(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_config =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG]; + uint64_t len; + int sz; + + sz =3D sizeof(len); + if (pread(vbasedev->fd, &len, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_config.size)) + !=3D sz) { + error_report("vfio: Failed to get length of device config"); + return -1; + } + if (len > region_config->size) { + error_report("vfio: Error device config length"); + return -1; + } + vdev->migration->devconfig_size =3D len; + + return 0; +} + +static int vfio_set_device_config_size(VFIOPCIDevice *vdev, uint64_t size) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_config =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG]; + int sz; + + if (size > region_config->size) { + return -1; + } + + sz =3D sizeof(size); + if (pwrite(vbasedev->fd, &size, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_config.size)) + !=3D sz) { + error_report("vfio: Failed to set length of device config"); + return -1; + } + vdev->migration->devconfig_size =3D size; + return 0; +} + +static int vfio_save_data_device_config(VFIOPCIDevice *vdev, QEMUFile *f) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_config =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG]; + void *dest; + uint32_t sz; + uint8_t *buf =3D NULL; + uint32_t action =3D VFIO_DEVICE_DATA_ACTION_GET_BUFFER; + uint64_t len =3D vdev->migration->devconfig_size; + + qemu_put_be64(f, len); + + sz =3D sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_config.actio= n)) + !=3D sz) { + error_report("vfio: action failure for device config get buffer"); + return -1; + } + + if (!vfio_device_state_region_mmaped(region_config)) { + buf =3D g_malloc(len); + if (buf =3D=3D NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + if (pread(vbasedev->fd, buf, len, region_config->fd_offset) !=3D l= en) { + error_report("vfio: Failed read device config buffer"); + return -1; + } + qemu_put_buffer(f, buf, len); + g_free(buf); + } else { + dest =3D region_config->mmaps[0].mmap; + qemu_put_buffer(f, dest, len); + } + return 0; +} + +static int vfio_load_data_device_config(VFIOPCIDevice *vdev, + QEMUFile *f, uint64_t len) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_config =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG]; + void *dest; + uint32_t sz; + uint8_t *buf =3D NULL; + uint32_t action =3D VFIO_DEVICE_DATA_ACTION_SET_BUFFER; + + vfio_set_device_config_size(vdev, len); + + if (!vfio_device_state_region_mmaped(region_config)) { + buf =3D g_malloc(len); + if (buf =3D=3D NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + qemu_get_buffer(f, buf, len); + if (pwrite(vbasedev->fd, buf, len, + region_config->fd_offset) !=3D len) { + error_report("vfio: Failed to write devie config buffer"); + return -1; + } + g_free(buf); + } else { + dest =3D region_config->mmaps[0].mmap; + qemu_get_buffer(f, dest, len); + } + + sz =3D sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_config.actio= n)) + !=3D sz) { + error_report("vfio: action failure for device config set buffer"); + return -1; + } + + return 0; +} + +static int vfio_set_dirty_page_bitmap_chunk(VFIOPCIDevice *vdev, + uint64_t start_addr, uint64_t page_nr) +{ + + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_bitmap =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_BITMAP]; + unsigned long bitmap_size =3D + BITS_TO_LONGS(page_nr) * sizeof(unsigned long); + uint32_t sz; + + struct { + __u64 start_addr; + __u64 page_nr; + } system_memory; + system_memory.start_addr =3D start_addr; + system_memory.page_nr =3D page_nr; + sz =3D sizeof(system_memory); + if (pwrite(vbasedev->fd, &system_memory, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, system_memory)) + !=3D sz) { + error_report("vfio: Failed to set system memory range for dirty pa= ges"); + return -1; + } + + if (!vfio_device_state_region_mmaped(region_bitmap)) { + void *bitmap =3D g_malloc0(bitmap_size); + + if (pread(vbasedev->fd, bitmap, bitmap_size, + region_bitmap->fd_offset) !=3D bitmap_size) { + error_report("vfio: Failed to read dirty bitmap data"); + return -1; + } + + cpu_physical_memory_set_dirty_lebitmap(bitmap, start_addr, page_nr= ); + + g_free(bitmap); + } else { + cpu_physical_memory_set_dirty_lebitmap( + region_bitmap->mmaps[0].mmap, + start_addr, page_nr); + } + return 0; +} + +int vfio_set_dirty_page_bitmap(VFIOPCIDevice *vdev, + uint64_t start_addr, uint64_t page_nr) +{ + VFIORegion *region_bitmap =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_BITMAP]; + unsigned long chunk_size =3D region_bitmap->size; + uint64_t chunk_pg_nr =3D (chunk_size / sizeof(unsigned long)) * + BITS_PER_LONG; + + uint64_t cnt_left; + int rc =3D 0; + + cnt_left =3D page_nr; + + while (cnt_left >=3D chunk_pg_nr) { + rc =3D vfio_set_dirty_page_bitmap_chunk(vdev, start_addr, chunk_pg= _nr); + if (rc) { + goto exit; + } + cnt_left -=3D chunk_pg_nr; + start_addr +=3D start_addr; + } + rc =3D vfio_set_dirty_page_bitmap_chunk(vdev, start_addr, cnt_left); + +exit: + return rc; +} + +static int vfio_set_device_state(VFIOPCIDevice *vdev, + uint32_t dev_state) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + uint32_t sz =3D sizeof(dev_state); + + if (!vdev->migration) { + return -1; + } + + if (pwrite(vbasedev->fd, &dev_state, sz, + region->fd_offset + + offsetof(struct vfio_device_state_ctl, device_state)) + !=3D sz) { + error_report("vfio: Failed to set device state %d", dev_state); + return -1; + } + vdev->migration->device_state =3D dev_state; + return 0; +} + +static int vfio_get_device_data_caps(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + + uint32_t caps; + uint32_t size =3D sizeof(caps); + + if (pread(vbasedev->fd, &caps, size, + region->fd_offset + + offsetof(struct vfio_device_state_ctl, caps)) + !=3D size) { + error_report("%s Failed to read data caps of device states", + vbasedev->name); + return -1; + } + vdev->migration->data_caps =3D caps; + return 0; +} + + +static int vfio_check_devstate_version(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + + uint32_t version; + uint32_t size =3D sizeof(version); + + if (pread(vbasedev->fd, &version, size, + region->fd_offset + + offsetof(struct vfio_device_state_ctl, version)) + !=3D size) { + error_report("%s Failed to read version of device state interfaces= ", + vbasedev->name); + return -1; + } + + if (version !=3D VFIO_DEVICE_STATE_INTERFACE_VERSION) { + error_report("%s migration version mismatch, right version is %d", + vbasedev->name, VFIO_DEVICE_STATE_INTERFACE_VERSION); + return -1; + } + + return 0; +} + +static void vfio_vm_change_state_handler(void *pv, int running, RunState s= tate) +{ + VFIOPCIDevice *vdev =3D pv; + uint32_t dev_state =3D vdev->migration->device_state; + + if (!running) { + dev_state |=3D VFIO_DEVICE_STATE_STOP; + } else { + dev_state &=3D ~VFIO_DEVICE_STATE_STOP; + } + + vfio_set_device_state(vdev, dev_state); +} + +static void vfio_save_live_pending(QEMUFile *f, void *opaque, + uint64_t max_size, + uint64_t *res_precopy_only, + uint64_t *res_compatible, + uint64_t *res_post_copy_only) +{ + VFIOPCIDevice *vdev =3D opaque; + + if (!vfio_device_data_cap_device_memory(vdev)) { + return; + } + + return; +} + +static int vfio_save_iterate(QEMUFile *f, void *opaque) +{ + VFIOPCIDevice *vdev =3D opaque; + + if (!vfio_device_data_cap_device_memory(vdev)) { + return 0; + } + + return 0; +} + +static void vfio_pci_load_config(VFIOPCIDevice *vdev, QEMUFile *f) +{ + PCIDevice *pdev =3D &vdev->pdev; + uint32_t ctl, msi_lo, msi_hi, msi_data, bar_cfg, i; + bool msi_64bit; + + /* retore pci bar configuration */ + ctl =3D pci_default_read_config(pdev, PCI_COMMAND, 2); + vfio_pci_write_config(pdev, PCI_COMMAND, + ctl & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2); + for (i =3D 0; i < PCI_ROM_SLOT; i++) { + bar_cfg =3D qemu_get_be32(f); + vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar_cfg, 4= ); + } + vfio_pci_write_config(pdev, PCI_COMMAND, + ctl | PCI_COMMAND_IO | PCI_COMMAND_MEMORY, 2); + + /* restore msi configuration */ + ctl =3D pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAGS, 2= ); + msi_64bit =3D !!(ctl & PCI_MSI_FLAGS_64BIT); + + vfio_pci_write_config(&vdev->pdev, + pdev->msi_cap + PCI_MSI_FLAGS, + ctl & (!PCI_MSI_FLAGS_ENABLE), 2); + + msi_lo =3D qemu_get_be32(f); + vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_LO, msi_lo= , 4); + + if (msi_64bit) { + msi_hi =3D qemu_get_be32(f); + vfio_pci_write_config(pdev, pdev->msi_cap + PCI_MSI_ADDRESS_HI, + msi_hi, 4); + } + msi_data =3D qemu_get_be32(f); + vfio_pci_write_config(pdev, + pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32= ), + msi_data, 2); + + vfio_pci_write_config(&vdev->pdev, pdev->msi_cap + PCI_MSI_FLAGS, + ctl | PCI_MSI_FLAGS_ENABLE, 2); + +} + +static int vfio_load_state(QEMUFile *f, void *opaque, int version_id) +{ + VFIOPCIDevice *vdev =3D opaque; + int flag; + uint64_t len; + int ret =3D 0; + + if (version_id !=3D VFIO_DEVICE_STATE_INTERFACE_VERSION) { + return -EINVAL; + } + + do { + flag =3D qemu_get_byte(f); + + switch (flag & ~VFIO_SAVE_FLAG_CONTINUE) { + case VFIO_SAVE_FLAG_SETUP: + break; + case VFIO_SAVE_FLAG_PCI: + vfio_pci_load_config(vdev, f); + break; + case VFIO_SAVE_FLAG_DEVCONFIG: + len =3D qemu_get_be64(f); + vfio_load_data_device_config(vdev, f, len); + break; + default: + ret =3D -EINVAL; + } + } while (flag & VFIO_SAVE_FLAG_CONTINUE); + + return ret; +} + +static void vfio_pci_save_config(VFIOPCIDevice *vdev, QEMUFile *f) +{ + PCIDevice *pdev =3D &vdev->pdev; + uint32_t msi_cfg, msi_lo, msi_hi, msi_data, bar_cfg, i; + bool msi_64bit; + + for (i =3D 0; i < PCI_ROM_SLOT; i++) { + bar_cfg =3D pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i *= 4, 4); + qemu_put_be32(f, bar_cfg); + } + + msi_cfg =3D pci_default_read_config(pdev, pdev->msi_cap + PCI_MSI_FLAG= S, 2); + msi_64bit =3D !!(msi_cfg & PCI_MSI_FLAGS_64BIT); + + msi_lo =3D pci_default_read_config(pdev, + pdev->msi_cap + PCI_MSI_ADDRESS_LO, 4); + qemu_put_be32(f, msi_lo); + + if (msi_64bit) { + msi_hi =3D pci_default_read_config(pdev, + pdev->msi_cap + PCI_MSI_ADDRESS_HI, + 4); + qemu_put_be32(f, msi_hi); + } + + msi_data =3D pci_default_read_config(pdev, + pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : PCI_MSI_DATA_32= ), + 2); + qemu_put_be32(f, msi_data); + +} + +static int vfio_save_complete_precopy(QEMUFile *f, void *opaque) +{ + VFIOPCIDevice *vdev =3D opaque; + int rc =3D 0; + + qemu_put_byte(f, VFIO_SAVE_FLAG_PCI | VFIO_SAVE_FLAG_CONTINUE); + vfio_pci_save_config(vdev, f); + + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVCONFIG); + rc +=3D vfio_get_device_config_size(vdev); + rc +=3D vfio_save_data_device_config(vdev, f); + + return rc; +} + +static int vfio_save_setup(QEMUFile *f, void *opaque) +{ + VFIOPCIDevice *vdev =3D opaque; + qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP); + + vfio_set_device_state(vdev, VFIO_DEVICE_STATE_RUNNING | + VFIO_DEVICE_STATE_LOGGING); + return 0; +} + +static int vfio_load_setup(QEMUFile *f, void *opaque) +{ + return 0; +} + +static void vfio_save_cleanup(void *opaque) +{ + VFIOPCIDevice *vdev =3D opaque; + uint32_t dev_state =3D vdev->migration->device_state; + + dev_state &=3D ~VFIO_DEVICE_STATE_LOGGING; + + vfio_set_device_state(vdev, dev_state); +} + +static SaveVMHandlers savevm_vfio_handlers =3D { + .save_setup =3D vfio_save_setup, + .save_live_pending =3D vfio_save_live_pending, + .save_live_iterate =3D vfio_save_iterate, + .save_live_complete_precopy =3D vfio_save_complete_precopy, + .save_cleanup =3D vfio_save_cleanup, + .load_setup =3D vfio_load_setup, + .load_state =3D vfio_load_state, +}; + +int vfio_migration_init(VFIOPCIDevice *vdev, Error **errp) +{ + int ret; + Error *local_err =3D NULL; + vdev->migration =3D g_new0(VFIOMigration, 1); + + if (vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL], + VFIO_REGION_SUBTYPE_DEVICE_STATE_CTL, + "device-state-ctl")) { + goto error; + } + + if (vfio_check_devstate_version(vdev)) { + goto error; + } + + if (vfio_get_device_data_caps(vdev)) { + goto error; + } + + if (vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_CONFIG], + VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_CONFIG, + "device-state-data-device-config")) { + goto error; + } + + if (vfio_device_data_cap_device_memory(vdev)) { + error_report("No suppport of data cap device memory Yet"); + goto error; + } + + if (vfio_device_data_cap_system_memory(vdev) && + vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_BITMAP], + VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_DIRTYBITMAP, + "device-state-data-dirtybitmap")) { + goto error; + } + + vdev->migration->device_state =3D VFIO_DEVICE_STATE_RUNNING; + + register_savevm_live(NULL, TYPE_VFIO_PCI, -1, + VFIO_DEVICE_STATE_INTERFACE_VERSION, + &savevm_vfio_handlers, + vdev); + + vdev->migration->vm_state =3D + qemu_add_vm_change_state_handler(vfio_vm_change_state_handler, vde= v); + + return 0; +error: + error_setg(&vdev->migration_blocker, + "VFIO device doesn't support migration"); + ret =3D migrate_add_blocker(vdev->migration_blocker, &local_err); + if (local_err) { + error_propagate(errp, local_err); + error_free(vdev->migration_blocker); + } + + g_free(vdev->migration); + vdev->migration =3D NULL; + + return ret; +} + +void vfio_migration_finalize(VFIOPCIDevice *vdev) +{ + if (vdev->migration) { + int i; + qemu_del_vm_change_state_handler(vdev->migration->vm_state); + unregister_savevm(NULL, TYPE_VFIO_PCI, vdev); + for (i =3D 0; i < VFIO_DEVSTATE_REGION_NUM; i++) { + vfio_region_finalize(&vdev->migration->region[i]); + } + g_free(vdev->migration); + vdev->migration =3D NULL; + } else if (vdev->migration_blocker) { + migrate_del_blocker(vdev->migration_blocker); + error_free(vdev->migration_blocker); + } +} diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index c0cb1ec..b8e006b 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -37,7 +37,6 @@ =20 #define MSIX_CAP_LENGTH 12 =20 -#define TYPE_VFIO_PCI "vfio-pci" #define PCI_VFIO(obj) OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI) =20 static void vfio_disable_interrupts(VFIOPCIDevice *vdev); diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index b1ae4c0..4b7b1bb 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -19,6 +19,7 @@ #include "qemu/event_notifier.h" #include "qemu/queue.h" #include "qemu/timer.h" +#include "sysemu/sysemu.h" =20 #define PCI_ANY_ID (~0) =20 @@ -56,6 +57,21 @@ typedef struct VFIOBAR { QLIST_HEAD(, VFIOQuirk) quirks; } VFIOBAR; =20 +enum { + VFIO_DEVSTATE_REGION_CTL =3D 0, + VFIO_DEVSTATE_REGION_DATA_CONFIG, + VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY, + VFIO_DEVSTATE_REGION_DATA_BITMAP, + VFIO_DEVSTATE_REGION_NUM, +}; +typedef struct VFIOMigration { + VFIORegion region[VFIO_DEVSTATE_REGION_NUM]; + uint32_t data_caps; + uint32_t device_state; + uint64_t devconfig_size; + VMChangeStateEntry *vm_state; +} VFIOMigration; + typedef struct VFIOVGARegion { MemoryRegion mem; off_t offset; @@ -132,6 +148,8 @@ typedef struct VFIOPCIDevice { VFIOBAR bars[PCI_NUM_REGIONS - 1]; /* No ROM */ VFIOVGA *vga; /* 0xa0000, 0x3b0, 0x3c0 */ void *igd_opregion; + VFIOMigration *migration; + Error *migration_blocker; PCIHostDeviceAddress host; EventNotifier err_notifier; EventNotifier req_notifier; @@ -198,5 +216,10 @@ int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev, void vfio_display_reset(VFIOPCIDevice *vdev); int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp); void vfio_display_finalize(VFIOPCIDevice *vdev); - +bool vfio_device_data_cap_system_memory(VFIOPCIDevice *vdev); +bool vfio_device_data_cap_device_memory(VFIOPCIDevice *vdev); +int vfio_set_dirty_page_bitmap(VFIOPCIDevice *vdev, + uint64_t start_addr, uint64_t page_nr); +int vfio_migration_init(VFIOPCIDevice *vdev, Error **errp); +void vfio_migration_finalize(VFIOPCIDevice *vdev); #endif /* HW_VFIO_VFIO_PCI_H */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 1b434d0..ed43613 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -32,6 +32,7 @@ #endif =20 #define VFIO_MSG_PREFIX "vfio %s: " +#define TYPE_VFIO_PCI "vfio-pci" =20 enum { VFIO_DEVICE_TYPE_PCI =3D 0, --=20 2.7.4 From nobody Wed May 1 20:56:04 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=intel.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1550566494210791.805873270432; Tue, 19 Feb 2019 00:54:54 -0800 (PST) Received: from localhost ([127.0.0.1]:44379 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1Ax-0004AN-6M for importer@patchew.org; Tue, 19 Feb 2019 03:54:51 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55682) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw192-0002tU-US for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw190-0001B8-VM for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:52 -0500 Received: from mga02.intel.com ([134.134.136.20]:25125) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw18z-00019y-01 for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:50 -0500 Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:52:46 -0800 Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga007.fm.intel.com with ESMTP; 19 Feb 2019 00:52:42 -0800 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="123524385" From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:52:41 +0800 Message-Id: <1550566361-3697-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 134.134.136.20 Subject: [Qemu-devel] [PATCH 3/5] vfio/migration: tracking of dirty page in system memory X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" register the log_sync interface to hook into ram's live migration callbacks. ram_save_pending |->migration_bitmap_sync |->memory_global_dirty_log_sync |->memory_region_sync_dirty_bitmap |->listener->log_sync(listener, &mrs); So, the dirty page produced by vfio device in system memory will be save/load by ram's live migration code iteratively. Bitmap of device's dirty page in system memory is retrieved from Dirty Bitm= ap Region Signed-off-by: Yan Zhao Signed-off-by: Yulei Zhang --- hw/vfio/common.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 7c185e5a..719e750 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -27,6 +27,7 @@ =20 #include "hw/vfio/vfio-common.h" #include "hw/vfio/vfio.h" +#include "hw/vfio/pci.h" #include "exec/address-spaces.h" #include "exec/memory.h" #include "hw/hw.h" @@ -698,9 +699,34 @@ static void vfio_listener_region_del(MemoryListener *l= istener, } } =20 +static void vfio_log_sync(MemoryListener *listener, + MemoryRegionSection *section) +{ + VFIOContainer *container =3D container_of(listener, VFIOContainer, lis= tener); + VFIOGroup *group =3D QLIST_FIRST(&container->group_list); + VFIODevice *vbasedev; + VFIOPCIDevice *vdev; + + ram_addr_t size =3D int128_get64(section->size); + uint64_t page_nr =3D size >> TARGET_PAGE_BITS; + uint64_t start_addr =3D section->offset_within_address_space; + + QLIST_FOREACH(vbasedev, &group->device_list, next) { + vdev =3D container_of(vbasedev, VFIOPCIDevice, vbasedev); + if (!vdev->migration || + !vfio_device_data_cap_system_memory(vdev) || + !(vdev->migration->device_state & VFIO_DEVICE_STATE_LOGGIN= G)) { + continue; + } + + vfio_set_dirty_page_bitmap(vdev, start_addr, page_nr); + } +} + static const MemoryListener vfio_memory_listener =3D { .region_add =3D vfio_listener_region_add, .region_del =3D vfio_listener_region_del, + .log_sync =3D vfio_log_sync, }; =20 static void vfio_listener_release(VFIOContainer *container) --=20 2.7.4 From nobody Wed May 1 20:56:04 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=intel.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 155056675540752.256547784626946; Tue, 19 Feb 2019 00:59:15 -0800 (PST) Received: from localhost ([127.0.0.1]:44461 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1FA-0007vx-Cp for importer@patchew.org; Tue, 19 Feb 2019 03:59:12 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55741) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw198-0002yM-He for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:59 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw197-0001DQ-LU for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:58 -0500 Received: from mga01.intel.com ([192.55.52.88]:27178) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw197-0001CV-Be for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:52:57 -0500 Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:52:54 -0800 Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga008.fm.intel.com with ESMTP; 19 Feb 2019 00:52:52 -0800 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="125511180" From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:52:51 +0800 Message-Id: <1550566371-3743-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.88 Subject: [Qemu-devel] [PATCH 4/5] vfio/migration: turn on migration X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" init vfio migration in vfio_realize() and register migraton blocker if failure met. finalize all migration resources when vfio_instance_finalize(). Signed-off-by: Yan Zhao Signed-off-by: Yulei Zhang --- hw/vfio/pci.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index b8e006b..8bf625e 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -3068,6 +3068,8 @@ static void vfio_realize(PCIDevice *pdev, Error **err= p) goto out_teardown; } =20 + vfio_migration_init(vdev, errp); + vfio_register_err_notifier(vdev); vfio_register_req_notifier(vdev); vfio_setup_resetfn_quirk(vdev); @@ -3089,6 +3091,7 @@ static void vfio_instance_finalize(Object *obj) =20 vfio_display_finalize(vdev); vfio_bars_finalize(vdev); + vfio_migration_finalize(vdev); g_free(vdev->emulated_config_bits); g_free(vdev->rom); /* @@ -3221,11 +3224,6 @@ static Property vfio_pci_dev_properties[] =3D { DEFINE_PROP_END_OF_LIST(), }; =20 -static const VMStateDescription vfio_pci_vmstate =3D { - .name =3D "vfio-pci", - .unmigratable =3D 1, -}; - static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc =3D DEVICE_CLASS(klass); @@ -3233,7 +3231,6 @@ static void vfio_pci_dev_class_init(ObjectClass *klas= s, void *data) =20 dc->reset =3D vfio_pci_reset; dc->props =3D vfio_pci_dev_properties; - dc->vmsd =3D &vfio_pci_vmstate; dc->desc =3D "VFIO-based PCI device assignment"; set_bit(DEVICE_CATEGORY_MISC, dc->categories); pdc->realize =3D vfio_realize; --=20 2.7.4 From nobody Wed May 1 20:56:04 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=intel.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1550566580021636.4209026475681; Tue, 19 Feb 2019 00:56:20 -0800 (PST) Received: from localhost ([127.0.0.1]:44435 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw1CI-0005RZ-Tl for importer@patchew.org; Tue, 19 Feb 2019 03:56:14 -0500 Received: from eggs.gnu.org ([209.51.188.92]:55870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gw19I-00035D-MT for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gw19G-0001IO-W7 for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:08 -0500 Received: from mga14.intel.com ([192.55.52.115]:14108) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gw19G-0001I0-Il for qemu-devel@nongnu.org; Tue, 19 Feb 2019 03:53:06 -0500 Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Feb 2019 00:53:05 -0800 Received: from joy-desktop.sh.intel.com ([10.239.13.17]) by fmsmga005.fm.intel.com with ESMTP; 19 Feb 2019 00:53:00 -0800 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.58,387,1544515200"; d="scan'208";a="321511650" From: Yan Zhao To: alex.williamson@redhat.com, qemu-devel@nongnu.org Date: Tue, 19 Feb 2019 16:53:00 +0800 Message-Id: <1550566380-3788-1-git-send-email-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> References: <1550566254-3545-1-git-send-email-yan.y.zhao@intel.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 192.55.52.115 Subject: [Qemu-devel] [PATCH 5/5] vfio/migration: support device memory capability X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: cjia@nvidia.com, kvm@vger.kernel.org, aik@ozlabs.ru, Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com, kwankhede@nvidia.com, eauger@redhat.com, yi.l.liu@intel.com, eskultet@redhat.com, ziye.yang@intel.com, mlevitsk@redhat.com, pasic@linux.ibm.com, arei.gonglei@huawei.com, felipe@nutanix.com, Ken.Xue@amd.com, kevin.tian@intel.com, Yan Zhao , dgilbert@redhat.com, intel-gvt-dev@lists.freedesktop.org, changpeng.liu@intel.com, cohuck@redhat.com, zhi.a.wang@intel.com, jonathan.davies@nutanix.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" If a device has device memory capability, save/load data from device memory in pre-copy and stop-and-copy phases. LOGGING state is set for device memory for dirty page logging: in LOGGING state, get device memory returns whole device memory snapshot; outside LOGGING state, get device memory returns dirty data since last get operation. Usually, device memory is very big, qemu needs to chunk it into several pieces each with size of device memory region. Signed-off-by: Yan Zhao Signed-off-by: Kirti Wankhede --- hw/vfio/migration.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++++= ++-- hw/vfio/pci.h | 1 + 2 files changed, 231 insertions(+), 5 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 16d6395..f1e9309 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -203,6 +203,201 @@ static int vfio_load_data_device_config(VFIOPCIDevice= *vdev, return 0; } =20 +static int vfio_get_device_memory_size(VFIOPCIDevice *vdev) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + uint64_t len; + int sz; + + sz =3D sizeof(len); + if (pread(vbasedev->fd, &len, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.size)) + !=3D sz) { + error_report("vfio: Failed to get length of device memory"); + return -1; + } + vdev->migration->devmem_size =3D len; + return 0; +} + +static int vfio_set_device_memory_size(VFIOPCIDevice *vdev, uint64_t size) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + int sz; + + sz =3D sizeof(size); + if (pwrite(vbasedev->fd, &size, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.size)) + !=3D sz) { + error_report("vfio: Failed to set length of device comemory"); + return -1; + } + vdev->migration->devmem_size =3D size; + return 0; +} + +static +int vfio_save_data_device_memory_chunk(VFIOPCIDevice *vdev, QEMUFile *f, + uint64_t pos, uint64_t len) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_devmem =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + void *dest; + uint32_t sz; + uint8_t *buf =3D NULL; + uint32_t action =3D VFIO_DEVICE_DATA_ACTION_GET_BUFFER; + + if (len > region_devmem->size) { + return -1; + } + + sz =3D sizeof(pos); + if (pwrite(vbasedev->fd, &pos, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.pos)) + !=3D sz) { + error_report("vfio: Failed to set save buffer pos"); + return -1; + } + sz =3D sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.actio= n)) + !=3D sz) { + error_report("vfio: Failed to set save buffer action"); + return -1; + } + + if (!vfio_device_state_region_mmaped(region_devmem)) { + buf =3D g_malloc(len); + if (buf =3D=3D NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + if (pread(vbasedev->fd, buf, len, region_devmem->fd_offset) !=3D l= en) { + error_report("vfio: error load device memory buffer"); + return -1; + } + qemu_put_be64(f, len); + qemu_put_be64(f, pos); + qemu_put_buffer(f, buf, len); + g_free(buf); + } else { + dest =3D region_devmem->mmaps[0].mmap; + qemu_put_be64(f, len); + qemu_put_be64(f, pos); + qemu_put_buffer(f, dest, len); + } + return 0; +} + +static int vfio_save_data_device_memory(VFIOPCIDevice *vdev, QEMUFile *f) +{ + VFIORegion *region_devmem =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + uint64_t total_len =3D vdev->migration->devmem_size; + uint64_t pos =3D 0; + + qemu_put_be64(f, total_len); + while (pos < total_len) { + uint64_t len =3D region_devmem->size; + + if (pos + len >=3D total_len) { + len =3D total_len - pos; + } + if (vfio_save_data_device_memory_chunk(vdev, f, pos, len)) { + return -1; + } + } + + return 0; +} + +static +int vfio_load_data_device_memory_chunk(VFIOPCIDevice *vdev, QEMUFile *f, + uint64_t pos, uint64_t len) +{ + VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIORegion *region_ctl =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_CTL]; + VFIORegion *region_devmem =3D + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_MEMORY]; + + void *dest; + uint32_t sz; + uint8_t *buf =3D NULL; + uint32_t action =3D VFIO_DEVICE_DATA_ACTION_SET_BUFFER; + + if (len > region_devmem->size) { + return -1; + } + + sz =3D sizeof(pos); + if (pwrite(vbasedev->fd, &pos, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.pos)) + !=3D sz) { + error_report("vfio: Failed to set device memory buffer pos"); + return -1; + } + if (!vfio_device_state_region_mmaped(region_devmem)) { + buf =3D g_malloc(len); + if (buf =3D=3D NULL) { + error_report("vfio: Failed to allocate memory for migrate"); + return -1; + } + qemu_get_buffer(f, buf, len); + if (pwrite(vbasedev->fd, buf, len, + region_devmem->fd_offset) !=3D len) { + error_report("vfio: Failed to load devie memory buffer"); + return -1; + } + g_free(buf); + } else { + dest =3D region_devmem->mmaps[0].mmap; + qemu_get_buffer(f, dest, len); + } + + sz =3D sizeof(action); + if (pwrite(vbasedev->fd, &action, sz, + region_ctl->fd_offset + + offsetof(struct vfio_device_state_ctl, device_memory.actio= n)) + !=3D sz) { + error_report("vfio: Failed to set load device memory buffer action= "); + return -1; + } + + return 0; + +} + +static int vfio_load_data_device_memory(VFIOPCIDevice *vdev, + QEMUFile *f, uint64_t total_len) +{ + uint64_t pos =3D 0, len =3D 0; + + vfio_set_device_memory_size(vdev, total_len); + + while (pos + len < total_len) { + len =3D qemu_get_be64(f); + pos =3D qemu_get_be64(f); + + vfio_load_data_device_memory_chunk(vdev, f, pos, len); + } + + return 0; +} + + static int vfio_set_dirty_page_bitmap_chunk(VFIOPCIDevice *vdev, uint64_t start_addr, uint64_t page_nr) { @@ -377,6 +572,10 @@ static void vfio_save_live_pending(QEMUFile *f, void *= opaque, return; } =20 + /* get dirty data size of device memory */ + vfio_get_device_memory_size(vdev); + + *res_precopy_only +=3D vdev->migration->devmem_size; return; } =20 @@ -388,7 +587,9 @@ static int vfio_save_iterate(QEMUFile *f, void *opaque) return 0; } =20 - return 0; + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY); + /* get dirty data of device memory */ + return vfio_save_data_device_memory(vdev, f); } =20 static void vfio_pci_load_config(VFIOPCIDevice *vdev, QEMUFile *f) @@ -458,6 +659,10 @@ static int vfio_load_state(QEMUFile *f, void *opaque, = int version_id) len =3D qemu_get_be64(f); vfio_load_data_device_config(vdev, f, len); break; + case VFIO_SAVE_FLAG_DEVMEMORY: + len =3D qemu_get_be64(f); + vfio_load_data_device_memory(vdev, f, len); + break; default: ret =3D -EINVAL; } @@ -503,6 +708,13 @@ static int vfio_save_complete_precopy(QEMUFile *f, voi= d *opaque) VFIOPCIDevice *vdev =3D opaque; int rc =3D 0; =20 + if (vfio_device_data_cap_device_memory(vdev)) { + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY | VFIO_SAVE_FLAG_CONTINU= E); + /* get dirty data of device memory */ + vfio_get_device_memory_size(vdev); + rc =3D vfio_save_data_device_memory(vdev, f); + } + qemu_put_byte(f, VFIO_SAVE_FLAG_PCI | VFIO_SAVE_FLAG_CONTINUE); vfio_pci_save_config(vdev, f); =20 @@ -515,12 +727,22 @@ static int vfio_save_complete_precopy(QEMUFile *f, vo= id *opaque) =20 static int vfio_save_setup(QEMUFile *f, void *opaque) { + int rc =3D 0; VFIOPCIDevice *vdev =3D opaque; - qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP); + + if (vfio_device_data_cap_device_memory(vdev)) { + qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP | VFIO_SAVE_FLAG_CONTINUE); + qemu_put_byte(f, VFIO_SAVE_FLAG_DEVMEMORY); + /* get whole snapshot of device memory */ + vfio_get_device_memory_size(vdev); + rc =3D vfio_save_data_device_memory(vdev, f); + } else { + qemu_put_byte(f, VFIO_SAVE_FLAG_SETUP); + } =20 vfio_set_device_state(vdev, VFIO_DEVICE_STATE_RUNNING | VFIO_DEVICE_STATE_LOGGING); - return 0; + return rc; } =20 static int vfio_load_setup(QEMUFile *f, void *opaque) @@ -576,8 +798,11 @@ int vfio_migration_init(VFIOPCIDevice *vdev, Error **e= rrp) goto error; } =20 - if (vfio_device_data_cap_device_memory(vdev)) { - error_report("No suppport of data cap device memory Yet"); + if (vfio_device_data_cap_device_memory(vdev) && + vfio_device_state_region_setup(vdev, + &vdev->migration->region[VFIO_DEVSTATE_REGION_DATA_DEVICE_ME= MORY], + VFIO_REGION_SUBTYPE_DEVICE_STATE_DATA_MEMORY, + "device-state-data-device-memory")) { goto error; } =20 diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 4b7b1bb..a2cc64b 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -69,6 +69,7 @@ typedef struct VFIOMigration { uint32_t data_caps; uint32_t device_state; uint64_t devconfig_size; + uint64_t devmem_size; VMChangeStateEntry *vm_state; } VFIOMigration; =20 --=20 2.7.4