From nobody Wed Feb 11 02:08:40 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1693392894; cv=none; d=zohomail.com; s=zohoarc; b=DdZCQk4p7KnVvDUtHJcwI8YeSGUHd2cHty6qAtc9IvgKB+FyQTr3dKB9GpM7pfpBTMspFC6Cl7mcNPXJYlC0gYkwlqnHxopBzuWBi4z27nUa6HJ3oruEKpHoOPZqBZZDPa+vnyymaMnz8IvNoUMAc2XlNmaDd8b/4BpFPfJe3Ms= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1693392894; h=Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=OVM3ylRxHg/x9ot1IKHhvKXyQBAImJSMuzltaQgDF0Q=; b=cs1i4zgwZssVjtKn6V9oTWW+VLXXNdnhgjDT299uiRwmSY2C2X4G/VG2q6393WUmtuzlmTjF39xml11B6tL2wGoEqHv3sBgpVzQU3I+Z61Ca2pJKFhFL0A4B8kz4sNJLxRfYRYn4mwtqLEBHmAr9QqENSBCO5AMY57vxMj4gop0= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1693392894649567.4102040757415; Wed, 30 Aug 2023 03:54:54 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qbIqM-0007XH-0u; Wed, 30 Aug 2023 06:54:38 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qbIqA-00074j-N2 for qemu-devel@nongnu.org; Wed, 30 Aug 2023 06:54:28 -0400 Received: from mgamail.intel.com ([192.55.52.93]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qbIq7-0007je-NB for qemu-devel@nongnu.org; Wed, 30 Aug 2023 06:54:26 -0400 Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2023 03:54:00 -0700 Received: from duan-server-s2600bt.bj.intel.com ([10.240.192.147]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Aug 2023 03:53:56 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1693392863; x=1724928863; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=MJ3hN+vwo4EtjM/DCbistPA7dAnDb5Lm8vWq7drftMc=; b=Jc8jsUdpq/oRgH2wcZDrum//OoHq6AsFObwAGB78PPQn4UVBE4ix26KT 9Hh1Ja3ZkGhP1eQb1FWNCag6QVFXpPdsytPkC7cCGX5S1xt0USBiYLjYx qWDPxQLWvKEt8lr+z0AWkjFuDHuLfbWiAlZ7eMJlXCRAtSyX4DCVL1IDk 7cr9i9sE5cdcmxESpC5oLmo7p7NNoSe1GHHE2aBd6Hm80jG58ZjjLpxj2 Dw40TErImLgYrJOlTaCJXs7HBS9Xv3GKeI5mf7DGXBS6QUabbhuZibG7r 63FuEZXUiy9qo1g4NlejfrMy4/8lP05lS0puQdUUUxwyCXViRrW1MjmrN w==; X-IronPort-AV: E=McAfee;i="6600,9927,10817"; a="373016714" X-IronPort-AV: E=Sophos;i="6.02,213,1688454000"; d="scan'208";a="373016714" X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10817"; a="715866295" X-IronPort-AV: E=Sophos;i="6.02,213,1688454000"; d="scan'208";a="715866295" From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan Subject: [PATCH v1 20/22] vfio/pci: Adapt vfio pci hot reset support with iommufd BE Date: Wed, 30 Aug 2023 18:37:52 +0800 Message-Id: <20230830103754.36461-21-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230830103754.36461-1-zhenzhong.duan@intel.com> References: <20230830103754.36461-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=192.55.52.93; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @intel.com) X-ZM-MESSAGEID: 1693392896055100003 Content-Type: text/plain; charset="utf-8" As pci hot reset path need to reference pci specific functions and data structures, adding container level callback functions for legacy and iommufd BE and referencing those pci specific func/data is no better than implementing reset support with iommufd BE directly in pci.c This way we can also share the common bus reset and system reset path for different BEs. Signed-off-by: Zhenzhong Duan --- hw/vfio/pci.c | 224 +++++++++++++++++++++++++++++++++++++++---- hw/vfio/trace-events | 1 + 2 files changed, 208 insertions(+), 17 deletions(-) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 34f65ecd17..3a8fee3c99 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -42,6 +42,7 @@ #include "qapi/error.h" #include "migration/blocker.h" #include "migration/qemu-file.h" +#include "linux/iommufd.h" =20 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug" =20 @@ -2378,22 +2379,13 @@ static bool vfio_pci_host_match(PCIHostDeviceAddres= s *addr, const char *name) return (strcmp(tmp, name) =3D=3D 0); } =20 -static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) +static int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev, + struct vfio_pci_hot_reset_info **in= fo_p) { - VFIOGroup *group; struct vfio_pci_hot_reset_info *info; - struct vfio_pci_dependent_device *devices; - struct vfio_pci_hot_reset *reset; - int32_t *fds; - int ret, i, count; - bool multi =3D false; - - trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi"= ); + int ret, count; =20 - if (!single) { - vfio_pci_pre_reset(vdev); - } - vdev->vbasedev.needs_reset =3D false; + assert(info_p && !*info_p); =20 info =3D g_malloc0(sizeof(*info)); info->argsz =3D sizeof(*info); @@ -2401,24 +2393,53 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, = bool single) ret =3D ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, i= nfo); if (ret && errno !=3D ENOSPC) { ret =3D -errno; + g_free(info); if (!vdev->has_pm_reset) { error_report("vfio: Cannot reset device %s, " "no available reset mechanism.", vdev->vbasedev.n= ame); } - goto out_single; + return ret; } =20 count =3D info->count; - info =3D g_realloc(info, sizeof(*info) + (count * sizeof(*devices))); - info->argsz =3D sizeof(*info) + (count * sizeof(*devices)); - devices =3D &info->devices[0]; + info =3D g_realloc(info, sizeof(*info) + (count * sizeof(info->devices= [0]))); + info->argsz =3D sizeof(*info) + (count * sizeof(info->devices[0])); =20 ret =3D ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, i= nfo); if (ret) { ret =3D -errno; + g_free(info); error_report("vfio: hot reset info failed: %m"); + return ret; + } + + *info_p =3D info; + return 0; +} + +static int vfio_pci_hot_reset_legacy(VFIOPCIDevice *vdev, bool single) +{ + VFIOGroup *group; + struct vfio_pci_hot_reset_info *info =3D NULL; + struct vfio_pci_dependent_device *devices; + struct vfio_pci_hot_reset *reset; + int32_t *fds; + int ret, i, count; + bool multi =3D false; + + trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi"= ); + + if (!single) { + vfio_pci_pre_reset(vdev); + } + vdev->vbasedev.needs_reset =3D false; + + ret =3D vfio_pci_get_pci_hot_reset_info(vdev, &info); + + if (ret) { goto out_single; } + devices =3D &info->devices[0]; =20 trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name); =20 @@ -2560,6 +2581,175 @@ out_single: return ret; } =20 +#ifdef CONFIG_IOMMUFD +static VFIODevice *vfio_pci_find_by_iommufd_devid(__u32 devid) +{ + VFIOAddressSpace *space; + VFIOContainer *bcontainer; + VFIOIOMMUFDContainer *container; + VFIOIOASHwpt *hwpt; + VFIODevice *vbasedev_iter; + VFIOIOMMUBackendOpsClass *ops =3D VFIO_IOMMU_BACKEND_OPS_CLASS( + object_class_by_name(TYPE_VFIO_IOMMU_BACKEND_IOMMUFD_OPS)); + + QLIST_FOREACH(space, &vfio_address_spaces, list) { + QLIST_FOREACH(bcontainer, &space->containers, next) { + if (bcontainer->ops !=3D ops) { + continue; + } + container =3D container_of(bcontainer, VFIOIOMMUFDContainer, + bcontainer); + QLIST_FOREACH(hwpt, &container->hwpt_list, next) { + QLIST_FOREACH(vbasedev_iter, &hwpt->device_list, next) { + if (devid =3D=3D vbasedev_iter->devid) { + return vbasedev_iter; + } + } + } + } + } + return NULL; +} + +static int vfio_pci_hot_reset_iommufd(VFIOPCIDevice *vdev, bool single) +{ + struct vfio_pci_hot_reset_info *info =3D NULL; + struct vfio_pci_dependent_device *devices; + struct vfio_pci_hot_reset *reset; + int ret, i; + bool multi =3D false; + + trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi"= ); + + if (!single) { + vfio_pci_pre_reset(vdev); + } + vdev->vbasedev.needs_reset =3D false; + + ret =3D vfio_pci_get_pci_hot_reset_info(vdev, &info); + + if (ret) { + goto out_single; + } + + assert(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID); + + devices =3D &info->devices[0]; + + if (!(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED)) { + if (!vdev->has_pm_reset) { + for (i =3D 0; i < info->count; i++) { + if (devices[i].devid =3D=3D VFIO_PCI_DEVID_NOT_OWNED) { + error_report("vfio: Cannot reset device %s, " + "depends on device %04x:%02x:%02x.%x " + "which is not owned.", + vdev->vbasedev.name, devices[i].segment, + devices[i].bus, PCI_SLOT(devices[i].devfn= ), + PCI_FUNC(devices[i].devfn)); + } + } + } + ret =3D -EPERM; + goto out_single; + } + + trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name); + + for (i =3D 0; i < info->count; i++) { + VFIOPCIDevice *tmp; + VFIODevice *vbasedev_iter; + + trace_vfio_pci_hot_reset_dep_devices_iommufd(devices[i].segment, + devices[i].bus, + PCI_SLOT(devices[i].devfn), + PCI_FUNC(devices[i].devfn), + devices[i].devid); + + /* + * If a VFIO cdev device is resettable, all the dependent devices + * are either bound to same iommufd or within same iommu_groups as + * one of the iommufd bound devices. + */ + assert(devices[i].devid !=3D VFIO_PCI_DEVID_NOT_OWNED); + + if (devices[i].devid =3D=3D vdev->vbasedev.devid || + devices[i].devid =3D=3D VFIO_PCI_DEVID_OWNED) { + continue; + } + + vbasedev_iter =3D vfio_pci_find_by_iommufd_devid(devices[i].devid); + if (!vbasedev_iter || !vbasedev_iter->dev->realized || + vbasedev_iter->type !=3D VFIO_DEVICE_TYPE_PCI) { + continue; + } + tmp =3D container_of(vbasedev_iter, VFIOPCIDevice, vbasedev); + if (single) { + ret =3D -EINVAL; + goto out_single; + } + vfio_pci_pre_reset(tmp); + tmp->vbasedev.needs_reset =3D false; + multi =3D true; + } + + if (!single && !multi) { + ret =3D -EINVAL; + goto out_single; + } + + /* Use zero length array for hot reset with iommufd backend */ + reset =3D g_malloc0(sizeof(*reset)); + reset->argsz =3D sizeof(*reset); + + /* Bus reset! */ + ret =3D ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset); + g_free(reset); + + trace_vfio_pci_hot_reset_result(vdev->vbasedev.name, + ret ? strerror(errno) : "Success"); + + /* Re-enable INTx on affected devices */ + for (i =3D 0; i < info->count; i++) { + VFIOPCIDevice *tmp; + VFIODevice *vbasedev_iter; + + if (devices[i].devid =3D=3D vdev->vbasedev.devid || + devices[i].devid =3D=3D VFIO_PCI_DEVID_OWNED) { + continue; + } + + vbasedev_iter =3D vfio_pci_find_by_iommufd_devid(devices[i].devid); + if (!vbasedev_iter || !vbasedev_iter->dev->realized || + vbasedev_iter->type !=3D VFIO_DEVICE_TYPE_PCI) { + continue; + } + tmp =3D container_of(vbasedev_iter, VFIOPCIDevice, vbasedev); + vfio_pci_post_reset(tmp); + } +out_single: + if (!single) { + vfio_pci_post_reset(vdev); + } + g_free(info); + + return ret; +} +#endif + +static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) +{ +#ifdef CONFIG_IOMMUFD + if (vdev->vbasedev.iommufd) { + return vfio_pci_hot_reset_iommufd(vdev, single); + } else +#endif + { + return vfio_pci_hot_reset_legacy(vdev, single); + } +} + + + /* * We want to differentiate hot reset of multiple in-use devices vs hot re= set * of a single in-use device. VFIO_DEVICE_RESET will already handle the c= ase diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 60b56f23a1..c4f3b337b8 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -34,6 +34,7 @@ vfio_check_af_flr(const char *name) "%s Supports FLR via = AF cap" vfio_pci_hot_reset(const char *name, const char *type) " (%s) %s" vfio_pci_hot_reset_has_dep_devices(const char *name) "%s: hot reset depend= ent devices:" vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function= , int group_id) "\t%04x:%02x:%02x.%x group %d" +vfio_pci_hot_reset_dep_devices_iommufd(int domain, int bus, int slot, int = function, int dev_id) "\t%04x:%02x:%02x.%x devid %d" vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot re= set: %s" vfio_populate_device_config(const char *name, unsigned long size, unsigned= long offset, unsigned long flags) "Device %s config:\n size: 0x%lx, offse= t: 0x%lx, flags: 0x%lx" vfio_populate_device_get_irq_info_failure(const char *errstr) "VFIO_DEVICE= _GET_IRQ_INFO failure: %s" --=20 2.34.1