From nobody Wed Nov 27 12:28:17 2024 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=intel.com ARC-Seal: i=1; a=rsa-sha256; t=1699531381; cv=none; d=zohomail.com; s=zohoarc; b=aCvEVvpkfVFKA7EO6tE7ZWEUqt3Bdwbh5ZQ9EFff5mTml4jUodZTm0CZaTwWT+oM8Z3I0ZyhxdXZIqTiHfqz5EfkJ+gzFr5Lw2v3vXbjyeOkcCWJ5QidHhZPpgbS+8oq6G3gsGcun+KZv7RWw5tbNWUXwAFdGSiIvQtTdoXy5jw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1699531381; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=QNi80b6g6Zb+bTYfF0g8JTT3EuU2GPCT8fLCK96zq/Q=; b=hsCrxjtkMHm0uQVWMSl1cLj5sLztOCbmajdRR7PuMQ/L1CZ7EIp2+725fvfjB1iftXz7Fp3o4FPPig5RRWYNMPmkbD7iiDL3txd3Y0vbIxQU7wHBay3P8iml67UYClTfpkc8fLuh7NI7jdgeoRWh72B+VQj/gfGxsfIP0xjtnQY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass header.i=@intel.com; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1699531381041723.2629781553106; Thu, 9 Nov 2023 04:03:01 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1r13j8-0001t0-QV; Thu, 09 Nov 2023 07:01:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r13j5-0001sj-RY for qemu-devel@nongnu.org; Thu, 09 Nov 2023 07:01:35 -0500 Received: from mgamail.intel.com ([192.55.52.136]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1r13j2-0001ej-9d for qemu-devel@nongnu.org; Thu, 09 Nov 2023 07:01:35 -0500 Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 04:01:26 -0800 Received: from duan-server-s2600bt.bj.intel.com ([10.240.192.147]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 04:01:19 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699531292; x=1731067292; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=F3chQL1M72NeWVmCa7AhXQWoLFGv95OLyspEsviFHKg=; b=OefVwF8hSv/FifCw3Hx1wSm9OLESlVX/olGxaz/v63YH/TrAXlCH6nc/ ZHy8v7SzRTmSlO8SN7QXhhX0CQUtQsiLLX9NkS/7YtE5hgMfIGR+ylBFH /OAks5iY4QzEyFzLPihr2JkSbscsggg0Hhc0KnwhqxylDoE2uFdS3O2nS kCalLK7YUgt24oZGjsisAcshyNWmViGQn6YnKuY9HrtoxcU7b9W29mFjI 8J2tDH9pUzmSApLXHv6Hcm/SLzuUDL8nrS0upqs/jikMqaMxqUyXwKahN v7DhF3obXon6jW+oJEYttEcFTO03GPZG+gNsmVu3g8Ao8JHmW5o9Wdbgd Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10888"; a="369305635" X-IronPort-AV: E=Sophos;i="6.03,289,1694761200"; d="scan'208";a="369305635" X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,289,1694761200"; d="scan'208";a="11516004" From: Zhenzhong Duan To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com, joao.m.martins@oracle.com, eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, chao.p.peng@intel.com, Zhenzhong Duan Subject: [PATCH v5 07/20] vfio/pci: Introduce a vfio pci hot reset interface Date: Thu, 9 Nov 2023 19:45:16 +0800 Message-Id: <20231109114529.1904193-8-zhenzhong.duan@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231109114529.1904193-1-zhenzhong.duan@intel.com> References: <20231109114529.1904193-1-zhenzhong.duan@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=192.55.52.136; envelope-from=zhenzhong.duan@intel.com; helo=mgamail.intel.com X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @intel.com) X-ZM-MESSAGEID: 1699531382654100001 Legacy vfio pci and iommufd cdev have different process to hot reset vfio device, expand current code to abstract out pci_hot_reset callback for legacy vfio, this same interface will also be used by iommufd cdev vfio device. Rename vfio_pci_hot_reset to vfio_legacy_pci_hot_reset and move it into container.c, vfio_pci_host_match is also moved as a dependency. vfio_pci_[pre/post]_reset are exported so they could be called in legacy and iommufd pci_hot_reset callback. Suggested-by: C=C3=A9dric Le Goater Signed-off-by: Zhenzhong Duan --- v5: Move vfio_legacy_pci_hot_reset into container.c hw/vfio/pci.h | 2 + include/hw/vfio/vfio-container-base.h | 3 + hw/vfio/container.c | 177 ++++++++++++++++++++++++++ hw/vfio/pci.c | 176 +------------------------ 4 files changed, 187 insertions(+), 171 deletions(-) diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 1006061afb..9264bce721 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -218,6 +218,8 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr); =20 extern const PropertyInfo qdev_prop_nv_gpudirect_clique; =20 +void vfio_pci_pre_reset(VFIOPCIDevice *vdev); +void vfio_pci_post_reset(VFIOPCIDevice *vdev); int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev, struct vfio_pci_hot_reset_info **info_= p); =20 diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-c= ontainer-base.h index 4b6f017c6f..45bb19c767 100644 --- a/include/hw/vfio/vfio-container-base.h +++ b/include/hw/vfio/vfio-container-base.h @@ -106,6 +106,9 @@ struct VFIOIOMMUOps { int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool sta= rt); int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *v= bmap, hwaddr iova, hwaddr size); + /* PCI specific */ + int (*pci_hot_reset)(VFIODevice *vbasedev, bool single); + /* SPAPR specific */ int (*add_window)(VFIOContainerBase *bcontainer, MemoryRegionSection *section, diff --git a/hw/vfio/container.c b/hw/vfio/container.c index ed2d721b2b..cb957d063b 100644 --- a/hw/vfio/container.c +++ b/hw/vfio/container.c @@ -33,6 +33,7 @@ #include "trace.h" #include "qapi/error.h" #include "migration/migration.h" +#include "pci.h" =20 VFIOGroupList vfio_group_list =3D QLIST_HEAD_INITIALIZER(vfio_group_list); @@ -922,6 +923,181 @@ static void vfio_legacy_detach_device(VFIODevice *vba= sedev) vfio_put_group(group); } =20 +static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *na= me) +{ + char tmp[13]; + + sprintf(tmp, "%04x:%02x:%02x.%1x", addr->domain, + addr->bus, addr->slot, addr->function); + + return (strcmp(tmp, name) =3D=3D 0); +} + +static int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single) +{ + VFIOPCIDevice *vdev =3D container_of(vbasedev, VFIOPCIDevice, vbasedev= ); + VFIOGroup *group; + struct vfio_pci_hot_reset_info *info =3D NULL; + struct vfio_pci_dependent_device *devices; + struct vfio_pci_hot_reset *reset; + int32_t *fds; + int ret, i, count; + bool multi =3D false; + + trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi"= ); + + if (!single) { + vfio_pci_pre_reset(vdev); + } + vdev->vbasedev.needs_reset =3D false; + + ret =3D vfio_pci_get_pci_hot_reset_info(vdev, &info); + + if (ret) { + goto out_single; + } + devices =3D &info->devices[0]; + + trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name); + + /* Verify that we have all the groups required */ + for (i =3D 0; i < info->count; i++) { + PCIHostDeviceAddress host; + VFIOPCIDevice *tmp; + VFIODevice *vbasedev_iter; + + host.domain =3D devices[i].segment; + host.bus =3D devices[i].bus; + host.slot =3D PCI_SLOT(devices[i].devfn); + host.function =3D PCI_FUNC(devices[i].devfn); + + trace_vfio_pci_hot_reset_dep_devices(host.domain, + host.bus, host.slot, host.function, devices[i].group_id); + + if (vfio_pci_host_match(&host, vdev->vbasedev.name)) { + continue; + } + + QLIST_FOREACH(group, &vfio_group_list, next) { + if (group->groupid =3D=3D devices[i].group_id) { + break; + } + } + + if (!group) { + if (!vdev->has_pm_reset) { + error_report("vfio: Cannot reset device %s, " + "depends on group %d which is not owned.", + vdev->vbasedev.name, devices[i].group_id); + } + ret =3D -EPERM; + goto out; + } + + /* Prep dependent devices for reset and clear our marker. */ + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) { + if (!vbasedev_iter->dev->realized || + vbasedev_iter->type !=3D VFIO_DEVICE_TYPE_PCI) { + continue; + } + tmp =3D container_of(vbasedev_iter, VFIOPCIDevice, vbasedev); + if (vfio_pci_host_match(&host, tmp->vbasedev.name)) { + if (single) { + ret =3D -EINVAL; + goto out_single; + } + vfio_pci_pre_reset(tmp); + tmp->vbasedev.needs_reset =3D false; + multi =3D true; + break; + } + } + } + + if (!single && !multi) { + ret =3D -EINVAL; + goto out_single; + } + + /* Determine how many group fds need to be passed */ + count =3D 0; + QLIST_FOREACH(group, &vfio_group_list, next) { + for (i =3D 0; i < info->count; i++) { + if (group->groupid =3D=3D devices[i].group_id) { + count++; + break; + } + } + } + + reset =3D g_malloc0(sizeof(*reset) + (count * sizeof(*fds))); + reset->argsz =3D sizeof(*reset) + (count * sizeof(*fds)); + fds =3D &reset->group_fds[0]; + + /* Fill in group fds */ + QLIST_FOREACH(group, &vfio_group_list, next) { + for (i =3D 0; i < info->count; i++) { + if (group->groupid =3D=3D devices[i].group_id) { + fds[reset->count++] =3D group->fd; + break; + } + } + } + + /* Bus reset! */ + ret =3D ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset); + g_free(reset); + + trace_vfio_pci_hot_reset_result(vdev->vbasedev.name, + ret ? strerror(errno) : "Success"); + +out: + /* Re-enable INTx on affected devices */ + for (i =3D 0; i < info->count; i++) { + PCIHostDeviceAddress host; + VFIOPCIDevice *tmp; + VFIODevice *vbasedev_iter; + + host.domain =3D devices[i].segment; + host.bus =3D devices[i].bus; + host.slot =3D PCI_SLOT(devices[i].devfn); + host.function =3D PCI_FUNC(devices[i].devfn); + + if (vfio_pci_host_match(&host, vdev->vbasedev.name)) { + continue; + } + + QLIST_FOREACH(group, &vfio_group_list, next) { + if (group->groupid =3D=3D devices[i].group_id) { + break; + } + } + + if (!group) { + break; + } + + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) { + if (!vbasedev_iter->dev->realized || + vbasedev_iter->type !=3D VFIO_DEVICE_TYPE_PCI) { + continue; + } + tmp =3D container_of(vbasedev_iter, VFIOPCIDevice, vbasedev); + if (vfio_pci_host_match(&host, tmp->vbasedev.name)) { + vfio_pci_post_reset(tmp); + break; + } + } + } +out_single: + if (!single) { + vfio_pci_post_reset(vdev); + } + g_free(info); + + return ret; +} + const VFIOIOMMUOps vfio_legacy_ops =3D { .dma_map =3D vfio_legacy_dma_map, .dma_unmap =3D vfio_legacy_dma_unmap, @@ -929,4 +1105,5 @@ const VFIOIOMMUOps vfio_legacy_ops =3D { .detach_device =3D vfio_legacy_detach_device, .set_dirty_page_tracking =3D vfio_legacy_set_dirty_page_tracking, .query_dirty_bitmap =3D vfio_legacy_query_dirty_bitmap, + .pci_hot_reset =3D vfio_legacy_pci_hot_reset, }; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index eb55e8ae88..257dae6a87 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2374,7 +2374,7 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev,= Error **errp) return 0; } =20 -static void vfio_pci_pre_reset(VFIOPCIDevice *vdev) +void vfio_pci_pre_reset(VFIOPCIDevice *vdev) { PCIDevice *pdev =3D &vdev->pdev; uint16_t cmd; @@ -2411,7 +2411,7 @@ static void vfio_pci_pre_reset(VFIOPCIDevice *vdev) vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2); } =20 -static void vfio_pci_post_reset(VFIOPCIDevice *vdev) +void vfio_pci_post_reset(VFIOPCIDevice *vdev) { Error *err =3D NULL; int nr; @@ -2435,16 +2435,6 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev) vfio_quirk_reset(vdev); } =20 -static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *na= me) -{ - char tmp[13]; - - sprintf(tmp, "%04x:%02x:%02x.%1x", addr->domain, - addr->bus, addr->slot, addr->function); - - return (strcmp(tmp, name) =3D=3D 0); -} - int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev, struct vfio_pci_hot_reset_info **info_= p) { @@ -2485,166 +2475,10 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice = *vdev, =20 static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single) { - VFIOGroup *group; - struct vfio_pci_hot_reset_info *info =3D NULL; - struct vfio_pci_dependent_device *devices; - struct vfio_pci_hot_reset *reset; - int32_t *fds; - int ret, i, count; - bool multi =3D false; - - trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi"= ); - - if (!single) { - vfio_pci_pre_reset(vdev); - } - vdev->vbasedev.needs_reset =3D false; - - ret =3D vfio_pci_get_pci_hot_reset_info(vdev, &info); - - if (ret) { - goto out_single; - } - devices =3D &info->devices[0]; - - trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name); - - /* Verify that we have all the groups required */ - for (i =3D 0; i < info->count; i++) { - PCIHostDeviceAddress host; - VFIOPCIDevice *tmp; - VFIODevice *vbasedev_iter; - - host.domain =3D devices[i].segment; - host.bus =3D devices[i].bus; - host.slot =3D PCI_SLOT(devices[i].devfn); - host.function =3D PCI_FUNC(devices[i].devfn); - - trace_vfio_pci_hot_reset_dep_devices(host.domain, - host.bus, host.slot, host.function, devices[i].group_id); - - if (vfio_pci_host_match(&host, vdev->vbasedev.name)) { - continue; - } - - QLIST_FOREACH(group, &vfio_group_list, next) { - if (group->groupid =3D=3D devices[i].group_id) { - break; - } - } - - if (!group) { - if (!vdev->has_pm_reset) { - error_report("vfio: Cannot reset device %s, " - "depends on group %d which is not owned.", - vdev->vbasedev.name, devices[i].group_id); - } - ret =3D -EPERM; - goto out; - } - - /* Prep dependent devices for reset and clear our marker. */ - QLIST_FOREACH(vbasedev_iter, &group->device_list, next) { - if (!vbasedev_iter->dev->realized || - vbasedev_iter->type !=3D VFIO_DEVICE_TYPE_PCI) { - continue; - } - tmp =3D container_of(vbasedev_iter, VFIOPCIDevice, vbasedev); - if (vfio_pci_host_match(&host, tmp->vbasedev.name)) { - if (single) { - ret =3D -EINVAL; - goto out_single; - } - vfio_pci_pre_reset(tmp); - tmp->vbasedev.needs_reset =3D false; - multi =3D true; - break; - } - } - } - - if (!single && !multi) { - ret =3D -EINVAL; - goto out_single; - } - - /* Determine how many group fds need to be passed */ - count =3D 0; - QLIST_FOREACH(group, &vfio_group_list, next) { - for (i =3D 0; i < info->count; i++) { - if (group->groupid =3D=3D devices[i].group_id) { - count++; - break; - } - } - } - - reset =3D g_malloc0(sizeof(*reset) + (count * sizeof(*fds))); - reset->argsz =3D sizeof(*reset) + (count * sizeof(*fds)); - fds =3D &reset->group_fds[0]; - - /* Fill in group fds */ - QLIST_FOREACH(group, &vfio_group_list, next) { - for (i =3D 0; i < info->count; i++) { - if (group->groupid =3D=3D devices[i].group_id) { - fds[reset->count++] =3D group->fd; - break; - } - } - } - - /* Bus reset! */ - ret =3D ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset); - g_free(reset); - - trace_vfio_pci_hot_reset_result(vdev->vbasedev.name, - ret ? strerror(errno) : "Success"); - -out: - /* Re-enable INTx on affected devices */ - for (i =3D 0; i < info->count; i++) { - PCIHostDeviceAddress host; - VFIOPCIDevice *tmp; - VFIODevice *vbasedev_iter; - - host.domain =3D devices[i].segment; - host.bus =3D devices[i].bus; - host.slot =3D PCI_SLOT(devices[i].devfn); - host.function =3D PCI_FUNC(devices[i].devfn); - - if (vfio_pci_host_match(&host, vdev->vbasedev.name)) { - continue; - } - - QLIST_FOREACH(group, &vfio_group_list, next) { - if (group->groupid =3D=3D devices[i].group_id) { - break; - } - } - - if (!group) { - break; - } - - QLIST_FOREACH(vbasedev_iter, &group->device_list, next) { - if (!vbasedev_iter->dev->realized || - vbasedev_iter->type !=3D VFIO_DEVICE_TYPE_PCI) { - continue; - } - tmp =3D container_of(vbasedev_iter, VFIOPCIDevice, vbasedev); - if (vfio_pci_host_match(&host, tmp->vbasedev.name)) { - vfio_pci_post_reset(tmp); - break; - } - } - } -out_single: - if (!single) { - vfio_pci_post_reset(vdev); - } - g_free(info); + VFIODevice *vbasedev =3D &vdev->vbasedev; + const VFIOIOMMUOps *ops =3D vbasedev->bcontainer->ops; =20 - return ret; + return ops->pci_hot_reset(vbasedev, single); } =20 /* --=20 2.34.1