From nobody Mon Feb 9 05:38:41 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44F10627F8 for ; Thu, 29 Feb 2024 09:52:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709200339; cv=none; b=me++rMs+ky2+h3JK6JvD1ZtsGSrQ5xNPM7RdmqbZ8IprFLOp1XVKTpDBN4lgoRkKlhSPUobtN2wEGV/DFcShlAgEo7ODu8W3zYiQWaga166l2gghEHl1VtvGhs8HUSMLEuFHjbgv3ALHk9JSFnbKCXkOwKqYnfVlm2epsLzkVyc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709200339; c=relaxed/simple; bh=vaA/zQDUtXU1RUde31FgXuYCnR5c2pHYcZwv74k8mCM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ETZwVTHr26quik+nQwuoxRC/GO2bmbmJx0shj7E9Bq5toABqMu7WcF1UePjIxwmkO97Qde1orkgVccqelS+LcOS5d5bss+yDMI0cT4QqD/8xU1ft4MnYzFTKLrE7BKLktNvtGYs810fGfh7GZe1vHQpI+JBuNDlwUa+WdaCS9t4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=R1tHi1kJ; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="R1tHi1kJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1709200338; x=1740736338; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vaA/zQDUtXU1RUde31FgXuYCnR5c2pHYcZwv74k8mCM=; b=R1tHi1kJDWdno7jaKfqZ1QLyO6AHTMH7TcQ1gaK/FAxDNRaPsed+IoKX wRvdVaySzsNWW45/WnLSUqq+m/g+pD2EmhPO2AaIK6NUlap28lQg4Pyqu ffAiz4vh48rkHJnzjLFYUWW6rECY4siExIIUCBSkaBtHWVkk91t723Cuf uSn/crYc1upc2xDPmOnFtCLpslc8dGcgNiatZX5kmLpFpQwKSLDHVQqj+ AEHVuoj1IF3VsLdoLK85PU7g2YlcBfMxogi4vmtQZwxRwMLl3e8p9bj7Y 1eoskERi9XUMEXoR3Z0TOIVYaRM62EQhJiVwRkUfLTs4kYGs+shhwNScd g==; X-IronPort-AV: E=McAfee;i="6600,9927,10998"; a="14366670" X-IronPort-AV: E=Sophos;i="6.06,194,1705392000"; d="scan'208";a="14366670" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Feb 2024 01:52:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,194,1705392000"; d="scan'208";a="38811017" Received: from allen-box.sh.intel.com ([10.239.159.127]) by fmviesa001.fm.intel.com with ESMTP; 29 Feb 2024 01:52:15 -0800 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Jason Gunthorpe , Kevin Tian , Eric Badger Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu , Jason Gunthorpe Subject: [PATCH v2 1/2] iommu: Add static iommu_ops->release_domain Date: Thu, 29 Feb 2024 17:46:12 +0800 Message-Id: <20240229094613.121575-2-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229094613.121575-1-baolu.lu@linux.intel.com> References: <20240229094613.121575-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The current device_release callback for individual iommu drivers does the following: 1) Silent IOMMU DMA translation: It detaches any existing domain from the device and puts it into a blocking state (some drivers might use the identity state). 2) Resource release: It releases resources allocated during the device_probe callback and restores the device to its pre-probe state. Step 1 is challenging for individual iommu drivers because each must check if a domain is already attached to the device. Additionally, if a deferred attach never occurred, the device_release should avoid modifying hardware configuration regardless of the reason for its call. To simplify this process, introduce a static release_domain within the iommu_ops structure. It can be either a blocking or identity domain depending on the iommu hardware. The iommu core will decide whether to attach this domain before the device_release callback, eliminating the need for repetitive code in various drivers. Consequently, the device_release callback can focus solely on the opposite operations of device_probe, including releasing all resources allocated during that callback. Co-developed-by: Jason Gunthorpe Signed-off-by: Jason Gunthorpe Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian --- include/linux/iommu.h | 1 + drivers/iommu/iommu.c | 19 +++++++++++++++---- 2 files changed, 16 insertions(+), 4 deletions(-) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index f08e6aa32657..2d44d4f01cc2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -585,6 +585,7 @@ struct iommu_ops { struct module *owner; struct iommu_domain *identity_domain; struct iommu_domain *blocked_domain; + struct iommu_domain *release_domain; struct iommu_domain *default_domain; }; =20 diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 210dc7b4c8cf..7158aa3d38af 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -461,13 +461,24 @@ static void iommu_deinit_device(struct device *dev) =20 /* * release_device() must stop using any attached domain on the device. - * If there are still other devices in the group they are not effected + * If there are still other devices in the group, they are not affected * by this callback. * - * The IOMMU driver must set the device to either an identity or - * blocking translation and stop using any domain pointer, as it is - * going to be freed. + * If the iommu driver provides release_domain, the core code ensures + * that domain is attached prior to calling release_device. Drivers can + * use this to enforce a translation on the idle iommu. Typically, the + * global static blocked_domain is a good choice. + * + * Otherwise, the iommu driver must set the device to either an identity + * or a blocking translation in release_device() and stop using any + * domain pointer, as it is going to be freed. + * + * Regardless, if a delayed attach never occurred, then the release + * should still avoid touching any hardware configuration either. */ + if (!dev->iommu->attach_deferred && ops->release_domain) + ops->release_domain->ops->attach_dev(ops->release_domain, dev); + if (ops->release_device) ops->release_device(dev); =20 --=20 2.34.1 From nobody Mon Feb 9 05:38:41 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 847766281E for ; Thu, 29 Feb 2024 09:52:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709200342; cv=none; b=AYFwk+IHX/yO4SaOdGqZqn+IBajip1KPffx+V5IgYCXmCZGRy7z0QtUAN8oNf9l64GQTYGz1OLupzhRJiZjbhnV0S8nt1n7Yeq2Ai5rOL8LViaB/sfJrSbmb/KdpTQ7rLzkPjyNHhdpdTEw1PxfTkIbob8zWajKJk52MTQgf4fs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709200342; c=relaxed/simple; bh=lKzo68tQ2/euX957y2t0ouNXm48fmjaBIyuo8kUBY4I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=t0o4E14zosOSfQDwYmbdvZRBGZJLFLmMTQ/d9gHSAxlPyLaBVTDxJz1XtlabZHg+3zKDaRZY+AnvXYJVN30bvWf6R6qRLoPBQBvfFKt9I2tumTJ7LjrNeWha+2gIIFXJh2Lxzj5fcIJs+AeXvR2Hvip7zx31bOoGGQlDWL9e5Qk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=TJMoyXAj; arc=none smtp.client-ip=192.198.163.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TJMoyXAj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1709200340; x=1740736340; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lKzo68tQ2/euX957y2t0ouNXm48fmjaBIyuo8kUBY4I=; b=TJMoyXAjNVvmZcCJ75BAr0bqcobLu4D7JWO63zrSSqMoQRh9vsbtZ3j6 H84JgMoHL+KKWdfwlObweU1MsEmw6YvorBqolMukqeqIYaIr/2Zu7HUdR veumDIGofq5WOFydaoHoJF/027VTcnmpE6a6wD3zfPvWdNXf6ea0YJnBm JLp8L4uvNvoGhaYkL/c95ZJnjG42p6FRk9ajTiIV2Ea9gPn6Nt14rpYgE oscgwJtq65WdtrJUJVA4R6Q+T7EjW4ZHeaJEdlxOzelSiL1m8xXwWAgK+ BpnMrNyGxVpBqWjYFAumg8I2tmZ/5cBMrohWo6UAWfxLrMWOClnhSvMZy Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10998"; a="14366693" X-IronPort-AV: E=Sophos;i="6.06,194,1705392000"; d="scan'208";a="14366693" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Feb 2024 01:52:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.06,194,1705392000"; d="scan'208";a="38811022" Received: from allen-box.sh.intel.com ([10.239.159.127]) by fmviesa001.fm.intel.com with ESMTP; 29 Feb 2024 01:52:18 -0800 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Jason Gunthorpe , Kevin Tian , Eric Badger Cc: iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH v2 2/2] iommu/vt-d: Fix NULL domain on device release Date: Thu, 29 Feb 2024 17:46:13 +0800 Message-Id: <20240229094613.121575-3-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240229094613.121575-1-baolu.lu@linux.intel.com> References: <20240229094613.121575-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In the kdump kernel, the IOMMU operates in deferred_attach mode. In this mode, info->domain may not yet be assigned by the time the release_device function is called. It leads to the following crash in the crash kernel: BUG: kernel NULL pointer dereference, address: 000000000000003c ... RIP: 0010:do_raw_spin_lock+0xa/0xa0 ... _raw_spin_lock_irqsave+0x1b/0x30 intel_iommu_release_device+0x96/0x170 iommu_deinit_device+0x39/0xf0 __iommu_group_remove_device+0xa0/0xd0 iommu_bus_notifier+0x55/0xb0 notifier_call_chain+0x5a/0xd0 blocking_notifier_call_chain+0x41/0x60 bus_notify+0x34/0x50 device_del+0x269/0x3d0 pci_remove_bus_device+0x77/0x100 p2sb_bar+0xae/0x1d0 ... i801_probe+0x423/0x740 Use the release_domain mechanism to fix it. The scalable mode context entry which is not part of release_domain should be cleared in release_device(). Fixes: 586081d3f6b1 ("iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO") Reported-by: Eric Badger Closes: https://lore.kernel.org/r/20240113181713.1817855-1-ebadger@purestor= age.com Signed-off-by: Lu Baolu --- drivers/iommu/intel/pasid.h | 1 + drivers/iommu/intel/iommu.c | 31 +++----------- drivers/iommu/intel/pasid.c | 83 +++++++++++++++++++++++++++++++++++++ 3 files changed, 90 insertions(+), 25 deletions(-) diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h index 487ede039bdd..42fda97fd851 100644 --- a/drivers/iommu/intel/pasid.h +++ b/drivers/iommu/intel/pasid.h @@ -318,4 +318,5 @@ void intel_pasid_tear_down_entry(struct intel_iommu *io= mmu, bool fault_ignore); void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu, struct device *dev, u32 pasid); +void intel_pasid_teardown_sm_context(struct device *dev); #endif /* __INTEL_PASID_H */ diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index cc3994efd362..f74d42d3258f 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -3869,30 +3869,6 @@ static void domain_context_clear(struct device_domai= n_info *info) &domain_context_clear_one_cb, info); } =20 -static void dmar_remove_one_dev_info(struct device *dev) -{ - struct device_domain_info *info =3D dev_iommu_priv_get(dev); - struct dmar_domain *domain =3D info->domain; - struct intel_iommu *iommu =3D info->iommu; - unsigned long flags; - - if (!dev_is_real_dma_subdevice(info->dev)) { - if (dev_is_pci(info->dev) && sm_supported(iommu)) - intel_pasid_tear_down_entry(iommu, info->dev, - IOMMU_NO_PASID, false); - - iommu_disable_pci_caps(info); - domain_context_clear(info); - } - - spin_lock_irqsave(&domain->lock, flags); - list_del(&info->link); - spin_unlock_irqrestore(&domain->lock, flags); - - domain_detach_iommu(domain, iommu); - info->domain =3D NULL; -} - /* * Clear the page table pointer in context or pasid table entries so that * all DMA requests without PASID from the device are blocked. If the page @@ -4431,7 +4407,11 @@ static void intel_iommu_release_device(struct device= *dev) mutex_lock(&iommu->iopf_lock); device_rbtree_remove(info); mutex_unlock(&iommu->iopf_lock); - dmar_remove_one_dev_info(dev); + + if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev) && + !context_copied(iommu, info->bus, info->devfn)) + intel_pasid_teardown_sm_context(dev); + intel_pasid_free_table(dev); intel_iommu_debugfs_remove_dev(info); kfree(info); @@ -4922,6 +4902,7 @@ static const struct iommu_dirty_ops intel_dirty_ops = =3D { =20 const struct iommu_ops intel_iommu_ops =3D { .blocked_domain =3D &blocking_domain, + .release_domain =3D &blocking_domain, .capable =3D intel_iommu_capable, .hw_info =3D intel_iommu_hw_info, .domain_alloc =3D intel_iommu_domain_alloc, diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 108158e2b907..52068cf52fe2 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -667,3 +667,86 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu= , struct device *dev, =20 return 0; } + +/* + * Interfaces to setup or teardown a pasid table to the scalable-mode + * context table entry: + */ + +/* + * Cache invalidation for changes to a scalable-mode context table + * entry. + * + * Section 6.5.3.3 of the VT-d spec: + * - Device-selective context-cache invalidation; + * - Domain-selective PASID-cache invalidation to affected domains + * (can be skipped if all PASID entries were not-present); + * - Domain-selective IOTLB invalidation to affected domains; + * - Global Device-TLB invalidation to affected functions. + * + * Note that RWBF (Required Write-Buffer Flushing) capability has + * been deprecated for scable mode. Section 11.4.2 of the VT-d spec: + * + * HRWBF: Hardware implementations reporting Scalable Mode Translation + * Support (SMTS) as Set also report this field as Clear. + */ +static void sm_context_flush_caches(struct device *dev) +{ + struct device_domain_info *info =3D dev_iommu_priv_get(dev); + struct intel_iommu *iommu =3D info->iommu; + + iommu->flush.flush_context(iommu, 0, PCI_DEVID(info->bus, info->devfn), + DMA_CCMD_MASK_NOBIT, DMA_CCMD_DEVICE_INVL); + qi_flush_pasid_cache(iommu, 0, QI_PC_GLOBAL, 0); + iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH); + devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID); +} + +static void context_entry_teardown_pasid_table(struct intel_iommu *iommu, + struct context_entry *context) +{ + context_clear_entry(context); + if (!ecap_coherent(iommu->ecap)) + clflush_cache_range(context, sizeof(*context)); +} + +static void device_pasid_table_teardown(struct device *dev, u8 bus, u8 dev= fn) +{ + struct device_domain_info *info =3D dev_iommu_priv_get(dev); + struct intel_iommu *iommu =3D info->iommu; + struct context_entry *context; + + spin_lock(&iommu->lock); + context =3D iommu_context_addr(iommu, bus, devfn, false); + if (!context) { + spin_unlock(&iommu->lock); + return; + } + + context_entry_teardown_pasid_table(iommu, context); + spin_unlock(&iommu->lock); + + sm_context_flush_caches(dev); +} + +static int pci_pasid_table_teardown(struct pci_dev *pdev, u16 alias, void = *data) +{ + struct device *dev =3D data; + + if (dev =3D=3D &pdev->dev) + device_pasid_table_teardown(dev, PCI_BUS_NUM(alias), alias & 0xff); + + return 0; +} + +void intel_pasid_teardown_sm_context(struct device *dev) +{ + struct device_domain_info *info =3D dev_iommu_priv_get(dev); + + if (!dev_is_pci(dev)) { + device_pasid_table_teardown(dev, info->bus, info->devfn); + return; + } + + pci_for_each_dma_alias(to_pci_dev(dev), pci_pasid_table_teardown, dev); +} --=20 2.34.1