From nobody Thu Apr 9 12:06:18 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7025E34DCEE for ; Mon, 9 Mar 2026 06:09:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036588; cv=none; b=qVkAqxaFuYmAMykdkSEKo2mGQCqI0wDUno10T1E2vkeeJzUbFuOvNnhkkdBBCZS7xwmc6rorM5Ycr3WAgjTBh4KcU77UPSNb1p9Zmk7sHErmOz1g/PJHw1Rkv316ZOBp10QKQqmBosizbmXsxDONZ0tSBgKJxiHPgLrbKoXiYrA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036588; c=relaxed/simple; bh=qRczGGjS+jvqacQKx+rqpEi2IjosIQiGr3i5T7Gla+o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ne/QqNNtBrnI9u+5cyiLlKYF1cqBbzkvMohrx2/RJXgj9sS4BzE8ObvaSs7+LeAk6lFMLOWye+fpWks+fkoLfQISu8JfviIucPQD1QyA01DLoSZVlwi+VX+BwUTthibibAFoJY9fog/0FsFZCVdKoEQZ++KT8JvNra+V+5jUvJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ireEgxg9; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ireEgxg9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1773036586; x=1804572586; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qRczGGjS+jvqacQKx+rqpEi2IjosIQiGr3i5T7Gla+o=; b=ireEgxg9RPBdNWHh608gimj9kpJtgu5T9VsqU1c2rJF02cYvK4LVPJdP jSS+ZAFd8L2/ZpRmCsni149sbVR8JPOew+Z5YWlINuaykBECsGVIb7mWg fWWHbbz5GL1fyPrpwlYVNs2nWSiZG9/o7wAstFH12qAQv+UOaT/dBQ8R7 2KQLp1K+G6RwE832US/RGLfwpp6PxC0blCCMy6+32TC1QxCYMWW3OYqG7 TqYHju9tKeJ7KZzXwBmaPoEQYmUXJNbm6iL4TKtuikJS7dxFXyMJln5j0 dE6OJXuRODKAXXq85p5dNDGr4+rflSHIuKbkDe/xabm3PjSJZFLoO56sU g==; X-CSE-ConnectionGUID: bivIPu27T/S4QgTNz3d13Q== X-CSE-MsgGUID: ooVytcYvRUKE36ukA5fZUA== X-IronPort-AV: E=McAfee;i="6800,10657,11723"; a="73248244" X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="73248244" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Mar 2026 23:09:45 -0700 X-CSE-ConnectionGUID: 0ypo218yQ8y3N/Lk6tDKPw== X-CSE-MsgGUID: Y+jAYJVjRmilEC4UAr0NNA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="245669158" Received: from allen-box.sh.intel.com ([10.239.159.52]) by fmviesa001.fm.intel.com with ESMTP; 08 Mar 2026 23:09:43 -0700 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe Cc: Dmytro Maluka , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH 1/8] iommu: Lift and generalize the STE/CD update code from SMMUv3 Date: Mon, 9 Mar 2026 14:06:41 +0800 Message-ID: <20260309060648.276762-2-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260309060648.276762-1-baolu.lu@linux.intel.com> References: <20260309060648.276762-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Jason Gunthorpe Many IOMMU implementations store data structures in host memory that can be quite big. The iommu is able to DMA read the host memory using an atomic quanta, usually 64 or 128 bits, and will read an entry using multiple quanta reads. Updating the host memory datastructure entry while the HW is concurrently DMA'ing it is a little bit involved, but if you want to do this hitlessly, while never making the entry non-valid, then it becomes quite complicated. entry_sync is a library to handle this task. It works on the notion of "used bits" which reflect which bits the HW is actually sensitive to and which bits are ignored by hardware. Many hardware specifications say things like 'if mode is X then bits ABC are ignored'. Using the ignored bits entry_sync can often compute a series of ordered writes and flushes that will allow the entry to be updated while keeping it valid. If such an update is not possible then entry will be made temporarily non-valid. A 64 and 128 bit quanta version is provided to support existing iommus. Co-developed-by: Lu Baolu Signed-off-by: Lu Baolu Signed-off-by: Jason Gunthorpe --- drivers/iommu/Kconfig | 14 +++ drivers/iommu/Makefile | 1 + drivers/iommu/entry_sync.h | 66 +++++++++++++ drivers/iommu/entry_sync_template.h | 143 ++++++++++++++++++++++++++++ drivers/iommu/entry_sync.c | 68 +++++++++++++ 5 files changed, 292 insertions(+) create mode 100644 drivers/iommu/entry_sync.h create mode 100644 drivers/iommu/entry_sync_template.h create mode 100644 drivers/iommu/entry_sync.c diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index f86262b11416..2650c9fa125b 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -145,6 +145,20 @@ config IOMMU_DEFAULT_PASSTHROUGH =20 endchoice =20 +config IOMMU_ENTRY_SYNC + bool + default n + +config IOMMU_ENTRY_SYNC64 + bool + select IOMMU_ENTRY_SYNC + default n + +config IOMMU_ENTRY_SYNC128 + bool + select IOMMU_ENTRY_SYNC + default n + config OF_IOMMU def_bool y depends on OF && IOMMU_API diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 0275821f4ef9..bd923995497a 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -10,6 +10,7 @@ obj-$(CONFIG_IOMMU_API) +=3D iommu-traces.o obj-$(CONFIG_IOMMU_API) +=3D iommu-sysfs.o obj-$(CONFIG_IOMMU_DEBUGFS) +=3D iommu-debugfs.o obj-$(CONFIG_IOMMU_DMA) +=3D dma-iommu.o +obj-$(CONFIG_IOMMU_ENTRY_SYNC) +=3D entry_sync.o obj-$(CONFIG_IOMMU_IO_PGTABLE) +=3D io-pgtable.o obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) +=3D io-pgtable-arm-v7s.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) +=3D io-pgtable-arm.o diff --git a/drivers/iommu/entry_sync.h b/drivers/iommu/entry_sync.h new file mode 100644 index 000000000000..004d421c71c0 --- /dev/null +++ b/drivers/iommu/entry_sync.h @@ -0,0 +1,66 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Many IOMMU implementations store data structures in host memory that ca= n be + * quite big. The iommu is able to DMA read the host memory using an atomic + * quanta, usually 64 or 128 bits, and will read an entry using multiple q= uanta + * reads. + * + * Updating the host memory datastructure entry while the HW is concurrent= ly + * DMA'ing it is a little bit involved, but if you want to do this hitless= ly, + * while never making the entry non-valid, then it becomes quite complicat= ed. + * + * entry_sync is a library to handle this task. It works on the notion of = "used + * bits" which reflect which bits the HW is actually sensitive to and whic= h bits + * are ignored by hardware. Many hardware specifications say things like '= if + * mode is X then bits ABC are ignored'. + * + * Using the ignored bits entry_sync can often compute a series of ordered + * writes and flushes that will allow the entry to be updated while keepin= g it + * valid. If such an update is not possible then entry will be made tempor= arily + * non-valid. + * + * A 64 and 128 bit quanta version is provided to support existing iommus. + */ +#ifndef IOMMU_ENTRY_SYNC_H +#define IOMMU_ENTRY_SYNC_H + +#include +#include +#include + +/* Caller allocates a stack array of this length to call entry_sync_write(= ) */ +#define ENTRY_SYNC_MEMORY_LEN(writer) ((writer)->num_quantas * 3) + +struct entry_sync_writer_ops64; +struct entry_sync_writer64 { + const struct entry_sync_writer_ops64 *ops; + size_t num_quantas; + size_t vbit_quanta; +}; + +struct entry_sync_writer_ops64 { + void (*get_used)(const __le64 *entry, __le64 *used); + void (*sync)(struct entry_sync_writer64 *writer); +}; + +void entry_sync_write64(struct entry_sync_writer64 *writer, __le64 *entry, + const __le64 *target, __le64 *memory, + size_t memory_len); + +struct entry_sync_writer_ops128; +struct entry_sync_writer128 { + const struct entry_sync_writer_ops128 *ops; + size_t num_quantas; + size_t vbit_quanta; +}; + +struct entry_sync_writer_ops128 { + void (*get_used)(const u128 *entry, u128 *used); + void (*sync)(struct entry_sync_writer128 *writer); +}; + +void entry_sync_write128(struct entry_sync_writer128 *writer, u128 *entry, + const u128 *target, u128 *memory, + size_t memory_len); + +#endif diff --git a/drivers/iommu/entry_sync_template.h b/drivers/iommu/entry_sync= _template.h new file mode 100644 index 000000000000..646f518b098e --- /dev/null +++ b/drivers/iommu/entry_sync_template.h @@ -0,0 +1,143 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#include "entry_sync.h" +#include +#include + +#ifndef entry_sync_writer +#define entry_sync_writer entry_sync_writer64 +#define quanta_t __le64 +#define NS(name) CONCATENATE(name, 64) +#endif + +/* + * Figure out if we can do a hitless update of entry to become target. Ret= urns a + * bit mask where 1 indicates that a quanta word needs to be set disruptiv= ely. + * unused_update is an intermediate value of entry that has unused bits se= t to + * their new values. + */ +static u8 NS(entry_quanta_diff)(struct entry_sync_writer *writer, + const quanta_t *entry, const quanta_t *target, + quanta_t *unused_update, quanta_t *memory) +{ + quanta_t *target_used =3D memory + writer->num_quantas * 1; + quanta_t *cur_used =3D memory + writer->num_quantas * 2; + u8 used_qword_diff =3D 0; + unsigned int i; + + writer->ops->get_used(entry, cur_used); + writer->ops->get_used(target, target_used); + + for (i =3D 0; i !=3D writer->num_quantas; i++) { + /* + * Check that masks are up to date, the make functions are not + * allowed to set a bit to 1 if the used function doesn't say it + * is used. + */ + WARN_ON_ONCE(target[i] & ~target_used[i]); + + /* Bits can change because they are not currently being used */ + unused_update[i] =3D (entry[i] & cur_used[i]) | + (target[i] & ~cur_used[i]); + /* + * Each bit indicates that a used bit in a qword needs to be + * changed after unused_update is applied. + */ + if ((unused_update[i] & target_used[i]) !=3D target[i]) + used_qword_diff |=3D 1 << i; + } + return used_qword_diff; +} + +/* + * Update the entry to the target configuration. The transition from the c= urrent + * entry to the target entry takes place over multiple steps that attempts= to + * make the transition hitless if possible. This function takes care not to + * create a situation where the HW can perceive a corrupted entry. HW is o= nly + * required to have a quanta-bit atomicity with stores from the CPU, while + * entries are many quanta bit values big. + * + * The difference between the current value and the target value is analyz= ed to + * determine which of three updates are required - disruptive, hitless or = no + * change. + * + * In the most general disruptive case we can make any update in three ste= ps: + * - Disrupting the entry (V=3D0) + * - Fill now unused quanta words, except qword 0 which contains V + * - Make qword 0 have the final value and valid (V=3D1) with a single 64 + * bit store + * + * However this disrupts the HW while it is happening. There are several + * interesting cases where a STE/CD can be updated without disturbing the = HW + * because only a small number of bits are changing (S1DSS, CONFIG, etc) or + * because the used bits don't intersect. We can detect this by calculatin= g how + * many 64 bit values need update after adjusting the unused bits and skip= the + * V=3D0 process. This relies on the IGNORED behavior described in the + * specification. + */ +void NS(entry_sync_write)(struct entry_sync_writer *writer, quanta_t *entr= y, + const quanta_t *target, quanta_t *memory, + size_t memory_len) +{ + quanta_t *unused_update =3D memory + writer->num_quantas * 0; + u8 used_qword_diff; + + if (WARN_ON(memory_len !=3D + ENTRY_SYNC_MEMORY_LEN(writer) * sizeof(*memory))) + return; + + used_qword_diff =3D NS(entry_quanta_diff)(writer, entry, target, + unused_update, memory); + if (hweight8(used_qword_diff) =3D=3D 1) { + /* + * Only one quanta needs its used bits to be changed. This is a + * hitless update, update all bits the current entry is ignoring + * to their new values, then update a single "critical quanta" + * to change the entry and finally 0 out any bits that are now + * unused in the target configuration. + */ + unsigned int critical_qword_index =3D ffs(used_qword_diff) - 1; + + /* + * Skip writing unused bits in the critical quanta since we'll + * be writing it in the next step anyways. This can save a sync + * when the only change is in that quanta. + */ + unused_update[critical_qword_index] =3D + entry[critical_qword_index]; + NS(entry_set)(writer, entry, unused_update, 0, + writer->num_quantas); + NS(entry_set)(writer, entry, target, critical_qword_index, 1); + NS(entry_set)(writer, entry, target, 0, writer->num_quantas); + } else if (used_qword_diff) { + /* + * At least two quantas need their inuse bits to be changed. + * This requires a breaking update, zero the V bit, write all + * qwords but 0, then set qword 0 + */ + unused_update[writer->vbit_quanta] =3D 0; + NS(entry_set)(writer, entry, unused_update, writer->vbit_quanta, 1); + + if (writer->vbit_quanta !=3D 0) + NS(entry_set)(writer, entry, target, 0, + writer->vbit_quanta - 1); + if (writer->vbit_quanta !=3D writer->num_quantas) + NS(entry_set)(writer, entry, target, + writer->vbit_quanta, + writer->num_quantas - 1); + + NS(entry_set)(writer, entry, target, writer->vbit_quanta, 1); + } else { + /* + * No inuse bit changed. Sanity check that all unused bits are 0 + * in the entry. The target was already sanity checked by + * entry_quanta_diff(). + */ + WARN_ON_ONCE(NS(entry_set)(writer, entry, target, 0, + writer->num_quantas)); + } +} +EXPORT_SYMBOL(NS(entry_sync_write)); + +#undef entry_sync_writer +#undef quanta_t +#undef NS diff --git a/drivers/iommu/entry_sync.c b/drivers/iommu/entry_sync.c new file mode 100644 index 000000000000..48d31270dbba --- /dev/null +++ b/drivers/iommu/entry_sync.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Helpers for drivers to update multi-quanta entries shared with HW witho= ut + * races to minimize breaking changes. + */ +#include "entry_sync.h" +#include +#include + +#if IS_ENABLED(CONFIG_IOMMU_ENTRY_SYNC64) +static bool entry_set64(struct entry_sync_writer64 *writer, __le64 *entry, + const __le64 *target, unsigned int start, + unsigned int len) +{ + bool changed =3D false; + unsigned int i; + + for (i =3D start; len !=3D 0; len--, i++) { + if (entry[i] !=3D target[i]) { + WRITE_ONCE(entry[i], target[i]); + changed =3D true; + } + } + + if (changed) + writer->ops->sync(writer); + return changed; +} + +#define entry_sync_writer entry_sync_writer64 +#define quanta_t __le64 +#define NS(name) CONCATENATE(name, 64) +#include "entry_sync_template.h" +#endif + +#if IS_ENABLED(CONFIG_IOMMU_ENTRY_SYNC128) +static bool entry_set128(struct entry_sync_writer128 *writer, u128 *entry, + const u128 *target, unsigned int start, + unsigned int len) +{ + bool changed =3D false; + unsigned int i; + + for (i =3D start; len !=3D 0; len--, i++) { + if (entry[i] !=3D target[i]) { + /* + * Use cmpxchg128 to generate an indivisible write from + * the CPU to DMA'able memory. This must ensure that HW + * sees either the new or old 128 bit value and not + * something torn. As updates are serialized by a + * spinlock, we use the local (unlocked) variant to + * avoid unnecessary bus locking overhead. + */ + cmpxchg128_local(&entry[i], entry[i], target[i]); + changed =3D true; + } + } + + if (changed) + writer->ops->sync(writer); + return changed; +} + +#define entry_sync_writer entry_sync_writer128 +#define quanta_t u128 +#define NS(name) CONCATENATE(name, 128) +#include "entry_sync_template.h" +#endif --=20 2.43.0 From nobody Thu Apr 9 12:06:18 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2DE5634F257 for ; Mon, 9 Mar 2026 06:09:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036589; cv=none; b=mZMMY+BIZibtuDFZC4I0bb7TiA39bQ6WA9SeWyxhzbQ49MY2tNV4NUoXO5tuSD/02d9hnGHJ9Fvs6YySpCDlH4w3saobHbnGIvgndDDvoyplu41bssCsXkVJOwFDqXYR32Ol2dlN5lkN2LTlYKFFgRRjtqaYWHG2RxNsAmq5134= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036589; c=relaxed/simple; bh=YY6Oan1TxDQ3pXt0pTXICCe0X+YwI1YxncTCoF+aR1Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PR4/XWK56eGf9jdetQAbiw3PT7m1MHx+y6HdW1SswGpD/PHh+264oaOohOC8GJ3EYfDRcIOHVGpwM33XUcEPEu+HXUY72aa38ABEA7sUGjUrAks36/vnGyJvUIBpJezqGZEfQ5xtmi52NlF4gVK0dokMz/5KXmpIG8zV/L4GRkQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CwwvTLdB; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CwwvTLdB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1773036588; x=1804572588; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YY6Oan1TxDQ3pXt0pTXICCe0X+YwI1YxncTCoF+aR1Y=; b=CwwvTLdBViQzlWOkABh3cOBTnwjOqnpVrWsrT+H6wcuqdJNIjrKajqsN k4LhOM3vfxhEa1E2x1BEpGDPyY0r/QJDbTk5lQRvHbAXnNkXBo9ky/Xwe SvO6A5ivgyIyEUGJB2YlLYbeNoiE6It9REKQm0SQX6NXfJGBqkAyFit/P N8vq2ohP1SSqr1MgFTHt5mxtnaj/Nlqh+gICTORiYDnVLmSXqslD94uED oJ9KhUTmT/ItST6q+3j0sy2dpSH0VDRwTiSSToz5waXFUxuaUFUCEtTLd Wys+sruLDa+U46t3Uf70ec7zBE9VPPbJNf7FLyjNQzQD6jojXNE1eZaOz w==; X-CSE-ConnectionGUID: C5NoFmV/RaeQFia3qZ2KgA== X-CSE-MsgGUID: uhO4VNlCSsKMB161ATWjlQ== X-IronPort-AV: E=McAfee;i="6800,10657,11723"; a="73248252" X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="73248252" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Mar 2026 23:09:48 -0700 X-CSE-ConnectionGUID: syalacKYQ1+5r6FnV/S0Wg== X-CSE-MsgGUID: CHhAKf8TTbGoCV+7E1KxSg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="245669162" Received: from allen-box.sh.intel.com ([10.239.159.52]) by fmviesa001.fm.intel.com with ESMTP; 08 Mar 2026 23:09:45 -0700 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe Cc: Dmytro Maluka , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH 2/8] iommu/vt-d: Add entry_sync support for PASID entry updates Date: Mon, 9 Mar 2026 14:06:42 +0800 Message-ID: <20260309060648.276762-3-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260309060648.276762-1-baolu.lu@linux.intel.com> References: <20260309060648.276762-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Updating PASID table entries while the device hardware is possibly performing DMA concurrently is complex. Traditionally, this required a "clear-then-update" approach =E2=80=94 clearing the Present bit, flushing caches, updating the entry, and then restoring the Present bit. This causes unnecessary latency or interruptions for transactions that might not even be affected by the specific bits being changed. Plumb this driver into the generic entry_sync library to modernize this process. The library uses the concept of "Used bits" to determine if a transition can be performed "hitlessly" (via a single atomic 128-bit swap) or if a disruptive 3-step update is truly required. The implementation includes: - intel_pasid_get_used(): Defines which bits the IOMMU hardware is sensitive to based on the PGTT. - intel_pasid_sync(): Handles the required clflushes, PASID cache invalidations, and IOTLB/Dev-TLB flushes required between update steps. - 128-bit atomicity: Depends on IOMMU_ENTRY_SYNC128 to ensure that 256-bit PASID entries are updated in atomic 128-bit quanta, preventing the hardware from ever seeing a "torn" entry. Signed-off-by: Lu Baolu --- drivers/iommu/intel/Kconfig | 2 + drivers/iommu/intel/pasid.c | 173 ++++++++++++++++++++++++++++++++++++ 2 files changed, 175 insertions(+) diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig index 5471f814e073..7fa31b9d4ef4 100644 --- a/drivers/iommu/intel/Kconfig +++ b/drivers/iommu/intel/Kconfig @@ -26,6 +26,8 @@ config INTEL_IOMMU select PCI_ATS select PCI_PRI select PCI_PASID + select IOMMU_ENTRY_SYNC + select IOMMU_ENTRY_SYNC128 help DMA remapping (DMAR) devices support enables independent address translations for Direct Memory Access (DMA) from devices. diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 9d30015b8940..5b9eb5c8f42d 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -21,12 +21,185 @@ #include "iommu.h" #include "pasid.h" #include "../iommu-pages.h" +#include "../entry_sync.h" =20 /* * Intel IOMMU system wide PASID name space: */ u32 intel_pasid_max_id =3D PASID_MAX; =20 +/* + * Plumb into the generic entry_sync library: + */ +static struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 p= asid); +static void pasid_flush_caches(struct intel_iommu *iommu, struct pasid_ent= ry *pte, + u32 pasid, u16 did); +static void intel_pasid_flush_present(struct intel_iommu *iommu, struct de= vice *dev, + u32 pasid, u16 did, struct pasid_entry *pte); +static void pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu, + u16 did, u32 pasid); +static void devtlb_invalidation_with_pasid(struct intel_iommu *iommu, + struct device *dev, u32 pasid); + +struct intel_pasid_writer { + struct entry_sync_writer128 writer; + struct intel_iommu *iommu; + struct device *dev; + u32 pasid; + struct pasid_entry orig_pte; + bool was_present; +}; + +/* + * Identify which bits of the 256-bit entry the HW is using. The "Used" bi= ts + * are those that, if changed, would cause the IOMMU to behave differently + * for an active transaction. + */ +static void intel_pasid_get_used(const u128 *entry, u128 *used) +{ + struct pasid_entry *pe =3D (struct pasid_entry *)entry; + struct pasid_entry *ue =3D (struct pasid_entry *)used; + u16 pgtt; + + /* Initialize used bits to 0. */ + memset(ue, 0, sizeof(*ue)); + + /* Present bit always matters. */ + ue->val[0] |=3D PASID_PTE_PRESENT; + + /* Nothing more for non-present entries. */ + if (!(pe->val[0] & PASID_PTE_PRESENT)) + return; + + pgtt =3D pasid_pte_get_pgtt(pe); + switch (pgtt) { + case PASID_ENTRY_PGTT_FL_ONLY: + /* AW, PGTT */ + ue->val[0] |=3D GENMASK_ULL(4, 2) | GENMASK_ULL(8, 6); + /* DID, PWSNP, PGSNP */ + ue->val[1] |=3D GENMASK_ULL(24, 23) | GENMASK_ULL(15, 0); + /* FSPTPTR, FSPM */ + ue->val[2] |=3D GENMASK_ULL(63, 12) | GENMASK_ULL(3, 2); + break; + case PASID_ENTRY_PGTT_NESTED: + /* FPD, AW, PGTT, SSADE, SSPTPTR*/ + ue->val[0] |=3D GENMASK_ULL(63, 12) | GENMASK_ULL(9, 6) | + GENMASK_ULL(4, 1); + /* PGSNP, DID, PWSNP */ + ue->val[1] |=3D GENMASK_ULL(24, 23) | GENMASK_ULL(15, 0); + /* FSPTPTR, FSPM, EAFE, WPE, SRE */ + ue->val[2] |=3D GENMASK_ULL(63, 12) | BIT_ULL(7) | + GENMASK_ULL(4, 2) | BIT_ULL(0); + break; + case PASID_ENTRY_PGTT_SL_ONLY: + /* FPD, AW, PGTT, SSADE, SSPTPTR */ + ue->val[0] |=3D GENMASK_ULL(63, 12) | GENMASK_ULL(9, 6) | + GENMASK_ULL(4, 1); + /* DID, PWSNP */ + ue->val[1] |=3D GENMASK_ULL(15, 0) | BIT_ULL(23); + break; + case PASID_ENTRY_PGTT_PT: + /* FPD, AW, PGTT */ + ue->val[0] |=3D GENMASK_ULL(4, 2) | GENMASK_ULL(8, 6) | BIT_ULL(1); + /* DID, PWSNP */ + ue->val[1] |=3D GENMASK_ULL(15, 0) | BIT_ULL(23); + break; + default: + WARN_ON(true); + } +} + +static void intel_pasid_sync(struct entry_sync_writer128 *writer) +{ + struct intel_pasid_writer *p_writer =3D container_of(writer, + struct intel_pasid_writer, writer); + struct intel_iommu *iommu =3D p_writer->iommu; + struct device *dev =3D p_writer->dev; + bool was_present, is_present; + u32 pasid =3D p_writer->pasid; + struct pasid_entry *pte; + u16 old_did, old_pgtt; + + pte =3D intel_pasid_get_entry(dev, pasid); + was_present =3D p_writer->was_present; + is_present =3D pasid_pte_is_present(pte); + old_did =3D pasid_get_domain_id(&p_writer->orig_pte); + old_pgtt =3D pasid_pte_get_pgtt(&p_writer->orig_pte); + + /* Update the last present state: */ + p_writer->was_present =3D is_present; + + if (!ecap_coherent(iommu->ecap)) + clflush_cache_range(pte, sizeof(*pte)); + + /* Sync for "P=3D0" to "P=3D1": */ + if (!was_present) { + if (is_present) + pasid_flush_caches(iommu, pte, pasid, + pasid_get_domain_id(pte)); + + return; + } + + /* Sync for "P=3D1" to "P=3D1": */ + if (is_present) { + intel_pasid_flush_present(iommu, dev, pasid, old_did, pte); + return; + } + + /* Sync for "P=3D1" to "P=3D0": */ + pasid_cache_invalidation_with_pasid(iommu, old_did, pasid); + + if (old_pgtt =3D=3D PASID_ENTRY_PGTT_PT || old_pgtt =3D=3D PASID_ENTRY_PG= TT_FL_ONLY) + qi_flush_piotlb(iommu, old_did, pasid, 0, -1, 0); + else + iommu->flush.flush_iotlb(iommu, old_did, 0, 0, DMA_TLB_DSI_FLUSH); + + devtlb_invalidation_with_pasid(iommu, dev, pasid); +} + +static const struct entry_sync_writer_ops128 writer_ops128 =3D { + .get_used =3D intel_pasid_get_used, + .sync =3D intel_pasid_sync, +}; + +#define INTEL_PASID_SYNC_MEM_COUNT 12 + +static int __maybe_unused intel_pasid_write(struct intel_iommu *iommu, + struct device *dev, u32 pasid, + u128 *target) +{ + struct pasid_entry *pte =3D intel_pasid_get_entry(dev, pasid); + struct intel_pasid_writer p_writer =3D { + .writer =3D { + .ops =3D &writer_ops128, + /* 512 bits total (4 * 128-bit chunks) */ + .num_quantas =3D 4, + /* The 'P' bit is in the first 128-bit chunk */ + .vbit_quanta =3D 0, + }, + .iommu =3D iommu, + .dev =3D dev, + .pasid =3D pasid, + }; + u128 memory[INTEL_PASID_SYNC_MEM_COUNT]; + + if (!pte) + return -ENODEV; + + p_writer.orig_pte =3D *pte; + p_writer.was_present =3D pasid_pte_is_present(pte); + + /* + * The library now does the heavy lifting: + * 1. Checks if it can do a 1-quanta hitless flip. + * 2. If not, it does a 3-step V=3D0 (disruptive) update. + */ + entry_sync_write128(&p_writer.writer, (u128 *)pte, target, memory, sizeof= (memory)); + + return 0; +} + /* * Per device pasid table management: */ --=20 2.43.0 From nobody Thu Apr 9 12:06:18 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7C2E934EF0E for ; Mon, 9 Mar 2026 06:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036591; cv=none; b=VHpmARoobJAKnKgiXU6zeD3e2CWu7UTrY5kbjupeGDCIb0ysd+h7nA+mFqjt1goL6a5t3pVe/BWjUotemmmrW9W+L3qLSS4RuLD88qcIiVTbswf+81irV/1liAwPU0stCYU9Fonmt1GS0svIQ+FCj0eJZKNZmqYf87EnPP6P7Nc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036591; c=relaxed/simple; bh=hh2uOktjahAxQZlSnY1/zk5puptgnvDXUVoF/w/9cGo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X1eWnrEh1nQ19uAvgpnFqJ6NgLO3UQflelqEFHi4RBMbK38LG072ia5QOMP8XkGbVCMV/c7wD2IdWAmeOe/1uMscIhnFa1621M5ubUflY1ZRna1Zkt3FnjC0sS9ZxbCLCYrc//EcYWyAg3Phd89fVAPgxsniW2egI8GUMrFcbW0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=HIqeyBGz; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="HIqeyBGz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1773036591; x=1804572591; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=hh2uOktjahAxQZlSnY1/zk5puptgnvDXUVoF/w/9cGo=; b=HIqeyBGzAXusQaSsIErv9iPYrl+3f0BsEvwHA/P4k0KwQ7keB0e+82I7 Uxe3wI5AqndAbF0GsRf6ByHLBfbGBAo3jRKiFQJd+z7h+mir6PPDL280Z puIjWFsiFEY+i4bmeQ8tQuST6R9KKvyy6TeOQClmKf7mX4wKNiGUg1jOf zjfzC1MX0PwwIOo687Le1LHpgoWlnkgUHIZA4mqC7s4gbXXuX6WpOMTXL v6gfn/ev081HBEPkVS9DHPj3R7fKtURtuFkTnVXSd9G95FNfXhaPn1yfA 22VOJijvfvvxWSNELGpuAVAzUeVFzu4+2B2adrGuLSZWk+GRrE04qdvDJ Q==; X-CSE-ConnectionGUID: SQU6DcipSSSWOPynhSMVCw== X-CSE-MsgGUID: NflHz0vZTPOW5jNar3SPnQ== X-IronPort-AV: E=McAfee;i="6800,10657,11723"; a="73248266" X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="73248266" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Mar 2026 23:09:50 -0700 X-CSE-ConnectionGUID: zHm29YHkQK6tmG9AN4sSNg== X-CSE-MsgGUID: I+ZcCJjiS7WqFoB8xwLwTA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="245669165" Received: from allen-box.sh.intel.com ([10.239.159.52]) by fmviesa001.fm.intel.com with ESMTP; 08 Mar 2026 23:09:48 -0700 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe Cc: Dmytro Maluka , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH 3/8] iommu/vt-d: Require CMPXCHG16B for PASID support Date: Mon, 9 Mar 2026 14:06:43 +0800 Message-ID: <20260309060648.276762-4-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260309060648.276762-1-baolu.lu@linux.intel.com> References: <20260309060648.276762-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The Intel IOMMU driver is moving toward using the generic entry_sync library for PASID table entry updates. This library requires 128-bit atomic write operations (cmpxchg128) to update 512-bit PASID entries in atomic quanta, ensuring the hardware never observes a torn entry. On x86_64, 128-bit atomicity is provided by the CMPXCHG16B instruction. Update the driver to: 1. Limit INTEL_IOMMU to X86_64, as 128-bit atomic operations are not available on 32-bit x86. 2. Gate pasid_supported() on the presence of X86_FEATURE_CX16. 3. Provide a boot-time warning if a PASID-capable IOMMU is detected on a CPU lacking the required instruction. Signed-off-by: Lu Baolu --- drivers/iommu/intel/Kconfig | 2 +- drivers/iommu/intel/iommu.h | 3 ++- drivers/iommu/intel/iommu.c | 4 ++++ 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig index 7fa31b9d4ef4..fee7fea9dfcb 100644 --- a/drivers/iommu/intel/Kconfig +++ b/drivers/iommu/intel/Kconfig @@ -11,7 +11,7 @@ config DMAR_DEBUG =20 config INTEL_IOMMU bool "Support for Intel IOMMU using DMA Remapping Devices" - depends on PCI_MSI && ACPI && X86 + depends on PCI_MSI && ACPI && X86_64 select IOMMU_API select GENERIC_PT select IOMMU_PT diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 599913fb65d5..54b58d01d0cb 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -535,7 +535,8 @@ enum { =20 #define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap)) #define pasid_supported(iommu) (sm_supported(iommu) && \ - ecap_pasid((iommu)->ecap)) + ecap_pasid((iommu)->ecap) && \ + boot_cpu_has(X86_FEATURE_CX16)) #define ssads_supported(iommu) (sm_supported(iommu) && \ ecap_slads((iommu)->ecap) && \ ecap_smpwc(iommu->ecap)) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index ef7613b177b9..5369526e89d0 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -2647,6 +2647,10 @@ int __init intel_iommu_init(void) pr_info_once("IOMMU batching disallowed due to virtualization\n"); iommu_set_dma_strict(); } + + if (ecap_pasid(iommu->ecap) && !boot_cpu_has(X86_FEATURE_CX16)) + pr_info_once("PASID disabled due to lack of CMPXCHG16B support.\n"); + iommu_device_sysfs_add(&iommu->iommu, NULL, intel_iommu_groups, "%s", iommu->name); --=20 2.43.0 From nobody Thu Apr 9 12:06:18 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A3C4B3502B9 for ; Mon, 9 Mar 2026 06:09:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036594; cv=none; b=JF27fpSm63KMnRFTVvbg8RyGN5kgJ5ay/BxSL/+8978EkRHeIplZibhxP3cThSruj03fjdXnvDYZtufDwgcrPXYi1aNcJ/j55QnWNBokb+CDvyM5xCbLO26L5nD9cJ0qrau+ebU6l9ZKUOjDGglC6h2bgdSHWS+slJBJksYleYU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036594; c=relaxed/simple; bh=deio4nPCsVkZq6SwU0doaBbEpKvzGPXhSnPfuCmiHQk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WKxJgoUcHapILVB0qMErrSO60hzh/vbfqQcKbGBu1Ojo/mz+McbjF6mIgpwpcweJ0HqyQn2avJOM2f+9rkL40sd5nHphyF3BwoPdCntI72FDWbmAuJaiYu5nIXZ8aVvx+b7j/QqgQlbkVVGcRUzctJzsgiuk7ZNAugGJEDzvPi4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=T0DxM+Tq; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="T0DxM+Tq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1773036593; x=1804572593; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=deio4nPCsVkZq6SwU0doaBbEpKvzGPXhSnPfuCmiHQk=; b=T0DxM+TqxcdoihhjbZB/9Vbuz2MUCRQ2xt5JiiqhsHnSxU4NWiHHIR7A r2pjHvC2s5d43nF78ni8w+TrJmnlvjTetqW/+GAFhlcaR49+yb2VTeJbk FtctDkrNq83SDvn6E2luYayT786WK1b/IfV0VGTuU4AtcKD38ttr90SxO JwsOWAKJZMUTUilhCDKOc8cW4FpGYy3gfnIJbW1MMgLaepnv7wD3ZzOeP 55/sOkfMKrzUdiFDA74CoX8knELjXaV4AhX9yVRSOcTBhrcRnaj/2WeOJ vh53PpKyZu3rLZK/mFITYnfI68TB55LPfTwJW5kMBWO0O0IwhNBrABp0f Q==; X-CSE-ConnectionGUID: UPC9oOHuQqaI2wbraMtg2Q== X-CSE-MsgGUID: 9vmvpVxgQzypHEi5ndDhWA== X-IronPort-AV: E=McAfee;i="6800,10657,11723"; a="73248277" X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="73248277" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Mar 2026 23:09:52 -0700 X-CSE-ConnectionGUID: y21i49Q6Q26Gu6J3fPlI/g== X-CSE-MsgGUID: XVlFIbhRStyDmU43My1MFw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="245669168" Received: from allen-box.sh.intel.com ([10.239.159.52]) by fmviesa001.fm.intel.com with ESMTP; 08 Mar 2026 23:09:50 -0700 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe Cc: Dmytro Maluka , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH 4/8] iommu/vt-d: Add trace events for PASID entry sync updates Date: Mon, 9 Mar 2026 14:06:44 +0800 Message-ID: <20260309060648.276762-5-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260309060648.276762-1-baolu.lu@linux.intel.com> References: <20260309060648.276762-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The entry_sync library introduces a more complex, multi-step update process for PASID table entries to enable hitless transitions. Add a set of trace events specifically for the Intel PASID sync plumbing. The implemented trace events introduce: - entry_write_start / entry_write_complete: Captures the state of the 512-bit PASID entry before and after the entry_sync library performs its update logic. This allows verification of the final output compared to the target. - entry_get_used: Logs the current entry alongside the calculated "used bits" mask. This is critical for debugging the library's decision-making process regarding whether an update can be hitless or must be disruptive. - entry_sync: Tracks the state transitions (was_present vs. is_present) within the entry_sync callback. This helps verify that the correct cache invalidations and IOTLB flushes are being triggered for specific transitions (e.g., P=3D1 to P=3D1 hitless vs. P=3D1 to P=3D0 disruptive). Signed-off-by: Lu Baolu --- drivers/iommu/intel/trace.h | 107 ++++++++++++++++++++++++++++++++++++ drivers/iommu/intel/pasid.c | 11 +++- 2 files changed, 117 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/intel/trace.h b/drivers/iommu/intel/trace.h index 6311ba3f1691..b0ccda6f8dc5 100644 --- a/drivers/iommu/intel/trace.h +++ b/drivers/iommu/intel/trace.h @@ -181,6 +181,113 @@ DEFINE_EVENT(cache_tag_flush, cache_tag_flush_range_n= p, unsigned long addr, unsigned long pages, unsigned long mask), TP_ARGS(tag, start, end, addr, pages, mask) ); + +DECLARE_EVENT_CLASS(entry_write, + TP_PROTO(struct device *dev, u32 pasid, u128 *target, u128 *curr), + TP_ARGS(dev, pasid, target, curr), + + TP_STRUCT__entry( + __string(dev, dev_name(dev)) + __field(u32, pasid) + __field(u64, t_w3) + __field(u64, t_w2) + __field(u64, t_w1) + __field(u64, t_w0) + __field(u64, c_w3) + __field(u64, c_w2) + __field(u64, c_w1) + __field(u64, c_w0) + ), + + TP_fast_assign( + __assign_str(dev); + __entry->pasid =3D pasid; + /* Target Entry */ + __entry->t_w0 =3D (u64)target[0]; + __entry->t_w1 =3D (u64)(target[0] >> 64); + __entry->t_w2 =3D (u64)target[1]; + __entry->t_w3 =3D (u64)(target[1] >> 64); + /* Current Entry */ + __entry->c_w0 =3D (u64)curr[0]; + __entry->c_w1 =3D (u64)(curr[0] >> 64); + __entry->c_w2 =3D (u64)curr[1]; + __entry->c_w3 =3D (u64)(curr[1] >> 64); + ), + + TP_printk("%s[%u] target %016llx:%016llx:%016llx:%016llx, current %016llx= :%016llx:%016llx:%016llx", + __get_str(dev), __entry->pasid, + __entry->t_w3, __entry->t_w2, __entry->t_w1, __entry->t_w0, + __entry->c_w3, __entry->c_w2, __entry->c_w1, __entry->c_w0 + ) +); + +DEFINE_EVENT(entry_write, entry_write_start, + TP_PROTO(struct device *dev, u32 pasid, u128 *target, u128 *curr), + TP_ARGS(dev, pasid, target, curr) +); + +DEFINE_EVENT(entry_write, entry_write_complete, + TP_PROTO(struct device *dev, u32 pasid, u128 *target, u128 *curr), + TP_ARGS(dev, pasid, target, curr) +); + +TRACE_EVENT(entry_get_used, + TP_PROTO(const u128 *pe, u128 *used), + TP_ARGS(pe, used), + + TP_STRUCT__entry( + __field(u64, e_w3) + __field(u64, e_w2) + __field(u64, e_w1) + __field(u64, e_w0) + __field(u64, u_w3) + __field(u64, u_w2) + __field(u64, u_w1) + __field(u64, u_w0) + ), + + TP_fast_assign( + __entry->e_w0 =3D (u64)pe[0]; + __entry->e_w1 =3D (u64)(pe[0] >> 64); + __entry->e_w2 =3D (u64)pe[1]; + __entry->e_w3 =3D (u64)(pe[1] >> 64); + + __entry->u_w0 =3D (u64)used[0]; + __entry->u_w1 =3D (u64)(used[0] >> 64); + __entry->u_w2 =3D (u64)used[1]; + __entry->u_w3 =3D (u64)(used[1] >> 64); + ), + + TP_printk("entry %016llx:%016llx:%016llx:%016llx, used %016llx:%016llx:%0= 16llx:%016llx", + __entry->e_w3, __entry->e_w2, __entry->e_w1, __entry->e_w0, + __entry->u_w3, __entry->u_w2, __entry->u_w1, __entry->u_w0 + ) +); + +TRACE_EVENT(entry_sync, + TP_PROTO(struct device *dev, u32 pasid, bool was_present, bool is_present= ), + TP_ARGS(dev, pasid, was_present, is_present), + + TP_STRUCT__entry( + __string(dev, dev_name(dev)) + __field(u32, pasid) + __field(bool, was_present) + __field(bool, is_present) + ), + + TP_fast_assign( + __assign_str(dev); + __entry->pasid =3D pasid; + __entry->was_present =3D was_present; + __entry->is_present =3D is_present; + ), + + TP_printk("%s[%u] was %s, is now %s", + __get_str(dev), __entry->pasid, + __entry->was_present ? "present" : "non-present", + __entry->is_present ? "present" : "non-present" + ) +); #endif /* _TRACE_INTEL_IOMMU_H */ =20 /* This part must be outside protection */ diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 5b9eb5c8f42d..b7c8888afaef 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -20,6 +20,7 @@ =20 #include "iommu.h" #include "pasid.h" +#include "trace.h" #include "../iommu-pages.h" #include "../entry_sync.h" =20 @@ -68,8 +69,10 @@ static void intel_pasid_get_used(const u128 *entry, u128= *used) ue->val[0] |=3D PASID_PTE_PRESENT; =20 /* Nothing more for non-present entries. */ - if (!(pe->val[0] & PASID_PTE_PRESENT)) + if (!(pe->val[0] & PASID_PTE_PRESENT)) { + trace_entry_get_used(entry, used); return; + } =20 pgtt =3D pasid_pte_get_pgtt(pe); switch (pgtt) { @@ -107,6 +110,8 @@ static void intel_pasid_get_used(const u128 *entry, u12= 8 *used) default: WARN_ON(true); } + + trace_entry_get_used(entry, used); } =20 static void intel_pasid_sync(struct entry_sync_writer128 *writer) @@ -132,6 +137,8 @@ static void intel_pasid_sync(struct entry_sync_writer12= 8 *writer) if (!ecap_coherent(iommu->ecap)) clflush_cache_range(pte, sizeof(*pte)); =20 + trace_entry_sync(dev, pasid, was_present, is_present); + /* Sync for "P=3D0" to "P=3D1": */ if (!was_present) { if (is_present) @@ -195,7 +202,9 @@ static int __maybe_unused intel_pasid_write(struct inte= l_iommu *iommu, * 1. Checks if it can do a 1-quanta hitless flip. * 2. If not, it does a 3-step V=3D0 (disruptive) update. */ + trace_entry_write_start(dev, pasid, target, (u128 *)pte); entry_sync_write128(&p_writer.writer, (u128 *)pte, target, memory, sizeof= (memory)); + trace_entry_write_complete(dev, pasid, target, (u128 *)pte); =20 return 0; } --=20 2.43.0 From nobody Thu Apr 9 12:06:18 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1CC9034F25C for ; Mon, 9 Mar 2026 06:09:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036597; cv=none; b=Y2hYR5DbAb84zwYMq08i34pFhV5slHa5CO0psZlvMIGRafn9sZv8cKkYreSojP5MCx4oUkxkbRoFWOQL1FSpv5eTlFFoasGEJVNFVt3sWrZ+SQA1iJY1BupGcmmykJE1tcIgmGAYAuQlM1ZqFZLn+IDTvIo2ceGYAkdVN7wKaeQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036597; c=relaxed/simple; bh=y8qXgGjcx+msGq9DbKTPaip+V0IsY6bVsi2//JJIGbU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hP9azrcxIr1kJ8+hOTdpzB4dGSQ7RHsRaE3lcgLIrpJG7bbr4n5V7x1KMtSobuoOluFzDxpm2lrY1a3QoObugMSZQUG21tFW7h5cXc+TiNmqw5vavNtQMP7mD5T73BNIBpvLkOwvOa1CH6I+JJ4u6AdQtipA8+hIqMLUMa53bKs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FImxzFUd; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FImxzFUd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1773036595; x=1804572595; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=y8qXgGjcx+msGq9DbKTPaip+V0IsY6bVsi2//JJIGbU=; b=FImxzFUdEZcMLq+J0qqqxfDmGG44/R37v9jMtbTark+6VDZe8g64+976 cjkAVNZ0dH0Vwky0GwFKlqKvUYg4+7I80UIk52VWbKMOvCbxJ7QP6qVVE 7Jc9dJAoNlkTIpvDfa5rqlDeOdILqqIL9yGFlcVce3CzTig8Xam8/9ZDw A8+UwWrBw+WexAjCl1Gr1bxGys4Ap7q4trk2q0Mqi/hD0ORTxvC0LMXtr 1/0Bx+HUNojR0EVgaCdOJq3a63R8G+CvVEYZa57BpSwN5RN26QBdh0xGH hhHAdLBWgOxS4+gyRLQPSnnz2pqBRMBBm7HKfHNNceERJaV6noODvAbrj g==; X-CSE-ConnectionGUID: wsplqRPLQ9evlixTFEgZig== X-CSE-MsgGUID: 4Gs+hy2XTMSwHMl2dZ821A== X-IronPort-AV: E=McAfee;i="6800,10657,11723"; a="73248287" X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="73248287" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Mar 2026 23:09:55 -0700 X-CSE-ConnectionGUID: F+hdJFR3TVy7IuNTidKv+A== X-CSE-MsgGUID: cLVSFpj2Q4iu+n1FV3mhow== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="245669171" Received: from allen-box.sh.intel.com ([10.239.159.52]) by fmviesa001.fm.intel.com with ESMTP; 08 Mar 2026 23:09:52 -0700 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe Cc: Dmytro Maluka , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH 5/8] iommu/vt-d: Use intel_pasid_write() for first-stage setup Date: Mon, 9 Mar 2026 14:06:45 +0800 Message-ID: <20260309060648.276762-6-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260309060648.276762-1-baolu.lu@linux.intel.com> References: <20260309060648.276762-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor intel_pasid_setup_first_level() to utilize the intel_pasid_write() helper. By moving to the entry_sync library, the driver now constructs the target entry in a local buffer and hands it off to intel_pasid_write(). This refactoring removes the need for __domain_setup_first_level(), simplifies locking by using the group mutex, and ensures a consistent update path for all first-stage setups. Signed-off-by: Lu Baolu --- drivers/iommu/intel/iommu.h | 5 ----- drivers/iommu/intel/iommu.c | 16 +++------------- drivers/iommu/intel/pasid.c | 36 +++++++++--------------------------- drivers/iommu/intel/svm.c | 5 ++--- 4 files changed, 14 insertions(+), 48 deletions(-) diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 54b58d01d0cb..fd6ca3b7f594 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1202,11 +1202,6 @@ domain_add_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid); void domain_remove_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid); - -int __domain_setup_first_level(struct intel_iommu *iommu, struct device *d= ev, - ioasid_t pasid, u16 did, phys_addr_t fsptptr, - int flags, struct iommu_domain *old); - int dmar_ir_support(void); =20 void iommu_flush_write_buffer(struct intel_iommu *iommu); diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5369526e89d0..db5e8dad50dc 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1248,16 +1248,6 @@ static void domain_context_clear_one(struct device_d= omain_info *info, u8 bus, u8 __iommu_flush_cache(iommu, context, sizeof(*context)); } =20 -int __domain_setup_first_level(struct intel_iommu *iommu, struct device *d= ev, - ioasid_t pasid, u16 did, phys_addr_t fsptptr, - int flags, struct iommu_domain *old) -{ - if (old) - intel_pasid_tear_down_entry(iommu, dev, pasid, false); - - return intel_pasid_setup_first_level(iommu, dev, fsptptr, pasid, did, fla= gs); -} - static int domain_setup_second_level(struct intel_iommu *iommu, struct dmar_domain *domain, struct device *dev, ioasid_t pasid, @@ -1301,9 +1291,9 @@ static int domain_setup_first_level(struct intel_iomm= u *iommu, BIT(PT_FEAT_DMA_INCOHERENT))) flags |=3D PASID_FLAG_PWSNP; =20 - return __domain_setup_first_level(iommu, dev, pasid, - domain_id_iommu(domain, iommu), - pt_info.gcr3_pt, flags, old); + return intel_pasid_setup_first_level(iommu, dev, pt_info.gcr3_pt, pasid, + domain_id_iommu(domain, iommu), + flags); } =20 static int dmar_domain_attach_device(struct dmar_domain *domain, diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index b7c8888afaef..8ea1ac8cbf5e 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -172,9 +172,8 @@ static const struct entry_sync_writer_ops128 writer_ops= 128 =3D { =20 #define INTEL_PASID_SYNC_MEM_COUNT 12 =20 -static int __maybe_unused intel_pasid_write(struct intel_iommu *iommu, - struct device *dev, u32 pasid, - u128 *target) +static int intel_pasid_write(struct intel_iommu *iommu, struct device *dev, + u32 pasid, u128 *target) { struct pasid_entry *pte =3D intel_pasid_get_entry(dev, pasid); struct intel_pasid_writer p_writer =3D { @@ -531,17 +530,14 @@ static void intel_pasid_flush_present(struct intel_io= mmu *iommu, =20 /* * Set up the scalable mode pasid table entry for first only - * translation type. + * translation type. Caller should zero out the entry before + * calling. */ static void pasid_pte_config_first_level(struct intel_iommu *iommu, struct pasid_entry *pte, phys_addr_t fsptptr, u16 did, int flags) { - lockdep_assert_held(&iommu->lock); - - pasid_clear_entry(pte); - /* Setup the first level page table pointer: */ pasid_set_flptr(pte, fsptptr); =20 @@ -564,7 +560,9 @@ int intel_pasid_setup_first_level(struct intel_iommu *i= ommu, struct device *dev, phys_addr_t fsptptr, u32 pasid, u16 did, int flags) { - struct pasid_entry *pte; + struct pasid_entry new_pte =3D {0}; + + iommu_group_mutex_assert(dev); =20 if (!ecap_flts(iommu->ecap)) { pr_err("No first level translation support on %s\n", @@ -578,25 +576,9 @@ int intel_pasid_setup_first_level(struct intel_iommu *= iommu, struct device *dev, return -EINVAL; } =20 - spin_lock(&iommu->lock); - pte =3D intel_pasid_get_entry(dev, pasid); - if (!pte) { - spin_unlock(&iommu->lock); - return -ENODEV; - } + pasid_pte_config_first_level(iommu, &new_pte, fsptptr, did, flags); =20 - if (pasid_pte_is_present(pte)) { - spin_unlock(&iommu->lock); - return -EBUSY; - } - - pasid_pte_config_first_level(iommu, pte, fsptptr, did, flags); - - spin_unlock(&iommu->lock); - - pasid_flush_caches(iommu, pte, pasid, did); - - return 0; + return intel_pasid_write(iommu, dev, pasid, (u128 *)&new_pte); } =20 /* diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index fea10acd4f02..978d63073e3b 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -171,9 +171,8 @@ static int intel_svm_set_dev_pasid(struct iommu_domain = *domain, /* Setup the pasid table: */ sflags =3D cpu_feature_enabled(X86_FEATURE_LA57) ? PASID_FLAG_FL5LP : 0; sflags |=3D PASID_FLAG_PWSNP; - ret =3D __domain_setup_first_level(iommu, dev, pasid, - FLPT_DEFAULT_DID, __pa(mm->pgd), - sflags, old); + ret =3D intel_pasid_setup_first_level(iommu, dev, __pa(mm->pgd), + pasid, FLPT_DEFAULT_DID, sflags); if (ret) goto out_unwind_iopf; =20 --=20 2.43.0 From nobody Thu Apr 9 12:06:18 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89BA534F270 for ; Mon, 9 Mar 2026 06:09:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036598; cv=none; b=tbPOtMmpsasq2BdoObWZM2ACUpPHb6JyDny+yaJfErx3f/y5T09ebHCgKABjsPYzlPmVyoJQQX+RBuTgwn1Ti9TATwYuWwMiCY+eCTsL6BFa36WnvXugXYbrZ8AE79DypZuqSk+UwUfFLblukwZCwLnu2wgzl6tXWNl35Cvq7A0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036598; c=relaxed/simple; bh=15cuIjNmzRhJGbEOchA9KN71cU83CCr7Ainh+JmETF4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jHt++w3gTGMrzITGVa0oE8bgMkApPL2QPWvHyof9kt/k3AD/vFvfOj0wCFOR7NQJzFBSRs5LcS9kaELkUvKbuAs5G3nXQtKFq97gIxhNKhSJa1QPXLpetpqiMtd186Cy3XgbrE6k7IhG39b5Uu/XOsVfIzYZbPYr9it2Z6nhP6o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dcSQMCGQ; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dcSQMCGQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1773036598; x=1804572598; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=15cuIjNmzRhJGbEOchA9KN71cU83CCr7Ainh+JmETF4=; b=dcSQMCGQRSi73y4a7d1Z7N15AMOq0eevE3qfFgJjiexncwDCV+8iWcjm 3tg7vJVZy4FPWO8a9+mBmAKM4bYec2xqYqVfIqoDv9xcUH0j/gj4/t0Rd J1wwbn3zhzVwbjBfjNoRP0bP2Cj2dSOJ8TSTHE8wft3FuEvOY8J1E05UG kDVr1wWQDPb9QJUmt2PB0W4XVncnDWHUIUMjg+Ys/xnHMb7OxbbCBHDvE egI4I7wjTOYlDqTUVcQbX3gGbtDedWalxAUnpV0ltRmOSicqmF48pUcwK G+ZXwI150pnVngHv00W1tqv/9tnqp4CKCp7MAl2BR0EvasOTWKFXN3JU2 g==; X-CSE-ConnectionGUID: NKQxFtieTGi41Lc8iXSTuQ== X-CSE-MsgGUID: xYqUQWNDTVuIACf/8ohlwQ== X-IronPort-AV: E=McAfee;i="6800,10657,11723"; a="73248295" X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="73248295" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Mar 2026 23:09:57 -0700 X-CSE-ConnectionGUID: QjNyrwwFTI2k3U1exs2qbQ== X-CSE-MsgGUID: WhLy3t6GSqK+lBkjuDwqJg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="245669174" Received: from allen-box.sh.intel.com ([10.239.159.52]) by fmviesa001.fm.intel.com with ESMTP; 08 Mar 2026 23:09:55 -0700 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe Cc: Dmytro Maluka , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH 6/8] iommu/vt-d: Use intel_pasid_write() for second-stage setup Date: Mon, 9 Mar 2026 14:06:46 +0800 Message-ID: <20260309060648.276762-7-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260309060648.276762-1-baolu.lu@linux.intel.com> References: <20260309060648.276762-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor intel_pasid_setup_second_level() to utilize the intel_pasid_write() helper. Similar to the first-stage setup, moves the second stage setup logic to the entry_sync library by constructing the target PASID entry in a local buffer and committing it via intel_pasid_write(). Signed-off-by: Lu Baolu --- drivers/iommu/intel/iommu.c | 19 ++++--------------- drivers/iommu/intel/pasid.c | 26 ++++---------------------- 2 files changed, 8 insertions(+), 37 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index db5e8dad50dc..b98020ac9de2 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1248,17 +1248,6 @@ static void domain_context_clear_one(struct device_d= omain_info *info, u8 bus, u8 __iommu_flush_cache(iommu, context, sizeof(*context)); } =20 -static int domain_setup_second_level(struct intel_iommu *iommu, - struct dmar_domain *domain, - struct device *dev, ioasid_t pasid, - struct iommu_domain *old) -{ - if (old) - intel_pasid_tear_down_entry(iommu, dev, pasid, false); - - return intel_pasid_setup_second_level(iommu, domain, dev, pasid); -} - static int domain_setup_passthrough(struct intel_iommu *iommu, struct device *dev, ioasid_t pasid, struct iommu_domain *old) @@ -1323,8 +1312,8 @@ static int dmar_domain_attach_device(struct dmar_doma= in *domain, ret =3D domain_setup_first_level(iommu, domain, dev, IOMMU_NO_PASID, NULL); else if (intel_domain_is_ss_paging(domain)) - ret =3D domain_setup_second_level(iommu, domain, dev, - IOMMU_NO_PASID, NULL); + ret =3D intel_pasid_setup_second_level(iommu, domain, + dev, IOMMU_NO_PASID); else if (WARN_ON(true)) ret =3D -EINVAL; =20 @@ -3634,8 +3623,8 @@ static int intel_iommu_set_dev_pasid(struct iommu_dom= ain *domain, ret =3D domain_setup_first_level(iommu, dmar_domain, dev, pasid, old); else if (intel_domain_is_ss_paging(dmar_domain)) - ret =3D domain_setup_second_level(iommu, dmar_domain, - dev, pasid, old); + ret =3D intel_pasid_setup_second_level(iommu, dmar_domain, + dev, pasid); else if (WARN_ON(true)) ret =3D -EINVAL; =20 diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 8ea1ac8cbf5e..3084afb3d4a1 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -590,10 +590,7 @@ static void pasid_pte_config_second_level(struct intel= _iommu *iommu, { struct pt_iommu_vtdss_hw_info pt_info; =20 - lockdep_assert_held(&iommu->lock); - pt_iommu_vtdss_hw_info(&domain->sspt, &pt_info); - pasid_clear_entry(pte); pasid_set_domain_id(pte, did); pasid_set_slptr(pte, pt_info.ssptptr); pasid_set_address_width(pte, pt_info.aw); @@ -611,9 +608,10 @@ int intel_pasid_setup_second_level(struct intel_iommu = *iommu, struct dmar_domain *domain, struct device *dev, u32 pasid) { - struct pasid_entry *pte; + struct pasid_entry new_pte =3D {0}; u16 did; =20 + iommu_group_mutex_assert(dev); =20 /* * If hardware advertises no support for second level @@ -626,25 +624,9 @@ int intel_pasid_setup_second_level(struct intel_iommu = *iommu, } =20 did =3D domain_id_iommu(domain, iommu); + pasid_pte_config_second_level(iommu, &new_pte, domain, did); =20 - spin_lock(&iommu->lock); - pte =3D intel_pasid_get_entry(dev, pasid); - if (!pte) { - spin_unlock(&iommu->lock); - return -ENODEV; - } - - if (pasid_pte_is_present(pte)) { - spin_unlock(&iommu->lock); - return -EBUSY; - } - - pasid_pte_config_second_level(iommu, pte, domain, did); - spin_unlock(&iommu->lock); - - pasid_flush_caches(iommu, pte, pasid, did); - - return 0; + return intel_pasid_write(iommu, dev, pasid, (u128 *)&new_pte); } =20 /* --=20 2.43.0 From nobody Thu Apr 9 12:06:18 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5242134F270 for ; Mon, 9 Mar 2026 06:10:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036603; cv=none; b=QMtVyrbN7MJLxC7jOHUKTUxsNEVlBIqmXDhZ+kga97DmTrKxBGboHIr9aU1bU5tsg807pyAiwjbrbEiJYUCgHxwLX7j0Fu+oOVbUfXNLAdUzW5yWDFMHYJGwYQZ3lz1tsrCYg+19E7Qxmj2NSN9sIAcQVgjqQoYHhQuJs4YW8Ys= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036603; c=relaxed/simple; bh=DYREPAnsXoAIO6r1Fw6EsIIRjTHVUtTguaZ0vRiAhtk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=u8XbTu3KiamJCiuwXsTxcLJbW/W6aYQ8IkoK85Vkt5gHgQDeNR9UWwGs/kVJ39lee/6JCFYzXysLh861PmpM1iE1CAVtZ59nufVZRoDX+5Hrn0Q1SZEMiN0u1JL6rH7/sJrt+6EQ4S5yJQNUyDNljBv/81t/D+T6PDtYFTqZX50= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RyHrFyDN; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RyHrFyDN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1773036602; x=1804572602; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DYREPAnsXoAIO6r1Fw6EsIIRjTHVUtTguaZ0vRiAhtk=; b=RyHrFyDNC98EFgVMtfmW4B72yeiykNKFrKfa+xUi7KgyaqYejzxRi03C BUlH7ohWEI5U1r5FIFaJuJkOhpX+bJy/0zJZKM7sMOd/ND1EwwgiT5dqW Al1GKXb35SROifY2jx4fxdYh6+AIcTOgnMQxNR0xAiY+vaXo/rg4MTzQe Pre7fTxfK6TGzvdncIXQAneBfSZwY/P0mJONQ4hWxBeS+LqQkr95YJbcS g4waNvA1hItTDYm3sdKbitchOCQwPvqBBPc3z8Rg3ZNNAgWjVFtLO7Bbj 5ur15v0TqMQc3VqlWivuO5DRccAJEuvFQd7WiEEvX/i/MwRyknUnT5Xbd g==; X-CSE-ConnectionGUID: Pg9dlwqGTpSziijMA8PQJg== X-CSE-MsgGUID: VL/DbWiUQmWa4sMyS0+JXA== X-IronPort-AV: E=McAfee;i="6800,10657,11723"; a="73248306" X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="73248306" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Mar 2026 23:10:00 -0700 X-CSE-ConnectionGUID: I7OP9EknQNKMwMswO7kOvA== X-CSE-MsgGUID: VXVbF9BVSRuQjTfqra2T4g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="245669179" Received: from allen-box.sh.intel.com ([10.239.159.52]) by fmviesa001.fm.intel.com with ESMTP; 08 Mar 2026 23:09:57 -0700 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe Cc: Dmytro Maluka , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH 7/8] iommu/vt-d: Use intel_pasid_write() for pass-through setup Date: Mon, 9 Mar 2026 14:06:47 +0800 Message-ID: <20260309060648.276762-8-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260309060648.276762-1-baolu.lu@linux.intel.com> References: <20260309060648.276762-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor intel_pasid_setup_pass_through() to utilize the intel_pasid_write() helper. Move the pass-through setup implementation to the entry_sync library, where the target PASID entry is constructed locally and committed via the centralized intel_pasid_write() wrapper. Signed-off-by: Lu Baolu --- drivers/iommu/intel/iommu.c | 12 +----------- drivers/iommu/intel/pasid.c | 26 ++++---------------------- 2 files changed, 5 insertions(+), 33 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index b98020ac9de2..f1f9fafd3984 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1248,16 +1248,6 @@ static void domain_context_clear_one(struct device_d= omain_info *info, u8 bus, u8 __iommu_flush_cache(iommu, context, sizeof(*context)); } =20 -static int domain_setup_passthrough(struct intel_iommu *iommu, - struct device *dev, ioasid_t pasid, - struct iommu_domain *old) -{ - if (old) - intel_pasid_tear_down_entry(iommu, dev, pasid, false); - - return intel_pasid_setup_pass_through(iommu, dev, pasid); -} - static int domain_setup_first_level(struct intel_iommu *iommu, struct dmar_domain *domain, struct device *dev, @@ -3848,7 +3838,7 @@ static int identity_domain_set_dev_pasid(struct iommu= _domain *domain, if (ret) return ret; =20 - ret =3D domain_setup_passthrough(iommu, dev, pasid, old); + ret =3D intel_pasid_setup_pass_through(iommu, dev, pasid); if (ret) { iopf_for_domain_replace(old, domain, dev); return ret; diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index 3084afb3d4a1..cb55ff422d7d 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -704,9 +704,6 @@ int intel_pasid_setup_dirty_tracking(struct intel_iommu= *iommu, static void pasid_pte_config_pass_through(struct intel_iommu *iommu, struct pasid_entry *pte, u16 did) { - lockdep_assert_held(&iommu->lock); - - pasid_clear_entry(pte); pasid_set_domain_id(pte, did); pasid_set_address_width(pte, iommu->agaw); pasid_set_translation_type(pte, PASID_ENTRY_PGTT_PT); @@ -718,27 +715,12 @@ static void pasid_pte_config_pass_through(struct inte= l_iommu *iommu, int intel_pasid_setup_pass_through(struct intel_iommu *iommu, struct device *dev, u32 pasid) { - u16 did =3D FLPT_DEFAULT_DID; - struct pasid_entry *pte; + struct pasid_entry new_pte =3D {0}; =20 - spin_lock(&iommu->lock); - pte =3D intel_pasid_get_entry(dev, pasid); - if (!pte) { - spin_unlock(&iommu->lock); - return -ENODEV; - } + iommu_group_mutex_assert(dev); + pasid_pte_config_pass_through(iommu, &new_pte, FLPT_DEFAULT_DID); =20 - if (pasid_pte_is_present(pte)) { - spin_unlock(&iommu->lock); - return -EBUSY; - } - - pasid_pte_config_pass_through(iommu, pte, did); - spin_unlock(&iommu->lock); - - pasid_flush_caches(iommu, pte, pasid, did); - - return 0; + return intel_pasid_write(iommu, dev, pasid, (u128 *)&new_pte); } =20 /* --=20 2.43.0 From nobody Thu Apr 9 12:06:18 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BD94434EF0E for ; Mon, 9 Mar 2026 06:10:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036604; cv=none; b=UpBF9yUSDMZhUCJnZWLldU2N+e91dtVDyTdCctFwhmOn2eYuJsJYun0wky3vy3j08y0hpFRKWL1X0Do0eshvto/z6KVoWrpuelIcC/4RAG/zx89hcJjYuHAqyVzyBfS+OhUhklWnEbBO6GqBWSCfZflk/HTdu5x5H4kSg0DF2vY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773036604; c=relaxed/simple; bh=Ai3mLIr1aMxlF434+33AGSEqNokSCKdS5uiESmB7DFg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KZs/aUfZD4tYYqpKMDmhjowCFKPrJ01lb/uJ6WGGpFJypo41QLh8yjo41CmaaudmVPbxZmcqXF3AD1QY3OHSlMlQvGbKOj9HcEGvyv7mdPQ+j+UKCj83c3L97JaNl7Lhttis2ry3Zd1lDGDNVRKEO+RLokYJs0w9sKc740f0mU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=pass smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=W4mb3AQV; arc=none smtp.client-ip=192.198.163.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="W4mb3AQV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1773036603; x=1804572603; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ai3mLIr1aMxlF434+33AGSEqNokSCKdS5uiESmB7DFg=; b=W4mb3AQV/3XAau/e3jzA03080I6IfOChb404yIhSt5JgaNjwK7/EDvFA glSb+hXrp+TT4UMMW2u+uYBJG5fkj58gY5GyXt6TBGkT009Sdew+cdWKs mTp7tSWeiLOALsMYNFUFAcYREignbZ9VGMICWZMdwxWCsoNFMjGIHyhsW vMrrEQQGX6dtXMe+pUQ0WiNnfimwpC2L4rBs0VaVGK+BvMbLXglHfB2aA BXHlhHEvNVr8PLB5vNplzlw8YdrSWpw/MMRVNNhS8ElbqHpcCosNJCJ7x 5F8SWd9O5Vbh22QsNcT5RRqiJS0sOztNMiTbRz2si0J6NCcIinzihmXRJ g==; X-CSE-ConnectionGUID: EMI5qigIRT6p/gODKBvhCA== X-CSE-MsgGUID: weaS5mOOTa+jwIoaPmUErA== X-IronPort-AV: E=McAfee;i="6800,10657,11723"; a="73248314" X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="73248314" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmvoesa112.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Mar 2026 23:10:02 -0700 X-CSE-ConnectionGUID: nJm8H1PtToOSmPMBSvrzFQ== X-CSE-MsgGUID: U1YReGvGR7Wjaas/4vYOEg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,109,1770624000"; d="scan'208";a="245669183" Received: from allen-box.sh.intel.com ([10.239.159.52]) by fmviesa001.fm.intel.com with ESMTP; 08 Mar 2026 23:10:00 -0700 From: Lu Baolu To: Joerg Roedel , Will Deacon , Robin Murphy , Kevin Tian , Jason Gunthorpe Cc: Dmytro Maluka , Samiullah Khawaja , iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu Subject: [PATCH 8/8] iommu/vt-d: Use intel_pasid_write() for nested setup Date: Mon, 9 Mar 2026 14:06:48 +0800 Message-ID: <20260309060648.276762-9-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260309060648.276762-1-baolu.lu@linux.intel.com> References: <20260309060648.276762-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor intel_pasid_setup_nested() to utilize the intel_pasid_write() helper. Move the implementation to the entry_sync infrastructure, where the nested PASID entry is constructed in a local buffer and committed via the centralized intel_pasid_write() wrapper. Signed-off-by: Lu Baolu --- drivers/iommu/intel/nested.c | 13 +------------ drivers/iommu/intel/pasid.c | 27 +++++---------------------- 2 files changed, 6 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c index 2b979bec56ce..1cebc1232f70 100644 --- a/drivers/iommu/intel/nested.c +++ b/drivers/iommu/intel/nested.c @@ -131,17 +131,6 @@ static int intel_nested_cache_invalidate_user(struct i= ommu_domain *domain, return ret; } =20 -static int domain_setup_nested(struct intel_iommu *iommu, - struct dmar_domain *domain, - struct device *dev, ioasid_t pasid, - struct iommu_domain *old) -{ - if (old) - intel_pasid_tear_down_entry(iommu, dev, pasid, false); - - return intel_pasid_setup_nested(iommu, dev, pasid, domain); -} - static int intel_nested_set_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid, struct iommu_domain *old) @@ -170,7 +159,7 @@ static int intel_nested_set_dev_pasid(struct iommu_doma= in *domain, if (ret) goto out_remove_dev_pasid; =20 - ret =3D domain_setup_nested(iommu, dmar_domain, dev, pasid, old); + ret =3D intel_pasid_setup_nested(iommu, dev, pasid, dmar_domain); if (ret) goto out_unwind_iopf; =20 diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c index cb55ff422d7d..5e0548dd8388 100644 --- a/drivers/iommu/intel/pasid.c +++ b/drivers/iommu/intel/pasid.c @@ -754,12 +754,8 @@ static void pasid_pte_config_nestd(struct intel_iommu = *iommu, { struct pt_iommu_vtdss_hw_info pt_info; =20 - lockdep_assert_held(&iommu->lock); - pt_iommu_vtdss_hw_info(&s2_domain->sspt, &pt_info); =20 - pasid_clear_entry(pte); - if (s1_cfg->addr_width =3D=3D ADDR_WIDTH_5LEVEL) pasid_set_flpm(pte, 1); =20 @@ -806,7 +802,9 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu,= struct device *dev, struct iommu_hwpt_vtd_s1 *s1_cfg =3D &domain->s1_cfg; struct dmar_domain *s2_domain =3D domain->s2_domain; u16 did =3D domain_id_iommu(domain, iommu); - struct pasid_entry *pte; + struct pasid_entry new_pte =3D {0}; + + iommu_group_mutex_assert(dev); =20 /* Address width should match the address width supported by hardware */ switch (s1_cfg->addr_width) { @@ -837,23 +835,8 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu= , struct device *dev, return -EINVAL; } =20 - spin_lock(&iommu->lock); - pte =3D intel_pasid_get_entry(dev, pasid); - if (!pte) { - spin_unlock(&iommu->lock); - return -ENODEV; - } - if (pasid_pte_is_present(pte)) { - spin_unlock(&iommu->lock); - return -EBUSY; - } - - pasid_pte_config_nestd(iommu, pte, s1_cfg, s2_domain, did); - spin_unlock(&iommu->lock); - - pasid_flush_caches(iommu, pte, pasid, did); - - return 0; + pasid_pte_config_nestd(iommu, &new_pte, s1_cfg, s2_domain, did); + return intel_pasid_write(iommu, dev, pasid, (u128 *)&new_pte); } =20 /* --=20 2.43.0