From nobody Sun May 24 23:29:00 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 016F63EEAE2 for ; Wed, 20 May 2026 15:07:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779289673; cv=none; b=f6Sb0MDZnwP5JNrt6+81guSiA5s0jQubVnUDm4n2Gp2LpbOdztsgck66Hhr/NvDRLdB8Hr17gMq1TZy8bXw2eRMjsMPK6EjE2qpk98K7E0fE6HEknjWEq8Dq6XHoXb4C6r8/+rLFjPdMapup1DsazVKKyQiuToeVcSCUESv76+Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779289673; c=relaxed/simple; bh=MVlFuUdSxChHlFJctVr5HZW91xXQ5JrSnHn1IZklfaY=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=mFnX6rIzIjAAs+GFaBcHGYvTfGk2QzRVQxk00aji5HVa8aloTIWgCtyWd+K6Bw3hhPN/0+t6OsvI8DSwRVl4acf9U7NspFGnhJSrfgCd1Uw+6W8KDc/BdRfDGZodw09v9NPisE60g/QpwU11Tc31blHvN6UWPttWlOuOsttmvB8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--joonwonkang.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DPiBDqg5; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--joonwonkang.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DPiBDqg5" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c8271fb4407so7560310a12.2 for ; Wed, 20 May 2026 08:07:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1779289671; x=1779894471; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=wHUnnXFLKyxNLD+EvpucYoAufuRZ6w5hpv0Yeic6fcw=; b=DPiBDqg57hI+JVFsBNFOFQHbiw80g53Mz1e6cWuSNrzqIEhFhKR7JUgKbPvyxrVDZd iVfwLVGOw2TsMYeQsGTZoQVORGdotnMtJ/AWcHLiVOaQimYQFg5Atn1rjH6stAIgi7TQ 45Mkwb1Vtgbb1JqJ0wjSpFp2Bsxe8VZvna+hnpIpaypNGA56CBAVyLIxJ06kp8Pq0VAi OVdIWqt6ru/U5KRb5RmoXldj1RqYHhUlfgTSyvKAXPmjK0TR6BGigy+8HPMlQyz7AS8a WTZ4uoXoTihdDXIIVUNEYOtgUxtPFrG0a7ym28pAafLOhxl/vdAJIBIdQNKdv6SPQ9MF HkqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779289671; x=1779894471; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=wHUnnXFLKyxNLD+EvpucYoAufuRZ6w5hpv0Yeic6fcw=; b=Bsd32RXXbNjZMxXvSKAcQ+ebJ4xI8nIct/Xez7tb6UpwxYVEbuoFZ3o6woOpAKB4/y Rrl+XAa7KoKpyNzf5uGGxXHdmWect4H2utoBhzUsDUf8IX0oTzMvxxsc+KLUXf+tmpnS ykmcD41jT8Q76A4H37h4XZEhwfRC0Sgv/JNex672RW/iK/uAj4s2ssPi/0R/Y8aoNB+z hEd9efJvkeKCfbm/zs1b4OiPB1Pf3WfBz1cxiNv0sLmVEYLCPjZaTiYA3QdcQz8Mxn5F Tb0lmElM1FXTGfbwFGoBfJ70PW7vpybSIqiUkaKiHIQUhR0TMzI69WbqHrkh51AQHmC2 TMEQ== X-Forwarded-Encrypted: i=1; AFNElJ8ZENHkMVbKq7NbBqBwqYJdKdnaHH2H3W6n8U0Atx9l8ucqUdGcPI5rYguf7qYbaE5UTFTrRmjWEXtzP0U=@vger.kernel.org X-Gm-Message-State: AOJu0Yzy3b6VtC3ivogVSQ+NCuloqYmJ57jTfhZ1KKV0JowZgmPg4KWe /zsOJRgZoNlqfXKejNtfF5oQVT4fhAY2ZKrEbFYZ0tZvS+Ixx/9d8Or/RBM/wg362zqEHMYxzE/ NnGQqGHL9dyb5A14v4J6/0ET4aw== X-Received: from pfbem50.prod.google.com ([2002:a05:6a00:3772:b0:82f:6a57:a9aa]) (user=joonwonkang job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:ab0d:b0:82f:5125:a327 with SMTP id d2e1a72fcca58-83f33d97ad5mr24114337b3a.27.1779289670994; Wed, 20 May 2026 08:07:50 -0700 (PDT) Date: Wed, 20 May 2026 15:07:43 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.54.0.631.ge1b05301d1-goog Message-ID: <20260520150743.727106-1-joonwonkang@google.com> Subject: [PATCH v2] iommu: Allow device driver to use its own PASID space for SVA From: Joonwon Kang To: jgg@ziepe.ca, will@kernel.org, robin.murphy@arm.com, joro@8bytes.org, jpb@kernel.org Cc: Alexander.Grest@microsoft.com, amhetre@nvidia.com, baolu.lu@linux.intel.com, easwar.hariharan@linux.microsoft.com, jacob.jun.pan@linux.intel.com, kees@kernel.org, kevin.tian@intel.com, nicolinc@nvidia.com, praan@google.com, smostafa@google.com, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, peterz@infradead.org, sohil.mehta@intel.com, kas@kernel.org, alexander.shishkin@linux.intel.com, ryasuoka@redhat.com, xin@zytor.com, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arm-kernel@lists.infradead.org, joonwonkang@google.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For SVA, the IOMMU core always allocates PASID from the global PASID space. The use of this global PASID space comes from the limitation of the ENQCMD instruction in Intel CPUs that it fetches its PASID operand from IA32_PASID, which is per-process; when a process wants to communicate with multiple devices with the ENQCMD instruction, it cannot change its PASID for each device without the kernel's intervention. Also note that ARM introduced a similar instruction, which is ST64BV0. Due to this nature, SVA with ARM SMMU v3 has been found not working in our environment when other modules/devices compete for PASID. The environment looks as follows: - The device is not a PCIe device. - The device is to use SVA. - The supported SSID/PASID space is very small for the device; only 1 to 3 SSIDs are supported. With this setup, when other modules have allocated all the PASIDs that our device is expected to use from the global PASID space via APIs like iommu_alloc_global_pasid() or iommu_sva_bind_device(), SVA binding to our device fails due to the lack of available PASIDs. This commit resolves the issue by allowing device driver to maintain its own PASID space and assign a PASID from that for the process-device bond via a new API called `iommu_sva_bind_device_pasid(dev, mm, pasid)`. Doing that, however, will disallow the process to execute the ENQCMD-like instructions at EL0. It is because the process cannot change its PASID in IA32_PASID(or ACCDATA_EL1 on ARM) for each device without the kernel's intervention. For this reason, calling `iommu_sva_bind_device()` and then `iommu_sva_bind_device_pasid()` for the same process will not be allowed and vice versa. Currently, there is a limitation that a process simultaneously doing SVA with multiple devices with different PASIDs is not supported. So, calling `iommu_sva_bind_device_pasid()` multiple times for the same process with different devices will not be allowed for now while that for `iommu_sva_bind_device()` will be. Another limitation is that a process cannot do `iommu_sva_bind_device()` if it has ever done `iommu_sva_bind_device_pasid()` even though it has been unbound after use. Suggested-by: Jason Gunthorpe Suggested-by: Kevin Tian Signed-off-by: Joonwon Kang --- v2: Reuse iommu_mm->pasid after SVA bound by iommu_sva_bind_device_pasid() is unbound. v1: Initial version. arch/x86/kernel/traps.c | 9 +-- drivers/iommu/iommu-sva.c | 151 +++++++++++++++++++++++++++++--------- include/linux/iommu.h | 14 +++- 3 files changed, 134 insertions(+), 40 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 0ca3912ecb7f..0131c8e5fb10 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -857,13 +857,12 @@ static bool try_fixup_enqcmd_gp(void) return false; =20 /* - * If the mm has not been allocated a - * PASID, the #GP can not be fixed up. + * If the mm has not been allocated a PASID or ENQCMD has been + * disallowed, the #GP can not be fixed up. */ - if (!mm_valid_pasid(current->mm)) - return false; - pasid =3D mm_get_enqcmd_pasid(current->mm); + if (pasid =3D=3D IOMMU_PASID_INVALID) + return false; =20 /* * Did this thread already have its PASID activated? diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index bc7c7232a43e..a83333651ad0 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -10,6 +10,9 @@ =20 #include "iommu-priv.h" =20 +/* Whether pasid is to be allocated from the global PASID space */ +#define IOMMU_PASID_GLOBAL_ANY IOMMU_NO_PASID + static DEFINE_MUTEX(iommu_sva_lock); static bool iommu_sva_present; static LIST_HEAD(iommu_sva_mms); @@ -17,10 +20,11 @@ static struct iommu_domain *iommu_sva_domain_alloc(stru= ct device *dev, struct mm_struct *mm); =20 /* Allocate a PASID for the mm within range (inclusive) */ -static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, str= uct device *dev) +static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, + struct device *dev, + ioasid_t pasid) { struct iommu_mm_data *iommu_mm; - ioasid_t pasid; =20 lockdep_assert_held(&iommu_sva_lock); =20 @@ -30,8 +34,27 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct = mm_struct *mm, struct de iommu_mm =3D mm->iommu_mm; /* Is a PASID already associated with this mm? */ if (iommu_mm) { + if ((pasid =3D=3D IOMMU_PASID_GLOBAL_ANY && !iommu_mm->pasid_global) || + (pasid !=3D IOMMU_PASID_GLOBAL_ANY && iommu_mm->pasid_global)) + return ERR_PTR(-EBUSY); + + if (!iommu_mm->pasid_global) { + if (list_empty(&iommu_mm->sva_domains)) + iommu_mm->pasid =3D pasid; + + if (pasid !=3D iommu_mm->pasid) { + /* + * Currently, a process simultaneously doing + * SVA with multiple devices with different + * PASIDs is not supported. + */ + return ERR_PTR(-ENOSPC); + } + } + if (iommu_mm->pasid >=3D dev->iommu->max_pasids) return ERR_PTR(-EOVERFLOW); + return iommu_mm; } =20 @@ -39,37 +62,30 @@ static struct iommu_mm_data *iommu_alloc_mm_data(struct= mm_struct *mm, struct de if (!iommu_mm) return ERR_PTR(-ENOMEM); =20 - pasid =3D iommu_alloc_global_pasid(dev); - if (pasid =3D=3D IOMMU_PASID_INVALID) { - kfree(iommu_mm); - return ERR_PTR(-ENOSPC); + if (pasid =3D=3D IOMMU_PASID_GLOBAL_ANY) { + pasid =3D iommu_alloc_global_pasid(dev); + if (pasid =3D=3D IOMMU_PASID_INVALID) { + kfree(iommu_mm); + return ERR_PTR(-ENOSPC); + } + iommu_mm->pasid_global =3D true; + } else { + if (pasid >=3D dev->iommu->max_pasids) { + kfree(iommu_mm); + return ERR_PTR(-EOVERFLOW); + } + iommu_mm->pasid_global =3D false; } iommu_mm->pasid =3D pasid; iommu_mm->mm =3D mm; INIT_LIST_HEAD(&iommu_mm->sva_domains); - /* - * Make sure the write to mm->iommu_mm is not reordered in front of - * initialization to iommu_mm fields. If it does, readers may see a - * valid iommu_mm with uninitialized values. - */ - smp_store_release(&mm->iommu_mm, iommu_mm); + return iommu_mm; } =20 -/** - * iommu_sva_bind_device() - Bind a process address space to a device - * @dev: the device - * @mm: the mm to bind, caller must hold a reference to mm_users - * - * Create a bond between device and address space, allowing the device to - * access the mm using the PASID returned by iommu_sva_get_pasid(). If a - * bond already exists between @device and @mm, an additional internal - * reference is taken. Caller must call iommu_sva_unbind_device() - * to release each reference. - * - * On error, returns an ERR_PTR value. - */ -struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_stru= ct *mm) +static struct iommu_sva *iommu_sva_bind_device_internal(struct device *dev, + struct mm_struct *mm, + ioasid_t pasid) { struct iommu_group *group =3D dev->iommu_group; struct iommu_attach_handle *attach_handle; @@ -84,7 +100,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *d= ev, struct mm_struct *mm mutex_lock(&iommu_sva_lock); =20 /* Allocate mm->pasid if necessary. */ - iommu_mm =3D iommu_alloc_mm_data(mm, dev); + iommu_mm =3D iommu_alloc_mm_data(mm, dev, pasid); if (IS_ERR(iommu_mm)) { ret =3D PTR_ERR(iommu_mm); goto out_unlock; @@ -96,7 +112,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *d= ev, struct mm_struct *mm handle =3D container_of(attach_handle, struct iommu_sva, handle); if (attach_handle->domain->mm !=3D mm) { ret =3D -EBUSY; - goto out_unlock; + goto out_free_iommu_mm; } refcount_inc(&handle->users); mutex_unlock(&iommu_sva_lock); @@ -105,17 +121,17 @@ struct iommu_sva *iommu_sva_bind_device(struct device= *dev, struct mm_struct *mm =20 if (PTR_ERR(attach_handle) !=3D -ENOENT) { ret =3D PTR_ERR(attach_handle); - goto out_unlock; + goto out_free_iommu_mm; } =20 handle =3D kzalloc_obj(*handle); if (!handle) { ret =3D -ENOMEM; - goto out_unlock; + goto out_free_iommu_mm; } =20 /* Search for an existing domain. */ - list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) { + list_for_each_entry(domain, &iommu_mm->sva_domains, next) { ret =3D iommu_attach_device_pasid(domain, dev, iommu_mm->pasid, &handle->handle); if (!ret) { @@ -143,6 +159,15 @@ struct iommu_sva *iommu_sva_bind_device(struct device = *dev, struct mm_struct *mm list_add(&iommu_mm->mm_list_elm, &iommu_sva_mms); } list_add(&domain->next, &iommu_mm->sva_domains); + if (!mm->iommu_mm) { + /* + * Make sure the write to mm->iommu_mm is not reordered in + * front of initialization to iommu_mm fields. If it does, + * readers may see a valid iommu_mm with uninitialized values. + */ + smp_store_release(&mm->iommu_mm, iommu_mm); + } + out: refcount_set(&handle->users, 1); mutex_unlock(&iommu_sva_lock); @@ -153,12 +178,66 @@ struct iommu_sva *iommu_sva_bind_device(struct device= *dev, struct mm_struct *mm iommu_domain_free(domain); out_free_handle: kfree(handle); +out_free_iommu_mm: + if (!mm->iommu_mm) { + if (iommu_mm->pasid_global) + iommu_free_global_pasid(iommu_mm->pasid); + kfree(iommu_mm); + } out_unlock: mutex_unlock(&iommu_sva_lock); return ERR_PTR(ret); } + +/** + * iommu_sva_bind_device() - Bind a process address space to a device + * @dev: the device + * @mm: the mm to bind, caller must hold a reference to mm_users + * + * Create a bond between device and address space, allowing the device to + * access the mm using the PASID returned by iommu_sva_get_pasid(). If a + * bond already exists between @device and @mm, an additional internal + * reference is taken. Caller must call iommu_sva_unbind_device() + * to release each reference. + * + * On error, returns an ERR_PTR value. + */ +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_stru= ct *mm) +{ + return iommu_sva_bind_device_internal(dev, mm, IOMMU_PASID_GLOBAL_ANY); +} EXPORT_SYMBOL_GPL(iommu_sva_bind_device); =20 +/** + * iommu_sva_bind_device_pasid() - Bind a process address space to a device + * with a designated pasid + * @dev: the device + * @mm: the mm to bind, caller must hold a reference to mm_users + * @pasid: the pasid to assign to the bond + * + * Create a bond between device and address space, allowing the device to + * access the mm using the PASID returned by iommu_sva_get_pasid(). If a + * bond already exists between @device and @mm, an additional internal + * reference is taken. Caller must call iommu_sva_unbind_device() + * to release each reference. + * + * It is the caller's responsibility to maintain the PASID space for @pasi= d. + * After the bond is created, the process for @mm will not be able to exec= ute + * ENQCMD or similar instructions at EL0. To allow those instructions at E= L0, + * iommu_sva_bind_device() must be used instead. + * + * On error, returns an ERR_PTR value. + */ +struct iommu_sva *iommu_sva_bind_device_pasid(struct device *dev, + struct mm_struct *mm, + ioasid_t pasid) +{ + if (pasid =3D=3D IOMMU_PASID_GLOBAL_ANY) + return ERR_PTR(-EINVAL); + return iommu_sva_bind_device_internal(dev, mm, pasid); +} +EXPORT_SYMBOL_GPL(iommu_sva_bind_device_pasid); + /** * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_d= evice * @handle: the handle returned by iommu_sva_bind_device() @@ -198,9 +277,12 @@ EXPORT_SYMBOL_GPL(iommu_sva_unbind_device); =20 u32 iommu_sva_get_pasid(struct iommu_sva *handle) { - struct iommu_domain *domain =3D handle->handle.domain; + struct iommu_mm_data *iommu_mm =3D handle->handle.domain->mm->iommu_mm; + + if (!iommu_mm) + return IOMMU_PASID_INVALID; =20 - return mm_get_enqcmd_pasid(domain->mm); + return iommu_mm->pasid; } EXPORT_SYMBOL_GPL(iommu_sva_get_pasid); =20 @@ -211,7 +293,8 @@ void mm_pasid_drop(struct mm_struct *mm) if (!iommu_mm) return; =20 - iommu_free_global_pasid(iommu_mm->pasid); + if (iommu_mm->pasid_global) + iommu_free_global_pasid(iommu_mm->pasid); kfree(iommu_mm); } =20 diff --git a/include/linux/iommu.h b/include/linux/iommu.h index e587d4ac4d33..5b6116e7152d 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1140,6 +1140,7 @@ struct iommu_sva { =20 struct iommu_mm_data { u32 pasid; + bool pasid_global; struct mm_struct *mm; struct list_head sva_domains; struct list_head mm_list_elm; @@ -1626,7 +1627,7 @@ static inline u32 mm_get_enqcmd_pasid(struct mm_struc= t *mm) { struct iommu_mm_data *iommu_mm =3D READ_ONCE(mm->iommu_mm); =20 - if (!iommu_mm) + if (!iommu_mm || !iommu_mm->pasid_global) return IOMMU_PASID_INVALID; return iommu_mm->pasid; } @@ -1634,6 +1635,9 @@ static inline u32 mm_get_enqcmd_pasid(struct mm_struc= t *mm) void mm_pasid_drop(struct mm_struct *mm); struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm); +struct iommu_sva *iommu_sva_bind_device_pasid(struct device *dev, + struct mm_struct *mm, + ioasid_t pasid); void iommu_sva_unbind_device(struct iommu_sva *handle); u32 iommu_sva_get_pasid(struct iommu_sva *handle); void iommu_sva_invalidate_kva_range(unsigned long start, unsigned long end= ); @@ -1644,6 +1648,14 @@ iommu_sva_bind_device(struct device *dev, struct mm_= struct *mm) return ERR_PTR(-ENODEV); } =20 +static inline struct iommu_sva * +iommu_sva_bind_device_pasid(struct device *dev, + struct mm_struct *mm, + ioasid_t pasid) +{ + return ERR_PTR(-ENODEV); +} + static inline void iommu_sva_unbind_device(struct iommu_sva *handle) { } --=20 2.54.0.631.ge1b05301d1-goog