From nobody Sat Feb 7 08:44:16 2026 Received: from mail-ed1-f73.google.com (mail-ed1-f73.google.com [209.85.208.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E65A26D4C1 for ; Wed, 12 Nov 2025 10:15:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762942512; cv=none; b=EP4np2zZrmN2SWTRsJc8twICM0zlXC8uRvirS6VqUo5gkrAN9x6XZpHcuCgT8viTKOoRdqIUw6MjA7DaE87wAifQeePD2VVDpuNMT1UvQEn4kRKwrur6LOyqDTXqg6DT9QDl6sG7p39FCridvMPQjN6TtGtIpxGHfppizuDs6uE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762942512; c=relaxed/simple; bh=o9+xpW4uPh8uhRYnWukE8UfaplpoEAJ9o211/Bb8GT8=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=ir6ryXcC5/kMaaxn1nvLVXJbufTMWMeLQ4WviqsmqjSijfrTicAzKyn/DqkIkk8l1bipLWsgtDErtLCWr4id3DFQOgo61Nvft+4ynxOPYQnxg5r1W2+yUCeUNkNKtAaI/O7yl/JG+T5GHuuA2AnVMmAOSDmz6uAKsTjnk5WTyPw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tWODgprB; arc=none smtp.client-ip=209.85.208.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tWODgprB" Received: by mail-ed1-f73.google.com with SMTP id 4fb4d7f45d1cf-6408222225eso885540a12.3 for ; Wed, 12 Nov 2025 02:15:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1762942509; x=1763547309; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=73I2FUGpROBWvvc1rEHs+Shl3zJLhmBgQPZT0bfkTyg=; b=tWODgprBhjFU294CNcY1IX7G2ROhJEo1yMmT4mKB6Ne2wonbAcA+iXPIQGEuPmOnc1 P+IGU/WOyRM0OnV9GrXfitQdqEP1/HkfYk9g2MIxlQ7sY9/6QgjPQRM5LmPlxvftlXur Erzh7ipTNE/m168aRaWC3PmTAz9zINzreMBhvrf0OnIgLcQCK6wWg6hY8mc7qsFheV3O L7usuiuI4HSFF/J5InXqqhpMNcvCYR/8DZLdBmC1KC+enCE3gNYxWXPJyJoPcB2KyRpE abHDTrl1CxUctGzAZgv+x9Y5Lxc9K1ZnraZPoICuT0Vb9tRrhJtggsWGT3BuZi+26xsv 4fIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762942509; x=1763547309; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=73I2FUGpROBWvvc1rEHs+Shl3zJLhmBgQPZT0bfkTyg=; b=GzyuT64k41VzzRBmQhImJsgi8E0gDyboR3KQf4g/AbDJAFfPyYq0JXF8H2hkxRrOlH ySfcrVK1KlPaGC52NbZdlfBuKGS+rUPj2eCZqDWr0wsCh6zUQ9GmjkdXdLrk0O2g+Sn8 UVTT89lJZJfevTYQ6KMZMPj1QNOxI63wHbgarI5ic1ynprPZQTuc1mLF7azoLhtkUf/M 1rg9Mkyx1ExRvNcImjS+jvcJndBpoSqsj/iZEQeIol19EQ5VJ1xdD6BKP+rCj+dVClU3 ZYtJjRk2iAknix6Jj3HCUKenqTek3EI+o1CxOrknBpH8dK1eSpDYAAKzMDASdZ8V4bNG i/Xg== X-Forwarded-Encrypted: i=1; AJvYcCUfZ/8ulASBfxo8TfEfiJ0EGhHABPphelEk5zL1i0DD7c2Z64scm6wq+Z7kCo6D7Wl/5WNryhf8A9KTC4g=@vger.kernel.org X-Gm-Message-State: AOJu0YyIcDoAsL60i2ULRju1mebQd7fKOy9FXGfiHg7DCektmLKIRz8f XwPhNugTdwhJfweD1fIumA+v1VW1kgv+IYdWaY8rMSo87gVPI8H+REOpPwbXGKA68IrXyeRMhGM 2rlHniRwTgCUUCNIVsA== X-Google-Smtp-Source: AGHT+IFqKjM6hhCMFPCGk+2Jpbz5Hm1iHGXFKauxgdFMLZ34a2nCz7aZdsSy/MowpmbykFK42/F+adtGbWZpHz8= X-Received: from edg25.prod.google.com ([2002:a05:6402:23d9:b0:641:85b3:f437]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:26d1:b0:640:fa38:7e4a with SMTP id 4fb4d7f45d1cf-6431a49a8cemr2140719a12.8.1762942508817; Wed, 12 Nov 2025 02:15:08 -0800 (PST) Date: Wed, 12 Nov 2025 10:15:00 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-B4-Tracking: v=1; b=H4sIACNeFGkC/0XMTQ6DIBCG4asQ1p1Gx/pTrtK4EBiVpAoFNDXGu 5fWRWf3TvI9Ow/kDQUu2M49rSYYO6coLoyrsZsHAqNTc8ywzNOBseCG2MknQU9ZgyhvJLXmaeA 89eb9wx7t2Z5eSzLj+fyTgp1gVoGnYBevCNy4BYibI019gErLor5jrfPyLlb8+rILBMpOk4mCr dU1b8Ar5O1xfACweRVgxwAAAA== X-Change-Id: 20251111-io-pgtable-fe0822b4ebdd X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=12178; i=aliceryhl@google.com; h=from:subject:message-id; bh=ZXrOBFdyf9TG39sXS6oqK5q38LX8QtvFEFyq+TKtdgQ=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBpFF4nbVQ48kgpUcJKNVdzu9LCeXj0vRdE2I+Qv qo/Vy3hpOCJAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCaRReJwAKCRAEWL7uWMY5 Ri6kEACJ1ErD1ff8rRfLJ/E1cEo3ogYuuSXdi9FVJi7KwZMw0ryDRadjy26jInquoxCCfu2qrS9 P6iY4wBtKDX+vOHETQMKq8nOmDOhDOB0nEEEPsnPD6dMTE+fzf6AOgO9xbpgbH9ZyYZIxbJPOtu nNlG1p3mvjEbL56gmynSEyunQkDjE3oOPUC9w2nozhVlLQEG/8deuKcR0dk2tF1PNG+2GRRGTB9 YgpwbCf8mWn/eIS7R1bUMzLHnLX10EbWTQVeG2MdPgMe/HzrUzmhnbFNSaRpmeiuUdYpqMwYdlN mXWPIVOB+qhqCR8rKwfOEBraN2N/xFGFqLpWUecNUPO5/8TgAM27a2fVJQchwqxdqVEQsIiiLrZ qYQ9WjW94fhJvg6SIPUmJGlA0YFZbGVFOnQhBb0x7+i2Q9HFjHisL5XcPBoTCia5oUkQLEUhTB/ Y0BXqipcIUL79CmELu152HEAl3LqSN9Oc9brq7CYPgNs4HjA/HZO/qj1MLk3iEEBy2f8y7ZMGMR IshBN7Kox/QxntpwacoOYZZCcJ6ScKG8qMcsA8OQAjOhO0idBw8egPuWPKFcy7U+uOe1w5E/Nhp fA6Ca+tiJaUZQK8oyZerF33kS9pOps9AEY4gFHLdmGSvP32fRejNa/56LKZ7DbPtAyzqgvszjIr OFZWcZQaNThXsdQ== X-Mailer: b4 0.14.2 Message-ID: <20251112-io-pgtable-v3-1-b00c2e6b951a@google.com> Subject: [PATCH v3] io: add io_pgtable abstraction From: Alice Ryhl To: Miguel Ojeda , Will Deacon , Daniel Almeida , Boris Brezillon Cc: Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , Danilo Krummrich , Joerg Roedel , Robin Murphy , Lorenzo Stoakes , "Liam R. Howlett" , Asahi Lina , linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org, iommu@lists.linux.dev, linux-mm@kvack.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Asahi Lina This will be used by the Tyr driver to create and modify the page table of each address space on the GPU. Each time a mapping gets created or removed by userspace, Tyr will call into GPUVM, which will figure out which calls to map_pages and unmap_pages are required to map the data in question in the page table so that the GPU may access those pages when using that address space. The Rust type wraps the struct using a raw pointer rather than the usual Opaque+ARef approach because Opaque+ARef requires the target type to be refcounted. Signed-off-by: Asahi Lina Co-Developed-by: Alice Ryhl Signed-off-by: Alice Ryhl Acked-by: Boris Brezillon --- This patch is based on [1] but I have rewritten and simplified large parts of it. The Asahi driver no longer uses the io-pgtable abstraction, and Nova never planned to (since NVIDIA has its own separate memory). Therefore, I have simplified these abstractions to fit the needs of the Tyr GPU driver. This series depends on the PhysAddr typedef [2]. [1]: https://lore.kernel.org/all/20250623-io_pgtable-v2-1-fd72daac75f1@coll= abora.com/ [2]: https://lore.kernel.org/all/20251112-resource-phys-typedefs-v2-0-53830= 7384f82@google.com/ --- rust/bindings/bindings_helper.h | 3 +- rust/kernel/io.rs | 1 + rust/kernel/io/pgtable.rs | 254 ++++++++++++++++++++++++++++++++++++= ++++ 3 files changed, 257 insertions(+), 1 deletion(-) diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helpe= r.h index 2e43c66635a2c9f31bd99b9817bd2d6ab89fbcf2..faab6bc9463321c092a8bbcb628= 1175e490caccd 100644 --- a/rust/bindings/bindings_helper.h +++ b/rust/bindings/bindings_helper.h @@ -56,8 +56,9 @@ #include #include #include -#include #include +#include +#include #include #include #include diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs index 56a435eb14e3a1ce72dd58b88cbf296041f1703e..5913e240d5a9814ceed52c6dc1a= 798e64158d567 100644 --- a/rust/kernel/io.rs +++ b/rust/kernel/io.rs @@ -8,6 +8,7 @@ use crate::{bindings, build_assert, ffi::c_void}; =20 pub mod mem; +pub mod pgtable; pub mod poll; pub mod resource; =20 diff --git a/rust/kernel/io/pgtable.rs b/rust/kernel/io/pgtable.rs new file mode 100644 index 0000000000000000000000000000000000000000..fe05bc1673f9a7741a887a3c9bb= ad866dd17a2b5 --- /dev/null +++ b/rust/kernel/io/pgtable.rs @@ -0,0 +1,254 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! IOMMU page table management. +//! +//! C header: [`include/io-pgtable.h`](srctree/include/io-pgtable.h) + +use core::{ + marker::PhantomData, + ptr::NonNull, // +}; + +use crate::{ + alloc, + bindings, + device::{Bound, Device}, + devres::Devres, + error::to_result, + io::PhysAddr, + prelude::*, // +}; + +use bindings::io_pgtable_fmt; + +/// Protection flags used with IOMMU mappings. +pub mod prot { + /// Read access. + pub const READ: u32 =3D bindings::IOMMU_READ; + /// Write access. + pub const WRITE: u32 =3D bindings::IOMMU_WRITE; + /// Request cache coherency. + pub const CACHE: u32 =3D bindings::IOMMU_CACHE; + /// Request no-execute permission. + pub const NOEXEC: u32 =3D bindings::IOMMU_NOEXEC; + /// MMIO peripheral mapping. + pub const MMIO: u32 =3D bindings::IOMMU_MMIO; + /// Privileged mapping. + pub const PRIV: u32 =3D bindings::IOMMU_PRIV; +} + +/// Represents a requested `io_pgtable` configuration. +pub struct Config { + /// Quirk bitmask (type-specific). + pub quirks: usize, + /// Valid page sizes, as a bitmask of powers of two. + pub pgsize_bitmap: usize, + /// Input address space size in bits. + pub ias: u32, + /// Output address space size in bits. + pub oas: u32, + /// IOMMU uses coherent accesses for page table walks. + pub coherent_walk: bool, +} + +/// An io page table using a specific format. +/// +/// # Invariants +/// +/// The pointer references a valid io page table. +pub struct IoPageTable { + ptr: NonNull, + _marker: PhantomData, +} + +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread. +unsafe impl Send for IoPageTable {} +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently. +unsafe impl Sync for IoPageTable {} + +/// The format used by this page table. +pub trait IoPageTableFmt: 'static { + /// The value representing this format. + const FORMAT: io_pgtable_fmt; +} + +impl IoPageTable { + /// Create a new `IoPageTable` as a device resource. + #[inline] + pub fn new( + dev: &Device, + config: Config, + ) -> impl PinInit>, Error> + '_ { + // SAFETY: Devres ensures that the value is dropped during device = unbind. + Devres::new(dev, unsafe { Self::new_raw(dev, config) }) + } + + /// Create a new `IoPageTable`. + /// + /// # Safety + /// + /// If successful, then the returned value must be dropped before the = device is unbound. + #[inline] + pub unsafe fn new_raw(dev: &Device, config: Config) -> Result> { + let mut raw_cfg =3D bindings::io_pgtable_cfg { + quirks: config.quirks, + pgsize_bitmap: config.pgsize_bitmap, + ias: config.ias, + oas: config.oas, + coherent_walk: config.coherent_walk, + tlb: &raw const NOOP_FLUSH_OPS, + iommu_dev: dev.as_raw(), + // SAFETY: All zeroes is a valid value for `struct io_pgtable_= cfg`. + ..unsafe { core::mem::zeroed() } + }; + + // SAFETY: + // * The raw_cfg pointer is valid for the duration of this call. + // * The provided `FLUSH_OPS` contains valid function pointers tha= t accept a null pointer + // as cookie. + // * The caller ensures that the io pgtable does not outlive the d= evice. + let ops =3D unsafe { + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::= ptr::null_mut()) + }; + // INVARIANT: We successfully created a valid page table. + Ok(IoPageTable { + ptr: NonNull::new(ops).ok_or(ENOMEM)?, + _marker: PhantomData, + }) + } + + /// Obtain a raw pointer to the underlying `struct io_pgtable_ops`. + #[inline] + pub fn raw_ops(&self) -> *mut bindings::io_pgtable_ops { + self.ptr.as_ptr() + } + + /// Obtain a raw pointer to the underlying `struct io_pgtable`. + #[inline] + pub fn raw_pgtable(&self) -> *mut bindings::io_pgtable { + // SAFETY: The io_pgtable_ops of an io-pgtable is always the ops f= ield of a io_pgtable. + unsafe { kernel::container_of!(self.raw_ops(), bindings::io_pgtabl= e, ops) } + } + + /// Obtain a raw pointer to the underlying `struct io_pgtable_cfg`. + #[inline] + pub fn raw_cfg(&self) -> *mut bindings::io_pgtable_cfg { + // SAFETY: The `raw_pgtable()` method returns a valid pointer. + unsafe { &raw mut (*self.raw_pgtable()).cfg } + } + + /// Map a physically contiguous range of pages of the same size. + /// + /// # Safety + /// + /// * This page table must not contain any mapping that overlaps with = the mapping created by + /// this call. + /// * If this page table is live, then the caller must ensure that it'= s okay to access the + /// physical address being mapped for the duration in which it is ma= pped. + #[inline] + pub unsafe fn map_pages( + &self, + iova: usize, + paddr: PhysAddr, + pgsize: usize, + pgcount: usize, + prot: u32, + flags: alloc::Flags, + ) -> Result { + let mut mapped: usize =3D 0; + + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never n= ull. + let map_pages =3D unsafe { (*self.raw_ops()).map_pages.unwrap_unch= ecked() }; + + // SAFETY: The safety requirements of this method are sufficient t= o call `map_pages`. + to_result(unsafe { + (map_pages)( + self.raw_ops(), + iova, + paddr, + pgsize, + pgcount, + prot as i32, + flags.as_raw(), + &mut mapped, + ) + })?; + + Ok(mapped) + } + + /// Unmap a range of virtually contiguous pages of the same size. + /// + /// # Safety + /// + /// This page table must contain a mapping at `iova` that consists of = exactly `pgcount` pages + /// of size `pgsize`. + #[inline] + pub unsafe fn unmap_pages(&self, iova: usize, pgsize: usize, pgcount: = usize) -> usize { + // SAFETY: The `unmap_pages` function in `io_pgtable_ops` is never= null. + let unmap_pages =3D unsafe { (*self.raw_ops()).unmap_pages.unwrap_= unchecked() }; + + // SAFETY: The safety requirements of this method are sufficient t= o call `unmap_pages`. + unsafe { (unmap_pages)(self.raw_ops(), iova, pgsize, pgcount, core= ::ptr::null_mut()) } + } +} + +// These bindings are currently designed for use by GPU drivers, which use= this page table together +// with GPUVM. When using GPUVM, a single mapping operation may be transla= ted into many operations +// on the page table, and in that case you generally want to flush the TLB= only once per GPUVM +// operation. Thus, do not use these callbacks as they would flush more of= ten than needed. +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops =3D bindings::iommu_flush= _ops { + tlb_flush_all: Some(rust_tlb_flush_all_noop), + tlb_flush_walk: Some(rust_tlb_flush_walk_noop), + tlb_add_page: None, +}; + +#[no_mangle] +extern "C" fn rust_tlb_flush_all_noop(_cookie: *mut core::ffi::c_void) {} + +#[no_mangle] +extern "C" fn rust_tlb_flush_walk_noop( + _iova: usize, + _size: usize, + _granule: usize, + _cookie: *mut core::ffi::c_void, +) { +} + +impl Drop for IoPageTable { + fn drop(&mut self) { + // SAFETY: The caller of `ttbr` promised that the page table is no= t live when this + // destructor runs. + unsafe { bindings::free_io_pgtable_ops(self.0.ops) }; + } +} + +/// The `ARM_64_LPAE_S1` page table format. +pub enum ARM64LPAES1 {} + +impl IoPageTableFmt for ARM64LPAES1 { + const FORMAT: io_pgtable_fmt =3D bindings::io_pgtable_fmt_ARM_64_LPAE_= S1 as io_pgtable_fmt; +} + +impl IoPageTable { + /// Access the `ttbr` field of the configuration. + /// + /// This is the physical address of the page table, which may be passe= d to the device that + /// needs to use it. + /// + /// # Safety + /// + /// The caller must ensure that the device stops using the page table = before dropping it. + #[inline] + pub unsafe fn ttbr(&self) -> u64 { + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES= 1`. + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.ttbr } + } + + /// Access the `mair` field of the configuration. + #[inline] + pub fn mair(&self) -> u64 { + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES= 1`. + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.mair } + } +} --- base-commit: ffee675aceb9f44b0502a8bec912abb0c4f4af62 change-id: 20251111-io-pgtable-fe0822b4ebdd prerequisite-change-id: 20251106-resource-phys-typedefs-6db37927d159:v2 prerequisite-patch-id: 350421d8dbaf3db51b1243d82077c5eb88f54db5 prerequisite-patch-id: ac0166fb3cd235de76841789173051191a4d2434 prerequisite-patch-id: f4bca02c77c40093690b66cdf477f928784bdbf4 prerequisite-patch-id: 083d1c22b1a7eb0dcae37052b926362543c68e8a Best regards, --=20 Alice Ryhl