From nobody Sun Feb 8 02:21:36 2026 Received: from mail-ed1-f73.google.com (mail-ed1-f73.google.com [209.85.208.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E399301012 for ; Fri, 19 Dec 2025 10:51:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766141466; cv=none; b=mpIACFkAeoOrr4kLC6MymUKa3n8uAcB9y/AVF8s5So96CKEnWAxejifR9YvdAWOTtf0J1xcBGWtbAZppANAdENf95aPgAMNKt2zqgzr/QiwzIxJkIEE2DHCY2KATO7DFeRk5QTwM9vJhgeE7AwFvcDhZrvtbqwqKJp1hr+5IrXU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766141466; c=relaxed/simple; bh=dMIbPSyqjVMQ35wHjtTUedOrA8v6+L3doPVZMdxV3/M=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=Ehm8gdw7Iv8FkML44W0E2iNfJW/TdssVJsMstt6FC6hxNHt5dZNhWGRgy0R2XT0iMR0dqdA+VvlpsS6HbEt6EK0NUxl3NcFE+kGfdKoqgocnjOgYyHafxvZCEgRboo3LX6rtdxH5O8VRgT85FTrsBrnhi1wLVRbVoCn8D/ip+aE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2ivLnobY; arc=none smtp.client-ip=209.85.208.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2ivLnobY" Received: by mail-ed1-f73.google.com with SMTP id 4fb4d7f45d1cf-64b8dab21easo891957a12.1 for ; Fri, 19 Dec 2025 02:51:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1766141463; x=1766746263; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=SFDmnD/AtD5C3m0av9uWzqAF9ygVuagC6pCqVMqHXtk=; b=2ivLnobYziW0XObYamq8w8Dqa5RrYneofqWMXALJtwElNPzMDvj3sI5RqDlV4grRUI HVv7w50gdyGLGFD2cbz9NZocgWqsM56EVwhRlm15a6cYfIB1KzKOvdNb/jskcwIJAoNQ 1WeSu2ZNnWqDLU65xhWT7NEWgULMmALzF6AqruF3au1uiJA4sicZlMt6bQwrQu11tG7W 2prCzqleu4+B8fsRnbbEJWOl+2VIcKTGLDvWGDaAPRRYyUs2Eemr3gSPT40lc47Je7l6 TIuQ68kttQ2by4qUxa6+HwGw+3OF3Mt16WQWLq9WNRt6RdEtueBqp2tW0VkYBZfaY4By NiZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1766141463; x=1766746263; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=SFDmnD/AtD5C3m0av9uWzqAF9ygVuagC6pCqVMqHXtk=; b=TBmhJi5/33446wzzTjMX2vt5KCIu8i/MGV78sEwdJ3bK5uw/1loYLLCfCKdziw9RjE RaEv8d+ssZsPElXN49hCrA7kri+Fuz8tD/NuHtYACVse6bKNWkFJFxYM810n9grBVKeH NC9AqRaFAqBOrEVj3PCNmz3rO4CC6a9zpo8BFTTpEAon9n7q3qA362Iw4eRSQi6efksu nXB2uKYVZ2yPcq5UvUtKytYNts6gKPyH3sZXbEPyzwOnhwWqUvz/9MvG0q6KlqWYJ6VP 9wbjqpSfhZbuaYyoPXkpOeUG6bfE3jbCFdXqFCsBD9iKJCdoMZUSFtbaznsqAqabyMb8 L2QA== X-Forwarded-Encrypted: i=1; AJvYcCXXY1A7ZTVVJi9Y7s+nUqdbyXZJIAzLjsO13I1HXE2nRhHaJPLcnWqX4Il29B1ElQKwIgTYQ5kdISsMMXM=@vger.kernel.org X-Gm-Message-State: AOJu0YxHMJOVBGBvCujzIVAswW3kkiMGpzgv+3iVU/xhSG9ZLetq+71R aM7OMQ8ZPOUD3bsClrCHA5EAD+c9wLSabt3QJMvI0mVjFOdIncYDWqLEhqvcHLlEcjJ5dtOEHX9 UwGb4PRbGEAPtelOQzA== X-Google-Smtp-Source: AGHT+IFsvIMhYISekbniT1OSzrzjnmd5mPLV8E2NfHuR2lYU97/cV3bhT6AtyekwFUIJjn5ib19avNkmSSr8epA= X-Received: from edrz22.prod.google.com ([2002:aa7:c656:0:b0:64b:7f74:4879]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6402:518b:b0:64b:a1e6:8017 with SMTP id 4fb4d7f45d1cf-64ba1e68459mr975873a12.27.1766141462593; Fri, 19 Dec 2025 02:51:02 -0800 (PST) Date: Fri, 19 Dec 2025 10:50:52 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-B4-Tracking: v=1; b=H4sIAAsuRWkC/02Oyw6DIBBFf8XMuhjAR6qr/kfjAnBEEhULlLQx/ nupdtHZncmdc2cDj86ghzbbwGE03tglQXnJQI1i0UhMnxg45RVLQ4wlqw5CTkgGpFfOZYmy7yE drA4H8zpk9+5kh49ncoZzCVJ4JMrOswltFuucNcQpBt/waHyw7n08Eosj/evk/52xIIxIShXHW jYVEzdtrZ4wT1bo9n3/ALIK0x7SAAAA X-Change-Id: 20251111-io-pgtable-fe0822b4ebdd X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=13538; i=aliceryhl@google.com; h=from:subject:message-id; bh=uJYCNjJzEYtBk/2iSJnBj1R9WDR2GzMN4fV/t+4RXcY=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBpRS4RncHCV4nyS//y4EAx+bwRMSgTWZLk3H0zY of/WRAd/OWJAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCaUUuEQAKCRAEWL7uWMY5 RtMKEAC6EBqe0yNbvKxSik8ZGeRpxMHrK4icFxX/DE/flKsK+rF6gcXEMnGCo4DJ8Me8RcHPdzn KD6njKVO9AB9ewjx/iC9dPVJkKAwrYJloVv6DbBcBFULOfxuyafoPXV5nFlSoDo4D/pVRVEoZiU Hl0NGVLsYEvOgQ9ifWdNqXF31WtvQ5mMFzK7qpA86xYRKFkH6TEZkxGk7Vb10wHj2ahbjBsK1Y8 JWpLgu9/Hx1IRttjhBjHqNvnnktOVwyzuLr2VLQ1UA6SXqoOu1vfEThdH/vvlLIvOR6a6h0jH6K breiZA2kdB0+1A8uzQFqJAqfC6wksrgjsHqRUQmRspta5WjKEENBuoB/1jwBP7y6t6ovgQcrJqS RlLYbLKKCOaqvuLKbHV2hD5qbvlA14sQAXLilTeZl73v4gbLcxBxTZHbC2tkcFWB3lscqFz5i/H 1iGviEq/qenzMz88SSDev8C12/UhQ4IayJRWI+r41hS288PtA6ch2Dwk3Q8OvjxsnZjYNgOqRBr +1fkIINIVU4a0jETQhEBAn2zAHP667lHUAXkqIBXzXMNpMLd1bdtHeRXHmNlqUFJIzxDwCx5twL NebLAoc93N0D+PlrUap4jNl1LndvIU1aO02vNu9+bfGf8FACSs5IgtNbzzrLBDn1eB3DU1ZmSwh gpc9YvUiqVHMuVQ== X-Mailer: b4 0.14.2 Message-ID: <20251219-io-pgtable-v4-1-68aaa7a40380@google.com> Subject: [PATCH v4] io: add io_pgtable abstraction From: Alice Ryhl To: Miguel Ojeda , Will Deacon , Daniel Almeida , Boris Brezillon , Robin Murphy , Jason Gunthorpe Cc: Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , Danilo Krummrich , Joerg Roedel , Lorenzo Stoakes , "Liam R. Howlett" , Asahi Lina , linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org, iommu@lists.linux.dev, linux-mm@kvack.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Asahi Lina This will be used by the Tyr driver to create and modify the page table of each address space on the GPU. Each time a mapping gets created or removed by userspace, Tyr will call into GPUVM, which will figure out which calls to map_pages and unmap_pages are required to map the data in question in the page table so that the GPU may access those pages when using that address space. The Rust type wraps the struct using a raw pointer rather than the usual Opaque+ARef approach because Opaque+ARef requires the target type to be refcounted. Signed-off-by: Asahi Lina Acked-by: Boris Brezillon Co-developed-by: Alice Ryhl Signed-off-by: Alice Ryhl Reviewed-by: Daniel Almeida --- Changes in v4: - Rename prot::PRIV to prot::PRIVILEGED - Adjust map_pages to return the length even on error. - Explain return value in docs of map_pages and unmap_pages. - Explain in map_pages that the caller must explicitly flush the TLB before accessing the resulting mapping. - Add a safety requirement that access to a given range is required to be exclusive. - Reword comment on NOOP_FLUSH_OPS. - Rebase on v6.19-rc1 and pick up tags. - Link to v3: https://lore.kernel.org/r/20251112-io-pgtable-v3-1-b00c2e6b95= 1a@google.com Changes in v3: - Almost entirely rewritten from scratch. - Link to v2: https://lore.kernel.org/all/20250623-io_pgtable-v2-1-fd72daac= 75f1@collabora.com/ --- rust/bindings/bindings_helper.h | 3 +- rust/kernel/io.rs | 1 + rust/kernel/io/pgtable.rs | 278 ++++++++++++++++++++++++++++++++++++= ++++ 3 files changed, 281 insertions(+), 1 deletion(-) diff --git a/rust/bindings/bindings_helper.h b/rust/bindings/bindings_helpe= r.h index a067038b4b422b4256f4a2b75fe644d47e6e82c8..1b05a5e4cfb4780fdc27813d708= a8f1a6a2d9913 100644 --- a/rust/bindings/bindings_helper.h +++ b/rust/bindings/bindings_helper.h @@ -56,9 +56,10 @@ #include #include #include -#include #include #include +#include +#include #include #include #include diff --git a/rust/kernel/io.rs b/rust/kernel/io.rs index 98e8b84e68d11ef74b2026d8c3d847a127f4672d..88253158448cbf493ca200a87ef= 9ba958255e761 100644 --- a/rust/kernel/io.rs +++ b/rust/kernel/io.rs @@ -10,6 +10,7 @@ }; =20 pub mod mem; +pub mod pgtable; pub mod poll; pub mod resource; =20 diff --git a/rust/kernel/io/pgtable.rs b/rust/kernel/io/pgtable.rs new file mode 100644 index 0000000000000000000000000000000000000000..11096acfa41d45125e866876e41= 459a347e9afe6 --- /dev/null +++ b/rust/kernel/io/pgtable.rs @@ -0,0 +1,278 @@ +// SPDX-License-Identifier: GPL-2.0 + +//! IOMMU page table management. +//! +//! C header: [`include/io-pgtable.h`](srctree/include/io-pgtable.h) + +use core::{ + marker::PhantomData, + ptr::NonNull, // +}; + +use crate::{ + alloc, + bindings, + device::{Bound, Device}, + devres::Devres, + error::to_result, + io::PhysAddr, + prelude::*, // +}; + +use bindings::io_pgtable_fmt; + +/// Protection flags used with IOMMU mappings. +pub mod prot { + /// Read access. + pub const READ: u32 =3D bindings::IOMMU_READ; + /// Write access. + pub const WRITE: u32 =3D bindings::IOMMU_WRITE; + /// Request cache coherency. + pub const CACHE: u32 =3D bindings::IOMMU_CACHE; + /// Request no-execute permission. + pub const NOEXEC: u32 =3D bindings::IOMMU_NOEXEC; + /// MMIO peripheral mapping. + pub const MMIO: u32 =3D bindings::IOMMU_MMIO; + /// Privileged mapping. + pub const PRIVILEGED: u32 =3D bindings::IOMMU_PRIV; +} + +/// Represents a requested `io_pgtable` configuration. +pub struct Config { + /// Quirk bitmask (type-specific). + pub quirks: usize, + /// Valid page sizes, as a bitmask of powers of two. + pub pgsize_bitmap: usize, + /// Input address space size in bits. + pub ias: u32, + /// Output address space size in bits. + pub oas: u32, + /// IOMMU uses coherent accesses for page table walks. + pub coherent_walk: bool, +} + +/// An io page table using a specific format. +/// +/// # Invariants +/// +/// The pointer references a valid io page table. +pub struct IoPageTable { + ptr: NonNull, + _marker: PhantomData, +} + +// SAFETY: `struct io_pgtable_ops` is not restricted to a single thread. +unsafe impl Send for IoPageTable {} +// SAFETY: `struct io_pgtable_ops` may be accessed concurrently. +unsafe impl Sync for IoPageTable {} + +/// The format used by this page table. +pub trait IoPageTableFmt: 'static { + /// The value representing this format. + const FORMAT: io_pgtable_fmt; +} + +impl IoPageTable { + /// Create a new `IoPageTable` as a device resource. + #[inline] + pub fn new( + dev: &Device, + config: Config, + ) -> impl PinInit>, Error> + '_ { + // SAFETY: Devres ensures that the value is dropped during device = unbind. + Devres::new(dev, unsafe { Self::new_raw(dev, config) }) + } + + /// Create a new `IoPageTable`. + /// + /// # Safety + /// + /// If successful, then the returned value must be dropped before the = device is unbound. + #[inline] + pub unsafe fn new_raw(dev: &Device, config: Config) -> Result> { + let mut raw_cfg =3D bindings::io_pgtable_cfg { + quirks: config.quirks, + pgsize_bitmap: config.pgsize_bitmap, + ias: config.ias, + oas: config.oas, + coherent_walk: config.coherent_walk, + tlb: &raw const NOOP_FLUSH_OPS, + iommu_dev: dev.as_raw(), + // SAFETY: All zeroes is a valid value for `struct io_pgtable_= cfg`. + ..unsafe { core::mem::zeroed() } + }; + + // SAFETY: + // * The raw_cfg pointer is valid for the duration of this call. + // * The provided `FLUSH_OPS` contains valid function pointers tha= t accept a null pointer + // as cookie. + // * The caller ensures that the io pgtable does not outlive the d= evice. + let ops =3D unsafe { + bindings::alloc_io_pgtable_ops(F::FORMAT, &mut raw_cfg, core::= ptr::null_mut()) + }; + // INVARIANT: We successfully created a valid page table. + Ok(IoPageTable { + ptr: NonNull::new(ops).ok_or(ENOMEM)?, + _marker: PhantomData, + }) + } + + /// Obtain a raw pointer to the underlying `struct io_pgtable_ops`. + #[inline] + pub fn raw_ops(&self) -> *mut bindings::io_pgtable_ops { + self.ptr.as_ptr() + } + + /// Obtain a raw pointer to the underlying `struct io_pgtable`. + #[inline] + pub fn raw_pgtable(&self) -> *mut bindings::io_pgtable { + // SAFETY: The io_pgtable_ops of an io-pgtable is always the ops f= ield of a io_pgtable. + unsafe { kernel::container_of!(self.raw_ops(), bindings::io_pgtabl= e, ops) } + } + + /// Obtain a raw pointer to the underlying `struct io_pgtable_cfg`. + #[inline] + pub fn raw_cfg(&self) -> *mut bindings::io_pgtable_cfg { + // SAFETY: The `raw_pgtable()` method returns a valid pointer. + unsafe { &raw mut (*self.raw_pgtable()).cfg } + } + + /// Map a physically contiguous range of pages of the same size. + /// + /// Even if successful, this operation may not map the entire range. I= n that case, only a + /// prefix of the range is mapped, and the returned integer indicates = its length in bytes. In + /// this case, the caller will usually call `map_pages` again for the = remaining range. + /// + /// The returned [`Result`] indicates whether an error was encountered= while mapping pages. + /// Note that this may return a non-zero length even if an error was e= ncountered. The caller + /// will usually [unmap the relevant pages](Self::unmap_pages) on erro= r. + /// + /// The caller must flush the TLB before using the pgtable to access t= he newly created mapping. + /// + /// # Safety + /// + /// * No other io-pgtable operation may access the range `iova .. iova= +pgsize*pgcount` while + /// this `map_pages` operation executes. + /// * This page table must not contain any mapping that overlaps with = the mapping created by + /// this call. + /// * If this page table is live, then the caller must ensure that it'= s okay to access the + /// physical address being mapped for the duration in which it is ma= pped. + #[inline] + #[must_use] + pub unsafe fn map_pages( + &self, + iova: usize, + paddr: PhysAddr, + pgsize: usize, + pgcount: usize, + prot: u32, + flags: alloc::Flags, + ) -> (usize, Result) { + let mut mapped: usize =3D 0; + + // SAFETY: The `map_pages` function in `io_pgtable_ops` is never n= ull. + let map_pages =3D unsafe { (*self.raw_ops()).map_pages.unwrap_unch= ecked() }; + + // SAFETY: The safety requirements of this method are sufficient t= o call `map_pages`. + let ret =3D to_result(unsafe { + (map_pages)( + self.raw_ops(), + iova, + paddr, + pgsize, + pgcount, + prot as i32, + flags.as_raw(), + &mut mapped, + ) + }); + + (mapped, ret) + } + + /// Unmap a range of virtually contiguous pages of the same size. + /// + /// This may not unmap the entire range, and returns the length of the= unmapped prefix in + /// bytes. + /// + /// # Safety + /// + /// * No other io-pgtable operation may access the range `iova .. iova= +pgsize*pgcount` while + /// this `unmap_pages` operation executes. + /// * This page table must contain one or more consecutive mappings st= arting at `iova` whose + /// total size is `pgcount * pgsize`. + #[inline] + #[must_use] + pub unsafe fn unmap_pages(&self, iova: usize, pgsize: usize, pgcount: = usize) -> usize { + // SAFETY: The `unmap_pages` function in `io_pgtable_ops` is never= null. + let unmap_pages =3D unsafe { (*self.raw_ops()).unmap_pages.unwrap_= unchecked() }; + + // SAFETY: The safety requirements of this method are sufficient t= o call `unmap_pages`. + unsafe { (unmap_pages)(self.raw_ops(), iova, pgsize, pgcount, core= ::ptr::null_mut()) } + } +} + +// For now, we do not provide the ability to flush the TLB via the built-i= n callback mechanism. +// Instead, the `map_pages` function requires the caller to explicitly flu= sh the TLB before the +// pgtable is used to access the newly created range. +// +// This is done because the initial user of this abstraction may perform m= any calls to `map_pages` +// in a single batched operation, and wishes to only flush the TLB once af= ter performing the entire +// batch of mappings. These callbacks would flush too often for that use-c= ase. +// +// Support for flushing the TLB in these callbacks may be added in the fut= ure. +static NOOP_FLUSH_OPS: bindings::iommu_flush_ops =3D bindings::iommu_flush= _ops { + tlb_flush_all: Some(rust_tlb_flush_all_noop), + tlb_flush_walk: Some(rust_tlb_flush_walk_noop), + tlb_add_page: None, +}; + +#[no_mangle] +extern "C" fn rust_tlb_flush_all_noop(_cookie: *mut core::ffi::c_void) {} + +#[no_mangle] +extern "C" fn rust_tlb_flush_walk_noop( + _iova: usize, + _size: usize, + _granule: usize, + _cookie: *mut core::ffi::c_void, +) { +} + +impl Drop for IoPageTable { + fn drop(&mut self) { + // SAFETY: The caller of `ttbr` promised that the page table is no= t live when this + // destructor runs. + unsafe { bindings::free_io_pgtable_ops(self.raw_ops()) }; + } +} + +/// The `ARM_64_LPAE_S1` page table format. +pub enum ARM64LPAES1 {} + +impl IoPageTableFmt for ARM64LPAES1 { + const FORMAT: io_pgtable_fmt =3D bindings::io_pgtable_fmt_ARM_64_LPAE_= S1 as io_pgtable_fmt; +} + +impl IoPageTable { + /// Access the `ttbr` field of the configuration. + /// + /// This is the physical address of the page table, which may be passe= d to the device that + /// needs to use it. + /// + /// # Safety + /// + /// The caller must ensure that the device stops using the page table = before dropping it. + #[inline] + pub unsafe fn ttbr(&self) -> u64 { + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES= 1`. + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.ttbr } + } + + /// Access the `mair` field of the configuration. + #[inline] + pub fn mair(&self) -> u64 { + // SAFETY: `arm_lpae_s1_cfg` is the right cfg type for `ARM64LPAES= 1`. + unsafe { (*self.raw_cfg()).__bindgen_anon_1.arm_lpae_s1_cfg.mair } + } +} --- base-commit: 3e7f562e20ee87a25e104ef4fce557d39d62fa85 change-id: 20251111-io-pgtable-fe0822b4ebdd Best regards, --=20 Alice Ryhl