From nobody Sat Dec 28 10:25:33 2024 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7BE3A1FDE00 for ; Wed, 11 Dec 2024 10:37:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913459; cv=none; b=Gd/csDCi2y5XjRFPIBXMkB0Y2r3n7mkVHP+p/KhgYkp3Xv8cZ2Up1SFY3CKL4bnjJa2wgStdF08GLjPhF9/zhf9itmDC7Hwb2uZ40yYvZY/PZR0BDevWe72vwwWAP4dCtjcYk+aDUHvRlikZnsp/QvyMs73ZGV0lBtPgzFvpb2s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913459; c=relaxed/simple; bh=BP0rHFtdcn3zxhmWpCsC4fRZsGWpHTVijKhPN/V33Gw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XXnEDtsTFIInamfZP5VXI+y9f2be4/n5azmQfz1htwmg0ou8Tw+0dFVv4eVM24YUwta/GfhENyRut1kjs4vu4513vzjOOqbKaCj5UixBaKgbBRiuDlueOROfIm5uK9ZnAqIBiYKv97KiP2YPNaMTUubIG0Fer+56YcZuB+yQc+8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=iZW15SrV; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iZW15SrV" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-434f852cb35so2245175e9.0 for ; Wed, 11 Dec 2024 02:37:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733913455; x=1734518255; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=eeSYOeE5pMMeyg7ZhXQAlvtxanJpaTuPefZQ5vxq80o=; b=iZW15SrVYYPwGFTx5SFpIb3pf7o6hZbq/I6RjyhZrK5e7rIrYpCaFbsV2SvIk3MsZr jhGVTZm8Ms+q9LtSCXViXc3VpToWphWe9YUTDWraDqaQgNWDntU2FDJoNhCBjqLhhlok ZkYpVgH4v/DeiaB4WjqRJFtFkthJoWNpE6wMvB+yWgpLQjK8DYAbAsNihuGJYGlTNZ4u 8ITBAXZzpRsKAXFuamPnKLQcEcQ0WKe0Ug49OEhmvLs5C/yfepiqJnlH4gGp9Is+3L2v bd7OiCur6nKG/fSP6qmAOP73uuONvbEw/wFlW8siAXi1geYNBDke6QpVWecA9d+8d5kS 1nZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733913455; x=1734518255; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=eeSYOeE5pMMeyg7ZhXQAlvtxanJpaTuPefZQ5vxq80o=; b=C/UpC6muRUgRsKiyB1i0GePVBNSGeV5eE3mxprRg31GlYYmhdXWc9p0nLPRo2OpvQl m1sEV+hRxxaCwuAyjvrrLOkQ9wUijSU6if1h0YyWFuMkdW83fPETcxMmby9rfRRo3h1M W0Jtlf+mvMuwhNQnXIXnrhyzBMe/tue/a8JM/8TTw66HQQ5kLTq8RyuwAdgivYfZAhbv 0lxnk+t2a+rJXSkLVvTlB9RUv7NwfRPbZxMGAuhydv+C5Ojzp2Wd0qWKrrMie6zR69SJ NDLBx1Fqzp1oKNdJhBTgKHsBCn+rmW8hujN2UaYw+nV5JpLNa1vTUXHFoWttT6+Qfede CxMA== X-Forwarded-Encrypted: i=1; AJvYcCUQcdeV5B6Bc0HBd2551y/ZQotgTFphAYWPNObB5eEDYzb0zGp+Z6gp6y8NOFbWEexn8HKFZfHYTN8hP50=@vger.kernel.org X-Gm-Message-State: AOJu0YzwLAhw4gjI84gjXb3qX5P5BTJUz1JkHL+HbQrdaKxbdJK4OAyh hw7eceUbIo13BP5ORxNXnpSv7I6IvMxTKl5eAA7soRvZeShypjHfCjpl9KEaRwzkjpQKgOASmFo jJhImoMcrEUWOTA== X-Google-Smtp-Source: AGHT+IGlVGDsXAS8QTt20ZhjhUvb4+iLhx6imu5M/qvMwlRkpVFZ0hN9qToGEpQgPFG5rFlBMfq0S5s6OEGbDoM= X-Received: from wmbd2.prod.google.com ([2002:a05:600c:58c2:b0:434:f513:bb24]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:b86:b0:434:f623:9ff3 with SMTP id 5b1f17b1804b1-4361c382ce3mr18383385e9.15.1733913454954; Wed, 11 Dec 2024 02:37:34 -0800 (PST) Date: Wed, 11 Dec 2024 10:37:05 +0000 In-Reply-To: <20241211-vma-v11-0-466640428fc3@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211-vma-v11-0-466640428fc3@google.com> X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=10055; i=aliceryhl@google.com; h=from:subject:message-id; bh=BP0rHFtdcn3zxhmWpCsC4fRZsGWpHTVijKhPN/V33Gw=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBnWWtmTO1N1K8DUC6NoDXf02ue/qobHQiKrkrWy Qgkt4oEYx2JAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCZ1lrZgAKCRAEWL7uWMY5 Rk37D/wKqutxvgqTXRQSIuDKk6I9CScFzIL8g8BqDE1KNEKiN6wrIWLFEcpw9BSH85DoRLdm0y2 XGydQAJ5iIJzfHWTw4k5RvMDkr8JHrdMTYTNLo9w49UHdABMFkiSk2ZRQ4p97t2rcPk4DciGYtB LYzQdCJPLhvlnk4j125r3wgO+8a1PgVElFmYrZZG42nFk3Do3gamix8Db9R8M6/l5Zl9o2uObsA ONIlvWmIedb1NJ5V/LfYc/HhYoSZT3wYNQRjET2d0352qRxyipt4oXDJfEOnf4PUc9WElAfZaEo Itha4cyfEjOZjky9RLsKXWzEQFyadVxgjxdtYQwzOScECwI9iVKIvRhYYHdjLqmmjwW0rkktYVb DDUuSOD3u9KPp1UEtAxW/wWaPN3JUFL2Ng+nkc5V7nbSzlMXF22g5ytMA0rpK1QOC5NIhqVXDup SbLgi25Ot6vGwTcrGyxuBbQjXTVfV51yZpVQPW96b3IF30Pd8OHeMa2bCusmjZU+Po5vkEtOpoY 8GLg8cf1on2Jt21bMRjq6S4gH4r9TLiZ5WgNBhTp1CAc443b92p/AVEzZshjGavyrKiMhSkBnbR j7QpB7Q3pSWCg1HJHzVSwr/MX9VStblyNEDSRZL9+R4YR7h3FeeWbC+qJWw7PscIJWVRRd+QSWf a+c0FhQRhWjjrTg== X-Mailer: b4 0.13.0 Message-ID: <20241211-vma-v11-1-466640428fc3@google.com> Subject: [PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct From: Alice Ryhl To: Miguel Ojeda , Matthew Wilcox , Lorenzo Stoakes , Vlastimil Babka , John Hubbard , "Liam R. Howlett" , Andrew Morton , Greg Kroah-Hartman , Arnd Bergmann , Christian Brauner , Jann Horn , Suren Baghdasaryan Cc: Alex Gaynor , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rust-for-linux@vger.kernel.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable These abstractions allow you to reference a `struct mm_struct` using both mmgrab and mmget refcounts. This is done using two Rust types: * Mm - represents an mm_struct where you don't know anything about the value of mm_users. * MmWithUser - represents an mm_struct where you know at compile time that mm_users is non-zero. This allows us to encode in the type system whether a method requires that mm_users is non-zero or not. For instance, you can always call `mmget_not_zero` but you can only call `mmap_read_lock` when mm_users is non-zero. It's possible to access current->mm without a refcount increment, but that is added in a later patch of this series. Acked-by: Lorenzo Stoakes (for mm bits) Signed-off-by: Alice Ryhl --- rust/helpers/helpers.c | 1 + rust/helpers/mm.c | 39 +++++++++ rust/kernel/lib.rs | 1 + rust/kernel/mm.rs | 219 +++++++++++++++++++++++++++++++++++++++++++++= ++++ 4 files changed, 260 insertions(+) diff --git a/rust/helpers/helpers.c b/rust/helpers/helpers.c index dcf827a61b52..9d748ec845b3 100644 --- a/rust/helpers/helpers.c +++ b/rust/helpers/helpers.c @@ -16,6 +16,7 @@ #include "fs.c" #include "jump_label.c" #include "kunit.c" +#include "mm.c" #include "mutex.c" #include "page.c" #include "pid_namespace.c" diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c new file mode 100644 index 000000000000..7201747a5d31 --- /dev/null +++ b/rust/helpers/mm.c @@ -0,0 +1,39 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +void rust_helper_mmgrab(struct mm_struct *mm) +{ + mmgrab(mm); +} + +void rust_helper_mmdrop(struct mm_struct *mm) +{ + mmdrop(mm); +} + +void rust_helper_mmget(struct mm_struct *mm) +{ + mmget(mm); +} + +bool rust_helper_mmget_not_zero(struct mm_struct *mm) +{ + return mmget_not_zero(mm); +} + +void rust_helper_mmap_read_lock(struct mm_struct *mm) +{ + mmap_read_lock(mm); +} + +bool rust_helper_mmap_read_trylock(struct mm_struct *mm) +{ + return mmap_read_trylock(mm); +} + +void rust_helper_mmap_read_unlock(struct mm_struct *mm) +{ + mmap_read_unlock(mm); +} diff --git a/rust/kernel/lib.rs b/rust/kernel/lib.rs index e1065a7551a3..6555e0847192 100644 --- a/rust/kernel/lib.rs +++ b/rust/kernel/lib.rs @@ -46,6 +46,7 @@ pub mod kunit; pub mod list; pub mod miscdevice; +pub mod mm; #[cfg(CONFIG_NET)] pub mod net; pub mod page; diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs new file mode 100644 index 000000000000..84cba581edaa --- /dev/null +++ b/rust/kernel/mm.rs @@ -0,0 +1,219 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2024 Google LLC. + +//! Memory management. +//! +//! C header: [`include/linux/mm.h`](srctree/include/linux/mm.h) + +use crate::{ + bindings, + types::{ARef, AlwaysRefCounted, NotThreadSafe, Opaque}, +}; +use core::{ops::Deref, ptr::NonNull}; + +/// A wrapper for the kernel's `struct mm_struct`. +/// +/// Since `mm_users` may be zero, the associated address space may not exi= st anymore. You can use +/// [`mmget_not_zero`] to be able to access the address space. +/// +/// The `ARef` smart pointer holds an `mmgrab` refcount. Its destructo= r may sleep. +/// +/// # Invariants +/// +/// Values of this type are always refcounted using `mmgrab`. +/// +/// [`mmget_not_zero`]: Mm::mmget_not_zero +#[repr(transparent)] +pub struct Mm { + mm: Opaque, +} + +// SAFETY: It is safe to call `mmdrop` on another thread than where `mmgra= b` was called. +unsafe impl Send for Mm {} +// SAFETY: All methods on `Mm` can be called in parallel from several thre= ads. +unsafe impl Sync for Mm {} + +// SAFETY: By the type invariants, this type is always refcounted. +unsafe impl AlwaysRefCounted for Mm { + #[inline] + fn inc_ref(&self) { + // SAFETY: The pointer is valid since self is a reference. + unsafe { bindings::mmgrab(self.as_raw()) }; + } + + #[inline] + unsafe fn dec_ref(obj: NonNull) { + // SAFETY: The caller is giving up their refcount. + unsafe { bindings::mmdrop(obj.cast().as_ptr()) }; + } +} + +/// A wrapper for the kernel's `struct mm_struct`. +/// +/// This type is like [`Mm`], but with non-zero `mm_users`. It can only be= used when `mm_users` can +/// be proven to be non-zero at compile-time, usually because the relevant= code holds an `mmget` +/// refcount. It can be used to access the associated address space. +/// +/// The `ARef` smart pointer holds an `mmget` refcount. Its de= structor may sleep. +/// +/// # Invariants +/// +/// Values of this type are always refcounted using `mmget`. The value of = `mm_users` is non-zero. +#[repr(transparent)] +pub struct MmWithUser { + mm: Mm, +} + +// SAFETY: It is safe to call `mmput` on another thread than where `mmget`= was called. +unsafe impl Send for MmWithUser {} +// SAFETY: All methods on `MmWithUser` can be called in parallel from seve= ral threads. +unsafe impl Sync for MmWithUser {} + +// SAFETY: By the type invariants, this type is always refcounted. +unsafe impl AlwaysRefCounted for MmWithUser { + #[inline] + fn inc_ref(&self) { + // SAFETY: The pointer is valid since self is a reference. + unsafe { bindings::mmget(self.as_raw()) }; + } + + #[inline] + unsafe fn dec_ref(obj: NonNull) { + // SAFETY: The caller is giving up their refcount. + unsafe { bindings::mmput(obj.cast().as_ptr()) }; + } +} + +// Make all `Mm` methods available on `MmWithUser`. +impl Deref for MmWithUser { + type Target =3D Mm; + + #[inline] + fn deref(&self) -> &Mm { + &self.mm + } +} + +// These methods are safe to call even if `mm_users` is zero. +impl Mm { + /// Call `mmgrab` on `current.mm`. + #[inline] + pub fn mmgrab_current() -> Option> { + // SAFETY: It's safe to get the `mm` field from current. + let mm =3D unsafe { + let current =3D bindings::get_current(); + (*current).mm + }; + + if mm.is_null() { + return None; + } + + // SAFETY: The value of `current->mm` is guaranteed to be null or = a valid `mm_struct`. We + // just checked that it's not null. Furthermore, the returned `&Mm= ` is valid only for the + // duration of this function, and `current->mm` will stay valid fo= r that long. + let mm =3D unsafe { Mm::from_raw(mm) }; + + // This increments the refcount using `mmgrab`. + Some(ARef::from(mm)) + } + + /// Returns a raw pointer to the inner `mm_struct`. + #[inline] + pub fn as_raw(&self) -> *mut bindings::mm_struct { + self.mm.get() + } + + /// Obtain a reference from a raw pointer. + /// + /// # Safety + /// + /// The caller must ensure that `ptr` points at an `mm_struct`, and th= at it is not deallocated + /// during the lifetime 'a. + #[inline] + pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a Mm { + // SAFETY: Caller promises that the pointer is valid for 'a. Layou= ts are compatible due to + // repr(transparent). + unsafe { &*ptr.cast() } + } + + /// Calls `mmget_not_zero` and returns a handle if it succeeds. + #[inline] + pub fn mmget_not_zero(&self) -> Option> { + // SAFETY: The pointer is valid since self is a reference. + let success =3D unsafe { bindings::mmget_not_zero(self.as_raw()) }; + + if success { + // SAFETY: We just created an `mmget` refcount. + Some(unsafe { ARef::from_raw(NonNull::new_unchecked(self.as_ra= w().cast())) }) + } else { + None + } + } +} + +// These methods require `mm_users` to be non-zero. +impl MmWithUser { + /// Obtain a reference from a raw pointer. + /// + /// # Safety + /// + /// The caller must ensure that `ptr` points at an `mm_struct`, and th= at `mm_users` remains + /// non-zero for the duration of the lifetime 'a. + #[inline] + pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_struct) -> &'a MmW= ithUser { + // SAFETY: Caller promises that the pointer is valid for 'a. The l= ayout is compatible due + // to repr(transparent). + unsafe { &*ptr.cast() } + } + + /// Lock the mmap read lock. + #[inline] + pub fn mmap_read_lock(&self) -> MmapReadGuard<'_> { + // SAFETY: The pointer is valid since self is a reference. + unsafe { bindings::mmap_read_lock(self.as_raw()) }; + + // INVARIANT: We just acquired the read lock. + MmapReadGuard { + mm: self, + _nts: NotThreadSafe, + } + } + + /// Try to lock the mmap read lock. + #[inline] + pub fn mmap_read_trylock(&self) -> Option> { + // SAFETY: The pointer is valid since self is a reference. + let success =3D unsafe { bindings::mmap_read_trylock(self.as_raw()= ) }; + + if success { + // INVARIANT: We just acquired the read lock. + Some(MmapReadGuard { + mm: self, + _nts: NotThreadSafe, + }) + } else { + None + } + } +} + +/// A guard for the mmap read lock. +/// +/// # Invariants +/// +/// This `MmapReadGuard` guard owns the mmap read lock. +pub struct MmapReadGuard<'a> { + mm: &'a MmWithUser, + // `mmap_read_lock` and `mmap_read_unlock` must be called on the same = thread + _nts: NotThreadSafe, +} + +impl Drop for MmapReadGuard<'_> { + #[inline] + fn drop(&mut self) { + // SAFETY: We hold the read lock by the type invariants. + unsafe { bindings::mmap_read_unlock(self.mm.as_raw()) }; + } +} --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Dec 28 10:25:33 2024 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C1D7211A11 for ; Wed, 11 Dec 2024 10:37:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913460; cv=none; b=kLm3Soxiy9uvInvpWY2iHddVBZtjblo44/csHsvXVx1LlAKKXzUcDHryFIZrsGWpZpIVuh/rk4Cr6IQIjHxBEWAXqLGlPUkeHMGKlZUozUfgv+M8jGGQvwp3zwQxGmNiixNKvtWVnQEB/cG0DEF49w3AmNQ1v/5HrT0yrhJ+OSQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913460; c=relaxed/simple; bh=UELtGY1YbfyD3E/jKfzFIZ073fFyrrrn2baNLFYoZH8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=snAG7NMtnNcFkaWcrvMSUG3F6BAZDQPFyT+yeyF7YVbIbKfrvu+hoHbsdPXQZubVm/GGf/37DpNKhhZ8EuDNsp5/dFE+kqaOOnhpjOb8e/hQyARzqGF9iVjuTiT0FI1cdpGQS+eewOIhVxXTVhKHmVW/q3MQ1cTi02sSpngT4GI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=iUyquE6S; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iUyquE6S" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43582d49dacso15104155e9.2 for ; Wed, 11 Dec 2024 02:37:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733913457; x=1734518257; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ptCVgSajmLGd+33xQ7/1wwSdfyjzIQ8tWfofuzm1Y9U=; b=iUyquE6SWHsM3bA2/CyG4Stj9uT9IUUiiANDavdiOIsOsZi3okwo/dFn01A1pYnf9d A94ACTRWGfS0pZRL4H8KRcZWYoOdksd/TerPAC5P/iSJ+sxG1MU0cKq9DV/JRJY/Guhz sgKWXcr+N53T3LxWAOoeE3hP04gZVcsTtAVs4b11RXnawRmCqvoFLXh6FCQoeUejeVDv fUnzr/k2LMCx+oY5sLRezLfrvvJVO+4L8zvQ7bPRCMyUYmdt49W+M1laPg1i2QumGtBj 7ZTRlV+UJVpmGdwo5FWL1yQ2RAgdxQpiE84Z/tps4TVlZiq0xNN0kTrOmxFw43dsL6MI Bubw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733913457; x=1734518257; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ptCVgSajmLGd+33xQ7/1wwSdfyjzIQ8tWfofuzm1Y9U=; b=VEqka7rNQhLFV632Nn0EpiWuCxbFweSFloQphtcVEyIC7N5mDNWqUApZ4WaCzBBQLy oXs91xRZH9p0ftUsbb9t6vxoCx5Yb9ha/qNU/2uWZakjdxii1rv/TDd2floDY3+AsUxi GWaQ20QY24K6Ir1Z8Fz0Bw8nJgfJJg/pTdzkKQ8qgUMZ7uvnK0wFmmcLUmR8rcjCgW80 tKAlaz+Es+yx4eJEpgzeC5aM4BwJ3OI9O1cNPp7aAXhpjioaoJK+42UesLqmoejqJ7Y0 SpeZdtV/OOgk286hkSjsoiI6jdvjNwOuCtvchFYk3ueqLm0g5hrVJXqzY02mxUqLurGR 2t7g== X-Forwarded-Encrypted: i=1; AJvYcCX1EkTYOTHt70A+OBmiX6mvn4JAN7dwoZ2DUGJX0AMIVTQcWQb5H2U+OiDfQmQWS6r9N3SVF+z8YaRap10=@vger.kernel.org X-Gm-Message-State: AOJu0YzxJQROg2g7FamUUZD9f4w2c09mzwuag/L51xalGz49QUUvguxm 7uWXvIg1h3au+5gf9fhlxwtuNJYIjgS4NO05tb2KHZ3N2dbZn+O77GExmG5TQ+Dmsszt30sw0xv EmYGGtfBanuKxMg== X-Google-Smtp-Source: AGHT+IF2oHVYbWX6H19GjEN2XS8y+OVVbBpLGUljSwx1ScMsBgY3j4qEcrJlD9I/H+Lzexp4LvuOcuGlpE8PUZM= X-Received: from wmpd23.prod.google.com ([2002:a05:600c:4c17:b0:436:16c6:831]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1d20:b0:434:f218:e1a8 with SMTP id 5b1f17b1804b1-4361c3c70eamr16834795e9.19.1733913457042; Wed, 11 Dec 2024 02:37:37 -0800 (PST) Date: Wed, 11 Dec 2024 10:37:06 +0000 In-Reply-To: <20241211-vma-v11-0-466640428fc3@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211-vma-v11-0-466640428fc3@google.com> X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=10059; i=aliceryhl@google.com; h=from:subject:message-id; bh=UELtGY1YbfyD3E/jKfzFIZ073fFyrrrn2baNLFYoZH8=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBnWWtmJY7Ptcy7pNZrWwIGonxUXNeWa5B1NRzbo /1Y1X00++WJAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCZ1lrZgAKCRAEWL7uWMY5 RisoD/9s1GclB/tPV2RQgJj7Nws8WY0RQiMF3+XoqpaXLo7CBWgXqFtqDRnmEDtnPSRjPNjZDUw vn76Q3hqvoFexzTpBLEIXNpq+0aW62Rs4WYTSB9EK4l03q8OIXaLWOuNzNlIFnkk5FWJQnnVctY 4WaWo/DVhnFu3CckEwyhSjyZaC6tPRmjhEOV0MV9Zbc8e1VQXzd6gIuukpFAHK46aLIKXZwmPa7 J9pTslBugUwDx0t/QdX6IDb8rto2Ux9i/12Ph2fwYr/mmAY0XkIqc3MavawP8NkoK9IqXgWDVfH OAk1I8Nd9M4GaTxeIr8Wqn3+VcYuY2axQuPSdU+N+/CI/5kSIxDN++PqFJG1ohgzX3zgVUHioTH 5ZwPVM+FGEGb2b6Lv4iIn2uoVLcgsE3qxZetWkS/wYejQJ8GS0tvdtyuY0zkLLM5uQqH6ZBqFMR VHC+va0R5zAofKnL4ktwO3H/+1SBj2euS1cjsFFE2EIDCz3ENNd32tgOwboechRxR0AXiyPxyuE Iw12dKxpN9KICnBsQaXZZMbK3xJQ5wsZ74pST127xzw0893fVbw2sKL5BfGFNVccPJv/QWbMq41 de5sNMp11t7DLip+VQnwMBl5Qlb/GCogJOR7gtVxW7AA1y9Eaqfqjj38Lo9unyrdcERTCEq0e+I leQBcaYv82zdrBw== X-Mailer: b4 0.13.0 Message-ID: <20241211-vma-v11-2-466640428fc3@google.com> Subject: [PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access From: Alice Ryhl To: Miguel Ojeda , Matthew Wilcox , Lorenzo Stoakes , Vlastimil Babka , John Hubbard , "Liam R. Howlett" , Andrew Morton , Greg Kroah-Hartman , Arnd Bergmann , Christian Brauner , Jann Horn , Suren Baghdasaryan Cc: Alex Gaynor , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rust-for-linux@vger.kernel.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This adds a type called VmAreaRef which is used when referencing a vma that you have read access to. Here, read access means that you hold either the mmap read lock or the vma read lock (or stronger). Additionally, a vma_lookup method is added to the mmap read guard, which enables you to obtain a &VmAreaRef in safe Rust code. This patch only provides a way to lock the mmap read lock, but a follow-up patch also provides a way to just lock the vma read lock. Acked-by: Lorenzo Stoakes (for mm bits) Reviewed-by: Jann Horn Signed-off-by: Alice Ryhl --- rust/helpers/mm.c | 6 ++ rust/kernel/mm.rs | 21 ++++++ rust/kernel/mm/virt.rs | 191 +++++++++++++++++++++++++++++++++++++++++++++= ++++ 3 files changed, 218 insertions(+) diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c index 7201747a5d31..7b72eb065a3e 100644 --- a/rust/helpers/mm.c +++ b/rust/helpers/mm.c @@ -37,3 +37,9 @@ void rust_helper_mmap_read_unlock(struct mm_struct *mm) { mmap_read_unlock(mm); } + +struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm, + unsigned long addr) +{ + return vma_lookup(mm, addr); +} diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs index 84cba581edaa..ace8e7d57afe 100644 --- a/rust/kernel/mm.rs +++ b/rust/kernel/mm.rs @@ -12,6 +12,8 @@ }; use core::{ops::Deref, ptr::NonNull}; =20 +pub mod virt; + /// A wrapper for the kernel's `struct mm_struct`. /// /// Since `mm_users` may be zero, the associated address space may not exi= st anymore. You can use @@ -210,6 +212,25 @@ pub struct MmapReadGuard<'a> { _nts: NotThreadSafe, } =20 +impl<'a> MmapReadGuard<'a> { + /// Look up a vma at the given address. + #[inline] + pub fn vma_lookup(&self, vma_addr: usize) -> Option<&virt::VmAreaRef> { + // SAFETY: We hold a reference to the mm, so the pointer must be v= alid. Any value is okay + // for `vma_addr`. + let vma =3D unsafe { bindings::vma_lookup(self.mm.as_raw(), vma_ad= dr as _) }; + + if vma.is_null() { + None + } else { + // SAFETY: We just checked that a vma was found, so the pointe= r is valid. Furthermore, + // the returned area will borrow from this read lock guard, so= it can only be used + // while the mmap read lock is still held. + unsafe { Some(virt::VmAreaRef::from_raw(vma)) } + } + } +} + impl Drop for MmapReadGuard<'_> { #[inline] fn drop(&mut self) { diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs new file mode 100644 index 000000000000..68c763169cf0 --- /dev/null +++ b/rust/kernel/mm/virt.rs @@ -0,0 +1,191 @@ +// SPDX-License-Identifier: GPL-2.0 + +// Copyright (C) 2024 Google LLC. + +//! Virtual memory. + +use crate::{bindings, mm::MmWithUser, types::Opaque}; + +/// A wrapper for the kernel's `struct vm_area_struct` with read access. +/// +/// It represents an area of virtual memory. +/// +/// # Invariants +/// +/// The caller must hold the mmap read lock or the vma read lock. +#[repr(transparent)] +pub struct VmAreaRef { + vma: Opaque, +} + +// Methods you can call when holding the mmap or vma read lock (or strong)= . They must be usable no +// matter what the vma flags are. +impl VmAreaRef { + /// Access a virtual memory area given a raw pointer. + /// + /// # Safety + /// + /// Callers must ensure that `vma` is valid for the duration of 'a, an= d that the mmap or vma + /// read lock (or stronger) is held for at least the duration of 'a. + #[inline] + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'= a Self { + // SAFETY: The caller ensures that the invariants are satisfied fo= r the duration of 'a. + unsafe { &*vma.cast() } + } + + /// Returns a raw pointer to this area. + #[inline] + pub fn as_ptr(&self) -> *mut bindings::vm_area_struct { + self.vma.get() + } + + /// Access the underlying `mm_struct`. + #[inline] + pub fn mm(&self) -> &MmWithUser { + // SAFETY: By the type invariants, this `vm_area_struct` is valid = and we hold the mmap/vma + // read lock or stronger. This implies that the underlying mm has = a non-zero value of + // `mm_users`. + unsafe { MmWithUser::from_raw((*self.as_ptr()).vm_mm) } + } + + /// Returns the flags associated with the virtual memory area. + /// + /// The possible flags are a combination of the constants in [`flags`]. + #[inline] + pub fn flags(&self) -> vm_flags_t { + // SAFETY: By the type invariants, the caller holds at least the m= map read lock, so this + // access is not a data race. + unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags as _ } + } + + /// Returns the (inclusive) start address of the virtual memory area. + #[inline] + pub fn start(&self) -> usize { + // SAFETY: By the type invariants, the caller holds at least the m= map read lock, so this + // access is not a data race. + unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_sta= rt as _ } + } + + /// Returns the (exclusive) end address of the virtual memory area. + #[inline] + pub fn end(&self) -> usize { + // SAFETY: By the type invariants, the caller holds at least the m= map read lock, so this + // access is not a data race. + unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_end= as _ } + } + + /// Zap pages in the given page range. + /// + /// This clears page table mappings for the range at the leaf level, l= eaving all other page + /// tables intact, and freeing any memory referenced by the VMA in thi= s range. That is, + /// anonymous memory is completely freed, file-backed memory has its r= eference count on page + /// cache folio's dropped, any dirty data will still be written back t= o disk as usual. + #[inline] + pub fn zap_page_range_single(&self, address: usize, size: usize) { + let (end, did_overflow) =3D address.overflowing_add(size); + if did_overflow || address < self.start() || self.end() < end { + // TODO: call WARN_ONCE once Rust version of it is added + return; + } + + // SAFETY: By the type invariants, the caller has read access to t= his VMA, which is + // sufficient for this method call. This method has no requirement= s on the vma flags. The + // address range is checked to be within the vma. + unsafe { + bindings::zap_page_range_single( + self.as_ptr(), + address as _, + size as _, + core::ptr::null_mut(), + ) + }; + } +} + +/// The integer type used for vma flags. +#[doc(inline)] +pub use bindings::vm_flags_t; + +/// All possible flags for [`VmAreaRef`]. +pub mod flags { + use super::vm_flags_t; + use crate::bindings; + + /// No flags are set. + pub const NONE: vm_flags_t =3D bindings::VM_NONE as _; + + /// Mapping allows reads. + pub const READ: vm_flags_t =3D bindings::VM_READ as _; + + /// Mapping allows writes. + pub const WRITE: vm_flags_t =3D bindings::VM_WRITE as _; + + /// Mapping allows execution. + pub const EXEC: vm_flags_t =3D bindings::VM_EXEC as _; + + /// Mapping is shared. + pub const SHARED: vm_flags_t =3D bindings::VM_SHARED as _; + + /// Mapping may be updated to allow reads. + pub const MAYREAD: vm_flags_t =3D bindings::VM_MAYREAD as _; + + /// Mapping may be updated to allow writes. + pub const MAYWRITE: vm_flags_t =3D bindings::VM_MAYWRITE as _; + + /// Mapping may be updated to allow execution. + pub const MAYEXEC: vm_flags_t =3D bindings::VM_MAYEXEC as _; + + /// Mapping may be updated to be shared. + pub const MAYSHARE: vm_flags_t =3D bindings::VM_MAYSHARE as _; + + /// Page-ranges managed without `struct page`, just pure PFN. + pub const PFNMAP: vm_flags_t =3D bindings::VM_PFNMAP as _; + + /// Memory mapped I/O or similar. + pub const IO: vm_flags_t =3D bindings::VM_IO as _; + + /// Do not copy this vma on fork. + pub const DONTCOPY: vm_flags_t =3D bindings::VM_DONTCOPY as _; + + /// Cannot expand with mremap(). + pub const DONTEXPAND: vm_flags_t =3D bindings::VM_DONTEXPAND as _; + + /// Lock the pages covered when they are faulted in. + pub const LOCKONFAULT: vm_flags_t =3D bindings::VM_LOCKONFAULT as _; + + /// Is a VM accounted object. + pub const ACCOUNT: vm_flags_t =3D bindings::VM_ACCOUNT as _; + + /// Should the VM suppress accounting. + pub const NORESERVE: vm_flags_t =3D bindings::VM_NORESERVE as _; + + /// Huge TLB Page VM. + pub const HUGETLB: vm_flags_t =3D bindings::VM_HUGETLB as _; + + /// Synchronous page faults. (DAX-specific) + pub const SYNC: vm_flags_t =3D bindings::VM_SYNC as _; + + /// Architecture-specific flag. + pub const ARCH_1: vm_flags_t =3D bindings::VM_ARCH_1 as _; + + /// Wipe VMA contents in child on fork. + pub const WIPEONFORK: vm_flags_t =3D bindings::VM_WIPEONFORK as _; + + /// Do not include in the core dump. + pub const DONTDUMP: vm_flags_t =3D bindings::VM_DONTDUMP as _; + + /// Not soft dirty clean area. + pub const SOFTDIRTY: vm_flags_t =3D bindings::VM_SOFTDIRTY as _; + + /// Can contain `struct page` and pure PFN pages. + pub const MIXEDMAP: vm_flags_t =3D bindings::VM_MIXEDMAP as _; + + /// MADV_HUGEPAGE marked this vma. + pub const HUGEPAGE: vm_flags_t =3D bindings::VM_HUGEPAGE as _; + + /// MADV_NOHUGEPAGE marked this vma. + pub const NOHUGEPAGE: vm_flags_t =3D bindings::VM_NOHUGEPAGE as _; + + /// KSM may merge identical pages. + pub const MERGEABLE: vm_flags_t =3D bindings::VM_MERGEABLE as _; +} --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Dec 28 10:25:33 2024 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A1B0A2210E5 for ; Wed, 11 Dec 2024 10:37:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913463; cv=none; b=JL9JTqAqdqEsmq0K/xGyjxiwAzYeg+ZACDyh8DiJjktcp5viKO83qWmSzTMeyZ94a/lV4qztSxUdrnXm1BmRDoX3/uT7pf5QNCzndwiOKt+mxO3UArP8qDGKNMnRi0XKVlBJkc3hGhNEPQ6ZMEBkyIaaF6zrOfmsLWdmkG5m/84= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913463; c=relaxed/simple; bh=6t+xF+P5/IR/lCGOL4lVeujRJ3nIoK+5J94bGPLjczU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=oCG2URTjmyeCqS+sn99QwyCrQAcdkxKbBRk36oUEu9xEHYjpmnG+ho6cOteDF7Wqo9i3gKdfkWqHtAImu/FxFeQ1IiOq1B2wgbMax7m2a3FV9L5evDwRq7DOBMcJw1Y7GucXal7tTmz4n3YMKFIqnf4vHGsigYLeG2eSEpvNCzI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=S6NjFYKW; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="S6NjFYKW" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-434e8beeed9so37307495e9.1 for ; Wed, 11 Dec 2024 02:37:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733913459; x=1734518259; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xdRoZ/FXvJ1iNt28gv3dQ+sMcWS78rYWNAmYz+QTguo=; b=S6NjFYKWwSym4fUf+k0z26/gmycD63KMqkywRbEZOFAgLmYK1wnVxQfUNXW8SS2Wdy /DWCD5CMyg0pAlK1gXr2pKA/f1sO44Rbw8FR4MqY+E2OoBxNKsCuCM0exK83quEHB7AB cszXZ+gKTyp56KkVVw2XZ+azKG9m2cmJqwXEE3WYI4+kQcbSo/H5GpoBqaO3jR63ZTlA AX8h4pX+AbygbIDCZmyreVvzkJF4wRJSWsVqAFqf1llm6zdIkM2nbdDHH7+cuyb0VCtq 6af3/eoRCztg8cKhffH9kNEE5XMGO20aB3qd1HpZHsuHCCpFIqKqsASTTmF8dIH/Plnj eDSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733913459; x=1734518259; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xdRoZ/FXvJ1iNt28gv3dQ+sMcWS78rYWNAmYz+QTguo=; b=jlhSZWxoIxZIpBdNMZxyfoDMf891E8lkxUjMod+v/QeuCye7cA7hg++hL130FPUGyA 8uU/zj7Zy1ImeAU2j3TFFl0NvOPkceM+zUr3qGhRlOEoF+8Z0JZr1bwngHcHL7GDSkHL L+njbbKvytrYM/3uL8njXWRkkQmRaprxTqz1JgLfQcdpVXGSpVMJ+VcenX6q/vaBNNJR RutW/J473rKuXzvEslRK2S7cqURgePt4O76bS3RQJB0nVDVlWuLKT0oGP27Hvv+XFHE3 /ds/K7On9r5L5V/NSGidh0lmSPFx2Pn4yePHIQb5ToFZtEQ1lIqVSB/iZe8euQZwK9D3 JI7Q== X-Forwarded-Encrypted: i=1; AJvYcCUg4LmOpbRe2RrgdxipyJ+JU/OkYbo3c+v0Ix+yjlt8ytd6/hRMD5c/g26JyTqWt7R/6FVTh4zQRZ63VVI=@vger.kernel.org X-Gm-Message-State: AOJu0YwG7/OciLcooWXveliESDPlcWzfxJaaG6tZy6v1vabeKrCPyC+F NI1t4aimrCa6Q0JeAyEGlg54Tw5y/vMXBxOlxEvtHWkeP916iw4m1/9lwZuZfCBMuJspbgOpl9E zvcnnZDPwSksYTw== X-Google-Smtp-Source: AGHT+IFiaOcL2ApjopemuAF+csic5VYHrPo4suLSy1p/57Ja4FGM8Qf+KoTAcXifpYCYix4DnL3IX5KOAs2s7jk= X-Received: from wmoy22.prod.google.com ([2002:a05:600c:17d6:b0:436:1995:1888]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:4f11:b0:435:23c:e23e with SMTP id 5b1f17b1804b1-4361c3ab01fmr16964065e9.12.1733913459182; Wed, 11 Dec 2024 02:37:39 -0800 (PST) Date: Wed, 11 Dec 2024 10:37:07 +0000 In-Reply-To: <20241211-vma-v11-0-466640428fc3@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211-vma-v11-0-466640428fc3@google.com> X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=3734; i=aliceryhl@google.com; h=from:subject:message-id; bh=6t+xF+P5/IR/lCGOL4lVeujRJ3nIoK+5J94bGPLjczU=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBnWWtn4X75kf6LePY+XZjlPFNolUZoJJ13E+kKv AdV5RmQ4cmJAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCZ1lrZwAKCRAEWL7uWMY5 RsYAEAClhhBwJB76Jj2L7B1ET41hYpXogDIazTH+rKg3i7QZObgXoaE2MPemKRlrV4yMInqgRij eWQIwr4ncHv/JV7zPTLkVOZ3LcVB0UPpgYcTPZ9/18yIS3+Db7En1gZRHrU9YmfFe4Incwr2JQK DB4vDQ2gtEQzfpx2s6Z77STq/wcZLrfnk6guaSaocfPSmf79vOKr18vnHn05DDVEtV6kV65amQT fEDULMlPeyXghGqiA1Xhv7wIq/jdIoFWmXTMgn46Zr5XpBKTwJAyh8GgaS0NrVLXXarNrOgpqjo ygUHM9Opb8NqrgNOae6CNtoBy3bVIE1Ftn8SGnhULvbvGQ2+C2utiYsAkWlaDVLLCmE0v4bow13 BUR5B8FS3dRzJ3d+n5fQkBdXnMw8Y2ovTwu+71t4Vz7hWt1OdmZ+IQCRoqx/DoPrJHRpR2hwmyW R/ZHCvJEbjPPbt+8KnCYToVtc0DTAJGW3I5t3TE+BMC/qBFk0kJ80vj9lwBRD8zh5j3KMToFUqm IBiRpwF1dIyfMCAaWSr6GoixEibdzj31iE3iTylBVC8DbZIl0BzB+K/rxomImUe7AIrEDRAd5R6 1RsaczeisEh5okvkLlrWcVFYbfxpa1IJJuHTKwlA1sHNehU3jTTg+NuvSPAnkHJe4ZL5Zy+MAuk tt79oV/6PfAzmtw== X-Mailer: b4 0.13.0 Message-ID: <20241211-vma-v11-3-466640428fc3@google.com> Subject: [PATCH v11 3/8] mm: rust: add vm_insert_page From: Alice Ryhl To: Miguel Ojeda , Matthew Wilcox , Lorenzo Stoakes , Vlastimil Babka , John Hubbard , "Liam R. Howlett" , Andrew Morton , Greg Kroah-Hartman , Arnd Bergmann , Christian Brauner , Jann Horn , Suren Baghdasaryan Cc: Alex Gaynor , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rust-for-linux@vger.kernel.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The vm_insert_page method is only usable on vmas with the VM_MIXEDMAP flag, so we introduce a new type to keep track of such vmas. The approach used in this patch assumes that we will not need to encode many flag combinations in the type. I don't think we need to encode more than VM_MIXEDMAP and VM_PFNMAP as things are now. However, if that becomes necessary, using generic parameters in a single type would scale better as the number of flags increases. Acked-by: Lorenzo Stoakes (for mm bits) Signed-off-by: Alice Ryhl --- rust/kernel/mm/virt.rs | 71 ++++++++++++++++++++++++++++++++++++++++++++++= +++- 1 file changed, 70 insertions(+), 1 deletion(-) diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs index 68c763169cf0..3a23854e14f4 100644 --- a/rust/kernel/mm/virt.rs +++ b/rust/kernel/mm/virt.rs @@ -4,7 +4,15 @@ =20 //! Virtual memory. =20 -use crate::{bindings, mm::MmWithUser, types::Opaque}; +use crate::{ + bindings, + error::{to_result, Result}, + mm::MmWithUser, + page::Page, + types::Opaque, +}; + +use core::ops::Deref; =20 /// A wrapper for the kernel's `struct vm_area_struct` with read access. /// @@ -100,6 +108,67 @@ pub fn zap_page_range_single(&self, address: usize, si= ze: usize) { ) }; } + + /// Check whether the `VM_MIXEDMAP` flag is set. + /// + /// This can be used to access methods that require `VM_MIXEDMAP` to b= e set. + #[inline] + pub fn as_mixedmap_vma(&self) -> Option<&VmAreaMixedMap> { + if self.flags() & flags::MIXEDMAP !=3D 0 { + // SAFETY: We just checked that `VM_MIXEDMAP` is set. All othe= r requirements are + // satisfied by the type invariants of `VmAreaRef`. + Some(unsafe { VmAreaMixedMap::from_raw(self.as_ptr()) }) + } else { + None + } + } +} + +/// A wrapper for the kernel's `struct vm_area_struct` with read access an= d `VM_MIXEDMAP` set. +/// +/// It represents an area of virtual memory. +/// +/// # Invariants +/// +/// The caller must hold the mmap read lock or the vma read lock. The `VM_= MIXEDMAP` flag must be +/// set. +#[repr(transparent)] +pub struct VmAreaMixedMap { + vma: VmAreaRef, +} + +// Make all `VmAreaRef` methods available on `VmAreaMixedMap`. +impl Deref for VmAreaMixedMap { + type Target =3D VmAreaRef; + + #[inline] + fn deref(&self) -> &VmAreaRef { + &self.vma + } +} + +impl VmAreaMixedMap { + /// Access a virtual memory area given a raw pointer. + /// + /// # Safety + /// + /// Callers must ensure that `vma` is valid for the duration of 'a, an= d that the mmap read lock + /// (or stronger) is held for at least the duration of 'a. The `VM_MIX= EDMAP` flag must be set. + #[inline] + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'= a Self { + // SAFETY: The caller ensures that the invariants are satisfied fo= r the duration of 'a. + unsafe { &*vma.cast() } + } + + /// Maps a single page at the given address within the virtual memory = area. + /// + /// This operation does not take ownership of the page. + #[inline] + pub fn vm_insert_page(&self, address: usize, page: &Page) -> Result { + // SAFETY: The caller has read access and has verified that `VM_MI= XEDMAP` is set. The page + // is order 0. The address is checked on the C side so it can take= any value. + to_result(unsafe { bindings::vm_insert_page(self.as_ptr(), address= as _, page.as_ptr()) }) + } } =20 /// The integer type used for vma flags. --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Dec 28 10:25:33 2024 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C6C6C230279 for ; Wed, 11 Dec 2024 10:37:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913464; cv=none; b=oF6hOv8pGHkRbqJC+M6CWgUpGqCtN7eL+g1GDYjoyy0GLpoxiI5hwyIFCCe7OndOP4RI6YIBW11ieS/hQ1LQf1tR/iuEs/tsI4f+CK+Favpe8IG9WsoJ/6h3v+5eyJcZBdagtRBpE2n0+wVOblJ+wq8tq5bAiSwcd8IYFnHTueo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913464; c=relaxed/simple; bh=PSMNvAHImS1yuLQzsvSoD2e3+5nrD/G5jvs5aJHnybQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YDaojabCAHpxvbe2yNpFeO8NXB+mUXY7fPeYG1xCCCgWEDYf1B4XULa9jXDx07+SMUlPdyuEkk+o1zYV/Wkq6w0vIkop16FVZTRRpQbbj1o6SkVoKJ9vQ1Y/tBhGsjAux6apFex5x/Nsem/0C+qWX2I1bqS5HbJclubSTuzTcok= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=O/mARUAo; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="O/mARUAo" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-385df115288so2797404f8f.2 for ; Wed, 11 Dec 2024 02:37:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733913461; x=1734518261; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jtfdo1hWu/gkdW8i64KIMP+Laz0H38tgBK8KPM9++n4=; b=O/mARUAo5zcESSbeW2dtzZmGvBLDJbSDzb9uIwlt9AxFVDXPIZY0iobtp3f030Yrbg lGZr24wcGWvFWTewd8dJ7at0SRp39oM7SK1/4st9OcbIOGkwyJ77UbIpFWAMQ67CTgtw GI55cNSrrAQ+5KHeIoh+YkcyGRWrM/FPr4BcLgfHwLOeD/fpFPX8kdGnuiprbl1ReCMK x0k/91c3tImqbmSLJvsUjh4qehUCihwH5vn+aJK/oSvTT9eQkMX1P/JCiRm0FvIfDBOZ OCpWMU2hFmxoE26EbeKAt3hpuDPc43YSOha5wDY5RJctspHJoCECUvM94/+qN+OPUPgR 8V4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733913461; x=1734518261; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jtfdo1hWu/gkdW8i64KIMP+Laz0H38tgBK8KPM9++n4=; b=OTHzDufPghiIYRtm1BkZmAMSDPQb7Hpt5Eax0BlOtpqf795i93p2HO393DRAXoI4fV ybV4A7LfykzVVpwJzssvs/XWc2bcXfrQ832CVaFgbi8vNtHJ9aganqkfOMchL1G+hzqn rodNuLaI5HReoIzbvl0RQ+s1VjWGqXaR8TBFWL3k0hUiO5FeJI5delT9pj6vx554dQa3 X3PCCI9gA1CmA3viukC5Fxyi5cF5rixguKaVe5bai02nVhxcvfPo7j/KOlJhD5GdcmhK 5BqCYZWy3M2Xkrf4DdvADL5ovBIbJ0yYh49tmBvjcaJOpPWEN38UQ92bycIyfs94QJHJ xYzg== X-Forwarded-Encrypted: i=1; AJvYcCUCpCXp7zm8CrOCNBUKaoTRX49qnxZzohLMrit9zfXkyt8eWzZ+su0SgTIQwsnjoKDXVpH90D7XHOL7QEk=@vger.kernel.org X-Gm-Message-State: AOJu0YzBqMF1BmIXxVtu+LN+L6TXt4YPymSXz3YNPELIsdSybOdrBxch 8oMhyHhz8F9AJUOSEV1+STvfLkaWOcZSeeYbz0McKzLfpR0QHZ1FaGlNcmC4vLDUWeAxOy8dIzx ys7C0ENoRv0hoyg== X-Google-Smtp-Source: AGHT+IEPSG2Hqz3P3TcvwAF16etBCufMRdEZ09D0s4Xc1JzGR3A143l7IWc1m3i2YQwgadn0owNcK8U+CXDiey4= X-Received: from wmok4.prod.google.com ([2002:a05:600c:4784:b0:434:a2c3:d51b]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:4023:b0:385:eb17:cd3d with SMTP id ffacd0b85a97d-3864ce49640mr1860462f8f.8.1733913461236; Wed, 11 Dec 2024 02:37:41 -0800 (PST) Date: Wed, 11 Dec 2024 10:37:08 +0000 In-Reply-To: <20241211-vma-v11-0-466640428fc3@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211-vma-v11-0-466640428fc3@google.com> X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=3946; i=aliceryhl@google.com; h=from:subject:message-id; bh=PSMNvAHImS1yuLQzsvSoD2e3+5nrD/G5jvs5aJHnybQ=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBnWWtoiZME2oih+BwGJvgLeV9XdV6aNRW+Q12xj brbx3/LqYWJAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCZ1lraAAKCRAEWL7uWMY5 RhRpEACoHtQCpffN1jELk1C4qWvx35jHIqYWia49W6YKDfgnOKbSh/PaQNsmFLQ5xqVgbm7v/UK aBQwg3EiD7cArhLxL66nlarZSNU4xAyIYRagufeEs62wVM8fiHrN0bLBQZaTeVR45ka6Dr/NpjT JDvwDHSpm37SvlfAmofc9S+xU4wW8bYutsCq3jCcK0SwgR0BZRcT+q5yYgBnR06vTwCt8Eo32tF 4eCgLuB09ENw8ozEaZuy3U/nIsKDw/Fs1cyweaV3JFTRhZu08ZCQFCMZIItw+z7S9L6SHZ7QOqo 2aszTbLLzdzJChsBBjyQ7S46UghGprkQh+8+iJ/+OOOK9T4A71euURV7yzgU5+I7pOVq7uuEUvc xMxwyyWyeR/cf/dst1SzUffc7jT2e/Ap4W1LCtdaFkEgNXR6/6vTsJ9ffKl5VQYaw5CW+v+L/NY BXwxO+E6cGtAf9afhviyTbXnazRnar/c2501rI4RBu/3zB+0LuIiuSBH4ZyHIkanDFBbekdJzfe j2fNRFns1Rv36w8SKGtQokraNhC7EX9sVaWLGkn778X7n00EcmfFH5uMSpERmffICVt/3aJa2rW nW09v5vj355pS053y8y4wrt1fWYw/tuQYoBi21rp2L+yWlr+nKcIRcoshluSVLDbH40G2H3jp6u boRup1nXBHzGN5w== X-Mailer: b4 0.13.0 Message-ID: <20241211-vma-v11-4-466640428fc3@google.com> Subject: [PATCH v11 4/8] mm: rust: add lock_vma_under_rcu From: Alice Ryhl To: Miguel Ojeda , Matthew Wilcox , Lorenzo Stoakes , Vlastimil Babka , John Hubbard , "Liam R. Howlett" , Andrew Morton , Greg Kroah-Hartman , Arnd Bergmann , Christian Brauner , Jann Horn , Suren Baghdasaryan Cc: Alex Gaynor , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rust-for-linux@vger.kernel.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Currently, the binder driver always uses the mmap lock to make changes to its vma. Because the mmap lock is global to the process, this can involve significant contention. However, the kernel has a feature called per-vma locks, which can significantly reduce contention. For example, you can take a vma lock in parallel with an mmap write lock. This is important because contention on the mmap lock has been a long-term recurring challenge for the Binder driver. This patch introduces support for using `lock_vma_under_rcu` from Rust. The Rust Binder driver will be able to use this to reduce contention on the mmap lock. Acked-by: Lorenzo Stoakes (for mm bits) Reviewed-by: Jann Horn Signed-off-by: Alice Ryhl --- rust/helpers/mm.c | 5 +++++ rust/kernel/mm.rs | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++= ++++ 2 files changed, 61 insertions(+) diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c index 7b72eb065a3e..81b510c96fd2 100644 --- a/rust/helpers/mm.c +++ b/rust/helpers/mm.c @@ -43,3 +43,8 @@ struct vm_area_struct *rust_helper_vma_lookup(struct mm_s= truct *mm, { return vma_lookup(mm, addr); } + +void rust_helper_vma_end_read(struct vm_area_struct *vma) +{ + vma_end_read(vma); +} diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs index ace8e7d57afe..425b73a9dfe6 100644 --- a/rust/kernel/mm.rs +++ b/rust/kernel/mm.rs @@ -13,6 +13,7 @@ use core::{ops::Deref, ptr::NonNull}; =20 pub mod virt; +use virt::VmAreaRef; =20 /// A wrapper for the kernel's `struct mm_struct`. /// @@ -170,6 +171,32 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_st= ruct) -> &'a MmWithUser { unsafe { &*ptr.cast() } } =20 + /// Attempt to access a vma using the vma read lock. + /// + /// This is an optimistic trylock operation, so it may fail if there i= s contention. In that + /// case, you should fall back to taking the mmap read lock. + /// + /// When per-vma locks are disabled, this always returns `None`. + #[inline] + pub fn lock_vma_under_rcu(&self, vma_addr: usize) -> Option> { + #[cfg(CONFIG_PER_VMA_LOCK)] + { + // SAFETY: Calling `bindings::lock_vma_under_rcu` is always ok= ay given an mm where + // `mm_users` is non-zero. + let vma =3D unsafe { bindings::lock_vma_under_rcu(self.as_raw(= ), vma_addr as _) }; + if !vma.is_null() { + return Some(VmaReadGuard { + // SAFETY: If `lock_vma_under_rcu` returns a non-null = ptr, then it points at a + // valid vma. The vma is stable for as long as the vma= read lock is held. + vma: unsafe { VmAreaRef::from_raw(vma) }, + _nts: NotThreadSafe, + }); + } + } + + None + } + /// Lock the mmap read lock. #[inline] pub fn mmap_read_lock(&self) -> MmapReadGuard<'_> { @@ -238,3 +265,32 @@ fn drop(&mut self) { unsafe { bindings::mmap_read_unlock(self.mm.as_raw()) }; } } + +/// A guard for the vma read lock. +/// +/// # Invariants +/// +/// This `VmaReadGuard` guard owns the vma read lock. +pub struct VmaReadGuard<'a> { + vma: &'a VmAreaRef, + // `vma_end_read` must be called on the same thread as where the lock = was taken + _nts: NotThreadSafe, +} + +// Make all `VmAreaRef` methods available on `VmaReadGuard`. +impl Deref for VmaReadGuard<'_> { + type Target =3D VmAreaRef; + + #[inline] + fn deref(&self) -> &VmAreaRef { + self.vma + } +} + +impl Drop for VmaReadGuard<'_> { + #[inline] + fn drop(&mut self) { + // SAFETY: We hold the read lock by the type invariants. + unsafe { bindings::vma_end_read(self.vma.as_ptr()) }; + } +} --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Dec 28 10:25:33 2024 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D03A8233691 for ; Wed, 11 Dec 2024 10:37:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913467; cv=none; b=RrjixCGpg0tyY5ov7ca668rQOGQ32Nw1wdUNVHPnt7nsaadzL0Ufh7SQs/ChT4mgOMAkaGGOMRX/F0TL2QvFhdbzh+ugADD2fAcMRCHht+cgbPPlf+z6lZHzy4KL+m5Hu17B9xRy33Ny0ji0FgBQNOuBOVcMC/twkn7FxMcEeoQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913467; c=relaxed/simple; bh=o52ixmPVakbGqHXkdZGKle8ANx4alFbrYcfoTBoaBeY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=EvMhiU5miRljXo16n+x39aEaBNI+hDEVM74vGuFEba6vsjDxBYmYw1QbxjrM4EEGs/pyiH0U6IUrgpPlNz97RZjDwdIhByQ+RCdGWbAW9RgKRYa4jmRrlRrPXYHVWD9SNUE06grQQ4CltsVR7KYTlrDNAPE70QVCbMyHNXn7RXQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nuj/+Oee; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nuj/+Oee" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-3862e986d17so1711729f8f.3 for ; Wed, 11 Dec 2024 02:37:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733913463; x=1734518263; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=N4tIdMdsO0J6lrUvneIwXSrF/KOin5KwKJIBsiYufGA=; b=nuj/+OeeVMOiUj+t5OBbkKwDnqVnCgHfp00R3GWlmvtINfVvW7fkLbxPtLSdIi7odm A+3e9cYZ5A0RYxxRMF9ohOysJkcV9S4w98+rzVpxF3nR6NE48ZCCQ6Q9waHJos5n0s71 TvoYLslyYC2HYWBn4GUG+qaQiUUqrwLiBxsvLqLNYEB4q5kfLz/T5WjveFa3wu8CLyEz hlaYCOii+ac6v/CPyEha1u8wzKJPOzay3K4BjCihPIAMeBraC41NxyQkDRE6PB5QAhNU MGXDR5nXLoiH45j7eUqMUmztWIC9y5ZwzvIi7h7+w+B5rVYwlSZS4gCp8u3ZcGL+z3IM 0Obw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733913463; x=1734518263; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N4tIdMdsO0J6lrUvneIwXSrF/KOin5KwKJIBsiYufGA=; b=fBKGO7fUNQuYgu9r5gp/JtQBwfHNQOAydqC9vevxBfIkWDnVehcZVXKqU8Kyit+vgI drKQQ0fxNb3aedUPd3almTOOlnYj/OuTk0Vu05Jh5aQ50K4t4SYXWoHSVu5rJ1p2JplG IdThI/hAJhGRQxAlQbHYEGOursZUWErWyiwvDSbreNXSkthGdM7FpdFyCZjP0w7CvjnM 8UA3+98E9+T4zVmC6a39XIAhOxU8dmiZbB2HwcBgvrsM1CMg2dV+3hQeSMKtIKsqanyd J68yLKIVDRQqQVmGDkcWxYr8wJqf/LBogyHTb6eyzAr6u/YZH8cN4HVIAdQzs1UU/pl6 WTyQ== X-Forwarded-Encrypted: i=1; AJvYcCWTjM2k8OApvV6Q305fWEIkACBvot/3T5NyqGy5+4U+RF2ANXm6PFNVpETza9h8mmZA5524Np1IIRekrgE=@vger.kernel.org X-Gm-Message-State: AOJu0YyjSTZRS2lp4lxorxkNLCKTSy9qiYji7yhnspY0tR3l5krGPxDQ g4tOZEOls5gy4FSsBVxKaaT7l5KewSxq82NNsSZYcajmubtHFkJ48SWpV2gpndTuKJM38CEp83+ SyPssmV/HrRTiQw== X-Google-Smtp-Source: AGHT+IGvJYqc2Db+rertFy18yncLZyz9Adh52dHN4rQJSEpkQ1PGruhxbsByd/HWJepWTHQaE7E4uK9LRFm4JnY= X-Received: from wmpr10.prod.google.com ([2002:a05:600c:320a:b0:435:dde5:2c3b]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6000:2d84:b0:385:f979:7664 with SMTP id ffacd0b85a97d-3864ced3035mr1190487f8f.58.1733913463310; Wed, 11 Dec 2024 02:37:43 -0800 (PST) Date: Wed, 11 Dec 2024 10:37:09 +0000 In-Reply-To: <20241211-vma-v11-0-466640428fc3@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211-vma-v11-0-466640428fc3@google.com> X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=3090; i=aliceryhl@google.com; h=from:subject:message-id; bh=o52ixmPVakbGqHXkdZGKle8ANx4alFbrYcfoTBoaBeY=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBnWWtoRxaCSkuHkzNXUxE2qqNVnv4EQP+W4uD5c UetmRby79SJAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCZ1lraAAKCRAEWL7uWMY5 RmGCD/9Xgd51fs3bqaJoNH1j26BxFH1sX0Yp3b23v/LQVFSNhncGa6OGK550wg7ov0OeRIedPKw vfoO7FS+v9j13zOMi4A9nogfpPp0lYkqOTWrjQl4lIX68/kck8B5nybxGBMNAvS2CWxD8J0Oomq 6SQ+cb/xgGWhD7xV2WvIifQANQyuin8/tmalAHCl1YX+yIhv+kuE9c0KrvQTAZoWjdTvymoHJIf 1HJepqC3TOgcM0g6dYzn1eNvyFJ9ohJMe9U/AuPZyQf7j0uZPdI87WRmSPQo4Nz3eWY1OLsDC3O UXjBgTWPCBw2u6qNS1ZBOlc/qK8zjohbdQlEjtXZmBJySD+w3BE4xtEeY3yFUM4dz74jwcOQlWK WDLxpWyPhDtHAwUZApsiExI7rM4qZ5YDprePBLCPvtvFW0iPN05pOnPqHip3EZB571ZotEyrQqH M6TPjtBZ0so7nD5X7bMEUx1AWeXuWpfqU9K+OuvbpcPa1bSLiKddGL3/aXfKcp89ZoPPHKRWgE4 qQKlEHkqh+G265jkYzw0ENNNu9TBPO/eqNsH9Vz4XpCRGS3WC0l5ZbXStkimQTxSBRvxIimx72u 12CtVhWEgk2lytcOj92QGdVSVd3m1qTGXFcJ2PHYgnugqt12OMZ/3fsD9gyqGz17LTTFnjZ+Rls mQogTRrM6ybWoCA== X-Mailer: b4 0.13.0 Message-ID: <20241211-vma-v11-5-466640428fc3@google.com> Subject: [PATCH v11 5/8] mm: rust: add mmput_async support From: Alice Ryhl To: Miguel Ojeda , Matthew Wilcox , Lorenzo Stoakes , Vlastimil Babka , John Hubbard , "Liam R. Howlett" , Andrew Morton , Greg Kroah-Hartman , Arnd Bergmann , Christian Brauner , Jann Horn , Suren Baghdasaryan Cc: Alex Gaynor , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rust-for-linux@vger.kernel.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Adds an MmWithUserAsync type that uses mmput_async when dropped but is otherwise identical to MmWithUser. This has to be done using a separate type because the thing we are changing is the destructor. Rust Binder needs this to avoid a certain deadlock. See commit 9a9ab0d96362 ("binder: fix race between mmput() and do_exit()") for details. It's also needed in the shrinker to avoid cleaning up the mm in the shrinker's context. Acked-by: Lorenzo Stoakes (for mm bits) Signed-off-by: Alice Ryhl Reviewed-by: Andreas Hindborg --- rust/kernel/mm.rs | 49 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs index 425b73a9dfe6..50f4861ae4b9 100644 --- a/rust/kernel/mm.rs +++ b/rust/kernel/mm.rs @@ -98,6 +98,48 @@ fn deref(&self) -> &Mm { } } =20 +/// A wrapper for the kernel's `struct mm_struct`. +/// +/// This type is identical to `MmWithUser` except that it uses `mmput_asyn= c` when dropping a +/// refcount. This means that the destructor of `ARef` is= safe to call in atomic +/// context. +/// +/// # Invariants +/// +/// Values of this type are always refcounted using `mmget`. The value of = `mm_users` is non-zero. +#[repr(transparent)] +pub struct MmWithUserAsync { + mm: MmWithUser, +} + +// SAFETY: It is safe to call `mmput_async` on another thread than where `= mmget` was called. +unsafe impl Send for MmWithUserAsync {} +// SAFETY: All methods on `MmWithUserAsync` can be called in parallel from= several threads. +unsafe impl Sync for MmWithUserAsync {} + +// SAFETY: By the type invariants, this type is always refcounted. +unsafe impl AlwaysRefCounted for MmWithUserAsync { + fn inc_ref(&self) { + // SAFETY: The pointer is valid since self is a reference. + unsafe { bindings::mmget(self.as_raw()) }; + } + + unsafe fn dec_ref(obj: NonNull) { + // SAFETY: The caller is giving up their refcount. + unsafe { bindings::mmput_async(obj.cast().as_ptr()) }; + } +} + +// Make all `MmWithUser` methods available on `MmWithUserAsync`. +impl Deref for MmWithUserAsync { + type Target =3D MmWithUser; + + #[inline] + fn deref(&self) -> &MmWithUser { + &self.mm + } +} + // These methods are safe to call even if `mm_users` is zero. impl Mm { /// Call `mmgrab` on `current.mm`. @@ -171,6 +213,13 @@ pub unsafe fn from_raw<'a>(ptr: *const bindings::mm_st= ruct) -> &'a MmWithUser { unsafe { &*ptr.cast() } } =20 + /// Use `mmput_async` when dropping this refcount. + #[inline] + pub fn into_mmput_async(me: ARef) -> ARef= { + // SAFETY: The layouts and invariants are compatible. + unsafe { ARef::from_raw(ARef::into_raw(me).cast()) } + } + /// Attempt to access a vma using the vma read lock. /// /// This is an optimistic trylock operation, so it may fail if there i= s contention. In that --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Dec 28 10:25:33 2024 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0027E2336BD for ; Wed, 11 Dec 2024 10:37:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913470; cv=none; b=ru+ykOWEyDPQo4eo6eRlvFAYgKM+zy9Jc1fd9iHvKyZqq12IfJfajPKbyvqggk8Ze+VGsYiAah+Hel/CVQD16DuH2HKixk6h7QHHHGdmua382U3RQW8tBrIJci6rz6xGgt9DhocseX0ECkfHZ7pfoVzhMylUs0cGopn3jEHNnNE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913470; c=relaxed/simple; bh=seaPzXRr+ND8q/GQYnUjLUdeX+eDlOtmUd9qc8R35Uw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NU5jdji9AtmiUpkRTgjvMgdZjaBLUSycsA1YB4HBpb/bCSqFgKZtbozt28Adp/6R1OWhkE+Xifk1rKAdlOCTvNDzjPmcnocdexOr9YdbqdJwxCPZe+CQcWzQWcMpgoxw30glhIl500NLqzLlypLAeveXawS835O0PlxpfVTQHVI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=r4zwL1Sl; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="r4zwL1Sl" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43619b135bcso2246185e9.1 for ; Wed, 11 Dec 2024 02:37:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733913465; x=1734518265; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=H9J7APDOlzo7bms4ez8DHMnNHWyqhL35jyR1kYpr9QI=; b=r4zwL1SlABpgTLct/fbhgNey/m5H+/wHBmv4fAoNu761+s6Bb5bMSvehYbvoBpUa8X rjP88O0OohFmW8QThTlXG92uc/A+iJeBhXBH3vX7uljjchcaPXTIkOXbJHQMFrqWjGg5 aEFCRC5ji5wzjtvR3Lk1I4v8WJB6DKOJZ2UVv1zBmx5ekIXHSqXN3MWw2VB6/FDzqUQP ULkhd+lETKksuttnkYln970qzJ1akFKquDIujrPjKt39X7t6sF/6uLoKkGdk+t/wKZtA vgGBR7pS2SqL/J9Q+bEW1aT7W8ZxZIFBgWhYZTgHTogWH+isnc1FLQWwCZnGRsEg9T5i FDWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733913465; x=1734518265; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=H9J7APDOlzo7bms4ez8DHMnNHWyqhL35jyR1kYpr9QI=; b=kbGtdhLRcx3zbr2FsuRKf4Du1gP9gbzF5cdS9TRNXyXzdZATLo18YmKwJgEiNQjVvP zEO95RA1rkUj0A5wNl/JRBYASjZWv8xuigkC2HUDYIiIVywS+SLI8pQZE6bjKRDh76Xr +aXOgpNmCwPVfuexLfW0k9nq9Dvt9vk8GYqmsiS6XX9E7kVpTjJg5L+U/J7hTbPxLDW3 MI1Vpy37S4vNTtunS6Z9uY38rokEQ4jQl4DgmAJlw2/sMKakhAwwzWOu98XPl3E7Gp/1 Hn7+FkunmSw5McuIh2Ftra24LdD7wh/GwJzwfNTAJZUjU5LBtxKA0AJ3/HU+20r5cXHw dbJw== X-Forwarded-Encrypted: i=1; AJvYcCWfzIzQAt5dOAp4l4uKV/uBqdb2bGYyEaev7VVQ9+XLzxxzq+KUzGDVf5VR3shUTYi4c8tMGW4vGLbzSFs=@vger.kernel.org X-Gm-Message-State: AOJu0YzBR8sPISkUBFa2OWQconWoFbHdBEsUxenTGTWYXqvtdh1djgcp JGIY9DCutjnzF7W4PZEH2MdYD7xkVjrM/U5RagGKVoLbxdHDDV5Yh7lx86tAwtWJOaSfthFN+/z fd1yvFbMyhkvHQg== X-Google-Smtp-Source: AGHT+IEHXGYW3IxnBDufE3uW2/yO/pfGqLpiPsVXzHTtpypegQgEznYG6piRm0MwSTPxkbuG6wjEylcUkB+RemI= X-Received: from wmlu15.prod.google.com ([2002:a05:600c:210f:b0:434:f0d4:cbaf]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:3b94:b0:434:f0df:9f6 with SMTP id 5b1f17b1804b1-4361c346814mr18704535e9.3.1733913465503; Wed, 11 Dec 2024 02:37:45 -0800 (PST) Date: Wed, 11 Dec 2024 10:37:10 +0000 In-Reply-To: <20241211-vma-v11-0-466640428fc3@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211-vma-v11-0-466640428fc3@google.com> X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=8235; i=aliceryhl@google.com; h=from:subject:message-id; bh=seaPzXRr+ND8q/GQYnUjLUdeX+eDlOtmUd9qc8R35Uw=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBnWWtph/aiDjcWSnagWFqg3c8CayEabkssTHofo IIZe48WPGWJAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCZ1lraQAKCRAEWL7uWMY5 RhAmD/48pALXpz/NRk7/Fwz/9MCuo0u66DfT05kQmm6LINIciACvTECWnSEQ0tqAKRNpsp/3Fex M4Jpu48j5eDcPWEe/l1C35rsiOmHX4W4JVrnnbQZp+/CjgyqyKCcGdDeeAsbF9Fup1VX8msId3D /03PP6NhmDZYipxZdlby2oUlmR34VgBi5vhQUaLndkfR7m4sEFY9dttHnlypa8gl+V0gKxmY79U 9JhN1ZFWyvUIYXm8lClnAsbDLy08fgFrDbEUeFzSMhVR1kTYVdrrhv5dekv/RSP83A632PXlet0 PYeKGYna7WKapvDmtx8K7d0S/9kzlS2ohG7IDbWe5ZGerwfda1ceER8EtUwLQcpx6Lodt9LYOWU 0WrCa6ARlGG4I/YVU2U6dzSxkYI3Kozqnl6EzH58UkK9BXMGdSGAL+HX19s8k94rmcbVe5P7PHi W+05YwL+uuVwdmWJJ3JWrwvQTW1iF2nFdQXtgZ2iGluLiOIQYSiG9J8og/1HIcZ0hzT7VTzDTL7 8A3/CIHP3KV5PwM6wFMQXsbOnuL1jQF2ZQmrxOfLskPlDXn3meCorLLNaO9pkEG5rjCOyNHwD6p 7r6CLbjSBFYOiYh8+hJVc36KbYuQafNb+ARWM1sWBLGLoe0W8HWuM103Eetus0orFCyJ79zlJFI VIEfxrJdZ6KH4lQ== X-Mailer: b4 0.13.0 Message-ID: <20241211-vma-v11-6-466640428fc3@google.com> Subject: [PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap() From: Alice Ryhl To: Miguel Ojeda , Matthew Wilcox , Lorenzo Stoakes , Vlastimil Babka , John Hubbard , "Liam R. Howlett" , Andrew Morton , Greg Kroah-Hartman , Arnd Bergmann , Christian Brauner , Jann Horn , Suren Baghdasaryan Cc: Alex Gaynor , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rust-for-linux@vger.kernel.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable This type will be used when setting up a new vma in an f_ops->mmap() hook. Using a separate type from VmAreaRef allows us to have a separate set of operations that you are only able to use during the mmap() hook. For example, the VM_MIXEDMAP flag must not be changed after the initial setup that happens during the f_ops->mmap() hook. To avoid setting invalid flag values, the methods for clearing VM_MAYWRITE and similar involve a check of VM_WRITE, and return an error if VM_WRITE is set. Trying to use `try_clear_maywrite` without checking the return value results in a compilation error because the `Result` type is marked #[must_use]. For now, there's only a method for VM_MIXEDMAP and not VM_PFNMAP. When we add a VM_PFNMAP method, we will need some way to prevent you from setting both VM_MIXEDMAP and VM_PFNMAP on the same vma. Acked-by: Lorenzo Stoakes (for mm bits) Reviewed-by: Jann Horn Signed-off-by: Alice Ryhl --- rust/kernel/mm/virt.rs | 181 +++++++++++++++++++++++++++++++++++++++++++++= +++- 1 file changed, 180 insertions(+), 1 deletion(-) diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs index 3a23854e14f4..6d9ba56d4f95 100644 --- a/rust/kernel/mm/virt.rs +++ b/rust/kernel/mm/virt.rs @@ -6,7 +6,7 @@ =20 use crate::{ bindings, - error::{to_result, Result}, + error::{code::EINVAL, to_result, Result}, mm::MmWithUser, page::Page, types::Opaque, @@ -171,6 +171,185 @@ pub fn vm_insert_page(&self, address: usize, page: &P= age) -> Result { } } =20 +/// A builder for setting up a vma in an `f_ops->mmap()` hook. +/// +/// # Invariants +/// +/// For the duration of 'a, the referenced vma must be undergoing initiali= zation in an +/// `f_ops->mmap()` hook. +pub struct VmAreaNew { + vma: VmAreaRef, +} + +// Make all `VmAreaRef` methods available on `VmAreaNew`. +impl Deref for VmAreaNew { + type Target =3D VmAreaRef; + + #[inline] + fn deref(&self) -> &VmAreaRef { + &self.vma + } +} + +impl VmAreaNew { + /// Access a virtual memory area given a raw pointer. + /// + /// # Safety + /// + /// Callers must ensure that `vma` is undergoing initial vma setup for= the duration of 'a. + #[inline] + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'= a Self { + // SAFETY: The caller ensures that the invariants are satisfied fo= r the duration of 'a. + unsafe { &*vma.cast() } + } + + /// Internal method for updating the vma flags. + /// + /// # Safety + /// + /// This must not be used to set the flags to an invalid value. + #[inline] + unsafe fn update_flags(&self, set: vm_flags_t, unset: vm_flags_t) { + let mut flags =3D self.flags(); + flags |=3D set; + flags &=3D !unset; + + // SAFETY: This is not a data race: the vma is undergoing initial = setup, so it's not yet + // shared. Additionally, `VmAreaNew` is `!Sync`, so it cannot be u= sed to write in parallel. + // The caller promises that this does not set the flags to an inva= lid value. + unsafe { (*self.as_ptr()).__bindgen_anon_2.__vm_flags =3D flags }; + } + + /// Set the `VM_MIXEDMAP` flag on this vma. + /// + /// This enables the vma to contain both `struct page` and pure PFN pa= ges. Returns a reference + /// that can be used to call `vm_insert_page` on the vma. + #[inline] + pub fn set_mixedmap(&self) -> &VmAreaMixedMap { + // SAFETY: We don't yet provide a way to set VM_PFNMAP, so this ca= nnot put the flags in an + // invalid state. + unsafe { self.update_flags(flags::MIXEDMAP, 0) }; + + // SAFETY: We just set `VM_MIXEDMAP` on the vma. + unsafe { VmAreaMixedMap::from_raw(self.vma.as_ptr()) } + } + + /// Set the `VM_IO` flag on this vma. + /// + /// This is used for memory mapped IO and similar. The flag tells othe= r parts of the kernel to + /// avoid looking at the pages. For memory mapped IO this is useful as= accesses to the pages + /// could have side effects. + #[inline] + pub fn set_io(&self) { + // SAFETY: Setting the VM_IO flag is always okay. + unsafe { self.update_flags(flags::IO, 0) }; + } + + /// Set the `VM_DONTEXPAND` flag on this vma. + /// + /// This prevents the vma from being expanded with `mremap()`. + #[inline] + pub fn set_dontexpand(&self) { + // SAFETY: Setting the VM_DONTEXPAND flag is always okay. + unsafe { self.update_flags(flags::DONTEXPAND, 0) }; + } + + /// Set the `VM_DONTCOPY` flag on this vma. + /// + /// This prevents the vma from being copied on fork. This option is on= ly permanent if `VM_IO` + /// is set. + #[inline] + pub fn set_dontcopy(&self) { + // SAFETY: Setting the VM_DONTCOPY flag is always okay. + unsafe { self.update_flags(flags::DONTCOPY, 0) }; + } + + /// Set the `VM_DONTDUMP` flag on this vma. + /// + /// This prevents the vma from being included in core dumps. This opti= on is only permanent if + /// `VM_IO` is set. + #[inline] + pub fn set_dontdump(&self) { + // SAFETY: Setting the VM_DONTDUMP flag is always okay. + unsafe { self.update_flags(flags::DONTDUMP, 0) }; + } + + /// Returns whether `VM_READ` is set. + /// + /// This flag indicates whether userspace is mapping this vma as reada= ble. + #[inline] + pub fn get_read(&self) -> bool { + (self.flags() & flags::READ) !=3D 0 + } + + /// Try to clear the `VM_MAYREAD` flag, failing if `VM_READ` is set. + /// + /// This flag indicates whether userspace is allowed to make this vma = readable with + /// `mprotect()`. + /// + /// Note that this operation is irreversible. Once `VM_MAYREAD` has be= en cleared, it can never + /// be set again. + #[inline] + pub fn try_clear_mayread(&self) -> Result { + if self.get_read() { + return Err(EINVAL); + } + // SAFETY: Clearing `VM_MAYREAD` is okay when `VM_READ` is not set. + unsafe { self.update_flags(0, flags::MAYREAD) }; + Ok(()) + } + + /// Returns whether `VM_WRITE` is set. + /// + /// This flag indicates whether userspace is mapping this vma as writa= ble. + #[inline] + pub fn get_write(&self) -> bool { + (self.flags() & flags::WRITE) !=3D 0 + } + + /// Try to clear the `VM_MAYWRITE` flag, failing if `VM_WRITE` is set. + /// + /// This flag indicates whether userspace is allowed to make this vma = writable with + /// `mprotect()`. + /// + /// Note that this operation is irreversible. Once `VM_MAYWRITE` has b= een cleared, it can never + /// be set again. + #[inline] + pub fn try_clear_maywrite(&self) -> Result { + if self.get_write() { + return Err(EINVAL); + } + // SAFETY: Clearing `VM_MAYWRITE` is okay when `VM_WRITE` is not s= et. + unsafe { self.update_flags(0, flags::MAYWRITE) }; + Ok(()) + } + + /// Returns whether `VM_EXEC` is set. + /// + /// This flag indicates whether userspace is mapping this vma as execu= table. + #[inline] + pub fn get_exec(&self) -> bool { + (self.flags() & flags::EXEC) !=3D 0 + } + + /// Try to clear the `VM_MAYEXEC` flag, failing if `VM_EXEC` is set. + /// + /// This flag indicates whether userspace is allowed to make this vma = executable with + /// `mprotect()`. + /// + /// Note that this operation is irreversible. Once `VM_MAYEXEC` has be= en cleared, it can never + /// be set again. + #[inline] + pub fn try_clear_mayexec(&self) -> Result { + if self.get_exec() { + return Err(EINVAL); + } + // SAFETY: Clearing `VM_MAYEXEC` is okay when `VM_EXEC` is not set. + unsafe { self.update_flags(0, flags::MAYEXEC) }; + Ok(()) + } +} + /// The integer type used for vma flags. #[doc(inline)] pub use bindings::vm_flags_t; --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Dec 28 10:25:33 2024 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 324822368F8 for ; Wed, 11 Dec 2024 10:37:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913470; cv=none; b=MOqrjDznxRtm03V3Fkfjnl4yLuWezop8CNfprqtOOje8s4NE0647PTpBvncAQnbjSpp0vLyg22uF/owUUow60GnfT6e9eEIubLUdaLX3u2DUBTCJ9vxdWRgQ9Pp/0uOEU9hF/bUSvyr0+srLtkgRj3qluQJ1FhyZZwzz7Y1THjM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913470; c=relaxed/simple; bh=kzjS+ohxpZzNcxKXIXjmOFsrjiQU+XTutyXr8P3Ah4o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=bEgLjJocDf8rXAn74dxrsS9X+pE/zwki0ju/mTPz+cwIhxUtdvwSR5fHqomgcuicuLILooNFYWbzX8G2iUS7xm4dDZLAvNpfcVaPgQdzgPAqVGGhkVM1uuxjNiiuhWslGSTbna8w+wHp3/AL33ogy16mbA2CZTUZAo0NOsw7ypg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Snz4vEGv; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Snz4vEGv" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-434f1d39147so22998515e9.0 for ; Wed, 11 Dec 2024 02:37:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733913467; x=1734518267; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rTGvDMTG6tS84iSkKnwUvzTY8tCVI3+QbIdPlF0cheo=; b=Snz4vEGv5rozvAbPIcHfUxgexy7ZSGcFaME9OU+2/J56NmlxBR2LkSDr1bU5Qpu4Sr 0gSsKsXcORkTQqNiO+ZWnD3jGlKqrIRGFD8VwzoYsOKtMJhXeOwCeQ4AR04wSRQ/u4Gw sXZuLEUCnh49VQOA9+Tu5+tH9NumIVQWUuerlOiWqBwq/bnwBZOIMgSnoXJnFnPyYLdi DkxqySeTYPTPbO2/LgdeiQ9VFUGecylV2sWZZAA2LQIP3RLyV37LK//LTwMqX26l8UH0 ZWqzs1GWwNlXV6iSDvnadb0ZZkMwqWlkFJEqAc4xfvldUydZilE1jIohDe28Xa5yaMnx Oiwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733913467; x=1734518267; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rTGvDMTG6tS84iSkKnwUvzTY8tCVI3+QbIdPlF0cheo=; b=nnVTYRYqIOq4NJFh269WwFby9ifuU/EPEnEcXJzjE8E+HcnqmTcGorLh1I24IqkZlI DcXWq3Fe1cXc3pIWo/xKgkAa6KeEJ6/WtwebpjKL8bboiMjJseBADsH9O1FsmX4OhHg2 s3IPeq7DjEEICJW/D1WYorJc9U4M/cQVxNU3ZIx7qi+rlUmGDqNlXSacnPTLjkyUiRCO uy6kNL7R+B83jBsru0z9OY8OyhxzGSL+VROTqQPjInNc4VSasesCd/0qwZN2aZJUqdzm bxY07FWmw6J7gLNUSiqzV4LOfxJd28kbHOZfuUXcQoOoZNCLWRq9wK7iRsrX8i3mWJZo h/Fg== X-Forwarded-Encrypted: i=1; AJvYcCVTikLQoshjy6Vk2aBSnftNvZQ5cGLz0VrwF53IxDGroT18XbEJfNim60xlcud0U3LfKOsNWh4fDmxbn04=@vger.kernel.org X-Gm-Message-State: AOJu0Yx6/IbVFrDKutcH298NRV9Be6LSTjB3KCFUo8GjTcdonfWNsu7d 3Blj9gsUWq7PgeH9s0MRREXzPIfG4Pl+0o/+Wc0mL6q0lZWv1cgr6JM/rjhE6SIu79FCCqxVv8o bo7UzC0HwG7r0BA== X-Google-Smtp-Source: AGHT+IFbQovoxVXUDOnrp4PlHMXfzU8u47wUa/LBd5eLin9mhsNX8CNYJjENnzGGaQ/unC/QIRj6aAvvCl/6L5c= X-Received: from wmbf11.prod.google.com ([2002:a05:600c:594b:b0:434:a98d:6a1c]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:83cf:b0:434:edcf:7464 with SMTP id 5b1f17b1804b1-4361c418431mr18973185e9.30.1733913467697; Wed, 11 Dec 2024 02:37:47 -0800 (PST) Date: Wed, 11 Dec 2024 10:37:11 +0000 In-Reply-To: <20241211-vma-v11-0-466640428fc3@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211-vma-v11-0-466640428fc3@google.com> X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=3113; i=aliceryhl@google.com; h=from:subject:message-id; bh=kzjS+ohxpZzNcxKXIXjmOFsrjiQU+XTutyXr8P3Ah4o=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBnWWtphPlZQRQby61uOX0k3ecWUliVN/FZJFCoi bM4aIl7KVeJAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCZ1lraQAKCRAEWL7uWMY5 RpGaD/wMi3M4it7Hk8rAcHr7ywdgfyToJ+g739ILqsktYqKVFQs/m+KIByJs+9mKBODgiyL10K5 PEZYsa0Ox8s8XKZPI69psKntnEhe3GjUUvu7px0f15cmt57bRZODDgo3vI59YfdTW5FZOESP9RD YhSPLC08wMnnoQBxpcm99TqMClCmmsWIlo4bXZfGvuzS2CJUipJ9sptfZzzrYVIKNO23klgaGPQ 8yZF6MBm59u1msMIvMt35PK1aBm/0EVWPzHCjgQnSqZTbQfrF0VdVZ5yjRfJiEzqf7NT36yIkz5 a89RzhZKbKMOqBVSMGJ7WHIpfsblZ/9wBgcrgDYLjQRb0FTgJQemp5aXCPWaXMNLS/rtMH/O0VQ eaYb9srV1MWgtGQOwbKFC8C0jIWhmSOiTdUabE96UYeBrzR4OkXqW1nPI/E6OaG6jx9o34JdWHQ nCIW7jvcdA2V19zfyaTQT+zOkrX6OJ3rfccVMAXlArE2lHiCUpKWFt7L7z1DlYoYj9PmjnUaTZn IIsFDF/0jEngVlAuF+Fw+EHPuOXm2I4uasrQubr3yOClmPU2OeM9hwv1p1GvOzU6QfgS0LZempt ng0nHj9CaNoclmMK2hvLhOPagmeyxUlGOVFA12epr2x5ASIL++FcelN0S6Uqpdlju+Q7hHbXi1C UQa8vdmTLIt2gug== X-Mailer: b4 0.13.0 Message-ID: <20241211-vma-v11-7-466640428fc3@google.com> Subject: [PATCH v11 7/8] rust: miscdevice: add mmap support From: Alice Ryhl To: Miguel Ojeda , Matthew Wilcox , Lorenzo Stoakes , Vlastimil Babka , John Hubbard , "Liam R. Howlett" , Andrew Morton , Greg Kroah-Hartman , Arnd Bergmann , Christian Brauner , Jann Horn , Suren Baghdasaryan Cc: Alex Gaynor , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rust-for-linux@vger.kernel.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Add the ability to write a file_operations->mmap hook in Rust when using the miscdevice abstraction. The `vma` argument to the `mmap` hook uses the `VmAreaNew` type from the previous commit; this type provides the correct set of operations for a file_operations->mmap hook. Acked-by: Lorenzo Stoakes (for mm bits) Signed-off-by: Alice Ryhl --- rust/kernel/miscdevice.rs | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/rust/kernel/miscdevice.rs b/rust/kernel/miscdevice.rs index 7e2a79b3ae26..e5366f9c6d7f 100644 --- a/rust/kernel/miscdevice.rs +++ b/rust/kernel/miscdevice.rs @@ -11,6 +11,8 @@ use crate::{ bindings, error::{to_result, Error, Result, VTABLE_DEFAULT_ERROR}, + fs::File, + mm::virt::VmAreaNew, prelude::*, str::CStr, types::{ForeignOwnable, Opaque}, @@ -110,6 +112,15 @@ fn release(device: Self::Ptr) { drop(device); } =20 + /// Handle for mmap. + fn mmap( + _device: ::Borrowed<'_>, + _file: &File, + _vma: &VmAreaNew, + ) -> Result { + kernel::build_error!(VTABLE_DEFAULT_ERROR) + } + /// Handler for ioctls. /// /// The `cmd` argument is usually manipulated using the utilties in [`= kernel::ioctl`]. @@ -156,6 +167,7 @@ impl VtableHelper { const VTABLE: bindings::file_operations =3D bindings::file_operati= ons { open: Some(fops_open::), release: Some(fops_release::), + mmap: maybe_fn(T::HAS_MMAP, fops_mmap::), unlocked_ioctl: maybe_fn(T::HAS_IOCTL, fops_ioctl::), #[cfg(CONFIG_COMPAT)] compat_ioctl: if T::HAS_COMPAT_IOCTL { @@ -216,6 +228,31 @@ impl VtableHelper { 0 } =20 +/// # Safety +/// +/// `file` must be a valid file that is associated with a `MiscDeviceRegis= tration`. +/// `vma` must be a vma that is currently being mmap'ed with this file. +unsafe extern "C" fn fops_mmap( + file: *mut bindings::file, + vma: *mut bindings::vm_area_struct, +) -> c_int { + // SAFETY: The mmap call of a file can access the private data. + let private =3D unsafe { (*file).private_data }; + // SAFETY: Mmap calls can borrow the private data of the file. + let device =3D unsafe { ::borrow(private) }; + // SAFETY: The caller provides a vma that is undergoing initial VMA se= tup. + let area =3D unsafe { VmAreaNew::from_raw(vma) }; + // SAFETY: + // * The file is valid for the duration of this call. + // * There is no active fdget_pos region on the file on this thread. + let file =3D unsafe { File::from_raw_file(file) }; + + match T::mmap(device, file, area) { + Ok(()) =3D> 0, + Err(err) =3D> err.to_errno() as c_int, + } +} + /// # Safety /// /// `file` must be a valid file that is associated with a `MiscDeviceRegis= tration`. --=20 2.47.1.613.gc27f4b7a9f-goog From nobody Sat Dec 28 10:25:33 2024 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 518DB23690D for ; Wed, 11 Dec 2024 10:37:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913474; cv=none; b=ivuxCfa94z98yhQcGBsgTcRgN20gQ6y5L9CMPdYINRg+JsuAqSBnTJeArCWZJ7MAde86Z3LzjiEPbQuM00k8ix54QgnoomJQ9iBIBXz3lqfDGlXgGyNM3ly76Hc42f8O+kd0w6ujsleMe+cpya/e+O1wBw/UbMs2sX4m/Yys7ck= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733913474; c=relaxed/simple; bh=StMV4D93TpURmhDr1aRbRum370oMmECfCETgkAAobmQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sQ36FEbHuq8+izPvFDv4PFJ6OHb9NqucevrLMblKjfm4XW+zejZ/LaSxqgJiO+fJ4yovmpUchJFf/qVlSNIunaZ9S2NtKTJWTTyPKk0K71D8qngxiJ4rWvouSRoIgJsRnRlHqPZ95bYwf0PnPt5J/22UTjS8BTcYsElmXlLA4Pk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GYIlSO+B; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--aliceryhl.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GYIlSO+B" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-434c214c05aso50349765e9.0 for ; Wed, 11 Dec 2024 02:37:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1733913470; x=1734518270; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=COiqNiZZHGXDau8NrwsaWsQuxOeFpNqIFDuZo6Q7yxY=; b=GYIlSO+B97qIxGbyD2h2TCX3/LlrzHIdXrTCwfYd/+MW/1t3OG9eJJ2a9VntDCaWqX ln73/fBsQdeLVxFDuv6lN4+0BmArUVzzr4IYkB7BlXoQUgiO840/3dvCqNkokUX/8i49 O2L274RnhiKentLIhnCrhtytK5noSB0O6yE7LSz2KI8Gxw/puQQbOsbctip4smIrpBWB rMlpvRZXZ0/pVwrlQtX0phQxYNv/WipEz6EgysxOQbbrM4tfYg0F7ZFnbKS2Go//JHr8 un7EaUApEPQBzQg1FUGj3uE2V42NDGK11LALzf5xtKQUIIjFspOL4GSWS7Q4iQWvAk1R mx5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733913470; x=1734518270; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=COiqNiZZHGXDau8NrwsaWsQuxOeFpNqIFDuZo6Q7yxY=; b=M+fk1fH2cOrtcevIXeR60RFMVdq7en9qfR4nmiSKK8KRfB7NNyOP5w4031dzyaNzub niSC/6igk1CXRlP+vc/pkIIGl8mmPDQsQEtZbTTukS58vSPGSpIj4mLi5rMDduLTuk+d yYalrf+82SgfksF8JT+JZ/rF8iIseHvjItKp019edLEUZoAcgpiTQydjS3BoxLvYZfLt Dvifdn/vtwYpTGHJwxeYZ/K78c4WIu+Tzn+Ls1OFvsOIeD1Vi9D56SnvL0hgODSO/Qui IZsMPoKcJ7ivU6z3wqZrIjmKx3X5DOzj5dtM+oYN5IWWPE3M6n3R+vqW4mYnPq/CPoTe TCgA== X-Forwarded-Encrypted: i=1; AJvYcCU+d+Gj/41/618wIEcaitLs7dBctXsWjRKQRf4g7PWcDVgPVvEYWAtOGGI8+RnkBMtgoWaW6UGSFrheIt0=@vger.kernel.org X-Gm-Message-State: AOJu0Yz0aunZlLXCnAFEPDQv0AqCKlYcr/p1pF3V2xBXZFN1geNoJFx/ QTE6/3TeSU7NPIKFhIOC0GaC8p1fpnOz7XzvNuM0esSnV9LhPuuOF517BCuc0fkMP6HUfGZ5vEb SVSBSyvKvy+toCQ== X-Google-Smtp-Source: AGHT+IGJWksM6TqfuEAm6PR3f/upyLvC2qq3UeG2eYl7geK7S/29j1j2r63y2Gy0jB84JFF33EsS10vrIieEeLs= X-Received: from wmbju8.prod.google.com ([2002:a05:600c:56c8:b0:434:feb1:add1]) (user=aliceryhl job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1d20:b0:431:3bf9:3ebb with SMTP id 5b1f17b1804b1-4361c429dedmr15419925e9.24.1733913469827; Wed, 11 Dec 2024 02:37:49 -0800 (PST) Date: Wed, 11 Dec 2024 10:37:12 +0000 In-Reply-To: <20241211-vma-v11-0-466640428fc3@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241211-vma-v11-0-466640428fc3@google.com> X-Developer-Key: i=aliceryhl@google.com; a=openpgp; fpr=49F6C1FAA74960F43A5B86A1EE7A392FDE96209F X-Developer-Signature: v=1; a=openpgp-sha256; l=22122; i=aliceryhl@google.com; h=from:subject:message-id; bh=StMV4D93TpURmhDr1aRbRum370oMmECfCETgkAAobmQ=; b=owEBbQKS/ZANAwAKAQRYvu5YxjlGAcsmYgBnWWtqlOomS0h8JmMDkoglQIvfsmDNU4Ge9Cyvz cp4qxKVmm6JAjMEAAEKAB0WIQSDkqKUTWQHCvFIvbIEWL7uWMY5RgUCZ1lragAKCRAEWL7uWMY5 Rtq+EACKcdjUdiMl3drn8z33X6oq2sVRiZ63h26GpBhdtT4Pu4XbJ5dsrLQpt4HBMohyfcNJBl2 glSq0DquhIXF07WfXfh+S7IfeQo1tt7RqvpzmWYCAMTkiriJIagcjydFT5WHBX2K3deRnH/0MJf UjOOVQ9c1TYAEDcZm89a/jbumC75IKTVNrxBdyAjUqP4xSrJkjcTQASEF2PIJ/TDHAyhaKa01d8 D8bBK040A5ugUvgSIHnS5hBCv/HHMcMlGJzNq8k+Ae//dPuNZQ63+HgHiB2kmn9HEAXPXZYZJPc cCL6So9vYr7FjhHwlbDKlpx9p+VEWVxStqbtm6i+464u9YIctLCnhvZNPVBnYy5CH3WWQiQaS0+ C1jQvLJbVAgSySJSpu4OEBlC1PUgzQK/HBKtH9KKwrLaLWBF+ZmTjEf0XQbV81DKQqjTBKvOWDZ XIYcVkJ4HRGiCu/CLWjPfVf2L21R+fTOYo/o44PVKF4PNCgjNXp4ZxY3qFxyE6RlSu9Th1omtBP DQ8F+MxWuFXuvZSfAat4vGvLP+Vi2aqKR8hTcbPUpcYXF0U8kXmvcv038Xpt6yU9una+eVhcbJl dpVbSc4WCKGTdvdmibZkmIJr9lAnu1NQpX02WegzXy1A6lh2lPZhJIXlxAdNg/d0IMCgIIFvDx4 /8ePkAu2/EzpLbQ== X-Mailer: b4 0.13.0 Message-ID: <20241211-vma-v11-8-466640428fc3@google.com> Subject: [PATCH v11 8/8] task: rust: rework how current is accessed From: Alice Ryhl To: Miguel Ojeda , Matthew Wilcox , Lorenzo Stoakes , Vlastimil Babka , John Hubbard , "Liam R. Howlett" , Andrew Morton , Greg Kroah-Hartman , Arnd Bergmann , Christian Brauner , Jann Horn , Suren Baghdasaryan Cc: Alex Gaynor , Boqun Feng , Gary Guo , "=?utf-8?q?Bj=C3=B6rn_Roy_Baron?=" , Benno Lossin , Andreas Hindborg , Trevor Gross , linux-kernel@vger.kernel.org, linux-mm@kvack.org, rust-for-linux@vger.kernel.org, Alice Ryhl Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Introduce a new type called `CurrentTask` that lets you perform various operations that are only safe on the `current` task. Use the new type to provide a way to access the current mm without incrementing its refcount. With this change, you can write stuff such as let vma =3D current!().mm().lock_vma_under_rcu(addr); without incrementing any refcounts. This replaces the existing abstractions for accessing the current pid namespace. With the old approach, every field access to current involves both a macro and a unsafe helper function. The new approach simplifies that to a single safe function on the `CurrentTask` type. This makes it less heavy-weight to add additional current accessors in the future. That said, creating a `CurrentTask` type like the one in this patch requires that we are careful to ensure that it cannot escape the current task or otherwise access things after they are freed. To do this, I declared that it cannot escape the current "task context" where I defined a "task context" as essentially the region in which `current` remains unchanged. So e.g., release_task() or begin_new_exec() would leave the task context. If a userspace thread returns to userspace and later makes another syscall, then I consider the two syscalls to be different task contexts. This allows values stored in that task to be modified between syscalls, even if they're guaranteed to be immutable during a syscall. Ensuring correctness of `CurrentTask` is slightly tricky if we also want the ability to have a safe `kthread_use_mm()` implementation in Rust. To support that safely, there are two patterns we need to ensure are safe: // Case 1: current!() called inside the scope. let mm; kthread_use_mm(some_mm, || { mm =3D current!().mm(); }); drop(some_mm); mm.do_something(); // UAF and: // Case 2: current!() called before the scope. let mm; let task =3D current!(); kthread_use_mm(some_mm, || { mm =3D task.mm(); }); drop(some_mm); mm.do_something(); // UAF The existing `current!()` abstraction already natively prevents the first case: The `&CurrentTask` would be tied to the inner scope, so the borrow-checker ensures that no reference derived from it can escape the scope. Fixing the second case is a bit more tricky. The solution is to essentially pretend that the contents of the scope execute on an different thread, which means that only thread-safe types can cross the boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to move it to another thread will fail, and this includes our fake pretend thread boundary. This has the disadvantage that other types that aren't thread-safe for reasons unrelated to `current` also cannot be moved across the `kthread_use_mm()` boundary. I consider this an acceptable tradeoff. Cc: Christian Brauner Signed-off-by: Alice Ryhl --- rust/kernel/mm.rs | 22 ---- rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++------------------= ---- 2 files changed, 167 insertions(+), 139 deletions(-) diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs index 50f4861ae4b9..f7d1079391ef 100644 --- a/rust/kernel/mm.rs +++ b/rust/kernel/mm.rs @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser { =20 // These methods are safe to call even if `mm_users` is zero. impl Mm { - /// Call `mmgrab` on `current.mm`. - #[inline] - pub fn mmgrab_current() -> Option> { - // SAFETY: It's safe to get the `mm` field from current. - let mm =3D unsafe { - let current =3D bindings::get_current(); - (*current).mm - }; - - if mm.is_null() { - return None; - } - - // SAFETY: The value of `current->mm` is guaranteed to be null or = a valid `mm_struct`. We - // just checked that it's not null. Furthermore, the returned `&Mm= ` is valid only for the - // duration of this function, and `current->mm` will stay valid fo= r that long. - let mm =3D unsafe { Mm::from_raw(mm) }; - - // This increments the refcount using `mmgrab`. - Some(ARef::from(mm)) - } - /// Returns a raw pointer to the inner `mm_struct`. #[inline] pub fn as_raw(&self) -> *mut bindings::mm_struct { diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs index 07bc22a7645c..8c1ee46c03eb 100644 --- a/rust/kernel/task.rs +++ b/rust/kernel/task.rs @@ -7,6 +7,7 @@ use crate::{ bindings, ffi::{c_int, c_long, c_uint}, + mm::MmWithUser, pid_namespace::PidNamespace, types::{ARef, NotThreadSafe, Opaque}, }; @@ -31,22 +32,20 @@ #[macro_export] macro_rules! current { () =3D> { - // SAFETY: Deref + addr-of below create a temporary `TaskRef` that= cannot outlive the - // caller. + // SAFETY: This expression creates a temporary value that is dropp= ed at the end of the + // caller's scope. The following mechanisms ensure that the result= ing `&CurrentTask` cannot + // leave current task context: + // + // * To return to userspace, the caller must leave the current sco= pe. + // * Operations such as `begin_new_exec()` are necessarily unsafe = and the caller of + // `begin_new_exec()` is responsible for safety. + // * Rust abstractions for things such as a `kthread_use_mm()` sco= pe must require the + // closure to be `Send`, so the `NotThreadSafe` field of `Curren= tTask` ensures that the + // `&CurrentTask` cannot cross the scope in either direction. unsafe { &*$crate::task::Task::current() } }; } =20 -/// Returns the currently running task's pid namespace. -#[macro_export] -macro_rules! current_pid_ns { - () =3D> { - // SAFETY: Deref + addr-of below create a temporary `PidNamespaceR= ef` that cannot outlive - // the caller. - unsafe { &*$crate::task::Task::current_pid_ns() } - }; -} - /// Wraps the kernel's `struct task_struct`. /// /// # Invariants @@ -105,6 +104,44 @@ unsafe impl Send for Task {} // synchronised by C code (e.g., `signal_pending`). unsafe impl Sync for Task {} =20 +/// Represents the [`Task`] in the `current` global. +/// +/// This type exists to provide more efficient operations that are only va= lid on the current task. +/// For example, to retrieve the pid-namespace of a task, you must use rcu= protection unless it is +/// the current task. +/// +/// # Invariants +/// +/// Each value of this type must only be accessed from the task context it= was created within. +/// +/// Of course, every thread is in a different task context, but for the pu= rposes of this invariant, +/// these operations also permanently leave the task context: +/// +/// * Returning to userspace from system call context. +/// * Calling `release_task()`. +/// * Calling `begin_new_exec()` in a binary format loader. +/// +/// Other operations temporarily create a new sub-context: +/// +/// * Calling `kthread_use_mm()` creates a new context, and `kthread_unuse= _mm()` returns to the +/// old context. +/// +/// This means that a `CurrentTask` obtained before a `kthread_use_mm()` c= all may be used again +/// once `kthread_unuse_mm()` is called, but it must not be used between t= hese two calls. +/// Conversely, a `CurrentTask` obtained between a `kthread_use_mm()`/`kth= read_unuse_mm()` pair +/// must not be used after `kthread_unuse_mm()`. +#[repr(transparent)] +pub struct CurrentTask(Task, NotThreadSafe); + +// Make all `Task` methods available on `CurrentTask`. +impl Deref for CurrentTask { + type Target =3D Task; + #[inline] + fn deref(&self) -> &Task { + &self.0 + } +} + /// The type of process identifiers (PIDs). type Pid =3D bindings::pid_t; =20 @@ -131,119 +168,29 @@ pub fn current_raw() -> *mut bindings::task_struct { /// /// # Safety /// - /// Callers must ensure that the returned object doesn't outlive the c= urrent task/thread. - pub unsafe fn current() -> impl Deref { - struct TaskRef<'a> { - task: &'a Task, - _not_send: NotThreadSafe, + /// Callers must ensure that the returned object is only used to acces= s a [`CurrentTask`] + /// within the task context that was active when this function was cal= led. For more details, + /// see the invariants section for [`CurrentTask`]. + pub unsafe fn current() -> impl Deref { + struct TaskRef { + task: *const CurrentTask, } =20 - impl Deref for TaskRef<'_> { - type Target =3D Task; + impl Deref for TaskRef { + type Target =3D CurrentTask; =20 fn deref(&self) -> &Self::Target { - self.task + // SAFETY: The returned reference borrows from this `TaskR= ef`, so it cannot outlive + // the `TaskRef`, which the caller of `Task::current()` ha= s promised will not + // outlive the task/thread for which `self.task` is the `c= urrent` pointer. Thus, it + // is okay to return a `CurrentTask` reference here. + unsafe { &*self.task } } } =20 - let current =3D Task::current_raw(); TaskRef { - // SAFETY: If the current thread is still running, the current= task is valid. Given - // that `TaskRef` is not `Send`, we know it cannot be transfer= red to another thread - // (where it could potentially outlive the caller). - task: unsafe { &*current.cast() }, - _not_send: NotThreadSafe, - } - } - - /// Returns a PidNamespace reference for the currently executing task'= s/thread's pid namespace. - /// - /// This function can be used to create an unbounded lifetime by e.g.,= storing the returned - /// PidNamespace in a global variable which would be a bug. So the rec= ommended way to get the - /// current task's/thread's pid namespace is to use the [`current_pid_= ns`] macro because it is - /// safe. - /// - /// # Safety - /// - /// Callers must ensure that the returned object doesn't outlive the c= urrent task/thread. - pub unsafe fn current_pid_ns() -> impl Deref { - struct PidNamespaceRef<'a> { - task: &'a PidNamespace, - _not_send: NotThreadSafe, - } - - impl Deref for PidNamespaceRef<'_> { - type Target =3D PidNamespace; - - fn deref(&self) -> &Self::Target { - self.task - } - } - - // The lifetime of `PidNamespace` is bound to `Task` and `struct p= id`. - // - // The `PidNamespace` of a `Task` doesn't ever change once the `Ta= sk` is alive. A - // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)= ` will not have an effect - // on the calling `Task`'s pid namespace. It will only effect the = pid namespace of children - // created by the calling `Task`. This invariant guarantees that a= fter having acquired a - // reference to a `Task`'s pid namespace it will remain unchanged. - // - // When a task has exited and been reaped `release_task()` will be= called. This will set - // the `PidNamespace` of the task to `NULL`. So retrieving the `Pi= dNamespace` of a task - // that is dead will return `NULL`. Note, that neither holding the= RCU lock nor holding a - // referencing count to - // the `Task` will prevent `release_task()` being called. - // - // In order to retrieve the `PidNamespace` of a `Task` the `task_a= ctive_pid_ns()` function - // can be used. There are two cases to consider: - // - // (1) retrieving the `PidNamespace` of the `current` task - // (2) retrieving the `PidNamespace` of a non-`current` task - // - // From system call context retrieving the `PidNamespace` for case= (1) is always safe and - // requires neither RCU locking nor a reference count to be held. = Retrieving the - // `PidNamespace` after `release_task()` for current will return `= NULL` but no codepath - // like that is exposed to Rust. - // - // Retrieving the `PidNamespace` from system call context for (2) = requires RCU protection. - // Accessing `PidNamespace` outside of RCU protection requires a r= eference count that - // must've been acquired while holding the RCU lock. Note that acc= essing a non-`current` - // task means `NULL` can be returned as the non-`current` task cou= ld have already passed - // through `release_task()`. - // - // To retrieve (1) the `current_pid_ns!()` macro should be used wh= ich ensure that the - // returned `PidNamespace` cannot outlive the calling scope. The a= ssociated - // `current_pid_ns()` function should not be called directly as it= could be abused to - // created an unbounded lifetime for `PidNamespace`. The `current_= pid_ns!()` macro allows - // Rust to handle the common case of accessing `current`'s `PidNam= espace` without RCU - // protection and without having to acquire a reference count. - // - // For (2) the `task_get_pid_ns()` method must be used. This will = always acquire a - // reference on `PidNamespace` and will return an `Option` to forc= e the caller to - // explicitly handle the case where `PidNamespace` is `None`, some= thing that tends to be - // forgotten when doing the equivalent operation in `C`. Missing R= CU primitives make it - // difficult to perform operations that are otherwise safe without= holding a reference - // count as long as RCU protection is guaranteed. But it is not im= portant currently. But we - // do want it in the future. - // - // Note for (2) the required RCU protection around calling `task_a= ctive_pid_ns()` - // synchronizes against putting the last reference of the associat= ed `struct pid` of - // `task->thread_pid`. The `struct pid` stored in that field is us= ed to retrieve the - // `PidNamespace` of the caller. When `release_task()` is called `= task->thread_pid` will be - // `NULL`ed and `put_pid()` on said `struct pid` will be delayed i= n `free_pid()` via - // `call_rcu()` allowing everyone with an RCU protected access to = the `struct pid` acquired - // from `task->thread_pid` to finish. - // - // SAFETY: The current task's pid namespace is valid as long as th= e current task is running. - let pidns =3D unsafe { bindings::task_active_pid_ns(Task::current_= raw()) }; - PidNamespaceRef { - // SAFETY: If the current thread is still running, the current= task and its associated - // pid namespace are valid. `PidNamespaceRef` is not `Send`, s= o we know it cannot be - // transferred to another thread (where it could potentially o= utlive the current - // `Task`). The caller needs to ensure that the PidNamespaceRe= f doesn't outlive the - // current task/thread. - task: unsafe { PidNamespace::from_ptr(pidns) }, - _not_send: NotThreadSafe, + // CAST: The layout of `struct task_struct` and `CurrentTask` = is identical. + task: Task::current_raw().cast(), } } =20 @@ -326,6 +273,109 @@ pub fn wake_up(&self) { } } =20 +impl CurrentTask { + /// Access the address space of the current task. + /// + /// This function does not touch the refcount of the mm. + #[inline] + pub fn mm(&self) -> Option<&MmWithUser> { + // SAFETY: The `mm` field of `current` is not modified from other = threads, so reading it is + // not a data race. + let mm =3D unsafe { (*self.as_ptr()).mm }; + + if mm.is_null() { + return None; + } + + // SAFETY: If `current->mm` is non-null, then it references a vali= d mm with a non-zero + // value of `mm_users`. Furthermore, the returned `&MmWithUser` bo= rrows from this + // `CurrentTask`, so it cannot escape the scope in which the curre= nt pointer was obtained. + // + // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` ar= e used. There are two + // relevant cases: + // * If the `&CurrentTask` was created before `kthread_use_mm()`, = then it cannot be + // accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` s= cope due to the + // `NotThreadSafe` field of `CurrentTask`. + // * If the `&CurrentTask` was created within a `kthread_use_mm()`= /`kthread_unuse_mm()` + // scope, then the `&CurrentTask` cannot escape that scope, so t= he returned `&MmWithUser` + // also cannot escape that scope. + // In either case, it's not possible to read `current->mm` and kee= p using it after the + // scope is ended with `kthread_unuse_mm()`. + Some(unsafe { MmWithUser::from_raw(mm) }) + } + + /// Access the pid namespace of the current task. + /// + /// This function does not touch the refcount of the namespace or use = RCU protection. + #[doc(alias =3D "task_active_pid_ns")] + #[inline] + pub fn active_pid_ns(&self) -> Option<&PidNamespace> { + // SAFETY: It is safe to call `task_active_pid_ns` without RCU pro= tection when calling it + // on the current task. + let active_ns =3D unsafe { bindings::task_active_pid_ns(self.as_pt= r()) }; + + if active_ns.is_null() { + return None; + } + + // The lifetime of `PidNamespace` is bound to `Task` and `struct p= id`. + // + // The `PidNamespace` of a `Task` doesn't ever change once the `Ta= sk` is alive. A + // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)= ` will not have an effect + // on the calling `Task`'s pid namespace. It will only effect the = pid namespace of children + // created by the calling `Task`. This invariant guarantees that a= fter having acquired a + // reference to a `Task`'s pid namespace it will remain unchanged. + // + // When a task has exited and been reaped `release_task()` will be= called. This will set + // the `PidNamespace` of the task to `NULL`. So retrieving the `Pi= dNamespace` of a task + // that is dead will return `NULL`. Note, that neither holding the= RCU lock nor holding a + // referencing count to the `Task` will prevent `release_task()` b= eing called. + // + // In order to retrieve the `PidNamespace` of a `Task` the `task_a= ctive_pid_ns()` function + // can be used. There are two cases to consider: + // + // (1) retrieving the `PidNamespace` of the `current` task + // (2) retrieving the `PidNamespace` of a non-`current` task + // + // From system call context retrieving the `PidNamespace` for case= (1) is always safe and + // requires neither RCU locking nor a reference count to be held. = Retrieving the + // `PidNamespace` after `release_task()` for current will return `= NULL` but no codepath + // like that is exposed to Rust. + // + // Retrieving the `PidNamespace` from system call context for (2) = requires RCU protection. + // Accessing `PidNamespace` outside of RCU protection requires a r= eference count that + // must've been acquired while holding the RCU lock. Note that acc= essing a non-`current` + // task means `NULL` can be returned as the non-`current` task cou= ld have already passed + // through `release_task()`. + // + // To retrieve (1) the `&CurrentTask` type should be used which en= sures that the returned + // `PidNamespace` cannot outlive the current task context. The `Cu= rrentTask::active_pid_ns` + // function allows Rust to handle the common case of accessing `cu= rrent`'s `PidNamespace` + // without RCU protection and without having to acquire a referenc= e count. + // + // For (2) the `task_get_pid_ns()` method must be used. This will = always acquire a + // reference on `PidNamespace` and will return an `Option` to forc= e the caller to + // explicitly handle the case where `PidNamespace` is `None`, some= thing that tends to be + // forgotten when doing the equivalent operation in `C`. Missing R= CU primitives make it + // difficult to perform operations that are otherwise safe without= holding a reference + // count as long as RCU protection is guaranteed. But it is not im= portant currently. But we + // do want it in the future. + // + // Note for (2) the required RCU protection around calling `task_a= ctive_pid_ns()` + // synchronizes against putting the last reference of the associat= ed `struct pid` of + // `task->thread_pid`. The `struct pid` stored in that field is us= ed to retrieve the + // `PidNamespace` of the caller. When `release_task()` is called `= task->thread_pid` will be + // `NULL`ed and `put_pid()` on said `struct pid` will be delayed i= n `free_pid()` via + // `call_rcu()` allowing everyone with an RCU protected access to = the `struct pid` acquired + // from `task->thread_pid` to finish. + // + // SAFETY: If `current`'s pid ns is non-null, then it references a= valid pid ns. + // Furthermore, the returned `&PidNamespace` borrows from this `Cu= rrentTask`, so it cannot + // escape the scope in which the current pointer was obtained. + Some(unsafe { PidNamespace::from_ptr(active_ns) }) + } +} + // SAFETY: The type invariants guarantee that `Task` is always refcounted. unsafe impl crate::types::AlwaysRefCounted for Task { fn inc_ref(&self) { --=20 2.47.1.613.gc27f4b7a9f-goog