[v11] Rust support for mm_struct, vm_area_struct, and mmap

[PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Alice Ryhl 1 year, 1 month ago

Introduce a new type called `CurrentTask` that lets you perform various
operations that are only safe on the `current` task. Use the new type to
provide a way to access the current mm without incrementing its
refcount.

With this change, you can write stuff such as

	let vma = current!().mm().lock_vma_under_rcu(addr);

without incrementing any refcounts.

This replaces the existing abstractions for accessing the current pid
namespace. With the old approach, every field access to current involves
both a macro and a unsafe helper function. The new approach simplifies
that to a single safe function on the `CurrentTask` type. This makes it
less heavy-weight to add additional current accessors in the future.

That said, creating a `CurrentTask` type like the one in this patch
requires that we are careful to ensure that it cannot escape the current
task or otherwise access things after they are freed. To do this, I
declared that it cannot escape the current "task context" where I
defined a "task context" as essentially the region in which `current`
remains unchanged. So e.g., release_task() or begin_new_exec() would
leave the task context.

If a userspace thread returns to userspace and later makes another
syscall, then I consider the two syscalls to be different task contexts.
This allows values stored in that task to be modified between syscalls,
even if they're guaranteed to be immutable during a syscall.

Ensuring correctness of `CurrentTask` is slightly tricky if we also want
the ability to have a safe `kthread_use_mm()` implementation in Rust. To
support that safely, there are two patterns we need to ensure are safe:

	// Case 1: current!() called inside the scope.
	let mm;
	kthread_use_mm(some_mm, || {
	    mm = current!().mm();
	});
	drop(some_mm);
	mm.do_something(); // UAF

and:

	// Case 2: current!() called before the scope.
	let mm;
	let task = current!();
	kthread_use_mm(some_mm, || {
	    mm = task.mm();
	});
	drop(some_mm);
	mm.do_something(); // UAF

The existing `current!()` abstraction already natively prevents the
first case: The `&CurrentTask` would be tied to the inner scope, so the
borrow-checker ensures that no reference derived from it can escape the
scope.

Fixing the second case is a bit more tricky. The solution is to
essentially pretend that the contents of the scope execute on an
different thread, which means that only thread-safe types can cross the
boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
move it to another thread will fail, and this includes our fake pretend
thread boundary.

This has the disadvantage that other types that aren't thread-safe for
reasons unrelated to `current` also cannot be moved across the
`kthread_use_mm()` boundary. I consider this an acceptable tradeoff.

Cc: Christian Brauner <brauner@kernel.org>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/kernel/mm.rs   |  22 ----
 rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
 2 files changed, 167 insertions(+), 139 deletions(-)

diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 50f4861ae4b9..f7d1079391ef 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
 
 // These methods are safe to call even if `mm_users` is zero.
 impl Mm {
-    /// Call `mmgrab` on `current.mm`.
-    #[inline]
-    pub fn mmgrab_current() -> Option<ARef<Mm>> {
-        // SAFETY: It's safe to get the `mm` field from current.
-        let mm = unsafe {
-            let current = bindings::get_current();
-            (*current).mm
-        };
-
-        if mm.is_null() {
-            return None;
-        }
-
-        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
-        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
-        // duration of this function, and `current->mm` will stay valid for that long.
-        let mm = unsafe { Mm::from_raw(mm) };
-
-        // This increments the refcount using `mmgrab`.
-        Some(ARef::from(mm))
-    }
-
     /// Returns a raw pointer to the inner `mm_struct`.
     #[inline]
     pub fn as_raw(&self) -> *mut bindings::mm_struct {
diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
index 07bc22a7645c..8c1ee46c03eb 100644
--- a/rust/kernel/task.rs
+++ b/rust/kernel/task.rs
@@ -7,6 +7,7 @@
 use crate::{
     bindings,
     ffi::{c_int, c_long, c_uint},
+    mm::MmWithUser,
     pid_namespace::PidNamespace,
     types::{ARef, NotThreadSafe, Opaque},
 };
@@ -31,22 +32,20 @@
 #[macro_export]
 macro_rules! current {
     () => {
-        // SAFETY: Deref + addr-of below create a temporary `TaskRef` that cannot outlive the
-        // caller.
+        // SAFETY: This expression creates a temporary value that is dropped at the end of the
+        // caller's scope. The following mechanisms ensure that the resulting `&CurrentTask` cannot
+        // leave current task context:
+        //
+        // * To return to userspace, the caller must leave the current scope.
+        // * Operations such as `begin_new_exec()` are necessarily unsafe and the caller of
+        //   `begin_new_exec()` is responsible for safety.
+        // * Rust abstractions for things such as a `kthread_use_mm()` scope must require the
+        //   closure to be `Send`, so the `NotThreadSafe` field of `CurrentTask` ensures that the
+        //   `&CurrentTask` cannot cross the scope in either direction.
         unsafe { &*$crate::task::Task::current() }
     };
 }
 
-/// Returns the currently running task's pid namespace.
-#[macro_export]
-macro_rules! current_pid_ns {
-    () => {
-        // SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
-        // the caller.
-        unsafe { &*$crate::task::Task::current_pid_ns() }
-    };
-}
-
 /// Wraps the kernel's `struct task_struct`.
 ///
 /// # Invariants
@@ -105,6 +104,44 @@ unsafe impl Send for Task {}
 // synchronised by C code (e.g., `signal_pending`).
 unsafe impl Sync for Task {}
 
+/// Represents the [`Task`] in the `current` global.
+///
+/// This type exists to provide more efficient operations that are only valid on the current task.
+/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
+/// the current task.
+///
+/// # Invariants
+///
+/// Each value of this type must only be accessed from the task context it was created within.
+///
+/// Of course, every thread is in a different task context, but for the purposes of this invariant,
+/// these operations also permanently leave the task context:
+///
+/// * Returning to userspace from system call context.
+/// * Calling `release_task()`.
+/// * Calling `begin_new_exec()` in a binary format loader.
+///
+/// Other operations temporarily create a new sub-context:
+///
+/// * Calling `kthread_use_mm()` creates a new context, and `kthread_unuse_mm()` returns to the
+///   old context.
+///
+/// This means that a `CurrentTask` obtained before a `kthread_use_mm()` call may be used again
+/// once `kthread_unuse_mm()` is called, but it must not be used between these two calls.
+/// Conversely, a `CurrentTask` obtained between a `kthread_use_mm()`/`kthread_unuse_mm()` pair
+/// must not be used after `kthread_unuse_mm()`.
+#[repr(transparent)]
+pub struct CurrentTask(Task, NotThreadSafe);
+
+// Make all `Task` methods available on `CurrentTask`.
+impl Deref for CurrentTask {
+    type Target = Task;
+    #[inline]
+    fn deref(&self) -> &Task {
+        &self.0
+    }
+}
+
 /// The type of process identifiers (PIDs).
 type Pid = bindings::pid_t;
 
@@ -131,119 +168,29 @@ pub fn current_raw() -> *mut bindings::task_struct {
     ///
     /// # Safety
     ///
-    /// Callers must ensure that the returned object doesn't outlive the current task/thread.
-    pub unsafe fn current() -> impl Deref<Target = Task> {
-        struct TaskRef<'a> {
-            task: &'a Task,
-            _not_send: NotThreadSafe,
+    /// Callers must ensure that the returned object is only used to access a [`CurrentTask`]
+    /// within the task context that was active when this function was called. For more details,
+    /// see the invariants section for [`CurrentTask`].
+    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
+        struct TaskRef {
+            task: *const CurrentTask,
         }
 
-        impl Deref for TaskRef<'_> {
-            type Target = Task;
+        impl Deref for TaskRef {
+            type Target = CurrentTask;
 
             fn deref(&self) -> &Self::Target {
-                self.task
+                // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
+                // the `TaskRef`, which the caller of `Task::current()` has promised will not
+                // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
+                // is okay to return a `CurrentTask` reference here.
+                unsafe { &*self.task }
             }
         }
 
-        let current = Task::current_raw();
         TaskRef {
-            // SAFETY: If the current thread is still running, the current task is valid. Given
-            // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
-            // (where it could potentially outlive the caller).
-            task: unsafe { &*current.cast() },
-            _not_send: NotThreadSafe,
-        }
-    }
-
-    /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
-    ///
-    /// This function can be used to create an unbounded lifetime by e.g., storing the returned
-    /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
-    /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
-    /// safe.
-    ///
-    /// # Safety
-    ///
-    /// Callers must ensure that the returned object doesn't outlive the current task/thread.
-    pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
-        struct PidNamespaceRef<'a> {
-            task: &'a PidNamespace,
-            _not_send: NotThreadSafe,
-        }
-
-        impl Deref for PidNamespaceRef<'_> {
-            type Target = PidNamespace;
-
-            fn deref(&self) -> &Self::Target {
-                self.task
-            }
-        }
-
-        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
-        //
-        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
-        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
-        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
-        // created by the calling `Task`. This invariant guarantees that after having acquired a
-        // reference to a `Task`'s pid namespace it will remain unchanged.
-        //
-        // When a task has exited and been reaped `release_task()` will be called. This will set
-        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
-        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
-        // referencing count to
-        // the `Task` will prevent `release_task()` being called.
-        //
-        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
-        // can be used. There are two cases to consider:
-        //
-        // (1) retrieving the `PidNamespace` of the `current` task
-        // (2) retrieving the `PidNamespace` of a non-`current` task
-        //
-        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
-        // requires neither RCU locking nor a reference count to be held. Retrieving the
-        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
-        // like that is exposed to Rust.
-        //
-        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
-        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
-        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
-        // task means `NULL` can be returned as the non-`current` task could have already passed
-        // through `release_task()`.
-        //
-        // To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the
-        // returned `PidNamespace` cannot outlive the calling scope. The associated
-        // `current_pid_ns()` function should not be called directly as it could be abused to
-        // created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows
-        // Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU
-        // protection and without having to acquire a reference count.
-        //
-        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
-        // reference on `PidNamespace` and will return an `Option` to force the caller to
-        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
-        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
-        // difficult to perform operations that are otherwise safe without holding a reference
-        // count as long as RCU protection is guaranteed. But it is not important currently. But we
-        // do want it in the future.
-        //
-        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
-        // synchronizes against putting the last reference of the associated `struct pid` of
-        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
-        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
-        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
-        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
-        // from `task->thread_pid` to finish.
-        //
-        // SAFETY: The current task's pid namespace is valid as long as the current task is running.
-        let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
-        PidNamespaceRef {
-            // SAFETY: If the current thread is still running, the current task and its associated
-            // pid namespace are valid. `PidNamespaceRef` is not `Send`, so we know it cannot be
-            // transferred to another thread (where it could potentially outlive the current
-            // `Task`). The caller needs to ensure that the PidNamespaceRef doesn't outlive the
-            // current task/thread.
-            task: unsafe { PidNamespace::from_ptr(pidns) },
-            _not_send: NotThreadSafe,
+            // CAST: The layout of `struct task_struct` and `CurrentTask` is identical.
+            task: Task::current_raw().cast(),
         }
     }
 
@@ -326,6 +273,109 @@ pub fn wake_up(&self) {
     }
 }
 
+impl CurrentTask {
+    /// Access the address space of the current task.
+    ///
+    /// This function does not touch the refcount of the mm.
+    #[inline]
+    pub fn mm(&self) -> Option<&MmWithUser> {
+        // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
+        // not a data race.
+        let mm = unsafe { (*self.as_ptr()).mm };
+
+        if mm.is_null() {
+            return None;
+        }
+
+        // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
+        // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
+        // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
+        //
+        // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
+        // relevant cases:
+        // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
+        //   accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
+        //   `NotThreadSafe` field of `CurrentTask`.
+        // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
+        //   scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
+        //   also cannot escape that scope.
+        // In either case, it's not possible to read `current->mm` and keep using it after the
+        // scope is ended with `kthread_unuse_mm()`.
+        Some(unsafe { MmWithUser::from_raw(mm) })
+    }
+
+    /// Access the pid namespace of the current task.
+    ///
+    /// This function does not touch the refcount of the namespace or use RCU protection.
+    #[doc(alias = "task_active_pid_ns")]
+    #[inline]
+    pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
+        // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
+        // on the current task.
+        let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
+
+        if active_ns.is_null() {
+            return None;
+        }
+
+        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
+        //
+        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
+        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
+        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
+        // created by the calling `Task`. This invariant guarantees that after having acquired a
+        // reference to a `Task`'s pid namespace it will remain unchanged.
+        //
+        // When a task has exited and been reaped `release_task()` will be called. This will set
+        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
+        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
+        // referencing count to the `Task` will prevent `release_task()` being called.
+        //
+        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
+        // can be used. There are two cases to consider:
+        //
+        // (1) retrieving the `PidNamespace` of the `current` task
+        // (2) retrieving the `PidNamespace` of a non-`current` task
+        //
+        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
+        // requires neither RCU locking nor a reference count to be held. Retrieving the
+        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
+        // like that is exposed to Rust.
+        //
+        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
+        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
+        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
+        // task means `NULL` can be returned as the non-`current` task could have already passed
+        // through `release_task()`.
+        //
+        // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
+        // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
+        // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
+        // without RCU protection and without having to acquire a reference count.
+        //
+        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
+        // reference on `PidNamespace` and will return an `Option` to force the caller to
+        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
+        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
+        // difficult to perform operations that are otherwise safe without holding a reference
+        // count as long as RCU protection is guaranteed. But it is not important currently. But we
+        // do want it in the future.
+        //
+        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
+        // synchronizes against putting the last reference of the associated `struct pid` of
+        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
+        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
+        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
+        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
+        // from `task->thread_pid` to finish.
+        //
+        // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
+        // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
+        // escape the scope in which the current pointer was obtained.
+        Some(unsafe { PidNamespace::from_ptr(active_ns) })
+    }
+}
+
 // SAFETY: The type invariants guarantee that `Task` is always refcounted.
 unsafe impl crate::types::AlwaysRefCounted for Task {
     fn inc_ref(&self) {

-- 
2.47.1.613.gc27f4b7a9f-goog

Re: [PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Boqun Feng 1 year, 1 month ago

On Wed, Dec 11, 2024 at 10:37:12AM +0000, Alice Ryhl wrote:
> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.
> 
> With this change, you can write stuff such as
> 
> 	let vma = current!().mm().lock_vma_under_rcu(addr);
> 
> without incrementing any refcounts.
> 
> This replaces the existing abstractions for accessing the current pid
> namespace. With the old approach, every field access to current involves
> both a macro and a unsafe helper function. The new approach simplifies
> that to a single safe function on the `CurrentTask` type. This makes it
> less heavy-weight to add additional current accessors in the future.
> 
> That said, creating a `CurrentTask` type like the one in this patch
> requires that we are careful to ensure that it cannot escape the current
> task or otherwise access things after they are freed. To do this, I
> declared that it cannot escape the current "task context" where I
> defined a "task context" as essentially the region in which `current`
> remains unchanged. So e.g., release_task() or begin_new_exec() would
> leave the task context.
> 
> If a userspace thread returns to userspace and later makes another
> syscall, then I consider the two syscalls to be different task contexts.
> This allows values stored in that task to be modified between syscalls,
> even if they're guaranteed to be immutable during a syscall.
> 
> Ensuring correctness of `CurrentTask` is slightly tricky if we also want
> the ability to have a safe `kthread_use_mm()` implementation in Rust. To
> support that safely, there are two patterns we need to ensure are safe:
> 
> 	// Case 1: current!() called inside the scope.
> 	let mm;
> 	kthread_use_mm(some_mm, || {
> 	    mm = current!().mm();
> 	});
> 	drop(some_mm);
> 	mm.do_something(); // UAF
> 
> and:
> 
> 	// Case 2: current!() called before the scope.
> 	let mm;
> 	let task = current!();
> 	kthread_use_mm(some_mm, || {
> 	    mm = task.mm();
> 	});
> 	drop(some_mm);
> 	mm.do_something(); // UAF
> 
> The existing `current!()` abstraction already natively prevents the
> first case: The `&CurrentTask` would be tied to the inner scope, so the
> borrow-checker ensures that no reference derived from it can escape the
> scope.
> 
> Fixing the second case is a bit more tricky. The solution is to
> essentially pretend that the contents of the scope execute on an
> different thread, which means that only thread-safe types can cross the
> boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
> move it to another thread will fail, and this includes our fake pretend
> thread boundary.
> 
> This has the disadvantage that other types that aren't thread-safe for
> reasons unrelated to `current` also cannot be moved across the
> `kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
> 
> Cc: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
>  rust/kernel/mm.rs   |  22 ----
>  rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
>  2 files changed, 167 insertions(+), 139 deletions(-)
> 
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 50f4861ae4b9..f7d1079391ef 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
>  
>  // These methods are safe to call even if `mm_users` is zero.
>  impl Mm {
> -    /// Call `mmgrab` on `current.mm`.
> -    #[inline]
> -    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> -        // SAFETY: It's safe to get the `mm` field from current.
> -        let mm = unsafe {
> -            let current = bindings::get_current();
> -            (*current).mm
> -        };
> -
> -        if mm.is_null() {
> -            return None;
> -        }
> -
> -        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> -        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> -        // duration of this function, and `current->mm` will stay valid for that long.
> -        let mm = unsafe { Mm::from_raw(mm) };
> -
> -        // This increments the refcount using `mmgrab`.
> -        Some(ARef::from(mm))
> -    }
> -

This is removed because of no user? If so, maybe don't introduce this at
all in the earlier patch of this series? The rest looks good to me.

Regards,
Boqun

>      /// Returns a raw pointer to the inner `mm_struct`.
>      #[inline]
>      pub fn as_raw(&self) -> *mut bindings::mm_struct {
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 07bc22a7645c..8c1ee46c03eb 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -7,6 +7,7 @@
>  use crate::{
>      bindings,
>      ffi::{c_int, c_long, c_uint},
> +    mm::MmWithUser,
>      pid_namespace::PidNamespace,
>      types::{ARef, NotThreadSafe, Opaque},
>  };
> @@ -31,22 +32,20 @@
[...]

Re: [PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Alice Ryhl 1 year ago

On Tue, Dec 17, 2024 at 12:40 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Wed, Dec 11, 2024 at 10:37:12AM +0000, Alice Ryhl wrote:
> > Introduce a new type called `CurrentTask` that lets you perform various
> > operations that are only safe on the `current` task. Use the new type to
> > provide a way to access the current mm without incrementing its
> > refcount.
> >
> > With this change, you can write stuff such as
> >
> >       let vma = current!().mm().lock_vma_under_rcu(addr);
> >
> > without incrementing any refcounts.
> >
> > This replaces the existing abstractions for accessing the current pid
> > namespace. With the old approach, every field access to current involves
> > both a macro and a unsafe helper function. The new approach simplifies
> > that to a single safe function on the `CurrentTask` type. This makes it
> > less heavy-weight to add additional current accessors in the future.
> >
> > That said, creating a `CurrentTask` type like the one in this patch
> > requires that we are careful to ensure that it cannot escape the current
> > task or otherwise access things after they are freed. To do this, I
> > declared that it cannot escape the current "task context" where I
> > defined a "task context" as essentially the region in which `current`
> > remains unchanged. So e.g., release_task() or begin_new_exec() would
> > leave the task context.
> >
> > If a userspace thread returns to userspace and later makes another
> > syscall, then I consider the two syscalls to be different task contexts.
> > This allows values stored in that task to be modified between syscalls,
> > even if they're guaranteed to be immutable during a syscall.
> >
> > Ensuring correctness of `CurrentTask` is slightly tricky if we also want
> > the ability to have a safe `kthread_use_mm()` implementation in Rust. To
> > support that safely, there are two patterns we need to ensure are safe:
> >
> >       // Case 1: current!() called inside the scope.
> >       let mm;
> >       kthread_use_mm(some_mm, || {
> >           mm = current!().mm();
> >       });
> >       drop(some_mm);
> >       mm.do_something(); // UAF
> >
> > and:
> >
> >       // Case 2: current!() called before the scope.
> >       let mm;
> >       let task = current!();
> >       kthread_use_mm(some_mm, || {
> >           mm = task.mm();
> >       });
> >       drop(some_mm);
> >       mm.do_something(); // UAF
> >
> > The existing `current!()` abstraction already natively prevents the
> > first case: The `&CurrentTask` would be tied to the inner scope, so the
> > borrow-checker ensures that no reference derived from it can escape the
> > scope.
> >
> > Fixing the second case is a bit more tricky. The solution is to
> > essentially pretend that the contents of the scope execute on an
> > different thread, which means that only thread-safe types can cross the
> > boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
> > move it to another thread will fail, and this includes our fake pretend
> > thread boundary.
> >
> > This has the disadvantage that other types that aren't thread-safe for
> > reasons unrelated to `current` also cannot be moved across the
> > `kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
> >
> > Cc: Christian Brauner <brauner@kernel.org>
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > ---
> >  rust/kernel/mm.rs   |  22 ----
> >  rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
> >  2 files changed, 167 insertions(+), 139 deletions(-)
> >
> > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> > index 50f4861ae4b9..f7d1079391ef 100644
> > --- a/rust/kernel/mm.rs
> > +++ b/rust/kernel/mm.rs
> > @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
> >
> >  // These methods are safe to call even if `mm_users` is zero.
> >  impl Mm {
> > -    /// Call `mmgrab` on `current.mm`.
> > -    #[inline]
> > -    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> > -        // SAFETY: It's safe to get the `mm` field from current.
> > -        let mm = unsafe {
> > -            let current = bindings::get_current();
> > -            (*current).mm
> > -        };
> > -
> > -        if mm.is_null() {
> > -            return None;
> > -        }
> > -
> > -        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> > -        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> > -        // duration of this function, and `current->mm` will stay valid for that long.
> > -        let mm = unsafe { Mm::from_raw(mm) };
> > -
> > -        // This increments the refcount using `mmgrab`.
> > -        Some(ARef::from(mm))
> > -    }
> > -
>
> This is removed because of no user? If so, maybe don't introduce this at
> all in the earlier patch of this series? The rest looks good to me.

I guess I can drop the temporary introduction of this. It's here due
to the history of this series where originally it only had
mmgrab_current, and Binder would use that. But with this patch, you
can use CurrentTask::mm() + ARef::from() to do the same thing. For
Binder, the difference doesn't matter, but the latter is more powerful
as you can access the current task's mm_struct without incrementing
refcounts.

Alice

Re: [PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Boqun Feng 1 year ago

On Mon, Jan 13, 2025 at 11:30:05AM +0100, Alice Ryhl wrote:
> On Tue, Dec 17, 2024 at 12:40 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Wed, Dec 11, 2024 at 10:37:12AM +0000, Alice Ryhl wrote:
> > > Introduce a new type called `CurrentTask` that lets you perform various
> > > operations that are only safe on the `current` task. Use the new type to
> > > provide a way to access the current mm without incrementing its
> > > refcount.
> > >
> > > With this change, you can write stuff such as
> > >
> > >       let vma = current!().mm().lock_vma_under_rcu(addr);
> > >
> > > without incrementing any refcounts.
> > >
> > > This replaces the existing abstractions for accessing the current pid
> > > namespace. With the old approach, every field access to current involves
> > > both a macro and a unsafe helper function. The new approach simplifies
> > > that to a single safe function on the `CurrentTask` type. This makes it
> > > less heavy-weight to add additional current accessors in the future.
> > >
> > > That said, creating a `CurrentTask` type like the one in this patch
> > > requires that we are careful to ensure that it cannot escape the current
> > > task or otherwise access things after they are freed. To do this, I
> > > declared that it cannot escape the current "task context" where I
> > > defined a "task context" as essentially the region in which `current`
> > > remains unchanged. So e.g., release_task() or begin_new_exec() would
> > > leave the task context.
> > >
> > > If a userspace thread returns to userspace and later makes another
> > > syscall, then I consider the two syscalls to be different task contexts.
> > > This allows values stored in that task to be modified between syscalls,
> > > even if they're guaranteed to be immutable during a syscall.
> > >
> > > Ensuring correctness of `CurrentTask` is slightly tricky if we also want
> > > the ability to have a safe `kthread_use_mm()` implementation in Rust. To
> > > support that safely, there are two patterns we need to ensure are safe:
> > >
> > >       // Case 1: current!() called inside the scope.
> > >       let mm;
> > >       kthread_use_mm(some_mm, || {
> > >           mm = current!().mm();
> > >       });
> > >       drop(some_mm);
> > >       mm.do_something(); // UAF
> > >
> > > and:
> > >
> > >       // Case 2: current!() called before the scope.
> > >       let mm;
> > >       let task = current!();
> > >       kthread_use_mm(some_mm, || {
> > >           mm = task.mm();
> > >       });
> > >       drop(some_mm);
> > >       mm.do_something(); // UAF
> > >
> > > The existing `current!()` abstraction already natively prevents the
> > > first case: The `&CurrentTask` would be tied to the inner scope, so the
> > > borrow-checker ensures that no reference derived from it can escape the
> > > scope.
> > >
> > > Fixing the second case is a bit more tricky. The solution is to
> > > essentially pretend that the contents of the scope execute on an
> > > different thread, which means that only thread-safe types can cross the
> > > boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
> > > move it to another thread will fail, and this includes our fake pretend
> > > thread boundary.
> > >
> > > This has the disadvantage that other types that aren't thread-safe for
> > > reasons unrelated to `current` also cannot be moved across the
> > > `kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
> > >
> > > Cc: Christian Brauner <brauner@kernel.org>
> > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > ---
> > >  rust/kernel/mm.rs   |  22 ----
> > >  rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
> > >  2 files changed, 167 insertions(+), 139 deletions(-)
> > >
> > > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> > > index 50f4861ae4b9..f7d1079391ef 100644
> > > --- a/rust/kernel/mm.rs
> > > +++ b/rust/kernel/mm.rs
> > > @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
> > >
> > >  // These methods are safe to call even if `mm_users` is zero.
> > >  impl Mm {
> > > -    /// Call `mmgrab` on `current.mm`.
> > > -    #[inline]
> > > -    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> > > -        // SAFETY: It's safe to get the `mm` field from current.
> > > -        let mm = unsafe {
> > > -            let current = bindings::get_current();
> > > -            (*current).mm
> > > -        };
> > > -
> > > -        if mm.is_null() {
> > > -            return None;
> > > -        }
> > > -
> > > -        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> > > -        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> > > -        // duration of this function, and `current->mm` will stay valid for that long.
> > > -        let mm = unsafe { Mm::from_raw(mm) };
> > > -
> > > -        // This increments the refcount using `mmgrab`.
> > > -        Some(ARef::from(mm))
> > > -    }
> > > -
> >
> > This is removed because of no user? If so, maybe don't introduce this at
> > all in the earlier patch of this series? The rest looks good to me.
> 
> I guess I can drop the temporary introduction of this. It's here due
> to the history of this series where originally it only had
> mmgrab_current, and Binder would use that. But with this patch, you

As someone who usually dig a lot of history in git, I would like to see
the drop of the temporary introduction ;-) If you're going to rebase,
please see whether the drop can be done easily, thanks! Not a block
issue though.

With or without the change, feel free to add:

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

> can use CurrentTask::mm() + ARef::from() to do the same thing. For
> Binder, the difference doesn't matter, but the latter is more powerful
> as you can access the current task's mm_struct without incrementing
> refcounts.
> 
> Alice

Re: [PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Andreas Hindborg 1 year, 1 month ago

"Alice Ryhl" <aliceryhl@google.com> writes:

> Introduce a new type called `CurrentTask` that lets you perform various
> operations that are only safe on the `current` task. Use the new type to
> provide a way to access the current mm without incrementing its
> refcount.
>
> With this change, you can write stuff such as
>
> 	let vma = current!().mm().lock_vma_under_rcu(addr);
>
> without incrementing any refcounts.
>
> This replaces the existing abstractions for accessing the current pid
> namespace. With the old approach, every field access to current involves
> both a macro and a unsafe helper function. The new approach simplifies
> that to a single safe function on the `CurrentTask` type. This makes it
> less heavy-weight to add additional current accessors in the future.
>
> That said, creating a `CurrentTask` type like the one in this patch
> requires that we are careful to ensure that it cannot escape the current
> task or otherwise access things after they are freed. To do this, I
> declared that it cannot escape the current "task context" where I
> defined a "task context" as essentially the region in which `current`
> remains unchanged. So e.g., release_task() or begin_new_exec() would
> leave the task context.
>
> If a userspace thread returns to userspace and later makes another
> syscall, then I consider the two syscalls to be different task contexts.
> This allows values stored in that task to be modified between syscalls,
> even if they're guaranteed to be immutable during a syscall.
>
> Ensuring correctness of `CurrentTask` is slightly tricky if we also want
> the ability to have a safe `kthread_use_mm()` implementation in Rust. To
> support that safely, there are two patterns we need to ensure are safe:
>
> 	// Case 1: current!() called inside the scope.
> 	let mm;
> 	kthread_use_mm(some_mm, || {
> 	    mm = current!().mm();
> 	});
> 	drop(some_mm);
> 	mm.do_something(); // UAF
>
> and:
>
> 	// Case 2: current!() called before the scope.
> 	let mm;
> 	let task = current!();
> 	kthread_use_mm(some_mm, || {
> 	    mm = task.mm();
> 	});
> 	drop(some_mm);
> 	mm.do_something(); // UAF
>
> The existing `current!()` abstraction already natively prevents the
> first case: The `&CurrentTask` would be tied to the inner scope, so the
> borrow-checker ensures that no reference derived from it can escape the
> scope.
>
> Fixing the second case is a bit more tricky. The solution is to
> essentially pretend that the contents of the scope execute on an
> different thread, which means that only thread-safe types can cross the
> boundary. Since `CurrentTask` is marked `NotThreadSafe`, attempts to
> move it to another thread will fail, and this includes our fake pretend
> thread boundary.
>
> This has the disadvantage that other types that aren't thread-safe for
> reasons unrelated to `current` also cannot be moved across the
> `kthread_use_mm()` boundary. I consider this an acceptable tradeoff.
>
> Cc: Christian Brauner <brauner@kernel.org>
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> ---
>  rust/kernel/mm.rs   |  22 ----
>  rust/kernel/task.rs | 284 ++++++++++++++++++++++++++++++----------------------
>  2 files changed, 167 insertions(+), 139 deletions(-)
>
> diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
> index 50f4861ae4b9..f7d1079391ef 100644
> --- a/rust/kernel/mm.rs
> +++ b/rust/kernel/mm.rs
> @@ -142,28 +142,6 @@ fn deref(&self) -> &MmWithUser {
>
>  // These methods are safe to call even if `mm_users` is zero.
>  impl Mm {
> -    /// Call `mmgrab` on `current.mm`.
> -    #[inline]
> -    pub fn mmgrab_current() -> Option<ARef<Mm>> {
> -        // SAFETY: It's safe to get the `mm` field from current.
> -        let mm = unsafe {
> -            let current = bindings::get_current();
> -            (*current).mm
> -        };
> -
> -        if mm.is_null() {
> -            return None;
> -        }
> -
> -        // SAFETY: The value of `current->mm` is guaranteed to be null or a valid `mm_struct`. We
> -        // just checked that it's not null. Furthermore, the returned `&Mm` is valid only for the
> -        // duration of this function, and `current->mm` will stay valid for that long.
> -        let mm = unsafe { Mm::from_raw(mm) };
> -
> -        // This increments the refcount using `mmgrab`.
> -        Some(ARef::from(mm))
> -    }
> -
>      /// Returns a raw pointer to the inner `mm_struct`.
>      #[inline]
>      pub fn as_raw(&self) -> *mut bindings::mm_struct {
> diff --git a/rust/kernel/task.rs b/rust/kernel/task.rs
> index 07bc22a7645c..8c1ee46c03eb 100644
> --- a/rust/kernel/task.rs
> +++ b/rust/kernel/task.rs
> @@ -7,6 +7,7 @@
>  use crate::{
>      bindings,
>      ffi::{c_int, c_long, c_uint},
> +    mm::MmWithUser,
>      pid_namespace::PidNamespace,
>      types::{ARef, NotThreadSafe, Opaque},
>  };
> @@ -31,22 +32,20 @@
>  #[macro_export]
>  macro_rules! current {
>      () => {
> -        // SAFETY: Deref + addr-of below create a temporary `TaskRef` that cannot outlive the
> -        // caller.
> +        // SAFETY: This expression creates a temporary value that is dropped at the end of the
> +        // caller's scope. The following mechanisms ensure that the resulting `&CurrentTask` cannot
> +        // leave current task context:
> +        //
> +        // * To return to userspace, the caller must leave the current scope.
> +        // * Operations such as `begin_new_exec()` are necessarily unsafe and the caller of
> +        //   `begin_new_exec()` is responsible for safety.
> +        // * Rust abstractions for things such as a `kthread_use_mm()` scope must require the
> +        //   closure to be `Send`, so the `NotThreadSafe` field of `CurrentTask` ensures that the
> +        //   `&CurrentTask` cannot cross the scope in either direction.
>          unsafe { &*$crate::task::Task::current() }
>      };
>  }
>
> -/// Returns the currently running task's pid namespace.
> -#[macro_export]
> -macro_rules! current_pid_ns {
> -    () => {
> -        // SAFETY: Deref + addr-of below create a temporary `PidNamespaceRef` that cannot outlive
> -        // the caller.
> -        unsafe { &*$crate::task::Task::current_pid_ns() }
> -    };
> -}
> -
>  /// Wraps the kernel's `struct task_struct`.
>  ///
>  /// # Invariants
> @@ -105,6 +104,44 @@ unsafe impl Send for Task {}
>  // synchronised by C code (e.g., `signal_pending`).
>  unsafe impl Sync for Task {}
>
> +/// Represents the [`Task`] in the `current` global.
> +///
> +/// This type exists to provide more efficient operations that are only valid on the current task.
> +/// For example, to retrieve the pid-namespace of a task, you must use rcu protection unless it is
> +/// the current task.
> +///
> +/// # Invariants
> +///
> +/// Each value of this type must only be accessed from the task context it was created within.
> +///
> +/// Of course, every thread is in a different task context, but for the purposes of this invariant,
> +/// these operations also permanently leave the task context:
> +///
> +/// * Returning to userspace from system call context.
> +/// * Calling `release_task()`.
> +/// * Calling `begin_new_exec()` in a binary format loader.
> +///
> +/// Other operations temporarily create a new sub-context:
> +///
> +/// * Calling `kthread_use_mm()` creates a new context, and `kthread_unuse_mm()` returns to the
> +///   old context.
> +///
> +/// This means that a `CurrentTask` obtained before a `kthread_use_mm()` call may be used again
> +/// once `kthread_unuse_mm()` is called, but it must not be used between these two calls.
> +/// Conversely, a `CurrentTask` obtained between a `kthread_use_mm()`/`kthread_unuse_mm()` pair
> +/// must not be used after `kthread_unuse_mm()`.
> +#[repr(transparent)]
> +pub struct CurrentTask(Task, NotThreadSafe);
> +
> +// Make all `Task` methods available on `CurrentTask`.
> +impl Deref for CurrentTask {
> +    type Target = Task;
> +    #[inline]
> +    fn deref(&self) -> &Task {
> +        &self.0
> +    }
> +}
> +
>  /// The type of process identifiers (PIDs).
>  type Pid = bindings::pid_t;
>
> @@ -131,119 +168,29 @@ pub fn current_raw() -> *mut bindings::task_struct {
>      ///
>      /// # Safety
>      ///
> -    /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> -    pub unsafe fn current() -> impl Deref<Target = Task> {
> -        struct TaskRef<'a> {
> -            task: &'a Task,
> -            _not_send: NotThreadSafe,
> +    /// Callers must ensure that the returned object is only used to access a [`CurrentTask`]
> +    /// within the task context that was active when this function was called. For more details,
> +    /// see the invariants section for [`CurrentTask`].
> +    pub unsafe fn current() -> impl Deref<Target = CurrentTask> {
> +        struct TaskRef {
> +            task: *const CurrentTask,
>          }
>
> -        impl Deref for TaskRef<'_> {
> -            type Target = Task;
> +        impl Deref for TaskRef {
> +            type Target = CurrentTask;
>
>              fn deref(&self) -> &Self::Target {
> -                self.task
> +                // SAFETY: The returned reference borrows from this `TaskRef`, so it cannot outlive
> +                // the `TaskRef`, which the caller of `Task::current()` has promised will not
> +                // outlive the task/thread for which `self.task` is the `current` pointer. Thus, it
> +                // is okay to return a `CurrentTask` reference here.
> +                unsafe { &*self.task }
>              }
>          }
>
> -        let current = Task::current_raw();
>          TaskRef {
> -            // SAFETY: If the current thread is still running, the current task is valid. Given
> -            // that `TaskRef` is not `Send`, we know it cannot be transferred to another thread
> -            // (where it could potentially outlive the caller).
> -            task: unsafe { &*current.cast() },
> -            _not_send: NotThreadSafe,
> -        }
> -    }
> -
> -    /// Returns a PidNamespace reference for the currently executing task's/thread's pid namespace.
> -    ///
> -    /// This function can be used to create an unbounded lifetime by e.g., storing the returned
> -    /// PidNamespace in a global variable which would be a bug. So the recommended way to get the
> -    /// current task's/thread's pid namespace is to use the [`current_pid_ns`] macro because it is
> -    /// safe.
> -    ///
> -    /// # Safety
> -    ///
> -    /// Callers must ensure that the returned object doesn't outlive the current task/thread.
> -    pub unsafe fn current_pid_ns() -> impl Deref<Target = PidNamespace> {
> -        struct PidNamespaceRef<'a> {
> -            task: &'a PidNamespace,
> -            _not_send: NotThreadSafe,
> -        }
> -
> -        impl Deref for PidNamespaceRef<'_> {
> -            type Target = PidNamespace;
> -
> -            fn deref(&self) -> &Self::Target {
> -                self.task
> -            }
> -        }
> -
> -        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> -        //
> -        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> -        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> -        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> -        // created by the calling `Task`. This invariant guarantees that after having acquired a
> -        // reference to a `Task`'s pid namespace it will remain unchanged.
> -        //
> -        // When a task has exited and been reaped `release_task()` will be called. This will set
> -        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> -        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> -        // referencing count to
> -        // the `Task` will prevent `release_task()` being called.
> -        //
> -        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> -        // can be used. There are two cases to consider:
> -        //
> -        // (1) retrieving the `PidNamespace` of the `current` task
> -        // (2) retrieving the `PidNamespace` of a non-`current` task
> -        //
> -        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> -        // requires neither RCU locking nor a reference count to be held. Retrieving the
> -        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> -        // like that is exposed to Rust.
> -        //
> -        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> -        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> -        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> -        // task means `NULL` can be returned as the non-`current` task could have already passed
> -        // through `release_task()`.
> -        //
> -        // To retrieve (1) the `current_pid_ns!()` macro should be used which ensure that the
> -        // returned `PidNamespace` cannot outlive the calling scope. The associated
> -        // `current_pid_ns()` function should not be called directly as it could be abused to
> -        // created an unbounded lifetime for `PidNamespace`. The `current_pid_ns!()` macro allows
> -        // Rust to handle the common case of accessing `current`'s `PidNamespace` without RCU
> -        // protection and without having to acquire a reference count.
> -        //
> -        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> -        // reference on `PidNamespace` and will return an `Option` to force the caller to
> -        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> -        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> -        // difficult to perform operations that are otherwise safe without holding a reference
> -        // count as long as RCU protection is guaranteed. But it is not important currently. But we
> -        // do want it in the future.
> -        //
> -        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> -        // synchronizes against putting the last reference of the associated `struct pid` of
> -        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> -        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> -        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> -        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> -        // from `task->thread_pid` to finish.
> -        //
> -        // SAFETY: The current task's pid namespace is valid as long as the current task is running.
> -        let pidns = unsafe { bindings::task_active_pid_ns(Task::current_raw()) };
> -        PidNamespaceRef {
> -            // SAFETY: If the current thread is still running, the current task and its associated
> -            // pid namespace are valid. `PidNamespaceRef` is not `Send`, so we know it cannot be
> -            // transferred to another thread (where it could potentially outlive the current
> -            // `Task`). The caller needs to ensure that the PidNamespaceRef doesn't outlive the
> -            // current task/thread.
> -            task: unsafe { PidNamespace::from_ptr(pidns) },
> -            _not_send: NotThreadSafe,
> +            // CAST: The layout of `struct task_struct` and `CurrentTask` is identical.
> +            task: Task::current_raw().cast(),
>          }
>      }
>
> @@ -326,6 +273,109 @@ pub fn wake_up(&self) {
>      }
>  }
>
> +impl CurrentTask {
> +    /// Access the address space of the current task.
> +    ///
> +    /// This function does not touch the refcount of the mm.
> +    #[inline]
> +    pub fn mm(&self) -> Option<&MmWithUser> {
> +        // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
> +        // not a data race.
> +        let mm = unsafe { (*self.as_ptr()).mm };
> +
> +        if mm.is_null() {
> +            return None;
> +        }
> +
> +        // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> +        // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
> +        // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
> +        //
> +        // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
> +        // relevant cases:
> +        // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
> +        //   accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
> +        //   `NotThreadSafe` field of `CurrentTask`.
> +        // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
> +        //   scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
> +        //   also cannot escape that scope.
> +        // In either case, it's not possible to read `current->mm` and keep using it after the
> +        // scope is ended with `kthread_unuse_mm()`.

I guess we don't actually need the last section until we see
`ktread_use_mm` / `kthread_unuse_mm` abstractions in tree?

> +        Some(unsafe { MmWithUser::from_raw(mm) })
> +    }
> +
> +    /// Access the pid namespace of the current task.

Is it an address space or a memory map(ping)? Can we use consistent vocabulary?

> +    ///
> +    /// This function does not touch the refcount of the namespace or use RCU protection.
> +    #[doc(alias = "task_active_pid_ns")]

What is with the alias?

> +    #[inline]
> +    pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
> +        // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
> +        // on the current task.
> +        let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
> +
> +        if active_ns.is_null() {
> +            return None;
> +        }
> +
> +        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> +        //
> +        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> +        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> +        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> +        // created by the calling `Task`. This invariant guarantees that after having acquired a
> +        // reference to a `Task`'s pid namespace it will remain unchanged.
> +        //
> +        // When a task has exited and been reaped `release_task()` will be called. This will set
> +        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> +        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> +        // referencing count to the `Task` will prevent `release_task()` being called.
> +        //
> +        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> +        // can be used. There are two cases to consider:
> +        //
> +        // (1) retrieving the `PidNamespace` of the `current` task
> +        // (2) retrieving the `PidNamespace` of a non-`current` task
> +        //
> +        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> +        // requires neither RCU locking nor a reference count to be held. Retrieving the
> +        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> +        // like that is exposed to Rust.
> +        //
> +        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> +        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> +        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> +        // task means `NULL` can be returned as the non-`current` task could have already passed
> +        // through `release_task()`.
> +        //
> +        // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
> +        // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
> +        // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
> +        // without RCU protection and without having to acquire a reference count.
> +        //
> +        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> +        // reference on `PidNamespace` and will return an `Option` to force the caller to
> +        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> +        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> +        // difficult to perform operations that are otherwise safe without holding a reference
> +        // count as long as RCU protection is guaranteed. But it is not important currently. But we
> +        // do want it in the future.
> +        //
> +        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> +        // synchronizes against putting the last reference of the associated `struct pid` of
> +        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> +        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> +        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> +        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> +        // from `task->thread_pid` to finish.

While this comment is a nice piece of documentation, I think we should
move it elsewhere, or restrict it to paragraphs pertaining to (1), since
that is the only case we consider here?

> +        //
> +        // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
> +        // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
> +        // escape the scope in which the current pointer was obtained.
> +        Some(unsafe { PidNamespace::from_ptr(active_ns) })
> +    }

Can we move the impl block and the struct definition next to each other?


Best regards,
Andreas Hindborg

Re: [PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Alice Ryhl 1 year, 1 month ago

On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
> > +impl CurrentTask {
> > +    /// Access the address space of the current task.
> > +    ///
> > +    /// This function does not touch the refcount of the mm.
> > +    #[inline]
> > +    pub fn mm(&self) -> Option<&MmWithUser> {
> > +        // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
> > +        // not a data race.
> > +        let mm = unsafe { (*self.as_ptr()).mm };
> > +
> > +        if mm.is_null() {
> > +            return None;
> > +        }
> > +
> > +        // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
> > +        // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
> > +        // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
> > +        //
> > +        // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
> > +        // relevant cases:
> > +        // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
> > +        //   accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
> > +        //   `NotThreadSafe` field of `CurrentTask`.
> > +        // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
> > +        //   scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
> > +        //   also cannot escape that scope.
> > +        // In either case, it's not possible to read `current->mm` and keep using it after the
> > +        // scope is ended with `kthread_unuse_mm()`.
>
> I guess we don't actually need the last section until we see
> `ktread_use_mm` / `kthread_unuse_mm` abstractions in tree?

I mean, there could be such a scope in C code that called into Rust?
And I don't think there's anything wrong with future-proofing this
abstraction towards adding it in the future.

> > +        Some(unsafe { MmWithUser::from_raw(mm) })
> > +    }
> > +
> > +    /// Access the pid namespace of the current task.
>
> Is it an address space or a memory map(ping)? Can we use consistent vocabulary?

Neither. It's a pid namespace which has nothing to do with address
spaces or memory mappings. This part of this patch is moving an
existing abstraction to work with the reworked way to access current.

> > +    ///
> > +    /// This function does not touch the refcount of the namespace or use RCU protection.
> > +    #[doc(alias = "task_active_pid_ns")]
>
> What is with the alias?

This is the Rust equivalent to the C function called
task_active_pid_ns. The alias makes it easier to find it.

> > +    #[inline]
> > +    pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
> > +        // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
> > +        // on the current task.
> > +        let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
> > +
> > +        if active_ns.is_null() {
> > +            return None;
> > +        }
> > +
> > +        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> > +        //
> > +        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> > +        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> > +        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> > +        // created by the calling `Task`. This invariant guarantees that after having acquired a
> > +        // reference to a `Task`'s pid namespace it will remain unchanged.
> > +        //
> > +        // When a task has exited and been reaped `release_task()` will be called. This will set
> > +        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> > +        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> > +        // referencing count to the `Task` will prevent `release_task()` being called.
> > +        //
> > +        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> > +        // can be used. There are two cases to consider:
> > +        //
> > +        // (1) retrieving the `PidNamespace` of the `current` task
> > +        // (2) retrieving the `PidNamespace` of a non-`current` task
> > +        //
> > +        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> > +        // requires neither RCU locking nor a reference count to be held. Retrieving the
> > +        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> > +        // like that is exposed to Rust.
> > +        //
> > +        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> > +        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> > +        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> > +        // task means `NULL` can be returned as the non-`current` task could have already passed
> > +        // through `release_task()`.
> > +        //
> > +        // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
> > +        // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
> > +        // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
> > +        // without RCU protection and without having to acquire a reference count.
> > +        //
> > +        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> > +        // reference on `PidNamespace` and will return an `Option` to force the caller to
> > +        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> > +        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> > +        // difficult to perform operations that are otherwise safe without holding a reference
> > +        // count as long as RCU protection is guaranteed. But it is not important currently. But we
> > +        // do want it in the future.
> > +        //
> > +        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> > +        // synchronizes against putting the last reference of the associated `struct pid` of
> > +        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> > +        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> > +        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> > +        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> > +        // from `task->thread_pid` to finish.
>
> While this comment is a nice piece of documentation, I think we should
> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
> that is the only case we consider here?

Where would you move it?

> > +        //
> > +        // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
> > +        // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
> > +        // escape the scope in which the current pointer was obtained.
> > +        Some(unsafe { PidNamespace::from_ptr(active_ns) })
> > +    }
>
> Can we move the impl block and the struct definition next to each other?

I could move the definition of CurrentTask down, but I'm not really
convinced that it's an improvement.

Alice

Re: [PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Andreas Hindborg 1 year, 1 month ago

"Alice Ryhl" <aliceryhl@google.com> writes:

> On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>> > +impl CurrentTask {
>> > +    /// Access the address space of the current task.
>> > +    ///
>> > +    /// This function does not touch the refcount of the mm.
>> > +    #[inline]
>> > +    pub fn mm(&self) -> Option<&MmWithUser> {
>> > +        // SAFETY: The `mm` field of `current` is not modified from other threads, so reading it is
>> > +        // not a data race.
>> > +        let mm = unsafe { (*self.as_ptr()).mm };
>> > +
>> > +        if mm.is_null() {
>> > +            return None;
>> > +        }
>> > +
>> > +        // SAFETY: If `current->mm` is non-null, then it references a valid mm with a non-zero
>> > +        // value of `mm_users`. Furthermore, the returned `&MmWithUser` borrows from this
>> > +        // `CurrentTask`, so it cannot escape the scope in which the current pointer was obtained.
>> > +        //
>> > +        // This is safe even if `kthread_use_mm()`/`kthread_unuse_mm()` are used. There are two
>> > +        // relevant cases:
>> > +        // * If the `&CurrentTask` was created before `kthread_use_mm()`, then it cannot be
>> > +        //   accessed during the `kthread_use_mm()`/`kthread_unuse_mm()` scope due to the
>> > +        //   `NotThreadSafe` field of `CurrentTask`.
>> > +        // * If the `&CurrentTask` was created within a `kthread_use_mm()`/`kthread_unuse_mm()`
>> > +        //   scope, then the `&CurrentTask` cannot escape that scope, so the returned `&MmWithUser`
>> > +        //   also cannot escape that scope.
>> > +        // In either case, it's not possible to read `current->mm` and keep using it after the
>> > +        // scope is ended with `kthread_unuse_mm()`.
>>
>> I guess we don't actually need the last section until we see
>> `ktread_use_mm` / `kthread_unuse_mm` abstractions in tree?
>
> I mean, there could be such a scope in C code that called into Rust?

👍

>> > +        Some(unsafe { MmWithUser::from_raw(mm) })
>> > +    }
>> > +
>> > +    /// Access the pid namespace of the current task.
>>
>> Is it an address space or a memory map(ping)? Can we use consistent vocabulary?
>
> Neither. It's a pid namespace which has nothing to do with address
> spaces or memory mappings. This part of this patch is moving an
> existing abstraction to work with the reworked way to access current.

Sorry, not sure what I was talking about here. I feel like this comment
landed in the wrong place 😬

I remember taking note of the use of VMA, memory map, address space all
over the place. I object to "VMA" and would rather have it spelled out
in documentation.

>
>> > +    ///
>> > +    /// This function does not touch the refcount of the namespace or use RCU protection.
>> > +    #[doc(alias = "task_active_pid_ns")]
>>
>> What is with the alias?
>
> This is the Rust equivalent to the C function called
> task_active_pid_ns. The alias makes it easier to find it.

Cool.

>
>> > +    #[inline]
>> > +    pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
>> > +        // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
>> > +        // on the current task.
>> > +        let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
>> > +
>> > +        if active_ns.is_null() {
>> > +            return None;
>> > +        }
>> > +
>> > +        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
>> > +        //
>> > +        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
>> > +        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
>> > +        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
>> > +        // created by the calling `Task`. This invariant guarantees that after having acquired a
>> > +        // reference to a `Task`'s pid namespace it will remain unchanged.
>> > +        //
>> > +        // When a task has exited and been reaped `release_task()` will be called. This will set
>> > +        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
>> > +        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
>> > +        // referencing count to the `Task` will prevent `release_task()` being called.
>> > +        //
>> > +        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
>> > +        // can be used. There are two cases to consider:
>> > +        //
>> > +        // (1) retrieving the `PidNamespace` of the `current` task
>> > +        // (2) retrieving the `PidNamespace` of a non-`current` task
>> > +        //
>> > +        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
>> > +        // requires neither RCU locking nor a reference count to be held. Retrieving the
>> > +        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
>> > +        // like that is exposed to Rust.
>> > +        //
>> > +        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
>> > +        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
>> > +        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
>> > +        // task means `NULL` can be returned as the non-`current` task could have already passed
>> > +        // through `release_task()`.
>> > +        //
>> > +        // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
>> > +        // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
>> > +        // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
>> > +        // without RCU protection and without having to acquire a reference count.
>> > +        //
>> > +        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
>> > +        // reference on `PidNamespace` and will return an `Option` to force the caller to
>> > +        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
>> > +        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
>> > +        // difficult to perform operations that are otherwise safe without holding a reference
>> > +        // count as long as RCU protection is guaranteed. But it is not important currently. But we
>> > +        // do want it in the future.
>> > +        //
>> > +        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
>> > +        // synchronizes against putting the last reference of the associated `struct pid` of
>> > +        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
>> > +        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
>> > +        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
>> > +        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
>> > +        // from `task->thread_pid` to finish.
>>
>> While this comment is a nice piece of documentation, I think we should
>> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
>> that is the only case we consider here?
>
> Where would you move it?

The info about (2) should probably be with the implementation for that
case, when it lands. Perhaps we can move it hen?

>
>> > +        //
>> > +        // SAFETY: If `current`'s pid ns is non-null, then it references a valid pid ns.
>> > +        // Furthermore, the returned `&PidNamespace` borrows from this `CurrentTask`, so it cannot
>> > +        // escape the scope in which the current pointer was obtained.
>> > +        Some(unsafe { PidNamespace::from_ptr(active_ns) })
>> > +    }
>>
>> Can we move the impl block and the struct definition next to each other?
>
> I could move the definition of CurrentTask down, but I'm not really
> convinced that it's an improvement.

I would prefer that, but it's just personal preference. I think it makes
for a more comfortable ride when reading the code first time.



Best regards,
Andreas Hindborg

Re: [PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Alice Ryhl 1 year ago

On Thu, Jan 9, 2025 at 9:42 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>
> "Alice Ryhl" <aliceryhl@google.com> writes:
>
> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
> >>
> >> "Alice Ryhl" <aliceryhl@google.com> writes:
> >> > +    #[inline]
> >> > +    pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
> >> > +        // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
> >> > +        // on the current task.
> >> > +        let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
> >> > +
> >> > +        if active_ns.is_null() {
> >> > +            return None;
> >> > +        }
> >> > +
> >> > +        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
> >> > +        //
> >> > +        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
> >> > +        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
> >> > +        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
> >> > +        // created by the calling `Task`. This invariant guarantees that after having acquired a
> >> > +        // reference to a `Task`'s pid namespace it will remain unchanged.
> >> > +        //
> >> > +        // When a task has exited and been reaped `release_task()` will be called. This will set
> >> > +        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
> >> > +        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
> >> > +        // referencing count to the `Task` will prevent `release_task()` being called.
> >> > +        //
> >> > +        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
> >> > +        // can be used. There are two cases to consider:
> >> > +        //
> >> > +        // (1) retrieving the `PidNamespace` of the `current` task
> >> > +        // (2) retrieving the `PidNamespace` of a non-`current` task
> >> > +        //
> >> > +        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
> >> > +        // requires neither RCU locking nor a reference count to be held. Retrieving the
> >> > +        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
> >> > +        // like that is exposed to Rust.
> >> > +        //
> >> > +        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
> >> > +        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
> >> > +        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
> >> > +        // task means `NULL` can be returned as the non-`current` task could have already passed
> >> > +        // through `release_task()`.
> >> > +        //
> >> > +        // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
> >> > +        // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
> >> > +        // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
> >> > +        // without RCU protection and without having to acquire a reference count.
> >> > +        //
> >> > +        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
> >> > +        // reference on `PidNamespace` and will return an `Option` to force the caller to
> >> > +        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
> >> > +        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
> >> > +        // difficult to perform operations that are otherwise safe without holding a reference
> >> > +        // count as long as RCU protection is guaranteed. But it is not important currently. But we
> >> > +        // do want it in the future.
> >> > +        //
> >> > +        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
> >> > +        // synchronizes against putting the last reference of the associated `struct pid` of
> >> > +        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
> >> > +        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
> >> > +        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
> >> > +        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
> >> > +        // from `task->thread_pid` to finish.
> >>
> >> While this comment is a nice piece of documentation, I think we should
> >> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
> >> that is the only case we consider here?
> >
> > Where would you move it?
>
> The info about (2) should probably be with the implementation for that
> case, when it lands. Perhaps we can move it hen?

The function already exists. It's called Task::get_pid_ns(). I think
the comment makes sense here: get_pid_ns() is the normal case where
you don't skip synchronization, and active_pid_ns() is the special
case where you can skip RCU due to reasons. This comment explains that
normally you cannot skip RCU, but in this special case you can.

Alice

Re: [PATCH v11 8/8] task: rust: rework how current is accessed

Posted by Andreas Hindborg 1 year ago

"Alice Ryhl" <aliceryhl@google.com> writes:

> On Thu, Jan 9, 2025 at 9:42 AM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>>
>> "Alice Ryhl" <aliceryhl@google.com> writes:
>>
>> > On Mon, Dec 16, 2024 at 3:51 PM Andreas Hindborg <a.hindborg@kernel.org> wrote:
>> >>
>> >> "Alice Ryhl" <aliceryhl@google.com> writes:
>> >> > +    #[inline]
>> >> > +    pub fn active_pid_ns(&self) -> Option<&PidNamespace> {
>> >> > +        // SAFETY: It is safe to call `task_active_pid_ns` without RCU protection when calling it
>> >> > +        // on the current task.
>> >> > +        let active_ns = unsafe { bindings::task_active_pid_ns(self.as_ptr()) };
>> >> > +
>> >> > +        if active_ns.is_null() {
>> >> > +            return None;
>> >> > +        }
>> >> > +
>> >> > +        // The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.
>> >> > +        //
>> >> > +        // The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive. A
>> >> > +        // `unshare(CLONE_NEWPID)` or `setns(fd_pidns/pidfd, CLONE_NEWPID)` will not have an effect
>> >> > +        // on the calling `Task`'s pid namespace. It will only effect the pid namespace of children
>> >> > +        // created by the calling `Task`. This invariant guarantees that after having acquired a
>> >> > +        // reference to a `Task`'s pid namespace it will remain unchanged.
>> >> > +        //
>> >> > +        // When a task has exited and been reaped `release_task()` will be called. This will set
>> >> > +        // the `PidNamespace` of the task to `NULL`. So retrieving the `PidNamespace` of a task
>> >> > +        // that is dead will return `NULL`. Note, that neither holding the RCU lock nor holding a
>> >> > +        // referencing count to the `Task` will prevent `release_task()` being called.
>> >> > +        //
>> >> > +        // In order to retrieve the `PidNamespace` of a `Task` the `task_active_pid_ns()` function
>> >> > +        // can be used. There are two cases to consider:
>> >> > +        //
>> >> > +        // (1) retrieving the `PidNamespace` of the `current` task
>> >> > +        // (2) retrieving the `PidNamespace` of a non-`current` task
>> >> > +        //
>> >> > +        // From system call context retrieving the `PidNamespace` for case (1) is always safe and
>> >> > +        // requires neither RCU locking nor a reference count to be held. Retrieving the
>> >> > +        // `PidNamespace` after `release_task()` for current will return `NULL` but no codepath
>> >> > +        // like that is exposed to Rust.
>> >> > +        //
>> >> > +        // Retrieving the `PidNamespace` from system call context for (2) requires RCU protection.
>> >> > +        // Accessing `PidNamespace` outside of RCU protection requires a reference count that
>> >> > +        // must've been acquired while holding the RCU lock. Note that accessing a non-`current`
>> >> > +        // task means `NULL` can be returned as the non-`current` task could have already passed
>> >> > +        // through `release_task()`.
>> >> > +        //
>> >> > +        // To retrieve (1) the `&CurrentTask` type should be used which ensures that the returned
>> >> > +        // `PidNamespace` cannot outlive the current task context. The `CurrentTask::active_pid_ns`
>> >> > +        // function allows Rust to handle the common case of accessing `current`'s `PidNamespace`
>> >> > +        // without RCU protection and without having to acquire a reference count.
>> >> > +        //
>> >> > +        // For (2) the `task_get_pid_ns()` method must be used. This will always acquire a
>> >> > +        // reference on `PidNamespace` and will return an `Option` to force the caller to
>> >> > +        // explicitly handle the case where `PidNamespace` is `None`, something that tends to be
>> >> > +        // forgotten when doing the equivalent operation in `C`. Missing RCU primitives make it
>> >> > +        // difficult to perform operations that are otherwise safe without holding a reference
>> >> > +        // count as long as RCU protection is guaranteed. But it is not important currently. But we
>> >> > +        // do want it in the future.
>> >> > +        //
>> >> > +        // Note for (2) the required RCU protection around calling `task_active_pid_ns()`
>> >> > +        // synchronizes against putting the last reference of the associated `struct pid` of
>> >> > +        // `task->thread_pid`. The `struct pid` stored in that field is used to retrieve the
>> >> > +        // `PidNamespace` of the caller. When `release_task()` is called `task->thread_pid` will be
>> >> > +        // `NULL`ed and `put_pid()` on said `struct pid` will be delayed in `free_pid()` via
>> >> > +        // `call_rcu()` allowing everyone with an RCU protected access to the `struct pid` acquired
>> >> > +        // from `task->thread_pid` to finish.
>> >>
>> >> While this comment is a nice piece of documentation, I think we should
>> >> move it elsewhere, or restrict it to paragraphs pertaining to (1), since
>> >> that is the only case we consider here?
>> >
>> > Where would you move it?
>>
>> The info about (2) should probably be with the implementation for that
>> case, when it lands. Perhaps we can move it hen?
>
> The function already exists. It's called Task::get_pid_ns(). I think
> the comment makes sense here: get_pid_ns() is the normal case where
> you don't skip synchronization, and active_pid_ns() is the special
> case where you can skip RCU due to reasons. This comment explains that
> normally you cannot skip RCU, but in this special case you can.

Reading this again I think it should simply be cut down in size:


```
The lifetime of `PidNamespace` is bound to `Task` and `struct pid`.

The `PidNamespace` of a `Task` doesn't ever change once the `Task` is alive.

From system call context retrieving the `PidNamespace` for the current
task is always safe and requires neither RCU locking nor a reference
count to be held. Retrieving the `PidNamespace` after `release_task()`
for current will return `NULL` but no codepath like that is exposed to
Rust.
```

The rest is not relevant to this function and it does not help
understanding the function.


Another thought - add a link to `get_pid_ns`:

@@ -307,6 +307,8 @@ pub fn mm(&self) -> Option<&MmWithUser> {
     /// Access the pid namespace of the current task.
     ///
     /// This function does not touch the refcount of the namespace or use RCU protection.
+    ///
+    /// To access the pid namespace of another task, see [`Task::get_pid_ns`].
     #[doc(alias = "task_active_pid_ns")]
     #[inline]
     pub fn active_pid_ns(&self) -> Option<&PidNamespace> {


Best regards,
Andreas Hindborg

[PATCH v11 1/8] mm: rust: add abstraction for struct mm_struct
[PATCH v11 2/8] mm: rust: add vm_area_struct methods that require read access
[PATCH v11 3/8] mm: rust: add vm_insert_page
[PATCH v11 4/8] mm: rust: add lock_vma_under_rcu
[PATCH v11 5/8] mm: rust: add mmput_async support
[PATCH v11 6/8] mm: rust: add VmAreaNew for f_ops->mmap()
[PATCH v11 7/8] rust: miscdevice: add mmap support
[PATCH v11 8/8] task: rust: rework how current is accessed