[PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access

Alice Ryhl posted 8 patches 1 year, 2 months ago
There is a newer version of this series
[PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
Posted by Alice Ryhl 1 year, 2 months ago
This adds a type called VmAreaRef which is used when referencing a vma
that you have read access to. Here, read access means that you hold
either the mmap read lock or the vma read lock (or stronger).

Additionally, a vma_lookup method is added to the mmap read guard, which
enables you to obtain a &VmAreaRef in safe Rust code.

This patch only provides a way to lock the mmap read lock, but a
follow-up patch also provides a way to just lock the vma read lock.

Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
---
 rust/helpers/mm.c      |   6 ++
 rust/kernel/mm.rs      |  21 ++++++
 rust/kernel/mm/virt.rs | 176 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 203 insertions(+)

diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c
index 7201747a5d31..7b72eb065a3e 100644
--- a/rust/helpers/mm.c
+++ b/rust/helpers/mm.c
@@ -37,3 +37,9 @@ void rust_helper_mmap_read_unlock(struct mm_struct *mm)
 {
 	mmap_read_unlock(mm);
 }
+
+struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm,
+					      unsigned long addr)
+{
+	return vma_lookup(mm, addr);
+}
diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs
index 84cba581edaa..ace8e7d57afe 100644
--- a/rust/kernel/mm.rs
+++ b/rust/kernel/mm.rs
@@ -12,6 +12,8 @@
 };
 use core::{ops::Deref, ptr::NonNull};
 
+pub mod virt;
+
 /// A wrapper for the kernel's `struct mm_struct`.
 ///
 /// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use
@@ -210,6 +212,25 @@ pub struct MmapReadGuard<'a> {
     _nts: NotThreadSafe,
 }
 
+impl<'a> MmapReadGuard<'a> {
+    /// Look up a vma at the given address.
+    #[inline]
+    pub fn vma_lookup(&self, vma_addr: usize) -> Option<&virt::VmAreaRef> {
+        // SAFETY: We hold a reference to the mm, so the pointer must be valid. Any value is okay
+        // for `vma_addr`.
+        let vma = unsafe { bindings::vma_lookup(self.mm.as_raw(), vma_addr as _) };
+
+        if vma.is_null() {
+            None
+        } else {
+            // SAFETY: We just checked that a vma was found, so the pointer is valid. Furthermore,
+            // the returned area will borrow from this read lock guard, so it can only be used
+            // while the mmap read lock is still held.
+            unsafe { Some(virt::VmAreaRef::from_raw(vma)) }
+        }
+    }
+}
+
 impl Drop for MmapReadGuard<'_> {
     #[inline]
     fn drop(&mut self) {
diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs
new file mode 100644
index 000000000000..6df145fea128
--- /dev/null
+++ b/rust/kernel/mm/virt.rs
@@ -0,0 +1,176 @@
+// SPDX-License-Identifier: GPL-2.0
+
+// Copyright (C) 2024 Google LLC.
+
+//! Virtual memory.
+
+use crate::{bindings, types::Opaque};
+
+/// A wrapper for the kernel's `struct vm_area_struct` with read access.
+///
+/// It represents an area of virtual memory.
+///
+/// # Invariants
+///
+/// The caller must hold the mmap read lock or the vma read lock.
+#[repr(transparent)]
+pub struct VmAreaRef {
+    vma: Opaque<bindings::vm_area_struct>,
+}
+
+// Methods you can call when holding the mmap or vma read lock (or strong). They must be usable no
+// matter what the vma flags are.
+impl VmAreaRef {
+    /// Access a virtual memory area given a raw pointer.
+    ///
+    /// # Safety
+    ///
+    /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap or vma
+    /// read lock (or stronger) is held for at least the duration of 'a.
+    #[inline]
+    pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self {
+        // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a.
+        unsafe { &*vma.cast() }
+    }
+
+    /// Returns a raw pointer to this area.
+    #[inline]
+    pub fn as_ptr(&self) -> *mut bindings::vm_area_struct {
+        self.vma.get()
+    }
+
+    /// Returns the flags associated with the virtual memory area.
+    ///
+    /// The possible flags are a combination of the constants in [`flags`].
+    #[inline]
+    pub fn flags(&self) -> vm_flags_t {
+        // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+        // access is not a data race.
+        unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags as _ }
+    }
+
+    /// Returns the (inclusive) start address of the virtual memory area.
+    #[inline]
+    pub fn start(&self) -> usize {
+        // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+        // access is not a data race.
+        unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_start as _ }
+    }
+
+    /// Returns the (exclusive) end address of the virtual memory area.
+    #[inline]
+    pub fn end(&self) -> usize {
+        // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this
+        // access is not a data race.
+        unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_end as _ }
+    }
+
+    /// Zap pages in the given page range.
+    ///
+    /// This clears page table mappings for the range at the leaf level, leaving all other page
+    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
+    /// anonymous memory is completely freed, file-backed memory has its reference count on page
+    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
+    #[inline]
+    pub fn zap_page_range_single(&self, address: usize, size: usize) {
+        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
+        // sufficient for this method call. This method has no requirements on the vma flags. Any
+        // value of `address` and `size` is allowed.
+        unsafe {
+            bindings::zap_page_range_single(
+                self.as_ptr(),
+                address as _,
+                size as _,
+                core::ptr::null_mut(),
+            )
+        };
+    }
+}
+
+/// The integer type used for vma flags.
+#[doc(inline)]
+pub use bindings::vm_flags_t;
+
+/// All possible flags for [`VmAreaRef`].
+pub mod flags {
+    use super::vm_flags_t;
+    use crate::bindings;
+
+    /// No flags are set.
+    pub const NONE: vm_flags_t = bindings::VM_NONE as _;
+
+    /// Mapping allows reads.
+    pub const READ: vm_flags_t = bindings::VM_READ as _;
+
+    /// Mapping allows writes.
+    pub const WRITE: vm_flags_t = bindings::VM_WRITE as _;
+
+    /// Mapping allows execution.
+    pub const EXEC: vm_flags_t = bindings::VM_EXEC as _;
+
+    /// Mapping is shared.
+    pub const SHARED: vm_flags_t = bindings::VM_SHARED as _;
+
+    /// Mapping may be updated to allow reads.
+    pub const MAYREAD: vm_flags_t = bindings::VM_MAYREAD as _;
+
+    /// Mapping may be updated to allow writes.
+    pub const MAYWRITE: vm_flags_t = bindings::VM_MAYWRITE as _;
+
+    /// Mapping may be updated to allow execution.
+    pub const MAYEXEC: vm_flags_t = bindings::VM_MAYEXEC as _;
+
+    /// Mapping may be updated to be shared.
+    pub const MAYSHARE: vm_flags_t = bindings::VM_MAYSHARE as _;
+
+    /// Page-ranges managed without `struct page`, just pure PFN.
+    pub const PFNMAP: vm_flags_t = bindings::VM_PFNMAP as _;
+
+    /// Memory mapped I/O or similar.
+    pub const IO: vm_flags_t = bindings::VM_IO as _;
+
+    /// Do not copy this vma on fork.
+    pub const DONTCOPY: vm_flags_t = bindings::VM_DONTCOPY as _;
+
+    /// Cannot expand with mremap().
+    pub const DONTEXPAND: vm_flags_t = bindings::VM_DONTEXPAND as _;
+
+    /// Lock the pages covered when they are faulted in.
+    pub const LOCKONFAULT: vm_flags_t = bindings::VM_LOCKONFAULT as _;
+
+    /// Is a VM accounted object.
+    pub const ACCOUNT: vm_flags_t = bindings::VM_ACCOUNT as _;
+
+    /// Should the VM suppress accounting.
+    pub const NORESERVE: vm_flags_t = bindings::VM_NORESERVE as _;
+
+    /// Huge TLB Page VM.
+    pub const HUGETLB: vm_flags_t = bindings::VM_HUGETLB as _;
+
+    /// Synchronous page faults. (DAX-specific)
+    pub const SYNC: vm_flags_t = bindings::VM_SYNC as _;
+
+    /// Architecture-specific flag.
+    pub const ARCH_1: vm_flags_t = bindings::VM_ARCH_1 as _;
+
+    /// Wipe VMA contents in child on fork.
+    pub const WIPEONFORK: vm_flags_t = bindings::VM_WIPEONFORK as _;
+
+    /// Do not include in the core dump.
+    pub const DONTDUMP: vm_flags_t = bindings::VM_DONTDUMP as _;
+
+    /// Not soft dirty clean area.
+    pub const SOFTDIRTY: vm_flags_t = bindings::VM_SOFTDIRTY as _;
+
+    /// Can contain `struct page` and pure PFN pages.
+    pub const MIXEDMAP: vm_flags_t = bindings::VM_MIXEDMAP as _;
+
+    /// MADV_HUGEPAGE marked this vma.
+    pub const HUGEPAGE: vm_flags_t = bindings::VM_HUGEPAGE as _;
+
+    /// MADV_NOHUGEPAGE marked this vma.
+    pub const NOHUGEPAGE: vm_flags_t = bindings::VM_NOHUGEPAGE as _;
+
+    /// KSM may merge identical pages.
+    pub const MERGEABLE: vm_flags_t = bindings::VM_MERGEABLE as _;
+}

-- 
2.47.0.371.ga323438b13-goog
Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
Posted by Jann Horn 1 year, 2 months ago
On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> This adds a type called VmAreaRef which is used when referencing a vma
> that you have read access to. Here, read access means that you hold
> either the mmap read lock or the vma read lock (or stronger).
>
> Additionally, a vma_lookup method is added to the mmap read guard, which
> enables you to obtain a &VmAreaRef in safe Rust code.
>
> This patch only provides a way to lock the mmap read lock, but a
> follow-up patch also provides a way to just lock the vma read lock.
>
> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> Signed-off-by: Alice Ryhl <aliceryhl@google.com>

Reviewed-by: Jann Horn <jannh@google.com>

with one comment:

> +    /// Zap pages in the given page range.
> +    ///
> +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> +    #[inline]
> +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> +        // value of `address` and `size` is allowed.

If we really want to allow any address and size, we might want to add
an early bailout in zap_page_range_single(). The comment on top of
zap_page_range_single() currently says "The range must fit into one
VMA", and it looks like by the point we reach a bailout, we could have
gone through an interval tree walk via
mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
for a range that ends before it starts; I don't know how safe that is.

> +        unsafe {
> +            bindings::zap_page_range_single(
> +                self.as_ptr(),
> +                address as _,
> +                size as _,
> +                core::ptr::null_mut(),
> +            )
> +        };
> +    }
> +}
Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
Posted by Alice Ryhl 1 year, 2 months ago
On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > This adds a type called VmAreaRef which is used when referencing a vma
> > that you have read access to. Here, read access means that you hold
> > either the mmap read lock or the vma read lock (or stronger).
> >
> > Additionally, a vma_lookup method is added to the mmap read guard, which
> > enables you to obtain a &VmAreaRef in safe Rust code.
> >
> > This patch only provides a way to lock the mmap read lock, but a
> > follow-up patch also provides a way to just lock the vma read lock.
> >
> > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
>
> Reviewed-by: Jann Horn <jannh@google.com>

Thanks!

> with one comment:
>
> > +    /// Zap pages in the given page range.
> > +    ///
> > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > +    #[inline]
> > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > +        // value of `address` and `size` is allowed.
>
> If we really want to allow any address and size, we might want to add
> an early bailout in zap_page_range_single(). The comment on top of
> zap_page_range_single() currently says "The range must fit into one
> VMA", and it looks like by the point we reach a bailout, we could have
> gone through an interval tree walk via
> mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> for a range that ends before it starts; I don't know how safe that is.

I could change the comment on zap_page_range_single() to say:

"The range should be contained within a single VMA. Otherwise an error
is returned."

And then I can add an overflow check at the top of
zap_page_range_single(). Sounds ok?

Alice
Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
Posted by Jann Horn 1 year, 2 months ago
On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > This adds a type called VmAreaRef which is used when referencing a vma
> > > that you have read access to. Here, read access means that you hold
> > > either the mmap read lock or the vma read lock (or stronger).
> > >
> > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > enables you to obtain a &VmAreaRef in safe Rust code.
> > >
> > > This patch only provides a way to lock the mmap read lock, but a
> > > follow-up patch also provides a way to just lock the vma read lock.
> > >
> > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> >
> > Reviewed-by: Jann Horn <jannh@google.com>
>
> Thanks!
>
> > with one comment:
> >
> > > +    /// Zap pages in the given page range.
> > > +    ///
> > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > +    #[inline]
> > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > +        // value of `address` and `size` is allowed.
> >
> > If we really want to allow any address and size, we might want to add
> > an early bailout in zap_page_range_single(). The comment on top of
> > zap_page_range_single() currently says "The range must fit into one
> > VMA", and it looks like by the point we reach a bailout, we could have
> > gone through an interval tree walk via
> > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > for a range that ends before it starts; I don't know how safe that is.
>
> I could change the comment on zap_page_range_single() to say:
>
> "The range should be contained within a single VMA. Otherwise an error
> is returned."
>
> And then I can add an overflow check at the top of
> zap_page_range_single(). Sounds ok?

Yes, I think changing the comment like that and adding a check for
whether address+size wraps around there addresses this.
Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
Posted by Alice Ryhl 1 year, 2 months ago
On Wed, Nov 27, 2024 at 4:41 PM Jann Horn <jannh@google.com> wrote:
>
> On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > This adds a type called VmAreaRef which is used when referencing a vma
> > > > that you have read access to. Here, read access means that you hold
> > > > either the mmap read lock or the vma read lock (or stronger).
> > > >
> > > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > > enables you to obtain a &VmAreaRef in safe Rust code.
> > > >
> > > > This patch only provides a way to lock the mmap read lock, but a
> > > > follow-up patch also provides a way to just lock the vma read lock.
> > > >
> > > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > >
> > > Reviewed-by: Jann Horn <jannh@google.com>
> >
> > Thanks!
> >
> > > with one comment:
> > >
> > > > +    /// Zap pages in the given page range.
> > > > +    ///
> > > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > > +    #[inline]
> > > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > > +        // value of `address` and `size` is allowed.
> > >
> > > If we really want to allow any address and size, we might want to add
> > > an early bailout in zap_page_range_single(). The comment on top of
> > > zap_page_range_single() currently says "The range must fit into one
> > > VMA", and it looks like by the point we reach a bailout, we could have
> > > gone through an interval tree walk via
> > > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > > for a range that ends before it starts; I don't know how safe that is.
> >
> > I could change the comment on zap_page_range_single() to say:
> >
> > "The range should be contained within a single VMA. Otherwise an error
> > is returned."
> >
> > And then I can add an overflow check at the top of
> > zap_page_range_single(). Sounds ok?
>
> Yes, I think changing the comment like that and adding a check for
> whether address+size wraps around there addresses this.

Hmm. Looking at this again now ...

For one, the function returns void so we can at best handle overflow
by performing a no-op.

Another question is, are we actually okay to call it with ranges
outside the vma? Does that just no-op or what? Maybe the Rust side
should just bail out early if the address range is not a subset of the
vma?

Alice
Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
Posted by Lorenzo Stoakes 1 year, 2 months ago
On Fri, Nov 29, 2024 at 12:44:10PM +0100, Alice Ryhl wrote:
> On Wed, Nov 27, 2024 at 4:41 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> > > >
> > > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > This adds a type called VmAreaRef which is used when referencing a vma
> > > > > that you have read access to. Here, read access means that you hold
> > > > > either the mmap read lock or the vma read lock (or stronger).
> > > > >
> > > > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > > > enables you to obtain a &VmAreaRef in safe Rust code.
> > > > >
> > > > > This patch only provides a way to lock the mmap read lock, but a
> > > > > follow-up patch also provides a way to just lock the vma read lock.
> > > > >
> > > > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > >
> > > > Reviewed-by: Jann Horn <jannh@google.com>
> > >
> > > Thanks!
> > >
> > > > with one comment:
> > > >
> > > > > +    /// Zap pages in the given page range.
> > > > > +    ///
> > > > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > > > +    #[inline]
> > > > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > > > +        // value of `address` and `size` is allowed.
> > > >
> > > > If we really want to allow any address and size, we might want to add
> > > > an early bailout in zap_page_range_single(). The comment on top of
> > > > zap_page_range_single() currently says "The range must fit into one
> > > > VMA", and it looks like by the point we reach a bailout, we could have
> > > > gone through an interval tree walk via
> > > > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > > > for a range that ends before it starts; I don't know how safe that is.
> > >
> > > I could change the comment on zap_page_range_single() to say:
> > >
> > > "The range should be contained within a single VMA. Otherwise an error
> > > is returned."
> > >
> > > And then I can add an overflow check at the top of
> > > zap_page_range_single(). Sounds ok?
> >
> > Yes, I think changing the comment like that and adding a check for
> > whether address+size wraps around there addresses this.
>
> Hmm. Looking at this again now ...
>
> For one, the function returns void so we can at best handle overflow
> by performing a no-op.
>
> Another question is, are we actually okay to call it with ranges
> outside the vma? Does that just no-op or what? Maybe the Rust side
> should just bail out early if the address range is not a subset of the
> vma?

In unmap_single_vma() invoked by zap_page_range_single() we check this:

	unsigned long start = max(vma->vm_start, start_addr);

	...

	if (start >= vma->vm_end)
		return;
	end = min(vma->vm_end, end_addr);
	if (end <= vma->vm_start)
		return;

So you could specify start <= vma->vm_end and end > vma->vm_start and it'll get
clamped like:

0                                        MAX
        start  <-clamp->
<---------------------->xxxxxxxxxxxxxxxxxxxx
               |--------| VMA
xxxxxxxxxxxxxxxx<-------------------------->
	        <-clamp->

However note that we will tell mmum_notifier_range_init() and
hugetlb_zap_begin() incorrect values in this case... not sure.

I'd prefer if rust would strictly only allowed ranges within the VMA.

>
> Alice
Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
Posted by Alice Ryhl 1 year, 2 months ago
On Wed, Nov 27, 2024 at 4:41 PM Jann Horn <jannh@google.com> wrote:
>
> On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> > >
> > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > This adds a type called VmAreaRef which is used when referencing a vma
> > > > that you have read access to. Here, read access means that you hold
> > > > either the mmap read lock or the vma read lock (or stronger).
> > > >
> > > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > > enables you to obtain a &VmAreaRef in safe Rust code.
> > > >
> > > > This patch only provides a way to lock the mmap read lock, but a
> > > > follow-up patch also provides a way to just lock the vma read lock.
> > > >
> > > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > >
> > > Reviewed-by: Jann Horn <jannh@google.com>
> >
> > Thanks!
> >
> > > with one comment:
> > >
> > > > +    /// Zap pages in the given page range.
> > > > +    ///
> > > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > > +    #[inline]
> > > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > > +        // value of `address` and `size` is allowed.
> > >
> > > If we really want to allow any address and size, we might want to add
> > > an early bailout in zap_page_range_single(). The comment on top of
> > > zap_page_range_single() currently says "The range must fit into one
> > > VMA", and it looks like by the point we reach a bailout, we could have
> > > gone through an interval tree walk via
> > > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > > for a range that ends before it starts; I don't know how safe that is.
> >
> > I could change the comment on zap_page_range_single() to say:
> >
> > "The range should be contained within a single VMA. Otherwise an error
> > is returned."
> >
> > And then I can add an overflow check at the top of
> > zap_page_range_single(). Sounds ok?
>
> Yes, I think changing the comment like that and adding a check for
> whether address+size wraps around there addresses this.

Can there be a page at the top of the address space? If so, I have to
be a bit careful in the wrap-around check, because it should only fail
if the addition wraps around *and* the sum is non-zero.

Alice
Re: [PATCH v9 2/8] mm: rust: add vm_area_struct methods that require read access
Posted by Jann Horn 1 year, 2 months ago
On Wed, Nov 27, 2024 at 4:46 PM Alice Ryhl <aliceryhl@google.com> wrote:
>
> On Wed, Nov 27, 2024 at 4:41 PM Jann Horn <jannh@google.com> wrote:
> >
> > On Wed, Nov 27, 2024 at 1:01 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > On Tue, Nov 26, 2024 at 11:10 PM Jann Horn <jannh@google.com> wrote:
> > > >
> > > > On Fri, Nov 22, 2024 at 4:41 PM Alice Ryhl <aliceryhl@google.com> wrote:
> > > > > This adds a type called VmAreaRef which is used when referencing a vma
> > > > > that you have read access to. Here, read access means that you hold
> > > > > either the mmap read lock or the vma read lock (or stronger).
> > > > >
> > > > > Additionally, a vma_lookup method is added to the mmap read guard, which
> > > > > enables you to obtain a &VmAreaRef in safe Rust code.
> > > > >
> > > > > This patch only provides a way to lock the mmap read lock, but a
> > > > > follow-up patch also provides a way to just lock the vma read lock.
> > > > >
> > > > > Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> (for mm bits)
> > > > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > >
> > > > Reviewed-by: Jann Horn <jannh@google.com>
> > >
> > > Thanks!
> > >
> > > > with one comment:
> > > >
> > > > > +    /// Zap pages in the given page range.
> > > > > +    ///
> > > > > +    /// This clears page table mappings for the range at the leaf level, leaving all other page
> > > > > +    /// tables intact, and freeing any memory referenced by the VMA in this range. That is,
> > > > > +    /// anonymous memory is completely freed, file-backed memory has its reference count on page
> > > > > +    /// cache folio's dropped, any dirty data will still be written back to disk as usual.
> > > > > +    #[inline]
> > > > > +    pub fn zap_page_range_single(&self, address: usize, size: usize) {
> > > > > +        // SAFETY: By the type invariants, the caller has read access to this VMA, which is
> > > > > +        // sufficient for this method call. This method has no requirements on the vma flags. Any
> > > > > +        // value of `address` and `size` is allowed.
> > > >
> > > > If we really want to allow any address and size, we might want to add
> > > > an early bailout in zap_page_range_single(). The comment on top of
> > > > zap_page_range_single() currently says "The range must fit into one
> > > > VMA", and it looks like by the point we reach a bailout, we could have
> > > > gone through an interval tree walk via
> > > > mmu_notifier_invalidate_range_start()->__mmu_notifier_invalidate_range_start()->mn_itree_invalidate()
> > > > for a range that ends before it starts; I don't know how safe that is.
> > >
> > > I could change the comment on zap_page_range_single() to say:
> > >
> > > "The range should be contained within a single VMA. Otherwise an error
> > > is returned."
> > >
> > > And then I can add an overflow check at the top of
> > > zap_page_range_single(). Sounds ok?
> >
> > Yes, I think changing the comment like that and adding a check for
> > whether address+size wraps around there addresses this.
>
> Can there be a page at the top of the address space? If so, I have to
> be a bit careful in the wrap-around check, because it should only fail
> if the addition wraps around *and* the sum is non-zero.

Uh, good question... I can't think of any architectures that have
userspace mappings right at the top of the address space, and having
objects allocated right at the end of the address space would violate
the C standard (because C guarantees that pointers pointing directly
behind an array behave reasonably, and this would not work if a
pointer pointing directly behind an array could wrap around to NULL).
And unless the userspace allocator takes responsibility for enforcing
this edge case, the kernel has to do it by preventing the last page of
the virtual address space from being mapped. (This is the reason why a
32-bit process on an arm64 kernel is normally only allowed to map
addresses up to 0xfffff000, see
<https://git.kernel.org/linus/d263119387de>.)

Allowing userspace to map the top of the address space would also be a
bad idea because then ERR_PTR() could return valid userspace pointers.

Looking at the current implementation of zap_page_range_single(), in
the case you're describing, unmap_single_vma() would get called with
end_addr==0, and then we'd bail out on the "if (end <= vma->vm_start)"
check. So calling zap_page_range_single() on such a range would
already fail.