[PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM

Joel Fernandes posted 7 patches 3 months, 3 weeks ago
[PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Posted by Joel Fernandes 3 months, 3 weeks ago
Required for writing page tables directly to VRAM physical memory,
before page tables and MMU are setup.

Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
---
 drivers/gpu/nova-core/mm/mod.rs    |   3 +
 drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
 drivers/gpu/nova-core/nova_core.rs |   1 +
 drivers/gpu/nova-core/regs.rs      |  29 +++-
 4 files changed, 273 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/nova-core/mm/mod.rs
 create mode 100644 drivers/gpu/nova-core/mm/pramin.rs

diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
new file mode 100644
index 000000000000..54c7cd9416a9
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/mod.rs
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0
+
+pub(crate) mod pramin;
diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
new file mode 100644
index 000000000000..4f4e1b8c0b9b
--- /dev/null
+++ b/drivers/gpu/nova-core/mm/pramin.rs
@@ -0,0 +1,241 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Direct VRAM access through PRAMIN window before page tables are set up.
+//! PRAMIN can also write to system memory, however for simplicty we only
+//! support VRAM access.
+//!
+//! # Examples
+//!
+//! ## Writing u32 data to VRAM
+//!
+//! ```no_run
+//! use crate::driver::Bar0;
+//! use crate::mm::pramin::PraminVram;
+//!
+//! fn write_data_to_vram(bar: &Bar0) -> Result {
+//!     let pramin = PraminVram::new(bar);
+//!     // Write 4 32-bit words to VRAM at offset 0x10000
+//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
+//!     pramin.write::<u32>(0x10000, &data)?;
+//!     Ok(())
+//! }
+//! ```
+//!
+//! ## Reading bytes from VRAM
+//!
+//! ```no_run
+//! use crate::driver::Bar0;
+//! use crate::mm::pramin::PraminVram;
+//!
+//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
+//!     let pramin = PraminVram::new(bar);
+//!     // Read a u8 from VRAM starting at offset 0x20000
+//!     pramin.read::<u8>(0x20000, buffer)?;
+//!     Ok(())
+//! }
+//! ```
+
+#![expect(dead_code)]
+
+use crate::driver::Bar0;
+use crate::regs;
+use core::mem;
+use kernel::prelude::*;
+
+/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
+/// the VRAM directly. These addresses are consistent across all GPUs.
+const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
+const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
+
+/// Trait for types that can be read/written through PRAMIN.
+pub(crate) trait PraminNum: Copy + Default + Sized {
+    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
+
+    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
+
+    fn size_bytes() -> usize {
+        mem::size_of::<Self>()
+    }
+
+    fn alignment() -> usize {
+        Self::size_bytes()
+    }
+}
+
+/// Macro to implement PraminNum trait for unsigned integer types.
+macro_rules! impl_pramin_unsigned_num {
+    ($bits:literal) => {
+        ::kernel::macros::paste! {
+            impl PraminNum for [<u $bits>] {
+                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
+                    bar.[<try_read $bits>](offset)
+                }
+
+                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
+                    bar.[<try_write $bits>](self, offset)
+                }
+            }
+        }
+    };
+}
+
+impl_pramin_unsigned_num!(8);
+impl_pramin_unsigned_num!(16);
+impl_pramin_unsigned_num!(32);
+impl_pramin_unsigned_num!(64);
+
+/// Direct VRAM access through PRAMIN window before page tables are set up.
+pub(crate) struct PraminVram<'a> {
+    bar: &'a Bar0,
+    saved_window_addr: usize,
+}
+
+impl<'a> PraminVram<'a> {
+    /// Create a new PRAMIN VRAM accessor, saving current window state,
+    /// the state is restored when the accessor is dropped.
+    ///
+    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
+    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
+    pub(crate) fn new(bar: &'a Bar0) -> Self {
+        let saved_window_addr = Self::get_window_addr(bar);
+        Self {
+            bar,
+            saved_window_addr,
+        }
+    }
+
+    /// Set BAR0 window to point to specific FB region.
+    ///
+    /// # Arguments
+    ///
+    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
+    ///                 Must be 64KB aligned (lower 16 bits zero).
+    fn set_window_addr(&self, fb_offset: usize) -> Result {
+        // FB offset must be 64KB aligned (hardware requirement for window_base field)
+        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
+        if fb_offset & 0xFFFF != 0 {
+            return Err(EINVAL);
+        }
+
+        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
+        window_reg.write(self.bar);
+
+        // Read back to ensure it took effect
+        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
+        if readback.window_base() != window_reg.window_base() {
+            return Err(EIO);
+        }
+
+        Ok(())
+    }
+
+    /// Get current BAR0 window offset.
+    ///
+    /// # Returns
+    ///
+    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
+    /// This offset is always 64KB aligned.
+    fn get_window_addr(bar: &Bar0) -> usize {
+        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
+        window_reg.get_window_addr()
+    }
+
+    /// Common logic for accessing VRAM data through PRAMIN with windowing.
+    ///
+    /// # Arguments
+    ///
+    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
+    ///                 Must be aligned to `T::alignment()`.
+    /// * `num_items` - Number of items of type `T` to process.
+    /// * `operation` - Closure called for each item to perform the actual read/write.
+    ///                 Takes two parameters:
+    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
+    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
+    ///
+    /// The function automatically handles PRAMIN window repositioning when accessing
+    /// data that spans multiple 1MB windows.
+    fn access_vram<T: PraminNum, F>(
+        &self,
+        fb_offset: usize,
+        num_items: usize,
+        mut operation: F,
+    ) -> Result
+    where
+        F: FnMut(usize, usize) -> Result,
+    {
+        // FB offset must be aligned to the size of T
+        if fb_offset & (T::alignment() - 1) != 0 {
+            return Err(EINVAL);
+        }
+
+        let mut offset_bytes = fb_offset;
+        let mut remaining_items = num_items;
+        let mut data_index = 0;
+
+        while remaining_items > 0 {
+            // Align the window to 64KB boundary
+            let target_window = offset_bytes & !0xFFFF;
+            let window_offset = offset_bytes - target_window;
+
+            // Set window if needed
+            if target_window != Self::get_window_addr(self.bar) {
+                self.set_window_addr(target_window)?;
+            }
+
+            // Calculate how many items we can access from this window position
+            // We can access up to 1MB total, minus the offset within the window
+            let remaining_in_window = PRAMIN_SIZE - window_offset;
+            let max_items_in_window = remaining_in_window / T::size_bytes();
+            let items_to_write = core::cmp::min(remaining_items, max_items_in_window);
+
+            // Process data through PRAMIN
+            for i in 0..items_to_write {
+                // Calculate the byte offset in the PRAMIN window to write to.
+                let pramin_offset_bytes = PRAMIN_BASE + window_offset + (i * T::size_bytes());
+                operation(data_index + i, pramin_offset_bytes)?;
+            }
+
+            // Move to next chunk.
+            data_index += items_to_write;
+            offset_bytes += items_to_write * T::size_bytes();
+            remaining_items -= items_to_write;
+        }
+
+        Ok(())
+    }
+
+    /// Generic write for data to VRAM through PRAMIN.
+    ///
+    /// # Arguments
+    ///
+    /// * `fb_offset` - Starting byte offset in VRAM where data will be written.
+    ///                 Must be aligned to `T::alignment()`.
+    /// * `data` - Slice of items to write to VRAM. All items will be written sequentially
+    ///            starting at `fb_offset`.
+    pub(crate) fn write<T: PraminNum>(&self, fb_offset: usize, data: &[T]) -> Result {
+        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
+            data[data_idx].write_to_bar(self.bar, pramin_offset)
+        })
+    }
+
+    /// Generic read data from VRAM through PRAMIN.
+    ///
+    /// # Arguments
+    ///
+    /// * `fb_offset` - Starting byte offset in VRAM where data will be read from.
+    ///                 Must be aligned to `T::alignment()`.
+    /// * `data` - Mutable slice that will be filled with data read from VRAM.
+    ///            The number of items read equals `data.len()`.
+    pub(crate) fn read<T: PraminNum>(&self, fb_offset: usize, data: &mut [T]) -> Result {
+        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
+            data[data_idx] = T::read_from_bar(self.bar, pramin_offset)?;
+            Ok(())
+        })
+    }
+}
+
+impl<'a> Drop for PraminVram<'a> {
+    fn drop(&mut self) {
+        let _ = self.set_window_addr(self.saved_window_addr); // Restore original window.
+    }
+}
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 112277c7921e..6bd9fc3372d6 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -13,6 +13,7 @@
 mod gfw;
 mod gpu;
 mod gsp;
+mod mm;
 mod regs;
 mod util;
 mod vbios;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index a3836a01996b..ba09da7e1541 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -12,6 +12,7 @@
     FalconModSelAlgo, FalconSecurityModel, PFalcon2Base, PFalconBase, PeregrineCoreSelect,
 };
 use crate::gpu::{Architecture, Chipset};
+use kernel::bits::genmask_u32;
 use kernel::prelude::*;
 
 // PMC
@@ -43,7 +44,8 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
     }
 }
 
-// PBUS
+// PBUS - PBUS is a bus control unit, that helps the GPU communicate with the PCI bus.
+// Handles the BAR windows, decoding of MMIO read/writes on the BARs, etc.
 
 register!(NV_PBUS_SW_SCRATCH @ 0x00001400[64]  {});
 
@@ -52,6 +54,31 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
     31:16   frts_err_code as u16;
 });
 
+// BAR0 window control register to configure the BAR0 window for PRAMIN access
+// (direct physical VRAM access).
+register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control register" {
+    25:24   target as u8, "Target (0=VID_MEM, 1=SYS_MEM_COHERENT, 2=SYS_MEM_NONCOHERENT)";
+    23:0    window_base as u32, "Window base address (bits 39:16 of FB addr)";
+});
+
+impl NV_PBUS_BAR0_WINDOW {
+    /// Returns the 64-bit aligned VRAM address of the window.
+    pub(crate) fn get_window_addr(self) -> usize {
+        (self.window_base() as usize) << 16
+    }
+
+    /// Sets the window address from a framebuffer offset.
+    /// The fb_offset must be 64KB aligned (lower bits discared).
+    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
+        // Calculate window base (bits 39:16 of FB address)
+        // The total FB address is 40 bits, mask anything above. Since we are
+        // right shifting the offset by 16 bits, the mask is only 24 bits.
+        let mask = genmask_u32(0..=23) as usize;
+        let window_base = ((fb_offset >> 16) & mask) as u32;
+        self.set_window_base(window_base)
+    }
+}
+
 // PFB
 
 // The following two registers together hold the physical system memory address that is used by the
-- 
2.34.1
Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Posted by Alexandre Courbot 3 months, 2 weeks ago
On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
> Required for writing page tables directly to VRAM physical memory,
> before page tables and MMU are setup.
>
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/nova_core.rs |   1 +
>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>  4 files changed, 273 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>
> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
> new file mode 100644
> index 000000000000..54c7cd9416a9
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/mod.rs
> @@ -0,0 +1,3 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +pub(crate) mod pramin;
> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
> new file mode 100644
> index 000000000000..4f4e1b8c0b9b
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/pramin.rs
> @@ -0,0 +1,241 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct VRAM access through PRAMIN window before page tables are set up.
> +//! PRAMIN can also write to system memory, however for simplicty we only

s/simplicty/simplicity

> +//! support VRAM access.
> +//!
> +//! # Examples
> +//!
> +//! ## Writing u32 data to VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
> +//!     pramin.write::<u32>(0x10000, &data)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +//!
> +//! ## Reading bytes from VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Read a u8 from VRAM starting at offset 0x20000
> +//!     pramin.read::<u8>(0x20000, buffer)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +
> +#![expect(dead_code)]
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +use core::mem;
> +use kernel::prelude::*;
> +
> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
> +/// the VRAM directly. These addresses are consistent across all GPUs.
> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000

This definition looks like it could be an array of registers - that way
we could use its `BASE` associated constant and keep the hardware
offsets into the `regs` module.

Even if we don't use the array of registers for convenience, it is good
to have it defined in `regs` for consistency.

> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position

You can use `kernel::sizes::SZ_1M` here.

> +
> +/// Trait for types that can be read/written through PRAMIN.
> +pub(crate) trait PraminNum: Copy + Default + Sized {
> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
> +
> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
> +
> +    fn size_bytes() -> usize {
> +        mem::size_of::<Self>()
> +    }
> +
> +    fn alignment() -> usize {
> +        Self::size_bytes()
> +    }
> +}

Since this trait requires `Sized`, you can use `size_of` and `align_of`
directly, making the `size_bytes` and `alignment` methods redundant.
Only `write_to_bar` should remain.

I also wonder whether we couldn't get rid of this trait entirely by
leveragin `FromBytes` and `AsBytes`. Since the size of the type is
known, we could have read/write methods in Pramin that write its content
by using Io accessors of decreasing size (first 64-bit, then 32, etc)
until all the data is written.

> +
> +/// Macro to implement PraminNum trait for unsigned integer types.
> +macro_rules! impl_pramin_unsigned_num {
> +    ($bits:literal) => {
> +        ::kernel::macros::paste! {
> +            impl PraminNum for [<u $bits>] {
> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
> +                    bar.[<try_read $bits>](offset)
> +                }
> +
> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
> +                    bar.[<try_write $bits>](self, offset)
> +                }
> +            }
> +        }
> +    };
> +}
> +
> +impl_pramin_unsigned_num!(8);
> +impl_pramin_unsigned_num!(16);
> +impl_pramin_unsigned_num!(32);
> +impl_pramin_unsigned_num!(64);
> +
> +/// Direct VRAM access through PRAMIN window before page tables are set up.
> +pub(crate) struct PraminVram<'a> {

Let's use the shorter name `Pramin` - the limitation to VRAM is a
reasonable one (since the CPU can access its own system memory), it is
not necessary to encode it into the name.

> +    bar: &'a Bar0,
> +    saved_window_addr: usize,
> +}
> +
> +impl<'a> PraminVram<'a> {
> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
> +    /// the state is restored when the accessor is dropped.
> +    ///
> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
> +        let saved_window_addr = Self::get_window_addr(bar);
> +        Self {
> +            bar,
> +            saved_window_addr,
> +        }
> +    }
> +
> +    /// Set BAR0 window to point to specific FB region.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
> +    ///                 Must be 64KB aligned (lower 16 bits zero).

Let's follow the rust doccomment guidelines for the arguments.

> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
> +        if fb_offset & 0xFFFF != 0 {
> +            return Err(EINVAL);
> +        }

Since this method is private and called from controlled contexts for
which `fb_offset` should always be valid, we can request callers to
give us a "window index" (e.g. the `window_base` of the
`NV_PBUS_BAR0_WINDOW` register) directly and remove this check. That
will also let us remove the impl block on `NV_PBUS_BAR0_WINDOW`.

> +
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
> +        window_reg.write(self.bar);
> +
> +        // Read back to ensure it took effect
> +        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
> +        if readback.window_base() != window_reg.window_base() {
> +            return Err(EIO);
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Get current BAR0 window offset.
> +    ///
> +    /// # Returns
> +    ///
> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
> +    /// This offset is always 64KB aligned.
> +    fn get_window_addr(bar: &Bar0) -> usize {
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
> +        window_reg.get_window_addr()
> +    }
> +
> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `num_items` - Number of items of type `T` to process.
> +    /// * `operation` - Closure called for each item to perform the actual read/write.
> +    ///                 Takes two parameters:
> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access

Formatting of arguments is strange here as well.

> +    ///
> +    /// The function automatically handles PRAMIN window repositioning when accessing
> +    /// data that spans multiple 1MB windows.

Inversely, this large method is under-documented. Understanding what
`operation` is supposed to do would be helpful.

> +    fn access_vram<T: PraminNum, F>(
> +        &self,
> +        fb_offset: usize,
> +        num_items: usize,
> +        mut operation: F,
> +    ) -> Result
> +    where
> +        F: FnMut(usize, usize) -> Result,
> +    {
> +        // FB offset must be aligned to the size of T
> +        if fb_offset & (T::alignment() - 1) != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut offset_bytes = fb_offset;
> +        let mut remaining_items = num_items;
> +        let mut data_index = 0;
> +
> +        while remaining_items > 0 {
> +            // Align the window to 64KB boundary
> +            let target_window = offset_bytes & !0xFFFF;
> +            let window_offset = offset_bytes - target_window;
> +
> +            // Set window if needed
> +            if target_window != Self::get_window_addr(self.bar) {
> +                self.set_window_addr(target_window)?;
> +            }
> +
> +            // Calculate how many items we can access from this window position
> +            // We can access up to 1MB total, minus the offset within the window
> +            let remaining_in_window = PRAMIN_SIZE - window_offset;
> +            let max_items_in_window = remaining_in_window / T::size_bytes();
> +            let items_to_write = core::cmp::min(remaining_items, max_items_in_window);
> +
> +            // Process data through PRAMIN
> +            for i in 0..items_to_write {
> +                // Calculate the byte offset in the PRAMIN window to write to.
> +                let pramin_offset_bytes = PRAMIN_BASE + window_offset + (i * T::size_bytes());
> +                operation(data_index + i, pramin_offset_bytes)?;
> +            }
> +
> +            // Move to next chunk.
> +            data_index += items_to_write;
> +            offset_bytes += items_to_write * T::size_bytes();
> +            remaining_items -= items_to_write;
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Generic write for data to VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be written.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Slice of items to write to VRAM. All items will be written sequentially
> +    ///            starting at `fb_offset`.
> +    pub(crate) fn write<T: PraminNum>(&self, fb_offset: usize, data: &[T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx].write_to_bar(self.bar, pramin_offset)
> +        })
> +    }
> +
> +    /// Generic read data from VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be read from.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Mutable slice that will be filled with data read from VRAM.
> +    ///            The number of items read equals `data.len()`.
> +    pub(crate) fn read<T: PraminNum>(&self, fb_offset: usize, data: &mut [T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx] = T::read_from_bar(self.bar, pramin_offset)?;
> +            Ok(())
> +        })
> +    }
> +}
> +
> +impl<'a> Drop for PraminVram<'a> {
> +    fn drop(&mut self) {
> +        let _ = self.set_window_addr(self.saved_window_addr); // Restore original window.
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 112277c7921e..6bd9fc3372d6 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -13,6 +13,7 @@
>  mod gfw;
>  mod gpu;
>  mod gsp;
> +mod mm;
>  mod regs;
>  mod util;
>  mod vbios;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index a3836a01996b..ba09da7e1541 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -12,6 +12,7 @@
>      FalconModSelAlgo, FalconSecurityModel, PFalcon2Base, PFalconBase, PeregrineCoreSelect,
>  };
>  use crate::gpu::{Architecture, Chipset};
> +use kernel::bits::genmask_u32;
>  use kernel::prelude::*;
>  
>  // PMC
> @@ -43,7 +44,8 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      }
>  }
>  
> -// PBUS
> +// PBUS - PBUS is a bus control unit, that helps the GPU communicate with the PCI bus.
> +// Handles the BAR windows, decoding of MMIO read/writes on the BARs, etc.
>  
>  register!(NV_PBUS_SW_SCRATCH @ 0x00001400[64]  {});
>  
> @@ -52,6 +54,31 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      31:16   frts_err_code as u16;
>  });
>  
> +// BAR0 window control register to configure the BAR0 window for PRAMIN access
> +// (direct physical VRAM access).
> +register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control register" {
> +    25:24   target as u8, "Target (0=VID_MEM, 1=SYS_MEM_COHERENT, 2=SYS_MEM_NONCOHERENT)";
> +    23:0    window_base as u32, "Window base address (bits 39:16 of FB addr)";
> +});
> +
> +impl NV_PBUS_BAR0_WINDOW {
> +    /// Returns the 64-bit aligned VRAM address of the window.
> +    pub(crate) fn get_window_addr(self) -> usize {
> +        (self.window_base() as usize) << 16
> +    }
> +
> +    /// Sets the window address from a framebuffer offset.
> +    /// The fb_offset must be 64KB aligned (lower bits discared).
> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
> +        // Calculate window base (bits 39:16 of FB address)
> +        // The total FB address is 40 bits, mask anything above. Since we are
> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
> +        let mask = genmask_u32(0..=23) as usize;
> +        let window_base = ((fb_offset >> 16) & mask) as u32;
> +        self.set_window_base(window_base)
> +    }
> +}

If you work directly with `window_base` as suggested above, this impl
block can be dropped altogether.
Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Posted by Joel Fernandes 3 months, 2 weeks ago
Hi Alex,

On 10/22/2025 6:41 AM, Alexandre Courbot wrote:
> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
>> Required for writing page tables directly to VRAM physical memory,
>> before page tables and MMU are setup.
>>
>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>>  drivers/gpu/nova-core/nova_core.rs |   1 +
>>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>>  4 files changed, 273 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>>
>> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
>> new file mode 100644
>> index 000000000000..54c7cd9416a9
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/mm/mod.rs
>> @@ -0,0 +1,3 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +pub(crate) mod pramin;
>> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
>> new file mode 100644
>> index 000000000000..4f4e1b8c0b9b
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/mm/pramin.rs
>> @@ -0,0 +1,241 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Direct VRAM access through PRAMIN window before page tables are set up.
>> +//! PRAMIN can also write to system memory, however for simplicty we only
> 
> s/simplicty/simplicity

Ok.

>> +//! support VRAM access.
>> +//!
>> +//! # Examples
>> +//!
>> +//! ## Writing u32 data to VRAM
>> +//!
>> +//! ```no_run
>> +//! use crate::driver::Bar0;
>> +//! use crate::mm::pramin::PraminVram;
>> +//!
>> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
>> +//!     let pramin = PraminVram::new(bar);
>> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
>> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
>> +//!     pramin.write::<u32>(0x10000, &data)?;
>> +//!     Ok(())
>> +//! }
>> +//! ```
>> +//!
>> +//! ## Reading bytes from VRAM
>> +//!
>> +//! ```no_run
>> +//! use crate::driver::Bar0;
>> +//! use crate::mm::pramin::PraminVram;
>> +//!
>> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
>> +//!     let pramin = PraminVram::new(bar);
>> +//!     // Read a u8 from VRAM starting at offset 0x20000
>> +//!     pramin.read::<u8>(0x20000, buffer)?;
>> +//!     Ok(())
>> +//! }
>> +//! ```
>> +
>> +#![expect(dead_code)]
>> +
>> +use crate::driver::Bar0;
>> +use crate::regs;
>> +use core::mem;
>> +use kernel::prelude::*;
>> +
>> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
>> +/// the VRAM directly. These addresses are consistent across all GPUs.
>> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
> 
> This definition looks like it could be an array of registers - that way
> we could use its `BASE` associated constant and keep the hardware
> offsets into the `regs` module.
> 
> Even if we don't use the array of registers for convenience, it is good
> to have it defined in `regs` for consistency.

Ok, I wanted to do that, but I thought since these are registers, it is weird to
move it there.

Also we need byte-level access, register macro is u32. I don't think we should
overload regs.rs just to store magic numbers, these are not registers right? We
have PRAM window configuration registers but that's different.

> 
>> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
> 
> You can use `kernel::sizes::SZ_1M` here.

Sure, will do.

>> +
>> +/// Trait for types that can be read/written through PRAMIN.
>> +pub(crate) trait PraminNum: Copy + Default + Sized {
>> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
>> +
>> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
>> +
>> +    fn size_bytes() -> usize {
>> +        mem::size_of::<Self>()
>> +    }
>> +
>> +    fn alignment() -> usize {
>> +        Self::size_bytes()
>> +    }
>> +}
> 
> Since this trait requires `Sized`, you can use `size_of` and `align_of`
> directly, making the `size_bytes` and `alignment` methods redundant.
> Only `write_to_bar` should remain.

Sure, slightly poorer caller-side readability though but its fine with me, I'll
do that.

> I also wonder whether we couldn't get rid of this trait entirely by
> leveragin `FromBytes` and `AsBytes`. Since the size of the type is
> known, we could have read/write methods in Pramin that write its content
> by using Io accessors of decreasing size (first 64-bit, then 32, etc)
> until all the data is written.

Ah great idea, I like this. Though per the other discussion with John on keeping
it simple (not doing bulk I/O operations), maybe we wouldn't need a trait at
all. Let me see.

> 
>> +
>> +/// Macro to implement PraminNum trait for unsigned integer types.
>> +macro_rules! impl_pramin_unsigned_num {
>> +    ($bits:literal) => {
>> +        ::kernel::macros::paste! {
>> +            impl PraminNum for [<u $bits>] {
>> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
>> +                    bar.[<try_read $bits>](offset)
>> +                }
>> +
>> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
>> +                    bar.[<try_write $bits>](self, offset)
>> +                }
>> +            }
>> +        }
>> +    };
>> +}
>> +
>> +impl_pramin_unsigned_num!(8);
>> +impl_pramin_unsigned_num!(16);
>> +impl_pramin_unsigned_num!(32);
>> +impl_pramin_unsigned_num!(64);
>> +
>> +/// Direct VRAM access through PRAMIN window before page tables are set up.
>> +pub(crate) struct PraminVram<'a> {
> 
> Let's use the shorter name `Pramin` - the limitation to VRAM is a
> reasonable one (since the CPU can access its own system memory), it is
> not necessary to encode it into the name.
Sure, sounds good.

> 
>> +    bar: &'a Bar0,
>> +    saved_window_addr: usize,
>> +}
>> +
>> +impl<'a> PraminVram<'a> {
>> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
>> +    /// the state is restored when the accessor is dropped.
>> +    ///
>> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
>> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
>> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
>> +        let saved_window_addr = Self::get_window_addr(bar);
>> +        Self {
>> +            bar,
>> +            saved_window_addr,
>> +        }
>> +    }
>> +
>> +    /// Set BAR0 window to point to specific FB region.
>> +    ///
>> +    /// # Arguments
>> +    ///
>> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
>> +    ///                 Must be 64KB aligned (lower 16 bits zero).
> 
> Let's follow the rust doccomment guidelines for the arguments.

Ok, Sure.
> 
>> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
>> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
>> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
>> +        if fb_offset & 0xFFFF != 0 {
>> +            return Err(EINVAL);
>> +        }
> 
> Since this method is private and called from controlled contexts for
> which `fb_offset` should always be valid, we can request callers to
> give us a "window index" (e.g. the `window_base` of the
> `NV_PBUS_BAR0_WINDOW` register) directly and remove this check. That
> will also let us remove the impl block on `NV_PBUS_BAR0_WINDOW`.
> 

The tradeoff being it may complicated callers of the function that deal purely
with addresses instead of window indices.

>> +    ///
>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>> +    /// data that spans multiple 1MB windows.
> 
> Inversely, this large method is under-documented. Understanding what
> `operation` is supposed to do would be helpful.

I will skip these comments for now as we discussed dropping complexity in other
thread, but thanks for the review on this. This function should be likely
dropped in the next iteration.

>> +
>> +    /// Sets the window address from a framebuffer offset.
>> +    /// The fb_offset must be 64KB aligned (lower bits discared).
>> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
>> +        // Calculate window base (bits 39:16 of FB address)
>> +        // The total FB address is 40 bits, mask anything above. Since we are
>> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
>> +        let mask = genmask_u32(0..=23) as usize;
>> +        let window_base = ((fb_offset >> 16) & mask) as u32;
>> +        self.set_window_base(window_base)
>> +    }
>> +}
> 
> If you work directly with `window_base` as suggested above, this impl
> block can be dropped altogether.
But it will complicate callers. That's the tradeoff. I prefer to keep caller
side simple and abstract away complexity. But to your point, this is an internal
API so I can probably code it both ways and see what it looks like.

thanks,

 - Joel
Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Posted by Alexandre Courbot 3 months, 2 weeks ago
On Thu Oct 23, 2025 at 7:04 AM JST, Joel Fernandes wrote:
> Hi Alex,
>
> On 10/22/2025 6:41 AM, Alexandre Courbot wrote:
>> On Tue Oct 21, 2025 at 3:55 AM JST, Joel Fernandes wrote:
>>> Required for writing page tables directly to VRAM physical memory,
>>> before page tables and MMU are setup.
>>>
>>> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>>> ---
>>>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>>>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>>>  drivers/gpu/nova-core/nova_core.rs |   1 +
>>>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>>>  4 files changed, 273 insertions(+), 1 deletion(-)
>>>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>>>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs
>>>
>>> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
>>> new file mode 100644
>>> index 000000000000..54c7cd9416a9
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/mm/mod.rs
>>> @@ -0,0 +1,3 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +pub(crate) mod pramin;
>>> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
>>> new file mode 100644
>>> index 000000000000..4f4e1b8c0b9b
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/mm/pramin.rs
>>> @@ -0,0 +1,241 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Direct VRAM access through PRAMIN window before page tables are set up.
>>> +//! PRAMIN can also write to system memory, however for simplicty we only
>> 
>> s/simplicty/simplicity
>
> Ok.
>
>>> +//! support VRAM access.
>>> +//!
>>> +//! # Examples
>>> +//!
>>> +//! ## Writing u32 data to VRAM
>>> +//!
>>> +//! ```no_run
>>> +//! use crate::driver::Bar0;
>>> +//! use crate::mm::pramin::PraminVram;
>>> +//!
>>> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
>>> +//!     let pramin = PraminVram::new(bar);
>>> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
>>> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
>>> +//!     pramin.write::<u32>(0x10000, &data)?;
>>> +//!     Ok(())
>>> +//! }
>>> +//! ```
>>> +//!
>>> +//! ## Reading bytes from VRAM
>>> +//!
>>> +//! ```no_run
>>> +//! use crate::driver::Bar0;
>>> +//! use crate::mm::pramin::PraminVram;
>>> +//!
>>> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
>>> +//!     let pramin = PraminVram::new(bar);
>>> +//!     // Read a u8 from VRAM starting at offset 0x20000
>>> +//!     pramin.read::<u8>(0x20000, buffer)?;
>>> +//!     Ok(())
>>> +//! }
>>> +//! ```
>>> +
>>> +#![expect(dead_code)]
>>> +
>>> +use crate::driver::Bar0;
>>> +use crate::regs;
>>> +use core::mem;
>>> +use kernel::prelude::*;
>>> +
>>> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
>>> +/// the VRAM directly. These addresses are consistent across all GPUs.
>>> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
>> 
>> This definition looks like it could be an array of registers - that way
>> we could use its `BASE` associated constant and keep the hardware
>> offsets into the `regs` module.
>> 
>> Even if we don't use the array of registers for convenience, it is good
>> to have it defined in `regs` for consistency.
>
> Ok, I wanted to do that, but I thought since these are registers, it is weird to
> move it there.

It's just that it looks like to keep all the layout in the same place.

>
> Also we need byte-level access, register macro is u32. I don't think we should
> overload regs.rs just to store magic numbers, these are not registers right? We
> have PRAM window configuration registers but that's different.

The register macro just gained support for `u8` thanks to your work, I
thought that would have been a good use-case. :)

Also we don't need to use the register accessors for this - the idea was
just to have the definition in `regs.rs`, and use the constant
containing its address instead of the regular register accessors if they
are not fit for this.

I don't feel too strongly about this though.

>
>> 
>>> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
>> 
>> You can use `kernel::sizes::SZ_1M` here.
>
> Sure, will do.
>
>>> +
>>> +/// Trait for types that can be read/written through PRAMIN.
>>> +pub(crate) trait PraminNum: Copy + Default + Sized {
>>> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
>>> +
>>> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
>>> +
>>> +    fn size_bytes() -> usize {
>>> +        mem::size_of::<Self>()
>>> +    }
>>> +
>>> +    fn alignment() -> usize {
>>> +        Self::size_bytes()
>>> +    }
>>> +}
>> 
>> Since this trait requires `Sized`, you can use `size_of` and `align_of`
>> directly, making the `size_bytes` and `alignment` methods redundant.
>> Only `write_to_bar` should remain.
>
> Sure, slightly poorer caller-side readability though but its fine with me, I'll
> do that.
>
>> I also wonder whether we couldn't get rid of this trait entirely by
>> leveragin `FromBytes` and `AsBytes`. Since the size of the type is
>> known, we could have read/write methods in Pramin that write its content
>> by using Io accessors of decreasing size (first 64-bit, then 32, etc)
>> until all the data is written.
>
> Ah great idea, I like this. Though per the other discussion with John on keeping
> it simple (not doing bulk I/O operations), maybe we wouldn't need a trait at
> all. Let me see.

Even better. :)

>
>> 
>>> +
>>> +/// Macro to implement PraminNum trait for unsigned integer types.
>>> +macro_rules! impl_pramin_unsigned_num {
>>> +    ($bits:literal) => {
>>> +        ::kernel::macros::paste! {
>>> +            impl PraminNum for [<u $bits>] {
>>> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
>>> +                    bar.[<try_read $bits>](offset)
>>> +                }
>>> +
>>> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
>>> +                    bar.[<try_write $bits>](self, offset)
>>> +                }
>>> +            }
>>> +        }
>>> +    };
>>> +}
>>> +
>>> +impl_pramin_unsigned_num!(8);
>>> +impl_pramin_unsigned_num!(16);
>>> +impl_pramin_unsigned_num!(32);
>>> +impl_pramin_unsigned_num!(64);
>>> +
>>> +/// Direct VRAM access through PRAMIN window before page tables are set up.
>>> +pub(crate) struct PraminVram<'a> {
>> 
>> Let's use the shorter name `Pramin` - the limitation to VRAM is a
>> reasonable one (since the CPU can access its own system memory), it is
>> not necessary to encode it into the name.
> Sure, sounds good.
>
>> 
>>> +    bar: &'a Bar0,
>>> +    saved_window_addr: usize,
>>> +}
>>> +
>>> +impl<'a> PraminVram<'a> {
>>> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
>>> +    /// the state is restored when the accessor is dropped.
>>> +    ///
>>> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
>>> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
>>> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
>>> +        let saved_window_addr = Self::get_window_addr(bar);
>>> +        Self {
>>> +            bar,
>>> +            saved_window_addr,
>>> +        }
>>> +    }
>>> +
>>> +    /// Set BAR0 window to point to specific FB region.
>>> +    ///
>>> +    /// # Arguments
>>> +    ///
>>> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
>>> +    ///                 Must be 64KB aligned (lower 16 bits zero).
>> 
>> Let's follow the rust doccomment guidelines for the arguments.
>
> Ok, Sure.
>> 
>>> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
>>> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
>>> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
>>> +        if fb_offset & 0xFFFF != 0 {
>>> +            return Err(EINVAL);
>>> +        }
>> 
>> Since this method is private and called from controlled contexts for
>> which `fb_offset` should always be valid, we can request callers to
>> give us a "window index" (e.g. the `window_base` of the
>> `NV_PBUS_BAR0_WINDOW` register) directly and remove this check. That
>> will also let us remove the impl block on `NV_PBUS_BAR0_WINDOW`.
>> 
>
> The tradeoff being it may complicated callers of the function that deal purely
> with addresses instead of window indices.

Would it though? IIUC the two callers of this method are aligning the
address already, so the internal check is superfluous. And this would
also simplify `NV_PBUS_BAR0_WINDOW`.

>
>>> +    ///
>>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>>> +    /// data that spans multiple 1MB windows.
>> 
>> Inversely, this large method is under-documented. Understanding what
>> `operation` is supposed to do would be helpful.
>
> I will skip these comments for now as we discussed dropping complexity in other
> thread, but thanks for the review on this. This function should be likely
> dropped in the next iteration.
>
>>> +
>>> +    /// Sets the window address from a framebuffer offset.
>>> +    /// The fb_offset must be 64KB aligned (lower bits discared).
>>> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
>>> +        // Calculate window base (bits 39:16 of FB address)
>>> +        // The total FB address is 40 bits, mask anything above. Since we are
>>> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
>>> +        let mask = genmask_u32(0..=23) as usize;
>>> +        let window_base = ((fb_offset >> 16) & mask) as u32;
>>> +        self.set_window_base(window_base)
>>> +    }
>>> +}
>> 
>> If you work directly with `window_base` as suggested above, this impl
>> block can be dropped altogether.
> But it will complicate callers. That's the tradeoff. I prefer to keep caller
> side simple and abstract away complexity. But to your point, this is an internal
> API so I can probably code it both ways and see what it looks like.

I am not convinced that the callers would require extra complexity. If
it turns out they do, let's weight the cost/benefit of both approaches.
Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Posted by John Hubbard 3 months, 2 weeks ago
On 10/20/25 11:55 AM, Joel Fernandes wrote:
> Required for writing page tables directly to VRAM physical memory,
> before page tables and MMU are setup.
> 
> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
> ---
>  drivers/gpu/nova-core/mm/mod.rs    |   3 +
>  drivers/gpu/nova-core/mm/pramin.rs | 241 +++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/nova_core.rs |   1 +
>  drivers/gpu/nova-core/regs.rs      |  29 +++-
>  4 files changed, 273 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/nova-core/mm/mod.rs
>  create mode 100644 drivers/gpu/nova-core/mm/pramin.rs

Hi Joel,

Lots of feedback here, I'm familiar with this part of the RM and GPU,
and have studied your patch here for a while.

First of all, I'd like very much for this patch to be at the beginning
of a new patchset, that includes calling code that actually uses this
pramin functionality. That's a good practice in general, which I want
to sort of reestablish here in nova, but also in this particular case
it is especially helpful in avoiding a lot of code that we don't need,
and won't need.

OK here we go.

> 
> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
> new file mode 100644
> index 000000000000..54c7cd9416a9
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/mod.rs
> @@ -0,0 +1,3 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +pub(crate) mod pramin;
> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
> new file mode 100644
> index 000000000000..4f4e1b8c0b9b
> --- /dev/null
> +++ b/drivers/gpu/nova-core/mm/pramin.rs
> @@ -0,0 +1,241 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Direct VRAM access through PRAMIN window before page tables are set up.

Let's omit "before page tables are set up", because that's an assumption
about the caller. In fact, this can be used at any time. (As an aside,
various diag tools use this HW feature to peek/poke at vidmem, as you
might expect.)

> +//! PRAMIN can also write to system memory, however for simplicty we only
> +//! support VRAM access.

Actually, I'd like to say something a bit different:

For Hopper/Blackwell and later GPUs, PRAMIN can only access VRAM.
For earlier GPUs, sysmem was technically supported, but in practice
never used in production drivers.

This is why Nova only provides VRAM access: Nova supports only a few
of those older GPU generations (Turing, Ampere, and Ada), and there is
no use case even for those few GPUs, for attempting sysmem access
via PRAMIN.


> +//!
> +//! # Examples
> +//!
> +//! ## Writing u32 data to VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
> +//!     pramin.write::<u32>(0x10000, &data)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +//!
> +//! ## Reading bytes from VRAM
> +//!
> +//! ```no_run
> +//! use crate::driver::Bar0;
> +//! use crate::mm::pramin::PraminVram;
> +//!
> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
> +//!     let pramin = PraminVram::new(bar);
> +//!     // Read a u8 from VRAM starting at offset 0x20000
> +//!     pramin.read::<u8>(0x20000, buffer)?;
> +//!     Ok(())
> +//! }
> +//! ```
> +
> +#![expect(dead_code)]
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +use core::mem;
> +use kernel::prelude::*;
> +
> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
> +/// the VRAM directly. These addresses are consistent across all GPUs.
> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
> +
> +/// Trait for types that can be read/written through PRAMIN.
> +pub(crate) trait PraminNum: Copy + Default + Sized {
> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
> +
> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
> +
> +    fn size_bytes() -> usize {
> +        mem::size_of::<Self>()
> +    }
> +
> +    fn alignment() -> usize {
> +        Self::size_bytes()
> +    }
> +}
> +
> +/// Macro to implement PraminNum trait for unsigned integer types.
> +macro_rules! impl_pramin_unsigned_num {
> +    ($bits:literal) => {
> +        ::kernel::macros::paste! {
> +            impl PraminNum for [<u $bits>] {
> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
> +                    bar.[<try_read $bits>](offset)
> +                }
> +
> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
> +                    bar.[<try_write $bits>](self, offset)
> +                }
> +            }
> +        }
> +    };
> +}
> +
> +impl_pramin_unsigned_num!(8);
> +impl_pramin_unsigned_num!(16);
> +impl_pramin_unsigned_num!(32);
> +impl_pramin_unsigned_num!(64);
> +
> +/// Direct VRAM access through PRAMIN window before page tables are set up.
> +pub(crate) struct PraminVram<'a> {
> +    bar: &'a Bar0,
> +    saved_window_addr: usize,
> +}
> +
> +impl<'a> PraminVram<'a> {
> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
> +    /// the state is restored when the accessor is dropped.
> +    ///
> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
> +        let saved_window_addr = Self::get_window_addr(bar);
> +        Self {
> +            bar,
> +            saved_window_addr,
> +        }
> +    }
> +
> +    /// Set BAR0 window to point to specific FB region.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
> +    ///                 Must be 64KB aligned (lower 16 bits zero).
> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
> +        if fb_offset & 0xFFFF != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
> +        window_reg.write(self.bar);
> +
> +        // Read back to ensure it took effect
> +        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
> +        if readback.window_base() != window_reg.window_base() {
> +            return Err(EIO);
> +        }

Let's not "read it back to ensure it took effect". This is not required
by the hardware.

> +
> +        Ok(())
> +    }
> +
> +    /// Get current BAR0 window offset.
> +    ///
> +    /// # Returns
> +    ///
> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
> +    /// This offset is always 64KB aligned.
> +    fn get_window_addr(bar: &Bar0) -> usize {
> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
> +        window_reg.get_window_addr()
> +    }
> +
> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `num_items` - Number of items of type `T` to process.
> +    /// * `operation` - Closure called for each item to perform the actual read/write.
> +    ///                 Takes two parameters:
> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
> +    ///
> +    /// The function automatically handles PRAMIN window repositioning when accessing
> +    /// data that spans multiple 1MB windows.
> +    fn access_vram<T: PraminNum, F>(
> +        &self,
> +        fb_offset: usize,
> +        num_items: usize,
> +        mut operation: F,
> +    ) -> Result

This is far too much functionality, and the code can be made much smaller
and simpler, and still get what we need. Open RM only supplies small accessors
(8 thru 64 bits wide), and no "bulk access". The calling code can loop if 
necessary.

We should do likewise, and avoid this.

Then we can just create things such as write_u32() or write<u32>(), etc.

And do we even *need* read?? I'm not sure we do.

This is hopefully showing the value of including the calling code, as
a follow-on patch in the series.

Hope this helps!

thanks,
-- 
John Hubbard


> +    where
> +        F: FnMut(usize, usize) -> Result,
> +    {
> +        // FB offset must be aligned to the size of T
> +        if fb_offset & (T::alignment() - 1) != 0 {
> +            return Err(EINVAL);
> +        }
> +
> +        let mut offset_bytes = fb_offset;
> +        let mut remaining_items = num_items;
> +        let mut data_index = 0;
> +
> +        while remaining_items > 0 {
> +            // Align the window to 64KB boundary
> +            let target_window = offset_bytes & !0xFFFF;
> +            let window_offset = offset_bytes - target_window;
> +
> +            // Set window if needed
> +            if target_window != Self::get_window_addr(self.bar) {
> +                self.set_window_addr(target_window)?;
> +            }
> +
> +            // Calculate how many items we can access from this window position
> +            // We can access up to 1MB total, minus the offset within the window
> +            let remaining_in_window = PRAMIN_SIZE - window_offset;
> +            let max_items_in_window = remaining_in_window / T::size_bytes();
> +            let items_to_write = core::cmp::min(remaining_items, max_items_in_window);
> +
> +            // Process data through PRAMIN
> +            for i in 0..items_to_write {
> +                // Calculate the byte offset in the PRAMIN window to write to.
> +                let pramin_offset_bytes = PRAMIN_BASE + window_offset + (i * T::size_bytes());
> +                operation(data_index + i, pramin_offset_bytes)?;
> +            }
> +
> +            // Move to next chunk.
> +            data_index += items_to_write;
> +            offset_bytes += items_to_write * T::size_bytes();
> +            remaining_items -= items_to_write;
> +        }
> +
> +        Ok(())
> +    }
> +
> +    /// Generic write for data to VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be written.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Slice of items to write to VRAM. All items will be written sequentially
> +    ///            starting at `fb_offset`.
> +    pub(crate) fn write<T: PraminNum>(&self, fb_offset: usize, data: &[T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx].write_to_bar(self.bar, pramin_offset)
> +        })
> +    }
> +
> +    /// Generic read data from VRAM through PRAMIN.
> +    ///
> +    /// # Arguments
> +    ///
> +    /// * `fb_offset` - Starting byte offset in VRAM where data will be read from.
> +    ///                 Must be aligned to `T::alignment()`.
> +    /// * `data` - Mutable slice that will be filled with data read from VRAM.
> +    ///            The number of items read equals `data.len()`.
> +    pub(crate) fn read<T: PraminNum>(&self, fb_offset: usize, data: &mut [T]) -> Result {
> +        self.access_vram::<T, _>(fb_offset, data.len(), |data_idx, pramin_offset| {
> +            data[data_idx] = T::read_from_bar(self.bar, pramin_offset)?;
> +            Ok(())
> +        })
> +    }
> +}
> +
> +impl<'a> Drop for PraminVram<'a> {
> +    fn drop(&mut self) {
> +        let _ = self.set_window_addr(self.saved_window_addr); // Restore original window.
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 112277c7921e..6bd9fc3372d6 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -13,6 +13,7 @@
>  mod gfw;
>  mod gpu;
>  mod gsp;
> +mod mm;
>  mod regs;
>  mod util;
>  mod vbios;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index a3836a01996b..ba09da7e1541 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -12,6 +12,7 @@
>      FalconModSelAlgo, FalconSecurityModel, PFalcon2Base, PFalconBase, PeregrineCoreSelect,
>  };
>  use crate::gpu::{Architecture, Chipset};
> +use kernel::bits::genmask_u32;
>  use kernel::prelude::*;
>  
>  // PMC
> @@ -43,7 +44,8 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      }
>  }
>  
> -// PBUS
> +// PBUS - PBUS is a bus control unit, that helps the GPU communicate with the PCI bus.
> +// Handles the BAR windows, decoding of MMIO read/writes on the BARs, etc.
>  
>  register!(NV_PBUS_SW_SCRATCH @ 0x00001400[64]  {});
>  
> @@ -52,6 +54,31 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
>      31:16   frts_err_code as u16;
>  });
>  
> +// BAR0 window control register to configure the BAR0 window for PRAMIN access
> +// (direct physical VRAM access).
> +register!(NV_PBUS_BAR0_WINDOW @ 0x00001700, "BAR0 window control register" {
> +    25:24   target as u8, "Target (0=VID_MEM, 1=SYS_MEM_COHERENT, 2=SYS_MEM_NONCOHERENT)";
> +    23:0    window_base as u32, "Window base address (bits 39:16 of FB addr)";
> +});
> +
> +impl NV_PBUS_BAR0_WINDOW {
> +    /// Returns the 64-bit aligned VRAM address of the window.
> +    pub(crate) fn get_window_addr(self) -> usize {
> +        (self.window_base() as usize) << 16
> +    }
> +
> +    /// Sets the window address from a framebuffer offset.
> +    /// The fb_offset must be 64KB aligned (lower bits discared).
> +    pub(crate) fn set_window_addr(self, fb_offset: usize) -> Self {
> +        // Calculate window base (bits 39:16 of FB address)
> +        // The total FB address is 40 bits, mask anything above. Since we are
> +        // right shifting the offset by 16 bits, the mask is only 24 bits.
> +        let mask = genmask_u32(0..=23) as usize;
> +        let window_base = ((fb_offset >> 16) & mask) as u32;
> +        self.set_window_base(window_base)
> +    }
> +}
> +
>  // PFB
>  
>  // The following two registers together hold the physical system memory address that is used by the
Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Posted by Joel Fernandes 3 months, 2 weeks ago
Hi John,

On 10/21/2025 10:18 PM, John Hubbard wrote:
[...]
> First of all, I'd like very much for this patch to be at the beginning
> of a new patchset, that includes calling code that actually uses this
> pramin functionality.

There's no code in this patchset that uses pramin functionality though, so not
sure what you mean? This is a prerequisite patch for mm as mentioned in the
cover letter.

>> diff --git a/drivers/gpu/nova-core/mm/mod.rs b/drivers/gpu/nova-core/mm/mod.rs
>> new file mode 100644
>> index 000000000000..54c7cd9416a9
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/mm/mod.rs
>> @@ -0,0 +1,3 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +pub(crate) mod pramin;
>> diff --git a/drivers/gpu/nova-core/mm/pramin.rs b/drivers/gpu/nova-core/mm/pramin.rs
>> new file mode 100644
>> index 000000000000..4f4e1b8c0b9b
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/mm/pramin.rs
>> @@ -0,0 +1,241 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Direct VRAM access through PRAMIN window before page tables are set up.
> 
> Let's omit "before page tables are set up", because that's an assumption
> about the caller. In fact, this can be used at any time. (As an aside,
> various diag tools use this HW feature to peek/poke at vidmem, as you
> might expect.)

Ok.

> 
>> +//! PRAMIN can also write to system memory, however for simplicty we only
>> +//! support VRAM access.
> 
> Actually, I'd like to say something a bit different:
> 
> For Hopper/Blackwell and later GPUs, PRAMIN can only access VRAM.
> For earlier GPUs, sysmem was technically supported, but in practice
> never used in production drivers.

Ah, will clarify.

> 
> This is why Nova only provides VRAM access: Nova supports only a few
> of those older GPU generations (Turing, Ampere, and Ada), and there is
> no use case even for those few GPUs, for attempting sysmem access
> via PRAMIN.

Makes sense. Plus we can also access sysmem from BAR1 if needed.

>> +//!
>> +//! # Examples
>> +//!
>> +//! ## Writing u32 data to VRAM
>> +//!
>> +//! ```no_run
>> +//! use crate::driver::Bar0;
>> +//! use crate::mm::pramin::PraminVram;
>> +//!
>> +//! fn write_data_to_vram(bar: &Bar0) -> Result {
>> +//!     let pramin = PraminVram::new(bar);
>> +//!     // Write 4 32-bit words to VRAM at offset 0x10000
>> +//!     let data: [u32; 4] = [0xDEADBEEF, 0xCAFEBABE, 0x12345678, 0x87654321];
>> +//!     pramin.write::<u32>(0x10000, &data)?;
>> +//!     Ok(())
>> +//! }
>> +//! ```
>> +//!
>> +//! ## Reading bytes from VRAM
>> +//!
>> +//! ```no_run
>> +//! use crate::driver::Bar0;
>> +//! use crate::mm::pramin::PraminVram;
>> +//!
>> +//! fn read_data_from_vram(bar: &Bar0, buffer: &mut KVec<u8>) -> Result {
>> +//!     let pramin = PraminVram::new(bar);
>> +//!     // Read a u8 from VRAM starting at offset 0x20000
>> +//!     pramin.read::<u8>(0x20000, buffer)?;
>> +//!     Ok(())
>> +//! }
>> +//! ```
>> +
>> +#![expect(dead_code)]
>> +
>> +use crate::driver::Bar0;
>> +use crate::regs;
>> +use core::mem;
>> +use kernel::prelude::*;
>> +
>> +/// PRAMIN is a window into the VRAM (not a hardware block) that is used to access
>> +/// the VRAM directly. These addresses are consistent across all GPUs.
>> +const PRAMIN_BASE: usize = 0x700000; // PRAMIN is always at BAR0 + 0x700000
>> +const PRAMIN_SIZE: usize = 0x100000; // 1MB aperture - max access per window position
>> +
>> +/// Trait for types that can be read/written through PRAMIN.
>> +pub(crate) trait PraminNum: Copy + Default + Sized {
>> +    fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self>;
>> +
>> +    fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result;
>> +
>> +    fn size_bytes() -> usize {
>> +        mem::size_of::<Self>()
>> +    }
>> +
>> +    fn alignment() -> usize {
>> +        Self::size_bytes()
>> +    }
>> +}
>> +
>> +/// Macro to implement PraminNum trait for unsigned integer types.
>> +macro_rules! impl_pramin_unsigned_num {
>> +    ($bits:literal) => {
>> +        ::kernel::macros::paste! {
>> +            impl PraminNum for [<u $bits>] {
>> +                fn read_from_bar(bar: &Bar0, offset: usize) -> Result<Self> {
>> +                    bar.[<try_read $bits>](offset)
>> +                }
>> +
>> +                fn write_to_bar(self, bar: &Bar0, offset: usize) -> Result {
>> +                    bar.[<try_write $bits>](self, offset)
>> +                }
>> +            }
>> +        }
>> +    };
>> +}
>> +
>> +impl_pramin_unsigned_num!(8);
>> +impl_pramin_unsigned_num!(16);
>> +impl_pramin_unsigned_num!(32);
>> +impl_pramin_unsigned_num!(64);
>> +
>> +/// Direct VRAM access through PRAMIN window before page tables are set up.
>> +pub(crate) struct PraminVram<'a> {
>> +    bar: &'a Bar0,
>> +    saved_window_addr: usize,
>> +}
>> +
>> +impl<'a> PraminVram<'a> {
>> +    /// Create a new PRAMIN VRAM accessor, saving current window state,
>> +    /// the state is restored when the accessor is dropped.
>> +    ///
>> +    /// The BAR0 window base must be 64KB aligned but provides 1MB of VRAM access.
>> +    /// Window is repositioned automatically when accessing data beyond 1MB boundaries.
>> +    pub(crate) fn new(bar: &'a Bar0) -> Self {
>> +        let saved_window_addr = Self::get_window_addr(bar);
>> +        Self {
>> +            bar,
>> +            saved_window_addr,
>> +        }
>> +    }
>> +
>> +    /// Set BAR0 window to point to specific FB region.
>> +    ///
>> +    /// # Arguments
>> +    ///
>> +    /// * `fb_offset` - VRAM byte offset where the window should be positioned.
>> +    ///                 Must be 64KB aligned (lower 16 bits zero).
>> +    fn set_window_addr(&self, fb_offset: usize) -> Result {
>> +        // FB offset must be 64KB aligned (hardware requirement for window_base field)
>> +        // Once positioned, the window provides access to 1MB of VRAM through PRAMIN aperture
>> +        if fb_offset & 0xFFFF != 0 {
>> +            return Err(EINVAL);
>> +        }
>> +
>> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::default().set_window_addr(fb_offset);
>> +        window_reg.write(self.bar);
>> +
>> +        // Read back to ensure it took effect
>> +        let readback = regs::NV_PBUS_BAR0_WINDOW::read(self.bar);
>> +        if readback.window_base() != window_reg.window_base() {
>> +            return Err(EIO);
>> +        }
> 
> Let's not "read it back to ensure it took effect". This is not required
> by the hardware.

Ok, it was more for my ascertaining that the write to effect, but you're right
its not necessary so I'll remove it.

>> +        Ok(())
>> +    }
>> +
>> +    /// Get current BAR0 window offset.
>> +    ///
>> +    /// # Returns
>> +    ///
>> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
>> +    /// This offset is always 64KB aligned.
>> +    fn get_window_addr(bar: &Bar0) -> usize {
>> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
>> +        window_reg.get_window_addr()
>> +    }
>> +
>> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
>> +    ///
>> +    /// # Arguments
>> +    ///
>> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
>> +    ///                 Must be aligned to `T::alignment()`.
>> +    /// * `num_items` - Number of items of type `T` to process.
>> +    /// * `operation` - Closure called for each item to perform the actual read/write.
>> +    ///                 Takes two parameters:
>> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
>> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
>> +    ///
>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>> +    /// data that spans multiple 1MB windows.
>> +    fn access_vram<T: PraminNum, F>(
>> +        &self,
>> +        fb_offset: usize,
>> +        num_items: usize,
>> +        mut operation: F,
>> +    ) -> Result
> 
> This is far too much functionality, and the code can be made much smaller
> and simpler.
> and still get what we need. Open RM only supplies small accessors
> (8 thru 64 bits wide), and no "bulk access". The calling code can loop if 
> necessary.

The code uses a sliding window approach to reposition the moving window,
abstracting away the details of the moving window from the caller. That
simplifies the callers a lot as they don't need to "loop" and know when to move
the window when they hit limits. They can also write to greater than 1MB. The
bulk of the logic is in this function and the surrounding code is mostly
wrappers, which part is complicated or that you did not understand?

Just to note also, the PRAMIN moving window functionality in this patch allows
us to not need BAR2 to access VRAM for instance memory. That is a code
simplification then as we do not need code for BAR2 (the tradeoff being slightly
slower instance memory access). I confirmed with the team that this is also an
option. Abstracting the sliding window functionality becomes important then, so
I'd not vote for removing this functionality for that reason. And if we ever use
BAR2, having it is still useful because it allows us to have a fallback too for
comparison/reference.

> 
> We should do likewise, and avoid this.
> 
> Then we can just create things such as write_u32() or write<u32>(), etc.
> 
> And do we even *need* read?? I'm not sure we do.

We do need reads as we walk through page tables structures. Note that the page
tables are partially allocated by the GSP.

> 
> This is hopefully showing the value of including the calling code, as
> a follow-on patch in the series.

Unfortunately, there are too many dependencies as I mentioned in the cover
letter, so I would like to get functionality merged in stages. That's the
best way to make good progress IMO for nova-core. Of course, we have to careful
about design etc and I kept it as simple as possible out of that intention. My
pramin patch was written 3-4 months ago now, so I'd like to not keep it too
sitting comfortably in my tree. :). And this patch is critical for mm.

That said, I'll go look at the usecases and APIs again and see if I can simplify
anything more.

thanks,

 - Joel
Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Posted by Alexandre Courbot 3 months, 2 weeks ago
On Thu Oct 23, 2025 at 2:48 AM JST, Joel Fernandes wrote:
<snip>
>>> +        Ok(())
>>> +    }
>>> +
>>> +    /// Get current BAR0 window offset.
>>> +    ///
>>> +    /// # Returns
>>> +    ///
>>> +    /// The byte offset in VRAM where the PRAMIN window is currently positioned.
>>> +    /// This offset is always 64KB aligned.
>>> +    fn get_window_addr(bar: &Bar0) -> usize {
>>> +        let window_reg = regs::NV_PBUS_BAR0_WINDOW::read(bar);
>>> +        window_reg.get_window_addr()
>>> +    }
>>> +
>>> +    /// Common logic for accessing VRAM data through PRAMIN with windowing.
>>> +    ///
>>> +    /// # Arguments
>>> +    ///
>>> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
>>> +    ///                 Must be aligned to `T::alignment()`.
>>> +    /// * `num_items` - Number of items of type `T` to process.
>>> +    /// * `operation` - Closure called for each item to perform the actual read/write.
>>> +    ///                 Takes two parameters:
>>> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
>>> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
>>> +    ///
>>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>>> +    /// data that spans multiple 1MB windows.
>>> +    fn access_vram<T: PraminNum, F>(
>>> +        &self,
>>> +        fb_offset: usize,
>>> +        num_items: usize,
>>> +        mut operation: F,
>>> +    ) -> Result
>> 
>> This is far too much functionality, and the code can be made much smaller
>> and simpler.
>> and still get what we need. Open RM only supplies small accessors
>> (8 thru 64 bits wide), and no "bulk access". The calling code can loop if 
>> necessary.
>
> The code uses a sliding window approach to reposition the moving window,
> abstracting away the details of the moving window from the caller. That
> simplifies the callers a lot as they don't need to "loop" and know when to move
> the window when they hit limits. They can also write to greater than 1MB. The
> bulk of the logic is in this function and the surrounding code is mostly
> wrappers, which part is complicated or that you did not understand?
>
> Just to note also, the PRAMIN moving window functionality in this patch allows
> us to not need BAR2 to access VRAM for instance memory. That is a code
> simplification then as we do not need code for BAR2 (the tradeoff being slightly
> slower instance memory access). I confirmed with the team that this is also an
> option. Abstracting the sliding window functionality becomes important then, so
> I'd not vote for removing this functionality for that reason. And if we ever use
> BAR2, having it is still useful because it allows us to have a fallback too for
> comparison/reference.

Whether we want a sliding window mechanism or not, I think it is
valuable to expose the PRAMIN functionality the way the hardware
supports it (i.e. set base address and work with a fixed slice), and
then build QoL features like the sliding window on top of it, through
e.g. another type that wraps the basic PRAMIN one.

This would make the code easier to read, allow more flexibility for
users (although in the case of PRAMIN we might not really need it), and
matches what Rust does for e.g. `BufReader`, which consumes a basic reader
and provide buffering for it.

>
>> 
>> We should do likewise, and avoid this.
>> 
>> Then we can just create things such as write_u32() or write<u32>(), etc.
>> 
>> And do we even *need* read?? I'm not sure we do.
>
> We do need reads as we walk through page tables structures. Note that the page
> tables are partially allocated by the GSP.
>
>> 
>> This is hopefully showing the value of including the calling code, as
>> a follow-on patch in the series.
>
> Unfortunately, there are too many dependencies as I mentioned in the cover
> letter, so I would like to get functionality merged in stages. That's the
> best way to make good progress IMO for nova-core. Of course, we have to careful
> about design etc and I kept it as simple as possible out of that intention. My
> pramin patch was written 3-4 months ago now, so I'd like to not keep it too
> sitting comfortably in my tree. :). And this patch is critical for mm.

Although we have neglected it lately, we could use our
`nova-core-unstable` staging branch for that - IIRC the goal was also to
keep track of pending patches and make sure they don't bitrot until they
can be sent.
Re: [PATCH 6/7] nova-core: mm: Add support to use PRAMIN windows to write to VRAM
Posted by Joel Fernandes 3 months, 2 weeks ago
Hi,

On 10/22/2025 1:48 PM, Joel Fernandes wrote:
>>> +    /// * `fb_offset` - Starting byte offset in VRAM (framebuffer) where access begins.
>>> +    ///                 Must be aligned to `T::alignment()`.
>>> +    /// * `num_items` - Number of items of type `T` to process.
>>> +    /// * `operation` - Closure called for each item to perform the actual read/write.
>>> +    ///                 Takes two parameters:
>>> +    ///                 - `data_idx`: Index of the item in the data array (0..num_items)
>>> +    ///                 - `pramin_offset`: BAR0 offset in the PRAMIN aperture to access
>>> +    ///
>>> +    /// The function automatically handles PRAMIN window repositioning when accessing
>>> +    /// data that spans multiple 1MB windows.
>>> +    fn access_vram<T: PraminNum, F>(
>>> +        &self,
>>> +        fb_offset: usize,
>>> +        num_items: usize,
>>> +        mut operation: F,
>>> +    ) -> Result
>> This is far too much functionality, and the code can be made much smaller
>> and simpler.
>> and still get what we need. Open RM only supplies small accessors
>> (8 thru 64 bits wide), and no "bulk access". The calling code can loop if 
>> necessary.
>> The code uses a sliding window approach to reposition the moving window,
> abstracting away the details of the moving window from the caller.
For completeness: I chatted with John and wanted to update the thread. The main
"benefit" from this extra complexity is if we write > 1MB at a time (automatic
sliding of window). However, in all frankness I *currently* don't have such a
use case - all writes are within 1MB. Just to be clear though I do feel we will
need this though if we use PRAMIN as the primary way to access "instance memory"
instead of BAR2 since instance memory is much larger than the window.

So for next version of this patch, I will keep it simple (KISS principal). We
can add in the abstracted moving-window for >1MB writes when we need it as this
patch on the list and is not going anywhere. Also I think since we're exposing
the details of the window size to the caller, we could come up with compile-time
checks to make sure they're within bounds, so I'll proceed accordingly.

Thanks!

 - Joel