[PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion

Alexandre Courbot posted 16 patches 9 months, 3 weeks ago
There is a newer version of this series
[PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
Posted by Alexandre Courbot 9 months, 3 weeks ago
Upon reset, the GPU executes the GFW_BOOT firmware in order to
initialize its base parameters such as clocks. The driver must ensure
that this step is completed before using the hardware.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/driver.rs    |  2 +-
 drivers/gpu/nova-core/gpu.rs       |  5 +++++
 drivers/gpu/nova-core/nova_core.rs |  1 +
 drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
 5 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
new file mode 100644
index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
--- /dev/null
+++ b/drivers/gpu/nova-core/devinit.rs
@@ -0,0 +1,40 @@
+// SPDX-License-Identifier: GPL-2.0
+
+//! Methods for device initialization.
+
+use kernel::bindings;
+use kernel::devres::Devres;
+use kernel::prelude::*;
+
+use crate::driver::Bar0;
+use crate::regs;
+
+/// Wait for devinit FW completion.
+///
+/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
+/// considered unusable until this step is completed, so it must be waited on very early during
+/// driver initialization.
+pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
+    let mut timeout = 2000;
+
+    loop {
+        let gfw_booted = with_bar!(
+            bar,
+            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)
+                .read_protection_level0_enabled()
+                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff
+        )?;
+
+        if gfw_booted {
+            return Ok(());
+        }
+
+        if timeout == 0 {
+            return Err(ETIMEDOUT);
+        }
+        timeout -= 1;
+
+        // SAFETY: msleep should be safe to call with any parameter.
+        unsafe { bindings::msleep(2) };
+    }
+}
diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
     pub(crate) gpu: Gpu,
 }
 
-const BAR0_SIZE: usize = 8;
+const BAR0_SIZE: usize = 0x1000000;
 pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
 
 kernel::pci_device_table!(
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -2,6 +2,7 @@
 
 use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
 
+use crate::devinit;
 use crate::driver::Bar0;
 use crate::firmware::Firmware;
 use crate::regs;
@@ -168,6 +169,10 @@ pub(crate) fn new(
             spec.revision
         );
 
+        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
+        devinit::wait_gfw_boot_completion(&bar)
+            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
+
         Ok(pin_init!(Self { spec, bar, fw }))
     }
 }
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -20,6 +20,7 @@ macro_rules! with_bar {
     }
 }
 
+mod devinit;
 mod driver;
 mod firmware;
 mod gpu;
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -13,3 +13,14 @@
     7:4     major_rev => as u8, "major revision of the chip";
     28:20   chipset => try_into Chipset, "chipset model"
 );
+
+/* GC6 */
+
+register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
+    0:0     read_protection_level0_enabled => as_bit bool
+);
+
+/* TODO: This is an array of registers. */
+register!(Pgc6AonSecureScratchGroup05@0x00118234;
+    31:0    value => as u32
+);

-- 
2.49.0
Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
Posted by Danilo Krummrich 9 months, 3 weeks ago
On Sun, Apr 20, 2025 at 09:19:40PM +0900, Alexandre Courbot wrote:
> Upon reset, the GPU executes the GFW_BOOT firmware in order to
> initialize its base parameters such as clocks. The driver must ensure
> that this step is completed before using the hardware.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/driver.rs    |  2 +-
>  drivers/gpu/nova-core/gpu.rs       |  5 +++++
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
>  5 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
> --- /dev/null
> +++ b/drivers/gpu/nova-core/devinit.rs
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Methods for device initialization.
> +
> +use kernel::bindings;
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +
> +/// Wait for devinit FW completion.
> +///
> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
> +/// considered unusable until this step is completed, so it must be waited on very early during
> +/// driver initialization.
> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
> +    let mut timeout = 2000;
> +
> +    loop {
> +        let gfw_booted = with_bar!(
> +            bar,
> +            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)
> +                .read_protection_level0_enabled()
> +                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff
> +        )?;
> +
> +        if gfw_booted {
> +            return Ok(());
> +        }
> +
> +        if timeout == 0 {
> +            return Err(ETIMEDOUT);
> +        }
> +        timeout -= 1;
> +
> +        // SAFETY: msleep should be safe to call with any parameter.
> +        unsafe { bindings::msleep(2) };

I assume this goes away with [1]? Can we please add a corresponding TODO? Also,
do you mind preparing the follow-up patches for cases like this (there's also
the transmute one), such that we can apply them, once the dependencies did land
and such that we can verify that they suit our needs?

[1] https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/

> +    }
> +}
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>      pub(crate) gpu: Gpu,
>  }
>  
> -const BAR0_SIZE: usize = 8;
> +const BAR0_SIZE: usize = 0x1000000;
>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>  
>  kernel::pci_device_table!(
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -2,6 +2,7 @@
>  
>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>  
> +use crate::devinit;
>  use crate::driver::Bar0;
>  use crate::firmware::Firmware;
>  use crate::regs;
> @@ -168,6 +169,10 @@ pub(crate) fn new(
>              spec.revision
>          );
>  
> +        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
> +        devinit::wait_gfw_boot_completion(&bar)
> +            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
> +
>          Ok(pin_init!(Self { spec, bar, fw }))
>      }
>  }
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -20,6 +20,7 @@ macro_rules! with_bar {
>      }
>  }
>  
> +mod devinit;
>  mod driver;
>  mod firmware;
>  mod gpu;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -13,3 +13,14 @@
>      7:4     major_rev => as u8, "major revision of the chip";
>      28:20   chipset => try_into Chipset, "chipset model"
>  );
> +
> +/* GC6 */
> +
> +register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
> +    0:0     read_protection_level0_enabled => as_bit bool
> +);
> +
> +/* TODO: This is an array of registers. */
> +register!(Pgc6AonSecureScratchGroup05@0x00118234;
> +    31:0    value => as u32
> +);

Please also document new register definitions.
Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
Posted by Alexandre Courbot 9 months, 2 weeks ago
On Tue Apr 22, 2025 at 8:36 PM JST, Danilo Krummrich wrote:
> On Sun, Apr 20, 2025 at 09:19:40PM +0900, Alexandre Courbot wrote:
>> Upon reset, the GPU executes the GFW_BOOT firmware in order to
>> initialize its base parameters such as clocks. The driver must ensure
>> that this step is completed before using the hardware.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
>>  drivers/gpu/nova-core/driver.rs    |  2 +-
>>  drivers/gpu/nova-core/gpu.rs       |  5 +++++
>>  drivers/gpu/nova-core/nova_core.rs |  1 +
>>  drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
>>  5 files changed, 58 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
>> --- /dev/null
>> +++ b/drivers/gpu/nova-core/devinit.rs
>> @@ -0,0 +1,40 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +//! Methods for device initialization.
>> +
>> +use kernel::bindings;
>> +use kernel::devres::Devres;
>> +use kernel::prelude::*;
>> +
>> +use crate::driver::Bar0;
>> +use crate::regs;
>> +
>> +/// Wait for devinit FW completion.
>> +///
>> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
>> +/// considered unusable until this step is completed, so it must be waited on very early during
>> +/// driver initialization.
>> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
>> +    let mut timeout = 2000;
>> +
>> +    loop {
>> +        let gfw_booted = with_bar!(
>> +            bar,
>> +            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)
>> +                .read_protection_level0_enabled()
>> +                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff
>> +        )?;
>> +
>> +        if gfw_booted {
>> +            return Ok(());
>> +        }
>> +
>> +        if timeout == 0 {
>> +            return Err(ETIMEDOUT);
>> +        }
>> +        timeout -= 1;
>> +
>> +        // SAFETY: msleep should be safe to call with any parameter.
>> +        unsafe { bindings::msleep(2) };
>
> I assume this goes away with [1]? Can we please add a corresponding TODO? Also,
> do you mind preparing the follow-up patches for cases like this (there's also
> the transmute one), such that we can apply them, once the dependencies did land
> and such that we can verify that they suit our needs?
>
> [1] https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/

Good idea. Added the TODO item with a link to the patch.

>
>> +    }
>> +}
>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
>> index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
>> --- a/drivers/gpu/nova-core/driver.rs
>> +++ b/drivers/gpu/nova-core/driver.rs
>> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>>      pub(crate) gpu: Gpu,
>>  }
>>  
>> -const BAR0_SIZE: usize = 8;
>> +const BAR0_SIZE: usize = 0x1000000;
>>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>>  
>>  kernel::pci_device_table!(
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -2,6 +2,7 @@
>>  
>>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>>  
>> +use crate::devinit;
>>  use crate::driver::Bar0;
>>  use crate::firmware::Firmware;
>>  use crate::regs;
>> @@ -168,6 +169,10 @@ pub(crate) fn new(
>>              spec.revision
>>          );
>>  
>> +        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
>> +        devinit::wait_gfw_boot_completion(&bar)
>> +            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
>> +
>>          Ok(pin_init!(Self { spec, bar, fw }))
>>      }
>>  }
>> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
>> index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
>> --- a/drivers/gpu/nova-core/nova_core.rs
>> +++ b/drivers/gpu/nova-core/nova_core.rs
>> @@ -20,6 +20,7 @@ macro_rules! with_bar {
>>      }
>>  }
>>  
>> +mod devinit;
>>  mod driver;
>>  mod firmware;
>>  mod gpu;
>> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
>> index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
>> --- a/drivers/gpu/nova-core/regs.rs
>> +++ b/drivers/gpu/nova-core/regs.rs
>> @@ -13,3 +13,14 @@
>>      7:4     major_rev => as u8, "major revision of the chip";
>>      28:20   chipset => try_into Chipset, "chipset model"
>>  );
>> +
>> +/* GC6 */
>> +
>> +register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
>> +    0:0     read_protection_level0_enabled => as_bit bool
>> +);
>> +
>> +/* TODO: This is an array of registers. */
>> +register!(Pgc6AonSecureScratchGroup05@0x00118234;
>> +    31:0    value => as u32
>> +);
>
> Please also document new register definitions.

Thankfully Joel's documentation patches take care of this!

Cheers,
Alex.
Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
Posted by Joel Fernandes 9 months, 2 weeks ago

On 4/29/2025 8:48 AM, Alexandre Courbot wrote:
> On Tue Apr 22, 2025 at 8:36 PM JST, Danilo Krummrich wrote:
>> On Sun, Apr 20, 2025 at 09:19:40PM +0900, Alexandre Courbot wrote:
>>> Upon reset, the GPU executes the GFW_BOOT firmware in order to
>>> initialize its base parameters such as clocks. The driver must ensure
>>> that this step is completed before using the hardware.
>>>
>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>> ---
>>>  drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
>>>  drivers/gpu/nova-core/driver.rs    |  2 +-
>>>  drivers/gpu/nova-core/gpu.rs       |  5 +++++
>>>  drivers/gpu/nova-core/nova_core.rs |  1 +
>>>  drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
>>>  5 files changed, 58 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
>>> new file mode 100644
>>> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
>>> --- /dev/null
>>> +++ b/drivers/gpu/nova-core/devinit.rs
>>> @@ -0,0 +1,40 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +
>>> +//! Methods for device initialization.
>>> +
>>> +use kernel::bindings;
>>> +use kernel::devres::Devres;
>>> +use kernel::prelude::*;
>>> +
>>> +use crate::driver::Bar0;
>>> +use crate::regs;
>>> +
>>> +/// Wait for devinit FW completion.
>>> +///
>>> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
>>> +/// considered unusable until this step is completed, so it must be waited on very early during
>>> +/// driver initialization.
>>> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
>>> +    let mut timeout = 2000;
>>> +
>>> +    loop {
>>> +        let gfw_booted = with_bar!(
>>> +            bar,
>>> +            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)
>>> +                .read_protection_level0_enabled()
>>> +                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff
>>> +        )?;
>>> +
>>> +        if gfw_booted {
>>> +            return Ok(());
>>> +        }
>>> +
>>> +        if timeout == 0 {
>>> +            return Err(ETIMEDOUT);
>>> +        }
>>> +        timeout -= 1;
>>> +
>>> +        // SAFETY: msleep should be safe to call with any parameter.
>>> +        unsafe { bindings::msleep(2) };
>>
>> I assume this goes away with [1]? Can we please add a corresponding TODO? Also,
>> do you mind preparing the follow-up patches for cases like this (there's also
>> the transmute one), such that we can apply them, once the dependencies did land
>> and such that we can verify that they suit our needs?
>>
>> [1] https://lore.kernel.org/lkml/20250220070611.214262-8-fujita.tomonori@gmail.com/
> 
> Good idea. Added the TODO item with a link to the patch.
> 
>>
>>> +    }
>>> +}
>>> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
>>> index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
>>> --- a/drivers/gpu/nova-core/driver.rs
>>> +++ b/drivers/gpu/nova-core/driver.rs
>>> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>>>      pub(crate) gpu: Gpu,
>>>  }
>>>  
>>> -const BAR0_SIZE: usize = 8;
>>> +const BAR0_SIZE: usize = 0x1000000;
>>>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>>>  
>>>  kernel::pci_device_table!(
>>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>>> index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
>>> --- a/drivers/gpu/nova-core/gpu.rs
>>> +++ b/drivers/gpu/nova-core/gpu.rs
>>> @@ -2,6 +2,7 @@
>>>  
>>>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>>>  
>>> +use crate::devinit;
>>>  use crate::driver::Bar0;
>>>  use crate::firmware::Firmware;
>>>  use crate::regs;
>>> @@ -168,6 +169,10 @@ pub(crate) fn new(
>>>              spec.revision
>>>          );
>>>  
>>> +        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
>>> +        devinit::wait_gfw_boot_completion(&bar)
>>> +            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
>>> +
>>>          Ok(pin_init!(Self { spec, bar, fw }))
>>>      }
>>>  }
>>> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
>>> index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
>>> --- a/drivers/gpu/nova-core/nova_core.rs
>>> +++ b/drivers/gpu/nova-core/nova_core.rs
>>> @@ -20,6 +20,7 @@ macro_rules! with_bar {
>>>      }
>>>  }
>>>  
>>> +mod devinit;
>>>  mod driver;
>>>  mod firmware;
>>>  mod gpu;
>>> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
>>> index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
>>> --- a/drivers/gpu/nova-core/regs.rs
>>> +++ b/drivers/gpu/nova-core/regs.rs
>>> @@ -13,3 +13,14 @@
>>>      7:4     major_rev => as u8, "major revision of the chip";
>>>      28:20   chipset => try_into Chipset, "chipset model"
>>>  );
>>> +
>>> +/* GC6 */
>>> +
>>> +register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
>>> +    0:0     read_protection_level0_enabled => as_bit bool
>>> +);
>>> +
>>> +/* TODO: This is an array of registers. */
>>> +register!(Pgc6AonSecureScratchGroup05@0x00118234;
>>> +    31:0    value => as u32
>>> +);
>>
>> Please also document new register definitions.
> 
> Thankfully Joel's documentation patches take care of this!
> 
Yes, my doc tree (now 8 patches) includes documenting these! Considering the
register renaming stuff it may conflict, but I'll fix that up. :)

thanks,

 - Joel
Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
Posted by Joel Fernandes 9 months, 3 weeks ago
Hi, Alex,
Just some documentation-type comments and one Rust-naming-convention comment:

On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
> Upon reset, the GPU executes the GFW_BOOT firmware in order to
> initialize its base parameters such as clocks. The driver must ensure
> that this step is completed before using the hardware.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/devinit.rs   | 40 ++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/driver.rs    |  2 +-
>  drivers/gpu/nova-core/gpu.rs       |  5 +++++
>  drivers/gpu/nova-core/nova_core.rs |  1 +
>  drivers/gpu/nova-core/regs.rs      | 11 +++++++++++
>  5 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
> --- /dev/null
> +++ b/drivers/gpu/nova-core/devinit.rs
> @@ -0,0 +1,40 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +//! Methods for device initialization.

Let us clarify what devinit means.

devinit is a sequence of register read/writes after reset that performs tasks
such as:
1. Programming VRAM memory controller timings.
2. Power sequencing.
3. Clock and PLL configuration.
4. Thermal management.
5. Performs VRAM memory scrubbing (ECC initialization) - on some GPUs, it scrubs
only part of memory and then kicks of 'async scrubbing'.

devinit itself is a 'script' which is interpreted by the PMU microcontroller of
of the GPU by an interpreter program.

Note that devinit also needs to run during suspend/resume at runtime.

I talked with Alex and I could add a new patch on top of this patch to add these
clarifying 'doc' comments as well. I will commit them to my git branch and send
on top of this as needed, but Alex can feel free to decide to squash them as well.

> +
> +use kernel::bindings;
> +use kernel::devres::Devres;
> +use kernel::prelude::*;
> +
> +use crate::driver::Bar0;
> +use crate::regs;
> +
> +/// Wait for devinit FW completion.
> +///
> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
> +/// considered unusable until this step is completed, so it must be waited on very early during
> +/// driver initialization.
> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {

To reduce acronym soup, we can clarify gfw means 'GPU firmware', it is a broad
term used for VBIOS ROM components several of which execute before the driver
loads. Perhaps that part of comment can be 'the GPU firmware (gfw) code'.

> +    let mut timeout = 2000;
> +
> +    loop {
> +        let gfw_booted = with_bar!(
> +            bar,
> +            |b| regs::Pgc6AonSecureScratchGroup05PrivLevelMask::read(b)

Per my research, FWSEC is run before hand on the GSP in 'high secure' mode,
before the driver even loads. This happens roughly around the time devinit is
also happening (not sure if it before or after). This FWSEC is supposed to lower
the privilege level of the access to 'Pgc6AonSecureScratchGroup05' so that the
register is accessible by the CPU. I think we should mention that here as
rationale for why we need to read Pgc6AonSecureScratchGroup05PrivLevelMask first
before accessing Pgc6AonSecureScratchGroup05.

Here we should say we need to read the GFW_BOOT only once we know that the
privilege level has been reduced by the FWSEC

> +                .read_protection_level0_enabled()
> +                && (regs::Pgc6AonSecureScratchGroup05::read(b).value() & 0xff) == 0xff

I find this Rust convention for camel casing long constants very unreadable and
troubling: Pgc6AonSecureScratchGroup05. I think we should relax this requirement
for sake of readability. Could the Rust community / maintainers provide some input?

Apart from readability, it also makes searching for the same register name a
nightmare with other code bases written in C.

Couple of ideas discussed:

1. May be have a macro that converts
REG(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK) ->
regs::Pgc6AonSecureScratchGroup05 ?
But not sure what it takes on the rust side to implement a macro like that.

2. Adding doc comments both in regs.rs during defining the register, and
possibly at the caller site. This still does address the issue fully.


> +        )?;
> +
> +        if gfw_booted {
> +            return Ok(());
> +        }
> +
> +        if timeout == 0 {
> +            return Err(ETIMEDOUT);
> +        }
> +        timeout -= 1;
> +
> +        // SAFETY: msleep should be safe to call with any parameter.
> +        unsafe { bindings::msleep(2) };
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index a08fb6599267a960f0e07b6efd0e3b6cdc296aa4..752ba4b0fcfe8d835d366570bb2f807840a196da 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -10,7 +10,7 @@ pub(crate) struct NovaCore {
>      pub(crate) gpu: Gpu,
>  }
>  
> -const BAR0_SIZE: usize = 8;
> +const BAR0_SIZE: usize = 0x1000000;
>  pub(crate) type Bar0 = pci::Bar<BAR0_SIZE>;
>  
>  kernel::pci_device_table!(
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index 866c5992b9eb27735975bb4948e522bc01fadaa2..1f7799692a0ab042f2540e01414f5ca347ae9ecc 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -2,6 +2,7 @@
>  
>  use kernel::{device, devres::Devres, error::code::*, pci, prelude::*};
>  
> +use crate::devinit;
>  use crate::driver::Bar0;
>  use crate::firmware::Firmware;
>  use crate::regs;
> @@ -168,6 +169,10 @@ pub(crate) fn new(
>              spec.revision
>          );
>  
> +        // We must wait for GFW_BOOT completion before doing any significant setup on the GPU.
> +        devinit::wait_gfw_boot_completion(&bar)
> +            .inspect_err(|_| pr_err!("GFW boot did not complete"))?;
> +
>          Ok(pin_init!(Self { spec, bar, fw }))
>      }
>  }
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index 0eecd612e34efc046dad852e6239de6ffa5fdd62..878161e060f54da7738c656f6098936a62dcaa93 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -20,6 +20,7 @@ macro_rules! with_bar {
>      }
>  }
>  
> +mod devinit;
>  mod driver;
>  mod firmware;
>  mod gpu;
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index e315a3011660df7f18c0a3e0582b5845545b36e2..fd7096f0ddd4af90114dd1119d9715d2cd3aa2ac 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -13,3 +13,14 @@
>      7:4     major_rev => as u8, "major revision of the chip";
>      28:20   chipset => try_into Chipset, "chipset model"
>  );
> +
> +/* GC6 */

GC6 is a GPU low-power state. The VRAM is in self-refresh and GPU itself is
powered down (all power rails not required for self-refresh).

The following registers are exposed by the hardware unit in the GPU which
manages the GC6 state transitions:

> +
> +register!(Pgc6AonSecureScratchGroup05PrivLevelMask@0x00118128;
> +    0:0     read_protection_level0_enabled => as_bit bool
> +);

This is a privilege-level-mask register, which dictates whether the host CPU can
access the register.

> +
> +/* TODO: This is an array of registers. */
> +register!(Pgc6AonSecureScratchGroup05@0x00118234;
> +    31:0    value => as u32
> +);
> 

These are always-on registers always available including in the GC6 state (which
makes sense since we need to access it to know if we are far enough in the boot
process).

thanks,

 - Joel
Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
Posted by Danilo Krummrich 9 months, 3 weeks ago
On Mon, Apr 21, 2025 at 05:45:33PM -0400, Joel Fernandes wrote:
> On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
> > diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
> > --- /dev/null
> > +++ b/drivers/gpu/nova-core/devinit.rs
> > @@ -0,0 +1,40 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +//! Methods for device initialization.
> 
> Let us clarify what devinit means.
> 
> devinit is a sequence of register read/writes after reset that performs tasks
> such as:
> 1. Programming VRAM memory controller timings.
> 2. Power sequencing.
> 3. Clock and PLL configuration.
> 4. Thermal management.
> 5. Performs VRAM memory scrubbing (ECC initialization) - on some GPUs, it scrubs
> only part of memory and then kicks of 'async scrubbing'.
> 
> devinit itself is a 'script' which is interpreted by the PMU microcontroller of
> of the GPU by an interpreter program.
> 
> Note that devinit also needs to run during suspend/resume at runtime.

Thanks for writing this up. I fully agree that those things have to be
documented.

> 
> I talked with Alex and I could add a new patch on top of this patch to add these
> clarifying 'doc' comments as well. I will commit them to my git branch and send
> on top of this as needed, but Alex can feel free to decide to squash them as well.

Fine with both, whatever you guys prefer.

> 
> > +
> > +use kernel::bindings;
> > +use kernel::devres::Devres;
> > +use kernel::prelude::*;
> > +
> > +use crate::driver::Bar0;
> > +use crate::regs;
> > +
> > +/// Wait for devinit FW completion.
> > +///
> > +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
> > +/// considered unusable until this step is completed, so it must be waited on very early during
> > +/// driver initialization.
> > +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
> 
> To reduce acronym soup, we can clarify gfw means 'GPU firmware', it is a broad
> term used for VBIOS ROM components several of which execute before the driver
> loads. Perhaps that part of comment can be 'the GPU firmware (gfw) code'.

Yes, we should absolutely explain acronyms as well as use consistent and defined
terminology when referring to things.

I think we should put both into Documentation/gpu/nova/ and add the
corresponding pointers in the code.

> I find this Rust convention for camel casing long constants very unreadable and
> troubling: Pgc6AonSecureScratchGroup05. I think we should relax this requirement
> for sake of readability. Could the Rust community / maintainers provide some input?
> 
> Apart from readability, it also makes searching for the same register name a
> nightmare with other code bases written in C.
> 
> Couple of ideas discussed:
> 
> 1. May be have a macro that converts
> REG(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK) ->
> regs::Pgc6AonSecureScratchGroup05 ?
> But not sure what it takes on the rust side to implement a macro like that.
> 
> 2. Adding doc comments both in regs.rs during defining the register, and
> possibly at the caller site. This still does address the issue fully.

If that addresses your concern, it sounds totally reasonable to me.
Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
Posted by Alexandre Courbot 9 months, 3 weeks ago
Hi Joel, Danilo,

On Tue Apr 22, 2025 at 8:28 PM JST, Danilo Krummrich wrote:
> On Mon, Apr 21, 2025 at 05:45:33PM -0400, Joel Fernandes wrote:
>> On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
>> > diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
>> > new file mode 100644
>> > index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
>> > --- /dev/null
>> > +++ b/drivers/gpu/nova-core/devinit.rs
>> > @@ -0,0 +1,40 @@
>> > +// SPDX-License-Identifier: GPL-2.0
>> > +
>> > +//! Methods for device initialization.
>> 
>> Let us clarify what devinit means.
>> 
>> devinit is a sequence of register read/writes after reset that performs tasks
>> such as:
>> 1. Programming VRAM memory controller timings.
>> 2. Power sequencing.
>> 3. Clock and PLL configuration.
>> 4. Thermal management.
>> 5. Performs VRAM memory scrubbing (ECC initialization) - on some GPUs, it scrubs
>> only part of memory and then kicks of 'async scrubbing'.
>> 
>> devinit itself is a 'script' which is interpreted by the PMU microcontroller of
>> of the GPU by an interpreter program.
>> 
>> Note that devinit also needs to run during suspend/resume at runtime.
>
> Thanks for writing this up. I fully agree that those things have to be
> documented.
>
>> 
>> I talked with Alex and I could add a new patch on top of this patch to add these
>> clarifying 'doc' comments as well. I will commit them to my git branch and send
>> on top of this as needed, but Alex can feel free to decide to squash them as well.
>
> Fine with both, whatever you guys prefer.

If that works with you, I will put Joel's patches improving the
documentation right after mines adding the code in the next revision. I
know this ideally should be a single patch, but researching this stuff
(and producing a proper writeup) is quite involved and a separate kind
of task from the quickly-translate-code-while-peeking-at-OpenRM work
that I did. 

>
>> 
>> > +
>> > +use kernel::bindings;
>> > +use kernel::devres::Devres;
>> > +use kernel::prelude::*;
>> > +
>> > +use crate::driver::Bar0;
>> > +use crate::regs;
>> > +
>> > +/// Wait for devinit FW completion.
>> > +///
>> > +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
>> > +/// considered unusable until this step is completed, so it must be waited on very early during
>> > +/// driver initialization.
>> > +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
>> 
>> To reduce acronym soup, we can clarify gfw means 'GPU firmware', it is a broad
>> term used for VBIOS ROM components several of which execute before the driver
>> loads. Perhaps that part of comment can be 'the GPU firmware (gfw) code'.
>
> Yes, we should absolutely explain acronyms as well as use consistent and defined
> terminology when referring to things.
>
> I think we should put both into Documentation/gpu/nova/ and add the
> corresponding pointers in the code.

SGTM.

>
>> I find this Rust convention for camel casing long constants very unreadable and
>> troubling: Pgc6AonSecureScratchGroup05. I think we should relax this requirement
>> for sake of readability. Could the Rust community / maintainers provide some input?
>> 
>> Apart from readability, it also makes searching for the same register name a
>> nightmare with other code bases written in C.
>> 
>> Couple of ideas discussed:
>> 
>> 1. May be have a macro that converts
>> REG(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK) ->
>> regs::Pgc6AonSecureScratchGroup05 ?
>> But not sure what it takes on the rust side to implement a macro like that.
>> 
>> 2. Adding doc comments both in regs.rs during defining the register, and
>> possibly at the caller site. This still does address the issue fully.
>
> If that addresses your concern, it sounds totally reasonable to me.

Sorry, I'm having trouble understanding what you guys are agreeing on.
:)

The most radical option would be to define the registers directly as
capital snake-case (NV_PGC6_...), basically a 1:1 match with OpenRM.
This would be the easiest way to cross-reference, but goes against the
Rust naming conventions. If we go all the way, this also means the field
accessors would be capital snake-case, unless we figure out a smart
macro to work this around...
Re: [PATCH 08/16] gpu: nova-core: wait for GFW_BOOT completion
Posted by Joel Fernandes 9 months, 3 weeks ago
On 4/22/2025 9:06 AM, Alexandre Courbot wrote:
> On Tue Apr 22, 2025 at 8:28 PM JST, Danilo Krummrich wrote:
>> On Mon, Apr 21, 2025 at 05:45:33PM -0400, Joel Fernandes wrote:
>>> On 4/20/2025 8:19 AM, Alexandre Courbot wrote:
>>>> diff --git a/drivers/gpu/nova-core/devinit.rs b/drivers/gpu/nova-core/devinit.rs
>>>> new file mode 100644
>>>> index 0000000000000000000000000000000000000000..ee5685aff845aa97d6b0fbe9528df9a7ba274b2c
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/nova-core/devinit.rs
>>>> @@ -0,0 +1,40 @@
>>>> +// SPDX-License-Identifier: GPL-2.0
>>>> +
>>>> +//! Methods for device initialization.
>>>
>>> Let us clarify what devinit means.
>>>
>>> devinit is a sequence of register read/writes after reset that performs tasks
>>> such as:
>>> 1. Programming VRAM memory controller timings.
>>> 2. Power sequencing.
>>> 3. Clock and PLL configuration.
>>> 4. Thermal management.
>>> 5. Performs VRAM memory scrubbing (ECC initialization) - on some GPUs, it scrubs
>>> only part of memory and then kicks of 'async scrubbing'.
>>>
>>> devinit itself is a 'script' which is interpreted by the PMU microcontroller of
>>> of the GPU by an interpreter program.
>>>
>>> Note that devinit also needs to run during suspend/resume at runtime.
>>
>> Thanks for writing this up. I fully agree that those things have to be
>> documented.

Thanks.

>>> I talked with Alex and I could add a new patch on top of this patch to add these
>>> clarifying 'doc' comments as well. I will commit them to my git branch and send
>>> on top of this as needed, but Alex can feel free to decide to squash them as well.
>>
>> Fine with both, whatever you guys prefer.
> 
> If that works with you, I will put Joel's patches improving the
> documentation right after mines adding the code in the next revision. I
> know this ideally should be a single patch, but researching this stuff
> (and producing a proper writeup) is quite involved and a separate kind
> of task from the quickly-translate-code-while-peeking-at-OpenRM work
> that I did. 

From my side, this makes sense to me.

>>>> +
>>>> +use kernel::bindings;
>>>> +use kernel::devres::Devres;
>>>> +use kernel::prelude::*;
>>>> +
>>>> +use crate::driver::Bar0;
>>>> +use crate::regs;
>>>> +
>>>> +/// Wait for devinit FW completion.
>>>> +///
>>>> +/// Upon reset, the GPU runs some firmware code to setup its core parameters. Most of the GPU is
>>>> +/// considered unusable until this step is completed, so it must be waited on very early during
>>>> +/// driver initialization.
>>>> +pub(crate) fn wait_gfw_boot_completion(bar: &Devres<Bar0>) -> Result<()> {
>>>
>>> To reduce acronym soup, we can clarify gfw means 'GPU firmware', it is a broad
>>> term used for VBIOS ROM components several of which execute before the driver
>>> loads. Perhaps that part of comment can be 'the GPU firmware (gfw) code'.
>>
>> Yes, we should absolutely explain acronyms as well as use consistent and defined
>> terminology when referring to things.
>>
>> I think we should put both into Documentation/gpu/nova/ and add the
>> corresponding pointers in the code.
> 
> SGTM.

Ack.

>>> I find this Rust convention for camel casing long constants very unreadable and
>>> troubling: Pgc6AonSecureScratchGroup05. I think we should relax this requirement
>>> for sake of readability. Could the Rust community / maintainers provide some input?
>>>
>>> Apart from readability, it also makes searching for the same register name a
>>> nightmare with other code bases written in C.
>>>
>>> Couple of ideas discussed:
>>>
>>> 1. May be have a macro that converts
>>> REG(NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK) ->
>>> regs::Pgc6AonSecureScratchGroup05 ?
>>> But not sure what it takes on the rust side to implement a macro like that.
>>>
>>> 2. Adding doc comments both in regs.rs during defining the register, and
>>> possibly at the caller site. This still does address the issue fully.
>>
>> If that addresses your concern, it sounds totally reasonable to me.
> 
> Sorry, I'm having trouble understanding what you guys are agreeing on.
> :)
> 
> The most radical option would be to define the registers directly as
> capital snake-case (NV_PGC6_...), basically a 1:1 match with OpenRM.
> This would be the easiest way to cross-reference, but goes against the
> Rust naming conventions. If we go all the way, this also means the field
> accessors would be capital snake-case, unless we figure out a smart
> macro to work this around...

I think the accessors can still be lower case, because we can do something like:

pgc6_reg = NV_PGC6_AON_SECURE_SCRATCH_GROUP_05_PRIV_LEVEL_MASK;

pgc6_reg.field_accessor();

?

Since rust convention already allows capital snake-case for statics and
constants, I think we should aim for this exception for Nova register defs
before discussing other options (i.e. directly replacing the definition of the
register from camel case to capital snake case) as is consistent with Nouveau,
Open RM etc. I think it will make things so much easier (and probably less
error-prone and maintainable) such as translation of GSP headers to Rust, string
search across Nouveau, Open RM repositories etc.

Thoughts?

thanks,

 - Joel