[v1] gpu: nova-core: gsp: add RM control command infrastructure

[PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by Eliot Courtney 1 month, 1 week ago

Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
needed for RM control calls.

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
 drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
 2 files changed, 26 insertions(+)

diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 4740cda0b51c..2cadfcaf9a8a 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
 /// The reply from the GSP to the [`GetGspInfo`] command.
 pub(crate) struct GetGspStaticInfoReply {
     gpu_name: [u8; 64],
+    h_client: u32,
+    h_subdevice: u32,
 }
 
 impl MessageFromGsp for GetGspStaticInfoReply {
@@ -210,6 +212,8 @@ fn read(
     ) -> Result<Self, Self::InitError> {
         Ok(GetGspStaticInfoReply {
             gpu_name: msg.gpu_name_str(),
+            h_client: msg.h_internal_client(),
+            h_subdevice: msg.h_internal_subdevice(),
         })
     }
 }
@@ -236,6 +240,18 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
             .to_str()
             .map_err(GpuNameError::InvalidUtf8)
     }
+
+    /// Returns the internal client handle allocated by GSP-RM.
+    #[expect(dead_code)]
+    pub(crate) fn h_client(&self) -> u32 {
+        self.h_client
+    }
+
+    /// Returns the internal subdevice handle allocated by GSP-RM.
+    #[expect(dead_code)]
+    pub(crate) fn h_subdevice(&self) -> u32 {
+        self.h_subdevice
+    }
 }
 
 /// Send the [`GetGspInfo`] command and awaits for its reply.
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index 67f44421fcc3..aaf3509a0207 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -115,6 +115,16 @@ impl GspStaticConfigInfo {
     pub(crate) fn gpu_name_str(&self) -> [u8; 64] {
         self.0.gpuNameString
     }
+
+    /// Returns the internal client handle allocated by GSP-RM.
+    pub(crate) fn h_internal_client(&self) -> u32 {
+        self.0.hInternalClient
+    }
+
+    /// Returns the internal subdevice handle allocated by GSP-RM.
+    pub(crate) fn h_internal_subdevice(&self) -> u32 {
+        self.0.hInternalSubdevice
+    }
 }
 
 // SAFETY: Padding is explicit and will not contain uninitialized data.

-- 
2.53.0

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by Joel Fernandes 4 weeks ago

On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
> needed for RM control calls.
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>  drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
> index 4740cda0b51c..2cadfcaf9a8a 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>  /// The reply from the GSP to the [`GetGspInfo`] command.
>  pub(crate) struct GetGspStaticInfoReply {
>      gpu_name: [u8; 64],
> +    h_client: u32,
> +    h_subdevice: u32,

I would rather have more descriptive names please. 'client_handle',
'subdevice_handle'. Also some explanation of what a client and a sub-device
mean somewhere in the comments or documentation would be nice.

>  }
>  
>  impl MessageFromGsp for GetGspStaticInfoReply {
> @@ -210,6 +212,8 @@ fn read(
>      ) -> Result<Self, Self::InitError> {
>          Ok(GetGspStaticInfoReply {
>              gpu_name: msg.gpu_name_str(),
> +            h_client: msg.h_internal_client(),
> +            h_subdevice: msg.h_internal_subdevice(),
>          })
>      }
>  }
> @@ -236,6 +240,18 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
>              .to_str()
>              .map_err(GpuNameError::InvalidUtf8)
>      }
> +
> +    /// Returns the internal client handle allocated by GSP-RM.
> +    #[expect(dead_code)]
> +    pub(crate) fn h_client(&self) -> u32 {
> +        self.h_client
> +    }
> +
> +    /// Returns the internal subdevice handle allocated by GSP-RM.
> +    #[expect(dead_code)]
> +    pub(crate) fn h_subdevice(&self) -> u32 {
> +        self.h_subdevice
> +    }

Same here.

>  }
>  
>  /// Send the [`GetGspInfo`] command and awaits for its reply.
> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
> index 67f44421fcc3..aaf3509a0207 100644
> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
> @@ -115,6 +115,16 @@ impl GspStaticConfigInfo {
>      pub(crate) fn gpu_name_str(&self) -> [u8; 64] {
>          self.0.gpuNameString
>      }
> +
> +    /// Returns the internal client handle allocated by GSP-RM.
> +    pub(crate) fn h_internal_client(&self) -> u32 {
> +        self.0.hInternalClient

What is the difference between and internal handle and a non-internal one?

And again, descriptive function names.

> +    }
> +
> +    /// Returns the internal subdevice handle allocated by GSP-RM.
> +    pub(crate) fn h_internal_subdevice(&self) -> u32 {

Same here.

thanks,

--
Joel Fernandes

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by Joel Fernandes 4 weeks ago


> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
> 
> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>> needed for RM control calls.
>> 
>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>> ---
>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>> 2 files changed, 26 insertions(+)
>> 
>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>> index 4740cda0b51c..2cadfcaf9a8a 100644
>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>> /// The reply from the GSP to the [`GetGspInfo`] command.
>> pub(crate) struct GetGspStaticInfoReply {
>>     gpu_name: [u8; 64],
>> +    h_client: u32,
>> +    h_subdevice: u32,
> 
> I would rather have more descriptive names please. 'client_handle',
> 'subdevice_handle'. Also some explanation of what a client and a sub-device
> mean somewhere in the comments or documentation would be nice.

Also just checking if we can have repr wrappers around the u32 for clients /
handles.  These concepts are quite common in Nvidia drivers so we should
probably create new types for them.

And if we can please document the terminology, device, subset, clients handles
etc. and add new Documentation/ entries even.

Thoughts?

-- 
Joel Fernandes

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by John Hubbard 4 weeks ago

On 3/9/26 4:41 PM, Joel Fernandes wrote:
>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>> needed for RM control calls.
>>>
>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>> ---
>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>> 2 files changed, 26 insertions(+)
>>>
>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>> pub(crate) struct GetGspStaticInfoReply {
>>>     gpu_name: [u8; 64],
>>> +    h_client: u32,
>>> +    h_subdevice: u32,
>>
>> I would rather have more descriptive names please. 'client_handle',

Maybe it's better to mirror the Open RM names, which are ancient and
well known in those circles. Changing them at this point is probably
going to result in a slightly worse situation, because there are
probably millions of lines of code out there that use the existing
nomenclature.

However...

>> 'subdevice_handle'. Also some explanation of what a client and a sub-device
>> mean somewhere in the comments or documentation would be nice.

Yes, although I expect you can simply refer to some well known pre-
existing documentation from NVIDAI for that!

> 
> Also just checking if we can have repr wrappers around the u32 for clients /
> handles.  These concepts are quite common in Nvidia drivers so we should
> probably create new types for them.
> 
> And if we can please document the terminology, device, subset, clients handles
> etc. and add new Documentation/ entries even.
> 
> Thoughts?
> 

This has already been done countless times by countless people I
think, and so we don't need to do it again. Just refer to existing
docs.

btw, as an aside:

I'm checking with our GSP firmware team to be sure, but my
understanding is that much of this is actually very temporary. Because
the GSP team does not want to continue on with this model in which
GSP has to maintain that kind of state: an internal hierarchy of
objects. Instead, they are hoping to move to an API in which nova
would directly refer to each object/item in GSP. And subdevice, in 
particular, is an old SLI term that no one wants to keep around
either. It was an ugly hack in Open RM that took more than a decade
to recover from, by moving the SLI concept out to user space.

So even though we should document what we're doing now, I would like
to also note that we suspect a certain amount of this will 
disappear, to be replaced with a somewhat simpler API, in the 
somewhat near future.

thanks,
-- 
John Hubbard

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by Joel Fernandes 4 weeks ago

Hi John,

> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
> 
> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>> needed for RM control calls.
>>>> 
>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>> ---
>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>> 2 files changed, 26 insertions(+)
>>>> 
>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>    gpu_name: [u8; 64],
>>>> +    h_client: u32,
>>>> +    h_subdevice: u32,
>>> 
>>> I would rather have more descriptive names please. 'client_handle',
> 
> Maybe it's better to mirror the Open RM names, which are ancient and
> well known in those circles. Changing them at this point is probably
> going to result in a slightly worse situation, because there are
> probably millions of lines of code out there that use the existing
> nomenclature.

I have to disagree a bit here. Saying h_ in code is a bit meaningless:
there is no mention of the word "handle" anywhere near these fields.
h_ could mean "higher", "hardware", or any number of things. The only
reason I know it means "handle" is because of expertise with Nvidia
drivers. The `_handle` suffix is self-documenting; `h_` is not.

> 
> However...
> 
>>> 'subdevice_handle'. Also some explanation of what a client and a sub-device
>>> mean somewhere in the comments or documentation would be nice.
> 
> Yes, although I expect you can simply refer to some well known pre-
> existing documentation from NVIDAI for that!

I apologize but I am a bit concerned with this approach because it feels we
are drifting into black box dev without encouraging more code comments,
documentation and cleaner code.

We need to make the driver as readable and well documented as possible, we
do not want another Nouveau with magic numbers and magic variable names.

Very least I would expect at least one or two lines of code comments of
what is a handle, what is a client, what is an internal client handle
versus not. I guess I do not understand what is the hesitation?

Sure external documentation is good but to clarify, I am referring to a few
code comments. That's not much to ask right?

Elaborate documentation files in kernel can be optional but there is
probably no harm in citing external references from in-kernel docs too. But
again I was more concerned about code comments and variable names.

> 
>> 
>> Also just checking if we can have repr wrappers around the u32 for clients /
>> handles.  These concepts are quite common in Nvidia drivers so we should
>> probably create new types for them.
>> 
>> And if we can please document the terminology, device, subset, clients handles
>> etc. and add new Documentation/ entries even.
>> 
>> Thoughts?
>> 
> 
> This has already been done countless times by countless people I
> think, and so we don't need to do it again. Just refer to existing
> docs.

Not sure if you are referring to nova-core repr or docs, but the repr
technique to wrap raw integers is pretty common in the firmware layer of
nova-core.

> 
> btw, as an aside:
> 
> I'm checking with our GSP firmware team to be sure, but my
> understanding is that much of this is actually very temporary. Because
> the GSP team does not want to continue on with this model in which
> GSP has to maintain that kind of state: an internal hierarchy of
> objects. Instead, they are hoping to move to an API in which nova
> would directly refer to each object/item in GSP. And subdevice, in

Even if this is "temporary" we can't just say "Oh, I'll just add these
legacy things and badly write them, because they're going anyway at some
unpredictable point in the future". After all, this is the Linux kernel we
are talking about :-). The bar is quite high.

> particular, is an old SLI term that no one wants to keep around
> either. It was an ugly hack in Open RM that took more than a decade
> to recover from, by moving the SLI concept out to user space.
> 
> So even though we should document what we're doing now, I would like
> to also note that we suspect a certain amount of this will
> disappear, to be replaced with a somewhat simpler API, in the
> somewhat near future.

Sure, but client handles are a broader GPU driver concept even if this
particular one is GSP-internal. We are certainly going to need a rust type
to represent a client right? Other GPU drivers also have concept of
clients. The point is not that `hInternalClient` represents a GPU user
today, it may well be temporary as you note, but that using
`#[repr(transparent)]` new types for raw u32 handles costs nothing and
makes the code better and more readable. This pattern is already
well-established in nova-core itself: see `PackedRegistryEntry` for example
being a repr type. IMHO, there should be little reason that we need the
struct to have magic u32 numbers in Rust code for concepts like "handles".

All I am saying is let us think this through before just doing the shortcut
of using u32 for client handles, etc. Rust gives us rich types, lets use them.

thanks,

-- 
Joel Fernandes

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by Alexandre Courbot 4 weeks ago

On Tue Mar 10, 2026 at 11:17 AM JST, Joel Fernandes wrote:
> Hi John,
>
>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>> 
>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>>> needed for RM control calls.
>>>>> 
>>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>>> ---
>>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>>> 2 files changed, 26 insertions(+)
>>>>> 
>>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>>    gpu_name: [u8; 64],
>>>>> +    h_client: u32,
>>>>> +    h_subdevice: u32,
>>>> 
>>>> I would rather have more descriptive names please. 'client_handle',
>> 
>> Maybe it's better to mirror the Open RM names, which are ancient and
>> well known in those circles. Changing them at this point is probably
>> going to result in a slightly worse situation, because there are
>> probably millions of lines of code out there that use the existing
>> nomenclature.
>
> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
> there is no mention of the word "handle" anywhere near these fields.
> h_ could mean "higher", "hardware", or any number of things. The only
> reason I know it means "handle" is because of expertise with Nvidia
> drivers. The `_handle` suffix is self-documenting; `h_` is not.

I tend to agree with Joel that we should try to avoid NVisms when they
get in the way of clarity - that's what we did so far actually. We can
always mention the RM name of fields in the doccomments.

The only exception being generated bindings, but they reside in their
own module and are opaque to the rest of the driver.

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by Eliot Courtney 4 weeks ago

On Tue Mar 10, 2026 at 11:36 AM JST, Alexandre Courbot wrote:
> On Tue Mar 10, 2026 at 11:17 AM JST, Joel Fernandes wrote:
>> Hi John,
>>
>>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>> 
>>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>>>> needed for RM control calls.
>>>>>> 
>>>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>>>> ---
>>>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>>>> 2 files changed, 26 insertions(+)
>>>>>> 
>>>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>>>    gpu_name: [u8; 64],
>>>>>> +    h_client: u32,
>>>>>> +    h_subdevice: u32,
>>>>> 
>>>>> I would rather have more descriptive names please. 'client_handle',
>>> 
>>> Maybe it's better to mirror the Open RM names, which are ancient and
>>> well known in those circles. Changing them at this point is probably
>>> going to result in a slightly worse situation, because there are
>>> probably millions of lines of code out there that use the existing
>>> nomenclature.
>>
>> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
>> there is no mention of the word "handle" anywhere near these fields.
>> h_ could mean "higher", "hardware", or any number of things. The only
>> reason I know it means "handle" is because of expertise with Nvidia
>> drivers. The `_handle` suffix is self-documenting; `h_` is not.
>
> I tend to agree with Joel that we should try to avoid NVisms when they
> get in the way of clarity - that's what we did so far actually. We can
> always mention the RM name of fields in the doccomments.
>
> The only exception being generated bindings, but they reside in their
> own module and are opaque to the rest of the driver.

Thanks everyone for your comments so far. I don't mind about the naming,
I can see both arguments. But we have two votes for different names for
h_client/h_subdevice/etc so far, so I'll go with that for now. If anyone
has any other suggested names keen to hear.

Having newtypes for client/device/subdevice/object is easy to do and
will help a lot with calling functions that take a bunch of these as
arguments so I think that is a great idea.

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by Danilo Krummrich 4 weeks ago

On Tue Mar 10, 2026 at 5:02 AM CET, Eliot Courtney wrote:
> On Tue Mar 10, 2026 at 11:36 AM JST, Alexandre Courbot wrote:
>> On Tue Mar 10, 2026 at 11:17 AM JST, Joel Fernandes wrote:
>>>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>>>> +    h_client: u32,
>>>>>>> +    h_subdevice: u32,
>>>>>> 
>>>>>> I would rather have more descriptive names please. 'client_handle',
>>>> 
>>>> Maybe it's better to mirror the Open RM names, which are ancient and
>>>> well known in those circles. Changing them at this point is probably
>>>> going to result in a slightly worse situation, because there are
>>>> probably millions of lines of code out there that use the existing
>>>> nomenclature.
>>>
>>> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
>>> there is no mention of the word "handle" anywhere near these fields.
>>> h_ could mean "higher", "hardware", or any number of things. The only
>>> reason I know it means "handle" is because of expertise with Nvidia
>>> drivers. The `_handle` suffix is self-documenting; `h_` is not.
>>
>> I tend to agree with Joel that we should try to avoid NVisms when they
>> get in the way of clarity - that's what we did so far actually. We can
>> always mention the RM name of fields in the doccomments.
>>
>> The only exception being generated bindings, but they reside in their
>> own module and are opaque to the rest of the driver.
>
> Thanks everyone for your comments so far. I don't mind about the naming,
> I can see both arguments. But we have two votes for different names for
> h_client/h_subdevice/etc so far, so I'll go with that for now. If anyone
> has any other suggested names keen to hear.

Please don't mirror names from other drivers (regardless whether it is Open RM,
nouveau, etc.) just to be consistent with the naming of other drivers. If the
name is good, great, otherwise please pick what makes the most sense.

In this case h_client and h_subdevice aren't great names at all. I don't want to
establish a pattern where we prefix handles with 'h' to distinguish them from
other values.

There is no reason to have such a convention if we can just represent a
handle through a proper type (which has other advantages as well), i.e. we don't
need to resolve any ambiguity due to a primitive type.

For instance, we could have a gsp::Handle type and then this could just become:

	client: gsp::Handle,
	subdev: gsp::Handle,

If we want type safety to a point where we can't even confuse different types of
handles, we could give them a type state, so we don't have to re-implement the
same thing multiple times, e.g.

	client: gsp::Handle<Client>,
	subdev: gsp::Handle<Subdev>,

I.e. when a method expects a client handle, nothing else can be passed it.

- Danilo

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by John Hubbard 4 weeks ago

On 3/9/26 7:17 PM, Joel Fernandes wrote:
> Hi John,
> 
>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>>> needed for RM control calls.
>>>>>
>>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>>> ---
>>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>>> 2 files changed, 26 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>>    gpu_name: [u8; 64],
>>>>> +    h_client: u32,
>>>>> +    h_subdevice: u32,
>>>>
>>>> I would rather have more descriptive names please. 'client_handle',
>>
>> Maybe it's better to mirror the Open RM names, which are ancient and
>> well known in those circles. Changing them at this point is probably
>> going to result in a slightly worse situation, because there are
>> probably millions of lines of code out there that use the existing
>> nomenclature.
> 
> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
> there is no mention of the word "handle" anywhere near these fields.
> h_ could mean "higher", "hardware", or any number of things. The only
> reason I know it means "handle" is because of expertise with Nvidia
> drivers. The `_handle` suffix is self-documenting; `h_` is not.

Maybe if we write h_client in the code, and "handle..." in a comment
above it? Or the other way around. Just to make the connection to
the Open RM client code that is out there.

> 
>>
>> However...
>>
>>>> 'subdevice_handle'. Also some explanation of what a client and a sub-device
>>>> mean somewhere in the comments or documentation would be nice.
>>
>> Yes, although I expect you can simply refer to some well known pre-
>> existing documentation from NVIDAI for that!
> 
> I apologize but I am a bit concerned with this approach because it feels we
> are drifting into black box dev without encouraging more code comments,
> documentation and cleaner code.
> 
> We need to make the driver as readable and well documented as possible, we
> do not want another Nouveau with magic numbers and magic variable names.
> 
> Very least I would expect at least one or two lines of code comments of
> what is a handle, what is a client, what is an internal client handle
> versus not. I guess I do not understand what is the hesitation?
> 
> Sure external documentation is good but to clarify, I am referring to a few
> code comments. That's not much to ask right?
> 

Sure, a short amount of comments would help, that sounds good.


> Elaborate documentation files in kernel can be optional but there is
> probably no harm in citing external references from in-kernel docs too. But
> again I was more concerned about code comments and variable names.
> 
>>
>>>
>>> Also just checking if we can have repr wrappers around the u32 for clients /
>>> handles.  These concepts are quite common in Nvidia drivers so we should
>>> probably create new types for them.
>>>
>>> And if we can please document the terminology, device, subset, clients handles
>>> etc. and add new Documentation/ entries even.
>>>
>>> Thoughts?
>>>
>>
>> This has already been done countless times by countless people I
>> think, and so we don't need to do it again. Just refer to existing
>> docs.
> 
> Not sure if you are referring to nova-core repr or docs, but the repr
> technique to wrap raw integers is pretty common in the firmware layer of
> nova-core.
> 
>>
>> btw, as an aside:
>>
>> I'm checking with our GSP firmware team to be sure, but my
>> understanding is that much of this is actually very temporary. Because
>> the GSP team does not want to continue on with this model in which
>> GSP has to maintain that kind of state: an internal hierarchy of
>> objects. Instead, they are hoping to move to an API in which nova
>> would directly refer to each object/item in GSP. And subdevice, in
> 
> Even if this is "temporary" we can't just say "Oh, I'll just add these
> legacy things and badly write them, because they're going anyway at some
> unpredictable point in the future". After all, this is the Linux kernel we
> are talking about :-). The bar is quite high.

I am not saying any such thing.

> 
>> particular, is an old SLI term that no one wants to keep around
>> either. It was an ugly hack in Open RM that took more than a decade
>> to recover from, by moving the SLI concept out to user space.
>>
>> So even though we should document what we're doing now, I would like
>> to also note that we suspect a certain amount of this will
>> disappear, to be replaced with a somewhat simpler API, in the
>> somewhat near future.
> 
> Sure, but client handles are a broader GPU driver concept even if this
> particular one is GSP-internal. We are certainly going to need a rust type
> to represent a client right? Other GPU drivers also have concept of
> clients. The point is not that `hInternalClient` represents a GPU user
> today, it may well be temporary as you note, but that using
> `#[repr(transparent)]` new types for raw u32 handles costs nothing and
> makes the code better and more readable. This pattern is already
> well-established in nova-core itself: see `PackedRegistryEntry` for example
> being a repr type. IMHO, there should be little reason that we need the
> struct to have magic u32 numbers in Rust code for concepts like "handles".
> 

We will debate this when it shows up, perhaps I should not have 
mentioned it, other than to remind Eliot to make it easy to delete.

> All I am saying is let us think this through before just doing the shortcut
> of using u32 for client handles, etc. Rust gives us rich types, lets use them.
> 

ohh, that's a whole other topic and idea. I wasn't going into that,
but feel free to explore if Rust can make it better.

thanks,
-- 
John Hubbard

Re: [PATCH 3/9] gpu: nova-core: gsp: expose GSP-RM internal client and subdevice handles

Posted by Joel Fernandes 3 weeks, 6 days ago


On 3/9/2026 10:29 PM, John Hubbard wrote:
> On 3/9/26 7:17 PM, Joel Fernandes wrote:
>> Hi John,
>>
>>> On Mar 9, 2026, at 8:06 PM, John Hubbard <jhubbard@nvidia.com> wrote:
>>>
>>> On 3/9/26 4:41 PM, Joel Fernandes wrote:
>>>>>> On Mar 9, 2026, at 5:22 PM, Joel Fernandes <joelagnelf@nvidia.com> wrote:
>>>>> On Fri, Feb 27, 2026 at 09:32:08PM +0900, Eliot Courtney wrote:
>>>>>> Expose the `hInternalClient` and `hInternalSubdevice` handles. These are
>>>>>> needed for RM control calls.
>>>>>>
>>>>>> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
>>>>>> ---
>>>>>> drivers/gpu/nova-core/gsp/commands.rs    | 16 ++++++++++++++++
>>>>>> drivers/gpu/nova-core/gsp/fw/commands.rs | 10 ++++++++++
>>>>>> 2 files changed, 26 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> index 4740cda0b51c..2cadfcaf9a8a 100644
>>>>>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>>>>>> @@ -197,6 +197,8 @@ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>>>>>> /// The reply from the GSP to the [`GetGspInfo`] command.
>>>>>> pub(crate) struct GetGspStaticInfoReply {
>>>>>>    gpu_name: [u8; 64],
>>>>>> +    h_client: u32,
>>>>>> +    h_subdevice: u32,
>>>>>
>>>>> I would rather have more descriptive names please. 'client_handle',
>>>
>>> Maybe it's better to mirror the Open RM names, which are ancient and
>>> well known in those circles. Changing them at this point is probably
>>> going to result in a slightly worse situation, because there are
>>> probably millions of lines of code out there that use the existing
>>> nomenclature.
>>
>> I have to disagree a bit here. Saying h_ in code is a bit meaningless:
>> there is no mention of the word "handle" anywhere near these fields.
>> h_ could mean "higher", "hardware", or any number of things. The only
>> reason I know it means "handle" is because of expertise with Nvidia
>> drivers. The `_handle` suffix is self-documenting; `h_` is not.
> 
> Maybe if we write h_client in the code, and "handle..." in a comment
> above it? Or the other way around. Just to make the connection to
> the Open RM client code that is out there.
> 
Sure, a descriptive name is better but if we're going with descriptive type
names instead, then perhaps a comment with a descriptive type name would be fine
with me too.

Here are also some sample comments that could be used (Eliot do confirm
accuracy, but this is from my notes/research):

A client is a handle that provides a namespace and lifetime boundary for GPU
resource access. All child objects (devices, memory, channels) are scoped to it
and freed when it's destroyed. Under a client, we have Device and Sub-Device
handles.

The Device/Sub-Device split comes from SLI (multiple GPUs acting as one), where
the Device broadcasts operations to all GPUs, while the Sub-Device targets a
single GPU. Today with single-GPU setups, the pair is still mandatory but always
created together as a 1:1 mapping to a single physical GPU.

>>> particular, is an old SLI term that no one wants to keep around
>>> either. It was an ugly hack in Open RM that took more than a decade
>>> to recover from, by moving the SLI concept out to user space.
>>>
>>> So even though we should document what we're doing now, I would like
>>> to also note that we suspect a certain amount of this will
>>> disappear, to be replaced with a somewhat simpler API, in the
>>> somewhat near future.
>>
>> Sure, but client handles are a broader GPU driver concept even if this
>> particular one is GSP-internal. We are certainly going to need a rust type
>> to represent a client right? Other GPU drivers also have concept of
>> clients. The point is not that `hInternalClient` represents a GPU user
>> today, it may well be temporary as you note, but that using
>> `#[repr(transparent)]` new types for raw u32 handles costs nothing and
>> makes the code better and more readable. This pattern is already
>> well-established in nova-core itself: see `PackedRegistryEntry` for example
>> being a repr type. IMHO, there should be little reason that we need the
>> struct to have magic u32 numbers in Rust code for concepts like "handles".
>>
> 
> We will debate this when it shows up, perhaps I should not have 
> mentioned it, other than to remind Eliot to make it easy to delete.
> 
>> All I am saying is let us think this through before just doing the shortcut
>> of using u32 for client handles, etc. Rust gives us rich types, lets use them.
>>
> 
> ohh, that's a whole other topic and idea. I wasn't going into that,
> but feel free to explore if Rust can make it better.
Oh ok, yeah it sounds like we are aligned on this with Eliot's other reply with
introducing new rich types for these, so we should be good there (with abundant
comments on the rich types). Will explore this further on my side as well.

Thanks!

--
Joel Fernandes