Currently, the GSP is left running after the driver is unbound. This is
not great for several reasons, notably that it can still access shared
memory areas that the kernel will now reclaim (especially problematic on
setups without an IOMMU).
Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
unbinding. This stops the GSP and let us proceed with the rest of the
unbind sequence in the next patch.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/gpu.rs | 5 +++
drivers/gpu/nova-core/gsp/boot.rs | 42 +++++++++++++++++++++++
drivers/gpu/nova-core/gsp/commands.rs | 42 +++++++++++++++++++++++
drivers/gpu/nova-core/gsp/fw.rs | 4 +++
drivers/gpu/nova-core/gsp/fw/commands.rs | 27 +++++++++++++++
drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 8 +++++
6 files changed, 128 insertions(+)
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index ef6ceced350e..b94784f57b36 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -291,6 +291,9 @@ pub(crate) fn new<'a>(
/// Called when the corresponding [`Device`](device::Device) is unbound.
///
+ /// Prepares the GPU for unbinding by shutting down the GSP and unregistering the sysmem flush
+ /// memory page.
+ ///
/// Note: This method must only be called from `Driver::unbind`.
pub(crate) fn unbind(self: Pin<&mut Self>, dev: &device::Device<device::Core>) {
let this = self.project();
@@ -299,6 +302,8 @@ pub(crate) fn unbind(self: Pin<&mut Self>, dev: &device::Device<device::Core>) {
return;
};
+ let _ = kernel::warn_on_err!(this.gsp.unload(dev, bar, this.gsp_falcon,));
+
this.sysmem_flush.unregister(bar);
}
}
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index ca7efdaa753b..e12e1d3fd53f 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -32,6 +32,7 @@
},
gpu::Chipset,
gsp::{
+ cmdq::Cmdq,
commands,
sequencer::{
GspSequencer,
@@ -231,4 +232,45 @@ pub(crate) fn boot(
Ok(())
}
+
+ /// Shutdowns the GSP and wait until it is offline.
+ fn shutdown_gsp(
+ cmdq: &mut Cmdq,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<Gsp>,
+ suspend: bool,
+ ) -> Result<()> {
+ // Send command to shutdown GSP and wait for response.
+ commands::unload_guest_driver(cmdq, bar, suspend)?;
+
+ // Wait until GSP signals it is suspended.
+ const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = 0x8000_0000;
+ read_poll_timeout(
+ || Ok(gsp_falcon.read_mailbox0(bar)),
+ |&mb0| mb0 == LIBOS_INTERRUPT_PROCESSOR_SUSPENDED,
+ Delta::ZERO,
+ Delta::from_secs(5),
+ )
+ .map(|_| ())
+ }
+
+ /// Attempts to unload the GSP firmware.
+ ///
+ /// This stops all activity on the GSP.
+ pub(crate) fn unload(
+ self: Pin<&mut Self>,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<Gsp>,
+ ) -> Result {
+ let this = self.project();
+
+ /* Shutdown the GSP */
+
+ Self::shutdown_gsp(this.cmdq, bar, gsp_falcon, false)
+ .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
+ dev_dbg!(dev, "GSP shut down\n");
+
+ Ok(())
+ }
}
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 2050771f9b53..0bcfca8c7e42 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -225,3 +225,45 @@ pub(crate) fn get_gsp_info(cmdq: &mut Cmdq, bar: &Bar0) -> Result<GetGspStaticIn
}
}
}
+
+impl CommandToGsp for UnloadingGuestDriver {
+ const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+ type Command = Self;
+ type InitError = Infallible;
+
+ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
+ *self
+ }
+}
+
+pub(crate) struct UnloadingGuestDriverReply;
+
+impl MessageFromGsp for UnloadingGuestDriverReply {
+ const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+ type InitError = Infallible;
+ type Message = ();
+
+ fn read(
+ _msg: &Self::Message,
+ _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
+ ) -> Result<Self, Self::InitError> {
+ Ok(UnloadingGuestDriverReply)
+ }
+}
+
+/// Send the [`UnloadingGuestDriver`] command to the GSP and await the reply.
+pub(crate) fn unload_guest_driver(
+ cmdq: &mut Cmdq,
+ bar: &Bar0,
+ suspend: bool,
+) -> Result<UnloadingGuestDriverReply> {
+ cmdq.send_command(bar, UnloadingGuestDriver::new(suspend))?;
+
+ loop {
+ match cmdq.receive_msg::<UnloadingGuestDriverReply>(Delta::from_secs(5)) {
+ Ok(resp) => return Ok(resp),
+ Err(ERANGE) => continue,
+ Err(e) => return Err(e),
+ }
+ }
+}
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 09549f7db52d..228464fd47a0 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -209,6 +209,7 @@ pub(crate) enum MsgFunction {
GspInitPostObjGpu = bindings::NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
GspRmControl = bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
GetStaticInfo = bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
+ UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
// Event codes
GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
@@ -249,6 +250,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
}
bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => Ok(MsgFunction::GspRmControl),
bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => Ok(MsgFunction::GetStaticInfo),
+ bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
+ Ok(MsgFunction::UnloadingGuestDriver)
+ }
bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
bindings::NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => {
Ok(MsgFunction::GspRunCpuSequencer)
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index 85465521de32..c7df4783ad21 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -125,3 +125,30 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
// SAFETY: This struct only contains integer types for which all bit patterns
// are valid.
unsafe impl FromBytes for GspStaticConfigInfo {}
+
+/// Payload of the `UnloadingGuestDriver` command and message.
+#[repr(transparent)]
+#[derive(Clone, Copy, Debug, Zeroable)]
+pub(crate) struct UnloadingGuestDriver {
+ inner: bindings::rpc_unloading_guest_driver_v1F_07,
+}
+
+impl UnloadingGuestDriver {
+ pub(crate) fn new(suspend: bool) -> Self {
+ Self {
+ inner: bindings::rpc_unloading_guest_driver_v1F_07 {
+ bInPMTransition: if suspend { 1 } else { 0 },
+ bGc6Entering: 0,
+ newLevel: if suspend { 3 } else { 0 },
+ ..Zeroable::zeroed()
+ },
+ }
+ }
+}
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for UnloadingGuestDriver {}
+
+// SAFETY: This struct only contains integer types for which all bit patterns
+// are valid.
+unsafe impl FromBytes for UnloadingGuestDriver {}
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 6d25fe0bffa9..212ccc13c0cc 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -879,6 +879,14 @@ fn default() -> Self {
}
}
#[repr(C)]
+#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
+pub struct rpc_unloading_guest_driver_v1F_07 {
+ pub bInPMTransition: u8_,
+ pub bGc6Entering: u8_,
+ pub __bindgen_padding_0: [u8; 2usize],
+ pub newLevel: u32_,
+}
+#[repr(C)]
#[derive(Debug, Default, MaybeZeroable)]
pub struct rpc_run_cpu_sequencer_v17_00 {
pub bufferSizeDWord: u32_,
--
2.52.0
Hi Alex,
Just did a quick pass, will do a proper review later:
> On Dec 16, 2025, at 12:13 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> Currently, the GSP is left running after the driver is unbound. This is
> not great for several reasons, notably that it can still access shared
> memory areas that the kernel will now reclaim (especially problematic on
> setups without an IOMMU).
>
> Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
> unbinding. This stops the GSP and let us proceed with the rest of the
> unbind sequence in the next patch.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> drivers/gpu/nova-core/gpu.rs | 5 +++
> drivers/gpu/nova-core/gsp/boot.rs | 42 +++++++++++++++++++++++
> drivers/gpu/nova-core/gsp/commands.rs | 42 +++++++++++++++++++++++
> drivers/gpu/nova-core/gsp/fw.rs | 4 +++
> drivers/gpu/nova-core/gsp/fw/commands.rs | 27 +++++++++++++++
> drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 8 +++++
> 6 files changed, 128 insertions(+)
>
> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index ef6ceced350e..b94784f57b36 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -291,6 +291,9 @@ pub(crate) fn new<'a>(
>
> /// Called when the corresponding [`Device`](device::Device) is unbound.
> ///
> + /// Prepares the GPU for unbinding by shutting down the GSP and unregistering the sysmem flush
> + /// memory page.
> + ///
> /// Note: This method must only be called from `Driver::unbind`.
> pub(crate) fn unbind(self: Pin<&mut Self>, dev: &device::Device<device::Core>) {
> let this = self.project();
> @@ -299,6 +302,8 @@ pub(crate) fn unbind(self: Pin<&mut Self>, dev: &device::Device<device::Core>) {
> return;
> };
>
> + let _ = kernel::warn_on_err!(this.gsp.unload(dev, bar, this.gsp_falcon,));
> +
> this.sysmem_flush.unregister(bar);
> }
> }
> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
> index ca7efdaa753b..e12e1d3fd53f 100644
> --- a/drivers/gpu/nova-core/gsp/boot.rs
> +++ b/drivers/gpu/nova-core/gsp/boot.rs
> @@ -32,6 +32,7 @@
> },
> gpu::Chipset,
> gsp::{
> + cmdq::Cmdq,
> commands,
> sequencer::{
> GspSequencer,
> @@ -231,4 +232,45 @@ pub(crate) fn boot(
>
> Ok(())
> }
> +
> + /// Shutdowns the GSP and wait until it is offline.
> + fn shutdown_gsp(
> + cmdq: &mut Cmdq,
> + bar: &Bar0,
> + gsp_falcon: &Falcon<Gsp>,
> + suspend: bool,
> + ) -> Result<()> {
> + // Send command to shutdown GSP and wait for response.
> + commands::unload_guest_driver(cmdq, bar, suspend)?;
> +
> + // Wait until GSP signals it is suspended.
> + const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = 0x8000_0000;
> + read_poll_timeout(
> + || Ok(gsp_falcon.read_mailbox0(bar)),
> + |&mb0| mb0 == LIBOS_INTERRUPT_PROCESSOR_SUSPENDED,
> + Delta::ZERO,
> + Delta::from_secs(5),
> + )
> + .map(|_| ())
> + }
> +
> + /// Attempts to unload the GSP firmware.
> + ///
> + /// This stops all activity on the GSP.
> + pub(crate) fn unload(
> + self: Pin<&mut Self>,
> + dev: &device::Device<device::Bound>,
> + bar: &Bar0,
> + gsp_falcon: &Falcon<Gsp>,
> + ) -> Result {
> + let this = self.project();
> +
> + /* Shutdown the GSP */
> +
> + Self::shutdown_gsp(this.cmdq, bar, gsp_falcon, false)
> + .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
> + dev_dbg!(dev, "GSP shut down\n");
> +
> + Ok(())
> + }
> }
> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
> index 2050771f9b53..0bcfca8c7e42 100644
> --- a/drivers/gpu/nova-core/gsp/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/commands.rs
> @@ -225,3 +225,45 @@ pub(crate) fn get_gsp_info(cmdq: &mut Cmdq, bar: &Bar0) -> Result<GetGspStaticIn
> }
> }
> }
> +
> +impl CommandToGsp for UnloadingGuestDriver {
> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
> + type Command = Self;
> + type InitError = Infallible;
> +
> + fn init(&self) -> impl Init<Self::Command, Self::InitError> {
> + *self
> + }
> +}
> +
> +pub(crate) struct UnloadingGuestDriverReply;
> +
> +impl MessageFromGsp for UnloadingGuestDriverReply {
> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
> + type InitError = Infallible;
> + type Message = ();
> +
> + fn read(
> + _msg: &Self::Message,
> + _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
> + ) -> Result<Self, Self::InitError> {
> + Ok(UnloadingGuestDriverReply)
> + }
> +}
> +
> +/// Send the [`UnloadingGuestDriver`] command to the GSP and await the reply.
> +pub(crate) fn unload_guest_driver(
> + cmdq: &mut Cmdq,
> + bar: &Bar0,
> + suspend: bool,
> +) -> Result<UnloadingGuestDriverReply> {
> + cmdq.send_command(bar, UnloadingGuestDriver::new(suspend))?;
> +
> + loop {
> + match cmdq.receive_msg::<UnloadingGuestDriverReply>(Delta::from_secs(5)) {
> + Ok(resp) => return Ok(resp),
> + Err(ERANGE) => continue,
> + Err(e) => return Err(e),
> + }
This outer loop can go on infinitely, lets bound it?
Either outer timeout or bounded number of tries. Bounded tries better since it has inner timeout.
> + }
> +}
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 09549f7db52d..228464fd47a0 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -209,6 +209,7 @@ pub(crate) enum MsgFunction {
> GspInitPostObjGpu = bindings::NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
> GspRmControl = bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> GetStaticInfo = bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> + UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>
> // Event codes
> GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
> @@ -249,6 +250,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
> }
> bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => Ok(MsgFunction::GspRmControl),
> bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => Ok(MsgFunction::GetStaticInfo),
> + bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
> + Ok(MsgFunction::UnloadingGuestDriver)
> + }
> bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
> bindings::NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => {
> Ok(MsgFunction::GspRunCpuSequencer)
> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
> index 85465521de32..c7df4783ad21 100644
> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
> @@ -125,3 +125,30 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
> // SAFETY: This struct only contains integer types for which all bit patterns
> // are valid.
> unsafe impl FromBytes for GspStaticConfigInfo {}
> +
> +/// Payload of the `UnloadingGuestDriver` command and message.
> +#[repr(transparent)]
> +#[derive(Clone, Copy, Debug, Zeroable)]
> +pub(crate) struct UnloadingGuestDriver {
> + inner: bindings::rpc_unloading_guest_driver_v1F_07,
> +}
> +
> +impl UnloadingGuestDriver {
> + pub(crate) fn new(suspend: bool) -> Self {
> + Self {
> + inner: bindings::rpc_unloading_guest_driver_v1F_07 {
We should go through intermediate firmware representation than direct bindings access?
> + bInPMTransition: if suspend { 1 } else { 0 },
Then this can just be passed as a bool.
> + bGc6Entering: 0,
> + newLevel: if suspend { 3 } else { 0 },
> + ..Zeroable::zeroed()
> + },
> + }
> + }
> +}
> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for UnloadingGuestDriver {}
> +
> +// SAFETY: This struct only contains integer types for which all bit patterns
> +// are valid.
> +unsafe impl FromBytes for UnloadingGuestDriver {}
> diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> index 6d25fe0bffa9..212ccc13c0cc 100644
> --- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> +++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
> @@ -879,6 +879,14 @@ fn default() -> Self {
> }
> }
> #[repr(C)]
> +#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
> +pub struct rpc_unloading_guest_driver_v1F_07 {
> + pub bInPMTransition: u8_,
> + pub bGc6Entering: u8_,
> + pub __bindgen_padding_0: [u8; 2usize],
> + pub newLevel: u32_,
When these are intermediate represented, documentation would be nice on the fields.
thanks,
- Joel
> +}
> +#[repr(C)]
> #[derive(Debug, Default, MaybeZeroable)]
> pub struct rpc_run_cpu_sequencer_v17_00 {
> pub bufferSizeDWord: u32_,
>
> --
> 2.52.0
>
On Wed Dec 17, 2025 at 12:39 AM JST, Joel Fernandes wrote:
> Hi Alex,
>
> Just did a quick pass, will do a proper review later:
>
>> On Dec 16, 2025, at 12:13 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> Currently, the GSP is left running after the driver is unbound. This is
>> not great for several reasons, notably that it can still access shared
>> memory areas that the kernel will now reclaim (especially problematic on
>> setups without an IOMMU).
>>
>> Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
>> unbinding. This stops the GSP and let us proceed with the rest of the
>> unbind sequence in the next patch.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> drivers/gpu/nova-core/gpu.rs | 5 +++
>> drivers/gpu/nova-core/gsp/boot.rs | 42 +++++++++++++++++++++++
>> drivers/gpu/nova-core/gsp/commands.rs | 42 +++++++++++++++++++++++
>> drivers/gpu/nova-core/gsp/fw.rs | 4 +++
>> drivers/gpu/nova-core/gsp/fw/commands.rs | 27 +++++++++++++++
>> drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 8 +++++
>> 6 files changed, 128 insertions(+)
>>
>> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
>> index ef6ceced350e..b94784f57b36 100644
>> --- a/drivers/gpu/nova-core/gpu.rs
>> +++ b/drivers/gpu/nova-core/gpu.rs
>> @@ -291,6 +291,9 @@ pub(crate) fn new<'a>(
>>
>> /// Called when the corresponding [`Device`](device::Device) is unbound.
>> ///
>> + /// Prepares the GPU for unbinding by shutting down the GSP and unregistering the sysmem flush
>> + /// memory page.
>> + ///
>> /// Note: This method must only be called from `Driver::unbind`.
>> pub(crate) fn unbind(self: Pin<&mut Self>, dev: &device::Device<device::Core>) {
>> let this = self.project();
>> @@ -299,6 +302,8 @@ pub(crate) fn unbind(self: Pin<&mut Self>, dev: &device::Device<device::Core>) {
>> return;
>> };
>>
>> + let _ = kernel::warn_on_err!(this.gsp.unload(dev, bar, this.gsp_falcon,));
>> +
>> this.sysmem_flush.unregister(bar);
>> }
>> }
>> diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
>> index ca7efdaa753b..e12e1d3fd53f 100644
>> --- a/drivers/gpu/nova-core/gsp/boot.rs
>> +++ b/drivers/gpu/nova-core/gsp/boot.rs
>> @@ -32,6 +32,7 @@
>> },
>> gpu::Chipset,
>> gsp::{
>> + cmdq::Cmdq,
>> commands,
>> sequencer::{
>> GspSequencer,
>> @@ -231,4 +232,45 @@ pub(crate) fn boot(
>>
>> Ok(())
>> }
>> +
>> + /// Shutdowns the GSP and wait until it is offline.
>> + fn shutdown_gsp(
>> + cmdq: &mut Cmdq,
>> + bar: &Bar0,
>> + gsp_falcon: &Falcon<Gsp>,
>> + suspend: bool,
>> + ) -> Result<()> {
>> + // Send command to shutdown GSP and wait for response.
>> + commands::unload_guest_driver(cmdq, bar, suspend)?;
>> +
>> + // Wait until GSP signals it is suspended.
>> + const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = 0x8000_0000;
>> + read_poll_timeout(
>> + || Ok(gsp_falcon.read_mailbox0(bar)),
>> + |&mb0| mb0 == LIBOS_INTERRUPT_PROCESSOR_SUSPENDED,
>> + Delta::ZERO,
>> + Delta::from_secs(5),
>> + )
>> + .map(|_| ())
>> + }
>> +
>> + /// Attempts to unload the GSP firmware.
>> + ///
>> + /// This stops all activity on the GSP.
>> + pub(crate) fn unload(
>> + self: Pin<&mut Self>,
>> + dev: &device::Device<device::Bound>,
>> + bar: &Bar0,
>> + gsp_falcon: &Falcon<Gsp>,
>> + ) -> Result {
>> + let this = self.project();
>> +
>> + /* Shutdown the GSP */
>> +
>> + Self::shutdown_gsp(this.cmdq, bar, gsp_falcon, false)
>> + .inspect_err(|e| dev_err!(dev, "unload guest driver failed: {:?}", e))?;
>> + dev_dbg!(dev, "GSP shut down\n");
>> +
>> + Ok(())
>> + }
>> }
>> diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
>> index 2050771f9b53..0bcfca8c7e42 100644
>> --- a/drivers/gpu/nova-core/gsp/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/commands.rs
>> @@ -225,3 +225,45 @@ pub(crate) fn get_gsp_info(cmdq: &mut Cmdq, bar: &Bar0) -> Result<GetGspStaticIn
>> }
>> }
>> }
>> +
>> +impl CommandToGsp for UnloadingGuestDriver {
>> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
>> + type Command = Self;
>> + type InitError = Infallible;
>> +
>> + fn init(&self) -> impl Init<Self::Command, Self::InitError> {
>> + *self
>> + }
>> +}
>> +
>> +pub(crate) struct UnloadingGuestDriverReply;
>> +
>> +impl MessageFromGsp for UnloadingGuestDriverReply {
>> + const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
>> + type InitError = Infallible;
>> + type Message = ();
>> +
>> + fn read(
>> + _msg: &Self::Message,
>> + _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
>> + ) -> Result<Self, Self::InitError> {
>> + Ok(UnloadingGuestDriverReply)
>> + }
>> +}
>> +
>> +/// Send the [`UnloadingGuestDriver`] command to the GSP and await the reply.
>> +pub(crate) fn unload_guest_driver(
>> + cmdq: &mut Cmdq,
>> + bar: &Bar0,
>> + suspend: bool,
>> +) -> Result<UnloadingGuestDriverReply> {
>> + cmdq.send_command(bar, UnloadingGuestDriver::new(suspend))?;
>> +
>> + loop {
>> + match cmdq.receive_msg::<UnloadingGuestDriverReply>(Delta::from_secs(5)) {
>> + Ok(resp) => return Ok(resp),
>> + Err(ERANGE) => continue,
>> + Err(e) => return Err(e),
>> + }
>
> This outer loop can go on infinitely, lets bound it?
>
> Either outer timeout or bounded number of tries. Bounded tries better since it has inner timeout.
Yes. And what we really want is a more generic way to do this, because
this pattern occurs with several commands so far (and more to come).
>
>> + }
>> +}
>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>> index 09549f7db52d..228464fd47a0 100644
>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> @@ -209,6 +209,7 @@ pub(crate) enum MsgFunction {
>> GspInitPostObjGpu = bindings::NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
>> GspRmControl = bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
>> GetStaticInfo = bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
>> + UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>>
>> // Event codes
>> GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>> @@ -249,6 +250,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
>> }
>> bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => Ok(MsgFunction::GspRmControl),
>> bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => Ok(MsgFunction::GetStaticInfo),
>> + bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
>> + Ok(MsgFunction::UnloadingGuestDriver)
>> + }
>> bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
>> bindings::NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => {
>> Ok(MsgFunction::GspRunCpuSequencer)
>> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
>> index 85465521de32..c7df4783ad21 100644
>> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
>> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
>> @@ -125,3 +125,30 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
>> // SAFETY: This struct only contains integer types for which all bit patterns
>> // are valid.
>> unsafe impl FromBytes for GspStaticConfigInfo {}
>> +
>> +/// Payload of the `UnloadingGuestDriver` command and message.
>> +#[repr(transparent)]
>> +#[derive(Clone, Copy, Debug, Zeroable)]
>> +pub(crate) struct UnloadingGuestDriver {
>> + inner: bindings::rpc_unloading_guest_driver_v1F_07,
>> +}
>> +
>> +impl UnloadingGuestDriver {
>> + pub(crate) fn new(suspend: bool) -> Self {
>> + Self {
>> + inner: bindings::rpc_unloading_guest_driver_v1F_07 {
>
> We should go through intermediate firmware representation than direct bindings access?
Only if the size of the bindings justifies it - here having an opaque
wrapper just just well, and spares us some code down the line as we
would have to initialize the bindings anyway.
>
>
>> + bInPMTransition: if suspend { 1 } else { 0 },
>
> Then this can just be passed as a bool.
>
>> + bGc6Entering: 0,
>> + newLevel: if suspend { 3 } else { 0 },
Note to self to figure out these magic numbers. :)
On Thu, 2025-12-18 at 22:27 +0900, Alexandre Courbot wrote:
> > > + loop {
> > > + match cmdq.receive_msg::<UnloadingGuestDriverReply>(Delta::from_secs(5)) {
> > > + Ok(resp) => return Ok(resp),
> > > + Err(ERANGE) => continue,
> > > + Err(e) => return Err(e),
> > > + }
> >
> > This outer loop can go on infinitely, lets bound it?
> >
> > Either outer timeout or bounded number of tries. Bounded tries better since it has inner
> > timeout.
>
> Yes. And what we really want is a more generic way to do this, because
> this pattern occurs with several commands so far (and more to come).
Isn't the real problem that we are polling for a specific message, when all message should be
handled asynchronously as events, like Nouveau does?
Err(ERANGE) => continue,
This effectively throws out all other messages, including errors and anything else important.
On Fri Dec 19, 2025 at 7:33 AM JST, Timur Tabi wrote:
> On Thu, 2025-12-18 at 22:27 +0900, Alexandre Courbot wrote:
>> > > + loop {
>> > > + match cmdq.receive_msg::<UnloadingGuestDriverReply>(Delta::from_secs(5)) {
>> > > + Ok(resp) => return Ok(resp),
>> > > + Err(ERANGE) => continue,
>> > > + Err(e) => return Err(e),
>> > > + }
>> >
>> > This outer loop can go on infinitely, lets bound it?
>> >
>> > Either outer timeout or bounded number of tries. Bounded tries better since it has inner
>> > timeout.
>>
>> Yes. And what we really want is a more generic way to do this, because
>> this pattern occurs with several commands so far (and more to come).
>
> Isn't the real problem that we are polling for a specific message, when all message should be
> handled asynchronously as events, like Nouveau does?
Does Nouveau really handle all messages asynchronously? Just taking a
look at `r535_gsp_rpc_send` I see:
* A potential busy-loop with `r535_gsp_rpc_handle_reply`, An argument to
* define whether we should wait for a reply (`policy`).
So it seems like each GSP command expecting a reply is effectively
looping until it arrives, with some messages (LIBOS_PRINT, SEQUENCER,
etc.) being managed by a notifier registered with the command queue. But
messages sent explicitly by the driver don't seem to make use of it and
instead process messages until they find their reply.
This seems to work because IIUC the GSP sends replies in the same order
as it received the messages (so one caller cannot consume the reply
intended to another) and GSP messages are a replacement for the CPU
messing with the hardware itself anyway. So in that context that design
is not particularly awful, but maybe we will want to switch to something
interrupt-based in Nova long-term anyway.
On Fri, 2025-12-19 at 12:39 +0900, Alexandre Courbot wrote: > Does Nouveau really handle all messages asynchronously? Just taking a > look at `r535_gsp_rpc_send` I see: > > * A potential busy-loop with `r535_gsp_rpc_handle_reply`, An argument to > * define whether we should wait for a reply (`policy`). > > So it seems like each GSP command expecting a reply is effectively > looping until it arrives, with some messages (LIBOS_PRINT, SEQUENCER, > etc.) being managed by a notifier registered with the command queue. But > messages sent explicitly by the driver don't seem to make use of it and > instead process messages until they find their reply. Yes, you're right. But the difference is that in Nouveau, all message processing is handled by r535_gsp_msg_recv(), which always also handles all of the asynchronous "other" messages. The above `loop` expression in Nova doesn't do that. It's missing the asynchronous handler. This is the crux of my concern. > This seems to work because IIUC the GSP sends replies in the same order > as it received the messages (so one caller cannot consume the reply > intended to another) and GSP messages are a replacement for the CPU > messing with the hardware itself anyway. So in that context that design > is not particularly awful, but maybe we will want to switch to something > interrupt-based in Nova long-term anyway. Sure, but we still need to do it the way Nouveau handles it. We need our own version of r535_gsp_rpc_handle_reply() which unifies handling of all incoming messages, either polling or interrupt-based. For now, we can always pass NVKM_GSP_RPC_REPLY_POLL or NVKM_GSP_RPC_REPLY_RECV, which are polling-based. NVKM_GSP_RPC_REPLY_NOWAIT is apparently interrupt-based, which I believe is triggered via r535_gsp_msgq_work().
On Sun Dec 21, 2025 at 6:30 AM JST, Timur Tabi wrote: > On Fri, 2025-12-19 at 12:39 +0900, Alexandre Courbot wrote: > > > >> Does Nouveau really handle all messages asynchronously? Just taking a >> look at `r535_gsp_rpc_send` I see: >> >> * A potential busy-loop with `r535_gsp_rpc_handle_reply`, An argument to >> * define whether we should wait for a reply (`policy`). >> >> So it seems like each GSP command expecting a reply is effectively >> looping until it arrives, with some messages (LIBOS_PRINT, SEQUENCER, >> etc.) being managed by a notifier registered with the command queue. But >> messages sent explicitly by the driver don't seem to make use of it and >> instead process messages until they find their reply. > > Yes, you're right. But the difference is that in Nouveau, all message processing is handled by > r535_gsp_msg_recv(), which always also handles all of the asynchronous "other" messages. > > The above `loop` expression in Nova doesn't do that. It's missing the asynchronous handler. > This is the crux of my concern. I agree with you that this is something we want to improve (either by adding a handler to the loop, or some other way). It should be its own patch series once we have better visibility about how we want to handle message, as the current implementation is still very crude and ad-hoc.
> On Dec 18, 2025, at 5:33 PM, Timur Tabi <ttabi@nvidia.com> wrote:
>
> On Thu, 2025-12-18 at 22:27 +0900, Alexandre Courbot wrote:
>>>> + loop {
>>>> + match cmdq.receive_msg::<UnloadingGuestDriverReply>(Delta::from_secs(5)) {
>>>> + Ok(resp) => return Ok(resp),
>>>> + Err(ERANGE) => continue,
>>>> + Err(e) => return Err(e),
>>>> + }
>>>
>>> This outer loop can go on infinitely, lets bound it?
>>>
>>> Either outer timeout or bounded number of tries. Bounded tries better since it has inner
>>> timeout.
>>
>> Yes. And what we really want is a more generic way to do this, because
>> this pattern occurs with several commands so far (and more to come).
>
> Isn't the real problem that we are polling for a specific message, when all message should be
> handled asynchronously as events, like Nouveau does?
>
> Err(ERANGE) => continue,
>
> This effectively throws out all other messages, including errors and anything else important.
>
Indeed, for that we need Interrupts. For the rest of the patterns where we need the message synchronously, we should bound this. Hanging in the driver is unacceptable.
Interrupts are almost here now, considering we/others have upstreamed all the required patches for Rust interrupt support.
On Thu, 2025-12-18 at 22:44 +0000, Joel Fernandes wrote: > > Isn't the real problem that we are polling for a specific message, when all message should be > > handled asynchronously as events, like Nouveau does? > > > > Err(ERANGE) => continue, > > > > This effectively throws out all other messages, including errors and anything else important. > > > > Indeed, for that we need Interrupts. For the rest of the patterns where we need the message > synchronously, we should bound this. Hanging in the driver is unacceptable. It's going to be difficulty to have a running asynchronous message handler in the background *and* poll synchronously for a specific message on occasional. I would say that even in this case, we should handle the message asynchronously. So instead of polling on the message queue, we just wait on a semaphore, with a timeout. > Interrupts are almost here now, considering we/others have upstreamed all the required patches for > Rust interrupt support. So my suggestion is that we don't overcomplicate this code today, since it's all going to be ripped out later.
On 12/18/2025 6:34 PM, Timur Tabi wrote:
> On Thu, 2025-12-18 at 22:44 +0000, Joel Fernandes wrote:
>>> Isn't the real problem that we are polling for a specific message, when all message should be
>>> handled asynchronously as events, like Nouveau does?
>>>
>>> Err(ERANGE) => continue,
>>>
>>> This effectively throws out all other messages, including errors and anything else important.
>>>
>>
>> Indeed, for that we need Interrupts. For the rest of the patterns where we need the message
>> synchronously, we should bound this. Hanging in the driver is unacceptable.
>
> It's going to be difficulty to have a running asynchronous message handler in the background *and*
> poll synchronously for a specific message on occasional. I would say that even in this case, we
> should handle the message asynchronously. So instead of polling on the message queue, we just wait
> on a semaphore, with a timeout.
I think we don't strictly need a semaphore for synchronous polling - the wait is
expected to be short AFAIK and if not we should just error out. What we need is
a registration mechanism that registers different event types and their
handlers, and if the message received is not an expected one, we simply call the
event handler registered while continuing to poll for the message we are
expecting until it is received: See how Nouveau does it in r535_gsp_msg_recv().
Anyway, the wait should be expected to be short and if not, we'd break out of
the loop { }.
Interestingly, Nouveau inserts 2 micro second sleeps while polling AFAICS. Where
as OpenRM simply spins without sleeps. I would even say that sleeping in the
loop is risky due to the dependency on atomic context, so we'd have to be
careful there (I am guessing all our usecases for these loops are non-atomic
context?).
We still need the interrupt handling for cases where we don't need synchronous
polling. During then, we will directly call the event handlers from
IRQ->workqueue path. The event handler registration/calling code in both cases
should be the same.
So in the loop { }, nova needs something like this:
Err(ERANGE) => {
// Dispatch to notification
dispatch_async_message(msg); // Same ones called by Async handling.
continue;
}
- Joel
On 12/18/2025 8:46 PM, Joel Fernandes wrote:
>
>
> On 12/18/2025 6:34 PM, Timur Tabi wrote:
>> On Thu, 2025-12-18 at 22:44 +0000, Joel Fernandes wrote:
>>>> Isn't the real problem that we are polling for a specific message, when all message should be
>>>> handled asynchronously as events, like Nouveau does?
>>>>
>>>> Err(ERANGE) => continue,
>>>>
>>>> This effectively throws out all other messages, including errors and anything else important.
>>>>
>>>
>>> Indeed, for that we need Interrupts. For the rest of the patterns where we need the message
>>> synchronously, we should bound this. Hanging in the driver is unacceptable.
>>
>> It's going to be difficulty to have a running asynchronous message handler in the background *and*
>> poll synchronously for a specific message on occasional. I would say that even in this case, we
>> should handle the message asynchronously. So instead of polling on the message queue, we just wait
>> on a semaphore, with a timeout.
>
> I think we don't strictly need a semaphore for synchronous polling - the wait is
> expected to be short AFAIK and if not we should just error out. What we need is
> a registration mechanism that registers different event types and their
> handlers, and if the message received is not an expected one, we simply call the
> event handler registered while continuing to poll for the message we are
> expecting until it is received: See how Nouveau does it in r535_gsp_msg_recv().
> Anyway, the wait should be expected to be short and if not, we'd break out of
> the loop { }.
>
> Interestingly, Nouveau inserts 2 micro second sleeps while polling AFAICS. Where
> as OpenRM simply spins without sleeps. I would even say that sleeping in the
> loop is risky due to the dependency on atomic context, so we'd have to be
> careful there (I am guessing all our usecases for these loops are non-atomic
> context?).
>
> We still need the interrupt handling for cases where we don't need synchronous
> polling. During then, we will directly call the event handlers from
> IRQ->workqueue path. The event handler registration/calling code in both cases
> should be the same.
>
> So in the loop { }, nova needs something like this:
>
> Err(ERANGE) => {
> // Dispatch to notification
> dispatch_async_message(msg); // Same ones called by Async handling.
> continue;
> }
Btw, I can work on this tomorrow and send out some patches.
Hi Alex,
>>> + }
>>> +}
>>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>>> index 09549f7db52d..228464fd47a0 100644
>>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>>> @@ -209,6 +209,7 @@ pub(crate) enum MsgFunction {
>>> GspInitPostObjGpu = bindings::NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
>>> GspRmControl = bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
>>> GetStaticInfo = bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
>>> + UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>>>
>>> // Event codes
>>> GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>>> @@ -249,6 +250,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
>>> }
>>> bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => Ok(MsgFunction::GspRmControl),
>>> bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => Ok(MsgFunction::GetStaticInfo),
>>> + bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
>>> + Ok(MsgFunction::UnloadingGuestDriver)
>>> + }
>>> bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
>>> bindings::NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => {
>>> Ok(MsgFunction::GspRunCpuSequencer)
>>> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
>>> index 85465521de32..c7df4783ad21 100644
>>> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
>>> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
>>> @@ -125,3 +125,30 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
>>> // SAFETY: This struct only contains integer types for which all bit patterns
>>> // are valid.
>>> unsafe impl FromBytes for GspStaticConfigInfo {}
>>> +
>>> +/// Payload of the `UnloadingGuestDriver` command and message.
>>> +#[repr(transparent)]
>>> +#[derive(Clone, Copy, Debug, Zeroable)]
>>> +pub(crate) struct UnloadingGuestDriver {
>>> + inner: bindings::rpc_unloading_guest_driver_v1F_07,
>>> +}
>>> +
>>> +impl UnloadingGuestDriver {
>>> + pub(crate) fn new(suspend: bool) -> Self {
>>> + Self {
>>> + inner: bindings::rpc_unloading_guest_driver_v1F_07 {
>>
>> We should go through intermediate firmware representation than direct bindings access?
>
> Only if the size of the bindings justifies it - here having an opaque
> wrapper just just well, and spares us some code down the line as we
> would have to initialize the bindings anyway.
I am not sure about that, it sounds like a layering violation. It would be good not to keep the rules fuzzy about this, because then we could do it either way in all cases.
Another reason is that we cannot anticipate in advance which specific helper functions we will need to add in the future. Down the line, we may need to add some helper functions to the struct as well. Also having V1F07 in the name sounds very magic number-ish. It would be good to abstract that out with a better-named struct anyway.
Thanks,
- Joel
>
>>
>>
>>> + bInPMTransition: if suspend { 1 } else { 0 },
>>
>> Then this can just be passed as a bool.
>>
>>> + bGc6Entering: 0,
>>> + newLevel: if suspend { 3 } else { 0 },
>
> Note to self to figure out these magic numbers. :)
>
On Fri Dec 19, 2025 at 5:52 AM JST, Joel Fernandes wrote:
> Hi Alex,
>
>>>> + }
>>>> +}
>>>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>>>> index 09549f7db52d..228464fd47a0 100644
>>>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>>>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>>>> @@ -209,6 +209,7 @@ pub(crate) enum MsgFunction {
>>>> GspInitPostObjGpu = bindings::NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
>>>> GspRmControl = bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
>>>> GetStaticInfo = bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
>>>> + UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>>>>
>>>> // Event codes
>>>> GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>>>> @@ -249,6 +250,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
>>>> }
>>>> bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => Ok(MsgFunction::GspRmControl),
>>>> bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => Ok(MsgFunction::GetStaticInfo),
>>>> + bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
>>>> + Ok(MsgFunction::UnloadingGuestDriver)
>>>> + }
>>>> bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
>>>> bindings::NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => {
>>>> Ok(MsgFunction::GspRunCpuSequencer)
>>>> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
>>>> index 85465521de32..c7df4783ad21 100644
>>>> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
>>>> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
>>>> @@ -125,3 +125,30 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
>>>> // SAFETY: This struct only contains integer types for which all bit patterns
>>>> // are valid.
>>>> unsafe impl FromBytes for GspStaticConfigInfo {}
>>>> +
>>>> +/// Payload of the `UnloadingGuestDriver` command and message.
>>>> +#[repr(transparent)]
>>>> +#[derive(Clone, Copy, Debug, Zeroable)]
>>>> +pub(crate) struct UnloadingGuestDriver {
>>>> + inner: bindings::rpc_unloading_guest_driver_v1F_07,
>>>> +}
>>>> +
>>>> +impl UnloadingGuestDriver {
>>>> + pub(crate) fn new(suspend: bool) -> Self {
>>>> + Self {
>>>> + inner: bindings::rpc_unloading_guest_driver_v1F_07 {
>>>
>>> We should go through intermediate firmware representation than direct bindings access?
>>
>> Only if the size of the bindings justifies it - here having an opaque
>> wrapper just just well, and spares us some code down the line as we
>> would have to initialize the bindings anyway.
>
> I am not sure about that, it sounds like a layering violation. It would be good not to keep the rules fuzzy about this, because then we could do it either way in all cases.
>
> Another reason is that we cannot anticipate in advance which specific helper functions we will need to add in the future. Down the line, we may need to add some helper functions to the struct as well. Also having V1F07 in the name sounds very magic number-ish. It would be good to abstract that out with a better-named struct anyway.
The rules are not fuzzy. The only thing modules outside of `fw` are
seeing is a struct named `UnloadingGuestDriver`, and the name with
`V1F07` is not leaked.
Even if we had a different structure, we would still need to write the
rpc_unloading_guest_driver_v1F_07 at some point, so we would need to
refer to it in `fw` anyway. The current design is not any more lax than
that, it just removes one step.
> On Dec 18, 2025, at 10:26 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> On Fri Dec 19, 2025 at 5:52 AM JST, Joel Fernandes wrote:
>> Hi Alex,
>>
>>>>> + }
>>>>> +}
>>>>> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>>>>> index 09549f7db52d..228464fd47a0 100644
>>>>> --- a/drivers/gpu/nova-core/gsp/fw.rs
>>>>> +++ b/drivers/gpu/nova-core/gsp/fw.rs
>>>>> @@ -209,6 +209,7 @@ pub(crate) enum MsgFunction {
>>>>> GspInitPostObjGpu = bindings::NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
>>>>> GspRmControl = bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
>>>>> GetStaticInfo = bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
>>>>> + UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
>>>>>
>>>>> // Event codes
>>>>> GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>>>>> @@ -249,6 +250,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
>>>>> }
>>>>> bindings::NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => Ok(MsgFunction::GspRmControl),
>>>>> bindings::NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => Ok(MsgFunction::GetStaticInfo),
>>>>> + bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
>>>>> + Ok(MsgFunction::UnloadingGuestDriver)
>>>>> + }
>>>>> bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
>>>>> bindings::NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => {
>>>>> Ok(MsgFunction::GspRunCpuSequencer)
>>>>> diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
>>>>> index 85465521de32..c7df4783ad21 100644
>>>>> --- a/drivers/gpu/nova-core/gsp/fw/commands.rs
>>>>> +++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
>>>>> @@ -125,3 +125,30 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
>>>>> // SAFETY: This struct only contains integer types for which all bit patterns
>>>>> // are valid.
>>>>> unsafe impl FromBytes for GspStaticConfigInfo {}
>>>>> +
>>>>> +/// Payload of the `UnloadingGuestDriver` command and message.
>>>>> +#[repr(transparent)]
>>>>> +#[derive(Clone, Copy, Debug, Zeroable)]
>>>>> +pub(crate) struct UnloadingGuestDriver {
>>>>> + inner: bindings::rpc_unloading_guest_driver_v1F_07,
>>>>> +}
>>>>> +
>>>>> +impl UnloadingGuestDriver {
>>>>> + pub(crate) fn new(suspend: bool) -> Self {
>>>>> + Self {
>>>>> + inner: bindings::rpc_unloading_guest_driver_v1F_07 {
>>>>
>>>> We should go through intermediate firmware representation than direct bindings access?
>>>
>>> Only if the size of the bindings justifies it - here having an opaque
>>> wrapper just just well, and spares us some code down the line as we
>>> would have to initialize the bindings anyway.
>>
>> I am not sure about that, it sounds like a layering violation. It would be good not to keep the rules fuzzy about this, because then we could do it either way in all cases.
>>
>> Another reason is that we cannot anticipate in advance which specific helper functions we will need to add in the future. Down the line, we may need to add some helper functions to the struct as well. Also having V1F07 in the name sounds very magic number-ish. It would be good to abstract that out with a better-named struct anyway.
>
> The rules are not fuzzy. The only thing modules outside of `fw` are
> seeing is a struct named `UnloadingGuestDriver`, and the name with
> `V1F07` is not leaked.
>
> Even if we had a different structure, we would still need to write the
> rpc_unloading_guest_driver_v1F_07 at some point, so we would need to
> refer to it in `fw` anyway. The current design is not any more lax than
> that, it just removes one step.
Ah, I missed that. This is all in FW, so I think that clarifies the rule for me now.
Thank you.
© 2016 - 2026 Red Hat, Inc.