drivers/gpu/nova-core/driver.rs | 5 +++++ rust/kernel/pci.rs | 6 ++++++ 2 files changed, 11 insertions(+)
Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the idea now is that VFIO drivers, for NVIDIA GPUs that are supported by NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to let NovaCore bind to the VFs, and then have NovaCore call into the upper (VFIO) module via Aux Bus, but this turns out to be awkward and is no longer in favor.) So, in order to support that: Nova-core must only bind to Physical Functions (PFs) and regular PCI devices, not to Virtual Functions (VFs) created through SR-IOV. Add a method to check if a PCI device is a Virtual Function (VF). This allows Rust drivers to determine whether a device is a VF created through SR-IOV. This is required in order to implement VFIO, because drivers such as NovaCore must only bind to Physical Functions (PFs) or regular PCI devices. The VFs must be left unclaimed, so that a VFIO kernel module can claim them. Use is_virtfn() in NovaCore, in preparation for it to be used in a VFIO scenario. I've based this on top of today's driver-core-next [1], because the first patch belongs there, and the second patch applies cleanly to either driver-core-next or drm-rust-next. So this seems like the easiest to work with. [1] https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/ John Hubbard (2): rust: pci: add is_virtfn(), to check for VFs gpu: nova-core: reject binding to SR-IOV Virtual Functions drivers/gpu/nova-core/driver.rs | 5 +++++ rust/kernel/pci.rs | 6 ++++++ 2 files changed, 11 insertions(+) base-commit: 6d97171ac6585de698df019b0bfea3f123fd8385 -- 2.51.0
On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: > Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the > idea now is that VFIO drivers, for NVIDIA GPUs that are supported by > NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to > let NovaCore bind to the VFs, and then have NovaCore call into the upper > (VFIO) module via Aux Bus, but this turns out to be awkward and is no > longer in favor.) So, in order to support that: > > Nova-core must only bind to Physical Functions (PFs) and regular PCI > devices, not to Virtual Functions (VFs) created through SR-IOV. Naive question: will guests also see the passed-through VF as a VF? If so, wouldn't this change also prevents guests from using Nova?
On 1.10.2025 3.26, Alexandre Courbot wrote: > On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: >> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the >> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by >> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to >> let NovaCore bind to the VFs, and then have NovaCore call into the upper >> (VFIO) module via Aux Bus, but this turns out to be awkward and is no >> longer in favor.) So, in order to support that: >> >> Nova-core must only bind to Physical Functions (PFs) and regular PCI >> devices, not to Virtual Functions (VFs) created through SR-IOV. > > Naive question: will guests also see the passed-through VF as a VF? If > so, wouldn't this change also prevents guests from using Nova? In the entire software stack (firmware and interface, host/guest driver,and management stack), the entire design assumes that a VF is tied to a VM. NVIDIA GPU already provides good enough mechanisms to enforce those between containers on PF. Moreover, VF on baremetal is not the only way to support *container* environments. Surely, there are also other approaches, for example, PF driver with DRM cgroup. Like what I mentioned, it is really device/use-case specific. The device vendor chooses the best approaches based on their device characteristic and schedule/resource isolation capabilities for supporting containers on bare metal. Z.
On 9/30/25 5:26 PM, Alexandre Courbot wrote: > On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: >> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the >> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by >> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to >> let NovaCore bind to the VFs, and then have NovaCore call into the upper >> (VFIO) module via Aux Bus, but this turns out to be awkward and is no >> longer in favor.) So, in order to support that: >> >> Nova-core must only bind to Physical Functions (PFs) and regular PCI >> devices, not to Virtual Functions (VFs) created through SR-IOV. > > Naive question: will guests also see the passed-through VF as a VF? If > so, wouldn't this change also prevents guests from using Nova? I'm also new to this area. I would expect that guests *must* see these as PFs, otherwise...nothing makes any sense. Maybe Alex Williamson or Jason Gunthorpe (+CC) can chime in. thanks, -- John Hubbard
On Tue, Sep 30, 2025 at 06:26:23PM -0700, John Hubbard wrote: > On 9/30/25 5:26 PM, Alexandre Courbot wrote: > > On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: > >> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the > >> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by > >> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to > >> let NovaCore bind to the VFs, and then have NovaCore call into the upper > >> (VFIO) module via Aux Bus, but this turns out to be awkward and is no > >> longer in favor.) So, in order to support that: > >> > >> Nova-core must only bind to Physical Functions (PFs) and regular PCI > >> devices, not to Virtual Functions (VFs) created through SR-IOV. > > > > Naive question: will guests also see the passed-through VF as a VF? If > > so, wouldn't this change also prevents guests from using Nova? > > I'm also new to this area. I would expect that guests *must* see > these as PFs, otherwise...nothing makes any sense. > > Maybe Alex Williamson or Jason Gunthorpe (+CC) can chime in. Driver should never do something like this. Novacore should work on a VF pretending to be a PF in a VM, and it should work directly on that same VF outside a VM. It is not the job of driver to make binding decisions like 'oh VFs of this devices are usually VFIO so I will fail probe'. VFIO users should use the disable driver autobinding sysfs before creating SRIOV instance to prevent this auto binding and then bind VFIO manually. Or userspace can manually unbind novacore from the VF and rebind VFIO. Jason
On Wed, 1 Oct 2025 11:46:29 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Tue, Sep 30, 2025 at 06:26:23PM -0700, John Hubbard wrote: > > On 9/30/25 5:26 PM, Alexandre Courbot wrote: > > > On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: > > >> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the > > >> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by > > >> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to > > >> let NovaCore bind to the VFs, and then have NovaCore call into the upper > > >> (VFIO) module via Aux Bus, but this turns out to be awkward and is no > > >> longer in favor.) So, in order to support that: > > >> > > >> Nova-core must only bind to Physical Functions (PFs) and regular PCI > > >> devices, not to Virtual Functions (VFs) created through SR-IOV. > > > > > > Naive question: will guests also see the passed-through VF as a VF? If > > > so, wouldn't this change also prevents guests from using Nova? > > > > I'm also new to this area. I would expect that guests *must* see > > these as PFs, otherwise...nothing makes any sense. To answer this specific question, a VF essentially appears as a PF to the VM. The relationship between a PF and VF is established when SR-IOV is configured and in part requires understanding the offset and stride of the VF enumeration, none of which is visible to the VM. The gaps in VF devices (ex. device ID register) are also emulated in the hypervisor stack. > > Maybe Alex Williamson or Jason Gunthorpe (+CC) can chime in. > > Driver should never do something like this. > > Novacore should work on a VF pretending to be a PF in a VM, and it > should work directly on that same VF outside a VM. > > It is not the job of driver to make binding decisions like 'oh VFs of > this devices are usually VFIO so I will fail probe'. > > VFIO users should use the disable driver autobinding sysfs before > creating SRIOV instance to prevent this auto binding and then bind > VFIO manually. > > Or userspace can manually unbind novacore from the VF and rebind VFIO. But this is also true, unbinding "native" host drivers is a fact of life for vfio and we do have the sriov_drivers_autoprobe sysfs attributes if a user wants to set a policy for automatically probing VF drivers for a PF. I think the question would be whether a "bare" VF really provides a useful device for nova-core to bind to or if we're just picking it up because the ID table matches. It's my impression that we require a fair bit of software emulation/virtualization in the host vGPU driver to turn the VF into something that can work like a PF in the VM and I don't know that we can require nova-core to make use of a VF without that emulation/virtualization layer. For example, aren't VRAM allocations for a VF done as part of profiling the VF through the vGPU host driver? Thanks, Alex
On Wed, Oct 01, 2025 at 12:16:31PM -0600, Alex Williamson wrote: > I think the question would be whether a "bare" VF really provides a > useful device for nova-core to bind to or if we're just picking it > up It really should work, actual linux containers are my goto reason for people wanting to use VF's without a virtualization layer. > fair bit of software emulation/virtualization in the host vGPU driver to > turn the VF into something that can work like a PF in the VM and I > don't know that we can require nova-core to make use of a VF without > that emulation/virtualization layer. For example, aren't VRAM > allocations for a VF done as part of profiling the VF through the vGPU > host driver? The VF profiling should be designed to work without VFIO. It is was one thing to have the VFIO variant driver profile mediated devices that only it can create, but now that it is a generic VF without mediation it doesn't make sense anymore. The question is how much mediation does the variant driver insert between the VM and the VF, and from what I can see that is mostly limited to config space.. IOW, I would expect nova-core on the PF has a way to profile and activate the VF to a usable state and then nova-core can run either through a vm or directly on the VF. At least this is how all the NIC drivers have their SRIOV support designed today. Jason
On 10/1/25 11:30 AM, Jason Gunthorpe wrote: > On Wed, Oct 01, 2025 at 12:16:31PM -0600, Alex Williamson wrote: >> I think the question would be whether a "bare" VF really provides a >> useful device for nova-core to bind to or if we're just picking it >> up > > It really should work, actual linux containers are my goto reason for > people wanting to use VF's without a virtualization layer. This is a solid use case, even though we don't yet have it for GPUs. > >> fair bit of software emulation/virtualization in the host vGPU driver to >> turn the VF into something that can work like a PF in the VM and I >> don't know that we can require nova-core to make use of a VF without >> that emulation/virtualization layer. For example, aren't VRAM >> allocations for a VF done as part of profiling the VF through the vGPU >> host driver? > > The VF profiling should be designed to work without VFIO. So we'll need to add some support to nova-core, in order for that to happen. It's not there yet, of course. > > It is was one thing to have the VFIO variant driver profile mediated > devices that only it can create, but now that it is a generic VF > without mediation it doesn't make sense anymore. > > The question is how much mediation does the variant driver insert > between the VM and the VF, and from what I can see that is mostly > limited to config space.. > > IOW, I would expect nova-core on the PF has a way to profile and > activate the VF to a usable state and then nova-core can run either > through a vm or directly on the VF. > > At least this is how all the NIC drivers have their SRIOV support > designed today. > OK, so I really like this design direction, and we can go in that direction. However, I'd like to start with this tiny patchset first, because: a) It's only one "if" statement to delete, when we decide to start letting nova-core support VFs directly. b) This series simplifies handling of VFs for the first use case, which is vGPU running on VFIO. thanks, -- John Hubbard
On Wed Oct 1, 2025 at 10:26 AM JST, John Hubbard wrote: > On 9/30/25 5:26 PM, Alexandre Courbot wrote: >> On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: >>> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the >>> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by >>> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to >>> let NovaCore bind to the VFs, and then have NovaCore call into the upper >>> (VFIO) module via Aux Bus, but this turns out to be awkward and is no >>> longer in favor.) So, in order to support that: >>> >>> Nova-core must only bind to Physical Functions (PFs) and regular PCI >>> devices, not to Virtual Functions (VFs) created through SR-IOV. >> >> Naive question: will guests also see the passed-through VF as a VF? If >> so, wouldn't this change also prevents guests from using Nova? > > I'm also new to this area. I would expect that guests *must* see > these as PFs, otherwise...nothing makes any sense. But if the guest sees the passed-through VF as a PF, won't it try to do things it is not supposed to do like loading the GSP firmware (which is managed by the host)?
On 9/30/25 6:39 PM, Alexandre Courbot wrote: > On Wed Oct 1, 2025 at 10:26 AM JST, John Hubbard wrote: >> On 9/30/25 5:26 PM, Alexandre Courbot wrote: >>> On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: >>>> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the >>>> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by >>>> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to >>>> let NovaCore bind to the VFs, and then have NovaCore call into the upper >>>> (VFIO) module via Aux Bus, but this turns out to be awkward and is no >>>> longer in favor.) So, in order to support that: >>>> >>>> Nova-core must only bind to Physical Functions (PFs) and regular PCI >>>> devices, not to Virtual Functions (VFs) created through SR-IOV. >>> >>> Naive question: will guests also see the passed-through VF as a VF? If >>> so, wouldn't this change also prevents guests from using Nova? >> >> I'm also new to this area. I would expect that guests *must* see >> these as PFs, otherwise...nothing makes any sense. > > But if the guest sees the passed-through VF as a PF, won't it try to > do things it is not supposed to do like loading the GSP firmware (which > is managed by the host)? Yes. A non-paravirtualized guest will attempt to behave just like a bare metal driver would behave. It's the job of the various layers of virtualization to intercept and modify such things appropriately. Looking ahead: if the VFIO experts come back and tell us that guests see these as VFs, then there is still a way forward, because we talked about loading nova-core with a "vfio_mode" kernel module parameter. So then it becomes "if vfio_mode, then skip VFs". thanks, -- John Hubbard
On 1.10.2025 4.45, John Hubbard wrote: > On 9/30/25 6:39 PM, Alexandre Courbot wrote: >> On Wed Oct 1, 2025 at 10:26 AM JST, John Hubbard wrote: >>> On 9/30/25 5:26 PM, Alexandre Courbot wrote: >>>> On Wed Oct 1, 2025 at 7:07 AM JST, John Hubbard wrote: >>>>> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the >>>>> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by >>>>> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to >>>>> let NovaCore bind to the VFs, and then have NovaCore call into the upper >>>>> (VFIO) module via Aux Bus, but this turns out to be awkward and is no >>>>> longer in favor.) So, in order to support that: >>>>> >>>>> Nova-core must only bind to Physical Functions (PFs) and regular PCI >>>>> devices, not to Virtual Functions (VFs) created through SR-IOV. >>>> >>>> Naive question: will guests also see the passed-through VF as a VF? If >>>> so, wouldn't this change also prevents guests from using Nova? >>> pdev->virtfn (VF) is set to "true" when admin enabling VFs via the sysfs and PF driver. Presumably, pdev->virtfn will be "false" all the time in the guest. >>> I'm also new to this area. I would expect that guests *must* see >>> these as PFs, otherwise...nothing makes any sense. >> >> But if the guest sees the passed-through VF as a PF, won't it try to >> do things it is not supposed to do like loading the GSP firmware (which >> is managed by the host)? > The guest driver will read PMC_BOOT_1 and check PMC_BOOT_1_VGPU_VF flag to tell if it is running on a VF or a PF. https://github.com/NVIDIA/open-gpu-kernel-modules/blob/main/src/nvidia/arch/nvalloc/unix/src/os-hypervisor.c#L945 > Yes. A non-paravirtualized guest will attempt to behave just like a > bare metal driver would behave. It's the job of the various layers > of virtualization to intercept and modify such things appropriately. > > Looking ahead: if the VFIO experts come back and tell us that guests > see these as VFs, then there is still a way forward, because we > talked about loading nova-core with a "vfio_mode" kernel module > parameter. So then it becomes "if vfio_mode, then skip VFs". > > > thanks,
On Wed, Oct 01, 2025 at 08:09:37AM +0000, Zhi Wang wrote: > >> But if the guest sees the passed-through VF as a PF, won't it try to > >> do things it is not supposed to do like loading the GSP firmware (which > >> is managed by the host)? > > > > The guest driver will read PMC_BOOT_1 and check PMC_BOOT_1_VGPU_VF flag > to tell if it is running on a VF or a PF. Yes exactly, and then novacore should modify its behavior and operate the device in the different mode. It doesn't matter if a VM is involved or not, a VF driver running side by side wit the PF driver should still work. There are use cases where people do this, eg they can stick the VF into a linux container and use the SRIOV mechanism as a QOS control. 'This container only gets 1/4 of a GPU' Jason
On 1.10.2025 17.48, Jason Gunthorpe wrote: > On Wed, Oct 01, 2025 at 08:09:37AM +0000, Zhi Wang wrote: >>>> But if the guest sees the passed-through VF as a PF, won't it try to >>>> do things it is not supposed to do like loading the GSP firmware (which >>>> is managed by the host)? >>> >> >> The guest driver will read PMC_BOOT_1 and check PMC_BOOT_1_VGPU_VF flag >> to tell if it is running on a VF or a PF. > > Yes exactly, and then novacore should modify its behavior and operate > the device in the different mode. > > It doesn't matter if a VM is involved or not, a VF driver running side > by side wit the PF driver should still work. > > There are use cases where people do this, eg they can stick the VF > into a linux container and use the SRIOV mechanism as a QOS control. > 'This container only gets 1/4 of a GPU' > Right, I also mentioned the same use cases of NIC/GPU in another reply to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support linux container, it seems there have been other solutions. IMHO, it is not mandatory that we have to support VF driver on bare metal so far yet. Z. > Jason
On Wed, Oct 01, 2025 at 09:13:33PM +0000, Zhi Wang wrote: > Right, I also mentioned the same use cases of NIC/GPU in another reply > to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support > linux container, I don't think it matter what "NVIDIA" does - this is the upstream architecture it should be followed unless there is some significant reason. Jason
On 2.10.2025 14.58, Jason Gunthorpe wrote: > On Wed, Oct 01, 2025 at 09:13:33PM +0000, Zhi Wang wrote: > >> Right, I also mentioned the same use cases of NIC/GPU in another reply >> to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support >> linux container, > > I don't think it matter what "NVIDIA" does - this is the upstream > architecture it should be followed unless there is some significant > reason. > Hmm. Can you elaborate why? From the device vendor's stance, they know what is the best approach to offer the better the user experience according to their device characteristic. VF on bare metal is not the only approach for supporting *container*. Some devices are using it because they have to rely on it to deliver the user experience. It is mandatory for them because they have to, not because of the architecture. I am not sure why do the device vendor has to be forced on supporting "VF on bare metal" if they have already offered the user a solution via other approach? In fact, all the CSP I know, who are using GPU containers (not VM-based containers) widely on cloud gaming, ML, they are using PF driver because they are expecting high-density containers way more than amount of VFs the GPU can offer. I don't see the point of "VF on bare metal" is mandatory for GPU containers, at least, not right now. Z. > Jason
On Thu, Oct 02, 2025 at 12:59:59PM +0000, Zhi Wang wrote: > On 2.10.2025 14.58, Jason Gunthorpe wrote: > > On Wed, Oct 01, 2025 at 09:13:33PM +0000, Zhi Wang wrote: > > > >> Right, I also mentioned the same use cases of NIC/GPU in another reply > >> to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support > >> linux container, > > > > I don't think it matter what "NVIDIA" does - this is the upstream > > architecture it should be followed unless there is some significant > > reason. > > Hmm. Can you elaborate why? > > From the device vendor's stance, they know what is the best approach > to offer the better the user experience according to their device > characteristic. You can easially push the code to nova core not vfio and make it work generically, some significant reason is needed beyond "the vendor doesn't want to". Jason
On 2.10.2025 16.42, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 12:59:59PM +0000, Zhi Wang wrote: >> On 2.10.2025 14.58, Jason Gunthorpe wrote: >>> On Wed, Oct 01, 2025 at 09:13:33PM +0000, Zhi Wang wrote: >>> >>>> Right, I also mentioned the same use cases of NIC/GPU in another reply >>>> to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support >>>> linux container, >>> >>> I don't think it matter what "NVIDIA" does - this is the upstream >>> architecture it should be followed unless there is some significant >>> reason. >> >> Hmm. Can you elaborate why? >> >> From the device vendor's stance, they know what is the best approach >> to offer the better the user experience according to their device >> characteristic. > > You can easially push the code to nova core not vfio and make it work > generically, some significant reason is needed beyond "the vendor > doesn't want to". > The point is: it is not that "easy" by just pushing the code to nova core and then it works, because the entire software stack including the firmware and its interface are not designed for such use case. It just wouldn't work. Z. > Jason
On Thu, Oct 02, 2025 at 02:29:09PM +0000, Zhi Wang wrote: > On 2.10.2025 16.42, Jason Gunthorpe wrote: > > On Thu, Oct 02, 2025 at 12:59:59PM +0000, Zhi Wang wrote: > >> On 2.10.2025 14.58, Jason Gunthorpe wrote: > >>> On Wed, Oct 01, 2025 at 09:13:33PM +0000, Zhi Wang wrote: > >>> > >>>> Right, I also mentioned the same use cases of NIC/GPU in another reply > >>>> to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support > >>>> linux container, > >>> > >>> I don't think it matter what "NVIDIA" does - this is the upstream > >>> architecture it should be followed unless there is some significant > >>> reason. > >> > >> Hmm. Can you elaborate why? > >> > >> From the device vendor's stance, they know what is the best approach > >> to offer the better the user experience according to their device > >> characteristic. > > > > You can easially push the code to nova core not vfio and make it work > > generically, some significant reason is needed beyond "the vendor > > doesn't want to". You'd have to be more specific, I didn't see really any mediation stuff in the vfio driver to explain why the VF in the VM would act so differently that it "couldn't work" Even if there is some small FW issue, it is better to still structure things in the normal way and assume it will get fixed sometime later than to forever close that door. Jason
On 2.10.2025 17.31, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 02:29:09PM +0000, Zhi Wang wrote: >> On 2.10.2025 16.42, Jason Gunthorpe wrote: >>> On Thu, Oct 02, 2025 at 12:59:59PM +0000, Zhi Wang wrote: >>>> On 2.10.2025 14.58, Jason Gunthorpe wrote: >>>>> On Wed, Oct 01, 2025 at 09:13:33PM +0000, Zhi Wang wrote: >>>>> >>>>>> Right, I also mentioned the same use cases of NIC/GPU in another reply >>>>>> to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support >>>>>> linux container, >>>>> >>>>> I don't think it matter what "NVIDIA" does - this is the upstream >>>>> architecture it should be followed unless there is some significant >>>>> reason. >>>> >>>> Hmm. Can you elaborate why? >>>> >>>> From the device vendor's stance, they know what is the best approach >>>> to offer the better the user experience according to their device >>>> characteristic. >>> >>> You can easially push the code to nova core not vfio and make it work >>> generically, some significant reason is needed beyond "the vendor >>> doesn't want to". > > You'd have to be more specific, I didn't see really any mediation > stuff in the vfio driver to explain why the VF in the VM would act so > differently that it "couldn't work" > From the device vendor’s perspective, we have no support or use case for a bare-metal VF model, not now and not in the foreseeable future. Even hypothetically, such support would not come from nova-core.ko, since that would defeat the purpose of maintaining a trimmed-down kernel module where minimizing the attack surface and preserving strict security boundaries are primary design goals. > Even if there is some small FW issue, it is better to still structure > things in the normal way and assume it will get fixed sometime later > than to forever close that door. > > Jason
On Tue, Oct 07, 2025 at 06:51:47AM +0000, Zhi Wang wrote: > > You'd have to be more specific, I didn't see really any mediation > > stuff in the vfio driver to explain why the VF in the VM would act so > > differently that it "couldn't work" > > From the device vendor’s perspective, we have no support or use case for > a bare-metal VF model, not now and not in the foreseeable future. Again be specific, exactly what mediation in vfio is missing. > Even hypothetically, such support would not come from nova-core.ko, > since that would defeat the purpose of maintaining a trimmed-down > kernel module where minimizing the attack surface and preserving > strict security boundaries are primary design goals. Nonsense. If you moved stuff from vfio to noca-core it doesn't change the "trimmed-down" nature one bit. I'm strongly against adding that profiling stuff to vfio, and I'm not hearing any reasons why nova is special and it must be done that way. Jason
On Tue Oct 7, 2025 at 8:51 AM CEST, Zhi Wang wrote: > From the device vendor’s perspective, we have no support or use case for > a bare-metal VF model, not now and not in the foreseeable future. Who is we? I think there'd be a ton of users that do see such use-cases. What does "no support" mean? Are there technical limitation that prevent an implementation (I haven't seen any so far)? > Even > hypothetically, such support would not come from nova-core.ko, since > that would defeat the purpose of maintaining a trimmed-down kernel > module where minimizing the attack surface and preserving strict > security boundaries are primary design goals. I wouldn't say the *primary* design goal is to be as trimmed-down as possible. The primary design goals are rather proper firmware abstraction, addressing design incompatibilities with modern graphics and compute APIs, memory safety concerns and general maintainability. It does make sense to not run the vGPU use-case on top of all the additional DRM stuff that will go into nova-drm, since this is clearly not needed in the vGPU use-case. But, it doesn't mean that we have to keep everything out of nova-core for this purpose. I think the bare-metal VF model is a very interesting use-case and if it is technically feasable we should support it. And I think it should be in nova-core. The difference between nova-core running on a bare metal VF and nova-core running on the same VF in a VM shouldn't be that different anyways, no?
On 7.10.2025 13.14, Danilo Krummrich wrote: > On Tue Oct 7, 2025 at 8:51 AM CEST, Zhi Wang wrote: >> From the device vendor’s perspective, we have no support or use case for >> a bare-metal VF model, not now and not in the foreseeable future. > > Who is we? I think there'd be a ton of users that do see such use-cases. > > What does "no support" mean? Are there technical limitation that prevent an> implementation (I haven't seen any so far)? > >> Even >> hypothetically, such support would not come from nova-core.ko, since >> that would defeat the purpose of maintaining a trimmed-down kernel >> module where minimizing the attack surface and preserving strict >> security boundaries are primary design goals. > > I wouldn't say the *primary* design goal is to be as trimmed-down as possible. > > The primary design goals are rather proper firmware abstraction, addressing > design incompatibilities with modern graphics and compute APIs, memory safety > concerns and general maintainability. > > It does make sense to not run the vGPU use-case on top of all the additional DRM > stuff that will go into nova-drm, since this is clearly not needed in the vGPU > use-case. But, it doesn't mean that we have to keep everything out of nova-core > for this purpose. > > I think the bare-metal VF model is a very interesting use-case and if it is > technically feasable we should support it. And I think it should be in > nova-core. The difference between nova-core running on a bare metal VF and > nova-core running on the same VF in a VM shouldn't be that different anyways, > no? @Neo. Can you shed some light here?
On Thu Oct 2, 2025 at 6:13 AM JST, Zhi Wang wrote: > On 1.10.2025 17.48, Jason Gunthorpe wrote: >> On Wed, Oct 01, 2025 at 08:09:37AM +0000, Zhi Wang wrote: >>>>> But if the guest sees the passed-through VF as a PF, won't it try to >>>>> do things it is not supposed to do like loading the GSP firmware (which >>>>> is managed by the host)? >>>> >>> >>> The guest driver will read PMC_BOOT_1 and check PMC_BOOT_1_VGPU_VF flag >>> to tell if it is running on a VF or a PF. >> >> Yes exactly, and then novacore should modify its behavior and operate >> the device in the different mode. >> >> It doesn't matter if a VM is involved or not, a VF driver running side >> by side wit the PF driver should still work. >> >> There are use cases where people do this, eg they can stick the VF >> into a linux container and use the SRIOV mechanism as a QOS control. >> 'This container only gets 1/4 of a GPU' >> > > Right, I also mentioned the same use cases of NIC/GPU in another reply > to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support > linux container, it seems there have been other solutions. IMHO, it is > not mandatory that we have to support VF driver on bare metal so far > yet. For my education, what gets in the way of supporting a VF on the bare metal if we already support it from inside a VM?
On 10/1/25 6:43 PM, Alexandre Courbot wrote:
> On Thu Oct 2, 2025 at 6:13 AM JST, Zhi Wang wrote:
>> On 1.10.2025 17.48, Jason Gunthorpe wrote:
>>> On Wed, Oct 01, 2025 at 08:09:37AM +0000, Zhi Wang wrote:
...
>> Right, I also mentioned the same use cases of NIC/GPU in another reply
>> to Danilo. But what I get is NVIDIA doesn't use bare metal VF to support
>> linux container, it seems there have been other solutions. IMHO, it is
>> not mandatory that we have to support VF driver on bare metal so far
>> yet.
>
> For my education, what gets in the way of supporting a VF on the bare
> metal if we already support it from inside a VM?
Synthesizing a response from what I've learned here:
First of all, the PF and VFs will probe() nova-core with the same
device ID, so nova-core will get multiple probe() calls for the
same device. That has to be handled. (Thanks to Joel for pointing
that out.)
Next, for actual true VF support, nova-core will need to "provision"
the VFs, which I have learned involves the following:
* allocate vidmem (or "VRAM" in DRM terminology)
* set up compute quotas
* configure which GPU features are exposed
thanks,
--
John Hubbard
On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... > Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the > idea now is that VFIO drivers, for NVIDIA GPUs that are supported by > NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to > let NovaCore bind to the VFs, and then have NovaCore call into the upper > (VFIO) module via Aux Bus, but this turns out to be awkward and is no > longer in favor.) So, in order to support that: > > Nova-core must only bind to Physical Functions (PFs) and regular PCI > devices, not to Virtual Functions (VFs) created through SR-IOV. > > Add a method to check if a PCI device is a Virtual Function (VF). This > allows Rust drivers to determine whether a device is a VF created > through SR-IOV. This is required in order to implement VFIO, because > drivers such as NovaCore must only bind to Physical Functions (PFs) or > regular PCI devices. The VFs must be left unclaimed, so that a VFIO > kernel module can claim them. Curiously based on a quick glance I didn't see any other drivers doing this which makes me wonder why we're different here. But it seems likely their virtual functions are supported by the same driver rather than requiring a different VF specific driver (or I glanced too quickly!). I'm guessing the proposal is to fail the probe() function in nova-core for the VFs - I'm not sure but does the driver core continue to try probing other drivers if one fails probe()? It seems like this would be something best filtered on in the device id table, although I understand that's not possible today. > Use is_virtfn() in NovaCore, in preparation for it to be used in a VFIO > scenario. > > I've based this on top of today's driver-core-next [1], because the > first patch belongs there, and the second patch applies cleanly to either > driver-core-next or drm-rust-next. So this seems like the easiest to > work with. > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/ > > John Hubbard (2): > rust: pci: add is_virtfn(), to check for VFs > gpu: nova-core: reject binding to SR-IOV Virtual Functions > > drivers/gpu/nova-core/driver.rs | 5 +++++ > rust/kernel/pci.rs | 6 ++++++ > 2 files changed, 11 insertions(+) > > > base-commit: 6d97171ac6585de698df019b0bfea3f123fd8385 > -- > 2.51.0 >
On 9/30/25 5:29 PM, Alistair Popple wrote: > On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the >> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by >> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to >> let NovaCore bind to the VFs, and then have NovaCore call into the upper >> (VFIO) module via Aux Bus, but this turns out to be awkward and is no >> longer in favor.) So, in order to support that: >> >> Nova-core must only bind to Physical Functions (PFs) and regular PCI >> devices, not to Virtual Functions (VFs) created through SR-IOV. >> >> Add a method to check if a PCI device is a Virtual Function (VF). This >> allows Rust drivers to determine whether a device is a VF created >> through SR-IOV. This is required in order to implement VFIO, because >> drivers such as NovaCore must only bind to Physical Functions (PFs) or >> regular PCI devices. The VFs must be left unclaimed, so that a VFIO >> kernel module can claim them. > > Curiously based on a quick glance I didn't see any other drivers doing this > which makes me wonder why we're different here. But it seems likely their > virtual functions are supported by the same driver rather than requiring a > different VF specific driver (or I glanced too quickly!). I haven't checked into that, but it sounds reasonable. > > I'm guessing the proposal is to fail the probe() function in nova-core for > the VFs - I'm not sure but does the driver core continue to try probing other > drivers if one fails probe()? It seems like this would be something best > filtered on in the device id table, although I understand that's not possible > today. Yes, from my experience with building Nouveau and Nova and running both on the same system, with 2 GPUs: when Nova gets probed first, because Nova is a work in progress, however far it gets, it still fails the probe in the end. And then Nouveau gets probed, and claims the GPU. thanks, -- John Hubbard
On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: > On 9/30/25 5:29 PM, Alistair Popple wrote: >> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >>> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the >>> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by >>> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to >>> let NovaCore bind to the VFs, and then have NovaCore call into the upper >>> (VFIO) module via Aux Bus, but this turns out to be awkward and is no >>> longer in favor.) So, in order to support that: >>> >>> Nova-core must only bind to Physical Functions (PFs) and regular PCI >>> devices, not to Virtual Functions (VFs) created through SR-IOV. >>> >>> Add a method to check if a PCI device is a Virtual Function (VF). This >>> allows Rust drivers to determine whether a device is a VF created >>> through SR-IOV. This is required in order to implement VFIO, because >>> drivers such as NovaCore must only bind to Physical Functions (PFs) or >>> regular PCI devices. The VFs must be left unclaimed, so that a VFIO >>> kernel module can claim them. >> >> Curiously based on a quick glance I didn't see any other drivers doing this >> which makes me wonder why we're different here. But it seems likely their >> virtual functions are supported by the same driver rather than requiring a >> different VF specific driver (or I glanced too quickly!). > > I haven't checked into that, but it sounds reasonable. There are multiple cases: Some devices have different PCI device IDs for their physical and virtual functions and different drivers handling then. One example for that is Intel IXGBE. But there are also some drivers, which do a similar check and just stop probing if they detect a virtual function. So, this patch series does not do anything uncommon. >> I'm guessing the proposal is to fail the probe() function in nova-core for >> the VFs - I'm not sure but does the driver core continue to try probing other >> drivers if one fails probe()? It seems like this would be something best >> filtered on in the device id table, although I understand that's not possible >> today. Yes, the driver core keeps going until it finds a driver that succeeds probing or no driver is left to probe. (This behavior is also the reason for the name probe() in the first place.) However, nowadays we ideally know whether a driver fits a device before probe() is called, but there are still exceptions; with PCI virtual functions we've just hit one of those. Theoretically, we could also indicate whether a driver handles virtual functions through a boolean in struct pci_driver, which would be a bit more elegant. If you want I can also pick this up with my SR-IOV RFC which will probably touch the driver structure as well; I plan to send something in a few days.
On 1.10.2025 13.32, Danilo Krummrich wrote: > On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >> On 9/30/25 5:29 PM, Alistair Popple wrote: >>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >>>> Post-Kangrejos, the approach for NovaCore + VFIO has changed a bit: the >>>> idea now is that VFIO drivers, for NVIDIA GPUs that are supported by >>>> NovaCore, should bind directly to the GPU's VFs. (An earlier idea was to >>>> let NovaCore bind to the VFs, and then have NovaCore call into the upper >>>> (VFIO) module via Aux Bus, but this turns out to be awkward and is no >>>> longer in favor.) So, in order to support that: >>>> >>>> Nova-core must only bind to Physical Functions (PFs) and regular PCI >>>> devices, not to Virtual Functions (VFs) created through SR-IOV. >>>> >>>> Add a method to check if a PCI device is a Virtual Function (VF). This >>>> allows Rust drivers to determine whether a device is a VF created >>>> through SR-IOV. This is required in order to implement VFIO, because >>>> drivers such as NovaCore must only bind to Physical Functions (PFs) or >>>> regular PCI devices. The VFs must be left unclaimed, so that a VFIO >>>> kernel module can claim them. >>> >>> Curiously based on a quick glance I didn't see any other drivers doing this >>> which makes me wonder why we're different here. But it seems likely their >>> virtual functions are supported by the same driver rather than requiring a >>> different VF specific driver (or I glanced too quickly!). >> >> I haven't checked into that, but it sounds reasonable. > > There are multiple cases: > > Some devices have different PCI device IDs for their physical and virtual > functions and different drivers handling then. One example for that is Intel > IXGBE. > > But there are also some drivers, which do a similar check and just stop probing > if they detect a virtual function. > Right, it really depends on the hardware design and the intended use cases, and is therefore device-specific. In networking, for example, there are scenarios where VFs are used directly on bare metal - such as with DPDK to bypass the kernel network stack for better performance. In such cases, PF and VF drivers can end up being quite different and VF driver can attach on the baremetal (via pdev->is_virtfn in probe()). Similarly, in the GPU domain, there are comparable scenarios where VFs are exposed on bare metal for use cases, like containers. (I remember Xe driver can be attached to a VF in bare metal for such a use case.) For NVIDIA GPUs, VFs are only associated with VMs. So this change makes sense within this scope. Z. > So, this patch series does not do anything uncommon. > >>> I'm guessing the proposal is to fail the probe() function in nova-core for >>> the VFs - I'm not sure but does the driver core continue to try probing other >>> drivers if one fails probe()? It seems like this would be something best >>> filtered on in the device id table, although I understand that's not possible >>> today. > > Yes, the driver core keeps going until it finds a driver that succeeds probing > or no driver is left to probe. (This behavior is also the reason for the name > probe() in the first place.) > > However, nowadays we ideally know whether a driver fits a device before probe() > is called, but there are still exceptions; with PCI virtual functions we've just > hit one of those. > > Theoretically, we could also indicate whether a driver handles virtual functions > through a boolean in struct pci_driver, which would be a bit more elegant. > > If you want I can also pick this up with my SR-IOV RFC which will probably touch > the driver structure as well; I plan to send something in a few days.
On 10/1/25 6:52 AM, Zhi Wang wrote: > On 1.10.2025 13.32, Danilo Krummrich wrote: >> On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >>> On 9/30/25 5:29 PM, Alistair Popple wrote: >>>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... ... >> So, this patch series does not do anything uncommon. >> >>>> I'm guessing the proposal is to fail the probe() function in nova-core for >>>> the VFs - I'm not sure but does the driver core continue to try probing other >>>> drivers if one fails probe()? It seems like this would be something best >>>> filtered on in the device id table, although I understand that's not possible >>>> today. >> >> Yes, the driver core keeps going until it finds a driver that succeeds probing >> or no driver is left to probe. (This behavior is also the reason for the name >> probe() in the first place.) >> >> However, nowadays we ideally know whether a driver fits a device before probe() >> is called, but there are still exceptions; with PCI virtual functions we've just >> hit one of those. >> >> Theoretically, we could also indicate whether a driver handles virtual functions >> through a boolean in struct pci_driver, which would be a bit more elegant. >> >> If you want I can also pick this up with my SR-IOV RFC which will probably touch >> the driver structure as well; I plan to send something in a few days. As I mentioned in the other fork of this thread, I do think this is a good start. So unless someone disagrees, I'd like to go with this series (perhaps with better wording in the commit messages, and maybe a better comment above the probe() failure return) for now. And then we can add SRIOV support into nova-core when we are ready. Let me know--especially Jason--if that sounds reasonable, and if so I'll draft more accurate wording. thanks, -- John Hubbard
On Thu Oct 2, 2025 at 12:38 AM CEST, John Hubbard wrote: > On 10/1/25 6:52 AM, Zhi Wang wrote: >> On 1.10.2025 13.32, Danilo Krummrich wrote: >>> On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >>>> On 9/30/25 5:29 PM, Alistair Popple wrote: >>>>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... > ... >>> So, this patch series does not do anything uncommon. >>> >>>>> I'm guessing the proposal is to fail the probe() function in nova-core for >>>>> the VFs - I'm not sure but does the driver core continue to try probing other >>>>> drivers if one fails probe()? It seems like this would be something best >>>>> filtered on in the device id table, although I understand that's not possible >>>>> today. >>> >>> Yes, the driver core keeps going until it finds a driver that succeeds probing >>> or no driver is left to probe. (This behavior is also the reason for the name >>> probe() in the first place.) >>> >>> However, nowadays we ideally know whether a driver fits a device before probe() >>> is called, but there are still exceptions; with PCI virtual functions we've just >>> hit one of those. >>> >>> Theoretically, we could also indicate whether a driver handles virtual functions >>> through a boolean in struct pci_driver, which would be a bit more elegant. >>> >>> If you want I can also pick this up with my SR-IOV RFC which will probably touch >>> the driver structure as well; I plan to send something in a few days. > > As I mentioned in the other fork of this thread, I do think this is > a good start. So unless someone disagrees, I'd like to go with this > series (perhaps with better wording in the commit messages, and maybe > a better comment above the probe() failure return) for now. Indicating whether the driver supports VFs through a boolean in struct pci_driver is about the same effort (well, maybe slightly more), but solves the problem in a cleaner way since it avoids probe() being called in the first place. Other existing drivers benefit from that as well. Forget about the SR-IOV RFC I was talking about; I really just intended to offer to take care of that. :)
On Thu, Oct 02, 2025 at 12:52:10AM +0200, Danilo Krummrich wrote: > Indicating whether the driver supports VFs through a boolean in struct > pci_driver is about the same effort (well, maybe slightly more), but solves the > problem in a cleaner way since it avoids probe() being called in the first > place. Other existing drivers benefit from that as well. I'm strongly against that idea. Drivers should not be doing things like this, giving them core code helpers to do something they should not do is the wrong direction. I think this patchset should be simply dropped. Novacore should try to boot on a VF and fail if it isn't setup. Jason
On Thu Oct 2, 2025 at 2:01 PM CEST, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 12:52:10AM +0200, Danilo Krummrich wrote: > >> Indicating whether the driver supports VFs through a boolean in struct >> pci_driver is about the same effort (well, maybe slightly more), but solves the >> problem in a cleaner way since it avoids probe() being called in the first >> place. Other existing drivers benefit from that as well. > > I'm strongly against that idea. > > Drivers should not be doing things like this, giving them core code > helpers to do something they should not do is the wrong direction. > > I think this patchset should be simply dropped. Novacore should try to > boot on a VF and fail if it isn't setup. Why? What about other upstream drivers that clearly assert that they don't support VFs? Why would we want to force them to try to boot to a point where they "naturally" fail? I think there's nothing wrong with allowing drivers to "officially" assert that they're intended for PFs only. Here are a few examples of drivers that have the same requirement: https://elixir.bootlin.com/linux/v6.17/source/drivers/net/ethernet/realtek/rtase/rtase_main.c#L2195 https://elixir.bootlin.com/linux/v6.17/source/drivers/net/ethernet/intel/ice/ice_main.c#L5266 https://elixir.bootlin.com/linux/v6.17/source/drivers/net/ethernet/intel/igb/igb_main.c#L3221
On Thu, Oct 02, 2025 at 02:08:27PM +0200, Danilo Krummrich wrote:
> Why? What about other upstream drivers that clearly assert that they don't
> support VFs?
They shouldn't be doing that either. There is lots of junk in Linux,
that doesn't mean it should be made first-class to encourage more
people to do the wrong thing.
> Why would we want to force them to try to boot to a point where
> they "naturally" fail?
We want them to work.
> https://elixir.bootlin.com/linux/v6.17/source/drivers/net/ethernet/realtek/rtase/rtase_main.c#L2195
> https://elixir.bootlin.com/linux/v6.17/source/drivers/net/ethernet/intel/ice/ice_main.c#L5266
> https://elixir.bootlin.com/linux/v6.17/source/drivers/net/ethernet/intel/igb/igb_main.c#L3221
This usage seems wrong to me:
commit 50ac7479846053ca8054be833c1594e64de496bb
Author: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Date: Wed Jul 28 12:39:10 2021 -0700
ice: Prevent probing virtual functions
The userspace utility "driverctl" can be used to change/override the
system's default driver choices. This is useful in some situations
(buggy driver, old driver missing a device ID, trying a workaround,
etc.) where the user needs to load a different driver.
However, this is also prone to user error, where a driver is mapped
to a device it's not designed to drive. For example, if the ice driver
is mapped to driver iavf devices, the ice driver crashes.
Add a check to return an error if the ice driver is being used to
probe a virtual function.
Decoding this.. There is actually an "iavf" driver, and it does have
special PCI IDs for VFs:
static const struct pci_device_id iavf_pci_tbl[] = {
{PCI_VDEVICE(INTEL, IAVF_DEV_ID_VF), 0},
{PCI_VDEVICE(INTEL, IAVF_DEV_ID_VF_HV), 0},
{PCI_VDEVICE(INTEL, IAVF_DEV_ID_X722_VF), 0},
{PCI_VDEVICE(INTEL, IAVF_DEV_ID_ADAPTIVE_VF), 0},
In normal cases iavf will probe to the SRIOV VFS just fine.
The above is saying if the user mis-uses driverctl to bind the ice
driver to a function that doesn't have matching PCI IDs then the
kernel crashes. Yeah. I'm pretty sure that is true for a lot of
drivers. Bind them to HW not in their ID tables and their are not
going to work right.
I would have rejected a patch like this. The ID table is already
correct and properly excludes VFs.
Jason
On Thu Oct 2, 2025 at 2:32 PM CEST, Jason Gunthorpe wrote: > On Thu, Oct 02, 2025 at 02:08:27PM +0200, Danilo Krummrich wrote: > >> Why? What about other upstream drivers that clearly assert that they don't >> support VFs? > > They shouldn't be doing that either. There is lots of junk in Linux, > that doesn't mean it should be made first-class to encourage more > people to do the wrong thing. Let's discontinue this thread and keep discussing in the v2 one, since the discussions converge.
On 10/1/25 3:52 PM, Danilo Krummrich wrote: > On Thu Oct 2, 2025 at 12:38 AM CEST, John Hubbard wrote: >> On 10/1/25 6:52 AM, Zhi Wang wrote: >>> On 1.10.2025 13.32, Danilo Krummrich wrote: >>>> On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >>>>> On 9/30/25 5:29 PM, Alistair Popple wrote: >>>>>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >> ... >> As I mentioned in the other fork of this thread, I do think this is >> a good start. So unless someone disagrees, I'd like to go with this >> series (perhaps with better wording in the commit messages, and maybe >> a better comment above the probe() failure return) for now. > > Indicating whether the driver supports VFs through a boolean in struct > pci_driver is about the same effort (well, maybe slightly more), but solves the > problem in a cleaner way since it avoids probe() being called in the first > place. Other existing drivers benefit from that as well. Yes, that is cleaner, and like you say, nearly as easy. > > Forget about the SR-IOV RFC I was talking about; I really just intended to offer > to take care of that. :) I can send out a v2 with that "PCI driver bool: supports VFs" approach, glad to do that. thanks, -- John Hubbard
> On Oct 1, 2025, at 7:00 PM, John Hubbard <jhubbard@nvidia.com> wrote: > > On 10/1/25 3:52 PM, Danilo Krummrich wrote: >>> On Thu Oct 2, 2025 at 12:38 AM CEST, John Hubbard wrote: >>> On 10/1/25 6:52 AM, Zhi Wang wrote: >>>> On 1.10.2025 13.32, Danilo Krummrich wrote: >>>>> On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >>>>>> On 9/30/25 5:29 PM, Alistair Popple wrote: >>>>>>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >>> ... >>> As I mentioned in the other fork of this thread, I do think this is >>> a good start. So unless someone disagrees, I'd like to go with this >>> series (perhaps with better wording in the commit messages, and maybe >>> a better comment above the probe() failure return) for now. >> >> Indicating whether the driver supports VFs through a boolean in struct >> pci_driver is about the same effort (well, maybe slightly more), but solves the >> problem in a cleaner way since it avoids probe() being called in the first >> place. Other existing drivers benefit from that as well. > > Yes, that is cleaner, and like you say, nearly as easy. > >> >> Forget about the SR-IOV RFC I was talking about; I really just intended to offer >> to take care of that. :) > > I can send out a v2 with that "PCI driver bool: supports VFs" approach, > glad to do that. Here is my opinion and correct me if I missed something: It feels premature to remove the option of nova-core binding to a VF, since other options to disable auto probing do exist as Jason pointed out. Taking a parallel with VFIO pass through for instance, the user already has to do some diligence like preventing drivers from binding and then making vfio-pci bind to the device IDs. This case is similar though slightly different, but VFIO setup requires some configuration anyway so will it really improve anything? I quietly suggest holding on till there is a real need or we are sure nova cannot bind to, or operate on a VF. This might also close the door to say any future testing we may do by binding to a VF for instance (yes we can delete a statement but..). Just my suggestion, but I do not strongly oppose either. thanks, - Joel > > > thanks, > -- > John Hubbard >
On 10/1/25 4:47 PM, Joel Fernandes wrote: >> On Oct 1, 2025, at 7:00 PM, John Hubbard <jhubbard@nvidia.com> wrote: >> On 10/1/25 3:52 PM, Danilo Krummrich wrote: >>>> On Thu Oct 2, 2025 at 12:38 AM CEST, John Hubbard wrote: >>>> On 10/1/25 6:52 AM, Zhi Wang wrote: >>>>> On 1.10.2025 13.32, Danilo Krummrich wrote: >>>>>> On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >>>>>>> On 9/30/25 5:29 PM, Alistair Popple wrote: >>>>>>>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >>>> ... > Here is my opinion and correct me if I missed something: > > It feels premature to remove the option of nova-core binding to a VF, since other options to disable auto probing do exist as Jason pointed out. > > Taking a parallel with VFIO pass through for instance, the user already has to do some diligence like preventing drivers from binding and then making vfio-pci bind to the device IDs. This case is similar though slightly different, but VFIO setup requires some configuration anyway so will it really improve anything? > > I quietly suggest holding on till there is a real need or we are sure nova cannot bind to, or operate on a VF. This I'm confident that nova-core cannot properly handle a VF with *today's* code. There is no expectation at all for a VF to show up--yet. Which is why I think it's appropriate to skip it right now. thanks, -- John Hubbard
On Thu Oct 2, 2025 at 1:51 AM CEST, John Hubbard wrote: > On 10/1/25 4:47 PM, Joel Fernandes wrote: >>> On Oct 1, 2025, at 7:00 PM, John Hubbard <jhubbard@nvidia.com> wrote: >>> On 10/1/25 3:52 PM, Danilo Krummrich wrote: >>>>> On Thu Oct 2, 2025 at 12:38 AM CEST, John Hubbard wrote: >>>>> On 10/1/25 6:52 AM, Zhi Wang wrote: >>>>>> On 1.10.2025 13.32, Danilo Krummrich wrote: >>>>>>> On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >>>>>>>> On 9/30/25 5:29 PM, Alistair Popple wrote: >>>>>>>>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >>>>> ... >> Here is my opinion and correct me if I missed something: >> >> It feels premature to remove the option of nova-core binding to a VF, since other options to disable auto probing do exist as Jason pointed out. >> >> Taking a parallel with VFIO pass through for instance, the user already has to do some diligence like preventing drivers from binding and then making vfio-pci bind to the device IDs. This case is similar though slightly different, but VFIO setup requires some configuration anyway so will it really improve anything? >> >> I quietly suggest holding on till there is a real need or we are sure nova cannot bind to, or operate on a VF. This > > I'm confident that nova-core cannot properly handle a VF with *today's* code. > There is no expectation at all for a VF to show up--yet. > > Which is why I think it's appropriate to skip it right now. I agree with John. If a driver does not support a certain device, it is not the user's responsibility to prevent probing. Currently nova-core does not support VFs, so it should never get probed for them in the first place.
> On Oct 1, 2025, at 7:56 PM, Danilo Krummrich <dakr@kernel.org> wrote: > > On Thu Oct 2, 2025 at 1:51 AM CEST, John Hubbard wrote: >> On 10/1/25 4:47 PM, Joel Fernandes wrote: >>>> On Oct 1, 2025, at 7:00 PM, John Hubbard <jhubbard@nvidia.com> wrote: >>>> On 10/1/25 3:52 PM, Danilo Krummrich wrote: >>>>>> On Thu Oct 2, 2025 at 12:38 AM CEST, John Hubbard wrote: >>>>>> On 10/1/25 6:52 AM, Zhi Wang wrote: >>>>>>> On 1.10.2025 13.32, Danilo Krummrich wrote: >>>>>>>> On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >>>>>>>>> On 9/30/25 5:29 PM, Alistair Popple wrote: >>>>>>>>>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >>>>>> ... >>> Here is my opinion and correct me if I missed something: >>> >>> It feels premature to remove the option of nova-core binding to a VF, since other options to disable auto probing do exist as Jason pointed out. >>> >>> Taking a parallel with VFIO pass through for instance, the user already has to do some diligence like preventing drivers from binding and then making vfio-pci bind to the device IDs. This case is similar though slightly different, but VFIO setup requires some configuration anyway so will it really improve anything? >>> >>> I quietly suggest holding on till there is a real need or we are sure nova cannot bind to, or operate on a VF. This >> >> I'm confident that nova-core cannot properly handle a VF with *today's* code. >> There is no expectation at all for a VF to show up--yet. >> >> Which is why I think it's appropriate to skip it right now. > > I agree with John. > > If a driver does not support a certain device, it is not the user's > responsibility to prevent probing. Currently nova-core does not support VFs, so > it should never get probed for them in the first place. That works for me. If we are doing this, I would also suggest adding a detailed comment preceding the if statement, saying the reason for this is because the VFs share the same device IDs when in reality we have 2 different drivers that handle the different functions. Thanks.
On 10/1/25 5:48 PM, Joel Fernandes wrote: >> On Oct 1, 2025, at 7:56 PM, Danilo Krummrich <dakr@kernel.org> wrote: >> On Thu Oct 2, 2025 at 1:51 AM CEST, John Hubbard wrote: >>> On 10/1/25 4:47 PM, Joel Fernandes wrote: >>>>> On Oct 1, 2025, at 7:00 PM, John Hubbard <jhubbard@nvidia.com> wrote: >>>>> On 10/1/25 3:52 PM, Danilo Krummrich wrote: >>>>>>> On Thu Oct 2, 2025 at 12:38 AM CEST, John Hubbard wrote: >>>>>>> On 10/1/25 6:52 AM, Zhi Wang wrote: >>>>>>>> On 1.10.2025 13.32, Danilo Krummrich wrote: >>>>>>>>> On Wed Oct 1, 2025 at 3:22 AM CEST, John Hubbard wrote: >>>>>>>>>> On 9/30/25 5:29 PM, Alistair Popple wrote: >>>>>>>>>>> On 2025-10-01 at 08:07 +1000, John Hubbard <jhubbard@nvidia.com> wrote... >>>>>>> ... >> If a driver does not support a certain device, it is not the user's >> responsibility to prevent probing. Currently nova-core does not support VFs, so >> it should never get probed for them in the first place. > > That works for me. If we are doing this, I would also suggest adding a detailed comment preceding the if statement, The nova-core piece that decides this is not an if statement. It's a const. It really is cleaner. :) saying the reason for this is because the VFs share the same device IDs when in reality we have 2 different drivers that handle the different functions. > I've got it passing tests already, I'll add appropriate comments and post it shortly, and let's see what you think. thanks, -- John Hubbard
On Wed, Oct 01, 2025 at 05:54:45PM -0700, John Hubbard wrote: > saying the reason for this is because the VFs share the same device > IDs when in reality we have 2 different drivers that handle the > different functions. That's the fundamental problem here. Presenting the same device ID when the device actually has a very different programming model is against how PCI is supposed to operate. For examplem mlx5 devices give unique IDs to their VFs - though they don't have different programming models. If novacore doesn't work on VFs at all, even in VMs, then use the register based detection that was said earlier. Jason
© 2016 - 2026 Red Hat, Inc.