RE: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state

Duan, Zhenzhong posted 2 patches 2 days, 13 hours ago
Only 0 patches received!
RE: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in d3hot state
Posted by Duan, Zhenzhong 2 days, 13 hours ago

>-----Original Message-----
>From: Alex Williamson <alex.williamson@redhat.com>
>Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in
>d3hot state
>
>On Thu, 20 Feb 2025 04:24:13 +0000
>"Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote:
>
>> >-----Original Message-----
>> >From: Alex Williamson <alex.williamson@redhat.com>
>> >Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in
>> >d3hot state
>> >
>> >On Wed, 19 Feb 2025 18:58:58 +0100
>> >Eric Auger <eric.auger@redhat.com> wrote:
>> >
>> >> Since kernel commit:
>> >> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access
>> >> in D3hot power state")
>> >> any attempt to do an mmap access to a BAR when the device is in d3hot
>> >> state will generate a fault.
>> >>
>> >> On system_powerdown, if the VFIO device is translated by an IOMMU,
>> >> the device is moved to D3hot state and then the vIOMMU gets disabled
>> >> by the guest. As a result of this later operation, the address space is
>> >> swapped from translated to untranslated. When re-enabling the aliased
>> >> regions, the RAM regions are dma-mapped again and this causes DMA_MAP
>> >> faults when attempting the operation on BARs.
>> >>
>> >> To avoid doing the remap on those BARs, we compute whether the
>> >> device is in D3hot state and if so, skip the DMA MAP.
>> >
>> >Thinking on this some more, QEMU PCI code already manages the device
>> >BARs appearing in the address space based on the memory enable bit in
>> >the command register.  Should we do the same for PM state?
>> >
>> >IOW, the device going into low power state should remove the BARs from
>> >the AddressSpace and waking the device should re-add them.  The BAR DMA
>> >mapping should then always be consistent, whereas here nothing would
>> >remap the BARs when the device is woken.
>>
>> If BARs should be disabled before D3hot transition, isn't it guest's responsibility
>to do that itself?
>> Just like what have been done for FLR which calls pci_dev_save_and_disable().
>
>Nothing requires the guest to clear memory and IO from the command
>register before entering a low power state, nor are we going to get
>very far arguing that it's the guest's fault for triggering an error in
>the hypervisor.  The PCI spec indicates that memory and IO BARs are only
>accessible when the device is in the D0 power state.  On bare metal
>accessing the BAR for a device in a low power state would generate an
>unsupported request.

Understood, yes it makes sense to remove BARs from AddressSpace when D3hot.

> Therefore why should QEMU map BARs of devices in
>low power states into the address space?
Should not.

Thanks
Zhenzhong