>-----Original Message----- >From: Alex Williamson <alex.williamson@redhat.com> >Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in >d3hot state > >On Thu, 20 Feb 2025 04:24:13 +0000 >"Duan, Zhenzhong" <zhenzhong.duan@intel.com> wrote: > >> >-----Original Message----- >> >From: Alex Williamson <alex.williamson@redhat.com> >> >Subject: Re: [RFC 0/2] hw/vfio/pci: Prevent BARs from being dma mapped in >> >d3hot state >> > >> >On Wed, 19 Feb 2025 18:58:58 +0100 >> >Eric Auger <eric.auger@redhat.com> wrote: >> > >> >> Since kernel commit: >> >> 2b2c651baf1c ("vfio/pci: Invalidate mmaps and block the access >> >> in D3hot power state") >> >> any attempt to do an mmap access to a BAR when the device is in d3hot >> >> state will generate a fault. >> >> >> >> On system_powerdown, if the VFIO device is translated by an IOMMU, >> >> the device is moved to D3hot state and then the vIOMMU gets disabled >> >> by the guest. As a result of this later operation, the address space is >> >> swapped from translated to untranslated. When re-enabling the aliased >> >> regions, the RAM regions are dma-mapped again and this causes DMA_MAP >> >> faults when attempting the operation on BARs. >> >> >> >> To avoid doing the remap on those BARs, we compute whether the >> >> device is in D3hot state and if so, skip the DMA MAP. >> > >> >Thinking on this some more, QEMU PCI code already manages the device >> >BARs appearing in the address space based on the memory enable bit in >> >the command register. Should we do the same for PM state? >> > >> >IOW, the device going into low power state should remove the BARs from >> >the AddressSpace and waking the device should re-add them. The BAR DMA >> >mapping should then always be consistent, whereas here nothing would >> >remap the BARs when the device is woken. >> >> If BARs should be disabled before D3hot transition, isn't it guest's responsibility >to do that itself? >> Just like what have been done for FLR which calls pci_dev_save_and_disable(). > >Nothing requires the guest to clear memory and IO from the command >register before entering a low power state, nor are we going to get >very far arguing that it's the guest's fault for triggering an error in >the hypervisor. The PCI spec indicates that memory and IO BARs are only >accessible when the device is in the D0 power state. On bare metal >accessing the BAR for a device in a low power state would generate an >unsupported request. Understood, yes it makes sense to remove BARs from AddressSpace when D3hot. > Therefore why should QEMU map BARs of devices in >low power states into the address space? Should not. Thanks Zhenzhong
© 2016 - 2025 Red Hat, Inc.