[PATCH v6 3/9] PCI: Avoid saving config space state if inaccessible

Farhan Ali posted 9 patches 2 weeks, 1 day ago
[PATCH v6 3/9] PCI: Avoid saving config space state if inaccessible
Posted by Farhan Ali 2 weeks, 1 day ago
The current reset process saves the device's config space state before
reset and restores it afterward. However, errors may occur unexpectedly,
and the device may become inaccessible or the config space itself may
be corrupted. This results in saving corrupted values that get
written back to the device during state restoration.

With a reset we want to recover/restore the device into a functional
state. So avoid saving the state of the config space when the
device config space is inaccessible/corrupted.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/pci.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 608d64900fee..28c6b9e7f526 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5105,6 +5105,7 @@ EXPORT_SYMBOL_GPL(pci_dev_unlock);
 
 static void pci_dev_save_and_disable(struct pci_dev *dev)
 {
+	u32 val;
 	const struct pci_error_handlers *err_handler =
 			dev->driver ? dev->driver->err_handler : NULL;
 
@@ -5125,6 +5126,12 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
 	 */
 	pci_set_power_state(dev, PCI_D0);
 
+	pci_read_config_dword(dev, PCI_COMMAND, &val);
+	if (PCI_POSSIBLE_ERROR(val)) {
+		pci_warn(dev, "Device config space inaccessible\n");
+		return;
+	}
+
 	pci_save_state(dev);
 	/*
 	 * Disable the device by clearing the Command register, except for
-- 
2.43.0
Re: [PATCH v6 3/9] PCI: Avoid saving config space state if inaccessible
Posted by Niklas Schnelle 2 weeks, 1 day ago
On Mon, 2025-12-01 at 14:08 -0800, Farhan Ali wrote:
> The current reset process saves the device's config space state before
> reset and restores it afterward. However, errors may occur unexpectedly,
> and the device may become inaccessible or the config space itself may
> be corrupted. This results in saving corrupted values that get
> written back to the device during state restoration.
> 
> With a reset we want to recover/restore the device into a functional
> state. So avoid saving the state of the config space when the
> device config space is inaccessible/corrupted.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>

I think the commit message needs more focus. Specifically I think the
main point is the case that Lukas mentioned in the following quote from
the cover letter of his "PCI: Universal error recoverability of
devices" series:

"However errors may occur unexpectedly and it may then be impossible
to save Config Space because the device may be inaccessible (e.g. DPC)
or Config Space may be corrupted. So it must be saved ahead of time."

That case will inevitably happen when state save / reset happens while
a PCI device is in the error state on a platform like s390, POWER, or
with DPC where Config Space will be inaccessible.

Moreover, I'd like to stress that this is an issue independent from the
rest of your series. As we've seen in your experiments this can be
triggered today when a vfio-pci user process blocks recovery, e.g. by
not handling the eventfd, and then the user tries to mitigate the
situation by performing a reset through sysfs, which then saves the
0xff bytes from inaccessible config space which may subsequently kill
the device on restore.

> ---
>  drivers/pci/pci.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 608d64900fee..28c6b9e7f526 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5105,6 +5105,7 @@ EXPORT_SYMBOL_GPL(pci_dev_unlock);
>  
>  static void pci_dev_save_and_disable(struct pci_dev *dev)
>  {
> +	u32 val;
>  	const struct pci_error_handlers *err_handler =
>  			dev->driver ? dev->driver->err_handler : NULL;
>  
> @@ -5125,6 +5126,12 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
>  	 */
>  	pci_set_power_state(dev, PCI_D0);
>  
> +	pci_read_config_dword(dev, PCI_COMMAND, &val);
> +	if (PCI_POSSIBLE_ERROR(val)) {
> +		pci_warn(dev, "Device config space inaccessible\n");
> +		return;
> +	}
> +

Can you explain your reasoning for not using pci_channel_offline()
here? This was suggested by Lukas in a previous iteration (link below)
and I would tend to prefer that as well.

https://lore.kernel.org/all/aOZoWDQV0TNh-NiM@wunner.de/

>  	pci_save_state(dev);
>  	/*
>  	 * Disable the device by clearing the Command register, except for
Re: [PATCH v6 3/9] PCI: Avoid saving config space state if inaccessible
Posted by Farhan Ali 2 weeks ago
On 12/2/2025 4:20 AM, Niklas Schnelle wrote:
> On Mon, 2025-12-01 at 14:08 -0800, Farhan Ali wrote:
>> The current reset process saves the device's config space state before
>> reset and restores it afterward. However, errors may occur unexpectedly,
>> and the device may become inaccessible or the config space itself may
>> be corrupted. This results in saving corrupted values that get
>> written back to the device during state restoration.
>>
>> With a reset we want to recover/restore the device into a functional
>> state. So avoid saving the state of the config space when the
>> device config space is inaccessible/corrupted.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> I think the commit message needs more focus. Specifically I think the
> main point is the case that Lukas mentioned in the following quote from
> the cover letter of his "PCI: Universal error recoverability of
> devices" series:
>
> "However errors may occur unexpectedly and it may then be impossible
> to save Config Space because the device may be inaccessible (e.g. DPC)
> or Config Space may be corrupted. So it must be saved ahead of time."

I agree, I can add this bit verbatim to the commit message.


>
> That case will inevitably happen when state save / reset happens while
> a PCI device is in the error state on a platform like s390, POWER, or
> with DPC where Config Space will be inaccessible.
>
> Moreover, I'd like to stress that this is an issue independent from the
> rest of your series. As we've seen in your experiments this can be
> triggered today when a vfio-pci user process blocks recovery, e.g. by
> not handling the eventfd, and then the user tries to mitigate the
> situation by performing a reset through sysfs, which then saves the
> 0xff bytes from inaccessible config space which may subsequently kill
> the device on restore.
>
>> ---
>>   drivers/pci/pci.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index 608d64900fee..28c6b9e7f526 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -5105,6 +5105,7 @@ EXPORT_SYMBOL_GPL(pci_dev_unlock);
>>   
>>   static void pci_dev_save_and_disable(struct pci_dev *dev)
>>   {
>> +	u32 val;
>>   	const struct pci_error_handlers *err_handler =
>>   			dev->driver ? dev->driver->err_handler : NULL;
>>   
>> @@ -5125,6 +5126,12 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
>>   	 */
>>   	pci_set_power_state(dev, PCI_D0);
>>   
>> +	pci_read_config_dword(dev, PCI_COMMAND, &val);
>> +	if (PCI_POSSIBLE_ERROR(val)) {
>> +		pci_warn(dev, "Device config space inaccessible\n");
>> +		return;
>> +	}
>> +
> Can you explain your reasoning for not using pci_channel_offline()
> here? This was suggested by Lukas in a previous iteration (link below)
> and I would tend to prefer that as well.
>
> https://lore.kernel.org/all/aOZoWDQV0TNh-NiM@wunner.de/

AFAICT the error_state flag (checked in pci_channel_offline()) is set by 
error recovery code, when we get an error. I think using 
pci_channel_offline() creates a small window where the device may have 
already gone into an error state and thus the config space is 
inaccessible, but the error recovery code might not have set the flag. 
This can happen for example if we try to reset a device (with an ioctl 
like VFIO_DEVICE_PCI_HOT_RESET), and an error happens while we are in 
this function, in the middle of handling the reset.

I think reading directly from the config space, might be better 
indicator of device's state?

Thanks

Farhan

>
>>   	pci_save_state(dev);
>>   	/*
>>   	 * Disable the device by clearing the Command register, except for