[PATCH v2 1/9] PCI: Avoid restoring error values in config space

Farhan Ali posted 9 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH v2 1/9] PCI: Avoid restoring error values in config space
Posted by Farhan Ali 1 month, 1 week ago
The current reset process saves the device's config space state before
reset and restores it afterward. However, when a device is in an error
state before reset, config space reads may return error values instead of
valid data. This results in saving corrupted values that get written back
to the device during state restoration. Add validation to prevent writing
error values to the device when restoring the config space state after
reset.

Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
---
 drivers/pci/pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b0f4d98036cd..0dd95d782022 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1825,6 +1825,9 @@ static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
 	if (!force && val == saved_val)
 		return;
 
+	if (PCI_POSSIBLE_ERROR(saved_val))
+		return;
+
 	for (;;) {
 		pci_dbg(pdev, "restore config %#04x: %#010x -> %#010x\n",
 			offset, val, saved_val);
-- 
2.43.0
Re: [PATCH v2 1/9] PCI: Avoid restoring error values in config space
Posted by Alex Williamson 1 month, 1 week ago
On Mon, 25 Aug 2025 10:12:18 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> The current reset process saves the device's config space state before
> reset and restores it afterward. However, when a device is in an error
> state before reset, config space reads may return error values instead of
> valid data. This results in saving corrupted values that get written back
> to the device during state restoration. Add validation to prevent writing
> error values to the device when restoring the config space state after
> reset.
> 
> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> ---
>  drivers/pci/pci.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index b0f4d98036cd..0dd95d782022 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1825,6 +1825,9 @@ static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
>  	if (!force && val == saved_val)
>  		return;
>  
> +	if (PCI_POSSIBLE_ERROR(saved_val))
> +		return;
> +
>  	for (;;) {
>  		pci_dbg(pdev, "restore config %#04x: %#010x -> %#010x\n",
>  			offset, val, saved_val);


The commit log makes this sound like more than it is.  We're really
only error checking the first 64 bytes of config space before restore,
the capabilities are not checked.  I suppose skipping the BARs and
whatnot is no worse than writing -1 to them, but this is only a
complete solution in the narrow case where we're relying on vfio-pci to
come in and restore the pre-open device state.

I had imagined that pci_save_state() might detect the error state of
the device, avoid setting state_saved, but we'd still perform the
restore callouts that only rely on internal kernel state, maybe adding a
fallback to restore the BARs from resource information.

This implementation serves a purpose, but the commit log should
describe the specific, narrow scenario this solves, and probably also
add a comment in the code about why we're not consistently checking the
saved state for errors.  Thanks,

Alex
Re: [PATCH v2 1/9] PCI: Avoid restoring error values in config space
Posted by Farhan Ali 1 month, 1 week ago
On 8/25/2025 2:35 PM, Alex Williamson wrote:
> On Mon, 25 Aug 2025 10:12:18 -0700
> Farhan Ali <alifm@linux.ibm.com> wrote:
>
>> The current reset process saves the device's config space state before
>> reset and restores it afterward. However, when a device is in an error
>> state before reset, config space reads may return error values instead of
>> valid data. This results in saving corrupted values that get written back
>> to the device during state restoration. Add validation to prevent writing
>> error values to the device when restoring the config space state after
>> reset.
>>
>> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
>> ---
>>   drivers/pci/pci.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>> index b0f4d98036cd..0dd95d782022 100644
>> --- a/drivers/pci/pci.c
>> +++ b/drivers/pci/pci.c
>> @@ -1825,6 +1825,9 @@ static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
>>   	if (!force && val == saved_val)
>>   		return;
>>   
>> +	if (PCI_POSSIBLE_ERROR(saved_val))
>> +		return;
>> +
>>   	for (;;) {
>>   		pci_dbg(pdev, "restore config %#04x: %#010x -> %#010x\n",
>>   			offset, val, saved_val);
>
> The commit log makes this sound like more than it is.  We're really
> only error checking the first 64 bytes of config space before restore,
> the capabilities are not checked.  I suppose skipping the BARs and
> whatnot is no worse than writing -1 to them, but this is only a
> complete solution in the narrow case where we're relying on vfio-pci to
> come in and restore the pre-open device state.
>
> I had imagined that pci_save_state() might detect the error state of
> the device, avoid setting state_saved, but we'd still perform the
> restore callouts that only rely on internal kernel state, maybe adding a
> fallback to restore the BARs from resource information.

I initially started with pci_save_state(), and avoid saving the state 
altogether. But that would mean we don't go restore the msix state and 
for s390 don't call arch_restore_msi_irqs(). Do you prefer to avoid 
saving the state at all? This change was small and sufficient enough to 
avoid breaking the device in my testing.

>
> This implementation serves a purpose, but the commit log should
> describe the specific, narrow scenario this solves, and probably also
> add a comment in the code about why we're not consistently checking the
> saved state for errors.  Thanks,
>
> Alex
Yes, I can re-word the commit message.

Thanks
Farhan
Re: [PATCH v2 1/9] PCI: Avoid restoring error values in config space
Posted by Alex Williamson 1 month, 1 week ago
On Mon, 25 Aug 2025 15:13:00 -0700
Farhan Ali <alifm@linux.ibm.com> wrote:

> On 8/25/2025 2:35 PM, Alex Williamson wrote:
> > On Mon, 25 Aug 2025 10:12:18 -0700
> > Farhan Ali <alifm@linux.ibm.com> wrote:
> >  
> >> The current reset process saves the device's config space state before
> >> reset and restores it afterward. However, when a device is in an error
> >> state before reset, config space reads may return error values instead of
> >> valid data. This results in saving corrupted values that get written back
> >> to the device during state restoration. Add validation to prevent writing
> >> error values to the device when restoring the config space state after
> >> reset.
> >>
> >> Signed-off-by: Farhan Ali <alifm@linux.ibm.com>
> >> ---
> >>   drivers/pci/pci.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >> index b0f4d98036cd..0dd95d782022 100644
> >> --- a/drivers/pci/pci.c
> >> +++ b/drivers/pci/pci.c
> >> @@ -1825,6 +1825,9 @@ static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> >>   	if (!force && val == saved_val)
> >>   		return;
> >>   
> >> +	if (PCI_POSSIBLE_ERROR(saved_val))
> >> +		return;
> >> +
> >>   	for (;;) {
> >>   		pci_dbg(pdev, "restore config %#04x: %#010x -> %#010x\n",
> >>   			offset, val, saved_val);  
> >
> > The commit log makes this sound like more than it is.  We're really
> > only error checking the first 64 bytes of config space before restore,
> > the capabilities are not checked.  I suppose skipping the BARs and
> > whatnot is no worse than writing -1 to them, but this is only a
> > complete solution in the narrow case where we're relying on vfio-pci to
> > come in and restore the pre-open device state.
> >
> > I had imagined that pci_save_state() might detect the error state of
> > the device, avoid setting state_saved, but we'd still perform the
> > restore callouts that only rely on internal kernel state, maybe adding a
> > fallback to restore the BARs from resource information.  
> 
> I initially started with pci_save_state(), and avoid saving the state 
> altogether. But that would mean we don't go restore the msix state and 
> for s390 don't call arch_restore_msi_irqs(). Do you prefer to avoid 
> saving the state at all? This change was small and sufficient enough to 
> avoid breaking the device in my testing.

If we're only reading -1 from the device anyway, I'm not sure what
value we're adding to continue to save bogus data from the device.
There are also various restore sub-functions that don't need that saved
state, ex. PASID, PRI, ATS, REBAR, AER, MSI, MSIX, ACS, VF REBAR,
SRIOV.  We could push the state_saved check down into the functions
that do need the prior device state, add warnings and let the remaining
function proceed.  We really need to at least pull BAR values from
resources information for there to be a chance of a functional device
without relying on vfio-pci to restore that though.  Thanks,

Alex