[PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC

Terry Bowman posted 34 patches 3 weeks, 4 days ago
There is a newer version of this series
[PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
Posted by Terry Bowman 3 weeks, 4 days ago
The CXL driver's error handling for uncorrectable errors (UCE) will be
updated in the future. A required change is for the error handlers to
to force a system panic when a UCE is detected.

Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
be used by CXL UCE fatal and non-fatal recovery in future patches. Update
PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>

---

Changes in v13 -> v14:
- Add review-by for Dan
- Update Title prefix (Bjorn)
- Removed merge_result. Only logging error for device reporting the
  error (Dan)

Changes in  v12->v13:
- Add Dave Jiang's, Jonathan's, Ben's review-by
- Typo fix (Ben)

Changes v11 -> v12:
- Documentation requested (Lukas)
---
 Documentation/PCI/pci-error-recovery.rst | 2 ++
 include/linux/pci.h                      | 3 +++
 2 files changed, 5 insertions(+)

diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
index 43bc4e3665b4..82ee2c8c0450 100644
--- a/Documentation/PCI/pci-error-recovery.rst
+++ b/Documentation/PCI/pci-error-recovery.rst
@@ -102,6 +102,8 @@ Possible return values are::
 		PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
 		PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
 		PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
+		PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
+		PCI_ERS_RESULT_PANIC,       /* System is unstable, panic. Is CXL specific */
 	};
 
 A driver does not have to implement all of these callbacks; however,
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f8e8b3df794d..ee05d5925b13 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -921,6 +921,9 @@ enum pci_ers_result {
 
 	/* No AER capabilities registered for the driver */
 	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
+
+	/* System is unstable, panic. Is CXL specific */
+	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
 };
 
 /* PCI bus error event callbacks */
-- 
2.34.1
Re: [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
Posted by Kuppuswamy Sathyanarayanan 3 weeks, 4 days ago
Hi,

On 1/14/2026 10:20 AM, Terry Bowman wrote:
> The CXL driver's error handling for uncorrectable errors (UCE) will be
> updated in the future. A required change is for the error handlers to
> to force a system panic when a UCE is detected.
> 
> Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
> be used by CXL UCE fatal and non-fatal recovery in future patches. Update
> PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> 

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

> ---
> 
> Changes in v13 -> v14:
> - Add review-by for Dan
> - Update Title prefix (Bjorn)
> - Removed merge_result. Only logging error for device reporting the
>   error (Dan)
> 
> Changes in  v12->v13:
> - Add Dave Jiang's, Jonathan's, Ben's review-by
> - Typo fix (Ben)
> 
> Changes v11 -> v12:
> - Documentation requested (Lukas)
> ---
>  Documentation/PCI/pci-error-recovery.rst | 2 ++
>  include/linux/pci.h                      | 3 +++
>  2 files changed, 5 insertions(+)
> 
> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> index 43bc4e3665b4..82ee2c8c0450 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -102,6 +102,8 @@ Possible return values are::
>  		PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
>  		PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
>  		PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
> +		PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
> +		PCI_ERS_RESULT_PANIC,       /* System is unstable, panic. Is CXL specific */
>  	};

I think you also need to update the "Detailed Steps" section of this
document to include details on when these new values should be returned
and how they affect the recovery flow.

>  
>  A driver does not have to implement all of these callbacks; however,
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index f8e8b3df794d..ee05d5925b13 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -921,6 +921,9 @@ enum pci_ers_result {
>  
>  	/* No AER capabilities registered for the driver */
>  	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> +
> +	/* System is unstable, panic. Is CXL specific */
> +	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
>  };
>  
>  /* PCI bus error event callbacks */

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
Re: [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
Posted by Bowman, Terry 3 weeks, 4 days ago
On 1/14/2026 12:58 PM, Kuppuswamy Sathyanarayanan wrote:
> Hi,
> 
> On 1/14/2026 10:20 AM, Terry Bowman wrote:
>> The CXL driver's error handling for uncorrectable errors (UCE) will be
>> updated in the future. A required change is for the error handlers to
>> to force a system panic when a UCE is detected.
>>
>> Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
>> be used by CXL UCE fatal and non-fatal recovery in future patches. Update
>> PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>>
> 
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
>> ---
>>
>> Changes in v13 -> v14:
>> - Add review-by for Dan
>> - Update Title prefix (Bjorn)
>> - Removed merge_result. Only logging error for device reporting the
>>   error (Dan)
>>
>> Changes in  v12->v13:
>> - Add Dave Jiang's, Jonathan's, Ben's review-by
>> - Typo fix (Ben)
>>
>> Changes v11 -> v12:
>> - Documentation requested (Lukas)
>> ---
>>  Documentation/PCI/pci-error-recovery.rst | 2 ++
>>  include/linux/pci.h                      | 3 +++
>>  2 files changed, 5 insertions(+)
>>
>> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
>> index 43bc4e3665b4..82ee2c8c0450 100644
>> --- a/Documentation/PCI/pci-error-recovery.rst
>> +++ b/Documentation/PCI/pci-error-recovery.rst
>> @@ -102,6 +102,8 @@ Possible return values are::
>>  		PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
>>  		PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
>>  		PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
>> +		PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
>> +		PCI_ERS_RESULT_PANIC,       /* System is unstable, panic. Is CXL specific */
>>  	};
> 
> I think you also need to update the "Detailed Steps" section of this
> document to include details on when these new values should be returned
> and how they affect the recovery flow.
> 

I had details about PCI_ERS_RESULT_PANIC you mention in v13. Bjorne asked me to remove.

-Terry

>>  
>>  A driver does not have to implement all of these callbacks; however,
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index f8e8b3df794d..ee05d5925b13 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -921,6 +921,9 @@ enum pci_ers_result {
>>  
>>  	/* No AER capabilities registered for the driver */
>>  	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>> +
>> +	/* System is unstable, panic. Is CXL specific */
>> +	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
>>  };
>>  
>>  /* PCI bus error event callbacks */
>
Re: [PATCH v14 27/34] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC
Posted by Kuppuswamy Sathyanarayanan 3 weeks, 4 days ago
Hi,

On 1/14/2026 11:20 AM, Bowman, Terry wrote:
> On 1/14/2026 12:58 PM, Kuppuswamy Sathyanarayanan wrote:
>> Hi,
>>
>> On 1/14/2026 10:20 AM, Terry Bowman wrote:
>>> The CXL driver's error handling for uncorrectable errors (UCE) will be
>>> updated in the future. A required change is for the error handlers to
>>> to force a system panic when a UCE is detected.
>>>
>>> Introduce PCI_ERS_RESULT_PANIC as a 'enum pci_ers_result' type. This will
>>> be used by CXL UCE fatal and non-fatal recovery in future patches. Update
>>> PCIe recovery documentation with details of PCI_ERS_RESULT_PANIC.
>>>
>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>>>
>>
>> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>>> ---
>>>
>>> Changes in v13 -> v14:
>>> - Add review-by for Dan
>>> - Update Title prefix (Bjorn)
>>> - Removed merge_result. Only logging error for device reporting the
>>>   error (Dan)
>>>
>>> Changes in  v12->v13:
>>> - Add Dave Jiang's, Jonathan's, Ben's review-by
>>> - Typo fix (Ben)
>>>
>>> Changes v11 -> v12:
>>> - Documentation requested (Lukas)
>>> ---
>>>  Documentation/PCI/pci-error-recovery.rst | 2 ++
>>>  include/linux/pci.h                      | 3 +++
>>>  2 files changed, 5 insertions(+)
>>>
>>> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
>>> index 43bc4e3665b4..82ee2c8c0450 100644
>>> --- a/Documentation/PCI/pci-error-recovery.rst
>>> +++ b/Documentation/PCI/pci-error-recovery.rst
>>> @@ -102,6 +102,8 @@ Possible return values are::
>>>  		PCI_ERS_RESULT_NEED_RESET,  /* Device driver wants slot to be reset. */
>>>  		PCI_ERS_RESULT_DISCONNECT,  /* Device has completely failed, is unrecoverable */
>>>  		PCI_ERS_RESULT_RECOVERED,   /* Device driver is fully recovered and operational */
>>> +		PCI_ERS_RESULT_NO_AER_DRIVER, /* No AER capabilities registered for the driver */
>>> +		PCI_ERS_RESULT_PANIC,       /* System is unstable, panic. Is CXL specific */
>>>  	};
>>
>> I think you also need to update the "Detailed Steps" section of this
>> document to include details on when these new values should be returned
>> and how they affect the recovery flow.
>>
> 
> I had details about PCI_ERS_RESULT_PANIC you mention in v13. Bjorne asked me to remove.

Sorry, I did not check the previous version.

What about PCI_ERS_RESULT_NO_AER_DRIVER? I think it needs to be included part
of STEP 1 details, but likely as a separate patch since it is unrelated to the
CXL changes in this series.

> 
> -Terry
> 
>>>  
>>>  A driver does not have to implement all of these callbacks; however,
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index f8e8b3df794d..ee05d5925b13 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -921,6 +921,9 @@ enum pci_ers_result {
>>>  
>>>  	/* No AER capabilities registered for the driver */
>>>  	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>>> +
>>> +	/* System is unstable, panic. Is CXL specific */
>>> +	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
>>>  };
>>>  
>>>  /* PCI bus error event callbacks */
>>
> 

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer