[PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type

Terry Bowman posted 15 patches 1 month, 2 weeks ago
There is a newer version of this series
[PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type
Posted by Terry Bowman 1 month, 2 weeks ago
The CXL AER service will be updated to support CXL PCIe port error
handling in the future. These devices will use a system panic during
recovery handling.

Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 include/linux/pci.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4cf89a4b4cbc..6f7e7371161d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -857,6 +857,9 @@ enum pci_ers_result {
 
 	/* No AER capabilities registered for the driver */
 	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
+
+	/* Device state requires system panic */
+	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
 };
 
 /* PCI bus error event callbacks */
-- 
2.34.1
Re: [PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type
Posted by Jonathan Cameron 1 month, 1 week ago
On Tue, 8 Oct 2024 17:16:48 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> The CXL AER service will be updated to support CXL PCIe port error
> handling in the future. These devices will use a system panic during
> recovery handling.

Recovery handling by panic? :) That's an interesting form of recovery..

> 
> Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
>  include/linux/pci.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 4cf89a4b4cbc..6f7e7371161d 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -857,6 +857,9 @@ enum pci_ers_result {
>  
>  	/* No AER capabilities registered for the driver */
>  	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> +
> +	/* Device state requires system panic */
> +	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
>  };
>  
>  /* PCI bus error event callbacks */
Re: [PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type
Posted by Terry Bowman 1 month, 1 week ago

On 10/16/24 11:30, Jonathan Cameron wrote:
> On Tue, 8 Oct 2024 17:16:48 -0500
> Terry Bowman <terry.bowman@amd.com> wrote:
> 
>> The CXL AER service will be updated to support CXL PCIe port error
>> handling in the future. These devices will use a system panic during
>> recovery handling.
> 
> Recovery handling by panic? :) That's an interesting form of recovery..
> 

Yes, Dan requested all UCE (fatal and non-fatal) are handled by panic in order 
to limit the  blast radius of corruption in the  case of UCE. 

The recovery logic in cxl_do_recovery() (not using the panic) is also tested as well.

Regards,
Terry

>>
>> Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> ---
>>  include/linux/pci.h | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 4cf89a4b4cbc..6f7e7371161d 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -857,6 +857,9 @@ enum pci_ers_result {
>>  
>>  	/* No AER capabilities registered for the driver */
>>  	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>> +
>> +	/* Device state requires system panic */
>> +	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
>>  };
>>  
>>  /* PCI bus error event callbacks */
>
Re: [PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type
Posted by Jonathan Cameron 1 month, 1 week ago
On Wed, 16 Oct 2024 12:31:35 -0500
Terry Bowman <Terry.Bowman@amd.com> wrote:

> On 10/16/24 11:30, Jonathan Cameron wrote:
> > On Tue, 8 Oct 2024 17:16:48 -0500
> > Terry Bowman <terry.bowman@amd.com> wrote:
> >   
> >> The CXL AER service will be updated to support CXL PCIe port error
> >> handling in the future. These devices will use a system panic during
> >> recovery handling.  
> > 
> > Recovery handling by panic? :) That's an interesting form of recovery..
> >   
> 
> Yes, Dan requested all UCE (fatal and non-fatal) are handled by panic in order 
> to limit the  blast radius of corruption in the  case of UCE. 
That's fair enough.  Maybe it should be called attempted recovery handling ;)

This is fine.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Jonathan

> 
> The recovery logic in cxl_do_recovery() (not using the panic) is also tested as well.
> 
> Regards,
> Terry
> 
> >>
> >> Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type.
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> >> ---
> >>  include/linux/pci.h | 3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index 4cf89a4b4cbc..6f7e7371161d 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -857,6 +857,9 @@ enum pci_ers_result {
> >>  
> >>  	/* No AER capabilities registered for the driver */
> >>  	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
> >> +
> >> +	/* Device state requires system panic */
> >> +	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
> >>  };
> >>  
> >>  /* PCI bus error event callbacks */  
> >
Re: [PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type
Posted by Bowman, Terry 1 month, 1 week ago
Hi Jonathan,

On 10/17/2024 8:31 AM, Jonathan Cameron wrote:
> On Wed, 16 Oct 2024 12:31:35 -0500
> Terry Bowman <Terry.Bowman@amd.com> wrote:
> 
>> On 10/16/24 11:30, Jonathan Cameron wrote:
>>> On Tue, 8 Oct 2024 17:16:48 -0500
>>> Terry Bowman <terry.bowman@amd.com> wrote:
>>>    
>>>> The CXL AER service will be updated to support CXL PCIe port error
>>>> handling in the future. These devices will use a system panic during
>>>> recovery handling.
>>>
>>> Recovery handling by panic? :) That's an interesting form of recovery..
>>>    
>>
>> Yes, Dan requested all UCE (fatal and non-fatal) are handled by panic in order
>> to limit the  blast radius of corruption in the  case of UCE.
> That's fair enough.  Maybe it should be called attempted recovery handling ;)
> 
> This is fine.
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> 
> Jonathan
> 

I'll add "attempted" recovery to the commit message.

Regards,
Terry

>>
>> The recovery logic in cxl_do_recovery() (not using the panic) is also tested as well.
>>
>> Regards,
>> Terry
>>
>>>>
>>>> Add PCI_ERS_RESULT_PANIC enumeration to pci_ers_result type.
>>>>
>>>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>>>> ---
>>>>   include/linux/pci.h | 3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 4cf89a4b4cbc..6f7e7371161d 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -857,6 +857,9 @@ enum pci_ers_result {
>>>>   
>>>>   	/* No AER capabilities registered for the driver */
>>>>   	PCI_ERS_RESULT_NO_AER_DRIVER = (__force pci_ers_result_t) 6,
>>>> +
>>>> +	/* Device state requires system panic */
>>>> +	PCI_ERS_RESULT_PANIC = (__force pci_ers_result_t) 7,
>>>>   };
>>>>   
>>>>   /* PCI bus error event callbacks */
>>>    
>