[PATCH v2 01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver'

Terry Bowman posted 14 patches 1 month ago
There is a newer version of this series
[PATCH v2 01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver'
Posted by Terry Bowman 1 month ago
CXL.io provides PCIe like protocol error implementation, but CXL.io and
PCIe have different handling requirements.

The PCIe AER service driver may attempt recovering PCIe devices with
uncorrectable errors while recovery is not used for CXL.io. Recovery is not
used in the CXL.io recovery because of the potential for corruption on
what can be system memory.

Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
Create handlers for correctable and uncorrectable CXL.io error
handling.

The CXL error handlers will be used in future patches adding CXL PCIe
port protocol error handling.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 include/linux/pci.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 573b4c4c2be6..106ac83e3a7b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -886,6 +886,14 @@ struct pci_error_handlers {
 	void (*cor_error_detected)(struct pci_dev *dev);
 };
 
+/* CXL bus error event callbacks */
+struct cxl_error_handlers {
+	/* CXL bus error detected on this device */
+	bool (*error_detected)(struct pci_dev *dev);
+
+	/* Allow device driver to record more details of a correctable error */
+	void (*cor_error_detected)(struct pci_dev *dev);
+};
 
 struct module;
 
@@ -956,6 +964,7 @@ struct pci_driver {
 	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
 	u32  (*sriov_get_vf_total_msix)(struct pci_dev *pf);
 	const struct pci_error_handlers *err_handler;
+	const struct cxl_error_handlers *cxl_err_handler;
 	const struct attribute_group **groups;
 	const struct attribute_group **dev_groups;
 	struct device_driver	driver;
-- 
2.34.1
Re: [PATCH v2 01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver'
Posted by Fan Ni 3 weeks, 3 days ago
On Fri, Oct 25, 2024 at 04:02:52PM -0500, Terry Bowman wrote:
> CXL.io provides PCIe like protocol error implementation, but CXL.io and
> PCIe have different handling requirements.
> 
> The PCIe AER service driver may attempt recovering PCIe devices with
> uncorrectable errors while recovery is not used for CXL.io. Recovery is not
> used in the CXL.io recovery because of the potential for corruption on
> what can be system memory.
> 
> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
> Create handlers for correctable and uncorrectable CXL.io error
> handling.
> 
> The CXL error handlers will be used in future patches adding CXL PCIe
> port protocol error handling.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---

Reviewed-by: Fan Ni <fan.ni@samsung.com>

>  include/linux/pci.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 573b4c4c2be6..106ac83e3a7b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -886,6 +886,14 @@ struct pci_error_handlers {
>  	void (*cor_error_detected)(struct pci_dev *dev);
>  };
>  
> +/* CXL bus error event callbacks */
> +struct cxl_error_handlers {
> +	/* CXL bus error detected on this device */
> +	bool (*error_detected)(struct pci_dev *dev);
> +
> +	/* Allow device driver to record more details of a correctable error */
> +	void (*cor_error_detected)(struct pci_dev *dev);
> +};
>  
>  struct module;
>  
> @@ -956,6 +964,7 @@ struct pci_driver {
>  	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
>  	u32  (*sriov_get_vf_total_msix)(struct pci_dev *pf);
>  	const struct pci_error_handlers *err_handler;
> +	const struct cxl_error_handlers *cxl_err_handler;
>  	const struct attribute_group **groups;
>  	const struct attribute_group **dev_groups;
>  	struct device_driver	driver;
> -- 
> 2.34.1
> 

-- 
Fan Ni
Re: [PATCH v2 01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver'
Posted by Dave Jiang 3 weeks, 3 days ago

On 10/25/24 2:02 PM, Terry Bowman wrote:
> CXL.io provides PCIe like protocol error implementation, but CXL.io and
> PCIe have different handling requirements.
> 
> The PCIe AER service driver may attempt recovering PCIe devices with
> uncorrectable errors while recovery is not used for CXL.io. Recovery is not
> used in the CXL.io recovery because of the potential for corruption on
> what can be system memory.
> 
> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
> Create handlers for correctable and uncorrectable CXL.io error
> handling.
> 
> The CXL error handlers will be used in future patches adding CXL PCIe
> port protocol error handling.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>

> ---
>  include/linux/pci.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 573b4c4c2be6..106ac83e3a7b 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -886,6 +886,14 @@ struct pci_error_handlers {
>  	void (*cor_error_detected)(struct pci_dev *dev);
>  };
>  
> +/* CXL bus error event callbacks */
> +struct cxl_error_handlers {
> +	/* CXL bus error detected on this device */
> +	bool (*error_detected)(struct pci_dev *dev);
> +
> +	/* Allow device driver to record more details of a correctable error */
> +	void (*cor_error_detected)(struct pci_dev *dev);
> +};
>  
>  struct module;
>  
> @@ -956,6 +964,7 @@ struct pci_driver {
>  	int  (*sriov_set_msix_vec_count)(struct pci_dev *vf, int msix_vec_count); /* On PF */
>  	u32  (*sriov_get_vf_total_msix)(struct pci_dev *pf);
>  	const struct pci_error_handlers *err_handler;
> +	const struct cxl_error_handlers *cxl_err_handler;
>  	const struct attribute_group **groups;
>  	const struct attribute_group **dev_groups;
>  	struct device_driver	driver;
Re: [PATCH v2 01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver'
Posted by Jonathan Cameron 3 weeks, 4 days ago
On Fri, 25 Oct 2024 16:02:52 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> CXL.io provides PCIe like protocol error implementation, but CXL.io and
> PCIe have different handling requirements.
> 
> The PCIe AER service driver may attempt recovering PCIe devices with
> uncorrectable errors while recovery is not used for CXL.io. Recovery is not
> used in the CXL.io recovery because of the potential for corruption on
> what can be system memory.
> 
> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
> Create handlers for correctable and uncorrectable CXL.io error
> handling.
> 
> The CXL error handlers will be used in future patches adding CXL PCIe
> port protocol error handling.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Re: [PATCH v2 01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver'
Posted by Bowman, Terry 3 weeks, 4 days ago
Hi Jonathan,

Thank you for reviewing.

Regards,
Terry

On 10/30/2024 10:14 AM, Jonathan Cameron wrote:
> On Fri, 25 Oct 2024 16:02:52 -0500
> Terry Bowman <terry.bowman@amd.com> wrote:
>
>> CXL.io provides PCIe like protocol error implementation, but CXL.io and
>> PCIe have different handling requirements.
>>
>> The PCIe AER service driver may attempt recovering PCIe devices with
>> uncorrectable errors while recovery is not used for CXL.io. Recovery is not
>> used in the CXL.io recovery because of the potential for corruption on
>> what can be system memory.
>>
>> Create pci_driver::cxl_err_handlers similar to pci_driver::error_handler.
>> Create handlers for correctable and uncorrectable CXL.io error
>> handling.
>>
>> The CXL error handlers will be used in future patches adding CXL PCIe
>> port protocol error handling.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>