[PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

Bjorn Helgaas posted 3 patches 2 years ago
[PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
Posted by Bjorn Helgaas 2 years ago
From: Bjorn Helgaas <bhelgaas@google.com>

The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
Previously we printed these as "Corrected" or "Uncorrected".  To avoid
confusion, use the same terms as the spec.

One confusing situation is when one agent detects an error, but another
agent is responsible for recovery, e.g., by re-attempting the operation.
The first agent may log a "correctable" error but it has not yet been
corrected.  The recovery agent must report an uncorrectable error if it is
unable to recover.  If we print the first agent's error as "Corrected", it
gives the false impression that it has already been resolved.

Sample message change:

  - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
  + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 42a3bd35a3e1..20db80018b5d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
  * AER error strings
  */
 static const char *aer_error_severity_string[] = {
-	"Uncorrected (Non-Fatal)",
-	"Uncorrected (Fatal)",
-	"Corrected"
+	"Uncorrectable (Non-Fatal)",
+	"Uncorrectable (Fatal)",
+	"Correctable"
 };
 
 static const char *aer_error_layer[] = {
-- 
2.34.1
Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
Posted by Kuppuswamy Sathyanarayanan 1 year, 11 months ago

On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>  drivers/pci/pcie/aer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
>   * AER error strings
>   */
>  static const char *aer_error_severity_string[] = {
> -	"Uncorrected (Non-Fatal)",
> -	"Uncorrected (Fatal)",
> -	"Corrected"
> +	"Uncorrectable (Non-Fatal)",
> +	"Uncorrectable (Fatal)",
> +	"Correctable"
>  };
>  
>  static const char *aer_error_layer[] = {

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
Posted by Terry Bowman 2 years ago
Hi Bjorn,

Will help prevent confusion. LGTM. 

On 12/6/23 16:42, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/pcie/aer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
>   * AER error strings
>   */
>  static const char *aer_error_severity_string[] = {
> -	"Uncorrected (Non-Fatal)",
> -	"Uncorrected (Fatal)",
> -	"Corrected"
> +	"Uncorrectable (Non-Fatal)",
> +	"Uncorrectable (Fatal)",
> +	"Correctable"
>  };
>  
>  static const char *aer_error_layer[] = {
Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
Posted by Bjorn Helgaas 2 years ago
On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
> Hi Bjorn,
> 
> Will help prevent confusion. LGTM. 

Thanks a lot for taking a look at these!  I'd like to give you credit
in the log, e.g., "Reviewed-by: Terry Bowman <Terry.Bowman@amd.com>",
but I'm OCD enough that I don't want to translate "LGTM" into that all
by myself.

If you want that credit (and, I guess, the privilege of being cc'd
when we find that these patches break something :)), just reply again
with that actual "Reviewed-by:" text in it.

Bjorn
Re: [PATCH 1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
Posted by Jonathan Cameron 2 years ago
On Wed,  6 Dec 2023 16:42:29 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com
Good to tidy this up. FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>