[PATCH v2] trace, RAS: Use __print_symbolic helper for entry severity for aer_events

Sargun Dhillon posted 1 patch 8 months ago
include/ras/ras_event.h | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
[PATCH v2] trace, RAS: Use __print_symbolic helper for entry severity for aer_events
Posted by Sargun Dhillon 8 months ago
The chained ternary conditional operator in the perf event format for
ras:aer_event was causing a misrepresentation of the severity of the event
when used with "perf script". Rather than building our own hand-rolled
formatting, just use the __print_symbolic helper to format it.

Specifically, all corrected errors were being formatted as non-fatal,
uncorrected errors, as shown below with the BAD_TLP errors, which is
correctable. This is due to a bug in libtraceevent, where chained
ternary conditions are not parsed correctly.

The before / after are as follows (and also tested to make sure
uncorrectable events) still show up as uncorrectable.

aer-inject was used with the following AER event injection script:
AER
PCI_ID 00:05.0
COR_STATUS BAD_TLP
HEADER_LOG 0 1 2 3

dmesg (unchanged between runs):
pcieport 0000:00:05.0: aer_inject: Injecting errors 00000040/00000000 into device 0000:00:05.0
pcieport 0000:00:05.0: AER: Correctable error message received from 0000:00:05.0
pcieport 0000:00:05.0: PCIe Bus Error: severity=Correctable, type=Data Link Layer, (Receiver ID)
pcieport 0000:00:05.0:   device [1b36:000c] error status/mask=00000040/0000e000
pcieport 0000:00:05.0:    [ 6] BadTLP

Before:
virtme-ng:/# perf script |cat
   irq/24-aerdrv     424 [002]   392.240255:          ras:aer_event: 0000:00:05.0 PCIe Bus Error: severity=Uncorrected, non-fatal, Bad TLP, TLP Header=Not available

After:
   irq/24-aerdrv     424 [002]    29.198383:          ras:aer_event: 0000:00:05.0 PCIe Bus Error: severity=Corrected, Bad TLP, TLP Header=Not available

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
---
 include/ras/ras_event.h | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index e5f7ee0864e7..9312007096d7 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -327,9 +327,10 @@ TRACE_EVENT(aer_event,
 
 	TP_printk("%s PCIe Bus Error: severity=%s, %s, TLP Header=%s\n",
 		__get_str(dev_name),
-		__entry->severity == AER_CORRECTABLE ? "Corrected" :
-			__entry->severity == AER_FATAL ?
-			"Fatal" : "Uncorrected, non-fatal",
+		__print_symbolic(__entry->severity,
+				 {AER_NONFATAL, "Uncorrected, non-fatal"},
+				 {AER_FATAL, "Fatal"},
+				 {AER_CORRECTABLE, "Corrected"}),
 		__entry->severity == AER_CORRECTABLE ?
 		__print_flags(__entry->status, "|", aer_correctable_errors) :
 		__print_flags(__entry->status, "|", aer_uncorrectable_errors),
-- 
2.47.1
Re: [PATCH v2] trace, RAS: Use __print_symbolic helper for entry severity for aer_events
Posted by Borislav Petkov 8 months ago
+ Rostedt.

On Mon, Apr 14, 2025 at 08:38:34AM -0700, Sargun Dhillon wrote:
> The chained ternary conditional operator in the perf event format for
> ras:aer_event was causing a misrepresentation of the severity of the event
> when used with "perf script". Rather than building our own hand-rolled
> formatting, just use the __print_symbolic helper to format it.
> 
> Specifically, all corrected errors were being formatted as non-fatal,
> uncorrected errors, as shown below with the BAD_TLP errors, which is
> correctable. This is due to a bug in libtraceevent, where chained
> ternary conditions are not parsed correctly.

So because *some* libtraceevent has a bug, we're wagging the dog, not the
tail?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v2] trace, RAS: Use __print_symbolic helper for entry severity for aer_events
Posted by Steven Rostedt 8 months ago
On Mon, 14 Apr 2025 18:33:47 +0200
Borislav Petkov <bp@alien8.de> wrote:

> On Mon, Apr 14, 2025 at 08:38:34AM -0700, Sargun Dhillon wrote:
> > The chained ternary conditional operator in the perf event format for
> > ras:aer_event was causing a misrepresentation of the severity of the event
> > when used with "perf script". Rather than building our own hand-rolled
> > formatting, just use the __print_symbolic helper to format it.
> > 
> > Specifically, all corrected errors were being formatted as non-fatal,
> > uncorrected errors, as shown below with the BAD_TLP errors, which is
> > correctable. This is due to a bug in libtraceevent, where chained
> > ternary conditions are not parsed correctly.  
> 
> So because *some* libtraceevent has a bug, we're wagging the dog, not the
> tail?

Agreed.

If something isn't parsed correctly in libtraceevent, please let me know!

Can you apply this to libtraceevent and see if it fixes your issue:

diff --git a/src/event-parse.c b/src/event-parse.c
index 6317ff6..4a09fcc 100644
--- a/src/event-parse.c
+++ b/src/event-parse.c
@@ -2083,6 +2083,16 @@ process_cond(struct tep_event *event, struct tep_print_arg *top, char **tok)
 
 	type = process_arg(event, right, &token);
 
+ againagain:
+	if (type == TEP_EVENT_ERROR)
+		goto out_free;
+
+	/* Handle other operations in the results */
+	if (type == TEP_EVENT_OP) {
+		type = process_op(event, right, &token);
+		goto againagain;
+	}
+
 	top->op.right = arg;
 
 	*tok = token;

I'm getting ready to post a new version of libtraceevent, and if this fixes
the parsing then I'll include this too.

-- Steve