drivers/pci/hotplug/pciehp_ctrl.c | 5 +++++ include/ras/ras_event.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+)
Hotplug events are critical indicators for analyzing hardware health,
particularly in AI supercomputers where surprise link downs can
significantly impact system performance and reliability. The failure
characterization analysis illustrates the significance of failures
caused by the Infiniband link errors. Meta observe that 2% in a machine
learning cluster and 6% in a vision application cluster of Infiniband
failures co-occur with GPU failures, such as falling off the bus, which
may indicate a correlation with PCIe.[1]
Generate a RAS tracepoint for hotplug event to help healthy check.
The output like below:
$ echo 1 > /sys/kernel/debug/tracing/events/ras/pciehp_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
<...>-213 [001] ..... 43.762740: pciehp_event: 0000:00:02.0 slot:10, state:5, events:65792
[1]https://arxiv.org/abs/2410.21680
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
drivers/pci/hotplug/pciehp_ctrl.c | 5 +++++
include/ras/ras_event.h | 29 +++++++++++++++++++++++++++++
2 files changed, 34 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index dcdbfcf404dd..ec9285e3b9b5 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -19,6 +19,7 @@
#include <linux/types.h>
#include <linux/pm_runtime.h>
#include <linux/pci.h>
+#include <ras/ras_event.h>
#include "pciehp.h"
/* The following routines constitute the bulk of the
@@ -245,6 +246,8 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
if (events & PCI_EXP_SLTSTA_PDC)
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pciehp_event(dev_name(&ctrl->pcie->port->dev),
+ slot_name(ctrl), ON_STATE, events);
pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
break;
default:
@@ -282,6 +285,8 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
if (link_active)
ctrl_info(ctrl, "Slot(%s): Link Up\n",
slot_name(ctrl));
+ trace_pciehp_event(dev_name(&ctrl->pcie->port->dev),
+ slot_name(ctrl), OFF_STATE, events);
ctrl->request_result = pciehp_enable_slot(ctrl);
break;
default:
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index e5f7ee0864e7..5013d6ff920e 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -338,6 +338,35 @@ TRACE_EVENT(aer_event,
"Not available")
);
+TRACE_EVENT(pciehp_event,
+ TP_PROTO(const char *port_name,
+ const char *slot,
+ const u8 state,
+ const u32 events),
+
+ TP_ARGS(port_name, slot, state, events),
+
+ TP_STRUCT__entry(
+ __string( port_name, port_name )
+ __string( slot, slot )
+ __field( u8, state )
+ __field( u32, events )
+ ),
+
+ TP_fast_assign(
+ __assign_str(port_name);
+ __assign_str(slot);
+ __entry->state = state;
+ __entry->events = events;
+ ),
+
+ TP_printk("%s slot:%s, state:%d, events:%d\n",
+ __get_str(port_name),
+ __get_str(slot),
+ __entry->state,
+ __entry->events)
+);
+
/*
* memory-failure recovery action result event
*
--
2.39.3
On Fri, Nov 08, 2024 at 11:09:39AM +0800, Shuai Xue wrote: > --- a/drivers/pci/hotplug/pciehp_ctrl.c > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > @@ -19,6 +19,7 @@ > #include <linux/types.h> > #include <linux/pm_runtime.h> > #include <linux/pci.h> > +#include <ras/ras_event.h> > #include "pciehp.h" Hm, why does the TRACE_EVENT() definition have to live in ras_event.h? Why not, say, in pciehp.h? > @@ -245,6 +246,8 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) > if (events & PCI_EXP_SLTSTA_PDC) > ctrl_info(ctrl, "Slot(%s): Card not present\n", > slot_name(ctrl)); > + trace_pciehp_event(dev_name(&ctrl->pcie->port->dev), > + slot_name(ctrl), ON_STATE, events); > pciehp_disable_slot(ctrl, SURPRISE_REMOVAL); > break; > default: I'd suggest using pci_name() instead of dev_name() as it's a little shorter. Passing ON_STATE here isn't always accurate because there's "case BLINKINGOFF_STATE" with a fallthrough preceding the above code block. Wouldn't it be more readable to just log the event that occured as a string, e.g. "Surprise Removal" (and "Insertion" or "Hot Add" for the other trace event you're introducing) instead of the state? Otherwise you see "ON_STATE" in the log but that's actually the *old* value so you have to mentally convert this to "previously ON, so now must be transitioning to OFF". I'm fine with adding trace points to pciehp, I just want to make sure we do it in a way that's easy to parse for admins. Thanks, Lukas
在 2024/11/10 01:52, Lukas Wunner 写道: > On Fri, Nov 08, 2024 at 11:09:39AM +0800, Shuai Xue wrote: >> --- a/drivers/pci/hotplug/pciehp_ctrl.c >> +++ b/drivers/pci/hotplug/pciehp_ctrl.c >> @@ -19,6 +19,7 @@ >> #include <linux/types.h> >> #include <linux/pm_runtime.h> >> #include <linux/pci.h> >> +#include <ras/ras_event.h> >> #include "pciehp.h" > > Hm, why does the TRACE_EVENT() definition have to live in ras_event.h? > Why not, say, in pciehp.h? IMHO, it is a type of RAS related event, so I add it in ras_event.h, similar to other events like aer_event and memory_failure_event. I could move it to pciehp.h, if the maintainers prefer that location. > >> @@ -245,6 +246,8 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) >> if (events & PCI_EXP_SLTSTA_PDC) >> ctrl_info(ctrl, "Slot(%s): Card not present\n", >> slot_name(ctrl)); >> + trace_pciehp_event(dev_name(&ctrl->pcie->port->dev), >> + slot_name(ctrl), ON_STATE, events); >> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL); >> break; >> default: > > I'd suggest using pci_name() instead of dev_name() as it's a little shorter. Will use pci_name() instead. > > Passing ON_STATE here isn't always accurate because there's > "case BLINKINGOFF_STATE" with a fallthrough preceding the > above code block. Yes, you are right, I missed the above fallthrough case. > > Wouldn't it be more readable to just log the event that occured > as a string, e.g. "Surprise Removal" (and "Insertion" or "Hot Add" > for the other trace event you're introducing) instead of the state? > > Otherwise you see "ON_STATE" in the log but that's actually the > *old* value so you have to mentally convert this to "previously ON, > so now must be transitioning to OFF". I see your point. "Surprise Removal" or "Insertion" is indeed the exact state transition. However, I am concerned that using a string might make it difficult for user space tools like rasdaemon to parse. How about adding a new enum for state transition? For example: enum pciehp_trans_type { PCIEHP_SAFE_REMOVAL, PCIEHP_SURPRISE_REMOVAL, PCIEHP_Hot_Add, ... } And define the state transition as a int type for tracepoint, then rasdaemon can parse the value easily. trace_pciehp_event(pci_name(&ctrl->pcie->port->dev), slot_name(ctrl), PCIEHP_SAFE_REMOVAL, events); And TP_printk with symbolic name of the state transition. TRACE_EVENT(pciehp_event, TP_PROTO(const char *port_name, const char *slot, const int trans_state), TP_ARGS(port_name, slot, trans_state), TP_STRUCT__entry( __string( port_name, port_name ) __string( slot, slot ) __field( int, trans_state ) ), TP_fast_assign( __assign_str(port_name, port_name); __assign_str(slot, slot); __entry->trans_state = trans_state; ), TP_printk("%s slot:%s, state:%d, events:%d\n", __get_str(port_name), __get_str(slot), __print_symbolic(__entry->trans_state, PCIEHP_SURPRISE_REMOVAL), ); > > I'm fine with adding trace points to pciehp, I just want to make sure > we do it in a way that's easy to parse for admins. Thank you for the positive feedback :) > > Thanks, > > Lukas Best Regards, Shuai
On Sun, Nov 10, 2024 at 06:12:09PM +0800, Shuai Xue wrote: > 2024/11/10 01:52, Lukas Wunner: > > On Fri, Nov 08, 2024 at 11:09:39AM +0800, Shuai Xue wrote: > > > --- a/drivers/pci/hotplug/pciehp_ctrl.c > > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c > > > @@ -19,6 +19,7 @@ > > > #include <linux/types.h> > > > #include <linux/pm_runtime.h> > > > #include <linux/pci.h> > > > +#include <ras/ras_event.h> > > > #include "pciehp.h" > > > > Hm, why does the TRACE_EVENT() definition have to live in ras_event.h? > > Why not, say, in pciehp.h? > > IMHO, it is a type of RAS related event, so I add it in ras_event.h, > similar to other events like aer_event and memory_failure_event. > > I could move it to pciehp.h, if the maintainers prefer that location. IMO pciehp.h makes more sense than ras/ras_event.h. The addition of AER to ras/ras_event.h was over a decade ago with commit 0a2409aad38e ("trace, AER: Move trace into unified interface"). That commit wasn't acked by Bjorn. It wasn't even cc'ed to linux-pci: https://lore.kernel.org/all/1402475691-30045-3-git-send-email-gong.chen@linux.intel.com/ I can see a connection between AER and RAS, but PCI hotplug tracepoints are not exclusively RAS, they might be useful for other purposes as well. Note that pciehp is not just used on servers but also e.g. for Thunderbolt on mobile devices and the tracepoints might come in handy to debug that. > > Wouldn't it be more readable to just log the event that occured > > as a string, e.g. "Surprise Removal" (and "Insertion" or "Hot Add" > > for the other trace event you're introducing) instead of the state? > > > > Otherwise you see "ON_STATE" in the log but that's actually the > > *old* value so you have to mentally convert this to "previously ON, > > so now must be transitioning to OFF". > > I see your point. "Surprise Removal" or "Insertion" is indeed the exact > state transition. However, I am concerned that using a string might make > it difficult for user space tools like rasdaemon to parse. If this is parsed by a user space daemon, put the enum in a uapi header, e.g. include/uapi/linux/pci.h. > How about adding a new enum for state transition? For example: > > enum pciehp_trans_type { > PCIEHP_SAFE_REMOVAL, > PCIEHP_SURPRISE_REMOVAL, > PCIEHP_Hot_Add, > ... > } In that case, I'd suggest adding an entry to the enum for all the ctrl_info() messages, i.e. Link Up Link Down Card present Card not present Amend pciehp_handle_presence_or_link_change() with curly braces around all the affected if-blocks and put a trace event next to the ctrl_info() message. Also, since these events are not pciehp-specific, I'd call the enum something like pci_hotplug_event and the entries PCI_HOTPLUG_... (or PCI_HP_... if you prefer short names). These trace events could in principle be raised by any of the other hotplug drivers in drivers/pci/hotplug/, not just pciehp. Thanks, Lukas
在 2024/11/11 00:44, Lukas Wunner 写道: > On Sun, Nov 10, 2024 at 06:12:09PM +0800, Shuai Xue wrote: >> 2024/11/10 01:52, Lukas Wunner: >>> On Fri, Nov 08, 2024 at 11:09:39AM +0800, Shuai Xue wrote: >>>> --- a/drivers/pci/hotplug/pciehp_ctrl.c >>>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c >>>> @@ -19,6 +19,7 @@ >>>> #include <linux/types.h> >>>> #include <linux/pm_runtime.h> >>>> #include <linux/pci.h> >>>> +#include <ras/ras_event.h> >>>> #include "pciehp.h" >>> >>> Hm, why does the TRACE_EVENT() definition have to live in ras_event.h? >>> Why not, say, in pciehp.h? >> >> IMHO, it is a type of RAS related event, so I add it in ras_event.h, >> similar to other events like aer_event and memory_failure_event. >> >> I could move it to pciehp.h, if the maintainers prefer that location. > > IMO pciehp.h makes more sense than ras/ras_event.h. > > The addition of AER to ras/ras_event.h was over a decade ago with > commit 0a2409aad38e ("trace, AER: Move trace into unified interface"). > That commit wasn't acked by Bjorn. It wasn't even cc'ed to linux-pci: > > https://lore.kernel.org/all/1402475691-30045-3-git-send-email-gong.chen@linux.intel.com/ > > I can see a connection between AER and RAS, but PCI hotplug tracepoints > are not exclusively RAS, they might be useful for other purposes as well. > Note that pciehp is not just used on servers but also e.g. for Thunderbolt > on mobile devices and the tracepoints might come in handy to debug that. Got it, will move it to pciehp.h > >>> Wouldn't it be more readable to just log the event that occured >>> as a string, e.g. "Surprise Removal" (and "Insertion" or "Hot Add" >>> for the other trace event you're introducing) instead of the state? >>> >>> Otherwise you see "ON_STATE" in the log but that's actually the >>> *old* value so you have to mentally convert this to "previously ON, >>> so now must be transitioning to OFF". >> >> I see your point. "Surprise Removal" or "Insertion" is indeed the exact >> state transition. However, I am concerned that using a string might make >> it difficult for user space tools like rasdaemon to parse. > > If this is parsed by a user space daemon, put the enum in a uapi header, > e.g. include/uapi/linux/pci.h. Will do it. > > >> How about adding a new enum for state transition? For example: >> >> enum pciehp_trans_type { >> PCIEHP_SAFE_REMOVAL, >> PCIEHP_SURPRISE_REMOVAL, >> PCIEHP_Hot_Add, >> ... >> } > > In that case, I'd suggest adding an entry to the enum for all the > ctrl_info() messages, i.e. > > Link Up > Link Down > Card present > Card not present > > Amend pciehp_handle_presence_or_link_change() with curly braces > around all the affected if-blocks and put a trace event next to the > ctrl_info() message. Will do it. > > Also, since these events are not pciehp-specific, I'd call the enum > something like pci_hotplug_event and the entries PCI_HOTPLUG_... > (or PCI_HP_... if you prefer short names). These trace events could > in principle be raised by any of the other hotplug drivers in > drivers/pci/hotplug/, not just pciehp. I see. Will rename with PCI_HP_ prefix. > > Thanks, > > Lukas Thank you for valuable comments. Best Regards, Shuai
© 2016 - 2024 Red Hat, Inc.