tools/testing/selftests/drivers/irq/Makefile | 5 +++ tools/testing/selftests/drivers/irq/config | 2 + .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++ 3 files changed, 46 insertions(+) create mode 100644 tools/testing/selftests/drivers/irq/Makefile create mode 100644 tools/testing/selftests/drivers/irq/config create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
Validate there are no duplicate hwirq from the irq debug
file system /sys/kernel/debug/irq/irqs/* per chip name.
One example log show 2 duplicated hwirq in the irq debug
file system.
$ sudo cat /sys/kernel/debug/irq/irqs/163
handler: handle_fasteoi_irq
device: 0019:00:00.0
<SNIP>
node: 1
affinity: 72-143
effectiv: 76
domain: irqchip@0x0000100022040000-3
hwirq: 0xc8000000
chip: ITS-MSI
flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174
handler: handle_fasteoi_irq
device: 0039:00:00.0
<SNIP>
node: 3
affinity: 216-287
effectiv: 221
domain: irqchip@0x0000300022040000-3
hwirq: 0xc8000000
chip: ITS-MSI
flags: 0x20
The irq-check.sh can help to collect hwirq and chip name from
/sys/kernel/debug/irq/irqs/* and print error log when find duplicate
hwirq per chip name.
Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
[1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
Signed-off-by: Joseph Jang <jjang@nvidia.com>
Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
---
tools/testing/selftests/drivers/irq/Makefile | 5 +++
tools/testing/selftests/drivers/irq/config | 2 +
.../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++
3 files changed, 46 insertions(+)
create mode 100644 tools/testing/selftests/drivers/irq/Makefile
create mode 100644 tools/testing/selftests/drivers/irq/config
create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
new file mode 100644
index 000000000000..d6998017c861
--- /dev/null
+++ b/tools/testing/selftests/drivers/irq/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+TEST_PROGS := irq-check.sh
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
new file mode 100644
index 000000000000..a53d3b713728
--- /dev/null
+++ b/tools/testing/selftests/drivers/irq/config
@@ -0,0 +1,2 @@
+CONFIG_GENERIC_IRQ_DEBUGFS=y
+CONFIG_GENERIC_IRQ_INJECTION=y
diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
new file mode 100755
index 000000000000..e784777043a1
--- /dev/null
+++ b/tools/testing/selftests/drivers/irq/irq-check.sh
@@ -0,0 +1,39 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This script need root permission
+uid=$(id -u)
+if [ $uid -ne 0 ]; then
+ echo "SKIP: Must be run as root"
+ exit 4
+fi
+
+# Ensure debugfs is mounted
+mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
+if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
+ echo "SKIP: irq debugfs not found"
+ exit 4
+fi
+
+# Traverse the irq debug file system directory to collect chip_name and hwirq
+hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
+ # Read chip name and hwirq from the irq_file
+ chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
+ hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
+
+ if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
+ continue
+ fi
+
+ echo "$chip_name $hwirq"
+done)
+
+dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
+
+if [ -n "$dup_hwirq_list" ]; then
+ echo "ERROR: Found duplicate hwirq"
+ echo "$dup_hwirq_list"
+ exit 1
+fi
+
+exit 0
--
2.34.1
On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
> Validate there are no duplicate hwirq from the irq debug
> file system /sys/kernel/debug/irq/irqs/* per chip name.
>
> One example log show 2 duplicated hwirq in the irq debug
> file system.
>
> $ sudo cat /sys/kernel/debug/irq/irqs/163
> handler: handle_fasteoi_irq
> device: 0019:00:00.0
> <SNIP>
> node: 1
> affinity: 72-143
> effectiv: 76
> domain: irqchip@0x0000100022040000-3
> hwirq: 0xc8000000
> chip: ITS-MSI
> flags: 0x20
>
> $ sudo cat /sys/kernel/debug/irq/irqs/174
> handler: handle_fasteoi_irq
> device: 0039:00:00.0
> <SNIP>
> node: 3
> affinity: 216-287
> effectiv: 221
> domain: irqchip@0x0000300022040000-3
> hwirq: 0xc8000000
> chip: ITS-MSI
> flags: 0x20
>
> The irq-check.sh can help to collect hwirq and chip name from
> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
> hwirq per chip name.
>
> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
I don't know enough about this issue to understand the details. It
seems like you look for duplicate hwirqs in chips with the same name,
e.g., "ITS-MSI" in this case? That name seems too generic to me
(might there be several instances of "ITS-MSI" in a system?)
Also, the name may come from chip->irq_print_chip(), so it apparently
relies on irqchip drivers to make the names unique if there are
multiple instances?
I would have expected looking for duplicates inside something more
specific, like "irqchip@0x0000300022040000-3". But again, I don't
know enough about the problem to speak confidently here.
Cosmetic nits:
- Tweak subject to match history (use "git log --oneline
tools/testing/selftests/drivers/" to see it), e.g.,
selftests: irq: Add check for duplicate hwirq
- Rewrap commit log to fill 75 columns. No point in using shorter
lines.
- Indent the "$ sudu cat ..." block by a couple spaces since it's
effectively a quotation, not part of the main text body.
- Possibly include sample output of irq-check.sh (also indented as a
quote) when run on the system where you manually found the
duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..."
- Reword "The irq-check.sh can help ..." to something like this:
Add an irq-check.sh test to report errors when there are
duplicate hwirqs per chip name.
- Since the kernel patch has already been merged, cite it like this
instead of using the https://lore URL:
db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation")
> Signed-off-by: Joseph Jang <jjang@nvidia.com>
> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
> ---
> tools/testing/selftests/drivers/irq/Makefile | 5 +++
> tools/testing/selftests/drivers/irq/config | 2 +
> .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++
> 3 files changed, 46 insertions(+)
> create mode 100644 tools/testing/selftests/drivers/irq/Makefile
> create mode 100644 tools/testing/selftests/drivers/irq/config
> create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
>
> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
> new file mode 100644
> index 000000000000..d6998017c861
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +TEST_PROGS := irq-check.sh
> +
> +include ../../lib.mk
> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
> new file mode 100644
> index 000000000000..a53d3b713728
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/config
> @@ -0,0 +1,2 @@
> +CONFIG_GENERIC_IRQ_DEBUGFS=y
> +CONFIG_GENERIC_IRQ_INJECTION=y
> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
> new file mode 100755
> index 000000000000..e784777043a1
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
> @@ -0,0 +1,39 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# This script need root permission
> +uid=$(id -u)
> +if [ $uid -ne 0 ]; then
> + echo "SKIP: Must be run as root"
> + exit 4
> +fi
> +
> +# Ensure debugfs is mounted
> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
> + echo "SKIP: irq debugfs not found"
> + exit 4
> +fi
> +
> +# Traverse the irq debug file system directory to collect chip_name and hwirq
> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
> + # Read chip name and hwirq from the irq_file
> + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
> + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
> +
> + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
> + continue
> + fi
> +
> + echo "$chip_name $hwirq"
> +done)
> +
> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
> +
> +if [ -n "$dup_hwirq_list" ]; then
> + echo "ERROR: Found duplicate hwirq"
> + echo "$dup_hwirq_list"
> + exit 1
> +fi
> +
> +exit 0
> --
> 2.34.1
>
On 2024/10/19 3:34 AM, Bjorn Helgaas wrote:
> On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
>> Validate there are no duplicate hwirq from the irq debug
>> file system /sys/kernel/debug/irq/irqs/* per chip name.
>>
>> One example log show 2 duplicated hwirq in the irq debug
>> file system.
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/163
>> handler: handle_fasteoi_irq
>> device: 0019:00:00.0
>> <SNIP>
>> node: 1
>> affinity: 72-143
>> effectiv: 76
>> domain: irqchip@0x0000100022040000-3
>> hwirq: 0xc8000000
>> chip: ITS-MSI
>> flags: 0x20
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/174
>> handler: handle_fasteoi_irq
>> device: 0039:00:00.0
>> <SNIP>
>> node: 3
>> affinity: 216-287
>> effectiv: 221
>> domain: irqchip@0x0000300022040000-3
>> hwirq: 0xc8000000
>> chip: ITS-MSI
>> flags: 0x20
>>
>> The irq-check.sh can help to collect hwirq and chip name from
>> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
>> hwirq per chip name.
>>
>> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
>> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
>
> I don't know enough about this issue to understand the details. It
> seems like you look for duplicate hwirqs in chips with the same name,
> e.g., "ITS-MSI" in this case? That name seems too generic to me
> (might there be several instances of "ITS-MSI" in a system?)
>
As I know, each PCIe device typically has only one ITS-MSI controller.
Having multiple ITS-MSI instances for the same device would lead to
confusion and potential conflicts in interrupt routing.
> Also, the name may come from chip->irq_print_chip(), so it apparently
> relies on irqchip drivers to make the names unique if there are
> multiple instances?
>
> I would have expected looking for duplicates inside something more
> specific, like "irqchip@0x0000300022040000-3". But again, I don't
> know enough about the problem to speak confidently here.
>
In our case, If we look for duplicates by different irq domains like
"irqchip@0x0000100022040000-3" and "irqchip@0x0000300022040000-3" as
following example.
$ sudo cat /sys/kernel/debug/irq/irqs/163
handler: handle_fasteoi_irq
device: 0019:00:00.0
<SNIP>
node: 1
affinity: 72-143
effectiv: 76
domain: irqchip@0x0000100022040000-3
hwirq: 0xc8000000
chip: ITS-MSI
flags: 0x20
$ sudo cat /sys/kernel/debug/irq/irqs/174
handler: handle_fasteoi_irq
device: 0039:00:00.0
<SNIP>
node: 3
affinity: 216-287
effectiv: 221
domain: irqchip@0x0000300022040000-3
hwirq: 0xc8000000
chip: ITS-MSI
flags: 0x20
We could not detect the duplicated hwirq number (0xc8000000) in this case.
> Cosmetic nits:
>
> - Tweak subject to match history (use "git log --oneline
> tools/testing/selftests/drivers/" to see it), e.g.,
>
> selftests: irq: Add check for duplicate hwirq
>
> - Rewrap commit log to fill 75 columns. No point in using shorter
> lines.
>
> - Indent the "$ sudu cat ..." block by a couple spaces since it's
> effectively a quotation, not part of the main text body.
>
> - Possibly include sample output of irq-check.sh (also indented as a
> quote) when run on the system where you manually found the
> duplicate via "sudo cat /sys/kernel/debug/irq/irqs/..."
>
> - Reword "The irq-check.sh can help ..." to something like this:
>
> Add an irq-check.sh test to report errors when there are
> duplicate hwirqs per chip name.
>
> - Since the kernel patch has already been merged, cite it like this
> instead of using the https://lore URL:
>
> db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation")
>
If you agree to use irq chip name ("ITS-MSI") to scan duplicate hwirq, I
could send version 2 patch to fix above suggestions.
Thank you,
Joseph.
>> Signed-off-by: Joseph Jang <jjang@nvidia.com>
>> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
>> ---
>> tools/testing/selftests/drivers/irq/Makefile | 5 +++
>> tools/testing/selftests/drivers/irq/config | 2 +
>> .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++
>> 3 files changed, 46 insertions(+)
>> create mode 100644 tools/testing/selftests/drivers/irq/Makefile
>> create mode 100644 tools/testing/selftests/drivers/irq/config
>> create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
>>
>> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
>> new file mode 100644
>> index 000000000000..d6998017c861
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/Makefile
>> @@ -0,0 +1,5 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +TEST_PROGS := irq-check.sh
>> +
>> +include ../../lib.mk
>> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
>> new file mode 100644
>> index 000000000000..a53d3b713728
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/config
>> @@ -0,0 +1,2 @@
>> +CONFIG_GENERIC_IRQ_DEBUGFS=y
>> +CONFIG_GENERIC_IRQ_INJECTION=y
>> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
>> new file mode 100755
>> index 000000000000..e784777043a1
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
>> @@ -0,0 +1,39 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +# This script need root permission
>> +uid=$(id -u)
>> +if [ $uid -ne 0 ]; then
>> + echo "SKIP: Must be run as root"
>> + exit 4
>> +fi
>> +
>> +# Ensure debugfs is mounted
>> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
>> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
>> + echo "SKIP: irq debugfs not found"
>> + exit 4
>> +fi
>> +
>> +# Traverse the irq debug file system directory to collect chip_name and hwirq
>> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
>> + # Read chip name and hwirq from the irq_file
>> + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
>> + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
>> +
>> + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
>> + continue
>> + fi
>> +
>> + echo "$chip_name $hwirq"
>> +done)
>> +
>> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
>> +
>> +if [ -n "$dup_hwirq_list" ]; then
>> + echo "ERROR: Found duplicate hwirq"
>> + echo "$dup_hwirq_list"
>> + exit 1
>> +fi
>> +
>> +exit 0
>> --
>> 2.34.1
>>
>
On Mon, Nov 11, 2024 at 03:21:36PM +0800, Joseph Jang wrote:
> On 2024/10/19 3:34 AM, Bjorn Helgaas wrote:
> > On Tue, Sep 03, 2024 at 06:44:26PM -0700, Joseph Jang wrote:
> > > Validate there are no duplicate hwirq from the irq debug
> > > file system /sys/kernel/debug/irq/irqs/* per chip name.
> > >
> > > One example log show 2 duplicated hwirq in the irq debug
> > > file system.
> > >
> > > $ sudo cat /sys/kernel/debug/irq/irqs/163
> > > handler: handle_fasteoi_irq
> > > device: 0019:00:00.0
> > > <SNIP>
> > > node: 1
> > > affinity: 72-143
> > > effectiv: 76
> > > domain: irqchip@0x0000100022040000-3
> > > hwirq: 0xc8000000
> > > chip: ITS-MSI
> > > flags: 0x20
> > >
> > > $ sudo cat /sys/kernel/debug/irq/irqs/174
> > > handler: handle_fasteoi_irq
> > > device: 0039:00:00.0
> > > <SNIP>
> > > node: 3
> > > affinity: 216-287
> > > effectiv: 221
> > > domain: irqchip@0x0000300022040000-3
> > > hwirq: 0xc8000000
> > > chip: ITS-MSI
> > > flags: 0x20
> > >
> > > The irq-check.sh can help to collect hwirq and chip name from
> > > /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
> > > hwirq per chip name.
> > >
> > > Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
> > > [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
> >
> > I don't know enough about this issue to understand the details. It
> > seems like you look for duplicate hwirqs in chips with the same name,
> > e.g., "ITS-MSI" in this case? That name seems too generic to me
> > (might there be several instances of "ITS-MSI" in a system?)
>
> As I know, each PCIe device typically has only one ITS-MSI controller.
> Having multiple ITS-MSI instances for the same device would lead to
> confusion and potential conflicts in interrupt routing.
>
> > Also, the name may come from chip->irq_print_chip(), so it apparently
> > relies on irqchip drivers to make the names unique if there are
> > multiple instances?
> >
> > I would have expected looking for duplicates inside something more
> > specific, like "irqchip@0x0000300022040000-3". But again, I don't
> > know enough about the problem to speak confidently here.
>
> In our case, If we look for duplicates by different irq domains like
> "irqchip@0x0000100022040000-3" and "irqchip@0x0000300022040000-3" as
> following example.
>
> $ sudo cat /sys/kernel/debug/irq/irqs/163
> handler: handle_fasteoi_irq
> device: 0019:00:00.0
> <SNIP>
> node: 1
> affinity: 72-143
> effectiv: 76
> domain: irqchip@0x0000100022040000-3
> hwirq: 0xc8000000
> chip: ITS-MSI
> flags: 0x20
> $ sudo cat /sys/kernel/debug/irq/irqs/174
> handler: handle_fasteoi_irq
> device: 0039:00:00.0
> <SNIP>
> node: 3
> affinity: 216-287
> effectiv: 221
> domain: irqchip@0x0000300022040000-3
> hwirq: 0xc8000000
> chip: ITS-MSI
> flags: 0x20
>
> We could not detect the duplicated hwirq number (0xc8000000) in this
> case.
Again, this is really out of my area, but based on
Documentation/core-api/irq/irq-domain.rst, I assumed the point of
hwirq was that hwirq numbers were local to an interrupt controller,
i.e., to an irq_domain.
If that's the case, it should not be a problem if hwirq number
0xc8000000 is used in two separate irq_domains.
Bjorn
On Fri, Nov 22 2024 at 11:54, Bjorn Helgaas wrote:
> On Mon, Nov 11, 2024 at 03:21:36PM +0800, Joseph Jang wrote:
>> We could not detect the duplicated hwirq number (0xc8000000) in this
>> case.
>
> Again, this is really out of my area, but based on
> Documentation/core-api/irq/irq-domain.rst, I assumed the point of
> hwirq was that hwirq numbers were local to an interrupt controller,
> i.e., to an irq_domain.
Correct.
But due to the truncation problem in pci_msi_domain_calc_hwirq() we
ended up with the same hwirq number for two different interrupts in the
same domain.
That said, I'm not really convinced about the value of the proposed
script as it just checks at a random point in time, which does not give
any meaningful test coverage.
I'd rather want to see a check in the irq domain code itself. At the
point where an interrupt is inserted, the irqdomain can validate that
there is no existing mapping for the hardware interrupt number. This
check can be unconditionally enabled as interrupt setup is not really a
hotpath operation and the lookup in the revmap or the tree is cheap.
Thanks,
tglx
On 2024/9/4 9:44 AM, Joseph Jang wrote:
> Validate there are no duplicate hwirq from the irq debug
> file system /sys/kernel/debug/irq/irqs/* per chip name.
>
> One example log show 2 duplicated hwirq in the irq debug
> file system.
>
> $ sudo cat /sys/kernel/debug/irq/irqs/163
> handler: handle_fasteoi_irq
> device: 0019:00:00.0
> <SNIP>
> node: 1
> affinity: 72-143
> effectiv: 76
> domain: irqchip@0x0000100022040000-3
> hwirq: 0xc8000000
> chip: ITS-MSI
> flags: 0x20
>
> $ sudo cat /sys/kernel/debug/irq/irqs/174
> handler: handle_fasteoi_irq
> device: 0039:00:00.0
> <SNIP>
> node: 3
> affinity: 216-287
> effectiv: 221
> domain: irqchip@0x0000300022040000-3
> hwirq: 0xc8000000
> chip: ITS-MSI
> flags: 0x20
>
> The irq-check.sh can help to collect hwirq and chip name from
> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
> hwirq per chip name.
>
> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
>
> Signed-off-by: Joseph Jang <jjang@nvidia.com>
> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
> ---
> tools/testing/selftests/drivers/irq/Makefile | 5 +++
> tools/testing/selftests/drivers/irq/config | 2 +
> .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++
> 3 files changed, 46 insertions(+)
> create mode 100644 tools/testing/selftests/drivers/irq/Makefile
> create mode 100644 tools/testing/selftests/drivers/irq/config
> create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
>
> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
> new file mode 100644
> index 000000000000..d6998017c861
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +TEST_PROGS := irq-check.sh
> +
> +include ../../lib.mk
> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
> new file mode 100644
> index 000000000000..a53d3b713728
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/config
> @@ -0,0 +1,2 @@
> +CONFIG_GENERIC_IRQ_DEBUGFS=y
> +CONFIG_GENERIC_IRQ_INJECTION=y
> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
> new file mode 100755
> index 000000000000..e784777043a1
> --- /dev/null
> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
> @@ -0,0 +1,39 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# This script need root permission
> +uid=$(id -u)
> +if [ $uid -ne 0 ]; then
> + echo "SKIP: Must be run as root"
> + exit 4
> +fi
> +
> +# Ensure debugfs is mounted
> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
> + echo "SKIP: irq debugfs not found"
> + exit 4
> +fi
> +
> +# Traverse the irq debug file system directory to collect chip_name and hwirq
> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
> + # Read chip name and hwirq from the irq_file
> + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
> + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
> +
> + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
> + continue
> + fi
> +
> + echo "$chip_name $hwirq"
> +done)
> +
> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
> +
> +if [ -n "$dup_hwirq_list" ]; then
> + echo "ERROR: Found duplicate hwirq"
> + echo "$dup_hwirq_list"
> + exit 1
> +fi
> +
> +exit 0
Hi Tglx,
I follow your suggestions
https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg16952.html
to enable IRQ DEBUG_FS and create a new script to scan duplicated hwirq.
If you have available time, would you please help to take a look at new
patch again ?
https://lore.kernel.org/all/20240904014426.3404397-1-jjang@nvidia.com/T/
Hi Shuah,
If you have time, could you help to take a look at the new patch ?
Thank you,
Joseph.
On 10/17/24 22:29, Joseph Jang wrote:
>
>
> On 2024/9/4 9:44 AM, Joseph Jang wrote:
>> Validate there are no duplicate hwirq from the irq debug
>> file system /sys/kernel/debug/irq/irqs/* per chip name.
>>
>> One example log show 2 duplicated hwirq in the irq debug
>> file system.
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/163
>> handler: handle_fasteoi_irq
>> device: 0019:00:00.0
>> <SNIP>
>> node: 1
>> affinity: 72-143
>> effectiv: 76
>> domain: irqchip@0x0000100022040000-3
>> hwirq: 0xc8000000
>> chip: ITS-MSI
>> flags: 0x20
>>
>> $ sudo cat /sys/kernel/debug/irq/irqs/174
>> handler: handle_fasteoi_irq
>> device: 0039:00:00.0
>> <SNIP>
>> node: 3
>> affinity: 216-287
>> effectiv: 221
>> domain: irqchip@0x0000300022040000-3
>> hwirq: 0xc8000000
>> chip: ITS-MSI
>> flags: 0x20
>>
>> The irq-check.sh can help to collect hwirq and chip name from
>> /sys/kernel/debug/irq/irqs/* and print error log when find duplicate
>> hwirq per chip name.
>>
>> Kernel patch ("PCI/MSI: Fix MSI hwirq truncation") [1] fix above issue.
>> [1]: https://lore.kernel.org/all/20240115135649.708536-1-vidyas@nvidia.com/
>>
>> Signed-off-by: Joseph Jang <jjang@nvidia.com>
>> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com>
>> ---
>> tools/testing/selftests/drivers/irq/Makefile | 5 +++
>> tools/testing/selftests/drivers/irq/config | 2 +
>> .../selftests/drivers/irq/irq-check.sh | 39 +++++++++++++++++++
>> 3 files changed, 46 insertions(+)
>> create mode 100644 tools/testing/selftests/drivers/irq/Makefile
>> create mode 100644 tools/testing/selftests/drivers/irq/config
>> create mode 100755 tools/testing/selftests/drivers/irq/irq-check.sh
>>
>> diff --git a/tools/testing/selftests/drivers/irq/Makefile b/tools/testing/selftests/drivers/irq/Makefile
>> new file mode 100644
>> index 000000000000..d6998017c861
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/Makefile
>> @@ -0,0 +1,5 @@
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +TEST_PROGS := irq-check.sh
>> +
>> +include ../../lib.mk
>> diff --git a/tools/testing/selftests/drivers/irq/config b/tools/testing/selftests/drivers/irq/config
>> new file mode 100644
>> index 000000000000..a53d3b713728
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/config
>> @@ -0,0 +1,2 @@
>> +CONFIG_GENERIC_IRQ_DEBUGFS=y
>> +CONFIG_GENERIC_IRQ_INJECTION=y
>> diff --git a/tools/testing/selftests/drivers/irq/irq-check.sh b/tools/testing/selftests/drivers/irq/irq-check.sh
>> new file mode 100755
>> index 000000000000..e784777043a1
>> --- /dev/null
>> +++ b/tools/testing/selftests/drivers/irq/irq-check.sh
>> @@ -0,0 +1,39 @@
>> +#!/bin/bash
>> +# SPDX-License-Identifier: GPL-2.0
>> +
>> +# This script need root permission
>> +uid=$(id -u)
>> +if [ $uid -ne 0 ]; then
>> + echo "SKIP: Must be run as root"
>> + exit 4
>> +fi
>> +
>> +# Ensure debugfs is mounted
>> +mount -t debugfs nodev /sys/kernel/debug 2>/dev/null
>> +if [ ! -d "/sys/kernel/debug/irq/irqs" ]; then
>> + echo "SKIP: irq debugfs not found"
>> + exit 4
>> +fi
>> +
>> +# Traverse the irq debug file system directory to collect chip_name and hwirq
>> +hwirq_list=$(for irq_file in /sys/kernel/debug/irq/irqs/*; do
>> + # Read chip name and hwirq from the irq_file
>> + chip_name=$(cat "$irq_file" | grep -m 1 'chip:' | awk '{print $2}')
>> + hwirq=$(cat "$irq_file" | grep -m 1 'hwirq:' | awk '{print $2}' )
>> +
>> + if [ -z "$chip_name" ] || [ -z "$hwirq" ]; then
>> + continue
>> + fi
>> +
>> + echo "$chip_name $hwirq"
>> +done)
>> +
>> +dup_hwirq_list=$(echo "$hwirq_list" | sort | uniq -cd)
>> +
>> +if [ -n "$dup_hwirq_list" ]; then
>> + echo "ERROR: Found duplicate hwirq"
>> + echo "$dup_hwirq_list"
>> + exit 1
>> +fi
>> +
>> +exit 0
>
> Hi Tglx,
>
> I follow your suggestions https://www.mail-archive.com/linux-kselftest@vger.kernel.org/msg16952.html to enable IRQ DEBUG_FS and create a new script to scan duplicated hwirq. If you have available time, would you please help to take a look at new patch again ?
>
>
> https://lore.kernel.org/all/20240904014426.3404397-1-jjang@nvidia.com/T/
>
>
> Hi Shuah,
>
> If you have time, could you help to take a look at the new patch ?
>
Once Thomas reviews this and gives me okay - I will accept the patch.
thanks,
-- Shuah
© 2016 - 2025 Red Hat, Inc.