kernel/irq/Kconfig | 1 + kernel/irq/irq_test.c | 64 ++++++++++++++++++++----------------------- 2 files changed, 31 insertions(+), 34 deletions(-)
The new kunit tests at kernel/irq/irq_test.c were primarily tested on x86_64, with QEMU and with ARCH=um builds. Naturally, there are other architectures that throw complications in the mix, with various CPU hotplug and IRQ implementation choices. Guenter has been dutifully noticing and reporting these errors, in places like: https://lore.kernel.org/all/b4cf04ea-d398-473f-bf11-d36643aa50dd@roeck-us.net/ I hope I've addressed all the failures, but it's hard to tell when I don't have cross-compilers and QEMU setups for all of these architectures. I've tested what I could on arm, powerpc, x86_64, and um ARCH. This series is based on David's patch for these tests: [PATCH] genirq/test: Fix depth tests on architectures with NOREQUEST by default. https://lore.kernel.org/all/20250816094528.3560222-2-davidgow@google.com/ Brian Norris (6): genirq/test: Select IRQ_DOMAIN genirq/test: Factor out fake-virq setup genirq/test: Fail early if we can't request an IRQ genirq/test: Skip managed-affinity tests with !SPARSE_IRQ genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions genirq/test: Ensure CPU 1 is online for hotplug test kernel/irq/Kconfig | 1 + kernel/irq/irq_test.c | 64 ++++++++++++++++++++----------------------- 2 files changed, 31 insertions(+), 34 deletions(-) -- 2.51.0.rc1.167.g924127e9c0-goog
On Mon, Aug 18, 2025 at 12:27:37PM -0700, Brian Norris wrote: > The new kunit tests at kernel/irq/irq_test.c were primarily tested on > x86_64, with QEMU and with ARCH=um builds. Naturally, there are other > architectures that throw complications in the mix, with various CPU > hotplug and IRQ implementation choices. > > Guenter has been dutifully noticing and reporting these errors, in > places like: > https://lore.kernel.org/all/b4cf04ea-d398-473f-bf11-d36643aa50dd@roeck-us.net/ > > I hope I've addressed all the failures, but it's hard to tell when I > don't have cross-compilers and QEMU setups for all of these > architectures. > > I've tested what I could on arm, powerpc, x86_64, and um ARCH. > > This series is based on David's patch for these tests: > > [PATCH] genirq/test: Fix depth tests on architectures with NOREQUEST by default. > https://lore.kernel.org/all/20250816094528.3560222-2-davidgow@google.com/ > Looks pretty good. Build results: total: 162 pass: 162 fail: 0 Qemu test results: total: 637 pass: 637 fail: 0 Unit test results: pass: 640616 fail: 13 Failed unit tests: arm64:imx8mp-evk:irq_cpuhotplug_test arm64:imx8mp-evk:irq_test_cases m68k:q800:irq_test_cases m68k:virt:irq_test_cases Individual failures: [ 32.613761] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210 [ 32.613761] Expected remove_cpu(1) == 0, but [ 32.613761] remove_cpu(1) == -16 (0xfffffffffffffff0) [ 32.621522] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212 [ 32.621522] Expected add_cpu(1) == 0, but [ 32.621522] add_cpu(1) == 1 (0x1) [ 32.630930] # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1 # irq_disable_depth_test: ASSERTION FAILED at kernel/irq/irq_test.c:53 Expected virq >= 0, but virq == -12 (0xfffffffffffffff4) # irq_disable_depth_test: pass:0 fail:1 skip:0 total:1 not ok 1 irq_disable_depth_test # irq_free_disabled_test: ASSERTION FAILED at kernel/irq/irq_test.c:53 Expected virq >= 0, but virq == -12 (0xfffffffffffffff4) # irq_free_disabled_test: pass:0 fail:1 skip:0 total:1 Guenter
On Thu, Aug 21, 2025 at 10:02:52AM -0700, Guenter Roeck wrote: > Build results: > total: 162 pass: 162 fail: 0 > Qemu test results: > total: 637 pass: 637 fail: 0 > Unit test results: > pass: 640616 fail: 13 > Failed unit tests: > arm64:imx8mp-evk:irq_cpuhotplug_test > arm64:imx8mp-evk:irq_test_cases > m68k:q800:irq_test_cases > m68k:virt:irq_test_cases > > Individual failures: > > [ 32.613761] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210 > [ 32.613761] Expected remove_cpu(1) == 0, but > [ 32.613761] remove_cpu(1) == -16 (0xfffffffffffffff0) > [ 32.621522] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212 > [ 32.621522] Expected add_cpu(1) == 0, but > [ 32.621522] add_cpu(1) == 1 (0x1) > [ 32.630930] # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1 I managed to get an imx8mp-evk setup running (both little and big endian) and couldn't reproduce. But I'm guessing based on the logs that we're racing with pci_call_probe(), which disables CPU hotplug (cpu_hotplug_disable()) for its duration. I'm not sure how to handle that. 1. I could just SKIP the test on EBUSY. But that'd make for flaky test coverage. 2. Expose some method to block cpu_hotplug_disable() users temporarily. 3. Stop trying to do CPU hotplug in a unit test. (It's bordering on "integration test"; but it's still useful IMO...) 4. Add an EBUSY retry loop? Or some other similar polling (if we had, say, a cpu_hotplug_disabled() API). > # irq_disable_depth_test: ASSERTION FAILED at kernel/irq/irq_test.c:53 > Expected virq >= 0, but > virq == -12 (0xfffffffffffffff4) > # irq_disable_depth_test: pass:0 fail:1 skip:0 total:1 > not ok 1 irq_disable_depth_test > # irq_free_disabled_test: ASSERTION FAILED at kernel/irq/irq_test.c:53 > Expected virq >= 0, but > virq == -12 (0xfffffffffffffff4) > # irq_free_disabled_test: pass:0 fail:1 skip:0 total:1 We've discussed this one, and I have a fix (depends on SPARSE_IRQ). Brian
On 8/21/25 12:06, Brian Norris wrote: > On Thu, Aug 21, 2025 at 10:02:52AM -0700, Guenter Roeck wrote: >> Build results: >> total: 162 pass: 162 fail: 0 >> Qemu test results: >> total: 637 pass: 637 fail: 0 >> Unit test results: >> pass: 640616 fail: 13 >> Failed unit tests: >> arm64:imx8mp-evk:irq_cpuhotplug_test >> arm64:imx8mp-evk:irq_test_cases >> m68k:q800:irq_test_cases >> m68k:virt:irq_test_cases >> >> Individual failures: >> >> [ 32.613761] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210 >> [ 32.613761] Expected remove_cpu(1) == 0, but >> [ 32.613761] remove_cpu(1) == -16 (0xfffffffffffffff0) >> [ 32.621522] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212 >> [ 32.621522] Expected add_cpu(1) == 0, but >> [ 32.621522] add_cpu(1) == 1 (0x1) >> [ 32.630930] # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1 > > I managed to get an imx8mp-evk setup running (both little and big > endian) and couldn't reproduce. But I'm guessing based on the logs that > we're racing with pci_call_probe(), which disables CPU hotplug > (cpu_hotplug_disable()) for its duration. > > I'm not sure how to handle that. > > 1. I could just SKIP the test on EBUSY. But that'd make for flaky test > coverage. > 2. Expose some method to block cpu_hotplug_disable() users temporarily. > 3. Stop trying to do CPU hotplug in a unit test. (It's bordering on > "integration test"; but it's still useful IMO...) > 4. Add an EBUSY retry loop? Or some other similar polling (if we had, > say, a cpu_hotplug_disabled() API). > Here is an additional data point: It only happens with big endian tests. This always happens in my setup, and it only happens when booting from virtio-pci but not when booting from other devices. I just re-ran the test and it passed this time, so this is apparently a flake. I'd suggest to ignore it for now. If I see it again and find a clean way to reproduce it we can have another look. The emulated PCIe controller for imx8mp-evk isn't exactly stable, so this may just be a side effect of emulation problems. Guenter
On Fri, Aug 22, 2025 at 11:34:04AM -0700, Guenter Roeck wrote: > On 8/21/25 12:06, Brian Norris wrote: > > On Thu, Aug 21, 2025 at 10:02:52AM -0700, Guenter Roeck wrote: > > > Build results: > > > total: 162 pass: 162 fail: 0 > > > Qemu test results: > > > total: 637 pass: 637 fail: 0 > > > Unit test results: > > > pass: 640616 fail: 13 > > > Failed unit tests: > > > arm64:imx8mp-evk:irq_cpuhotplug_test > > > arm64:imx8mp-evk:irq_test_cases > > > m68k:q800:irq_test_cases > > > m68k:virt:irq_test_cases > > > > > > Individual failures: > > > > > > [ 32.613761] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:210 > > > [ 32.613761] Expected remove_cpu(1) == 0, but > > > [ 32.613761] remove_cpu(1) == -16 (0xfffffffffffffff0) > > > [ 32.621522] # irq_cpuhotplug_test: EXPECTATION FAILED at kernel/irq/irq_test.c:212 > > > [ 32.621522] Expected add_cpu(1) == 0, but > > > [ 32.621522] add_cpu(1) == 1 (0x1) > > > [ 32.630930] # irq_cpuhotplug_test: pass:0 fail:1 skip:0 total:1 > > > > I managed to get an imx8mp-evk setup running (both little and big > > endian) and couldn't reproduce. But I'm guessing based on the logs that > > we're racing with pci_call_probe(), which disables CPU hotplug > > (cpu_hotplug_disable()) for its duration. > > > > I'm not sure how to handle that. > > > > 1. I could just SKIP the test on EBUSY. But that'd make for flaky test > > coverage. > > 2. Expose some method to block cpu_hotplug_disable() users temporarily. > > 3. Stop trying to do CPU hotplug in a unit test. (It's bordering on > > "integration test"; but it's still useful IMO...) > > 4. Add an EBUSY retry loop? Or some other similar polling (if we had, > > say, a cpu_hotplug_disabled() API). Ah, I see that add_cpu() (cpu_subsys_online()) already has an -EBUSY retry loop, but remove_cpu() doesn't. So #4 seems like a good solution. It might even make sense to retry in cpu_subsys_offline(), rather than just in the test. I'll give this some thought for later though. > Here is an additional data point: It only happens with big endian tests. > This always happens in my setup, and it only happens when booting from > virtio-pci but not when booting from other devices. > > I just re-ran the test and it passed this time, so this is apparently > a flake. I'd suggest to ignore it for now. If I see it again and find > a clean way to reproduce it we can have another look. The emulated PCIe > controller for imx8mp-evk isn't exactly stable, so this may just be a side > effect of emulation problems. This furthers my suspicion that it's a race with PCIe probing. On the failure case, the test is running right after some PCI scan logs. But I'm fine deferring for now, since it's not very reproducible. Brian
On Tue, 19 Aug 2025 at 03:28, Brian Norris <briannorris@chromium.org> wrote: > > The new kunit tests at kernel/irq/irq_test.c were primarily tested on > x86_64, with QEMU and with ARCH=um builds. Naturally, there are other > architectures that throw complications in the mix, with various CPU > hotplug and IRQ implementation choices. > > Guenter has been dutifully noticing and reporting these errors, in > places like: > https://lore.kernel.org/all/b4cf04ea-d398-473f-bf11-d36643aa50dd@roeck-us.net/ > > I hope I've addressed all the failures, but it's hard to tell when I > don't have cross-compilers and QEMU setups for all of these > architectures. > > I've tested what I could on arm, powerpc, x86_64, and um ARCH. > > This series is based on David's patch for these tests: > > [PATCH] genirq/test: Fix depth tests on architectures with NOREQUEST by default. > https://lore.kernel.org/all/20250816094528.3560222-2-davidgow@google.com/ > > Thanks very much. These patches all look good to me, so the series is: Reviewed-by: David Gow <davidgow@google.com> I am, however, still getting test failures on m68k (with CONFIG_VIRT=y): ./tools/testing/kunit/kunit.py run --arch m68k --cross_compile m68k-linux-gnu- irq* [14:54:23] =============== irq_test_cases (4 subtests) ================ [14:54:23] # irq_disable_depth_test: ASSERTION FAILED at kernel/irq/irq_test.c:53 [14:54:23] Expected virq >= 0, but [14:54:23] virq == -12 (0xfffffffffffffff4) [14:54:23] [FAILED] irq_disable_depth_test [14:54:23] # irq_free_disabled_test: ASSERTION FAILED at kernel/irq/irq_test.c:53 [14:54:23] Expected virq >= 0, but [14:54:23] virq == -12 (0xfffffffffffffff4) [14:54:23] [FAILED] irq_free_disabled_test [14:54:23] [SKIPPED] irq_shutdown_depth_test [14:54:23] [SKIPPED] irq_cpuhotplug_test [14:54:23] # module: irq_test [14:54:23] # irq_test_cases: pass:0 fail:2 skip:2 total:4 [14:54:23] # Totals: pass:0 fail:2 skip:2 total:4 [14:54:23] ================= [FAILED] irq_test_cases ================== [14:54:23] ============================================================ [14:54:23] Testing complete. Ran 4 tests: failed: 2, skipped: 2 Looks like __irq_alloc_descs() is returning -ENOMEM (as irq_find_free_area() is returning 200 w/ nr_irqs == 200, and CONFIG_SPARSE_IRQ=n). But all of the other architectures I found worked okay, so this is at least an improvement. Thanks, -- David > Brian Norris (6): > genirq/test: Select IRQ_DOMAIN > genirq/test: Factor out fake-virq setup > genirq/test: Fail early if we can't request an IRQ > genirq/test: Skip managed-affinity tests with !SPARSE_IRQ > genirq/test: Drop CONFIG_GENERIC_IRQ_MIGRATION assumptions > genirq/test: Ensure CPU 1 is online for hotplug test > > kernel/irq/Kconfig | 1 + > kernel/irq/irq_test.c | 64 ++++++++++++++++++++----------------------- > 2 files changed, 31 insertions(+), 34 deletions(-) > > -- > 2.51.0.rc1.167.g924127e9c0-goog >
On Wed, Aug 20, 2025 at 03:00:34PM +0800, David Gow wrote: > Looks like __irq_alloc_descs() is returning -ENOMEM (as > irq_find_free_area() is returning 200 w/ nr_irqs == 200, and > CONFIG_SPARSE_IRQ=n). Thanks for the insight. I bothered compiling my own qemu just so I can run m68k this time, and I can reproduce. I wonder if I should make everything (CONFIG_IRQ_KUNIT_TEST) depend on CONFIG_SPARSE_IRQ, since it seems like arches like m68k can't enable SPARSE_IRQ, and they can't allocate new (fake) IRQs without it. That'd be a tweak to patch 4. Or maybe just 'depends on !M68K', since architectures with higher NR_IRQS headroom may still work even without SPARSE_IRQ. > But all of the other architectures I found worked okay, so this is at > least an improvement. Thanks for the testing. Brian
On Thu, 21 Aug 2025 at 01:22, Brian Norris <briannorris@chromium.org> wrote: > > On Wed, Aug 20, 2025 at 03:00:34PM +0800, David Gow wrote: > > Looks like __irq_alloc_descs() is returning -ENOMEM (as > > irq_find_free_area() is returning 200 w/ nr_irqs == 200, and > > CONFIG_SPARSE_IRQ=n). > > Thanks for the insight. I bothered compiling my own qemu just so I can > run m68k this time, and I can reproduce. > > I wonder if I should make everything (CONFIG_IRQ_KUNIT_TEST) depend on > CONFIG_SPARSE_IRQ, since it seems like arches like m68k can't enable > SPARSE_IRQ, and they can't allocate new (fake) IRQs without it. That'd > be a tweak to patch 4. > > Or maybe just 'depends on !M68K', since architectures with higher > NR_IRQS headroom may still work even without SPARSE_IRQ. > I'm not an m68k expert (so I've CCed Geert), but I think different m68k configs do have different NR_IRQS, so it's possible there are working m68k setups, too. (It also seems slightly suspicious to me that exactly 200 IRQs are allocated here, though, so a lack of extra headroom may be deliberate and/or triggered by something trying to allocate all IRQs.) Personally, I don't have any m68k machines lying around, so disabling the test so my qemu scripts don't report errors is fine by me. Ideally the dependency would be as narrow as possible, but that may well be !M68K. The other option would be to try to skip the test if there aren't free IRQs, but maybe that'd hide real issues? Regardless, I'll defer to the IRQ and m68k experts here: as long as I'm not seeing errors, I'm happy. :-) Cheers, -- David
Hi David, On Thu, 21 Aug 2025 at 05:45, David Gow <davidgow@google.com> wrote: > On Thu, 21 Aug 2025 at 01:22, Brian Norris <briannorris@chromium.org> wrote: > > On Wed, Aug 20, 2025 at 03:00:34PM +0800, David Gow wrote: > > > Looks like __irq_alloc_descs() is returning -ENOMEM (as > > > irq_find_free_area() is returning 200 w/ nr_irqs == 200, and > > > CONFIG_SPARSE_IRQ=n). > > > > Thanks for the insight. I bothered compiling my own qemu just so I can > > run m68k this time, and I can reproduce. > > > > I wonder if I should make everything (CONFIG_IRQ_KUNIT_TEST) depend on > > CONFIG_SPARSE_IRQ, since it seems like arches like m68k can't enable > > SPARSE_IRQ, and they can't allocate new (fake) IRQs without it. That'd > > be a tweak to patch 4. > > > > Or maybe just 'depends on !M68K', since architectures with higher > > NR_IRQS headroom may still work even without SPARSE_IRQ. > > I'm not an m68k expert (so I've CCed Geert), but I think different > m68k configs do have different NR_IRQS, so it's possible there are > working m68k setups, too. (It also seems slightly suspicious to me > that exactly 200 IRQs are allocated here, though, so a lack of extra > headroom may be deliberate and/or triggered by something trying to > allocate all IRQs.) > > Personally, I don't have any m68k machines lying around, so disabling > the test so my qemu scripts don't report errors is fine by me. Ideally > the dependency would be as narrow as possible, but that may well be > !M68K. M68k indeed has different values of NR_IRQS, based on the system(s) support is enabled for. These values are based on the IRQ hierarchy of the system(s), which is rather fixed. Hence this does not take into account any additional irqchips that are being registered by e.g. tests... "git grep -w NR_IRQS -- arch/*/include/" shows m68k is not the only architecture having that limitation... > The other option would be to try to skip the test if there aren't free > IRQs, but maybe that'd hide real issues? > > Regardless, I'll defer to the IRQ and m68k experts here: as long as > I'm not seeing errors, I'm happy. :-) kernel/irq/irqdesc.c: static bool irq_expand_nr_irqs(unsigned int nr) { if (nr > MAX_SPARSE_IRQS) return false; nr_irqs = nr; return true; } kernel/irq/internals.h: #ifdef CONFIG_SPARSE_IRQ # define MAX_SPARSE_IRQS INT_MAX #else # define MAX_SPARSE_IRQS NR_IRQS #endif So probably the test should depend on SPARSE_IRQ? Increasing NR_IRQS everywhere when IRQ_KUNIT_TEST is enabled sounds rather invasive to me. BTW, given the test calls irq_domain_alloc_descs(), I think it should also depend on IRQ_DOMAIN. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
Hi Geert, On Thu, Aug 21, 2025 at 09:05:03AM +0200, Geert Uytterhoeven wrote: > So probably the test should depend on SPARSE_IRQ? Increasing NR_IRQS > everywhere when IRQ_KUNIT_TEST is enabled sounds rather invasive to me. Yeah, I was leaning to 'depends on SPARSE_IRQ' > BTW, given the test calls irq_domain_alloc_descs(), I think it should > also depend on IRQ_DOMAIN. Right, that's in patch 1. I'll resend the series with a 'depends on SPARSE_IRQ'. Thanks, Brian
On 8/20/25 10:22, Brian Norris wrote: > On Wed, Aug 20, 2025 at 03:00:34PM +0800, David Gow wrote: >> Looks like __irq_alloc_descs() is returning -ENOMEM (as >> irq_find_free_area() is returning 200 w/ nr_irqs == 200, and >> CONFIG_SPARSE_IRQ=n). > > Thanks for the insight. I bothered compiling my own qemu just so I can > run m68k this time, and I can reproduce. > > I wonder if I should make everything (CONFIG_IRQ_KUNIT_TEST) depend on > CONFIG_SPARSE_IRQ, since it seems like arches like m68k can't enable > SPARSE_IRQ, and they can't allocate new (fake) IRQs without it. That'd > be a tweak to patch 4. > > Or maybe just 'depends on !M68K', since architectures with higher > NR_IRQS headroom may still work even without SPARSE_IRQ. > >> But all of the other architectures I found worked okay, so this is at >> least an improvement. > > Thanks for the testing. > I applied the series to my testing branch. I'll run a full test tonight and report results tomorrow. Guenter
© 2016 - 2025 Red Hat, Inc.