arch/arm64/kernel/smp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Some arm64 rely on store_cpu_topology() to setup the real topology.
This needs to be done before the call to notify_cpu_starting() which
tell the scheduler about the cpu otherwise the core scheduling data
structures are setup in a way that does not match the actual topology.
Without this change stress-ng (which enables core scheduling in its prctl
tests) causes a warning and then a crash (trimmed for legibility):
[ 1853.805168] ------------[ cut here ]------------
[ 1853.809784] task_rq(b)->core != rq->core
[ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4
...
[ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[ 1854.231256] Call trace:
[ 1854.233689] pick_next_task+0x3dc/0x81c
[ 1854.237512] __schedule+0x10c/0x4cc
[ 1854.240988] schedule_idle+0x34/0x54
Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock")
Signed-off-by: Phil Auld <pauld@redhat.com>
---
This is a similar issue to
f2703def339c ("MIPS: smp: fill in sibling and core maps earlier")
which fixed it for MIPS.
arch/arm64/kernel/smp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 27df5c1e6baa..3b46041f2b97 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void)
* Log the CPU info before it is marked online and might get read.
*/
cpuinfo_store_cpu();
+ store_cpu_topology(cpu);
/*
* Enable GIC and timers.
@@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void)
ipi_setup(cpu);
- store_cpu_topology(cpu);
numa_add_cpu(cpu);
/*
--
2.18.0
On 22/03/2022 17:03, Phil Auld wrote:
> Some arm64 rely on store_cpu_topology() to setup the real topology.
> This needs to be done before the call to notify_cpu_starting() which
> tell the scheduler about the cpu otherwise the core scheduling data
> structures are setup in a way that does not match the actual topology.
>
> Without this change stress-ng (which enables core scheduling in its prctl
> tests) causes a warning and then a crash (trimmed for legibility):
>
> [ 1853.805168] ------------[ cut here ]------------
> [ 1853.809784] task_rq(b)->core != rq->core
> [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4
> ...
> [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> ...
> [ 1854.231256] Call trace:
> [ 1854.233689] pick_next_task+0x3dc/0x81c
> [ 1854.237512] __schedule+0x10c/0x4cc
> [ 1854.240988] schedule_idle+0x34/0x54
>
> Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock")
> Signed-off-by: Phil Auld <pauld@redhat.com>
> ---
> This is a similar issue to
> f2703def339c ("MIPS: smp: fill in sibling and core maps earlier")
> which fixed it for MIPS.
I assume this is for a machine which relies on MPIDR-based setup
(package_id == -1)? I.e. it doesn't have proper ACPI/(DT) data for
topology setup.
Tried on a ThunderX2 by disabling parse_acpi_topology() but then I end
up with a machine w/o SMT, so `stress-ng --prctl N` doesn't show this issue.
Which machine were you using?
> arch/arm64/kernel/smp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 27df5c1e6baa..3b46041f2b97 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void)
> * Log the CPU info before it is marked online and might get read.
> */
> cpuinfo_store_cpu();
> + store_cpu_topology(cpu);
>
> /*
> * Enable GIC and timers.
> @@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void)
>
> ipi_setup(cpu);
>
> - store_cpu_topology(cpu);
> numa_add_cpu(cpu);
>
> /*
On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote:
> On 22/03/2022 17:03, Phil Auld wrote:
> > Some arm64 rely on store_cpu_topology() to setup the real topology.
> > This needs to be done before the call to notify_cpu_starting() which
> > tell the scheduler about the cpu otherwise the core scheduling data
> > structures are setup in a way that does not match the actual topology.
> >
> > Without this change stress-ng (which enables core scheduling in its prctl
> > tests) causes a warning and then a crash (trimmed for legibility):
> >
> > [ 1853.805168] ------------[ cut here ]------------
> > [ 1853.809784] task_rq(b)->core != rq->core
> > [ 1853.809792] WARNING: CPU: 117 PID: 0 at kernel/sched/fair.c:11102 cfs_prio_less+0x1b4/0x1c4
> > ...
> > [ 1854.015210] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> > ...
> > [ 1854.231256] Call trace:
> > [ 1854.233689] pick_next_task+0x3dc/0x81c
> > [ 1854.237512] __schedule+0x10c/0x4cc
> > [ 1854.240988] schedule_idle+0x34/0x54
> >
> > Fixes: 9edeaea1bc45 ("sched: Core-wide rq->lock")
> > Signed-off-by: Phil Auld <pauld@redhat.com>
> > ---
> > This is a similar issue to
> > f2703def339c ("MIPS: smp: fill in sibling and core maps earlier")
> > which fixed it for MIPS.
>
> I assume this is for a machine which relies on MPIDR-based setup
> (package_id == -1)? I.e. it doesn't have proper ACPI/(DT) data for
> topology setup.
Yes, that's my understanding. No PPTT.
>
> Tried on a ThunderX2 by disabling parse_acpi_topology() but then I end
> up with a machine w/o SMT, so `stress-ng --prctl N` doesn't show this issue.
>
> Which machine were you using?
This instance is an HPE Apollo 70 set to smt-4. I believe it's ThunderX2
chips.
ARM (CN9980-2200LG4077-Y21-G)
Thanks,
Phil
>
> > arch/arm64/kernel/smp.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 27df5c1e6baa..3b46041f2b97 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -234,6 +234,7 @@ asmlinkage notrace void secondary_start_kernel(void)
> > * Log the CPU info before it is marked online and might get read.
> > */
> > cpuinfo_store_cpu();
> > + store_cpu_topology(cpu);
> >
> > /*
> > * Enable GIC and timers.
> > @@ -242,7 +243,6 @@ asmlinkage notrace void secondary_start_kernel(void)
> >
> > ipi_setup(cpu);
> >
> > - store_cpu_topology(cpu);
> > numa_add_cpu(cpu);
> >
> > /*
>
--
On 29/03/2022 17:20, Phil Auld wrote: > On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote: >> On 22/03/2022 17:03, Phil Auld wrote: [...] >> I assume this is for a machine which relies on MPIDR-based setup >> (package_id == -1)? I.e. it doesn't have proper ACPI/(DT) data for >> topology setup. > > Yes, that's my understanding. No PPTT. > >> >> Tried on a ThunderX2 by disabling parse_acpi_topology() but then I end >> up with a machine w/o SMT, so `stress-ng --prctl N` doesn't show this issue. >> >> Which machine were you using? > > This instance is an HPE Apollo 70 set to smt-4. I believe it's ThunderX2 > chips. > > ARM (CN9980-2200LG4077-Y21-G) I'm using the same processor just with ACPI/PPTT. # sudo dmidecode -t 4 | grep "Part Number" Part Number: CN9980-2200LG4077-21-Y-G Part Number: CN9980-2200LG4077-21-Y-G # cat /sys/devices/system/cpu/cpu0/topology/thread_siblings 0,32,64,96 # cat /sys/kernel/debug/sched/domains/cpu0/domain*/name SMT MC NUMA But no matter whether I disable parse_acpi_topology() or just force `cpu_topology[cpu].package_id = -1` in this function, I always end up with: # cat /sys/kernel/debug/sched/domains/cpu0/domain*/name MC NUMA # cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list 0 so no SMT sched domain. The MPIDR-based topology fallback code in store_cpu_topology() forces `cpuid_topo->thread_id = -1`. IMHO this is why on my machine I don't see this issue while running: root@oss-apollo7007:~# stress-ng --prctl 256 -t 60 stress-ng: info: [2388042] dispatching hogs: 256 prctl Is there something I miss in my setup to provoke this issue? [...]
On Tue, Mar 29, 2022 at 08:55:08PM +0200 Dietmar Eggemann wrote: > On 29/03/2022 17:20, Phil Auld wrote: > > On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote: > >> On 22/03/2022 17:03, Phil Auld wrote: > > [...] > > >> I assume this is for a machine which relies on MPIDR-based setup > >> (package_id == -1)? I.e. it doesn't have proper ACPI/(DT) data for > >> topology setup. > > > > Yes, that's my understanding. No PPTT. > > > >> > >> Tried on a ThunderX2 by disabling parse_acpi_topology() but then I end > >> up with a machine w/o SMT, so `stress-ng --prctl N` doesn't show this issue. > >> > >> Which machine were you using? > > > > This instance is an HPE Apollo 70 set to smt-4. I believe it's ThunderX2 > > chips. > > > > ARM (CN9980-2200LG4077-Y21-G) > I'm using the same processor just with ACPI/PPTT. > Maybe I'm misinformed about these systems having no PPTT... I'm reclaiming the system. Is there a way I can tell from userspace? > # sudo dmidecode -t 4 | grep "Part Number" > Part Number: CN9980-2200LG4077-21-Y-G > Part Number: CN9980-2200LG4077-21-Y-G > > # cat /sys/devices/system/cpu/cpu0/topology/thread_siblings > 0,32,64,96 > > # cat /sys/kernel/debug/sched/domains/cpu0/domain*/name > SMT > MC > NUMA > > But no matter whether I disable parse_acpi_topology() or just force > `cpu_topology[cpu].package_id = -1` in this function, I always end up with: > > # cat /sys/kernel/debug/sched/domains/cpu0/domain*/name > MC > NUMA > > # cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list > 0 > > so no SMT sched domain. The MPIDR-based topology fallback code in > store_cpu_topology() forces `cpuid_topo->thread_id = -1`. Right. So since I'm getting SMT it must not have package_id == -1. In which case you should be able to reproduce it because it must be that the call the update_siblings_masks() is required. That appears to only be called from store_cpu_topology() which is after the scheduler has already setup the core pointers. The fix could be the same but I should reword the commit message since it should effect all SMT arm systems I'd think. Or maybe the ACPI topology code should call update_sibling_masks(). > > IMHO this is why on my machine I don't see this issue while running: > > root@oss-apollo7007:~# stress-ng --prctl 256 -t 60 > stress-ng: info: [2388042] dispatching hogs: 256 prctl > > Is there something I miss in my setup to provoke this issue? > Make sure you have a stress-ng that is new enough and built against headers that have the CORE_SCHED prctls defined. BTW, thanks for taking a look. Cheers, Phil > [...] > --
On 29/03/2022 21:50, Phil Auld wrote: > On Tue, Mar 29, 2022 at 08:55:08PM +0200 Dietmar Eggemann wrote: >> On 29/03/2022 17:20, Phil Auld wrote: >>> On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote: >>>> On 22/03/2022 17:03, Phil Auld wrote: [...] >>> This instance is an HPE Apollo 70 set to smt-4. I believe it's ThunderX2 >>> chips. >>> >>> ARM (CN9980-2200LG4077-Y21-G) >> I'm using the same processor just with ACPI/PPTT. >> > > Maybe I'm misinformed about these systems having no PPTT... > > I'm reclaiming the system. Is there a way I can tell from userspace? # cat /sys/firmware/acpi/tables/PPTT > pptt.dat # iasl -d pptt.dat # vim pptt.dsl [...] >> so no SMT sched domain. The MPIDR-based topology fallback code in >> store_cpu_topology() forces `cpuid_topo->thread_id = -1`. > > Right. So since I'm getting SMT it must not have package_id == -1. > In which case you should be able to reproduce it because it must > be that the call the update_siblings_masks() is required. That > appears to only be called from store_cpu_topology() which is > after the scheduler has already setup the core pointers. > > The fix could be the same but I should reword the commit message > since it should effect all SMT arm systems I'd think. > > Or maybe the ACPI topology code should call update_sibling_masks(). >> >> IMHO this is why on my machine I don't see this issue while running: >> >> root@oss-apollo7007:~# stress-ng --prctl 256 -t 60 >> stress-ng: info: [2388042] dispatching hogs: 256 prctl >> >> Is there something I miss in my setup to provoke this issue? >> > > Make sure you have a stress-ng that is new enough and built against > headers that have the CORE_SCHED prctls defined. Ah, I was using a pretty old version 0.11.07. Now I switched to 0.13.12 which includes: 9038e442b92d - stress-prctl: add Linux 5.14 PR_SCHED_CORE prctl To get SCHED_CORE activated in stress-prctl.c, as a quick hack, I had to add the definitions of PR_SCHED_CORE, PR_SCHED_CORE_GET, etc. to this file. Now the issue you described triggers on this machine immediately.
On Wed, Mar 30, 2022 at 05:48:34PM +0200 Dietmar Eggemann wrote: > On 29/03/2022 21:50, Phil Auld wrote: > > On Tue, Mar 29, 2022 at 08:55:08PM +0200 Dietmar Eggemann wrote: > >> On 29/03/2022 17:20, Phil Auld wrote: > >>> On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote: > >>>> On 22/03/2022 17:03, Phil Auld wrote: > > [...] > > >>> This instance is an HPE Apollo 70 set to smt-4. I believe it's ThunderX2 > >>> chips. > >>> > >>> ARM (CN9980-2200LG4077-Y21-G) > >> I'm using the same processor just with ACPI/PPTT. > >> > > > > Maybe I'm misinformed about these systems having no PPTT... > > > > I'm reclaiming the system. Is there a way I can tell from userspace? > > # cat /sys/firmware/acpi/tables/PPTT > pptt.dat > # iasl -d pptt.dat > # vim pptt.dsl > Thanks, I'll git that a try. I suspect these are the same as yours though and I was just mistaken :) > [...] > > >> so no SMT sched domain. The MPIDR-based topology fallback code in > >> store_cpu_topology() forces `cpuid_topo->thread_id = -1`. > > > > Right. So since I'm getting SMT it must not have package_id == -1. > > In which case you should be able to reproduce it because it must > > be that the call the update_siblings_masks() is required. That > > appears to only be called from store_cpu_topology() which is > > after the scheduler has already setup the core pointers. > > > > The fix could be the same but I should reword the commit message > > since it should effect all SMT arm systems I'd think. > > > > Or maybe the ACPI topology code should call update_sibling_masks(). > >> > >> IMHO this is why on my machine I don't see this issue while running: > >> > >> root@oss-apollo7007:~# stress-ng --prctl 256 -t 60 > >> stress-ng: info: [2388042] dispatching hogs: 256 prctl > >> > >> Is there something I miss in my setup to provoke this issue? > >> > > > > Make sure you have a stress-ng that is new enough and built against > > headers that have the CORE_SCHED prctls defined. > > Ah, I was using a pretty old version 0.11.07. Now I switched to 0.13.12 > which includes: > > 9038e442b92d - stress-prctl: add Linux 5.14 PR_SCHED_CORE prctl > > To get SCHED_CORE activated in stress-prctl.c, as a quick hack, I had to > add the definitions of PR_SCHED_CORE, PR_SCHED_CORE_GET, etc. to this file. > > Now the issue you described triggers on this machine immediately. > Great! I'll repost the patch with a more accurate commit message then. And if you come up with something different that works for me too. Let me know and I'll test it here. Cheers, Phil --
On Wed, Mar 30, 2022 at 05:48:34PM +0200 Dietmar Eggemann wrote: > On 29/03/2022 21:50, Phil Auld wrote: > > On Tue, Mar 29, 2022 at 08:55:08PM +0200 Dietmar Eggemann wrote: > >> On 29/03/2022 17:20, Phil Auld wrote: > >>> On Tue, Mar 29, 2022 at 04:02:22PM +0200 Dietmar Eggemann wrote: > >>>> On 22/03/2022 17:03, Phil Auld wrote: > > [...] > > >>> This instance is an HPE Apollo 70 set to smt-4. I believe it's ThunderX2 > >>> chips. > >>> > >>> ARM (CN9980-2200LG4077-Y21-G) > >> I'm using the same processor just with ACPI/PPTT. > >> > > > > Maybe I'm misinformed about these systems having no PPTT... > > > > I'm reclaiming the system. Is there a way I can tell from userspace? > > # cat /sys/firmware/acpi/tables/PPTT > pptt.dat > # iasl -d pptt.dat > # vim pptt.dsl > I don't have iasl but # strings pptt.dat PPTT ServerCL CAVM So that looks like it has a PPTT entry. Cheers, Phil > [...] > > >> so no SMT sched domain. The MPIDR-based topology fallback code in > >> store_cpu_topology() forces `cpuid_topo->thread_id = -1`. > > > > Right. So since I'm getting SMT it must not have package_id == -1. > > In which case you should be able to reproduce it because it must > > be that the call the update_siblings_masks() is required. That > > appears to only be called from store_cpu_topology() which is > > after the scheduler has already setup the core pointers. > > > > The fix could be the same but I should reword the commit message > > since it should effect all SMT arm systems I'd think. > > > > Or maybe the ACPI topology code should call update_sibling_masks(). > >> > >> IMHO this is why on my machine I don't see this issue while running: > >> > >> root@oss-apollo7007:~# stress-ng --prctl 256 -t 60 > >> stress-ng: info: [2388042] dispatching hogs: 256 prctl > >> > >> Is there something I miss in my setup to provoke this issue? > >> > > > > Make sure you have a stress-ng that is new enough and built against > > headers that have the CORE_SCHED prctls defined. > > Ah, I was using a pretty old version 0.11.07. Now I switched to 0.13.12 > which includes: > > 9038e442b92d - stress-prctl: add Linux 5.14 PR_SCHED_CORE prctl > > To get SCHED_CORE activated in stress-prctl.c, as a quick hack, I had to > add the definitions of PR_SCHED_CORE, PR_SCHED_CORE_GET, etc. to this file. > > Now the issue you described triggers on this machine immediately. > --
© 2016 - 2026 Red Hat, Inc.