kernel/rcu/tree_nocb.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
In the preparation stage of CPU online, if the corresponding
the rdp's->nocb_cb_kthread does not exist, will be created,
there is a situation where the rdp's rcuop kthreads creation fails,
and then de-offload this CPU's rdp, does not assign this CPU's
rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
rdp's->rdp_gp->nocb_gp_kthread is still valid.
This will cause the subsequent re-offload operation of this offline
CPU, which will pass the conditional check and the kthread_unpark()
will access invalid rdp's->nocb_cb_kthread pointer.
This commit therefore use rdp's->nocb_gp_kthread instead of
rdp_gp's->nocb_gp_kthread for safety check.
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
---
kernel/rcu/tree_nocb.h | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
index 1596812f7f12..6679140bb0b5 100644
--- a/kernel/rcu/tree_nocb.h
+++ b/kernel/rcu/tree_nocb.h
@@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
{
int wake_gp;
- struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
WARN_ON_ONCE(cpu_online(rdp->cpu));
/*
@@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
if (!rdp->nocb_gp_rdp)
return -EINVAL;
- if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
+ if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
return -EINVAL;
pr_info("Offloading %d\n", rdp->cpu);
@@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
if (wake_gp)
- wake_up_process(rdp_gp->nocb_gp_kthread);
+ wake_up_process(rdp->nocb_gp_kthread);
swait_event_exclusive(rdp->nocb_state_wq,
rcu_nocb_rdp_offload_wait_cond(rdp));
--
2.17.1
Le Wed, May 07, 2025 at 07:26:05PM +0800, Zqiang a écrit : > In the preparation stage of CPU online, if the corresponding > the rdp's->nocb_cb_kthread does not exist, will be created, > there is a situation where the rdp's rcuop kthreads creation fails, > and then de-offload this CPU's rdp, does not assign this CPU's > rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and > rdp's->rdp_gp->nocb_gp_kthread is still valid. > > This will cause the subsequent re-offload operation of this offline > CPU, which will pass the conditional check and the kthread_unpark() > will access invalid rdp's->nocb_cb_kthread pointer. > > This commit therefore use rdp's->nocb_gp_kthread instead of > rdp_gp's->nocb_gp_kthread for safety check. > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> -- Frederic Weisbecker SUSE Labs
On 5/7/2025 7:26 AM, Zqiang wrote:
> In the preparation stage of CPU online, if the corresponding
> the rdp's->nocb_cb_kthread does not exist, will be created,
> there is a situation where the rdp's rcuop kthreads creation fails,
> and then de-offload this CPU's rdp, does not assign this CPU's
> rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
> rdp's->rdp_gp->nocb_gp_kthread is still valid.
>
> This will cause the subsequent re-offload operation of this offline
> CPU, which will pass the conditional check and the kthread_unpark()
> will access invalid rdp's->nocb_cb_kthread pointer.
>
> This commit therefore use rdp's->nocb_gp_kthread instead of
> rdp_gp's->nocb_gp_kthread for safety check.
>
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
Is it possible that rdp_gp->nocb_gp_kthread is valid, but rdp->nocb_cb_kthread
is still invalid (because its creation failed?). This seems a bit fragile to me
to assume that if rdp_gp->nocb_gp_kthread then rdp->nocb_cb_kthread is valid. Or
is there a path that makes sure this invariant is always satisfied? If so, can
we add additional warnings for checking this invariant?
Also from the other thread, it sounds like there is more work to do here
(related patches so I'd like to defer this to 6.17 - feel free to keep posting
patches for this work though).
Thanks!
- Joel
> ---
> kernel/rcu/tree_nocb.h | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 1596812f7f12..6679140bb0b5 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
> static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> {
> int wake_gp;
> - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
>
> WARN_ON_ONCE(cpu_online(rdp->cpu));
> /*
> @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> if (!rdp->nocb_gp_rdp)
> return -EINVAL;
>
> - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
> + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
> return -EINVAL;
>
> pr_info("Offloading %d\n", rdp->cpu);
> @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
>
> wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
> if (wake_gp)
> - wake_up_process(rdp_gp->nocb_gp_kthread);
> + wake_up_process(rdp->nocb_gp_kthread);
>
> swait_event_exclusive(rdp->nocb_state_wq,
> rcu_nocb_rdp_offload_wait_cond(rdp));
On 5/9/2025 3:07 PM, Joel Fernandes wrote:
>
>
> On 5/7/2025 7:26 AM, Zqiang wrote:
>> In the preparation stage of CPU online, if the corresponding
>> the rdp's->nocb_cb_kthread does not exist, will be created,
>> there is a situation where the rdp's rcuop kthreads creation fails,
>> and then de-offload this CPU's rdp, does not assign this CPU's
>> rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
>> rdp's->rdp_gp->nocb_gp_kthread is still valid.
Maybe I mixed up what you're doing but this commit message is a bit hard to
parse. You're saying in the problematic scenario, "rdp's->nocb_gp_rdp" is valid,
but in the patch below you're checking for "rdp's->nocb_gp_rdp". So I am a bit
confused, why you would not run into the same problematic scenario. I think I
agree with your patch but it would be good to refine and clarify the problematic
condition, the commit message, and also please add some comments to the code :).
Do you have a reproducer for this? If not, maybe we can work on some test cases
Both CPU hotplug and offload is tested by rcutorture, so I'd expect we run into
it. But perhaps not, because it requires kthread creation to fail?
thanks,
- Joel
>>
>> This will cause the subsequent re-offload operation of this offline
>> CPU, which will pass the conditional check and the kthread_unpark()
>> will access invalid rdp's->nocb_cb_kthread pointer.
>>
>> This commit therefore use rdp's->nocb_gp_kthread instead of
>> rdp_gp's->nocb_gp_kthread for safety check.
>>
>> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
>
> Is it possible that rdp_gp->nocb_gp_kthread is valid, but rdp->nocb_cb_kthread
> is still invalid (because its creation failed?). This seems a bit fragile to me
> to assume that if rdp_gp->nocb_gp_kthread then rdp->nocb_cb_kthread is valid. Or
> is there a path that makes sure this invariant is always satisfied? If so, can
> we add additional warnings for checking this invariant?
>
> Also from the other thread, it sounds like there is more work to do here
> (related patches so I'd like to defer this to 6.17 - feel free to keep posting
> patches for this work though).
>
> Thanks!
>
> - Joel
>
>> ---
>> kernel/rcu/tree_nocb.h | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
>> index 1596812f7f12..6679140bb0b5 100644
>> --- a/kernel/rcu/tree_nocb.h
>> +++ b/kernel/rcu/tree_nocb.h
>> @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
>> static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
>> {
>> int wake_gp;
>> - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
>>
>> WARN_ON_ONCE(cpu_online(rdp->cpu));
>> /*
>> @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
>> if (!rdp->nocb_gp_rdp)
>> return -EINVAL;
>>
>> - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
>> + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
>> return -EINVAL;
>>
>> pr_info("Offloading %d\n", rdp->cpu);
>> @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
>>
>> wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
>> if (wake_gp)
>> - wake_up_process(rdp_gp->nocb_gp_kthread);
>> + wake_up_process(rdp->nocb_gp_kthread);
>>
>> swait_event_exclusive(rdp->nocb_state_wq,
>> rcu_nocb_rdp_offload_wait_cond(rdp));
>
>
> On 5/9/2025 3:07 PM, Joel Fernandes wrote:
> >
> >
> > On 5/7/2025 7:26 AM, Zqiang wrote:
> >> In the preparation stage of CPU online, if the corresponding
> >> the rdp's->nocb_cb_kthread does not exist, will be created,
> >> there is a situation where the rdp's rcuop kthreads creation fails,
> >> and then de-offload this CPU's rdp, does not assign this CPU's
> >> rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
> >> rdp's->rdp_gp->nocb_gp_kthread is still valid.
>
> Maybe I mixed up what you're doing but this commit message is a bit hard to
> parse. You're saying in the problematic scenario, "rdp's->nocb_gp_rdp" is valid,
> but in the patch below you're checking for "rdp's->nocb_gp_rdp". So I am a bit
> confused, why you would not run into the same problematic scenario. I think I
> agree with your patch but it would be good to refine and clarify the problematic
> condition, the commit message, and also please add some comments to the code :).
Hello, Joel
Maybe my description is not clear enough.
The scenario I want to describe is:
This is a small probability event,the rdp->nocb_cb_kthread creation
may fail in rcu_spawn_cpu_nocb_kthread(), this rdp->nocb_cb_kthread
will not set,is null pointer. but this rdp->rdp_gp->nocb_gp_kthread
and rdp->nocb_gp_rdp pointer is valid.
but in rcu_nocb_rdp_offload(), we only check rdp->nocb_gp_rdp and
rdp->rdp_gp->nocb_gp_kthread, this will cause kthread_unpark() access
null rdp->nocb_cb_kthread pointer.
so I use rdp->nocb_gp_kthread condition to make a judgment,
if the rdp->nocb_gp_kthread is valid, this means that rdp->nocb_cb_kthread
is also valid and there is no kthreads creation failure in
rcu_spawn_cpu_nocb_kthread()
Thanks
Zqiang
>
> Do you have a reproducer for this? If not, maybe we can work on some test cases
> Both CPU hotplug and offload is tested by rcutorture, so I'd expect we run into
> it. But perhaps not, because it requires kthread creation to fail?
>
> thanks,
>
> - Joel
>
>
> >>
> >> This will cause the subsequent re-offload operation of this offline
> >> CPU, which will pass the conditional check and the kthread_unpark()
> >> will access invalid rdp's->nocb_cb_kthread pointer.
> >>
> >> This commit therefore use rdp's->nocb_gp_kthread instead of
> >> rdp_gp's->nocb_gp_kthread for safety check.
> >>
> >> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> >
> > Is it possible that rdp_gp->nocb_gp_kthread is valid, but rdp->nocb_cb_kthread
> > is still invalid (because its creation failed?). This seems a bit fragile to me
> > to assume that if rdp_gp->nocb_gp_kthread then rdp->nocb_cb_kthread is valid. Or
> > is there a path that makes sure this invariant is always satisfied? If so, can
> > we add additional warnings for checking this invariant?
> >
> > Also from the other thread, it sounds like there is more work to do here
> > (related patches so I'd like to defer this to 6.17 - feel free to keep posting
> > patches for this work though).
> >
> > Thanks!
> >
> > - Joel
> >
> >> ---
> >> kernel/rcu/tree_nocb.h | 5 ++---
> >> 1 file changed, 2 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> >> index 1596812f7f12..6679140bb0b5 100644
> >> --- a/kernel/rcu/tree_nocb.h
> >> +++ b/kernel/rcu/tree_nocb.h
> >> @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
> >> static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> >> {
> >> int wake_gp;
> >> - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> >>
> >> WARN_ON_ONCE(cpu_online(rdp->cpu));
> >> /*
> >> @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> >> if (!rdp->nocb_gp_rdp)
> >> return -EINVAL;
> >>
> >> - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
> >> + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
> >> return -EINVAL;
> >>
> >> pr_info("Offloading %d\n", rdp->cpu);
> >> @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> >>
> >> wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
> >> if (wake_gp)
> >> - wake_up_process(rdp_gp->nocb_gp_kthread);
> >> + wake_up_process(rdp->nocb_gp_kthread);
> >>
> >> swait_event_exclusive(rdp->nocb_state_wq,
> >> rcu_nocb_rdp_offload_wait_cond(rdp));
> >
>
On Wed, May 07, 2025 at 07:26:05PM +0800, Zqiang wrote:
> In the preparation stage of CPU online, if the corresponding
> the rdp's->nocb_cb_kthread does not exist, will be created,
> there is a situation where the rdp's rcuop kthreads creation fails,
> and then de-offload this CPU's rdp, does not assign this CPU's
> rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
> rdp's->rdp_gp->nocb_gp_kthread is still valid.
>
> This will cause the subsequent re-offload operation of this offline
> CPU, which will pass the conditional check and the kthread_unpark()
> will access invalid rdp's->nocb_cb_kthread pointer.
>
> This commit therefore use rdp's->nocb_gp_kthread instead of
> rdp_gp's->nocb_gp_kthread for safety check.
Let's see...
The rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() functions invoke
cpus_read_lock(), and thus exclude all the CPU-hotplug notifiers,
including the one that invokes rcutree_prepare_cpu(). There is also
rcu_spawn_gp_kthread(), but that is an early_initcall() that happens
before CPU hotplug can happen, at least for non-boot CPUs.
So rcu_spawn_cpu_nocb_kthread() cannot run concurrently with either
rcu_nocb_cpu_offload() or rcu_nocb_cpu_deoffload(), correct?
It appears that all CPUs (try to) create their rcuoc and rcuog kthreads
when they come online, regardless of the nohz_full and rcu_nocbs kernel
boot parameters, some old tree_nocb.h comments notwithstanding. ;-) The
rcu_organize_nocb_kthreads() function looks to cover all CPUs as well,
so ->nocb_gp_rdp will always be set after very early boot (give or take
alloc_bootmem_cpumask_var() failure in rcu_nocb_setup() and checked for
by the cpumask_available() in rcu_organize_nocb_kthreads()).
The rcu_spawn_cpu_nocb_kthread() can fail to spawn the GP kthread,
in which case both ->nocb_cb_kthread and ->nocb_gp_kthread remain
NULL.
If so, LGTM.
Thanx, Paul
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> ---
> kernel/rcu/tree_nocb.h | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 1596812f7f12..6679140bb0b5 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
> static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> {
> int wake_gp;
> - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
>
> WARN_ON_ONCE(cpu_online(rdp->cpu));
> /*
> @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> if (!rdp->nocb_gp_rdp)
> return -EINVAL;
>
> - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
> + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
> return -EINVAL;
>
> pr_info("Offloading %d\n", rdp->cpu);
> @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
>
> wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
> if (wake_gp)
> - wake_up_process(rdp_gp->nocb_gp_kthread);
> + wake_up_process(rdp->nocb_gp_kthread);
>
> swait_event_exclusive(rdp->nocb_state_wq,
> rcu_nocb_rdp_offload_wait_cond(rdp));
> --
> 2.17.1
>
>
> On Wed, May 07, 2025 at 07:26:05PM +0800, Zqiang wrote:
> > In the preparation stage of CPU online, if the corresponding
> > the rdp's->nocb_cb_kthread does not exist, will be created,
> > there is a situation where the rdp's rcuop kthreads creation fails,
> > and then de-offload this CPU's rdp, does not assign this CPU's
> > rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
> > rdp's->rdp_gp->nocb_gp_kthread is still valid.
> >
> > This will cause the subsequent re-offload operation of this offline
> > CPU, which will pass the conditional check and the kthread_unpark()
> > will access invalid rdp's->nocb_cb_kthread pointer.
> >
> > This commit therefore use rdp's->nocb_gp_kthread instead of
> > rdp_gp's->nocb_gp_kthread for safety check.
>
> Let's see...
>
> The rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() functions invoke
> cpus_read_lock(), and thus exclude all the CPU-hotplug notifiers,
> including the one that invokes rcutree_prepare_cpu(). There is also
> rcu_spawn_gp_kthread(), but that is an early_initcall() that happens
> before CPU hotplug can happen, at least for non-boot CPUs.
>
> So rcu_spawn_cpu_nocb_kthread() cannot run concurrently with either
> rcu_nocb_cpu_offload() or rcu_nocb_cpu_deoffload(), correct?
Yes, the rcutree_prepare_cpu() is invoked under the cpus_write_lock()
protection.
>
> It appears that all CPUs (try to) create their rcuoc and rcuog kthreads
> when they come online, regardless of the nohz_full and rcu_nocbs kernel
> boot parameters, some old tree_nocb.h comments notwithstanding. ;-) The
> rcu_organize_nocb_kthreads() function looks to cover all CPUs as well,
> so ->nocb_gp_rdp will always be set after very early boot (give or take
> alloc_bootmem_cpumask_var() failure in rcu_nocb_setup() and checked for
> by the cpumask_available() in rcu_organize_nocb_kthreads()).
>
> The rcu_spawn_cpu_nocb_kthread() can fail to spawn the GP kthread,
> in which case both ->nocb_cb_kthread and ->nocb_gp_kthread remain
> NULL.
This is a low probability event, but it is possible, if this happens,
and we test it with rcutorture configured with parameters
nocbs_toggle and onoff_interval, it will trigger a null ptr access.
Thanks
Zqiang
>
> If so, LGTM.
>
> Thanx, Paul
>
> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> > ---
> > kernel/rcu/tree_nocb.h | 5 ++---
> > 1 file changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > index 1596812f7f12..6679140bb0b5 100644
> > --- a/kernel/rcu/tree_nocb.h
> > +++ b/kernel/rcu/tree_nocb.h
> > @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
> > static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> > {
> > int wake_gp;
> > - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> >
> > WARN_ON_ONCE(cpu_online(rdp->cpu));
> > /*
> > @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> > if (!rdp->nocb_gp_rdp)
> > return -EINVAL;
> >
> > - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
> > + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
> > return -EINVAL;
> >
> > pr_info("Offloading %d\n", rdp->cpu);
> > @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> >
> > wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
> > if (wake_gp)
> > - wake_up_process(rdp_gp->nocb_gp_kthread);
> > + wake_up_process(rdp->nocb_gp_kthread);
> >
> > swait_event_exclusive(rdp->nocb_state_wq,
> > rcu_nocb_rdp_offload_wait_cond(rdp));
> > --
> > 2.17.1
> >
On Fri, May 09, 2025 at 11:32:13AM +0800, Z qiang wrote:
> >
> > On Wed, May 07, 2025 at 07:26:05PM +0800, Zqiang wrote:
> > > In the preparation stage of CPU online, if the corresponding
> > > the rdp's->nocb_cb_kthread does not exist, will be created,
> > > there is a situation where the rdp's rcuop kthreads creation fails,
> > > and then de-offload this CPU's rdp, does not assign this CPU's
> > > rdp->nocb_cb_kthread pointer, but this rdp's->nocb_gp_rdp and
> > > rdp's->rdp_gp->nocb_gp_kthread is still valid.
> > >
> > > This will cause the subsequent re-offload operation of this offline
> > > CPU, which will pass the conditional check and the kthread_unpark()
> > > will access invalid rdp's->nocb_cb_kthread pointer.
> > >
> > > This commit therefore use rdp's->nocb_gp_kthread instead of
> > > rdp_gp's->nocb_gp_kthread for safety check.
> >
> > Let's see...
> >
> > The rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() functions invoke
> > cpus_read_lock(), and thus exclude all the CPU-hotplug notifiers,
> > including the one that invokes rcutree_prepare_cpu(). There is also
> > rcu_spawn_gp_kthread(), but that is an early_initcall() that happens
> > before CPU hotplug can happen, at least for non-boot CPUs.
> >
> > So rcu_spawn_cpu_nocb_kthread() cannot run concurrently with either
> > rcu_nocb_cpu_offload() or rcu_nocb_cpu_deoffload(), correct?
>
> Yes, the rcutree_prepare_cpu() is invoked under the cpus_write_lock()
> protection.
Very good!
> > It appears that all CPUs (try to) create their rcuoc and rcuog kthreads
> > when they come online, regardless of the nohz_full and rcu_nocbs kernel
> > boot parameters, some old tree_nocb.h comments notwithstanding. ;-) The
> > rcu_organize_nocb_kthreads() function looks to cover all CPUs as well,
> > so ->nocb_gp_rdp will always be set after very early boot (give or take
> > alloc_bootmem_cpumask_var() failure in rcu_nocb_setup() and checked for
> > by the cpumask_available() in rcu_organize_nocb_kthreads()).
> >
> > The rcu_spawn_cpu_nocb_kthread() can fail to spawn the GP kthread,
> > in which case both ->nocb_cb_kthread and ->nocb_gp_kthread remain
> > NULL.
>
> This is a low probability event, but it is possible, if this happens,
> and we test it with rcutorture configured with parameters
> nocbs_toggle and onoff_interval, it will trigger a null ptr access.
Understood, and that might be another patch if you would like to
persue it.
Thanx, Paul
> Thanks
> Zqiang
>
>
> >
> > If so, LGTM.
> >
> > Thanx, Paul
> >
> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> > > ---
> > > kernel/rcu/tree_nocb.h | 5 ++---
> > > 1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > index 1596812f7f12..6679140bb0b5 100644
> > > --- a/kernel/rcu/tree_nocb.h
> > > +++ b/kernel/rcu/tree_nocb.h
> > > @@ -1146,7 +1146,6 @@ static bool rcu_nocb_rdp_offload_wait_cond(struct rcu_data *rdp)
> > > static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> > > {
> > > int wake_gp;
> > > - struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> > >
> > > WARN_ON_ONCE(cpu_online(rdp->cpu));
> > > /*
> > > @@ -1156,7 +1155,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> > > if (!rdp->nocb_gp_rdp)
> > > return -EINVAL;
> > >
> > > - if (WARN_ON_ONCE(!rdp_gp->nocb_gp_kthread))
> > > + if (WARN_ON_ONCE(!rdp->nocb_gp_kthread))
> > > return -EINVAL;
> > >
> > > pr_info("Offloading %d\n", rdp->cpu);
> > > @@ -1166,7 +1165,7 @@ static int rcu_nocb_rdp_offload(struct rcu_data *rdp)
> > >
> > > wake_gp = rcu_nocb_queue_toggle_rdp(rdp);
> > > if (wake_gp)
> > > - wake_up_process(rdp_gp->nocb_gp_kthread);
> > > + wake_up_process(rdp->nocb_gp_kthread);
> > >
> > > swait_event_exclusive(rdp->nocb_state_wq,
> > > rcu_nocb_rdp_offload_wait_cond(rdp));
> > > --
> > > 2.17.1
> > >
© 2016 - 2025 Red Hat, Inc.