cpumask: Ensure the visibility of set_nr_cpu_ids

[PATCH] cpumask: Ensure the visibility of set_nr_cpu_ids

Posted by I Hsin Cheng 1 year, 5 months ago

The variable "nr_cpu_ids" is a system-wide variable which should be seen
as consistent all the time. For example it's set in one of the kernel
setup procedure "prefill_possible_map", the operations here should
happens before all the code after setup, which means the operations here
should be visible to all the code after setup.

set_cpu_possible() ensure it's visibility because it eventually falls
into an atomic instruction, however the function "set_nr_cpu_ids()"
fails to make the guarantee since it only performs a normal write
operations.

Adding the macro "WRITE_ONCE()" will prevent the compiler from re-order
the instruction of the write operation for "nr_cpu_ids", so we can
guarantee the operation is visible to all the codes coming after it.

Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
---
 include/linux/cpumask.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index f10fb87d4..3731f5e43 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -46,7 +46,7 @@ static inline void set_nr_cpu_ids(unsigned int nr)
 #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
 	WARN_ON(nr != nr_cpu_ids);
 #else
-	nr_cpu_ids = nr;
+	WRITE_ONCE(nr_cpu_ids, nr);
 #endif
 }
 
-- 
2.34.1

Re: [PATCH] cpumask: Ensure the visibility of set_nr_cpu_ids

Posted by Yury Norov 1 year, 5 months ago

On Sun, Aug 11, 2024 at 05:25:01PM +0800, I Hsin Cheng wrote:
> The variable "nr_cpu_ids" is a system-wide variable which should be seen
> as consistent all the time. For example it's set in one of the kernel
> setup procedure "prefill_possible_map", the operations here should
> happens before all the code after setup, which means the operations here
> should be visible to all the code after setup.
> 
> set_cpu_possible() ensure it's visibility because it eventually falls
> into an atomic instruction, however the function "set_nr_cpu_ids()"
> fails to make the guarantee since it only performs a normal write
> operations.

Set_cpu_possible() is a completely different thing.

> Adding the macro "WRITE_ONCE()" will prevent the compiler from re-order
> the instruction of the write operation for "nr_cpu_ids", so we can
> guarantee the operation is visible to all the codes coming after it.
> 
> Signed-off-by: I Hsin Cheng <richard120310@gmail.com>

I don't understand this. nr_cpu_ids is initialized at compile time
to NR_CPUS, to represent maximum number of bits in cpumasks.

Later on runtime we update nr_cpu_ids with an actual number of possible
CPUs in the system. The type of the variable is unsigned int, and it
means that threads accessing it will either fetch NR_CPUS, or new value
coherently.

Having nr_cpu_ids == NR_CPUS is not an error, it's just a non-optimal
value. The only effect of it is that kernel algorithms traverse unused
part of cpumasks for the first few microseconds after boot.

Can you explain in details what type of race you're trying to fix?
Which architecture? What is the race scenario?

> ---
>  include/linux/cpumask.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index f10fb87d4..3731f5e43 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -46,7 +46,7 @@ static inline void set_nr_cpu_ids(unsigned int nr)
>  #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
>  	WARN_ON(nr != nr_cpu_ids);
>  #else
> -	nr_cpu_ids = nr;
> +	WRITE_ONCE(nr_cpu_ids, nr);

WRITE_ONCE()? How is that supposed to work? The only possible effect
would be reordering of a couple of instructions. How would that help
threads running on other CPUs synchronize any better?

Regardless, WRITE_ONCE() should always be paired with READ_ONCE() to
make it working. So, if we take this, we should also make every read of
nr_cpu_ids by using READ_ONCE(). nr_cpu_ids is used in fast paths in
many places, particularly as loop termination condition. Things like
this:

        while (cpu < READ_ONCE(nr_cpu_ids))
                do_something_very_quick();

would definitely hit performance.

Thanks,
Yury

>  #endif
>  }
>  
> -- 
> 2.34.1

Re: [PATCH] cpumask: Ensure the visibility of set_nr_cpu_ids

Posted by I Hsin Cheng 1 year, 5 months ago

On Sun, Aug 11, 2024 at 08:18:01AM -0700, Yury Norov wrote:
> On Sun, Aug 11, 2024 at 05:25:01PM +0800, I Hsin Cheng wrote:
> > The variable "nr_cpu_ids" is a system-wide variable which should be seen
> > as consistent all the time. For example it's set in one of the kernel
> > setup procedure "prefill_possible_map", the operations here should
> > happens before all the code after setup, which means the operations here
> > should be visible to all the code after setup.
> > 
> > set_cpu_possible() ensure it's visibility because it eventually falls
> > into an atomic instruction, however the function "set_nr_cpu_ids()"
> > fails to make the guarantee since it only performs a normal write
> > operations.
> 
> Set_cpu_possible() is a completely different thing.
>  
> > Adding the macro "WRITE_ONCE()" will prevent the compiler from re-order
> > the instruction of the write operation for "nr_cpu_ids", so we can
> > guarantee the operation is visible to all the codes coming after it.
> > 
> > Signed-off-by: I Hsin Cheng <richard120310@gmail.com>
> 
> I don't understand this. nr_cpu_ids is initialized at compile time
> to NR_CPUS, to represent maximum number of bits in cpumasks.
> 
> Later on runtime we update nr_cpu_ids with an actual number of possible
> CPUs in the system. The type of the variable is unsigned int, and it
> means that threads accessing it will either fetch NR_CPUS, or new value
> coherently.
> 
> Having nr_cpu_ids == NR_CPUS is not an error, it's just a non-optimal
> value. The only effect of it is that kernel algorithms traverse unused
> part of cpumasks for the first few microseconds after boot.
> 
> Can you explain in details what type of race you're trying to fix?
> Which architecture? What is the race scenario?
> 
> > ---
> >  include/linux/cpumask.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index f10fb87d4..3731f5e43 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -46,7 +46,7 @@ static inline void set_nr_cpu_ids(unsigned int nr)
> >  #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
> >  	WARN_ON(nr != nr_cpu_ids);
> >  #else
> > -	nr_cpu_ids = nr;
> > +	WRITE_ONCE(nr_cpu_ids, nr);
> 
> 
> WRITE_ONCE()? How is that supposed to work? The only possible effect
> would be reordering of a couple of instructions. How would that help
> threads running on other CPUs synchronize any better?
> 
> Regardless, WRITE_ONCE() should always be paired with READ_ONCE() to
> make it working. So, if we take this, we should also make every read of
> nr_cpu_ids by using READ_ONCE(). nr_cpu_ids is used in fast paths in
> many places, particularly as loop termination condition. Things like
> this:
> 
>         while (cpu < READ_ONCE(nr_cpu_ids))
>                 do_something_very_quick();
> 
> would definitely hit performance.
> 
> Thanks,
> Yury
> 
> >  #endif
> >  }
> >  
> > -- 
> > 2.34.1

Thanks for the reply and your detailed explanation, I wasn't trying to
fix any race conditions, I was thinking the same as you mentioned that
some of the cpumask's operation would traverse more times than it
actually needs to. I took set_cpu_possible() as an example trying to
express I think it would be nice for nr_cpu_ids to make synchronized
guarantees across CPUs, as WRITE_ONCE() would also provide write memory
barrier.

> Regardless, WRITE_ONCE() should always be paired with READ_ONCE() to
> make it working. So, if we take this, we should also make every read of
> nr_cpu_ids by using READ_ONCE(). nr_cpu_ids is used in fast paths in
> many places, particularly as loop termination condition. Things like
> this:
>
>         while (cpu < READ_ONCE(nr_cpu_ids))
>                 do_something_very_quick();
>
> would definitely hit performance.

Indeed if WRITCE_ONCE() is going to be applied here, every read of
nr_cpu_ids will have to use READ_ONCE() and it'll harm the fast path's
performance. I didn't considered this part, thanks for pointing it out.
Comparing to the number of redundant traversal, I think the performance
hit introduced by READ_ONCE() in fast path will be larger, so we should
stick to the same way. I learned alot and should reconsidered many parts
of my former knowledge about cpumask from your reply.
Thanks again for your detail explanation and kindly reply !

Best regards,
I-Hsin Cheng