[patch V3 03/20] sched/mmcid: Cacheline align MM CID storage

Thomas Gleixner posted 20 patches 3 months, 1 week ago
There is a newer version of this series
[patch V3 03/20] sched/mmcid: Cacheline align MM CID storage
Posted by Thomas Gleixner 3 months, 1 week ago
Both the per CPU storage and the data in mm_struct are heavily used in
context switch. As they can end up next to other frequently modified data,
they are subject to false sharing.

Make them cache line aligned.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/rseq_types.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/rseq_types.h
+++ b/include/linux/rseq_types.h
@@ -112,7 +112,7 @@ struct sched_mm_cid {
  */
 struct mm_cid_pcpu {
 	unsigned int	cid;
-};
+}____cacheline_aligned_in_smp;
 
 /**
  * struct mm_mm_cid - Storage for per MM CID data
@@ -126,7 +126,7 @@ struct mm_mm_cid {
 	struct mm_cid_pcpu	__percpu *pcpu;
 	unsigned int		nr_cpus_allowed;
 	raw_spinlock_t		lock;
-};
+}____cacheline_aligned_in_smp;
 #else /* CONFIG_SCHED_MM_CID */
 struct mm_mm_cid { };
 struct sched_mm_cid { };
Re: [patch V3 03/20] sched/mmcid: Cacheline align MM CID storage
Posted by Mathieu Desnoyers 3 months, 1 week ago
On 2025-10-29 09:08, Thomas Gleixner wrote:
[...]
>   struct mm_cid_pcpu {
>   	unsigned int	cid;
> -};
> +}____cacheline_aligned_in_smp;

What's the point in cacheline aligning this per-CPU variable ?
Should we expect other accesses to per-CPU variables sharing the
same cache line to update them frequently from remote CPUs ?

I did not cacheline align it expecting that per-CPU variables are
typically updated from their respective CPUs. So perhaps reality
don't match my expectations, but that's news to me.

> @@ -126,7 +126,7 @@ struct mm_mm_cid {
[...]
> -};
> +}____cacheline_aligned_in_smp;

OK for this cacheline align.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Re: [patch V3 03/20] sched/mmcid: Cacheline align MM CID storage
Posted by Thomas Gleixner 3 months, 1 week ago
On Wed, Oct 29 2025 at 11:39, Mathieu Desnoyers wrote:
> On 2025-10-29 09:08, Thomas Gleixner wrote:
> [...]
>>   struct mm_cid_pcpu {
>>   	unsigned int	cid;
>> -};
>> +}____cacheline_aligned_in_smp;
>
> What's the point in cacheline aligning this per-CPU variable ?
> Should we expect other accesses to per-CPU variables sharing the
> same cache line to update them frequently from remote CPUs ?
>
> I did not cacheline align it expecting that per-CPU variables are
> typically updated from their respective CPUs. So perhaps reality
> don't match my expectations, but that's news to me.

It depends. While per CPU variables are typically updated only on the
local CPU there are situations where there is cross CPU access and it
really depends in which proximity it ends up. I made it that way because
I saw the accesses prominent in perf top, which means there is
contention on the cache line.

Thanks,

        tglx
Re: [patch V3 03/20] sched/mmcid: Cacheline align MM CID storage
Posted by Mathieu Desnoyers 3 months, 1 week ago
On 2025-10-29 17:09, Thomas Gleixner wrote:
> On Wed, Oct 29 2025 at 11:39, Mathieu Desnoyers wrote:
>> On 2025-10-29 09:08, Thomas Gleixner wrote:
>> [...]
>>>    struct mm_cid_pcpu {
>>>    	unsigned int	cid;
>>> -};
>>> +}____cacheline_aligned_in_smp;
>>
>> What's the point in cacheline aligning this per-CPU variable ?
>> Should we expect other accesses to per-CPU variables sharing the
>> same cache line to update them frequently from remote CPUs ?
>>
>> I did not cacheline align it expecting that per-CPU variables are
>> typically updated from their respective CPUs. So perhaps reality
>> don't match my expectations, but that's news to me.
> 
> It depends. While per CPU variables are typically updated only on the
> local CPU there are situations where there is cross CPU access and it
> really depends in which proximity it ends up. I made it that way because
> I saw the accesses prominent in perf top, which means there is
> contention on the cache line.
I did notice false sharing in the past within the mm_struct between
the mm_count field and the mm_cid percpu _pointer load_:

commit c1753fd02a00 ("mm: move mm_count into its own cache line")

Before understanding that this was actually the pointer load
that was false-sharing with mm_count, I initially thought that the
per-cpu memory somehow had false sharing because I was reading
the perf profiles incorrectly.

I just want to make sure that what you have identified in the perf
profiles is indeed false sharing of the per-cpu memory and not false
sharing of the per-cpu pointer load. Otherwise we'd been adding entirely
useless padding to percpu structures.

Note that in the current layout, atomic_t mm_users is right besides
the pcpu_cid percpu pointer, which may cause false sharing if mm_users
is updated often. But if that's indeed the culprit, then just adding
the cacheline align on the new struct mm_mm_cid suffices.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com