[PATCH 0/3] Remove volatile from arch_raw_cpu_ptr() and revert the hacks.

Sebastian Andrzej Siewior posted 3 patches 4 years, 2 months ago
There is a newer version of this series
[PATCH 0/3] Remove volatile from arch_raw_cpu_ptr() and revert the hacks.
Posted by Sebastian Andrzej Siewior 4 years, 2 months ago
On 2022-03-23 10:17:08 [-0700], Linus Torvalds wrote:
> I get the feeling that the real problem is that on x86, we have this:
> 
> #define arch_raw_cpu_ptr(ptr)                           \
> ({                                                      \
>         unsigned long tcp_ptr__;                        \
>         asm volatile("add " __percpu_arg(1) ", %0"      \
>                      : "=r" (tcp_ptr__)                 \
>                      : "m" (this_cpu_off), "0" (ptr));  \
>         (typeof(*(ptr)) __kernel __force *)tcp_ptr__;   \
> })
> 
> and that "volatile" is just *WRONG*.
> 
> That volatile is what literally tells the compiler "you can't remove
> this if it isn't used".

It is indeed just x86. After double checking arm/mips removes that code
properly.

> But there's no point to that.
> 
> So how about we
> 
>  (a) just revert commit 9983a9d577db4
> 
>  (b) remove that bogus 'volatile'
> 
> Doesn't that fix the problem?

The following series does that. The assembly code looks okay. In a few
simple test cases the this_cpu_ptr() usage is always created and is not
moved passed preempt_enable() statement.
The resulting vmlinux shrunk a bit. The test config lost ~2KiB:
     text      data       bss       dec      hex filename
 22533901  10722831  13963496  47220228  2d08604 vmlinux.volatile
 22531589  10722831  13971688  47226108  2d09cfc vmlinux.patched

after looking at it it was sometimes due avoiding this_cpu_ptr(),
sometimes it looked that the compiler made other decisions at the
earlier resulting to be more beneficial later on.

Sebastian
Re: [PATCH 0/3] Remove volatile from arch_raw_cpu_ptr() and revert the hacks.
Posted by Linus Torvalds 4 years, 2 months ago
On Thu, Mar 24, 2022 at 10:39 AM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
>
> The following series does that. The assembly code looks okay. In a few
> simple test cases the this_cpu_ptr() usage is always created and is not
> moved passed preempt_enable() statement.

Hmm. I didn't think things through, and you're very correct in having
thought about preempt_enable/disable as needing barriers for that asm.

So the "volatile" does have that potential significance for that
arch_raw_cpu_ptr() in case it was what ordered it wrt preemption.

It turns out that it's all ok, because

 - even without the 'volatile', arch_raw_cpu_ptr() has an *input* that
is a memory location:

                     : "m" (this_cpu_off)

 - and prempt_{dis,en}able() fundamentally has a 'barrier()' in it

and as such it's not just random luck that the percpu thing is never
moved around preemption points. They are properly serialized even
without any 'volatile'.

But that "it's not just random luck" might be worth a comment.

So Ack on your series, but that additional comment might be worth it,
since I didn't even think of it until you mentioned it. Clearly much
too subtle.

Thanks,
              Linus
Re: [PATCH 0/3] Remove volatile from arch_raw_cpu_ptr() and revert the hacks.
Posted by Peter Zijlstra 4 years, 2 months ago
On Thu, Mar 24, 2022 at 11:28:16AM -0700, Linus Torvalds wrote:

> So Ack on your series, but that additional comment might be worth it,
> since I didn't even think of it until you mentioned it. Clearly much
> too subtle.

Sebastian, were you going to resend these patches with that comment on,
or should I go collect them as is for post -rc1?
Re: [PATCH 0/3] Remove volatile from arch_raw_cpu_ptr() and revert the hacks.
Posted by Sebastian Andrzej Siewior 4 years, 2 months ago
On 2022-03-28 15:55:14 [+0200], Peter Zijlstra wrote:
> On Thu, Mar 24, 2022 at 11:28:16AM -0700, Linus Torvalds wrote:
> 
> > So Ack on your series, but that additional comment might be worth it,
> > since I didn't even think of it until you mentioned it. Clearly much
> > too subtle.
> 
> Sebastian, were you going to resend these patches with that comment on,
> or should I go collect them as is for post -rc1?

Just did so. Sorry, had no time on FRI.

Sebastian
[PATCH v2 0/3] Remove volatile from arch_raw_cpu_ptr() and revert the hacks.
Posted by Sebastian Andrzej Siewior 4 years, 2 months ago
Remove volatile from arch_raw_cpu_ptr() and revert the hacks.

v1…v2: Updated the comments in 1/3

Sebastian
[PATCH v2 1/3] x86/percpu: Remove volatile from arch_raw_cpu_ptr().
Posted by Sebastian Andrzej Siewior 4 years, 2 months ago
The volatile attribute in the inline assembly of arch_raw_cpu_ptr()
forces the compiler to always generate the code, even if the compiler
can decide upfront that its result is not needed.

For instance invoking __intel_pmu_disable_all(false) (like
intel_pmu_snapshot_arch_branch_stack() does) leads to loading the
address of &cpu_hw_events into the register while compiler knows that it
has no need for it. This ends up with code like:

|	movq	$cpu_hw_events, %rax			#, tcp_ptr__
|	add	%gs:this_cpu_off(%rip), %rax		# this_cpu_off, tcp_ptr__
|	xorl	%eax, %eax				# tmp93

It also creates additional code within local_lock() with !RT &&
!LOCKDEP which is not desired.

By removing the volatile attribute the compiler can place the
function freely and avoid it if it is not needed in the end.
By using the function twice the compiler properly caches only the
variable offset and always loads the CPU-offset.

this_cpu_ptr() also remains properly placed within a preempt_disable()
sections because
- arch_raw_cpu_ptr() assembly has a memory input ("m" (this_cpu_off))
- prempt_{dis,en}able() fundamentally has a 'barrier()' in it

Therefore this_cpu_ptr() is already properly serialized and does not
rely on the 'volatile' attribute.

Remove volatile from arch_raw_cpu_ptr().

[ bigeasy: Added Linus' explanation why this_cpu_ptr() is not moved out
  of a preempt_disable() section without the 'volatile' attribute. ]

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 arch/x86/include/asm/percpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a3c33b79fb865..5d572a19a389c 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -38,7 +38,7 @@
 #define arch_raw_cpu_ptr(ptr)				\
 ({							\
 	unsigned long tcp_ptr__;			\
-	asm volatile("add " __percpu_arg(1) ", %0"	\
+	asm ("add " __percpu_arg(1) ", %0"	\
 		     : "=r" (tcp_ptr__)			\
 		     : "m" (this_cpu_off), "0" (ptr));	\
 	(typeof(*(ptr)) __kernel __force *)tcp_ptr__;	\
-- 
2.35.1
[tip: locking/urgent] x86/percpu: Remove volatile from arch_raw_cpu_ptr().
Posted by tip-bot2 for Sebastian Andrzej Siewior 4 years, 2 months ago
The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     1c1e7e3c23dd25f938302428eeb22c3dda2c3427
Gitweb:        https://git.kernel.org/tip/1c1e7e3c23dd25f938302428eeb22c3dda2c3427
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Mon, 28 Mar 2022 16:58:08 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Apr 2022 09:59:38 +02:00

x86/percpu: Remove volatile from arch_raw_cpu_ptr().

The volatile attribute in the inline assembly of arch_raw_cpu_ptr()
forces the compiler to always generate the code, even if the compiler
can decide upfront that its result is not needed.

For instance invoking __intel_pmu_disable_all(false) (like
intel_pmu_snapshot_arch_branch_stack() does) leads to loading the
address of &cpu_hw_events into the register while compiler knows that it
has no need for it. This ends up with code like:

|	movq	$cpu_hw_events, %rax			#, tcp_ptr__
|	add	%gs:this_cpu_off(%rip), %rax		# this_cpu_off, tcp_ptr__
|	xorl	%eax, %eax				# tmp93

It also creates additional code within local_lock() with !RT &&
!LOCKDEP which is not desired.

By removing the volatile attribute the compiler can place the
function freely and avoid it if it is not needed in the end.
By using the function twice the compiler properly caches only the
variable offset and always loads the CPU-offset.

this_cpu_ptr() also remains properly placed within a preempt_disable()
sections because
- arch_raw_cpu_ptr() assembly has a memory input ("m" (this_cpu_off))
- prempt_{dis,en}able() fundamentally has a 'barrier()' in it

Therefore this_cpu_ptr() is already properly serialized and does not
rely on the 'volatile' attribute.

Remove volatile from arch_raw_cpu_ptr().

[ bigeasy: Added Linus' explanation why this_cpu_ptr() is not moved out
  of a preempt_disable() section without the 'volatile' attribute. ]

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220328145810.86783-2-bigeasy@linutronix.de
---
 arch/x86/include/asm/percpu.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index a3c33b7..13c0d63 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -38,9 +38,9 @@
 #define arch_raw_cpu_ptr(ptr)				\
 ({							\
 	unsigned long tcp_ptr__;			\
-	asm volatile("add " __percpu_arg(1) ", %0"	\
-		     : "=r" (tcp_ptr__)			\
-		     : "m" (this_cpu_off), "0" (ptr));	\
+	asm ("add " __percpu_arg(1) ", %0"		\
+	     : "=r" (tcp_ptr__)				\
+	     : "m" (this_cpu_off), "0" (ptr));		\
 	(typeof(*(ptr)) __kernel __force *)tcp_ptr__;	\
 })
 #else
[PATCH v2 2/3] Revert "locking/local_lock: Make the empty local_lock_*() function a macro."
Posted by Sebastian Andrzej Siewior 4 years, 2 months ago
With volatile removed from arch_raw_cpu_ptr() the compiler no longer
creates the per-CPU reference. The usage of the macro can be reverted
now.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/linux/local_lock_internal.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h
index 6d635e8306d64..975e33b793a77 100644
--- a/include/linux/local_lock_internal.h
+++ b/include/linux/local_lock_internal.h
@@ -44,9 +44,9 @@ static inline void local_lock_debug_init(local_lock_t *l)
 }
 #else /* CONFIG_DEBUG_LOCK_ALLOC */
 # define LOCAL_LOCK_DEBUG_INIT(lockname)
-# define local_lock_acquire(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
-# define local_lock_release(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
-# define local_lock_debug_init(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
+static inline void local_lock_acquire(local_lock_t *l) { }
+static inline void local_lock_release(local_lock_t *l) { }
+static inline void local_lock_debug_init(local_lock_t *l) { }
 #endif /* !CONFIG_DEBUG_LOCK_ALLOC */
 
 #define INIT_LOCAL_LOCK(lockname)	{ LOCAL_LOCK_DEBUG_INIT(lockname) }
-- 
2.35.1
[tip: locking/urgent] Revert "locking/local_lock: Make the empty local_lock_*() function a macro."
Posted by tip-bot2 for Sebastian Andrzej Siewior 4 years, 2 months ago
The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     2d2f8f083ef29e9b7adfe5cb421368331543473f
Gitweb:        https://git.kernel.org/tip/2d2f8f083ef29e9b7adfe5cb421368331543473f
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Mon, 28 Mar 2022 16:58:09 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Apr 2022 09:59:39 +02:00

Revert "locking/local_lock: Make the empty local_lock_*() function a macro."

With volatile removed from arch_raw_cpu_ptr() the compiler no longer
creates the per-CPU reference. The usage of the macro can be reverted
now.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220328145810.86783-3-bigeasy@linutronix.de
---
 include/linux/local_lock_internal.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h
index 6d635e8..975e33b 100644
--- a/include/linux/local_lock_internal.h
+++ b/include/linux/local_lock_internal.h
@@ -44,9 +44,9 @@ static inline void local_lock_debug_init(local_lock_t *l)
 }
 #else /* CONFIG_DEBUG_LOCK_ALLOC */
 # define LOCAL_LOCK_DEBUG_INIT(lockname)
-# define local_lock_acquire(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
-# define local_lock_release(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
-# define local_lock_debug_init(__ll)  do { typecheck(local_lock_t *, __ll); } while (0)
+static inline void local_lock_acquire(local_lock_t *l) { }
+static inline void local_lock_release(local_lock_t *l) { }
+static inline void local_lock_debug_init(local_lock_t *l) { }
 #endif /* !CONFIG_DEBUG_LOCK_ALLOC */
 
 #define INIT_LOCAL_LOCK(lockname)	{ LOCAL_LOCK_DEBUG_INIT(lockname) }
[PATCH v2 3/3] Revert "mm/page_alloc: mark pagesets as __maybe_unused"
Posted by Sebastian Andrzej Siewior 4 years, 2 months ago
The local_lock() is now using a proper static inline function which is
enough for llvm to accept that the variable is used.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdc8f60ae4622..634323b13cd2f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -128,7 +128,7 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
 struct pagesets {
 	local_lock_t lock;
 };
-static DEFINE_PER_CPU(struct pagesets, pagesets) __maybe_unused = {
+static DEFINE_PER_CPU(struct pagesets, pagesets) = {
 	.lock = INIT_LOCAL_LOCK(lock),
 };
 
-- 
2.35.1
[tip: locking/urgent] Revert "mm/page_alloc: mark pagesets as __maybe_unused"
Posted by tip-bot2 for Sebastian Andrzej Siewior 4 years, 2 months ago
The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     273ba85b5e8b971ed28eb5c17e1638543be9237d
Gitweb:        https://git.kernel.org/tip/273ba85b5e8b971ed28eb5c17e1638543be9237d
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Mon, 28 Mar 2022 16:58:10 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Apr 2022 09:59:39 +02:00

Revert "mm/page_alloc: mark pagesets as __maybe_unused"

The local_lock() is now using a proper static inline function which is
enough for llvm to accept that the variable is used.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220328145810.86783-4-bigeasy@linutronix.de
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2db9578..6e5b448 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -128,7 +128,7 @@ static DEFINE_MUTEX(pcp_batch_high_lock);
 struct pagesets {
 	local_lock_t lock;
 };
-static DEFINE_PER_CPU(struct pagesets, pagesets) __maybe_unused = {
+static DEFINE_PER_CPU(struct pagesets, pagesets) = {
 	.lock = INIT_LOCAL_LOCK(lock),
 };