[PATCH] x86/altcall: always use a temporary parameter stashing variable

Roger Pau Monne posted 1 patch 1 month, 4 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20240228135908.13319-1-roger.pau@citrix.com
xen/arch/x86/include/asm/alternative.h | 36 +++++++++++++++++---------
1 file changed, 24 insertions(+), 12 deletions(-)
[PATCH] x86/altcall: always use a temporary parameter stashing variable
Posted by Roger Pau Monne 1 month, 4 weeks ago
The usage in ALT_CALL_ARG() on clang of:

register union {
    typeof(arg) e;
    const unsigned long r;
} ...

When `arg` is the first argument to alternative_{,v}call() and
const_vlapic_vcpu() is used results in clang 3.5.0 complaining with:

arch/x86/hvm/vlapic.c:141:47: error: non-const static data member must be initialized out of line
         alternative_call(hvm_funcs.test_pir, const_vlapic_vcpu(vlapic), vec) )

Workaround this by pulling `arg1` into a local variable, like it's done for
further arguments (arg2, arg3...)

Originally arg1 wasn't pulled into a variable because for the a1_ register
local variable the possible clobbering as a result of operators on other
variables don't matter:

https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables

Note clang version 3.8.1 seems to already be fixed and don't require the
workaround, but since it's harmless do it uniformly everywhere.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Gitlab CI seems OK on both the 4.17 and staging branches with this applied:

4.17:    https://gitlab.com/xen-project/people/royger/xen/-/pipelines/1193801183
staging: https://gitlab.com/xen-project/people/royger/xen/-/pipelines/1193801881
---
 xen/arch/x86/include/asm/alternative.h | 36 +++++++++++++++++---------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
index 3c14db5078ba..0d3697f1de49 100644
--- a/xen/arch/x86/include/asm/alternative.h
+++ b/xen/arch/x86/include/asm/alternative.h
@@ -253,21 +253,24 @@ extern void alternative_branches(void);
 })
 
 #define alternative_vcall1(func, arg) ({           \
-    ALT_CALL_ARG(arg, 1);                          \
+    typeof(arg) v1_ = (arg);                       \
+    ALT_CALL_ARG(v1_, 1);                          \
     ALT_CALL_NO_ARG2;                              \
     (void)sizeof(func(arg));                       \
     (void)alternative_callN(1, int, func);         \
 })
 
 #define alternative_call1(func, arg) ({            \
-    ALT_CALL_ARG(arg, 1);                          \
+    typeof(arg) v1_ = (arg);                       \
+    ALT_CALL_ARG(v1_, 1);                          \
     ALT_CALL_NO_ARG2;                              \
     alternative_callN(1, typeof(func(arg)), func); \
 })
 
 #define alternative_vcall2(func, arg1, arg2) ({           \
+    typeof(arg1) v1_ = (arg1);                            \
     typeof(arg2) v2_ = (arg2);                            \
-    ALT_CALL_ARG(arg1, 1);                                \
+    ALT_CALL_ARG(v1_, 1);                                 \
     ALT_CALL_ARG(v2_, 2);                                 \
     ALT_CALL_NO_ARG3;                                     \
     (void)sizeof(func(arg1, arg2));                       \
@@ -275,17 +278,19 @@ extern void alternative_branches(void);
 })
 
 #define alternative_call2(func, arg1, arg2) ({            \
+    typeof(arg1) v1_ = (arg1);                            \
     typeof(arg2) v2_ = (arg2);                            \
-    ALT_CALL_ARG(arg1, 1);                                \
+    ALT_CALL_ARG(v1_, 1);                                 \
     ALT_CALL_ARG(v2_, 2);                                 \
     ALT_CALL_NO_ARG3;                                     \
     alternative_callN(2, typeof(func(arg1, arg2)), func); \
 })
 
 #define alternative_vcall3(func, arg1, arg2, arg3) ({    \
+    typeof(arg1) v1_ = (arg1);                           \
     typeof(arg2) v2_ = (arg2);                           \
     typeof(arg3) v3_ = (arg3);                           \
-    ALT_CALL_ARG(arg1, 1);                               \
+    ALT_CALL_ARG(v1_, 1);                                \
     ALT_CALL_ARG(v2_, 2);                                \
     ALT_CALL_ARG(v3_, 3);                                \
     ALT_CALL_NO_ARG4;                                    \
@@ -294,9 +299,10 @@ extern void alternative_branches(void);
 })
 
 #define alternative_call3(func, arg1, arg2, arg3) ({     \
+    typeof(arg1) v1_ = (arg1);                            \
     typeof(arg2) v2_ = (arg2);                           \
     typeof(arg3) v3_ = (arg3);                           \
-    ALT_CALL_ARG(arg1, 1);                               \
+    ALT_CALL_ARG(v1_, 1);                                \
     ALT_CALL_ARG(v2_, 2);                                \
     ALT_CALL_ARG(v3_, 3);                                \
     ALT_CALL_NO_ARG4;                                    \
@@ -305,10 +311,11 @@ extern void alternative_branches(void);
 })
 
 #define alternative_vcall4(func, arg1, arg2, arg3, arg4) ({ \
+    typeof(arg1) v1_ = (arg1);                              \
     typeof(arg2) v2_ = (arg2);                              \
     typeof(arg3) v3_ = (arg3);                              \
     typeof(arg4) v4_ = (arg4);                              \
-    ALT_CALL_ARG(arg1, 1);                                  \
+    ALT_CALL_ARG(v1_, 1);                                   \
     ALT_CALL_ARG(v2_, 2);                                   \
     ALT_CALL_ARG(v3_, 3);                                   \
     ALT_CALL_ARG(v4_, 4);                                   \
@@ -318,10 +325,11 @@ extern void alternative_branches(void);
 })
 
 #define alternative_call4(func, arg1, arg2, arg3, arg4) ({  \
+    typeof(arg1) v1_ = (arg1);                              \
     typeof(arg2) v2_ = (arg2);                              \
     typeof(arg3) v3_ = (arg3);                              \
     typeof(arg4) v4_ = (arg4);                              \
-    ALT_CALL_ARG(arg1, 1);                                  \
+    ALT_CALL_ARG(v1_, 1);                                   \
     ALT_CALL_ARG(v2_, 2);                                   \
     ALT_CALL_ARG(v3_, 3);                                   \
     ALT_CALL_ARG(v4_, 4);                                   \
@@ -332,11 +340,12 @@ extern void alternative_branches(void);
 })
 
 #define alternative_vcall5(func, arg1, arg2, arg3, arg4, arg5) ({ \
+    typeof(arg1) v1_ = (arg1);                                    \
     typeof(arg2) v2_ = (arg2);                                    \
     typeof(arg3) v3_ = (arg3);                                    \
     typeof(arg4) v4_ = (arg4);                                    \
     typeof(arg5) v5_ = (arg5);                                    \
-    ALT_CALL_ARG(arg1, 1);                                        \
+    ALT_CALL_ARG(v1_, 1);                                         \
     ALT_CALL_ARG(v2_, 2);                                         \
     ALT_CALL_ARG(v3_, 3);                                         \
     ALT_CALL_ARG(v4_, 4);                                         \
@@ -347,11 +356,12 @@ extern void alternative_branches(void);
 })
 
 #define alternative_call5(func, arg1, arg2, arg3, arg4, arg5) ({  \
+    typeof(arg1) v1_ = (arg1);                                    \
     typeof(arg2) v2_ = (arg2);                                    \
     typeof(arg3) v3_ = (arg3);                                    \
     typeof(arg4) v4_ = (arg4);                                    \
     typeof(arg5) v5_ = (arg5);                                    \
-    ALT_CALL_ARG(arg1, 1);                                        \
+    ALT_CALL_ARG(v1_, 1);                                         \
     ALT_CALL_ARG(v2_, 2);                                         \
     ALT_CALL_ARG(v3_, 3);                                         \
     ALT_CALL_ARG(v4_, 4);                                         \
@@ -363,12 +373,13 @@ extern void alternative_branches(void);
 })
 
 #define alternative_vcall6(func, arg1, arg2, arg3, arg4, arg5, arg6) ({ \
+    typeof(arg1) v1_ = (arg1);                                          \
     typeof(arg2) v2_ = (arg2);                                          \
     typeof(arg3) v3_ = (arg3);                                          \
     typeof(arg4) v4_ = (arg4);                                          \
     typeof(arg5) v5_ = (arg5);                                          \
     typeof(arg6) v6_ = (arg6);                                          \
-    ALT_CALL_ARG(arg1, 1);                                              \
+    ALT_CALL_ARG(v1_, 1);                                               \
     ALT_CALL_ARG(v2_, 2);                                               \
     ALT_CALL_ARG(v3_, 3);                                               \
     ALT_CALL_ARG(v4_, 4);                                               \
@@ -379,12 +390,13 @@ extern void alternative_branches(void);
 })
 
 #define alternative_call6(func, arg1, arg2, arg3, arg4, arg5, arg6) ({  \
+    typeof(arg1) v1_ = (arg1);                                          \
     typeof(arg2) v2_ = (arg2);                                          \
     typeof(arg3) v3_ = (arg3);                                          \
     typeof(arg4) v4_ = (arg4);                                          \
     typeof(arg5) v5_ = (arg5);                                          \
     typeof(arg6) v6_ = (arg6);                                          \
-    ALT_CALL_ARG(arg1, 1);                                              \
+    ALT_CALL_ARG(v1_, 1);                                               \
     ALT_CALL_ARG(v2_, 2);                                               \
     ALT_CALL_ARG(v3_, 3);                                               \
     ALT_CALL_ARG(v4_, 4);                                               \
-- 
2.44.0


Re: [PATCH] x86/altcall: always use a temporary parameter stashing variable
Posted by Jan Beulich 1 month, 4 weeks ago
On 28.02.2024 14:59, Roger Pau Monne wrote:
> The usage in ALT_CALL_ARG() on clang of:
> 
> register union {
>     typeof(arg) e;
>     const unsigned long r;
> } ...
> 
> When `arg` is the first argument to alternative_{,v}call() and
> const_vlapic_vcpu() is used results in clang 3.5.0 complaining with:
> 
> arch/x86/hvm/vlapic.c:141:47: error: non-const static data member must be initialized out of line
>          alternative_call(hvm_funcs.test_pir, const_vlapic_vcpu(vlapic), vec) )
> 
> Workaround this by pulling `arg1` into a local variable, like it's done for
> further arguments (arg2, arg3...)
> 
> Originally arg1 wasn't pulled into a variable because for the a1_ register
> local variable the possible clobbering as a result of operators on other
> variables don't matter:
> 
> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
> 
> Note clang version 3.8.1 seems to already be fixed and don't require the
> workaround, but since it's harmless do it uniformly everywhere.
> 
> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

I'm okay with this change, but since you don't mention anything in this
regard: Did you look at whether / how generated code (with gcc) changes?

Jan

Re: [PATCH] x86/altcall: always use a temporary parameter stashing variable
Posted by Roger Pau Monné 1 month, 4 weeks ago
On Wed, Feb 28, 2024 at 03:28:25PM +0100, Jan Beulich wrote:
> On 28.02.2024 14:59, Roger Pau Monne wrote:
> > The usage in ALT_CALL_ARG() on clang of:
> > 
> > register union {
> >     typeof(arg) e;
> >     const unsigned long r;
> > } ...
> > 
> > When `arg` is the first argument to alternative_{,v}call() and
> > const_vlapic_vcpu() is used results in clang 3.5.0 complaining with:
> > 
> > arch/x86/hvm/vlapic.c:141:47: error: non-const static data member must be initialized out of line
> >          alternative_call(hvm_funcs.test_pir, const_vlapic_vcpu(vlapic), vec) )
> > 
> > Workaround this by pulling `arg1` into a local variable, like it's done for
> > further arguments (arg2, arg3...)
> > 
> > Originally arg1 wasn't pulled into a variable because for the a1_ register
> > local variable the possible clobbering as a result of operators on other
> > variables don't matter:
> > 
> > https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
> > 
> > Note clang version 3.8.1 seems to already be fixed and don't require the
> > workaround, but since it's harmless do it uniformly everywhere.
> > 
> > Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
> > Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> I'm okay with this change, but since you don't mention anything in this
> regard: Did you look at whether / how generated code (with gcc) changes?

So the specific example of vlapic_test_irq() shows no changes to the
generated code. bloat-o-meter shows a 0 delta:

add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Function                                     old     new   delta
Total: Before=3570098, After=3570098, chg +0.00%

I assume there can be some reordering of instructions, as we now force
arg1 temporary variable (v1_) to be initialized ahead of the rest of
the arguments.

Thanks, Roger.

Re: [PATCH] x86/altcall: always use a temporary parameter stashing variable
Posted by Jan Beulich 1 month, 4 weeks ago
On 28.02.2024 17:56, Roger Pau Monné wrote:
> On Wed, Feb 28, 2024 at 03:28:25PM +0100, Jan Beulich wrote:
>> On 28.02.2024 14:59, Roger Pau Monne wrote:
>>> The usage in ALT_CALL_ARG() on clang of:
>>>
>>> register union {
>>>     typeof(arg) e;
>>>     const unsigned long r;
>>> } ...
>>>
>>> When `arg` is the first argument to alternative_{,v}call() and
>>> const_vlapic_vcpu() is used results in clang 3.5.0 complaining with:
>>>
>>> arch/x86/hvm/vlapic.c:141:47: error: non-const static data member must be initialized out of line
>>>          alternative_call(hvm_funcs.test_pir, const_vlapic_vcpu(vlapic), vec) )
>>>
>>> Workaround this by pulling `arg1` into a local variable, like it's done for
>>> further arguments (arg2, arg3...)
>>>
>>> Originally arg1 wasn't pulled into a variable because for the a1_ register
>>> local variable the possible clobbering as a result of operators on other
>>> variables don't matter:
>>>
>>> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html#Local-Register-Variables
>>>
>>> Note clang version 3.8.1 seems to already be fixed and don't require the
>>> workaround, but since it's harmless do it uniformly everywhere.
>>>
>>> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> Fixes: 2ce562b2a413 ('x86/altcall: use a union as register type for function parameters on clang')
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> I'm okay with this change, but since you don't mention anything in this
>> regard: Did you look at whether / how generated code (with gcc) changes?
> 
> So the specific example of vlapic_test_irq() shows no changes to the
> generated code. bloat-o-meter shows a 0 delta:
> 
> add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
> Function                                     old     new   delta
> Total: Before=3570098, After=3570098, chg +0.00%
> 
> I assume there can be some reordering of instructions, as we now force
> arg1 temporary variable (v1_) to be initialized ahead of the rest of
> the arguments.

Yeah, re-ordering is okay. I was more worried of new unnecessary register
copying, for example. Thanks for confirming.

Jan