[PATCH] x86/altcall: further refine clang workaround

Roger Pau Monne posted 1 patch 1 month, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20240725105634.16825-1-roger.pau@citrix.com
There is a newer version of this series
xen/arch/x86/include/asm/alternative.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
[PATCH] x86/altcall: further refine clang workaround
Posted by Roger Pau Monne 1 month, 2 weeks ago
The current code in ALT_CALL_ARG() won't successfully workaround the clang
code-generation issue if the arg parameter has a size that's not a power of 2.
While there are no such sized parameters at the moment, improve the workaround
to also be effective when such sizes are used.

Instead of using a union with a long, add the necessary padding so that the
resulting struct always has the same size as a register, and let the compiler
zero-initialize the padding.

Reported-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/include/asm/alternative.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/include/asm/alternative.h b/xen/arch/x86/include/asm/alternative.h
index e63b45927643..c3e3a5e3f8a8 100644
--- a/xen/arch/x86/include/asm/alternative.h
+++ b/xen/arch/x86/include/asm/alternative.h
@@ -184,11 +184,11 @@ extern void alternative_branches(void);
  * https://github.com/llvm/llvm-project/issues/82598
  */
 #define ALT_CALL_ARG(arg, n)                                            \
-    register union {                                                    \
-        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
-        unsigned long r;                                                \
+    register struct {                                                   \
+        typeof(arg) e;                                                  \
+        char pad[sizeof(void *) - sizeof(arg)];                         \
     } a ## n ## _ asm ( ALT_CALL_arg ## n ) = {                         \
-        .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
+        .e = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })   \
     }
 #else
 #define ALT_CALL_ARG(arg, n) \
-- 
2.45.2


Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Jan Beulich 1 month, 2 weeks ago
On 25.07.2024 12:56, Roger Pau Monne wrote:
> --- a/xen/arch/x86/include/asm/alternative.h
> +++ b/xen/arch/x86/include/asm/alternative.h
> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
>   * https://github.com/llvm/llvm-project/issues/82598
>   */
>  #define ALT_CALL_ARG(arg, n)                                            \
> -    register union {                                                    \
> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> -        unsigned long r;                                                \
> +    register struct {                                                   \
> +        typeof(arg) e;                                                  \
> +        char pad[sizeof(void *) - sizeof(arg)];                         \

One thing that occurred to me only after our discussion, and I then forgot
to mention this before you would send a patch: What if sizeof(void *) ==
sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
get rid of.

I was wondering whether we could get away resorting to bitfields, as those
are well-defined when having a width of zero:

    register struct {                                                   \
        typeof(arg) e;                                                  \
        unsigned long pad:(sizeof(void *) - sizeof(arg)) * 8;           \
    } ...

Yet when the width is zero, the field may not have name, whereas when the
field uniformly doesn't have a name, Clang would, like also for

    register struct {                                                   \
        typeof(arg) e;                                                  \
        unsigned long :0;                                               \
    } ...

regards that space as not needing any (re)init. Bottom line: For the
moment I'm out of ideas.

Jan
Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Roger Pau Monné 1 month, 2 weeks ago
On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> On 25.07.2024 12:56, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/include/asm/alternative.h
> > +++ b/xen/arch/x86/include/asm/alternative.h
> > @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> >   * https://github.com/llvm/llvm-project/issues/82598
> >   */
> >  #define ALT_CALL_ARG(arg, n)                                            \
> > -    register union {                                                    \
> > -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> > -        unsigned long r;                                                \
> > +    register struct {                                                   \
> > +        typeof(arg) e;                                                  \
> > +        char pad[sizeof(void *) - sizeof(arg)];                         \
> 
> One thing that occurred to me only after our discussion, and I then forgot
> to mention this before you would send a patch: What if sizeof(void *) ==
> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> get rid of.

I wondered about this, but I though it was only [] that we were trying
to get rid of, not [0].

> I was wondering whether we could get away resorting to bitfields, as those
> are well-defined when having a width of zero:
> 
>     register struct {                                                   \
>         typeof(arg) e;                                                  \
>         unsigned long pad:(sizeof(void *) - sizeof(arg)) * 8;           \
>     } ...
> 
> Yet when the width is zero, the field may not have name, whereas when the
> field uniformly doesn't have a name, Clang would, like also for
> 
>     register struct {                                                   \
>         typeof(arg) e;                                                  \
>         unsigned long :0;                                               \
>     } ...
> 
> regards that space as not needing any (re)init. Bottom line: For the
> moment I'm out of ideas.

Hm, I see.  I don't have any good ideas right now either.  Will put it
on the back burner and pick up later, already too much on my plate
right now to be playing clang games.  Thanks for your input.

Roger.
Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Jan Beulich 1 month, 2 weeks ago
On 25.07.2024 16:54, Roger Pau Monné wrote:
> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
>> On 25.07.2024 12:56, Roger Pau Monne wrote:
>>> --- a/xen/arch/x86/include/asm/alternative.h
>>> +++ b/xen/arch/x86/include/asm/alternative.h
>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
>>>   * https://github.com/llvm/llvm-project/issues/82598
>>>   */
>>>  #define ALT_CALL_ARG(arg, n)                                            \
>>> -    register union {                                                    \
>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
>>> -        unsigned long r;                                                \
>>> +    register struct {                                                   \
>>> +        typeof(arg) e;                                                  \
>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
>>
>> One thing that occurred to me only after our discussion, and I then forgot
>> to mention this before you would send a patch: What if sizeof(void *) ==
>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
>> get rid of.
> 
> I wondered about this, but I though it was only [] that we were trying
> to get rid of, not [0].

Sadly (here) it's actually the other way around, aiui.

Jan

Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Roger Pau Monné 1 month, 1 week ago
On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
> On 25.07.2024 16:54, Roger Pau Monné wrote:
> > On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> >> On 25.07.2024 12:56, Roger Pau Monne wrote:
> >>> --- a/xen/arch/x86/include/asm/alternative.h
> >>> +++ b/xen/arch/x86/include/asm/alternative.h
> >>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> >>>   * https://github.com/llvm/llvm-project/issues/82598
> >>>   */
> >>>  #define ALT_CALL_ARG(arg, n)                                            \
> >>> -    register union {                                                    \
> >>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> >>> -        unsigned long r;                                                \
> >>> +    register struct {                                                   \
> >>> +        typeof(arg) e;                                                  \
> >>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
> >>
> >> One thing that occurred to me only after our discussion, and I then forgot
> >> to mention this before you would send a patch: What if sizeof(void *) ==
> >> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> >> get rid of.
> > 
> > I wondered about this, but I though it was only [] that we were trying
> > to get rid of, not [0].
> 
> Sadly (here) it's actually the other way around, aiui.

The only other option I have in mind is using an oversized array on
the union, like:

#define ALT_CALL_ARG(arg, n)                                            \
    union {                                                             \
        typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
        unsigned long r;                                                \
    } a ## n ## __  = {                                                 \
        .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
    };                                                                  \
    register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
        a ## n ## __.r

Regards, Roger.

Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Jan Beulich 1 month, 1 week ago
On 26.07.2024 09:31, Roger Pau Monné wrote:
> On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
>> On 25.07.2024 16:54, Roger Pau Monné wrote:
>>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
>>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
>>>>> --- a/xen/arch/x86/include/asm/alternative.h
>>>>> +++ b/xen/arch/x86/include/asm/alternative.h
>>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
>>>>>   * https://github.com/llvm/llvm-project/issues/82598
>>>>>   */
>>>>>  #define ALT_CALL_ARG(arg, n)                                            \
>>>>> -    register union {                                                    \
>>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
>>>>> -        unsigned long r;                                                \
>>>>> +    register struct {                                                   \
>>>>> +        typeof(arg) e;                                                  \
>>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
>>>>
>>>> One thing that occurred to me only after our discussion, and I then forgot
>>>> to mention this before you would send a patch: What if sizeof(void *) ==
>>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
>>>> get rid of.
>>>
>>> I wondered about this, but I though it was only [] that we were trying
>>> to get rid of, not [0].
>>
>> Sadly (here) it's actually the other way around, aiui.
> 
> The only other option I have in mind is using an oversized array on
> the union, like:
> 
> #define ALT_CALL_ARG(arg, n)                                            \
>     union {                                                             \
>         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
>         unsigned long r;                                                \
>     } a ## n ## __  = {                                                 \
>         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
>     };                                                                  \
>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
>         a ## n ## __.r

Yet that's likely awful code-gen wise? For the time being, can we perhaps
just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?

Jan

Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Roger Pau Monné 1 month, 1 week ago
On Fri, Jul 26, 2024 at 09:36:15AM +0200, Jan Beulich wrote:
> On 26.07.2024 09:31, Roger Pau Monné wrote:
> > On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
> >> On 25.07.2024 16:54, Roger Pau Monné wrote:
> >>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> >>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
> >>>>> --- a/xen/arch/x86/include/asm/alternative.h
> >>>>> +++ b/xen/arch/x86/include/asm/alternative.h
> >>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> >>>>>   * https://github.com/llvm/llvm-project/issues/82598
> >>>>>   */
> >>>>>  #define ALT_CALL_ARG(arg, n)                                            \
> >>>>> -    register union {                                                    \
> >>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> >>>>> -        unsigned long r;                                                \
> >>>>> +    register struct {                                                   \
> >>>>> +        typeof(arg) e;                                                  \
> >>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
> >>>>
> >>>> One thing that occurred to me only after our discussion, and I then forgot
> >>>> to mention this before you would send a patch: What if sizeof(void *) ==
> >>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> >>>> get rid of.
> >>>
> >>> I wondered about this, but I though it was only [] that we were trying
> >>> to get rid of, not [0].
> >>
> >> Sadly (here) it's actually the other way around, aiui.
> > 
> > The only other option I have in mind is using an oversized array on
> > the union, like:
> > 
> > #define ALT_CALL_ARG(arg, n)                                            \
> >     union {                                                             \
> >         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
> >         unsigned long r;                                                \
> >     } a ## n ## __  = {                                                 \
> >         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
> >     };                                                                  \
> >     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
> >         a ## n ## __.r
> 
> Yet that's likely awful code-gen wise?

Seems OK: https://godbolt.org/z/nsdo5Gs8W

> For the time being, can we perhaps
> just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?

My main concern with tightening the BUILD_BUG_ON() is that then I
would also like to do so for the GCC one, so that build fails
uniformly.

Thanks, Roger.

Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Jan Beulich 1 month, 1 week ago
On 26.07.2024 09:52, Roger Pau Monné wrote:
> On Fri, Jul 26, 2024 at 09:36:15AM +0200, Jan Beulich wrote:
>> On 26.07.2024 09:31, Roger Pau Monné wrote:
>>> On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
>>>> On 25.07.2024 16:54, Roger Pau Monné wrote:
>>>>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
>>>>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
>>>>>>> --- a/xen/arch/x86/include/asm/alternative.h
>>>>>>> +++ b/xen/arch/x86/include/asm/alternative.h
>>>>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
>>>>>>>   * https://github.com/llvm/llvm-project/issues/82598
>>>>>>>   */
>>>>>>>  #define ALT_CALL_ARG(arg, n)                                            \
>>>>>>> -    register union {                                                    \
>>>>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
>>>>>>> -        unsigned long r;                                                \
>>>>>>> +    register struct {                                                   \
>>>>>>> +        typeof(arg) e;                                                  \
>>>>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
>>>>>>
>>>>>> One thing that occurred to me only after our discussion, and I then forgot
>>>>>> to mention this before you would send a patch: What if sizeof(void *) ==
>>>>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
>>>>>> get rid of.
>>>>>
>>>>> I wondered about this, but I though it was only [] that we were trying
>>>>> to get rid of, not [0].
>>>>
>>>> Sadly (here) it's actually the other way around, aiui.
>>>
>>> The only other option I have in mind is using an oversized array on
>>> the union, like:
>>>
>>> #define ALT_CALL_ARG(arg, n)                                            \
>>>     union {                                                             \
>>>         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
>>>         unsigned long r;                                                \
>>>     } a ## n ## __  = {                                                 \
>>>         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
>>>     };                                                                  \
>>>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
>>>         a ## n ## __.r
>>
>> Yet that's likely awful code-gen wise?
> 
> Seems OK: https://godbolt.org/z/nsdo5Gs8W

In which case why not go this route. If the compiler is doing fine with
that, maybe the array dimension expression could be further simplified,
accepting yet more over-sizing? Like "sizeof(void *) / sizeof (arg) + 1"
or even simply "sizeof(void *)"? Suitably commented of course ...

>> For the time being, can we perhaps
>> just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?
> 
> My main concern with tightening the BUILD_BUG_ON() is that then I
> would also like to do so for the GCC one, so that build fails
> uniformly.

If we were to take that route, then yes, probably should constrain both
(with a suitable comment on the gcc one).

Jan

Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Alejandro Vallejo 1 month, 1 week ago
On Fri Jul 26, 2024 at 9:05 AM BST, Jan Beulich wrote:
> On 26.07.2024 09:52, Roger Pau Monné wrote:
> > On Fri, Jul 26, 2024 at 09:36:15AM +0200, Jan Beulich wrote:
> >> On 26.07.2024 09:31, Roger Pau Monné wrote:
> >>> On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
> >>>> On 25.07.2024 16:54, Roger Pau Monné wrote:
> >>>>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> >>>>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
> >>>>>>> --- a/xen/arch/x86/include/asm/alternative.h
> >>>>>>> +++ b/xen/arch/x86/include/asm/alternative.h
> >>>>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> >>>>>>>   * https://github.com/llvm/llvm-project/issues/82598
> >>>>>>>   */
> >>>>>>>  #define ALT_CALL_ARG(arg, n)                                            \
> >>>>>>> -    register union {                                                    \
> >>>>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> >>>>>>> -        unsigned long r;                                                \
> >>>>>>> +    register struct {                                                   \
> >>>>>>> +        typeof(arg) e;                                                  \
> >>>>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
> >>>>>>
> >>>>>> One thing that occurred to me only after our discussion, and I then forgot
> >>>>>> to mention this before you would send a patch: What if sizeof(void *) ==
> >>>>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> >>>>>> get rid of.
> >>>>>
> >>>>> I wondered about this, but I though it was only [] that we were trying
> >>>>> to get rid of, not [0].
> >>>>
> >>>> Sadly (here) it's actually the other way around, aiui.
> >>>
> >>> The only other option I have in mind is using an oversized array on
> >>> the union, like:
> >>>
> >>> #define ALT_CALL_ARG(arg, n)                                            \
> >>>     union {                                                             \
> >>>         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
> >>>         unsigned long r;                                                \
> >>>     } a ## n ## __  = {                                                 \
> >>>         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
> >>>     };                                                                  \
> >>>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
> >>>         a ## n ## __.r
> >>
> >> Yet that's likely awful code-gen wise?
> > 
> > Seems OK: https://godbolt.org/z/nsdo5Gs8W
>
> In which case why not go this route. If the compiler is doing fine with
> that, maybe the array dimension expression could be further simplified,
> accepting yet more over-sizing? Like "sizeof(void *) / sizeof (arg) + 1"
> or even simply "sizeof(void *)"? Suitably commented of course ...
>
> >> For the time being, can we perhaps
> >> just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?
> > 
> > My main concern with tightening the BUILD_BUG_ON() is that then I
> > would also like to do so for the GCC one, so that build fails
> > uniformly.
>
> If we were to take that route, then yes, probably should constrain both
> (with a suitable comment on the gcc one).
>
> Jan

Yet another way would be to have an intermediate `long` to cast onto. Compilers
will optimise away the copy. It ignores the different-type aliasing rules in
the C spec, so there's an assumption that we have -fno-strict-aliasing. But I
belive we do? Otherwise it should pretty much work on anything.

```
  #define ALT_CALL_ARG(arg, n)                                              \
      unsigned long __tmp = 0;                                              \
      *(typeof(arg) *)&__tmp =                                              \
          ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })          \
      register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = __tmp; \
```

fwiw, clang18 emits identical code compared with the previous godbolt link.

Link: https://godbolt.org/z/facd1M9xa

Cheers,
Alejandro
Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Alejandro Vallejo 1 month, 1 week ago
On Fri Jul 26, 2024 at 3:17 PM BST, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 9:05 AM BST, Jan Beulich wrote:
> > On 26.07.2024 09:52, Roger Pau Monné wrote:
> > > On Fri, Jul 26, 2024 at 09:36:15AM +0200, Jan Beulich wrote:
> > >> On 26.07.2024 09:31, Roger Pau Monné wrote:
> > >>> On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
> > >>>> On 25.07.2024 16:54, Roger Pau Monné wrote:
> > >>>>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> > >>>>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
> > >>>>>>> --- a/xen/arch/x86/include/asm/alternative.h
> > >>>>>>> +++ b/xen/arch/x86/include/asm/alternative.h
> > >>>>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> > >>>>>>>   * https://github.com/llvm/llvm-project/issues/82598
> > >>>>>>>   */
> > >>>>>>>  #define ALT_CALL_ARG(arg, n)                                            \
> > >>>>>>> -    register union {                                                    \
> > >>>>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> > >>>>>>> -        unsigned long r;                                                \
> > >>>>>>> +    register struct {                                                   \
> > >>>>>>> +        typeof(arg) e;                                                  \
> > >>>>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
> > >>>>>>
> > >>>>>> One thing that occurred to me only after our discussion, and I then forgot
> > >>>>>> to mention this before you would send a patch: What if sizeof(void *) ==
> > >>>>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> > >>>>>> get rid of.
> > >>>>>
> > >>>>> I wondered about this, but I though it was only [] that we were trying
> > >>>>> to get rid of, not [0].
> > >>>>
> > >>>> Sadly (here) it's actually the other way around, aiui.
> > >>>
> > >>> The only other option I have in mind is using an oversized array on
> > >>> the union, like:
> > >>>
> > >>> #define ALT_CALL_ARG(arg, n)                                            \
> > >>>     union {                                                             \
> > >>>         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
> > >>>         unsigned long r;                                                \
> > >>>     } a ## n ## __  = {                                                 \
> > >>>         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
> > >>>     };                                                                  \
> > >>>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
> > >>>         a ## n ## __.r
> > >>
> > >> Yet that's likely awful code-gen wise?
> > > 
> > > Seems OK: https://godbolt.org/z/nsdo5Gs8W
> >
> > In which case why not go this route. If the compiler is doing fine with
> > that, maybe the array dimension expression could be further simplified,
> > accepting yet more over-sizing? Like "sizeof(void *) / sizeof (arg) + 1"
> > or even simply "sizeof(void *)"? Suitably commented of course ...
> >
> > >> For the time being, can we perhaps
> > >> just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?
> > > 
> > > My main concern with tightening the BUILD_BUG_ON() is that then I
> > > would also like to do so for the GCC one, so that build fails
> > > uniformly.
> >
> > If we were to take that route, then yes, probably should constrain both
> > (with a suitable comment on the gcc one).
> >
> > Jan
>
> Yet another way would be to have an intermediate `long` to cast onto. Compilers
> will optimise away the copy. It ignores the different-type aliasing rules in
> the C spec, so there's an assumption that we have -fno-strict-aliasing. But I
> belive we do? Otherwise it should pretty much work on anything.
>
> ```
>   #define ALT_CALL_ARG(arg, n)                                              \
>       unsigned long __tmp = 0;                                              \
>       *(typeof(arg) *)&__tmp =                                              \
>           ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })          \
>       register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = __tmp; \
> ```
>
> fwiw, clang18 emits identical code compared with the previous godbolt link.
>
> Link: https://godbolt.org/z/facd1M9xa
>
> Cheers,
> Alejandro

Bah. s/b/__tmp/ in line15. Same output though, so the point still stands.

Cheers,
Alejandro
Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Roger Pau Monné 1 month, 1 week ago
On Fri, Jul 26, 2024 at 03:25:08PM +0100, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 3:17 PM BST, Alejandro Vallejo wrote:
> > On Fri Jul 26, 2024 at 9:05 AM BST, Jan Beulich wrote:
> > > On 26.07.2024 09:52, Roger Pau Monné wrote:
> > > > On Fri, Jul 26, 2024 at 09:36:15AM +0200, Jan Beulich wrote:
> > > >> On 26.07.2024 09:31, Roger Pau Monné wrote:
> > > >>> On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
> > > >>>> On 25.07.2024 16:54, Roger Pau Monné wrote:
> > > >>>>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> > > >>>>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
> > > >>>>>>> --- a/xen/arch/x86/include/asm/alternative.h
> > > >>>>>>> +++ b/xen/arch/x86/include/asm/alternative.h
> > > >>>>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> > > >>>>>>>   * https://github.com/llvm/llvm-project/issues/82598
> > > >>>>>>>   */
> > > >>>>>>>  #define ALT_CALL_ARG(arg, n)                                            \
> > > >>>>>>> -    register union {                                                    \
> > > >>>>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> > > >>>>>>> -        unsigned long r;                                                \
> > > >>>>>>> +    register struct {                                                   \
> > > >>>>>>> +        typeof(arg) e;                                                  \
> > > >>>>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
> > > >>>>>>
> > > >>>>>> One thing that occurred to me only after our discussion, and I then forgot
> > > >>>>>> to mention this before you would send a patch: What if sizeof(void *) ==
> > > >>>>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> > > >>>>>> get rid of.
> > > >>>>>
> > > >>>>> I wondered about this, but I though it was only [] that we were trying
> > > >>>>> to get rid of, not [0].
> > > >>>>
> > > >>>> Sadly (here) it's actually the other way around, aiui.
> > > >>>
> > > >>> The only other option I have in mind is using an oversized array on
> > > >>> the union, like:
> > > >>>
> > > >>> #define ALT_CALL_ARG(arg, n)                                            \
> > > >>>     union {                                                             \
> > > >>>         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
> > > >>>         unsigned long r;                                                \
> > > >>>     } a ## n ## __  = {                                                 \
> > > >>>         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
> > > >>>     };                                                                  \
> > > >>>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
> > > >>>         a ## n ## __.r
> > > >>
> > > >> Yet that's likely awful code-gen wise?
> > > > 
> > > > Seems OK: https://godbolt.org/z/nsdo5Gs8W
> > >
> > > In which case why not go this route. If the compiler is doing fine with
> > > that, maybe the array dimension expression could be further simplified,
> > > accepting yet more over-sizing? Like "sizeof(void *) / sizeof (arg) + 1"
> > > or even simply "sizeof(void *)"? Suitably commented of course ...
> > >
> > > >> For the time being, can we perhaps
> > > >> just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?
> > > > 
> > > > My main concern with tightening the BUILD_BUG_ON() is that then I
> > > > would also like to do so for the GCC one, so that build fails
> > > > uniformly.
> > >
> > > If we were to take that route, then yes, probably should constrain both
> > > (with a suitable comment on the gcc one).
> > >
> > > Jan
> >
> > Yet another way would be to have an intermediate `long` to cast onto. Compilers
> > will optimise away the copy. It ignores the different-type aliasing rules in
> > the C spec, so there's an assumption that we have -fno-strict-aliasing. But I
> > belive we do? Otherwise it should pretty much work on anything.
> >
> > ```
> >   #define ALT_CALL_ARG(arg, n)                                              \
> >       unsigned long __tmp = 0;                                              \
> >       *(typeof(arg) *)&__tmp =                                              \
> >           ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })          \
> >       register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = __tmp; \
> > ```
> >
> > fwiw, clang18 emits identical code compared with the previous godbolt link.
> >
> > Link: https://godbolt.org/z/facd1M9xa
> >
> > Cheers,
> > Alejandro
> 
> Bah. s/b/__tmp/ in line15. Same output though, so the point still stands.

Had to adjust it to:

#define ALT_CALL_ARG(arg, n)                                              \
    unsigned long a ## n ## __ = 0;                                       \
    *(typeof(arg) *)&a ## n ## __ =                                       \
        ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); });         \
    register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = a ## n ## __

So that tmp__ is not defined multiple times for repeated
ALT_CALL_ARG() usages.

Already tried something like this in the past, but it mixes code with
declarations, and that's forbidden in the current C standard that Xen
uses:

./arch/x86/include/asm/hvm/hvm.h:665:5: error: mixing declarations and code is incompatible with standards before C99 [-Werror,-Wdeclaration-after-statement]

The `*(typeof(arg) *)&__tmp = ...` line is considered code, and is
followed by further declarations.  Even if we moved both declarations
ahead of the assigns it would still complain when multiple
ALT_CALL_ARG() instances are used in the same altcall block.

Thanks, Roger.

Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Alejandro Vallejo 1 month, 1 week ago
On Fri Jul 26, 2024 at 4:18 PM BST, Roger Pau Monné wrote:
> On Fri, Jul 26, 2024 at 03:25:08PM +0100, Alejandro Vallejo wrote:
> > On Fri Jul 26, 2024 at 3:17 PM BST, Alejandro Vallejo wrote:
> > > On Fri Jul 26, 2024 at 9:05 AM BST, Jan Beulich wrote:
> > > > On 26.07.2024 09:52, Roger Pau Monné wrote:
> > > > > On Fri, Jul 26, 2024 at 09:36:15AM +0200, Jan Beulich wrote:
> > > > >> On 26.07.2024 09:31, Roger Pau Monné wrote:
> > > > >>> On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
> > > > >>>> On 25.07.2024 16:54, Roger Pau Monné wrote:
> > > > >>>>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> > > > >>>>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
> > > > >>>>>>> --- a/xen/arch/x86/include/asm/alternative.h
> > > > >>>>>>> +++ b/xen/arch/x86/include/asm/alternative.h
> > > > >>>>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> > > > >>>>>>>   * https://github.com/llvm/llvm-project/issues/82598
> > > > >>>>>>>   */
> > > > >>>>>>>  #define ALT_CALL_ARG(arg, n)                                            \
> > > > >>>>>>> -    register union {                                                    \
> > > > >>>>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> > > > >>>>>>> -        unsigned long r;                                                \
> > > > >>>>>>> +    register struct {                                                   \
> > > > >>>>>>> +        typeof(arg) e;                                                  \
> > > > >>>>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
> > > > >>>>>>
> > > > >>>>>> One thing that occurred to me only after our discussion, and I then forgot
> > > > >>>>>> to mention this before you would send a patch: What if sizeof(void *) ==
> > > > >>>>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> > > > >>>>>> get rid of.
> > > > >>>>>
> > > > >>>>> I wondered about this, but I though it was only [] that we were trying
> > > > >>>>> to get rid of, not [0].
> > > > >>>>
> > > > >>>> Sadly (here) it's actually the other way around, aiui.
> > > > >>>
> > > > >>> The only other option I have in mind is using an oversized array on
> > > > >>> the union, like:
> > > > >>>
> > > > >>> #define ALT_CALL_ARG(arg, n)                                            \
> > > > >>>     union {                                                             \
> > > > >>>         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
> > > > >>>         unsigned long r;                                                \
> > > > >>>     } a ## n ## __  = {                                                 \
> > > > >>>         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
> > > > >>>     };                                                                  \
> > > > >>>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
> > > > >>>         a ## n ## __.r
> > > > >>
> > > > >> Yet that's likely awful code-gen wise?
> > > > > 
> > > > > Seems OK: https://godbolt.org/z/nsdo5Gs8W
> > > >
> > > > In which case why not go this route. If the compiler is doing fine with
> > > > that, maybe the array dimension expression could be further simplified,
> > > > accepting yet more over-sizing? Like "sizeof(void *) / sizeof (arg) + 1"
> > > > or even simply "sizeof(void *)"? Suitably commented of course ...
> > > >
> > > > >> For the time being, can we perhaps
> > > > >> just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?
> > > > > 
> > > > > My main concern with tightening the BUILD_BUG_ON() is that then I
> > > > > would also like to do so for the GCC one, so that build fails
> > > > > uniformly.
> > > >
> > > > If we were to take that route, then yes, probably should constrain both
> > > > (with a suitable comment on the gcc one).
> > > >
> > > > Jan
> > >
> > > Yet another way would be to have an intermediate `long` to cast onto. Compilers
> > > will optimise away the copy. It ignores the different-type aliasing rules in
> > > the C spec, so there's an assumption that we have -fno-strict-aliasing. But I
> > > belive we do? Otherwise it should pretty much work on anything.
> > >
> > > ```
> > >   #define ALT_CALL_ARG(arg, n)                                              \
> > >       unsigned long __tmp = 0;                                              \
> > >       *(typeof(arg) *)&__tmp =                                              \
> > >           ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })          \
> > >       register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = __tmp; \
> > > ```
> > >
> > > fwiw, clang18 emits identical code compared with the previous godbolt link.
> > >
> > > Link: https://godbolt.org/z/facd1M9xa
> > >
> > > Cheers,
> > > Alejandro
> > 
> > Bah. s/b/__tmp/ in line15. Same output though, so the point still stands.
>
> Had to adjust it to:
>
> #define ALT_CALL_ARG(arg, n)                                              \
>     unsigned long a ## n ## __ = 0;                                       \
>     *(typeof(arg) *)&a ## n ## __ =                                       \
>         ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); });         \
>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = a ## n ## __
>
> So that tmp__ is not defined multiple times for repeated
> ALT_CALL_ARG() usages.
>
> Already tried something like this in the past, but it mixes code with
> declarations, and that's forbidden in the current C standard that Xen
> uses:
>
> ./arch/x86/include/asm/hvm/hvm.h:665:5: error: mixing declarations and code is incompatible with standards before C99 [-Werror,-Wdeclaration-after-statement]
>
> The `*(typeof(arg) *)&__tmp = ...` line is considered code, and is
> followed by further declarations.  Even if we moved both declarations
> ahead of the assigns it would still complain when multiple
> ALT_CALL_ARG() instances are used in the same altcall block.
>
> Thanks, Roger.

That _was_ forbidden in C89, but it has been allowed since. We have a warning
enabled to cause it to fail even if we always use C99-compatible compilers. I
think we should change that.

Regardless, I think it can be worked around. This compiles (otherwise
untested):

#define ALT_CALL_ARG(arg, n)
    register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = ({   \
        unsigned long tmp = 0;                                          \
        *(typeof(arg) *)&a ## n ## __ = (arg);                          \
        BUILD_BUG_ON(sizeof(arg) > sizeof(void *));                     \
        tmp;                                                            \
    })

That said, if the oversized temp union works, I'm fine with that too.

Cheers,
Alejandro
Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Alejandro Vallejo 1 month, 1 week ago
On Fri Jul 26, 2024 at 5:25 PM BST, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 4:18 PM BST, Roger Pau Monné wrote:
> > On Fri, Jul 26, 2024 at 03:25:08PM +0100, Alejandro Vallejo wrote:
> > > On Fri Jul 26, 2024 at 3:17 PM BST, Alejandro Vallejo wrote:
> > > > On Fri Jul 26, 2024 at 9:05 AM BST, Jan Beulich wrote:
> > > > > On 26.07.2024 09:52, Roger Pau Monné wrote:
> > > > > > On Fri, Jul 26, 2024 at 09:36:15AM +0200, Jan Beulich wrote:
> > > > > >> On 26.07.2024 09:31, Roger Pau Monné wrote:
> > > > > >>> On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
> > > > > >>>> On 25.07.2024 16:54, Roger Pau Monné wrote:
> > > > > >>>>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> > > > > >>>>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
> > > > > >>>>>>> --- a/xen/arch/x86/include/asm/alternative.h
> > > > > >>>>>>> +++ b/xen/arch/x86/include/asm/alternative.h
> > > > > >>>>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> > > > > >>>>>>>   * https://github.com/llvm/llvm-project/issues/82598
> > > > > >>>>>>>   */
> > > > > >>>>>>>  #define ALT_CALL_ARG(arg, n)                                            \
> > > > > >>>>>>> -    register union {                                                    \
> > > > > >>>>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> > > > > >>>>>>> -        unsigned long r;                                                \
> > > > > >>>>>>> +    register struct {                                                   \
> > > > > >>>>>>> +        typeof(arg) e;                                                  \
> > > > > >>>>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
> > > > > >>>>>>
> > > > > >>>>>> One thing that occurred to me only after our discussion, and I then forgot
> > > > > >>>>>> to mention this before you would send a patch: What if sizeof(void *) ==
> > > > > >>>>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> > > > > >>>>>> get rid of.
> > > > > >>>>>
> > > > > >>>>> I wondered about this, but I though it was only [] that we were trying
> > > > > >>>>> to get rid of, not [0].
> > > > > >>>>
> > > > > >>>> Sadly (here) it's actually the other way around, aiui.
> > > > > >>>
> > > > > >>> The only other option I have in mind is using an oversized array on
> > > > > >>> the union, like:
> > > > > >>>
> > > > > >>> #define ALT_CALL_ARG(arg, n)                                            \
> > > > > >>>     union {                                                             \
> > > > > >>>         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
> > > > > >>>         unsigned long r;                                                \
> > > > > >>>     } a ## n ## __  = {                                                 \
> > > > > >>>         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
> > > > > >>>     };                                                                  \
> > > > > >>>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
> > > > > >>>         a ## n ## __.r
> > > > > >>
> > > > > >> Yet that's likely awful code-gen wise?
> > > > > > 
> > > > > > Seems OK: https://godbolt.org/z/nsdo5Gs8W
> > > > >
> > > > > In which case why not go this route. If the compiler is doing fine with
> > > > > that, maybe the array dimension expression could be further simplified,
> > > > > accepting yet more over-sizing? Like "sizeof(void *) / sizeof (arg) + 1"
> > > > > or even simply "sizeof(void *)"? Suitably commented of course ...
> > > > >
> > > > > >> For the time being, can we perhaps
> > > > > >> just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?
> > > > > > 
> > > > > > My main concern with tightening the BUILD_BUG_ON() is that then I
> > > > > > would also like to do so for the GCC one, so that build fails
> > > > > > uniformly.
> > > > >
> > > > > If we were to take that route, then yes, probably should constrain both
> > > > > (with a suitable comment on the gcc one).
> > > > >
> > > > > Jan
> > > >
> > > > Yet another way would be to have an intermediate `long` to cast onto. Compilers
> > > > will optimise away the copy. It ignores the different-type aliasing rules in
> > > > the C spec, so there's an assumption that we have -fno-strict-aliasing. But I
> > > > belive we do? Otherwise it should pretty much work on anything.
> > > >
> > > > ```
> > > >   #define ALT_CALL_ARG(arg, n)                                              \
> > > >       unsigned long __tmp = 0;                                              \
> > > >       *(typeof(arg) *)&__tmp =                                              \
> > > >           ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })          \
> > > >       register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = __tmp; \
> > > > ```
> > > >
> > > > fwiw, clang18 emits identical code compared with the previous godbolt link.
> > > >
> > > > Link: https://godbolt.org/z/facd1M9xa
> > > >
> > > > Cheers,
> > > > Alejandro
> > > 
> > > Bah. s/b/__tmp/ in line15. Same output though, so the point still stands.
> >
> > Had to adjust it to:
> >
> > #define ALT_CALL_ARG(arg, n)                                              \
> >     unsigned long a ## n ## __ = 0;                                       \
> >     *(typeof(arg) *)&a ## n ## __ =                                       \
> >         ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); });         \
> >     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = a ## n ## __
> >
> > So that tmp__ is not defined multiple times for repeated
> > ALT_CALL_ARG() usages.
> >
> > Already tried something like this in the past, but it mixes code with
> > declarations, and that's forbidden in the current C standard that Xen
> > uses:
> >
> > ./arch/x86/include/asm/hvm/hvm.h:665:5: error: mixing declarations and code is incompatible with standards before C99 [-Werror,-Wdeclaration-after-statement]
> >
> > The `*(typeof(arg) *)&__tmp = ...` line is considered code, and is
> > followed by further declarations.  Even if we moved both declarations
> > ahead of the assigns it would still complain when multiple
> > ALT_CALL_ARG() instances are used in the same altcall block.
> >
> > Thanks, Roger.
>
> That _was_ forbidden in C89, but it has been allowed since. We have a warning
> enabled to cause it to fail even if we always use C99-compatible compilers. I
> think we should change that.
>
> Regardless, I think it can be worked around. This compiles (otherwise
> untested):
>
> #define ALT_CALL_ARG(arg, n)
>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = ({   \
>         unsigned long tmp = 0;                                          \
>         *(typeof(arg) *)&a ## n ## __ = (arg);                          \
>         BUILD_BUG_ON(sizeof(arg) > sizeof(void *));                     \
>         tmp;                                                            \
>     })
>
> That said, if the oversized temp union works, I'm fine with that too.
>
> Cheers,
> Alejandro

Sigh... I'm going at 1 mistake per email today. I meant:

#define ALT_CALL_ARG(arg, n)
     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = ({   \
         unsigned long tmp = 0;                                          \
         *(typeof(arg) *)&tmp = (arg);                                   \
         BUILD_BUG_ON(sizeof(arg) > sizeof(void *));                     \
         tmp;                                                            \
     })

Cheers,
Alejandro
Re: [PATCH] x86/altcall: further refine clang workaround
Posted by Roger Pau Monné 1 month, 1 week ago
On Fri, Jul 26, 2024 at 05:38:11PM +0100, Alejandro Vallejo wrote:
> On Fri Jul 26, 2024 at 5:25 PM BST, Alejandro Vallejo wrote:
> > On Fri Jul 26, 2024 at 4:18 PM BST, Roger Pau Monné wrote:
> > > On Fri, Jul 26, 2024 at 03:25:08PM +0100, Alejandro Vallejo wrote:
> > > > On Fri Jul 26, 2024 at 3:17 PM BST, Alejandro Vallejo wrote:
> > > > > On Fri Jul 26, 2024 at 9:05 AM BST, Jan Beulich wrote:
> > > > > > On 26.07.2024 09:52, Roger Pau Monné wrote:
> > > > > > > On Fri, Jul 26, 2024 at 09:36:15AM +0200, Jan Beulich wrote:
> > > > > > >> On 26.07.2024 09:31, Roger Pau Monné wrote:
> > > > > > >>> On Thu, Jul 25, 2024 at 05:00:22PM +0200, Jan Beulich wrote:
> > > > > > >>>> On 25.07.2024 16:54, Roger Pau Monné wrote:
> > > > > > >>>>> On Thu, Jul 25, 2024 at 03:18:29PM +0200, Jan Beulich wrote:
> > > > > > >>>>>> On 25.07.2024 12:56, Roger Pau Monne wrote:
> > > > > > >>>>>>> --- a/xen/arch/x86/include/asm/alternative.h
> > > > > > >>>>>>> +++ b/xen/arch/x86/include/asm/alternative.h
> > > > > > >>>>>>> @@ -184,11 +184,11 @@ extern void alternative_branches(void);
> > > > > > >>>>>>>   * https://github.com/llvm/llvm-project/issues/82598
> > > > > > >>>>>>>   */
> > > > > > >>>>>>>  #define ALT_CALL_ARG(arg, n)                                            \
> > > > > > >>>>>>> -    register union {                                                    \
> > > > > > >>>>>>> -        typeof(arg) e[sizeof(long) / sizeof(arg)];                      \
> > > > > > >>>>>>> -        unsigned long r;                                                \
> > > > > > >>>>>>> +    register struct {                                                   \
> > > > > > >>>>>>> +        typeof(arg) e;                                                  \
> > > > > > >>>>>>> +        char pad[sizeof(void *) - sizeof(arg)];                         \
> > > > > > >>>>>>
> > > > > > >>>>>> One thing that occurred to me only after our discussion, and I then forgot
> > > > > > >>>>>> to mention this before you would send a patch: What if sizeof(void *) ==
> > > > > > >>>>>> sizeof(arg)? Zero-sized arrays are explicitly something we're trying to
> > > > > > >>>>>> get rid of.
> > > > > > >>>>>
> > > > > > >>>>> I wondered about this, but I though it was only [] that we were trying
> > > > > > >>>>> to get rid of, not [0].
> > > > > > >>>>
> > > > > > >>>> Sadly (here) it's actually the other way around, aiui.
> > > > > > >>>
> > > > > > >>> The only other option I have in mind is using an oversized array on
> > > > > > >>> the union, like:
> > > > > > >>>
> > > > > > >>> #define ALT_CALL_ARG(arg, n)                                            \
> > > > > > >>>     union {                                                             \
> > > > > > >>>         typeof(arg) e[(sizeof(long) + sizeof(arg) - 1) / sizeof(arg)];  \
> > > > > > >>>         unsigned long r;                                                \
> > > > > > >>>     } a ## n ## __  = {                                                 \
> > > > > > >>>         .e[0] = ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })\
> > > > > > >>>     };                                                                  \
> > > > > > >>>     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) =      \
> > > > > > >>>         a ## n ## __.r
> > > > > > >>
> > > > > > >> Yet that's likely awful code-gen wise?
> > > > > > > 
> > > > > > > Seems OK: https://godbolt.org/z/nsdo5Gs8W
> > > > > >
> > > > > > In which case why not go this route. If the compiler is doing fine with
> > > > > > that, maybe the array dimension expression could be further simplified,
> > > > > > accepting yet more over-sizing? Like "sizeof(void *) / sizeof (arg) + 1"
> > > > > > or even simply "sizeof(void *)"? Suitably commented of course ...
> > > > > >
> > > > > > >> For the time being, can we perhaps
> > > > > > >> just tighten the BUILD_BUG_ON(), as iirc Alejandro had suggested?
> > > > > > > 
> > > > > > > My main concern with tightening the BUILD_BUG_ON() is that then I
> > > > > > > would also like to do so for the GCC one, so that build fails
> > > > > > > uniformly.
> > > > > >
> > > > > > If we were to take that route, then yes, probably should constrain both
> > > > > > (with a suitable comment on the gcc one).
> > > > > >
> > > > > > Jan
> > > > >
> > > > > Yet another way would be to have an intermediate `long` to cast onto. Compilers
> > > > > will optimise away the copy. It ignores the different-type aliasing rules in
> > > > > the C spec, so there's an assumption that we have -fno-strict-aliasing. But I
> > > > > belive we do? Otherwise it should pretty much work on anything.
> > > > >
> > > > > ```
> > > > >   #define ALT_CALL_ARG(arg, n)                                              \
> > > > >       unsigned long __tmp = 0;                                              \
> > > > >       *(typeof(arg) *)&__tmp =                                              \
> > > > >           ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); })          \
> > > > >       register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = __tmp; \
> > > > > ```
> > > > >
> > > > > fwiw, clang18 emits identical code compared with the previous godbolt link.
> > > > >
> > > > > Link: https://godbolt.org/z/facd1M9xa
> > > > >
> > > > > Cheers,
> > > > > Alejandro
> > > > 
> > > > Bah. s/b/__tmp/ in line15. Same output though, so the point still stands.
> > >
> > > Had to adjust it to:
> > >
> > > #define ALT_CALL_ARG(arg, n)                                              \
> > >     unsigned long a ## n ## __ = 0;                                       \
> > >     *(typeof(arg) *)&a ## n ## __ =                                       \
> > >         ({ BUILD_BUG_ON(sizeof(arg) > sizeof(void *)); (arg); });         \
> > >     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = a ## n ## __
> > >
> > > So that tmp__ is not defined multiple times for repeated
> > > ALT_CALL_ARG() usages.
> > >
> > > Already tried something like this in the past, but it mixes code with
> > > declarations, and that's forbidden in the current C standard that Xen
> > > uses:
> > >
> > > ./arch/x86/include/asm/hvm/hvm.h:665:5: error: mixing declarations and code is incompatible with standards before C99 [-Werror,-Wdeclaration-after-statement]
> > >
> > > The `*(typeof(arg) *)&__tmp = ...` line is considered code, and is
> > > followed by further declarations.  Even if we moved both declarations
> > > ahead of the assigns it would still complain when multiple
> > > ALT_CALL_ARG() instances are used in the same altcall block.
> > >
> > > Thanks, Roger.
> >
> > That _was_ forbidden in C89, but it has been allowed since. We have a warning
> > enabled to cause it to fail even if we always use C99-compatible compilers. I
> > think we should change that.
> >
> > Regardless, I think it can be worked around. This compiles (otherwise
> > untested):
> >
> > #define ALT_CALL_ARG(arg, n)
> >     register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = ({   \
> >         unsigned long tmp = 0;                                          \
> >         *(typeof(arg) *)&a ## n ## __ = (arg);                          \
> >         BUILD_BUG_ON(sizeof(arg) > sizeof(void *));                     \
> >         tmp;                                                            \
> >     })
> >
> > That said, if the oversized temp union works, I'm fine with that too.
> >
> > Cheers,
> > Alejandro
> 
> Sigh... I'm going at 1 mistake per email today. I meant:
> 
> #define ALT_CALL_ARG(arg, n)
>      register unsigned long a ## n ## _ asm ( ALT_CALL_arg ## n ) = ({   \
>          unsigned long tmp = 0;                                          \
>          *(typeof(arg) *)&tmp = (arg);                                   \
>          BUILD_BUG_ON(sizeof(arg) > sizeof(void *));                     \
>          tmp;                                                            \
>      })

I see, so placing the temp variable on an inner scope.  The code
generated seems to be the same from godbolt.  I don't have a strong
opinion on using either.  Your suggestion is slightly shorter than the
union one, so I might go for it.

Thanks, Roger.