[PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible

Andrew Cooper posted 4 patches 4 years, 2 months ago
There is a newer version of this series
[PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
Posted by Andrew Cooper 4 years, 2 months ago
With altcall, we convert indirect branches into direct ones.  With that
complete, none of the potential targets need an endbr64 instruction.

Furthermore, removing the endbr64 instructions is a security defence-in-depth
improvement, because it limits the options available to an attacker who has
managed to hijack a function pointer.

Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
walk over the entire section, looking for any pointers into .text, and clobber
an endbr64 instruction if found.  This is some minor structure (ab)use but it
works alarmingly well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

It would be nice for the printk() to say "optimised away %u of %u", but the
latter number can only feasibly come from post-processing of xen-syms during
the build.
---
 xen/arch/x86/alternative.c | 38 ++++++++++++++++++++++++++++++++++++++
 xen/arch/x86/xen.lds.S     |  5 +++++
 xen/include/xen/init.h     |  2 ++
 3 files changed, 45 insertions(+)

diff --git a/xen/arch/x86/alternative.c b/xen/arch/x86/alternative.c
index 5ae4c80d5119..65fc8534b97f 100644
--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
     return memcpy(addr, opcode, len);
 }
 
+extern unsigned long __initdata_cf_clobber_start[];
+extern unsigned long __initdata_cf_clobber_end[];
+
 /*
  * Replace instructions with better alternatives for this CPU type.
  * This runs before SMP is initialized to avoid SMP problems with
@@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
         add_nops(buf + a->repl_len, total_len - a->repl_len);
         text_poke(orig, buf, total_len);
     }
+
+    /*
+     * Clobber endbr64 instructions now that altcall has finished optimised
+     * all indirect branches to direct ones.
+     */
+    if ( force && cpu_has_xen_ibt )
+    {
+        unsigned long *val;
+        unsigned int clobbered = 0;
+
+        /*
+         * This is some minor structure (ab)use.  We walk the entire contents
+         * of .init.data.cf_clobber as if it were an array of pointers.
+         *
+         * If the pointer points into .text, and has an endbr64 instruction,
+         * nop out the endbr64.  This causes the pointer to no longer be a
+         * legal indirect branch target under CET-IBT.  This is a
+         * defence-in-depth measure, to reduce the options available to an
+         * adversary who has managed to hijack a function pointer.
+         */
+        for ( val = __initdata_cf_clobber_start;
+              val < __initdata_cf_clobber_end;
+              val++ )
+        {
+            void *ptr = (void *)*val;
+
+            if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
+                continue;
+
+            add_nops(ptr, 4);
+            clobbered++;
+        }
+
+        printk("altcall: Optimised away %u endbr64 instructions\n", clobbered);
+    }
 }
 
 void init_or_livepatch apply_alternatives(struct alt_instr *start,
diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
index 87e344d4dd97..5b16a98e4df1 100644
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -214,6 +214,11 @@ SECTIONS
        *(.initcall1.init)
        __initcall_end = .;
 
+       . = ALIGN(POINTER_ALIGN);
+        __initdata_cf_clobber_start = .;
+	*(.init.data.cf_clobber)
+        __initdata_cf_clobber_end = .;
+
        *(.init.data)
        *(.init.data.rel)
        *(.init.data.rel.*)
diff --git a/xen/include/xen/init.h b/xen/include/xen/init.h
index bfe789e93f6b..66b324892a52 100644
--- a/xen/include/xen/init.h
+++ b/xen/include/xen/init.h
@@ -18,6 +18,8 @@
 #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
 #define __exit_call       __used_section(".exitcall.exit")
 
+#define __initdata_cf_clobber __section(".init.data.cf_clobber")
+
 /* These macros are used to mark some functions or 
  * initialized data (doesn't apply to uninitialized data)
  * as `initialization' functions. The kernel can take this
-- 
2.11.0


Re: [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
Posted by Jan Beulich 4 years, 2 months ago
On 26.11.2021 22:22, Andrew Cooper wrote:
> With altcall, we convert indirect branches into direct ones.  With that
> complete, none of the potential targets need an endbr64 instruction.

Assuming that no other hooks remain which re-use the same function. I
think this constraint wants at least mentioning explicitly.

> Furthermore, removing the endbr64 instructions is a security defence-in-depth
> improvement, because it limits the options available to an attacker who has
> managed to hijack a function pointer.
> 
> Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
> walk over the entire section, looking for any pointers into .text, and clobber
> an endbr64 instruction if found.  This is some minor structure (ab)use but it
> works alarmingly well.

Iirc you've said more than once that non-function-pointer data in
those structures is fine; I'm not convinced. What if a sequence of
sub-pointer-size fields has a value looking like a pointer into
.text? This may not be very likely, but would result in corruption
that may be hard to associate with anything. Of course, with the
is_endbr64() check and with a build time check of there not being
any stray ENDBR64 patterns in .text, that issue would disappear.
But we aren't quite there yet.

> --- a/xen/arch/x86/alternative.c
> +++ b/xen/arch/x86/alternative.c
> @@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
>      return memcpy(addr, opcode, len);
>  }
>  
> +extern unsigned long __initdata_cf_clobber_start[];
> +extern unsigned long __initdata_cf_clobber_end[];

const please. I also would find it quite a bit better if these
were suitably typed such that ...

> @@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>          add_nops(buf + a->repl_len, total_len - a->repl_len);
>          text_poke(orig, buf, total_len);
>      }
> +
> +    /*
> +     * Clobber endbr64 instructions now that altcall has finished optimised
> +     * all indirect branches to direct ones.
> +     */
> +    if ( force && cpu_has_xen_ibt )
> +    {
> +        unsigned long *val;
> +        unsigned int clobbered = 0;
> +
> +        /*
> +         * This is some minor structure (ab)use.  We walk the entire contents
> +         * of .init.data.cf_clobber as if it were an array of pointers.
> +         *
> +         * If the pointer points into .text, and has an endbr64 instruction,
> +         * nop out the endbr64.  This causes the pointer to no longer be a
> +         * legal indirect branch target under CET-IBT.  This is a
> +         * defence-in-depth measure, to reduce the options available to an
> +         * adversary who has managed to hijack a function pointer.
> +         */
> +        for ( val = __initdata_cf_clobber_start;
> +              val < __initdata_cf_clobber_end;
> +              val++ )
> +        {
> +            void *ptr = (void *)*val;

... no cast was needed here.

> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -214,6 +214,11 @@ SECTIONS
>         *(.initcall1.init)
>         __initcall_end = .;
>  
> +       . = ALIGN(POINTER_ALIGN);
> +        __initdata_cf_clobber_start = .;
> +	*(.init.data.cf_clobber)

Nit: hard tab slipped in here.

> --- a/xen/include/xen/init.h
> +++ b/xen/include/xen/init.h
> @@ -18,6 +18,8 @@
>  #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
>  #define __exit_call       __used_section(".exitcall.exit")
>  
> +#define __initdata_cf_clobber __section(".init.data.cf_clobber")

Just to repeat what I've said elsewhere: I think we want a const
version of this as well.

Jan


Re: [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
Posted by Andrew Cooper 4 years, 2 months ago
On 01/12/2021 08:20, Jan Beulich wrote:
> On 26.11.2021 22:22, Andrew Cooper wrote:
>> With altcall, we convert indirect branches into direct ones.  With that
>> complete, none of the potential targets need an endbr64 instruction.
> Assuming that no other hooks remain which re-use the same function. I
> think this constraint wants at least mentioning explicitly.

Fair point, but I think it is entirely reasonable to expect logic not to
mix and match altcall on the same hook.

>
>> Furthermore, removing the endbr64 instructions is a security defence-in-depth
>> improvement, because it limits the options available to an attacker who has
>> managed to hijack a function pointer.
>>
>> Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
>> walk over the entire section, looking for any pointers into .text, and clobber
>> an endbr64 instruction if found.  This is some minor structure (ab)use but it
>> works alarmingly well.
> Iirc you've said more than once that non-function-pointer data in
> those structures is fine; I'm not convinced. What if a sequence of
> sub-pointer-size fields has a value looking like a pointer into
> .text? This may not be very likely, but would result in corruption
> that may be hard to associate with anything. Of course, with the
> is_endbr64() check and with a build time check of there not being
> any stray ENDBR64 patterns in .text, that issue would disappear.
> But we aren't quite there yet.

I disagree with "not very likely" and put it firmly in the "not
plausible" category.

To cause a problem, you need an aligned something which isn't actually a
function pointer with a bit pattern forming [0xffff82d040200000,
ffff82d04039e1ba) which hits an ENDBR64 pattern.  Removing the stray
ENDBR64's doesn't prevent such a bit pattern pointing at a real (wrong)
function.

These structures are almost exclusively compile time generated.

So yes - it's not impossible, but it's also not going to happen
accidentally.


>
>> --- a/xen/arch/x86/alternative.c
>> +++ b/xen/arch/x86/alternative.c
>> @@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
>>      return memcpy(addr, opcode, len);
>>  }
>>  
>> +extern unsigned long __initdata_cf_clobber_start[];
>> +extern unsigned long __initdata_cf_clobber_end[];
> const please. I also would find it quite a bit better if these
> were suitably typed such that ...
>
>> @@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>>          add_nops(buf + a->repl_len, total_len - a->repl_len);
>>          text_poke(orig, buf, total_len);
>>      }
>> +
>> +    /*
>> +     * Clobber endbr64 instructions now that altcall has finished optimised
>> +     * all indirect branches to direct ones.
>> +     */
>> +    if ( force && cpu_has_xen_ibt )
>> +    {
>> +        unsigned long *val;
>> +        unsigned int clobbered = 0;
>> +
>> +        /*
>> +         * This is some minor structure (ab)use.  We walk the entire contents
>> +         * of .init.data.cf_clobber as if it were an array of pointers.
>> +         *
>> +         * If the pointer points into .text, and has an endbr64 instruction,
>> +         * nop out the endbr64.  This causes the pointer to no longer be a
>> +         * legal indirect branch target under CET-IBT.  This is a
>> +         * defence-in-depth measure, to reduce the options available to an
>> +         * adversary who has managed to hijack a function pointer.
>> +         */
>> +        for ( val = __initdata_cf_clobber_start;
>> +              val < __initdata_cf_clobber_end;
>> +              val++ )
>> +        {
>> +            void *ptr = (void *)*val;
> ... no cast was needed here.

Unless you know what this type is, I already tried and am stuck. 
Everything else requires more horrible casts on val.

>
>> --- a/xen/arch/x86/xen.lds.S
>> +++ b/xen/arch/x86/xen.lds.S
>> @@ -214,6 +214,11 @@ SECTIONS
>>         *(.initcall1.init)
>>         __initcall_end = .;
>>  
>> +       . = ALIGN(POINTER_ALIGN);
>> +        __initdata_cf_clobber_start = .;
>> +	*(.init.data.cf_clobber)
> Nit: hard tab slipped in here.

Will fix.

>
>> --- a/xen/include/xen/init.h
>> +++ b/xen/include/xen/init.h
>> @@ -18,6 +18,8 @@
>>  #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
>>  #define __exit_call       __used_section(".exitcall.exit")
>>  
>> +#define __initdata_cf_clobber __section(".init.data.cf_clobber")
> Just to repeat what I've said elsewhere: I think we want a const
> version of this as well.

I can, but does it really matter?  initconst is merged into initdata and
not actually read-only to begin with.

~Andrew

Re: [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
Posted by Jan Beulich 4 years, 2 months ago
On 01.12.2021 20:07, Andrew Cooper wrote:
> On 01/12/2021 08:20, Jan Beulich wrote:
>> On 26.11.2021 22:22, Andrew Cooper wrote:
>>> With altcall, we convert indirect branches into direct ones.  With that
>>> complete, none of the potential targets need an endbr64 instruction.
>> Assuming that no other hooks remain which re-use the same function. I
>> think this constraint wants at least mentioning explicitly.
> 
> Fair point, but I think it is entirely reasonable to expect logic not to
> mix and match altcall on the same hook.

Take XSM's silo_xsm_ops and dummy_ops as an example. With what
xsm_fixup_ops() does it should be entirely benign if silo_xsm_ops
set any or all of the hooks which are currently unset to what
dummy_ops has.

>>> Furthermore, removing the endbr64 instructions is a security defence-in-depth
>>> improvement, because it limits the options available to an attacker who has
>>> managed to hijack a function pointer.
>>>
>>> Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
>>> walk over the entire section, looking for any pointers into .text, and clobber
>>> an endbr64 instruction if found.  This is some minor structure (ab)use but it
>>> works alarmingly well.
>> Iirc you've said more than once that non-function-pointer data in
>> those structures is fine; I'm not convinced. What if a sequence of
>> sub-pointer-size fields has a value looking like a pointer into
>> .text? This may not be very likely, but would result in corruption
>> that may be hard to associate with anything. Of course, with the
>> is_endbr64() check and with a build time check of there not being
>> any stray ENDBR64 patterns in .text, that issue would disappear.
>> But we aren't quite there yet.
> 
> I disagree with "not very likely" and put it firmly in the "not
> plausible" category.
> 
> To cause a problem, you need an aligned something which isn't actually a
> function pointer with a bit pattern forming [0xffff82d040200000,
> ffff82d04039e1ba) which hits an ENDBR64 pattern.  Removing the stray
> ENDBR64's doesn't prevent such a bit pattern pointing at a real (wrong)
> function.

Why "aligned" in "aligned something"? And I also don't see what you're
trying to tell me with the last sentence. It's still .text corruption
that would result if such a pattern (crossing an insn boundary)
happened to be pointed at.

> These structures are almost exclusively compile time generated.
> 
> So yes - it's not impossible, but it's also not going to happen
> accidentally.

I wonder how you mean to exclude such accidents. It occurs to me that
checking the linked binary for the pattern isn't going to be enough.
Such a patter could also form with alternatives patching. (It's all
quite unlikely, yes, but imo we need to fully exclude the possibility.)

>>> --- a/xen/arch/x86/alternative.c
>>> +++ b/xen/arch/x86/alternative.c
>>> @@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
>>>      return memcpy(addr, opcode, len);
>>>  }
>>>  
>>> +extern unsigned long __initdata_cf_clobber_start[];
>>> +extern unsigned long __initdata_cf_clobber_end[];
>> const please. I also would find it quite a bit better if these
>> were suitably typed such that ...
>>
>>> @@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>>>          add_nops(buf + a->repl_len, total_len - a->repl_len);
>>>          text_poke(orig, buf, total_len);
>>>      }
>>> +
>>> +    /*
>>> +     * Clobber endbr64 instructions now that altcall has finished optimised
>>> +     * all indirect branches to direct ones.
>>> +     */
>>> +    if ( force && cpu_has_xen_ibt )
>>> +    {
>>> +        unsigned long *val;
>>> +        unsigned int clobbered = 0;
>>> +
>>> +        /*
>>> +         * This is some minor structure (ab)use.  We walk the entire contents
>>> +         * of .init.data.cf_clobber as if it were an array of pointers.
>>> +         *
>>> +         * If the pointer points into .text, and has an endbr64 instruction,
>>> +         * nop out the endbr64.  This causes the pointer to no longer be a
>>> +         * legal indirect branch target under CET-IBT.  This is a
>>> +         * defence-in-depth measure, to reduce the options available to an
>>> +         * adversary who has managed to hijack a function pointer.
>>> +         */
>>> +        for ( val = __initdata_cf_clobber_start;
>>> +              val < __initdata_cf_clobber_end;
>>> +              val++ )
>>> +        {
>>> +            void *ptr = (void *)*val;
>> ... no cast was needed here.
> 
> Unless you know what this type is, I already tried and am stuck. 
> Everything else requires more horrible casts on val.

It's as simple as I thought is would be; proposed respective patch
at the end of the mail (the two //temp-marked #define-s were needed so
I could build-test this without needing to pull in further patches of
yours). No new casts at all, and the one gone that I wanted to see
eliminated.

>>> --- a/xen/include/xen/init.h
>>> +++ b/xen/include/xen/init.h
>>> @@ -18,6 +18,8 @@
>>>  #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
>>>  #define __exit_call       __used_section(".exitcall.exit")
>>>  
>>> +#define __initdata_cf_clobber __section(".init.data.cf_clobber")
>> Just to repeat what I've said elsewhere: I think we want a const
>> version of this as well.
> 
> I can, but does it really matter?  initconst is merged into initdata and
> not actually read-only to begin with.

My remark wasn't about the actual mapping properties at all. What I'm
after is the compiler being able to spot modifications. If I see a
struct instance marked "const" and if I know the thing builds okay, I
know I don't need to go hunt for possible writes to this struct
instance. When it's non-const, to be sure there's no possible conflict
with the patching (yours or just the altcall part), I'd need to find
and verify all instances where the object gets written to.

Jan

**********************************************************************

--- a/xen/arch/x86/alternative.c
+++ b/xen/arch/x86/alternative.c
@@ -28,6 +28,9 @@
 #include <asm/nops.h>
 #include <xen/livepatch.h>
 
+#define cpu_has_xen_ibt true//temp
+#define is_endbr64(p) false//temp
+
 #define MAX_PATCH_LEN (255-1)
 
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
@@ -174,6 +177,9 @@ text_poke(void *addr, const void *opcode
     cpuid_eax(0);
 }
 
+extern void *const __initdata_cf_clobber_start[];
+extern void *const __initdata_cf_clobber_end[];
+
 /*
  * Replace instructions with better alternatives for this CPU type.
  * This runs before SMP is initialized to avoid SMP problems with
@@ -309,6 +315,41 @@ static void init_or_livepatch _apply_alt
         add_nops(buf + a->repl_len, total_len - a->repl_len);
         text_poke(orig, buf, total_len);
     }
+
+    /*
+     * Clobber endbr64 instructions now that altcall has finished optimised
+     * all indirect branches to direct ones.
+     */
+    if ( force && cpu_has_xen_ibt )
+    {
+        void *const *val;
+        unsigned int clobbered = 0;
+
+        /*
+         * This is some minor structure (ab)use.  We walk the entire contents
+         * of .init.data.cf_clobber as if it were an array of pointers.
+         *
+         * If the pointer points into .text, and has an endbr64 instruction,
+         * nop out the endbr64.  This causes the pointer to no longer be a
+         * legal indirect branch target under CET-IBT.  This is a
+         * defence-in-depth measure, to reduce the options available to an
+         * adversary who has managed to hijack a function pointer.
+         */
+        for ( val = __initdata_cf_clobber_start;
+              val < __initdata_cf_clobber_end;
+              val++ )
+        {
+            void *ptr = *val;
+
+            if ( !is_kernel_text(ptr) || !is_endbr64(ptr) )
+                continue;
+
+            add_nops(ptr, 4);
+            clobbered++;
+        }
+
+        printk("altcall: Optimised away %u endbr64 instructions\n", clobbered);
+    }
 }
 
 void init_or_livepatch apply_alternatives(struct alt_instr *start,
--- a/xen/arch/x86/xen.lds.S
+++ b/xen/arch/x86/xen.lds.S
@@ -217,6 +217,11 @@ SECTIONS
        *(.initcall1.init)
        __initcall_end = .;
 
+       . = ALIGN(POINTER_ALIGN);
+        __initdata_cf_clobber_start = .;
+       *(.init.data.cf_clobber)
+        __initdata_cf_clobber_end = .;
+
        *(.init.data)
        *(.init.data.rel)
        *(.init.data.rel.*)
--- a/xen/include/xen/init.h
+++ b/xen/include/xen/init.h
@@ -18,6 +18,8 @@
 #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
 #define __exit_call       __used_section(".exitcall.exit")
 
+#define __initdata_cf_clobber __section(".init.data.cf_clobber")
+
 /* These macros are used to mark some functions or 
  * initialized data (doesn't apply to uninitialized data)
  * as `initialization' functions. The kernel can take this


Re: [PATCH 2/4] x86/altcall: Optimise away endbr64 instruction where possible
Posted by Andrew Cooper 4 years, 2 months ago
On 02/12/2021 08:01, Jan Beulich wrote:
> On 01.12.2021 20:07, Andrew Cooper wrote:
>> On 01/12/2021 08:20, Jan Beulich wrote:
>>> On 26.11.2021 22:22, Andrew Cooper wrote:
>>>> With altcall, we convert indirect branches into direct ones.  With that
>>>> complete, none of the potential targets need an endbr64 instruction.
>>> Assuming that no other hooks remain which re-use the same function. I
>>> think this constraint wants at least mentioning explicitly.
>> Fair point, but I think it is entirely reasonable to expect logic not to
>> mix and match altcall on the same hook.
> Take XSM's silo_xsm_ops and dummy_ops as an example. With what
> xsm_fixup_ops() does it should be entirely benign if silo_xsm_ops
> set any or all of the hooks which are currently unset to what
> dummy_ops has.

We're talking very specifically about ops and ops-like structures, and
we don't just have random code calling dumy_ops->foo() when you've also
got xsm_foo() { altcall(ops->foo); }

In this case specifically, each of {flask,silo,dummy}_ops are static,
and xsm_fixup_ops() gets called exactly once on the root xsm_ops object,
so even code inside silo.c can't call silo->foo() and hit the dummy foo().

>>>> Furthermore, removing the endbr64 instructions is a security defence-in-depth
>>>> improvement, because it limits the options available to an attacker who has
>>>> managed to hijack a function pointer.
>>>>
>>>> Introduce a new .init.data.cf_clobber section.  Have _apply_alternatives()
>>>> walk over the entire section, looking for any pointers into .text, and clobber
>>>> an endbr64 instruction if found.  This is some minor structure (ab)use but it
>>>> works alarmingly well.
>>> Iirc you've said more than once that non-function-pointer data in
>>> those structures is fine; I'm not convinced. What if a sequence of
>>> sub-pointer-size fields has a value looking like a pointer into
>>> .text? This may not be very likely, but would result in corruption
>>> that may be hard to associate with anything. Of course, with the
>>> is_endbr64() check and with a build time check of there not being
>>> any stray ENDBR64 patterns in .text, that issue would disappear.
>>> But we aren't quite there yet.
>> I disagree with "not very likely" and put it firmly in the "not
>> plausible" category.
>>
>> To cause a problem, you need an aligned something which isn't actually a
>> function pointer with a bit pattern forming [0xffff82d040200000,
>> ffff82d04039e1ba) which hits an ENDBR64 pattern.  Removing the stray
>> ENDBR64's doesn't prevent such a bit pattern pointing at a real (wrong)
>> function.
> Why "aligned" in "aligned something"?

The non-function-pointer thing inside the ops struct needs to be 8-byte
aligned to trigger this bad behaviour to begin with, because we
interpret the struct as an array of unsigned longs.

Any 8-byte block containing a bool for example can't cause problems, nor
can a pair of adjacent uint32_t's if they're not on an 8 byte boundary.

>  And I also don't see what you're
> trying to tell me with the last sentence. It's still .text corruption
> that would result if such a pattern (crossing an insn boundary)
> happened to be pointed at.

We (will) have tooling to detect and reject ENDBR64 bit patterns which
aren't real ENDBR64 instructions.

But this "integer bit pattern that looks like a function pointer"
problem can target one of ~1600 (fewer in most builds) real ENDBR64
instructions of an unrelated function.

>> These structures are almost exclusively compile time generated.
>>
>> So yes - it's not impossible, but it's also not going to happen
>> accidentally.
> I wonder how you mean to exclude such accidents. It occurs to me that
> checking the linked binary for the pattern isn't going to be enough.
> Such a patter could also form with alternatives patching. (It's all
> quite unlikely, yes, but imo we need to fully exclude the possibility.)

Again, we're taking specifically ops structures, not arbitrary structures.

hvm_funcs is the only thing so far that has non-function pointer
members, and its got a string pointer (fine - not .text), and a couple
of integer fields, none of which will plausibly alias a function pointer.

I will fully admit that there is a risk of things going wrong.  I'm
happy copious health warnings wherever necessary, but I don't see
anything going wrong in practice without a deliberate attempt to tickle
this corner case.

>>>> --- a/xen/arch/x86/alternative.c
>>>> +++ b/xen/arch/x86/alternative.c
>>>> @@ -173,6 +173,9 @@ text_poke(void *addr, const void *opcode, size_t len)
>>>>      return memcpy(addr, opcode, len);
>>>>  }
>>>>  
>>>> +extern unsigned long __initdata_cf_clobber_start[];
>>>> +extern unsigned long __initdata_cf_clobber_end[];
>>> const please. I also would find it quite a bit better if these
>>> were suitably typed such that ...
>>>
>>>> @@ -329,6 +332,41 @@ static void init_or_livepatch _apply_alternatives(struct alt_instr *start,
>>>>          add_nops(buf + a->repl_len, total_len - a->repl_len);
>>>>          text_poke(orig, buf, total_len);
>>>>      }
>>>> +
>>>> +    /*
>>>> +     * Clobber endbr64 instructions now that altcall has finished optimised
>>>> +     * all indirect branches to direct ones.
>>>> +     */
>>>> +    if ( force && cpu_has_xen_ibt )
>>>> +    {
>>>> +        unsigned long *val;
>>>> +        unsigned int clobbered = 0;
>>>> +
>>>> +        /*
>>>> +         * This is some minor structure (ab)use.  We walk the entire contents
>>>> +         * of .init.data.cf_clobber as if it were an array of pointers.
>>>> +         *
>>>> +         * If the pointer points into .text, and has an endbr64 instruction,
>>>> +         * nop out the endbr64.  This causes the pointer to no longer be a
>>>> +         * legal indirect branch target under CET-IBT.  This is a
>>>> +         * defence-in-depth measure, to reduce the options available to an
>>>> +         * adversary who has managed to hijack a function pointer.
>>>> +         */
>>>> +        for ( val = __initdata_cf_clobber_start;
>>>> +              val < __initdata_cf_clobber_end;
>>>> +              val++ )
>>>> +        {
>>>> +            void *ptr = (void *)*val;
>>> ... no cast was needed here.
>> Unless you know what this type is, I already tried and am stuck. 
>> Everything else requires more horrible casts on val.
> It's as simple as I thought is would be; proposed respective patch
> at the end of the mail (the two //temp-marked #define-s were needed so
> I could build-test this without needing to pull in further patches of
> yours). No new casts at all, and the one gone that I wanted to see
> eliminated.

I can't have been very caffeinated while having those problems, clearly...

I have no idea how I didn't manage to come up with that as a working
solution.

>>>> --- a/xen/include/xen/init.h
>>>> +++ b/xen/include/xen/init.h
>>>> @@ -18,6 +18,8 @@
>>>>  #define __init_call(lvl)  __used_section(".initcall" lvl ".init")
>>>>  #define __exit_call       __used_section(".exitcall.exit")
>>>>  
>>>> +#define __initdata_cf_clobber __section(".init.data.cf_clobber")
>>> Just to repeat what I've said elsewhere: I think we want a const
>>> version of this as well.
>> I can, but does it really matter?  initconst is merged into initdata and
>> not actually read-only to begin with.
> My remark wasn't about the actual mapping properties at all. What I'm
> after is the compiler being able to spot modifications. If I see a
> struct instance marked "const" and if I know the thing builds okay, I
> know I don't need to go hunt for possible writes to this struct
> instance. When it's non-const, to be sure there's no possible conflict
> with the patching (yours or just the altcall part), I'd need to find
> and verify all instances where the object gets written to.

I've added __initconst_cf_clobber too.

~Andrew