[PATCH v2 3/4] xen: arm: enable stack protector feature

Volodymyr Babchuk posted 4 patches 3 weeks, 5 days ago
There is a newer version of this series
[PATCH v2 3/4] xen: arm: enable stack protector feature
Posted by Volodymyr Babchuk 3 weeks, 5 days ago
Enable previously added CONFIG_STACK_PROTECTOR feature for ARM
platform. Here we can call boot_stack_chk_guard_setup() in start_xen()
function, because it never returns, so stack protector code will not
be triggered because of changed canary.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>

---

In v2:
 - Reordered Kconfig entry
---
 xen/arch/arm/Kconfig | 1 +
 xen/arch/arm/setup.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 23bbc91aad..a24c88c327 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -16,6 +16,7 @@ config ARM
 	select HAS_ALTERNATIVE if HAS_VMAP
 	select HAS_DEVICE_TREE
 	select HAS_PASSTHROUGH
+	select HAS_STACK_PROTECTOR
 	select HAS_UBSAN
 	select IOMMU_FORCE_PT_SHARE
 
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 2e27af4560..f855e97e25 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -13,6 +13,7 @@
 #include <xen/domain_page.h>
 #include <xen/grant_table.h>
 #include <xen/types.h>
+#include <xen/stack-protector.h>
 #include <xen/string.h>
 #include <xen/serial.h>
 #include <xen/sched.h>
@@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr)
      */
     system_state = SYS_STATE_boot;
 
+    boot_stack_chk_guard_setup();
+
     if ( acpi_disabled )
     {
         printk("Booting using Device Tree\n");
-- 
2.47.1
Re: [PATCH v2 3/4] xen: arm: enable stack protector feature
Posted by Andrew Cooper 3 weeks, 1 day ago
On 30/11/2024 1:10 am, Volodymyr Babchuk wrote:
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 2e27af4560..f855e97e25 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr)
>       */
>      system_state = SYS_STATE_boot;
>  
> +    boot_stack_chk_guard_setup();
> +
>      if ( acpi_disabled )
>      {
>          printk("Booting using Device Tree\n");

I still think that __stack_chk_guard wants setting up in ASM before
entering C.

The only reason this call is so late is because Xen's get_random()
sequence is less than helpful.  That wants rewriting somewhat, but maybe
now isn't the best time.

Even if you initialise __stack_chk_guard it to -1 rather than 0, it's
still got a better chance of catching errors during very early boot; the
instrumentation is present, but is using 0 as the canary value.

On x86, dumping the current TSC value into __stack_chk_guard would be
far better than using -1.  Even if it skewed to a lower number, it's
unpredictable and not going to reoccur by accident during a stack overrun.

Surely ARM has something similar it could use?

[edit] Yes, get_cycles(), which every architecture seems to have.  In
fact, swapping get_random() from NOW() to get_cycles() would be good
enough to get it usable from early assembly.

~Andrew

A better option for get_random() would be to use a proven PRNG (e.g. one
of the xorshift family), seeded with get_cycles() and then re-seeded
with a real RDRAND/etc instruction if such a capability is found to
exist on the hardware.

Re: [PATCH v2 3/4] xen: arm: enable stack protector feature
Posted by Julien Grall 3 weeks, 1 day ago
On Tue, 3 Dec 2024 at 22:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> On 30/11/2024 1:10 am, Volodymyr Babchuk wrote:
> > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> > index 2e27af4560..f855e97e25 100644
> > --- a/xen/arch/arm/setup.c
> > +++ b/xen/arch/arm/setup.c
> > @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr)
> >       */
> >      system_state = SYS_STATE_boot;
> >
> > +    boot_stack_chk_guard_setup();
> > +
> >      if ( acpi_disabled )
> >      {
> >          printk("Booting using Device Tree\n");
>
> I still think that __stack_chk_guard wants setting up in ASM before
> entering C.
>
> The only reason this call is so late is because Xen's get_random()
> sequence is less than helpful.  That wants rewriting somewhat, but maybe
> now isn't the best time.
>
> Even if you initialise __stack_chk_guard it to -1 rather than 0, it's
> still got a better chance of catching errors during very early boot; the
> instrumentation is present, but is using 0 as the canary value.
>
> On x86, dumping the current TSC value into __stack_chk_guard would be
> far better than using -1.  Even if it skewed to a lower number, it's
> unpredictable and not going to reoccur by accident during a stack overrun.
>
> Surely ARM has something similar it could use?

There is a optional system register to read a random number.

>
> [edit] Yes, get_cycles(), which every architecture seems to have.  In
> fact, swapping get_random() from NOW() to get_cycles() would be good
> enough to get it usable from early assembly.

Not quite. Technically we can't rely on the timer counter until
platform_init_time() is called.
This was to cater an issue on the exynos we used in OssTest. But
arguably this is the exception
rather than the norm because the firmware ought to properly initialize
the timer...

I haven't checked recent firmware. But I could be convinced to access
the counter before
hand if we really think that setting __stack_chk_guard from ASM is much better.

Cheers,
Re: [PATCH v2 3/4] xen: arm: enable stack protector feature
Posted by Andrew Cooper 3 weeks ago
On 03/12/2024 11:16 pm, Julien Grall wrote:
> On Tue, 3 Dec 2024 at 22:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 30/11/2024 1:10 am, Volodymyr Babchuk wrote:
>>> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
>>> index 2e27af4560..f855e97e25 100644
>>> --- a/xen/arch/arm/setup.c
>>> +++ b/xen/arch/arm/setup.c
>>> @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr)
>>>       */
>>>      system_state = SYS_STATE_boot;
>>>
>>> +    boot_stack_chk_guard_setup();
>>> +
>>>      if ( acpi_disabled )
>>>      {
>>>          printk("Booting using Device Tree\n");
>> I still think that __stack_chk_guard wants setting up in ASM before
>> entering C.
>>
>> The only reason this call is so late is because Xen's get_random()
>> sequence is less than helpful.  That wants rewriting somewhat, but maybe
>> now isn't the best time.
>>
>> Even if you initialise __stack_chk_guard it to -1 rather than 0, it's
>> still got a better chance of catching errors during very early boot; the
>> instrumentation is present, but is using 0 as the canary value.
>>
>> On x86, dumping the current TSC value into __stack_chk_guard would be
>> far better than using -1.  Even if it skewed to a lower number, it's
>> unpredictable and not going to reoccur by accident during a stack overrun.
>>
>> Surely ARM has something similar it could use?
> There is a optional system register to read a random number.

Only in v8.5 as far as I can see, and even then it's not required. 
Also, it suffers from the same problem as RDRAND on x86; we need to boot
to at least feature detection before we can safely use it if it's available.

>
>> [edit] Yes, get_cycles(), which every architecture seems to have.  In
>> fact, swapping get_random() from NOW() to get_cycles() would be good
>> enough to get it usable from early assembly.
> Not quite. Technically we can't rely on the timer counter until
> platform_init_time() is called.
> This was to cater an issue on the exynos we used in OssTest. But
> arguably this is the exception
> rather than the norm because the firmware ought to properly initialize
> the timer...
>
> I haven't checked recent firmware. But I could be convinced to access
> the counter before
> hand if we really think that setting __stack_chk_guard from ASM is much better.

The C instrumentation is always present, right from the very start of
start_xen().

Even working with a canary of 0, there's some value.  It will spot
clobbering with a non-zero value, but it won't spot e.g. an overly-long
memset(, 0).

During boot, we're not defending against a malicious entity.  Simply
defending against bad developer expectations.

Therefore, anything to get a non-zero value prior to entering C will be
an improvement.  Best-effort is fine, and if there's one platform with
an errata that causes it to miss out, then oh well.  Any other platform
which manifests a crash will still lead to the problem being fixed.

I suppose taking this argument to it's logical conclusion, we could
initialise __stack_chk_guard with a poison pattern, although not one
shared by any other poison pattern in Xen.  That alone would be better
than using 0 in early boot.

~Andrew

Re: [PATCH v2 3/4] xen: arm: enable stack protector feature
Posted by Volodymyr Babchuk 2 weeks, 6 days ago
Hi Andrew,

Andrew Cooper <andrew.cooper3@citrix.com> writes:

> On 03/12/2024 11:16 pm, Julien Grall wrote:
>> On Tue, 3 Dec 2024 at 22:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> On 30/11/2024 1:10 am, Volodymyr Babchuk wrote:
>>>> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
>>>> index 2e27af4560..f855e97e25 100644
>>>> --- a/xen/arch/arm/setup.c
>>>> +++ b/xen/arch/arm/setup.c
>>>> @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr)
>>>>       */
>>>>      system_state = SYS_STATE_boot;
>>>>
>>>> +    boot_stack_chk_guard_setup();
>>>> +
>>>>      if ( acpi_disabled )
>>>>      {
>>>>          printk("Booting using Device Tree\n");
>>> I still think that __stack_chk_guard wants setting up in ASM before
>>> entering C.
>>>
>>> The only reason this call is so late is because Xen's get_random()
>>> sequence is less than helpful.  That wants rewriting somewhat, but maybe
>>> now isn't the best time.
>>>
>>> Even if you initialise __stack_chk_guard it to -1 rather than 0, it's
>>> still got a better chance of catching errors during very early boot; the
>>> instrumentation is present, but is using 0 as the canary value.
>>>
>>> On x86, dumping the current TSC value into __stack_chk_guard would be
>>> far better than using -1.  Even if it skewed to a lower number, it's
>>> unpredictable and not going to reoccur by accident during a stack overrun.
>>>
>>> Surely ARM has something similar it could use?
>> There is a optional system register to read a random number.
>
> Only in v8.5 as far as I can see, and even then it's not required. 
> Also, it suffers from the same problem as RDRAND on x86; we need to boot
> to at least feature detection before we can safely use it if it's available.
>
>>
>>> [edit] Yes, get_cycles(), which every architecture seems to have.  In
>>> fact, swapping get_random() from NOW() to get_cycles() would be good
>>> enough to get it usable from early assembly.
>> Not quite. Technically we can't rely on the timer counter until
>> platform_init_time() is called.
>> This was to cater an issue on the exynos we used in OssTest. But
>> arguably this is the exception
>> rather than the norm because the firmware ought to properly initialize
>> the timer...
>>
>> I haven't checked recent firmware. But I could be convinced to access
>> the counter before
>> hand if we really think that setting __stack_chk_guard from ASM is much better.
>
> The C instrumentation is always present, right from the very start of
> start_xen().
>
> Even working with a canary of 0, there's some value.  It will spot
> clobbering with a non-zero value, but it won't spot e.g. an overly-long
> memset(, 0).
>
> During boot, we're not defending against a malicious entity.  Simply
> defending against bad developer expectations.
>
> Therefore, anything to get a non-zero value prior to entering C will be
> an improvement.  Best-effort is fine, and if there's one platform with
> an errata that causes it to miss out, then oh well.  Any other platform
> which manifests a crash will still lead to the problem being fixed.
>
> I suppose taking this argument to it's logical conclusion, we could
> initialise __stack_chk_guard with a poison pattern, although not one
> shared by any other poison pattern in Xen.  That alone would be better
> than using 0 in early boot.

Okay, so I come with three-stage initialization:

1. Static poison pattern
2. Time-based early value
3. Random-number based long-term value

So, apart from already present

static always_inline void boot_stack_chk_guard_setup(void);

I did this:

/*
 * Initial value is chosen by fair dice roll.
 * It will be updated during boot process.
 */
#if BITS_PER_LONG == 32
unsigned long __ro_after_init __stack_chk_guard = 0xdd2cc927;
#else
unsigned long __ro_after_init __stack_chk_guard = 0x2d853605a4d9a09c;
#endif

/* This function should be called from ASM only */
void __init asmlinkage boot_stack_chk_guard_setup_early(void)
{
    /*
     * Linear congruent generator. Constant is taken from
     * Tables Of Linear Congruential Generators
     * Of Different Sizes And Good Lattice Structure by Pierre L’Ecuyer
     */
#if BITS_PER_LONG == 32
    const unsigned long a = 2891336453;
#else
    const unsigned long a = 2862933555777941757;
#endif
    const unsigned c = 1;
    unsigned long cycles = get_cycles();

    if ( !cycles )
        return;

    __stack_chk_guard = cycles * a + c;
}

boot_stack_chk_guard_setup_early() is being called by ASM code during
early boot stages.

Then, later, boot_stack_chk_guard_setup() is called.

If you are okay with this approach, I'll send the next version.

-- 
WBR, Volodymyr
Re: [PATCH v2 3/4] xen: arm: enable stack protector feature
Posted by Jan Beulich 2 weeks, 3 days ago
On 06.12.2024 05:46, Volodymyr Babchuk wrote:
> So, apart from already present
> 
> static always_inline void boot_stack_chk_guard_setup(void);
> 
> I did this:
> 
> /*
>  * Initial value is chosen by fair dice roll.
>  * It will be updated during boot process.
>  */
> #if BITS_PER_LONG == 32
> unsigned long __ro_after_init __stack_chk_guard = 0xdd2cc927;

At least this and ...

> #else
> unsigned long __ro_after_init __stack_chk_guard = 0x2d853605a4d9a09c;
> #endif
> 
> /* This function should be called from ASM only */
> void __init asmlinkage boot_stack_chk_guard_setup_early(void)
> {
>     /*
>      * Linear congruent generator. Constant is taken from
>      * Tables Of Linear Congruential Generators
>      * Of Different Sizes And Good Lattice Structure by Pierre L’Ecuyer
>      */
> #if BITS_PER_LONG == 32
>     const unsigned long a = 2891336453;

... this will need a UL suffix for Misra. Probably best to add UL on all
four constants.

As to the comment, please adhere to ./CODING_STYLE.

> #else
>     const unsigned long a = 2862933555777941757;
> #endif
>     const unsigned c = 1;

I'm having a hard time seeing why this need to be a static variable. Its
sole use is ...

>     unsigned long cycles = get_cycles();
> 
>     if ( !cycles )
>         return;
> 
>     __stack_chk_guard = cycles * a + c;

... here, where you can as well write a literal 1.

Jan

Re: [PATCH v2 3/4] xen: arm: enable stack protector feature
Posted by Volodymyr Babchuk 2 weeks, 3 days ago
Hi Jan,

Jan Beulich <jbeulich@suse.com> writes:

[...]


>
>> #else
>>     const unsigned long a = 2862933555777941757;
>> #endif
>>     const unsigned c = 1;
>
> I'm having a hard time seeing why this need to be a static variable. Its
> sole use is ...

It's a constant in a hope that compiler is smart enough to optimize it out.

>>     unsigned long cycles = get_cycles();
>> 
>>     if ( !cycles )
>>         return;
>> 
>>     __stack_chk_guard = cycles * a + c;
>
> ... here, where you can as well write a literal 1.

For readability. Formula for LCG is X_n+1 = (X_n * a + c) mod m.

-- 
WBR, Volodymyr
Re: [PATCH v2 3/4] xen: arm: enable stack protector feature
Posted by Julien Grall 3 weeks, 5 days ago
Hi Volodymyr,

On 30/11/2024 01:10, Volodymyr Babchuk wrote:
> Enable previously added CONFIG_STACK_PROTECTOR feature for ARM
> platform. Here we can call boot_stack_chk_guard_setup() in start_xen()
> function, because it never returns, so stack protector code will not
> be triggered because of changed canary.

It would be good to explain how you decided to call...

> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> 
> ---
> 
> In v2:
>   - Reordered Kconfig entry
> ---
>   xen/arch/arm/Kconfig | 1 +
>   xen/arch/arm/setup.c | 3 +++
>   2 files changed, 4 insertions(+)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 23bbc91aad..a24c88c327 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -16,6 +16,7 @@ config ARM
>   	select HAS_ALTERNATIVE if HAS_VMAP
>   	select HAS_DEVICE_TREE
>   	select HAS_PASSTHROUGH
> +	select HAS_STACK_PROTECTOR
>   	select HAS_UBSAN
>   	select IOMMU_FORCE_PT_SHARE
>   
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 2e27af4560..f855e97e25 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -13,6 +13,7 @@
>   #include <xen/domain_page.h>
>   #include <xen/grant_table.h>
>   #include <xen/types.h>
> +#include <xen/stack-protector.h>
>   #include <xen/string.h>
>   #include <xen/serial.h>
>   #include <xen/sched.h>
> @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr)
>        */
>       system_state = SYS_STATE_boot;
>   
> +    boot_stack_chk_guard_setup();

... the function here. If I am not mistaken, at this point, cpu_khz 
(used by NOW() in get_random()) would be zero. It is only initialized by 
preinit_xen_time() which happens later.

So I think it should be called a bit further down and gain a comment to 
about the call dependencies.

Cheers,

-- 
Julien Grall