Enable previously added CONFIG_STACK_PROTECTOR feature for ARM
platform. Here we can call boot_stack_chk_guard_setup() in start_xen()
function, because it never returns, so stack protector code will not
be triggered because of changed canary.
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
In v2:
- Reordered Kconfig entry
---
xen/arch/arm/Kconfig | 1 +
xen/arch/arm/setup.c | 3 +++
2 files changed, 4 insertions(+)
diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 23bbc91aad..a24c88c327 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -16,6 +16,7 @@ config ARM
select HAS_ALTERNATIVE if HAS_VMAP
select HAS_DEVICE_TREE
select HAS_PASSTHROUGH
+ select HAS_STACK_PROTECTOR
select HAS_UBSAN
select IOMMU_FORCE_PT_SHARE
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 2e27af4560..f855e97e25 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -13,6 +13,7 @@
#include <xen/domain_page.h>
#include <xen/grant_table.h>
#include <xen/types.h>
+#include <xen/stack-protector.h>
#include <xen/string.h>
#include <xen/serial.h>
#include <xen/sched.h>
@@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr)
*/
system_state = SYS_STATE_boot;
+ boot_stack_chk_guard_setup();
+
if ( acpi_disabled )
{
printk("Booting using Device Tree\n");
--
2.47.1
On 30/11/2024 1:10 am, Volodymyr Babchuk wrote: > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c > index 2e27af4560..f855e97e25 100644 > --- a/xen/arch/arm/setup.c > +++ b/xen/arch/arm/setup.c > @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr) > */ > system_state = SYS_STATE_boot; > > + boot_stack_chk_guard_setup(); > + > if ( acpi_disabled ) > { > printk("Booting using Device Tree\n"); I still think that __stack_chk_guard wants setting up in ASM before entering C. The only reason this call is so late is because Xen's get_random() sequence is less than helpful. That wants rewriting somewhat, but maybe now isn't the best time. Even if you initialise __stack_chk_guard it to -1 rather than 0, it's still got a better chance of catching errors during very early boot; the instrumentation is present, but is using 0 as the canary value. On x86, dumping the current TSC value into __stack_chk_guard would be far better than using -1. Even if it skewed to a lower number, it's unpredictable and not going to reoccur by accident during a stack overrun. Surely ARM has something similar it could use? [edit] Yes, get_cycles(), which every architecture seems to have. In fact, swapping get_random() from NOW() to get_cycles() would be good enough to get it usable from early assembly. ~Andrew A better option for get_random() would be to use a proven PRNG (e.g. one of the xorshift family), seeded with get_cycles() and then re-seeded with a real RDRAND/etc instruction if such a capability is found to exist on the hardware.
On Tue, 3 Dec 2024 at 22:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > On 30/11/2024 1:10 am, Volodymyr Babchuk wrote: > > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c > > index 2e27af4560..f855e97e25 100644 > > --- a/xen/arch/arm/setup.c > > +++ b/xen/arch/arm/setup.c > > @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr) > > */ > > system_state = SYS_STATE_boot; > > > > + boot_stack_chk_guard_setup(); > > + > > if ( acpi_disabled ) > > { > > printk("Booting using Device Tree\n"); > > I still think that __stack_chk_guard wants setting up in ASM before > entering C. > > The only reason this call is so late is because Xen's get_random() > sequence is less than helpful. That wants rewriting somewhat, but maybe > now isn't the best time. > > Even if you initialise __stack_chk_guard it to -1 rather than 0, it's > still got a better chance of catching errors during very early boot; the > instrumentation is present, but is using 0 as the canary value. > > On x86, dumping the current TSC value into __stack_chk_guard would be > far better than using -1. Even if it skewed to a lower number, it's > unpredictable and not going to reoccur by accident during a stack overrun. > > Surely ARM has something similar it could use? There is a optional system register to read a random number. > > [edit] Yes, get_cycles(), which every architecture seems to have. In > fact, swapping get_random() from NOW() to get_cycles() would be good > enough to get it usable from early assembly. Not quite. Technically we can't rely on the timer counter until platform_init_time() is called. This was to cater an issue on the exynos we used in OssTest. But arguably this is the exception rather than the norm because the firmware ought to properly initialize the timer... I haven't checked recent firmware. But I could be convinced to access the counter before hand if we really think that setting __stack_chk_guard from ASM is much better. Cheers,
On 03/12/2024 11:16 pm, Julien Grall wrote: > On Tue, 3 Dec 2024 at 22:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 30/11/2024 1:10 am, Volodymyr Babchuk wrote: >>> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c >>> index 2e27af4560..f855e97e25 100644 >>> --- a/xen/arch/arm/setup.c >>> +++ b/xen/arch/arm/setup.c >>> @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr) >>> */ >>> system_state = SYS_STATE_boot; >>> >>> + boot_stack_chk_guard_setup(); >>> + >>> if ( acpi_disabled ) >>> { >>> printk("Booting using Device Tree\n"); >> I still think that __stack_chk_guard wants setting up in ASM before >> entering C. >> >> The only reason this call is so late is because Xen's get_random() >> sequence is less than helpful. That wants rewriting somewhat, but maybe >> now isn't the best time. >> >> Even if you initialise __stack_chk_guard it to -1 rather than 0, it's >> still got a better chance of catching errors during very early boot; the >> instrumentation is present, but is using 0 as the canary value. >> >> On x86, dumping the current TSC value into __stack_chk_guard would be >> far better than using -1. Even if it skewed to a lower number, it's >> unpredictable and not going to reoccur by accident during a stack overrun. >> >> Surely ARM has something similar it could use? > There is a optional system register to read a random number. Only in v8.5 as far as I can see, and even then it's not required. Also, it suffers from the same problem as RDRAND on x86; we need to boot to at least feature detection before we can safely use it if it's available. > >> [edit] Yes, get_cycles(), which every architecture seems to have. In >> fact, swapping get_random() from NOW() to get_cycles() would be good >> enough to get it usable from early assembly. > Not quite. Technically we can't rely on the timer counter until > platform_init_time() is called. > This was to cater an issue on the exynos we used in OssTest. But > arguably this is the exception > rather than the norm because the firmware ought to properly initialize > the timer... > > I haven't checked recent firmware. But I could be convinced to access > the counter before > hand if we really think that setting __stack_chk_guard from ASM is much better. The C instrumentation is always present, right from the very start of start_xen(). Even working with a canary of 0, there's some value. It will spot clobbering with a non-zero value, but it won't spot e.g. an overly-long memset(, 0). During boot, we're not defending against a malicious entity. Simply defending against bad developer expectations. Therefore, anything to get a non-zero value prior to entering C will be an improvement. Best-effort is fine, and if there's one platform with an errata that causes it to miss out, then oh well. Any other platform which manifests a crash will still lead to the problem being fixed. I suppose taking this argument to it's logical conclusion, we could initialise __stack_chk_guard with a poison pattern, although not one shared by any other poison pattern in Xen. That alone would be better than using 0 in early boot. ~Andrew
Hi Andrew, Andrew Cooper <andrew.cooper3@citrix.com> writes: > On 03/12/2024 11:16 pm, Julien Grall wrote: >> On Tue, 3 Dec 2024 at 22:00, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> On 30/11/2024 1:10 am, Volodymyr Babchuk wrote: >>>> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c >>>> index 2e27af4560..f855e97e25 100644 >>>> --- a/xen/arch/arm/setup.c >>>> +++ b/xen/arch/arm/setup.c >>>> @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr) >>>> */ >>>> system_state = SYS_STATE_boot; >>>> >>>> + boot_stack_chk_guard_setup(); >>>> + >>>> if ( acpi_disabled ) >>>> { >>>> printk("Booting using Device Tree\n"); >>> I still think that __stack_chk_guard wants setting up in ASM before >>> entering C. >>> >>> The only reason this call is so late is because Xen's get_random() >>> sequence is less than helpful. That wants rewriting somewhat, but maybe >>> now isn't the best time. >>> >>> Even if you initialise __stack_chk_guard it to -1 rather than 0, it's >>> still got a better chance of catching errors during very early boot; the >>> instrumentation is present, but is using 0 as the canary value. >>> >>> On x86, dumping the current TSC value into __stack_chk_guard would be >>> far better than using -1. Even if it skewed to a lower number, it's >>> unpredictable and not going to reoccur by accident during a stack overrun. >>> >>> Surely ARM has something similar it could use? >> There is a optional system register to read a random number. > > Only in v8.5 as far as I can see, and even then it's not required. > Also, it suffers from the same problem as RDRAND on x86; we need to boot > to at least feature detection before we can safely use it if it's available. > >> >>> [edit] Yes, get_cycles(), which every architecture seems to have. In >>> fact, swapping get_random() from NOW() to get_cycles() would be good >>> enough to get it usable from early assembly. >> Not quite. Technically we can't rely on the timer counter until >> platform_init_time() is called. >> This was to cater an issue on the exynos we used in OssTest. But >> arguably this is the exception >> rather than the norm because the firmware ought to properly initialize >> the timer... >> >> I haven't checked recent firmware. But I could be convinced to access >> the counter before >> hand if we really think that setting __stack_chk_guard from ASM is much better. > > The C instrumentation is always present, right from the very start of > start_xen(). > > Even working with a canary of 0, there's some value. It will spot > clobbering with a non-zero value, but it won't spot e.g. an overly-long > memset(, 0). > > During boot, we're not defending against a malicious entity. Simply > defending against bad developer expectations. > > Therefore, anything to get a non-zero value prior to entering C will be > an improvement. Best-effort is fine, and if there's one platform with > an errata that causes it to miss out, then oh well. Any other platform > which manifests a crash will still lead to the problem being fixed. > > I suppose taking this argument to it's logical conclusion, we could > initialise __stack_chk_guard with a poison pattern, although not one > shared by any other poison pattern in Xen. That alone would be better > than using 0 in early boot. Okay, so I come with three-stage initialization: 1. Static poison pattern 2. Time-based early value 3. Random-number based long-term value So, apart from already present static always_inline void boot_stack_chk_guard_setup(void); I did this: /* * Initial value is chosen by fair dice roll. * It will be updated during boot process. */ #if BITS_PER_LONG == 32 unsigned long __ro_after_init __stack_chk_guard = 0xdd2cc927; #else unsigned long __ro_after_init __stack_chk_guard = 0x2d853605a4d9a09c; #endif /* This function should be called from ASM only */ void __init asmlinkage boot_stack_chk_guard_setup_early(void) { /* * Linear congruent generator. Constant is taken from * Tables Of Linear Congruential Generators * Of Different Sizes And Good Lattice Structure by Pierre L’Ecuyer */ #if BITS_PER_LONG == 32 const unsigned long a = 2891336453; #else const unsigned long a = 2862933555777941757; #endif const unsigned c = 1; unsigned long cycles = get_cycles(); if ( !cycles ) return; __stack_chk_guard = cycles * a + c; } boot_stack_chk_guard_setup_early() is being called by ASM code during early boot stages. Then, later, boot_stack_chk_guard_setup() is called. If you are okay with this approach, I'll send the next version. -- WBR, Volodymyr
On 06.12.2024 05:46, Volodymyr Babchuk wrote: > So, apart from already present > > static always_inline void boot_stack_chk_guard_setup(void); > > I did this: > > /* > * Initial value is chosen by fair dice roll. > * It will be updated during boot process. > */ > #if BITS_PER_LONG == 32 > unsigned long __ro_after_init __stack_chk_guard = 0xdd2cc927; At least this and ... > #else > unsigned long __ro_after_init __stack_chk_guard = 0x2d853605a4d9a09c; > #endif > > /* This function should be called from ASM only */ > void __init asmlinkage boot_stack_chk_guard_setup_early(void) > { > /* > * Linear congruent generator. Constant is taken from > * Tables Of Linear Congruential Generators > * Of Different Sizes And Good Lattice Structure by Pierre L’Ecuyer > */ > #if BITS_PER_LONG == 32 > const unsigned long a = 2891336453; ... this will need a UL suffix for Misra. Probably best to add UL on all four constants. As to the comment, please adhere to ./CODING_STYLE. > #else > const unsigned long a = 2862933555777941757; > #endif > const unsigned c = 1; I'm having a hard time seeing why this need to be a static variable. Its sole use is ... > unsigned long cycles = get_cycles(); > > if ( !cycles ) > return; > > __stack_chk_guard = cycles * a + c; ... here, where you can as well write a literal 1. Jan
Hi Jan, Jan Beulich <jbeulich@suse.com> writes: [...] > >> #else >> const unsigned long a = 2862933555777941757; >> #endif >> const unsigned c = 1; > > I'm having a hard time seeing why this need to be a static variable. Its > sole use is ... It's a constant in a hope that compiler is smart enough to optimize it out. >> unsigned long cycles = get_cycles(); >> >> if ( !cycles ) >> return; >> >> __stack_chk_guard = cycles * a + c; > > ... here, where you can as well write a literal 1. For readability. Formula for LCG is X_n+1 = (X_n * a + c) mod m. -- WBR, Volodymyr
Hi Volodymyr, On 30/11/2024 01:10, Volodymyr Babchuk wrote: > Enable previously added CONFIG_STACK_PROTECTOR feature for ARM > platform. Here we can call boot_stack_chk_guard_setup() in start_xen() > function, because it never returns, so stack protector code will not > be triggered because of changed canary. It would be good to explain how you decided to call... > > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> > > --- > > In v2: > - Reordered Kconfig entry > --- > xen/arch/arm/Kconfig | 1 + > xen/arch/arm/setup.c | 3 +++ > 2 files changed, 4 insertions(+) > > diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig > index 23bbc91aad..a24c88c327 100644 > --- a/xen/arch/arm/Kconfig > +++ b/xen/arch/arm/Kconfig > @@ -16,6 +16,7 @@ config ARM > select HAS_ALTERNATIVE if HAS_VMAP > select HAS_DEVICE_TREE > select HAS_PASSTHROUGH > + select HAS_STACK_PROTECTOR > select HAS_UBSAN > select IOMMU_FORCE_PT_SHARE > > diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c > index 2e27af4560..f855e97e25 100644 > --- a/xen/arch/arm/setup.c > +++ b/xen/arch/arm/setup.c > @@ -13,6 +13,7 @@ > #include <xen/domain_page.h> > #include <xen/grant_table.h> > #include <xen/types.h> > +#include <xen/stack-protector.h> > #include <xen/string.h> > #include <xen/serial.h> > #include <xen/sched.h> > @@ -341,6 +342,8 @@ void asmlinkage __init start_xen(unsigned long fdt_paddr) > */ > system_state = SYS_STATE_boot; > > + boot_stack_chk_guard_setup(); ... the function here. If I am not mistaken, at this point, cpu_khz (used by NOW() in get_random()) would be zero. It is only initialized by preinit_xen_time() which happens later. So I think it should be called a bit further down and gain a comment to about the call dependencies. Cheers, -- Julien Grall
© 2016 - 2024 Red Hat, Inc.