Secure prandom_u32 invocations

Expand all Fold all

[PATCH v4 3/3] x86 mm, x86 architecture (32-bit and 64-bit): arch/x86/mm/kaslr.c: Adds 64bits version of prandom_seed_state

Posted by david.keisarschm@mail.huji.ac.il 2 years, 8 months ago

From: David Keisar Schmidt <david.keisarschm@mail.huji.ac.il>

The memory randomization of the virtual address space of kernel memory regions
(physical memory mapping, vmalloc & vmemmap) inside arch/x86/mm/kaslr.c
is based on the function prandom_bytes_state which uses the prandom_u32 PRNG.

However, the seeding here is done by calling prandom_seed_state,
which effectively uses only 32bits of the seed, which means that observing ONE
region's offset (say 30 bits) can provide the attacker with 2 possible seeds
(from which the attacker can calculate the remaining two regions)

Hence, we implemented an adjusted version of prandom_seed_state, inside kaslr.c
so it takes advantage of all the seed's 64 bits. With this implementation,
enumerating over the seed is quite unfeasible, and attacking the linearity requires ~113
bits which we don't get with two exposed region offsets (but rather up to 30
bits each).

Signed-off-by: David Keisar Schmidt <david.keisarschm@mail.huji.ac.il>
---
Changes since v3:
* We took a different approach, and replaced the invocation of
  prandom_bytes_state, to a revised version which is more secure.

Changes since v2:
* edited commit message.


 arch/x86/mm/kaslr.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index 557f0fe25..5fd73d5ad 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -59,6 +59,29 @@ static inline unsigned long get_padding(struct kaslr_memory_region *region)
 {
 	return (region->size_tb << TB_SHIFT);
 }
+static inline void _kaslr_prandom_seed_state(struct rnd_state *state, u64 seed)
+{
+	u32 i = ((seed >> 32) ^ (seed << 10) ^ seed) & 0xffffffffUL;
+	// To take advantage of all 64 bits of the seed
+	u32 j = ((seed>>32) ^ (seed<<10)) & 0xffffffffUL;
+	state->s1 = __seed(i,   2U);
+	state->s2 = __seed(j,   8U);
+	/* Ensure no obvious linear relation with the previous states */
+	state->s3 = __seed(next_pseudo_random32(i+j),  16U);
+	state->s4 = __seed(next_pseudo_random32(j-((i>>16)^(i<<16))), 128U);
+
+	/* Calling RNG ten times to satisfy recurrence condition */
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+	prandom_u32_state(state);
+}
 
 /* Initialize base and padding for each memory region randomized with KASLR */
 void __init kernel_randomize_memory(void)
@@ -113,7 +136,8 @@ void __init kernel_randomize_memory(void)
 	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
 		remain_entropy -= get_padding(&kaslr_regions[i]);
 
-	prandom_seed_state(&rand_state, kaslr_get_random_long("Memory"));
+	_kaslr_prandom_seed_state(&rand_state, kaslr_get_random_long
+				   ("Memory"));
 
 	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
 		unsigned long entropy;
-- 
2.38.0

Re: [PATCH v4 3/3] x86 mm, x86 architecture (32-bit and 64-bit): arch/x86/mm/kaslr.c: Adds 64bits version of prandom_seed_state

Posted by H. Peter Anvin 2 years, 8 months ago

On 1/13/23 13:39, david.keisarschm@mail.huji.ac.il wrote:
> +{
> +	u32 i = ((seed >> 32) ^ (seed << 10) ^ seed) & 0xffffffffUL;
> +	// To take advantage of all 64 bits of the seed
> +	u32 j = ((seed>>32) ^ (seed<<10)) & 0xffffffffUL;

The and operation here is pointless. You are implicitly casting to u32. 
Even if you had to mask explicitly, (u32) would be a lot more clear than 
a hard-to-read constant.

> +	state->s1 = __seed(i,   2U);
> +	state->s2 = __seed(j,   8U);
> +	/* Ensure no obvious linear relation with the previous states */
> +	state->s3 = __seed(next_pseudo_random32(i+j),  16U);
> +	state->s4 = __seed(next_pseudo_random32(j-((i>>16)^(i<<16))), 128U);
> +
> +	/* Calling RNG ten times to satisfy recurrence condition */
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +	prandom_u32_state(state);
> +}

What recurrence relation is that? This looks like you are just trying to 
re-invoke prandom_warmup(), but for heaven's sake, don't open-code it!

In fact, it seems you are just hacking an alternate version of 
prandom_seed_full_state(). Why not just change prandom_seed_full_state() 
if that is motivated? (You may need to factor out the iteration over all 
CPUs.)

Honestly, I'm not a super expert in PRNGs, but this seems both slow (the 
*only* motivation for prandom_u32() is to be fast) and pointless. If we 
are relying on this for security, we should be using something that is 
actually cryptographic; especially since this is only invoked on boot, 
which is where entropy is maximally scarce and PRNGs maximally 
vulnerable due to being close to their seed state.

Additionally, in terms of creating performant mixing functions, one 
thing that comes to mind is the circular multiply:

static inline u32 circular_multiply(u32 a, u32 b)
{
	u64 m = (u64)a * b;
	return m + (m >> 32);
}

On x86, this is two instructions, even on 32 bits.

One can use ^ instead of + to make it a bit less algebraic, but I don't 
know for sure that that is a net improvement.

	-hpa

[PATCH v4 1/3] slab_allocator: mm/slab.c: Replace invocation of weak PRNG
[PATCH v4 2/3] slab_allocator: mm/slab_common.c: Replace invocation of weak PRNG
[PATCH v4 3/3] x86 mm, x86 architecture (32-bit and 64-bit): arch/x86/mm/kaslr.c: Adds 64bits version of prandom_seed_state