drivers/char/random.c | 58 ++++++++++++++++++++----------------------- 1 file changed, 27 insertions(+), 31 deletions(-)
Rather than use spinlocks to protect batched entropy, we can instead
disable interrupts locally, since we're dealing with per-cpu data, and
manage resets with a basic generation counter. This should fix up the
below splat that Jonathan received with a PROVE_RAW_LOCK_NESTING=y
kernel.
Note that Sebastian has pointed out a few other areas where
using spinlock_t in an IRQ context is potentially problematic for
PREEMPT_RT. This patch handles one of those cases, and we'll hopefully
have additional patches for other cases.
[ 2.500000] [ BUG: Invalid wait context ]
[ 2.500000] 5.17.0-rc1 #563 Not tainted
[ 2.500000] -----------------------------
[ 2.500000] swapper/1 is trying to lock:
[ 2.500000] c0b0e9cc (batched_entropy_u32.lock){....}-{3:3}, at: invalidate_batched_entropy+0x18/0x4c
[ 2.500000] other info that might help us debug this:
[ 2.500000] context-{2:2}
[ 2.500000] 3 locks held by swapper/1:
[ 2.500000] #0: c0ae86ac (event_mutex){+.+.}-{4:4}, at: event_trace_init+0x4c/0xd8
[ 2.500000] #1: c0ae81b8 (trace_event_sem){+.+.}-{4:4}, at: event_trace_init+0x68/0xd8
[ 2.500000] #2: c19b05cc (&sb->s_type->i_mutex_key#2){+.+.}-{4:4}, at: start_creating+0x40/0xc4
[ 2.500000] stack backtrace:
[ 2.500000] CPU: 0 PID: 1 Comm: swapper Not tainted 5.17.0-rc1 #563
[ 2.500000] Hardware name: WPCM450 chip
[ 2.500000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14)
[ 2.500000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c)
[ 2.500000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354)
[ 2.500000] [<c0054478>] (lock_acquire) from [<c0568028>] (_raw_spin_lock_irqsave+0x60/0x74)
[ 2.500000] [<c0568028>] (_raw_spin_lock_irqsave) from [<c030b6f4>] (invalidate_batched_entropy+0x18/0x4c)
[ 2.500000] [<c030b6f4>] (invalidate_batched_entropy) from [<c030e7fc>] (crng_fast_load+0xf0/0x110)
[ 2.500000] [<c030e7fc>] (crng_fast_load) from [<c030e954>] (add_interrupt_randomness+0x138/0x200)
[ 2.500000] [<c030e954>] (add_interrupt_randomness) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38)
[ 2.500000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c)
[ 2.500000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114)
[ 2.500000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34)
[ 2.500000] [<c0061178>] (handle_irq_desc) from [<c056214c>] (generic_handle_arch_irq+0x28/0x3c)
[ 2.500000] [<c056214c>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80)
[ 2.500000] Exception stack(0xc1485d48 to 0xc1485d90)
[ 2.500000] 5d40: 9780e804 00000001 c09413d4 200000d3 60000053 c016af54
[ 2.500000] 5d60: 00000000 c0afa5b8 c14194e0 c19a1d48 c0789ce0 00000000 c1490480 c1485d98
[ 2.500000] 5d80: c0168970 c0168984 20000053 ffffffff
[ 2.500000] [<c0008eb4>] (__irq_svc) from [<c0168984>] (read_seqbegin.constprop.0+0x6c/0x90)
[ 2.500000] [<c0168984>] (read_seqbegin.constprop.0) from [<c016af54>] (d_lookup+0x14/0x40)
[ 2.500000] [<c016af54>] (d_lookup) from [<c015cecc>] (lookup_dcache+0x18/0x50)
[ 2.500000] [<c015cecc>] (lookup_dcache) from [<c015d868>] (lookup_one_len+0x90/0xe0)
[ 2.500000] [<c015d868>] (lookup_one_len) from [<c01e33e4>] (start_creating+0x68/0xc4)
[ 2.500000] [<c01e33e4>] (start_creating) from [<c01e398c>] (tracefs_create_file+0x30/0x11c)
[ 2.500000] [<c01e398c>] (tracefs_create_file) from [<c00c42f8>] (trace_create_file+0x14/0x38)
[ 2.500000] [<c00c42f8>] (trace_create_file) from [<c00cc854>] (event_create_dir+0x310/0x420)
[ 2.500000] [<c00cc854>] (event_create_dir) from [<c00cc9d8>] (__trace_early_add_event_dirs+0x28/0x50)
[ 2.500000] [<c00cc9d8>] (__trace_early_add_event_dirs) from [<c07c8d64>] (event_trace_init+0x70/0xd8)
[ 2.500000] [<c07c8d64>] (event_trace_init) from [<c07c8560>] (tracer_init_tracefs+0x14/0x284)
[ 2.500000] [<c07c8560>] (tracer_init_tracefs) from [<c000a330>] (do_one_initcall+0xdc/0x288)
[ 2.500000] [<c000a330>] (do_one_initcall) from [<c07bd1e8>] (kernel_init_freeable+0x1c4/0x20c)
[ 2.500000] [<c07bd1e8>] (kernel_init_freeable) from [<c05629c0>] (kernel_init+0x10/0x110)
[ 2.500000] [<c05629c0>] (kernel_init) from [<c00084f8>] (ret_from_fork+0x14/0x3c)
[ 2.500000] Exception stack(0xc1485fb0 to 0xc1485ff8)
[ 2.500000] 5fa0: 00000000 00000000 00000000 00000000
[ 2.500000] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 2.500000] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
Reported-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
- We move from Andy's original patch, which was a bit racey, to using a
simple generation counter.
drivers/char/random.c | 58 ++++++++++++++++++++-----------------------
1 file changed, 27 insertions(+), 31 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index b411182df6f6..8b18b3f1c317 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -2057,13 +2057,15 @@ struct ctl_table random_table[] = {
};
#endif /* CONFIG_SYSCTL */
+static atomic_t batch_generation = ATOMIC_INIT(0);
+
struct batched_entropy {
union {
u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)];
u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
};
unsigned int position;
- spinlock_t batch_lock;
+ int generation;
};
/*
@@ -2074,9 +2076,7 @@ struct batched_entropy {
* wait_for_random_bytes() should be called and return 0 at least once at any
* point prior.
*/
-static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
-};
+static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64);
u64 get_random_u64(void)
{
@@ -2084,41 +2084,52 @@ u64 get_random_u64(void)
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;
warn_unseeded_randomness(&previous);
- batch = raw_cpu_ptr(&batched_entropy_u64);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
+ local_irq_save(flags);
+ batch = this_cpu_ptr(&batched_entropy_u64);
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u64);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u64[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_irq_restore(flags);
return ret;
}
EXPORT_SYMBOL(get_random_u64);
-static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
-};
+static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32);
+
u32 get_random_u32(void)
{
u32 ret;
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;
warn_unseeded_randomness(&previous);
- batch = raw_cpu_ptr(&batched_entropy_u32);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
+ local_irq_save(flags);
+ batch = this_cpu_ptr(&batched_entropy_u32);
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u32);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u32[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_irq_restore(flags);
return ret;
}
EXPORT_SYMBOL(get_random_u32);
@@ -2129,22 +2140,7 @@ EXPORT_SYMBOL(get_random_u32);
* next usage. */
static void invalidate_batched_entropy(void)
{
- int cpu;
- unsigned long flags;
-
- for_each_possible_cpu(cpu) {
- struct batched_entropy *batched_entropy;
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
- spin_lock_irqsave(&batched_entropy->batch_lock, flags);
- batched_entropy->position = 0;
- spin_unlock(&batched_entropy->batch_lock);
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu);
- spin_lock(&batched_entropy->batch_lock);
- batched_entropy->position = 0;
- spin_unlock_irqrestore(&batched_entropy->batch_lock, flags);
- }
+ atomic_inc(&batch_generation);
}
/**
--
2.35.0
Hi Jason,
On Fri, Jan 28, 2022 at 11:35:48PM +0100, Jason A. Donenfeld wrote:
> Rather than use spinlocks to protect batched entropy, we can instead
> disable interrupts locally, since we're dealing with per-cpu data, and
> manage resets with a basic generation counter. This should fix up the
> below splat that Jonathan received with a PROVE_RAW_LOCK_NESTING=y
> kernel.
>
> Note that Sebastian has pointed out a few other areas where
> using spinlock_t in an IRQ context is potentially problematic for
> PREEMPT_RT. This patch handles one of those cases, and we'll hopefully
> have additional patches for other cases.
>
> [ 2.500000] [ BUG: Invalid wait context ]
[...]
>
> Reported-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
> Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
> Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Changes v1->v2:
> - We move from Andy's original patch, which was a bit racey, to using a
> simple generation counter.
>
> drivers/char/random.c | 58 ++++++++++++++++++++-----------------------
> 1 file changed, 27 insertions(+), 31 deletions(-)
This patch doesn't seem to apply properly on 5.17-rc1:
$ git am rand2.mbox
Applying: random: remove batched entropy locking
error: patch failed: drivers/char/random.c:2057
error: drivers/char/random.c: patch does not apply
Patch failed at 0001 random: remove batched entropy locking
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
So I cherry-picked it from https://git.zx2c4.com/linux-rng jd/no-batch-lock.
The result looks alright (no splat during boot).
Tested-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
--
With both patches, I get the following about three minutes after boot, but
I guess it will be addressed by one of the other patchsets under discussion:
[ 202.670000] =============================
[ 202.670000] [ BUG: Invalid wait context ]
[ 202.670000] 5.17.0-rc1-00001-g4e53c88bd88e #585 Not tainted
[ 202.670000] -----------------------------
[ 202.670000] swapper/0 is trying to lock:
[ 202.670000] c0b0ff3c (random_write_wait.lock){....}-{3:3}, at: __wake_up_common_lock+0x54/0xb4
[ 202.670000] other info that might help us debug this:
[ 202.670000] context-{2:2}
[ 202.670000] no locks held by swapper/0.
[ 202.670000] stack backtrace:
[ 202.670000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-00001-g4e53c88bd88e #585
[ 202.670000] Hardware name: WPCM450 chip
[ 202.670000] [<c00100a8>] (unwind_backtrace) from [<c000db2c>] (show_stack+0x10/0x14)
[ 202.670000] [<c000db2c>] (show_stack) from [<c0054eec>] (__lock_acquire+0x3f0/0x189c)
[ 202.670000] [<c0054eec>] (__lock_acquire) from [<c0054478>] (lock_acquire+0x2b8/0x354)
[ 202.670000] [<c0054478>] (lock_acquire) from [<c0568088>] (_raw_spin_lock_irqsave+0x60/0x74)
[ 202.670000] [<c0568088>] (_raw_spin_lock_irqsave) from [<c004c34c>] (__wake_up_common_lock+0x54/0xb4)
[ 202.670000] [<c004c34c>] (__wake_up_common_lock) from [<c004c3bc>] (__wake_up+0x10/0x18)
[ 202.670000] [<c004c3bc>] (__wake_up) from [<c030d898>] (crng_reseed.constprop.0+0x240/0x338)
[ 202.670000] [<c030d898>] (crng_reseed.constprop.0) from [<c0061b34>] (handle_irq_event_percpu+0x18/0x38)
[ 202.670000] [<c0061b34>] (handle_irq_event_percpu) from [<c0061b8c>] (handle_irq_event+0x38/0x5c)
[ 202.670000] [<c0061b8c>] (handle_irq_event) from [<c0065b28>] (handle_fasteoi_irq+0x9c/0x114)
[ 202.670000] [<c0065b28>] (handle_fasteoi_irq) from [<c0061178>] (handle_irq_desc+0x24/0x34)
[ 202.670000] [<c0061178>] (handle_irq_desc) from [<c05621ac>] (generic_handle_arch_irq+0x28/0x3c)
[ 202.670000] [<c05621ac>] (generic_handle_arch_irq) from [<c0008eb4>] (__irq_svc+0x54/0x80)
[ 202.670000] Exception stack(0xc0931f20 to 0xc0931f68)
[ 202.670000] 1f20: 00000000 0005317f 0005217f 60000013 00000000 00000000 c0930000 ffffffff
[ 202.670000] 1f40: c0937ac0 00000000 00053177 00000000 600000d3 c0931f70 c000b0e8 c000b0e0
[ 202.670000] 1f60: 60000013 ffffffff
[ 202.670000] [<c0008eb4>] (__irq_svc) from [<c000b0e0>] (arch_cpu_idle+0x24/0x34)
[ 202.670000] [<c000b0e0>] (arch_cpu_idle) from [<c0567e24>] (default_idle_call+0x40/0x80)
[ 202.670000] [<c0567e24>] (default_idle_call) from [<c0046fbc>] (do_idle+0xd4/0x254)
[ 202.670000] [<c0046fbc>] (do_idle) from [<c00474e4>] (cpu_startup_entry+0xc/0x10)
[ 202.670000] [<c00474e4>] (cpu_startup_entry) from [<c07bdedc>] (start_kernel+0x5cc/0x6c0)
[ 202.670000] random: crng init done
Hey Andy, Think I could bug you to review this patch? The general idea is based on your original patch, and I think this fits what we talked about on IRC. I figure we'll probably both page this out of our minds after another week or two of not thinking about it. It's here on cgit if that's easier to look at: https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/commit/?h=jd/no-batch-lock Thanks, Jason
On 2022-02-04 01:27:55 [+0100], Jason A. Donenfeld wrote: > Hey Andy, > > Think I could bug you to review this patch? The general idea is based > on your original patch, and I think this fits what we talked about on > IRC. I figure we'll probably both page this out of our minds after > another week or two of not thinking about it. > > It's here on cgit if that's easier to look at: > https://git.kernel.org/pub/scm/linux/kernel/git/crng/random.git/commit/?h=jd/no-batch-lock Please don't merge this: - This splat only occurs with CONFIG_PROVE_RAW_LOCK_NESTING enabled. This option has this in its description: │ NOTE: There are known nesting problems. So if you enable this │ option expect lockdep splats until these problems have been fully │ addressed which is work in progress. This config switch allows to │ identify and analyze these problems. It will be removed and the │ check permanently enabled once the main issues have been fixed. - The problem identified by the splat affects only PREEMPT_RT. Non-RT is not affected by this. - This patch disables interrupts and invokes extract_crng() which leads to other problems. If this patch is the way then it should be merged together with the other outstanding issues. > Thanks, > Jason Sebastian
Hi Sebastian, Please calm down a bit: this patch doesn't minimize the importance of working out a real solution for PREEMPT_RT, and I'm not under the illusion that this one here is the silver bullet. It does, however, have other merits, which may or may not have anything to do with PREEMPT_RT. To reiterate: I am taking your PREEMPT_RT concerns seriously, and I want to come up with a solution to that, which we're working toward more broadly in that other thread. Per your feedback on v1, this is no longer marked for stable and no longer purports to fix the PREEMPT_RT issues entirely. Actually, a large motivation for this includes the reason why Andy's original patch was laying around in the first place: we're trying to make this code faster. I can improve the commit message a bit though. On Fri, Feb 4, 2022 at 12:10 PM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > - This splat only occurs with CONFIG_PROVE_RAW_LOCK_NESTING enabled. Right, the commit message for v2 mentions that. > - The problem identified by the splat affects only PREEMPT_RT. Non-RT is > not affected by this. Right. > > - This patch disables interrupts and invokes extract_crng() which leads > to other problems. The existing code, which uses a spinlock, also disables interrupts, right? So this isn't actually regressing in that regard. It just doesn't fix your PREEMPT_RT issue, right? Or is the issue you see that spinlock_t is a mutex on PREEMPT_RT, so we're disabling interrupts here in a way that we _weren't_ originally, in a PREEMPT_RT context? If that's the case, then I think I see your objection. I wonder if it'd be enough here to disable preemption instead? But then we run into trouble if this is called from an interrupt. Maybe it'd be best to retain the spinlock_t, which will amount to disabling interrupts on !PREEMPT_RT, since it'll never be contended, but will turn into a mutex on PREEMPT_RT, where it'll do the right thing from an exclusivity perspective. Would this be reasonable? Andy? Any suggestions? Jason
On 2022-02-04 14:42:03 [+0100], Jason A. Donenfeld wrote: > Hi Sebastian, Hi Jason, > Please calm down a bit: this patch doesn't minimize the importance of > working out a real solution for PREEMPT_RT, and I'm not under the > illusion that this one here is the silver bullet. It does, however, > have other merits, which may or may not have anything to do with > PREEMPT_RT. To reiterate: I am taking your PREEMPT_RT concerns > seriously, and I want to come up with a solution to that, which we're > working toward more broadly in that other thread. > > Per your feedback on v1, this is no longer marked for stable and no > longer purports to fix the PREEMPT_RT issues entirely. Actually, a > large motivation for this includes the reason why Andy's original > patch was laying around in the first place: we're trying to make this > code faster. The commit in tree you cited is b43db859a36cb553102c9c80431fc44618703bda. It does not mention anything regarding faster nor the performance improvement and conditions (hoth path, etc). It still has a stable tag. > I can improve the commit message a bit though. > > On Fri, Feb 4, 2022 at 12:10 PM Sebastian Andrzej Siewior > <bigeasy@linutronix.de> wrote: > > - This splat only occurs with CONFIG_PROVE_RAW_LOCK_NESTING enabled. > > Right, the commit message for v2 mentions that. > > > - The problem identified by the splat affects only PREEMPT_RT. Non-RT is > > not affected by this. > > Right. > > > > > - This patch disables interrupts and invokes extract_crng() which leads > > to other problems. > > The existing code, which uses a spinlock, also disables interrupts, > right? So this isn't actually regressing in that regard. It just > doesn't fix your PREEMPT_RT issue, right? The existing code uses spin_lock_irqsave() which do not disable on PREEMPT_RT. The local_irq_save() on the hand does. > Or is the issue you see that spinlock_t is a mutex on PREEMPT_RT, so > we're disabling interrupts here in a way that we _weren't_ originally, > in a PREEMPT_RT context? If that's the case, then I think I see your > objection. Exactly. > I wonder if it'd be enough here to disable preemption instead? But > then we run into trouble if this is called from an interrupt. Disabling preemption does not allow to acquire sleeping locks so no win. > Maybe it'd be best to retain the spinlock_t, which will amount to > disabling interrupts on !PREEMPT_RT, since it'll never be contended, > but will turn into a mutex on PREEMPT_RT, where it'll do the right > thing from an exclusivity perspective. Would this be reasonable? what does retain the spinlock_t mean since we already have a spinlock_t? > Andy? Any suggestions? > > Jason Sebastian
Hi Sebastian, On Fri, Feb 4, 2022 at 3:02 PM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > The commit in tree you cited is b43db859a36cb553102c9c80431fc44618703bda. > It does not mention anything regarding faster nor the performance > improvement and conditions (hoth path, etc). It still has a stable tag. It dropped the Cc: stable@. It still has the Fixes:. I can get rid of the Fixes: too. I'll improve that message a bunch for a potential v3. > > Maybe it'd be best to retain the spinlock_t, which will amount to > > disabling interrupts on !PREEMPT_RT, since it'll never be contended, > > but will turn into a mutex on PREEMPT_RT, where it'll do the right > > thing from an exclusivity perspective. Would this be reasonable? > > what does retain the spinlock_t mean since we already have a spinlock_t? The idea would be to keep using spinlock_t like we do now -- no change there -- but move to using this atomic generation counter so that there's never any contention. Actually, though, I worry that that approach would throw out the gains we're getting by chucking the spinlock in the first place. What if we keep a spinlock_t there on PREEMPT_RT but stick with disabling interrupts on !PREEMPT_RT? I wish there was a solution or an API that amounted to the same thing so there wouldn't need to be an #ifdef, but I don't know what that'd be. Jason
On 2022-02-04 15:11:34 [+0100], Jason A. Donenfeld wrote: > Hi Sebastian, Hi Jason, > On Fri, Feb 4, 2022 at 3:02 PM Sebastian Andrzej Siewior > <bigeasy@linutronix.de> wrote: > > The commit in tree you cited is b43db859a36cb553102c9c80431fc44618703bda. > > It does not mention anything regarding faster nor the performance > > improvement and conditions (hoth path, etc). It still has a stable tag. > > It dropped the Cc: stable@. It still has the Fixes:. I can get rid of > the Fixes: too. I'll improve that message a bunch for a potential v3. Either you argue for bug fixing or performance improvement and I made it clear that it is not bug fixing. That Fixes: tag is enough for Greg to backport it. > > > Maybe it'd be best to retain the spinlock_t, which will amount to > > > disabling interrupts on !PREEMPT_RT, since it'll never be contended, > > > but will turn into a mutex on PREEMPT_RT, where it'll do the right > > > thing from an exclusivity perspective. Would this be reasonable? > > > > what does retain the spinlock_t mean since we already have a spinlock_t? > > The idea would be to keep using spinlock_t like we do now -- no change > there -- but move to using this atomic generation counter so that > there's never any contention. Actually, though, I worry that that > approach would throw out the gains we're getting by chucking the > spinlock in the first place. It is a per-CPU spinlock_t so there should be no contention if there is no cross-CPU access. The overhead are two atomic operations. > What if we keep a spinlock_t there on PREEMPT_RT but stick with > disabling interrupts on !PREEMPT_RT? I wish there was a solution or an > API that amounted to the same thing so there wouldn't need to be an > #ifdef, but I don't know what that'd be. If it is still to much try to look for locallock_t and local_lock_irqsave(). This is kind of like your local_irq_save() but you have lockdep annotations and PREEMPT_RT has a spinlock_t like behaviour. It also documents in-code what the scope of your locking is. > Jason Sebastian
Hi Sebastian, On Fri, Feb 4, 2022 at 3:30 PM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > What if we keep a spinlock_t there on PREEMPT_RT but stick with > > disabling interrupts on !PREEMPT_RT? I wish there was a solution or an > > API that amounted to the same thing so there wouldn't need to be an > > #ifdef, but I don't know what that'd be. > > If it is still to much try to look for locallock_t and > local_lock_irqsave(). This is kind of like your local_irq_save() but > you have lockdep annotations and PREEMPT_RT has a spinlock_t like > behaviour. It also documents in-code what the scope of your locking is. Oh, that's terrific, thanks! Sounds like exactly what we were looking for. I'll respin this patch with that. Jason
Rather than use spinlocks to protect batched entropy, we can instead
disable interrupts locally, since we're dealing with per-cpu data, and
manage resets with a basic generation counter. At the same time, we
can't quite do this on PREEMPT_RT, where we still want spinlocks-as-
mutexes semantics. So we use a local_lock_t, which provides the right
behavior for each. Because this is a per-cpu lock, that generation
counter is still doing the necessary CPU-to-CPU communication.
This should improve performance a bit. It will also fix the linked splat
that Jonathan received with a PROVE_RAW_LOCK_NESTING=y.
Note that Sebastian has pointed out a few other areas where
using spinlock_t in an IRQ context is potentially problematic for
PREEMPT_RT. This patch handles one of those cases, and we'll hopefully
have additional patches for other cases.
Suggested-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Tested-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net>
Link: https://lore.kernel.org/lkml/YfMa0QgsjCVdRAvJ@latitude/
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
drivers/char/random.c | 55 ++++++++++++++++++++++---------------------
1 file changed, 28 insertions(+), 27 deletions(-)
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 455615ac169a..3e54b90a3ff8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1732,13 +1732,16 @@ struct ctl_table random_table[] = {
};
#endif /* CONFIG_SYSCTL */
+static atomic_t batch_generation = ATOMIC_INIT(0);
+
struct batched_entropy {
union {
u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)];
u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
};
+ local_lock_t lock;
unsigned int position;
- spinlock_t batch_lock;
+ int generation;
};
/*
@@ -1750,7 +1753,7 @@ struct batched_entropy {
* point prior.
*/
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u64) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u64.lock),
+ .lock = INIT_LOCAL_LOCK(batched_entropy_u64.lock)
};
u64 get_random_u64(void)
@@ -1759,41 +1762,54 @@ u64 get_random_u64(void)
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;
warn_unseeded_randomness(&previous);
- batch = raw_cpu_ptr(&batched_entropy_u64);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
+ batch = this_cpu_ptr(&batched_entropy_u64);
+ local_lock_irqsave(&batch->lock, flags);
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u64);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u64[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_unlock_irqrestore(&batch->lock, flags);
return ret;
}
EXPORT_SYMBOL(get_random_u64);
static DEFINE_PER_CPU(struct batched_entropy, batched_entropy_u32) = {
- .batch_lock = __SPIN_LOCK_UNLOCKED(batched_entropy_u32.lock),
+ .lock = INIT_LOCAL_LOCK(batched_entropy_u32.lock)
};
+
u32 get_random_u32(void)
{
u32 ret;
unsigned long flags;
struct batched_entropy *batch;
static void *previous;
+ int next_gen;
warn_unseeded_randomness(&previous);
- batch = raw_cpu_ptr(&batched_entropy_u32);
- spin_lock_irqsave(&batch->batch_lock, flags);
- if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0) {
+ batch = this_cpu_ptr(&batched_entropy_u32);
+ local_lock_irqsave(&batch->lock, flags);
+
+ next_gen = atomic_read(&batch_generation);
+ if (batch->position % ARRAY_SIZE(batch->entropy_u32) == 0 ||
+ next_gen != batch->generation) {
extract_crng((u8 *)batch->entropy_u32);
batch->position = 0;
+ batch->generation = next_gen;
}
+
ret = batch->entropy_u32[batch->position++];
- spin_unlock_irqrestore(&batch->batch_lock, flags);
+ local_unlock_irqrestore(&batch->lock, flags);
return ret;
}
EXPORT_SYMBOL(get_random_u32);
@@ -1804,22 +1820,7 @@ EXPORT_SYMBOL(get_random_u32);
* next usage. */
static void invalidate_batched_entropy(void)
{
- int cpu;
- unsigned long flags;
-
- for_each_possible_cpu(cpu) {
- struct batched_entropy *batched_entropy;
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u32, cpu);
- spin_lock_irqsave(&batched_entropy->batch_lock, flags);
- batched_entropy->position = 0;
- spin_unlock(&batched_entropy->batch_lock);
-
- batched_entropy = per_cpu_ptr(&batched_entropy_u64, cpu);
- spin_lock(&batched_entropy->batch_lock);
- batched_entropy->position = 0;
- spin_unlock_irqrestore(&batched_entropy->batch_lock, flags);
- }
+ atomic_inc(&batch_generation);
}
/**
--
2.35.0
On 2022-02-04 16:51:42 [+0100], Jason A. Donenfeld wrote:
> index 455615ac169a..3e54b90a3ff8 100644
> --- a/drivers/char/random.c
> +++ b/drivers/char/random.c
> @@ -1759,41 +1762,54 @@ u64 get_random_u64(void)
> unsigned long flags;
> struct batched_entropy *batch;
> static void *previous;
> + int next_gen;
>
> warn_unseeded_randomness(&previous);
>
> - batch = raw_cpu_ptr(&batched_entropy_u64);
> - spin_lock_irqsave(&batch->batch_lock, flags);
> - if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
> + batch = this_cpu_ptr(&batched_entropy_u64);
> + local_lock_irqsave(&batch->lock, flags);
Does this compile and work? From the looks of it, this should be:
local_lock_irqsave(&batched_entropy_u64.lock, flags);
batch = this_cpu_ptr(&batched_entropy_u64);
and we could do s/this_cpu_ptr/raw_cpu_ptr/
Sebastian
Hi Sebastian, On Fri, Feb 4, 2022 at 4:57 PM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > Does this compile and work? It does, yes. > From the looks of it, this should be: > > local_lock_irqsave(&batched_entropy_u64.lock, flags); > batch = this_cpu_ptr(&batched_entropy_u64); I wasn't aware you could reference percpu things statically like that without going through the ptr helper. Anyway, I'll switch to yours, as taking that lock prior seems better anyway. > > and we could do s/this_cpu_ptr/raw_cpu_ptr/ Ack. Jason
On Fri, Feb 4, 2022 at 4:57 PM Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2022-02-04 16:51:42 [+0100], Jason A. Donenfeld wrote:
> > index 455615ac169a..3e54b90a3ff8 100644
> > --- a/drivers/char/random.c
> > +++ b/drivers/char/random.c
> > @@ -1759,41 +1762,54 @@ u64 get_random_u64(void)
> > unsigned long flags;
> > struct batched_entropy *batch;
> > static void *previous;
> > + int next_gen;
> >
> > warn_unseeded_randomness(&previous);
> >
> > - batch = raw_cpu_ptr(&batched_entropy_u64);
> > - spin_lock_irqsave(&batch->batch_lock, flags);
> > - if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
> > + batch = this_cpu_ptr(&batched_entropy_u64);
> > + local_lock_irqsave(&batch->lock, flags);
>
> Does this compile and work? From the looks of it, this should be:
>
> local_lock_irqsave(&batched_entropy_u64.lock, flags);
> batch = this_cpu_ptr(&batched_entropy_u64);
>
> and we could do s/this_cpu_ptr/raw_cpu_ptr/
Why raw_cpu_ptr? include/linux/percpu-defs.h says about raw_*() operations:
* Operations for contexts where we do not want to do any checks for
* preemptions. Unless strictly necessary, always use [__]this_cpu_*()
* instead.
So when I see a raw_*() percpu thing, I read it as "it is expected
that we can migrate after this point (or we're in some really weird
context where the normal context check doesn't work)". Is that
incorrect?
On Wed, Feb 16, 2022 at 9:01 PM Jann Horn <jannh@google.com> wrote:
>
> On Fri, Feb 4, 2022 at 4:57 PM Sebastian Andrzej Siewior
> <bigeasy@linutronix.de> wrote:
> > On 2022-02-04 16:51:42 [+0100], Jason A. Donenfeld wrote:
> > > index 455615ac169a..3e54b90a3ff8 100644
> > > --- a/drivers/char/random.c
> > > +++ b/drivers/char/random.c
> > > @@ -1759,41 +1762,54 @@ u64 get_random_u64(void)
> > > unsigned long flags;
> > > struct batched_entropy *batch;
> > > static void *previous;
> > > + int next_gen;
> > >
> > > warn_unseeded_randomness(&previous);
> > >
> > > - batch = raw_cpu_ptr(&batched_entropy_u64);
> > > - spin_lock_irqsave(&batch->batch_lock, flags);
> > > - if (batch->position % ARRAY_SIZE(batch->entropy_u64) == 0) {
> > > + batch = this_cpu_ptr(&batched_entropy_u64);
> > > + local_lock_irqsave(&batch->lock, flags);
> >
> > Does this compile and work? From the looks of it, this should be:
> >
> > local_lock_irqsave(&batched_entropy_u64.lock, flags);
> > batch = this_cpu_ptr(&batched_entropy_u64);
> >
> > and we could do s/this_cpu_ptr/raw_cpu_ptr/
>
> Why raw_cpu_ptr? include/linux/percpu-defs.h says about raw_*() operations:
>
> * Operations for contexts where we do not want to do any checks for
> * preemptions. Unless strictly necessary, always use [__]this_cpu_*()
> * instead.
>
> So when I see a raw_*() percpu thing, I read it as "it is expected
> that we can migrate after this point (or we're in some really weird
> context where the normal context check doesn't work)". Is that
> incorrect?
If it says "contexts where we do not want to do any checks for
preemptions", then that would apply here I would think? We're taking a
local lock, which means afterwards there are no preemptions. For
context, the code that got committed after Sebastian's final review
is:
local_lock_irqsave(&batched_entropy_u32.lock, flags);
batch = raw_cpu_ptr(&batched_entropy_u32);
However, I think most other code uses this_cpu_ptr() instead? So not sure.
Jason
On 2022-02-16 21:58:14 [+0100], Jason A. Donenfeld wrote: > > Why raw_cpu_ptr? include/linux/percpu-defs.h says about raw_*() operations: > > > > * Operations for contexts where we do not want to do any checks for > > * preemptions. Unless strictly necessary, always use [__]this_cpu_*() > > * instead. > > > > So when I see a raw_*() percpu thing, I read it as "it is expected > > that we can migrate after this point (or we're in some really weird > > context where the normal context check doesn't work)". Is that > > incorrect? > > If it says "contexts where we do not want to do any checks for > preemptions", then that would apply here I would think? We're taking a > local lock, which means afterwards there are no preemptions. For > context, the code that got committed after Sebastian's final review > is: > > local_lock_irqsave(&batched_entropy_u32.lock, flags); > batch = raw_cpu_ptr(&batched_entropy_u32); > > However, I think most other code uses this_cpu_ptr() instead? So not sure. It depends what you are looking for. raw_cpu_ptr(&batched_entropy_u32) give you the pointer to &batched_entropy_u32 of _this_ CPU - the CPU you are currently running. this_cpu_ptr() does the same except that it has smp_processor_id() in it. smp_processor_id() will yell at you (given you enabled CONFIG_DEBUG_PREEMPT) if the code can migrate to another CPU. So it will yell at you unless: - you disable preemption / migration so code can't migrate - you run on a per-CPU thread i.e. a thread which is bound to a single CPU and therefore can't migrate to another. The local_lock_irqsave() acquires a lock / disables interrupt and therefore can't migrate. So I suggested to use raw_cpu_ptr() as an optimisation in the debug-case. If you disable preemption before accessing a per-CPU variable with this_cpu_ptr() then you _need_ to ensure that nobody is accessing the variable from in-IRQ context. Nobody will yell at you. But if you use a local-lock then you get lockdep annotation and lockdep will complain if you use that variable always in process context and sometimes in-IRQ. > Jason Sebastian
© 2016 - 2026 Red Hat, Inc.