[PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes

Leon Hwang posted 3 patches 2 weeks, 5 days ago
[PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Posted by Leon Hwang 2 weeks, 5 days ago
Switch the free-node pop paths to raw_spin_trylock*() to avoid blocking
on contended LRU locks.

If the global or per-CPU LRU lock is unavailable, refuse to refill the
local free list and return NULL instead. This allows callers to back
off safely rather than blocking or re-entering the same lock context.

This change avoids lockdep warnings and potential deadlocks caused by
re-entrant LRU lock acquisition from NMI context, as shown below:

[  418.260323] bpf_testmod: oh no, recursing into test_1, recursion_misses 1
[  424.982207] ================================
[  424.982216] WARNING: inconsistent lock state
[  424.982223] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[  424.982314]  *** DEADLOCK ***
[...]

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
---
 kernel/bpf/bpf_lru_list.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
index c091f3232cc5..03d37f72731a 100644
--- a/kernel/bpf/bpf_lru_list.c
+++ b/kernel/bpf/bpf_lru_list.c
@@ -312,14 +312,15 @@ static void bpf_lru_list_push_free(struct bpf_lru_list *l,
 	raw_spin_unlock_irqrestore(&l->lock, flags);
 }
 
-static void bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
+static bool bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
 					   struct bpf_lru_locallist *loc_l)
 {
 	struct bpf_lru_list *l = &lru->common_lru.lru_list;
 	struct bpf_lru_node *node, *tmp_node;
 	unsigned int nfree = 0;
 
-	raw_spin_lock(&l->lock);
+	if (!raw_spin_trylock(&l->lock))
+		return false;
 
 	__local_list_flush(l, loc_l);
 
@@ -339,6 +340,8 @@ static void bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
 				      BPF_LRU_LOCAL_LIST_T_FREE);
 
 	raw_spin_unlock(&l->lock);
+
+	return true;
 }
 
 /*
@@ -418,7 +421,8 @@ static struct bpf_lru_node *bpf_percpu_lru_pop_free(struct bpf_lru *lru,
 
 	l = per_cpu_ptr(lru->percpu_lru, cpu);
 
-	raw_spin_lock_irqsave(&l->lock, flags);
+	if (!raw_spin_trylock_irqsave(&l->lock, flags))
+		return NULL;
 
 	__bpf_lru_list_rotate(lru, l);
 
@@ -451,13 +455,12 @@ static struct bpf_lru_node *bpf_common_lru_pop_free(struct bpf_lru *lru,
 
 	loc_l = per_cpu_ptr(clru->local_list, cpu);
 
-	raw_spin_lock_irqsave(&loc_l->lock, flags);
+	if (!raw_spin_trylock_irqsave(&loc_l->lock, flags))
+		return NULL;
 
 	node = __local_list_pop_free(loc_l);
-	if (!node) {
-		bpf_lru_list_pop_free_to_local(lru, loc_l);
+	if (!node && bpf_lru_list_pop_free_to_local(lru, loc_l))
 		node = __local_list_pop_free(loc_l);
-	}
 
 	if (node)
 		__local_list_add_pending(lru, loc_l, cpu, node, hash);
-- 
2.52.0
Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Posted by Daniel Borkmann 2 weeks, 5 days ago
On 1/19/26 3:21 PM, Leon Hwang wrote:
> Switch the free-node pop paths to raw_spin_trylock*() to avoid blocking
> on contended LRU locks.
> 
> If the global or per-CPU LRU lock is unavailable, refuse to refill the
> local free list and return NULL instead. This allows callers to back
> off safely rather than blocking or re-entering the same lock context.
> 
> This change avoids lockdep warnings and potential deadlocks caused by
> re-entrant LRU lock acquisition from NMI context, as shown below:
> 
> [  418.260323] bpf_testmod: oh no, recursing into test_1, recursion_misses 1
> [  424.982207] ================================
> [  424.982216] WARNING: inconsistent lock state
> [  424.982223] inconsistent {INITIAL USE} -> {IN-NMI} usage.
> [  424.982314]  *** DEADLOCK ***
> [...]
> 
> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> ---
>   kernel/bpf/bpf_lru_list.c | 17 ++++++++++-------
>   1 file changed, 10 insertions(+), 7 deletions(-)

Documentation/bpf/map_lru_hash_update.dot needs update?

> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
> index c091f3232cc5..03d37f72731a 100644
> --- a/kernel/bpf/bpf_lru_list.c
> +++ b/kernel/bpf/bpf_lru_list.c
> @@ -312,14 +312,15 @@ static void bpf_lru_list_push_free(struct bpf_lru_list *l,
>   	raw_spin_unlock_irqrestore(&l->lock, flags);
>   }
>   
> -static void bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
> +static bool bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
>   					   struct bpf_lru_locallist *loc_l)
>   {
>   	struct bpf_lru_list *l = &lru->common_lru.lru_list;
>   	struct bpf_lru_node *node, *tmp_node;
>   	unsigned int nfree = 0;
>   
> -	raw_spin_lock(&l->lock);
> +	if (!raw_spin_trylock(&l->lock))
> +		return false;
>   

Could you provide some more analysis, and the effect this has on real-world
programs? Presumably they'll unexpectedly encounter a lot more frequent
-ENOMEM as an error on bpf_map_update_elem even though memory might be
available just that locks are contended?

Also, have you considered rqspinlock as a potential candidate to discover
deadlocks?
Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Posted by Leon Hwang 2 weeks, 5 days ago

On 20/1/26 03:47, Daniel Borkmann wrote:
> On 1/19/26 3:21 PM, Leon Hwang wrote:
>> Switch the free-node pop paths to raw_spin_trylock*() to avoid blocking
>> on contended LRU locks.
>>
>> If the global or per-CPU LRU lock is unavailable, refuse to refill the
>> local free list and return NULL instead. This allows callers to back
>> off safely rather than blocking or re-entering the same lock context.
>>
>> This change avoids lockdep warnings and potential deadlocks caused by
>> re-entrant LRU lock acquisition from NMI context, as shown below:
>>
>> [  418.260323] bpf_testmod: oh no, recursing into test_1,
>> recursion_misses 1
>> [  424.982207] ================================
>> [  424.982216] WARNING: inconsistent lock state
>> [  424.982223] inconsistent {INITIAL USE} -> {IN-NMI} usage.
>> [  424.982314]  *** DEADLOCK ***
>> [...]
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>   kernel/bpf/bpf_lru_list.c | 17 ++++++++++-------
>>   1 file changed, 10 insertions(+), 7 deletions(-)
> 
> Documentation/bpf/map_lru_hash_update.dot needs update?
> 

Yes, it needs update.

>> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
>> index c091f3232cc5..03d37f72731a 100644
>> --- a/kernel/bpf/bpf_lru_list.c
>> +++ b/kernel/bpf/bpf_lru_list.c
>> @@ -312,14 +312,15 @@ static void bpf_lru_list_push_free(struct
>> bpf_lru_list *l,
>>       raw_spin_unlock_irqrestore(&l->lock, flags);
>>   }
>>   -static void bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
>> +static bool bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
>>                          struct bpf_lru_locallist *loc_l)
>>   {
>>       struct bpf_lru_list *l = &lru->common_lru.lru_list;
>>       struct bpf_lru_node *node, *tmp_node;
>>       unsigned int nfree = 0;
>>   -    raw_spin_lock(&l->lock);
>> +    if (!raw_spin_trylock(&l->lock))
>> +        return false;
>>   
> 
> Could you provide some more analysis, and the effect this has on real-world
> programs? Presumably they'll unexpectedly encounter a lot more frequent
> -ENOMEM as an error on bpf_map_update_elem even though memory might be
> available just that locks are contended?
> 
> Also, have you considered rqspinlock as a potential candidate to discover
> deadlocks?

Thanks for the questions.

While I haven’t encountered this issue in production systems myself, the
deadlock has been observed repeatedly in practice, including the cases
shown in the cover letter. It can also be reproduced reliably when
running the LRU tests locally, so this is a real and recurring problem.

I agree that returning -ENOMEM when locks are contended is not ideal.
Using -EBUSY would better reflect the situation where memory is
available but forward progress is temporarily blocked by lock
contention. I can update the patch accordingly.

Regarding rqspinlock: as mentioned in the cover letter, Menglong
previously explored using rqspinlock to address these deadlocks but was
unable to arrive at a complete solution. After further off-list
discussion, we agreed that using trylock is a more practical approach
here. In most observed cases, the lock contention leading to deadlock
occurs in bpf_common_lru_pop_free(), and trylock allows callers to back
off safely rather than risking re-entrancy and deadlock.

Thanks,
Leon

Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Posted by Alexei Starovoitov 2 weeks, 5 days ago
On Mon, Jan 19, 2026 at 5:50 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 20/1/26 03:47, Daniel Borkmann wrote:
> > On 1/19/26 3:21 PM, Leon Hwang wrote:
> >> Switch the free-node pop paths to raw_spin_trylock*() to avoid blocking
> >> on contended LRU locks.
> >>
> >> If the global or per-CPU LRU lock is unavailable, refuse to refill the
> >> local free list and return NULL instead. This allows callers to back
> >> off safely rather than blocking or re-entering the same lock context.
> >>
> >> This change avoids lockdep warnings and potential deadlocks caused by
> >> re-entrant LRU lock acquisition from NMI context, as shown below:
> >>
> >> [  418.260323] bpf_testmod: oh no, recursing into test_1,
> >> recursion_misses 1
> >> [  424.982207] ================================
> >> [  424.982216] WARNING: inconsistent lock state
> >> [  424.982223] inconsistent {INITIAL USE} -> {IN-NMI} usage.
> >> [  424.982314]  *** DEADLOCK ***
> >> [...]
> >>
> >> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
> >> ---
> >>   kernel/bpf/bpf_lru_list.c | 17 ++++++++++-------
> >>   1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > Documentation/bpf/map_lru_hash_update.dot needs update?
> >
>
> Yes, it needs update.
>
> >> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
> >> index c091f3232cc5..03d37f72731a 100644
> >> --- a/kernel/bpf/bpf_lru_list.c
> >> +++ b/kernel/bpf/bpf_lru_list.c
> >> @@ -312,14 +312,15 @@ static void bpf_lru_list_push_free(struct
> >> bpf_lru_list *l,
> >>       raw_spin_unlock_irqrestore(&l->lock, flags);
> >>   }
> >>   -static void bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
> >> +static bool bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
> >>                          struct bpf_lru_locallist *loc_l)
> >>   {
> >>       struct bpf_lru_list *l = &lru->common_lru.lru_list;
> >>       struct bpf_lru_node *node, *tmp_node;
> >>       unsigned int nfree = 0;
> >>   -    raw_spin_lock(&l->lock);
> >> +    if (!raw_spin_trylock(&l->lock))
> >> +        return false;
> >>
> >
> > Could you provide some more analysis, and the effect this has on real-world
> > programs? Presumably they'll unexpectedly encounter a lot more frequent
> > -ENOMEM as an error on bpf_map_update_elem even though memory might be
> > available just that locks are contended?
> >
> > Also, have you considered rqspinlock as a potential candidate to discover
> > deadlocks?
>
> Thanks for the questions.
>
> While I haven’t encountered this issue in production systems myself, the
> deadlock has been observed repeatedly in practice, including the cases
> shown in the cover letter. It can also be reproduced reliably when
> running the LRU tests locally, so this is a real and recurring problem.
>
> I agree that returning -ENOMEM when locks are contended is not ideal.
> Using -EBUSY would better reflect the situation where memory is
> available but forward progress is temporarily blocked by lock
> contention. I can update the patch accordingly.
>
> Regarding rqspinlock: as mentioned in the cover letter, Menglong
> previously explored using rqspinlock to address these deadlocks but was
> unable to arrive at a complete solution. After further off-list
> discussion, we agreed that using trylock is a more practical approach
> here. In most observed cases, the lock contention leading to deadlock
> occurs in bpf_common_lru_pop_free(), and trylock allows callers to back
> off safely rather than risking re-entrancy and deadlock.

Sorry, trylock is not an option here.

We are not going to sacrifice LRU map reliability for the sake of syzbot.
Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Posted by bot+bpf-ci@kernel.org 2 weeks, 5 days ago
> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
> index c091f3232cc5..03d37f72731a 100644
> --- a/kernel/bpf/bpf_lru_list.c
> +++ b/kernel/bpf/bpf_lru_list.c

[ ... ]

> @@ -451,13 +455,12 @@ static struct bpf_lru_node *bpf_common_lru_pop_free(struct bpf_lru *lru,
>
>  	loc_l = per_cpu_ptr(clru->local_list, cpu);
>
> -	raw_spin_lock_irqsave(&loc_l->lock, flags);
> +	if (!raw_spin_trylock_irqsave(&loc_l->lock, flags))
> +		return NULL;
>
>  	node = __local_list_pop_free(loc_l);
> -	if (!node) {
> -		bpf_lru_list_pop_free_to_local(lru, loc_l);
> +	if (!node && bpf_lru_list_pop_free_to_local(lru, loc_l))
>  		node = __local_list_pop_free(loc_l);
> -	}
>
>  	if (node)
>  		__local_list_add_pending(lru, loc_l, cpu, node, hash);

The trylock conversion here looks correct, but what about the steal path
that follows later in bpf_common_lru_pop_free()? Looking at the existing
code (not shown in this diff), the steal loop still uses unconditional
raw_spin_lock_irqsave():

    do {
        steal_loc_l = per_cpu_ptr(clru->local_list, steal);

        raw_spin_lock_irqsave(&steal_loc_l->lock, flags);
        ...
    } while (...);

If the goal is to avoid NMI-context deadlocks when acquiring LRU locks,
can the same deadlock scenario occur when NMI interrupts during the steal
loop and the NMI handler tries to acquire the same steal_loc_l->lock?

Similarly, after a successful steal, there is another unconditional lock:

    if (node) {
        raw_spin_lock_irqsave(&loc_l->lock, flags);
        __local_list_add_pending(lru, loc_l, cpu, node, hash);
        raw_spin_unlock_irqrestore(&loc_l->lock, flags);
    }

Should these also use trylock to maintain consistency with the stated goal
of avoiding NMI-context deadlocks?


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21147913717
Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Posted by Leon Hwang 2 weeks, 5 days ago

On 20/1/26 02:46, bot+bpf-ci@kernel.org wrote:
>> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
>> index c091f3232cc5..03d37f72731a 100644
>> --- a/kernel/bpf/bpf_lru_list.c
>> +++ b/kernel/bpf/bpf_lru_list.c
> 
> [ ... ]
> 
>> @@ -451,13 +455,12 @@ static struct bpf_lru_node *bpf_common_lru_pop_free(struct bpf_lru *lru,
>>
>>  	loc_l = per_cpu_ptr(clru->local_list, cpu);
>>
>> -	raw_spin_lock_irqsave(&loc_l->lock, flags);
>> +	if (!raw_spin_trylock_irqsave(&loc_l->lock, flags))
>> +		return NULL;
>>
>>  	node = __local_list_pop_free(loc_l);
>> -	if (!node) {
>> -		bpf_lru_list_pop_free_to_local(lru, loc_l);
>> +	if (!node && bpf_lru_list_pop_free_to_local(lru, loc_l))
>>  		node = __local_list_pop_free(loc_l);
>> -	}
>>
>>  	if (node)
>>  		__local_list_add_pending(lru, loc_l, cpu, node, hash);
> 
> The trylock conversion here looks correct, but what about the steal path
> that follows later in bpf_common_lru_pop_free()? Looking at the existing
> code (not shown in this diff), the steal loop still uses unconditional
> raw_spin_lock_irqsave():
> 
>     do {
>         steal_loc_l = per_cpu_ptr(clru->local_list, steal);
> 
>         raw_spin_lock_irqsave(&steal_loc_l->lock, flags);
>         ...
>     } while (...);
> 
> If the goal is to avoid NMI-context deadlocks when acquiring LRU locks,
> can the same deadlock scenario occur when NMI interrupts during the steal
> loop and the NMI handler tries to acquire the same steal_loc_l->lock?
> 
> Similarly, after a successful steal, there is another unconditional lock:
> 
>     if (node) {
>         raw_spin_lock_irqsave(&loc_l->lock, flags);
>         __local_list_add_pending(lru, loc_l, cpu, node, hash);
>         raw_spin_unlock_irqrestore(&loc_l->lock, flags);
>     }
> 
> Should these also use trylock to maintain consistency with the stated goal
> of avoiding NMI-context deadlocks?
> 

This patch is not intended to eliminate all possible deadlock scenarios.
Its goal is to avoid deadlocks caused by long-lived critical sections
in the free-node pop paths, where lock contention can persist and lead
to re-entrant lock acquisition from NMI context.

The steal path and the post-steal update are both short-lived critical
sections. They do not exhibit the same contention characteristics and
have not been observed to trigger the reported deadlock scenarios.
Converting these paths to trylock would add complexity without clear
benefit, and is therefore unnecessary for the stated goal of this change.

Thanks,
Leon

> 
> ---
> AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
> See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> 
> CI run summary: https://github.com/kernel-patches/bpf/actions/runs/21147913717
Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Posted by Alexei Starovoitov 2 weeks, 5 days ago
On Mon, Jan 19, 2026 at 5:57 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>
>
>
> On 20/1/26 02:46, bot+bpf-ci@kernel.org wrote:
> >> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
> >> index c091f3232cc5..03d37f72731a 100644
> >> --- a/kernel/bpf/bpf_lru_list.c
> >> +++ b/kernel/bpf/bpf_lru_list.c
> >
> > [ ... ]
> >
> >> @@ -451,13 +455,12 @@ static struct bpf_lru_node *bpf_common_lru_pop_free(struct bpf_lru *lru,
> >>
> >>      loc_l = per_cpu_ptr(clru->local_list, cpu);
> >>
> >> -    raw_spin_lock_irqsave(&loc_l->lock, flags);
> >> +    if (!raw_spin_trylock_irqsave(&loc_l->lock, flags))
> >> +            return NULL;
> >>
> >>      node = __local_list_pop_free(loc_l);
> >> -    if (!node) {
> >> -            bpf_lru_list_pop_free_to_local(lru, loc_l);
> >> +    if (!node && bpf_lru_list_pop_free_to_local(lru, loc_l))
> >>              node = __local_list_pop_free(loc_l);
> >> -    }
> >>
> >>      if (node)
> >>              __local_list_add_pending(lru, loc_l, cpu, node, hash);
> >
> > The trylock conversion here looks correct, but what about the steal path
> > that follows later in bpf_common_lru_pop_free()? Looking at the existing
> > code (not shown in this diff), the steal loop still uses unconditional
> > raw_spin_lock_irqsave():
> >
> >     do {
> >         steal_loc_l = per_cpu_ptr(clru->local_list, steal);
> >
> >         raw_spin_lock_irqsave(&steal_loc_l->lock, flags);
> >         ...
> >     } while (...);
> >
> > If the goal is to avoid NMI-context deadlocks when acquiring LRU locks,
> > can the same deadlock scenario occur when NMI interrupts during the steal
> > loop and the NMI handler tries to acquire the same steal_loc_l->lock?
> >
> > Similarly, after a successful steal, there is another unconditional lock:
> >
> >     if (node) {
> >         raw_spin_lock_irqsave(&loc_l->lock, flags);
> >         __local_list_add_pending(lru, loc_l, cpu, node, hash);
> >         raw_spin_unlock_irqrestore(&loc_l->lock, flags);
> >     }
> >
> > Should these also use trylock to maintain consistency with the stated goal
> > of avoiding NMI-context deadlocks?
> >
>
> This patch is not intended to eliminate all possible deadlock scenarios.
> Its goal is to avoid deadlocks caused by long-lived critical sections
> in the free-node pop paths, where lock contention can persist and lead
> to re-entrant lock acquisition from NMI context.
>
> The steal path and the post-steal update are both short-lived critical
> sections. They do not exhibit the same contention characteristics and
> have not been observed to trigger the reported deadlock scenarios.
> Converting these paths to trylock would add complexity without clear
> benefit, and is therefore unnecessary for the stated goal of this change.

AI is correct. Either everything needs to be converted or none.
Adding trylock in a few places because syzbot found them is not fixing anything.
Just silencing one (or a few?) syzbot reports.
As I said in the other email, trylock is not an option.
rqspinlock is the only true way of addressing potential deadlocks.
If it's too hard, then leave it as-is. Do not hack things half way.
Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Posted by Leon Hwang 2 weeks, 5 days ago

On 20/1/26 10:01, Alexei Starovoitov wrote:
> On Mon, Jan 19, 2026 at 5:57 PM Leon Hwang <leon.hwang@linux.dev> wrote:
>>
>>
>>
>> On 20/1/26 02:46, bot+bpf-ci@kernel.org wrote:
>>>> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
>>>> index c091f3232cc5..03d37f72731a 100644
>>>> --- a/kernel/bpf/bpf_lru_list.c
>>>> +++ b/kernel/bpf/bpf_lru_list.c
>>>
>>> [ ... ]
>>>
>>>> @@ -451,13 +455,12 @@ static struct bpf_lru_node *bpf_common_lru_pop_free(struct bpf_lru *lru,
>>>>
>>>>      loc_l = per_cpu_ptr(clru->local_list, cpu);
>>>>
>>>> -    raw_spin_lock_irqsave(&loc_l->lock, flags);
>>>> +    if (!raw_spin_trylock_irqsave(&loc_l->lock, flags))
>>>> +            return NULL;
>>>>
>>>>      node = __local_list_pop_free(loc_l);
>>>> -    if (!node) {
>>>> -            bpf_lru_list_pop_free_to_local(lru, loc_l);
>>>> +    if (!node && bpf_lru_list_pop_free_to_local(lru, loc_l))
>>>>              node = __local_list_pop_free(loc_l);
>>>> -    }
>>>>
>>>>      if (node)
>>>>              __local_list_add_pending(lru, loc_l, cpu, node, hash);
>>>
>>> The trylock conversion here looks correct, but what about the steal path
>>> that follows later in bpf_common_lru_pop_free()? Looking at the existing
>>> code (not shown in this diff), the steal loop still uses unconditional
>>> raw_spin_lock_irqsave():
>>>
>>>     do {
>>>         steal_loc_l = per_cpu_ptr(clru->local_list, steal);
>>>
>>>         raw_spin_lock_irqsave(&steal_loc_l->lock, flags);
>>>         ...
>>>     } while (...);
>>>
>>> If the goal is to avoid NMI-context deadlocks when acquiring LRU locks,
>>> can the same deadlock scenario occur when NMI interrupts during the steal
>>> loop and the NMI handler tries to acquire the same steal_loc_l->lock?
>>>
>>> Similarly, after a successful steal, there is another unconditional lock:
>>>
>>>     if (node) {
>>>         raw_spin_lock_irqsave(&loc_l->lock, flags);
>>>         __local_list_add_pending(lru, loc_l, cpu, node, hash);
>>>         raw_spin_unlock_irqrestore(&loc_l->lock, flags);
>>>     }
>>>
>>> Should these also use trylock to maintain consistency with the stated goal
>>> of avoiding NMI-context deadlocks?
>>>
>>
>> This patch is not intended to eliminate all possible deadlock scenarios.
>> Its goal is to avoid deadlocks caused by long-lived critical sections
>> in the free-node pop paths, where lock contention can persist and lead
>> to re-entrant lock acquisition from NMI context.
>>
>> The steal path and the post-steal update are both short-lived critical
>> sections. They do not exhibit the same contention characteristics and
>> have not been observed to trigger the reported deadlock scenarios.
>> Converting these paths to trylock would add complexity without clear
>> benefit, and is therefore unnecessary for the stated goal of this change.
> 
> AI is correct. Either everything needs to be converted or none.
> Adding trylock in a few places because syzbot found them is not fixing anything.
> Just silencing one (or a few?) syzbot reports.
> As I said in the other email, trylock is not an option.
> rqspinlock is the only true way of addressing potential deadlocks.
> If it's too hard, then leave it as-is. Do not hack things half way.

Understood.

Leave it as-is.

Thanks,
Leon