[PATCH 2/2] posix-timers: Use RCU in posix_timer_add()

Eric Dumazet posted 2 patches 10 months, 1 week ago
There is a newer version of this series
[PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
Posted by Eric Dumazet 10 months, 1 week ago
If many posix timers are hashed in posix_timers_hashtable,
hash_lock can be held for long durations.

This can be really bad in some cases as Thomas
explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/

We can perform all searches under RCU, then acquire
the lock only when there is a good chance to need it,
and after cpu caches were populated.

I also added a cond_resched() in the possible long loop.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 kernel/time/posix-timers.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index 204a351a2fd3..dd2f9016d3dc 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
 
 		head = &posix_timers_hashtable[hash(sig, id)];
 
+		rcu_read_lock();
+		if (__posix_timers_find(head, sig, id)) {
+			rcu_read_unlock();
+			cond_resched();
+			continue;
+		}
+		rcu_read_unlock();
 		spin_lock(&hash_lock);
+		/*
+		 * We must perform the lookup under hash_lock protection
+		 * because another thread could have used the same id.
+		 * This is very unlikely, but possible.
+		 */
 		if (!__posix_timers_find(head, sig, id)) {
 			hlist_add_head_rcu(&timer->t_hash, head);
 			spin_unlock(&hash_lock);
-- 
2.48.1.601.g30ceb7b040-goog
Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
Posted by Thomas Gleixner 10 months ago
On Fri, Feb 14 2025 at 13:59, Eric Dumazet wrote:

> If many posix timers are hashed in posix_timers_hashtable,
> hash_lock can be held for long durations.
>
> This can be really bad in some cases as Thomas
> explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/

I really hate the horrible ABI which we can't get rid of w/o breaking
CRIU.

The global hash really needs to go away and be replaced by a per signal
xarray. That can be done, but due to CRIU there is no way to make this
non-sparse by reusing holes, which are created by deleted timers.

The sad truth is that the kernel has absolutely zero clue that this
happens in a CRIU restore operation context, unless I'm missing
something.

If it would be able to detect it, then we could work around it
somehow. But without that there is not much we can do aside of breaking
the ABI.

Though in the above thread the CRIU people already signaled that they
are willing to work out a migration scheme. I just forgot to revisit
this. Let me stare at it some more.

> We can perform all searches under RCU, then acquire
> the lock only when there is a good chance to need it,
> and after cpu caches were populated.
>
> I also added a cond_resched() in the possible long loop.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  kernel/time/posix-timers.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 204a351a2fd3..dd2f9016d3dc 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
>  
>  		head = &posix_timers_hashtable[hash(sig, id)];
>  
> +		rcu_read_lock();
> +		if (__posix_timers_find(head, sig, id)) {
> +			rcu_read_unlock();
> +			cond_resched();
> +			continue;
> +		}
> +		rcu_read_unlock();
>  		spin_lock(&hash_lock);
> +		/*
> +		 * We must perform the lookup under hash_lock protection
> +		 * because another thread could have used the same id.

Hmm, that won't help and is broken already today as timer->id is set at
the call site after releasing hash_lock.

> +		 * This is very unlikely, but possible.

Only if the process is able to install INT_MAX - 1 timers and the stupid
search wraps around (INT_MAX loops) on the other thread and ends up at
the same number again. But yes, theoretically it's possible. :)

So the timer ID must be set _before_ adding it to the hash list, but
that wants to be a seperate patch.

> +		 */
>  		if (!__posix_timers_find(head, sig, id)) {
>  			hlist_add_head_rcu(&timer->t_hash, head);
>  			spin_unlock(&hash_lock);

Thanks,

        tglx
Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
Posted by Thomas Gleixner 10 months ago
On Mon, Feb 17 2025 at 20:24, Thomas Gleixner wrote:
> On Fri, Feb 14 2025 at 13:59, Eric Dumazet wrote:
>> @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
>>  
>>  		head = &posix_timers_hashtable[hash(sig, id)];
>>  
>> +		rcu_read_lock();
>> +		if (__posix_timers_find(head, sig, id)) {
>> +			rcu_read_unlock();
>> +			cond_resched();
>> +			continue;
>> +		}
>> +		rcu_read_unlock();
>>  		spin_lock(&hash_lock);
>> +		/*
>> +		 * We must perform the lookup under hash_lock protection
>> +		 * because another thread could have used the same id.
>
> Hmm, that won't help and is broken already today as timer->id is set at
> the call site after releasing hash_lock.
>
>> +		 * This is very unlikely, but possible.
>
> Only if the process is able to install INT_MAX - 1 timers and the stupid
> search wraps around (INT_MAX loops) on the other thread and ends up at
> the same number again. But yes, theoretically it's possible. :)
>
> So the timer ID must be set _before_ adding it to the hash list, but
> that wants to be a seperate patch.

It's even worse. __posix_timers_find() checks for both timer->it_id and
timer->it_signal, but the latter is only set when the timer is about to
go live. I have an idea, but that might be a bad one :)

Thanks,

        tglx
Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
Posted by David Laight 10 months, 1 week ago
On Fri, 14 Feb 2025 13:59:11 +0000
Eric Dumazet <edumazet@google.com> wrote:

> If many posix timers are hashed in posix_timers_hashtable,
> hash_lock can be held for long durations.
> 
> This can be really bad in some cases as Thomas
> explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/
> 
> We can perform all searches under RCU, then acquire
> the lock only when there is a good chance to need it,
> and after cpu caches were populated.
> 
> I also added a cond_resched() in the possible long loop.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>  kernel/time/posix-timers.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> index 204a351a2fd3..dd2f9016d3dc 100644
> --- a/kernel/time/posix-timers.c
> +++ b/kernel/time/posix-timers.c
> @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
>  
>  		head = &posix_timers_hashtable[hash(sig, id)];
>  
> +		rcu_read_lock();
> +		if (__posix_timers_find(head, sig, id)) {
> +			rcu_read_unlock();
> +			cond_resched();
> +			continue;
> +		}
> +		rcu_read_unlock();
>  		spin_lock(&hash_lock);
> +		/*
> +		 * We must perform the lookup under hash_lock protection
> +		 * because another thread could have used the same id.
> +		 * This is very unlikely, but possible.
> +		 */

If next_posix_timer_id is 64bit (so can't wrap) I think you can compare the
(unmasked by MAX_INT) value being used with the current value.
If the difference is small (well less than MAX_INT) I don't think you need
the rescan.
(Not going to help 32bit - but who cares :-)

	David

>  		if (!__posix_timers_find(head, sig, id)) {
>  			hlist_add_head_rcu(&timer->t_hash, head);
>  			spin_unlock(&hash_lock);
Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
Posted by Eric Dumazet 10 months, 1 week ago
On Fri, Feb 14, 2025 at 5:59 PM David Laight
<david.laight.linux@gmail.com> wrote:
>
> On Fri, 14 Feb 2025 13:59:11 +0000
> Eric Dumazet <edumazet@google.com> wrote:
>
> > If many posix timers are hashed in posix_timers_hashtable,
> > hash_lock can be held for long durations.
> >
> > This can be really bad in some cases as Thomas
> > explained in https://lore.kernel.org/all/87ednpyyeo.ffs@tglx/
> >
> > We can perform all searches under RCU, then acquire
> > the lock only when there is a good chance to need it,
> > and after cpu caches were populated.
> >
> > I also added a cond_resched() in the possible long loop.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > ---
> >  kernel/time/posix-timers.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
> > index 204a351a2fd3..dd2f9016d3dc 100644
> > --- a/kernel/time/posix-timers.c
> > +++ b/kernel/time/posix-timers.c
> > @@ -112,7 +112,19 @@ static int posix_timer_add(struct k_itimer *timer)
> >
> >               head = &posix_timers_hashtable[hash(sig, id)];
> >
> > +             rcu_read_lock();
> > +             if (__posix_timers_find(head, sig, id)) {
> > +                     rcu_read_unlock();
> > +                     cond_resched();
> > +                     continue;
> > +             }
> > +             rcu_read_unlock();
> >               spin_lock(&hash_lock);
> > +             /*
> > +              * We must perform the lookup under hash_lock protection
> > +              * because another thread could have used the same id.
> > +              * This is very unlikely, but possible.
> > +              */
>
> If next_posix_timer_id is 64bit (so can't wrap) I think you can compare the
> (unmasked by MAX_INT) value being used with the current value.
> If the difference is small (well less than MAX_INT) I don't think you need
> the rescan.
> (Not going to help 32bit - but who cares :-)

I just noticed the rescan is racy anyway, because when the other threads add
a timer, the timer->it_signal and timer->it_id are temporarily zero.

There is a small race window.

We can set timer->it_id earlier [1], but not timer->it_signal

More work is needed :)

[1]

diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c
index dd2f9016d3dc..59ff75c81cff 100644
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -126,6 +126,7 @@ static int posix_timer_add(struct k_itimer *timer)
                 * This is very unlikely, but possible.
                 */
                if (!__posix_timers_find(head, sig, id)) {
+                       timer->it_id = (timer_t)id;
                        hlist_add_head_rcu(&timer->t_hash, head);
                        spin_unlock(&hash_lock);
                        return id;
@@ -428,7 +429,6 @@ static int do_timer_create(clockid_t which_clock,
struct sigevent *event,
                return new_timer_id;
        }

-       new_timer->it_id = (timer_t) new_timer_id;
        new_timer->it_clock = which_clock;
        new_timer->kclock = kc;
        new_timer->it_overrun = -1LL;
Re: [PATCH 2/2] posix-timers: Use RCU in posix_timer_add()
Posted by Thomas Gleixner 10 months ago
On Fri, Feb 14 2025 at 18:48, Eric Dumazet wrote:
> I just noticed the rescan is racy anyway, because when the other threads add
> a timer, the timer->it_signal and timer->it_id are temporarily zero.

Ah, you noticed too, but that has nothing to do with the rescan. That's
broken already today.

Thanks,

        tglx