From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A64F1DDA2F for ; Sat, 8 Mar 2025 16:48:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452494; cv=none; b=i4tHqgHodV3owGQ1OqvdJnLH+QMHwb8TIjdOOjJQoUc8GLaO/b75iTwiWfvmAvDVmjJ+uveM7ss/GQETzd6M/XLXqHxviS0PJfzZRbCPRPWDt5aDlEc4q7GcghIgPMshgeQMPdh8b9wC0AEc6kWD9IPtGO6Cx2SaMXjeigdxIsI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452494; c=relaxed/simple; bh=gdUh6CAXH3yzXLba7xsiAqD/Eh4CmHcPuqJYcfDoLMs=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=YdAipQNVqMzOk8jtCmEaym3fAi9WWi96xxXwcxICKCF8ut74jqlZSjLnNbvCtph+JWpyqNDM/o0QIfLztNh2p/si63qKHo7Pf24dEZ9DgreC9brcAcBN6yElDzwTkvLndtMmm6ZlRi8AXaL7undzQE+T+k1pwVJdcP5AM78PjoE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=ZtbwUOtM; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=SVLGgAl8; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="ZtbwUOtM"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="SVLGgAl8" Message-ID: <20250308155623.507944489@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452491; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=5Os3CrMG/mRgyD9Z6cJdOv4unBixPmwKfFPCjJq+QzE=; b=ZtbwUOtMvGYQrLDBvDd0Q5hgZwm/Y6kBhsLJqgGFRU4lCYNpskPY1QC4b7L3Cq9miD8V0c xuBo0E+8M3hgvPFDHnj6H7KZU/GLnuQaMONBDHU0QhWiSLu3YjQ2JlscIi1YGbOHc5ytvk qMN3Rqhx2+uk323eFRgfwtqhy6+z+l9Pb8k4pUH6jXr1Js4fqm07FjFhj88+8q1a9UnKKJ +3HKvTyx01H3K5xGRChVSUSV83nTeBI5nSPh6qSwMYUEMKOg82hPduhlLoiRhSTzOFKpB6 zKt2vr9D66JsG5vAAF6MYMaBduJXpmyfkPZMNU1fpjk4QML7h/bXBiZV6eKFgA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452491; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=5Os3CrMG/mRgyD9Z6cJdOv4unBixPmwKfFPCjJq+QzE=; b=SVLGgAl8EnGt8ohl70qITlw6mt6kZmbk4U+z3Ux/EMsVO+18rB9n5EBTcENEv9SbOHv1qm nuIs6Ocmy/wn42BQ== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 01/18] posix-timers: Ensure that timer initialization is fully visible References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:10 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Frederic pointed out that the memory operations to initialize the timer are not guaranteed to be visible, when __lock_timer() observes timer::it_signal valid under timer::it_lock: T0 T1 --------- ----------- do_timer_create() // A new_timer->.... =3D .... spin_lock(current->sighand) // B WRITE_ONCE(new_timer->it_signal, current->signal) spin_unlock(current->sighand) sys_timer_*() t =3D __lock_timer() spin_lock(&timr->it_lock) // observes B if (timr->it_signal =3D=3D current->signal) return timr; if (!t) return; // Is not guaranteed to observe A Protect the write of timer::it_signal, which makes the timer valid, with timer::it_lock as well. This guarantees that T1 must observe the initialization A completely, when it observes the valid signal pointer under timer::it_lock. sighand::siglock must still be taken to protect the signal::posix_timers list. Reported-by: Frederic Weisbecker Suggested-by: Frederic Weisbecker Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- kernel/time/posix-timers.c | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -462,14 +462,21 @@ static int do_timer_create(clockid_t whi if (error) goto out; =20 - spin_lock_irq(¤t->sighand->siglock); - /* This makes the timer valid in the hash table */ - WRITE_ONCE(new_timer->it_signal, current->signal); - hlist_add_head(&new_timer->list, ¤t->signal->posix_timers); - spin_unlock_irq(¤t->sighand->siglock); /* - * After unlocking sighand::siglock @new_timer is subject to - * concurrent removal and cannot be touched anymore + * timer::it_lock ensures that __lock_timer() observes a fully + * initialized timer when it observes a valid timer::it_signal. + * + * sighand::siglock is required to protect signal::posix_timers. + */ + scoped_guard (spinlock_irq, &new_timer->it_lock) { + guard(spinlock)(¤t->sighand->siglock); + /* This makes the timer valid in the hash table */ + WRITE_ONCE(new_timer->it_signal, current->signal); + hlist_add_head(&new_timer->list, ¤t->signal->posix_timers); + } + /* + * After unlocking @new_timer is subject to concurrent removal and + * cannot be touched anymore */ return 0; out: From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A92E21EB5C5 for ; Sat, 8 Mar 2025 16:48:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452499; cv=none; b=Ax+zCWiVFL5FC+KAXZyZ6Lq862snbaA3RDSwzPBLDMVnsdLgAjamGOw4XjsZ54R+ntNeB6v8by/CrHaD4nky/ydhNXeWkORlylMrhhyk8LqJ7k7yIMGP8oIgDO7eWF0t++9E515MbkzJIEKl/LVV86nqMYeBtOsbTn5dFACt+Ss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452499; c=relaxed/simple; bh=HqSqBEwtX/+GWHY7PrWFXMp7Llf0ugNpRU0fug7ihbU=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=UOcAMZusA/fFASPTADFSyyq6/w6eRjjxlsxkv/EGWpA4LBnDa3ITfJEJBcN+t7Xtnyfm8aEpqktTojYAJ8qbMQn3NVIh1mPUSnFexE0aq8XJWoV8joyCddJ/EGRWSbfzLS4vlfE9FnW+ryiy7nH61XfC4bf46VhvTf+uZ+pb24U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=qe0aicus; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=UOI72XZL; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qe0aicus"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="UOI72XZL" Message-ID: <20250308155623.572035178@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=OokW0q+El9MTH0hZbNAk7tISB10kppb/zIF4/BcMm5s=; b=qe0aicusjWnDD3X37OG5cfeAG/AbNmEkWdaLcpQqqeB9herIwIyA7O3pcce+WaTs1RCE+g i1pgeuXb/oPexhGkHGqGGzSkbTEdCzwSsv0pWlcJuyuIa8VZPRkfyQ9m3hZ/r0e4K1K8HC 1nCwaJzuI6FERJYDDB9nR5OvStm2Rl8zugHFSDIImkfHPHMsmL7BsDF5Azj1N5VgB67ltZ HkjANk+hQ7+XadnBAWcVM+zS4K9jP4ZiW/peI2KlYx8GT2mZdn1Hu8gSeL3IrI1/XWAvRt JpWnG7I/l2Qhsv1MjgDx54+/rM7QBPNKdo/s5/+CiX1gWsmoi7PaCagYCH+X2Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452496; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=OokW0q+El9MTH0hZbNAk7tISB10kppb/zIF4/BcMm5s=; b=UOI72XZLDb5F3nu+DHHUpbc8ggu6mlRFprpP0zx1lRiZiq3/NdoTUPMy+D5jic/KCmS0EW gKO6c2sVH58REGCg== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 02/18] posix-timers: Initialise timer before adding it to the hash table References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:14 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Eric Dumazet A timer is only valid in the hashtable when both timer::it_signal and timer::it_id are set to their final values, but timers are added without those values being set. The timer ID is allocated when the timer is added to the hash in invalid state. The ID is taken from a monotonically increasing per process counter which wraps around after reaching INT_MAX. The hash insertion validates that there is no timer with the allocated ID in the hash table which belongs to the same process. That opens a mostly theoretical race condition: If other threads of the same process manage to create/delete timers in rapid succession before the newly created timer is fully initialized and wrap around to the timer ID which was handed out, then a duplicate timer ID will be inserted into the hash table. Prevent this by: 1) Setting timer::it_id before inserting the timer into the hashtable. =20 2) Storing the signal pointer in timer::it_signal with bit 0 set before inserting it into the hashtable. Bit 0 acts as a invalid bit, which means that the regular lookup for sys_timer_*() will fail the comparison with the signal pointer. But the lookup on insertion masks out bit 0 and can therefore detect a timer which is not yet valid, but allocated in the hash table. Bit 0 in the pointer is cleared once the initialization of the timer completed. [ tglx: Fold ID and signal iniitializaion into one patch and massage change log and comments. ] Signed-off-by: Eric Dumazet Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker Link: https://lore.kernel.org/all/20250219125522.2535263-3-edumazet@google.= com --- kernel/time/posix-timers.c | 56 +++++++++++++++++++++++++++++++++-------= ----- 1 file changed, 42 insertions(+), 14 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -72,13 +72,13 @@ static int hash(struct signal_struct *si return hash_32(hash32_ptr(sig) ^ nr, HASH_BITS(posix_timers_hashtable)); } =20 -static struct k_itimer *__posix_timers_find(struct hlist_head *head, - struct signal_struct *sig, - timer_t id) +static struct k_itimer *posix_timer_by_id(timer_t id) { + struct signal_struct *sig =3D current->signal; + struct hlist_head *head =3D &posix_timers_hashtable[hash(sig, id)]; struct k_itimer *timer; =20 - hlist_for_each_entry_rcu(timer, head, t_hash, lockdep_is_held(&hash_lock)= ) { + hlist_for_each_entry_rcu(timer, head, t_hash) { /* timer->it_signal can be set concurrently */ if ((READ_ONCE(timer->it_signal) =3D=3D sig) && (timer->it_id =3D=3D id)) return timer; @@ -86,12 +86,26 @@ static struct k_itimer *__posix_timers_f return NULL; } =20 -static struct k_itimer *posix_timer_by_id(timer_t id) +static inline struct signal_struct *posix_sig_owner(const struct k_itimer = *timer) { - struct signal_struct *sig =3D current->signal; - struct hlist_head *head =3D &posix_timers_hashtable[hash(sig, id)]; + unsigned long val =3D (unsigned long)timer->it_signal; + + /* + * Mask out bit 0, which acts as invalid marker to prevent + * posix_timer_by_id() detecting it as valid. + */ + return (struct signal_struct *)(val & ~1UL); +} + +static bool posix_timer_hashed(struct hlist_head *head, struct signal_stru= ct *sig, timer_t id) +{ + struct k_itimer *timer; =20 - return __posix_timers_find(head, sig, id); + hlist_for_each_entry_rcu(timer, head, t_hash, lockdep_is_held(&hash_lock)= ) { + if ((posix_sig_owner(timer) =3D=3D sig) && (timer->it_id =3D=3D id)) + return true; + } + return false; } =20 static int posix_timer_add(struct k_itimer *timer) @@ -112,7 +126,19 @@ static int posix_timer_add(struct k_itim sig->next_posix_timer_id =3D (id + 1) & INT_MAX; =20 head =3D &posix_timers_hashtable[hash(sig, id)]; - if (!__posix_timers_find(head, sig, id)) { + if (!posix_timer_hashed(head, sig, id)) { + /* + * Set the timer ID and the signal pointer to make + * it identifiable in the hash table. The signal + * pointer has bit 0 set to indicate that it is not + * yet fully initialized. posix_timer_hashed() + * masks this bit out, but the syscall lookup fails + * to match due to it being set. This guarantees + * that there can't be duplicate timer IDs handed + * out. + */ + timer->it_id =3D (timer_t)id; + timer->it_signal =3D (struct signal_struct *)((unsigned long)sig | 1UL); hlist_add_head_rcu(&timer->t_hash, head); spin_unlock(&hash_lock); return id; @@ -406,8 +432,7 @@ static int do_timer_create(clockid_t whi =20 /* * Add the timer to the hash table. The timer is not yet valid - * because new_timer::it_signal is still NULL. The timer id is also - * not yet visible to user space. + * after insertion, but has a unique ID allocated. */ new_timer_id =3D posix_timer_add(new_timer); if (new_timer_id < 0) { @@ -415,7 +440,6 @@ static int do_timer_create(clockid_t whi return new_timer_id; } =20 - new_timer->it_id =3D (timer_t) new_timer_id; new_timer->it_clock =3D which_clock; new_timer->kclock =3D kc; new_timer->it_overrun =3D -1LL; @@ -453,7 +477,7 @@ static int do_timer_create(clockid_t whi } /* * After succesful copy out, the timer ID is visible to user space - * now but not yet valid because new_timer::signal is still NULL. + * now but not yet valid because new_timer::signal low order bit is 1. * * Complete the initialization with the clock specific create * callback. @@ -470,7 +494,11 @@ static int do_timer_create(clockid_t whi */ scoped_guard (spinlock_irq, &new_timer->it_lock) { guard(spinlock)(¤t->sighand->siglock); - /* This makes the timer valid in the hash table */ + /* + * new_timer::it_signal contains the signal pointer with + * bit 0 set, which makes it invalid for syscall operations. + * Store the unmodified signal pointer to make it valid. + */ WRITE_ONCE(new_timer->it_signal, current->signal); hlist_add_head(&new_timer->list, ¤t->signal->posix_timers); } From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C92451EB5DB for ; Sat, 8 Mar 2025 16:48:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452502; cv=none; b=l1eRj1e6fnUHjkMhis1bcJmHGsH/JHBSBzwxHWJAWx3s89EaTNJiMcrYMDv+VxVo2cQ4PtTZuXf4cyeOizcnm04LzctyShX/a1z1B3LKYnmC8wES6WvuphWSUW707NgGCo1Ei+dcCTV6g6QVNhI5m26p+TF3aesWNBYp5Gij6ag= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452502; c=relaxed/simple; bh=iGrVt5SIWyWVVTmRnZN0SZwdfWGj6TN1Wpw5Fld7GzA=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=NYJAyxLo0zNU+eODScQPRlLitr+jUXFvtUlV8HsDfsHX0rDTenOCzdq1H+H91GiK2U89UZqLnBLp79sifJ+MpK5EGkvyvmoBf0wRRaCHNL9OdHndb34bD2b2VCk1dV7hkDoLytizjBea1MrLJD9v4b+I3WZUfROaeEg8hALlvuI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=mj93E3NG; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=VDEn1dcD; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="mj93E3NG"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VDEn1dcD" Message-ID: <20250308155623.635612865@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452498; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=1ZUFUQNw1HQq98JJ1JeSBCsqtOh9o5uM2O8I/kYQj8M=; b=mj93E3NGtP1NTygp3gfmaZHhcTbOZ28gCZve4+jnAIy1ZZVY7nKwwg5qt1TwT1kZWZuy3/ cJ+vOxYeZu/amSL82MUPe+PqMRq+MZMH+zutPJVeFxtNNhxTOg8R8tkn9UZwkQ9hMjB81J nDueegP9pWilpXwo4BzsphvELIwlpsa1FG6/NDiDG3caQ8HakOjV5ft5eSoSXH72Ae7MJI oiZm76miKpxUatsgvmC8PhsG00H6f1K3OT8Amfxjqj0mrmCz2HF1UGKjWmJgOtvaiezH+e HfQN1RmgTVZJx5yvwCf/kG1G05Rb/XAaV6YRKOBDc5tivEs6lfCyg3yS0S5QZw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452498; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=1ZUFUQNw1HQq98JJ1JeSBCsqtOh9o5uM2O8I/kYQj8M=; b=VDEn1dcDBUTG/ed4VFmeW9kpJ2rOFvBJVFrTwQfyqGVZqBjzboBKfXrN1sDgyA2WrhTPsW ObKQveB5UJo6NsDw== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 03/18] posix-timers: Add cond_resched() to posix_timer_add() search loop References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:17 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Eric Dumazet With a large number of POSIX timers the search for a valid ID might cause a soft lockup on PREEMPT_NONE/VOLUNTARY kernels. Add cond_resched() to the loop to prevent that. [ tglx: Split out from Eric's series ] Signed-off-by: Eric Dumazet Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker Link: https://lore.kernel.org/all/20250214135911.2037402-2-edumazet@google.= com --- kernel/time/posix-timers.c | 1 + 1 file changed, 1 insertion(+) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -144,6 +144,7 @@ static int posix_timer_add(struct k_itim return id; } spin_unlock(&hash_lock); + cond_resched(); } /* POSIX return code when no timer ID could be allocated */ return -EAGAIN; From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8647A1EB5E0 for ; Sat, 8 Mar 2025 16:48:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452505; cv=none; b=hWzCX2PkyY5PX24Bp1f0B5eivtXxky2sK+aS1/DcSe6MlqTu8p8pdGMgxUAbqxXov/CUT8zU/XXAQznXmRltu9Wk9QyHVMn/9fk/+EFbRxxAPhBGmL5eBwP8nxGImZwmQCkWXhDK2qGpbCC5xD2SrTqzMwtuWhkuLy1i1S/Om6k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452505; c=relaxed/simple; bh=v1ZQb0CLMsf+4NMtv8z9vh1bq4XwnUUI1Mq85vu/yoM=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=l1wlymMbkU3MY7mP8KcirGHplqzIeVBW9+Y6ncS9ibLFOvPr/RjcbX93FxYyo73uWeTbVVkal6G6DK/Qmzl+ReIuarjAWAgyHsryPkl+cAX5IerBLSa63tPY5FpD8Q0sd8NSCzogrd5xVk79cMr9/g3GEhPA1hNcYCBncOeDJOk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=KWH6PSfy; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=YH+RQgpy; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="KWH6PSfy"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="YH+RQgpy" Message-ID: <20250308155623.701301552@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=lUqtxlbBdGdbkk7RdEECXGsUd/xNQguaG0zDCu2fH+M=; b=KWH6PSfyHomUZaL3dZmZEktWM+C/Q4NjPLciuzf5n6ysvCfZRLyl3JoQYhvnzDqZEuJfTp FJMcssCo0oXSZ5j8A/3XyfRA/ZHHkCn59rRf2Evh3R1Es9WxlG/fqI71dRJXFRwPH2uGCl jwSaS72tEr/aAhwyHgzw/ar0RbUUxi0Hm8uZtFyC5qNRz9Eb/z9VtI8tpg4sf53uuGB9Gv Flh77itDPRNhwtBziPlR3vozSzHUV03hsqupjCdAk5dAw8PErqVmknQDH/6zJTDREEwE36 +umjcqcg5RyQhNEct5SE3okIol41YVjlUOTWGz43ZGaQwGese2XYAvg2GltX0A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=lUqtxlbBdGdbkk7RdEECXGsUd/xNQguaG0zDCu2fH+M=; b=YH+RQgpyfIxzd0v66oMDZq2lAkEGBhvv9DN0wXHVQcliMJy2FslVqzzZtDAPVxDHUZujC1 Oc4fAMiCKuDb1gCw== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 04/18] posix-timers: Cleanup includes References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:20 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove pointless includes and sort the remaining ones alphabetically. Signed-off-by: Thomas Gleixner Acked-by: Frederic Weisbecker --- kernel/time/posix-timers.c | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -9,28 +9,22 @@ * * These are all the functions necessary to implement POSIX clocks & timers */ -#include -#include -#include -#include -#include -#include - -#include -#include -#include +#include #include #include +#include +#include +#include +#include +#include #include #include +#include +#include #include -#include -#include -#include -#include -#include -#include +#include #include +#include =20 #include "timekeeping.h" #include "posix-timers.h" From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37CAD1FCF47 for ; Sat, 8 Mar 2025 16:48:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452509; cv=none; b=PNZv+sGwzSN9pehxKtw5T90RQqr8gthW2OLtTPTzga71Z5KE2I8iInCYgDjRfvmbCJYULqdS/z3yn6D61KLB2voSdDNrqWX9OS7/XzrXR7NrM+Wxy9sbFaAeh2DPENO9hrWRU7TRvBktSmzmZ6oaWTJ6yy3es/ZfOSPAHKeIq0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452509; c=relaxed/simple; bh=mAIcgcV9wg8FHtqmfIumUc7LsU6GTAJDQl4M1SlYKn8=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=CyiXoi0gbuhjECc3ezMHtntwVvoh7BUD6vxDgxEE5JsLyz/vSAxMFTD7XMKvPn590ugov+kWndNQVu93apwl6dpTWWsBACCd9qKe7x/9FBOw3GueKq4IktBPVWgV45Vm6RKj2BV9ZJNOziY9ThBf17wwQXqy86RZ12ERlMs6zrI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=dy7zQdc5; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=GtJO5ZLJ; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="dy7zQdc5"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="GtJO5ZLJ" Message-ID: <20250308155623.765462334@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=SowdON3hohqcwgSzLKdo7RLzk5nlyhfHXImf0zgHVVs=; b=dy7zQdc5QVKR0TN+yrpq7xmW33s/8RqM3Hck6Ue/E8RqEPMdkH5L9raDcQqVaUnPHLl05y Sd5kWe6I33vA6JSqHTnfctFt9wEB84DzFUQuFp27v6vwy3N0Sh/CG+nGnuOqTLT5/TL5zV oyv7R/RXaKkvk5K9QluhU1P0RbtIQwRI5sthHqRRHDKPnRNUVLX6VPNKXMzjuhDRw3anjm iyM3CvgnZYzlYShpAYZ3yQa4DhzFaxK4CRqsmHSI5/IUhgeEBkuwbRkNHhr+7LTsJtc3Bq K88b7rJjkrbl1IpqiI1FmzUwPsaSyYIbgH52Q7DglfW7ErZPLxTKIJaB7nywiw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452506; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=SowdON3hohqcwgSzLKdo7RLzk5nlyhfHXImf0zgHVVs=; b=GtJO5ZLJjSvlL2aQtQhTtKWxtW8ZjRGRMsd0Rgpb2y00sCA/JdiLQt0ROI/YLM8b+/yXIl CHRBw03P13zJUtDA== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 05/18] posix-timers: Remove a few paranoid warnings References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:24 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Warnings about a non-initialized timer or non-existing callbacks are just useful for implementing new posix clocks, but there a NULL pointer dereference is expected anyway. :) Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- V2: New patch --- kernel/time/posix-timers.c | 37 ++++++++----------------------------- 1 file changed, 8 insertions(+), 29 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -682,7 +682,6 @@ void common_timer_get(struct k_itimer *t =20 static int do_timer_gettime(timer_t timer_id, struct itimerspec64 *settin= g) { - const struct k_clock *kc; struct k_itimer *timr; unsigned long flags; int ret =3D 0; @@ -692,11 +691,7 @@ static int do_timer_gettime(timer_t time return -EINVAL; =20 memset(setting, 0, sizeof(*setting)); - kc =3D timr->kclock; - if (WARN_ON_ONCE(!kc || !kc->timer_get)) - ret =3D -EINVAL; - else - kc->timer_get(timr, setting); + timr->kclock->timer_get(timr, setting); =20 unlock_timer(timr, flags); return ret; @@ -824,7 +819,6 @@ static void common_timer_wait_running(st static struct k_itimer *timer_wait_running(struct k_itimer *timer, unsigned long *flags) { - const struct k_clock *kc =3D READ_ONCE(timer->kclock); timer_t timer_id =3D READ_ONCE(timer->it_id); =20 /* Prevent kfree(timer) after dropping the lock */ @@ -835,8 +829,7 @@ static struct k_itimer *timer_wait_runni * kc->timer_wait_running() might drop RCU lock. So @timer * cannot be touched anymore after the function returns! */ - if (!WARN_ON_ONCE(!kc->timer_wait_running)) - kc->timer_wait_running(timer); + timer->kclock->timer_wait_running(timer); =20 rcu_read_unlock(); /* Relock the timer. It might be not longer hashed. */ @@ -899,7 +892,6 @@ static int do_timer_settime(timer_t time struct itimerspec64 *new_spec64, struct itimerspec64 *old_spec64) { - const struct k_clock *kc; struct k_itimer *timr; unsigned long flags; int error; @@ -922,11 +914,7 @@ static int do_timer_settime(timer_t time /* Prevent signal delivery and rearming. */ timr->it_signal_seq++; =20 - kc =3D timr->kclock; - if (WARN_ON_ONCE(!kc || !kc->timer_set)) - error =3D -EINVAL; - else - error =3D kc->timer_set(timr, tmr_flags, new_spec64, old_spec64); + error =3D timr->kclock->timer_set(timr, tmr_flags, new_spec64, old_spec64= ); =20 if (error =3D=3D TIMER_RETRY) { // We already got the old time... @@ -1008,18 +996,6 @@ static inline void posix_timer_cleanup_i } } =20 -static inline int timer_delete_hook(struct k_itimer *timer) -{ - const struct k_clock *kc =3D timer->kclock; - - /* Prevent signal delivery and rearming. */ - timer->it_signal_seq++; - - if (WARN_ON_ONCE(!kc || !kc->timer_del)) - return -EINVAL; - return kc->timer_del(timer); -} - /* Delete a POSIX.1b interval timer. */ SYSCALL_DEFINE1(timer_delete, timer_t, timer_id) { @@ -1032,7 +1008,10 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t if (!timer) return -EINVAL; =20 - if (unlikely(timer_delete_hook(timer) =3D=3D TIMER_RETRY)) { + /* Prevent signal delivery and rearming. */ + timer->it_signal_seq++; + + if (unlikely(timer->kclock->timer_del(timer) =3D=3D TIMER_RETRY)) { /* Unlocks and relocks the timer if it still exists */ timer =3D timer_wait_running(timer, &flags); goto retry_delete; @@ -1078,7 +1057,7 @@ static void itimer_delete(struct k_itime * mechanism. Worse, that timer mechanism might run the expiry * function concurrently. */ - if (timer_delete_hook(timer) =3D=3D TIMER_RETRY) { + if (timer->kclock->timer_del(timer) =3D=3D TIMER_RETRY) { /* * Timer is expired concurrently, prevent livelocks * and pointless spinning on RT. From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 633DA1E833B for ; Sat, 8 Mar 2025 16:48:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452510; cv=none; b=g0w9hHATBLYYAUugDb3e2Y6uI5nebwbMnqELNBk/ok7jnle65RucgOGOxdRujQt4RNpOT8LLsCHIlKWKlxAQnNFkGTYfwHc1bE4w2LZVeHvsKVsQX6Zt+3qXqdZIW9jHIFfvraYD2XQZ8Lfq1JvDpGb6sYpLTfxQM3HX1kPGJmE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452510; c=relaxed/simple; bh=1CNbS8V5GNdwE2MUYNRQZQZrnyIdBIXF1/Rqlwc4JP8=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=Wsd95EKgbNsUQPwnhE2fBf8Hqgdqb5BGcKXUkC93RGjHd+gKqh/9GweiT5XXDYi7IjiuZfSEW935VlbvP81bJ6hkI1qj6LZGPnBloL0NX8qDHEW9sS/OE3qi4qVHq2lRPzMkADmP5mKgjKbR3lg9DRP3ZGuC75SQyH0l6dtU4yA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=nT5dcYVa; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=D463rrhM; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="nT5dcYVa"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="D463rrhM" Message-ID: <20250308155623.829215801@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452507; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=27HQMLTjuNSBvpM2M4GFmvhtGJQR57PqTAcp8ARZKmw=; b=nT5dcYVaQDkbMwfCDdBqE2/RZroJ2vFBnBB7aM3hi8EeXYyzZEFaxtUtN8Rl8OAoRflWlZ WCeJwGD4hcUBCXgWuZBfSg5Gn4b78qb1zj+ta8kxB4eWBnhAYudi+4rMk5FiaKDj2MVbI+ pqutsPn2QHo5hOp+ItR4d5EwW/pXi7PrU03T3BdDYHJQCGU2AK2U5jk6P8CZkRHssh03Dw Shz7Qt1bqTiGk2mnVUBaUnDfVBAPYwcvspaDRQpkJ5TdHkolxT9I6vpYQCsV/baPOlQ4Rg gRaFL/Mjs4+QdwwmMPoYkIow/tAHWv0s5ylh5TDPLG214vocQJMW+/uaF5hUbQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452507; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=27HQMLTjuNSBvpM2M4GFmvhtGJQR57PqTAcp8ARZKmw=; b=D463rrhMp/m32S+2lppElbQJ/KUgBknsUkczkjREpci+3hu+cZPAE4V1ga3pCGhuLvPDKm UKPoe19G0Fb/6yDw== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 06/18] posix-timers: Remove SLAB_PANIC from kmem cache References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:26 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There is no need to panic when the posix-timer kmem_cache can't be created. timer_create() will fail with -ENOMEM and that's it. Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- V2: New patch --- kernel/time/posix-timers.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -243,9 +243,8 @@ static int posix_get_hrtimer_res(clockid =20 static __init int init_posix_timers(void) { - posix_timers_cache =3D kmem_cache_create("posix_timers_cache", - sizeof(struct k_itimer), 0, - SLAB_PANIC | SLAB_ACCOUNT, NULL); + posix_timers_cache =3D kmem_cache_create("posix_timers_cache", sizeof(str= uct k_itimer), 0, + SLAB_ACCOUNT, NULL); return 0; } __initcall(init_posix_timers); @@ -371,8 +370,12 @@ static struct pid *good_sigevent(sigeven =20 static struct k_itimer *alloc_posix_timer(void) { - struct k_itimer *tmr =3D kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL= ); + struct k_itimer *tmr; =20 + if (unlikely(!posix_timers_cache)) + return NULL; + + tmr =3D kmem_cache_zalloc(posix_timers_cache, GFP_KERNEL); if (!tmr) return tmr; From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1F24202F9A for ; Sat, 8 Mar 2025 16:48:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452513; cv=none; b=IjZDxkqzKAfanR2QA1FiN/h1z69ZsybPLDHoLDIpKqcgugPnFRt2dbUPWoDZj8nlYlVsCSgDHiZ6FqO+g0XyXyVXVVEXGCd2GDTv9+x4WYgz+HQveh0twKduT0zHRXrg3Viiv2HBccajpvmemLRZlAFXRVW3v2nz8qpLZ4MlDEA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452513; c=relaxed/simple; bh=Kikv1cOuZ52fHMF2lOdP2stNkQm++hSxwwV0PI7NR58=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=CLLV3S5zqte5nrjUT3jaI3ryRbeumsYUCnVUMwWYo042vj3OoAYxx/UK77coTSS+wjqRiHHTEJx85irWK9SCBja5azMB+JADx4og80rKtAomFcMZ7A3zCzV1JsB8hFC+5/ZvsuUhqchg9CDinqEfP6odiFt9Rmrwi1m3RgrvPnM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=bibw8EBU; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=lrJEa/Lz; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="bibw8EBU"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="lrJEa/Lz" Message-ID: <20250308155623.892762130@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452510; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=2fHIYPPSBTnFA9+ynp17I3Mj7eH6sqePOgXAxOrPoEc=; b=bibw8EBUlnBLK5BmB/v5H87skuJHn3DAu7OlSzT8Bre5TESDMGoDsiTmobXu7h3w2zVXm+ yQuQUvgf8tz8E6sw7vCUoxs8L22jisV3cGU6X5COdism7iSCo5VKMup8MfDwgHoZGvHPTK wCc3Absf5jYNc0JlhYXNEvmxxXdnmtsnDxiqQKho7m3DpEoja0PdyzZnJLrGpjupQQgrIA uVf+XEas+2Zn+6z7YCsREUIZFH6qwTnRhIDL4xVyk+u9naO30rZeEIG2NSlwgPGGjMmeGR i9D+1aqXyRGPevU9xVQ+/ZKmKHXRn3c3yKgftDKFOxxoqxGNz4H1tEW+19Srjg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452510; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=2fHIYPPSBTnFA9+ynp17I3Mj7eH6sqePOgXAxOrPoEc=; b=lrJEa/LzFe5MayDiBOFX6bvxOhUQt18ZsjXwB9uYX+ojf065NkchhhJEmvQ104p+ii1cPL g3lMOcZaqiUO2bBA== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 07/18] posix-timers: Use guards in a few places References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:28 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Switch locking and RCU to guards where applicable. Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- V2: New patch --- kernel/time/posix-timers.c | 68 +++++++++++++++++++---------------------= ----- 1 file changed, 30 insertions(+), 38 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -397,9 +397,8 @@ void posixtimer_free_timer(struct k_itim =20 static void posix_timer_unhash_and_free(struct k_itimer *tmr) { - spin_lock(&hash_lock); - hlist_del_rcu(&tmr->t_hash); - spin_unlock(&hash_lock); + scoped_guard (spinlock, &hash_lock) + hlist_del_rcu(&tmr->t_hash); posixtimer_putref(tmr); } =20 @@ -443,9 +442,8 @@ static int do_timer_create(clockid_t whi new_timer->it_overrun =3D -1LL; =20 if (event) { - rcu_read_lock(); - new_timer->it_pid =3D get_pid(good_sigevent(event)); - rcu_read_unlock(); + scoped_guard (rcu) + new_timer->it_pid =3D get_pid(good_sigevent(event)); if (!new_timer->it_pid) { error =3D -EINVAL; goto out; @@ -579,7 +577,7 @@ static struct k_itimer *__lock_timer(tim * can't change, but timr::it_signal becomes NULL during * destruction. */ - rcu_read_lock(); + guard(rcu)(); timr =3D posix_timer_by_id(timer_id); if (timr) { spin_lock_irqsave(&timr->it_lock, *flags); @@ -587,14 +585,10 @@ static struct k_itimer *__lock_timer(tim * Validate under timr::it_lock that timr::it_signal is * still valid. Pairs with #1 above. */ - if (timr->it_signal =3D=3D current->signal) { - rcu_read_unlock(); + if (timr->it_signal =3D=3D current->signal) return timr; - } spin_unlock_irqrestore(&timr->it_lock, *flags); } - rcu_read_unlock(); - return NULL; } =20 @@ -825,16 +819,15 @@ static struct k_itimer *timer_wait_runni timer_t timer_id =3D READ_ONCE(timer->it_id); =20 /* Prevent kfree(timer) after dropping the lock */ - rcu_read_lock(); - unlock_timer(timer, *flags); - - /* - * kc->timer_wait_running() might drop RCU lock. So @timer - * cannot be touched anymore after the function returns! - */ - timer->kclock->timer_wait_running(timer); + scoped_guard (rcu) { + unlock_timer(timer, *flags); + /* + * kc->timer_wait_running() might drop RCU lock. So @timer + * cannot be touched anymore after the function returns! + */ + timer->kclock->timer_wait_running(timer); + } =20 - rcu_read_unlock(); /* Relock the timer. It might be not longer hashed. */ return lock_timer(timer_id, flags); } @@ -1020,20 +1013,20 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t goto retry_delete; } =20 - spin_lock(¤t->sighand->siglock); - hlist_del(&timer->list); - posix_timer_cleanup_ignored(timer); - /* - * A concurrent lookup could check timer::it_signal lockless. It - * will reevaluate with timer::it_lock held and observe the NULL. - * - * It must be written with siglock held so that the signal code - * observes timer->it_signal =3D=3D NULL in do_sigaction(SIG_IGN), - * which prevents it from moving a pending signal of a deleted - * timer to the ignore list. - */ - WRITE_ONCE(timer->it_signal, NULL); - spin_unlock(¤t->sighand->siglock); + scoped_guard (spinlock, ¤t->sighand->siglock) { + hlist_del(&timer->list); + posix_timer_cleanup_ignored(timer); + /* + * A concurrent lookup could check timer::it_signal lockless. It + * will reevaluate with timer::it_lock held and observe the NULL. + * + * It must be written with siglock held so that the signal code + * observes timer->it_signal =3D=3D NULL in do_sigaction(SIG_IGN), + * which prevents it from moving a pending signal of a deleted + * timer to the ignore list. + */ + WRITE_ONCE(timer->it_signal, NULL); + } =20 unlock_timer(timer, flags); posix_timer_unhash_and_free(timer); @@ -1106,9 +1099,8 @@ void exit_itimers(struct task_struct *ts return; =20 /* Protect against concurrent read via /proc/$PID/timers */ - spin_lock_irq(&tsk->sighand->siglock); - hlist_move_list(&tsk->signal->posix_timers, &timers); - spin_unlock_irq(&tsk->sighand->siglock); + scoped_guard (spinlock_irq, &tsk->sighand->siglock) + hlist_move_list(&tsk->signal->posix_timers, &timers); =20 /* The timers are not longer accessible via tsk::signal */ while (!hlist_empty(&timers)) { From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A43D1E8355 for ; Sat, 8 Mar 2025 16:48:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452515; cv=none; b=SsalU6ouGwP+MCdRIduambopoFlwD9Eq8Nutte8QBOKFm3burJw0gBn8k7KI0ojK99t+fSnNHF8lQgMwbNMP2xTHUX2nE7adbchYDMiXaK+ucSRFYwbaAbuFfJ+dggjBdabS2X9MbvSlnrB4w+/kMzxdq808Vd5nfJXf0CFzxKE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452515; c=relaxed/simple; bh=ACRkWixJAqVfDOk1Q5wb1S5hQJR2mKiJgZ3G/R2/Ri8=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=PbvyMpEbNCc6UUchy+FVnGvdG7AG/9BH9YFa75vj9Fzi7i7BgmKtZW8uVkTvmdOM5BwhXF+2gaiN1qV7xFdJd7txfWHGF0ywo+VUorWCl0w1eoCXQKnqTuuPowD9hJ5rmfe6A3L2h2FHdsJ13mOUtdGVOhY5mLj/Gnwdz/lB/GQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=vQ11qUS/; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=fKajzwxV; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="vQ11qUS/"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="fKajzwxV" Message-ID: <20250308155623.959825668@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=6WOc3cDj6iOjUExLGGT8i0JmqAEDLdLDjsu3HEyVLCo=; b=vQ11qUS/X8aMNPNqrUeHEu4u2g6OCTdhgsu7kQGMIg4KWCO4cjnkzy4pz9GJQyWy7Cywfp 1vYKrFjGI7UUi4zemBa419Ew6YUlfSjxhAHqEIDshQSOeVU1kgggSrhgn+4OlKdRneRq1I t0yXQHrz7OtmgGnS1LmqD1TfHZqjxQsvC6wmLxHAyKp+oMxpNxBBQ0OhZR0M7jyGF+huwN qdFKtCrQ65YWi99dd/DLsnMKUWQA7voNZA96Uas/XkFrLFd6KLsrjmt0uT8lkBWizgLKBM zFF3KfdKhAUVCW2AYNBTOZ+snH/K1BqenpHMRwcisuOvm7b0b9Hml8ODU20Dhw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=6WOc3cDj6iOjUExLGGT8i0JmqAEDLdLDjsu3HEyVLCo=; b=fKajzwxVFpD6SPi89J89JmkDD9jR4OmTJ8rIijD99GXe4kJhrTNXMNkZKvKXthcimekkws 0MA2IIBIGWNlmtDA== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 08/18] posix-timers: Simplify lock/unlock_timer() References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:30 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Since the integration of sigqueue into the timer struct, lock_timer() is only used in task context. So taking the lock with irqsave() is not longer required. Convert it to use spin_[un]lock_irq(). Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- V2: New patch --- kernel/time/posix-timers.c | 70 ++++++++++++++++++----------------------= ----- 1 file changed, 29 insertions(+), 41 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -53,14 +53,19 @@ static const struct k_clock clock_realti #error "SIGEV_THREAD_ID must not share bit with other SIGEV values!" #endif =20 -static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flag= s); +static struct k_itimer *__lock_timer(timer_t timer_id); =20 -#define lock_timer(tid, flags) \ -({ struct k_itimer *__timr; \ - __cond_lock(&__timr->it_lock, __timr =3D __lock_timer(tid, flags)); \ - __timr; \ +#define lock_timer(tid) \ +({ struct k_itimer *__timr; \ + __cond_lock(&__timr->it_lock, __timr =3D __lock_timer(tid)); \ + __timr; \ }) =20 +static inline void unlock_timer(struct k_itimer *timr) +{ + spin_unlock_irq(&timr->it_lock); +} + static int hash(struct signal_struct *sig, unsigned int nr) { return hash_32(hash32_ptr(sig) ^ nr, HASH_BITS(posix_timers_hashtable)); @@ -144,11 +149,6 @@ static int posix_timer_add(struct k_itim return -EAGAIN; } =20 -static inline void unlock_timer(struct k_itimer *timr, unsigned long flags) -{ - spin_unlock_irqrestore(&timr->it_lock, flags); -} - static int posix_get_realtime_timespec(clockid_t which_clock, struct times= pec64 *tp) { ktime_get_real_ts64(tp); @@ -538,7 +538,7 @@ COMPAT_SYSCALL_DEFINE3(timer_create, clo } #endif =20 -static struct k_itimer *__lock_timer(timer_t timer_id, unsigned long *flag= s) +static struct k_itimer *__lock_timer(timer_t timer_id) { struct k_itimer *timr; =20 @@ -580,14 +580,14 @@ static struct k_itimer *__lock_timer(tim guard(rcu)(); timr =3D posix_timer_by_id(timer_id); if (timr) { - spin_lock_irqsave(&timr->it_lock, *flags); + spin_lock_irq(&timr->it_lock); /* * Validate under timr::it_lock that timr::it_signal is * still valid. Pairs with #1 above. */ if (timr->it_signal =3D=3D current->signal) return timr; - spin_unlock_irqrestore(&timr->it_lock, *flags); + spin_unlock_irq(&timr->it_lock); } return NULL; } @@ -680,17 +680,16 @@ void common_timer_get(struct k_itimer *t static int do_timer_gettime(timer_t timer_id, struct itimerspec64 *settin= g) { struct k_itimer *timr; - unsigned long flags; int ret =3D 0; =20 - timr =3D lock_timer(timer_id, &flags); + timr =3D lock_timer(timer_id); if (!timr) return -EINVAL; =20 memset(setting, 0, sizeof(*setting)); timr->kclock->timer_get(timr, setting); =20 - unlock_timer(timr, flags); + unlock_timer(timr); return ret; } =20 @@ -746,15 +745,14 @@ SYSCALL_DEFINE2(timer_gettime32, timer_t SYSCALL_DEFINE1(timer_getoverrun, timer_t, timer_id) { struct k_itimer *timr; - unsigned long flags; int overrun; =20 - timr =3D lock_timer(timer_id, &flags); + timr =3D lock_timer(timer_id); if (!timr) return -EINVAL; =20 overrun =3D timer_overrun_to_int(timr); - unlock_timer(timr, flags); + unlock_timer(timr); =20 return overrun; } @@ -813,14 +811,13 @@ static void common_timer_wait_running(st * when the task which tries to delete or disarm the timer has preempted * the task which runs the expiry in task work context. */ -static struct k_itimer *timer_wait_running(struct k_itimer *timer, - unsigned long *flags) +static struct k_itimer *timer_wait_running(struct k_itimer *timer) { timer_t timer_id =3D READ_ONCE(timer->it_id); =20 /* Prevent kfree(timer) after dropping the lock */ scoped_guard (rcu) { - unlock_timer(timer, *flags); + unlock_timer(timer); /* * kc->timer_wait_running() might drop RCU lock. So @timer * cannot be touched anymore after the function returns! @@ -829,7 +826,7 @@ static struct k_itimer *timer_wait_runni } =20 /* Relock the timer. It might be not longer hashed. */ - return lock_timer(timer_id, flags); + return lock_timer(timer_id); } =20 /* @@ -889,7 +886,6 @@ static int do_timer_settime(timer_t time struct itimerspec64 *old_spec64) { struct k_itimer *timr; - unsigned long flags; int error; =20 if (!timespec64_valid(&new_spec64->it_interval) || @@ -899,7 +895,7 @@ static int do_timer_settime(timer_t time if (old_spec64) memset(old_spec64, 0, sizeof(*old_spec64)); =20 - timr =3D lock_timer(timer_id, &flags); + timr =3D lock_timer(timer_id); retry: if (!timr) return -EINVAL; @@ -916,10 +912,10 @@ static int do_timer_settime(timer_t time // We already got the old time... old_spec64 =3D NULL; /* Unlocks and relocks the timer if it still exists */ - timr =3D timer_wait_running(timr, &flags); + timr =3D timer_wait_running(timr); goto retry; } - unlock_timer(timr, flags); + unlock_timer(timr); =20 return error; } @@ -995,10 +991,7 @@ static inline void posix_timer_cleanup_i /* Delete a POSIX.1b interval timer. */ SYSCALL_DEFINE1(timer_delete, timer_t, timer_id) { - struct k_itimer *timer; - unsigned long flags; - - timer =3D lock_timer(timer_id, &flags); + struct k_itimer *timer =3D lock_timer(timer_id); =20 retry_delete: if (!timer) @@ -1009,7 +1002,7 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t =20 if (unlikely(timer->kclock->timer_del(timer) =3D=3D TIMER_RETRY)) { /* Unlocks and relocks the timer if it still exists */ - timer =3D timer_wait_running(timer, &flags); + timer =3D timer_wait_running(timer); goto retry_delete; } =20 @@ -1028,7 +1021,7 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t WRITE_ONCE(timer->it_signal, NULL); } =20 - unlock_timer(timer, flags); + unlock_timer(timer); posix_timer_unhash_and_free(timer); return 0; } @@ -1039,12 +1032,7 @@ SYSCALL_DEFINE1(timer_delete, timer_t, t */ static void itimer_delete(struct k_itimer *timer) { - unsigned long flags; - - /* - * irqsave is required to make timer_wait_running() work. - */ - spin_lock_irqsave(&timer->it_lock, flags); + spin_lock_irq(&timer->it_lock); =20 retry_delete: /* @@ -1065,7 +1053,7 @@ static void itimer_delete(struct k_itime * do_exit() only for the last thread of the thread group. * So no other task can access and delete that timer. */ - if (WARN_ON_ONCE(timer_wait_running(timer, &flags) !=3D timer)) + if (WARN_ON_ONCE(timer_wait_running(timer) !=3D timer)) return; =20 goto retry_delete; @@ -1082,7 +1070,7 @@ static void itimer_delete(struct k_itime */ WRITE_ONCE(timer->it_signal, NULL); =20 - spin_unlock_irqrestore(&timer->it_lock, flags); + spin_unlock_irq(&timer->it_lock); posix_timer_unhash_and_free(timer); } From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C90220C01A for ; Sat, 8 Mar 2025 16:48:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452518; cv=none; b=ake/h5Yy45YkW9hNAaK+GgX0nxmZ+IFiriDuDTp8dibWiiXOTVqpvGOe4xjle/PpNnlla2eHTXAc4bFzSRBJveneE091jsQmv2+4BqLd/ZftPI0+HjIUazSG/MjzvgUMjaYILYoxyBGrE2XRFQPutucDUFq415uVFHwgkU6ICD0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452518; c=relaxed/simple; bh=Xsyv8zUo7pYEd0v+/cbt4egONg1kPeuwcOZXZdkx3ac=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=BHnqidr22xBropvCcEGb4zG7w7mYjCsfdYk+jqOZzHwNU2aQebYRBBPLFyEyIvqaV+zAb3x1YwBI7gqdtHKvAOM2thlmKQq+DF5iqOw5vc4pZTsTf/4O3100ffUwcFzdTtBS+l2pKq5bzzLj5zAn+1icgZYzjxjSZ6biYFsWPoQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=nhYW9NsQ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=eLG7w+iU; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="nhYW9NsQ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="eLG7w+iU" Message-ID: <20250308155624.024143438@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452514; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=k9/CnN0OtDuhJLBeiXAHb6gIq/b49+wFXhJ3KQcy60Y=; b=nhYW9NsQsuk8LaQZxgUH65NoCaQmzoTILbQjRHfMXlltqVfrPHzxqIeuBZxcDAd1D7W5lh Lio+toX/PMAgqImiGqUL+eOUHvVTEUeWUipd3p1k50aKEsPrraBCwl6u87jFt42qd89zjY tBZA11kjD5B4/30z2ipqw5jPRlHQY9TRwwWjE7TN7AZr142gP27FeBCrJHQve+YHnwX1GG 5SbasuN7Y3lqh4aWndqIoCnPYEhystDGZumYhknSxvrTrS0+zzluGyx77qTN0rVSRuc7mU o9sxWp4DJGhx0qk9FMszPBCUSNx6EKtNdUp5do/I6yelBQrje3ZvvzWo10gD1g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452514; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=k9/CnN0OtDuhJLBeiXAHb6gIq/b49+wFXhJ3KQcy60Y=; b=eLG7w+iU25DGVnK68ar1NM2YSzmSEsHD7f68Mt8UCGICVHFi+s228g4cNqvzba8+42Nfqx xiRYSv11UreBZ6BA== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 09/18] posix-timers: Rework timer removal References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:32 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" sys_timer_delete() and the do_exit() cleanup function itimer_delete() are doing the same thing, but have needlessly different implementations instead of sharing the code. The other oddity of timer deletion is the fact that the timer is not invalidated before the actual deletion happens, which allows concurrent lookups to succeed. That's wrong because a timer which is in the process of being deleted should not be visible and any actions like signal queueing, delivery and rearming should not happen once the task, which invoked timer_delete(), has the timer locked. Rework the code so that: 1) The signal queueing and delivery code ignore timers which are marked invalid 2) The deletion implementation between sys_timer_delete() and itimer_delete() is shared 3) The timer is invalidated and removed from the linked lists before the deletion callback of the relevant clock is invoked. That requires to rework timer_wait_running() as it does a lookup of the timer when relocking it at the end. In case of deletion this lookup would fail due to the preceding invalidation and the wait loop would terminate prematurely. But due to the preceding invalidation the timer cannot be accessed by other tasks anymore, so there is no way that the timer has been freed after the timer lock has been dropped. Move the re-validation out of timer_wait_running() and handle it at the only other usage site, timer_settime(). Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- V2: Simplify timer_wait_running() locking - PeterZ --- include/linux/posix-timers.h | 7 + kernel/signal.c | 2=20 kernel/time/posix-timers.c | 194 ++++++++++++++++++--------------------= ----- 3 files changed, 90 insertions(+), 113 deletions(-) --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -240,6 +240,13 @@ static inline void posixtimer_sigqueue_p =20 posixtimer_putref(tmr); } + +static inline bool posixtimer_valid(const struct k_itimer *timer) +{ + unsigned long val =3D (unsigned long)timer->it_signal; + + return !(val & 0x1UL); +} #else /* CONFIG_POSIX_TIMERS */ static inline void posixtimer_sigqueue_getref(struct sigqueue *q) { } static inline void posixtimer_sigqueue_putref(struct sigqueue *q) { } --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2092,7 +2092,7 @@ static inline void posixtimer_sig_ignore * from a non-periodic timer, then just drop the reference * count. Otherwise queue it on the ignored list. */ - if (tmr->it_signal && tmr->it_sig_periodic) + if (posixtimer_valid(tmr) && tmr->it_sig_periodic) hlist_add_head(&tmr->ignored_list, &tsk->signal->ignored_posix_timers); else posixtimer_putref(tmr); --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -279,7 +279,7 @@ static bool __posixtimer_deliver_signal( * since the signal was queued. In either case, don't rearm and * drop the signal. */ - if (timr->it_signal_seq !=3D timr->it_sigqueue_seq || WARN_ON_ONCE(!timr-= >it_signal)) + if (timr->it_signal_seq !=3D timr->it_sigqueue_seq || !posixtimer_valid(t= imr)) return false; =20 if (!timr->it_interval || WARN_ON_ONCE(timr->it_status !=3D POSIX_TIMER_R= EQUEUE_PENDING)) @@ -324,6 +324,9 @@ void posix_timer_queue_signal(struct k_i { lockdep_assert_held(&timr->it_lock); =20 + if (!posixtimer_valid(timr)) + return; + timr->it_status =3D timr->it_interval ? POSIX_TIMER_REQUEUE_PENDING : POS= IX_TIMER_DISARMED; posixtimer_send_sigqueue(timr); } @@ -553,11 +556,11 @@ static struct k_itimer *__lock_timer(tim * The hash lookup and the timers are RCU protected. * * Timers are added to the hash in invalid state where - * timr::it_signal =3D=3D NULL. timer::it_signal is only set after the - * rest of the initialization succeeded. + * timr::it_signal is marked invalid. timer::it_signal is only set + * after the rest of the initialization succeeded. * * Timer destruction happens in steps: - * 1) Set timr::it_signal to NULL with timr::it_lock held + * 1) Set timr::it_signal marked invalid with timr::it_lock held * 2) Release timr::it_lock * 3) Remove from the hash under hash_lock * 4) Put the reference count. @@ -574,8 +577,8 @@ static struct k_itimer *__lock_timer(tim * * The lookup validates locklessly that timr::it_signal =3D=3D * current::it_signal and timr::it_id =3D=3D @timer_id. timr::it_id - * can't change, but timr::it_signal becomes NULL during - * destruction. + * can't change, but timr::it_signal can become invalid during + * destruction, which makes the locked check fail. */ guard(rcu)(); timr =3D posix_timer_by_id(timer_id); @@ -811,22 +814,13 @@ static void common_timer_wait_running(st * when the task which tries to delete or disarm the timer has preempted * the task which runs the expiry in task work context. */ -static struct k_itimer *timer_wait_running(struct k_itimer *timer) +static void timer_wait_running(struct k_itimer *timer) { - timer_t timer_id =3D READ_ONCE(timer->it_id); - - /* Prevent kfree(timer) after dropping the lock */ - scoped_guard (rcu) { - unlock_timer(timer); - /* - * kc->timer_wait_running() might drop RCU lock. So @timer - * cannot be touched anymore after the function returns! - */ - timer->kclock->timer_wait_running(timer); - } - - /* Relock the timer. It might be not longer hashed. */ - return lock_timer(timer_id); + /* + * kc->timer_wait_running() might drop RCU lock. So @timer + * cannot be touched anymore after the function returns! + */ + timer->kclock->timer_wait_running(timer); } =20 /* @@ -885,8 +879,7 @@ static int do_timer_settime(timer_t time struct itimerspec64 *new_spec64, struct itimerspec64 *old_spec64) { - struct k_itimer *timr; - int error; + int ret; =20 if (!timespec64_valid(&new_spec64->it_interval) || !timespec64_valid(&new_spec64->it_value)) @@ -895,29 +888,36 @@ static int do_timer_settime(timer_t time if (old_spec64) memset(old_spec64, 0, sizeof(*old_spec64)); =20 - timr =3D lock_timer(timer_id); -retry: - if (!timr) - return -EINVAL; + for (;;) { + struct k_itimer *timr =3D lock_timer(timer_id); =20 - if (old_spec64) - old_spec64->it_interval =3D ktime_to_timespec64(timr->it_interval); + if (!timr) + return -EINVAL; =20 - /* Prevent signal delivery and rearming. */ - timr->it_signal_seq++; + if (old_spec64) + old_spec64->it_interval =3D ktime_to_timespec64(timr->it_interval); =20 - error =3D timr->kclock->timer_set(timr, tmr_flags, new_spec64, old_spec64= ); + /* Prevent signal delivery and rearming. */ + timr->it_signal_seq++; + + ret =3D timr->kclock->timer_set(timr, tmr_flags, new_spec64, old_spec64); + if (ret !=3D TIMER_RETRY) { + unlock_timer(timr); + break; + } =20 - if (error =3D=3D TIMER_RETRY) { - // We already got the old time... + /* Read the old time only once */ old_spec64 =3D NULL; - /* Unlocks and relocks the timer if it still exists */ - timr =3D timer_wait_running(timr); - goto retry; + /* Protect the timer from being freed after the lock is dropped */ + guard(rcu)(); + unlock_timer(timr); + /* + * timer_wait_running() might drop RCU read side protection + * so the timer has to be looked up again! + */ + timer_wait_running(timr); } - unlock_timer(timr); - - return error; + return ret; } =20 /* Set a POSIX.1b interval timer */ @@ -988,90 +988,56 @@ static inline void posix_timer_cleanup_i } } =20 -/* Delete a POSIX.1b interval timer. */ -SYSCALL_DEFINE1(timer_delete, timer_t, timer_id) +static void posix_timer_delete(struct k_itimer *timer) { - struct k_itimer *timer =3D lock_timer(timer_id); - -retry_delete: - if (!timer) - return -EINVAL; - - /* Prevent signal delivery and rearming. */ + /* + * Invalidate the timer, remove it from the linked list and remove + * it from the ignored list if pending. + * + * The invalidation must be written with siglock held so that the + * signal code observes timer->it_valid =3D=3D false in do_sigaction(), + * which prevents it from moving a pending signal of a deleted + * timer to the ignore list. + * + * The invalidation also prevents signal queueing, signal delivery + * and therefore rearming from the signal delivery path. + * + * A concurrent lookup can still find the timer in the hash, but it + * will check timer::it_signal with timer::it_lock held and observe + * bit 0 set, which invalidates it. That also prevents the timer ID + * from being handed out before this timer is completely gone. + */ timer->it_signal_seq++; =20 - if (unlikely(timer->kclock->timer_del(timer) =3D=3D TIMER_RETRY)) { - /* Unlocks and relocks the timer if it still exists */ - timer =3D timer_wait_running(timer); - goto retry_delete; - } - scoped_guard (spinlock, ¤t->sighand->siglock) { + unsigned long sig =3D (unsigned long)timer->it_signal | 1UL; + + WRITE_ONCE(timer->it_signal, (struct signal_struct *)sig); hlist_del(&timer->list); posix_timer_cleanup_ignored(timer); - /* - * A concurrent lookup could check timer::it_signal lockless. It - * will reevaluate with timer::it_lock held and observe the NULL. - * - * It must be written with siglock held so that the signal code - * observes timer->it_signal =3D=3D NULL in do_sigaction(SIG_IGN), - * which prevents it from moving a pending signal of a deleted - * timer to the ignore list. - */ - WRITE_ONCE(timer->it_signal, NULL); } =20 - unlock_timer(timer); - posix_timer_unhash_and_free(timer); - return 0; + while (timer->kclock->timer_del(timer) =3D=3D TIMER_RETRY) { + guard(rcu)(); + spin_unlock_irq(&timer->it_lock); + timer_wait_running(timer); + spin_lock_irq(&timer->it_lock); + } } =20 -/* - * Delete a timer if it is armed, remove it from the hash and schedule it - * for RCU freeing. - */ -static void itimer_delete(struct k_itimer *timer) +/* Delete a POSIX.1b interval timer. */ +SYSCALL_DEFINE1(timer_delete, timer_t, timer_id) { - spin_lock_irq(&timer->it_lock); - -retry_delete: - /* - * Even if the timer is not longer accessible from other tasks - * it still might be armed and queued in the underlying timer - * mechanism. Worse, that timer mechanism might run the expiry - * function concurrently. - */ - if (timer->kclock->timer_del(timer) =3D=3D TIMER_RETRY) { - /* - * Timer is expired concurrently, prevent livelocks - * and pointless spinning on RT. - * - * timer_wait_running() drops timer::it_lock, which opens - * the possibility for another task to delete the timer. - * - * That's not possible here because this is invoked from - * do_exit() only for the last thread of the thread group. - * So no other task can access and delete that timer. - */ - if (WARN_ON_ONCE(timer_wait_running(timer) !=3D timer)) - return; - - goto retry_delete; - } - hlist_del(&timer->list); - - posix_timer_cleanup_ignored(timer); + struct k_itimer *timer =3D lock_timer(timer_id); =20 - /* - * Setting timer::it_signal to NULL is technically not required - * here as nothing can access the timer anymore legitimately via - * the hash table. Set it to NULL nevertheless so that all deletion - * paths are consistent. - */ - WRITE_ONCE(timer->it_signal, NULL); + if (!timer) + return -EINVAL; =20 - spin_unlock_irq(&timer->it_lock); + posix_timer_delete(timer); + unlock_timer(timer); + /* Remove it from the hash, which frees up the timer ID */ posix_timer_unhash_and_free(timer); + return 0; } =20 /* @@ -1082,6 +1048,8 @@ static void itimer_delete(struct k_itime void exit_itimers(struct task_struct *tsk) { struct hlist_head timers; + struct hlist_node *next; + struct k_itimer *timer; =20 if (hlist_empty(&tsk->signal->posix_timers)) return; @@ -1091,8 +1059,10 @@ void exit_itimers(struct task_struct *ts hlist_move_list(&tsk->signal->posix_timers, &timers); =20 /* The timers are not longer accessible via tsk::signal */ - while (!hlist_empty(&timers)) { - itimer_delete(hlist_entry(timers.first, struct k_itimer, list)); + hlist_for_each_entry_safe(timer, next, &timers, list) { + scoped_guard (spinlock_irq, &timer->it_lock) + posix_timer_delete(timer); + posix_timer_unhash_and_free(timer); cond_resched(); } From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8C0A20C02A for ; Sat, 8 Mar 2025 16:48:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452518; cv=none; b=adzLzMTausxCOzkuXWiaI7Suy/Ebug76zj+mK85MH4kN51tHPRjN1jOVGkC4xBurFVlZX87PkcaP7nQoQsfqMGY/dNF3Ol/NFXydRK3UcGLMUvasiksPROenUeDoBiR9IeGeS7nm6s/Xx1qoP7RaFMr7E5ik3ANPIfZGkekygKs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452518; c=relaxed/simple; bh=cDTvKJ7glOg38+u+qSpJSVRSKM0pSQleUzTVt8lOuJM=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=t+l2iRpzTmHU1qo/izBjRomGk7Rk4ctYID60QBjrFW8DPLRba+lQ2hHCWauDeLFY/NjW2xkaJ59yTqfnM2X2H5ZrfDyUiUH/74POVnKOK8ke7BPXTUs2Vp1VgFbQbE+40MWVXEFcu3Z03SpfDZKOsWeQM3f5NxeOqHwEK4Q4ooc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=pu0HAli2; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=syF67B0A; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="pu0HAli2"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="syF67B0A" Message-ID: <20250308155624.087465658@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452515; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=EH9hzlLkFC7MZ2HiWgms2LWFzcPGV/q2R7oMZg7igs8=; b=pu0HAli2KvtdhdWo890KELe38fdyZpusrbwxiHE/UHECvqWgvgEFyM1whVxRQsWplCD1pJ cD3C9s/yxyaldlorgoUThx6I3O02knOLbPDP9XQfcGyX6BTwN+9Iqpy3q/20wNVqFgvbEL nc8nPPE/oh2q1uvcV3Izgnxd60syWjoj1RuE9dzKct0mhWqmWqPR9fVfhErBplU/Sw/1eZ hcIR8rP1EwjcMb7P691glYUUkdEmO4ufauC2prtQNY7GKUdXDVX/MoGc4I5aR6L/HlR4Y5 mf4zHTxAYjUD/TqETGCi8DxLhkqRPLKneLT1/GdFiWse2oizNxmW/U/sfh7j5g== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452515; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=EH9hzlLkFC7MZ2HiWgms2LWFzcPGV/q2R7oMZg7igs8=; b=syF67B0AnE/jr7O+8VQaq9BnPp2WgUJ6jhEaMgQe96ffcWFLaWvuMUQ6CxVlLMOW6eX4ib UX7RFkS7djB2GsAw== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 10/18] posix-timers: Make lock_timer() use guard() References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:34 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra The lookup and locking of posix timers requires the same repeating pattern at all usage sites: tmr =3D lock_timer(tiner_id); if (!tmr) return -EINVAL; .... unlock_timer(tmr); Solve this with a guard implementation, which works in most places out of the box except for those, which need to unlock the timer inside the guard scope. Though the only places where this matters are timer_delete() and timer_settime(). In both cases the timer pointer needs to be preserved across the end of the scope, which is solved by storing the pointer in a variable outside of the scope. timer_settime() also has to protect the timer with RCU before unlocking, which obviously can't use guard(rcu) before leaving the guard scope as that guard is cleaned up before the unlock. Solve this by providing the RCU protection open coded. [ tglx: Made it work and added change log ] Signed-off-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/all/20250224162103.GD11590@noisy.programming.= kicks-ass.net Acked-by: Frederic Weisbecker --- V2a: Make unlock conditional - 0day V2: New patch --- include/linux/cleanup.h | 22 ++++++---- kernel/time/posix-timers.c | 94 +++++++++++++++++-----------------------= ----- 2 files changed, 51 insertions(+), 65 deletions(-) --- a/include/linux/cleanup.h +++ b/include/linux/cleanup.h @@ -291,11 +291,21 @@ static inline class_##_name##_t class_## #define __DEFINE_CLASS_IS_CONDITIONAL(_name, _is_cond) \ static __maybe_unused const bool class_##_name##_is_conditional =3D _is_co= nd =20 -#define DEFINE_GUARD(_name, _type, _lock, _unlock) \ +#define __DEFINE_GUARD_LOCK_PTR(_name, _exp) \ + static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ + { return (void *)(__force unsigned long)*(_exp); } + +#define DEFINE_CLASS_IS_GUARD(_name) \ __DEFINE_CLASS_IS_CONDITIONAL(_name, false); \ + __DEFINE_GUARD_LOCK_PTR(_name, _T) + +#define DEFINE_CLASS_IS_COND_GUARD(_name) \ + __DEFINE_CLASS_IS_CONDITIONAL(_name, true); \ + __DEFINE_GUARD_LOCK_PTR(_name, _T) + +#define DEFINE_GUARD(_name, _type, _lock, _unlock) \ DEFINE_CLASS(_name, _type, if (_T) { _unlock; }, ({ _lock; _T; }), _type = _T); \ - static inline void * class_##_name##_lock_ptr(class_##_name##_t *_T) \ - { return (void *)(__force unsigned long)*_T; } + DEFINE_CLASS_IS_GUARD(_name) =20 #define DEFINE_GUARD_COND(_name, _ext, _condlock) \ __DEFINE_CLASS_IS_CONDITIONAL(_name##_ext, true); \ @@ -375,11 +385,7 @@ static inline void class_##_name##_destr if (_T->lock) { _unlock; } \ } \ \ -static inline void *class_##_name##_lock_ptr(class_##_name##_t *_T) \ -{ \ - return (void *)(__force unsigned long)_T->lock; \ -} - +__DEFINE_GUARD_LOCK_PTR(_name, &_T->lock) =20 #define __DEFINE_LOCK_GUARD_1(_name, _type, _lock) \ static inline class_##_name##_t class_##_name##_constructor(_type *l) \ --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -63,9 +63,18 @@ static struct k_itimer *__lock_timer(tim =20 static inline void unlock_timer(struct k_itimer *timr) { - spin_unlock_irq(&timr->it_lock); + if (likely((timr))) + spin_unlock_irq(&timr->it_lock); } =20 +#define scoped_timer_get_or_fail(_id) \ + scoped_cond_guard(lock_timer, return -EINVAL, _id) + +#define scoped_timer (scope) + +DEFINE_CLASS(lock_timer, struct k_itimer *, unlock_timer(_T), __lock_timer= (id), timer_t id); +DEFINE_CLASS_IS_COND_GUARD(lock_timer); + static int hash(struct signal_struct *sig, unsigned int nr) { return hash_32(hash32_ptr(sig) ^ nr, HASH_BITS(posix_timers_hashtable)); @@ -682,18 +691,10 @@ void common_timer_get(struct k_itimer *t =20 static int do_timer_gettime(timer_t timer_id, struct itimerspec64 *settin= g) { - struct k_itimer *timr; - int ret =3D 0; - - timr =3D lock_timer(timer_id); - if (!timr) - return -EINVAL; - memset(setting, 0, sizeof(*setting)); - timr->kclock->timer_get(timr, setting); - - unlock_timer(timr); - return ret; + scoped_timer_get_or_fail(timer_id) + scoped_timer->kclock->timer_get(scoped_timer, setting); + return 0; } =20 /* Get the time remaining on a POSIX.1b interval timer. */ @@ -747,17 +748,8 @@ SYSCALL_DEFINE2(timer_gettime32, timer_t */ SYSCALL_DEFINE1(timer_getoverrun, timer_t, timer_id) { - struct k_itimer *timr; - int overrun; - - timr =3D lock_timer(timer_id); - if (!timr) - return -EINVAL; - - overrun =3D timer_overrun_to_int(timr); - unlock_timer(timr); - - return overrun; + scoped_timer_get_or_fail(timer_id) + return timer_overrun_to_int(scoped_timer); } =20 static void common_hrtimer_arm(struct k_itimer *timr, ktime_t expires, @@ -875,12 +867,9 @@ int common_timer_set(struct k_itimer *ti return 0; } =20 -static int do_timer_settime(timer_t timer_id, int tmr_flags, - struct itimerspec64 *new_spec64, +static int do_timer_settime(timer_t timer_id, int tmr_flags, struct itimer= spec64 *new_spec64, struct itimerspec64 *old_spec64) { - int ret; - if (!timespec64_valid(&new_spec64->it_interval) || !timespec64_valid(&new_spec64->it_value)) return -EINVAL; @@ -888,36 +877,28 @@ static int do_timer_settime(timer_t time if (old_spec64) memset(old_spec64, 0, sizeof(*old_spec64)); =20 - for (;;) { - struct k_itimer *timr =3D lock_timer(timer_id); + for (; ; old_spec64 =3D NULL) { + struct k_itimer *timr; =20 - if (!timr) - return -EINVAL; + scoped_timer_get_or_fail(timer_id) { + timr =3D scoped_timer; =20 - if (old_spec64) - old_spec64->it_interval =3D ktime_to_timespec64(timr->it_interval); + if (old_spec64) + old_spec64->it_interval =3D ktime_to_timespec64(timr->it_interval); =20 - /* Prevent signal delivery and rearming. */ - timr->it_signal_seq++; - - ret =3D timr->kclock->timer_set(timr, tmr_flags, new_spec64, old_spec64); - if (ret !=3D TIMER_RETRY) { - unlock_timer(timr); - break; - } + /* Prevent signal delivery and rearming. */ + timr->it_signal_seq++; =20 - /* Read the old time only once */ - old_spec64 =3D NULL; - /* Protect the timer from being freed after the lock is dropped */ - guard(rcu)(); - unlock_timer(timr); - /* - * timer_wait_running() might drop RCU read side protection - * so the timer has to be looked up again! - */ + int ret =3D timr->kclock->timer_set(timr, tmr_flags, new_spec64, old_sp= ec64); + if (ret !=3D TIMER_RETRY) + return ret; + + /* Protect the timer from being freed when leaving the lock scope */ + rcu_read_lock(); + } timer_wait_running(timr); + rcu_read_unlock(); } - return ret; } =20 /* Set a POSIX.1b interval timer */ @@ -1028,13 +1009,12 @@ static void posix_timer_delete(struct k_ /* Delete a POSIX.1b interval timer. */ SYSCALL_DEFINE1(timer_delete, timer_t, timer_id) { - struct k_itimer *timer =3D lock_timer(timer_id); - - if (!timer) - return -EINVAL; + struct k_itimer *timer; =20 - posix_timer_delete(timer); - unlock_timer(timer); + scoped_timer_get_or_fail(timer_id) { + timer =3D scoped_timer; + posix_timer_delete(timer); + } /* Remove it from the hash, which frees up the timer ID */ posix_timer_unhash_and_free(timer); return 0; From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DDA2420D4F4 for ; Sat, 8 Mar 2025 16:48:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452520; cv=none; b=EUiYTjmmKT78O6Fo/rBDVdp4+5zO6PzFIeFZS6BLO9ZfbxEuxCdizgnZoiY/eFXysq2pcuhWPdZa9JhL0fPfokMVsZLhNIq/zyWL96EyfS5ipSh9XGXOE6IEsMbWWoCSzzbb6vPdlyl/h7pll2VjPYDPNQoMiivf33Pf722OCd4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452520; c=relaxed/simple; bh=VZRiy9Nx/9QJr+labI5YPBS7f+6rHBvP8+BxprYYoIY=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=nYd/LssXKs1t/4AFX+B5/PyS3v7gFbNFTXZtD9S30nl/E716NAtjnO6579QXHUI8o0koblzkKGQ2ypJgDlic8kQrALSB0OQ3ORgs4ZdrY9rsl8GT7zDjYzkOhX0s+ZduHjo2d9SK8gsb1Q9tAqAIojqtKPumSwZBmfrSGIMSS1s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=VheeN7ep; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=BmJb6W/q; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VheeN7ep"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="BmJb6W/q" Message-ID: <20250308155624.151545978@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452517; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=wKjKiY6BXUNfp2GLLE0gAGlF2vV5x60AlACNOqAMXBY=; b=VheeN7epUs8M9aifJ+UNTdIDWvKhlvDj+OA1QvPCnDmYrRpyK0GaU7cX5TIt7WRGIyAUga NGL1Yi5xOiU59d2+0fCcrJwKlFSXi01ZylpiHA9d2x0bdQa4tjsqKkUj1tfnUX1cXdPIzN /Mia678S+rRhEXaGeutbK20lalvN9smmcNiWi5dWfIYZuiDavmN7ncnClPaIp2Nk2z0YAE Azke5/0oDJxnr2NeX+rtzi2jqot96io6r1j0PZbbIidGe4dT/GlhX9eTr3kUdOaV60ckGi mDXNW34JSEpCMaerihOT2O1Tp3h6FVvCdsXUDCnvSlb1aYoTrjXdChJOxhjWyA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452517; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=wKjKiY6BXUNfp2GLLE0gAGlF2vV5x60AlACNOqAMXBY=; b=BmJb6W/q1Aox7gou+kNLcpDjFhoVAz3QN50kIYadTJAE76oeJaRsD8Qh5g5cLpmmP5MP7y 8hxi4Uu+swuCAGBg== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 11/18] posix-timers: Make signal_struct:: Next_posix_timer_id an atomic_t References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:36 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Eric Dumazet The global hash_lock protecting the posix timer hash table can be heavily contended especially when there is an extensive linear search for a timer ID. Timer IDs are handed out by monotonically increasing next_posix_timer_id and then validating that there is no timer with the same ID in the hash table. Both operations happen with the global hash lock held. To reduce the hash lock contention the hash will be reworked to a scaled hash with per bucket locks, which requires to handle the ID counter lockless. Prepare for this by making next_posix_timer_id an atomic_t, which can be used lockless with atomic_inc_return(). [ tglx: Adopted from Eric's series, massaged change log and simplified it ] Signed-off-by: Eric Dumazet Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/all/20250219125522.2535263-2-edumazet@google.= com Acked-by: Frederic Weisbecker Reviewed-by: Frederic Weisbecker --- V2: Use atomic_fetch_inc() - PeterZ --- include/linux/sched/signal.h | 2 +- kernel/time/posix-timers.c | 14 +++++--------- 2 files changed, 6 insertions(+), 10 deletions(-) --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -136,7 +136,7 @@ struct signal_struct { #ifdef CONFIG_POSIX_TIMERS =20 /* POSIX.1b Interval Timers */ - unsigned int next_posix_timer_id; + atomic_t next_posix_timer_id; struct hlist_head posix_timers; struct hlist_head ignored_posix_timers; =20 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -119,21 +119,17 @@ static bool posix_timer_hashed(struct hl static int posix_timer_add(struct k_itimer *timer) { struct signal_struct *sig =3D current->signal; - struct hlist_head *head; - unsigned int cnt, id; =20 /* * FIXME: Replace this by a per signal struct xarray once there is * a plan to handle the resulting CRIU regression gracefully. */ - for (cnt =3D 0; cnt <=3D INT_MAX; cnt++) { - spin_lock(&hash_lock); - id =3D sig->next_posix_timer_id; - - /* Write the next ID back. Clamp it to the positive space */ - sig->next_posix_timer_id =3D (id + 1) & INT_MAX; + for (unsigned int cnt =3D 0; cnt <=3D INT_MAX; cnt++) { + /* Get the next timer ID and clamp it to positive space */ + unsigned int id =3D atomic_fetch_inc(&sig->next_posix_timer_id) & INT_MA= X; + struct hlist_head *head =3D &posix_timers_hashtable[hash(sig, id)]; =20 - head =3D &posix_timers_hashtable[hash(sig, id)]; + spin_lock(&hash_lock); if (!posix_timer_hashed(head, sig, id)) { /* * Set the timer ID and the signal pointer to make From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FDE82116F9 for ; Sat, 8 Mar 2025 16:48:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452523; cv=none; b=SLeA0V5XipzgDOEF29ytVemrBYM32Y8fSco6GZFXE0JXmRnX0e3x0S2RfXLr/5PSaRUYLgkmisEQyq3swoR/t8DPTyqdPzBI+mT9alAxteiBf7JKcHi0Q5RgN2iRftN409EHQ809ieeOjL+z/Az58SzV1HxXgG1r50rx8C6JKxA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452523; c=relaxed/simple; bh=2/o2R4BirviPkWoJ6/Dg651itGMmDv8MODHfl5EArFQ=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=g0zoshNX98cSQqOg4gFQgncHM9DT+PBX9cOc5+vWO6MDvp8RVsqzt9TSFeh8V74zvzqX0sCrWwfxcOpCOPm1DSSxlVGmOiD7po3GKEZLFBhp48HC5HWJNGVtaIOfngrm/6SytI8FV8YH3HDiJ+POQy0J/ZowMCEO649A59hWWA4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=X9ZWYrrJ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=GmoK4Yr7; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="X9ZWYrrJ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="GmoK4Yr7" Message-ID: <20250308155624.216091571@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452519; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=2I5BEH4Z6z8YLEX/RSbeSzAbUTha00O8M0YQAue1B5c=; b=X9ZWYrrJv+A2Uf0YYgogALtRjsnGmG2Ls7r8VAfM+ykPFFDNivIrnuvl95KBglKvKiGgTW Qh7LQIDOtvhioEnNDy27puaLca9iOI6EyiwXqntalD86JZtVZ5+Pd92jrFKqBZLznr0Wgi +tQKG24JmBfHC62AVqY6eaK7xeOshVgxeMg2mN4PrTlKidcv0I0Uu8z+2+zP9xXKZCxmyU MTD7d3gb3qjHHpSyN98Ygt4TG0CYyVwGGKUJVdOJ8tcFtsfnNSdt453VWC2W6R9YkUoZS6 gd7f002Uks9tSK/hiBKkb2j1GVZDhGXCsTqHRoZUZYmsqHYLYEG3hD8jN7L8EQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452519; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=2I5BEH4Z6z8YLEX/RSbeSzAbUTha00O8M0YQAue1B5c=; b=GmoK4Yr7AZ9QxqIrpNwsElb1GLnDeyav/252u1quTmRZWtx6eBaUSBNsmPWAKfij2/mGMS 5KXqS8eY9SzMheAg== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 12/18] posix-timers: Improve hash table performance References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:38 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Eric and Ben reported a significant performance bottleneck on the global hash, which is used to store posix timers for lookup. Eric tried to do a lockless validation of a new timer ID before trying to insert the timer, but that does not solve the problem. For the non-contended case this is a pointless exercise and for the contended case this extra lookup just creates enough interleaving that all tasks can make progress. There are actually two real solutions to the problem: 1) Provide a per process (signal struct) xarray storage 2) Implement a smarter hash like the one in the futex code #1 works perfectly fine for most cases, but the fact that CRIU enforced a linear increasing timer ID to restore timers makes this problematic. It's easy enough to create a sparse timer ID space, which amounts very fast to a large junk of memory consumed for the xarray. 2048 timers with a ID offset of 512 consume more than one megabyte of memory for the xarray storage. #2 The main advantage of the futex hash is that it uses per hash bucket locks instead of a global hash lock. Aside of that it is scaled according to the number of CPUs at boot time. Experiments with artifical benchmarks have shown that a scaled hash with per bucket locks comes pretty close to the xarray performance and in some scenarios it performes better. Test 1: A single process creates 20000 timers and afterwards invokes timer_getoverrun(2) on each of them: mainline Eric newhash xarray create 23 ms 23 ms 9 ms 8 ms getoverrun 14 ms 14 ms 5 ms 4 ms Test 2: A single process creates 50000 timers and afterwards invokes timer_getoverrun(2) on each of them: mainline Eric newhash xarray create 98 ms 219 ms 20 ms 18 ms getoverrun 62 ms 62 ms 10 ms 9 ms Test 3: A single process creates 100000 timers and afterwards invokes timer_getoverrun(2) on each of them: mainline Eric newhash xarray create 313 ms 750 ms 48 ms 33 ms getoverrun 261 ms 260 ms 20 ms 14 ms Erics changes create quite some overhead in the create() path due to the double list walk, as the main issue according to perf is the list walk itself. With 100k timers each hash bucket contains ~200 timers, which in the worst case need to be all inspected. The same problem applies for getoverrun() where the lookup has to walk through the hash buckets to find the timer it is looking for. The scaled hash obviously reduces hash collisions and lock contention significantly. This becomes more prominent with concurrency. Test 4: A process creates 63 threads and all threads wait on a barrier before each instance creates 20000 timers and afterwards invokes timer_getoverrun(2) on each of them. The threads are pinned on seperate CPUs to achive maximum concurrency. The numbers are the average times per thread: mainline Eric newhash xarray create 180239 ms 38599 ms 579 ms 813 ms getoverrun 2645 ms 2642 ms 32 ms 7 ms Test 5: A process forks 63 times and all forks wait on a barrier before each instance creates 20000 timers and afterwards invokes timer_getoverrun(2) on each of them. The processes are pinned on seperate CPUs to achive maximum concurrency. The numbers are the average times per process: mainline eric newhash xarray create 157253 ms 40008 ms 83 ms 60 ms getoverrun 2611 ms 2614 ms 40 ms 4 ms So clearly the reduction of lock contention with Eric's changes makes a significant difference for the create() loop, but it does not mitigate the problem of long list walks, which is clearly visible on the getoverrun() side because that is purely dominated by the lookup itself. Once the timer is found, the syscall just reads from the timer structure with no other locks or code paths involved and returns. The reason for the difference between the thread and the fork case for the new hash and the xarray is that both suffer from contention on sighand::siglock and the xarray suffers additionally from contention on the xarray lock on insertion. The only case where the reworked hash slighly outperforms the xarray is a tight loop which creates and deletes timers. Test 4: A process creates 63 threads and all threads wait on a barrier before each instance runs a loop which creates and deletes a timer 100000 times in a row. The threads are pinned on seperate CPUs to achive maximum concurrency. The numbers are the average times per thread: mainline Eric newhash xarray loop 5917 ms 5897 ms 5473 ms 7846 ms Test 5: A process forks 63 times and all forks wait on a barrier before each each instance runs a loop which creates and deletes a timer 100000 times in a row. The processes are pinned on seperate CPUs to achive maximum concurrency. The numbers are the average times per process: mainline Eric newhash xarray loop 5137 ms 7828 ms 891 ms 872 ms In both test there is not much contention on the hash, but the ucount accounting for the signal and in the thread case the sighand::siglock contention (plus the xarray locking) contribute dominantly to the overhead. As the memory consumption of the xarray in the sparse ID case is significant, the scaled hash with per bucket locks seems to be the better overall option. While the xarray has faster lookup times for a large number of timers, the actual syscall usage, which requires the lookup is not an extreme hotpath. Most applications utilize signal delivery and all syscalls except timer_getoverrun(2) are all but cheap. So implement a scaled hash with per bucket locks, which offers the best tradeoff between performance and memory consumption. Reported-by: Eric Dumazet Reported-by: Benjamin Segall Signed-off-by: Thomas Gleixner Acked-by: Frederic Weisbecker --- V2: Replace hash() by hashbucket(), which returns the bucket pointer. --- kernel/time/posix-timers.c | 99 ++++++++++++++++++++++++++++++----------= ----- 1 file changed, 68 insertions(+), 31 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -12,10 +12,10 @@ #include #include #include -#include #include #include #include +#include #include #include #include @@ -40,8 +40,18 @@ static struct kmem_cache *posix_timers_c * This allows checkpoint/restore to reconstruct the exact timer IDs for * a process. */ -static DEFINE_HASHTABLE(posix_timers_hashtable, 9); -static DEFINE_SPINLOCK(hash_lock); +struct timer_hash_bucket { + spinlock_t lock; + struct hlist_head head; +}; + +static struct { + struct timer_hash_bucket *buckets; + unsigned long bits; +} __timer_data __ro_after_init __aligned(2*sizeof(long)); + +#define timer_buckets (__timer_data.buckets) +#define timer_hashbits (__timer_data.bits) =20 static const struct k_clock * const posix_clocks[]; static const struct k_clock *clockid_to_kclock(const clockid_t id); @@ -75,18 +85,18 @@ static inline void unlock_timer(struct k DEFINE_CLASS(lock_timer, struct k_itimer *, unlock_timer(_T), __lock_timer= (id), timer_t id); DEFINE_CLASS_IS_COND_GUARD(lock_timer); =20 -static int hash(struct signal_struct *sig, unsigned int nr) +static struct timer_hash_bucket *hash_bucket(struct signal_struct *sig, un= signed int nr) { - return hash_32(hash32_ptr(sig) ^ nr, HASH_BITS(posix_timers_hashtable)); + return &timer_buckets[hash_32(hash32_ptr(sig) ^ nr, timer_hashbits)]; } =20 static struct k_itimer *posix_timer_by_id(timer_t id) { struct signal_struct *sig =3D current->signal; - struct hlist_head *head =3D &posix_timers_hashtable[hash(sig, id)]; + struct timer_hash_bucket *bucket =3D hash_bucket(sig, id); struct k_itimer *timer; =20 - hlist_for_each_entry_rcu(timer, head, t_hash) { + hlist_for_each_entry_rcu(timer, &bucket->head, t_hash) { /* timer->it_signal can be set concurrently */ if ((READ_ONCE(timer->it_signal) =3D=3D sig) && (timer->it_id =3D=3D id)) return timer; @@ -105,11 +115,13 @@ static inline struct signal_struct *posi return (struct signal_struct *)(val & ~1UL); } =20 -static bool posix_timer_hashed(struct hlist_head *head, struct signal_stru= ct *sig, timer_t id) +static bool posix_timer_hashed(struct timer_hash_bucket *bucket, struct si= gnal_struct *sig, + timer_t id) { + struct hlist_head *head =3D &bucket->head; struct k_itimer *timer; =20 - hlist_for_each_entry_rcu(timer, head, t_hash, lockdep_is_held(&hash_lock)= ) { + hlist_for_each_entry_rcu(timer, head, t_hash, lockdep_is_held(&bucket->lo= ck)) { if ((posix_sig_owner(timer) =3D=3D sig) && (timer->it_id =3D=3D id)) return true; } @@ -120,34 +132,34 @@ static int posix_timer_add(struct k_itim { struct signal_struct *sig =3D current->signal; =20 - /* - * FIXME: Replace this by a per signal struct xarray once there is - * a plan to handle the resulting CRIU regression gracefully. - */ for (unsigned int cnt =3D 0; cnt <=3D INT_MAX; cnt++) { /* Get the next timer ID and clamp it to positive space */ unsigned int id =3D atomic_fetch_inc(&sig->next_posix_timer_id) & INT_MA= X; - struct hlist_head *head =3D &posix_timers_hashtable[hash(sig, id)]; + struct timer_hash_bucket *bucket =3D hash_bucket(sig, id); =20 - spin_lock(&hash_lock); - if (!posix_timer_hashed(head, sig, id)) { + scoped_guard (spinlock, &bucket->lock) { /* - * Set the timer ID and the signal pointer to make - * it identifiable in the hash table. The signal - * pointer has bit 0 set to indicate that it is not - * yet fully initialized. posix_timer_hashed() - * masks this bit out, but the syscall lookup fails - * to match due to it being set. This guarantees - * that there can't be duplicate timer IDs handed - * out. + * Validate under the lock as this could have raced + * against another thread ending up with the same + * ID, which is highly unlikely, but possible. */ - timer->it_id =3D (timer_t)id; - timer->it_signal =3D (struct signal_struct *)((unsigned long)sig | 1UL); - hlist_add_head_rcu(&timer->t_hash, head); - spin_unlock(&hash_lock); - return id; + if (!posix_timer_hashed(bucket, sig, id)) { + /* + * Set the timer ID and the signal pointer to make + * it identifiable in the hash table. The signal + * pointer has bit 0 set to indicate that it is not + * yet fully initialized. posix_timer_hashed() + * masks this bit out, but the syscall lookup fails + * to match due to it being set. This guarantees + * that there can't be duplicate timer IDs handed + * out. + */ + timer->it_id =3D (timer_t)id; + timer->it_signal =3D (struct signal_struct *)((unsigned long)sig | 1UL= ); + hlist_add_head_rcu(&timer->t_hash, &bucket->head); + return id; + } } - spin_unlock(&hash_lock); cond_resched(); } /* POSIX return code when no timer ID could be allocated */ @@ -405,7 +417,9 @@ void posixtimer_free_timer(struct k_itim =20 static void posix_timer_unhash_and_free(struct k_itimer *tmr) { - scoped_guard (spinlock, &hash_lock) + struct timer_hash_bucket *bucket =3D hash_bucket(posix_sig_owner(tmr), tm= r->it_id); + + scoped_guard (spinlock, &bucket->lock) hlist_del_rcu(&tmr->t_hash); posixtimer_putref(tmr); } @@ -1486,3 +1500,26 @@ static const struct k_clock *clockid_to_ =20 return posix_clocks[array_index_nospec(idx, ARRAY_SIZE(posix_clocks))]; } + +static int __init posixtimer_init(void) +{ + unsigned long i, size; + unsigned int shift; + + if (IS_ENABLED(CONFIG_BASE_SMALL)) + size =3D 512; + else + size =3D roundup_pow_of_two(512 * num_possible_cpus()); + + timer_buckets =3D alloc_large_system_hash("posixtimers", sizeof(*timer_bu= ckets), + size, 0, 0, &shift, NULL, size, size); + size =3D 1UL << shift; + timer_hashbits =3D ilog2(size); + + for (i =3D 0; i < size; i++) { + spin_lock_init(&timer_buckets[i].lock); + INIT_HLIST_HEAD(&timer_buckets[i].head); + } + return 0; +} +core_initcall(posixtimer_init); From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3DD0212FAF for ; Sat, 8 Mar 2025 16:48:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452524; cv=none; b=a5H7Jp+c60N03SgaTHDGMgx6dsfx8nlDLINZwoc9BgTq98wHLv4yTfS3PvtRnEYBaDLdhCFGx3YyPHxB2Jjjf7ZbZ4cFbZq5r72nw8Ql5zj0h17iP2mpiovY6rGdAzw2zSqkaEF4QV0PMSWPU0pf+uRK8GPAsXpzNJFhugpAt14= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452524; c=relaxed/simple; bh=8UmxBRmF5nXHswTC8BicM5u5C0y9sC394+CC1Pxv8T0=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=p5Rb1RfUdgSEBBb25O7btvr2r1OgxAErjc8c1Vg93DXV+a2ieLUvlCr0KZZeHiW+ILpPpTAhbqgx8Voi1PGnz/8RN6rvavPREBknXpaUSn/frYYc6/nFIC8TaKEQ6rXSV6hEJR0kVJMdVJGa1l+0kRLn1D1K0qr/rl4V7TDnP08= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=1Yb8b9gA; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=lLfVAiI6; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="1Yb8b9gA"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="lLfVAiI6" Message-ID: <20250308155624.279080328@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=qoabwlj1EGQBFXc6piAS9tlQQLf9azO7VG5Y2rcwn2E=; b=1Yb8b9gACJSVZytKQQAOpuF2ugs+XmbODu8KaRKmBjTMfOqNIcEB5EesHPeV0wr/RXeJqi pj43yoQLUmW+sywI0u+PHdKb/xC8JfP/HKtA+6gpKlVG/19mg44ynyyYW4edZZWXYh2rHr 1G5T513ETKtm+jpH9sE8vJ/B8nS7sd4mlelUePmw6XEImnUFZnSsW2O2A1zRCx/8zbpKWa xql/Tkb6iUF1UFGleZjSt8RCxOLyGxYTPx5xK6yijPGM+L6MisXnyfZQ+H75CUIlzp//oF /+8OvdYcpqAfXfFm2HLasprmhmhpYK8eDxEP1ZHyZ5bDUDD2/mr1M2x4ATCAHQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452521; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=qoabwlj1EGQBFXc6piAS9tlQQLf9azO7VG5Y2rcwn2E=; b=lLfVAiI6sTj6n8l5qWxvhVHz61+Vcr96tVrbyQL7/nHSOwXVxmTYiJvbdJ0FAt35lrkQgk PPYzcc9HpWFTEABg== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 13/18] posix-timers: Switch to jhash32() References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:40 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The hash distribution of hash_32() is suboptimal. jhash32() provides a way better distribution, which evens out the length of the hash bucket lists, which in turn avoids large outliers in list walk times. Due to the sparse ID space (thanks CRIU) there is no guarantee that the timers will be fully evenly distributed over the hash buckets, but the behaviour is way better than with hash_32() even for randomly sparse ID spaces. For a pathological test case with 64 processes creating and accessing 20000 timers each, this results in a runtime reduction of ~10% and a significantly reduced runtime variation. Signed-off-by: Thomas Gleixner --- V2: New patch --- kernel/time/posix-timers.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -11,8 +11,8 @@ */ #include #include -#include #include +#include #include #include #include @@ -47,11 +47,11 @@ struct timer_hash_bucket { =20 static struct { struct timer_hash_bucket *buckets; - unsigned long bits; + unsigned long mask; } __timer_data __ro_after_init __aligned(2*sizeof(long)); =20 #define timer_buckets (__timer_data.buckets) -#define timer_hashbits (__timer_data.bits) +#define timer_hashmask (__timer_data.mask) =20 static const struct k_clock * const posix_clocks[]; static const struct k_clock *clockid_to_kclock(const clockid_t id); @@ -87,7 +87,7 @@ DEFINE_CLASS_IS_COND_GUARD(lock_timer); =20 static struct timer_hash_bucket *hash_bucket(struct signal_struct *sig, un= signed int nr) { - return &timer_buckets[hash_32(hash32_ptr(sig) ^ nr, timer_hashbits)]; + return &timer_buckets[jhash2((u32 *)&sig, sizeof(sig) / sizeof(u32), nr) = & timer_hashmask]; } =20 static struct k_itimer *posix_timer_by_id(timer_t id) @@ -1514,7 +1514,7 @@ static int __init posixtimer_init(void) timer_buckets =3D alloc_large_system_hash("posixtimers", sizeof(*timer_bu= ckets), size, 0, 0, &shift, NULL, size, size); size =3D 1UL << shift; - timer_hashbits =3D ilog2(size); + timer_hashmask =3D size - 1; =20 for (i =3D 0; i < size; i++) { spin_lock_init(&timer_buckets[i].lock); From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 26921213229 for ; Sat, 8 Mar 2025 16:48:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452525; cv=none; b=SU2dyIEu6En9+03KVpSPxMrTLvUTZnesmYfvvoSn6e80TCLBnPC8wGVMMDtLka4om1vgq2EMf3jGY3FwyMcKm/6WW3ge/CGRfuESSUtaWo96b3wL0D7tJAmeLddIoGMiG3+vI1gyd7e211LQpQ4+0SHFn/Jt1pvP18djkMUQt/4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452525; c=relaxed/simple; bh=Hnjd65SCjNsoU60oyHNlwuhqbFiEqpOg9xXyB20JbVc=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=Mio27jbNADOFm1bBUiJuhtkfEntZNGmZkt0IgPjO1CEX1s67ANn63ZHXvJvd1p8cT7uPH6J767Q090pnCMTgMDGypcbVSuc4m7bTWBW9oxVc7+maE66c4RBdUtpdDjYxatdSUe3/TNAmuVNOigauEw7Gv8Qg2XqAcoBwJ9Eczr4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=fzLylccU; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=N5MGHlE+; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="fzLylccU"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="N5MGHlE+" Message-ID: <20250308155624.341108067@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452522; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=0lWpZlsveEhaW8rPU7ZtqYAYyCwF3SjzBjO3yEzqZ8E=; b=fzLylccULg1SuonefIhePh9Rmada9RJSmYwDxHUeJUZY2a0ctcNr4Z8bu+C3fSunD9K332 Uicm7drLAfYa8kQvqdSREgdL0YlL6jFrrdJpGVn7aLWweyOsWcwqY9TpL0UwbnirVnnAHX VEFER8kn3yjsuVPTM3/wbOAxPsApya6pQzgsp5rLTJDqChAuiQtwEG+ituja1fB9VvLa2p FOpDkc5Fupf9lYuRInmUG/aXwkm/J6uAQmGVXKRJ7+Z8XHGWoFQh9EwW+A/qOAAoMX3dm+ GvtZWJ/dgYNvLIYYm50XR9SJJQ/Zlm1BCFoa/FUdQwiJFvObd9jushkOg2X3Zw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452522; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=0lWpZlsveEhaW8rPU7ZtqYAYyCwF3SjzBjO3yEzqZ8E=; b=N5MGHlE+2F9jgeVMDdDj6wdnAVbnRiA0QwQGtgJIpeoq4Nf4BDAEPvvBMhxUmQZH1SUDP/ rKBndMd08nApYdCg== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 14/18] posix-timers: Avoid false cacheline sharing References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:42 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" struct k_itimer has the hlist_node, which is used for lookup in the hash bucket, and the timer lock in the same cache line. That's obviously bad, if one CPU fiddles with a timer and the other is walking the hash bucket on which that timer is queued. Avoid this by restructuring struct k_itimer, so that the read mostly (only modified during setup and teardown) fields are in the first cache line and the lock and the rest of the fields which get written to are in cacheline 2-N. Reduces cacheline contention in a test case of 64 processes creating and accessing 20000 timers each by almost 30% according to perf. Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- V2: New patch --- include/linux/posix-timers.h | 21 ++++++++++++--------- kernel/time/posix-timers.c | 4 ++-- 2 files changed, 14 insertions(+), 11 deletions(-) --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -177,23 +177,26 @@ static inline void posix_cputimers_init_ * @rcu: RCU head for freeing the timer. */ struct k_itimer { - struct hlist_node list; - struct hlist_node ignored_list; + /* 1st cacheline contains read-mostly fields */ struct hlist_node t_hash; - spinlock_t it_lock; - const struct k_clock *kclock; - clockid_t it_clock; + struct hlist_node list; timer_t it_id; + clockid_t it_clock; + int it_sigev_notify; + enum pid_type it_pid_type; + struct signal_struct *it_signal; + const struct k_clock *kclock; + + /* 2nd cacheline and above contain fields which are modified regularly */ + spinlock_t it_lock; int it_status; bool it_sig_periodic; s64 it_overrun; s64 it_overrun_last; unsigned int it_signal_seq; unsigned int it_sigqueue_seq; - int it_sigev_notify; - enum pid_type it_pid_type; ktime_t it_interval; - struct signal_struct *it_signal; + struct hlist_node ignored_list; union { struct pid *it_pid; struct task_struct *it_process; @@ -210,7 +213,7 @@ struct k_itimer { } alarm; } it; struct rcu_head rcu; -}; +} ____cacheline_aligned_in_smp; =20 void run_posix_cpu_timers(void); void posix_cpu_timers_exit(struct task_struct *task); --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -260,8 +260,8 @@ static int posix_get_hrtimer_res(clockid =20 static __init int init_posix_timers(void) { - posix_timers_cache =3D kmem_cache_create("posix_timers_cache", sizeof(str= uct k_itimer), 0, - SLAB_ACCOUNT, NULL); + posix_timers_cache =3D kmem_cache_create("posix_timers_cache", sizeof(str= uct k_itimer), + __alignof__(struct k_itimer), SLAB_ACCOUNT, NULL); return 0; } __initcall(init_posix_timers); From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D4CB2135B5 for ; Sat, 8 Mar 2025 16:48:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452527; cv=none; b=s8dRn72CIjlL/R1/DWOAK8p/fvPC8El7plVANcRli1SMUrkRII9g//5sIN9VBzjqAv8wXS/Qw5Q4aRQj6KL5jsat20f24LqZp0thnC+H8+CreJBykUDwLwTVE6jQLooBxKL22ANwtmPYnHZnrkPivpVbSJ4oRqXCtCVQoURBMQg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452527; c=relaxed/simple; bh=xrw28x75dV9mWlVJTIBGY8ufeGt1CIsJt1X57Uiqwf8=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=uDTAP+wkd1OXQhDu6YQ1kCLPp40rAtFtKNP+3kUwr613C4GxT93jtB3UWD17/bLa+nfqGC7idCrjKh123yNHOqGT0iKwcJzVEuXXUMdYUaEBt+lLZdZNlYUxbs+pQFtFGgxHfRUQO4giHJK7OD8Cglyz/OOyxfWE2T/LwcHEQFI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=dm73b3UJ; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=9Wo6uJ4g; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="dm73b3UJ"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="9Wo6uJ4g" Message-ID: <20250308155624.403223080@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452524; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=VWSKY1Cn7k9ladpPLF+g4at2C4cmmfpJ9GlzmcEqQkc=; b=dm73b3UJxmRk/X3PBRMHE7bTFGveq47pPzh+yU+vAx+G3MWbFQRNVz0RKRns/SSlwedA9p rgTnVMuXkXU+1RCggiRVWn6LjoOW88phRPfqIp3nfW4jSxyfxidOpmgi7/nq1NtLZES/do 8rrxu5hupL8d7B8Lc9W/dbi45s7Ku8ToBOKUY9iHJF7rJXwXoZvUrFF/g7z+YFfr/AO5tY v9OXYYscoCmbxHV9XZF2VcP3Z/dleAViWPsie2dY2W70esJqDkrTHYJnzq/R0Jmz+IYj01 g6wXERa27/f4zHdJgGxNpXS3IrJTYxVJ8iC12AwPoaEMjuKk6a8OwuEfhAUP6Q== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452524; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=VWSKY1Cn7k9ladpPLF+g4at2C4cmmfpJ9GlzmcEqQkc=; b=9Wo6uJ4g5Ge+OtjmgE4umaV68ErwmEe9bBNDVbj0srhaOMnCcwQZToZS0eV76v/+LBqPSN bdgssfjdVQB+xJAQ== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 15/18] posix-timers: Make per process list RCU safe References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:43 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Preparatory change to remove the sighand locking from the /proc/$PID/timers iterator. Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- kernel/time/posix-timers.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -518,7 +518,7 @@ static int do_timer_create(clockid_t whi * Store the unmodified signal pointer to make it valid. */ WRITE_ONCE(new_timer->it_signal, current->signal); - hlist_add_head(&new_timer->list, ¤t->signal->posix_timers); + hlist_add_head_rcu(&new_timer->list, ¤t->signal->posix_timers); } /* * After unlocking @new_timer is subject to concurrent removal and @@ -1004,7 +1004,7 @@ static void posix_timer_delete(struct k_ unsigned long sig =3D (unsigned long)timer->it_signal | 1UL; =20 WRITE_ONCE(timer->it_signal, (struct signal_struct *)sig); - hlist_del(&timer->list); + hlist_del_rcu(&timer->list); posix_timer_cleanup_ignored(timer); } From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2919C2144D0 for ; Sat, 8 Mar 2025 16:48:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452529; cv=none; b=k1/AGcTYfDhbIqM4rAcv+a2HF5HvDUknGc/CiehHnp20kOt5w58RzEzqjfynipltb2CHCIESBAwfQXgVXzWcc17SYE7qiRVMKCnUTh9XDTW14clLkvQooZUS1qv2OieXR2+c1fiEqd8vjIf/ctYLoAbGOIScGTIdNpcgUkVpHBE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452529; c=relaxed/simple; bh=Ri0ZjOLv4CEHlOoYBusHjqOyUxJwtOmZYU+uz9YxkQU=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=T+Bv0psQFuub9Qi4KcAJQVEpcmuzP9VQLJLYsElkSk+crlkp+5nVa7RNvipB3SBBmPmU3Tz4MRMnsUOdpD9nve4t1nM1fKjMcin1Ad+q7XilONa1SSs3162ZiGKHmIpiZ19LR/0QmRju2QGCOXo12VR97LThhP8vllZxwNBCLko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=VaVA7+4S; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=qLyAtnQM; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="VaVA7+4S"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="qLyAtnQM" Message-ID: <20250308155624.465175807@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452526; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Uc4dKyPH/2nSK2noyFwgLXUhSjpPC4g5+353BlQ1hbI=; b=VaVA7+4S+8F0h2HOZH09ojQKezw2tyE2Ikv3uRq9/Yi4Td7LdkLww2oeCM6nTvVSS/5I3G Pg4YtMRPG3lEogcF5BpAP9HVcvlDAxIVrqSeLaJnjqji3yuwsMEeK0QEpCVwHOoDq1TXzS iThAT7Ga3gYPhFS+RUas0pgguhAzQONgw4uLowIt5IFJrfuk8FScbYVqIF1EHfCB7ZCayO Y/xZRcU36KDa7CVqCrZnSNZ9uYim/FW/GdQu0aWnhq7oxkO5ViqsAH0eddlxHyauu3dIup atMa4O3YrbC71abWqdocCBF1DNxqRHyYZdWnkt0FEJQ4ajKgU+38dYlAu2cwmQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452526; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Uc4dKyPH/2nSK2noyFwgLXUhSjpPC4g5+353BlQ1hbI=; b=qLyAtnQMRU3oVgQ5MT9zD07v7TvGLmxd+V6XrmVavf9/cONoJIqpyDp+6ZjKbVycSrC00y 59D+UETV8staLUAw== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 16/18] posix-timers: Dont iterate /proc/$PID/timers with sighand:: Siglock held References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:45 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The readout of /proc/$PID/timers holds sighand::siglock with interrupts disabled. That is required to protect against concurrent modifications of the task::signal::posix_timers list because the list is not RCU safe. With the conversion of the timer storage to a RCU protected hlist, this is not longer required. The only requirement is to protect the returned entry against a concurrent free, which is trivial as the timers are RCU protected. Removing the trylock of sighand::siglock is benign because the life time of task_struct::signal is bound to the life time of the task_struct itself. There are two scenarios where this matters: 1) The process is life and not about to be checkpointed 2) The process is stopped via ptrace for checkpointing #1 is a racy snapshot of the armed timers and nothing can rely on it. It's not more than debug information and it has been that way before because sighand lock is dropped when the buffer is full and the restart of the iteration might find a completely different set of timers. The task and therefore task::signal cannot be freed as timers_start() acquired a reference count via get_pid_task(). #2 the process is stopped for checkpointing so nothing can delete or create timers at this point. Neither can the process exit during the traversal. If CRIU fails to observe an exit in progress prior to the dissimination of the timers, then there are more severe problems to solve in the CRIU mechanics as they can't rely on posix timers being enabled in the first place. Therefore replace the lock acquisition with rcu_read_lock() and switch the timer storage traversal over to seq_hlist_*_rcu(). Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker --- fs/proc/base.c | 48 ++++++++++++++++++++---------------------------- 1 file changed, 20 insertions(+), 28 deletions(-) --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2497,11 +2497,9 @@ static const struct file_operations proc =20 #if defined(CONFIG_CHECKPOINT_RESTORE) && defined(CONFIG_POSIX_TIMERS) struct timers_private { - struct pid *pid; - struct task_struct *task; - struct sighand_struct *sighand; - struct pid_namespace *ns; - unsigned long flags; + struct pid *pid; + struct task_struct *task; + struct pid_namespace *ns; }; =20 static void *timers_start(struct seq_file *m, loff_t *pos) @@ -2512,54 +2510,48 @@ static void *timers_start(struct seq_fil if (!tp->task) return ERR_PTR(-ESRCH); =20 - tp->sighand =3D lock_task_sighand(tp->task, &tp->flags); - if (!tp->sighand) - return ERR_PTR(-ESRCH); - - return seq_hlist_start(&tp->task->signal->posix_timers, *pos); + rcu_read_lock(); + return seq_hlist_start_rcu(&tp->task->signal->posix_timers, *pos); } =20 static void *timers_next(struct seq_file *m, void *v, loff_t *pos) { struct timers_private *tp =3D m->private; - return seq_hlist_next(v, &tp->task->signal->posix_timers, pos); + + return seq_hlist_next_rcu(v, &tp->task->signal->posix_timers, pos); } =20 static void timers_stop(struct seq_file *m, void *v) { struct timers_private *tp =3D m->private; =20 - if (tp->sighand) { - unlock_task_sighand(tp->task, &tp->flags); - tp->sighand =3D NULL; - } - if (tp->task) { put_task_struct(tp->task); tp->task =3D NULL; + rcu_read_unlock(); } } =20 static int show_timer(struct seq_file *m, void *v) { - struct k_itimer *timer; - struct timers_private *tp =3D m->private; - int notify; static const char * const nstr[] =3D { - [SIGEV_SIGNAL] =3D "signal", - [SIGEV_NONE] =3D "none", - [SIGEV_THREAD] =3D "thread", + [SIGEV_SIGNAL] =3D "signal", + [SIGEV_NONE] =3D "none", + [SIGEV_THREAD] =3D "thread", }; =20 - timer =3D hlist_entry((struct hlist_node *)v, struct k_itimer, list); - notify =3D timer->it_sigev_notify; + struct k_itimer *timer =3D hlist_entry((struct hlist_node *)v, struct k_i= timer, list); + struct timers_private *tp =3D m->private; + int notify =3D timer->it_sigev_notify; + + guard(spinlock_irq)(&timer->it_lock); + if (!posixtimer_valid(timer)) + return 0; =20 seq_printf(m, "ID: %d\n", timer->it_id); - seq_printf(m, "signal: %d/%px\n", - timer->sigq.info.si_signo, + seq_printf(m, "signal: %d/%px\n", timer->sigq.info.si_signo, timer->sigq.info.si_value.sival_ptr); - seq_printf(m, "notify: %s/%s.%d\n", - nstr[notify & ~SIGEV_THREAD_ID], + seq_printf(m, "notify: %s/%s.%d\n", nstr[notify & ~SIGEV_THREAD_ID], (notify & SIGEV_THREAD_ID) ? "tid" : "pid", pid_nr_ns(timer->it_pid, tp->ns)); seq_printf(m, "ClockID: %d\n", timer->it_clock); From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3236214806 for ; Sat, 8 Mar 2025 16:48:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452532; cv=none; b=tZ9fawCmOBf1q/CTtvT8raCnGV6Jt4xYP0GAMpGGJBhrkabYNHaMhPJ8Qr/SgePLRIFFS+1hij4ldkv2PBC9KODxFLuHf7mDkdowy7ks8GSrF6Qg/kq14iC4tVEkS2rLLy00eC6hGU0DXKXyatUelQ9AcQz2WUj/ar/MwJArsS8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452532; c=relaxed/simple; bh=SNmNf1pfS8U1kfuzeVGc5ionyg+36cghQdd4igDVF/Q=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=A/Da22cjgcKkBOAAN7sVEKscK6E2gvXVm2lCIgvX1WbYdBbyIQEc/hsXrcPVhq0gwOgxVgZiEyksw9H/GdvcUJiZ1d50ebg1cSwE+pKsDnTSq5wzl8M0VfeX3IlvhbcrbIq2LHK6+6yvnk2QnBIdxdufrlDbMZTpdb4qqZTMO8g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=nwJhTZe+; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=EpBMk6uK; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="nwJhTZe+"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="EpBMk6uK" Message-ID: <20250308155624.526740902@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452528; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=7CfzyR+B+jADoe7CKStowkSOzNnvW4fAjEzZHTxzyYU=; b=nwJhTZe+A1NGxh/iOn/pKGJiq88czEJvVethYVpJcf1h2JIHHYTkIkcOc/Q8N8JzM3YKx4 Six3dKjthBk9LnppO4lodLQgPR86cf0TGywGmPA83/QwUrmewIaJOEP0tCkQsM7i7fyNOS ygSnis2quH8rJpsND+j298Xgl880yBncenx2UsstgtUXn3WZ0Y1ELha+HnGkKOOeTfNg47 7W5F5c72RPBG2eWvU3GftTVtSqMVRbtbEfB1mnTgPwXHkA137AFRtNYzxMbHPvfl4SiPkh bUX7o4o2yctalTJgJS8NtVOibApZNp7HSaZ1sMmO/Ti+iNRqQ34q9+rj3HFCtQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452528; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=7CfzyR+B+jADoe7CKStowkSOzNnvW4fAjEzZHTxzyYU=; b=EpBMk6uK6mMKxmSxRZnIEQlRR1CBq1HOqBmZhgfIQ9yKKEK4FlbG4usbFp8Pn0oqkyj7uq AbFH0pPCI/rQOZDQ== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 17/18] posix-timers: Provide a mechanism to allocate a given timer ID References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:47 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Checkpoint/Restore in Userspace (CRIU) requires to reconstruct posix timers with the same timer ID on restore. It uses sys_timer_create() and relies on the monotonic increasing timer ID provided by this syscall. It creates and deletes timers until the desired ID is reached. This is can loop for a long time, when the checkpointed process had a very sparse timer ID range. It has been debated to implement a new syscall to allow the creation of timers with a given timer ID, but that's tideous due to the 32/64bit compat issues of sigevent_t and of dubious value. The restore mechanism of CRIU creates the timers in a state where all threads of the restored process are held on a barrier and cannot issue syscalls. That means the restorer task has exclusive control. This allows to address this issue with a prctl() so that the restorer thread can do: if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_ON)) goto linear_mode; create_timers_with_explicit_ids(); prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_OFF); =20 This is backwards compatible because the prctl() fails on older kernels and CRIU can fall back to the linear timer ID mechanism. CRIU versions which do not know about the prctl() just work as before. Implement the prctl() and modify timer_create() so that it copies the requested timer ID from userspace by utilizing the existing timer_t pointer, which is used to copy out the allocated timer ID on success. If the prctl() is disabled, which it is by default, timer_create() works as before and does not try to read from the userspace pointer. There is no problem when a broken or rogue user space application enables the prctl(). If the user space pointer does not contain a valid ID, then timer_create() fails. If the data is not initialized, but constains a random valid ID, timer_create() will create that random timer ID or fail if the ID is already given out.=20 =20 As CRIU must use the raw syscall to avoid manipulating the internal state of the restored process, this has no library dependencies and can be adopted by CRIU right away. Recreating two timers with IDs 1000000 and 2000000 takes 1.5 seconds with the create/delete method. With the prctl() it takes 3 microseconds. Signed-off-by: Thomas Gleixner Reviewed-by: Cyrill Gorcunov Tested-by: Cyrill Gorcunov --- V2: Move the ID counter ahead to avoid collisions after switching back to normal mode. --- include/linux/posix-timers.h | 2=20 include/linux/sched/signal.h | 1=20 include/uapi/linux/prctl.h | 10 ++++ kernel/sys.c | 5 ++ kernel/time/posix-timers.c | 97 +++++++++++++++++++++++++++++++-------= ----- 5 files changed, 89 insertions(+), 26 deletions(-) --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -114,6 +114,7 @@ bool posixtimer_init_sigqueue(struct sig void posixtimer_send_sigqueue(struct k_itimer *tmr); bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueu= e *timer_sigq); void posixtimer_free_timer(struct k_itimer *timer); +long posixtimer_create_prctl(unsigned long ctrl); =20 /* Init task static initializer */ #define INIT_CPU_TIMERBASE(b) { \ @@ -140,6 +141,7 @@ static inline void posixtimer_rearm_itim static inline bool posixtimer_deliver_signal(struct kernel_siginfo *info, struct sigqueue *timer_sigq) { return false; } static inline void posixtimer_free_timer(struct k_itimer *timer) { } +static inline long posixtimer_create_prctl(unsigned long ctrl) { return -E= INVAL; } #endif =20 #ifdef CONFIG_POSIX_CPU_TIMERS_TASK_WORK --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -136,6 +136,7 @@ struct signal_struct { #ifdef CONFIG_POSIX_TIMERS =20 /* POSIX.1b Interval Timers */ + unsigned int timer_create_restore_ids:1; atomic_t next_posix_timer_id; struct hlist_head posix_timers; struct hlist_head ignored_posix_timers; --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -353,4 +353,14 @@ struct prctl_mm_map { */ #define PR_LOCK_SHADOW_STACK_STATUS 76 =20 +/* + * Controls the mode of timer_create() for CRIU restore operations. + * Enabling this allows CRIU to restore timers with explicit IDs. + * + * Don't use for normal operations as the result might be undefined. + */ +#define PR_TIMER_CREATE_RESTORE_IDS 77 +# define PR_TIMER_CREATE_RESTORE_IDS_OFF 0 +# define PR_TIMER_CREATE_RESTORE_IDS_ON 1 + #endif /* _LINUX_PRCTL_H */ --- a/kernel/sys.c +++ b/kernel/sys.c @@ -2811,6 +2811,11 @@ SYSCALL_DEFINE5(prctl, int, option, unsi return -EINVAL; error =3D arch_lock_shadow_stack_status(me, arg2); break; + case PR_TIMER_CREATE_RESTORE_IDS: + if (arg3 || arg4 || arg5) + return -EINVAL; + error =3D posixtimer_create_prctl(arg2); + break; default: trace_task_prctl_unknown(option, arg2, arg3, arg4, arg5); error =3D -EINVAL; --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -57,6 +58,8 @@ static const struct k_clock * const posi static const struct k_clock *clockid_to_kclock(const clockid_t id); static const struct k_clock clock_realtime, clock_monotonic; =20 +#define TIMER_ANY_ID INT_MIN + /* SIGEV_THREAD_ID cannot share a bit with the other SIGEV values. */ #if SIGEV_THREAD_ID !=3D (SIGEV_THREAD_ID & \ ~(SIGEV_SIGNAL | SIGEV_NONE | SIGEV_THREAD)) @@ -128,38 +131,60 @@ static bool posix_timer_hashed(struct ti return false; } =20 -static int posix_timer_add(struct k_itimer *timer) +static bool posix_timer_add_at(struct k_itimer *timer, struct signal_struc= t *sig, unsigned int id) +{ + struct timer_hash_bucket *bucket =3D hash_bucket(sig, id); + + scoped_guard (spinlock, &bucket->lock) { + /* + * Validate under the lock as this could have raced against + * another thread ending up with the same ID, which is + * highly unlikely, but possible. + */ + if (!posix_timer_hashed(bucket, sig, id)) { + /* + * Set the timer ID and the signal pointer to make + * it identifiable in the hash table. The signal + * pointer has bit 0 set to indicate that it is not + * yet fully initialized. posix_timer_hashed() + * masks this bit out, but the syscall lookup fails + * to match due to it being set. This guarantees + * that there can't be duplicate timer IDs handed + * out. + */ + timer->it_id =3D (timer_t)id; + timer->it_signal =3D (struct signal_struct *)((unsigned long)sig | 1UL); + hlist_add_head_rcu(&timer->t_hash, &bucket->head); + return true; + } + } + return false; +} + +static int posix_timer_add(struct k_itimer *timer, int req_id) { struct signal_struct *sig =3D current->signal; =20 + if (unlikely(req_id !=3D TIMER_ANY_ID)) { + if (!posix_timer_add_at(timer, sig, req_id)) + return -EBUSY; + + /* + * Move the ID counter past the requested ID, so that after + * switching back to normal mode the IDs are outside of the + * exact allocated region. That avoids ID collisions on the + * next regular timer_create() invocations. + */ + atomic_set(&sig->next_posix_timer_id, req_id + 1); + return req_id; + } + for (unsigned int cnt =3D 0; cnt <=3D INT_MAX; cnt++) { /* Get the next timer ID and clamp it to positive space */ unsigned int id =3D atomic_fetch_inc(&sig->next_posix_timer_id) & INT_MA= X; - struct timer_hash_bucket *bucket =3D hash_bucket(sig, id); =20 - scoped_guard (spinlock, &bucket->lock) { - /* - * Validate under the lock as this could have raced - * against another thread ending up with the same - * ID, which is highly unlikely, but possible. - */ - if (!posix_timer_hashed(bucket, sig, id)) { - /* - * Set the timer ID and the signal pointer to make - * it identifiable in the hash table. The signal - * pointer has bit 0 set to indicate that it is not - * yet fully initialized. posix_timer_hashed() - * masks this bit out, but the syscall lookup fails - * to match due to it being set. This guarantees - * that there can't be duplicate timer IDs handed - * out. - */ - timer->it_id =3D (timer_t)id; - timer->it_signal =3D (struct signal_struct *)((unsigned long)sig | 1UL= ); - hlist_add_head_rcu(&timer->t_hash, &bucket->head); - return id; - } - } + if (posix_timer_add_at(timer, sig, id)) + return id; cond_resched(); } /* POSIX return code when no timer ID could be allocated */ @@ -364,6 +389,16 @@ static enum hrtimer_restart posix_timer_ return HRTIMER_NORESTART; } =20 +long posixtimer_create_prctl(unsigned long ctrl) +{ + if (ctrl > PR_TIMER_CREATE_RESTORE_IDS_ON) + return -EINVAL; + + guard(spinlock_irq)(¤t->sighand->siglock); + current->signal->timer_create_restore_ids =3D ctrl =3D=3D PR_TIMER_CREATE= _RESTORE_IDS_ON; + return 0; +} + static struct pid *good_sigevent(sigevent_t * event) { struct pid *pid =3D task_tgid(current); @@ -435,6 +470,7 @@ static int do_timer_create(clockid_t whi timer_t __user *created_timer_id) { const struct k_clock *kc =3D clockid_to_kclock(which_clock); + timer_t req_id =3D TIMER_ANY_ID; struct k_itimer *new_timer; int error, new_timer_id; =20 @@ -449,11 +485,20 @@ static int do_timer_create(clockid_t whi =20 spin_lock_init(&new_timer->it_lock); =20 + /* Special case for CRIU to restore timers with a given timer ID. */ + if (unlikely(current->signal->timer_create_restore_ids)) { + if (copy_from_user(&req_id, created_timer_id, sizeof(req_id))) + return -EFAULT; + /* Valid IDs are 0..INT_MAX */ + if ((unsigned int)req_id > INT_MAX) + return -EINVAL; + } + /* * Add the timer to the hash table. The timer is not yet valid * after insertion, but has a unique ID allocated. */ - new_timer_id =3D posix_timer_add(new_timer); + new_timer_id =3D posix_timer_add(new_timer, req_id); if (new_timer_id < 0) { posixtimer_free_timer(new_timer); return new_timer_id; From nobody Thu Dec 18 23:23:23 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3F9B215049 for ; Sat, 8 Mar 2025 16:48:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452534; cv=none; b=HbL4VpFIWgcGsthvgoio+shqu/YKD2DtnGhVcZmfLN9QBX2qLjkLTzX1+JAajmWpjx4+y4EAnRxfaRScx1LgR2EZtaub4TdVXbKUoekAwmv5914NR8BVvd3UBSAiBvi3rQEX+iAbHj0Uj4cub+XSPt510skSCHOH/A+uk2JyFOY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741452534; c=relaxed/simple; bh=JlMYreI1byil917wOP/FNFghM3ExAXRDvCqM7pZuLyQ=; h=Message-ID:From:To:Cc:Subject:References:MIME-Version: Content-Type:Date; b=mMD+QnXdEv9efqwD7kPwb2p9RY9mKM0EC06QNg/b5aQeNJrm28nQL3gV/Bf7GRJooisAXQw4Lk+I2JQjFpZyk777Gqz/ePQJdjAViPR86vn+gvwR3TWXS1KB7J1olou7EvsSw0YGdENmZ5lq+ruUyBhs36XYrTfP/JU5yM+cAs8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=BmRd+6W7; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=suClD5IW; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="BmRd+6W7"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="suClD5IW" Message-ID: <20250308155624.590144807@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1741452530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=7wM4JqSDsBc0GGoxvDmULTCoxYPRMDLvcja00XnHjd0=; b=BmRd+6W7rTXukGk5lg6o8vRU1NtIL+Oet0dimnhTqj/yC5X7/V1cALMU9+4Uf2p23HdsW5 5GTMUtf095HumeCWaW+UzUTtkz+Na/gYkWHkEnaaLzjk5AxITvCKXcONjEV14E9e1Ejgzj HnaFiVl1qjHMwyrA9K+aV2iQkdcAGqn2XkDxSVivgFzTR2gevv0a6sB3wbrAZ09BDUcM4K NimCmvIe3ZF4vRse1hVj36moXQxD87QKTCPM8BEiKeDLVfkPyb9nRPJNFyDd8Ti0z6IFwf pMi5rBis4GnLhDs7UHKBR7l2kIVZi+dEWD67LEgiOj0W2LANcE03xRqhDzpUIA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1741452530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=7wM4JqSDsBc0GGoxvDmULTCoxYPRMDLvcja00XnHjd0=; b=suClD5IWwnHlbnPsxaJvI8FIeL6HZMI/QkVirfbPFBEupKebq+QdRuV1ObJxQVWGrwrG16 z9fVwmt6UUBZQKBw== From: Thomas Gleixner To: LKML Cc: Anna-Maria Behnsen , Frederic Weisbecker , Benjamin Segall , Eric Dumazet , Andrey Vagin , Pavel Tikhomirov , Peter Zijlstra , Cyrill Gorcunov Subject: [patch V3 18/18] selftests/timers/posix-timers: Add a test for exact allocation mode References: <20250308155501.391430556@linutronix.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Date: Sat, 8 Mar 2025 17:48:49 +0100 (CET) Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The exact timer ID allocation mode is used by CRIU to restore timers with a given ID. Add a test case for it. It's skipped on older kernels when the prctl() fails. Signed-off-by: Thomas Gleixner --- V3: Use the PRCTL defines V2: Adopt to the ID counter change in the exact mode case --- tools/testing/selftests/timers/posix_timers.c | 66 +++++++++++++++++++++= ++++- 1 file changed, 65 insertions(+), 1 deletion(-) --- a/tools/testing/selftests/timers/posix_timers.c +++ b/tools/testing/selftests/timers/posix_timers.c @@ -7,6 +7,7 @@ * Kernel loop code stolen from Steven Rostedt */ #define _GNU_SOURCE +#include #include #include #include @@ -599,14 +600,77 @@ static void check_overrun(int which, con "check_overrun %s\n", name); } =20 +#include + +static int do_timer_create(int *id) +{ + return syscall(__NR_timer_create, CLOCK_MONOTONIC, NULL, id); +} + +static int do_timer_delete(int id) +{ + return syscall(__NR_timer_delete, id); +} + +#ifndef define PR_TIMER_CREATE_RESTORE_IDS +# define PR_TIMER_CREATE_RESTORE_IDS 77 +# define PR_TIMER_CREATE_RESTORE_IDS_OFF 0 +# define PR_TIMER_CREATE_RESTORE_IDS_ON 1 +#endif + +static void check_timer_create_exact(void) +{ + int id; + + if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_ON, 0,= 0, 0)) { + switch (errno) { + case EINVAL: + ksft_test_result_skip("check timer create exact, not supported\n"); + return; + default: + ksft_test_result_skip("check timer create exact, errno =3D %d\n", errno= ); + return; + } + } + + id =3D 8; + if (do_timer_create(&id) < 0) + fatal_error(NULL, "timer_create()"); + + if (do_timer_delete(id)) + fatal_error(NULL, "timer_delete()"); + + if (prctl(PR_TIMER_CREATE_RESTORE_IDS, PR_TIMER_CREATE_RESTORE_IDS_OFF, 0= , 0, 0)) + fatal_error(NULL, "prctl()"); + + if (id !=3D 8) { + ksft_test_result_fail("check timer create exact %d !=3D 8\n", id); + return; + } + + /* Validate that it went back to normal mode and allocates ID 9 */ + if (do_timer_create(&id) < 0) + fatal_error(NULL, "timer_create()"); + + if (do_timer_delete(id)) + fatal_error(NULL, "timer_delete()"); + + if (id =3D=3D 9) + ksft_test_result_pass("check timer create exact\n"); + else + ksft_test_result_fail("check timer create exact. Disabling failed.\n"); +} + int main(int argc, char **argv) { ksft_print_header(); - ksft_set_plan(18); + ksft_set_plan(19); =20 ksft_print_msg("Testing posix timers. False negative may happen on CPU ex= ecution \n"); ksft_print_msg("based timers if other threads run on the CPU...\n"); =20 + check_timer_create_exact(); + check_itimer(ITIMER_VIRTUAL, "ITIMER_VIRTUAL"); check_itimer(ITIMER_PROF, "ITIMER_PROF"); check_itimer(ITIMER_REAL, "ITIMER_REAL");