timekeeping: Add a lockdep override in tick_freeze().

[PATCH] timekeeping: Add a lockdep override in tick_freeze().

Posted by Sebastian Andrzej Siewior 10 months, 1 week ago

tick_freeze() acquires a raw_spinlock_t (tick_freeze_lock). Later in the
callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver
can acquire a spinlock_t which becomes a sleeping lock on PREEMPT_RT.
Lockdep complains about this lock nesting.

Add a lockdep override for this special case and a comment explaining
why it is okay.

Reported-by: Borislav Petkov <bp@alien8.de>
Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/
Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/time/tick-common.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index a47bcf71defcf..8fd8e2ee09fa1 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -509,6 +509,7 @@ void tick_resume(void)
 
 #ifdef CONFIG_SUSPEND
 static DEFINE_RAW_SPINLOCK(tick_freeze_lock);
+static DEFINE_WAIT_OVERRIDE_MAP(tick_freeze_map, LD_WAIT_SLEEP);
 static unsigned int tick_freeze_depth;
 
 /**
@@ -528,9 +529,20 @@ void tick_freeze(void)
 	if (tick_freeze_depth == num_online_cpus()) {
 		trace_suspend_resume(TPS("timekeeping_freeze"),
 				     smp_processor_id(), true);
+		/*
+		 * All other CPUs have their interrupts disabled and are
+		 * suspended to idle. Other tasks have been frozen so there is
+		 * no scheduling happening. This means that there is no
+		 * concurrency in the system at this point. Therefore it is okay
+		 * to acquire a sleeping lock on PREEMPT_RT, such as spinlock_t,
+		 * because the lock can not be acquired and can not block.
+		 * Inform lockdep about the situation.
+		 */
+		lock_map_acquire_try(&tick_freeze_map);
 		system_state = SYSTEM_SUSPEND;
 		sched_clock_suspend();
 		timekeeping_suspend();
+		lock_map_release(&tick_freeze_map);
 	} else {
 		tick_suspend_local();
 	}
@@ -552,8 +564,16 @@ void tick_unfreeze(void)
 	raw_spin_lock(&tick_freeze_lock);
 
 	if (tick_freeze_depth == num_online_cpus()) {
+		/*
+		 * Similar to tick_freeze(). On resumption the first CPU may
+		 * acquire uncontended sleeping locks while other CPUs block on
+		 * tick_freeze_lock.
+		 */
+		lock_map_acquire_try(&tick_freeze_map);
 		timekeeping_resume();
 		sched_clock_resume();
+		lock_map_release(&tick_freeze_map);
+
 		system_state = SYSTEM_RUNNING;
 		trace_suspend_resume(TPS("timekeeping_freeze"),
 				     smp_processor_id(), false);
-- 
2.49.0

Re: [PATCH] timekeeping: Add a lockdep override in tick_freeze().

Posted by Chris Bainbridge 8 months, 1 week ago

Hi,

I'm getting "WARNING: inconsistent lock state" on resume with this
commit (92e250c624ea37fde64bfd624fd2556f0d846f18):

[   43.747069] ACPI: \_SB_.PEP_: Successfully transitioned to state screen off
[   43.753611] ACPI: \_SB_.PEP_: Successfully transitioned to state lps0 ms entry
[   43.753777] ACPI: \_SB_.PEP_: Successfully transitioned to state lps0 entry
[   43.838479] amd_pmc AMDI0005:00: failed to set RTC: -22
[   43.838489] PM: suspend-to-idle
[   43.838542] amd_pmc: SMU idlemask s0i3: 0xc00e0eb5
[   46.246882] Timekeeping suspended for 2.528 seconds
[   46.249132] PM: Triggering wakeup from IRQ 9
[   46.249292] ACPI: PM: ACPI fixed event wakeup
[   46.249297] PM: resume from suspend-to-idle
[   46.250801] amd_pmc AMDI0005:00: Last suspend didn't reach deepest state
[   46.251096] ACPI: \_SB_.PEP_: Successfully transitioned to state lps0 exit
[   46.257167] ACPI: \_SB_.PEP_: Successfully transitioned to state lps0 ms exit
[   46.258450] ACPI: \_SB_.PEP_: Successfully transitioned to state screen on
[   46.259652] ACPI: EC: interrupt unblocked
[   46.291758] PM: noirq resume of devices complete after 32.885 msecs

[   46.291907] ================================
[   46.291909] WARNING: inconsistent lock state
[   46.291911] 6.15.0-rc1-00002-g92e250c624ea #379 Not tainted
[   46.291914] --------------------------------
[   46.291915] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[   46.291917] irq/9-acpi/142 [HC0[0]:SC0[0]:HE1:SE1] takes:
[   46.291921] ffffffff88e48d18 (rtc_lock){?...}-{3:3}, at: cmos_interrupt+0x21/0x110
[   46.291939] {IN-HARDIRQ-W} state was registered at:
[   46.291941]   lock_acquire+0xc9/0x2d0
[   46.291946]   _raw_spin_lock+0x30/0x40
[   46.291951]   cmos_interrupt+0x21/0x110
[   46.291956]   __handle_irq_event_percpu+0x81/0x290
[   46.291962]   handle_irq_event+0x38/0x70
[   46.291966]   handle_edge_irq+0x85/0x210
[   46.291970]   __common_interrupt+0x45/0xd0
[   46.291973]   common_interrupt+0x80/0xa0
[   46.291977]   asm_common_interrupt+0x26/0x40
[   46.291981]   cpuidle_enter_state+0x113/0x540
[   46.291985]   cpuidle_enter+0x2d/0x40
[   46.291989]   do_idle+0x1f5/0x250
[   46.291992]   cpu_startup_entry+0x29/0x30
[   46.291994]   start_secondary+0x10c/0x130
[   46.291998]   common_startup_64+0x13e/0x141
[   46.292002] irq event stamp: 877
[   46.292004] hardirqs last  enabled at (877): [<ffffffff88778a90>] _raw_spin_unlock_irqrestore+0x40/0x50
[   46.292010] hardirqs last disabled at (876): [<ffffffff8877884c>] _raw_spin_lock_irqsave+0x5c/0x60
[   46.292014] softirqs last  enabled at (362): [<ffffffff87b0dc3c>] __irq_exit_rcu+0xcc/0xf0
[   46.292019] softirqs last disabled at (353): [<ffffffff87b0dc3c>] __irq_exit_rcu+0xcc/0xf0
[   46.292022] 
               other info that might help us debug this:
[   46.292024]  Possible unsafe locking scenario:

[   46.292025]        CPU0
[   46.292026]        ----
[   46.292027]   lock(rtc_lock);
[   46.292029]   <Interrupt>
[   46.292030]     lock(rtc_lock);
[   46.292032] 
                *** DEADLOCK ***

[   46.292033] no locks held by irq/9-acpi/142.
[   46.292035] 
               stack backtrace:
[   46.292040] CPU: 1 UID: 0 PID: 142 Comm: irq/9-acpi Not tainted 6.15.0-rc1-00002-g92e250c624ea #379 PREEMPT(voluntary) 
[   46.292046] Hardware name: HP HP Pavilion Aero Laptop 13-be0xxx/8916, BIOS F.17 12/18/2024
[   46.292049] Call Trace:
[   46.292052]  <TASK>
[   46.292058]  dump_stack_lvl+0x6e/0x90
[   46.292063]  print_usage_bug.part.0+0x22c/0x2c0
[   46.292067]  mark_lock+0x6f7/0x890
[   46.292072]  ? __lock_acquire+0x449/0x21c0
[   46.292077]  __lock_acquire+0x7e5/0x21c0
[   46.292080]  ? psi_task_switch+0x10a/0x330
[   46.292086]  lock_acquire+0xc9/0x2d0
[   46.292088]  ? cmos_interrupt+0x21/0x110
[   46.292094]  ? acpi_write_bit_register+0x79/0xf0
[   46.292102]  _raw_spin_lock+0x30/0x40
[   46.292106]  ? cmos_interrupt+0x21/0x110
[   46.292110]  cmos_interrupt+0x21/0x110
[   46.292115]  rtc_handler+0x28/0xc0
[   46.292120]  acpi_ev_fixed_event_detect+0xc8/0x140
[   46.292127]  ? irq_thread+0xa4/0x340
[   46.292129]  acpi_ev_sci_xrupt_handler+0x13/0x40
[   46.292132]  acpi_irq+0x16/0x30
[   46.292136]  irq_thread_fn+0x1d/0x50
[   46.292139]  irq_thread+0x1d5/0x340
[   46.292140]  ? irq_copy_pending.isra.0+0x70/0x70
[   46.292144]  ? irq_finalize_oneshot.part.0+0xc0/0xc0
[   46.292147]  ? irq_forced_thread_fn+0x70/0x70
[   46.292149]  kthread+0x10a/0x250
[   46.292154]  ? kthreads_online_cpu+0x130/0x130
[   46.292158]  ret_from_fork+0x31/0x50
[   46.292162]  ? kthreads_online_cpu+0x130/0x130
[   46.292166]  ret_from_fork_asm+0x11/0x20
[   46.292174]  </TASK>
[   46.292918] PM: early resume of devices complete after 0.829 msecs

Re: [PATCH] timekeeping: Add a lockdep override in tick_freeze().

Posted by Chris Bainbridge 8 months, 1 week ago

On Sat, May 31, 2025 at 07:27:03PM +0100, Chris Bainbridge wrote:
> Hi,
> 
> I'm getting "WARNING: inconsistent lock state" on resume with this
> commit (92e250c624ea37fde64bfd624fd2556f0d846f18):
> 

Further testing shows there are some required conditions for this
warning to be shown. The suspend must be of a short enough duration that
it does "not reach hardware sleep state" (according to amd_s2idle.py).

Also the warning is only shown once, I don't know if this is because the
conditions for the warning only occur once, or if there is log limit
somewhere that prevents it from being logged more than once.

I can reliably reproduce the warning by running amd_s2idle.py and
waiting for the automatic resume:

# ./amd_s2idle.py --log log --duration 5 --wait 4 --count 1
Debugging script for s2idle on AMD systems
💻 HP HP Pavilion Aero Laptop 13-be0xxx (103C_5335KV HP Pavilion) running BIOS 15.17 (F.17) released 12/18/2024 and EC 79.31
🐧 Debian GNU/Linux trixie/sid
🐧 Kernel 6.15.0-rc1-00002-g92e250c624ea
🔋 Battery BAT0 (313-27-3C-A PC03043XL) is operating at 100.00% of design
Checking prerequisites for s2idle
✅ Logs are provided via systemd
✅ AMD Ryzen 7 5800U with Radeon Graphics (family 19 model 50)
✅ SMT enabled
✅ LPS0 _DSM enabled
✅ ACPI FADT supports Low-power S0 idle
✅ HSMP driver `amd_hsmp` not detected (blocked: False)
✅ PMC driver `amd_pmc` loaded (Program 0 Firmware 64.73.0)
✅ GPU driver `amdgpu` bound to 0000:03:00.0
✅ System is configured for s2idle
✅ NVME Intel Corporation SSD 670p Series [Keystone Harbor] is configured for s2idle in BIOS
✅ GPIO driver `pinctrl_amd` available
🚦 Device firmware checks unavailable without fwupd gobject introspection
Started at 2025-05-31 19:46:33.911590 (cycle finish expected @ 2025-05-31 19:46:42.911616)
Results from last s2idle cycle
○ Suspend count: 1
○ Hardware sleep cycle count: 1
○ Wakeup triggered from IRQ 9: ACPI SCI
○ Woke up from IRQ 9: ACPI SCI
○ gpe03 increased from 140 to 148
✅ Userspace suspended for 0:00:08.256333
❌ Did not reach hardware sleep state

If the duration arg is 6 or higher, then amd_s2idle.py reports that the
hardware sleep state was entered, and the "inconsistent lock state"
warning does not appear. If the duration is too low (e.g. 1 second),
then the laptop does not wake up automatically, and upon pressing a
keyboard key, the amdgpu driver will report an error resuming the GPU,
and the GPU will not be working. (I don't think the amdgpu problem is
related to the lock state warning, I'm just mentioning it for
completeness). It is the state between these two cases, where the laptop
does suspend and resume correctly, but the suspend is too short to enter
a hardware sleep state, where the problem occurs.

Re: [PATCH] timekeeping: Add a lockdep override in tick_freeze().

Posted by Mateusz Jończyk 8 months, 1 week ago

W dniu 31.05.2025 o 21:16, Chris Bainbridge pisze:
> On Sat, May 31, 2025 at 07:27:03PM +0100, Chris Bainbridge wrote:
>> Hi,
>>
>> I'm getting "WARNING: inconsistent lock state" on resume with this
>> commit (92e250c624ea37fde64bfd624fd2556f0d846f18):
>>
> Further testing shows there are some required conditions for this
> warning to be shown. The suspend must be of a short enough duration that
> it does "not reach hardware sleep state" (according to amd_s2idle.py).
>
> Also the warning is only shown once, I don't know if this is because the
> conditions for the warning only occur once, or if there is log limit
> somewhere that prevents it from being logged more than once.
>
> I can reliably reproduce the warning by running amd_s2idle.py and
> waiting for the automatic resume:
>
> # ./amd_s2idle.py --log log --duration 5 --wait 4 --count 1
> Debugging script for s2idle on AMD systems
> 💻 HP HP Pavilion Aero Laptop 13-be0xxx (103C_5335KV HP Pavilion) running BIOS 15.17 (F.17) released 12/18/2024 and EC 79.31
> 🐧 Debian GNU/Linux trixie/sid
> 🐧 Kernel 6.15.0-rc1-00002-g92e250c624ea
> 🔋 Battery BAT0 (313-27-3C-A PC03043XL) is operating at 100.00% of design
> Checking prerequisites for s2idle
> ✅ Logs are provided via systemd
> ✅ AMD Ryzen 7 5800U with Radeon Graphics (family 19 model 50)
> ✅ SMT enabled
> ✅ LPS0 _DSM enabled
> ✅ ACPI FADT supports Low-power S0 idle
> ✅ HSMP driver `amd_hsmp` not detected (blocked: False)
> ✅ PMC driver `amd_pmc` loaded (Program 0 Firmware 64.73.0)
> ✅ GPU driver `amdgpu` bound to 0000:03:00.0
> ✅ System is configured for s2idle
> ✅ NVME Intel Corporation SSD 670p Series [Keystone Harbor] is configured for s2idle in BIOS
> ✅ GPIO driver `pinctrl_amd` available
> 🚦 Device firmware checks unavailable without fwupd gobject introspection
> Started at 2025-05-31 19:46:33.911590 (cycle finish expected @ 2025-05-31 19:46:42.911616)
> Results from last s2idle cycle
> ○ Suspend count: 1
> ○ Hardware sleep cycle count: 1
> ○ Wakeup triggered from IRQ 9: ACPI SCI
> ○ Woke up from IRQ 9: ACPI SCI
> ○ gpe03 increased from 140 to 148
> ✅ Userspace suspended for 0:00:08.256333
> ❌ Did not reach hardware sleep state
>
> If the duration arg is 6 or higher, then amd_s2idle.py reports that the
> hardware sleep state was entered, and the "inconsistent lock state"
> warning does not appear. If the duration is too low (e.g. 1 second),
> then the laptop does not wake up automatically, and upon pressing a
> keyboard key, the amdgpu driver will report an error resuming the GPU,
> and the GPU will not be working. (I don't think the amdgpu problem is
> related to the lock state warning, I'm just mentioning it for
> completeness). It is the state between these two cases, where the laptop
> does suspend and resume correctly, but the suspend is too short to enter
> a hardware sleep state, where the problem occurs.

Hello,

Thank you for this bug report.

amd_s2idle apparently uses an RTC alarm to wake the system up
(which on newer systems is handled by ACPI SCI instead).
When the delay before the alarm is very low (like 1 second),
the alarm fires before the system is fully
suspended and the system does not wake thereafter - you have
to wake it up manually. The ACPI SPI interrupt is queued, however,
and fires just thereafter.

It appears, however, that both the RTC interrupt and ACPI SPI
interrupts fired (one after the other or at the same time).

I have noticed that cmos_interrupt() in drivers/rtc/rtc-cmos.c
uses spin_lock(), not spin_lock_irqsave() etc., even though it
can be called from a non-interrupt context - indirectly by
cmos_resume() during system resume and also by rtc_handler().

This can lead to a deadlock and is likely while lockdep is
complaining - see "Single-lock state rules:" in
Documentation/locking/lockdep-design.rst .

It is possible that
commit 92e250c624ea ("timekeeping: Add a lockdep override in tick_freeze()")
is masking the current problem because only the first issue is shown.

I'll send you a debug patch shortly.

Greetings,
Mateusz

[DRAFT PATCH] rtc-cmos: use spin_lock_irqsave in cmos_interrupt

Posted by Mateusz Jończyk 8 months, 1 week ago

cmos_interrupt() can also be called also in non-interrupt contexts, such
as in ACPI handlers via rtc_handler(). Therefore, usage of
spin_lock(&rtc_lock) is insecure. Use spin_lock_irqsave() / etc.
instead.

Remove the local_irq_disable() hacks in cmos_check_wkalrm() and add a
comment so that these _irqsave / _irqrestore will not be disabled again,
as in

commit 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")

Untested yet.

Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
Fixes: 13be2efc390a ("rtc: cmos: Disable irq around direct invocation of cmos_interrupt()")
---
 drivers/rtc/rtc-cmos.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 8172869bd3d7..399bb82e6153 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -692,8 +692,12 @@ static irqreturn_t cmos_interrupt(int irq, void *p)
 {
 	u8		irqstat;
 	u8		rtc_control;
+	unsigned long	flags;
 
-	spin_lock(&rtc_lock);
+	/* We cannot use spin_lock() here, as cmos_interrupt() is also called
+	 * in non-irq context.
+	 */
+	spin_lock_irqsave(&rtc_lock, flags);
 
 	/* When the HPET interrupt handler calls us, the interrupt
 	 * status is passed as arg1 instead of the irq number.  But
@@ -727,7 +731,7 @@ static irqreturn_t cmos_interrupt(int irq, void *p)
 			hpet_mask_rtc_irq_bit(RTC_AIE);
 		CMOS_READ(RTC_INTR_FLAGS);
 	}
-	spin_unlock(&rtc_lock);
+	spin_unlock_irqrestore(&rtc_lock, flags);
 
 	if (is_intr(irqstat)) {
 		rtc_update_irq(p, 1, irqstat);
@@ -1295,9 +1299,7 @@ static void cmos_check_wkalrm(struct device *dev)
 	 * ACK the rtc irq here
 	 */
 	if (t_now >= cmos->alarm_expires && cmos_use_acpi_alarm()) {
-		local_irq_disable();
 		cmos_interrupt(0, (void *)cmos->rtc);
-		local_irq_enable();
 		return;
 	}
 
-- 
2.25.1

Re: [DRAFT PATCH] rtc-cmos: use spin_lock_irqsave in cmos_interrupt

Posted by Sebastian Andrzej Siewior 8 months, 1 week ago

On 2025-06-02 22:20:19 [+0200], Mateusz Jończyk wrote:
> cmos_interrupt() can also be called also in non-interrupt contexts, such
> as in ACPI handlers via rtc_handler(). Therefore, usage of
> spin_lock(&rtc_lock) is insecure. Use spin_lock_irqsave() / etc.
> instead.
> 
> Remove the local_irq_disable() hacks in cmos_check_wkalrm() and add a
> comment so that these _irqsave / _irqrestore will not be disabled again,
> as in

This also broke PREEMPT_RT due to disable-IRQ -> spinlock_t.

> commit 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")
> 
> Untested yet.
> 
> Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
> Fixes: 13be2efc390a ("rtc: cmos: Disable irq around direct invocation of cmos_interrupt()")

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian

Re: [DRAFT PATCH] rtc-cmos: use spin_lock_irqsave in cmos_interrupt

Posted by Chris Bainbridge 8 months, 1 week ago

On Mon, Jun 02, 2025 at 10:20:19PM +0200, Mateusz Jończyk wrote:
> cmos_interrupt() can also be called also in non-interrupt contexts, such
> as in ACPI handlers via rtc_handler(). Therefore, usage of
> spin_lock(&rtc_lock) is insecure. Use spin_lock_irqsave() / etc.
> instead.
> 
> Remove the local_irq_disable() hacks in cmos_check_wkalrm() and add a
> comment so that these _irqsave / _irqrestore will not be disabled again,
> as in
> 
> commit 6950d046eb6e ("rtc: cmos: Replace spin_lock_irqsave with spin_lock in hard IRQ")
> 
> Untested yet.
> 
> Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl>
> Fixes: 13be2efc390a ("rtc: cmos: Disable irq around direct invocation of cmos_interrupt()")

I tested this, and can confirm it does fix the "inconsistent lock state"
error in my test case.

Re: [PATCH] timekeeping: Add a lockdep override in tick_freeze().

Posted by Peter Zijlstra 10 months ago

On Fri, Apr 04, 2025 at 03:34:29PM +0200, Sebastian Andrzej Siewior wrote:
> tick_freeze() acquires a raw_spinlock_t (tick_freeze_lock). Later in the
> callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver
> can acquire a spinlock_t which becomes a sleeping lock on PREEMPT_RT.
> Lockdep complains about this lock nesting.
> 
> Add a lockdep override for this special case and a comment explaining
> why it is okay.
> 
> Reported-by: Borislav Petkov <bp@alien8.de>
> Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/
> Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
> Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This is of course horrible :-) But yes, probably the best one can do
given how things are.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> ---
>  kernel/time/tick-common.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index a47bcf71defcf..8fd8e2ee09fa1 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -509,6 +509,7 @@ void tick_resume(void)
>  
>  #ifdef CONFIG_SUSPEND
>  static DEFINE_RAW_SPINLOCK(tick_freeze_lock);
> +static DEFINE_WAIT_OVERRIDE_MAP(tick_freeze_map, LD_WAIT_SLEEP);
>  static unsigned int tick_freeze_depth;
>  
>  /**
> @@ -528,9 +529,20 @@ void tick_freeze(void)
>  	if (tick_freeze_depth == num_online_cpus()) {
>  		trace_suspend_resume(TPS("timekeeping_freeze"),
>  				     smp_processor_id(), true);
> +		/*
> +		 * All other CPUs have their interrupts disabled and are
> +		 * suspended to idle. Other tasks have been frozen so there is
> +		 * no scheduling happening. This means that there is no
> +		 * concurrency in the system at this point. Therefore it is okay
> +		 * to acquire a sleeping lock on PREEMPT_RT, such as spinlock_t,
> +		 * because the lock can not be acquired and can not block.
> +		 * Inform lockdep about the situation.
> +		 */
> +		lock_map_acquire_try(&tick_freeze_map);
>  		system_state = SYSTEM_SUSPEND;
>  		sched_clock_suspend();
>  		timekeeping_suspend();
> +		lock_map_release(&tick_freeze_map);
>  	} else {
>  		tick_suspend_local();
>  	}
> @@ -552,8 +564,16 @@ void tick_unfreeze(void)
>  	raw_spin_lock(&tick_freeze_lock);
>  
>  	if (tick_freeze_depth == num_online_cpus()) {
> +		/*
> +		 * Similar to tick_freeze(). On resumption the first CPU may
> +		 * acquire uncontended sleeping locks while other CPUs block on
> +		 * tick_freeze_lock.
> +		 */
> +		lock_map_acquire_try(&tick_freeze_map);
>  		timekeeping_resume();
>  		sched_clock_resume();
> +		lock_map_release(&tick_freeze_map);
> +
>  		system_state = SYSTEM_RUNNING;
>  		trace_suspend_resume(TPS("timekeeping_freeze"),
>  				     smp_processor_id(), false);
> -- 
> 2.49.0
>

Re: [PATCH] timekeeping: Add a lockdep override in tick_freeze().

Posted by Mateusz Jończyk 10 months, 1 week ago

W dniu 4.04.2025 o 15:34, Sebastian Andrzej Siewior pisze:
> tick_freeze() acquires a raw_spinlock_t (tick_freeze_lock). Later in the
> callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver
> can acquire a spinlock_t which becomes a sleeping lock on PREEMPT_RT.
> Lockdep complains about this lock nesting.
>
> Add a lockdep override for this special case and a comment explaining
> why it is okay.
>
> Reported-by: Borislav Petkov <bp@alien8.de>
> Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/
> Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
> Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>   kernel/time/tick-common.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
>
> diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
> index a47bcf71defcf..8fd8e2ee09fa1 100644
> --- a/kernel/time/tick-common.c
> +++ b/kernel/time/tick-common.c
> @@ -509,6 +509,7 @@ void tick_resume(void)
>   
>   #ifdef CONFIG_SUSPEND
>   static DEFINE_RAW_SPINLOCK(tick_freeze_lock);
> +static DEFINE_WAIT_OVERRIDE_MAP(tick_freeze_map, LD_WAIT_SLEEP);
>   static unsigned int tick_freeze_depth;
>   
>   /**
> @@ -528,9 +529,20 @@ void tick_freeze(void)
>   	if (tick_freeze_depth == num_online_cpus()) {
>   		trace_suspend_resume(TPS("timekeeping_freeze"),
>   				     smp_processor_id(), true);
> +		/*
> +		 * All other CPUs have their interrupts disabled and are
> +		 * suspended to idle. Other tasks have been frozen so there is
> +		 * no scheduling happening. This means that there is no
> +		 * concurrency in the system at this point. Therefore it is okay
> +		 * to acquire a sleeping lock on PREEMPT_RT, such as spinlock_t,
> +		 * because the lock can not be acquired and can not block.

because the lock cannot be held by other CPUs / threads and
acquiring it cannot block.

> +		 * Inform lockdep about the situation.
> +		 */

Greetings,

Mateusz

[tip: timers/urgent] timekeeping: Add a lockdep override in tick_freeze()

Posted by tip-bot2 for Sebastian Andrzej Siewior 10 months ago

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     92e250c624ea37fde64bfd624fd2556f0d846f18
Gitweb:        https://git.kernel.org/tip/92e250c624ea37fde64bfd624fd2556f0d846f18
Author:        Sebastian Andrzej Siewior <bigeasy@linutronix.de>
AuthorDate:    Fri, 04 Apr 2025 15:34:29 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Wed, 09 Apr 2025 22:30:39 +02:00

timekeeping: Add a lockdep override in tick_freeze()

tick_freeze() acquires a raw spinlock (tick_freeze_lock). Later in the
callchain (timekeeping_suspend() -> mc146818_avoid_UIP()) the RTC driver
acquires a spinlock which becomes a sleeping lock on PREEMPT_RT.  Lockdep
complains about this lock nesting.

Add a lockdep override for this special case and a comment explaining
why it is okay.

Reported-by: Borislav Petkov <bp@alien8.de>
Reported-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20250404133429.pnAzf-eF@linutronix.de
Closes: https://lore.kernel.org/all/20250330113202.GAZ-krsjAnurOlTcp-@fat_crate.local/
Closes: https://lore.kernel.org/all/CAP-bSRZ0CWyZZsMtx046YV8L28LhY0fson2g4EqcwRAVN1Jk+Q@mail.gmail.com/
---
 kernel/time/tick-common.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index a47bcf7..9a38594 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -509,6 +509,7 @@ void tick_resume(void)
 
 #ifdef CONFIG_SUSPEND
 static DEFINE_RAW_SPINLOCK(tick_freeze_lock);
+static DEFINE_WAIT_OVERRIDE_MAP(tick_freeze_map, LD_WAIT_SLEEP);
 static unsigned int tick_freeze_depth;
 
 /**
@@ -528,9 +529,22 @@ void tick_freeze(void)
 	if (tick_freeze_depth == num_online_cpus()) {
 		trace_suspend_resume(TPS("timekeeping_freeze"),
 				     smp_processor_id(), true);
+		/*
+		 * All other CPUs have their interrupts disabled and are
+		 * suspended to idle. Other tasks have been frozen so there
+		 * is no scheduling happening. This means that there is no
+		 * concurrency in the system at this point. Therefore it is
+		 * okay to acquire a sleeping lock on PREEMPT_RT, such as a
+		 * spinlock, because the lock cannot be held by other CPUs
+		 * or threads and acquiring it cannot block.
+		 *
+		 * Inform lockdep about the situation.
+		 */
+		lock_map_acquire_try(&tick_freeze_map);
 		system_state = SYSTEM_SUSPEND;
 		sched_clock_suspend();
 		timekeeping_suspend();
+		lock_map_release(&tick_freeze_map);
 	} else {
 		tick_suspend_local();
 	}
@@ -552,8 +566,16 @@ void tick_unfreeze(void)
 	raw_spin_lock(&tick_freeze_lock);
 
 	if (tick_freeze_depth == num_online_cpus()) {
+		/*
+		 * Similar to tick_freeze(). On resumption the first CPU may
+		 * acquire uncontended sleeping locks while other CPUs block on
+		 * tick_freeze_lock.
+		 */
+		lock_map_acquire_try(&tick_freeze_map);
 		timekeeping_resume();
 		sched_clock_resume();
+		lock_map_release(&tick_freeze_map);
+
 		system_state = SYSTEM_RUNNING;
 		trace_suspend_resume(TPS("timekeeping_freeze"),
 				     smp_processor_id(), false);