[PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare

Elliot Berman posted 1 patch 4 years ago
arch/arm64/kernel/paravirt.c | 3 +++
1 file changed, 3 insertions(+)
[PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare
Posted by Elliot Berman 4 years ago
From: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>

During hotplug, the stolen time data structure is unmapped and memset.
There is a possibility of the timer IRQ being triggered before memset
and stolen time is getting updated as part of this timer IRQ handler. This
causes the below crash in timer handler -

  [ 3457.473139][    C5] Unable to handle kernel paging request at virtual address ffffffc03df05148
  ...
  [ 3458.154398][    C5] Call trace:
  [ 3458.157648][    C5]  para_steal_clock+0x30/0x50
  [ 3458.162319][    C5]  irqtime_account_process_tick+0x30/0x194
  [ 3458.168148][    C5]  account_process_tick+0x3c/0x280
  [ 3458.173274][    C5]  update_process_times+0x5c/0xf4
  [ 3458.178311][    C5]  tick_sched_timer+0x180/0x384
  [ 3458.183164][    C5]  __run_hrtimer+0x160/0x57c
  [ 3458.187744][    C5]  hrtimer_interrupt+0x258/0x684
  [ 3458.192698][    C5]  arch_timer_handler_virt+0x5c/0xa0
  [ 3458.198002][    C5]  handle_percpu_devid_irq+0xdc/0x414
  [ 3458.203385][    C5]  handle_domain_irq+0xa8/0x168
  [ 3458.208241][    C5]  gic_handle_irq.34493+0x54/0x244
  [ 3458.213359][    C5]  call_on_irq_stack+0x40/0x70
  [ 3458.218125][    C5]  do_interrupt_handler+0x60/0x9c
  [ 3458.223156][    C5]  el1_interrupt+0x34/0x64
  [ 3458.227560][    C5]  el1h_64_irq_handler+0x1c/0x2c
  [ 3458.232503][    C5]  el1h_64_irq+0x7c/0x80
  [ 3458.236736][    C5]  free_vmap_area_noflush+0x108/0x39c
  [ 3458.242126][    C5]  remove_vm_area+0xbc/0x118
  [ 3458.246714][    C5]  vm_remove_mappings+0x48/0x2a4
  [ 3458.251656][    C5]  __vunmap+0x154/0x278
  [ 3458.255796][    C5]  stolen_time_cpu_down_prepare+0xc0/0xd8
  [ 3458.261542][    C5]  cpuhp_invoke_callback+0x248/0xc34
  [ 3458.266842][    C5]  cpuhp_thread_fun+0x1c4/0x248
  [ 3458.271696][    C5]  smpboot_thread_fn+0x1b0/0x400
  [ 3458.276638][    C5]  kthread+0x17c/0x1e0
  [ 3458.280691][    C5]  ret_from_fork+0x10/0x20

As a fix, disable the IRQs during hotplug until we unmap and memset the
stolen time structure.

Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
---
 arch/arm64/kernel/paravirt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c
index 75fed4460407..fc05a08557e0 100644
--- a/arch/arm64/kernel/paravirt.c
+++ b/arch/arm64/kernel/paravirt.c
@@ -70,13 +70,16 @@ static u64 para_steal_clock(int cpu)
 static int stolen_time_cpu_down_prepare(unsigned int cpu)
 {
 	struct pv_time_stolen_time_region *reg;
+	unsigned long flags;
 
 	reg = this_cpu_ptr(&stolen_time_region);
 	if (!reg->kaddr)
 		return 0;
 
+	local_irq_save(flags);
 	memunmap(reg->kaddr);
 	memset(reg, 0, sizeof(*reg));
+	local_irq_restore(flags);
 
 	return 0;
 }
-- 
2.25.1
Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare
Posted by Juergen Gross 4 years ago
On 20.04.22 22:44, Elliot Berman wrote:
> From: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> 
> During hotplug, the stolen time data structure is unmapped and memset.
> There is a possibility of the timer IRQ being triggered before memset
> and stolen time is getting updated as part of this timer IRQ handler. This
> causes the below crash in timer handler -
> 
>    [ 3457.473139][    C5] Unable to handle kernel paging request at virtual address ffffffc03df05148
>    ...
>    [ 3458.154398][    C5] Call trace:
>    [ 3458.157648][    C5]  para_steal_clock+0x30/0x50
>    [ 3458.162319][    C5]  irqtime_account_process_tick+0x30/0x194
>    [ 3458.168148][    C5]  account_process_tick+0x3c/0x280
>    [ 3458.173274][    C5]  update_process_times+0x5c/0xf4
>    [ 3458.178311][    C5]  tick_sched_timer+0x180/0x384
>    [ 3458.183164][    C5]  __run_hrtimer+0x160/0x57c
>    [ 3458.187744][    C5]  hrtimer_interrupt+0x258/0x684
>    [ 3458.192698][    C5]  arch_timer_handler_virt+0x5c/0xa0
>    [ 3458.198002][    C5]  handle_percpu_devid_irq+0xdc/0x414
>    [ 3458.203385][    C5]  handle_domain_irq+0xa8/0x168
>    [ 3458.208241][    C5]  gic_handle_irq.34493+0x54/0x244
>    [ 3458.213359][    C5]  call_on_irq_stack+0x40/0x70
>    [ 3458.218125][    C5]  do_interrupt_handler+0x60/0x9c
>    [ 3458.223156][    C5]  el1_interrupt+0x34/0x64
>    [ 3458.227560][    C5]  el1h_64_irq_handler+0x1c/0x2c
>    [ 3458.232503][    C5]  el1h_64_irq+0x7c/0x80
>    [ 3458.236736][    C5]  free_vmap_area_noflush+0x108/0x39c
>    [ 3458.242126][    C5]  remove_vm_area+0xbc/0x118
>    [ 3458.246714][    C5]  vm_remove_mappings+0x48/0x2a4
>    [ 3458.251656][    C5]  __vunmap+0x154/0x278
>    [ 3458.255796][    C5]  stolen_time_cpu_down_prepare+0xc0/0xd8
>    [ 3458.261542][    C5]  cpuhp_invoke_callback+0x248/0xc34
>    [ 3458.266842][    C5]  cpuhp_thread_fun+0x1c4/0x248
>    [ 3458.271696][    C5]  smpboot_thread_fn+0x1b0/0x400
>    [ 3458.276638][    C5]  kthread+0x17c/0x1e0
>    [ 3458.280691][    C5]  ret_from_fork+0x10/0x20
> 
> As a fix, disable the IRQs during hotplug until we unmap and memset the
> stolen time structure.

This will work for the call chain of your observed crash, but are
you sure that para_steal_clock() can't be called from another cpu
concurrently?

In case you verified this can't happen, you can add my:

Reviewed-by: Juergen Gross <jgross@suse.com>

Otherwise you either need to use RCU for doing the memunmap(), or a
lock to protect stolen_time_region.


Juergen
Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare
Posted by Will Deacon 4 years ago
On Thu, Apr 21, 2022 at 09:44:28AM +0200, Juergen Gross wrote:
> On 20.04.22 22:44, Elliot Berman wrote:
> > From: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
> > 
> > During hotplug, the stolen time data structure is unmapped and memset.
> > There is a possibility of the timer IRQ being triggered before memset
> > and stolen time is getting updated as part of this timer IRQ handler. This
> > causes the below crash in timer handler -
> > 
> >    [ 3457.473139][    C5] Unable to handle kernel paging request at virtual address ffffffc03df05148
> >    ...
> >    [ 3458.154398][    C5] Call trace:
> >    [ 3458.157648][    C5]  para_steal_clock+0x30/0x50
> >    [ 3458.162319][    C5]  irqtime_account_process_tick+0x30/0x194
> >    [ 3458.168148][    C5]  account_process_tick+0x3c/0x280
> >    [ 3458.173274][    C5]  update_process_times+0x5c/0xf4
> >    [ 3458.178311][    C5]  tick_sched_timer+0x180/0x384
> >    [ 3458.183164][    C5]  __run_hrtimer+0x160/0x57c
> >    [ 3458.187744][    C5]  hrtimer_interrupt+0x258/0x684
> >    [ 3458.192698][    C5]  arch_timer_handler_virt+0x5c/0xa0
> >    [ 3458.198002][    C5]  handle_percpu_devid_irq+0xdc/0x414
> >    [ 3458.203385][    C5]  handle_domain_irq+0xa8/0x168
> >    [ 3458.208241][    C5]  gic_handle_irq.34493+0x54/0x244
> >    [ 3458.213359][    C5]  call_on_irq_stack+0x40/0x70
> >    [ 3458.218125][    C5]  do_interrupt_handler+0x60/0x9c
> >    [ 3458.223156][    C5]  el1_interrupt+0x34/0x64
> >    [ 3458.227560][    C5]  el1h_64_irq_handler+0x1c/0x2c
> >    [ 3458.232503][    C5]  el1h_64_irq+0x7c/0x80
> >    [ 3458.236736][    C5]  free_vmap_area_noflush+0x108/0x39c
> >    [ 3458.242126][    C5]  remove_vm_area+0xbc/0x118
> >    [ 3458.246714][    C5]  vm_remove_mappings+0x48/0x2a4
> >    [ 3458.251656][    C5]  __vunmap+0x154/0x278
> >    [ 3458.255796][    C5]  stolen_time_cpu_down_prepare+0xc0/0xd8
> >    [ 3458.261542][    C5]  cpuhp_invoke_callback+0x248/0xc34
> >    [ 3458.266842][    C5]  cpuhp_thread_fun+0x1c4/0x248
> >    [ 3458.271696][    C5]  smpboot_thread_fn+0x1b0/0x400
> >    [ 3458.276638][    C5]  kthread+0x17c/0x1e0
> >    [ 3458.280691][    C5]  ret_from_fork+0x10/0x20
> > 
> > As a fix, disable the IRQs during hotplug until we unmap and memset the
> > stolen time structure.
> 
> This will work for the call chain of your observed crash, but are
> you sure that para_steal_clock() can't be called from another cpu
> concurrently?

Agreed, this needs checking as update_rq_clock() is called from all over the
place.

> In case you verified this can't happen, you can add my:
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>
> 
> Otherwise you either need to use RCU for doing the memunmap(), or a
> lock to protect stolen_time_region.

Yes, I think RCU would make a lot of sense here, deferring the memunmap
until there are no longer any readers. The reader is currently doing:

	if (!reg->kaddr)
		return 0;

	return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time));

so we'd also want an rcu_dereference() on reg->kaddr to avoid reading it
twice.

Will
Re: [PATCH] arm64: paravirt: Disable IRQs during stolen_time_cpu_down_prepare
Posted by Elliot Berman 4 years ago

On 4/21/2022 2:10 AM, Will Deacon wrote:
> On Thu, Apr 21, 2022 at 09:44:28AM +0200, Juergen Gross wrote:
>> On 20.04.22 22:44, Elliot Berman wrote:
>>> From: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
>>>
>>> During hotplug, the stolen time data structure is unmapped and memset.
>>> There is a possibility of the timer IRQ being triggered before memset
>>> and stolen time is getting updated as part of this timer IRQ handler. This
>>> causes the below crash in timer handler -
>>>
>>>     [ 3457.473139][    C5] Unable to handle kernel paging request at virtual address ffffffc03df05148
>>>     ...
>>>     [ 3458.154398][    C5] Call trace:
>>>     [ 3458.157648][    C5]  para_steal_clock+0x30/0x50
>>>     [ 3458.162319][    C5]  irqtime_account_process_tick+0x30/0x194
>>>     [ 3458.168148][    C5]  account_process_tick+0x3c/0x280
>>>     [ 3458.173274][    C5]  update_process_times+0x5c/0xf4
>>>     [ 3458.178311][    C5]  tick_sched_timer+0x180/0x384
>>>     [ 3458.183164][    C5]  __run_hrtimer+0x160/0x57c
>>>     [ 3458.187744][    C5]  hrtimer_interrupt+0x258/0x684
>>>     [ 3458.192698][    C5]  arch_timer_handler_virt+0x5c/0xa0
>>>     [ 3458.198002][    C5]  handle_percpu_devid_irq+0xdc/0x414
>>>     [ 3458.203385][    C5]  handle_domain_irq+0xa8/0x168
>>>     [ 3458.208241][    C5]  gic_handle_irq.34493+0x54/0x244
>>>     [ 3458.213359][    C5]  call_on_irq_stack+0x40/0x70
>>>     [ 3458.218125][    C5]  do_interrupt_handler+0x60/0x9c
>>>     [ 3458.223156][    C5]  el1_interrupt+0x34/0x64
>>>     [ 3458.227560][    C5]  el1h_64_irq_handler+0x1c/0x2c
>>>     [ 3458.232503][    C5]  el1h_64_irq+0x7c/0x80
>>>     [ 3458.236736][    C5]  free_vmap_area_noflush+0x108/0x39c
>>>     [ 3458.242126][    C5]  remove_vm_area+0xbc/0x118
>>>     [ 3458.246714][    C5]  vm_remove_mappings+0x48/0x2a4
>>>     [ 3458.251656][    C5]  __vunmap+0x154/0x278
>>>     [ 3458.255796][    C5]  stolen_time_cpu_down_prepare+0xc0/0xd8
>>>     [ 3458.261542][    C5]  cpuhp_invoke_callback+0x248/0xc34
>>>     [ 3458.266842][    C5]  cpuhp_thread_fun+0x1c4/0x248
>>>     [ 3458.271696][    C5]  smpboot_thread_fn+0x1b0/0x400
>>>     [ 3458.276638][    C5]  kthread+0x17c/0x1e0
>>>     [ 3458.280691][    C5]  ret_from_fork+0x10/0x20
>>>
>>> As a fix, disable the IRQs during hotplug until we unmap and memset the
>>> stolen time structure.
>>
>> This will work for the call chain of your observed crash, but are
>> you sure that para_steal_clock() can't be called from another cpu
>> concurrently?
> 
> Agreed, this needs checking as update_rq_clock() is called from all over the
> place.
> 
>> In case you verified this can't happen, you can add my:
>>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>>
>> Otherwise you either need to use RCU for doing the memunmap(), or a
>> lock to protect stolen_time_region.
> 
> Yes, I think RCU would make a lot of sense here, deferring the memunmap
> until there are no longer any readers. The reader is currently doing:
> 
> 	if (!reg->kaddr)
> 		return 0;
> 
> 	return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time));
> 
> so we'd also want an rcu_dereference() on reg->kaddr to avoid reading it
> twice.
> 
> Will


We have seen this particular stack many times in our testing, but not 
any other way. Will's comments are agreeable, I'll send out an updated 
patch.

Thanks,
Elliot