[PATCH v2] iommu/vt-d: fix system hang on reboot -f

Yunhui Cui posted 1 patch 11 months, 2 weeks ago
There is a newer version of this series
drivers/iommu/intel/iommu.c | 4 ----
1 file changed, 4 deletions(-)
[PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Yunhui Cui 11 months, 2 weeks ago
We found that executing the command ./a.out &;reboot -f (where a.out is a
program that only executes a while(1) infinite loop) can probabilistically
cause the system to hang in the intel_iommu_shutdown() function, rendering
it unresponsive. Through analysis, we identified that the factors
contributing to this issue are as follows:

1. The reboot -f command does not prompt the kernel to notify the
application layer to perform cleanup actions, allowing the application to
continue running.

2. When the kernel reaches the intel_iommu_shutdown() function, only the
BSP (Bootstrap Processor) CPU is operational in the system.

3. During the execution of intel_iommu_shutdown(), the function down_write
(&dmar_global_lock) causes the process to sleep and be scheduled out.

4. At this point, though the processor's interrupt flag is not cleared,
 allowing interrupts to be accepted. However, only legacy devices and NMI
(Non-Maskable Interrupt) interrupts could come in, as other interrupts
routing have already been disabled. If no legacy or NMI interrupts occur
at this stage, the scheduler will not be able to run.

5. If the application got scheduled at this time is executing a while(1)-
type loop, it will be unable to be preempted, leading to an infinite loop
and causing the system to become unresponsive.

To resolve this issue, the intel_iommu_shutdown() function should not
execute down_write(), which can potentially cause the process to be
scheduled out. Furthermore, since only the BSP is running during the later
stages of the reboot, there is no need for protection against parallel
access to the DMAR (DMA Remapping) unit. Therefore, the following lines
could be removed:

down_write(&dmar_global_lock);
up_write(&dmar_global_lock);

After testing, the issue has been resolved.

Fixes: 6c3a44ed3c55 ("iommu/vt-d: Turn off translations at shutdown")
Co-developed-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
---
 drivers/iommu/intel/iommu.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc46098f875b..6d9f2e56ce88 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2871,16 +2871,12 @@ void intel_iommu_shutdown(void)
 	if (no_iommu || dmar_disabled)
 		return;
 
-	down_write(&dmar_global_lock);
-
 	/* Disable PMRs explicitly here. */
 	for_each_iommu(iommu, drhd)
 		iommu_disable_protect_mem_regions(iommu);
 
 	/* Make sure the IOMMUs are switched off */
 	intel_disable_iommus();
-
-	up_write(&dmar_global_lock);
 }
 
 static struct intel_iommu *dev_to_intel_iommu(struct device *dev)
-- 
2.39.2
Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Baolu Lu 11 months, 2 weeks ago
On 2025/2/25 14:48, Yunhui Cui wrote:
> We found that executing the command ./a.out &;reboot -f (where a.out is a
> program that only executes a while(1) infinite loop) can probabilistically
> cause the system to hang in the intel_iommu_shutdown() function, rendering
> it unresponsive. Through analysis, we identified that the factors
> contributing to this issue are as follows:
> 
> 1. The reboot -f command does not prompt the kernel to notify the
> application layer to perform cleanup actions, allowing the application to
> continue running.
> 
> 2. When the kernel reaches the intel_iommu_shutdown() function, only the
> BSP (Bootstrap Processor) CPU is operational in the system.
> 
> 3. During the execution of intel_iommu_shutdown(), the function down_write
> (&dmar_global_lock) causes the process to sleep and be scheduled out.
> 
> 4. At this point, though the processor's interrupt flag is not cleared,
>   allowing interrupts to be accepted. However, only legacy devices and NMI
> (Non-Maskable Interrupt) interrupts could come in, as other interrupts
> routing have already been disabled. If no legacy or NMI interrupts occur
> at this stage, the scheduler will not be able to run.
> 
> 5. If the application got scheduled at this time is executing a while(1)-
> type loop, it will be unable to be preempted, leading to an infinite loop
> and causing the system to become unresponsive.
> 
> To resolve this issue, the intel_iommu_shutdown() function should not
> execute down_write(), which can potentially cause the process to be
> scheduled out. Furthermore, since only the BSP is running during the later
> stages of the reboot, there is no need for protection against parallel
> access to the DMAR (DMA Remapping) unit. Therefore, the following lines
> could be removed:

Good summary! Thank you!

> 
> down_write(&dmar_global_lock);
> up_write(&dmar_global_lock);
> 
> After testing, the issue has been resolved.
> 
> Fixes: 6c3a44ed3c55 ("iommu/vt-d: Turn off translations at shutdown")
> Co-developed-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
> ---
>   drivers/iommu/intel/iommu.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index cc46098f875b..6d9f2e56ce88 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2871,16 +2871,12 @@ void intel_iommu_shutdown(void)
>   	if (no_iommu || dmar_disabled)
>   		return;
>   
> -	down_write(&dmar_global_lock);
> -
>   	/* Disable PMRs explicitly here. */
>   	for_each_iommu(iommu, drhd)

Removing the locking for for_each_iommu() will trigger a suspicious RCU
usage splat. You need to replace this helper with a raw
list_for_each_entry() with some comments around it to explain why it is
safe.

>   		iommu_disable_protect_mem_regions(iommu);
>   
>   	/* Make sure the IOMMUs are switched off */
>   	intel_disable_iommus();
> -
> -	up_write(&dmar_global_lock);
>   }
>   
>   static struct intel_iommu *dev_to_intel_iommu(struct device *dev)

Thanks,
baolu
Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Ethan Zhao 11 months, 2 weeks ago
在 2025/2/25 15:01, Baolu Lu 写道:
> On 2025/2/25 14:48, Yunhui Cui wrote:
>> We found that executing the command ./a.out &;reboot -f (where a.out 
>> is a
>> program that only executes a while(1) infinite loop) can 
>> probabilistically
>> cause the system to hang in the intel_iommu_shutdown() function, 
>> rendering
>> it unresponsive. Through analysis, we identified that the factors
>> contributing to this issue are as follows:
>>
>> 1. The reboot -f command does not prompt the kernel to notify the
>> application layer to perform cleanup actions, allowing the 
>> application to
>> continue running.
>>
>> 2. When the kernel reaches the intel_iommu_shutdown() function, only the
>> BSP (Bootstrap Processor) CPU is operational in the system.
>>
>> 3. During the execution of intel_iommu_shutdown(), the function 
>> down_write
>> (&dmar_global_lock) causes the process to sleep and be scheduled out.
>>
>> 4. At this point, though the processor's interrupt flag is not cleared,
>>   allowing interrupts to be accepted. However, only legacy devices 
>> and NMI
>> (Non-Maskable Interrupt) interrupts could come in, as other interrupts
>> routing have already been disabled. If no legacy or NMI interrupts occur
>> at this stage, the scheduler will not be able to run.
>>
>> 5. If the application got scheduled at this time is executing a 
>> while(1)-
>> type loop, it will be unable to be preempted, leading to an infinite 
>> loop
>> and causing the system to become unresponsive.
>>
>> To resolve this issue, the intel_iommu_shutdown() function should not
>> execute down_write(), which can potentially cause the process to be
>> scheduled out. Furthermore, since only the BSP is running during the 
>> later
>> stages of the reboot, there is no need for protection against parallel
>> access to the DMAR (DMA Remapping) unit. Therefore, the following lines
>> could be removed:
>
> Good summary! Thank you!
>
>>
>> down_write(&dmar_global_lock);
>> up_write(&dmar_global_lock);
>>
>> After testing, the issue has been resolved.
>>
>> Fixes: 6c3a44ed3c55 ("iommu/vt-d: Turn off translations at shutdown")
>> Co-developed-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> Signed-off-by: Yunhui Cui <cuiyunhui@bytedance.com>
>> ---
>>   drivers/iommu/intel/iommu.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index cc46098f875b..6d9f2e56ce88 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -2871,16 +2871,12 @@ void intel_iommu_shutdown(void)
>>       if (no_iommu || dmar_disabled)
>>           return;
>>   -    down_write(&dmar_global_lock);
>> -
>>       /* Disable PMRs explicitly here. */
>>       for_each_iommu(iommu, drhd)
>
> Removing the locking for for_each_iommu() will trigger a suspicious RCU
> usage splat. You need to replace this helper with a raw
> list_for_each_entry() with some comments around it to explain why it is
> safe.
>
Oops,  RCU checking hids behind the for_each_iommu() macro.

How about

void intel_iommu_shutdown(void)

{

     struct dmar_drhd_unit *drhd;

     struct intel_iommu *iommu = NULL;

     if (no_iommu || dmar_disabled)

         return;


     /* Here only BSP is running, no RCU cocurrent lock checking needed */

     list_for_each_entry(drhd, &dmar_drhd_units, list) {

         iommu = drhd->iommu;

         /* Disable PMRs explicitly here. */

         iommu_disable_protect_mem_regions(iommu);

         iommu_disable_translation(iommu);

     }

}


Thanks,

Ethan

>> iommu_disable_protect_mem_regions(iommu);
>>         /* Make sure the IOMMUs are switched off */
>>       intel_disable_iommus();
>> -
>> -    up_write(&dmar_global_lock);
>>   }
>>     static struct intel_iommu *dev_to_intel_iommu(struct device *dev)
>
> Thanks,
> baolu

-- 
"firm, enduring, strong, and long-lived"

Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Jason Gunthorpe 11 months, 2 weeks ago
On Tue, Feb 25, 2025 at 04:54:54PM +0800, Ethan Zhao wrote:
> > On 2025/2/25 14:48, Yunhui Cui wrote:
> > > We found that executing the command ./a.out &;reboot -f (where a.out
> > > is a
> > > program that only executes a while(1) infinite loop) can
> > > probabilistically
> > > cause the system to hang in the intel_iommu_shutdown() function,
> > > rendering
> > > it unresponsive. Through analysis, we identified that the factors
> > > contributing to this issue are as follows:
> > > 
> > > 1. The reboot -f command does not prompt the kernel to notify the
> > > application layer to perform cleanup actions, allowing the
> > > application to
> > > continue running.
> > > 
> > > 2. When the kernel reaches the intel_iommu_shutdown() function, only the
> > > BSP (Bootstrap Processor) CPU is operational in the system.
> > > 
> > > 3. During the execution of intel_iommu_shutdown(), the function
> > > down_write
> > > (&dmar_global_lock) causes the process to sleep and be scheduled out.

Why does this happen? If the kernel has shutdown other CPUs then what
thread is holding the other side of this lock and why?

> > > 4. At this point, though the processor's interrupt flag is not cleared,
> > >   allowing interrupts to be accepted. However, only legacy devices
> > > and NMI
> > > (Non-Maskable Interrupt) interrupts could come in, as other interrupts
> > > routing have already been disabled. If no legacy or NMI interrupts occur
> > > at this stage, the scheduler will not be able to run.

> > > 5. If the application got scheduled at this time is executing a
> > > while(1)-
> > > type loop, it will be unable to be preempted, leading to an infinite
> > > loop
> > > and causing the system to become unresponsive.

If the schedular doesn't run how did we get from 4 -> 5?

Maybe the issue is the shutdown handler here is running in the wrong
time and it should not be running after the scheduler has been shut
down.

I don't think removing the lock is a great idea without more
explanation.

Jason
Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Ethan Zhao 11 months, 2 weeks ago
在 2025/2/25 22:26, Jason Gunthorpe 写道:
> On Tue, Feb 25, 2025 at 04:54:54PM +0800, Ethan Zhao wrote:
>>> On 2025/2/25 14:48, Yunhui Cui wrote:
>>>> We found that executing the command ./a.out &;reboot -f (where a.out
>>>> is a
>>>> program that only executes a while(1) infinite loop) can
>>>> probabilistically
>>>> cause the system to hang in the intel_iommu_shutdown() function,
>>>> rendering
>>>> it unresponsive. Through analysis, we identified that the factors
>>>> contributing to this issue are as follows:
>>>>
>>>> 1. The reboot -f command does not prompt the kernel to notify the
>>>> application layer to perform cleanup actions, allowing the
>>>> application to
>>>> continue running.
>>>>
>>>> 2. When the kernel reaches the intel_iommu_shutdown() function, only the
>>>> BSP (Bootstrap Processor) CPU is operational in the system.
>>>>
>>>> 3. During the execution of intel_iommu_shutdown(), the function
>>>> down_write
>>>> (&dmar_global_lock) causes the process to sleep and be scheduled out.
> Why does this happen? If the kernel has shutdown other CPUs then what
> thread is holding the other side of this lock and why?
>
>>>> 4. At this point, though the processor's interrupt flag is not cleared,
>>>>    allowing interrupts to be accepted. However, only legacy devices
>>>> and NMI
>>>> (Non-Maskable Interrupt) interrupts could come in, as other interrupts
>>>> routing have already been disabled. If no legacy or NMI interrupts occur
>>>> at this stage, the scheduler will not be able to run.
>>>> 5. If the application got scheduled at this time is executing a
>>>> while(1)-
>>>> type loop, it will be unable to be preempted, leading to an infinite
>>>> loop
>>>> and causing the system to become unresponsive.
> If the schedular doesn't run how did we get from 4 -> 5?
>
> Maybe the issue is the shutdown handler here is running in the wrong
> time and it should not be running after the scheduler has been shut
> down.
>
> I don't think removing the lock is a great idea without more
> explanation.

Seems it is not so simple job to explain why there is no race window between
this iommu_shutdown() and following dmar_global_lock holders.

1. PCIe hotplug dmar_pci_bus_notifier()

2. mm_core_init detect_intel_iommu()

3. late_initcall dmar_free_unused_resources()

4. acpi attach dmar_device_hotplug()

5. pci_iommu_init intel_iommu_init() init_dmars()

6. rootfs_initcall ir_dev_scope_init()

though here is the last stage of reboot. then how about we turn back to v1

Just repalce with own_write() with down_write_trylock().

Thanks,

Ethan


>
> Jason

-- 
"firm, enduring, strong, and long-lived"

Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Baolu Lu 11 months, 2 weeks ago
On 2/26/25 11:50, Ethan Zhao wrote:
>>>>>
>> If the schedular doesn't run how did we get from 4 -> 5?
>>
>> Maybe the issue is the shutdown handler here is running in the wrong
>> time and it should not be running after the scheduler has been shut
>> down.
>>
>> I don't think removing the lock is a great idea without more
>> explanation.
> 
> Seems it is not so simple job to explain why there is no race window 
> between
> this iommu_shutdown() and following dmar_global_lock holders.
> 
> 1. PCIe hotplug dmar_pci_bus_notifier()
> 
> 2. mm_core_init detect_intel_iommu()
> 
> 3. late_initcall dmar_free_unused_resources()
> 
> 4. acpi attach dmar_device_hotplug()
> 
> 5. pci_iommu_init intel_iommu_init() init_dmars()
> 
> 6. rootfs_initcall ir_dev_scope_init()
> 
> though here is the last stage of reboot. then how about we turn back to v1
> 
> Just repalce with own_write() with down_write_trylock().

I don't think trylock is a reasonable solution. intel_iommu_shutdown()
should not become a no-op simply because it cannot acquire a lock
immediately.

The lock here is to protect the drhd (representation of iommu hardware)
list. It needs protection because this driver supports iommu hot-add and
remove, which is triggered by an ACPI event for I/O board hotplug.
Provided the system does not respond to those events when this function
is called, it's fine to remove the lock.

Thanks,
baolu
Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Ethan Zhao 11 months, 2 weeks ago
在 2025/2/26 13:18, Baolu Lu 写道:
> On 2/26/25 11:50, Ethan Zhao wrote:
>>>>>>
>>> If the schedular doesn't run how did we get from 4 -> 5?
>>>
>>> Maybe the issue is the shutdown handler here is running in the wrong
>>> time and it should not be running after the scheduler has been shut
>>> down.
>>>
>>> I don't think removing the lock is a great idea without more
>>> explanation.
>>
>> Seems it is not so simple job to explain why there is no race window 
>> between
>> this iommu_shutdown() and following dmar_global_lock holders.
>>
>> 1. PCIe hotplug dmar_pci_bus_notifier()
>>
>> 2. mm_core_init detect_intel_iommu()
>>
>> 3. late_initcall dmar_free_unused_resources()
>>
>> 4. acpi attach dmar_device_hotplug()
>>
>> 5. pci_iommu_init intel_iommu_init() init_dmars()
>>
>> 6. rootfs_initcall ir_dev_scope_init()
>>
>> though here is the last stage of reboot. then how about we turn back 
>> to v1
>>
>> Just repalce with own_write() with down_write_trylock().
>
> I don't think trylock is a reasonable solution. intel_iommu_shutdown()
> should not become a no-op simply because it cannot acquire a lock
> immediately.

No other CPUs is holding lock after they were brought down by sync call to

functionnative_stop_other_cpus(1).

So actually it wouldn't fail to acquire a lock.  this is also the reason why we don't

need to down_write() thedmar_global_lock.

>
> The lock here is to protect the drhd (representation of iommu hardware)
> list. It needs protection because this driver supports iommu hot-add and
> remove, which is triggered by an ACPI event for I/O board hotplug.

Yup, the lock is used to protect the global listdmar_drhd_units.

but here all IOAPIC/LAPIC are brought down, hotplug interrupts couldn't 
happend either. (only legacy and NMI are alive).

> Provided the system does not respond to those events when this function
> is called, it's fine to remove the lock.
I agree.
>
> Thanks,
> baolu

-- 
"firm, enduring, strong, and long-lived"

Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Jason Gunthorpe 11 months, 2 weeks ago
On Wed, Feb 26, 2025 at 01:55:28PM +0800, Ethan Zhao wrote:
> > Provided the system does not respond to those events when this function
> > is called, it's fine to remove the lock.
> I agree.

I think it is running the destruction of the iommu far too late in the
process. IMHO it should be done after all the drivers have been
shutdown, before the CPUs go single threaded.

Jason
Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Ethan Zhao 11 months, 2 weeks ago
On 2/26/2025 9:04 PM, Jason Gunthorpe wrote:
> On Wed, Feb 26, 2025 at 01:55:28PM +0800, Ethan Zhao wrote:
>>> Provided the system does not respond to those events when this function
>>> is called, it's fine to remove the lock.
>> I agree.
> I think it is running the destruction of the iommu far too late in the
> process. IMHO it should be done after all the drivers have been
> shutdown, before the CPUs go single threaded.

Hmm... so far it is fine, the iommu_shutdown only has a little work to
do, disable the translation, the PMR disabling is just backward compatible,
was deprecated already. if we move it to one position where all CPUs are
cycling, we don't know what kind of user-land tasks left there (i.e. reboot -f
case), it would be hard to full-fill the requirement of Intel VT-d, no ongoing
transaction there on hardware when issue the translation disabling command.

Of course, once you have clear motivation to re-position it. we would like
to work on it.

Thanks,
Ethan

>
> Jason
>
Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Jason Gunthorpe 11 months, 2 weeks ago
On Thu, Feb 27, 2025 at 08:40:31AM +0800, Ethan Zhao wrote:
> 
> On 2/26/2025 9:04 PM, Jason Gunthorpe wrote:
> > On Wed, Feb 26, 2025 at 01:55:28PM +0800, Ethan Zhao wrote:
> > > > Provided the system does not respond to those events when this function
> > > > is called, it's fine to remove the lock.
> > > I agree.
> > I think it is running the destruction of the iommu far too late in the
> > process. IMHO it should be done after all the drivers have been
> > shutdown, before the CPUs go single threaded.
> 
> Hmm... so far it is fine, the iommu_shutdown only has a little work to
> do, disable the translation, the PMR disabling is just backward compatible,
> was deprecated already. if we move it to one position where all CPUs are
> cycling, we don't know what kind of user-land tasks left there (i.e. reboot -f
> case), it would be hard to full-fill the requirement of Intel VT-d, no ongoing
> transaction there on hardware when issue the translation disabling command.

There is no guarentee device dma is halted anyhow at this point either.

Jason
Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Ethan Zhao 11 months, 2 weeks ago
On 2/28/2025 4:38 AM, Jason Gunthorpe wrote:
> On Thu, Feb 27, 2025 at 08:40:31AM +0800, Ethan Zhao wrote:
>> On 2/26/2025 9:04 PM, Jason Gunthorpe wrote:
>>> On Wed, Feb 26, 2025 at 01:55:28PM +0800, Ethan Zhao wrote:
>>>>> Provided the system does not respond to those events when this function
>>>>> is called, it's fine to remove the lock.
>>>> I agree.
>>> I think it is running the destruction of the iommu far too late in the
>>> process. IMHO it should be done after all the drivers have been
>>> shutdown, before the CPUs go single threaded.
>> Hmm... so far it is fine, the iommu_shutdown only has a little work to
>> do, disable the translation, the PMR disabling is just backward compatible,
>> was deprecated already. if we move it to one position where all CPUs are
>> cycling, we don't know what kind of user-land tasks left there (i.e. reboot -f
>> case), it would be hard to full-fill the requirement of Intel VT-d, no ongoing
>> transaction there on hardware when issue the translation disabling command.
> There is no guarentee device dma is halted anyhow at this point either.

The -f option to reboot command, suggests data corruption possibility.although

the IOMMU strives not to cross the transaction boundaries of its address 
translation.

over all, we should make the reboot -f function works. not to hang 
there. meanwhile

it doesn't break anything else so far.

Thanks,

Ethan

>
> Jason
Re: [External] Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by yunhui cui 11 months, 2 weeks ago
Hi All,

On Fri, Feb 28, 2025 at 8:51 AM Ethan Zhao <etzhao1900@gmail.com> wrote:
>
>
> On 2/28/2025 4:38 AM, Jason Gunthorpe wrote:
> > On Thu, Feb 27, 2025 at 08:40:31AM +0800, Ethan Zhao wrote:
> >> On 2/26/2025 9:04 PM, Jason Gunthorpe wrote:
> >>> On Wed, Feb 26, 2025 at 01:55:28PM +0800, Ethan Zhao wrote:
> >>>>> Provided the system does not respond to those events when this function
> >>>>> is called, it's fine to remove the lock.
> >>>> I agree.
> >>> I think it is running the destruction of the iommu far too late in the
> >>> process. IMHO it should be done after all the drivers have been
> >>> shutdown, before the CPUs go single threaded.
> >> Hmm... so far it is fine, the iommu_shutdown only has a little work to
> >> do, disable the translation, the PMR disabling is just backward compatible,
> >> was deprecated already. if we move it to one position where all CPUs are
> >> cycling, we don't know what kind of user-land tasks left there (i.e. reboot -f
> >> case), it would be hard to full-fill the requirement of Intel VT-d, no ongoing
> >> transaction there on hardware when issue the translation disabling command.
> > There is no guarentee device dma is halted anyhow at this point either.
>
> The -f option to reboot command, suggests data corruption possibility.although
>
> the IOMMU strives not to cross the transaction boundaries of its address
> translation.
>
> over all, we should make the reboot -f function works. not to hang
> there. meanwhile
>
> it doesn't break anything else so far.

patch v1:
if (!down_write_trylock(&dmar_global_lock))
return;

 patch v3:
/* All other CPUs were brought down, hotplug interrupts were disabled,
no lock and RCU checking needed anymore*/
list_for_each_entry(drhd, &dmar_drhd_units, list) {
iommu = drhd->iommu;
/* Disable PMRs explicitly here. */
iommu_disable_protect_mem_regions(iommu);
iommu_disable_translation(iommu);
}

Since we can remove down_write/up_write, it indicates that during the
IOMMU shutdown process, we are not particularly concerned about
whether others are accessing the critical section. Therefore, it can
be understood that Patch v1 can successfully acquire the lock and
proceed with subsequent operations. From this perspective, Patch v1 is
reasonable and can prevent situations where there is an actual owner
of dmar_global_lock, even though it does not perform a real IOMMU
shutdown.

However, if the IOMMU shutdown truly fails, IOMMU_WAIT_OP will trigger
a panic(). Removing down_write/up_write offers better maintainability
than Patch v1 (as we can retrieve the cause from the vmcore). But this
might not be significant, since the reboot could have been completed
successfully, and the occurrence of panic() might instead cause
confusion.

In summary, it is essential to complete the reboot action here rather
than causing the system to hang. So, which do you prefer, v1 or v3?


>
> Thanks,
>
> Ethan
>
> >
> > Jason

Thanks,
Yunhui
Re: [External] Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Ethan Zhao 11 months, 2 weeks ago
在 2025/2/28 10:18, yunhui cui 写道:
> Hi All,
>
> On Fri, Feb 28, 2025 at 8:51 AM Ethan Zhao <etzhao1900@gmail.com> wrote:
>>
>> On 2/28/2025 4:38 AM, Jason Gunthorpe wrote:
>>> On Thu, Feb 27, 2025 at 08:40:31AM +0800, Ethan Zhao wrote:
>>>> On 2/26/2025 9:04 PM, Jason Gunthorpe wrote:
>>>>> On Wed, Feb 26, 2025 at 01:55:28PM +0800, Ethan Zhao wrote:
>>>>>>> Provided the system does not respond to those events when this function
>>>>>>> is called, it's fine to remove the lock.
>>>>>> I agree.
>>>>> I think it is running the destruction of the iommu far too late in the
>>>>> process. IMHO it should be done after all the drivers have been
>>>>> shutdown, before the CPUs go single threaded.
>>>> Hmm... so far it is fine, the iommu_shutdown only has a little work to
>>>> do, disable the translation, the PMR disabling is just backward compatible,
>>>> was deprecated already. if we move it to one position where all CPUs are
>>>> cycling, we don't know what kind of user-land tasks left there (i.e. reboot -f
>>>> case), it would be hard to full-fill the requirement of Intel VT-d, no ongoing
>>>> transaction there on hardware when issue the translation disabling command.
>>> There is no guarentee device dma is halted anyhow at this point either.
>> The -f option to reboot command, suggests data corruption possibility.although
>>
>> the IOMMU strives not to cross the transaction boundaries of its address
>> translation.
>>
>> over all, we should make the reboot -f function works. not to hang
>> there. meanwhile
>>
>> it doesn't break anything else so far.
> patch v1:
> if (!down_write_trylock(&dmar_global_lock))
> return;

We should also think about the corner case of reboot without -f option, there might be something

wrong with one of the devices there on other CPUs they don't release the lock and were shutdown

by NMI, the dmar_global_lock is held by one of them, so we just bail out here at at once ?

leaving the IOMMU translation etc in enabling status. Is it better to loop with 10 seconds timeout

to try the lock then make sure no better choice, then continue the disabling of IOMMU translation & PMR ?.

Even the reboot -f case, leaving the IOMMU translation in enabling status is worse choice.

So seems we shouldn't bail out immediately after one time trylock whatever.

>
>   patch v3:
> /* All other CPUs were brought down, hotplug interrupts were disabled,
> no lock and RCU checking needed anymore*/
> list_for_each_entry(drhd, &dmar_drhd_units, list) {
> iommu = drhd->iommu;
> /* Disable PMRs explicitly here. */
> iommu_disable_protect_mem_regions(iommu);
> iommu_disable_translation(iommu);
> }
>
> Since we can remove down_write/up_write, it indicates that during the
> IOMMU shutdown process, we are not particularly concerned about
> whether others are accessing the critical section. Therefore, it can
> be understood that Patch v1 can successfully acquire the lock and
> proceed with subsequent operations. From this perspective, Patch v1 is
> reasonable and can prevent situations where there is an actual owner
> of dmar_global_lock, even though it does not perform a real IOMMU
> shutdown.
>
> However, if the IOMMU shutdown truly fails, IOMMU_WAIT_OP will trigger
> a panic(). Removing down_write/up_write offers better maintainability
> than Patch v1 (as we can retrieve the cause from the vmcore). But this
> might not be significant, since the reboot could have been completed
> successfully, and the occurrence of panic() might instead cause
> confusion.
>
> In summary, it is essential to complete the reboot action here rather
> than causing the system to hang. So, which do you prefer, v1 or v3?

Would v3 miss something ? please comment.

normal reboot/shutdown without -f case, function works right without any data corrupt or

leaving device in confusion status, good or bad situation.

reboot -f case, functions works, strives to not corrupt data or leave device in confusion status.

...

Thanks,

Ethan

>
>
>> Thanks,
>>
>> Ethan
>>
>>> Jason
> Thanks,
> Yunhui

-- 
"firm, enduring, strong, and long-lived"

Re: [PATCH v2] iommu/vt-d: fix system hang on reboot -f
Posted by Ethan Zhao 11 months, 2 weeks ago
On 2/25/2025 10:26 PM, Jason Gunthorpe wrote:
> On Tue, Feb 25, 2025 at 04:54:54PM +0800, Ethan Zhao wrote:
>>> On 2025/2/25 14:48, Yunhui Cui wrote:
>>>> We found that executing the command ./a.out &;reboot -f (where a.out
>>>> is a
>>>> program that only executes a while(1) infinite loop) can
>>>> probabilistically
>>>> cause the system to hang in the intel_iommu_shutdown() function,
>>>> rendering
>>>> it unresponsive. Through analysis, we identified that the factors
>>>> contributing to this issue are as follows:
>>>>
>>>> 1. The reboot -f command does not prompt the kernel to notify the
>>>> application layer to perform cleanup actions, allowing the
>>>> application to
>>>> continue running.
>>>>
>>>> 2. When the kernel reaches the intel_iommu_shutdown() function, only the
>>>> BSP (Bootstrap Processor) CPU is operational in the system.
>>>>
>>>> 3. During the execution of intel_iommu_shutdown(), the function
>>>> down_write
>>>> (&dmar_global_lock) causes the process to sleep and be scheduled out.
> Why does this happen? If the kernel has shutdown other CPUs then what
> thread is holding the other side of this lock and why?

The down_write() actually executes might_sleep()->might_resched()->__cond_resched()->

__schedule() first before acquiring the lock, thus there is change to got scheduled out.

The caller is scheduled out due to voluntary sleep or because its time slice is exhausted,

not because the lock is held by other processes here.

>>>> 4. At this point, though the processor's interrupt flag is not cleared,
>>>>    allowing interrupts to be accepted. However, only legacy devices
>>>> and NMI
>>>> (Non-Maskable Interrupt) interrupts could come in, as other interrupts
>>>> routing have already been disabled. If no legacy or NMI interrupts occur
>>>> at this stage, the scheduler will not be able to run.
>>>> 5. If the application got scheduled at this time is executing a
>>>> while(1)-
>>>> type loop, it will be unable to be preempted, leading to an infinite
>>>> loop
>>>> and causing the system to become unresponsive.
> If the schedular doesn't run how did we get from 4 -> 5?

We got from 4-->5 because caller thread's voluntary invocation of the scheduler.

>
> Maybe the issue is the shutdown handler here is running in the wrong
> time and it should not be running after the scheduler has been shut
> down.

Move thex86_platform.iommu_shutdown() before hpet_disable() ?

I didn't figure out why we need scheduler is active when we execute

iommu_shutdown(), or the reason to keep this pair of down_write()/

up_write(). could you help to shed light ?

Thanks,
Ethan

>
> I don't think removing the lock is a great idea without more
> explanation.
>
> Jason
>