[PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()

Zhenzhong Duan posted 1 patch 1 year ago
drivers/iommu/intel/cache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Posted by Zhenzhong Duan 1 year ago
When setup mapping on an paging domain before the domain is attached to any
device, a NULL pointer dereference triggers as below:

 BUG: kernel NULL pointer dereference, address: 0000000000000200
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
 ...
 Call Trace:
  <TASK>
  ? __die+0x20/0x70
  ? page_fault_oops+0x80/0x150
  ? do_user_addr_fault+0x5f/0x670
  ? pfn_to_dma_pte+0xca/0x280
  ? exc_page_fault+0x78/0x170
  ? asm_exc_page_fault+0x22/0x30
  ? cache_tag_flush_range_np+0x114/0x1f0
  intel_iommu_iotlb_sync_map+0x16/0x20
  iommu_map+0x59/0xd0
  batch_to_domain+0x154/0x1e0
  iopt_area_fill_domains+0x106/0x300
  iopt_map_pages+0x1bc/0x290
  iopt_map_user_pages+0xe8/0x1e0
  ? xas_load+0x9/0xb0
  iommufd_ioas_map+0xc9/0x1c0
  iommufd_fops_ioctl+0xff/0x1b0
  __x64_sys_ioctl+0x87/0xc0
  do_syscall_64+0x50/0x110
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

qi_batch structure is allocated when domain is attached to a device for the
first time, when a mapping is setup before that, qi_batch is referenced to
do batched flush and trigger above issue.

Fix it by checking qi_batch pointer and bypass batched flushing if no
device is attached.

Cc: stable@vger.kernel.org
Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 drivers/iommu/intel/cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
index e5b89f728ad3..bb9dae9a7fba 100644
--- a/drivers/iommu/intel/cache.c
+++ b/drivers/iommu/intel/cache.c
@@ -264,7 +264,7 @@ static unsigned long calculate_psi_aligned_address(unsigned long start,
 
 static void qi_batch_flush_descs(struct intel_iommu *iommu, struct qi_batch *batch)
 {
-	if (!iommu || !batch->index)
+	if (!iommu || !batch || !batch->index)
 		return;
 
 	qi_submit_sync(iommu, batch->descs, batch->index, 0);
-- 
2.34.1
Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Posted by Yi Liu 1 year ago
On 2024/12/12 15:24, Zhenzhong Duan wrote:
> When setup mapping on an paging domain before the domain is attached to any
> device, a NULL pointer dereference triggers as below:
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000200
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>   ...
>   Call Trace:
>    <TASK>
>    ? __die+0x20/0x70
>    ? page_fault_oops+0x80/0x150
>    ? do_user_addr_fault+0x5f/0x670
>    ? pfn_to_dma_pte+0xca/0x280
>    ? exc_page_fault+0x78/0x170
>    ? asm_exc_page_fault+0x22/0x30
>    ? cache_tag_flush_range_np+0x114/0x1f0
>    intel_iommu_iotlb_sync_map+0x16/0x20
>    iommu_map+0x59/0xd0
>    batch_to_domain+0x154/0x1e0
>    iopt_area_fill_domains+0x106/0x300
>    iopt_map_pages+0x1bc/0x290
>    iopt_map_user_pages+0xe8/0x1e0
>    ? xas_load+0x9/0xb0
>    iommufd_ioas_map+0xc9/0x1c0
>    iommufd_fops_ioctl+0xff/0x1b0
>    __x64_sys_ioctl+0x87/0xc0
>    do_syscall_64+0x50/0x110
>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
> 
> qi_batch structure is allocated when domain is attached to a device for the
> first time, when a mapping is setup before that, qi_batch is referenced to
> do batched flush and trigger above issue.
> 
> Fix it by checking qi_batch pointer and bypass batched flushing if no
> device is attached.
> 
> Cc: stable@vger.kernel.org
> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   drivers/iommu/intel/cache.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
> index e5b89f728ad3..bb9dae9a7fba 100644
> --- a/drivers/iommu/intel/cache.c
> +++ b/drivers/iommu/intel/cache.c
> @@ -264,7 +264,7 @@ static unsigned long calculate_psi_aligned_address(unsigned long start,
>   
>   static void qi_batch_flush_descs(struct intel_iommu *iommu, struct qi_batch *batch)
>   {
> -	if (!iommu || !batch->index)
> +	if (!iommu || !batch || !batch->index)

this is the same issue in the below link. :) And we should fix it by
allocating the qi_batch for parent domains. The nested parent domains is
not going to be attached to device at all. It acts more as a background
domain, so this fix will miss the future cache flushes. It would have
bigger problems. :)

https://lore.kernel.org/linux-iommu/20241210130322.17175-1-yi.l.liu@intel.com/

>   		return;
>   
>   	qi_submit_sync(iommu, batch->descs, batch->index, 0);

-- 
Regards,
Yi Liu
RE: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Posted by Duan, Zhenzhong 1 year ago

>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Sent: Thursday, December 12, 2024 3:45 PM
>Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>cache_tag_flush_range_np()
>
>On 2024/12/12 15:24, Zhenzhong Duan wrote:
>> When setup mapping on an paging domain before the domain is attached to
>any
>> device, a NULL pointer dereference triggers as below:
>>
>>   BUG: kernel NULL pointer dereference, address: 0000000000000200
>>   #PF: supervisor read access in kernel mode
>>   #PF: error_code(0x0000) - not-present page
>>   RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>>   ...
>>   Call Trace:
>>    <TASK>
>>    ? __die+0x20/0x70
>>    ? page_fault_oops+0x80/0x150
>>    ? do_user_addr_fault+0x5f/0x670
>>    ? pfn_to_dma_pte+0xca/0x280
>>    ? exc_page_fault+0x78/0x170
>>    ? asm_exc_page_fault+0x22/0x30
>>    ? cache_tag_flush_range_np+0x114/0x1f0
>>    intel_iommu_iotlb_sync_map+0x16/0x20
>>    iommu_map+0x59/0xd0
>>    batch_to_domain+0x154/0x1e0
>>    iopt_area_fill_domains+0x106/0x300
>>    iopt_map_pages+0x1bc/0x290
>>    iopt_map_user_pages+0xe8/0x1e0
>>    ? xas_load+0x9/0xb0
>>    iommufd_ioas_map+0xc9/0x1c0
>>    iommufd_fops_ioctl+0xff/0x1b0
>>    __x64_sys_ioctl+0x87/0xc0
>>    do_syscall_64+0x50/0x110
>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>
>> qi_batch structure is allocated when domain is attached to a device for the
>> first time, when a mapping is setup before that, qi_batch is referenced to
>> do batched flush and trigger above issue.
>>
>> Fix it by checking qi_batch pointer and bypass batched flushing if no
>> device is attached.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   drivers/iommu/intel/cache.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
>> index e5b89f728ad3..bb9dae9a7fba 100644
>> --- a/drivers/iommu/intel/cache.c
>> +++ b/drivers/iommu/intel/cache.c
>> @@ -264,7 +264,7 @@ static unsigned long
>calculate_psi_aligned_address(unsigned long start,
>>
>>   static void qi_batch_flush_descs(struct intel_iommu *iommu, struct qi_batch
>*batch)
>>   {
>> -	if (!iommu || !batch->index)
>> +	if (!iommu || !batch || !batch->index)
>
>this is the same issue in the below link. :) And we should fix it by
>allocating the qi_batch for parent domains. The nested parent domains is
>not going to be attached to device at all. It acts more as a background
>domain, so this fix will miss the future cache flushes. It would have
>bigger problems. :)
>
>https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
>yi.l.liu@intel.com/

Ah, just see this😊
This patch tries to fix an issue that mapping setup on a paging domain before
it's attached to a device, your patch fixed an issue that nested parent
domain's qi_batch is not allocated even if nested domain is attached to
a device. I think they are different issues?

Thanks
Zhenzhong
Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Posted by Yi Liu 1 year ago
On 2024/12/12 16:19, Duan, Zhenzhong wrote:
> 
> 
>> -----Original Message-----
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, December 12, 2024 3:45 PM
>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>> cache_tag_flush_range_np()
>>
>> On 2024/12/12 15:24, Zhenzhong Duan wrote:
>>> When setup mapping on an paging domain before the domain is attached to
>> any
>>> device, a NULL pointer dereference triggers as below:
>>>
>>>    BUG: kernel NULL pointer dereference, address: 0000000000000200
>>>    #PF: supervisor read access in kernel mode
>>>    #PF: error_code(0x0000) - not-present page
>>>    RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>>>    ...
>>>    Call Trace:
>>>     <TASK>
>>>     ? __die+0x20/0x70
>>>     ? page_fault_oops+0x80/0x150
>>>     ? do_user_addr_fault+0x5f/0x670
>>>     ? pfn_to_dma_pte+0xca/0x280
>>>     ? exc_page_fault+0x78/0x170
>>>     ? asm_exc_page_fault+0x22/0x30
>>>     ? cache_tag_flush_range_np+0x114/0x1f0
>>>     intel_iommu_iotlb_sync_map+0x16/0x20
>>>     iommu_map+0x59/0xd0
>>>     batch_to_domain+0x154/0x1e0
>>>     iopt_area_fill_domains+0x106/0x300
>>>     iopt_map_pages+0x1bc/0x290
>>>     iopt_map_user_pages+0xe8/0x1e0
>>>     ? xas_load+0x9/0xb0
>>>     iommufd_ioas_map+0xc9/0x1c0
>>>     iommufd_fops_ioctl+0xff/0x1b0
>>>     __x64_sys_ioctl+0x87/0xc0
>>>     do_syscall_64+0x50/0x110
>>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>
>>> qi_batch structure is allocated when domain is attached to a device for the
>>> first time, when a mapping is setup before that, qi_batch is referenced to
>>> do batched flush and trigger above issue.
>>>
>>> Fix it by checking qi_batch pointer and bypass batched flushing if no
>>> device is attached.
>>>
>>> Cc: stable@vger.kernel.org
>>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>    drivers/iommu/intel/cache.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
>>> index e5b89f728ad3..bb9dae9a7fba 100644
>>> --- a/drivers/iommu/intel/cache.c
>>> +++ b/drivers/iommu/intel/cache.c
>>> @@ -264,7 +264,7 @@ static unsigned long
>> calculate_psi_aligned_address(unsigned long start,
>>>
>>>    static void qi_batch_flush_descs(struct intel_iommu *iommu, struct qi_batch
>> *batch)
>>>    {
>>> -	if (!iommu || !batch->index)
>>> +	if (!iommu || !batch || !batch->index)
>>
>> this is the same issue in the below link. :) And we should fix it by
>> allocating the qi_batch for parent domains. The nested parent domains is
>> not going to be attached to device at all. It acts more as a background
>> domain, so this fix will miss the future cache flushes. It would have
>> bigger problems. :)
>>
>> https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
>> yi.l.liu@intel.com/
> 
> Ah, just see this😊
> This patch tries to fix an issue that mapping setup on a paging domain before
> it's attached to a device, your patch fixed an issue that nested parent
> domain's qi_batch is not allocated even if nested domain is attached to
> a device. I think they are different issues?

Oops. I see. I think your case is allocating a hwpt based on an IOAS that
already has mappings. When the hwpt is added to it, all the mappings of
this IOAS would be pushing to the hwpt. But the hwpt has not been attached
yet, so hit this issue. I remember there is a immediate_attach bool to let
iommufd_hwpt_paging_alloc() do an attach before calling
iopt_table_add_domain() which pushes the IOAS mappings to domain.

One possible fix is to set the immediate_attach to be true. But I doubt if
it will be agreed since it was introduced due to some gap on ARM side. If
that gap has been resolved, this behavior would go away as well.

So back to this issue, with this change, the flush would be skipped. It
looks ok to me to skip cache flush for map path. And we should not expect
any unmap on this domain since there is no device attached (parent domain
is an exception), hence nothing to be flushed even there is unmap in the
domain's IOAS. So it appears to be a acceptable fix. @Baolu, your opinion?

-- 
Regards,
Yi Liu
RE: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Posted by Duan, Zhenzhong 1 year ago
Hi Yi,

>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Sent: Thursday, December 12, 2024 5:29 PM
>Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>cache_tag_flush_range_np()
>
>On 2024/12/12 16:19, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>> Sent: Thursday, December 12, 2024 3:45 PM
>>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>>> cache_tag_flush_range_np()
>>>
>>> On 2024/12/12 15:24, Zhenzhong Duan wrote:
>>>> When setup mapping on an paging domain before the domain is attached to
>>> any
>>>> device, a NULL pointer dereference triggers as below:
>>>>
>>>>    BUG: kernel NULL pointer dereference, address: 0000000000000200
>>>>    #PF: supervisor read access in kernel mode
>>>>    #PF: error_code(0x0000) - not-present page
>>>>    RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>>>>    ...
>>>>    Call Trace:
>>>>     <TASK>
>>>>     ? __die+0x20/0x70
>>>>     ? page_fault_oops+0x80/0x150
>>>>     ? do_user_addr_fault+0x5f/0x670
>>>>     ? pfn_to_dma_pte+0xca/0x280
>>>>     ? exc_page_fault+0x78/0x170
>>>>     ? asm_exc_page_fault+0x22/0x30
>>>>     ? cache_tag_flush_range_np+0x114/0x1f0
>>>>     intel_iommu_iotlb_sync_map+0x16/0x20
>>>>     iommu_map+0x59/0xd0
>>>>     batch_to_domain+0x154/0x1e0
>>>>     iopt_area_fill_domains+0x106/0x300
>>>>     iopt_map_pages+0x1bc/0x290
>>>>     iopt_map_user_pages+0xe8/0x1e0
>>>>     ? xas_load+0x9/0xb0
>>>>     iommufd_ioas_map+0xc9/0x1c0
>>>>     iommufd_fops_ioctl+0xff/0x1b0
>>>>     __x64_sys_ioctl+0x87/0xc0
>>>>     do_syscall_64+0x50/0x110
>>>>     entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>
>>>> qi_batch structure is allocated when domain is attached to a device for the
>>>> first time, when a mapping is setup before that, qi_batch is referenced to
>>>> do batched flush and trigger above issue.
>>>>
>>>> Fix it by checking qi_batch pointer and bypass batched flushing if no
>>>> device is attached.
>>>>
>>>> Cc: stable@vger.kernel.org
>>>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>>    drivers/iommu/intel/cache.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
>>>> index e5b89f728ad3..bb9dae9a7fba 100644
>>>> --- a/drivers/iommu/intel/cache.c
>>>> +++ b/drivers/iommu/intel/cache.c
>>>> @@ -264,7 +264,7 @@ static unsigned long
>>> calculate_psi_aligned_address(unsigned long start,
>>>>
>>>>    static void qi_batch_flush_descs(struct intel_iommu *iommu, struct
>qi_batch
>>> *batch)
>>>>    {
>>>> -	if (!iommu || !batch->index)
>>>> +	if (!iommu || !batch || !batch->index)
>>>
>>> this is the same issue in the below link. :) And we should fix it by
>>> allocating the qi_batch for parent domains. The nested parent domains is
>>> not going to be attached to device at all. It acts more as a background
>>> domain, so this fix will miss the future cache flushes. It would have
>>> bigger problems. :)
>>>
>>> https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
>>> yi.l.liu@intel.com/
>>
>> Ah, just see this😊
>> This patch tries to fix an issue that mapping setup on a paging domain before
>> it's attached to a device, your patch fixed an issue that nested parent
>> domain's qi_batch is not allocated even if nested domain is attached to
>> a device. I think they are different issues?
>
>Oops. I see. I think your case is allocating a hwpt based on an IOAS that
>already has mappings. When the hwpt is added to it, all the mappings of
>this IOAS would be pushing to the hwpt. But the hwpt has not been attached
>yet, so hit this issue. I remember there is a immediate_attach bool to let
>iommufd_hwpt_paging_alloc() do an attach before calling
>iopt_table_add_domain() which pushes the IOAS mappings to domain.
>
>One possible fix is to set the immediate_attach to be true. But I doubt if
>it will be agreed since it was introduced due to some gap on ARM side. If
>that gap has been resolved, this behavior would go away as well.
>
>So back to this issue, with this change, the flush would be skipped. It
>looks ok to me to skip cache flush for map path. And we should not expect
>any unmap on this domain since there is no device attached (parent domain
>is an exception), hence nothing to be flushed even there is unmap in the
>domain's IOAS. So it appears to be a acceptable fix. @Baolu, your opinion?

Hold on, it looks I'm wrong on analyzing related code qi_batch_flush_descs().
The iommu should always be NULL in my suspected case, so 
qi_batch_flush_descs() will return early without issue.

I reproduced the backtrace when playing with iommufd qemu nesting, I think your
previous comment is correct, I misunderstood the root cause of it, it's indeed
due to nesting parent domain not paging domain. Please ignore this patch.

Thanks
Zhenzhong
Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Posted by Yi Liu 1 year ago

On 2024/12/12 18:01, Duan, Zhenzhong wrote:
> Hi Yi,
> 
>> -----Original Message-----
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Sent: Thursday, December 12, 2024 5:29 PM
>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>> cache_tag_flush_range_np()
>>
>> On 2024/12/12 16:19, Duan, Zhenzhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>> Sent: Thursday, December 12, 2024 3:45 PM
>>>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>>>> cache_tag_flush_range_np()
>>>>
>>>> On 2024/12/12 15:24, Zhenzhong Duan wrote:
>>>>> When setup mapping on an paging domain before the domain is attached to
>>>> any
>>>>> device, a NULL pointer dereference triggers as below:
>>>>>
>>>>>     BUG: kernel NULL pointer dereference, address: 0000000000000200
>>>>>     #PF: supervisor read access in kernel mode
>>>>>     #PF: error_code(0x0000) - not-present page
>>>>>     RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>>>>>     ...
>>>>>     Call Trace:
>>>>>      <TASK>
>>>>>      ? __die+0x20/0x70
>>>>>      ? page_fault_oops+0x80/0x150
>>>>>      ? do_user_addr_fault+0x5f/0x670
>>>>>      ? pfn_to_dma_pte+0xca/0x280
>>>>>      ? exc_page_fault+0x78/0x170
>>>>>      ? asm_exc_page_fault+0x22/0x30
>>>>>      ? cache_tag_flush_range_np+0x114/0x1f0
>>>>>      intel_iommu_iotlb_sync_map+0x16/0x20
>>>>>      iommu_map+0x59/0xd0
>>>>>      batch_to_domain+0x154/0x1e0
>>>>>      iopt_area_fill_domains+0x106/0x300
>>>>>      iopt_map_pages+0x1bc/0x290
>>>>>      iopt_map_user_pages+0xe8/0x1e0
>>>>>      ? xas_load+0x9/0xb0
>>>>>      iommufd_ioas_map+0xc9/0x1c0
>>>>>      iommufd_fops_ioctl+0xff/0x1b0
>>>>>      __x64_sys_ioctl+0x87/0xc0
>>>>>      do_syscall_64+0x50/0x110
>>>>>      entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>
>>>>> qi_batch structure is allocated when domain is attached to a device for the
>>>>> first time, when a mapping is setup before that, qi_batch is referenced to
>>>>> do batched flush and trigger above issue.
>>>>>
>>>>> Fix it by checking qi_batch pointer and bypass batched flushing if no
>>>>> device is attached.
>>>>>
>>>>> Cc: stable@vger.kernel.org
>>>>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache invalidation")
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>>     drivers/iommu/intel/cache.c | 2 +-
>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/cache.c
>>>>> index e5b89f728ad3..bb9dae9a7fba 100644
>>>>> --- a/drivers/iommu/intel/cache.c
>>>>> +++ b/drivers/iommu/intel/cache.c
>>>>> @@ -264,7 +264,7 @@ static unsigned long
>>>> calculate_psi_aligned_address(unsigned long start,
>>>>>
>>>>>     static void qi_batch_flush_descs(struct intel_iommu *iommu, struct
>> qi_batch
>>>> *batch)
>>>>>     {
>>>>> -	if (!iommu || !batch->index)
>>>>> +	if (!iommu || !batch || !batch->index)
>>>>
>>>> this is the same issue in the below link. :) And we should fix it by
>>>> allocating the qi_batch for parent domains. The nested parent domains is
>>>> not going to be attached to device at all. It acts more as a background
>>>> domain, so this fix will miss the future cache flushes. It would have
>>>> bigger problems. :)
>>>>
>>>> https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
>>>> yi.l.liu@intel.com/
>>>
>>> Ah, just see this😊
>>> This patch tries to fix an issue that mapping setup on a paging domain before
>>> it's attached to a device, your patch fixed an issue that nested parent
>>> domain's qi_batch is not allocated even if nested domain is attached to
>>> a device. I think they are different issues?
>>
>> Oops. I see. I think your case is allocating a hwpt based on an IOAS that
>> already has mappings. When the hwpt is added to it, all the mappings of
>> this IOAS would be pushing to the hwpt. But the hwpt has not been attached
>> yet, so hit this issue. I remember there is a immediate_attach bool to let
>> iommufd_hwpt_paging_alloc() do an attach before calling
>> iopt_table_add_domain() which pushes the IOAS mappings to domain.
>>
>> One possible fix is to set the immediate_attach to be true. But I doubt if
>> it will be agreed since it was introduced due to some gap on ARM side. If
>> that gap has been resolved, this behavior would go away as well.
>>
>> So back to this issue, with this change, the flush would be skipped. It
>> looks ok to me to skip cache flush for map path. And we should not expect
>> any unmap on this domain since there is no device attached (parent domain
>> is an exception), hence nothing to be flushed even there is unmap in the
>> domain's IOAS. So it appears to be a acceptable fix. @Baolu, your opinion?
> 
> Hold on, it looks I'm wrong on analyzing related code qi_batch_flush_descs().
> The iommu should always be NULL in my suspected case, so
> qi_batch_flush_descs() will return early without issue.
> 
> I reproduced the backtrace when playing with iommufd qemu nesting, I think your
> previous comment is correct, I misunderstood the root cause of it, it's indeed
> due to nesting parent domain not paging domain. Please ignore this patch.

Great. I also had a try to allocate hwpt with an IOAS that has already got
a bunch of mappings, it can work as the iommu is null.

@Baolu, you may ignore this patch as it's already fixed.

-- 
Regards,
Yi Liu
Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in cache_tag_flush_range_np()
Posted by Baolu Lu 1 year ago
On 2024/12/12 19:00, Yi Liu wrote:
> 
> 
> On 2024/12/12 18:01, Duan, Zhenzhong wrote:
>> Hi Yi,
>>
>>> -----Original Message-----
>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>> Sent: Thursday, December 12, 2024 5:29 PM
>>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer dereference in
>>> cache_tag_flush_range_np()
>>>
>>> On 2024/12/12 16:19, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>>> Sent: Thursday, December 12, 2024 3:45 PM
>>>>> Subject: Re: [PATCH] iommu/vt-d: Fix kernel NULL pointer 
>>>>> dereference in
>>>>> cache_tag_flush_range_np()
>>>>>
>>>>> On 2024/12/12 15:24, Zhenzhong Duan wrote:
>>>>>> When setup mapping on an paging domain before the domain is 
>>>>>> attached to
>>>>> any
>>>>>> device, a NULL pointer dereference triggers as below:
>>>>>>
>>>>>>     BUG: kernel NULL pointer dereference, address: 0000000000000200
>>>>>>     #PF: supervisor read access in kernel mode
>>>>>>     #PF: error_code(0x0000) - not-present page
>>>>>>     RIP: 0010:cache_tag_flush_range_np+0x114/0x1f0
>>>>>>     ...
>>>>>>     Call Trace:
>>>>>>      <TASK>
>>>>>>      ? __die+0x20/0x70
>>>>>>      ? page_fault_oops+0x80/0x150
>>>>>>      ? do_user_addr_fault+0x5f/0x670
>>>>>>      ? pfn_to_dma_pte+0xca/0x280
>>>>>>      ? exc_page_fault+0x78/0x170
>>>>>>      ? asm_exc_page_fault+0x22/0x30
>>>>>>      ? cache_tag_flush_range_np+0x114/0x1f0
>>>>>>      intel_iommu_iotlb_sync_map+0x16/0x20
>>>>>>      iommu_map+0x59/0xd0
>>>>>>      batch_to_domain+0x154/0x1e0
>>>>>>      iopt_area_fill_domains+0x106/0x300
>>>>>>      iopt_map_pages+0x1bc/0x290
>>>>>>      iopt_map_user_pages+0xe8/0x1e0
>>>>>>      ? xas_load+0x9/0xb0
>>>>>>      iommufd_ioas_map+0xc9/0x1c0
>>>>>>      iommufd_fops_ioctl+0xff/0x1b0
>>>>>>      __x64_sys_ioctl+0x87/0xc0
>>>>>>      do_syscall_64+0x50/0x110
>>>>>>      entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>>>>
>>>>>> qi_batch structure is allocated when domain is attached to a 
>>>>>> device for the
>>>>>> first time, when a mapping is setup before that, qi_batch is 
>>>>>> referenced to
>>>>>> do batched flush and trigger above issue.
>>>>>>
>>>>>> Fix it by checking qi_batch pointer and bypass batched flushing if no
>>>>>> device is attached.
>>>>>>
>>>>>> Cc: stable@vger.kernel.org
>>>>>> Fixes: 705c1cdf1e73 ("iommu/vt-d: Introduce batched cache 
>>>>>> invalidation")
>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>> ---
>>>>>>     drivers/iommu/intel/cache.c | 2 +-
>>>>>>     1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/iommu/intel/cache.c b/drivers/iommu/intel/ 
>>>>>> cache.c
>>>>>> index e5b89f728ad3..bb9dae9a7fba 100644
>>>>>> --- a/drivers/iommu/intel/cache.c
>>>>>> +++ b/drivers/iommu/intel/cache.c
>>>>>> @@ -264,7 +264,7 @@ static unsigned long
>>>>> calculate_psi_aligned_address(unsigned long start,
>>>>>>
>>>>>>     static void qi_batch_flush_descs(struct intel_iommu *iommu, 
>>>>>> struct
>>> qi_batch
>>>>> *batch)
>>>>>>     {
>>>>>> -    if (!iommu || !batch->index)
>>>>>> +    if (!iommu || !batch || !batch->index)
>>>>>
>>>>> this is the same issue in the below link. 🙂 And we should fix it by
>>>>> allocating the qi_batch for parent domains. The nested parent 
>>>>> domains is
>>>>> not going to be attached to device at all. It acts more as a 
>>>>> background
>>>>> domain, so this fix will miss the future cache flushes. It would have
>>>>> bigger problems. 🙂
>>>>>
>>>>> https://lore.kernel.org/linux-iommu/20241210130322.17175-1-
>>>>> yi.l.liu@intel.com/
>>>>
>>>> Ah, just see this😊
>>>> This patch tries to fix an issue that mapping setup on a paging 
>>>> domain before
>>>> it's attached to a device, your patch fixed an issue that nested parent
>>>> domain's qi_batch is not allocated even if nested domain is attached to
>>>> a device. I think they are different issues?
>>>
>>> Oops. I see. I think your case is allocating a hwpt based on an IOAS 
>>> that
>>> already has mappings. When the hwpt is added to it, all the mappings of
>>> this IOAS would be pushing to the hwpt. But the hwpt has not been 
>>> attached
>>> yet, so hit this issue. I remember there is a immediate_attach bool 
>>> to let
>>> iommufd_hwpt_paging_alloc() do an attach before calling
>>> iopt_table_add_domain() which pushes the IOAS mappings to domain.
>>>
>>> One possible fix is to set the immediate_attach to be true. But I 
>>> doubt if
>>> it will be agreed since it was introduced due to some gap on ARM 
>>> side. If
>>> that gap has been resolved, this behavior would go away as well.
>>>
>>> So back to this issue, with this change, the flush would be skipped. It
>>> looks ok to me to skip cache flush for map path. And we should not 
>>> expect
>>> any unmap on this domain since there is no device attached (parent 
>>> domain
>>> is an exception), hence nothing to be flushed even there is unmap in the
>>> domain's IOAS. So it appears to be a acceptable fix. @Baolu, your 
>>> opinion?
>>
>> Hold on, it looks I'm wrong on analyzing related code 
>> qi_batch_flush_descs().
>> The iommu should always be NULL in my suspected case, so
>> qi_batch_flush_descs() will return early without issue.
>>
>> I reproduced the backtrace when playing with iommufd qemu nesting, I 
>> think your
>> previous comment is correct, I misunderstood the root cause of it, 
>> it's indeed
>> due to nesting parent domain not paging domain. Please ignore this patch.
> 
> Great. I also had a try to allocate hwpt with an IOAS that has already got
> a bunch of mappings, it can work as the iommu is null.
> 
> @Baolu, you may ignore this patch as it's already fixed.

Okay, thank you!