[PATCH v4 5/5] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device

Will Deacon posted 5 patches 1 year, 11 months ago
There is a newer version of this series
[PATCH v4 5/5] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
Posted by Will Deacon 1 year, 11 months ago
From: Nicolin Chen <nicolinc@nvidia.com>

The swiotlb does not support a mapping size > swiotlb_max_mapping_size().
On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that
an NVME device can map a size between 300KB~512KB, which certainly failed
the swiotlb mappings, though the default pool of swiotlb has many slots:
    systemd[1]: Started Journal Service.
 => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
    note: journal-offline[392] exited with irqs disabled
    note: journal-offline[392] exited with preempt_count 1

Call trace:
[    3.099918]  swiotlb_tbl_map_single+0x214/0x240
[    3.099921]  iommu_dma_map_page+0x218/0x328
[    3.099928]  dma_map_page_attrs+0x2e8/0x3a0
[    3.101985]  nvme_prep_rq.part.0+0x408/0x878 [nvme]
[    3.102308]  nvme_queue_rqs+0xc0/0x300 [nvme]
[    3.102313]  blk_mq_flush_plug_list.part.0+0x57c/0x600
[    3.102321]  blk_add_rq_to_plug+0x180/0x2a0
[    3.102323]  blk_mq_submit_bio+0x4c8/0x6b8
[    3.103463]  __submit_bio+0x44/0x220
[    3.103468]  submit_bio_noacct_nocheck+0x2b8/0x360
[    3.103470]  submit_bio_noacct+0x180/0x6c8
[    3.103471]  submit_bio+0x34/0x130
[    3.103473]  ext4_bio_write_folio+0x5a4/0x8c8
[    3.104766]  mpage_submit_folio+0xa0/0x100
[    3.104769]  mpage_map_and_submit_buffers+0x1a4/0x400
[    3.104771]  ext4_do_writepages+0x6a0/0xd78
[    3.105615]  ext4_writepages+0x80/0x118
[    3.105616]  do_writepages+0x90/0x1e8
[    3.105619]  filemap_fdatawrite_wbc+0x94/0xe0
[    3.105622]  __filemap_fdatawrite_range+0x68/0xb8
[    3.106656]  file_write_and_wait_range+0x84/0x120
[    3.106658]  ext4_sync_file+0x7c/0x4c0
[    3.106660]  vfs_fsync_range+0x3c/0xa8
[    3.106663]  do_fsync+0x44/0xc0

Since untrusted devices might go down the swiotlb pathway with dma-iommu,
these devices should not map a size larger than swiotlb_max_mapping_size.

To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to
take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from
the iommu_dma_opt_mapping_size().

Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Link: https://lore.kernel.org/r/ee51a3a5c32cf885b18f6416171802669f4a718a.1707851466.git.nicolinc@nvidia.com
Signed-off-by: Will Deacon <will@kernel.org>
---
 drivers/iommu/dma-iommu.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 50ccc4f1ef81..7d1a20da6d94 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1706,6 +1706,13 @@ static size_t iommu_dma_opt_mapping_size(void)
 	return iova_rcache_range();
 }
 
+static size_t iommu_dma_max_mapping_size(struct device *dev)
+{
+	if (is_swiotlb_active(dev) && dev_is_untrusted(dev))
+		return swiotlb_max_mapping_size(dev);
+	return SIZE_MAX;
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
 	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED,
 	.alloc			= iommu_dma_alloc,
@@ -1728,6 +1735,7 @@ static const struct dma_map_ops iommu_dma_ops = {
 	.unmap_resource		= iommu_dma_unmap_resource,
 	.get_merge_boundary	= iommu_dma_get_merge_boundary,
 	.opt_mapping_size	= iommu_dma_opt_mapping_size,
+	.max_mapping_size       = iommu_dma_max_mapping_size,
 };
 
 /*
-- 
2.44.0.rc0.258.g7320e95886-goog
Re: [PATCH v4 5/5] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
Posted by Christoph Hellwig 1 year, 11 months ago
On Wed, Feb 21, 2024 at 11:35:04AM +0000, Will Deacon wrote:
> +static size_t iommu_dma_max_mapping_size(struct device *dev)
> +{
> +	if (is_swiotlb_active(dev) && dev_is_untrusted(dev))
> +		return swiotlb_max_mapping_size(dev);

Curious: do we really need both checks here?  If swiotlb is active
for a device (for whatever reason), aren't we then always bound
by the max size?  If not please add a comment explaining it.
Re: [PATCH v4 5/5] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
Posted by Robin Murphy 1 year, 11 months ago
On 27/02/2024 3:40 pm, Christoph Hellwig wrote:
> On Wed, Feb 21, 2024 at 11:35:04AM +0000, Will Deacon wrote:
>> +static size_t iommu_dma_max_mapping_size(struct device *dev)
>> +{
>> +	if (is_swiotlb_active(dev) && dev_is_untrusted(dev))
>> +		return swiotlb_max_mapping_size(dev);
> 
> Curious: do we really need both checks here?  If swiotlb is active
> for a device (for whatever reason), aren't we then always bound
> by the max size?  If not please add a comment explaining it.
> 

Oh, good point - if we have an untrusted device but SWIOTLB isn't 
initialised for whatever reason, then it doesn't matter what 
max_mapping_size returns because iommu_dma_map_page() is going to bail 
out regardless.

Thanks,
Robin.
Re: [PATCH v4 5/5] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
Posted by Will Deacon 1 year, 11 months ago
On Tue, Feb 27, 2024 at 03:53:05PM +0000, Robin Murphy wrote:
> On 27/02/2024 3:40 pm, Christoph Hellwig wrote:
> > On Wed, Feb 21, 2024 at 11:35:04AM +0000, Will Deacon wrote:
> > > +static size_t iommu_dma_max_mapping_size(struct device *dev)
> > > +{
> > > +	if (is_swiotlb_active(dev) && dev_is_untrusted(dev))
> > > +		return swiotlb_max_mapping_size(dev);
> > 
> > Curious: do we really need both checks here?  If swiotlb is active
> > for a device (for whatever reason), aren't we then always bound
> > by the max size?  If not please add a comment explaining it.
> > 
> 
> Oh, good point - if we have an untrusted device but SWIOTLB isn't
> initialised for whatever reason, then it doesn't matter what
> max_mapping_size returns because iommu_dma_map_page() is going to bail out
> regardless.

Makes sense. Since this is all internal to the IOMMU DMA code, I can just
drop the first part of the check.

I'll get a v5 out shortly.

Will
Re: [PATCH v4 5/5] iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
Posted by Robin Murphy 1 year, 11 months ago
On 21/02/2024 11:35 am, Will Deacon wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> The swiotlb does not support a mapping size > swiotlb_max_mapping_size().
> On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that
> an NVME device can map a size between 300KB~512KB, which certainly failed
> the swiotlb mappings, though the default pool of swiotlb has many slots:
>      systemd[1]: Started Journal Service.
>   => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
>      note: journal-offline[392] exited with irqs disabled
>      note: journal-offline[392] exited with preempt_count 1
> 
> Call trace:
> [    3.099918]  swiotlb_tbl_map_single+0x214/0x240
> [    3.099921]  iommu_dma_map_page+0x218/0x328
> [    3.099928]  dma_map_page_attrs+0x2e8/0x3a0
> [    3.101985]  nvme_prep_rq.part.0+0x408/0x878 [nvme]
> [    3.102308]  nvme_queue_rqs+0xc0/0x300 [nvme]
> [    3.102313]  blk_mq_flush_plug_list.part.0+0x57c/0x600
> [    3.102321]  blk_add_rq_to_plug+0x180/0x2a0
> [    3.102323]  blk_mq_submit_bio+0x4c8/0x6b8
> [    3.103463]  __submit_bio+0x44/0x220
> [    3.103468]  submit_bio_noacct_nocheck+0x2b8/0x360
> [    3.103470]  submit_bio_noacct+0x180/0x6c8
> [    3.103471]  submit_bio+0x34/0x130
> [    3.103473]  ext4_bio_write_folio+0x5a4/0x8c8
> [    3.104766]  mpage_submit_folio+0xa0/0x100
> [    3.104769]  mpage_map_and_submit_buffers+0x1a4/0x400
> [    3.104771]  ext4_do_writepages+0x6a0/0xd78
> [    3.105615]  ext4_writepages+0x80/0x118
> [    3.105616]  do_writepages+0x90/0x1e8
> [    3.105619]  filemap_fdatawrite_wbc+0x94/0xe0
> [    3.105622]  __filemap_fdatawrite_range+0x68/0xb8
> [    3.106656]  file_write_and_wait_range+0x84/0x120
> [    3.106658]  ext4_sync_file+0x7c/0x4c0
> [    3.106660]  vfs_fsync_range+0x3c/0xa8
> [    3.106663]  do_fsync+0x44/0xc0
> 
> Since untrusted devices might go down the swiotlb pathway with dma-iommu,
> these devices should not map a size larger than swiotlb_max_mapping_size.
> 
> To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to
> take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from
> the iommu_dma_opt_mapping_size().

On the basis that this is at least far closer to correct than doing nothing,

Acked-by: Robin Murphy <robin.murphy@arm.com>

TBH I'm scared to think about theoretical correctness for all the 
interactions between the IOVA granule and min_align_mask, since just the 
SWIOTLB stuff is bad enough, even before you realise the ways that the 
IOVA allocation isn't necessarily right either. However I reckon as long 
as we don't ever see a granule smaller than IO_TLB_SIZE, and/or a 
min_align_mask larger than a granule, then this should probably work 
well enough as-is.

Cheers,
Robin.

> Fixes: 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers")
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Link: https://lore.kernel.org/r/ee51a3a5c32cf885b18f6416171802669f4a718a.1707851466.git.nicolinc@nvidia.com
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   drivers/iommu/dma-iommu.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 50ccc4f1ef81..7d1a20da6d94 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1706,6 +1706,13 @@ static size_t iommu_dma_opt_mapping_size(void)
>   	return iova_rcache_range();
>   }
>   
> +static size_t iommu_dma_max_mapping_size(struct device *dev)
> +{
> +	if (is_swiotlb_active(dev) && dev_is_untrusted(dev))
> +		return swiotlb_max_mapping_size(dev);
> +	return SIZE_MAX;
> +}
> +
>   static const struct dma_map_ops iommu_dma_ops = {
>   	.flags			= DMA_F_PCI_P2PDMA_SUPPORTED,
>   	.alloc			= iommu_dma_alloc,
> @@ -1728,6 +1735,7 @@ static const struct dma_map_ops iommu_dma_ops = {
>   	.unmap_resource		= iommu_dma_unmap_resource,
>   	.get_merge_boundary	= iommu_dma_get_merge_boundary,
>   	.opt_mapping_size	= iommu_dma_opt_mapping_size,
> +	.max_mapping_size       = iommu_dma_max_mapping_size,
>   };
>   
>   /*