Documentation/admin-guide/kernel-parameters.txt | 17 +++++++++++------ drivers/iommu/amd/amd_iommu.h | 1 + drivers/iommu/amd/amd_iommu_types.h | 4 ++++ drivers/iommu/amd/init.c | 8 ++++++++ drivers/iommu/amd/io_pgtable.c | 2 +- 5 files changed, 25 insertions(+), 7 deletions(-)
From: Joerg Roedel <jroedel@suse.de>
Add two new kernel command line parameters to limit the page-sizes
used for v1 page-tables:
nohugepages - Limits page-sizes to 4KiB
v2_pgsizes_only - Limits page-sizes to 4Kib/2Mib/1GiB; The
same as the sizes used with v2 page-tables
This is needed for multiple scenarios. When assigning devices to
SEV-SNP guests the IOMMU page-sizes need to match the sizes in the RMP
table, otherwise the device will not be able to access all shared
memory.
Also, some ATS devices do not work properly with arbitrary IO
page-sizes as supported by AMD-Vi, so limiting the sizes used by the
driver is a suitable workaround.
All-in-all, these parameters are only workarounds until the IOMMU core
and related APIs gather the ability to negotiate the page-sizes in a
better way.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
Documentation/admin-guide/kernel-parameters.txt | 17 +++++++++++------
drivers/iommu/amd/amd_iommu.h | 1 +
drivers/iommu/amd/amd_iommu_types.h | 4 ++++
drivers/iommu/amd/init.c | 8 ++++++++
drivers/iommu/amd/io_pgtable.c | 2 +-
5 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 09126bb8cc9f..6d6630aec46c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -333,12 +333,17 @@
allowed anymore to lift isolation
requirements as needed. This option
does not override iommu=pt
- force_enable - Force enable the IOMMU on platforms known
- to be buggy with IOMMU enabled. Use this
- option with care.
- pgtbl_v1 - Use v1 page table for DMA-API (Default).
- pgtbl_v2 - Use v2 page table for DMA-API.
- irtcachedis - Disable Interrupt Remapping Table (IRT) caching.
+ force_enable - Force enable the IOMMU on platforms known
+ to be buggy with IOMMU enabled. Use this
+ option with care.
+ pgtbl_v1 - Use v1 page table for DMA-API (Default).
+ pgtbl_v2 - Use v2 page table for DMA-API.
+ irtcachedis - Disable Interrupt Remapping Table (IRT) caching.
+ nohugepages - Limit page-sizes used for v1 page-tables
+ to 4 KiB.
+ v2_pgsizes_only - Limit page-sizes used for v1 page-tables
+ to 4KiB/2Mib/1GiB.
+
amd_iommu_dump= [HW,X86-64]
Enable AMD IOMMU driver option to dump the ACPI table
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index 29e6e71f7f9a..6386fa4556d9 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -43,6 +43,7 @@ int amd_iommu_enable_faulting(unsigned int cpu);
extern int amd_iommu_guest_ir;
extern enum io_pgtable_fmt amd_iommu_pgtable;
extern int amd_iommu_gpt_level;
+extern unsigned long amd_iommu_pgsize_bitmap;
/* Protection domain ops */
struct protection_domain *protection_domain_alloc(unsigned int type, int nid);
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 35aa4ff020f5..601fb4ee6900 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -293,6 +293,10 @@
* Page sizes >= the 52 bit max physical address of the CPU are not supported.
*/
#define AMD_IOMMU_PGSIZES (GENMASK_ULL(51, 12) ^ SZ_512G)
+
+/* Special mode where page-sizes are limited to 4 KiB */
+#define AMD_IOMMU_PGSIZES_4K (PAGE_SIZE)
+
/* 4K, 2MB, 1G page sizes are supported */
#define AMD_IOMMU_PGSIZES_V2 (PAGE_SIZE | (1ULL << 21) | (1ULL << 30))
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6b15ce09e78d..43131c3a2172 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -192,6 +192,8 @@ bool amdr_ivrs_remap_support __read_mostly;
bool amd_iommu_force_isolation __read_mostly;
+unsigned long amd_iommu_pgsize_bitmap __ro_after_init = AMD_IOMMU_PGSIZES;
+
/*
* AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
* to know which ones are already in use.
@@ -3492,6 +3494,12 @@ static int __init parse_amd_iommu_options(char *str)
amd_iommu_pgtable = AMD_IOMMU_V2;
} else if (strncmp(str, "irtcachedis", 11) == 0) {
amd_iommu_irtcachedis = true;
+ } else if (strncmp(str, "nohugepages", 11) == 0) {
+ pr_info("Restricting V1 page-sizes to 4KiB");
+ amd_iommu_pgsize_bitmap = AMD_IOMMU_PGSIZES_4K;
+ } else if (strncmp(str, "v2_pgsizes_only", 15) == 0) {
+ pr_info("Restricting V1 page-sizes to 4KiB/2MiB/1GiB");
+ amd_iommu_pgsize_bitmap = AMD_IOMMU_PGSIZES_V2;
} else {
pr_notice("Unknown option - '%s'\n", str);
}
diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
index 14f62c420e4a..804b788f3f16 100644
--- a/drivers/iommu/amd/io_pgtable.c
+++ b/drivers/iommu/amd/io_pgtable.c
@@ -548,7 +548,7 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
return NULL;
pgtable->mode = PAGE_MODE_3_LEVEL;
- cfg->pgsize_bitmap = AMD_IOMMU_PGSIZES;
+ cfg->pgsize_bitmap = amd_iommu_pgsize_bitmap;
cfg->ias = IOMMU_IN_ADDR_BIT_SIZE;
cfg->oas = IOMMU_OUT_ADDR_BIT_SIZE;
--
2.46.0
On 9/5/2024 12:52 PM, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> Add two new kernel command line parameters to limit the page-sizes
> used for v1 page-tables:
>
> nohugepages - Limits page-sizes to 4KiB
>
> v2_pgsizes_only - Limits page-sizes to 4Kib/2Mib/1GiB; The
> same as the sizes used with v2 page-tables
>
> This is needed for multiple scenarios. When assigning devices to
> SEV-SNP guests the IOMMU page-sizes need to match the sizes in the RMP
> table, otherwise the device will not be able to access all shared
> memory.
>
> Also, some ATS devices do not work properly with arbitrary IO
> page-sizes as supported by AMD-Vi, so limiting the sizes used by the
> driver is a suitable workaround.
>
> All-in-all, these parameters are only workarounds until the IOMMU core
> and related APIs gather the ability to negotiate the page-sizes in a
> better way.
Thanks! Patch looks good to me.
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
-Vasant
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 17 +++++++++++------
> drivers/iommu/amd/amd_iommu.h | 1 +
> drivers/iommu/amd/amd_iommu_types.h | 4 ++++
> drivers/iommu/amd/init.c | 8 ++++++++
> drivers/iommu/amd/io_pgtable.c | 2 +-
> 5 files changed, 25 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 09126bb8cc9f..6d6630aec46c 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -333,12 +333,17 @@
> allowed anymore to lift isolation
> requirements as needed. This option
> does not override iommu=pt
> - force_enable - Force enable the IOMMU on platforms known
> - to be buggy with IOMMU enabled. Use this
> - option with care.
> - pgtbl_v1 - Use v1 page table for DMA-API (Default).
> - pgtbl_v2 - Use v2 page table for DMA-API.
> - irtcachedis - Disable Interrupt Remapping Table (IRT) caching.
> + force_enable - Force enable the IOMMU on platforms known
> + to be buggy with IOMMU enabled. Use this
> + option with care.
> + pgtbl_v1 - Use v1 page table for DMA-API (Default).
> + pgtbl_v2 - Use v2 page table for DMA-API.
> + irtcachedis - Disable Interrupt Remapping Table (IRT) caching.
> + nohugepages - Limit page-sizes used for v1 page-tables
> + to 4 KiB.
> + v2_pgsizes_only - Limit page-sizes used for v1 page-tables
> + to 4KiB/2Mib/1GiB.
> +
>
> amd_iommu_dump= [HW,X86-64]
> Enable AMD IOMMU driver option to dump the ACPI table
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index 29e6e71f7f9a..6386fa4556d9 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -43,6 +43,7 @@ int amd_iommu_enable_faulting(unsigned int cpu);
> extern int amd_iommu_guest_ir;
> extern enum io_pgtable_fmt amd_iommu_pgtable;
> extern int amd_iommu_gpt_level;
> +extern unsigned long amd_iommu_pgsize_bitmap;
>
> /* Protection domain ops */
> struct protection_domain *protection_domain_alloc(unsigned int type, int nid);
> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
> index 35aa4ff020f5..601fb4ee6900 100644
> --- a/drivers/iommu/amd/amd_iommu_types.h
> +++ b/drivers/iommu/amd/amd_iommu_types.h
> @@ -293,6 +293,10 @@
> * Page sizes >= the 52 bit max physical address of the CPU are not supported.
> */
> #define AMD_IOMMU_PGSIZES (GENMASK_ULL(51, 12) ^ SZ_512G)
> +
> +/* Special mode where page-sizes are limited to 4 KiB */
> +#define AMD_IOMMU_PGSIZES_4K (PAGE_SIZE)
> +
> /* 4K, 2MB, 1G page sizes are supported */
> #define AMD_IOMMU_PGSIZES_V2 (PAGE_SIZE | (1ULL << 21) | (1ULL << 30))
>
> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
> index 6b15ce09e78d..43131c3a2172 100644
> --- a/drivers/iommu/amd/init.c
> +++ b/drivers/iommu/amd/init.c
> @@ -192,6 +192,8 @@ bool amdr_ivrs_remap_support __read_mostly;
>
> bool amd_iommu_force_isolation __read_mostly;
>
> +unsigned long amd_iommu_pgsize_bitmap __ro_after_init = AMD_IOMMU_PGSIZES;
> +
> /*
> * AMD IOMMU allows up to 2^16 different protection domains. This is a bitmap
> * to know which ones are already in use.
> @@ -3492,6 +3494,12 @@ static int __init parse_amd_iommu_options(char *str)
> amd_iommu_pgtable = AMD_IOMMU_V2;
> } else if (strncmp(str, "irtcachedis", 11) == 0) {
> amd_iommu_irtcachedis = true;
> + } else if (strncmp(str, "nohugepages", 11) == 0) {
> + pr_info("Restricting V1 page-sizes to 4KiB");
> + amd_iommu_pgsize_bitmap = AMD_IOMMU_PGSIZES_4K;
> + } else if (strncmp(str, "v2_pgsizes_only", 15) == 0) {
> + pr_info("Restricting V1 page-sizes to 4KiB/2MiB/1GiB");
> + amd_iommu_pgsize_bitmap = AMD_IOMMU_PGSIZES_V2;
> } else {
> pr_notice("Unknown option - '%s'\n", str);
> }
> diff --git a/drivers/iommu/amd/io_pgtable.c b/drivers/iommu/amd/io_pgtable.c
> index 14f62c420e4a..804b788f3f16 100644
> --- a/drivers/iommu/amd/io_pgtable.c
> +++ b/drivers/iommu/amd/io_pgtable.c
> @@ -548,7 +548,7 @@ static struct io_pgtable *v1_alloc_pgtable(struct io_pgtable_cfg *cfg, void *coo
> return NULL;
> pgtable->mode = PAGE_MODE_3_LEVEL;
>
> - cfg->pgsize_bitmap = AMD_IOMMU_PGSIZES;
> + cfg->pgsize_bitmap = amd_iommu_pgsize_bitmap;
> cfg->ias = IOMMU_IN_ADDR_BIT_SIZE;
> cfg->oas = IOMMU_OUT_ADDR_BIT_SIZE;
>
On 2024/9/5 15:22, Joerg Roedel wrote: > From: Joerg Roedel<jroedel@suse.de> > > Add two new kernel command line parameters to limit the page-sizes > used for v1 page-tables: > > nohugepages - Limits page-sizes to 4KiB > > v2_pgsizes_only - Limits page-sizes to 4Kib/2Mib/1GiB; The > same as the sizes used with v2 page-tables > > This is needed for multiple scenarios. When assigning devices to > SEV-SNP guests the IOMMU page-sizes need to match the sizes in the RMP > table, otherwise the device will not be able to access all shared > memory. > > Also, some ATS devices do not work properly with arbitrary IO > page-sizes as supported by AMD-Vi, so limiting the sizes used by the > driver is a suitable workaround. > > All-in-all, these parameters are only workarounds until the IOMMU core > and related APIs gather the ability to negotiate the page-sizes in a > better way. > > Signed-off-by: Joerg Roedel<jroedel@suse.de> > --- > Documentation/admin-guide/kernel-parameters.txt | 17 +++++++++++------ > drivers/iommu/amd/amd_iommu.h | 1 + > drivers/iommu/amd/amd_iommu_types.h | 4 ++++ > drivers/iommu/amd/init.c | 8 ++++++++ > drivers/iommu/amd/io_pgtable.c | 2 +- > 5 files changed, 25 insertions(+), 7 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 09126bb8cc9f..6d6630aec46c 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -333,12 +333,17 @@ > allowed anymore to lift isolation > requirements as needed. This option > does not override iommu=pt > - force_enable - Force enable the IOMMU on platforms known > - to be buggy with IOMMU enabled. Use this > - option with care. > - pgtbl_v1 - Use v1 page table for DMA-API (Default). > - pgtbl_v2 - Use v2 page table for DMA-API. > - irtcachedis - Disable Interrupt Remapping Table (IRT) caching. > + force_enable - Force enable the IOMMU on platforms known > + to be buggy with IOMMU enabled. Use this > + option with care. > + pgtbl_v1 - Use v1 page table for DMA-API (Default). > + pgtbl_v2 - Use v2 page table for DMA-API. > + irtcachedis - Disable Interrupt Remapping Table (IRT) caching. > + nohugepages - Limit page-sizes used for v1 page-tables > + to 4 KiB. Intel iommu driver has a similar option 'intel_iommu=sp_off' sp_off [Default Off] By default, super page will be supported if Intel IOMMU has the capability. With this option, super page will not be supported. Is it possible to consolidate these two into a single "iommu.nohugepages=1"? > + v2_pgsizes_only - Limit page-sizes used for v1 page-tables > + to 4KiB/2Mib/1GiB. > + > > amd_iommu_dump= [HW,X86-64] > Enable AMD IOMMU driver option to dump the ACPI table Thanks, baolu
On Thu, Sep 05, 2024 at 03:31:08PM +0800, Baolu Lu wrote: > Is it possible to consolidate these two into a single > "iommu.nohugepages=1"? Generally yes, but that requires to touch all drivers to make the behavior consistent. We can start this effort on-top of this change, if desired. Regards, Joerg
On Thu, Sep 05, 2024 at 09:34:07AM +0200, Joerg Roedel wrote: > On Thu, Sep 05, 2024 at 03:31:08PM +0800, Baolu Lu wrote: > > Is it possible to consolidate these two into a single > > "iommu.nohugepages=1"? > > Generally yes, but that requires to touch all drivers to make the > behavior consistent. We can start this effort on-top of this change, if > desired. Let's at least use the same keyword that already exists though?? Jason
On Thu, Sep 05, 2024 at 09:05:31AM -0300, Jason Gunthorpe wrote: > On Thu, Sep 05, 2024 at 09:34:07AM +0200, Joerg Roedel wrote: > > On Thu, Sep 05, 2024 at 03:31:08PM +0800, Baolu Lu wrote: > > > "iommu.nohugepages=1"? > > > > Generally yes, but that requires to touch all drivers to make the > > behavior consistent. We can start this effort on-top of this change, if > > desired. > > Let's at least use the same keyword that already exists though?? You mean amd_iommu=sp_off? I am not in favour of that, in the Linux world the term 'hugepage' is more common than 'superpage'. So I would avoid spreading the use of the later. We can extend that later to the iommu.nohugepages parameter suggested by Baolu. Regards, Joerg
On Thu, Sep 05, 2024 at 05:13:53PM +0200, Joerg Roedel wrote: > On Thu, Sep 05, 2024 at 09:05:31AM -0300, Jason Gunthorpe wrote: > > On Thu, Sep 05, 2024 at 09:34:07AM +0200, Joerg Roedel wrote: > > > On Thu, Sep 05, 2024 at 03:31:08PM +0800, Baolu Lu wrote: > > > > "iommu.nohugepages=1"? > > > > > > Generally yes, but that requires to touch all drivers to make the > > > behavior consistent. We can start this effort on-top of this change, if > > > desired. > > > > Let's at least use the same keyword that already exists though?? > > You mean amd_iommu=sp_off? I am not in favour of that, in the Linux > world the term 'hugepage' is more common than 'superpage'. So I > would avoid spreading the use of the later. We can extend that later to > the iommu.nohugepages parameter suggested by Baolu. I see, okay, let me check with some people if the mlx5 part is Ok Thanks, Jason
On Thu, Sep 05, 2024 at 02:52:06PM -0300, Jason Gunthorpe wrote: > On Thu, Sep 05, 2024 at 05:13:53PM +0200, Joerg Roedel wrote: > > On Thu, Sep 05, 2024 at 09:05:31AM -0300, Jason Gunthorpe wrote: > > > On Thu, Sep 05, 2024 at 09:34:07AM +0200, Joerg Roedel wrote: > > > > On Thu, Sep 05, 2024 at 03:31:08PM +0800, Baolu Lu wrote: > > > > > "iommu.nohugepages=1"? > > > > > > > > Generally yes, but that requires to touch all drivers to make the > > > > behavior consistent. We can start this effort on-top of this change, if > > > > desired. > > > > > > Let's at least use the same keyword that already exists though?? > > > > You mean amd_iommu=sp_off? I am not in favour of that, in the Linux > > world the term 'hugepage' is more common than 'superpage'. So I > > would avoid spreading the use of the later. We can extend that later to > > the iommu.nohugepages parameter suggested by Baolu. > > I see, okay, let me check with some people if the mlx5 part is Ok Apparently we have cases that rely on some other single page sizes (eg like 64G or something), a bitmap would probably be better. There was an ask that this apply to Intel as well. So, I think this would be better to start as a generic iommu parameter with a bitmap, and do the pagesize fixing in the core code, after domains are allocated, instead of in the AMD driver. Jason
On 2024/9/5 15:34, Joerg Roedel wrote: > On Thu, Sep 05, 2024 at 03:31:08PM +0800, Baolu Lu wrote: >> Is it possible to consolidate these two into a single >> "iommu.nohugepages=1"? > Generally yes, but that requires to touch all drivers to make the > behavior consistent. We can start this effort on-top of this change, if > desired. Yeah! That works too. Thanks, baolu
© 2016 - 2025 Red Hat, Inc.