> From: Mostafa Saleh <smostafa@google.com> > Sent: Wednesday, January 8, 2025 8:10 PM > > On Thu, Jan 02, 2025 at 04:16:14PM -0400, Jason Gunthorpe wrote: > > On Fri, Dec 13, 2024 at 07:39:04PM +0000, Mostafa Saleh wrote: > > > Yeah, SVA is tricky, I guess for that we would have to use nesting, > > > but tbh, I don’t think it’s a deal breaker for now. > > > > Again, it depends what your actual use case for translation is inside > > the host/guest environments. It would be good to clearly spell this out.. > > There are few drivers that directly manpulate the iommu_domains of a > > device. a few gpus, ath1x wireless, some tegra stuff, "venus". Which > > of those are you targetting? > > > > Not sure I understand this point about manipulating domains. > AFAIK, SVA is not that common, including mobile spaces but I can be wrong, > that’s why it’s not a priority here. Nested translation is required beyond SVA. A scenario which requires a vIOMMU and multiple device domains within the guest would like to embrace nesting. Especially for ARM vSMMU nesting is a must. But I'm not sure that I got Jason's point about " there is no way to get SVA support with para-virtualization." virtio-iommu is a para-virtualized model and SVA support is in its plan. The main requirement is to pass the base pointer of the guest CPU page table to backend and PRI faults/ responses back forth.
On Thu, Jan 16, 2025 at 06:39:31AM +0000, Tian, Kevin wrote:
> > From: Mostafa Saleh <smostafa@google.com>
> > Sent: Wednesday, January 8, 2025 8:10 PM
> >
> > On Thu, Jan 02, 2025 at 04:16:14PM -0400, Jason Gunthorpe wrote:
> > > On Fri, Dec 13, 2024 at 07:39:04PM +0000, Mostafa Saleh wrote:
> > > > Yeah, SVA is tricky, I guess for that we would have to use nesting,
> > > > but tbh, I don’t think it’s a deal breaker for now.
> > >
> > > Again, it depends what your actual use case for translation is inside
> > > the host/guest environments. It would be good to clearly spell this out..
> > > There are few drivers that directly manpulate the iommu_domains of a
> > > device. a few gpus, ath1x wireless, some tegra stuff, "venus". Which
> > > of those are you targetting?
> > >
> >
> > Not sure I understand this point about manipulating domains.
> > AFAIK, SVA is not that common, including mobile spaces but I can be wrong,
> > that’s why it’s not a priority here.
>
> Nested translation is required beyond SVA. A scenario which requires
> a vIOMMU and multiple device domains within the guest would like to
> embrace nesting. Especially for ARM vSMMU nesting is a must.
Right, if you need an iommu domain in the guest there are only three
mainstream ways to get this in Linux:
1) Use the DMA API and have the iommu group be translating. This is
optional in that the DMA API usually supports identity as an option.
2) A driver directly calls iommu_paging_domain_alloc() and manually
attaches it to some device, and does not use the DMA API. My list
above of ath1x/etc are examples doing this
3) Use VFIO
My remark to Mostafa is to be specific, which of the above do you want
to do in your mobile guest (and what driver exactly if #2) and why.
This will help inform what the performance profile looks like and
guide if nesting/para virt is appropriate.
> But I'm not sure that I got Jason's point about " there is no way to get
> SVA support with para-virtualization." virtio-iommu is a para-virtualized
> model and SVA support is in its plan. The main requirement is to pass
> the base pointer of the guest CPU page table to backend and PRI faults/
> responses back forth.
That's nesting, you have a full page table under the control of the
guest, and the guest needs to have a level of HW-specific
knowledge. It is just an alternative to using the native nesting
vIOMMU.
What I mean by "para-virtualization" is the guest does map/unmap calls
to the hypervisor and has no page tbale.
Jason
> From: Jason Gunthorpe <jgg@ziepe.ca> > Sent: Friday, January 17, 2025 3:15 AM > > On Thu, Jan 16, 2025 at 06:39:31AM +0000, Tian, Kevin wrote: > > > From: Mostafa Saleh <smostafa@google.com> > > > Sent: Wednesday, January 8, 2025 8:10 PM > > > > > > On Thu, Jan 02, 2025 at 04:16:14PM -0400, Jason Gunthorpe wrote: > > > > On Fri, Dec 13, 2024 at 07:39:04PM +0000, Mostafa Saleh wrote: > > > > > Yeah, SVA is tricky, I guess for that we would have to use nesting, > > > > > but tbh, I don’t think it’s a deal breaker for now. > > > > > > > > Again, it depends what your actual use case for translation is inside > > > > the host/guest environments. It would be good to clearly spell this out.. > > > > There are few drivers that directly manpulate the iommu_domains of a > > > > device. a few gpus, ath1x wireless, some tegra stuff, "venus". Which > > > > of those are you targetting? > > > > > > > > > > Not sure I understand this point about manipulating domains. > > > AFAIK, SVA is not that common, including mobile spaces but I can be > wrong, > > > that’s why it’s not a priority here. > > > > Nested translation is required beyond SVA. A scenario which requires > > a vIOMMU and multiple device domains within the guest would like to > > embrace nesting. Especially for ARM vSMMU nesting is a must. > > Right, if you need an iommu domain in the guest there are only three > mainstream ways to get this in Linux: > 1) Use the DMA API and have the iommu group be translating. This is > optional in that the DMA API usually supports identity as an option. > 2) A driver directly calls iommu_paging_domain_alloc() and manually > attaches it to some device, and does not use the DMA API. My list > above of ath1x/etc are examples doing this > 3) Use VFIO > > My remark to Mostafa is to be specific, which of the above do you want > to do in your mobile guest (and what driver exactly if #2) and why. > > This will help inform what the performance profile looks like and > guide if nesting/para virt is appropriate. Yeah that part would be critical to help decide which route to pursue first. Even when all options might be required in the end when pKVM is scaled to more scenarios, as you mentioned in another mail, a staging approach would be much preferrable to evolve. The pros/cons between nesting/para virt is clear - more static the mapping is, more gain from the para approach due to less paging walking and smaller tlb footprint, while vice versa nesting performs much better by avoiding frequent para calls on page table mgmt. 😊 > > > But I'm not sure that I got Jason's point about " there is no way to get > > SVA support with para-virtualization." virtio-iommu is a para-virtualized > > model and SVA support is in its plan. The main requirement is to pass > > the base pointer of the guest CPU page table to backend and PRI faults/ > > responses back forth. > > That's nesting, you have a full page table under the control of the > guest, and the guest needs to have a level of HW-specific > knowledge. It is just an alternative to using the native nesting > vIOMMU. > > What I mean by "para-virtualization" is the guest does map/unmap calls > to the hypervisor and has no page tbale. > Yes, that should never happen for SVA.
On Fri, Jan 17, 2025 at 06:57:12AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@ziepe.ca> > > Sent: Friday, January 17, 2025 3:15 AM > > > > On Thu, Jan 16, 2025 at 06:39:31AM +0000, Tian, Kevin wrote: > > > > From: Mostafa Saleh <smostafa@google.com> > > > > Sent: Wednesday, January 8, 2025 8:10 PM > > > > > > > > On Thu, Jan 02, 2025 at 04:16:14PM -0400, Jason Gunthorpe wrote: > > > > > On Fri, Dec 13, 2024 at 07:39:04PM +0000, Mostafa Saleh wrote: > > > > > > Yeah, SVA is tricky, I guess for that we would have to use nesting, > > > > > > but tbh, I don’t think it’s a deal breaker for now. > > > > > > > > > > Again, it depends what your actual use case for translation is inside > > > > > the host/guest environments. It would be good to clearly spell this out.. > > > > > There are few drivers that directly manpulate the iommu_domains of a > > > > > device. a few gpus, ath1x wireless, some tegra stuff, "venus". Which > > > > > of those are you targetting? > > > > > > > > > > > > > Not sure I understand this point about manipulating domains. > > > > AFAIK, SVA is not that common, including mobile spaces but I can be > > wrong, > > > > that’s why it’s not a priority here. > > > > > > Nested translation is required beyond SVA. A scenario which requires > > > a vIOMMU and multiple device domains within the guest would like to > > > embrace nesting. Especially for ARM vSMMU nesting is a must. We can still do para-virtualization for guests the same way we do for the host and use a single stage IOMMU. > > > > Right, if you need an iommu domain in the guest there are only three > > mainstream ways to get this in Linux: > > 1) Use the DMA API and have the iommu group be translating. This is > > optional in that the DMA API usually supports identity as an option. > > 2) A driver directly calls iommu_paging_domain_alloc() and manually > > attaches it to some device, and does not use the DMA API. My list > > above of ath1x/etc are examples doing this > > 3) Use VFIO > > > > My remark to Mostafa is to be specific, which of the above do you want > > to do in your mobile guest (and what driver exactly if #2) and why. > > > > This will help inform what the performance profile looks like and > > guide if nesting/para virt is appropriate. > AFAIK, the most common use cases would be: - Devices using DMA API because it requires a lot of memory to be contiguous in IOVA, which is hard to do with identity - Devices with security requirements/constraints to be isolated from the rest of the system, also using DMA API - VFIO is something we are looking at the moment and have prototyped with pKVM, and it should be supported soon in Android (only for platform devices for now) > Yeah that part would be critical to help decide which route to pursue > first. Even when all options might be required in the end when pKVM > is scaled to more scenarios, as you mentioned in another mail, a staging > approach would be much preferrable to evolve. I agree that would probably be the case. I will work on more staging approach for v3, mostly without the pv part as Jason suggested. > > The pros/cons between nesting/para virt is clear - more static the > mapping is, more gain from the para approach due to less paging > walking and smaller tlb footprint, while vice versa nesting performs > much better by avoiding frequent para calls on page table mgmt. 😊 I am also working to get the numbers for both cases so we know the order of magnitude of each case, as I guess it won't be as clear for large systems with many DMA initiators what approach is best. Thanks, Mostafa > > > > > > But I'm not sure that I got Jason's point about " there is no way to get > > > SVA support with para-virtualization." virtio-iommu is a para-virtualized > > > model and SVA support is in its plan. The main requirement is to pass > > > the base pointer of the guest CPU page table to backend and PRI faults/ > > > responses back forth. > > > > That's nesting, you have a full page table under the control of the > > guest, and the guest needs to have a level of HW-specific > > knowledge. It is just an alternative to using the native nesting > > vIOMMU. > > > > What I mean by "para-virtualization" is the guest does map/unmap calls > > to the hypervisor and has no page tbale. > > > > Yes, that should never happen for SVA.
> From: Mostafa Saleh <smostafa@google.com> > Sent: Wednesday, January 22, 2025 7:04 PM > > On Fri, Jan 17, 2025 at 06:57:12AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@ziepe.ca> > > > Sent: Friday, January 17, 2025 3:15 AM > > > > > > On Thu, Jan 16, 2025 at 06:39:31AM +0000, Tian, Kevin wrote: > > > > > From: Mostafa Saleh <smostafa@google.com> > > > > > Sent: Wednesday, January 8, 2025 8:10 PM > > > > > > > > > > On Thu, Jan 02, 2025 at 04:16:14PM -0400, Jason Gunthorpe wrote: > > > > > > On Fri, Dec 13, 2024 at 07:39:04PM +0000, Mostafa Saleh wrote: > > > > > > > Yeah, SVA is tricky, I guess for that we would have to use nesting, > > > > > > > but tbh, I don’t think it’s a deal breaker for now. > > > > > > > > > > > > Again, it depends what your actual use case for translation is inside > > > > > > the host/guest environments. It would be good to clearly spell this > out.. > > > > > > There are few drivers that directly manpulate the iommu_domains > of a > > > > > > device. a few gpus, ath1x wireless, some tegra stuff, "venus". Which > > > > > > of those are you targetting? > > > > > > > > > > > > > > > > Not sure I understand this point about manipulating domains. > > > > > AFAIK, SVA is not that common, including mobile spaces but I can be > > > wrong, > > > > > that’s why it’s not a priority here. > > > > > > > > Nested translation is required beyond SVA. A scenario which requires > > > > a vIOMMU and multiple device domains within the guest would like to > > > > embrace nesting. Especially for ARM vSMMU nesting is a must. > > We can still do para-virtualization for guests the same way we do for the > host and use a single stage IOMMU. same way but both require a nested setup. In concept there are two layers of address translations: GVA->GPA via guest page table, and GPA->HPA via pKVM page table. The difference between host/guest is just on the GPA mapping. For host it's 1:1 with additional hardening for which portion can be mapped and which cannot. For guest it's non-identical with the mapping established from the host. A nested translation naturally fits that conceptual layers. Using a single-stage IOMMU means you need to combine two layers into one layer i.e. GVA->HPA by removing GPA. Then you have to paravirt guest page table so every guest PTE change is intercepted to replace GPA with HPA. Doing so completely kills the benefit of SVA, which is why Jason said a no-go. > > > > > > > Right, if you need an iommu domain in the guest there are only three > > > mainstream ways to get this in Linux: > > > 1) Use the DMA API and have the iommu group be translating. This is > > > optional in that the DMA API usually supports identity as an option. > > > 2) A driver directly calls iommu_paging_domain_alloc() and manually > > > attaches it to some device, and does not use the DMA API. My list > > > above of ath1x/etc are examples doing this > > > 3) Use VFIO > > > > > > My remark to Mostafa is to be specific, which of the above do you want > > > to do in your mobile guest (and what driver exactly if #2) and why. > > > > > > This will help inform what the performance profile looks like and > > > guide if nesting/para virt is appropriate. > > > > AFAIK, the most common use cases would be: > - Devices using DMA API because it requires a lot of memory to be > contiguous in IOVA, which is hard to do with identity > - Devices with security requirements/constraints to be isolated from the > rest of the system, also using DMA API > - VFIO is something we are looking at the moment and have prototyped with > pKVM, and it should be supported soon in Android (only for platform > devices for now) what really matters is the frequency of map/unmap. > > > Yeah that part would be critical to help decide which route to pursue > > first. Even when all options might be required in the end when pKVM > > is scaled to more scenarios, as you mentioned in another mail, a staging > > approach would be much preferrable to evolve. > > I agree that would probably be the case. I will work on more staging > approach for v3, mostly without the pv part as Jason suggested. > > > > > The pros/cons between nesting/para virt is clear - more static the > > mapping is, more gain from the para approach due to less paging > > walking and smaller tlb footprint, while vice versa nesting performs > > much better by avoiding frequent para calls on page table mgmt. 😊 > > I am also working to get the numbers for both cases so we know > the order of magnitude of each case, as I guess it won't be as clear > for large systems with many DMA initiators what approach is best. > > That'd be great!
On Thu, Jan 23, 2025 at 08:13:34AM +0000, Tian, Kevin wrote: > > From: Mostafa Saleh <smostafa@google.com> > > Sent: Wednesday, January 22, 2025 7:04 PM > > > > On Fri, Jan 17, 2025 at 06:57:12AM +0000, Tian, Kevin wrote: > > > > From: Jason Gunthorpe <jgg@ziepe.ca> > > > > Sent: Friday, January 17, 2025 3:15 AM > > > > > > > > On Thu, Jan 16, 2025 at 06:39:31AM +0000, Tian, Kevin wrote: > > > > > > From: Mostafa Saleh <smostafa@google.com> > > > > > > Sent: Wednesday, January 8, 2025 8:10 PM > > > > > > > > > > > > On Thu, Jan 02, 2025 at 04:16:14PM -0400, Jason Gunthorpe wrote: > > > > > > > On Fri, Dec 13, 2024 at 07:39:04PM +0000, Mostafa Saleh wrote: > > > > > > > > Yeah, SVA is tricky, I guess for that we would have to use nesting, > > > > > > > > but tbh, I don’t think it’s a deal breaker for now. > > > > > > > > > > > > > > Again, it depends what your actual use case for translation is inside > > > > > > > the host/guest environments. It would be good to clearly spell this > > out.. > > > > > > > There are few drivers that directly manpulate the iommu_domains > > of a > > > > > > > device. a few gpus, ath1x wireless, some tegra stuff, "venus". Which > > > > > > > of those are you targetting? > > > > > > > > > > > > > > > > > > > Not sure I understand this point about manipulating domains. > > > > > > AFAIK, SVA is not that common, including mobile spaces but I can be > > > > wrong, > > > > > > that’s why it’s not a priority here. > > > > > > > > > > Nested translation is required beyond SVA. A scenario which requires > > > > > a vIOMMU and multiple device domains within the guest would like to > > > > > embrace nesting. Especially for ARM vSMMU nesting is a must. > > > > We can still do para-virtualization for guests the same way we do for the > > host and use a single stage IOMMU. > > same way but both require a nested setup. > > In concept there are two layers of address translations: GVA->GPA via > guest page table, and GPA->HPA via pKVM page table. > > The difference between host/guest is just on the GPA mapping. For host > it's 1:1 with additional hardening for which portion can be mapped and > which cannot. For guest it's non-identical with the mapping established > from the host. > > A nested translation naturally fits that conceptual layers. > > Using a single-stage IOMMU means you need to combine two layers > into one layer i.e. GVA->HPA by removing GPA. Then you have to > paravirt guest page table so every guest PTE change is intercepted > to replace GPA with HPA. > > Doing so completely kills the benefit of SVA, which is why Jason said > a no-go. I agree, this can’t work with SVA, in order to make that work we would need some new para-virt operation to install the S1 table, and the hypervisor has to configure the device in nested translation. But, for guests that doesn’t need SVA, they can just use single-stage para-virt (like virtio-iommu) > > > > > > > > > > > Right, if you need an iommu domain in the guest there are only three > > > > mainstream ways to get this in Linux: > > > > 1) Use the DMA API and have the iommu group be translating. This is > > > > optional in that the DMA API usually supports identity as an option. > > > > 2) A driver directly calls iommu_paging_domain_alloc() and manually > > > > attaches it to some device, and does not use the DMA API. My list > > > > above of ath1x/etc are examples doing this > > > > 3) Use VFIO > > > > > > > > My remark to Mostafa is to be specific, which of the above do you want > > > > to do in your mobile guest (and what driver exactly if #2) and why. > > > > > > > > This will help inform what the performance profile looks like and > > > > guide if nesting/para virt is appropriate. > > > > > > > AFAIK, the most common use cases would be: > > - Devices using DMA API because it requires a lot of memory to be > > contiguous in IOVA, which is hard to do with identity > > - Devices with security requirements/constraints to be isolated from the > > rest of the system, also using DMA API > > - VFIO is something we are looking at the moment and have prototyped with > > pKVM, and it should be supported soon in Android (only for platform > > devices for now) > > what really matters is the frequency of map/unmap. Yes, though it differs between devices/systems :/ that’s why I reckon we would need both on the long term. However, starting with some benchmarks for these cases can help to understand the magnitude of both solutions and prioritise which one is more suitable to start with for upstream. Thanks, Mostafa > > > > > > Yeah that part would be critical to help decide which route to pursue > > > first. Even when all options might be required in the end when pKVM > > > is scaled to more scenarios, as you mentioned in another mail, a staging > > > approach would be much preferrable to evolve. > > > > I agree that would probably be the case. I will work on more staging > > approach for v3, mostly without the pv part as Jason suggested. > > > > > > > > The pros/cons between nesting/para virt is clear - more static the > > > mapping is, more gain from the para approach due to less paging > > > walking and smaller tlb footprint, while vice versa nesting performs > > > much better by avoiding frequent para calls on page table mgmt. 😊 > > > > I am also working to get the numbers for both cases so we know > > the order of magnitude of each case, as I guess it won't be as clear > > for large systems with many DMA initiators what approach is best. > > > > > > That'd be great!
On Wed, Jan 22, 2025 at 11:04:24AM +0000, Mostafa Saleh wrote: > AFAIK, the most common use cases would be: > - Devices using DMA API because it requires a lot of memory to be > contiguous in IOVA, which is hard to do with identity This is not a feature of the DMA API any driver should rely on .. Are you aware of one that does? > - Devices with security requirements/constraints to be isolated from the > rest of the system, also using DMA API This is real, but again, in a mobile context does this even exist? It isn't like there are external PCIe ports that need securing on a phone? > - VFIO is something we are looking at the moment and have prototyped with > pKVM, and it should be supported soon in Android (only for platform > devices for now) Yes, this makes sense Jason
On Wed, Jan 22, 2025 at 12:20:55PM -0400, Jason Gunthorpe wrote: > On Wed, Jan 22, 2025 at 11:04:24AM +0000, Mostafa Saleh wrote: > > AFAIK, the most common use cases would be: > > - Devices using DMA API because it requires a lot of memory to be > > contiguous in IOVA, which is hard to do with identity > > This is not a feature of the DMA API any driver should rely on .. Are > you aware of one that does? > I’d guess one example is media drivers, they usually need large contiguous buffers, and would use for ex dma_alloc_coherent(), if the IOMMU is disabled or bypassed, that means that the kernel has to find such contiguous size in the physical address which can be impossible on devices with small memory as mobile devices. Similarly. I will look more into this while working on the patches to identity map everything for v3, and I’d see what kind of issues I hit. > > - Devices with security requirements/constraints to be isolated from the > > rest of the system, also using DMA API > > This is real, but again, in a mobile context does this even exist? It isn't > like there are external PCIe ports that need securing on a phone? It’s not just about completely external devices, it’s a defence in depth measure, where for example, network devices can be poked externally an there have cases in the past where exploits were found[1], so some vendors might have a policy to isolate such devices. Which I believe is a valid. [1] https://lwn.net/ml/oss-security/20221013101046.GB20615@suse.de/ Thanks, Mostafa > > > - VFIO is something we are looking at the moment and have prototyped with > > pKVM, and it should be supported soon in Android (only for platform > > devices for now) > > Yes, this makes sense > > Jason
On Wed, Jan 22, 2025 at 05:17:50PM +0000, Mostafa Saleh wrote: > On Wed, Jan 22, 2025 at 12:20:55PM -0400, Jason Gunthorpe wrote: > > On Wed, Jan 22, 2025 at 11:04:24AM +0000, Mostafa Saleh wrote: > > > AFAIK, the most common use cases would be: > > > - Devices using DMA API because it requires a lot of memory to be > > > contiguous in IOVA, which is hard to do with identity > > > > This is not a feature of the DMA API any driver should rely on .. Are > > you aware of one that does? > > > > I’d guess one example is media drivers, they usually need large contiguous > buffers, and would use for ex dma_alloc_coherent(), if the IOMMU is disabled or > bypassed, that means that the kernel has to find such contiguous size in the > physical address which can be impossible on devices with small memory as > mobile devices. Similarly. I see, that make sense > It’s not just about completely external devices, it’s a defence in depth > measure, where for example, network devices can be poked externally an > there have cases in the past where exploits were found[1], so some vendors > might have a policy to isolate such devices. Which I believe is a valid. The performance cost of doing isolation like that with networking is probably prohibitive with paravirt.. Jason
© 2016 - 2025 Red Hat, Inc.