[v2] RE: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

RE: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Tian, Kevin 2 months, 2 weeks ago

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, August 28, 2024 1:00 AM
> 
[...]
> On a multi-IOMMU system, the VIOMMU object can be instanced to the
> number
> of vIOMMUs in a guest VM, while holding the same parent HWPT to share
> the

Is there restriction that multiple vIOMMU objects can be only created
on a multi-IOMMU system?

> stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
> VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:

this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
entire context it actually means the physical 'VMID' allocated on the
associated physical IOMMU, correct?

Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Nicolin Chen 2 months, 2 weeks ago

On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, August 28, 2024 1:00 AM
> >
> [...]
> > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > number
> > of vIOMMUs in a guest VM, while holding the same parent HWPT to share
> > the
> 
> Is there restriction that multiple vIOMMU objects can be only created
> on a multi-IOMMU system?

I think it should be generally restricted to the number of pIOMMUs,
although likely (not 100% sure) we could do multiple vIOMMUs on a
single-pIOMMU system. Any reason for doing that?

> > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
> > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> 
> this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> entire context it actually means the physical 'VMID' allocated on the
> associated physical IOMMU, correct?

Quoting Jason's narratives, a VMID is a "Security namespace for
guest owned ID". The allocation, using SMMU as an example, should
be a part of vIOMMU instance allocation in the host SMMU driver.
Then, this VMID will be used to mark the cache tags. So, it is
still a software allocated ID, while HW would use it too.

Thanks
Nicolin

Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Alexey Kardashevskiy 2 months ago

On 11/9/24 17:08, Nicolin Chen wrote:
> On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>> Sent: Wednesday, August 28, 2024 1:00 AM
>>>
>> [...]
>>> On a multi-IOMMU system, the VIOMMU object can be instanced to the
>>> number
>>> of vIOMMUs in a guest VM, while holding the same parent HWPT to share
>>> the
>>
>> Is there restriction that multiple vIOMMU objects can be only created
>> on a multi-IOMMU system?
> 
> I think it should be generally restricted to the number of pIOMMUs,
> although likely (not 100% sure) we could do multiple vIOMMUs on a
> single-pIOMMU system. Any reason for doing that?


Just to clarify the terminology here - what are pIOMMU and vIOMMU exactly?

On AMD, IOMMU is a pretend-pcie device, one per a rootport, manages a DT 
- device table, one entry per BDFn, the entry owns a queue. A slice of 
that can be passed to a VM (== queues mapped directly to the VM, and 
such IOMMU appears in the VM as a pretend-pcie device too). So what is 
[pv]IOMMU here? Thanks,


> 
>>> stage-2 IO pagetable. Each VIOMMU then just need to only allocate its own
>>> VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
>>
>> this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
>> entire context it actually means the physical 'VMID' allocated on the
>> associated physical IOMMU, correct?
> 
> Quoting Jason's narratives, a VMID is a "Security namespace for
> guest owned ID". The allocation, using SMMU as an example, should
> be a part of vIOMMU instance allocation in the host SMMU driver.
> Then, this VMID will be used to mark the cache tags. So, it is
> still a software allocated ID, while HW would use it too.
> 
> Thanks
> Nicolin

-- 
Alexey

Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Nicolin Chen 1 month, 4 weeks ago

On Tue, Oct 01, 2024 at 11:55:59AM +1000, Alexey Kardashevskiy wrote:
> On 11/9/24 17:08, Nicolin Chen wrote:
> > On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > > 
> > > [...]
> > > > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > > > number
> > > > of vIOMMUs in a guest VM, while holding the same parent HWPT to share
> > > > the
> > > 
> > > Is there restriction that multiple vIOMMU objects can be only created
> > > on a multi-IOMMU system?
> > 
> > I think it should be generally restricted to the number of pIOMMUs,
> > although likely (not 100% sure) we could do multiple vIOMMUs on a
> > single-pIOMMU system. Any reason for doing that?
> 
> 
> Just to clarify the terminology here - what are pIOMMU and vIOMMU exactly?
> 
> On AMD, IOMMU is a pretend-pcie device, one per a rootport, manages a DT
> - device table, one entry per BDFn, the entry owns a queue. A slice of
> that can be passed to a VM (== queues mapped directly to the VM, and
> such IOMMU appears in the VM as a pretend-pcie device too). So what is
> [pv]IOMMU here? Thanks,
 
The "p" stands for physical: the entire IOMMU unit/instance. In
the IOMMU subsystem terminology, it's a struct iommu_device. It
sounds like AMD would register one iommu device per rootport?

The "v" stands for virtual: a slice of the pIOMMU that could be
shared or passed through to a VM:
 - Intel IOMMU doesn't have passthrough queues, so it uses a
   shared queue (for invalidation). In this case, vIOMMU will
   be a pure SW structure for HW queue sharing (with the host
   machine and other VMs). That said, I think the channel (or
   the port) that Intel VT-d uses internally for a device to
   do a two-stage translation can be seen as a "passthrough"
   feature, held by a vIOMMU.
 - AMD IOMMU can assign passthrough queues to VMs, in which
   case, vIOMMU will be a structure holding all passthrough
   resource (of the pIOMMU) assisgned to a VM. If there is a
   shared resource, it can be packed into the vIOMMU struct
   too. FYI, vQUEUE (future series) on the other hand will
   represent each passthrough queue in a vIOMMU struct. The
   VM then, per that specific pIOMMU (rootport?), will have
   one vIOMMU holding a number of vQUEUEs.
 - ARM SMMU is sort of in the middle, depending on the impls.
   vIOMMU will be a structure holding both passthrough and
   shared resource. It can define vQUEUEs, if the impl has
   passthrough queues like AMD does.

Allowing a vIOMMU to hold shared resource makes it a bit of an
upgraded model for IOMMU virtualization, from the existing HWPT
model that now looks like a subset of the vIOMMU model. 

Thanks
Nicolin

Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Alexey Kardashevskiy 1 month, 4 weeks ago


On 1/10/24 13:36, Nicolin Chen wrote:
> On Tue, Oct 01, 2024 at 11:55:59AM +1000, Alexey Kardashevskiy wrote:
>> On 11/9/24 17:08, Nicolin Chen wrote:
>>> On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
>>>>> From: Nicolin Chen <nicolinc@nvidia.com>
>>>>> Sent: Wednesday, August 28, 2024 1:00 AM
>>>>>
>>>> [...]
>>>>> On a multi-IOMMU system, the VIOMMU object can be instanced to the
>>>>> number
>>>>> of vIOMMUs in a guest VM, while holding the same parent HWPT to share
>>>>> the
>>>>
>>>> Is there restriction that multiple vIOMMU objects can be only created
>>>> on a multi-IOMMU system?
>>>
>>> I think it should be generally restricted to the number of pIOMMUs,
>>> although likely (not 100% sure) we could do multiple vIOMMUs on a
>>> single-pIOMMU system. Any reason for doing that?
>>
>>
>> Just to clarify the terminology here - what are pIOMMU and vIOMMU exactly?
>>
>> On AMD, IOMMU is a pretend-pcie device, one per a rootport, manages a DT
>> - device table, one entry per BDFn, the entry owns a queue. A slice of
>> that can be passed to a VM (== queues mapped directly to the VM, and
>> such IOMMU appears in the VM as a pretend-pcie device too). So what is
>> [pv]IOMMU here? Thanks,
>   
> The "p" stands for physical: the entire IOMMU unit/instance. In
> the IOMMU subsystem terminology, it's a struct iommu_device. It
> sounds like AMD would register one iommu device per rootport?

Yup, my test machine has 4 of these.


> The "v" stands for virtual: a slice of the pIOMMU that could be
> shared or passed through to a VM:
>   - Intel IOMMU doesn't have passthrough queues, so it uses a
>     shared queue (for invalidation). In this case, vIOMMU will
>     be a pure SW structure for HW queue sharing (with the host
>     machine and other VMs). That said, I think the channel (or
>     the port) that Intel VT-d uses internally for a device to
>     do a two-stage translation can be seen as a "passthrough"
>     feature, held by a vIOMMU.
>   - AMD IOMMU can assign passthrough queues to VMs, in which
>     case, vIOMMU will be a structure holding all passthrough
>     resource (of the pIOMMU) assisgned to a VM. If there is a
>     shared resource, it can be packed into the vIOMMU struct
>     too. FYI, vQUEUE (future series) on the other hand will
>     represent each passthrough queue in a vIOMMU struct. The
>     VM then, per that specific pIOMMU (rootport?), will have
>     one vIOMMU holding a number of vQUEUEs.
>   - ARM SMMU is sort of in the middle, depending on the impls.
>     vIOMMU will be a structure holding both passthrough and
>     shared resource. It can define vQUEUEs, if the impl has
>     passthrough queues like AMD does.
> 
> Allowing a vIOMMU to hold shared resource makes it a bit of an
> upgraded model for IOMMU virtualization, from the existing HWPT
> model that now looks like a subset of the vIOMMU model.

Thanks for confirming.

I've just read in this thread that "it should be generally restricted to 
the number of pIOMMUs, although likely (not 100% sure) we could do 
multiple vIOMMUs on a single-pIOMMU system. Any reason for doing that?"? 
thought "we have every reason to do that, unless p means something 
different", so I decided to ask :) Thanks,


> 
> Thanks
> Nicolin

-- 
Alexey

Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Jason Gunthorpe 1 month, 4 weeks ago

On Tue, Oct 01, 2024 at 03:06:57PM +1000, Alexey Kardashevskiy wrote:
> I've just read in this thread that "it should be generally restricted to the
> number of pIOMMUs, although likely (not 100% sure) we could do multiple
> vIOMMUs on a single-pIOMMU system. Any reason for doing that?"? thought "we
> have every reason to do that, unless p means something different", so I
> decided to ask :) Thanks,

I think that was inteded as "multiple vIOMMUs per pIOMMU within a
single VM".

There would always be multiple vIOMMUs per pIOMMU across VMs/etc.

Jason

RE: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Tian, Kevin 2 months, 2 weeks ago

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, September 11, 2024 3:08 PM
> 
> On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Wednesday, August 28, 2024 1:00 AM
> > >
> > [...]
> > > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > > number
> > > of vIOMMUs in a guest VM, while holding the same parent HWPT to
> share
> > > the
> >
> > Is there restriction that multiple vIOMMU objects can be only created
> > on a multi-IOMMU system?
> 
> I think it should be generally restricted to the number of pIOMMUs,
> although likely (not 100% sure) we could do multiple vIOMMUs on a
> single-pIOMMU system. Any reason for doing that?

No idea. But if you stated so then there will be code to enforce it e.g.
failing the attempt to create a vIOMMU object on a pIOMMU to which
another vIOMMU object is already linked?

> 
> > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> own
> > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> >
> > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > entire context it actually means the physical 'VMID' allocated on the
> > associated physical IOMMU, correct?
> 
> Quoting Jason's narratives, a VMID is a "Security namespace for
> guest owned ID". The allocation, using SMMU as an example, should

the VMID alone is not a namespace. It's one ID to tag another namespace.

> be a part of vIOMMU instance allocation in the host SMMU driver.
> Then, this VMID will be used to mark the cache tags. So, it is
> still a software allocated ID, while HW would use it too.
> 

VMIDs are physical resource belonging to the host SMMU driver.

but I got your original point that it's each vIOMMU gets an unique VMID
from the host SMMU driver, not exactly that each vIOMMU maintains
its own VMID namespace. that'd be a different concept.

Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Nicolin Chen 2 months, 2 weeks ago

On Wed, Sep 11, 2024 at 07:18:10AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, September 11, 2024 3:08 PM
> >
> > On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > >
> > > [...]
> > > > On a multi-IOMMU system, the VIOMMU object can be instanced to the
> > > > number
> > > > of vIOMMUs in a guest VM, while holding the same parent HWPT to
> > share
> > > > the
> > >
> > > Is there restriction that multiple vIOMMU objects can be only created
> > > on a multi-IOMMU system?
> >
> > I think it should be generally restricted to the number of pIOMMUs,
> > although likely (not 100% sure) we could do multiple vIOMMUs on a
> > single-pIOMMU system. Any reason for doing that?
> 
> No idea. But if you stated so then there will be code to enforce it e.g.
> failing the attempt to create a vIOMMU object on a pIOMMU to which
> another vIOMMU object is already linked?

Yea, I can do that.

> > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> > own
> > > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> > >
> > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > > entire context it actually means the physical 'VMID' allocated on the
> > > associated physical IOMMU, correct?
> >
> > Quoting Jason's narratives, a VMID is a "Security namespace for
> > guest owned ID". The allocation, using SMMU as an example, should
> 
> the VMID alone is not a namespace. It's one ID to tag another namespace.
> 
> > be a part of vIOMMU instance allocation in the host SMMU driver.
> > Then, this VMID will be used to mark the cache tags. So, it is
> > still a software allocated ID, while HW would use it too.
> >
> 
> VMIDs are physical resource belonging to the host SMMU driver.

Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
the guest.

> but I got your original point that it's each vIOMMU gets an unique VMID
> from the host SMMU driver, not exactly that each vIOMMU maintains
> its own VMID namespace. that'd be a different concept.

What's a VMID namespace actually? Please educate me :)

Thanks
Nicolin

RE: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Tian, Kevin 2 months, 2 weeks ago

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Wednesday, September 11, 2024 3:41 PM
> 
> On Wed, Sep 11, 2024 at 07:18:10AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Wednesday, September 11, 2024 3:08 PM
> > >
> > > On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > > >
> > > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> > > own
> > > > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> > > >
> > > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > > > entire context it actually means the physical 'VMID' allocated on the
> > > > associated physical IOMMU, correct?
> > >
> > > Quoting Jason's narratives, a VMID is a "Security namespace for
> > > guest owned ID". The allocation, using SMMU as an example, should
> >
> > the VMID alone is not a namespace. It's one ID to tag another namespace.
> >
> > > be a part of vIOMMU instance allocation in the host SMMU driver.
> > > Then, this VMID will be used to mark the cache tags. So, it is
> > > still a software allocated ID, while HW would use it too.
> > >
> >
> > VMIDs are physical resource belonging to the host SMMU driver.
> 
> Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
> the guest.
> 
> > but I got your original point that it's each vIOMMU gets an unique VMID
> > from the host SMMU driver, not exactly that each vIOMMU maintains
> > its own VMID namespace. that'd be a different concept.
> 
> What's a VMID namespace actually? Please educate me :)
> 

I meant the 16bit VMID pool under each SMMU.

Re: [PATCH v2 00/19] iommufd: Add VIOMMU infrastructure (Part-1)

Posted by Nicolin Chen 2 months, 2 weeks ago

On Wed, Sep 11, 2024 at 08:08:04AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Wednesday, September 11, 2024 3:41 PM
> >
> > On Wed, Sep 11, 2024 at 07:18:10AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Wednesday, September 11, 2024 3:08 PM
> > > >
> > > > On Wed, Sep 11, 2024 at 06:12:21AM +0000, Tian, Kevin wrote:
> > > > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > > > Sent: Wednesday, August 28, 2024 1:00 AM
> > > > > >
> > > > > > stage-2 IO pagetable. Each VIOMMU then just need to only allocate its
> > > > own
> > > > > > VMID to attach the shared stage-2 IO pagetable to the physical IOMMU:
> > > > >
> > > > > this reads like 'VMID' is a virtual ID allocated by vIOMMU. But from the
> > > > > entire context it actually means the physical 'VMID' allocated on the
> > > > > associated physical IOMMU, correct?
> > > >
> > > > Quoting Jason's narratives, a VMID is a "Security namespace for
> > > > guest owned ID". The allocation, using SMMU as an example, should
> > >
> > > the VMID alone is not a namespace. It's one ID to tag another namespace.
> > >
> > > > be a part of vIOMMU instance allocation in the host SMMU driver.
> > > > Then, this VMID will be used to mark the cache tags. So, it is
> > > > still a software allocated ID, while HW would use it too.
> > > >
> > >
> > > VMIDs are physical resource belonging to the host SMMU driver.
> >
> > Yes. Just the lifecycle of a VMID is controlled by a vIOMMU, i.e.
> > the guest.
> >
> > > but I got your original point that it's each vIOMMU gets an unique VMID
> > > from the host SMMU driver, not exactly that each vIOMMU maintains
> > > its own VMID namespace. that'd be a different concept.
> >
> > What's a VMID namespace actually? Please educate me :)
> >
> 
> I meant the 16bit VMID pool under each SMMU.

I see. Makes sense now.

Thanks
Nicolin