RE: [RFC PATCH 0/6] Enable shared device assignment

Tian, Kevin posted 6 patches 1 month, 1 week ago
Only 0 patches received!
RE: [RFC PATCH 0/6] Enable shared device assignment
Posted by Tian, Kevin 1 month, 1 week ago
> From: David Hildenbrand <david@redhat.com>
> Sent: Thursday, July 25, 2024 10:04 PM
> 
> > Open
> > ====
> > Implementing a RamDiscardManager to notify VFIO of page conversions
> > causes changes in semantics: private memory is treated as discarded (or
> > hot-removed) memory. This isn't aligned with the expectation of current
> > RamDiscardManager users (e.g. VFIO or live migration) who really
> > expect that discarded memory is hot-removed and thus can be skipped
> when
> > the users are processing guest memory. Treating private memory as
> > discarded won't work in future if VFIO or live migration needs to handle
> > private memory. e.g. VFIO may need to map private memory to support
> > Trusted IO and live migration for confidential VMs need to migrate
> > private memory.
> 
> "VFIO may need to map private memory to support Trusted IO"
> 
> I've been told that the way we handle shared memory won't be the way
> this is going to work with guest_memfd. KVM will coordinate directly
> with VFIO or $whatever and update the IOMMU tables itself right in the
> kernel; the pages are pinned/owned by guest_memfd, so that will just
> work. So I don't consider that currently a concern. guest_memfd private
> memory is not mapped into user page tables and as it currently seems it
> never will be.

Or could extend MAP_DMA to accept guest_memfd+offset in place of
'vaddr' and have VFIO/IOMMUFD call guest_memfd helpers to retrieve
the pinned pfn.

IMHO it's more the TIO arch deciding whether VFIO/IOMMUFD needs
to manage the mapping of the private memory instead of the use of
guest_memfd.

e.g. SEV-TIO, iiuc, introduces a new-layer page ownership tracker (RMP)
to check the HPA after the IOMMU walks the existing I/O page tables. 
So reasonably VFIO/IOMMUFD could continue to manage those I/O
page tables including both private and shared memory, with a hint to
know where to find the pfn (host page table or guest_memfd).

But TDX Connect introduces a new I/O page table format (same as secure
EPT) for mapping the private memory and further requires sharing the
secure-EPT between CPU/IOMMU for private. Then it appears to be
a different story.
Re: [RFC PATCH 0/6] Enable shared device assignment
Posted by David Hildenbrand 1 month, 1 week ago
On 26.07.24 07:02, Tian, Kevin wrote:
>> From: David Hildenbrand <david@redhat.com>
>> Sent: Thursday, July 25, 2024 10:04 PM
>>
>>> Open
>>> ====
>>> Implementing a RamDiscardManager to notify VFIO of page conversions
>>> causes changes in semantics: private memory is treated as discarded (or
>>> hot-removed) memory. This isn't aligned with the expectation of current
>>> RamDiscardManager users (e.g. VFIO or live migration) who really
>>> expect that discarded memory is hot-removed and thus can be skipped
>> when
>>> the users are processing guest memory. Treating private memory as
>>> discarded won't work in future if VFIO or live migration needs to handle
>>> private memory. e.g. VFIO may need to map private memory to support
>>> Trusted IO and live migration for confidential VMs need to migrate
>>> private memory.
>>
>> "VFIO may need to map private memory to support Trusted IO"
>>
>> I've been told that the way we handle shared memory won't be the way
>> this is going to work with guest_memfd. KVM will coordinate directly
>> with VFIO or $whatever and update the IOMMU tables itself right in the
>> kernel; the pages are pinned/owned by guest_memfd, so that will just
>> work. So I don't consider that currently a concern. guest_memfd private
>> memory is not mapped into user page tables and as it currently seems it
>> never will be.
> 
> Or could extend MAP_DMA to accept guest_memfd+offset in place of
> 'vaddr' and have VFIO/IOMMUFD call guest_memfd helpers to retrieve
> the pinned pfn.

In theory yes, and I've been thinking of the same for a while. Until 
people told me that it is unlikely that it will work that way in the future.

> 
> IMHO it's more the TIO arch deciding whether VFIO/IOMMUFD needs
> to manage the mapping of the private memory instead of the use of
> guest_memfd.
> 
> e.g. SEV-TIO, iiuc, introduces a new-layer page ownership tracker (RMP)
> to check the HPA after the IOMMU walks the existing I/O page tables.
> So reasonably VFIO/IOMMUFD could continue to manage those I/O
> page tables including both private and shared memory, with a hint to
> know where to find the pfn (host page table or guest_memfd).
> 
> But TDX Connect introduces a new I/O page table format (same as secure
> EPT) for mapping the private memory and further requires sharing the
> secure-EPT between CPU/IOMMU for private. Then it appears to be
> a different story.

Yes. This seems to be the future and more in-line with 
in-place/in-kernel conversion as e.g., pKVM wants to have it. If you 
want to avoid user space altogether when doing shared<->private 
conversions, then letting user space manage the IOMMUs is not going to work.


If we ever have to go down that path (MAP_DMA of guest_memfd), we could 
have two RAMDiscardManager for a RAM region, just like we have two 
memory backends: one for shared memory populate/discard (what this 
series tries to achieve), one for private memory populate/discard.

The thing is, that private memory will always have to be special-cased 
all over the place either way, unfortunately.

-- 
Cheers,

David / dhildenb
Re: [RFC PATCH 0/6] Enable shared device assignment
Posted by Xu Yilun 1 month, 1 week ago
On Fri, Jul 26, 2024 at 09:08:51AM +0200, David Hildenbrand wrote:
> On 26.07.24 07:02, Tian, Kevin wrote:
> > > From: David Hildenbrand <david@redhat.com>
> > > Sent: Thursday, July 25, 2024 10:04 PM
> > > 
> > > > Open
> > > > ====
> > > > Implementing a RamDiscardManager to notify VFIO of page conversions
> > > > causes changes in semantics: private memory is treated as discarded (or
> > > > hot-removed) memory. This isn't aligned with the expectation of current
> > > > RamDiscardManager users (e.g. VFIO or live migration) who really
> > > > expect that discarded memory is hot-removed and thus can be skipped
> > > when
> > > > the users are processing guest memory. Treating private memory as
> > > > discarded won't work in future if VFIO or live migration needs to handle
> > > > private memory. e.g. VFIO may need to map private memory to support
> > > > Trusted IO and live migration for confidential VMs need to migrate
> > > > private memory.
> > > 
> > > "VFIO may need to map private memory to support Trusted IO"
> > > 
> > > I've been told that the way we handle shared memory won't be the way
> > > this is going to work with guest_memfd. KVM will coordinate directly
> > > with VFIO or $whatever and update the IOMMU tables itself right in the
> > > kernel; the pages are pinned/owned by guest_memfd, so that will just
> > > work. So I don't consider that currently a concern. guest_memfd private
> > > memory is not mapped into user page tables and as it currently seems it
> > > never will be.
> > 
> > Or could extend MAP_DMA to accept guest_memfd+offset in place of

With TIO, I can imagine several buffer sharing requirements: KVM maps VFIO
owned private MMIO, IOMMU maps gmem owned private memory, IOMMU maps VFIO
owned private MMIO. These buffers cannot be found by user page table
anymore. I'm wondering it would be messy to have specific PFN finding
methods for each FD type. Is it possible we have a unified way for
buffer sharing and PFN finding, is dma-buf a candidate?

> > 'vaddr' and have VFIO/IOMMUFD call guest_memfd helpers to retrieve
> > the pinned pfn.
> 
> In theory yes, and I've been thinking of the same for a while. Until people
> told me that it is unlikely that it will work that way in the future.

Could you help specify why it won't work? As Kevin mentioned below, SEV-TIO
may still allow userspace to manage the IOMMU mapping for private. I'm
not sure how they map private memory for IOMMU without touching gmemfd.

Thanks,
Yilun

> 
> > 
> > IMHO it's more the TIO arch deciding whether VFIO/IOMMUFD needs
> > to manage the mapping of the private memory instead of the use of
> > guest_memfd.
> > 
> > e.g. SEV-TIO, iiuc, introduces a new-layer page ownership tracker (RMP)
> > to check the HPA after the IOMMU walks the existing I/O page tables.
> > So reasonably VFIO/IOMMUFD could continue to manage those I/O
> > page tables including both private and shared memory, with a hint to
> > know where to find the pfn (host page table or guest_memfd).
> > 
> > But TDX Connect introduces a new I/O page table format (same as secure
> > EPT) for mapping the private memory and further requires sharing the
> > secure-EPT between CPU/IOMMU for private. Then it appears to be
> > a different story.
> 
> Yes. This seems to be the future and more in-line with in-place/in-kernel
> conversion as e.g., pKVM wants to have it. If you want to avoid user space
> altogether when doing shared<->private conversions, then letting user space
> manage the IOMMUs is not going to work.
> 
> 
> If we ever have to go down that path (MAP_DMA of guest_memfd), we could have
> two RAMDiscardManager for a RAM region, just like we have two memory
> backends: one for shared memory populate/discard (what this series tries to
> achieve), one for private memory populate/discard.
> 
> The thing is, that private memory will always have to be special-cased all
> over the place either way, unfortunately.
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 
>
Re: [RFC PATCH 0/6] Enable shared device assignment
Posted by David Hildenbrand 1 month, 1 week ago
On 31.07.24 09:12, Xu Yilun wrote:
> On Fri, Jul 26, 2024 at 09:08:51AM +0200, David Hildenbrand wrote:
>> On 26.07.24 07:02, Tian, Kevin wrote:
>>>> From: David Hildenbrand <david@redhat.com>
>>>> Sent: Thursday, July 25, 2024 10:04 PM
>>>>
>>>>> Open
>>>>> ====
>>>>> Implementing a RamDiscardManager to notify VFIO of page conversions
>>>>> causes changes in semantics: private memory is treated as discarded (or
>>>>> hot-removed) memory. This isn't aligned with the expectation of current
>>>>> RamDiscardManager users (e.g. VFIO or live migration) who really
>>>>> expect that discarded memory is hot-removed and thus can be skipped
>>>> when
>>>>> the users are processing guest memory. Treating private memory as
>>>>> discarded won't work in future if VFIO or live migration needs to handle
>>>>> private memory. e.g. VFIO may need to map private memory to support
>>>>> Trusted IO and live migration for confidential VMs need to migrate
>>>>> private memory.
>>>>
>>>> "VFIO may need to map private memory to support Trusted IO"
>>>>
>>>> I've been told that the way we handle shared memory won't be the way
>>>> this is going to work with guest_memfd. KVM will coordinate directly
>>>> with VFIO or $whatever and update the IOMMU tables itself right in the
>>>> kernel; the pages are pinned/owned by guest_memfd, so that will just
>>>> work. So I don't consider that currently a concern. guest_memfd private
>>>> memory is not mapped into user page tables and as it currently seems it
>>>> never will be.
>>>
>>> Or could extend MAP_DMA to accept guest_memfd+offset in place of
> 
> With TIO, I can imagine several buffer sharing requirements: KVM maps VFIO
> owned private MMIO, IOMMU maps gmem owned private memory, IOMMU maps VFIO
> owned private MMIO. These buffers cannot be found by user page table
> anymore. I'm wondering it would be messy to have specific PFN finding
> methods for each FD type. Is it possible we have a unified way for
> buffer sharing and PFN finding, is dma-buf a candidate?

No expert on that, so I'm afraid I can't help.

> 
>>> 'vaddr' and have VFIO/IOMMUFD call guest_memfd helpers to retrieve
>>> the pinned pfn.
>>
>> In theory yes, and I've been thinking of the same for a while. Until people
>> told me that it is unlikely that it will work that way in the future.
> 
> Could you help specify why it won't work? As Kevin mentioned below, SEV-TIO
> may still allow userspace to manage the IOMMU mapping for private. I'm
> not sure how they map private memory for IOMMU without touching gmemfd.

I raised that question in [1]:

"How would the device be able to grab/access "private memory", if not 
via the user page tables?"

Jason summarized it as "The approaches I'm aware of require the secure 
world to own the IOMMU and generate the IOMMU page tables. So we will 
not use a GUP approach with VFIO today as the kernel will not have any 
reason to generate a page table in the first place. Instead we will say 
"this PCI device translates through the secure world" and walk away."

I think for some cVM approaches it really cannot work without letting 
KVM/secure world handle the IOMMU (e.g., sharing of page tables between 
IOMMU and KVM).

For your use case it *might* work, but I am wondering if this is how it 
should be done, and if there are better alternatives.


[1] https://lkml.org/lkml/2024/6/20/920

-- 
Cheers,

David / dhildenb