[RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files

Ackerley Tng posted 39 patches 1 year, 4 months ago
There is a newer version of this series
[RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files
Posted by Ackerley Tng 1 year, 4 months ago
guest_memfd files can always be mmap()ed to userspace, but
faultability is controlled by an attribute on the inode.

Co-developed-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Co-developed-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Ackerley Tng <ackerleytng@google.com>

---
 virt/kvm/guest_memfd.c | 46 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 44 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index b603518f7b62..fc2483e35876 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -781,7 +781,8 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 {
 	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
 	pgoff_t start = offset >> PAGE_SHIFT;
-	pgoff_t end = (offset + len) >> PAGE_SHIFT;
+	pgoff_t nr = len >> PAGE_SHIFT;
+	pgoff_t end = start + nr;
 	struct kvm_gmem *gmem;
 
 	/*
@@ -790,6 +791,9 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 	 */
 	filemap_invalidate_lock(inode->i_mapping);
 
+	/* TODO: Check if even_cows should be 0 or 1 */
+	unmap_mapping_range(inode->i_mapping, start, len, 0);
+
 	list_for_each_entry(gmem, gmem_list, entry)
 		kvm_gmem_invalidate_begin(gmem, start, end);
 
@@ -946,6 +950,9 @@ static void kvm_gmem_hugetlb_teardown(struct inode *inode)
 {
 	struct kvm_gmem_hugetlb *hgmem;
 
+	/* TODO: Check if even_cows should be 0 or 1 */
+	unmap_mapping_range(inode->i_mapping, 0, LLONG_MAX, 0);
+
 	truncate_inode_pages_final_prepare(inode->i_mapping);
 	kvm_gmem_hugetlb_truncate_folios_range(inode, 0, LLONG_MAX);
 
@@ -1003,11 +1010,46 @@ static void kvm_gmem_init_mount(void)
 	kvm_gmem_mnt = kern_mount(&kvm_gmem_fs);
 	BUG_ON(IS_ERR(kvm_gmem_mnt));
 
-	/* For giggles. Userspace can never map this anyways. */
 	kvm_gmem_mnt->mnt_flags |= MNT_NOEXEC;
 }
 
+static vm_fault_t kvm_gmem_fault(struct vm_fault *vmf)
+{
+	struct inode *inode;
+	struct folio *folio;
+
+	inode = file_inode(vmf->vma->vm_file);
+	if (!kvm_gmem_is_faultable(inode, vmf->pgoff))
+		return VM_FAULT_SIGBUS;
+
+	folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+	if (!folio)
+		return VM_FAULT_SIGBUS;
+
+	vmf->page = folio_file_page(folio, vmf->pgoff);
+	return VM_FAULT_LOCKED;
+}
+
+static const struct vm_operations_struct kvm_gmem_vm_ops = {
+	.fault = kvm_gmem_fault,
+};
+
+static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) !=
+	    (VM_SHARED | VM_MAYSHARE)) {
+		return -EINVAL;
+	}
+
+	file_accessed(file);
+	vm_flags_set(vma, VM_DONTDUMP);
+	vma->vm_ops = &kvm_gmem_vm_ops;
+
+	return 0;
+}
+
 static struct file_operations kvm_gmem_fops = {
+	.mmap		= kvm_gmem_mmap,
 	.open		= generic_file_open,
 	.release	= kvm_gmem_release,
 	.fallocate	= kvm_gmem_fallocate,
-- 
2.46.0.598.g6f2099f65c-goog
Re: [RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files
Posted by Yan Zhao 10 months, 1 week ago
On Tue, Sep 10, 2024 at 11:43:58PM +0000, Ackerley Tng wrote:
> guest_memfd files can always be mmap()ed to userspace, but
> faultability is controlled by an attribute on the inode.
> 
> Co-developed-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> 
> ---
>  virt/kvm/guest_memfd.c | 46 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index b603518f7b62..fc2483e35876 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -781,7 +781,8 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>  {
Hi Ackerley,

If userspace mmaps a guest_memfd to a VA when a GFN range is shared, it looks
that even after the GFN range has been successfully converted to private,
userspace can still call madvise(mem, size, MADV_REMOVE) on the userspace VA.
This action triggers kvm_gmem_punch_hole() and kvm_gmem_invalidate_begin(),
which can zap the private GFNs in the EPT.

Is this behavior intended for in-place conversion, and could it potentially lead
to private GFN ranges being accidentally zapped from the EPT?

Apologies if I missed any related discussions on this topic.

Thanks
Yan

>  	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
>  	pgoff_t start = offset >> PAGE_SHIFT;
> -	pgoff_t end = (offset + len) >> PAGE_SHIFT;
> +	pgoff_t nr = len >> PAGE_SHIFT;
> +	pgoff_t end = start + nr;
>  	struct kvm_gmem *gmem;
>  
>  	/*
> @@ -790,6 +791,9 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>  	 */
>  	filemap_invalidate_lock(inode->i_mapping);
>  
> +	/* TODO: Check if even_cows should be 0 or 1 */
> +	unmap_mapping_range(inode->i_mapping, start, len, 0);
> +
>  	list_for_each_entry(gmem, gmem_list, entry)
>  		kvm_gmem_invalidate_begin(gmem, start, end);
>
Re: [RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files
Posted by Ackerley Tng 9 months, 2 weeks ago
Yan Zhao <yan.y.zhao@intel.com> writes:

> On Tue, Sep 10, 2024 at 11:43:58PM +0000, Ackerley Tng wrote:
>> guest_memfd files can always be mmap()ed to userspace, but
>> faultability is controlled by an attribute on the inode.
>> 
>> Co-developed-by: Fuad Tabba <tabba@google.com>
>> Signed-off-by: Fuad Tabba <tabba@google.com>
>> Co-developed-by: Ackerley Tng <ackerleytng@google.com>
>> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
>> 
>> ---
>>  virt/kvm/guest_memfd.c | 46 ++++++++++++++++++++++++++++++++++++++++--
>>  1 file changed, 44 insertions(+), 2 deletions(-)
>> 
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index b603518f7b62..fc2483e35876 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -781,7 +781,8 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>  {
> Hi Ackerley,
>
> If userspace mmaps a guest_memfd to a VA when a GFN range is shared, it looks
> that even after the GFN range has been successfully converted to private,
> userspace can still call madvise(mem, size, MADV_REMOVE) on the userspace VA.
> This action triggers kvm_gmem_punch_hole() and kvm_gmem_invalidate_begin(),
> which can zap the private GFNs in the EPT.
>
> Is this behavior intended for in-place conversion, and could it potentially lead
> to private GFN ranges being accidentally zapped from the EPT?
>
> Apologies if I missed any related discussions on this topic.

No worries and thank you for your review! The next revision will not be
requiring userspace to do madvise(MADV_REMOVE), because memory could be
mapped in multiple processes, so unmapping from the kernel saves the
trouble of coordination in userspace between multiple processes.

>
> Thanks
> Yan
>
>>  	struct list_head *gmem_list = &inode->i_mapping->i_private_list;
>>  	pgoff_t start = offset >> PAGE_SHIFT;
>> -	pgoff_t end = (offset + len) >> PAGE_SHIFT;
>> +	pgoff_t nr = len >> PAGE_SHIFT;
>> +	pgoff_t end = start + nr;
>>  	struct kvm_gmem *gmem;
>>  
>>  	/*
>> @@ -790,6 +791,9 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>  	 */
>>  	filemap_invalidate_lock(inode->i_mapping);
>>  
>> +	/* TODO: Check if even_cows should be 0 or 1 */
>> +	unmap_mapping_range(inode->i_mapping, start, len, 0);
>> +
>>  	list_for_each_entry(gmem, gmem_list, entry)
>>  		kvm_gmem_invalidate_begin(gmem, start, end);
>>
Re: [RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files
Posted by Peter Xu 11 months, 1 week ago
On Tue, Sep 10, 2024 at 11:43:58PM +0000, Ackerley Tng wrote:
> @@ -790,6 +791,9 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>  	 */
>  	filemap_invalidate_lock(inode->i_mapping);
>  
> +	/* TODO: Check if even_cows should be 0 or 1 */
> +	unmap_mapping_range(inode->i_mapping, start, len, 0);

Should be s/start/offset/ here, or should expect some filemap crash assert
on non-zero mapcounts (when it starts to matter).

Btw, it would be nice if the new version would allow kvm to be compiled as
a module.  Currently it uses a lot of mm functions that are not yet
exported, so AFAIU it will only build if kvm is builtin.

Thanks,

-- 
Peter Xu
Re: [RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files
Posted by Peter Xu 1 year ago
On Tue, Sep 10, 2024 at 11:43:58PM +0000, Ackerley Tng wrote:
> @@ -790,6 +791,9 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>  	 */
>  	filemap_invalidate_lock(inode->i_mapping);
>  
> +	/* TODO: Check if even_cows should be 0 or 1 */
> +	unmap_mapping_range(inode->i_mapping, start, len, 0);
> +
>  	list_for_each_entry(gmem, gmem_list, entry)
>  		kvm_gmem_invalidate_begin(gmem, start, end);
>  
> @@ -946,6 +950,9 @@ static void kvm_gmem_hugetlb_teardown(struct inode *inode)
>  {
>  	struct kvm_gmem_hugetlb *hgmem;
>  
> +	/* TODO: Check if even_cows should be 0 or 1 */
> +	unmap_mapping_range(inode->i_mapping, 0, LLONG_MAX, 0);

Setting to 0 is ok in both places: even_cows only applies to MAP_PRIVATE,
which gmemfd doesn't support.  So feel free to drop the two comment lines.

Thanks,

-- 
Peter Xu
Re: [RFC PATCH 27/39] KVM: guest_memfd: Allow mmapping guest_memfd files
Posted by Ackerley Tng 9 months, 2 weeks ago
Peter Xu <peterx@redhat.com> writes:

> On Tue, Sep 10, 2024 at 11:43:58PM +0000, Ackerley Tng wrote:
>> @@ -790,6 +791,9 @@ static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>>  	 */
>>  	filemap_invalidate_lock(inode->i_mapping);
>>  
>> +	/* TODO: Check if even_cows should be 0 or 1 */
>> +	unmap_mapping_range(inode->i_mapping, start, len, 0);
>> +
>>  	list_for_each_entry(gmem, gmem_list, entry)
>>  		kvm_gmem_invalidate_begin(gmem, start, end);
>>  
>> @@ -946,6 +950,9 @@ static void kvm_gmem_hugetlb_teardown(struct inode *inode)
>>  {
>>  	struct kvm_gmem_hugetlb *hgmem;
>>  
>> +	/* TODO: Check if even_cows should be 0 or 1 */
>> +	unmap_mapping_range(inode->i_mapping, 0, LLONG_MAX, 0);
>
> Setting to 0 is ok in both places: even_cows only applies to MAP_PRIVATE,
> which gmemfd doesn't support.  So feel free to drop the two comment lines.
>
> Thanks,
>
> -- 
> Peter Xu

Thank you for reviewing and helping me check on this!