From nobody Sun Nov 24 00:47:18 2024 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D12A721A712 for ; Fri, 8 Nov 2024 15:51:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081071; cv=none; b=Wk5pPF/s0iHnt0zzBFQxHJxFpL1IXUrFSrj3avFCyXJKSdMppsBqpiNRfe0xHJAGWwNni8tM63ogwxWhCZPRGHL465K7PLx6DgsCoFu3XOIyXfQWBiR2/31YlGh5cq4WBQEK4v7OyvBq5faS/HRyIbOpNypMQFi+yC9e2YOIivQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081071; c=relaxed/simple; bh=LScR6Xj7avBL1rJJLStBikYzGWJDQNgm+T6bJpW8hgU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Xag/Mt2R4VJiGY+bXFQa+RVsaH/kJ1b7pNVhpGMY6gkKxxvSNHMNBFd0viQFHNb/Doix1dpiMU4aItrQKVTDIdl8VCGIOhWcXdRJlqle8OEg4R4g9eTt2h42/Aqbms1OQNqtFg0K60w2H57R8nn6iBscMpunMAYMpI6Tp9zsFc4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=irM32/1H; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="irM32/1H" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731081067; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KbK3ApElp81ArROKxgA5UKvEaAc0M1YUxQEdMoV1TmE=; b=irM32/1H5AVJjaJbo0OQ10VYVZuCUtxq9RifsIwV0aQ6SNOROE7VRppmv+NxFRYwnRX6WA qFy8z64mDNywinuMFzRcGZDvNdmCSbqTiC+NUR0N7k9JLFz0xJDB8ViaslSfnhzPa2hEGx Zu7kaO2muHpPK2w2K0/3j0iCQaw2MSk= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-365-5j0Jw1_dNpKYQfnr7qmC4A-1; Fri, 08 Nov 2024 10:51:05 -0500 X-MC-Unique: 5j0Jw1_dNpKYQfnr7qmC4A-1 X-Mimecast-MFC-AGG-ID: 5j0Jw1_dNpKYQfnr7qmC4A Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 25F1F1953943; Fri, 8 Nov 2024 15:51:04 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6797A300019F; Fri, 8 Nov 2024 15:51:03 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: michael.roth@amd.com, seanjc@google.com Subject: [PATCH 1/3] KVM: gmem: allocate private data for the gmem inode Date: Fri, 8 Nov 2024 10:50:54 -0500 Message-ID: <20241108155056.332412-2-pbonzini@redhat.com> In-Reply-To: <20241108155056.332412-1-pbonzini@redhat.com> References: <20241108155056.332412-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" In preparation for removing the usage of the uptodate flag, reintroduce the gmem filesystem type. We need it in order to free the private inode information. Signed-off-by: Paolo Bonzini --- include/uapi/linux/magic.h | 1 + virt/kvm/guest_memfd.c | 117 +++++++++++++++++++++++++++++++++---- virt/kvm/kvm_main.c | 7 ++- virt/kvm/kvm_mm.h | 8 ++- 4 files changed, 119 insertions(+), 14 deletions(-) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index bb575f3ab45e..d856dd6a7ed9 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -103,5 +103,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ +#define KVM_GUEST_MEM_MAGIC 0x474d454d /* "GMEM" */ =20 #endif /* __LINUX_MAGIC_H__ */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 8f079a61a56d..3ea5a7597fd4 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,9 +4,74 @@ #include #include #include +#include =20 #include "kvm_mm.h" =20 +/* Do all the filesystem crap just for evict_inode... */ + +static struct vfsmount *kvm_gmem_mnt __read_mostly; + +static void gmem_evict_inode(struct inode *inode) +{ + kvfree(inode->i_private); + truncate_inode_pages_final(&inode->i_data); + clear_inode(inode); +} + +static const struct super_operations gmem_super_operations =3D { + .drop_inode =3D generic_delete_inode, + .evict_inode =3D gmem_evict_inode, + .statfs =3D simple_statfs, +}; + +static int gmem_init_fs_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx =3D init_pseudo(fc, KVM_GUEST_MEM_MAGIC); + if (!ctx) + return -ENOMEM; + + ctx->ops =3D &gmem_super_operations; + return 0; +} + +static struct file_system_type kvm_gmem_fs_type =3D { + .name =3D "kvm_gmemfs", + .init_fs_context =3D gmem_init_fs_context, + .kill_sb =3D kill_anon_super, +}; + +static struct file *kvm_gmem_create_file(const char *name, const struct fi= le_operations *fops) +{ + struct inode *inode; + struct file *file; + + if (fops->owner && !try_module_get(fops->owner)) + return ERR_PTR(-ENOENT); + + inode =3D alloc_anon_inode(kvm_gmem_mnt->mnt_sb); + if (IS_ERR(inode)) { + file =3D ERR_CAST(inode); + goto err; + } + file =3D alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, fops); + if (IS_ERR(file)) + goto err_iput; + + return file; + +err_iput: + iput(inode); +err: + module_put(fops->owner); + return file; +} + + +struct kvm_gmem_inode { + unsigned long flags; +}; + struct kvm_gmem { struct kvm *kvm; struct xarray bindings; @@ -308,9 +373,31 @@ static struct file_operations kvm_gmem_fops =3D { .fallocate =3D kvm_gmem_fallocate, }; =20 -void kvm_gmem_init(struct module *module) +int kvm_gmem_init(struct module *module) { + int ret; + + ret =3D register_filesystem(&kvm_gmem_fs_type); + if (ret) { + pr_err("kvm-gmem: cannot register file system (%d)\n", ret); + return ret; + } + + kvm_gmem_mnt =3D kern_mount(&kvm_gmem_fs_type); + if (IS_ERR(kvm_gmem_mnt)) { + pr_err("kvm-gmem: kernel mount failed (%ld)\n", PTR_ERR(kvm_gmem_mnt)); + return PTR_ERR(kvm_gmem_mnt); + } + kvm_gmem_fops.owner =3D module; + + return 0; +} + +void kvm_gmem_exit(void) +{ + kern_unmount(kvm_gmem_mnt); + unregister_filesystem(&kvm_gmem_fs_type); } =20 static int kvm_gmem_migrate_folio(struct address_space *mapping, @@ -394,15 +481,23 @@ static const struct inode_operations kvm_gmem_iops = =3D { =20 static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) { - const char *anon_name =3D "[kvm-gmem]"; + const char *gmem_name =3D "[kvm-gmem]"; + struct kvm_gmem_inode *i_gmem; struct kvm_gmem *gmem; struct inode *inode; struct file *file; int fd, err; =20 + i_gmem =3D kvzalloc(sizeof(struct kvm_gmem_inode), GFP_KERNEL); + if (!i_gmem) + return -ENOMEM; + i_gmem->flags =3D flags; + fd =3D get_unused_fd_flags(0); - if (fd < 0) - return fd; + if (fd < 0) { + err =3D fd; + goto err_i_gmem; + } =20 gmem =3D kzalloc(sizeof(*gmem), GFP_KERNEL); if (!gmem) { @@ -410,19 +505,19 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t = size, u64 flags) goto err_fd; } =20 - file =3D anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem, - O_RDWR, NULL); + file =3D kvm_gmem_create_file(gmem_name, &kvm_gmem_fops); if (IS_ERR(file)) { err =3D PTR_ERR(file); goto err_gmem; } =20 + inode =3D file->f_inode; + + file->f_mapping =3D inode->i_mapping; + file->private_data =3D gmem; file->f_flags |=3D O_LARGEFILE; =20 - inode =3D file->f_inode; - WARN_ON(file->f_mapping !=3D inode->i_mapping); - - inode->i_private =3D (void *)(unsigned long)flags; + inode->i_private =3D i_gmem; inode->i_op =3D &kvm_gmem_iops; inode->i_mapping->a_ops =3D &kvm_gmem_aops; inode->i_mode |=3D S_IFREG; @@ -444,6 +539,8 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t si= ze, u64 flags) kfree(gmem); err_fd: put_unused_fd(fd); +err_i_gmem: + kvfree(i_gmem); return err; } =20 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 279e03029ce1..8b7b4e0eb639 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -6504,7 +6504,9 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align,= struct module *module) if (WARN_ON_ONCE(r)) goto err_vfio; =20 - kvm_gmem_init(module); + r =3D kvm_gmem_init(module); + if (r) + goto err_gmem; =20 r =3D kvm_init_virtualization(); if (r) @@ -6525,6 +6527,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align,= struct module *module) err_register: kvm_uninit_virtualization(); err_virt: + kvm_gmem_exit(); +err_gmem: kvm_vfio_ops_exit(); err_vfio: kvm_async_pf_deinit(); @@ -6556,6 +6560,7 @@ void kvm_exit(void) for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); + kvm_gmem_exit(); kvm_vfio_ops_exit(); kvm_async_pf_deinit(); kvm_irqfd_exit(); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index 715f19669d01..91e4202574a8 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -36,15 +36,17 @@ static inline void gfn_to_pfn_cache_invalidate_start(st= ruct kvm *kvm, #endif /* HAVE_KVM_PFNCACHE */ =20 #ifdef CONFIG_KVM_PRIVATE_MEM -void kvm_gmem_init(struct module *module); +int kvm_gmem_init(struct module *module); +void kvm_gmem_exit(void); int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args); int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset); void kvm_gmem_unbind(struct kvm_memory_slot *slot); #else -static inline void kvm_gmem_init(struct module *module) +static inline void kvm_gmem_exit(void) {} +static inline int kvm_gmem_init(struct module *module) { - + return 0; } =20 static inline int kvm_gmem_bind(struct kvm *kvm, --=20 2.43.5 From nobody Sun Nov 24 00:47:18 2024 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C77C421B430 for ; Fri, 8 Nov 2024 15:51:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081072; cv=none; b=mIlzZlEhJERdo8vjp091ermjkSdSq/ItmughpeRd/dcPBbthcRT2HKtsJAmYxj/XjsalrbUJOyYcYpiEDJz1URzACpndqDckAV2ma1RyfRnsiIyfYxToN+cvBATP8jLy17klkWggsWkGqmYWSgvuy1M6BIsJaGdZtw6WJXnMZbU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081072; c=relaxed/simple; bh=/AZjLPqPEaywOa4BXgZaIJNjugO2+x5GxR4rgGGfGhk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=a1UBePeKquCD5nt5wmCybNTRyGnYOd9TkQY5k7WPlF8r5mwVgmCNktXNbjIjetwQpvb1h6KY7IpBWdRzFdBQeEKFy9wD0NK2D6rVzJnMJJFFchWVu0h4TdBkkwpTRKr3qOVRIWueconCrt8YcvtdxW3dN5xw2mAxXhyVgD18Jig= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TqUnSSqy; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TqUnSSqy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731081069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=clr/KAgDBcdG6y4j8m/K4hUntnUnW17taVZTjfs/x9U=; b=TqUnSSqyWopQjEycglCurmMTRE71clsMYA7v4hBhVbi707Q9XXLoaW0XzdMVcAELJJmd4x 4ZP8zhxG3rzQIoCdAXFh6UBdg35RoiU/cO8FC3DpHqnBQYn/V9R3wKM/zz85aCRiMPbAen cNaOXyHrs5hDzCLUnIdiyR81OZAFgmE= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-477-GS94pOM3PfaBu3GK_iLKcw-1; Fri, 08 Nov 2024 10:51:06 -0500 X-MC-Unique: GS94pOM3PfaBu3GK_iLKcw-1 X-Mimecast-MFC-AGG-ID: GS94pOM3PfaBu3GK_iLKcw Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 135911956083; Fri, 8 Nov 2024 15:51:05 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 58D27300019E; Fri, 8 Nov 2024 15:51:04 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: michael.roth@amd.com, seanjc@google.com Subject: [PATCH 2/3] KVM: gmem: add a complete set of functions to query page preparedness Date: Fri, 8 Nov 2024 10:50:55 -0500 Message-ID: <20241108155056.332412-3-pbonzini@redhat.com> In-Reply-To: <20241108155056.332412-1-pbonzini@redhat.com> References: <20241108155056.332412-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" In preparation for moving preparedness out of the folio flags, pass the struct file* or struct inode* down to kvm_gmem_mark_prepared, as well as the offset within the gmem file. Introduce new functions to unprepare page on punch-hole, and to query the state. Signed-off-by: Paolo Bonzini --- virt/kvm/guest_memfd.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 3ea5a7597fd4..416e02a00cae 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -107,18 +107,28 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm, = struct kvm_memory_slot *slo return 0; } =20 -static inline void kvm_gmem_mark_prepared(struct folio *folio) +static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, struc= t folio *folio) { folio_mark_uptodate(folio); } =20 +static void kvm_gmem_mark_range_unprepared(struct inode *inode, pgoff_t in= dex, pgoff_t npages) +{ +} + +static bool kvm_gmem_is_prepared(struct file *file, pgoff_t index, struct = folio *folio) +{ + return folio_test_uptodate(folio); +} + /* * Process @folio, which contains @gfn, so that the guest can use it. * The folio must be locked and the gfn must be contained in @slot. * On successful return the guest sees a zero page so as to avoid * leaking host data and the up-to-date flag is set. */ -static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot = *slot, +static int kvm_gmem_prepare_folio(struct kvm *kvm, struct file *file, + struct kvm_memory_slot *slot, gfn_t gfn, struct folio *folio) { unsigned long nr_pages, i; @@ -147,7 +157,7 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, stru= ct kvm_memory_slot *slot, index =3D ALIGN_DOWN(index, 1 << folio_order(folio)); r =3D __kvm_gmem_prepare_folio(kvm, slot, index, folio); if (!r) - kvm_gmem_mark_prepared(folio); + kvm_gmem_mark_prepared(file, index, folio); =20 return r; } @@ -231,6 +241,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, lo= ff_t offset, loff_t len) kvm_gmem_invalidate_begin(gmem, start, end); =20 truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); + kvm_gmem_mark_range_unprepared(inode, start, end - start); =20 list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_end(gmem, start, end); @@ -682,7 +693,7 @@ __kvm_gmem_get_pfn(struct file *file, struct kvm_memory= _slot *slot, if (max_order) *max_order =3D 0; =20 - *is_prepared =3D folio_test_uptodate(folio); + *is_prepared =3D kvm_gmem_is_prepared(file, index, folio); return folio; } =20 @@ -704,7 +715,7 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory= _slot *slot, } =20 if (!is_prepared) - r =3D kvm_gmem_prepare_folio(kvm, slot, gfn, folio); + r =3D kvm_gmem_prepare_folio(kvm, file, slot, gfn, folio); =20 folio_unlock(folio); if (r < 0) @@ -781,8 +792,10 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gf= n, void __user *src, long =20 p =3D src ? src + i * PAGE_SIZE : NULL; ret =3D post_populate(kvm, gfn, pfn, p, max_order, opaque); - if (!ret) - kvm_gmem_mark_prepared(folio); + if (!ret) { + pgoff_t index =3D gfn - slot->base_gfn + slot->gmem.pgoff; + kvm_gmem_mark_prepared(file, index, folio); + } =20 put_folio_and_exit: folio_put(folio); --=20 2.43.5 From nobody Sun Nov 24 00:47:18 2024 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B4CB21B441 for ; Fri, 8 Nov 2024 15:51:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081073; cv=none; b=KRnt92wPJQTy0nVnADNd3vhUxeZcL5EcrMOqpc+W7bAv+9aZgYG3quJcD8KP2vAHigkfKiyk3WSkqOMD/6YSTZTIgDrmAZeXIGtpfXQIX9boOjX6n+E0v0FlsME7p1pGam6flmhjtUHTU2j2rJPRhoERhrBRGMy/Dvv8w8lNV0g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731081073; c=relaxed/simple; bh=wss1YTUGIt5QGo6tSV9djIgAdvk1YIPYCs+TInKCU54=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HDRhizOy+fxyB1idQ02Yo82iL6tBXIe2k8sYbZy92rAo2pqcZsd/XwRlye9L3iYKabh/KSio43FPTsHWNQNo2x4Q+TbyZBPRUiOClV/dU7M7BOBwNAPMZNfDk9W/gF8n/aGddHXEChtgskGmEUluC9Y51b4ImBzeCC5GiSzU9Mo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=LsLrsDTn; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LsLrsDTn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731081069; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+bBieyHX2d1zYDc3Erl++MDH1EG7vt2Iio6NE6nW7HE=; b=LsLrsDTnmcUndwH0ZF03tz/YCik7osE9QPMXdNcskcWTriknlCzrsh9BUn2KgkJWmYtMtG hLIlXlQhcvCXZC1dY+7ILDd97vtIQUWHsSece3nJfsgdAjdFkvG9c77q99JZ4UEk44Ttka rRKNK+B13h/CgZ12sBW2DK5hV4olTh4= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-347-z2_nwBsXMMmZ3Q4pqaaPwA-1; Fri, 08 Nov 2024 10:51:07 -0500 X-MC-Unique: z2_nwBsXMMmZ3Q4pqaaPwA-1 X-Mimecast-MFC-AGG-ID: z2_nwBsXMMmZ3Q4pqaaPwA Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 01C4819560B0; Fri, 8 Nov 2024 15:51:06 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 47440300019E; Fri, 8 Nov 2024 15:51:05 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: michael.roth@amd.com, seanjc@google.com Subject: [PATCH 3/3] KVM: gmem: track preparedness a page at a time Date: Fri, 8 Nov 2024 10:50:56 -0500 Message-ID: <20241108155056.332412-4-pbonzini@redhat.com> In-Reply-To: <20241108155056.332412-1-pbonzini@redhat.com> References: <20241108155056.332412-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Content-Type: text/plain; charset="utf-8" With support for large pages in gmem, it may happen that part of the gmem is mapped with large pages and part with 4k pages. For example, if a conversion happens on a small region within a large page, the large page has to be smashed into small pages even if backed by a large folio. Each of the small pages will have its own state of preparedness, which makes it harder to use the uptodate flag for preparedness. Just switch to a bitmap in the inode's i_private data. This is a bit gnarly because ordinary bitmap operations in Linux are not atomic, but otherwise not too hard. Signed-off-by: Paolo Bonzini --- virt/kvm/guest_memfd.c | 103 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 100 insertions(+), 3 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 416e02a00cae..e08503dfdd8a 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -68,8 +68,13 @@ static struct file *kvm_gmem_create_file(const char *nam= e, const struct file_ope } =20 =20 +#define KVM_GMEM_INODE_SIZE(size) \ + struct_size_t(struct kvm_gmem_inode, prepared, \ + DIV_ROUND_UP(size, PAGE_SIZE * BITS_PER_LONG)) + struct kvm_gmem_inode { unsigned long flags; + unsigned long prepared[]; }; =20 struct kvm_gmem { @@ -107,18 +112,110 @@ static int __kvm_gmem_prepare_folio(struct kvm *kvm,= struct kvm_memory_slot *slo return 0; } =20 +/* + * The bitmap of prepared pages has to be accessed atomically, because + * preparation is not protected by any guest. This unfortunately means + * that we cannot use regular bitmap operations. + * + * The logic becomes a bit simpler for set and test, which operate a + * folio at a time and therefore can assume that the range is naturally + * aligned (meaning that either it is smaller than a word, or it is does + * not include fractions of a word). For punch-hole operations however + * there is all the complexity. + */ + +static void bitmap_set_atomic_word(unsigned long *p, unsigned long start, = unsigned long len) +{ + unsigned long mask_to_set =3D + BITMAP_FIRST_WORD_MASK(start) & BITMAP_LAST_WORD_MASK(start + len); + + atomic_long_or(mask_to_set, (atomic_long_t *)p); +} + +static void bitmap_clear_atomic_word(unsigned long *p, unsigned long start= , unsigned long len) +{ + unsigned long mask_to_set =3D + BITMAP_FIRST_WORD_MASK(start) & BITMAP_LAST_WORD_MASK(start + len); + + atomic_long_andnot(mask_to_set, (atomic_long_t *)p); +} + +static bool bitmap_test_allset_word(unsigned long *p, unsigned long start,= unsigned long len) +{ + unsigned long mask_to_set =3D + BITMAP_FIRST_WORD_MASK(start) & BITMAP_LAST_WORD_MASK(start + len); + + return (*p & mask_to_set) =3D=3D mask_to_set; +} + static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, struc= t folio *folio) { - folio_mark_uptodate(folio); + struct kvm_gmem_inode *i_gmem =3D (struct kvm_gmem_inode *)file->f_inode-= >i_private; + unsigned long *p =3D i_gmem->prepared + BIT_WORD(index); + unsigned long npages =3D folio_nr_pages(folio); + + /* Folios must be naturally aligned */ + WARN_ON_ONCE(index & (npages - 1)); + index &=3D ~(npages - 1); + + /* Clear page before updating bitmap. */ + smp_wmb(); + + if (npages < BITS_PER_LONG) { + bitmap_set_atomic_word(p, index, npages); + } else { + BUILD_BUG_ON(BITS_PER_LONG !=3D 64); + memset64((u64 *)p, ~0, BITS_TO_LONGS(npages)); + } } =20 static void kvm_gmem_mark_range_unprepared(struct inode *inode, pgoff_t in= dex, pgoff_t npages) { + struct kvm_gmem_inode *i_gmem =3D (struct kvm_gmem_inode *)inode->i_priva= te; + unsigned long *p =3D i_gmem->prepared + BIT_WORD(index); + + index &=3D BITS_PER_LONG - 1; + if (index) { + int first_word_count =3D min(npages, BITS_PER_LONG - index); + bitmap_clear_atomic_word(p, index, first_word_count); + npages -=3D first_word_count; + p++; + } + + if (npages > BITS_PER_LONG) { + BUILD_BUG_ON(BITS_PER_LONG !=3D 64); + memset64((u64 *)p, 0, BITS_TO_LONGS(npages)); + p +=3D BIT_WORD(npages); + npages &=3D BITS_PER_LONG - 1; + } + + if (npages) + bitmap_clear_atomic_word(p++, 0, npages); } =20 static bool kvm_gmem_is_prepared(struct file *file, pgoff_t index, struct = folio *folio) { - return folio_test_uptodate(folio); + struct kvm_gmem_inode *i_gmem =3D (struct kvm_gmem_inode *)file->f_inode-= >i_private; + unsigned long *p =3D i_gmem->prepared + BIT_WORD(index); + unsigned long npages =3D folio_nr_pages(folio); + bool ret; + + /* Folios must be naturally aligned */ + WARN_ON_ONCE(index & (npages - 1)); + index &=3D ~(npages - 1); + + if (npages < BITS_PER_LONG) { + ret =3D bitmap_test_allset_word(p, index, npages); + } else { + for (; npages > 0; npages -=3D BITS_PER_LONG) + if (*p++ !=3D ~0) + break; + ret =3D (npages =3D=3D 0); + } + + /* Synchronize with kvm_gmem_mark_prepared(). */ + smp_rmb(); + return ret; } =20 /* @@ -499,7 +596,7 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t si= ze, u64 flags) struct file *file; int fd, err; =20 - i_gmem =3D kvzalloc(sizeof(struct kvm_gmem_inode), GFP_KERNEL); + i_gmem =3D kvzalloc(KVM_GMEM_INODE_SIZE(size), GFP_KERNEL); if (!i_gmem) return -ENOMEM; i_gmem->flags =3D flags; --=20 2.43.5 From nobody Sun Nov 24 00:47:18 2024 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 42FCF18EFDE for ; Fri, 8 Nov 2024 16:32:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731083557; cv=none; b=BDYi4juGU0CWsqCMQK0f+d/cBJC7ag0EOJ1s2MkvPmiWYtu5GjUBpuXtBnZEdbwM1cz6c3tmJJ9cB8LM+uzhtcQUXxhzvnCGODqWC5kusTZ9lRmnnVqbBvcFn09GCaKDjOdCOCmFZ0nuYyARPlySww7cierIyAr0j/IjwWj4uJ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731083557; c=relaxed/simple; bh=AnvrjaJy/vrLSe7sUgafShkk6/FA90lKRMl8dVnl0C8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=p28/ukZomfL2snARUkux94wJJ+gs0k6mP68OpWohOy7nfZ+jCPuW9nsVjR0qWnKdA0tJxG3kR+6WHfzDRAsfpjH0vWqOuUp60N64G2zC2TQ4JEFwwwR1IOteerdsSpd+xvUvLIB0drczRLsRpIeKl2cz52VMviwizMH2O9QXD+E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dg8ubsjH; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dg8ubsjH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1731083555; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6ARNxrMJwiTf6cTrp5WO9zitBzCD/zba3zIGWs6M4Po=; b=dg8ubsjHn3+fRx31BLvYCeb4G5ORk4f7+izxo9u7Zkww+zw1aqI6Ehe2/NoMJ3xpynh1G+ +zYcDj5GTvo7UJc2XVkhgbpyxMRpnbHmXrZ5BeXVfwXLqDUqiEDE7cEQRwABvOLARr4KEh 46spXf4hbBxWgWLFBHs2QA3QN3wdpO8= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-596-lhtoJS2zPsC6jXKvUtR1zg-1; Fri, 08 Nov 2024 11:32:31 -0500 X-MC-Unique: lhtoJS2zPsC6jXKvUtR1zg-1 X-Mimecast-MFC-AGG-ID: lhtoJS2zPsC6jXKvUtR1zg Received: from mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3F29A1955F41; Fri, 8 Nov 2024 16:32:30 +0000 (UTC) Received: from virtlab1023.lab.eng.rdu2.redhat.com (virtlab1023.lab.eng.rdu2.redhat.com [10.8.1.187]) by mx-prod-int-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6E95C195607C; Fri, 8 Nov 2024 16:32:29 +0000 (UTC) From: Paolo Bonzini To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: michael.roth@amd.com, seanjc@google.com Subject: [PATCH 2.5/3] KVM: gmem: limit hole-punching to ranges within the file Date: Fri, 8 Nov 2024 11:32:28 -0500 Message-ID: <20241108163228.374110-1-pbonzini@redhat.com> In-Reply-To: <20241108155056.332412-1-pbonzini@redhat.com> References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.0 on 10.30.177.15 Content-Type: text/plain; charset="utf-8" Do not pass out-of-bounds values to kvm_gmem_mark_range_unprepared(). Signed-off-by: Paolo Bonzini --- virt/kvm/guest_memfd.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) Sent separately because I thought this was a bug also in the current code but, on closer look, it is fine because ksys_fallocate checks that there is no overflow. diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 412d49c6d491..7dc89ceef782 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -324,10 +324,17 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *= gmem, pgoff_t start, static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t= len) { struct list_head *gmem_list =3D &inode->i_mapping->i_private_list; - pgoff_t start =3D offset >> PAGE_SHIFT; - pgoff_t end =3D (offset + len) >> PAGE_SHIFT; + loff_t size =3D i_size_read(inode); + pgoff_t start, end; struct kvm_gmem *gmem; =20 + if (offset > size) + return 0; + + len =3D min(size - offset, len); + start =3D offset >> PAGE_SHIFT; + end =3D (offset + len) >> PAGE_SHIFT; + /* * Bindings must be stable across invalidation to ensure the start+end * are balanced. --=20 2.43.5