From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57224225A39 for ; Wed, 14 May 2025 23:42:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266181; cv=none; b=FJjhRGJW17L2nhN3Fa/yWM+1w3rDX0pK1e3Gu+x6MKwBi4DkWMgW2Fh2QAkPJPs5BmIcBsFuqTh3shqhm/w3q5oGUBNfRSIFOrYo1lJac0ssPw6l4exDrrEt6bjsnDTullBG6Hg7r1iuLG+kaaUg5cRnOI7t4Y6YOWfUp/rqW84= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266181; c=relaxed/simple; bh=Zi5Qt3d0WMym+lUD/gE5hxU/OpD90amV+k0m8fS2W0s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JsoqwuqMN1LZ7bMnuqKFawfeEthLB7aLgZirmR3NK12eubMi/Z/eavUoxsK7Sy/v3kAE4PpyySj20WWWEYW++c1xHBtQ+jFFuN1kIGur0qCy6BBND8ZYzLyvOYmm1qr6qbG4jR2+w9z1Fu4FwAboVF5yxRyRSLds9STGdq9Bp7g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=SUid32ki; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="SUid32ki" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30ac618952eso279866a91.2 for ; Wed, 14 May 2025 16:42:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266178; x=1747870978; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=KZmvgYTH+K9kHlJ7hwKtOyUz/0yI5aXWo1zgz6ld0Aw=; b=SUid32kiLfdOFWx64G1dJU8fPZEcBZ2mKn+DD9hKmB/JGxfVyH454ZT2i43fPxQigc yc0LCXyUabC+fzKgrD9hAiDosuE/ts3try4MoZXJOxnWiQt6lTf8Qm3WI0Pp+L7mdQpp zAH0BwcHU+oDuRRf1DTBVuOXX1vHDvCk4ha1LuMv8CAd3fK+BJgL1QbEwoDJb2iKOgHn xBxUSQwkk+QgUFlloGqNPCjWYRRa/NNRixrSWL44M+ReAf1848uFLd/2E31DFniVOD28 PgaNu64A9QOTUgouadDm1Cq+FsSobBWPde9FXkc1S9hxPzIHpwUOS3PtNSzz4kZ6lMfY dEdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266178; x=1747870978; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KZmvgYTH+K9kHlJ7hwKtOyUz/0yI5aXWo1zgz6ld0Aw=; b=bdPYZfh57hcKnrYHPGtQiw57luuKtF4HHeHFcPpMbfM0I65R03OuXOKeJxDlK/pvk0 kiDoInbZf6F9gNuFnrn++dZzc0qgJurGQlGfoUtiNKXcWt6RmbG9yyfxE6Ou1E/VjdXR v54WSb20zv0AIuiohxnGw2+0gKoGJvFe++P5At25c68ytSGgYkjflMO7duTNm/OSadP9 fTTWb115KPmpiYEYtVX+dIdQZkEgbneBFEr2WSmBVLdOmOgK/Ld7Q8RaLukklZKtH9ed 55H7oF7qGMap668je0oOupqpQDcYEbbnyfQpHiIpp0Zwzorklpu24eY8sh1vApFgUGGP ZipA== X-Forwarded-Encrypted: i=1; AJvYcCXxYiX/oqjJ9pqKewuVVkl7tEmoVX6lTYqRlZ2c7ehlcOo5F7eUM1Fubu9d4aVqFKUpMDRPsk9K9rL6cvI=@vger.kernel.org X-Gm-Message-State: AOJu0YwN0ps7GQThXLIRZK4I9QxRzlGmY1pxKngEkg7gz+tnfkde0dRw 8AE+YWUtcGc5sb8TyfP9/X22Cwk97+gex+YELO6/jwDeOCXV7EbzyujUTJdYp/ZKYSZDZSXxZwj /r/cAQFY3Db7dnRTrub2Psg== X-Google-Smtp-Source: AGHT+IFFbb8i/sd+A66D+lahsn99vvDx9WnxGpS0pqUZU7qxgHi5C9VACOWp6JM90ib70m9PHQgY1Qlpup+M/X5IOw== X-Received: from pjbee11.prod.google.com ([2002:a17:90a:fc4b:b0:30a:3021:c1af]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:560b:b0:2fe:d766:ad8e with SMTP id 98e67ed59e1d1-30e2e59bccamr8372424a91.4.1747266178512; Wed, 14 May 2025 16:42:58 -0700 (PDT) Date: Wed, 14 May 2025 16:41:40 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 01/51] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" guest_memfd's inode represents memory the guest_memfd is providing. guest_memfd's file represents a struct kvm's view of that memory. Using a custom inode allows customization of the inode teardown process via callbacks. For example, ->evict_inode() allows customization of the truncation process on file close, and ->destroy_inode() and ->free_inode() allow customization of the inode freeing process. Customizing the truncation process allows flexibility in management of guest_memfd memory and customization of the inode freeing process allows proper cleanup of memory metadata stored on the inode. Memory metadata is more appropriately stored on the inode (as opposed to the file), since the metadata is for the memory and is not unique to a specific binding and struct kvm. Signed-off-by: Fuad Tabba Signed-off-by: Ackerley Tng Change-Id: I5c23bce8fefe492b40b8042ece1e81448752da99 --- include/uapi/linux/magic.h | 1 + virt/kvm/guest_memfd.c | 134 +++++++++++++++++++++++++++++++------ virt/kvm/kvm_main.c | 7 +- virt/kvm/kvm_mm.h | 9 ++- 4 files changed, 125 insertions(+), 26 deletions(-) diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h index bb575f3ab45e..638ca21b7a90 100644 --- a/include/uapi/linux/magic.h +++ b/include/uapi/linux/magic.h @@ -103,5 +103,6 @@ #define DEVMEM_MAGIC 0x454d444d /* "DMEM" */ #define SECRETMEM_MAGIC 0x5345434d /* "SECM" */ #define PID_FS_MAGIC 0x50494446 /* "PIDF" */ +#define GUEST_MEMFD_MAGIC 0x474d454d /* "GMEM" */ =20 #endif /* __LINUX_MAGIC_H__ */ diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index b8e247063b20..239d0f13dcc1 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1,12 +1,16 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include +#include #include +#include #include -#include =20 #include "kvm_mm.h" =20 +static struct vfsmount *kvm_gmem_mnt; + struct kvm_gmem { struct kvm *kvm; struct xarray bindings; @@ -416,9 +420,51 @@ static struct file_operations kvm_gmem_fops =3D { .fallocate =3D kvm_gmem_fallocate, }; =20 -void kvm_gmem_init(struct module *module) +static const struct super_operations kvm_gmem_super_operations =3D { + .statfs =3D simple_statfs, +}; + +static int kvm_gmem_init_fs_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx; + + if (!init_pseudo(fc, GUEST_MEMFD_MAGIC)) + return -ENOMEM; + + ctx =3D fc->fs_private; + ctx->ops =3D &kvm_gmem_super_operations; + + return 0; +} + +static struct file_system_type kvm_gmem_fs =3D { + .name =3D "kvm_guest_memory", + .init_fs_context =3D kvm_gmem_init_fs_context, + .kill_sb =3D kill_anon_super, +}; + +static int kvm_gmem_init_mount(void) +{ + kvm_gmem_mnt =3D kern_mount(&kvm_gmem_fs); + + if (WARN_ON_ONCE(IS_ERR(kvm_gmem_mnt))) + return PTR_ERR(kvm_gmem_mnt); + + kvm_gmem_mnt->mnt_flags |=3D MNT_NOEXEC; + return 0; +} + +int kvm_gmem_init(struct module *module) { kvm_gmem_fops.owner =3D module; + + return kvm_gmem_init_mount(); +} + +void kvm_gmem_exit(void) +{ + kern_unmount(kvm_gmem_mnt); + kvm_gmem_mnt =3D NULL; } =20 static int kvm_gmem_migrate_folio(struct address_space *mapping, @@ -500,11 +546,71 @@ static const struct inode_operations kvm_gmem_iops = =3D { .setattr =3D kvm_gmem_setattr, }; =20 +static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, + loff_t size, u64 flags) +{ + struct inode *inode; + + inode =3D alloc_anon_secure_inode(kvm_gmem_mnt->mnt_sb, name); + if (IS_ERR(inode)) + return inode; + + inode->i_private =3D (void *)(unsigned long)flags; + inode->i_op =3D &kvm_gmem_iops; + inode->i_mapping->a_ops =3D &kvm_gmem_aops; + inode->i_mode |=3D S_IFREG; + inode->i_size =3D size; + mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); + mapping_set_inaccessible(inode->i_mapping); + /* Unmovable mappings are supposed to be marked unevictable as well. */ + WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); + + return inode; +} + +static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, + u64 flags) +{ + static const char *name =3D "[kvm-gmem]"; + struct inode *inode; + struct file *file; + int err; + + err =3D -ENOENT; + if (!try_module_get(kvm_gmem_fops.owner)) + goto err; + + inode =3D kvm_gmem_inode_make_secure_inode(name, size, flags); + if (IS_ERR(inode)) { + err =3D PTR_ERR(inode); + goto err_put_module; + } + + file =3D alloc_file_pseudo(inode, kvm_gmem_mnt, name, O_RDWR, + &kvm_gmem_fops); + if (IS_ERR(file)) { + err =3D PTR_ERR(file); + goto err_put_inode; + } + + file->f_flags |=3D O_LARGEFILE; + file->private_data =3D priv; + +out: + return file; + +err_put_inode: + iput(inode); +err_put_module: + module_put(kvm_gmem_fops.owner); +err: + file =3D ERR_PTR(err); + goto out; +} + static int __kvm_gmem_create(struct kvm *kvm, loff_t size, u64 flags) { - const char *anon_name =3D "[kvm-gmem]"; struct kvm_gmem *gmem; - struct inode *inode; struct file *file; int fd, err; =20 @@ -518,32 +624,16 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t = size, u64 flags) goto err_fd; } =20 - file =3D anon_inode_create_getfile(anon_name, &kvm_gmem_fops, gmem, - O_RDWR, NULL); + file =3D kvm_gmem_inode_create_getfile(gmem, size, flags); if (IS_ERR(file)) { err =3D PTR_ERR(file); goto err_gmem; } =20 - file->f_flags |=3D O_LARGEFILE; - - inode =3D file->f_inode; - WARN_ON(file->f_mapping !=3D inode->i_mapping); - - inode->i_private =3D (void *)(unsigned long)flags; - inode->i_op =3D &kvm_gmem_iops; - inode->i_mapping->a_ops =3D &kvm_gmem_aops; - inode->i_mode |=3D S_IFREG; - inode->i_size =3D size; - mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER); - mapping_set_inaccessible(inode->i_mapping); - /* Unmovable mappings are supposed to be marked unevictable as well. */ - WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); - kvm_get_kvm(kvm); gmem->kvm =3D kvm; xa_init(&gmem->bindings); - list_add(&gmem->entry, &inode->i_mapping->i_private_list); + list_add(&gmem->entry, &file_inode(file)->i_mapping->i_private_list); =20 fd_install(fd, file); return fd; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6c75f933bfbe..66dfdafbb3b6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -6419,7 +6419,9 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align,= struct module *module) if (WARN_ON_ONCE(r)) goto err_vfio; =20 - kvm_gmem_init(module); + r =3D kvm_gmem_init(module); + if (r) + goto err_gmem; =20 r =3D kvm_init_virtualization(); if (r) @@ -6440,6 +6442,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align,= struct module *module) err_register: kvm_uninit_virtualization(); err_virt: + kvm_gmem_exit(); +err_gmem: kvm_vfio_ops_exit(); err_vfio: kvm_async_pf_deinit(); @@ -6471,6 +6475,7 @@ void kvm_exit(void) for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); kmem_cache_destroy(kvm_vcpu_cache); + kvm_gmem_exit(); kvm_vfio_ops_exit(); kvm_async_pf_deinit(); kvm_irqfd_exit(); diff --git a/virt/kvm/kvm_mm.h b/virt/kvm/kvm_mm.h index ec311c0d6718..be68c29fc4ab 100644 --- a/virt/kvm/kvm_mm.h +++ b/virt/kvm/kvm_mm.h @@ -68,17 +68,20 @@ static inline void gfn_to_pfn_cache_invalidate_start(st= ruct kvm *kvm, #endif /* HAVE_KVM_PFNCACHE */ =20 #ifdef CONFIG_KVM_GMEM -void kvm_gmem_init(struct module *module); +int kvm_gmem_init(struct module *module); +void kvm_gmem_exit(void); int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args); int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset); void kvm_gmem_unbind(struct kvm_memory_slot *slot); #else -static inline void kvm_gmem_init(struct module *module) +static inline int kvm_gmem_init(struct module *module) { - + return 0; } =20 +static inline void kvm_gmem_exit(void) {}; + static inline int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot, unsigned int fd, loff_t offset) --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED88A227B94 for ; Wed, 14 May 2025 23:43:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266182; cv=none; b=o/bnipASv0dMB4QdClJDiGTlsesL9CmMDSRSqfp/bgG2y3uvwVKyPpP7i+vmVSQMiQEGEOFDV8Ffp+EimPM1ctCqV12u5AfkaCJzGidZPXJ5eoGU0guDyQJL6rqDlkQ9e0ikdIUJUR5nVOcRmitPtwhX0Rbn3TGyOeNBryeyzqc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266182; c=relaxed/simple; bh=Z3mPvWkTUo74uZNv1q+N4L5sLrbeGWDIQpX5nFgYkp0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uI5nQiwDCUdP3vFO+01G6AZmIYX6MQ01ETb6DtAZZ5vaHzHoCxqxvSyKK200FXUmcJMkzcYwCO/MnK6l2jcQPhjj56oWvaHdrDS7yvGN23P7uAiWWh2QdlcFPgh0pR1gFbo3WBght9jkn+IU6qvE5nOFzKWrZ0dEB+a7gi2zIlU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Cye03Usj; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Cye03Usj" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-74255209c4bso567230b3a.2 for ; Wed, 14 May 2025 16:43:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266180; x=1747870980; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uFE+VAE/0TzxtaL5pFdDBk9KqgEP3a2++vmAovntxU8=; b=Cye03UsjdQFqf0FL962CMQGfQ9JsavD6n+zt4YMeR2fDcH90mdwYwvaD2tQ0BL2REu 0p6cBKSSR6OxoWFT4PPRxB2whD56xlg2l6gfIOAAItX3sTgtqAKhFDnldGLZQpGvIUrX CpMkrQE8y5AKbCYC6dSpO7w/hf1AYB+Z4fBQuvFOkIhEZvT6sc1fYGzzjgcsZbeDQWRl WnRIIVVLeoeTt3gygA6WRcl6NdIFQHSfOSWF0hEJCziczcDUWbKWvhqYj3pQY3NuetL5 2W/r5WMJ0SkR+zsIrnW+u1RriGrrDwT4cdrBBFBqd/jJkXnHDFJZF6RlOnhRl9E8phkl OLcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266180; x=1747870980; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uFE+VAE/0TzxtaL5pFdDBk9KqgEP3a2++vmAovntxU8=; b=JKkIybpL6EvQStReJ2QCYvw4e1aIU84gEuBemHbKxgqIZ0SM82mHAFK6mQBlb6aalM 9p+gUBNXlaeHuyZaRB23KUYqNj4G0w8c1aV3GYjEDZDi1lOQiHaZzBFFa0Kz2nqd/Wrz mXsEcLWqxr7Xv7d8hw2Po6f6K67LHdleLTI8nzV/Sp3Dha7LBkcQl67810F6YABj2Oxt DgJw4UMgPr3Pkcg2YlPxDIMwqOOVN9vvv8Gg6uwfAFvgvt1i5BItHReRotLtQmauEE9H 6yjprUM+OxqgWA4ZdUpTtjjEaXVq/j1eaIc03otJb5FqjlfhAvUAMwg6fnMN1yxT/GdY r5xg== X-Forwarded-Encrypted: i=1; AJvYcCXxrePhsiqU7iYWRPQHZtEi7JM8MsriuR43zu/3XncRbYeFPdon3gxqjSqfvUMzJL1kNFXl3cjwJ8HC8rY=@vger.kernel.org X-Gm-Message-State: AOJu0Ywd2BHGCpj4VnOWM21rjVNemrS/RN8XoZueWCFfBdS0GUpADpvF v3aXPJI6uwZuUYASS/wTn1lVAwUdZhzHrLwkxTuKxnSgg8bMSk6zdALHVsryjTV5L70+2ElU71W WHhEZTsqTLzDZJxlBmWtTxw== X-Google-Smtp-Source: AGHT+IH2SLSnX8hsNiO9/wgN08juESR9wv9AYUbEDhzJBJHGfElIKJ6uImvash75kjiAopzOkjbf2Pzlkbo1QoRW6g== X-Received: from pfx11.prod.google.com ([2002:a05:6a00:a44b:b0:741:e763:be68]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:1941:b0:740:9e87:9625 with SMTP id d2e1a72fcca58-742984c1549mr854404b3a.4.1747266180097; Wed, 14 May 2025 16:43:00 -0700 (PDT) Date: Wed, 14 May 2025 16:41:41 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 02/51] KVM: guest_memfd: Introduce and use shareability to guard faulting From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Track guest_memfd memory's shareability status within the inode as opposed to the file, since it is property of the guest_memfd's memory contents. Shareability is a property of the memory and is indexed using the page's index in the inode. Because shareability is the memory's property, it is stored within guest_memfd instead of within KVM, like in kvm->mem_attr_array. KVM_MEMORY_ATTRIBUTE_PRIVATE in kvm->mem_attr_array must still be retained to allow VMs to only use guest_memfd for private memory and some other memory for shared memory. Not all use cases require guest_memfd() to be shared with the host when first created. Add a new flag, GUEST_MEMFD_FLAG_INIT_PRIVATE, which when set on KVM_CREATE_GUEST_MEMFD, initializes the memory as private to the guest, and therefore not mappable by the host. Otherwise, memory is shared until explicitly converted to private. Signed-off-by: Ackerley Tng Co-developed-by: Vishal Annapurve Signed-off-by: Vishal Annapurve Co-developed-by: Fuad Tabba Signed-off-by: Fuad Tabba Change-Id: If03609cbab3ad1564685c85bdba6dcbb6b240c0f --- Documentation/virt/kvm/api.rst | 5 ++ include/uapi/linux/kvm.h | 2 + virt/kvm/guest_memfd.c | 124 ++++++++++++++++++++++++++++++++- 3 files changed, 129 insertions(+), 2 deletions(-) diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 86f74ce7f12a..f609337ae1c2 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6408,6 +6408,11 @@ belonging to the slot via its userspace_addr. The use of GUEST_MEMFD_FLAG_SUPPORT_SHARED will not be allowed for CoCo VM= s. This is validated when the guest_memfd instance is bound to the VM. =20 +If the capability KVM_CAP_GMEM_CONVERSIONS is supported, then the 'flags' = field +supports GUEST_MEMFD_FLAG_INIT_PRIVATE. Setting GUEST_MEMFD_FLAG_INIT_PRI= VATE +will initialize the memory for the guest_memfd as guest-only and not fault= able +by the host. + See KVM_SET_USER_MEMORY_REGION2 for additional details. =20 4.143 KVM_PRE_FAULT_MEMORY diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 4cc824a3a7c9..d7df312479aa 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1567,7 +1567,9 @@ struct kvm_memory_attributes { #define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) =20 #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest= _memfd) + #define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1UL << 0) +#define GUEST_MEMFD_FLAG_INIT_PRIVATE (1UL << 1) =20 struct kvm_create_guest_memfd { __u64 size; diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 239d0f13dcc1..590932499eba 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include =20 @@ -17,6 +18,24 @@ struct kvm_gmem { struct list_head entry; }; =20 +struct kvm_gmem_inode_private { +#ifdef CONFIG_KVM_GMEM_SHARED_MEM + struct maple_tree shareability; +#endif +}; + +enum shareability { + SHAREABILITY_GUEST =3D 1, /* Only the guest can map (fault) folios in thi= s range. */ + SHAREABILITY_ALL =3D 2, /* Both guest and host can fault folios in this r= ange. */ +}; + +static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index= ); + +static struct kvm_gmem_inode_private *kvm_gmem_private(struct inode *inode) +{ + return inode->i_mapping->i_private_data; +} + /** * folio_file_pfn - like folio_file_page, but return a pfn. * @folio: The folio which contains this index. @@ -29,6 +48,58 @@ static inline kvm_pfn_t folio_file_pfn(struct folio *fol= io, pgoff_t index) return folio_pfn(folio) + (index & (folio_nr_pages(folio) - 1)); } =20 +#ifdef CONFIG_KVM_GMEM_SHARED_MEM + +static int kvm_gmem_shareability_setup(struct kvm_gmem_inode_private *priv= ate, + loff_t size, u64 flags) +{ + enum shareability m; + pgoff_t last; + + last =3D (size >> PAGE_SHIFT) - 1; + m =3D flags & GUEST_MEMFD_FLAG_INIT_PRIVATE ? SHAREABILITY_GUEST : + SHAREABILITY_ALL; + return mtree_store_range(&private->shareability, 0, last, xa_mk_value(m), + GFP_KERNEL); +} + +static enum shareability kvm_gmem_shareability_get(struct inode *inode, + pgoff_t index) +{ + struct maple_tree *mt; + void *entry; + + mt =3D &kvm_gmem_private(inode)->shareability; + entry =3D mtree_load(mt, index); + WARN(!entry, + "Shareability should always be defined for all indices in inode."); + + return xa_to_value(entry); +} + +static struct folio *kvm_gmem_get_shared_folio(struct inode *inode, pgoff_= t index) +{ + if (kvm_gmem_shareability_get(inode, index) !=3D SHAREABILITY_ALL) + return ERR_PTR(-EACCES); + + return kvm_gmem_get_folio(inode, index); +} + +#else + +static int kvm_gmem_shareability_setup(struct maple_tree *mt, loff_t size,= u64 flags) +{ + return 0; +} + +static inline struct folio *kvm_gmem_get_shared_folio(struct inode *inode,= pgoff_t index) +{ + WARN_ONCE("Unexpected call to get shared folio.") + return NULL; +} + +#endif /* CONFIG_KVM_GMEM_SHARED_MEM */ + static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slo= t *slot, pgoff_t index, struct folio *folio) { @@ -333,7 +404,7 @@ static vm_fault_t kvm_gmem_fault_shared(struct vm_fault= *vmf) =20 filemap_invalidate_lock_shared(inode->i_mapping); =20 - folio =3D kvm_gmem_get_folio(inode, vmf->pgoff); + folio =3D kvm_gmem_get_shared_folio(inode, vmf->pgoff); if (IS_ERR(folio)) { int err =3D PTR_ERR(folio); =20 @@ -420,8 +491,33 @@ static struct file_operations kvm_gmem_fops =3D { .fallocate =3D kvm_gmem_fallocate, }; =20 +static void kvm_gmem_free_inode(struct inode *inode) +{ + struct kvm_gmem_inode_private *private =3D kvm_gmem_private(inode); + + kfree(private); + + free_inode_nonrcu(inode); +} + +static void kvm_gmem_destroy_inode(struct inode *inode) +{ + struct kvm_gmem_inode_private *private =3D kvm_gmem_private(inode); + +#ifdef CONFIG_KVM_GMEM_SHARED_MEM + /* + * mtree_destroy() can't be used within rcu callback, hence can't be + * done in ->free_inode(). + */ + if (private) + mtree_destroy(&private->shareability); +#endif +} + static const struct super_operations kvm_gmem_super_operations =3D { .statfs =3D simple_statfs, + .destroy_inode =3D kvm_gmem_destroy_inode, + .free_inode =3D kvm_gmem_free_inode, }; =20 static int kvm_gmem_init_fs_context(struct fs_context *fc) @@ -549,12 +645,26 @@ static const struct inode_operations kvm_gmem_iops = =3D { static struct inode *kvm_gmem_inode_make_secure_inode(const char *name, loff_t size, u64 flags) { + struct kvm_gmem_inode_private *private; struct inode *inode; + int err; =20 inode =3D alloc_anon_secure_inode(kvm_gmem_mnt->mnt_sb, name); if (IS_ERR(inode)) return inode; =20 + err =3D -ENOMEM; + private =3D kzalloc(sizeof(*private), GFP_KERNEL); + if (!private) + goto out; + + mt_init(&private->shareability); + inode->i_mapping->i_private_data =3D private; + + err =3D kvm_gmem_shareability_setup(private, size, flags); + if (err) + goto out; + inode->i_private =3D (void *)(unsigned long)flags; inode->i_op =3D &kvm_gmem_iops; inode->i_mapping->a_ops =3D &kvm_gmem_aops; @@ -566,6 +676,11 @@ static struct inode *kvm_gmem_inode_make_secure_inode(= const char *name, WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping)); =20 return inode; + +out: + iput(inode); + + return ERR_PTR(err); } =20 static struct file *kvm_gmem_inode_create_getfile(void *priv, loff_t size, @@ -654,6 +769,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_= guest_memfd *args) if (kvm_arch_vm_supports_gmem_shared_mem(kvm)) valid_flags |=3D GUEST_MEMFD_FLAG_SUPPORT_SHARED; =20 + if (flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED) + valid_flags |=3D GUEST_MEMFD_FLAG_INIT_PRIVATE; + if (flags & ~valid_flags) return -EINVAL; =20 @@ -842,6 +960,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory= _slot *slot, if (!file) return -EFAULT; =20 + filemap_invalidate_lock_shared(file_inode(file)->i_mapping); + folio =3D __kvm_gmem_get_pfn(file, slot, index, pfn, &is_prepared, max_or= der); if (IS_ERR(folio)) { r =3D PTR_ERR(folio); @@ -857,8 +977,8 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory= _slot *slot, *page =3D folio_file_page(folio, index); else folio_put(folio); - out: + filemap_invalidate_unlock_shared(file_inode(file)->i_mapping); fput(file); return r; } --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C60E22ACF7 for ; Wed, 14 May 2025 23:43:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266184; cv=none; b=nQ/3IOFyetaRW+cDgJetqFEPr/LX4Gt3TtWEtSim804UvcrEB5wH9odKlen62bbIA+NxO0nHoiAXf4KUV6rO04M1q99gmwHsrIjghyoCt2IwY0QlFpt30OOAE/KsAWtaN2FXC45+0UeidSqW1zq+9XFu98JtVR1hpxFpSy4Hj3Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266184; c=relaxed/simple; bh=TvFIoSrzCO8w1cNFBdYkZGoalAviBLZzlcndl+DTNy8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=kVq9HtGcJi9xCDLaR6Dpga8YqbgE0fPovPqvn/W+4mtIau2nnRJyMCbAuBfCokyLeQ5ew2jPYmzDJY99z+K+mzSYJO9samMxOWZWGs9JpYb27HW1DLfD5MoqcQLoz4zcwnkAtsp5xLZZsg5rg8lXkvLSEl2xP/o04JJye0m6/YI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BiNbP0Qw; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BiNbP0Qw" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30c06465976so356559a91.0 for ; Wed, 14 May 2025 16:43:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266182; x=1747870982; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OB4Vu62B2E9k02apRztGbb6VkHGr4YYVx8kY72Oa8kw=; b=BiNbP0QwsKUTY3eJ6SPfDiWMirDwi3gw6Fs+ZhqjZYiOptLU/e7aIYLrT53sOfwHhp MMd4Xk/T6i5xjQxm+dx3tbUsUPMmavAh37ntntjyGvpXee9y5fglIoZPJXo9Sb90/Q5o s1Y1VdRSl0MtoBrar5Begj5636x4SSkLDLm3Q8wNOMfIQJwQtDIEpiPkc+kyfRQ8PHVZ IGNBF5Z+jR18s566s9tvwhW/L3MD4yYXxFZcX/LtghTsEEivAB6uciaf76pKQwwa0ISA SbCb0gzsZXeYUn1psouKGOfpMomLSGmfo6kg2Q1ei+5TYXuBdyL97UzNzXGGLZZD4aJN Rl8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266182; x=1747870982; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OB4Vu62B2E9k02apRztGbb6VkHGr4YYVx8kY72Oa8kw=; b=KbKTAntSa1KqBpikU/SzPgKeIqn8k++UYNoeBgYpTiqXIqr310IhLsgS0yUNyqzbpu GkGHMZchMcWFfiClVbagjwfu24+fRuZu3O1v1u2W8xJJWw7FuxfD8cMslPoG2j/cx4gg YZ7lhpJKnX2gzMk7Y5f4RZI08ER+mprluIg9vLfiQvzrLiSoqJpDGUuVdhc+6UKhmV6Z FeC6iL9Y1NDev5Wk8VFwj8SjXx4fPybLq5sOJS4YQxFCu+C/YcWBxsakw5P/m46R6KWr 7BNkYTVMfmsxgHvywazACGZqVtnnf/QDKQ5Wevl/sO4egs2J7N/pTg6RAeiGYNDgKL4/ 2aQg== X-Forwarded-Encrypted: i=1; AJvYcCU6l8ZA/kSgpLWNzkhZxFRBvUKvktoj/DvPHLrtsR4SgqktH3ezpDXNrR+/3pEUOugzJMJJN788BL0C1FA=@vger.kernel.org X-Gm-Message-State: AOJu0YzzH5Z9+6Rrg4ECtNqR4V9Ou4zOpr19Emqn5+b+z8iI6EN9kDZh TzCvi7G9c1wURvpMoceTFRHc1eNx8c242iajoeKx7WBLWYm9GYj/uv0hzwFPs+bFD3p/97mB8Gm bbuXYiu78Y5N29qB35I3Ppw== X-Google-Smtp-Source: AGHT+IF+adSlhp7zDJmm2A5yYOEU8SOORDatD26qz6M1m2iFgXS5k7ij7O/H0tkK9p9BEbICVXUoFAXr3q9vRCbvwQ== X-Received: from pjbsb11.prod.google.com ([2002:a17:90b:50cb:b0:2fc:13d6:b4cb]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a86:b0:305:5f28:2d5c with SMTP id 98e67ed59e1d1-30e2e5d6aa8mr8073892a91.15.1747266181709; Wed, 14 May 2025 16:43:01 -0700 (PDT) Date: Wed, 14 May 2025 16:41:42 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <65afac3b13851c442c72652904db6d5755299615.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 03/51] KVM: selftests: Update guest_memfd_test for INIT_PRIVATE flag From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Test that GUEST_MEMFD_FLAG_INIT_PRIVATE is only valid when GUEST_MEMFD_FLAG_SUPPORT_SHARED is set. Change-Id: I506e236a232047cfaee17bcaed02ee14c8d25bbb Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/guest_memfd_test.c | 36 ++++++++++++------- 1 file changed, 24 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing= /selftests/kvm/guest_memfd_test.c index 60aaba5808a5..bf2876cbd711 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -401,13 +401,31 @@ static void test_with_type(unsigned long vm_type, uin= t64_t guest_memfd_flags, kvm_vm_release(vm); } =20 +static void test_vm_with_gmem_flag(struct kvm_vm *vm, uint64_t flag, + bool expect_valid) +{ + size_t page_size =3D getpagesize(); + int fd; + + fd =3D __vm_create_guest_memfd(vm, page_size, flag); + + if (expect_valid) { + TEST_ASSERT(fd > 0, + "guest_memfd() with flag '0x%lx' should be valid", + flag); + close(fd); + } else { + TEST_ASSERT(fd =3D=3D -1 && errno =3D=3D EINVAL, + "guest_memfd() with flag '0x%lx' should fail with EINVAL", + flag); + } +} + static void test_vm_type_gmem_flag_validity(unsigned long vm_type, uint64_t expected_valid_flags) { - size_t page_size =3D getpagesize(); struct kvm_vm *vm; uint64_t flag =3D 0; - int fd; =20 if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) return; @@ -415,17 +433,11 @@ static void test_vm_type_gmem_flag_validity(unsigned = long vm_type, vm =3D vm_create_barebones_type(vm_type); =20 for (flag =3D BIT(0); flag; flag <<=3D 1) { - fd =3D __vm_create_guest_memfd(vm, page_size, flag); + test_vm_with_gmem_flag(vm, flag, flag & expected_valid_flags); =20 - if (flag & expected_valid_flags) { - TEST_ASSERT(fd > 0, - "guest_memfd() with flag '0x%lx' should be valid", - flag); - close(fd); - } else { - TEST_ASSERT(fd =3D=3D -1 && errno =3D=3D EINVAL, - "guest_memfd() with flag '0x%lx' should fail with EINVAL", - flag); + if (flag =3D=3D GUEST_MEMFD_FLAG_SUPPORT_SHARED) { + test_vm_with_gmem_flag( + vm, flag | GUEST_MEMFD_FLAG_INIT_PRIVATE, true); } } =20 --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2D69A22D4C4 for ; Wed, 14 May 2025 23:43:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266186; cv=none; b=LWmVVAJA2yG1+PjYaJeedFZmGhtsuECT4tUmVn17lEBPlS8BZLig2n1xqYToMsfaxAsgOow0agITZCDtTGM3jJInlkxf5jEL1t3Ib3Zoxc/aw/W68Pz3Ylj17jJXouRAyb0HyopfM057Rq/BXlZmse8JQneI/frVNUAV4g/MgxA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266186; c=relaxed/simple; bh=sGc9RmYb7m9tHTPSCKHQ0juypYxNVRTdp24kVn5jqVI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ERfu/gk2bGCa4Jl4zVOg1AfUhZdtLB0LsvSwT1hI6EfkImvydlSE9v5Q7XfZf4r7w8yyXMKtdn7PyDI8iWqeWHljrFuXen5Brh4k62evaq6PVsGWycm9HQaxR3sNmHoSV1F9DPmEkxw0SSl5K1lBMxFRboVqldUr5C16CW9QXt0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Fk+Ho6VK; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Fk+Ho6VK" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30ac618952eso279924a91.2 for ; Wed, 14 May 2025 16:43:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266183; x=1747870983; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8VuNQPGchcOIq5zjntCI1hvesntZRo2DrNtwYp0PG1Q=; b=Fk+Ho6VKE6BV1XVvlO411ObmcxyRnipA2bqIj9xSU4kjDj5+F6N98c6QBF/I0Iasgg iYM3pBv65HymMonPvtTVikyfNJBH8c4TDcAz8mtNXINJ84vB3/2i+lzDFBUUWhG+RMii IAjfkvvgVjNhbUD73JXtK2LT9yaH0jebUYee1kYuG5u8gcbk3oKAt9EB0CDb7BXSyxNz izuendHosX+Dm5slW+BmRYciTidgGsUOsWuBp5+gcsKPJIEFJL7ooYs8uAXGITXKhXhq l4q5zgdVeLPPl1we4uJdAup9yFJCJPAsEvV/rEfsVF2z/DqSQ0xfh683NjC7d7Lwv38l 9DpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266183; x=1747870983; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8VuNQPGchcOIq5zjntCI1hvesntZRo2DrNtwYp0PG1Q=; b=MlXPHvJu+QI+1Bk6RS3jnd2/EHdhf1FNSzqjlOgzR/Z6M3xzjd52ukCzlXBzWZWlAj dkOvdUV3nn8j+1I2TOYfGWwN6IXL9+dhKkalfpYjqnFys9d+e8VXkfzAUATTgdhlvSLc uJ9ivNie30ieMrl7vI2UmmHKK3sSDmrIsU1tRA1PtM8ju5E7SqTrX9olxpmtNhkZDPae 8TSoqhjbr4LHgl7endfb3CAnQlS0Vo04j0bfmT2n01x6fQ42vUXWr9SwSNDT1RprFoNx zhVzmhtbQT3seTyx+nLMDuQKcWbQAMLxC87NqG7y3vYM7+8GRpAteEUvKU3gdsh2viAQ tbrA== X-Forwarded-Encrypted: i=1; AJvYcCUu+7/hpYDnJQthzN/PaCH6MUKVrESKxEdoFzc3gkb89hkmU3wG+SQaudi5hcRRKYSuAEexGN7/mukdqx4=@vger.kernel.org X-Gm-Message-State: AOJu0YzkWfdbhYJFrUmBdIUqr/S3MuPRp0XyG7p8hdH1rEbrwIoQ1w09 HSUEbxpa/FyGRLcrsfl3RoZ9O+yqHqCqM70+RJOxRkq7E9TXhwlHrmuUCIn8p9fCxCQPs7682n3 zHXzzSdPLhIjM/8PQNQRLCw== X-Google-Smtp-Source: AGHT+IG1FkJkhArG6Mt61Kb4omEnCDD1MSArIB4pDvqjFD6Q0hmbBicjQgSUCHYOc1MdC3NRkt8lFYlToXXQ998Tug== X-Received: from pjbqx6.prod.google.com ([2002:a17:90b:3e46:b0:30a:7c16:a1aa]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a86:b0:305:5f28:2d5c with SMTP id 98e67ed59e1d1-30e2e5d6aa8mr8074003a91.15.1747266183385; Wed, 14 May 2025 16:43:03 -0700 (PDT) Date: Wed, 14 May 2025 16:41:43 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The two new guest_memfd ioctls KVM_GMEM_CONVERT_SHARED and KVM_GMEM_CONVERT_PRIVATE convert the requested memory ranges to shared and private respectively. A guest_memfd ioctl is used because shareability is a property of the memory, and this property should be modifiable independently of the attached struct kvm. This allows shareability to be modified even if the memory is not yet bound using memslots. For shared to private conversions, if refcounts on any of the folios within the range are elevated, fail the conversion with -EAGAIN. At the point of shared to private conversion, all folios in range are also unmapped. The filemap_invalidate_lock() is held, so no faulting can occur. Hence, from that point on, only transient refcounts can be taken on the folios associated with that guest_memfd. Hence, it is safe to do the conversion from shared to private. After conversion is complete, refcounts may become elevated, but that is fine since users of transient refcounts don't actually access memory. For private to shared conversions, there are no refcount checks. any transient refcounts are expected to drop their refcounts soon. The conversion process will spin waiting for these transient refcounts to go away. Signed-off-by: Ackerley Tng Change-Id: I3546aaf6c1b795de6dc9ba09e816b64934221918 --- include/uapi/linux/kvm.h | 11 ++ virt/kvm/guest_memfd.c | 357 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 366 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d7df312479aa..5b28e17f6f14 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1577,6 +1577,17 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; =20 +#define KVM_GMEM_IO 0xAF +#define KVM_GMEM_CONVERT_SHARED _IOWR(KVM_GMEM_IO, 0x41, struct kvm_gmem= _convert) +#define KVM_GMEM_CONVERT_PRIVATE _IOWR(KVM_GMEM_IO, 0x42, struct kvm_gmem= _convert) + +struct kvm_gmem_convert { + __u64 offset; + __u64 size; + __u64 error_offset; + __u64 reserved[5]; +}; + #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memor= y) =20 struct kvm_pre_fault_memory { diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 590932499eba..f802116290ce 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -30,6 +30,10 @@ enum shareability { }; =20 static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index= ); +static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, + pgoff_t end); +static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start, + pgoff_t end); =20 static struct kvm_gmem_inode_private *kvm_gmem_private(struct inode *inode) { @@ -85,6 +89,306 @@ static struct folio *kvm_gmem_get_shared_folio(struct i= node *inode, pgoff_t inde return kvm_gmem_get_folio(inode, index); } =20 +/** + * kvm_gmem_shareability_store() - Sets shareability to @value for range. + * + * @mt: the shareability maple tree. + * @index: the range begins at this index in the inode. + * @nr_pages: number of PAGE_SIZE pages in this range. + * @value: the shareability value to set for this range. + * + * Unlike mtree_store_range(), this function also merges adjacent ranges t= hat + * have the same values as an optimization. Assumes that all stores to @mt= go + * through this function, such that adjacent ranges are always merged. + * + * Return: 0 on success and negative error otherwise. + */ +static int kvm_gmem_shareability_store(struct maple_tree *mt, pgoff_t inde= x, + size_t nr_pages, enum shareability value) +{ + MA_STATE(mas, mt, 0, 0); + unsigned long start; + unsigned long last; + void *entry; + int ret; + + start =3D index; + last =3D start + nr_pages - 1; + + mas_lock(&mas); + + /* Try extending range. entry is NULL on overflow/wrap-around. */ + mas_set_range(&mas, last + 1, last + 1); + entry =3D mas_find(&mas, last + 1); + if (entry && xa_to_value(entry) =3D=3D value) + last =3D mas.last; + + mas_set_range(&mas, start - 1, start - 1); + entry =3D mas_find(&mas, start - 1); + if (entry && xa_to_value(entry) =3D=3D value) + start =3D mas.index; + + mas_set_range(&mas, start, last); + ret =3D mas_store_gfp(&mas, xa_mk_value(value), GFP_KERNEL); + + mas_unlock(&mas); + + return ret; +} + +struct conversion_work { + struct list_head list; + pgoff_t start; + size_t nr_pages; +}; + +static int add_to_work_list(struct list_head *list, pgoff_t start, pgoff_t= last) +{ + struct conversion_work *work; + + work =3D kzalloc(sizeof(*work), GFP_KERNEL); + if (!work) + return -ENOMEM; + + work->start =3D start; + work->nr_pages =3D last + 1 - start; + + list_add_tail(&work->list, list); + + return 0; +} + +static bool kvm_gmem_has_safe_refcount(struct address_space *mapping, pgof= f_t start, + size_t nr_pages, pgoff_t *error_index) +{ + const int filemap_get_folios_refcount =3D 1; + struct folio_batch fbatch; + bool refcount_safe; + pgoff_t last; + int i; + + last =3D start + nr_pages - 1; + refcount_safe =3D true; + + folio_batch_init(&fbatch); + while (refcount_safe && + filemap_get_folios(mapping, &start, last, &fbatch)) { + + for (i =3D 0; i < folio_batch_count(&fbatch); ++i) { + int filemap_refcount; + int safe_refcount; + struct folio *f; + + f =3D fbatch.folios[i]; + filemap_refcount =3D folio_nr_pages(f); + + safe_refcount =3D filemap_refcount + filemap_get_folios_refcount; + if (folio_ref_count(f) !=3D safe_refcount) { + refcount_safe =3D false; + *error_index =3D f->index; + break; + } + } + + folio_batch_release(&fbatch); + } + + return refcount_safe; +} + +static int kvm_gmem_shareability_apply(struct inode *inode, + struct conversion_work *work, + enum shareability m) +{ + struct maple_tree *mt; + + mt =3D &kvm_gmem_private(inode)->shareability; + return kvm_gmem_shareability_store(mt, work->start, work->nr_pages, m); +} + +static int kvm_gmem_convert_compute_work(struct inode *inode, pgoff_t star= t, + size_t nr_pages, enum shareability m, + struct list_head *work_list) +{ + struct maple_tree *mt; + struct ma_state mas; + pgoff_t last; + void *entry; + int ret; + + last =3D start + nr_pages - 1; + + mt =3D &kvm_gmem_private(inode)->shareability; + ret =3D 0; + + mas_init(&mas, mt, start); + + rcu_read_lock(); + mas_for_each(&mas, entry, last) { + enum shareability current_m; + pgoff_t m_range_index; + pgoff_t m_range_last; + int ret; + + m_range_index =3D max(mas.index, start); + m_range_last =3D min(mas.last, last); + + current_m =3D xa_to_value(entry); + if (m =3D=3D current_m) + continue; + + mas_pause(&mas); + rcu_read_unlock(); + /* Caller will clean this up on error. */ + ret =3D add_to_work_list(work_list, m_range_index, m_range_last); + rcu_read_lock(); + if (ret) + break; + } + rcu_read_unlock(); + + return ret; +} + +static void kvm_gmem_convert_invalidate_begin(struct inode *inode, + struct conversion_work *work) +{ + struct list_head *gmem_list; + struct kvm_gmem *gmem; + pgoff_t end; + + end =3D work->start + work->nr_pages; + + gmem_list =3D &inode->i_mapping->i_private_list; + list_for_each_entry(gmem, gmem_list, entry) + kvm_gmem_invalidate_begin(gmem, work->start, end); +} + +static void kvm_gmem_convert_invalidate_end(struct inode *inode, + struct conversion_work *work) +{ + struct list_head *gmem_list; + struct kvm_gmem *gmem; + pgoff_t end; + + end =3D work->start + work->nr_pages; + + gmem_list =3D &inode->i_mapping->i_private_list; + list_for_each_entry(gmem, gmem_list, entry) + kvm_gmem_invalidate_end(gmem, work->start, end); +} + +static int kvm_gmem_convert_should_proceed(struct inode *inode, + struct conversion_work *work, + bool to_shared, pgoff_t *error_index) +{ + if (!to_shared) { + unmap_mapping_pages(inode->i_mapping, work->start, + work->nr_pages, false); + + if (!kvm_gmem_has_safe_refcount(inode->i_mapping, work->start, + work->nr_pages, error_index)) { + return -EAGAIN; + } + } + + return 0; +} + +static int kvm_gmem_convert_range(struct file *file, pgoff_t start, + size_t nr_pages, bool shared, + pgoff_t *error_index) +{ + struct conversion_work *work, *tmp, *rollback_stop_item; + LIST_HEAD(work_list); + struct inode *inode; + enum shareability m; + int ret; + + inode =3D file_inode(file); + + filemap_invalidate_lock(inode->i_mapping); + + m =3D shared ? SHAREABILITY_ALL : SHAREABILITY_GUEST; + ret =3D kvm_gmem_convert_compute_work(inode, start, nr_pages, m, &work_li= st); + if (ret || list_empty(&work_list)) + goto out; + + list_for_each_entry(work, &work_list, list) + kvm_gmem_convert_invalidate_begin(inode, work); + + list_for_each_entry(work, &work_list, list) { + ret =3D kvm_gmem_convert_should_proceed(inode, work, shared, + error_index); + if (ret) + goto invalidate_end; + } + + list_for_each_entry(work, &work_list, list) { + rollback_stop_item =3D work; + ret =3D kvm_gmem_shareability_apply(inode, work, m); + if (ret) + break; + } + + if (ret) { + m =3D shared ? SHAREABILITY_GUEST : SHAREABILITY_ALL; + list_for_each_entry(work, &work_list, list) { + if (work =3D=3D rollback_stop_item) + break; + + WARN_ON(kvm_gmem_shareability_apply(inode, work, m)); + } + } + +invalidate_end: + list_for_each_entry(work, &work_list, list) + kvm_gmem_convert_invalidate_end(inode, work); +out: + filemap_invalidate_unlock(inode->i_mapping); + + list_for_each_entry_safe(work, tmp, &work_list, list) { + list_del(&work->list); + kfree(work); + } + + return ret; +} + +static int kvm_gmem_ioctl_convert_range(struct file *file, + struct kvm_gmem_convert *param, + bool shared) +{ + pgoff_t error_index; + size_t nr_pages; + pgoff_t start; + int ret; + + if (param->error_offset) + return -EINVAL; + + if (param->size =3D=3D 0) + return 0; + + if (param->offset + param->size < param->offset || + param->offset > file_inode(file)->i_size || + param->offset + param->size > file_inode(file)->i_size) + return -EINVAL; + + if (!IS_ALIGNED(param->offset, PAGE_SIZE) || + !IS_ALIGNED(param->size, PAGE_SIZE)) + return -EINVAL; + + start =3D param->offset >> PAGE_SHIFT; + nr_pages =3D param->size >> PAGE_SHIFT; + + ret =3D kvm_gmem_convert_range(file, start, nr_pages, shared, &error_inde= x); + if (ret) + param->error_offset =3D error_index << PAGE_SHIFT; + + return ret; +} + #else =20 static int kvm_gmem_shareability_setup(struct maple_tree *mt, loff_t size,= u64 flags) @@ -186,15 +490,26 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem= *gmem, pgoff_t start, unsigned long index; =20 xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { + enum kvm_gfn_range_filter filter; pgoff_t pgoff =3D slot->gmem.pgoff; =20 + filter =3D KVM_FILTER_PRIVATE; + if (kvm_gmem_memslot_supports_shared(slot)) { + /* + * Unmapping would also cause invalidation, but cannot + * rely on mmu_notifiers to do invalidation via + * unmapping, since memory may not be mapped to + * userspace. + */ + filter |=3D KVM_FILTER_SHARED; + } + struct kvm_gfn_range gfn_range =3D { .start =3D slot->base_gfn + max(pgoff, start) - pgoff, .end =3D slot->base_gfn + min(pgoff + slot->npages, end) - pgoff, .slot =3D slot, .may_block =3D true, - /* guest memfd is relevant to only private mappings. */ - .attr_filter =3D KVM_FILTER_PRIVATE, + .attr_filter =3D filter, }; =20 if (!found_memslot) { @@ -484,11 +799,49 @@ EXPORT_SYMBOL_GPL(kvm_gmem_memslot_supports_shared); #define kvm_gmem_mmap NULL #endif /* CONFIG_KVM_GMEM_SHARED_MEM */ =20 +static long kvm_gmem_ioctl(struct file *file, unsigned int ioctl, + unsigned long arg) +{ + void __user *argp; + int r; + + argp =3D (void __user *)arg; + + switch (ioctl) { +#ifdef CONFIG_KVM_GMEM_SHARED_MEM + case KVM_GMEM_CONVERT_SHARED: + case KVM_GMEM_CONVERT_PRIVATE: { + struct kvm_gmem_convert param; + bool to_shared; + + r =3D -EFAULT; + if (copy_from_user(¶m, argp, sizeof(param))) + goto out; + + to_shared =3D ioctl =3D=3D KVM_GMEM_CONVERT_SHARED; + r =3D kvm_gmem_ioctl_convert_range(file, ¶m, to_shared); + if (r) { + if (copy_to_user(argp, ¶m, sizeof(param))) { + r =3D -EFAULT; + goto out; + } + } + break; + } +#endif + default: + r =3D -ENOTTY; + } +out: + return r; +} + static struct file_operations kvm_gmem_fops =3D { .mmap =3D kvm_gmem_mmap, .open =3D generic_file_open, .release =3D kvm_gmem_release, .fallocate =3D kvm_gmem_fallocate, + .unlocked_ioctl =3D kvm_gmem_ioctl, }; =20 static void kvm_gmem_free_inode(struct inode *inode) --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B2AB522DF8E for ; Wed, 14 May 2025 23:43:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266187; cv=none; b=XsgB5bqjiwzj4hV592FkXUn85KfOzRD7/9Q5/d0bM+vHbofRz4w1CT3eNKYdzupG9JxwyJOuYSemkZKssalfqIq97ZLmaBirwHVl2Q3j4GF3jvm46/PzI08kZCmBsx395RJuWn1f4lLjU6Uh23fKKaGC2SWQBwAchZVHGRv66Mw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266187; c=relaxed/simple; bh=yFnnXEcxo12WsAYC+UUgVp00wNV+FZx0rdKhgEEFr3k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=eGllVjHDITciJgQ/0/uaPIqmWWtteqza/uKkO9OYQtFy6iZE7FSBOVuXgtsJD5yJREbCqJMjPBtzVO8k3fa0tXlC1YxAoNMYMIMuqTggawjDpHMr8TEqh4XLylSoErfxlxe3YkbWUQhCslQS7KSr8/w1Lp/nnWzZ93KFAispHrw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Xa3c01dQ; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Xa3c01dQ" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-74089884644so285489b3a.3 for ; Wed, 14 May 2025 16:43:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266185; x=1747870985; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qICii8cPsGiem7fxgV97oCJmZqO43/DxZp5vXdV3WTg=; b=Xa3c01dQCn5Dak1WEOtMZq4hwERvrChCE72+LI2a6GqzHXeYi0YS250vkwAS0g+IaS 5cDby+XvMq/4UG2LbTMST9qKhbndwwWJbD95XngrFoICTGdHtJxGruLhkRjg/fUUGC6z 43ssm7ZFdYloBUXR5cT3hiUzBXSqWhvLO57CkF0NXrbrbETzwt8LuVvtxBFQV5fHheAp /zH3MHVw5Q/y4F+QzKwDERMxaysfvbp+tOwCXD17mZssah95axLcfae7tIh/BkkMzdl/ 6yJDtajscVrLT6A5Q4tlC0UXO7rfQnVx6Gwe91km4QkfDCJvszFobJ+zp1EHzM6TD9Au Qw6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266185; x=1747870985; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qICii8cPsGiem7fxgV97oCJmZqO43/DxZp5vXdV3WTg=; b=i5bRMB9EyyjHeYLiEiY0eS2P8Q7Ml0RbC6STYPd+Iw4CkC19eEDiuENP2bBb5rFNSV /7q6girdGvki/qDBRx+4u6u/XrkMKI3naVhRJp0EQYs0ddGMMlsEqgrI+YoFgmGADfO+ 01OwJWt7qVNHUB6BkXvyQzNLm2618KckGiSX+Op+syqsOuXcGAmgkyeuzFaCtOD1zMl+ ev0VmlbkFq+qhKOPuawHsbw2eUkLzLh49gvi6vu8IU5l0a5Eq3EPgiMTzH6hNg5BkuZO iZWbZlGGOZwIgz4aV5iLg7cunHbcpgyi8GaC5fnF/rYPh94BqfGGjKzA9ssvziXaPeyc ZHMg== X-Forwarded-Encrypted: i=1; AJvYcCUOCh3iVrjpFMrvsgNSpLjk6dp0n1Rnminf7P+1u2/iez5NbZ+wPYkEZOLFl7/QPyIhCEjeOoJR68ZcvDE=@vger.kernel.org X-Gm-Message-State: AOJu0YwvmBRsrM+wuefIbpCChg/gEbegg3L9qNrSH2kLZKUsLzfkv3h5 il44XAuyhagySamfwNu9woH3YWnY4wpo4RcHTARFg0Lr8mOP7qML4jiSn52nhmtrV+x+zvy72j6 gQ3v50pvlFy808NPs3UGFag== X-Google-Smtp-Source: AGHT+IF9uxV4jKBR2ISNh3Rht7TIwj9ihxeO9BED9Bzl3yQgQuek7HskgQIB9opFNDkTcfi2+w8NurKVBa9O265TBw== X-Received: from pfbcb11.prod.google.com ([2002:a05:6a00:430b:b0:730:7e2d:df69]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3d09:b0:740:6f86:a0e6 with SMTP id d2e1a72fcca58-742984dafa0mr801571b3a.6.1747266184935; Wed, 14 May 2025 16:43:04 -0700 (PDT) Date: Wed, 14 May 2025 16:41:44 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <37f60bbd7d408cf6d421d0582462488262c720ab.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 05/51] KVM: guest_memfd: Skip LRU for guest_memfd folios From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" filemap_add_folio(), called from filemap_grab_folio(), adds the folio onto some LRU list, which is not necessary for guest_memfd since guest_memfd folios don't participate in any swapping. This patch reimplements part of filemap_add_folio() to avoid adding allocated guest_memfd folios to the filemap. With shared to private conversions dependent on refcounts, avoiding usage of LRU ensures that LRU lists no longer take any refcounts on guest_memfd folios and significantly reduces the chance of elevated refcounts during conversion. Signed-off-by: Ackerley Tng Change-Id: Ia2540d9fc132d46219e6e714fd42bc82a62a27fa --- mm/filemap.c | 1 + mm/memcontrol.c | 2 + virt/kvm/guest_memfd.c | 91 ++++++++++++++++++++++++++++++++++++++---- 3 files changed, 86 insertions(+), 8 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 7b90cbeb4a1a..bed7160db214 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -954,6 +954,7 @@ noinline int __filemap_add_folio(struct address_space *= mapping, return xas_error(&xas); } ALLOW_ERROR_INJECTION(__filemap_add_folio, ERRNO); +EXPORT_SYMBOL_GPL(__filemap_add_folio); =20 int filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c96c1f2b9cf5..1def80570738 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4611,6 +4611,7 @@ int __mem_cgroup_charge(struct folio *folio, struct m= m_struct *mm, gfp_t gfp) =20 return ret; } +EXPORT_SYMBOL_GPL(__mem_cgroup_charge); =20 /** * mem_cgroup_charge_hugetlb - charge the memcg for a hugetlb folio @@ -4785,6 +4786,7 @@ void __mem_cgroup_uncharge(struct folio *folio) uncharge_folio(folio, &ug); uncharge_batch(&ug); } +EXPORT_SYMBOL_GPL(__mem_cgroup_uncharge); =20 void __mem_cgroup_uncharge_folios(struct folio_batch *folios) { diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index f802116290ce..6f6c4d298f8f 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -466,6 +466,38 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, str= uct kvm_memory_slot *slot, return r; } =20 +static int __kvm_gmem_filemap_add_folio(struct address_space *mapping, + struct folio *folio, pgoff_t index) +{ + void *shadow =3D NULL; + gfp_t gfp; + int ret; + + gfp =3D mapping_gfp_mask(mapping); + + __folio_set_locked(folio); + ret =3D __filemap_add_folio(mapping, folio, index, gfp, &shadow); + __folio_clear_locked(folio); + + return ret; +} + +/* + * Adds a folio to the filemap for guest_memfd. Skips adding the folio to = any + * LRU list. + */ +static int kvm_gmem_filemap_add_folio(struct address_space *mapping, + struct folio *folio, pgoff_t index) +{ + int ret; + + ret =3D __kvm_gmem_filemap_add_folio(mapping, folio, index); + if (!ret) + folio_set_unevictable(folio); + + return ret; +} + /* * Returns a locked folio on success. The caller is responsible for * setting the up-to-date flag before the memory is mapped into the guest. @@ -477,8 +509,46 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, str= uct kvm_memory_slot *slot, */ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) { + struct folio *folio; + gfp_t gfp; + int ret; + +repeat: + folio =3D filemap_lock_folio(inode->i_mapping, index); + if (!IS_ERR(folio)) + return folio; + + gfp =3D mapping_gfp_mask(inode->i_mapping); + /* TODO: Support huge pages. */ - return filemap_grab_folio(inode->i_mapping, index); + folio =3D filemap_alloc_folio(gfp, 0); + if (!folio) + return ERR_PTR(-ENOMEM); + + ret =3D mem_cgroup_charge(folio, NULL, gfp); + if (ret) { + folio_put(folio); + return ERR_PTR(ret); + } + + ret =3D kvm_gmem_filemap_add_folio(inode->i_mapping, folio, index); + if (ret) { + folio_put(folio); + + /* + * There was a race, two threads tried to get a folio indexing + * to the same location in the filemap. The losing thread should + * free the allocated folio, then lock the folio added to the + * filemap by the winning thread. + */ + if (ret =3D=3D -EEXIST) + goto repeat; + + return ERR_PTR(ret); + } + + __folio_set_locked(folio); + return folio; } =20 static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, @@ -956,23 +1026,28 @@ static int kvm_gmem_error_folio(struct address_space= *mapping, struct folio *fol } =20 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE +static void kvm_gmem_invalidate(struct folio *folio) +{ + kvm_pfn_t pfn =3D folio_pfn(folio); + + kvm_arch_gmem_invalidate(pfn, pfn + folio_nr_pages(folio)); +} +#else +static inline void kvm_gmem_invalidate(struct folio *folio) {} +#endif + static void kvm_gmem_free_folio(struct folio *folio) { - struct page *page =3D folio_page(folio, 0); - kvm_pfn_t pfn =3D page_to_pfn(page); - int order =3D folio_order(folio); + folio_clear_unevictable(folio); =20 - kvm_arch_gmem_invalidate(pfn, pfn + (1ul << order)); + kvm_gmem_invalidate(folio); } -#endif =20 static const struct address_space_operations kvm_gmem_aops =3D { .dirty_folio =3D noop_dirty_folio, .migrate_folio =3D kvm_gmem_migrate_folio, .error_remove_folio =3D kvm_gmem_error_folio, -#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE .free_folio =3D kvm_gmem_free_folio, -#endif }; =20 static int kvm_gmem_getattr(struct mnt_idmap *idmap, const struct path *pa= th, --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81F4A22F392 for ; Wed, 14 May 2025 23:43:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266189; cv=none; b=B25fxYwr4ooPPjXqP3e/W5OQBggWvvqeekgLIyemJ6BZYLbyYidfeoQMszpnk/4G8ySwdcLfku0Deefqhy0qI6+a5pfRkw5UVtFGpjUzcPuDK6/9e+GH6AmI68jrhxg+JIdTQOHVJat1P2bYGmKQfYUR8Sp3YbjoDdnd5vWAWhw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266189; c=relaxed/simple; bh=/oBHKxiR+7UKkvG/s3QvWspUBplbBTSgoXfmvQIq1WY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=G0MHzZvdKrKpQno4aUsoMrghhrP7cBySRQ1+8Ni3HcByHXCxlzCCfmOy+2FpaB64OG40P3d5QY/h6iQRrPykGzCJNDOXQdA2fckg3Iqartu8u7zTPxAkB0auu1Fzf0lYRXfa3FN5GycDqyQ7jUxZeYKn9r1umpi+2MDtwcNUeQU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=n7xI9McQ; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="n7xI9McQ" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b0f807421c9so140068a12.0 for ; Wed, 14 May 2025 16:43:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266187; x=1747870987; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XWIedWKko0q0l+HgOrPiOCPy2pv6PChoegA1MNDO8i4=; b=n7xI9McQE9IKDqsuBfIpWiEdT2YzXLvco8syvpjTufkwFnEzA0NYojLdnLmar0vPYm +PQYMDhSduvA31HJm9mqxxV7I8anv78KfJkpAMgmF0gq8HeEn+9Z1fLMV/NxOT+ORtJo 6kYfx0znglrKhGmIO0FhW3rjB6OI6TRP1C7EFCzFPhFurhX1dwlBZCN+gX3pE07LECDr nAXIcmSeyI7SzJJ7h/o1h26+mtTa/wPa6F4+DABGXtXyONiXCzirmxR0Ijad4qCUEpE3 W5ttRP+68rSVzXuU5RzJe/yEP5IrPCUVRi1BhjjWn0ZOhlVSJP4h8RH9lyWIypT5vCez JMmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266187; x=1747870987; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XWIedWKko0q0l+HgOrPiOCPy2pv6PChoegA1MNDO8i4=; b=PBrWS63FOGQv5+fBM7icMolN7OGppmQsJXFtAvWGRG16u9TYin2u1tCGUeZ+b0U9OQ ktoAu3mJFPMqhwZqhYGjT5ya78ZFMT3HfQLmVD0t5p8rFgcfrKZZuGDOagCJYYASaVR6 9SGpDw+M8K7PyJ5UkObgycii6Qt+TSbQ1fOYhB3kfKr2NaWZ733HUt+x1PjFJ9a+wDE5 1FgTGu8BaIwR1HDIWYBzfCT9P0DVg2NtiaoB59UOmfW7up8WNb9qLZclyHy1rvrs/YKn mm/eNE/A7mPqbl84vxszFOTHngjTJLU/ohHeyQLvaHvvWlB4c1hbErLUoSgJzfZYUG9x NSHA== X-Forwarded-Encrypted: i=1; AJvYcCXrw9IMI7elv2HrVWGBm32bdDTWCgWcFciDuEEZJ7JIS8bW62WmrF1TwNfFWyy0ntZItMpkHV+LhRC7fqU=@vger.kernel.org X-Gm-Message-State: AOJu0YwDtcjumQA4sB96aOiavIs1NS6gCiqMDbpb6cs634JUU7gl7ZFU +bvqHQIvbwD56kBonZ3GiAUW2iLSJBxjuMWggMqdOlH3sGQb9F3RbvoVadWdWvYMWW3qlwvi2Xq c2kZVjfRp5+Uny5SA53+ydg== X-Google-Smtp-Source: AGHT+IHesn3dOk6yj3lGjW9oGrsY34nuGC/3KJDKeYlh++dy4L/FXCRUxYHz7/nH0P6zJ+IQHTR/7qIbv4kewoYFFA== X-Received: from pjbso17.prod.google.com ([2002:a17:90b:1f91:b0:301:2679:9d9]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a85:b0:30a:3dde:6af4 with SMTP id 98e67ed59e1d1-30e2e687a50mr6882572a91.31.1747266186577; Wed, 14 May 2025 16:43:06 -0700 (PDT) Date: Wed, 14 May 2025 16:41:45 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <237590b163506821120734a0c8aad95d9c7ef299.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 06/51] KVM: Query guest_memfd for private/shared status From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Query guest_memfd for private/shared status if those guest_memfds track private/shared status. With this patch, Coco VMs can use guest_memfd for both shared and private memory. If Coco VMs choose to use guest_memfd for both shared and private memory, by creating guest_memfd with the GUEST_MEMFD_FLAG_SUPPORT_SHARED flag, guest_memfd will be used to provide the private/shared status of the memory, instead of kvm->mem_attr_array. Change-Id: I8f23d7995c12242aa4e09ccf5ec19360e9c9ed83 Signed-off-by: Ackerley Tng --- include/linux/kvm_host.h | 19 ++++++++++++------- virt/kvm/guest_memfd.c | 22 ++++++++++++++++++++++ 2 files changed, 34 insertions(+), 7 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b317392453a5..91279e05e010 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2508,12 +2508,22 @@ static inline void kvm_prepare_memory_fault_exit(st= ruct kvm_vcpu *vcpu, } =20 #ifdef CONFIG_KVM_GMEM_SHARED_MEM + bool kvm_gmem_memslot_supports_shared(const struct kvm_memory_slot *slot); +bool kvm_gmem_is_private(struct kvm_memory_slot *slot, gfn_t gfn); + #else + static inline bool kvm_gmem_memslot_supports_shared(const struct kvm_memor= y_slot *slot) { return false; } + +static inline bool kvm_gmem_is_private(struct kvm_memory_slot *slot, gfn_t= gfn) +{ + return false; +} + #endif /* CONFIG_KVM_GMEM_SHARED_MEM */ =20 #ifdef CONFIG_KVM_GENERIC_MEMORY_ATTRIBUTES @@ -2544,13 +2554,8 @@ static inline bool kvm_mem_is_private(struct kvm *kv= m, gfn_t gfn) return false; =20 slot =3D gfn_to_memslot(kvm, gfn); - if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) { - /* - * For now, memslots only support in-place shared memory if the - * host is allowed to mmap memory (i.e., non-Coco VMs). - */ - return false; - } + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(slot)) + return kvm_gmem_is_private(slot, gfn); =20 return kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUTE_PRIVATE; } diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 6f6c4d298f8f..853e989bdcb2 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -865,6 +865,28 @@ bool kvm_gmem_memslot_supports_shared(const struct kvm= _memory_slot *slot) } EXPORT_SYMBOL_GPL(kvm_gmem_memslot_supports_shared); =20 +bool kvm_gmem_is_private(struct kvm_memory_slot *slot, gfn_t gfn) +{ + struct inode *inode; + struct file *file; + pgoff_t index; + bool ret; + + file =3D kvm_gmem_get_file(slot); + if (!file) + return false; + + index =3D kvm_gmem_get_index(slot, gfn); + inode =3D file_inode(file); + + filemap_invalidate_lock_shared(inode->i_mapping); + ret =3D kvm_gmem_shareability_get(inode, index) =3D=3D SHAREABILITY_GUEST; + filemap_invalidate_unlock_shared(inode->i_mapping); + + fput(file); + return ret; +} + #else #define kvm_gmem_mmap NULL #endif /* CONFIG_KVM_GMEM_SHARED_MEM */ --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06F3722FF35 for ; Wed, 14 May 2025 23:43:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266190; cv=none; b=JG+OZVphYVycYmyJIlBen8dO4JOl4X8aPmL+KjJLs1kiB68P5xENqbULVD6LNidtnXd1ZdUcEaaBjZEhDzSiOqZhQYVGX2qIJZz+AZx6tROJJx0/duyHGm+5u7kfDdke1OEk54xzwF6jC3Q7U9yEPdo2cVEOMlAhDhc77mFqW9U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266190; c=relaxed/simple; bh=iKx5IvzpTweaAgeru4AhASmMtm6VdZkcgLEM9HYPrv8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PNAZrACz0piSAimxpvrADb7BfAASjOLZmVXCakp6R3z0vEuxXAP5GSuO5DATtmSV1b7mYo64JKKYASX/smpwuOuOXQoiIb9ocXH2h9KjNDgLOy06B8DbKwqUDfNBf5TT/XBhKyo569o1DW94fIM2ot22qNFHIpTlGDaQea0tsAM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0Jh42S9J; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0Jh42S9J" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30a7cc8c52eso423405a91.1 for ; Wed, 14 May 2025 16:43:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266188; x=1747870988; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=duQaEMPI/+/8EJ+OqflXscIYsMBaL2tMJ4wppFAcdZo=; b=0Jh42S9JzEJNm5pM4bs11rd4rBqu/OpOxa5cgqX23fOJwDRJ4kTqF6HBCnjk45vdxq KapMAC9tcZK2/k0Y5nO7bhxYhaHAFTWZLhUp7ZjI3xR0lo0fgt5ZLE/EdqfigJM4ES9V AIiwagHgZyWt1Q7noZUKdHqm504ufoHC7GXPrmMVbHOCThhDOI8ONGeaWaxvspXbcjNP uLZ+VyF8PLFbTNR0ZwTolFzwoGTHgWkOY3TarRgDcOX15VQsnMsA5xCNRy2yKHF0tq2p 8nOoQQFPIQtqsfppRMP4jFp0bVQI8c7wik2kVXJx2YdEVWsxFgaaaqkcnMqN3LpkMt/o I7GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266188; x=1747870988; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=duQaEMPI/+/8EJ+OqflXscIYsMBaL2tMJ4wppFAcdZo=; b=mturBofJpu/sdLXWXFlLmpaszAC2OcgktShWRinUOs1EjJ0YJv8Uaa5AGY16nQGZdN Ro42a7D0aglufuiuEIRTzEN2LWgfhH+kcG1csdHe9h1aySGW/qVC0AN91NUqmfw22ord HmtU1SHCCLv0JPCx6bf1WrvShzDlgHQXcF25oCuzy23/yejBhqWmPKd02DfL6nAeJ2vu RwzpMYMP4Fuz5ylM5cq9MEOW8LoFGNKi8oEx4+jcVxPTkcFhrFXT6XQ53opodXehpfvu vSH7eq4K+fKGi9B8sLWZyjt6DhM1ZAAeEEmwp/NvtvOn+hXxHmPrsyDX6RMgyWCQ8sC8 I5Lg== X-Forwarded-Encrypted: i=1; AJvYcCXPh0Ls4y1kH5WBQ5QqYFe4VhJaG5u4cjDCX+uxPmnlqC3Tc4IkSIZondxzWzKqhQfiBI+gfZTYmnAoz14=@vger.kernel.org X-Gm-Message-State: AOJu0YzNuu7cRguefWtgoh1NAGxZ7rtsGQPWyPkPSyNGt2JmHWEaWT7b 7xfIUdzdpWMfme2LBKoLADwmTn2nMvaJBArYxQSGeLmrC+QEZPGG/Ovh4OrmJpn4hnGRbc9F37y nTWTFOYo6HShSj1UHUpHOlQ== X-Google-Smtp-Source: AGHT+IFrb+gsGiHZc/e64qLljrrwwFd5Rm0IiKqEVIeKNMAq/DWkzMG2AbDLdjpl7ubUiMkSYyuQC33P/I6kwzCn8Q== X-Received: from pjbee11.prod.google.com ([2002:a17:90a:fc4b:b0:30a:3021:c1af]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:520f:b0:306:b65e:13a8 with SMTP id 98e67ed59e1d1-30e2e5ba382mr8485187a91.8.1747266188231; Wed, 14 May 2025 16:43:08 -0700 (PDT) Date: Wed, 14 May 2025 16:41:46 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <59d0c13258bea1caec2d3eeed54bc8cb78783399.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 07/51] KVM: guest_memfd: Add CAP KVM_CAP_GMEM_CONVERSION From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" KVM_CAP_GMEM_CONVERSION indicates that guest_memfd supports conversion. With this patch, as long as guest_memfd supports shared memory, it also supports conversion. With conversion support comes tracking of private/shared memory within guest_memfd, hence now all VM types support shared memory in guest_memfd. Before this patch, Coco VMs did not support shared memory because that would allow private memory to be accessible to the host. Coco VMs now support shared memory because with private/shared status tracked in guest_memfd, private memory will not be allowed to be mapped into the host. Change-Id: I057b7bd267dd84a93fdee2e95cceb88cd9dfc647 Signed-off-by: Ackerley Tng --- arch/arm64/include/asm/kvm_host.h | 5 ----- arch/x86/include/asm/kvm_host.h | 10 ---------- include/linux/kvm_host.h | 13 ------------- include/uapi/linux/kvm.h | 1 + virt/kvm/guest_memfd.c | 12 ++++-------- virt/kvm/kvm_main.c | 3 ++- 6 files changed, 7 insertions(+), 37 deletions(-) diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm= _host.h index 2514779f5131..7df673a71ade 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1598,9 +1598,4 @@ static inline bool kvm_arch_supports_gmem(struct kvm = *kvm) return IS_ENABLED(CONFIG_KVM_GMEM); } =20 -static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm) -{ - return IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM); -} - #endif /* __ARM64_KVM_HOST_H__ */ diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index f72722949cae..709cc2a7ba66 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -2255,18 +2255,8 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forc= ed_root_level, =20 #ifdef CONFIG_KVM_GMEM #define kvm_arch_supports_gmem(kvm) ((kvm)->arch.supports_gmem) - -/* - * CoCo VMs with hardware support that use guest_memfd only for backing pr= ivate - * memory, e.g., TDX, cannot use guest_memfd with userspace mapping enable= d. - */ -#define kvm_arch_vm_supports_gmem_shared_mem(kvm) \ - (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM) && \ - ((kvm)->arch.vm_type =3D=3D KVM_X86_SW_PROTECTED_VM || \ - (kvm)->arch.vm_type =3D=3D KVM_X86_DEFAULT_VM)) #else #define kvm_arch_supports_gmem(kvm) false -#define kvm_arch_vm_supports_gmem_shared_mem(kvm) false #endif =20 #define kvm_arch_has_readonly_mem(kvm) (!(kvm)->arch.has_protected_state) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 91279e05e010..d703f291f467 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -729,19 +729,6 @@ static inline bool kvm_arch_supports_gmem(struct kvm *= kvm) } #endif =20 -/* - * Returns true if this VM supports shared mem in guest_memfd. - * - * Arch code must define kvm_arch_vm_supports_gmem_shared_mem if support f= or - * guest_memfd is enabled. - */ -#if !defined(kvm_arch_vm_supports_gmem_shared_mem) && !IS_ENABLED(CONFIG_K= VM_GMEM) -static inline bool kvm_arch_vm_supports_gmem_shared_mem(struct kvm *kvm) -{ - return false; -} -#endif - #ifndef kvm_arch_has_readonly_mem static inline bool kvm_arch_has_readonly_mem(struct kvm *kvm) { diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 5b28e17f6f14..433e184f83ea 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -931,6 +931,7 @@ struct kvm_enable_cap { #define KVM_CAP_X86_GUEST_MODE 238 #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239 #define KVM_CAP_GMEM_SHARED_MEM 240 +#define KVM_CAP_GMEM_CONVERSION 241 =20 struct kvm_irq_routing_irqchip { __u32 irqchip; diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 853e989bdcb2..8c9c9e54616b 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1216,7 +1216,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_creat= e_guest_memfd *args) u64 flags =3D args->flags; u64 valid_flags =3D 0; =20 - if (kvm_arch_vm_supports_gmem_shared_mem(kvm)) + if (IS_ENABLED(CONFIG_KVM_GMEM_SHARED_MEM)) valid_flags |=3D GUEST_MEMFD_FLAG_SUPPORT_SHARED; =20 if (flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED) @@ -1286,13 +1286,9 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory= _slot *slot, offset + size > i_size_read(inode)) goto err; =20 - if (kvm_gmem_supports_shared(inode)) { - if (!kvm_arch_vm_supports_gmem_shared_mem(kvm)) - goto err; - - if (slot->userspace_addr && - !kvm_gmem_is_same_range(kvm, slot, file, offset)) - goto err; + if (kvm_gmem_supports_shared(inode) && slot->userspace_addr && + !kvm_gmem_is_same_range(kvm, slot, file, offset)) { + goto err; } =20 filemap_invalidate_lock(inode->i_mapping); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 66dfdafbb3b6..92054b1bbd3f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4843,7 +4843,8 @@ static int kvm_vm_ioctl_check_extension_generic(struc= t kvm *kvm, long arg) #endif #ifdef CONFIG_KVM_GMEM_SHARED_MEM case KVM_CAP_GMEM_SHARED_MEM: - return !kvm || kvm_arch_vm_supports_gmem_shared_mem(kvm); + case KVM_CAP_GMEM_CONVERSION: + return true; #endif default: break; --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 736F3230BE6 for ; Wed, 14 May 2025 23:43:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266192; cv=none; b=f8nGfcisrWPNkT+mlIPE0CPb7f6lIlJrgc/iLxGn+mQTcLl/6ibR/uMR+Ufl8ZaarR/HClZHzK9Lg8V+3qciBUOqEPqV4CUEe+Yu2yQi0lJhQy/XtWGUTALeCaBt6GFJv4972+TnmLsvRHButx/4D+lkErBs1wTxw8LZQsCplQU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266192; c=relaxed/simple; bh=TuCNdCmxfK0dmEiyt2S0QuAAbIE9Srge8FQVDB0ZAyI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Ccyce9jDXC1tbgzuamiJYG0+VY5Oq7wefsfLgoD2fNytXxCIOIJY9GFSGwClZFsmTQ25bHi1fEvrYg2tbAPBuq6spdxeyo9Y8sko0bQl1kQFNBcaeDvw120v9xsvTI6PQpF1L2NKfYRA0PZksZMUeRRBz/n5KXXyVIQv8wMOf4c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Qc3Hqwi+; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Qc3Hqwi+" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30abb33d1d2so654391a91.0 for ; Wed, 14 May 2025 16:43:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266190; x=1747870990; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=O4oPfv2n35WjJH2QrKz/exL9QiCodfRoanXJUAdhzSo=; b=Qc3Hqwi+RdTmMP8o5BfDPMMa0DOonJYElm5vx1OVGSiD0ekNGhc3zHt6RUP0rwO7YZ sbAP5XcYCaGXja8PEpnMX5tfHesNGtkjDulRN7IY0G3fDVLunr9H+lr4tRcZlzZzue4n rp4LbHMXGo4a/cFOdt8oAkcqeSydhYPpSs61QFedt0kni1QOJzVWOBbiAF5fFiX43AsK BSxr+cIP8bcCVp6sgNSEQV2ySZsb55oP9nRef2B58fQB20ba7NR4VkKnlM8+2BV8Dbi1 OKWTfliDvdSMQhoHfr4oNlET/TMTQSXEuFVYo8HFVFJ6wFCp4CR5FRSzibVkHBZWwUSb lV5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266190; x=1747870990; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=O4oPfv2n35WjJH2QrKz/exL9QiCodfRoanXJUAdhzSo=; b=mh/d1oMaLL9La6LZDjMtZ/nKOCQ4n/ca+HuKuBena6fYhu/Dq84jvA5J4ChwT671Cb o/67WZGZFWmyLRCAHfj798MGHajdNLTl/JKPb5qN76Y+5X0R6oZ++HhKHro14QEJ7PHC 99pf6sZu5m1rvwOml1b0jFJDOmX6fDrOwBBv+A02rIiJGUkA3/xQTwb6Qw3oh6dLX7in mgJVDxx7y5It4fX8rEowR/UHphlkRDvYgANIE+T+IhWzJQ0UTqwZLSqIERpESnRyc+Ta 9DMpaR/dEvsafYdF7sZwUTeRIpOZUP4q7ROzPpGJAK7BOpvOiIHLSa3TExNg8K15iOoW akug== X-Forwarded-Encrypted: i=1; AJvYcCWfvdkTIjiBHD4YDzadQLUE7TJQo8tPvlEGq+ggUOEOOIpC2ieyuVgfrgt+enYxYq3UFadqE1gl/K5I5x0=@vger.kernel.org X-Gm-Message-State: AOJu0YwxcJ81GnSHJNjPTSyAASb+xeOgVttwSsJj1pWANRWzac5oRrTu HIP10pT+7au/rhbxB3/BpbdEdi95p6ODrrC1KDdL8/NrIkvO8T8y2QFoCTk75H56iakl1tRFOb8 EudwOYwBZ2OUuC2k++5a6dA== X-Google-Smtp-Source: AGHT+IEA8LrfLfFI8PpcjddLO/wbh1V8oxL5OnFdRpSUscIRk8byP+XixVrSrRP2XLlfKkrpW4t+rWo0H2IMHIeXcA== X-Received: from pjbqx15.prod.google.com ([2002:a17:90b:3e4f:b0:308:64ce:7274]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2dc2:b0:2ee:f687:6acb with SMTP id 98e67ed59e1d1-30e5158e007mr667015a91.13.1747266189745; Wed, 14 May 2025 16:43:09 -0700 (PDT) Date: Wed, 14 May 2025 16:41:47 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <7ae972e602f94cef707ccb19b139638f4266d361.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 08/51] KVM: selftests: Test flag validity after guest_memfd supports conversions From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Before guest_memfd supports conversions, Coco VMs must not allow GUEST_MEMFD_FLAG_SUPPORT_SHARED. Because this is a platform stability requirement for hosts supporting Coco VMs, this is an important test to retain. Change-Id: I7a42a7d22e96adf17db3dcaedac6b175a36a0eab Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/guest_memfd_test.c | 26 ++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing= /selftests/kvm/guest_memfd_test.c index bf2876cbd711..51d88acdf072 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -435,7 +435,8 @@ static void test_vm_type_gmem_flag_validity(unsigned lo= ng vm_type, for (flag =3D BIT(0); flag; flag <<=3D 1) { test_vm_with_gmem_flag(vm, flag, flag & expected_valid_flags); =20 - if (flag =3D=3D GUEST_MEMFD_FLAG_SUPPORT_SHARED) { + if (flag =3D=3D GUEST_MEMFD_FLAG_SUPPORT_SHARED && + kvm_has_cap(KVM_CAP_GMEM_CONVERSION)) { test_vm_with_gmem_flag( vm, flag | GUEST_MEMFD_FLAG_INIT_PRIVATE, true); } @@ -444,7 +445,7 @@ static void test_vm_type_gmem_flag_validity(unsigned lo= ng vm_type, kvm_vm_release(vm); } =20 -static void test_gmem_flag_validity(void) +static void test_gmem_flag_validity_without_conversion_cap(void) { uint64_t non_coco_vm_valid_flags =3D 0; =20 @@ -462,11 +463,30 @@ static void test_gmem_flag_validity(void) #endif } =20 +static void test_gmem_flag_validity(void) +{ + /* After conversions are supported, all VM types support shared mem. */ + uint64_t valid_flags =3D GUEST_MEMFD_FLAG_SUPPORT_SHARED; + + test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, valid_flags); + +#ifdef __x86_64__ + test_vm_type_gmem_flag_validity(KVM_X86_SW_PROTECTED_VM, valid_flags); + test_vm_type_gmem_flag_validity(KVM_X86_SEV_VM, valid_flags); + test_vm_type_gmem_flag_validity(KVM_X86_SEV_ES_VM, valid_flags); + test_vm_type_gmem_flag_validity(KVM_X86_SNP_VM, valid_flags); + test_vm_type_gmem_flag_validity(KVM_X86_TDX_VM, valid_flags); +#endif +} + int main(int argc, char *argv[]) { TEST_REQUIRE(kvm_has_cap(KVM_CAP_GUEST_MEMFD)); =20 - test_gmem_flag_validity(); + if (kvm_has_cap(KVM_CAP_GMEM_CONVERSION)) + test_gmem_flag_validity(); + else + test_gmem_flag_validity_without_conversion_cap(); =20 test_with_type(VM_TYPE_DEFAULT, 0, false); if (kvm_has_cap(KVM_CAP_GMEM_SHARED_MEM)) { --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41932227BA5 for ; Wed, 14 May 2025 23:43:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266194; cv=none; b=Hiq2OISWWSCXl1/7AiSX3A4pv6PDox+hjx5Ku4KxNZSurrc/eMz2k1L3AGKmSmXGev2zfAMW/j3nLqO6JUf0RQS7QJp4GY9scAUD5D76RdyddDM/n2cVDC0YthN06+kG0gH5brTENs4ytoK2ZtH/u+lkWBtBrOLpUe3p8dzRc+Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266194; c=relaxed/simple; bh=ROz3Vh7FyPzpWClrBL5MnADoPmYX3e2sldFKqiqdpas=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=l8/XEHW5HHJWV2kZz+dzrF1wl1gwNht6NV3Wzu9CdDrvhAPz3FVz9EEiRhVe6adxANLkLuOJbnG38J3xPUvCy4M+pcr+Env2seVFvfR2p53a3K8C82suGe243bRklWloGRYCwbSznMUWt8dPqMOfpPlKLgtuTyMXDFt/rQJLQto= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dTyThdz1; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dTyThdz1" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b090c7c2c6aso184512a12.0 for ; Wed, 14 May 2025 16:43:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266191; x=1747870991; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XIQY9yLRzDdCwjnfungXTg71dAj8IwxkOejXFn+6ono=; b=dTyThdz1gNMa8qHdlw2eNqVE1Kvdd99L8asygIMrsE+xW7foDlOu/bHRqYCxlbCDmr NMFE4osgUoPJsLKA5XaIm8JyE7RlMoMjl3da+N1kWX/LHY0cwungy0jvhi9qxbxKATQu AwKUUgRZNKUtUgbJWxBLW5FDkJa3N6UgGTUvKGY3EmxSPg8mIXrSCUP/qyeWu6JvUFbF /ZFf91BRVlT/Q9QcgOcXFzcZzjlHgkSEaTSsW94L8fDvu4imt19laE8+E9Ixx7IbTKQ5 TwgKd6Xp9x9NAoZcWiZNifdptel0b0wyieB200mXY4LsSaAG+rYtw5+YigHxnU4EsTLa uuqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266191; x=1747870991; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XIQY9yLRzDdCwjnfungXTg71dAj8IwxkOejXFn+6ono=; b=DoDraIr856yOJkJJCqpUVsPolWeG9xBEtvqVkGs+ERKYuuzNAIw7dGidggzvZKU7C4 P+pvZZK/6KUhB4kUXirriYKWc/5DQLJB0asf+UdVg/V549Gg141JUbjDbYrJAc8wTZq0 TvIZNrIbD/trH5pBUneqYjBNZV0mnS71vmrQ5JMrYnmdOfCIt5tRcbwvG9gFVgplH7oh l/VPJDDd46lH2ju2mSEuDQdIWasqi8XQafogljOBTZ11Oe+6rBjFrT1XrbiulYIFXVFC RlNfe3yvoIA2QkkGuzaM1nlc16KyzkW+vlV48cYTjuNfqLa51Z7r45WxEL8i7ux8DzJm G2oA== X-Forwarded-Encrypted: i=1; AJvYcCW6BPln1eaF1bbyr6QJnLy2Lt4BjWwvJ3wxiXRKc8C1oHkGKPdxiyydmJcyq5zjkFw6K2vHDRKzAJE73No=@vger.kernel.org X-Gm-Message-State: AOJu0YxrpQVuF05ub21ia/yrxfQrl1nU1tS2iM5t3wT8qXR9uVSGEK5O lZ0WNneqr3j/PRKd3GzmINGUgXLY0JWwSw6r4FQdyYSkqsYrlclM+JMuTDGYAYDNnc6A6si90W+ esbeJQ5ob1H5KfONrLD8v5A== X-Google-Smtp-Source: AGHT+IHC0afSzx4CRGLb+NEtt5FopkBc14R3Tsx3Oo5v/U8eDsNNnam5RHpBwQ1p1qHPCM63buWzU71Ko18fm+qY7w== X-Received: from pjbnd11.prod.google.com ([2002:a17:90b:4ccb:b0:2fa:a101:743]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2d4f:b0:30a:3e8e:12cd with SMTP id 98e67ed59e1d1-30e51914e56mr638587a91.22.1747266191346; Wed, 14 May 2025 16:43:11 -0700 (PDT) Date: Wed, 14 May 2025 16:41:48 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <3d2f49b409f1d6564eaff49494789908eb9b74e5.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 09/51] KVM: selftests: Test faulting with respect to GUEST_MEMFD_FLAG_INIT_PRIVATE From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Test that faulting is denied when guest_memfd's shareability is initialized as private with GUEST_MEMFD_FLAG_INIT_PRIVATE and allowed if the flag is not specified. Signed-off-by: Ackerley Tng Co-developed-by: Fuad Tabba Signed-off-by: Fuad Tabba Change-Id: Id93d4683b36fc5a9c924458d26f0525baed26435 --- .../testing/selftests/kvm/guest_memfd_test.c | 112 +++++++++++++++--- 1 file changed, 97 insertions(+), 15 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing= /selftests/kvm/guest_memfd_test.c index 51d88acdf072..1e79382fd830 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -16,6 +16,7 @@ #include #include #include +#include =20 #include "kvm_util.h" #include "test_util.h" @@ -34,7 +35,7 @@ static void test_file_read_write(int fd) "pwrite on a guest_mem fd should fail"); } =20 -static void test_mmap_allowed(int fd, size_t page_size, size_t total_size) +static void test_faulting_allowed(int fd, size_t page_size, size_t total_s= ize) { const char val =3D 0xaa; char *mem; @@ -65,6 +66,53 @@ static void test_mmap_allowed(int fd, size_t page_size, = size_t total_size) TEST_ASSERT(!ret, "munmap should succeed"); } =20 +static void assert_not_faultable(char *address) +{ + pid_t child_pid; + + child_pid =3D fork(); + TEST_ASSERT(child_pid !=3D -1, "fork failed"); + + if (child_pid =3D=3D 0) { + *address =3D 'A'; + TEST_FAIL("Child should have exited with a signal"); + } else { + int status; + + waitpid(child_pid, &status, 0); + + TEST_ASSERT(WIFSIGNALED(status), + "Child should have exited with a signal"); + TEST_ASSERT_EQ(WTERMSIG(status), SIGBUS); + } +} + +static void test_faulting_sigbus(int fd, size_t total_size) +{ + char *mem; + int ret; + + mem =3D mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(mem !=3D MAP_FAILED, "mmaping() guest memory should pass."); + + assert_not_faultable(mem); + + ret =3D munmap(mem, total_size); + TEST_ASSERT(!ret, "munmap should succeed"); +} + +static void test_mmap_allowed(int fd, size_t total_size) +{ + char *mem; + int ret; + + mem =3D mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + TEST_ASSERT(mem !=3D MAP_FAILED, "mmaping() guest memory should pass."); + + ret =3D munmap(mem, total_size); + TEST_ASSERT(!ret, "munmap should succeed"); +} + static void test_mmap_denied(int fd, size_t page_size, size_t total_size) { char *mem; @@ -364,40 +412,74 @@ static void test_bind_guest_memfd_wrt_userspace_addr(= struct kvm_vm *vm) close(fd); } =20 -static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_fla= gs, - bool expect_mmap_allowed) +static void test_guest_memfd_features(struct kvm_vm *vm, size_t page_size, + uint64_t guest_memfd_flags, + bool expect_mmap_allowed, + bool expect_faulting_allowed) { - struct kvm_vm *vm; size_t total_size; - size_t page_size; int fd; =20 - if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) - return; - - page_size =3D getpagesize(); total_size =3D page_size * 4; =20 - vm =3D vm_create_barebones_type(vm_type); + if (expect_faulting_allowed) + TEST_REQUIRE(expect_mmap_allowed); =20 - test_create_guest_memfd_multiple(vm); - test_bind_guest_memfd_wrt_userspace_addr(vm); test_create_guest_memfd_invalid_sizes(vm, guest_memfd_flags, page_size); =20 fd =3D vm_create_guest_memfd(vm, total_size, guest_memfd_flags); =20 test_file_read_write(fd); =20 - if (expect_mmap_allowed) - test_mmap_allowed(fd, page_size, total_size); - else + if (expect_mmap_allowed) { + test_mmap_allowed(fd, total_size); + + if (expect_faulting_allowed) + test_faulting_allowed(fd, page_size, total_size); + else + test_faulting_sigbus(fd, total_size); + } else { test_mmap_denied(fd, page_size, total_size); + } =20 test_file_size(fd, page_size, total_size); test_fallocate(fd, page_size, total_size); test_invalid_punch_hole(fd, page_size, total_size); =20 close(fd); +} + +static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_fla= gs, + bool expect_mmap_allowed) +{ + struct kvm_vm *vm; + size_t page_size; + + if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) + return; + + vm =3D vm_create_barebones_type(vm_type); + + test_create_guest_memfd_multiple(vm); + test_bind_guest_memfd_wrt_userspace_addr(vm); + + page_size =3D getpagesize(); + if (guest_memfd_flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED) { + test_guest_memfd_features(vm, page_size, guest_memfd_flags, + expect_mmap_allowed, true); + + if (kvm_has_cap(KVM_CAP_GMEM_CONVERSION)) { + uint64_t flags =3D guest_memfd_flags | + GUEST_MEMFD_FLAG_INIT_PRIVATE; + + test_guest_memfd_features(vm, page_size, flags, + expect_mmap_allowed, false); + } + } else { + test_guest_memfd_features(vm, page_size, guest_memfd_flags, + expect_mmap_allowed, false); + } + kvm_vm_release(vm); } =20 --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C38C722F392 for ; Wed, 14 May 2025 23:43:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266196; cv=none; b=D9NQSuLBZdHcUXyLam0r59b1IWWOPXPFMia8++S3qTetoEsibpmySFXtJrbFkPPVCbOOflcKi+bycBGLkiugABX+RV4nwQrqWTBIjpZ2UIGmLi4Zw2MXzOEsdX8I6n4HDekJkUhW4/BTEsW9nOBmD3gc53J+SyA/Iho8Y6Df8pg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266196; c=relaxed/simple; bh=0PD4qw71a92H20thWZwCchz9FaTYnYvvVR8PnCp4a7I=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Xl326rLIf8uIyOUwcxga/4pplDK60CdQB/N7idUX/AzRfvuWjEr50V/zt9vV3vGH1UHiRAI6irhoa6+wo43kKfHvbV9eijjL8/qHBHXVLsFNN35SN9hkiWfJs76JWuOy565m1dDlXfF4GbCWRbXj3W4cymPQJj7UkfhOhpZw1FM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fh/HsReV; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fh/HsReV" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30c8c9070d2so551609a91.0 for ; Wed, 14 May 2025 16:43:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266193; x=1747870993; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IAI1pIUqm+SvgqmPkfJ097xIdC/ZKJ3Eqw0mKmDTKDI=; b=fh/HsReVyuPLUFANj2xxf9xTbtrLfJzKbUKdap8B2sBewWV9LPKKZ3sp+2ST6sjxuf 4L3KZC3T+WxfS9Er2LzheOD0+S3f0K696S4XwQZCaeRcpQ+Ee1jaRENSB/201ea8kQjE rxeoxASCNFJO6l6pHZkiXy46+KBH8+qckeS98vcwtbKdektbDo2bmm0fzymRLjRWfUOg AL1K064zLnPUluK5Jfg22XBkpuJIrLYNJrR2l+SkXYT4Hu+c/7KdkGB9GqfY1fdO9oKO h4PwYDXcwt31ZyCuy6s/stsVjcm1r/qgffmpen05ePyA+wTLyGOlkmA+93CCwBFN6Yao yniw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266193; x=1747870993; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IAI1pIUqm+SvgqmPkfJ097xIdC/ZKJ3Eqw0mKmDTKDI=; b=lmk+qLJCZg0dNCnlIkCErEHwcaAPPG/FHTk3P25ZJttSzPV+RFI93fFBO1rnDlp9Gr MaTBn7pPjQMTFlIDZ+CUpHM0Q7tduKRNQvLE0qFgBxbLbYBvI8sS4YZ94KFkNHF3FxUt x0w3qwbiq6KnyFF4VBDZRs7sh77NJ0zAycFCH2CmLAzyVTWSU6jwmSx6SEHdNqof5aSk ssP/OiIne0Swhd2yLlqlmS4qEHDjw4B+ugzH8zh3aK5civc+QYSzw1EBFfAhyWZ9Sowq bA1+ZNesSOw2m2sSi1SylXqB0in6Tgx4FvZwMQ/0TYmAd9C4DmKUgll2G9eSxUV3cP0m FxJw== X-Forwarded-Encrypted: i=1; AJvYcCUvuMbRsKdm4xyfXUpYfyvrmm7LQC8goVngbc8S0QFYI6REUdYecBkxVx4NT1xOs/UVIW6KK42J3ZM2Jdk=@vger.kernel.org X-Gm-Message-State: AOJu0YwWythUC75beu5QKNqaXukUjlp2TB39unmJpXHwO7kTEv/Qy9Bu Pn9q0LANg6rh1glI+8VTb89cUt3qyoa0KaxaCG7WTYGDRLn/He7hsm9t6b5MF3OmXfRiyM1Ffpk voLircQRhgvrDq7OV/hi+vg== X-Google-Smtp-Source: AGHT+IEzWKRHvJxFpPHhQwCbO4gUf/opoh8fdITiyrUsafwAX3adeMMi56ilVeaSPhJ42tnf1R8Ds40TdZQq21f91w== X-Received: from pjuw5.prod.google.com ([2002:a17:90a:d605:b0:30a:7d22:be7b]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d647:b0:30c:540b:9b6 with SMTP id 98e67ed59e1d1-30e2e419501mr9812590a91.0.1747266192909; Wed, 14 May 2025 16:43:12 -0700 (PDT) Date: Wed, 14 May 2025 16:41:49 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <9a9db594cc0e9d059dd30d2415d0346e09065bb6.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 10/51] KVM: selftests: Refactor vm_mem_add to be more flexible From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" enum vm_mem_backing_src_type is encoding too many different possibilities on different axes of (1) whether to mmap from an fd, (2) granularity of mapping for THP, (3) size of hugetlb mapping, and has yet to be extended to support guest_memfd. When guest_memfd supports mmap() and we also want to support testing with mmap()ing from guest_memfd, the number of combinations make enumeration in vm_mem_backing_src_type difficult. This refactor separates out vm_mem_backing_src_type from userspace_mem_region. For now, vm_mem_backing_src_type remains a possible way for tests to specify, on the command line, the combination of backing memory to test. vm_mem_add() is now the last place where vm_mem_backing_src_type is interpreted, to 1. Check validity of requested guest_paddr 2. Align mmap_size appropriately based on the mapping's page_size and architecture 3. Install memory appropriately according to mapping's page size mmap()ing an alias seems to be specific to userfaultfd tests and could be refactored out of struct userspace_mem_region and localized in userfaultfd tests in future. This paves the way for replacing vm_mem_backing_src_type with multiple command line flags that would specify backing memory more flexibly. Future tests are expected to use vm_mem_region_alloc() to allocate a struct userspace_mem_region, then use more fundamental functions like vm_mem_region_mmap(), vm_mem_region_madvise_thp(), kvm_memfd_create(), vm_create_guest_memfd(), and other functions in vm_mem_add() to flexibly build up struct userspace_mem_region before finally adding the region to the vm with vm_mem_region_add(). Change-Id: Ibb37af8a1a3bbb6de776426302433c5d9613ee76 Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/include/kvm_util.h | 29 +- .../testing/selftests/kvm/include/test_util.h | 2 + tools/testing/selftests/kvm/lib/kvm_util.c | 429 +++++++++++------- tools/testing/selftests/kvm/lib/test_util.c | 25 + 4 files changed, 328 insertions(+), 157 deletions(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing= /selftests/kvm/include/kvm_util.h index 373912464fb4..853ab68cff79 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -35,11 +35,26 @@ struct userspace_mem_region { struct sparsebit *protected_phy_pages; int fd; off_t offset; - enum vm_mem_backing_src_type backing_src_type; + /* + * host_mem is mmap_start aligned upwards to an address suitable for the + * architecture. In most cases, host_mem and mmap_start are the same, + * except for s390x, where the host address must be aligned to 1M (due + * to PGSTEs). + */ +#ifdef __s390x__ +#define S390X_HOST_ADDRESS_ALIGNMENT 0x100000 +#endif void *host_mem; + /* host_alias is to mmap_alias as host_mem is to mmap_start */ void *host_alias; void *mmap_start; void *mmap_alias; + /* + * mmap_size is possibly larger than region.memory_size because in some + * cases, host_mem has to be adjusted upwards (see comment for host_mem + * above). In those cases, mmap_size has to be adjusted upwards so that + * enough memory is available in this memslot. + */ size_t mmap_size; struct rb_node gpa_node; struct rb_node hva_node; @@ -582,6 +597,18 @@ int __vm_set_user_memory_region2(struct kvm_vm *vm, ui= nt32_t slot, uint32_t flag uint64_t gpa, uint64_t size, void *hva, uint32_t guest_memfd, uint64_t guest_memfd_offset); =20 +struct userspace_mem_region *vm_mem_region_alloc(struct kvm_vm *vm); +void *vm_mem_region_mmap(struct userspace_mem_region *region, size_t lengt= h, + int flags, int fd, off_t offset); +void vm_mem_region_install_memory(struct userspace_mem_region *region, + size_t memslot_size, size_t alignment); +void vm_mem_region_madvise_thp(struct userspace_mem_region *region, int ad= vice); +int vm_mem_region_install_guest_memfd(struct userspace_mem_region *region, + int guest_memfd); +void *vm_mem_region_mmap_alias(struct userspace_mem_region *region, int fl= ags, + size_t alignment); +void vm_mem_region_add(struct kvm_vm *vm, struct userspace_mem_region *reg= ion); + void vm_userspace_mem_region_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, uint64_t guest_paddr, uint32_t slot, uint64_t npages, diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testin= g/selftests/kvm/include/test_util.h index 77d13d7920cb..b4a03784ac4f 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -149,6 +149,8 @@ size_t get_trans_hugepagesz(void); size_t get_def_hugetlb_pagesz(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i= ); size_t get_backing_src_pagesz(uint32_t i); +int backing_src_should_madvise(uint32_t i); +int get_backing_src_madvise_advice(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/sel= ftests/kvm/lib/kvm_util.c index 815bc45dd8dc..58a3365f479c 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -824,15 +824,12 @@ void kvm_vm_free(struct kvm_vm *vmp) free(vmp); } =20 -int kvm_memfd_alloc(size_t size, bool hugepages) +int kvm_create_memfd(size_t size, unsigned int flags) { - int memfd_flags =3D MFD_CLOEXEC; - int fd, r; + int fd; + int r; =20 - if (hugepages) - memfd_flags |=3D MFD_HUGETLB; - - fd =3D memfd_create("kvm_selftest", memfd_flags); + fd =3D memfd_create("kvm_selftest", flags); TEST_ASSERT(fd !=3D -1, __KVM_SYSCALL_ERROR("memfd_create()", fd)); =20 r =3D ftruncate(fd, size); @@ -844,6 +841,16 @@ int kvm_memfd_alloc(size_t size, bool hugepages) return fd; } =20 +int kvm_memfd_alloc(size_t size, bool hugepages) +{ + int memfd_flags =3D MFD_CLOEXEC; + + if (hugepages) + memfd_flags |=3D MFD_HUGETLB; + + return kvm_create_memfd(size, memfd_flags); +} + static void vm_userspace_mem_region_gpa_insert(struct rb_root *gpa_tree, struct userspace_mem_region *region) { @@ -953,185 +960,295 @@ void vm_set_user_memory_region2(struct kvm_vm *vm, = uint32_t slot, uint32_t flags errno, strerror(errno)); } =20 +/** + * Allocates and returns a struct userspace_mem_region. + */ +struct userspace_mem_region *vm_mem_region_alloc(struct kvm_vm *vm) +{ + struct userspace_mem_region *region; =20 -/* FIXME: This thing needs to be ripped apart and rewritten. */ -void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, - uint64_t guest_paddr, uint32_t slot, uint64_t npages, - uint32_t flags, int guest_memfd, uint64_t guest_memfd_offset) + /* Allocate and initialize new mem region structure. */ + region =3D calloc(1, sizeof(*region)); + TEST_ASSERT(region !=3D NULL, "Insufficient Memory"); + + region->unused_phy_pages =3D sparsebit_alloc(); + if (vm_arch_has_protected_memory(vm)) + region->protected_phy_pages =3D sparsebit_alloc(); + + region->fd =3D -1; + region->region.guest_memfd =3D -1; + + return region; +} + +static size_t compute_page_size(int mmap_flags, int madvise_advice) +{ + if (mmap_flags & MAP_HUGETLB) { + int size_flags =3D (mmap_flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK; + + if (!size_flags) + return get_def_hugetlb_pagesz(); + + return 1ULL << size_flags; + } + + return madvise_advice =3D=3D MADV_HUGEPAGE ? get_trans_hugepagesz() : get= pagesize(); +} + +/** + * Calls mmap() with @length, @flags, @fd, @offset for @region. + * + * Think of this as the struct userspace_mem_region wrapper for the mmap() + * syscall. + */ +void *vm_mem_region_mmap(struct userspace_mem_region *region, size_t lengt= h, + int flags, int fd, off_t offset) +{ + void *mem; + + if (flags & MAP_SHARED) { + TEST_ASSERT(fd !=3D -1, + "Ensure that fd is provided for shared mappings."); + TEST_ASSERT( + region->fd =3D=3D fd || region->region.guest_memfd =3D=3D fd, + "Ensure that fd is opened before mmap, and is either " + "set up in region->fd or region->region.guest_memfd."); + } + + mem =3D mmap(NULL, length, PROT_READ | PROT_WRITE, flags, fd, offset); + TEST_ASSERT(mem !=3D MAP_FAILED, "Couldn't mmap anonymous memory"); + + region->mmap_start =3D mem; + region->mmap_size =3D length; + region->offset =3D offset; + + return mem; +} + +/** + * Installs mmap()ed memory in @region->mmap_start as @region->host_mem, + * checking constraints. + */ +void vm_mem_region_install_memory(struct userspace_mem_region *region, + size_t memslot_size, size_t alignment) +{ + TEST_ASSERT(region->mmap_size >=3D memslot_size, + "mmap()ed memory insufficient for memslot"); + + region->host_mem =3D align_ptr_up(region->mmap_start, alignment); + region->region.userspace_addr =3D (uint64_t)region->host_mem; + region->region.memory_size =3D memslot_size; +} + + +/** + * Calls madvise with @advice for @region. + * + * Think of this as the struct userspace_mem_region wrapper for the madvis= e() + * syscall. + */ +void vm_mem_region_madvise_thp(struct userspace_mem_region *region, int ad= vice) { int ret; + + TEST_ASSERT( + region->host_mem && region->mmap_size, + "vm_mem_region_madvise_thp() must be called after vm_mem_region_mmap()"); + + ret =3D madvise(region->host_mem, region->mmap_size, advice); + TEST_ASSERT(ret =3D=3D 0, "madvise failed, addr: %p length: 0x%lx", + region->host_mem, region->mmap_size); +} + +/** + * Installs guest_memfd by setting it up in @region. + * + * Returns the guest_memfd that was installed in the @region. + */ +int vm_mem_region_install_guest_memfd(struct userspace_mem_region *region, + int guest_memfd) +{ + /* + * Install a unique fd for each memslot so that the fd can be closed + * when the region is deleted without needing to track if the fd is + * owned by the framework or by the caller. + */ + guest_memfd =3D dup(guest_memfd); + TEST_ASSERT(guest_memfd >=3D 0, __KVM_SYSCALL_ERROR("dup()", guest_memfd)= ); + region->region.guest_memfd =3D guest_memfd; + + return guest_memfd; +} + +/** + * Calls mmap() to create an alias for mmap()ed memory at region->host_mem, + * exactly the same size the was mmap()ed. + * + * This is used mainly for userfaultfd tests. + */ +void *vm_mem_region_mmap_alias(struct userspace_mem_region *region, int fl= ags, + size_t alignment) +{ + region->mmap_alias =3D mmap(NULL, region->mmap_size, + PROT_READ | PROT_WRITE, flags, region->fd, 0); + TEST_ASSERT(region->mmap_alias !=3D MAP_FAILED, + __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED)); + + region->host_alias =3D align_ptr_up(region->mmap_alias, alignment); + + return region->host_alias; +} + +static void vm_mem_region_assert_no_duplicate(struct kvm_vm *vm, uint32_t = slot, + uint64_t gpa, size_t size) +{ struct userspace_mem_region *region; - size_t backing_src_pagesz =3D get_backing_src_pagesz(src_type); - size_t mem_size =3D npages * vm->page_size; - size_t alignment; - - TEST_REQUIRE_SET_USER_MEMORY_REGION2(); - - TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) =3D=3D npages, - "Number of guest pages is not compatible with the host. " - "Try npages=3D%d", vm_adjust_num_guest_pages(vm->mode, npages)); - - TEST_ASSERT((guest_paddr % vm->page_size) =3D=3D 0, "Guest physical " - "address not on a page boundary.\n" - " guest_paddr: 0x%lx vm->page_size: 0x%x", - guest_paddr, vm->page_size); - TEST_ASSERT((((guest_paddr >> vm->page_shift) + npages) - 1) - <=3D vm->max_gfn, "Physical range beyond maximum " - "supported physical address,\n" - " guest_paddr: 0x%lx npages: 0x%lx\n" - " vm->max_gfn: 0x%lx vm->page_size: 0x%x", - guest_paddr, npages, vm->max_gfn, vm->page_size); =20 /* * Confirm a mem region with an overlapping address doesn't * already exist. */ - region =3D (struct userspace_mem_region *) userspace_mem_region_find( - vm, guest_paddr, (guest_paddr + npages * vm->page_size) - 1); - if (region !=3D NULL) - TEST_FAIL("overlapping userspace_mem_region already " - "exists\n" - " requested guest_paddr: 0x%lx npages: 0x%lx " - "page_size: 0x%x\n" - " existing guest_paddr: 0x%lx size: 0x%lx", - guest_paddr, npages, vm->page_size, - (uint64_t) region->region.guest_phys_addr, - (uint64_t) region->region.memory_size); + region =3D userspace_mem_region_find(vm, gpa, gpa + size - 1); + if (region !=3D NULL) { + TEST_FAIL("overlapping userspace_mem_region already exists\n" + " requested gpa: 0x%lx size: 0x%lx" + " existing gpa: 0x%lx size: 0x%lx", + gpa, size, + (uint64_t) region->region.guest_phys_addr, + (uint64_t) region->region.memory_size); + } =20 /* Confirm no region with the requested slot already exists. */ - hash_for_each_possible(vm->regions.slot_hash, region, slot_node, - slot) { + hash_for_each_possible(vm->regions.slot_hash, region, slot_node, slot) { if (region->region.slot !=3D slot) continue; =20 - TEST_FAIL("A mem region with the requested slot " - "already exists.\n" - " requested slot: %u paddr: 0x%lx npages: 0x%lx\n" - " existing slot: %u paddr: 0x%lx size: 0x%lx", - slot, guest_paddr, npages, - region->region.slot, - (uint64_t) region->region.guest_phys_addr, - (uint64_t) region->region.memory_size); + TEST_FAIL("A mem region with the requested slot already exists.\n" + " requested slot: %u paddr: 0x%lx size: 0x%lx\n" + " existing slot: %u paddr: 0x%lx size: 0x%lx", + slot, gpa, size, + region->region.slot, + (uint64_t) region->region.guest_phys_addr, + (uint64_t) region->region.memory_size); } +} =20 - /* Allocate and initialize new mem region structure. */ - region =3D calloc(1, sizeof(*region)); - TEST_ASSERT(region !=3D NULL, "Insufficient Memory"); - region->mmap_size =3D mem_size; +/** + * Add a @region to @vm. All necessary fields in region->region should alr= eady + * be populated. + * + * Think of this as the struct userspace_mem_region wrapper for the + * KVM_SET_USER_MEMORY_REGION2 ioctl. + */ +void vm_mem_region_add(struct kvm_vm *vm, struct userspace_mem_region *reg= ion) +{ + uint64_t npages; + uint64_t gpa; + int ret; =20 -#ifdef __s390x__ - /* On s390x, the host address must be aligned to 1M (due to PGSTEs) */ - alignment =3D 0x100000; -#else - alignment =3D 1; -#endif + TEST_REQUIRE_SET_USER_MEMORY_REGION2(); =20 - /* - * When using THP mmap is not guaranteed to returned a hugepage aligned - * address so we have to pad the mmap. Padding is not needed for HugeTLB - * because mmap will always return an address aligned to the HugeTLB - * page size. - */ - if (src_type =3D=3D VM_MEM_SRC_ANONYMOUS_THP) - alignment =3D max(backing_src_pagesz, alignment); + npages =3D region->region.memory_size / vm->page_size; + TEST_ASSERT(vm_adjust_num_guest_pages(vm->mode, npages) =3D=3D npages, + "Number of guest pages is not compatible with the host. " + "Try npages=3D%d", vm_adjust_num_guest_pages(vm->mode, npages)); =20 - TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, backing_src_pagesz)); + gpa =3D region->region.guest_phys_addr; + TEST_ASSERT((gpa % vm->page_size) =3D=3D 0, + "Guest physical address not on a page boundary.\n" + " gpa: 0x%lx vm->page_size: 0x%x", + gpa, vm->page_size); + TEST_ASSERT((((gpa >> vm->page_shift) + npages) - 1) <=3D vm->max_gfn, + "Physical range beyond maximum supported physical address,\n" + " gpa: 0x%lx npages: 0x%lx\n" + " vm->max_gfn: 0x%lx vm->page_size: 0x%x", + gpa, npages, vm->max_gfn, vm->page_size); =20 - /* Add enough memory to align up if necessary */ - if (alignment > 1) - region->mmap_size +=3D alignment; + vm_mem_region_assert_no_duplicate(vm, region->region.slot, gpa, + region->mmap_size); =20 - region->fd =3D -1; - if (backing_src_is_shared(src_type)) - region->fd =3D kvm_memfd_alloc(region->mmap_size, - src_type =3D=3D VM_MEM_SRC_SHARED_HUGETLB); - - region->mmap_start =3D mmap(NULL, region->mmap_size, - PROT_READ | PROT_WRITE, - vm_mem_backing_src_alias(src_type)->flag, - region->fd, 0); - TEST_ASSERT(region->mmap_start !=3D MAP_FAILED, - __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED)); - - TEST_ASSERT(!is_backing_src_hugetlb(src_type) || - region->mmap_start =3D=3D align_ptr_up(region->mmap_start, backing_s= rc_pagesz), - "mmap_start %p is not aligned to HugeTLB page size 0x%lx", - region->mmap_start, backing_src_pagesz); - - /* Align host address */ - region->host_mem =3D align_ptr_up(region->mmap_start, alignment); - - /* As needed perform madvise */ - if ((src_type =3D=3D VM_MEM_SRC_ANONYMOUS || - src_type =3D=3D VM_MEM_SRC_ANONYMOUS_THP) && thp_configured()) { - ret =3D madvise(region->host_mem, mem_size, - src_type =3D=3D VM_MEM_SRC_ANONYMOUS ? MADV_NOHUGEPAGE : MADV_HUG= EPAGE); - TEST_ASSERT(ret =3D=3D 0, "madvise failed, addr: %p length: 0x%lx src_ty= pe: %s", - region->host_mem, mem_size, - vm_mem_backing_src_alias(src_type)->name); - } - - region->backing_src_type =3D src_type; - - if (flags & KVM_MEM_GUEST_MEMFD) { - if (guest_memfd < 0) { - uint32_t guest_memfd_flags =3D 0; - TEST_ASSERT(!guest_memfd_offset, - "Offset must be zero when creating new guest_memfd"); - guest_memfd =3D vm_create_guest_memfd(vm, mem_size, guest_memfd_flags); - } else { - /* - * Install a unique fd for each memslot so that the fd - * can be closed when the region is deleted without - * needing to track if the fd is owned by the framework - * or by the caller. - */ - guest_memfd =3D dup(guest_memfd); - TEST_ASSERT(guest_memfd >=3D 0, __KVM_SYSCALL_ERROR("dup()", guest_memf= d)); - } - - region->region.guest_memfd =3D guest_memfd; - region->region.guest_memfd_offset =3D guest_memfd_offset; - } else { - region->region.guest_memfd =3D -1; - } - - region->unused_phy_pages =3D sparsebit_alloc(); - if (vm_arch_has_protected_memory(vm)) - region->protected_phy_pages =3D sparsebit_alloc(); - sparsebit_set_num(region->unused_phy_pages, - guest_paddr >> vm->page_shift, npages); - region->region.slot =3D slot; - region->region.flags =3D flags; - region->region.guest_phys_addr =3D guest_paddr; - region->region.memory_size =3D npages * vm->page_size; - region->region.userspace_addr =3D (uintptr_t) region->host_mem; ret =3D __vm_ioctl(vm, KVM_SET_USER_MEMORY_REGION2, ®ion->region); TEST_ASSERT(ret =3D=3D 0, "KVM_SET_USER_MEMORY_REGION2 IOCTL failed,\n" - " rc: %i errno: %i\n" - " slot: %u flags: 0x%x\n" - " guest_phys_addr: 0x%lx size: 0x%lx guest_memfd: %d", - ret, errno, slot, flags, - guest_paddr, (uint64_t) region->region.memory_size, - region->region.guest_memfd); + " rc: %i errno: %i\n" + " slot: %u flags: 0x%x\n" + " guest_phys_addr: 0x%lx size: 0x%llx guest_memfd: %d", + ret, errno, region->region.slot, region->region.flags, + gpa, region->region.memory_size, + region->region.guest_memfd); + + sparsebit_set_num(region->unused_phy_pages, gpa >> vm->page_shift, npages= ); =20 /* Add to quick lookup data structures */ vm_userspace_mem_region_gpa_insert(&vm->regions.gpa_tree, region); vm_userspace_mem_region_hva_insert(&vm->regions.hva_tree, region); - hash_add(vm->regions.slot_hash, ®ion->slot_node, slot); + hash_add(vm->regions.slot_hash, ®ion->slot_node, region->region.slot); +} =20 - /* If shared memory, create an alias. */ - if (region->fd >=3D 0) { - region->mmap_alias =3D mmap(NULL, region->mmap_size, - PROT_READ | PROT_WRITE, - vm_mem_backing_src_alias(src_type)->flag, - region->fd, 0); - TEST_ASSERT(region->mmap_alias !=3D MAP_FAILED, - __KVM_SYSCALL_ERROR("mmap()", (int)(unsigned long)MAP_FAILED)); +void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type, + uint64_t guest_paddr, uint32_t slot, uint64_t npages, + uint32_t flags, int guest_memfd, uint64_t guest_memfd_offset) +{ + struct userspace_mem_region *region; + size_t mapping_page_size; + size_t memslot_size; + int madvise_advice; + size_t mmap_size; + size_t alignment; + int mmap_flags; + int memfd; =20 - /* Align host alias address */ - region->host_alias =3D align_ptr_up(region->mmap_alias, alignment); + memslot_size =3D npages * vm->page_size; + + mmap_flags =3D vm_mem_backing_src_alias(src_type)->flag; + madvise_advice =3D get_backing_src_madvise_advice(src_type); + mapping_page_size =3D compute_page_size(mmap_flags, madvise_advice); + + TEST_ASSERT_EQ(guest_paddr, align_up(guest_paddr, mapping_page_size)); + + alignment =3D mapping_page_size; +#ifdef __s390x__ + alignment =3D max(alignment, S390X_HOST_ADDRESS_ALIGNMENT); +#endif + + region =3D vm_mem_region_alloc(vm); + + memfd =3D -1; + if (backing_src_is_shared(src_type)) { + unsigned int memfd_flags =3D MFD_CLOEXEC; + + if (src_type =3D=3D VM_MEM_SRC_SHARED_HUGETLB) + memfd_flags |=3D MFD_HUGETLB; + + memfd =3D kvm_create_memfd(memslot_size, memfd_flags); } + region->fd =3D memfd; + + mmap_size =3D align_up(memslot_size, alignment); + vm_mem_region_mmap(region, mmap_size, mmap_flags, memfd, 0); + vm_mem_region_install_memory(region, memslot_size, alignment); + + if (backing_src_should_madvise(src_type)) + vm_mem_region_madvise_thp(region, madvise_advice); + + if (backing_src_is_shared(src_type)) + vm_mem_region_mmap_alias(region, mmap_flags, alignment); + + if (flags & KVM_MEM_GUEST_MEMFD) { + if (guest_memfd < 0) { + TEST_ASSERT( + guest_memfd_offset =3D=3D 0, + "Offset must be zero when creating new guest_memfd"); + guest_memfd =3D vm_create_guest_memfd(vm, memslot_size, 0); + } + + vm_mem_region_install_guest_memfd(region, guest_memfd); + } + + region->region.slot =3D slot; + region->region.flags =3D flags; + region->region.guest_phys_addr =3D guest_paddr; + region->region.guest_memfd_offset =3D guest_memfd_offset; + vm_mem_region_add(vm, region); } =20 void vm_userspace_mem_region_add(struct kvm_vm *vm, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/se= lftests/kvm/lib/test_util.c index 8ed0b74ae837..24dc90693afd 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -308,6 +308,31 @@ size_t get_backing_src_pagesz(uint32_t i) } } =20 +int backing_src_should_madvise(uint32_t i) +{ + switch (i) { + case VM_MEM_SRC_ANONYMOUS: + case VM_MEM_SRC_SHMEM: + case VM_MEM_SRC_ANONYMOUS_THP: + return true; + default: + return false; + } +} + +int get_backing_src_madvise_advice(uint32_t i) +{ + switch (i) { + case VM_MEM_SRC_ANONYMOUS: + case VM_MEM_SRC_SHMEM: + return MADV_NOHUGEPAGE; + case VM_MEM_SRC_ANONYMOUS_THP: + return MADV_NOHUGEPAGE; + default: + return 0; + } +} + bool is_backing_src_hugetlb(uint32_t i) { return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BED2E2356A3 for ; Wed, 14 May 2025 23:43:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266196; cv=none; b=QzBZ4/PJXh5hpVy/xPbBNiXd7tKb/VYN3ZizcptoZcAEnw9lwdmuuhNPfgamS363koXRpdeXIA249z7+h9hkfmLVojJUHdU0k6KgTeD4HtPBnmawXbnuL57McGgx065rDX6ybY8MH5PQBzdknjsga+Fe+2EYoGi+mQ/dxnd+i6M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266196; c=relaxed/simple; bh=LzJYk4b8YGsjuk791t5mKW865k0fPRkNXL8xs0fj9z0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=iwa8XhfK97+F00JKXJLGVXbAP9pt4PS9Ewiqx/SAsga9TotUP9AVLz6xdmS0+hy2dGr+BXnpbap08/rISyu/2yDc34sv47JNkzximYfETcYo+VWZMuQAIbCTN6XEYBkBW3CeWfjOaUgUv87mk9ATlQKgU7EOmdZNZXXSlnMeY2k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dB7tNBjE; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dB7tNBjE" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30abb33d1d2so654468a91.0 for ; Wed, 14 May 2025 16:43:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266194; x=1747870994; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mGWGFLne1bJyu7laipcwQPaVQw6rHvasb6OTJEqGffc=; b=dB7tNBjEJawIvPmuM7S1FgBXGqOLhkGp0NooRcKZfKHaWFZ1OYviFEfSpa7T57Z2Qp CN0uy9+GkzqpodzUbEYb33YDDLRQtM2LgTG9JgSBXwfdrDPQ0Y5RSXBO91/pBOTksTYN rXzhtFhtZg+4vON2MK0ncyw5qKV94pKyHOaIbdZN4qNy5H14lCl4Nn5QyQu/8BX16/gY jZqdaywDpq64YSVaDgDQD/bbH2RjOBJZ2Jpv5IleZUbT+rqU53RW+VlaIjMSi+2//x+8 ELyyPGaYNG4jEujDNIGxZgaLY3NmgnNxII8M/7i7FP0ljtwO/H3TpGoQuG9z/n8Hn2gb gsBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266194; x=1747870994; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mGWGFLne1bJyu7laipcwQPaVQw6rHvasb6OTJEqGffc=; b=iDxJYmR/UcIDqWsRX00/rpegY5QneMbp+xgqMa4ZIaovFvmv7AHLMDDzY9JHhAiiOa jfjuxYXG6sM/GJGsGCpUbJIlZSphl+ald7aOtdQIPUHAqdBXZdDCI9+vH1ZxpGAJ4asV 1sDocFTkSjrUAqcZTbP0n5506G7PfKku3vHzotR0235kTWrsCjaR5n/A7Q6g3Iyspmep uzYLqCBscszwGb8M5dus1lJ/xbFnrscx2vg+8UTq8Sm2QnqdPeRON/KmNipkviazTM2C guvMD8Kaq6N98nuMEMIlUqfyQsxDvkT0SFSdICKLEUDEOaQJ/Y9qwWGLAaOF1kPfMn8T r47g== X-Forwarded-Encrypted: i=1; AJvYcCWelt+wnuLcJxANY2AmguHQBWOv5+ATs1YbrS0wLGEUDQSnLRufW6KsAAFngkXW7v/VmO1nv+c58Otm0CM=@vger.kernel.org X-Gm-Message-State: AOJu0YwWqMPuKxYx79xJ+0uGaJIvlCfIyADaLA6Af3g6MZuiUFFqR10r E/E1dOQik6MK7+1awdqVtt8XfzrrmZFJt3ar8vnCFG5U/V6UAqzgmTs2swx5p1cZz5BtBoGRIIV PTSY8JxU+Ota1fbl8emRkgg== X-Google-Smtp-Source: AGHT+IHW8Awj1az2kZDKXFyRTHzAsu7nHW73nuWNFHatSp354fmtn6egEeLp32OhYFxf0/BZ1Nce5gFMY5p11LmLng== X-Received: from pjbee14.prod.google.com ([2002:a17:90a:fc4e:b0:301:1bf5:2f07]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5788:b0:2fe:baa3:b8bc with SMTP id 98e67ed59e1d1-30e51930f95mr563390a91.23.1747266194209; Wed, 14 May 2025 16:43:14 -0700 (PDT) Date: Wed, 14 May 2025 16:41:50 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <09e75529c3f844b1bb4dd5a096ed4160905fca7f.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 11/51] KVM: selftests: Allow cleanup of ucall_pool from host From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Many selftests use GUEST_DONE() to signal the end of guest code, which is handled in userspace. In most tests, the test exits and there is no need to clean up the ucall_pool->in_use bitmap. If there are many guest code functions using GUEST_DONE(), or of guest code functions are run many times, the ucall_pool->in_use bitmap will fill up, causing later runs of the same guest code function to fail. This patch allows ucall_free() to be called from userspace on uc.hva, which will unset and free the correct struct ucall in the pool, allowing ucalls to continue being used. Change-Id: I2cb2aeed4b291b1bfb2bece001d09c509cd10446 Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/include/ucall_common.h | 1 + tools/testing/selftests/kvm/lib/ucall_common.c | 16 ++++++++-------- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/kvm/include/ucall_common.h b/tools/tes= ting/selftests/kvm/include/ucall_common.h index d9d6581b8d4f..b6b850d0319a 100644 --- a/tools/testing/selftests/kvm/include/ucall_common.h +++ b/tools/testing/selftests/kvm/include/ucall_common.h @@ -40,6 +40,7 @@ __printf(5, 6) void ucall_assert(uint64_t cmd, const char= *exp, const char *fmt, ...); uint64_t get_ucall(struct kvm_vcpu *vcpu, struct ucall *uc); void ucall_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa); +void ucall_free(struct ucall *uc); int ucall_nr_pages_required(uint64_t page_size); =20 /* diff --git a/tools/testing/selftests/kvm/lib/ucall_common.c b/tools/testing= /selftests/kvm/lib/ucall_common.c index 42151e571953..9b6865c39ea7 100644 --- a/tools/testing/selftests/kvm/lib/ucall_common.c +++ b/tools/testing/selftests/kvm/lib/ucall_common.c @@ -21,24 +21,24 @@ int ucall_nr_pages_required(uint64_t page_size) =20 /* * ucall_pool holds per-VM values (global data is duplicated by each VM), = it - * must not be accessed from host code. + * should generally not be accessed from host code other than via ucall_fr= ee(), + * to cleanup after using GUEST_DONE() */ static struct ucall_header *ucall_pool; =20 void ucall_init(struct kvm_vm *vm, vm_paddr_t mmio_gpa) { - struct ucall_header *hdr; struct ucall *uc; vm_vaddr_t vaddr; int i; =20 - vaddr =3D vm_vaddr_alloc_shared(vm, sizeof(*hdr), KVM_UTIL_MIN_VADDR, - MEM_REGION_DATA); - hdr =3D (struct ucall_header *)addr_gva2hva(vm, vaddr); - memset(hdr, 0, sizeof(*hdr)); + vaddr =3D vm_vaddr_alloc_shared(vm, sizeof(*ucall_pool), + KVM_UTIL_MIN_VADDR, MEM_REGION_DATA); + ucall_pool =3D (struct ucall_header *)addr_gva2hva(vm, vaddr); + memset(ucall_pool, 0, sizeof(*ucall_pool)); =20 for (i =3D 0; i < KVM_MAX_VCPUS; ++i) { - uc =3D &hdr->ucalls[i]; + uc =3D &ucall_pool->ucalls[i]; uc->hva =3D uc; } =20 @@ -73,7 +73,7 @@ static struct ucall *ucall_alloc(void) return NULL; } =20 -static void ucall_free(struct ucall *uc) +void ucall_free(struct ucall *uc) { /* Beware, here be pointer arithmetic. */ clear_bit(uc - ucall_pool->ucalls, ucall_pool->in_use); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F79D230BE6 for ; Wed, 14 May 2025 23:43:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266198; cv=none; b=LBM7lICrs0zqpL3hxl0PgOzlJ2f9xpS7bpFsXZUp9tjM+fDXnMVO2LUqHuLO4OkZAU1o9466nPABWO0u5cuAb3WA5w7sfkKc8arBf/XXeWEJ0ENDuXeH5Ky6kFn1VPuUyCE/+VxwvGssnSShj66LcrGfpnVsokvt3uw8X441PHI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266198; c=relaxed/simple; bh=Lmy/KateIDIbX1Vh/hc3V4OevV55wK08eNct+myEsr8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Lexl1VdCL4UGUxXjeBxgf0CGi8issCIdo537iXoUuoi6dxYV9o6rgGhLpLm5J43XWfxmRC1dkHhNbTCF/biFc/lXrhFJ8XUYRURxq7gPovD1MiRq0LJSaKKcwf57Cri1pOG8+S5hEyxnDU3j0rRmXqPPfZHpBduCPHVq9uEerHc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=NUPQqZe2; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NUPQqZe2" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30a29af28d1so353229a91.0 for ; Wed, 14 May 2025 16:43:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266196; x=1747870996; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=FoX8u9X05EseKGiP0Nc90upAeUnBU0ZxWjayZNH51Ts=; b=NUPQqZe2QGdmJdLKDClCFij2JW9xsxvUzq8rR/66X24mKbn6DJpq59MqlBJd/aKxos +oMsjl1s/0AMjOmIdAFaYWTpyIg+ubYXoWsPf48fyVdPohIdzDezyoj0R3OUrkviNKqt YKDh4lwzdRj+PdIPJwugBvvntBEueabZeo081B21VkehR5tUsMN/qRVRoo/dBxfwRvmV /4Fg3xzFHV4qF1fdS5R9AjJ8NfFZ3JszNoYoz+apVnc++kzrvYscGon7Czmf+fEMjfgw NTfeOpHc3WWMOU4/Zdhvc+QEm5HwuYOLwDjmR9+7YH42XPw2Sz/BfL/LVems5D4tt4xL 9pMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266196; x=1747870996; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=FoX8u9X05EseKGiP0Nc90upAeUnBU0ZxWjayZNH51Ts=; b=rQkk7/oyJtNSVyA5H0lBZvnnazWiQUpVC1MHye0y1LXe7teLTsWePYBFIh6W69wFK3 sdaqj7RZP4gf09E7VUB/ZfsOHnmkt/Imk+6E3zsjZC4vR8mZMvuFmaOOwk6zrrmVLAKx hUU21764tUf1v9RTrDuDfRZqQGAJtHD4wrn5+BZePuezXhu/zKTzZYxL9clnpS86xgLy YHmzCu9yHVCpEEhigjV2IuORx55yosclx82YTIeNl4R35Cj3ggNc7EuHV5+d2FpE2jiy Xm2JjUE1pUxSRIvV5CUeM5aix1vEGZd8KFUAIf+a3/BqQfEuv1y90pyyz5b/r9ztMppy H6mw== X-Forwarded-Encrypted: i=1; AJvYcCVMVlsFHx6Ss065o5pSyxhgM5lf6qSrE9YXK9o6RkRI/3s/f68SIf4MwJhflZn3lcjUlHU6JJu/n/AJ+P8=@vger.kernel.org X-Gm-Message-State: AOJu0Yz12i9+A40j3sbxdjvmruWkGitVBWT5Es9YD66JBEW8gzO0Eb4H h6OmgKOx6tUx5bPhPmSya/UJtRPRwj3mbYpNpwkBSWOOWo1ueUhdneXmbRC1F8/G2HnRioxezlA zUEyZcBYPcu/nkDFRRwgtCg== X-Google-Smtp-Source: AGHT+IGaGSkp5qHaaTiIyneH7AsmROzySAlzDj2Uoqwi8z9+GG2j6Ou9TrD4KSKnauVFr0+nXqu29XWOBaPDldfVrA== X-Received: from pjbpw18.prod.google.com ([2002:a17:90b:2792:b0:2fe:800f:23a]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5305:b0:30e:3834:4be6 with SMTP id 98e67ed59e1d1-30e38344ce3mr5416198a91.3.1747266195801; Wed, 14 May 2025 16:43:15 -0700 (PDT) Date: Wed, 14 May 2025 16:41:51 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 12/51] KVM: selftests: Test conversion flows for guest_memfd From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add minimal tests for guest_memfd to test that when memory is marked shared in a VM, the host can read and write to it via an mmap()ed address, and the guest can also read and write to it. Tests added in this patch use refcounts taken via GUP (requiring CONFIG_GUP_TEST) to simulate unexpected refcounts on guest_memfd pages. Test that unexpected refcounts cause conversions to fail. Change-Id: I4f8c05aa511bcb9a34921a54fc8315ed89629018 Signed-off-by: Ackerley Tng --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../kvm/guest_memfd_conversions_test.c | 589 ++++++++++++++++++ .../testing/selftests/kvm/include/kvm_util.h | 74 +++ 3 files changed, 664 insertions(+) create mode 100644 tools/testing/selftests/kvm/guest_memfd_conversions_tes= t.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index ccf95ed037c3..bc22a5a23c4c 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -131,6 +131,7 @@ TEST_GEN_PROGS_x86 +=3D access_tracking_perf_test TEST_GEN_PROGS_x86 +=3D coalesced_io_test TEST_GEN_PROGS_x86 +=3D dirty_log_perf_test TEST_GEN_PROGS_x86 +=3D guest_memfd_test +TEST_GEN_PROGS_x86 +=3D guest_memfd_conversions_test TEST_GEN_PROGS_x86 +=3D hardware_disable_test TEST_GEN_PROGS_x86 +=3D memslot_modification_stress_test TEST_GEN_PROGS_x86 +=3D memslot_perf_test diff --git a/tools/testing/selftests/kvm/guest_memfd_conversions_test.c b/t= ools/testing/selftests/kvm/guest_memfd_conversions_test.c new file mode 100644 index 000000000000..34eb6c9a37b1 --- /dev/null +++ b/tools/testing/selftests/kvm/guest_memfd_conversions_test.c @@ -0,0 +1,589 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Test conversion flows for guest_memfd. + * + * Copyright (c) 2024, Google LLC. + */ +#include +#include +#include +#include +#include +#include + +#include "kvm_util.h" +#include "processor.h" +#include "test_util.h" +#include "ucall_common.h" +#include "../../../../mm/gup_test.h" + +#define GUEST_MEMFD_SHARING_TEST_SLOT 10 +/* + * Use high GPA above APIC_DEFAULT_PHYS_BASE to avoid clashing with + * APIC_DEFAULT_PHYS_BASE. + */ +#define GUEST_MEMFD_SHARING_TEST_GPA 0x100000000ULL +#define GUEST_MEMFD_SHARING_TEST_GVA 0x90000000ULL + +static int gup_test_fd; + +static void pin_pages(void *vaddr, uint64_t size) +{ + const struct pin_longterm_test args =3D { + .addr =3D (uint64_t)vaddr, + .size =3D size, + .flags =3D PIN_LONGTERM_TEST_FLAG_USE_WRITE, + }; + + gup_test_fd =3D open("/sys/kernel/debug/gup_test", O_RDWR); + TEST_REQUIRE(gup_test_fd > 0); + + TEST_ASSERT_EQ(ioctl(gup_test_fd, PIN_LONGTERM_TEST_START, &args), 0); +} + +static void unpin_pages(void) +{ + TEST_ASSERT_EQ(ioctl(gup_test_fd, PIN_LONGTERM_TEST_STOP), 0); +} + +static void guest_check_mem(uint64_t gva, char expected_read_value, char w= rite_value) +{ + char *mem =3D (char *)gva; + + if (expected_read_value !=3D 'X') + GUEST_ASSERT_EQ(*mem, expected_read_value); + + if (write_value !=3D 'X') + *mem =3D write_value; + + GUEST_DONE(); +} + +static int vcpu_run_handle_basic_ucalls(struct kvm_vcpu *vcpu) +{ + struct ucall uc; + int rc; + +keep_going: + do { + rc =3D __vcpu_run(vcpu); + } while (rc =3D=3D -1 && errno =3D=3D EINTR); + + switch (get_ucall(vcpu, &uc)) { + case UCALL_PRINTF: + REPORT_GUEST_PRINTF(uc); + goto keep_going; + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + } + + return rc; +} + +/** + * guest_use_memory() - Assert that guest can use memory at @gva. + * + * @vcpu: the vcpu to run this test on. + * @gva: the virtual address in the guest to try to use. + * @expected_read_value: the value that is expected at @gva. Set this to '= X' to + * skip checking current value. + * @write_value: value to write to @gva. Set to 'X' to skip writing value = to + * @address. + * @expected_errno: the expected errno if an error is expected while readi= ng or + * writing @gva. Set to 0 if no exception is expected, + * otherwise set it to the expected errno. If @expected_e= rrno + * is set, 'Z' is used instead of @expected_read_value or + * @write_value. + */ +static void guest_use_memory(struct kvm_vcpu *vcpu, uint64_t gva, + char expected_read_value, char write_value, + int expected_errno) +{ + struct kvm_regs original_regs; + int rc; + + if (expected_errno > 0) { + expected_read_value =3D 'Z'; + write_value =3D 'Z'; + } + + /* + * Backup and vCPU state from first run so that guest_check_mem can be + * run again and again. + */ + vcpu_regs_get(vcpu, &original_regs); + + vcpu_args_set(vcpu, 3, gva, expected_read_value, write_value); + vcpu_arch_set_entry_point(vcpu, guest_check_mem); + + rc =3D vcpu_run_handle_basic_ucalls(vcpu); + + if (expected_errno) { + TEST_ASSERT_EQ(rc, -1); + TEST_ASSERT_EQ(errno, expected_errno); + + switch (expected_errno) { + case EFAULT: + TEST_ASSERT_EQ(vcpu->run->exit_reason, 0); + break; + case EACCES: + TEST_ASSERT_EQ(vcpu->run->exit_reason, KVM_EXIT_MEMORY_FAULT); + break; + } + } else { + struct ucall uc; + + TEST_ASSERT_EQ(rc, 0); + TEST_ASSERT_EQ(get_ucall(vcpu, &uc), UCALL_DONE); + + /* + * UCALL_DONE() uses up one struct ucall slot. To reuse the slot + * in another run of guest_check_mem, free up that slot. + */ + ucall_free((struct ucall *)uc.hva); + } + + vcpu_regs_set(vcpu, &original_regs); +} + +/** + * host_use_memory() - Assert that host can fault and use memory at @addre= ss. + * + * @address: the address to be testing. + * @expected_read_value: the value expected to be read from @address. Set = to 'X' + * to skip checking current value at @address. + * @write_value: the value to write to @address. Set to 'X' to skip writing + * value to @address. + */ +static void host_use_memory(char *address, char expected_read_value, + char write_value) +{ + if (expected_read_value !=3D 'X') + TEST_ASSERT_EQ(*address, expected_read_value); + + if (write_value !=3D 'X') + *address =3D write_value; +} + +static void assert_host_cannot_fault(char *address) +{ + pid_t child_pid; + + child_pid =3D fork(); + TEST_ASSERT(child_pid !=3D -1, "fork failed"); + + if (child_pid =3D=3D 0) { + *address =3D 'A'; + TEST_FAIL("Child should have exited with a signal"); + } else { + int status; + + waitpid(child_pid, &status, 0); + + TEST_ASSERT(WIFSIGNALED(status), + "Child should have exited with a signal"); + TEST_ASSERT_EQ(WTERMSIG(status), SIGBUS); + } +} + +static void *add_memslot(struct kvm_vm *vm, size_t memslot_size, int guest= _memfd) +{ + struct userspace_mem_region *region; + void *mem; + + TEST_REQUIRE(guest_memfd > 0); + + region =3D vm_mem_region_alloc(vm); + + guest_memfd =3D vm_mem_region_install_guest_memfd(region, guest_memfd); + mem =3D vm_mem_region_mmap(region, memslot_size, MAP_SHARED, guest_memfd,= 0); + vm_mem_region_install_memory(region, memslot_size, PAGE_SIZE); + + region->region.slot =3D GUEST_MEMFD_SHARING_TEST_SLOT; + region->region.flags =3D KVM_MEM_GUEST_MEMFD; + region->region.guest_phys_addr =3D GUEST_MEMFD_SHARING_TEST_GPA; + region->region.guest_memfd_offset =3D 0; + + vm_mem_region_add(vm, region); + + return mem; +} + +static struct kvm_vm *setup_test(size_t test_page_size, bool init_private, + struct kvm_vcpu **vcpu, int *guest_memfd, + char **mem) +{ + const struct vm_shape shape =3D { + .mode =3D VM_MODE_DEFAULT, + .type =3D KVM_X86_SW_PROTECTED_VM, + }; + size_t test_nr_pages; + struct kvm_vm *vm; + uint64_t flags; + + test_nr_pages =3D test_page_size / PAGE_SIZE; + vm =3D __vm_create_shape_with_one_vcpu(shape, vcpu, test_nr_pages, NULL); + + flags =3D GUEST_MEMFD_FLAG_SUPPORT_SHARED; + if (init_private) + flags |=3D GUEST_MEMFD_FLAG_INIT_PRIVATE; + + *guest_memfd =3D vm_create_guest_memfd(vm, test_page_size, flags); + TEST_ASSERT(*guest_memfd > 0, "guest_memfd creation failed"); + + *mem =3D add_memslot(vm, test_page_size, *guest_memfd); + + virt_map(vm, GUEST_MEMFD_SHARING_TEST_GVA, GUEST_MEMFD_SHARING_TEST_GPA, + test_nr_pages); + + return vm; +} + +static void cleanup_test(size_t guest_memfd_size, struct kvm_vm *vm, + int guest_memfd, char *mem) +{ + kvm_vm_free(vm); + TEST_ASSERT_EQ(munmap(mem, guest_memfd_size), 0); + + if (guest_memfd > -1) + TEST_ASSERT_EQ(close(guest_memfd), 0); +} + +static void test_sharing(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int guest_memfd; + char *mem; + + vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/false, &vcpu, &guest_memf= d, &mem); + + host_use_memory(mem, 'X', 'A'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'A', 'B', 0); + + /* Toggle private flag of memory attributes and run the test again. */ + guest_memfd_convert_private(guest_memfd, 0, PAGE_SIZE); + + assert_host_cannot_fault(mem); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'B', 'C', 0); + + guest_memfd_convert_shared(guest_memfd, 0, PAGE_SIZE); + + host_use_memory(mem, 'C', 'D'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'D', 'E', 0); + + cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); +} + +static void test_init_mappable_false(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int guest_memfd; + char *mem; + + vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/true, &vcpu, &guest_memfd= , &mem); + + assert_host_cannot_fault(mem); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); + + guest_memfd_convert_shared(guest_memfd, 0, PAGE_SIZE); + + host_use_memory(mem, 'A', 'B'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'B', 'C', 0); + + cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); +} + +/* + * Test that even if there are no folios yet, conversion requests are reco= rded + * in guest_memfd. + */ +static void test_conversion_before_allocation(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int guest_memfd; + char *mem; + + vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/false, &vcpu, &guest_memf= d, &mem); + + guest_memfd_convert_private(guest_memfd, 0, PAGE_SIZE); + + assert_host_cannot_fault(mem); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); + + guest_memfd_convert_shared(guest_memfd, 0, PAGE_SIZE); + + host_use_memory(mem, 'A', 'B'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'B', 'C', 0); + + cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); +} + +static void __test_conversion_if_not_all_folios_allocated(int total_nr_pag= es, + int page_to_fault) +{ + const int second_page_to_fault =3D 8; + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + size_t total_size; + int guest_memfd; + char *mem; + int i; + + total_size =3D PAGE_SIZE * total_nr_pages; + vm =3D setup_test(total_size, /*init_private=3D*/false, &vcpu, &guest_mem= fd, &mem); + + /* + * Fault 2 of the pages to test filemap range operations except when + * page_to_fault =3D=3D second_page_to_fault. + */ + host_use_memory(mem + page_to_fault * PAGE_SIZE, 'X', 'A'); + host_use_memory(mem + second_page_to_fault * PAGE_SIZE, 'X', 'A'); + + guest_memfd_convert_private(guest_memfd, 0, total_size); + + for (i =3D 0; i < total_nr_pages; ++i) { + bool is_faulted; + char expected; + + assert_host_cannot_fault(mem + i * PAGE_SIZE); + + is_faulted =3D i =3D=3D page_to_fault || i =3D=3D second_page_to_fault; + expected =3D is_faulted ? 'A' : 'X'; + guest_use_memory(vcpu, + GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + expected, 'B', 0); + } + + guest_memfd_convert_shared(guest_memfd, 0, total_size); + + for (i =3D 0; i < total_nr_pages; ++i) { + host_use_memory(mem + i * PAGE_SIZE, 'B', 'C'); + guest_use_memory(vcpu, + GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + 'C', 'D', 0); + } + + cleanup_test(total_size, vm, guest_memfd, mem); +} + +static void test_conversion_if_not_all_folios_allocated(void) +{ + const int total_nr_pages =3D 16; + int i; + + for (i =3D 0; i < total_nr_pages; ++i) + __test_conversion_if_not_all_folios_allocated(total_nr_pages, i); +} + +static void test_conversions_should_not_affect_surrounding_pages(void) +{ + struct kvm_vcpu *vcpu; + int page_to_convert; + struct kvm_vm *vm; + size_t total_size; + int guest_memfd; + int nr_pages; + char *mem; + int i; + + page_to_convert =3D 2; + nr_pages =3D 4; + total_size =3D PAGE_SIZE * nr_pages; + + vm =3D setup_test(total_size, /*init_private=3D*/false, &vcpu, &guest_mem= fd, &mem); + + for (i =3D 0; i < nr_pages; ++i) { + host_use_memory(mem + i * PAGE_SIZE, 'X', 'A'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + 'A', 'B', 0); + } + + guest_memfd_convert_private(guest_memfd, PAGE_SIZE * page_to_convert, PAG= E_SIZE); + + + for (i =3D 0; i < nr_pages; ++i) { + char to_check; + + if (i =3D=3D page_to_convert) { + assert_host_cannot_fault(mem + i * PAGE_SIZE); + to_check =3D 'B'; + } else { + host_use_memory(mem + i * PAGE_SIZE, 'B', 'C'); + to_check =3D 'C'; + } + + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + to_check, 'D', 0); + } + + guest_memfd_convert_shared(guest_memfd, PAGE_SIZE * page_to_convert, PAGE= _SIZE); + + + for (i =3D 0; i < nr_pages; ++i) { + host_use_memory(mem + i * PAGE_SIZE, 'D', 'E'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + 'E', 'F', 0); + } + + cleanup_test(total_size, vm, guest_memfd, mem); +} + +static void __test_conversions_should_fail_if_memory_has_elevated_refcount( + int nr_pages, int page_to_convert) +{ + struct kvm_vcpu *vcpu; + loff_t error_offset; + struct kvm_vm *vm; + size_t total_size; + int guest_memfd; + char *mem; + int ret; + int i; + + total_size =3D PAGE_SIZE * nr_pages; + vm =3D setup_test(total_size, /*init_private=3D*/false, &vcpu, &guest_mem= fd, &mem); + + pin_pages(mem + page_to_convert * PAGE_SIZE, PAGE_SIZE); + + for (i =3D 0; i < nr_pages; i++) { + host_use_memory(mem + i * PAGE_SIZE, 'X', 'A'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + 'A', 'B', 0); + } + + error_offset =3D 0; + ret =3D __guest_memfd_convert_private(guest_memfd, page_to_convert * PAGE= _SIZE, + PAGE_SIZE, &error_offset); + TEST_ASSERT_EQ(ret, -1); + TEST_ASSERT_EQ(errno, EAGAIN); + TEST_ASSERT_EQ(error_offset, page_to_convert * PAGE_SIZE); + + unpin_pages(); + + guest_memfd_convert_private(guest_memfd, page_to_convert * PAGE_SIZE, PAG= E_SIZE); + + for (i =3D 0; i < nr_pages; i++) { + char expected; + + if (i =3D=3D page_to_convert) + assert_host_cannot_fault(mem + i * PAGE_SIZE); + else + host_use_memory(mem + i * PAGE_SIZE, 'B', 'C'); + + expected =3D i =3D=3D page_to_convert ? 'X' : 'C'; + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + expected, 'D', 0); + } + + guest_memfd_convert_shared(guest_memfd, page_to_convert * PAGE_SIZE, PAGE= _SIZE); + + + for (i =3D 0; i < nr_pages; i++) { + char expected =3D i =3D=3D page_to_convert ? 'X' : 'D'; + + host_use_memory(mem + i * PAGE_SIZE, expected, 'E'); + guest_use_memory(vcpu, + GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + 'E', 'F', 0); + } + + cleanup_test(total_size, vm, guest_memfd, mem); +} +/* + * This test depends on CONFIG_GUP_TEST to provide a kernel module that ex= poses + * pin_user_pages() to userspace. + */ +static void test_conversions_should_fail_if_memory_has_elevated_refcount(v= oid) +{ + int i; + + for (i =3D 0; i < 4; i++) + __test_conversions_should_fail_if_memory_has_elevated_refcount(4, i); +} + +static void test_truncate_should_not_change_mappability(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int guest_memfd; + char *mem; + int ret; + + vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/false, &vcpu, &guest_memf= d, &mem); + + host_use_memory(mem, 'X', 'A'); + + ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + 0, PAGE_SIZE); + TEST_ASSERT(!ret, "truncating the first page should succeed"); + + host_use_memory(mem, 'X', 'A'); + + guest_memfd_convert_private(guest_memfd, 0, PAGE_SIZE); + + assert_host_cannot_fault(mem); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'A', 'A', 0); + + ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + 0, PAGE_SIZE); + TEST_ASSERT(!ret, "truncating the first page should succeed"); + + assert_host_cannot_fault(mem); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); + + cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); +} + +static void test_fault_type_independent_of_mem_attributes(void) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int guest_memfd; + char *mem; + + vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/true, &vcpu, &guest_memfd= , &mem); + vm_mem_set_shared(vm, GUEST_MEMFD_SHARING_TEST_GPA, PAGE_SIZE); + + /* + * kvm->mem_attr_array set to shared, guest_memfd memory initialized as + * private. + */ + + /* Host cannot use private memory. */ + assert_host_cannot_fault(mem); + + /* Guest can fault and use memory. */ + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); + + guest_memfd_convert_shared(guest_memfd, 0, PAGE_SIZE); + vm_mem_set_private(vm, GUEST_MEMFD_SHARING_TEST_GPA, PAGE_SIZE); + + /* Host can use shared memory. */ + host_use_memory(mem, 'X', 'A'); + + /* Guest can also use shared memory. */ + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); + + cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); +} + +int main(int argc, char *argv[]) +{ + TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_V= M)); + TEST_REQUIRE(kvm_check_cap(KVM_CAP_GMEM_SHARED_MEM)); + TEST_REQUIRE(kvm_check_cap(KVM_CAP_GMEM_CONVERSION)); + + test_sharing(); + test_init_mappable_false(); + test_conversion_before_allocation(); + test_conversion_if_not_all_folios_allocated(); + test_conversions_should_not_affect_surrounding_pages(); + test_truncate_should_not_change_mappability(); + test_conversions_should_fail_if_memory_has_elevated_refcount(); + test_fault_type_independent_of_mem_attributes(); + + return 0; +} diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing= /selftests/kvm/include/kvm_util.h index 853ab68cff79..ffe0625f2d71 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -18,11 +18,13 @@ #include #include =20 +#include #include =20 #include "kvm_util_arch.h" #include "kvm_util_types.h" #include "sparsebit.h" +#include =20 #define KVM_DEV_PATH "/dev/kvm" #define KVM_MAX_VCPUS 512 @@ -426,6 +428,78 @@ static inline void vm_mem_set_shared(struct kvm_vm *vm= , uint64_t gpa, vm_set_memory_attributes(vm, gpa, size, 0); } =20 +static inline int __guest_memfd_convert_private(int guest_memfd, loff_t of= fset, + size_t size, loff_t *error_offset) +{ + int ret; + + struct kvm_gmem_convert param =3D { + .offset =3D offset, + .size =3D size, + .error_offset =3D 0, + }; + + ret =3D ioctl(guest_memfd, KVM_GMEM_CONVERT_PRIVATE, ¶m); + if (ret) + *error_offset =3D param.error_offset; + + return ret; +} + +static inline void guest_memfd_convert_private(int guest_memfd, loff_t off= set, + size_t size) +{ + loff_t error_offset; + int retries; + int ret; + + retries =3D 2; + do { + error_offset =3D 0; + ret =3D __guest_memfd_convert_private(guest_memfd, offset, size, + &error_offset); + } while (ret =3D=3D -1 && errno =3D=3D EAGAIN && --retries > 0); + + TEST_ASSERT(!ret, "Unexpected error %s (%m) at offset 0x%lx", + strerrorname_np(errno), error_offset); +} + +static inline int __guest_memfd_convert_shared(int guest_memfd, loff_t off= set, + size_t size, loff_t *error_offset) +{ + int ret; + + struct kvm_gmem_convert param =3D { + .offset =3D offset, + .size =3D size, + .error_offset =3D 0, + }; + + ret =3D ioctl(guest_memfd, KVM_GMEM_CONVERT_SHARED, ¶m); + if (ret) + *error_offset =3D param.error_offset; + + return ret; +} + +static inline void guest_memfd_convert_shared(int guest_memfd, loff_t offs= et, + size_t size) +{ + loff_t error_offset; + int retries; + int ret; + + retries =3D 2; + do { + error_offset =3D 0; + ret =3D __guest_memfd_convert_shared(guest_memfd, offset, size, + &error_offset); + } while (ret =3D=3D -1 && errno =3D=3D EAGAIN && --retries > 0); + + TEST_ASSERT(!ret, "Unexpected error %s (%m) at offset 0x%lx", + strerrorname_np(errno), error_offset); +} + void vm_guest_mem_fallocate(struct kvm_vm *vm, uint64_t gpa, uint64_t size, bool punch_hole); =20 --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBAFC239085 for ; Wed, 14 May 2025 23:43:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266199; cv=none; b=oiz7gtxqWikNTnMBy/iK3IQgYfFxUi9vrl6HeRd3sI8yi9dMvEzPpMeEmJry23ZpIK13WsXb9r/WRXZr03mU1ugCcZ/wDgqfjkyV+s290m1viTFBVpZdcRAq1qe0CmIyUksOzaIylaUNmSl/85+L6PlSpv7V6rVVzMJp7zXmLjk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266199; c=relaxed/simple; bh=W3kQvVjiVRdrC872LeW4B4Hq/K4Vu3vda0e9Fhub7Es=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Yy16+3CFlAbSQgESf0oJgTfNJYgSSzZ60Aos9LKwze5Ry7ty6uAt/tXux71Y7YLDJ3tzzbtkQ/o/r6QBa/j3pVfjnMUkHNI+n7UmB6gLxtnSVXJjhp/13cRaOiyz7lzz257kR1vqfVZvwhd9C3OjPJ4ncunhBBYYigXh1uDs8Qg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0Gv2gOpJ; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0Gv2gOpJ" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30ab450d918so524012a91.2 for ; Wed, 14 May 2025 16:43:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266197; x=1747870997; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PRU0ZRfLQjg4VVDgrWE7a1sC3dOKBaDUonPKqBXfZi4=; b=0Gv2gOpJPpQAk4AzrJ/7XJAFcnJMsEPsie8bZF9fQzpwfUJc4L8dyq/RwUSF08BEg/ sjqtYSRHBaLPNT/wK+6mtuLxpQ/gEXFiXwBYoM/QbDzLdVrPMjBzKdPBxuiQsb7eXvQn 8BHUCOb2v9seDLHOZmERG4YbzDSS+tMZ2Dx+oVkaqtNHGdQY4x2XxOCDlasr4QNJVlc2 /sH1Jz5COfNm58JtYrXqp1NGwnHVFW46HzSyghHBIUwOsfHw8WWYyr/qV6MemPsSMoRJ ZTsv62ECrG5JOProNxaN6x/U9M/QyPOHlTvW3VasklLm3UD7KoJ8p+wDWfWXfEJ83o5m TjiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266197; x=1747870997; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PRU0ZRfLQjg4VVDgrWE7a1sC3dOKBaDUonPKqBXfZi4=; b=R32P2VqhkDmKUmEe+NusfiiZBvOaeAtFn6mxzuy5uFc8JAgeN2xofopY0muBe75mwi vX4ZAkUavTnPTEPgMyF7EJ9nRXNXUsrKeJh5JfWHLUF52xk3AtJTT2+K4Sx9uPuwjeZ3 Dha9xTBWvVG1sq0QV/cot81QrYdhz8QiocT3CG2FcPiCDVnARGvlfviWy/AjnuYkTgN7 mKV1axcXZg0VWeRK/oX+5VqT36j9NIJ3rGGOEmI1gtjEBr1Eecy+JOwBtU87ecWQMQlQ LgkZ2No3ColgBmEm120MzagiRA6grw3KLFqcQuUi9vLUghmtGCDYZ3TSFPSQEY/hQCHJ IkEA== X-Forwarded-Encrypted: i=1; AJvYcCW+FGxcrZMq/iVJX7TAH+SEvwNjXFLxiwVmyKTzsHMlTsPstyQsktyAOOzQurVS2v9muk7dI3v4EX1j7d0=@vger.kernel.org X-Gm-Message-State: AOJu0Yyhh0Av05bMciIpVSaytrO2y1lNbAIwrLvTImOD9wvDtf6Vk0nK T4X/kMZOV11cLi78E8Snyyz9d830768j2GqOnixDvx7mJzs+2wIgDCH4KWvnZBK2gdjwnR1dGYA pxinr7hsO1yY2m+oSUw1mag== X-Google-Smtp-Source: AGHT+IHb7GIqu695NGHdx8kULM1l9ge1FyZ0eSB2cW4x76FAmN4jqpBALaAAXGXP8pOhY4vxYo06cCKYxIVDFuivmw== X-Received: from pjbsv3.prod.google.com ([2002:a17:90b:5383:b0:305:2d2a:dfaa]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:288b:b0:2ff:693a:7590 with SMTP id 98e67ed59e1d1-30e2e62ff9emr9683927a91.33.1747266197415; Wed, 14 May 2025 16:43:17 -0700 (PDT) Date: Wed, 14 May 2025 16:41:52 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 13/51] KVM: selftests: Add script to exercise private_mem_conversions_test From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Makes testing different combinations of private_mem_conversions_test flags easier. Change-Id: I7647e92524baf09eb97e09bdbd95ad57ada44f4b Signed-off-by: Ackerley Tng --- .../kvm/x86/private_mem_conversions_test.sh | 82 +++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100755 tools/testing/selftests/kvm/x86/private_mem_conversions= _test.sh diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.s= h b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh new file mode 100755 index 000000000000..76efa81114d2 --- /dev/null +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh @@ -0,0 +1,82 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0-only */ +# +# Wrapper script which runs different test setups of +# private_mem_conversions_test. +# +# Copyright (C) 2024, Google LLC. + +set -e + +num_vcpus_to_test=3D4 +num_memslots_to_test=3D$num_vcpus_to_test + +get_default_hugepage_size_in_kB() { + grep "Hugepagesize:" /proc/meminfo | grep -o '[[:digit:]]\+' +} + +# Required pages are based on the test setup (see computation for memfd_si= ze) in +# test_mem_conversions() in private_mem_migrate_tests.c) + +# These static requirements are set to the maximum required for +# num_vcpus_to_test, over all the hugetlb-related tests +required_num_2m_hugepages=3D$(( 1024 * num_vcpus_to_test )) +required_num_1g_hugepages=3D$(( 2 * num_vcpus_to_test )) + +# The other hugetlb sizes are not supported on x86_64 +[ "$(cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 2>/dev/nul= l || echo 0)" \ + -ge "$required_num_2m_hugepages" ] && hugepage_2mb_enabled=3D1 +[ "$(cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages 2>/dev/= null || echo 0)" \ + -ge "$required_num_1g_hugepages" ] && hugepage_1gb_enabled=3D1 + +case $(get_default_hugepage_size_in_kB) in + 2048) + hugepage_default_enabled=3D$hugepage_2mb_enabled + ;; + 1048576) + hugepage_default_enabled=3D$hugepage_1gb_enabled + ;; + *) + hugepage_default_enabled=3D0 + ;; +esac + +backing_src_types=3D( anonymous ) +backing_src_types+=3D( anonymous_thp ) +[ -n "$hugepage_default_enabled" ] && \ + backing_src_types+=3D( anonymous_hugetlb ) || \ + echo "skipping anonymous_hugetlb backing source type" +[ -n "$hugepage_2mb_enabled" ] && \ + backing_src_types+=3D( anonymous_hugetlb_2mb ) || \ + echo "skipping anonymous_hugetlb_2mb backing source type" +[ -n "$hugepage_1gb_enabled" ] && \ + backing_src_types+=3D( anonymous_hugetlb_1gb ) || \ + echo "skipping anonymous_hugetlb_1gb backing source type" +backing_src_types+=3D( shmem ) +[ -n "$hugepage_default_enabled" ] && \ + backing_src_types+=3D( shared_hugetlb ) || \ + echo "skipping shared_hugetlb backing source type" + +set +e + +TEST_EXECUTABLE=3D"$(dirname "$0")/private_mem_conversions_test" + +( + set -e + + for src_type in "${backing_src_types[@]}"; do + + set -x + + $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test + $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test -m $num_memslots_t= o_test + + { set +x; } 2>/dev/null + + echo + + done +) +RET=3D$? + +exit $RET --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EC0A23C4EA for ; Wed, 14 May 2025 23:43:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266201; cv=none; b=byXh5kWEE+HmGytZuGnO5Vq6YbPJnd+mI/XxBUMOinz+tMjAJl8MqJys2XPNfrUbO4Zs23BRiG5tNtvcxedH4Kj3+TOpUTqzxN+ZZA7AzeqMWVJ83AHid3gDTV1nTa2YNLiEH8ZB3M/SBE1ox6U0eN7pwngDFSGkFq2PVt7Gt+E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266201; c=relaxed/simple; bh=e91op+UraLyrWNhzcIBqz4xtRXuTML0cF8O8952iihk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=TqEjtqxB5nPoQ+HnO6KE+3dKkmAy0tcPrgvArd9yOfXa1hoBHo2A2jD9mOHlapMTyE07iISpzDNobq5txYxFX8PIxhB/ocio8XK76HlLFQsMSwg4QUrClrSiEK1EUDbNZyB/+1AILVXDPoaSv70qPSONBN6etcTSApnfqi/yneQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=qHCUYlR+; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="qHCUYlR+" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-7425168cfb9so305217b3a.1 for ; Wed, 14 May 2025 16:43:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266199; x=1747870999; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=N9iOM1rSbFI42WS9IxsAoi2d/13fokogLjc+XkopCRc=; b=qHCUYlR+cCyepRssCSAW83NvIpUyRouYz6CnTd4XQAURRydKBhtn1dwUYpXQZBytdS OyS4Ebg/OyG9pnyGapNeViRLffMmhntJEhWiyf0dKaiRfp05CMtmkqU8xeQGTssnVKSx Q49cnzlz3/7Tr3d6VlEkLJY7vlHcc3c6cZFG0MQtEzhqsttmSFfNmh5JRcJZcg5pDhR9 iOUMEYp7om/ToAlHsZ4S8altQA1LuMTHJtvkZXRl5uelwz96BfdjK+6vAvdKaMF2uss1 kxNFAsfkQtZT2no+yvdC2Eml8A+Ubv4vr/YtmsSCbsnR17cmTAed81evJl+lvdNGJ++B 0Vog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266199; x=1747870999; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=N9iOM1rSbFI42WS9IxsAoi2d/13fokogLjc+XkopCRc=; b=FLCEdFU7JCHZbDf6L1OMzE8tH1sBBYH9QqYh6RH17Sy3bajQs/TDHCv/Q0tHO0bPb2 A4z5XR57s2dBFUfZOdx4yItqF1UI5CR51+j23TR+9LoHT+7NusrXGZzxIyg3xI3ZwE8o EkhTlsm56+dFVIO4FJ6DJZ9GehBgyBVnjrPNjI+ju5Iq86Lh01iWafyEyLllnqXSu7P4 uAAHcltZ1Q1kGf9rK2Z/k2T4zaWN4xw38Ru6Pk9KKthAzOV2HNYfhjSnZFkh3Im5msLI yOLjPQdA9u1Wk477Q1RCTBj2Ri/eluEOxPw4yTpCkS1p1yf/xv/kuflghR0vP0n6kWsu DE5A== X-Forwarded-Encrypted: i=1; AJvYcCVqHPkLE6h9JqIg4TnBjE5/Dpehwl6KaL5lxOiUxo/MGcuR9qd2yenMVkDzkq6PR6Q5wc+3+mDZwEbctn4=@vger.kernel.org X-Gm-Message-State: AOJu0YxDcJxeucMlj3jQxz9WEKugpcTyyEqsmP9uzCll/MF4biAQ32NM Sgmi3JY+uDjTjBudwuOAgjlk+YXh7SZlwkFUPymg2kfApyqUBwWu6JAw4K/rx1KbvP/zzx4kA7L gLif6sDyZW0YxMMUEtnTkrg== X-Google-Smtp-Source: AGHT+IESWU6gXthLw+H3adUXZGWsqNOWfunn9I9jbZVcDugt7AmKDl1yijbkySA8Gzvt/3Tb29JluojLVpzGy9Oflg== X-Received: from pgah3.prod.google.com ([2002:a05:6a02:4e83:b0:b21:868e:36fd]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:7a8b:b0:215:d9fc:382e with SMTP id adf61e73a8af0-216115270femr429622637.13.1747266198807; Wed, 14 May 2025 16:43:18 -0700 (PDT) Date: Wed, 14 May 2025 16:41:53 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <45a932753580d21627779ccfc1a2400e17dfdd79.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 14/51] KVM: selftests: Update private_mem_conversions_test to mmap guest_memfd From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch updates private_mem_conversions_test to use guest_memfd for both private and shared memory. The guest_memfd conversion ioctls are used to perform conversions. Specify -g to also back shared memory with memory from guest_memfd. Signed-off-by: Ackerley Tng Change-Id: Ibc647dc43fbdddac7cc465886bed92c07bbf4f00 --- .../testing/selftests/kvm/include/kvm_util.h | 1 + tools/testing/selftests/kvm/lib/kvm_util.c | 36 ++++ .../kvm/x86/private_mem_conversions_test.c | 163 +++++++++++++++--- 3 files changed, 176 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing= /selftests/kvm/include/kvm_util.h index ffe0625f2d71..ded65a15abea 100644 --- a/tools/testing/selftests/kvm/include/kvm_util.h +++ b/tools/testing/selftests/kvm/include/kvm_util.h @@ -721,6 +721,7 @@ void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa); void *addr_gva2hva(struct kvm_vm *vm, vm_vaddr_t gva); vm_paddr_t addr_hva2gpa(struct kvm_vm *vm, void *hva); void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa); +int addr_gpa2guest_memfd(struct kvm_vm *vm, vm_paddr_t gpa, loff_t *offset= ); =20 #ifndef vcpu_arch_put_guest #define vcpu_arch_put_guest(mem, val) do { (mem) =3D (val); } while (0) diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/sel= ftests/kvm/lib/kvm_util.c index 58a3365f479c..253d0c00e2f0 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -1734,6 +1734,42 @@ void *addr_gpa2hva(struct kvm_vm *vm, vm_paddr_t gpa) + (gpa - region->region.guest_phys_addr)); } =20 +/* + * Address VM Physical to guest_memfd + * + * Input Args: + * vm - Virtual Machine + * gpa - VM physical address + * + * Output Args: + * offset - offset in guest_memfd for gpa + * + * Return: + * guest_memfd for + * + * Locates the memory region containing the VM physical address given by g= pa, + * within the VM given by vm. When found, the guest_memfd providing the m= emory + * to the vm physical address and the offset in the file corresponding to = the + * requested gpa is returned. A TEST_ASSERT failure occurs if no region + * containing gpa exists. + */ +int addr_gpa2guest_memfd(struct kvm_vm *vm, vm_paddr_t gpa, loff_t *offset) +{ + struct userspace_mem_region *region; + + gpa =3D vm_untag_gpa(vm, gpa); + + region =3D userspace_mem_region_find(vm, gpa, gpa); + if (!region) { + TEST_FAIL("No vm physical memory at 0x%lx", gpa); + return -1; + } + + *offset =3D region->region.guest_memfd_offset + gpa - region->region.gues= t_phys_addr; + + return region->region.guest_memfd; +} + /* * Address Host Virtual to VM Physical * diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c= b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c index 82a8d88b5338..ec20bb7e95c8 100644 --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c @@ -11,6 +11,7 @@ #include #include #include +#include =20 #include #include @@ -202,15 +203,19 @@ static void guest_test_explicit_conversion(uint64_t b= ase_gpa, bool do_fallocate) guest_sync_shared(gpa, size, p3, p4); memcmp_g(gpa, p4, size); =20 - /* Reset the shared memory back to the initial pattern. */ - memset((void *)gpa, init_p, size); - /* * Free (via PUNCH_HOLE) *all* private memory so that the next * iteration starts from a clean slate, e.g. with respect to * whether or not there are pages/folios in guest_mem. */ guest_map_shared(base_gpa, PER_CPU_DATA_SIZE, true); + + /* + * Reset the entire block back to the initial pattern. Do this + * after fallocate(PUNCH_HOLE) because hole-punching zeroes + * memory. + */ + memset((void *)base_gpa, init_p, PER_CPU_DATA_SIZE); } } =20 @@ -286,7 +291,8 @@ static void guest_code(uint64_t base_gpa) GUEST_DONE(); } =20 -static void handle_exit_hypercall(struct kvm_vcpu *vcpu) +static void handle_exit_hypercall(struct kvm_vcpu *vcpu, + bool back_shared_memory_with_guest_memfd) { struct kvm_run *run =3D vcpu->run; uint64_t gpa =3D run->hypercall.args[0]; @@ -303,17 +309,81 @@ static void handle_exit_hypercall(struct kvm_vcpu *vc= pu) if (do_fallocate) vm_guest_mem_fallocate(vm, gpa, size, map_shared); =20 - if (set_attributes) - vm_set_memory_attributes(vm, gpa, size, - map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE); + if (set_attributes) { + if (back_shared_memory_with_guest_memfd) { + loff_t offset; + int guest_memfd; + + guest_memfd =3D addr_gpa2guest_memfd(vm, gpa, &offset); + + if (map_shared) + guest_memfd_convert_shared(guest_memfd, offset, size); + else + guest_memfd_convert_private(guest_memfd, offset, size); + } else { + uint64_t attrs; + + attrs =3D map_shared ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE; + vm_set_memory_attributes(vm, gpa, size, attrs); + } + } run->hypercall.ret =3D 0; } =20 +static void assert_not_faultable(uint8_t *address) +{ + pid_t child_pid; + + child_pid =3D fork(); + TEST_ASSERT(child_pid !=3D -1, "fork failed"); + + if (child_pid =3D=3D 0) { + *address =3D 'A'; + TEST_FAIL("Child should have exited with a signal"); + } else { + int status; + + waitpid(child_pid, &status, 0); + + TEST_ASSERT(WIFSIGNALED(status), + "Child should have exited with a signal"); + TEST_ASSERT_EQ(WTERMSIG(status), SIGBUS); + } +} + +static void add_memslot(struct kvm_vm *vm, uint64_t gpa, uint32_t slot, + uint64_t size, int guest_memfd, + uint64_t guest_memfd_offset) +{ + struct userspace_mem_region *region; + + region =3D vm_mem_region_alloc(vm); + + guest_memfd =3D vm_mem_region_install_guest_memfd(region, guest_memfd); + + vm_mem_region_mmap(region, size, MAP_SHARED, guest_memfd, guest_memfd_off= set); + vm_mem_region_install_memory(region, size, getpagesize()); + + region->region.slot =3D slot; + region->region.flags =3D KVM_MEM_GUEST_MEMFD; + region->region.guest_phys_addr =3D gpa; + region->region.guest_memfd_offset =3D guest_memfd_offset; + + vm_mem_region_add(vm, region); +} + static bool run_vcpus; =20 -static void *__test_mem_conversions(void *__vcpu) +struct test_thread_args { - struct kvm_vcpu *vcpu =3D __vcpu; + struct kvm_vcpu *vcpu; + bool back_shared_memory_with_guest_memfd; +}; + +static void *__test_mem_conversions(void *params) +{ + struct test_thread_args *args =3D params; + struct kvm_vcpu *vcpu =3D args->vcpu; struct kvm_run *run =3D vcpu->run; struct kvm_vm *vm =3D vcpu->vm; struct ucall uc; @@ -325,7 +395,10 @@ static void *__test_mem_conversions(void *__vcpu) vcpu_run(vcpu); =20 if (run->exit_reason =3D=3D KVM_EXIT_HYPERCALL) { - handle_exit_hypercall(vcpu); + handle_exit_hypercall( + vcpu, + args->back_shared_memory_with_guest_memfd); + continue; } =20 @@ -349,8 +422,18 @@ static void *__test_mem_conversions(void *__vcpu) size_t nr_bytes =3D min_t(size_t, vm->page_size, size - i); uint8_t *hva =3D addr_gpa2hva(vm, gpa + i); =20 - /* In all cases, the host should observe the shared data. */ - memcmp_h(hva, gpa + i, uc.args[3], nr_bytes); + /* Check contents of memory */ + if (args->back_shared_memory_with_guest_memfd && + uc.args[0] =3D=3D SYNC_PRIVATE) { + assert_not_faultable(hva); + } else { + /* + * If shared and private memory use + * separate backing memory, the host + * should always observe shared data. + */ + memcmp_h(hva, gpa + i, uc.args[3], nr_bytes); + } =20 /* For shared, write the new pattern to guest memory. */ if (uc.args[0] =3D=3D SYNC_SHARED) @@ -366,14 +449,16 @@ static void *__test_mem_conversions(void *__vcpu) } } =20 -static void test_mem_conversions(enum vm_mem_backing_src_type src_type, ui= nt32_t nr_vcpus, - uint32_t nr_memslots) +static void test_mem_conversions(enum vm_mem_backing_src_type src_type, + uint32_t nr_vcpus, uint32_t nr_memslots, + bool back_shared_memory_with_guest_memfd) { /* * Allocate enough memory so that each vCPU's chunk of memory can be * naturally aligned with respect to the size of the backing store. */ const size_t alignment =3D max_t(size_t, SZ_2M, get_backing_src_pagesz(sr= c_type)); + struct test_thread_args *thread_args[KVM_MAX_VCPUS]; const size_t per_cpu_size =3D align_up(PER_CPU_DATA_SIZE, alignment); const size_t memfd_size =3D per_cpu_size * nr_vcpus; const size_t slot_size =3D memfd_size / nr_memslots; @@ -381,6 +466,7 @@ static void test_mem_conversions(enum vm_mem_backing_sr= c_type src_type, uint32_t pthread_t threads[KVM_MAX_VCPUS]; struct kvm_vm *vm; int memfd, i, r; + uint64_t flags; =20 const struct vm_shape shape =3D { .mode =3D VM_MODE_DEFAULT, @@ -394,12 +480,23 @@ static void test_mem_conversions(enum vm_mem_backing_= src_type src_type, uint32_t =20 vm_enable_cap(vm, KVM_CAP_EXIT_HYPERCALL, (1 << KVM_HC_MAP_GPA_RANGE)); =20 - memfd =3D vm_create_guest_memfd(vm, memfd_size, 0); + flags =3D back_shared_memory_with_guest_memfd ? + GUEST_MEMFD_FLAG_SUPPORT_SHARED : + 0; + memfd =3D vm_create_guest_memfd(vm, memfd_size, flags); =20 - for (i =3D 0; i < nr_memslots; i++) - vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i, - BASE_DATA_SLOT + i, slot_size / vm->page_size, - KVM_MEM_GUEST_MEMFD, memfd, slot_size * i); + for (i =3D 0; i < nr_memslots; i++) { + if (back_shared_memory_with_guest_memfd) { + add_memslot(vm, BASE_DATA_GPA + slot_size * i, + BASE_DATA_SLOT + i, slot_size, memfd, + slot_size * i); + } else { + vm_mem_add(vm, src_type, BASE_DATA_GPA + slot_size * i, + BASE_DATA_SLOT + i, + slot_size / vm->page_size, + KVM_MEM_GUEST_MEMFD, memfd, slot_size * i); + } + } =20 for (i =3D 0; i < nr_vcpus; i++) { uint64_t gpa =3D BASE_DATA_GPA + i * per_cpu_size; @@ -412,13 +509,23 @@ static void test_mem_conversions(enum vm_mem_backing_= src_type src_type, uint32_t */ virt_map(vm, gpa, gpa, PER_CPU_DATA_SIZE / vm->page_size); =20 - pthread_create(&threads[i], NULL, __test_mem_conversions, vcpus[i]); + thread_args[i] =3D malloc(sizeof(struct test_thread_args)); + TEST_ASSERT(thread_args[i] !=3D NULL, + "Could not allocate memory for thread parameters"); + thread_args[i]->vcpu =3D vcpus[i]; + thread_args[i]->back_shared_memory_with_guest_memfd =3D + back_shared_memory_with_guest_memfd; + + pthread_create(&threads[i], NULL, __test_mem_conversions, + (void *)thread_args[i]); } =20 WRITE_ONCE(run_vcpus, true); =20 - for (i =3D 0; i < nr_vcpus; i++) + for (i =3D 0; i < nr_vcpus; i++) { pthread_join(threads[i], NULL); + free(thread_args[i]); + } =20 kvm_vm_free(vm); =20 @@ -440,7 +547,7 @@ static void test_mem_conversions(enum vm_mem_backing_sr= c_type src_type, uint32_t static void usage(const char *cmd) { puts(""); - printf("usage: %s [-h] [-m nr_memslots] [-s mem_type] [-n nr_vcpus]\n", c= md); + printf("usage: %s [-h] [-g] [-m nr_memslots] [-s mem_type] [-n nr_vcpus]\= n", cmd); puts(""); backing_src_help("-s"); puts(""); @@ -448,18 +555,21 @@ static void usage(const char *cmd) puts(""); puts(" -m: specify the number of memslots (default: 1)"); puts(""); + puts(" -g: back shared memory with guest_memfd (default: false)"); + puts(""); } =20 int main(int argc, char *argv[]) { enum vm_mem_backing_src_type src_type =3D DEFAULT_VM_MEM_SRC; + bool back_shared_memory_with_guest_memfd =3D false; uint32_t nr_memslots =3D 1; uint32_t nr_vcpus =3D 1; int opt; =20 TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_V= M)); =20 - while ((opt =3D getopt(argc, argv, "hm:s:n:")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "hgm:s:n:")) !=3D -1) { switch (opt) { case 's': src_type =3D parse_backing_src_type(optarg); @@ -470,6 +580,9 @@ int main(int argc, char *argv[]) case 'm': nr_memslots =3D atoi_positive("nr_memslots", optarg); break; + case 'g': + back_shared_memory_with_guest_memfd =3D true; + break; case 'h': default: usage(argv[0]); @@ -477,7 +590,9 @@ int main(int argc, char *argv[]) } } =20 - test_mem_conversions(src_type, nr_vcpus, nr_memslots); + test_mem_conversions(src_type, nr_vcpus, nr_memslots, + back_shared_memory_with_guest_memfd); + =20 return 0; } --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46DB524167F for ; Wed, 14 May 2025 23:43:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266203; cv=none; b=RMvkGXdEag8S9j4PNBQrOM2Cx9FsTGtLdq9n0z8wclscGNwCVcQyGeq310dVDhDAh/3hG2hDD+Vxh0Gjmq2BeiU1+PIe4o4QeXEUMFlMzv7Y0quEcg73CJRBb9jO1NhlIwtQnaC6bDv9ndIsDKZVHnhVKQ0awB5e3eIFfpfrDkU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266203; c=relaxed/simple; bh=01nWvfWOqpjLSNin0f5MozA8ay58nO5pCUzg1VWRoWA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=iKzgZc2OFRz6ijZ3KCup97RS3gm3Bz7iTKLay21/1v/K+CTqr9RnewB8szFnUBjWEUfT6niNL6njU6BD7ddZlHh1R+Oz6qHkvNy1MLugP0dtyr4REAXLqKynH532FbK8M49RZiIEPtUNuyAFnmdbQcA+ab/miBr1FHsrkMc62a8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Lp/9Cau2; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Lp/9Cau2" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b1442e039eeso156669a12.0 for ; Wed, 14 May 2025 16:43:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266200; x=1747871000; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7ANgzEKnFmEYy3Z3vGn+Lctns6AT0DlxBzYIgk/HcKs=; b=Lp/9Cau2UCYtt5sD9cyrTUv+qQ2HXUCcQ/tl9au/S5dARwBut9YAVf2HKDU04WWHwn V1xoIOMDZomvNGBWMG1lYm7JLyBBBqGejceLXPAcjXPjd0VPqKZR/TNYlOr4Kodu9BKM hU096IaHUllkFn6CJUDWJpUxwOffxyQTeKujyweR2cFgEl5cjWQGc37erHihJUVPrrEx CTzl04OJsNrKFnhLB8xFrY57dsVvAeUviViB91Q3dZ+R+MMbIr7da2y/ApdFB0S8Lisq XE8KvAnxgG9bWm2mmelwd/SOybtVzrRiH35vBXTD87oCzEb7NSwXFNz0+CMUbNGoKd2n zhMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266200; x=1747871000; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7ANgzEKnFmEYy3Z3vGn+Lctns6AT0DlxBzYIgk/HcKs=; b=aVx0nmF2VqiToP6raO6S79/hzge1nOWjiBBz9AxC+8qeaqcD7iNi/PZ4CO7bLDL/Oz +mNUGpfERBHswLm49g4SgAeqOFJb+TcJeLEoH3h0eg2s/sIKUK0XbWsSDmG2vXoJDtuj j/akualrRP/SJi8BH7ojS46SGru1tcyrObz1SGIJhBJkM/pvq/CxjkrkV7mSxxlMDACP Dw6tk8yRVqc7x5LCH4g4C0RaCTcO0hfJDAI5cUNh1dXT6zFtdiLA+OkxOcfVFJHF0FfT gfR8+eYUlkeRSzPLq8PFhdWUUCQtEVY1jNFEoPADZFw4wh8jMCVPLdM7Mq7PVVDdZrZC PqOg== X-Forwarded-Encrypted: i=1; AJvYcCXp2m5J+MOkuxLy+Ys6Yi8iUnIq6LBAZfBkOYRDNJ01cpRzOtZrKqDZyUaxoBvLBFtBTKEQRMx6MbrQtKo=@vger.kernel.org X-Gm-Message-State: AOJu0Yy2YhKB9B8sB9cAl3rFWHtmv7KUhFJ9fYtAvW1nXpbPtzIHFEXA TSXCmeafy6aqWkmol5QeZwklYI2LY86Lt4yt4RpJVzvrw7CNqldlEsi9IM+Ud0t/CeNg8mIystR 0zFmAE6atNsGQAY79YZru+A== X-Google-Smtp-Source: AGHT+IG3FjeJkDgm9Ms4EWnoaOoEWXfsjihW73xVKgzCGlB7gjrFDeuIA+VwVYTWTC4W+fpiL6H7gEsE4mtNiKz5/Q== X-Received: from plblq11.prod.google.com ([2002:a17:903:144b:b0:231:9c86:58a1]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:e5cb:b0:224:1943:c5c with SMTP id d9443c01a7336-231980dae09mr76459405ad.15.1747266200421; Wed, 14 May 2025 16:43:20 -0700 (PDT) Date: Wed, 14 May 2025 16:41:54 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <80cbdc463d3ee89b98e471e1f96f6739c903bc01.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 15/51] KVM: selftests: Update script to map shared memory from guest_memfd From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update the private_mem_conversions_test.sh script to use the -g flag to also test conversions when both private and shared memory are mapped from guest_memfd. Change-Id: I16f8f6e4e5c361bbc4daeb66f15e8165db3d98f7 Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/x86/private_mem_conversions_test.sh | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.s= h b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh index 76efa81114d2..5dda6916e071 100755 --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh @@ -71,6 +71,9 @@ TEST_EXECUTABLE=3D"$(dirname "$0")/private_mem_conversion= s_test" $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test -m $num_memslots_t= o_test =20 + $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test -g + $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test -m $num_memslots_t= o_test -g + { set +x; } 2>/dev/null =20 echo --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 945D623E35D for ; Wed, 14 May 2025 23:43:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266204; cv=none; b=L1sJewYg4uk2kaFesRZmLv9Q0BmUF3VsMeDvohi4GA3VUvhNRoYJtahcj6L74RXf3ghc56OEVTseYwPKXcMQ0BNFfJclc3lqEEsE25mRhfvNAHzZx537Gg4PFKHieigGPIefjZN4feYOdGOT5iSzy5RWHu6LiZ76D3PrNEfhOsw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266204; c=relaxed/simple; bh=oDGLsbNd0LFnQpe9m623cVt5GSRtxYpnAP9K0Tz+McI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jwZ7gF2geyqMU7ui/Ppgj0QuW5SKhJBYVkZdlCN61NDxwYcWqYEQeL2D+q2ozrBDZ2sidXNVNmmb3yxnVpzWRFvOsOPfLJ411YwqRj7Std6fDAjIcfDwooaUFRv+E1qYI0WH3YGeSlUPmeBnz4NgEYW1Ycu5DdJT0Btn+26aep8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VwGpqAg1; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VwGpqAg1" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-22e50d4f49bso4634765ad.1 for ; Wed, 14 May 2025 16:43:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266202; x=1747871002; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jsmcesfFZlM8vI9EmxVXAuSJMymuSrAMKnT60ysxEi8=; b=VwGpqAg1Lo1nEQSAPNAKq8yobsQbMGlAcDivNdT0yausF9o03kYhUpGhVx02F5XVuz 6ZO7iOak2Sym4cNKwXWZpgLojHmMzjr7fyqcJUxWBwim3ZbompUgvARsMSnl5A9a8Nys V1GfuS8CCXH4OurkveUWEF47uVRgzOwwCnBDubDs7LQa13sLNDbBguGzKPUDbjxnKKcV ZpnqETrJFkwGid2hSS6nPiXcg1ixiidcpzMW9VG2rSs6ks/ATcbxv15lvGCICnmDzBhv XduNfsSpUF8ATvmsBMmsxE6qhBur0V4gXa6TKJ8nPmWiH3cRS55lHLrT4WzbnkK1wWak 3Ijg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266202; x=1747871002; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jsmcesfFZlM8vI9EmxVXAuSJMymuSrAMKnT60ysxEi8=; b=NYZNRU5E7EpRgVa6XR5lecFtt1hHOcS941iMK0UIEp5XBm2m8tPVtN3tA2FrHtMRsO I6yU7HRLVODYhJVo5gYOsnJV4jNmm9wFJ+CElSrABsU7C/e/zYkphOtWMmMTLTA+2zMW a101EPCW7Rw7adfodxa8gbn5NaFQil9Jm/1QlmTy+WjuiB4C1gQliRSxgFkY+L9yWR/X wE9oGLSz9g9IUGL+VQ5PUVZDMXKjuHkPAhY/RzIL/c/aRIFGs2d0Lno92Aq8wAGnBwh2 KSRHJzHQuA1GY3kxJdP6j62B5iExPBWty55sK+lzIeiPVYfaSJT8hf82PpWJ2P5TkAhH TlNQ== X-Forwarded-Encrypted: i=1; AJvYcCVwikJmz1QhN0NUT3Dn5iPd6krR1mHMGTPy79N80UrflxTXfokrxM4tF9551wQX/FEEnDH2drJ2R2CzGLg=@vger.kernel.org X-Gm-Message-State: AOJu0YxikaVjpCTXJucqTj8g6ZbRtbsWED3ZYgLkbojRgFFMEJlJdbJg 3RNCdkaPw1cBMX8ILaMUoMJNgFco3tTwLnOUvUnZDPpf1nXlD6/8AOc+npS7gVYWGsUl/jpJ1CR QHIkLpVAj6zrR6cdOzhZgCw== X-Google-Smtp-Source: AGHT+IF/P4aWlf9UdfQ/8YaI5I5tpZ8zq5xGXr4nPXfnbppEkqE1Ea4GA3jweli9KF4XtKTltUhzxoFBstS+BIdJOA== X-Received: from plbmb16.prod.google.com ([2002:a17:903:990:b0:223:4788:2e83]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:32c6:b0:22e:7c70:ed12 with SMTP id d9443c01a7336-231981521a7mr91558935ad.48.1747266201996; Wed, 14 May 2025 16:43:21 -0700 (PDT) Date: Wed, 14 May 2025 16:41:55 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <8548af334e01401a776aae37a0e9f30f9ffbba8c.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 16/51] mm: hugetlb: Consolidate interpretation of gbl_chg within alloc_hugetlb_folio() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Previously, gbl_chg was passed from alloc_hugetlb_folio() into dequeue_hugetlb_folio_vma(), leaking the concept of gbl_chg into dequeue_hugetlb_folio_vma(). This patch consolidates the interpretation of gbl_chg into alloc_hugetlb_folio(), also renaming dequeue_hugetlb_folio_vma() to dequeue_hugetlb_folio() so dequeue_hugetlb_folio() can just focus on dequeuing a folio. Change-Id: I31bf48af2400b6e13b44d03c8be22ce1a9092a9c Signed-off-by: Ackerley Tng Reviewed-by: James Houghton --- mm/hugetlb.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6ea1be71aa42..b843e869496f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1364,9 +1364,9 @@ static unsigned long available_huge_pages(struct hsta= te *h) return h->free_huge_pages - h->resv_huge_pages; } =20 -static struct folio *dequeue_hugetlb_folio_vma(struct hstate *h, - struct vm_area_struct *vma, - unsigned long address, long gbl_chg) +static struct folio *dequeue_hugetlb_folio(struct hstate *h, + struct vm_area_struct *vma, + unsigned long address) { struct folio *folio =3D NULL; struct mempolicy *mpol; @@ -1374,13 +1374,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struc= t hstate *h, nodemask_t *nodemask; int nid; =20 - /* - * gbl_chg=3D=3D1 means the allocation requires a new page that was not - * reserved before. Making sure there's at least one free page. - */ - if (gbl_chg && !available_huge_pages(h)) - goto err; - gfp_mask =3D htlb_alloc_mask(h); nid =3D huge_node(vma, address, gfp_mask, &mpol, &nodemask); =20 @@ -1398,9 +1391,6 @@ static struct folio *dequeue_hugetlb_folio_vma(struct= hstate *h, =20 mpol_cond_put(mpol); return folio; - -err: - return NULL; } =20 /* @@ -3074,12 +3064,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, goto out_uncharge_cgroup_reservation; =20 spin_lock_irq(&hugetlb_lock); + /* - * glb_chg is passed to indicate whether or not a page must be taken - * from the global free pool (global change). gbl_chg =3D=3D 0 indicates - * a reservation exists for the allocation. + * gbl_chg =3D=3D 0 indicates a reservation exists for the allocation - so + * try dequeuing a page. If there are available_huge_pages(), try using + * them! */ - folio =3D dequeue_hugetlb_folio_vma(h, vma, addr, gbl_chg); + folio =3D NULL; + if (!gbl_chg || available_huge_pages(h)) + folio =3D dequeue_hugetlb_folio(h, vma, addr); + if (!folio) { spin_unlock_irq(&hugetlb_lock); folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 146AD243964 for ; Wed, 14 May 2025 23:43:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266205; cv=none; b=nJO+P1YxwvK3UFvWJm34l9Bup6OTpYn3LgzDoNoUV01vuOvbdf4zuVcP61vztLFxx5GP5dJ67X2eNjNVwgkrEHvMWHKRkfsOdz+YSXSvZU9E4qwvstR0A+LEEzAP8JPQWRw6vAvV/CZ+jlOY5bU1D+wGHdnduFm5B0K7HwGxCcY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266205; c=relaxed/simple; bh=eK6YjjFU2KAHnAlEWkJCwdN+qAg40CDXbvkKpISmFJA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=c+jmgYT0scUH6Qluv0La/sB/l0f+wfxQ7qtj5XaXYY5K5AGN4WQL3RXWl0xcCL310iIJe200IKSkLmaZ8Bs9nIyBEwVTbEAxszY83wUxrsZE4BJoTA//b40fpVpDl0L5cKBNwrKfZQ0XOw62W3BEsxCfBElkpp3dwQIRNLCX0G8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2wXTQMTU; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2wXTQMTU" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30c21be92dbso335673a91.0 for ; Wed, 14 May 2025 16:43:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266203; x=1747871003; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=E26Tl2T4mfZ/m1IX0grkZCH7fXFwkgs13AgMIoZPWuo=; b=2wXTQMTU5WjwP1MmTJ+4JwV5x3AeVcZA7SFFt4X7HN1emjcR01N/MIqNZrgruK0nvf 4rSPDaZ/yJ3ww2nhks8M7wzfLhzCFM2vKj38iSQ17D01CCoCPxgPvnR+JOR2gGhigxOI tt3e3LqqGUW04PtbB31V25o/p4EdqbWqUR8ZggRM6MPvPmhw4RZ0MSYfu8JM6StEVfYe RSF5Cq2uEdpIemfzhhqSz8BPHJ+P1whPgbZYhLBP0JW6C+eiqlXMrjmLaKWD1INLoaUH +jJ7k4AxzqlUDvMxtMSqUnikC5/Fz8Lqm4sJuPyQtiisTEaYYkh+HNOPUhHx5L/4f8cT xO2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266203; x=1747871003; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=E26Tl2T4mfZ/m1IX0grkZCH7fXFwkgs13AgMIoZPWuo=; b=OHwbSv/Kd6bxP7Najwt1V1D+FkWByndIABwh1kjZwQ01GhIujOz6YxjuuNnfNjI7MY YcsnD88V5bzGrIYjEUyv6s6Z0vQ3Yq4+bfWVqGdtr+SKJsg34FHlQyVKMNHOtSPNJH0F aepq7V6yQ9tlI/Da0OH+POQLmj0nj/zjQU7wGET+lA6LvEucX2MwnQpl4caIDeigdjHE RisFZ34NQCri5wI724sJZZ0tgA9S8AQLNsVxgHvBzCFeZ+1dQ4eKOiq3XyB5r53WPooN giQI0hjQS4zRApQZtpDgUn5/WErmvRPAW6/TTAtUK44G1R3dtevLUzcyBDcZE1ImTcS6 U7eA== X-Forwarded-Encrypted: i=1; AJvYcCW93DzeyIb24m0sChwCcv92KjZd5k9Z2JnSVAOxvm+9TwlR7Ke6SkcJIfByg7fkPOQHwgMQewmKxnE6weg=@vger.kernel.org X-Gm-Message-State: AOJu0Yzbr7OOg5fIlf3QDebyAT0tebmH7BbVemL+PzLUjl9GKxMqfI4C EyPB8KRcPQvEKFUa1EBAYd+LRxtOvfzQ9XoTpziAk7/qWRhF58rRuEOweZfSraT0yLfqK3PPm5y gIztjPvpj/wroH96huSCMXfWdQg== X-Google-Smtp-Source: AGHT+IGGAEb3QlUf/WhLCRTxvufYT1iGlYd55goOcpCx/3NfQyHKKzpB2yIq1YSNBW8KeC+O0YA7TSeUeA6QlfAJtA== X-Received: from pjbpb18.prod.google.com ([2002:a17:90b:3c12:b0:2f5:63a:4513]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2d84:b0:2fc:aaf:74d3 with SMTP id 98e67ed59e1d1-30e4daca157mr2160148a91.4.1747266203512; Wed, 14 May 2025 16:43:23 -0700 (PDT) Date: Wed, 14 May 2025 16:41:56 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 17/51] mm: hugetlb: Cleanup interpretation of gbl_chg in alloc_hugetlb_folio() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The comment before dequeuing a folio explains that if gbl_chg =3D=3D 0, a reservation exists for the allocation. In addition, if a vma reservation exists, there's no need to get a reservation from the subpool, and gbl_chg was set to 0. This patch replaces both of that with code: subpool_reservation_exists defaults to false, and if a vma reservation does not exist, a reservation is sought from the subpool. Then, the existence of a reservation, whether in the vma or subpool, is summarized into reservation_exists, which is then used to determine whether to dequeue a folio. Signed-off-by: Ackerley Tng Change-Id: I52130a0bf9f33e07d320a446cdb3ebfddd9de658 --- mm/hugetlb.c | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b843e869496f..597f2b9f62b5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2999,8 +2999,10 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, { struct hugepage_subpool *spool =3D subpool_vma(vma); struct hstate *h =3D hstate_vma(vma); + bool subpool_reservation_exists; + bool reservation_exists; struct folio *folio; - long retval, gbl_chg; + long retval; map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg =3D NULL; @@ -3036,17 +3038,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, * that the allocation will not exceed the subpool limit. * Or if it can get one from the pool reservation directly. */ + subpool_reservation_exists =3D false; if (map_chg) { - gbl_chg =3D hugepage_subpool_get_pages(spool, 1); - if (gbl_chg < 0) + int npages_req =3D hugepage_subpool_get_pages(spool, 1); + + if (npages_req < 0) goto out_end_reservation; - } else { - /* - * If we have the vma reservation ready, no need for extra - * global reservation. - */ - gbl_chg =3D 0; + + subpool_reservation_exists =3D npages_req =3D=3D 0; } + reservation_exists =3D !map_chg || subpool_reservation_exists; =20 /* * If this allocation is not consuming a per-vma reservation, @@ -3065,13 +3066,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, =20 spin_lock_irq(&hugetlb_lock); =20 - /* - * gbl_chg =3D=3D 0 indicates a reservation exists for the allocation - so - * try dequeuing a page. If there are available_huge_pages(), try using - * them! - */ folio =3D NULL; - if (!gbl_chg || available_huge_pages(h)) + if (reservation_exists || available_huge_pages(h)) folio =3D dequeue_hugetlb_folio(h, vma, addr); =20 if (!folio) { @@ -3089,7 +3085,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, * Either dequeued or buddy-allocated folio needs to add special * mark to the folio when it consumes a global reservation. */ - if (!gbl_chg) { + if (reservation_exists) { folio_set_hugetlb_restore_reserve(folio); h->resv_huge_pages--; } --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC6071E7660 for ; Wed, 14 May 2025 23:43:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266207; cv=none; b=UEkql/qetgw9zlxOxx/r5F/+ectv4WILnWb7CuT6T5qjPztQ9qJ9sliCHqaJOblDjbBNxj/jzHpl2ZpaLa1g6sBFsbDXjIe2sdN01bUtrxcKe1nv4C7ckNwk75aC/bd/m2cX5XJktoXM8eU0egCRbar4NTojqpf64GTS/XAA/yc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266207; c=relaxed/simple; bh=kJAGQDoBsHG15Auu5vrc1qww/RHpjpxO2Cg3G3Gx12c=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=KLy2/f1Be3UcRGggXfXcdWMoowRS+Wt7+IOYfCvdSv/+E6ha81E1fyNlmWmh3taQAiXXrvD/Z4+VVa9iIKb0771LSl+Cr3cPp9MEb6cKiIDeQMY7qdb4ddoxmhvbnDRldRyH15Zl2FJJfsbR/GNeWNxjmotjBbF0HkzYHIgOO98= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=QpZy8O7P; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="QpZy8O7P" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b1442e039eeso156717a12.0 for ; Wed, 14 May 2025 16:43:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266205; x=1747871005; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=gu1hE3sOpXQcmnP4gjBDOWYVUzePLag+pWRkAyBiM5E=; b=QpZy8O7PiIxqMO0/JYlNqTGIwg7x3/ZNio5WCO1JQMbch4ZEfovFQTPoe92h+I5mFy e7WmTPbLPo2HGmH7jQ6E/F/uEkEoSSiI1jq1VjPOCOt947H9l0M7C2vVOaoq2fnRZrHb lpxTlxrOXucXjOmDfqHjztltl39HFU1leiCAA4W741RkpKusnibrf9XvDSdheiFtwmRx hIwVPBN6BTwKzCig8kPKa4fm86KPe9kMOJteEeCMpFg56QAKVw1I57zOlbjREjQsVWfN 8sp1eyMoaJDvyEDzJe9GUYXA0Z2e4sVkaHzfrA3GjBesOwFI712y7jRk7c7nijdCGHHe E3zQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266205; x=1747871005; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=gu1hE3sOpXQcmnP4gjBDOWYVUzePLag+pWRkAyBiM5E=; b=t8sZPSiUny025a8Dr0C6BO84xg5PFgFC9l84OH5Xe/jd69E8QTGBgEtR9w7vmZkOIN AYA3T1y5nIT+3UfxjMJT87OkNXdP10trt/FmzX+4FSOtDIo9PMbOoiwpFzsN1ew+LxrG 70YSMFZ7l2G0RBJFtqsxveBnSlMcuzK4K3BcKbu2Me6HH8lKD5NAkRtcyrIJW6+B0twf GqG7DrtP1PP3U/bbhTy3LfNpgLBJls3fAQmwuqVU4BGboOmvFQEBPbG7BDiSB1aAxOsL WvqG+dA/wG76w8gYgsVVSBLrYg3kOhByFaXmQWMWi5SO7rbRBQfqJB6E0BRD4c72izrd DrkQ== X-Forwarded-Encrypted: i=1; AJvYcCWqCV9RPAAw7MqrlC2bP0lhZ2DoUuNxfyWrRL40BP+f0EzI7ti/z/nm4XwoRsrd2HtnLcrI1S8tCpXZwcQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yy5shycbiGQeJvIuTO5ykw+fQr9Q5MXo0xxNOPy9jlxgdWvtWGg IhRSSStAWb6ZWC65z2o3DAk5/7VUdi2hKZ6I+YXGphACa8Qtjwxs/R/lPIEWNj7Zf+oMcg+DDcC yD2pBpe/UWqNJgDSjXDRxUw== X-Google-Smtp-Source: AGHT+IHrutaRsnjMWCkKIi090WWBQzOHOSc5C6RScHY9wSl72p4og14yPX1uNLmn95Ao7YjcjSzx202Z6voCwDWGow== X-Received: from pjbeu14.prod.google.com ([2002:a17:90a:f94e:b0:2fc:2f33:e07d]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2dc3:b0:30a:883a:ea5b with SMTP id 98e67ed59e1d1-30e2e5c84f4mr9725120a91.17.1747266205029; Wed, 14 May 2025 16:43:25 -0700 (PDT) Date: Wed, 14 May 2025 16:41:57 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <782bb82a0d2d62b616daebb77dc3d9e345fb76fa.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 18/51] mm: hugetlb: Cleanup interpretation of map_chg_state within alloc_hugetlb_folio() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Interpreting map_chg_state inline, within alloc_hugetlb_folio(), improves readability. Instead of having cow_from_owner and the result of vma_needs_reservation() compute a map_chg_state, and then interpreting map_chg_state within alloc_hugetlb_folio() to determine whether to + Get a page from the subpool or + Charge cgroup reservations or + Commit vma reservations or + Clean up reservations This refactoring makes those decisions just based on whether a vma_reservation_exists. If a vma_reservation_exists, the subpool had already been debited and the cgroup had been charged, hence alloc_hugetlb_folio() should not double-debit or double-charge. If the vma reservation can't be used (as in cow_from_owner), then the vma reservation effectively does not exist and vma_reservation_exists is set to false. The conditions for committing reservations or cleaning are also updated to be paired with the corresponding conditions guarding reservation creation. Signed-off-by: Ackerley Tng Change-Id: I22d72a2cae61fb64dc78e0a870b254811a06a31e --- mm/hugetlb.c | 94 ++++++++++++++++++++++------------------------------ 1 file changed, 39 insertions(+), 55 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 597f2b9f62b5..67144af7ab79 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2968,25 +2968,6 @@ void wait_for_freed_hugetlb_folios(void) flush_work(&free_hpage_work); } =20 -typedef enum { - /* - * For either 0/1: we checked the per-vma resv map, and one resv - * count either can be reused (0), or an extra needed (1). - */ - MAP_CHG_REUSE =3D 0, - MAP_CHG_NEEDED =3D 1, - /* - * Cannot use per-vma resv count can be used, hence a new resv - * count is enforced. - * - * NOTE: This is mostly identical to MAP_CHG_NEEDED, except - * that currently vma_needs_reservation() has an unwanted side - * effect to either use end() or commit() to complete the - * transaction. Hence it needs to differenciate from NEEDED. - */ - MAP_CHG_ENFORCED =3D 2, -} map_chg_state; - /* * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW * faults of hugetlb private mappings on top of a non-page-cache folio (in @@ -3000,46 +2981,45 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, struct hugepage_subpool *spool =3D subpool_vma(vma); struct hstate *h =3D hstate_vma(vma); bool subpool_reservation_exists; + bool vma_reservation_exists; bool reservation_exists; + bool charge_cgroup_rsvd; struct folio *folio; - long retval; - map_chg_state map_chg; int ret, idx; struct hugetlb_cgroup *h_cg =3D NULL; gfp_t gfp =3D htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; =20 idx =3D hstate_index(h); =20 - /* Whether we need a separate per-vma reservation? */ if (cow_from_owner) { /* * Special case! Since it's a CoW on top of a reserved * page, the private resv map doesn't count. So it cannot * consume the per-vma resv map even if it's reserved. */ - map_chg =3D MAP_CHG_ENFORCED; + vma_reservation_exists =3D false; } else { /* * Examine the region/reserve map to determine if the process - * has a reservation for the page to be allocated. A return - * code of zero indicates a reservation exists (no change). + * has a reservation for the page to be allocated and debit the + * reservation. If the number of pages required is 0, + * reservation exists. */ - retval =3D vma_needs_reservation(h, vma, addr); - if (retval < 0) + int npages_req =3D vma_needs_reservation(h, vma, addr); + + if (npages_req < 0) return ERR_PTR(-ENOMEM); - map_chg =3D retval ? MAP_CHG_NEEDED : MAP_CHG_REUSE; + + vma_reservation_exists =3D npages_req =3D=3D 0; } =20 /* - * Whether we need a separate global reservation? - * - * Processes that did not create the mapping will have no - * reserves as indicated by the region/reserve map. Check - * that the allocation will not exceed the subpool limit. - * Or if it can get one from the pool reservation directly. + * Debit subpool only if a vma reservation does not exist. If + * vma_reservation_exists, the vma reservation was either moved from the + * subpool or taken directly from hstate in hugetlb_reserve_pages() */ subpool_reservation_exists =3D false; - if (map_chg) { + if (!vma_reservation_exists) { int npages_req =3D hugepage_subpool_get_pages(spool, 1); =20 if (npages_req < 0) @@ -3047,13 +3027,16 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, =20 subpool_reservation_exists =3D npages_req =3D=3D 0; } - reservation_exists =3D !map_chg || subpool_reservation_exists; + + reservation_exists =3D vma_reservation_exists || subpool_reservation_exis= ts; =20 /* - * If this allocation is not consuming a per-vma reservation, - * charge the hugetlb cgroup now. + * If a vma_reservation_exists, we can skip charging hugetlb + * reservations since that was charged in hugetlb_reserve_pages() when + * the reservation was recorded on the resv_map. */ - if (map_chg) { + charge_cgroup_rsvd =3D !vma_reservation_exists; + if (charge_cgroup_rsvd) { ret =3D hugetlb_cgroup_charge_cgroup_rsvd( idx, pages_per_huge_page(h), &h_cg); if (ret) @@ -3091,10 +3074,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, } =20 hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); - /* If allocation is not consuming a reservation, also store the - * hugetlb_cgroup pointer on the page. - */ - if (map_chg) { + + if (charge_cgroup_rsvd) { hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), h_cg, folio); } @@ -3103,25 +3084,27 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, =20 hugetlb_set_folio_subpool(folio, spool); =20 - if (map_chg !=3D MAP_CHG_ENFORCED) { - /* commit() is only needed if the map_chg is not enforced */ - retval =3D vma_commit_reservation(h, vma, addr); + /* If vma accounting wasn't bypassed earlier, follow up with commit. */ + if (!cow_from_owner) { + int ret =3D vma_commit_reservation(h, vma, addr); /* - * Check for possible race conditions. When it happens.. - * The page was added to the reservation map between - * vma_needs_reservation and vma_commit_reservation. - * This indicates a race with hugetlb_reserve_pages. + * If there is a discrepancy in reservation status between the + * time of vma_needs_reservation() and vma_commit_reservation(), + * then there the page must have been added to the reservation + * map between vma_needs_reservation() and + * vma_commit_reservation(). + * * Adjust for the subpool count incremented above AND * in hugetlb_reserve_pages for the same page. Also, * the reservation count added in hugetlb_reserve_pages * no longer applies. */ - if (unlikely(map_chg =3D=3D MAP_CHG_NEEDED && retval =3D=3D 0)) { + if (unlikely(!vma_reservation_exists && ret =3D=3D 0)) { long rsv_adjust; =20 rsv_adjust =3D hugepage_subpool_put_pages(spool, 1); hugetlb_acct_memory(h, -rsv_adjust); - if (map_chg) { + if (charge_cgroup_rsvd) { spin_lock_irq(&hugetlb_lock); hugetlb_cgroup_uncharge_folio_rsvd( hstate_index(h), pages_per_huge_page(h), @@ -3149,14 +3132,15 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, out_uncharge_cgroup: hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); out_uncharge_cgroup_reservation: - if (map_chg) + if (charge_cgroup_rsvd) hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), h_cg); out_subpool_put: - if (map_chg) + if (!vma_reservation_exists) hugepage_subpool_put_pages(spool, 1); out_end_reservation: - if (map_chg !=3D MAP_CHG_ENFORCED) + /* If vma accounting wasn't bypassed earlier, cleanup. */ + if (!cow_from_owner) vma_end_reservation(h, vma, addr); return ERR_PTR(-ENOSPC); } --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D36524677A for ; Wed, 14 May 2025 23:43:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266209; cv=none; b=Xg7Pnv9CinIa+/u5uD6ZnLMH8COVFbibAk3YE6q/qJN6g6Cm5+AJia3FgzSKbTtxzHQIkaoywNDNMew9y8SEf2pxZlfAInxGQNaf95seG5puBSmkKZTiKxEaTQ9eoroaoqJuA1+ySdSZ4bO8vEOQFQQpn2crEE0E+jolLRL9gDk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266209; c=relaxed/simple; bh=0/T6Mrd1tLR+CRRQguZ0xMQEiqUuznphaf6jJwSvnvM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Qf+3tbN5ldWDn9gEsSPZhaG3ya+k9LrfEl7MJ9Cmpa6gHgx9Js74qfwoKQzA6ZdEilgo7LgzVua5E0GTD54D8kkJT8f7ECwyUU88a/8aG4JUs96FMVqFjDIHgRg2PlCKJ+CDSBwsyCS4ijW2ijhm1w4IZ+988JoeJ+YLOCTxjZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=seZ6CoU1; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="seZ6CoU1" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b0f807421c9so140185a12.0 for ; Wed, 14 May 2025 16:43:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266207; x=1747871007; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=5oaNONcPwTQNIIHvMztjnu28D1ASTydoh1/NvmmsKm4=; b=seZ6CoU1cdmBISfwdI2rooQgKJZ1G3ejJasMflQ8Mk1rGgzBnSBvfB42UcOigGlQL4 P7NIP94bhrMyrZ7/bkBbZMVS8XvMMtPyibvbEEWPSccuh3Df8ssgoLr2hvyBrbfL+3Dj 3qO54qlwhE/2qYu/oa2Wz+gSkos+TFbojAhhtGOg2pZxBNgfBUGSSn/qa6fq6DuG6QCo 98lkBIzG1MtyO64jJd336Ur5Zcxq3mhhJiihhdnwC5I4OiINhwq0qJhYQHdLRuOrpEn/ TU5iYcfLgPAmbBKkHBBqYqahOVU8faB4KktBLo4dXlIokH2VkhITxtVLfRObLLwZOSsN hs+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266207; x=1747871007; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=5oaNONcPwTQNIIHvMztjnu28D1ASTydoh1/NvmmsKm4=; b=ay/uXaTZITV95dUqP76HVFJ9EvB3y3bH5mdQRxlLxTkGghD0tCcQkWMdh2bkL/fw+J +C+GxW/ucZZOyKLmqxVqy6wliX6lGtoX18VEXwJqpyac8DV7LJqp7njcZ5fHubJpiBbN f0LWQWLmIPuXQgdHL3XlVEYEX9O9iQILftS/H4plS+bhFjsg1ekDxpjwOUYvLwqeUF/x 69qyfz1pPZJpzVTHx8SxFyGTPOCDYhJNDGYlhRWt/pL5l+/EQtqxjykNP4y8JesdEQ2O giV3Hp6JHO/IYOp3HEVMoYHtOGYkKAyHyxuZyNxIFBqbllQWNt+anpewfFQdxnB/VRHz 3qdQ== X-Forwarded-Encrypted: i=1; AJvYcCWMewwnOBg8nv4RQO3gqbwyO3ekeW6YYE0Op2Nu+PUri1r1gFJX2UMCjWXVWuJug/B27I85CPg8nhGmMWw=@vger.kernel.org X-Gm-Message-State: AOJu0Yz7QKkx1DPm+X01iva5KL1zdKbMmfx/RgtS2ixDhZi/W8AjGBs/ FPs0c1UMToeBXniqA9zOyazx0F5WlnjMwrra6o+sKlQSK/Zh+gTs5J1yYj5Hv/bRi3V6oi4Ch20 b5zoHzYGfJyNYvvwKBKGAfw== X-Google-Smtp-Source: AGHT+IFDw6Fhs5VsEhgFA8tXL8E4i6uAaDFqGq3lNYpfsvbH4tE4uhaRVHc3Hip3C1CIUzADwan554jzbkPTgKA3mg== X-Received: from pgbfm8.prod.google.com ([2002:a05:6a02:4988:b0:b0b:301e:8e96]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:6005:b0:1f5:82ae:69d1 with SMTP id adf61e73a8af0-215ff1254b5mr7372409637.20.1747266206677; Wed, 14 May 2025 16:43:26 -0700 (PDT) Date: Wed, 14 May 2025 16:41:58 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <66aa28f888e392f7039de1c20ef854fb05a3c839.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 19/51] mm: hugetlb: Rename alloc_surplus_hugetlb_folio From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rename alloc_surplus_hugetlb_folio vs alloc_surplus_hugetlb_folio_nodemask to align with dequeue_hugetlb_folio vs dequeue_hugetlb_folio_nodemask. Signed-off-by: Ackerley Tng Change-Id: I38982497eb70aeb174c386ed71bb896d85939eae --- mm/hugetlb.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 67144af7ab79..b822b204e9b3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2236,7 +2236,7 @@ int dissolve_free_hugetlb_folios(unsigned long start_= pfn, unsigned long end_pfn) /* * Allocates a fresh surplus page from the page allocator. */ -static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h, +static struct folio *alloc_surplus_hugetlb_folio_nodemask(struct hstate *h, gfp_t gfp_mask, int nid, nodemask_t *nmask) { struct folio *folio =3D NULL; @@ -2312,9 +2312,9 @@ static struct folio *alloc_migrate_hugetlb_folio(stru= ct hstate *h, gfp_t gfp_mas /* * Use the VMA's mpolicy to allocate a huge page from the buddy. */ -static -struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h, - struct vm_area_struct *vma, unsigned long addr) +static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h, + struct vm_area_struct *vma, + unsigned long addr) { struct folio *folio =3D NULL; struct mempolicy *mpol; @@ -2326,14 +2326,14 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(s= truct hstate *h, if (mpol_is_preferred_many(mpol)) { gfp_t gfp =3D gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); =20 - folio =3D alloc_surplus_hugetlb_folio(h, gfp, nid, nodemask); + folio =3D alloc_surplus_hugetlb_folio_nodemask(h, gfp, nid, nodemask); =20 /* Fallback to all nodes if page=3D=3DNULL */ nodemask =3D NULL; } =20 if (!folio) - folio =3D alloc_surplus_hugetlb_folio(h, gfp_mask, nid, nodemask); + folio =3D alloc_surplus_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemas= k); mpol_cond_put(mpol); return folio; } @@ -2435,14 +2435,14 @@ static int gather_surplus_pages(struct hstate *h, l= ong delta) =20 /* Prioritize current node */ if (node_isset(numa_mem_id(), alloc_nodemask)) - folio =3D alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h), + folio =3D alloc_surplus_hugetlb_folio_nodemask(h, htlb_alloc_mask(h), numa_mem_id(), NULL); =20 if (!folio) { for_each_node_mask(node, alloc_nodemask) { if (node =3D=3D numa_mem_id()) continue; - folio =3D alloc_surplus_hugetlb_folio(h, htlb_alloc_mask(h), + folio =3D alloc_surplus_hugetlb_folio_nodemask(h, htlb_alloc_mask(h), node, NULL); if (folio) break; @@ -3055,7 +3055,7 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, =20 if (!folio) { spin_unlock_irq(&hugetlb_lock); - folio =3D alloc_buddy_hugetlb_folio_with_mpol(h, vma, addr); + folio =3D alloc_surplus_hugetlb_folio(h, vma, addr); if (!folio) goto out_uncharge_cgroup; spin_lock_irq(&hugetlb_lock); @@ -3868,11 +3868,12 @@ static int set_max_huge_pages(struct hstate *h, uns= igned long count, int nid, * First take pages out of surplus state. Then make up the * remaining difference by allocating fresh huge pages. * - * We might race with alloc_surplus_hugetlb_folio() here and be unable - * to convert a surplus huge page to a normal huge page. That is - * not critical, though, it just means the overall size of the - * pool might be one hugepage larger than it needs to be, but - * within all the constraints specified by the sysctls. + * We might race with alloc_surplus_hugetlb_folio_nodemask() + * here and be unable to convert a surplus huge page to a normal + * huge page. That is not critical, though, it just means the + * overall size of the pool might be one hugepage larger than it + * needs to be, but within all the constraints specified by the + * sysctls. */ while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { if (!adjust_pool_surplus(h, nodes_allowed, -1)) @@ -3930,10 +3931,11 @@ static int set_max_huge_pages(struct hstate *h, uns= igned long count, int nid, * By placing pages into the surplus state independent of the * overcommit value, we are allowing the surplus pool size to * exceed overcommit. There are few sane options here. Since - * alloc_surplus_hugetlb_folio() is checking the global counter, - * though, we'll note that we're not allowed to exceed surplus - * and won't grow the pool anywhere else. Not until one of the - * sysctls are changed, or the surplus pages go out of use. + * alloc_surplus_hugetlb_folio_nodemask() is checking the global + * counter, though, we'll note that we're not allowed to exceed + * surplus and won't grow the pool anywhere else. Not until one + * of the sysctls are changed, or the surplus pages go out of + * use. * * min_count is the expected number of persistent pages, we * shouldn't calculate min_count by using --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E232C2472BA for ; Wed, 14 May 2025 23:43:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266210; cv=none; b=nRODVyCtakNhoNdAfCHHNrkXEO8x22oUcPy102tl2yMjvbq72cbUdezuB8Ru//FDgX9wxhzxntpynvX0pQcFga5nYtogVmLN3Y99sNE3GhLJeugeh34nTozrudfWAIk9ZFI3r4C6n5qPqSJPeKTmdVQnJzAO2/VZyXoSXjrxWhQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266210; c=relaxed/simple; bh=nwUvWK7j9Kj8KbBDPMeg5Tkw109j7gyxcczO1q9K5+o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=RGkPzqGXazI4i9UGQBbF0yeeLoTgjqKJ0CVgVrar7oQ3yP/mmPaL7OGYkuvJZ1o+zF42SXrK7if+YHCjHUMaei+7GCO1SMxpI1ltf9aedpz15sWHqDakk12LZPQGl+apqO+O09i5NWvqaXFmsaKtIhYCfBdi0/yX0zMezLliSDg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ykQwpTao; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ykQwpTao" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30cbf82bd11so285716a91.3 for ; Wed, 14 May 2025 16:43:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266208; x=1747871008; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iLum4dPU4sErthKtiaaPqZqnqSJWRV85cksXnAwJxvE=; b=ykQwpTaov4j6+oFidjP1X8A3elVTfV9/2BoGF7Vlbl9FSRYPdpXlin0PoCPLM/n3R6 /vE+XBE+wYZOOIvq0pmn+MDErxokm3JT3picReyh7V81pJzvmkCvEArGPrLoQtNvvJIB dg65XgamWP69LLo56qhJSG6hDDSdQ/I+LykYqEeKlTJet+liIJwB6RBMB9mf6/N3O+Gm icRy74MdAdTIyEPpHmI2M5R08Bx8Tyzf6w2PtmyhUEcUrIcv86H1Dve3iAh6nV0B2g0t DfIbXtqcu6Sgir4TZkmWwVmQYWiiBllHMyByqDoPyIiMOhJRAjq2OMyPdaOcmSGyqsHi 1Fgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266208; x=1747871008; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iLum4dPU4sErthKtiaaPqZqnqSJWRV85cksXnAwJxvE=; b=iPuN/0rG/9mrvBlQa0R0VoBoGV9FUidTs2zxVzxWkGuhvgVqsQ+9hDcxUrZxLkvxB0 tDnJzHjvd/BJnjI+p1zbsNzZ0kdMeahIu5RIOrTIzNybC7uFF5rKzjt89Rw0WawGK/aJ U+yd52g2Q3efEcmXvWUGntvp3UO0BfNDN+0NkBFwCSB6dbv6fS/B1sccMcvC6ZpiuE3f 7zTReWb9Wy+ZzMJtsA0YRDtkUsdx5Te3VEkGGEskHIIDUDGqFG7mzk212gDWjGsDYqMX ThGb0bajvnWHeP+1y4r21ymrcEM0s1LCEtDBj4l8n+cABniV21vqsneMBc7Ajeh/I1U2 zv+g== X-Forwarded-Encrypted: i=1; AJvYcCUGqCejUKKfMtkAIKIGrPvZcvY9JISSYEf38qQIYjI0FxeqdpDnMMQYLHC9wKoROGzJhMhh8pTPpSXSdqI=@vger.kernel.org X-Gm-Message-State: AOJu0Ywi+oQvWXmeorYaIfdYJij04R9K53V6YxD0FPi/508rxpyQfWUp 6QGCct8wt2p+aB+ayrBwjOr4nKteWcOhltf5xBaovidNLJYCKyBkVLyLOQlQDahyrYAWJPs41eJ 7n7mU4uDqQt5iL5HuhNQI3A== X-Google-Smtp-Source: AGHT+IFTVxsTLBGkiAhxwxfXI0xdufC6Hc5PJQ+0/Nhxii2q/tPBLHxyosuWUQut2LexX0oTzZIxWEdMSOoJvrMGKg== X-Received: from pjbpm14.prod.google.com ([2002:a17:90b:3c4e:b0:2fa:15aa:4d2b]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1cce:b0:306:b6f7:58ba with SMTP id 98e67ed59e1d1-30e5156ea36mr715371a91.6.1747266208304; Wed, 14 May 2025 16:43:28 -0700 (PDT) Date: Wed, 14 May 2025 16:41:59 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 20/51] mm: mempolicy: Refactor out policy_node_nodemask() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com, kernel test robot , Gregory Price Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This was refactored out of huge_node(). huge_node()'s interpretation of vma for order assumes the hugetlb-specific storage of the hstate information in the inode. policy_node_nodemask() does not assume that, and can be used more generically. This refactoring also enforces that nid default to the current node id, which was not previously enforced. alloc_pages_mpol() is the last remaining direct user of policy_nodemask(). All its callers begin with nid being the current node id as well. More refactoring is required for to simplify that. Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202409140519.DIQST28c-lkp@int= el.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202409140553.G2RGVWNA-lkp@int= el.com/ Reviewed-by: Gregory Price Signed-off-by: Ackerley Tng Change-Id: I5774b27d2e718f4d08b59f8d2fedbb34eda7bac3 --- include/linux/mempolicy.h | 9 +++++++++ mm/mempolicy.c | 33 ++++++++++++++++++++++++++------- 2 files changed, 35 insertions(+), 7 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index ce9885e0178a..840c576abcfd 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -138,6 +138,8 @@ extern void numa_policy_init(void); extern void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *ne= w); extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new); =20 +extern int policy_node_nodemask(struct mempolicy *mpol, gfp_t gfp_flags, + pgoff_t ilx, nodemask_t **nodemask); extern int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol, nodemask_t **nodemask); @@ -251,6 +253,13 @@ static inline void mpol_rebind_mm(struct mm_struct *mm= , nodemask_t *new) { } =20 +static inline int policy_node_nodemask(struct mempolicy *mpol, gfp_t gfp_f= lags, + pgoff_t ilx, nodemask_t **nodemask) +{ + *nodemask =3D NULL; + return 0; +} + static inline int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol, nodemask_t **nodemask) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index b28a1e6ae096..7837158ee5a8 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1261,7 +1261,7 @@ static struct folio *alloc_migration_target_by_mpol(s= truct folio *src, =20 h =3D folio_hstate(src); gfp =3D htlb_alloc_mask(h); - nodemask =3D policy_nodemask(gfp, pol, ilx, &nid); + nid =3D policy_node_nodemask(pol, gfp, ilx, &nodemask); return alloc_hugetlb_folio_nodemask(h, nid, nodemask, gfp, htlb_allow_alloc_fallback(MR_MEMPOLICY_MBIND)); } @@ -2121,6 +2121,29 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct= mempolicy *pol, return nodemask; } =20 +/** + * policy_node_nodemask() - Interpret memory policy to get nodemask and ni= d. + * + * @mpol: the memory policy to interpret. + * @gfp_flags: gfp flags for this request. + * @ilx: interleave index, for use only when MPOL_INTERLEAVE or + * MPOL_WEIGHTED_INTERLEAVE + * @nodemask: (output) pointer to nodemask pointer for 'bind' and 'prefer-= many' + * policy + * + * Context: must hold reference on @mpol. + * Return: a nid suitable for a page allocation and a pointer. If the effe= ctive + * policy is 'bind' or 'prefer-many', returns a pointer to the + * mempolicy's @nodemask for filtering the zonelist. + */ +int policy_node_nodemask(struct mempolicy *mpol, gfp_t gfp_flags, + pgoff_t ilx, nodemask_t **nodemask) +{ + int nid =3D numa_node_id(); + *nodemask =3D policy_nodemask(gfp_flags, mpol, ilx, &nid); + return nid; +} + #ifdef CONFIG_HUGETLBFS /* * huge_node(@vma, @addr, @gfp_flags, @mpol) @@ -2139,12 +2162,9 @@ int huge_node(struct vm_area_struct *vma, unsigned l= ong addr, gfp_t gfp_flags, struct mempolicy **mpol, nodemask_t **nodemask) { pgoff_t ilx; - int nid; =20 - nid =3D numa_node_id(); *mpol =3D get_vma_policy(vma, addr, hstate_vma(vma)->order, &ilx); - *nodemask =3D policy_nodemask(gfp_flags, *mpol, ilx, &nid); - return nid; + return policy_node_nodemask(*mpol, gfp_flags, ilx, nodemask); } =20 /* @@ -2601,8 +2621,7 @@ unsigned long alloc_pages_bulk_mempolicy_noprof(gfp_t= gfp, return alloc_pages_bulk_preferred_many(gfp, numa_node_id(), pol, nr_pages, page_array); =20 - nid =3D numa_node_id(); - nodemask =3D policy_nodemask(gfp, pol, NO_INTERLEAVE_INDEX, &nid); + nid =3D policy_node_nodemask(pol, gfp, NO_INTERLEAVE_INDEX, &nodemask); return alloc_pages_bulk_noprof(gfp, nid, nodemask, nr_pages, page_array); } --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5D30248F45 for ; Wed, 14 May 2025 23:43:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266212; cv=none; b=kJZGe0MWV4O/qpDDIk2GGiQ8Ct+t8CWkyLVIgx/F/kWlNwZsbtId90QmNw+DQc46wqqlC2XMvvhw6v3eaUOV2V5XeMjmz6bkFaT1Ph/6lfSIXyAWU5+FXslXt/LsRPvfHCXz/XbZnI7U5RuY/sVmZ4DOFH630z7TssKHAD8duNE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266212; c=relaxed/simple; bh=mDAgY99g40Yyk1XjcdF9j+JWk/+4IZdORF7bJ79w+2M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=OvAjLKlOwc8KKHJkxR7lvRtNfuJeEvAIEWpbqenaFdhSdiXSKnzl5p4KjZXxRR/OEG47rp8jP10C94lpga5LSbhftVHkecJVVMH6K6iAO32HwnPk/VHR4UQgkG4zS+CD10Wwl+nzKfvcIa7R4hxwscJpxvGLg91pVUt5XAqJJ6c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=P8rY/Ivv; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="P8rY/Ivv" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30a39fa0765so516196a91.3 for ; Wed, 14 May 2025 16:43:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266210; x=1747871010; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=4sIecgGyO0lUZDlGmfbLoDOBmwYRGdtl4g13CxMI1VY=; b=P8rY/IvvwBvyW70WJBFVGFGhlMamvpRQiCX2/3Q1nOTdYLdi/RsRfZI/YllEZ4XHSw mhp7L+FwBIfmLMHVGDUBdqmeyV0KiJG39NT/ZnJ5IkopEJ9ClSKoN2F2wdlh5xakAM1o 2XikV3dtuSleHExPqPDT8SPMlNc1zFUrwDPBnQbilrSCP5E7vOuZrN2c0SrLtj6WX3f5 1yHhzCeAHmyuxBRYCsASzmJMJmSSCsqv995fZ6IqXEMBWzBt6br/wxovx1H5Pm2nogz0 y8n2ykCVBI7VKICLRD7yRsPfamkyz0jwX80kBG6j0lqu2Himuf77p+J9nn1cQB3SoIVK AdjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266210; x=1747871010; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=4sIecgGyO0lUZDlGmfbLoDOBmwYRGdtl4g13CxMI1VY=; b=uNvHUO86Nvinv81VzIWBBhil1sWBwypAzE9a38/a0KAbkL0Isux7dbTnkh9e46x/55 a/jSxLdlCzvn2zivfeoVIXTyfQxG0N9Uyz1V5+iiho4jNjEvXH0ush+XXi+aveSNSsKX nwYPdfIExEiDZ7BDeE0J9VsVmw3Ak02n5NY2T1Kv0rFHa9pyFLxVUpzcE8SWctFRsnuk HHtOAQx7OJpFyDwyYMfmV5Wnw/ehLphsI/FkVuU7QlLIdrvk4RAKTDWmcx8KU7Ts/sfi i49iEESDm9JoctPP6c+VrLpbgOFcBH4FkBso7zI+jP5KfFLVxfDFMfsK1ccYz3RQdwEe mzuQ== X-Forwarded-Encrypted: i=1; AJvYcCXMM0Prf3LhTdl5GLFRe3gdBhd8K+8VMSd4tsWkMUqFrgxD9GX8z4QK+nLEwYktEpUENlAZIQ9l3qhYlRY=@vger.kernel.org X-Gm-Message-State: AOJu0YyIuIVq9rwpmfIdBNetZcRM73bccs13cJvWt1PwRcG0l+BVV/e0 eYzbRB3pnDdMgaHidc1QtLdBomJ1GpKHKcN0iga6RX2LgP15iRKPX32EqMMLOIcOfkNCsQT4iT0 tCIbLUr1MKFDe7qAFLnfWyQ== X-Google-Smtp-Source: AGHT+IHE2ihGcSaSi+44FOfQo2JFT44j8vrPkcuV6SQcS6l9oOHcwMqkj/ZKectxEWeBtOTnjJpM1DV+MxgAJO4ylw== X-Received: from pjbsw16.prod.google.com ([2002:a17:90b:2c90:b0:308:87dc:aa52]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d647:b0:30c:540b:9b6 with SMTP id 98e67ed59e1d1-30e2e419501mr9813764a91.0.1747266209881; Wed, 14 May 2025 16:43:29 -0700 (PDT) Date: Wed, 14 May 2025 16:42:00 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 21/51] mm: hugetlb: Inline huge_node() into callers From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" huge_node()'s role was to read struct mempolicy (mpol) from the vma and also interpret mpol to get node id and nodemask. huge_node() can be inlined into callers since 2 out of 3 of the callers will be refactored in later patches to take and interpret mpol without reading mpol from the vma. Signed-off-by: Ackerley Tng Change-Id: Ic94b2ed916fd4f89b7d2755288a3a2f6a56051f7 --- include/linux/mempolicy.h | 12 ------------ mm/hugetlb.c | 13 ++++++++++--- mm/mempolicy.c | 21 --------------------- 3 files changed, 10 insertions(+), 36 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 840c576abcfd..41fc53605ef0 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -140,9 +140,6 @@ extern void mpol_rebind_mm(struct mm_struct *mm, nodema= sk_t *new); =20 extern int policy_node_nodemask(struct mempolicy *mpol, gfp_t gfp_flags, pgoff_t ilx, nodemask_t **nodemask); -extern int huge_node(struct vm_area_struct *vma, - unsigned long addr, gfp_t gfp_flags, - struct mempolicy **mpol, nodemask_t **nodemask); extern bool init_nodemask_of_mempolicy(nodemask_t *mask); extern bool mempolicy_in_oom_domain(struct task_struct *tsk, const nodemask_t *mask); @@ -260,15 +257,6 @@ static inline int policy_node_nodemask(struct mempolic= y *mpol, gfp_t gfp_flags, return 0; } =20 -static inline int huge_node(struct vm_area_struct *vma, - unsigned long addr, gfp_t gfp_flags, - struct mempolicy **mpol, nodemask_t **nodemask) -{ - *mpol =3D NULL; - *nodemask =3D NULL; - return 0; -} - static inline bool init_nodemask_of_mempolicy(nodemask_t *m) { return false; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index b822b204e9b3..5cc261b90e39 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1372,10 +1372,12 @@ static struct folio *dequeue_hugetlb_folio(struct h= state *h, struct mempolicy *mpol; gfp_t gfp_mask; nodemask_t *nodemask; + pgoff_t ilx; int nid; =20 gfp_mask =3D htlb_alloc_mask(h); - nid =3D huge_node(vma, address, gfp_mask, &mpol, &nodemask); + mpol =3D get_vma_policy(vma, address, h->order, &ilx); + nid =3D policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); =20 if (mpol_is_preferred_many(mpol)) { folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, @@ -2321,8 +2323,11 @@ static struct folio *alloc_surplus_hugetlb_folio(str= uct hstate *h, gfp_t gfp_mask =3D htlb_alloc_mask(h); int nid; nodemask_t *nodemask; + pgoff_t ilx; + + mpol =3D get_vma_policy(vma, addr, h->order, &ilx); + nid =3D policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); =20 - nid =3D huge_node(vma, addr, gfp_mask, &mpol, &nodemask); if (mpol_is_preferred_many(mpol)) { gfp_t gfp =3D gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); =20 @@ -6829,10 +6834,12 @@ static struct folio *alloc_hugetlb_folio_vma(struct= hstate *h, nodemask_t *nodemask; struct folio *folio; gfp_t gfp_mask; + pgoff_t ilx; int node; =20 gfp_mask =3D htlb_alloc_mask(h); - node =3D huge_node(vma, address, gfp_mask, &mpol, &nodemask); + mpol =3D get_vma_policy(vma, address, h->order, &ilx); + node =3D policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); /* * This is used to allocate a temporary hugetlb to hold the copied * content, which will then be copied again to the final hugetlb diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 7837158ee5a8..39d0abc407dc 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2145,27 +2145,6 @@ int policy_node_nodemask(struct mempolicy *mpol, gfp= _t gfp_flags, } =20 #ifdef CONFIG_HUGETLBFS -/* - * huge_node(@vma, @addr, @gfp_flags, @mpol) - * @vma: virtual memory area whose policy is sought - * @addr: address in @vma for shared policy lookup and interleave policy - * @gfp_flags: for requested zone - * @mpol: pointer to mempolicy pointer for reference counted mempolicy - * @nodemask: pointer to nodemask pointer for 'bind' and 'prefer-many' pol= icy - * - * Returns a nid suitable for a huge page allocation and a pointer - * to the struct mempolicy for conditional unref after allocation. - * If the effective policy is 'bind' or 'prefer-many', returns a pointer - * to the mempolicy's @nodemask for filtering the zonelist. - */ -int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_fl= ags, - struct mempolicy **mpol, nodemask_t **nodemask) -{ - pgoff_t ilx; - - *mpol =3D get_vma_policy(vma, addr, hstate_vma(vma)->order, &ilx); - return policy_node_nodemask(*mpol, gfp_flags, ilx, nodemask); -} =20 /* * init_nodemask_of_mempolicy --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DC4024BC01 for ; Wed, 14 May 2025 23:43:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266215; cv=none; b=hfkdeyeKQK/VBkWsIkzhdqWQ9v3AGSgvttfAiyFBdYCX0ZlleUEYvTxicgzaSHvvq6KXFtEKmU3vyVId1YzDNTuNAowJETFuttzYHvkIo+NeV1ddNgJZaWGp/Yqk92MwPm7lnkrbdtbMOvDYS0KMLRNRgvKB/7NA3reWJ++uhik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266215; c=relaxed/simple; bh=VXyPJAGKIQGnQGNvuuAB1rMiRT3oZQFz9jJI9V3sruk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=elfBHYrUGBPmICH9x/hmoGrTe9BAMa1b8h5rzGDdkoFY41pz0Jg/SvNch8tRH4V8pHhKVuLyYGBCy132bXFbd0ZLBepM00laaai7TYCPA/TJl8N0D+2V56cfjBrupboXc0+nfO5ibAkzxar12lkXVKSO6K0DZERI0Z/Qgln9LcI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XESTAhq0; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XESTAhq0" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-73bfc657aefso273720b3a.1 for ; Wed, 14 May 2025 16:43:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266211; x=1747871011; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IVT2MIxWnWM/XLaHWjsnvPvOi1XED19jnd4mUMbhvto=; b=XESTAhq0P6Hi/6wMOWlNTw7SLis3iy7DBQlevQCJiweM/u7HT41F5nuPkv7osU4PeF u+8B9eNZwMExk1UCyF8rquCyqQPuQ3ytB80st3CNmAv/i9qsUh79zkh6U/hXHUGpfq2S Oon8/gf2hHYWwdBG/DPlURx02LRgGKdBUVUM8SqQkKFMo4/p8Z/0ybalCgAbYmjexdRb FyRVyUTVz5BJlqqLYOyKwgzSaDPdxRgj1cFayVVQgGYcKfQuv8+lBlIp3JyLNCFpXDm1 CqEDQHXMM0MvP6lwFXSVwRSFiUNcevYYeRxAz0K+V4WOK35ZU88AWPC0kF7Z/4lR3EOX SqZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266211; x=1747871011; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IVT2MIxWnWM/XLaHWjsnvPvOi1XED19jnd4mUMbhvto=; b=Z7Sg9xKYDONCPHC+CUleATlcjHZrFEYDoLNsbA8tEMVLEk3EMir/yNxgXzZYVVQlZG 0Q6+NSg5EgD6KmO7EJdUuvMlxytIaZxzX6Fa+U9nwoii7i1Hy09CYu0yG6PT2ZQ9J1ZF clEEKC/4G6V3WjfA4gwfZQQM7+Ci8dR/I/h7fhYFQUT39bq8FtHPbPIdVlos+ilIwjvu 1EOXiDFcROAYTjPf0L4p8seyIxNCTB0O/qkWu182cWPsuLHmVDk3uV/e0kgmcGf1v+hT v8fPc7h7V3cMR5L12A6ll66q2uece0T3T5w0VeU7tx32aEIrqRYyposLXOztqTOAhs6U PYEQ== X-Forwarded-Encrypted: i=1; AJvYcCW31XVph9Bm2808IPEbE6WKtXoaJFgWtzrVLXHpQLtMvjIO2cNa2fPUopwkoeFhevqGxFix+EG4k0im2WM=@vger.kernel.org X-Gm-Message-State: AOJu0Yxt2Uix84rB7Z1a/pCRlui2X//FE3jfuR4b8iTGm+iT8DmqqEVK hbo6dJbJFWo/ccBNBKlO322hmSiQ2611HaEwjSr8ITb3NTzvTPPbZjgNZTP22CF1+9QM35cGaAz C3Ft2y8cYM26raztJ5c1XoQ== X-Google-Smtp-Source: AGHT+IGlIyq6CGoOQHt7EoQf0Q/6z4roPUZNjH46k/lCxiZd8Az5X5i/aF2lsphu6KPl7WGlY+LsQbU/H8QuAY5yUQ== X-Received: from pgar21.prod.google.com ([2002:a05:6a02:2e95:b0:b1f:dcda:276e]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:700f:b0:1f5:6f61:a0ac with SMTP id adf61e73a8af0-215ff0970a6mr6985221637.5.1747266211396; Wed, 14 May 2025 16:43:31 -0700 (PDT) Date: Wed, 14 May 2025 16:42:01 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <1f64e3c7f04fc725f4da4d57de1ea040b7a56952.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 22/51] mm: hugetlb: Refactor hugetlb allocation functions From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor dequeue_hugetlb_folio() and alloc_surplus_hugetlb_folio() to take mpol, nid and nodemask. This decouples allocation of a folio from a vma. Signed-off-by: Ackerley Tng Change-Id: I890fb46fe8c6349383d8cf89befc68a4994eb416 --- mm/hugetlb.c | 64 ++++++++++++++++++++++++---------------------------- 1 file changed, 30 insertions(+), 34 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5cc261b90e39..29d1a3fb10df 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1364,34 +1364,22 @@ static unsigned long available_huge_pages(struct hs= tate *h) return h->free_huge_pages - h->resv_huge_pages; } =20 -static struct folio *dequeue_hugetlb_folio(struct hstate *h, - struct vm_area_struct *vma, - unsigned long address) +static struct folio *dequeue_hugetlb_folio(struct hstate *h, gfp_t gfp_mas= k, + struct mempolicy *mpol, + int nid, nodemask_t *nodemask) { struct folio *folio =3D NULL; - struct mempolicy *mpol; - gfp_t gfp_mask; - nodemask_t *nodemask; - pgoff_t ilx; - int nid; - - gfp_mask =3D htlb_alloc_mask(h); - mpol =3D get_vma_policy(vma, address, h->order, &ilx); - nid =3D policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); =20 if (mpol_is_preferred_many(mpol)) { - folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, - nid, nodemask); + folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); =20 /* Fallback to all nodes if page=3D=3DNULL */ nodemask =3D NULL; } =20 if (!folio) - folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, - nid, nodemask); + folio =3D dequeue_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemask); =20 - mpol_cond_put(mpol); return folio; } =20 @@ -2312,21 +2300,14 @@ static struct folio *alloc_migrate_hugetlb_folio(st= ruct hstate *h, gfp_t gfp_mas } =20 /* - * Use the VMA's mpolicy to allocate a huge page from the buddy. + * Allocate a huge page from the buddy allocator given memory policy and n= ode information. */ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h, - struct vm_area_struct *vma, - unsigned long addr) + gfp_t gfp_mask, + struct mempolicy *mpol, + int nid, nodemask_t *nodemask) { struct folio *folio =3D NULL; - struct mempolicy *mpol; - gfp_t gfp_mask =3D htlb_alloc_mask(h); - int nid; - nodemask_t *nodemask; - pgoff_t ilx; - - mpol =3D get_vma_policy(vma, addr, h->order, &ilx); - nid =3D policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); =20 if (mpol_is_preferred_many(mpol)) { gfp_t gfp =3D gfp_mask & ~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL); @@ -2339,7 +2320,7 @@ static struct folio *alloc_surplus_hugetlb_folio(stru= ct hstate *h, =20 if (!folio) folio =3D alloc_surplus_hugetlb_folio_nodemask(h, gfp_mask, nid, nodemas= k); - mpol_cond_put(mpol); + return folio; } =20 @@ -2993,6 +2974,11 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, int ret, idx; struct hugetlb_cgroup *h_cg =3D NULL; gfp_t gfp =3D htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; + struct mempolicy *mpol; + nodemask_t *nodemask; + gfp_t gfp_mask; + pgoff_t ilx; + int nid; =20 idx =3D hstate_index(h); =20 @@ -3032,7 +3018,6 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, =20 subpool_reservation_exists =3D npages_req =3D=3D 0; } - reservation_exists =3D vma_reservation_exists || subpool_reservation_exis= ts; =20 /* @@ -3048,21 +3033,30 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, goto out_subpool_put; } =20 + mpol =3D get_vma_policy(vma, addr, h->order, &ilx); + ret =3D hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); - if (ret) + if (ret) { + mpol_cond_put(mpol); goto out_uncharge_cgroup_reservation; + } + + gfp_mask =3D htlb_alloc_mask(h); + nid =3D policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); =20 spin_lock_irq(&hugetlb_lock); =20 folio =3D NULL; if (reservation_exists || available_huge_pages(h)) - folio =3D dequeue_hugetlb_folio(h, vma, addr); + folio =3D dequeue_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); =20 if (!folio) { spin_unlock_irq(&hugetlb_lock); - folio =3D alloc_surplus_hugetlb_folio(h, vma, addr); - if (!folio) + folio =3D alloc_surplus_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); + if (!folio) { + mpol_cond_put(mpol); goto out_uncharge_cgroup; + } spin_lock_irq(&hugetlb_lock); list_add(&folio->lru, &h->hugepage_activelist); folio_ref_unfreeze(folio, 1); @@ -3087,6 +3081,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, =20 spin_unlock_irq(&hugetlb_lock); =20 + mpol_cond_put(mpol); + hugetlb_set_folio_subpool(folio, spool); =20 /* If vma accounting wasn't bypassed earlier, follow up with commit. */ --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7999824E4AA for ; Wed, 14 May 2025 23:43:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266216; cv=none; b=AGS6o8FZ2aaSqsyETN7j/VsNiDvCCdJekGJzT13hURo2RXDGat8DtYnaRvkgFGIlRIikJZdym2nVyM6WtnXMb9a/3X16RB/dlGnqrUxIL8BwHwd0fKveF74ltUT9IikiGSdUIP2V57hQ5iVP2Tn8I89a1o1aAWJcq19Kh5aI/ao= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266216; c=relaxed/simple; bh=9IWf7Q3oE3YTh1nzFjdFC9zybFJm+g92r90fv7ZGhhU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YzzGajhHo9bvVG5S3vLo+ZP9jifxZiIlrJnxc58+CzAQOGjj91x0xMtd2VozYRHtoD2zOOFezlZ/mt8MKtAn9Jc0MzNGJTGYi9dsDG3/6CkYHXQrwJQ/dp5mJ/kebRwJSda1+oX35OBdSxiyk4SJvXlj3rsiw++Dwjd8rnXTZyY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yeIz9LFD; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yeIz9LFD" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30e48854445so266345a91.2 for ; Wed, 14 May 2025 16:43:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266213; x=1747871013; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wlApZPNMFWcR7jIJ9KHUoWOXBlzS7oZ5fyFUsGvGlHI=; b=yeIz9LFDa9x4TMSfx/RqwFPMOPwhuTRIsn51G/g/NxJFazXL+9Ov2aRjSbs3nxeHF7 8JRIiu0hh/usStOL12DJhMpcb3sRzRgeAHpMISrzGzjP53khsROs/5FvXgIEN9Tepi4+ 1Bu8mGat4OsLcmepyMJEPpaqAJWo2xdnMMHQJwZDxHyhARWaNRrBuxRQtsLvg+lPtGKh qLjobxdd5H1gSRTVlPLvwDbKj+YvEVLHXJigd0TaEEf762VaiRBW4njbAThaCNqdeDCB FWD/UqBu5oPOTLlxo8HQYTb4cVc6IrvmvjUP5rVAvZ337cBr+lqyxaR6OUBKZEcxw9eO vZzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266213; x=1747871013; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wlApZPNMFWcR7jIJ9KHUoWOXBlzS7oZ5fyFUsGvGlHI=; b=ePCgvhXyBRLZ5srRULHNVZuRN5eI4Ilt3Yr2T4rRi8+KKR4GxKgPDpTGy7arFObcym VZz5J0n7sgvczMJDtM4C8xlHuxE9MJlErFoAM3DsGiJ5ylKCl9B6fqg4UXkC2qWT/yYl 9TVuA9b8T5Z3ZTRaIHwPwIGsBVRI6WrL8F/oZDnWZVvv3FwLQlJtWtmaUIACCRfACAsd VVjSxMNh1f+Flk0WeJaes+vi+jh3lbqJG4uQ0yGcav9MwYqAcsKCI+T8uW7T/P7ooOy+ x0T46Apv664GBWZTjMjDpRNHeXdYW+OC5MLbLzvWZCePzVLg/ugHnqHJwjHM4EdbWRgG kc7A== X-Forwarded-Encrypted: i=1; AJvYcCUiUNLgYRRt15Xwalbrsd2XZzkfEzRqm2STnOZMBBm/zYMOp/40o1tYtNQzv5G7Q1NOmpw0+tUZgV9LO1U=@vger.kernel.org X-Gm-Message-State: AOJu0Yzmb9Qj57yEhGV3twwf5Xa85/rAQ74kSsfx1jQ4YSXw5IlHHCQA 3ejGwrfp5+LUIu0uBvQC/+YI6YEMSSg8FsFCRNkzgkDQYj3IWg7qFrfn1ynLUWvDU5E6GDn+Kx9 OwQQxZsqy5uOXzHFPZf7V9g== X-Google-Smtp-Source: AGHT+IGpGoT229pDQNLOtvYT1MW+Z5D28EV3QKdaocKPzRw0I86J+skBPLzQ0QgVUYX98zFangxne/p4We6UZ3/4Ig== X-Received: from pjbpw8.prod.google.com ([2002:a17:90b:2788:b0:301:1ea9:63b0]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:57c4:b0:2fa:15ab:4de7 with SMTP id 98e67ed59e1d1-30e51589dbamr900643a91.12.1747266212707; Wed, 14 May 2025 16:43:32 -0700 (PDT) Date: Wed, 14 May 2025 16:42:02 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 23/51] mm: hugetlb: Refactor out hugetlb_alloc_folio() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor out hugetlb_alloc_folio() from alloc_hugetlb_folio(), which handles allocation of a folio and cgroup charging. Other than flags to control charging in the allocation process, hugetlb_alloc_folio() also has parameters for memory policy. This refactoring as a whole decouples the hugetlb page allocation from hugetlbfs, (1) where the subpool is stored at the fs mount, (2) reservations are made during mmap and stored in the vma, and (3) mpol must be stored at vma->vm_policy (4) a vma must be used for allocation even if the pages are not meant to be used by host process. This decoupling will allow hugetlb_alloc_folio() to be used by guest_memfd in later patches. In guest_memfd, (1) a subpool is created per-fd and is stored on the inode, (2) no vma-related reservations are used (3) mpol may not be associated with a vma since (4) for private pages, the pages will not be mappable to userspace and hence have to associated vmas. This could hopefully also open hugetlb up as a more generic source of hugetlb pages that are not bound to hugetlbfs, with the complexities of userspace/mmap/vma-related reservations contained just to hugetlbfs. Signed-off-by: Ackerley Tng Change-Id: I60528f246341268acbf0ed5de7752ae2cacbef93 --- include/linux/hugetlb.h | 12 +++ mm/hugetlb.c | 192 ++++++++++++++++++++++------------------ 2 files changed, 118 insertions(+), 86 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8f3ac832ee7f..8ba941d88956 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -698,6 +698,9 @@ bool hugetlb_bootmem_page_zones_valid(int nid, struct h= uge_bootmem_page *m); int isolate_or_dissolve_huge_page(struct page *page, struct list_head *lis= t); int replace_free_hugepage_folios(unsigned long start_pfn, unsigned long en= d_pfn); void wait_for_freed_hugetlb_folios(void); +struct folio *hugetlb_alloc_folio(struct hstate *h, struct mempolicy *mpol, + pgoff_t ilx, bool charge_cgroup_rsvd, + bool use_existing_reservation); struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner); struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred= _nid, @@ -1099,6 +1102,15 @@ static inline void wait_for_freed_hugetlb_folios(voi= d) { } =20 +static inline struct folio *hugetlb_alloc_folio(struct hstate *h, + struct mempolicy *mpol, + pgoff_t ilx, + bool charge_cgroup_rsvd, + bool use_existing_reservation) +{ + return NULL; +} + static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, unsigned long addr, bool cow_from_owner) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 29d1a3fb10df..5b088fe002a2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2954,6 +2954,101 @@ void wait_for_freed_hugetlb_folios(void) flush_work(&free_hpage_work); } =20 +/** + * hugetlb_alloc_folio() - Allocates a hugetlb folio. + * + * @h: struct hstate to allocate from. + * @mpol: struct mempolicy to apply for this folio allocation. + * @ilx: Interleave index for interpretation of @mpol. + * @charge_cgroup_rsvd: Set to true to charge cgroup reservation. + * @use_existing_reservation: Set to true if this allocation should use an + * existing hstate reservation. + * + * This function handles cgroup and global hstate reservations. VMA-related + * reservations and subpool debiting must be handled by the caller if nece= ssary. + * + * Return: folio on success or negated error otherwise. + */ +struct folio *hugetlb_alloc_folio(struct hstate *h, struct mempolicy *mpol, + pgoff_t ilx, bool charge_cgroup_rsvd, + bool use_existing_reservation) +{ + unsigned int nr_pages =3D pages_per_huge_page(h); + struct hugetlb_cgroup *h_cg =3D NULL; + struct folio *folio =3D NULL; + nodemask_t *nodemask; + gfp_t gfp_mask; + int nid; + int idx; + int ret; + + idx =3D hstate_index(h); + + if (charge_cgroup_rsvd) { + if (hugetlb_cgroup_charge_cgroup_rsvd(idx, nr_pages, &h_cg)) + goto out; + } + + if (hugetlb_cgroup_charge_cgroup(idx, nr_pages, &h_cg)) + goto out_uncharge_cgroup_reservation; + + gfp_mask =3D htlb_alloc_mask(h); + nid =3D policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); + + spin_lock_irq(&hugetlb_lock); + + if (use_existing_reservation || available_huge_pages(h)) + folio =3D dequeue_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); + + if (!folio) { + spin_unlock_irq(&hugetlb_lock); + folio =3D alloc_surplus_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); + if (!folio) + goto out_uncharge_cgroup; + spin_lock_irq(&hugetlb_lock); + list_add(&folio->lru, &h->hugepage_activelist); + folio_ref_unfreeze(folio, 1); + /* Fall through */ + } + + if (use_existing_reservation) { + folio_set_hugetlb_restore_reserve(folio); + h->resv_huge_pages--; + } + + hugetlb_cgroup_commit_charge(idx, nr_pages, h_cg, folio); + + if (charge_cgroup_rsvd) + hugetlb_cgroup_commit_charge_rsvd(idx, nr_pages, h_cg, folio); + + spin_unlock_irq(&hugetlb_lock); + + gfp_mask =3D htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; + ret =3D mem_cgroup_charge_hugetlb(folio, gfp_mask); + /* + * Unconditionally increment NR_HUGETLB here. If it turns out that + * mem_cgroup_charge_hugetlb failed, then immediately free the page and + * decrement NR_HUGETLB. + */ + lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); + + if (ret =3D=3D -ENOMEM) { + free_huge_folio(folio); + return ERR_PTR(-ENOMEM); + } + + return folio; + +out_uncharge_cgroup: + hugetlb_cgroup_uncharge_cgroup(idx, nr_pages, h_cg); +out_uncharge_cgroup_reservation: + if (charge_cgroup_rsvd) + hugetlb_cgroup_uncharge_cgroup_rsvd(idx, nr_pages, h_cg); +out: + folio =3D ERR_PTR(-ENOSPC); + goto out; +} + /* * NOTE! "cow_from_owner" represents a very hacky usage only used in CoW * faults of hugetlb private mappings on top of a non-page-cache folio (in @@ -2971,16 +3066,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, bool reservation_exists; bool charge_cgroup_rsvd; struct folio *folio; - int ret, idx; - struct hugetlb_cgroup *h_cg =3D NULL; - gfp_t gfp =3D htlb_alloc_mask(h) | __GFP_RETRY_MAYFAIL; struct mempolicy *mpol; - nodemask_t *nodemask; - gfp_t gfp_mask; pgoff_t ilx; - int nid; - - idx =3D hstate_index(h); =20 if (cow_from_owner) { /* @@ -3020,69 +3107,22 @@ struct folio *alloc_hugetlb_folio(struct vm_area_st= ruct *vma, } reservation_exists =3D vma_reservation_exists || subpool_reservation_exis= ts; =20 - /* - * If a vma_reservation_exists, we can skip charging hugetlb - * reservations since that was charged in hugetlb_reserve_pages() when - * the reservation was recorded on the resv_map. - */ - charge_cgroup_rsvd =3D !vma_reservation_exists; - if (charge_cgroup_rsvd) { - ret =3D hugetlb_cgroup_charge_cgroup_rsvd( - idx, pages_per_huge_page(h), &h_cg); - if (ret) - goto out_subpool_put; - } - mpol =3D get_vma_policy(vma, addr, h->order, &ilx); =20 - ret =3D hugetlb_cgroup_charge_cgroup(idx, pages_per_huge_page(h), &h_cg); - if (ret) { - mpol_cond_put(mpol); - goto out_uncharge_cgroup_reservation; - } - - gfp_mask =3D htlb_alloc_mask(h); - nid =3D policy_node_nodemask(mpol, gfp_mask, ilx, &nodemask); - - spin_lock_irq(&hugetlb_lock); - - folio =3D NULL; - if (reservation_exists || available_huge_pages(h)) - folio =3D dequeue_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); - - if (!folio) { - spin_unlock_irq(&hugetlb_lock); - folio =3D alloc_surplus_hugetlb_folio(h, gfp_mask, mpol, nid, nodemask); - if (!folio) { - mpol_cond_put(mpol); - goto out_uncharge_cgroup; - } - spin_lock_irq(&hugetlb_lock); - list_add(&folio->lru, &h->hugepage_activelist); - folio_ref_unfreeze(folio, 1); - /* Fall through */ - } - /* - * Either dequeued or buddy-allocated folio needs to add special - * mark to the folio when it consumes a global reservation. + * If a vma_reservation_exists, we can skip charging cgroup reservations + * since that was charged during vma reservation. Use a reservation as + * long as it exists. */ - if (reservation_exists) { - folio_set_hugetlb_restore_reserve(folio); - h->resv_huge_pages--; - } - - hugetlb_cgroup_commit_charge(idx, pages_per_huge_page(h), h_cg, folio); - - if (charge_cgroup_rsvd) { - hugetlb_cgroup_commit_charge_rsvd(idx, pages_per_huge_page(h), - h_cg, folio); - } - - spin_unlock_irq(&hugetlb_lock); + charge_cgroup_rsvd =3D !vma_reservation_exists; + folio =3D hugetlb_alloc_folio(h, mpol, ilx, charge_cgroup_rsvd, + reservation_exists); =20 mpol_cond_put(mpol); =20 + if (IS_ERR_OR_NULL(folio)) + goto out_subpool_put; + hugetlb_set_folio_subpool(folio, spool); =20 /* If vma accounting wasn't bypassed earlier, follow up with commit. */ @@ -3091,9 +3131,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_stru= ct *vma, /* * If there is a discrepancy in reservation status between the * time of vma_needs_reservation() and vma_commit_reservation(), - * then there the page must have been added to the reservation - * map between vma_needs_reservation() and - * vma_commit_reservation(). + * then the page must have been added to the reservation map + * between vma_needs_reservation() and vma_commit_reservation(). * * Adjust for the subpool count incremented above AND * in hugetlb_reserve_pages for the same page. Also, @@ -3115,27 +3154,8 @@ struct folio *alloc_hugetlb_folio(struct vm_area_str= uct *vma, } } =20 - ret =3D mem_cgroup_charge_hugetlb(folio, gfp); - /* - * Unconditionally increment NR_HUGETLB here. If it turns out that - * mem_cgroup_charge_hugetlb failed, then immediately free the page and - * decrement NR_HUGETLB. - */ - lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h)); - - if (ret =3D=3D -ENOMEM) { - free_huge_folio(folio); - return ERR_PTR(-ENOMEM); - } - return folio; =20 -out_uncharge_cgroup: - hugetlb_cgroup_uncharge_cgroup(idx, pages_per_huge_page(h), h_cg); -out_uncharge_cgroup_reservation: - if (charge_cgroup_rsvd) - hugetlb_cgroup_uncharge_cgroup_rsvd(idx, pages_per_huge_page(h), - h_cg); out_subpool_put: if (!vma_reservation_exists) hugepage_subpool_put_pages(spool, 1); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D104924EF90 for ; Wed, 14 May 2025 23:43:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266216; cv=none; b=dRlmvQSkknDZvBUltVsrNDspMYxqdBxXEKK6eMG4x844OL8rZANRBl6Bn+USnCmS3czNzJD6ihmDgqQUM9dKGgf60GQ5HEYmrzGQaoru1HBnt9ZURHItKrvIW4nnGB8iXh7cl+WnS5Jglo+BOrLZDGrUEV6Lbzaw4JXVzqOIWHc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266216; c=relaxed/simple; bh=rKA0+onq+gdXn1ctPMqhGi1Qv0qirbMzMXdMeL/6v5E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=h6NlYZfnrl0GPZTk/uSr5KF4DrO6J27MET1p1In/3TG+iROF5u+lsC7rie+Wb4ociDyIaobaOnzXmURMEOvRweXVm4X33+cWDFhII1OOAi0RQAPy/F9ys3/927RaUFZHHOqrBboPyDMQK5+JP+xSQW5bk7Y4DNguWbwfjjxvDaQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fv7pt8SF; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fv7pt8SF" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30ad9303655so659660a91.2 for ; Wed, 14 May 2025 16:43:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266214; x=1747871014; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tc8r/LLWs1XCr5lU0UzhDkU/KqodMLVECq9VtOKJK+U=; b=fv7pt8SFkm6rtPdXGFrfPaqS8HDpyry3/FEpyDaoReaSLfACiwIytKC3vVqXkJi3OQ 3gVVJVMfqpFrp2PdWV0AInO3vQhQb5XomAvOPFAlgKsnp/kgZwKxbPDZ/4AH+HBIyqlI jImL1rbkP2LzFvCSOKzUddivZJ4vFeCcZMFdsXEbFWrSE+/tySSUueOrNoHceBL1+naU /BNXysm6300dlhPj2ydiMY7Zuc2Myz8k3Tdet5a1nClNvFPeGgrs13s9PQxHSeIgxgU3 z3Ap/bJC7pvg6+igAgnDQZQoRpGen1n+aOrGLl43uAP7RBGB37MguwXvsQ2SIJNKGRTL JCrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266214; x=1747871014; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tc8r/LLWs1XCr5lU0UzhDkU/KqodMLVECq9VtOKJK+U=; b=N4pg+UVPryAWM0u1slDYGKcqHbFFsakzyAZ+vLwKda2dtG+dz+tzvp56b7C7JQ3vbq Gm9zFS1OQ05fGQ2MacFwqi49wi19CT41Bv78V76MOqWpi9NxueNCPD8jmVoqBItsMSiD T0Lg99hAEXMaur8pGLD37VnpZ44hfN9X+fa18BEnyQXu8EpkAYSnTe3B6PYy2g/hXUYt dfOYpanbTvO2Z6jsxqJ7DDhAVVR8omqgumH8Wf3pLzLuZj7ho1TmLx84TrQ/tU9x3sYp yO/OyrSA0oq36FIXaHn+yjxcxeWb68PujL3aABn0vgXEpkTtBfrDHtK5mWz5sbmqVBLk Y4JQ== X-Forwarded-Encrypted: i=1; AJvYcCUbe2Ds4NApAXv9z0juVdK5zsz7VllH9kFyQ2yl2J+JRjX8AZTTNtR8TUmxXF6PDHInI+u9onJuWxQsh60=@vger.kernel.org X-Gm-Message-State: AOJu0Yw8uGjha0R3jKZQt+/CmTmTfoncaQFLM1fIiqMAJiiXds6LW9k7 L4oFvmnUkuH7lPQQIk8DWG8v0B6tTrfjQqnfKk3rBqhwI7ezvlcYPyx4IN99ut9JMmO1LwoyF78 K0FRsPYX7wAazeSLoHSvn3g== X-Google-Smtp-Source: AGHT+IF0TxGX5QDuiZ5fF+gyFVLYwkxQqYFd22OI+c7HWH/Uv05Rw/kMdg1Ug4u6l3WRzY2e8Z7/iM3iEM1ERmAmvQ== X-Received: from pjbeu14.prod.google.com ([2002:a17:90a:f94e:b0:2fc:2f33:e07d]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:c2cb:b0:2ff:52e1:c49f with SMTP id 98e67ed59e1d1-30e51930f33mr611504a91.26.1747266214289; Wed, 14 May 2025 16:43:34 -0700 (PDT) Date: Wed, 14 May 2025 16:42:03 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <3f2ac9240cd39295e7341d408548719818d5ea91.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 24/51] mm: hugetlb: Add option to create new subpool without using surplus From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __hugetlb_acct_memory() today does more than just memory accounting. When there's insufficient HugeTLB pages, __hugetlb_acct_memory() will attempt to get surplus pages. This change adds a flag to disable getting surplus pages if there are insufficient HugeTLB pages. Signed-off-by: Ackerley Tng Change-Id: Id79fdeaa236b4fed38fc3c20482b03fff729198f --- fs/hugetlbfs/inode.c | 2 +- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 77 +++++++++++++++++++++++++++++++---------- 3 files changed, 61 insertions(+), 20 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index e4de5425838d..609a88950354 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1424,7 +1424,7 @@ hugetlbfs_fill_super(struct super_block *sb, struct f= s_context *fc) if (ctx->max_hpages !=3D -1 || ctx->min_hpages !=3D -1) { sbinfo->spool =3D hugepage_new_subpool(ctx->hstate, ctx->max_hpages, - ctx->min_hpages); + ctx->min_hpages, true); if (!sbinfo->spool) goto out_free; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 8ba941d88956..c59264391c33 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -116,7 +116,7 @@ extern int hugetlb_max_hstate __read_mostly; for ((h) =3D hstates; (h) < &hstates[hugetlb_max_hstate]; (h)++) =20 struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_h= pages, - long min_hpages); + long min_hpages, bool use_surplus); void hugepage_put_subpool(struct hugepage_subpool *spool); =20 void hugetlb_dup_vma_private(struct vm_area_struct *vma); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5b088fe002a2..d22c5a8fd441 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -115,6 +115,7 @@ static int num_fault_mutexes __ro_after_init; struct mutex *hugetlb_fault_mutex_table __ro_after_init; =20 /* Forward declaration */ +static int __hugetlb_acct_memory(struct hstate *h, long delta, bool use_su= rplus); static int hugetlb_acct_memory(struct hstate *h, long delta); static void hugetlb_vma_lock_free(struct vm_area_struct *vma); static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); @@ -162,7 +163,7 @@ static inline void unlock_or_release_subpool(struct hug= epage_subpool *spool, } =20 struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_h= pages, - long min_hpages) + long min_hpages, bool use_surplus) { struct hugepage_subpool *spool; =20 @@ -176,7 +177,8 @@ struct hugepage_subpool *hugepage_new_subpool(struct hs= tate *h, long max_hpages, spool->hstate =3D h; spool->min_hpages =3D min_hpages; =20 - if (min_hpages !=3D -1 && hugetlb_acct_memory(h, min_hpages)) { + if (min_hpages !=3D -1 && + __hugetlb_acct_memory(h, min_hpages, use_surplus)) { kfree(spool); return NULL; } @@ -2382,35 +2384,64 @@ static nodemask_t *policy_mbind_nodemask(gfp_t gfp) return NULL; } =20 -/* - * Increase the hugetlb pool such that it can accommodate a reservation - * of size 'delta'. +/** + * hugetlb_hstate_reserve_pages() - Reserve @requested number of hugetlb p= ages + * from hstate @h. + * + * @h: the hstate to reserve from. + * @requested: number of hugetlb pages to reserve. + * + * If there are insufficient available hugetlb pages, no reservations are = made. + * + * Return: the number of surplus pages required to meet the @requested num= ber of + * hugetlb pages. */ -static int gather_surplus_pages(struct hstate *h, long delta) +static int hugetlb_hstate_reserve_pages(struct hstate *h, long requested) + __must_hold(&hugetlb_lock) +{ + long needed; + + needed =3D (h->resv_huge_pages + requested) - h->free_huge_pages; + if (needed <=3D 0) { + h->resv_huge_pages +=3D requested; + return 0; + } + + return needed; +} + +/** + * gather_surplus_pages() - Increase the hugetlb pool such that it can + * accommodate a reservation of size @requested. + * + * @h: the hstate in concern. + * @requested: The requested number of hugetlb pages. + * @needed: The number of hugetlb pages the pool needs to be increased by,= based + * on current number of reservations and free hugetlb pages. + * + * Return: 0 if successful or negative error otherwise. + */ +static int gather_surplus_pages(struct hstate *h, long requested, long nee= ded) __must_hold(&hugetlb_lock) { LIST_HEAD(surplus_list); struct folio *folio, *tmp; int ret; long i; - long needed, allocated; + long allocated; bool alloc_ok =3D true; int node; nodemask_t *mbind_nodemask, alloc_nodemask; =20 + if (needed =3D=3D 0) + return 0; + mbind_nodemask =3D policy_mbind_nodemask(htlb_alloc_mask(h)); if (mbind_nodemask) nodes_and(alloc_nodemask, *mbind_nodemask, cpuset_current_mems_allowed); else alloc_nodemask =3D cpuset_current_mems_allowed; =20 - lockdep_assert_held(&hugetlb_lock); - needed =3D (h->resv_huge_pages + delta) - h->free_huge_pages; - if (needed <=3D 0) { - h->resv_huge_pages +=3D delta; - return 0; - } - allocated =3D 0; =20 ret =3D -ENOMEM; @@ -2448,7 +2479,7 @@ static int gather_surplus_pages(struct hstate *h, lon= g delta) * because either resv_huge_pages or free_huge_pages may have changed. */ spin_lock_irq(&hugetlb_lock); - needed =3D (h->resv_huge_pages + delta) - + needed =3D (h->resv_huge_pages + requested) - (h->free_huge_pages + allocated); if (needed > 0) { if (alloc_ok) @@ -2469,7 +2500,7 @@ static int gather_surplus_pages(struct hstate *h, lon= g delta) * before they are reserved. */ needed +=3D allocated; - h->resv_huge_pages +=3D delta; + h->resv_huge_pages +=3D requested; ret =3D 0; =20 /* Free the needed pages to the hugetlb pool */ @@ -5284,7 +5315,7 @@ unsigned long hugetlb_total_pages(void) return nr_total_pages; } =20 -static int hugetlb_acct_memory(struct hstate *h, long delta) +static int __hugetlb_acct_memory(struct hstate *h, long delta, bool use_su= rplus) { int ret =3D -ENOMEM; =20 @@ -5316,7 +5347,12 @@ static int hugetlb_acct_memory(struct hstate *h, lon= g delta) * above. */ if (delta > 0) { - if (gather_surplus_pages(h, delta) < 0) + long needed =3D hugetlb_hstate_reserve_pages(h, delta); + + if (!use_surplus && needed > 0) + goto out; + + if (gather_surplus_pages(h, delta, needed) < 0) goto out; =20 if (delta > allowed_mems_nr(h)) { @@ -5334,6 +5370,11 @@ static int hugetlb_acct_memory(struct hstate *h, lon= g delta) return ret; } =20 +static int hugetlb_acct_memory(struct hstate *h, long delta) +{ + return __hugetlb_acct_memory(h, delta, true); +} + static void hugetlb_vm_op_open(struct vm_area_struct *vma) { struct resv_map *resv =3D vma_resv_map(vma); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FE372512C9 for ; Wed, 14 May 2025 23:43:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266219; cv=none; b=WnYWhzeuwkFicrHpX3w+p8qRHCaKOWhy2HrkWd56TtKwCJgVYSkPMMYoTs46QGlBmNyjvgjhX16dQJQBGXncoJEvHQP0J0QJaVQy61oKQIEFUuGk72TFRO3WfAyGQegAmmgIMJzUbL9Qau+SUYGXKwmL1qQ48NK7InpoHSw2IOs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266219; c=relaxed/simple; bh=P4f0b62NWOp+NPWKRTgUNNPji/5ERCLjnY9pODYzlUE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=R9/bhmsbXz0OXNik+DEJdBzRql5LkCa4Pg6LH1JiU4yoXlYVafveGAKfufQM2SNa/hcTJAPxTjLJaC4OQtecVRamur3By9WnGXrbhlqd6k8dXeRpPPtvF+k7Zmhdsyal6q+Wd9o3Sm6I0mJ1xSUyYsxtpKUOs5gjsr3nyZnfPn4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=X3aGCmoK; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="X3aGCmoK" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30c4b072631so348101a91.2 for ; Wed, 14 May 2025 16:43:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266216; x=1747871016; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=B7idB2ZTiegZ6IK12Tr/2nYezS5KZiJSYYU8LHDGXMQ=; b=X3aGCmoKl/t1PnndVwmh2h0I9wXGB88CzUqrLj9/ASoXDCLVAuD2OpqN1iOg+Jsyv0 QqAixmStyQqLyikPurcbAqeYN2p+nhIzBSC7KXw+kpoMm1TcTHe5yafvYBUkNt4EL+AK L13oz6T3422l+WFwQ88asyW6y/NYNH/jiVcKiSFdeDaFdgAqhiIRHqpb6pPf1iQaQgn4 q0EqoPZ8zhO9J1ytmo6lLQHfxC4LW0eRkQrD1PG0NC0A+dTbvdTOCU/72Me9kYWeyChO 0nsuwIUf+tT5GhDYh7S8FP8mTmWkIydqtEZBpBeNkSQ2082dVwMtrLg1OQQnUKBo9BIU 8utw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266216; x=1747871016; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=B7idB2ZTiegZ6IK12Tr/2nYezS5KZiJSYYU8LHDGXMQ=; b=nKcrktUEDcdttVrJBjZiLVRa2VNXWOcPXJPcpLFFKU8LQL2IBSNZ+NoFxAMNI26gM6 sH0XVi0A4BhbHqW3ZWJ4hK11RzAZ9ckChSk0bgE+oLK2H1LEtxllkzxiUHBp3/9hf2Mm 18zQ9P+Sp9wsXEBaDS4hwBWXK+1fwE4GMs1cYH0lQJcfOlzkhAo8OX/+JIIF8PweULWr dJqYhkCSXRlsAOgmcMtLHcI4sQ5SWvrgGNii6BJTvRSxD37d+qYc/A42TUaL6/rKoEKB +UD7J0J04soMp6jYc8AaclNZMOMxw99LcfgnFbYdR9bvG/9xeprG2WF9Z6z6vtCf4L+e uUIA== X-Forwarded-Encrypted: i=1; AJvYcCVtFGT6hfW2mKwjURktT8VNbeMIBfApy5NpJB9FQDLcUjXt+vwlORcR/uJJsA4Zh7k+Zv98DzqnSlM6pD4=@vger.kernel.org X-Gm-Message-State: AOJu0YzvQtez8UK1+Qe+TGpxLb5FOyHgi0YrhClruR4B5jp2FscyqcJu wwKvc7VWHZZUFMfnmpejz7q/7rL8s20RT5r1CKNLGteecmrYkenbdWuloUs8k0hTWEJSNbxFvVT uesjBOT7dy9s4A6Roybi9Jg== X-Google-Smtp-Source: AGHT+IE2emob7ne9xgmmOZ/2Nz4UUhKEr9sTG+LQ/jCh4mtM1mNcZXbWVXEVF+2dcFUaFpOL2yid3wyO7OHIppEddw== X-Received: from pjbee7.prod.google.com ([2002:a17:90a:fc47:b0:2fa:1fac:269c]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:55d0:b0:309:fac6:44f9 with SMTP id 98e67ed59e1d1-30e51947471mr583415a91.31.1747266215905; Wed, 14 May 2025 16:43:35 -0700 (PDT) Date: Wed, 14 May 2025 16:42:04 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 25/51] mm: truncate: Expose preparation steps for truncate_inode_pages_final From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This will allow preparation steps to be shared while implementing truncation differently. Signed-off-by: Ackerley Tng Change-Id: I83ad5965b8b50283ad930c20c99e3165cb5626c9 --- include/linux/mm.h | 1 + mm/truncate.c | 26 ++++++++++++++++---------- 2 files changed, 17 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bf55206935c4..e4e73c231ced 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3514,6 +3514,7 @@ extern unsigned long vm_unmapped_area(struct vm_unmap= ped_area_info *info); extern void truncate_inode_pages(struct address_space *, loff_t); extern void truncate_inode_pages_range(struct address_space *, loff_t lstart, loff_t lend); +extern void truncate_inode_pages_final_prepare(struct address_space *mappi= ng); extern void truncate_inode_pages_final(struct address_space *); =20 /* generic vm_area_ops exported for stackable file systems */ diff --git a/mm/truncate.c b/mm/truncate.c index 5d98054094d1..057e4aa73aa9 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -457,16 +457,7 @@ void truncate_inode_pages(struct address_space *mappin= g, loff_t lstart) } EXPORT_SYMBOL(truncate_inode_pages); =20 -/** - * truncate_inode_pages_final - truncate *all* pages before inode dies - * @mapping: mapping to truncate - * - * Called under (and serialized by) inode->i_rwsem. - * - * Filesystems have to use this in the .evict_inode path to inform the - * VM that this is the final truncate and the inode is going away. - */ -void truncate_inode_pages_final(struct address_space *mapping) +void truncate_inode_pages_final_prepare(struct address_space *mapping) { /* * Page reclaim can not participate in regular inode lifetime @@ -487,6 +478,21 @@ void truncate_inode_pages_final(struct address_space *= mapping) xa_lock_irq(&mapping->i_pages); xa_unlock_irq(&mapping->i_pages); } +} +EXPORT_SYMBOL(truncate_inode_pages_final_prepare); + +/** + * truncate_inode_pages_final - truncate *all* pages before inode dies + * @mapping: mapping to truncate + * + * Called under (and serialized by) inode->i_rwsem. + * + * Filesystems have to use this in the .evict_inode path to inform the + * VM that this is the final truncate and the inode is going away. + */ +void truncate_inode_pages_final(struct address_space *mapping) +{ + truncate_inode_pages_final_prepare(mapping); =20 truncate_inode_pages(mapping, 0); } --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70C76252289 for ; Wed, 14 May 2025 23:43:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266222; cv=none; b=hiYKuuM808gqWyEvwKEcz8CE/47nxqbHiqxBf8TpvC663OYpggenXTZ5nyIVwRD/1Pu3pSeqVvb1qRth1K4gN8AB8sZMsYJBqm0SbNZd9Y5GVJ/XQPwfiFs6XLHTC/V/DxTXlhHXu64lK9vTdi0E6lRRmnJbkeyufDWXunKT9hE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266222; c=relaxed/simple; bh=CY4bTSNeZPPROcrIeiO6I6OWg6zoDnaqUze/3Ax9ufY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=lymqdClQ08g02G+CfA2PgB+DcX2+e87yv83XJM6PKaa01QUrNVEsZ03QCR1LunnD/P0Y0oXm6UP39fxne9NrtXt3fSOP1ds3Fk60/WDfYwlZp5oOGkcUVVNaFbqS0EceohmO0oRBT71ywThzCvRhT1zSCRqxCKB/1wKO7cykmpo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3WPbM3Qt; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3WPbM3Qt" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30ab4d56096so389028a91.0 for ; Wed, 14 May 2025 16:43:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266218; x=1747871018; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tTxikc62gZoeyz7SAyFK4qZ9SsMpsyITZnnPziBCTKM=; b=3WPbM3Qt1NsRnOqdUxu1QcT2TteijobkjfB1GDWiQ3yCZ9ikwO37owIj+f6KUnW9/r SkFh6VxNKJtQoZOz+lu2sBXsBPrmTrko5gIb9NWcidO6uO5vNyL4FcGOqK8odH1jBl6O o1hCR1YQ+exUl8k/NggB3TuZGcej2Q4H/9aXsnpmbmzW0OKF7XNqtO52jWvbQoaeg5zC 52dlZ9W4fPRvHbD1Um8zyArC7Tktugd5+lHNsLGIRauceioORIUzyUxmRU73YPX5WlbY 18FIex+5eTOrHntGB1HmRcxq2h6i5iD/GbE/SKqIzxK4fLQsPM26ivqp6yuJ+bziDNY6 STZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266218; x=1747871018; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tTxikc62gZoeyz7SAyFK4qZ9SsMpsyITZnnPziBCTKM=; b=egsF8McpAoH6oCwickmWmQDH8pyRL5HGJ9ul9psQiL7W6isLQgorYyexCHlFvKdVkl hRa+neB3SwJiI5wRwkI6voJoQyD6gv/A7oc9JfHf+Ok9mtpi9gnnztW/X0yNxEHuezuy x4uBpTb1Dfhzj4rG5IdOTzTsQj54BwUFbSi0XOUIez71mXRFu1YFMs6BBB4mYsMw6gQP wqpGaN7je+PKmusY2xtn62DUjhhoZKLGYfvfBYSfOCHbEOvnBVqjybOMiRcfEsicjOw7 akRamaD0A9lg0gGD44pBkm4LyX2jDifE2uD7IzKtfvjo5Brw2UNZNiz1fnQVqqibwWsm va9A== X-Forwarded-Encrypted: i=1; AJvYcCWi1FryNVPSZRe8rJjl8K3MEOrHWqktMdxwTjNMdlZtfaZsv50zHzSMxlAzdFEvGnl3iiAg9ylD2NMQnlk=@vger.kernel.org X-Gm-Message-State: AOJu0YzRTiJ5BSyjyaafHYT97NdpbEnTRk5e0phw0xmkvn4S+wXLAKn7 R6RSsy2bC2jcq5pMHxpnPGDuNZabNB5PttA5/ZviAX1l6iFku4Xjx1jP0wZTUsIFyhYZD376FW3 sHyVElUbVU0wq1RUWKV4qrQ== X-Google-Smtp-Source: AGHT+IElJzF9b5y9iQhVl4MWgA/5M09NCf7n3sHbhDGMjoGMq56uDKR3Aes/09IOTbaGDebGEYQU1jlk3yzx6ggjqQ== X-Received: from pjxx15.prod.google.com ([2002:a17:90b:58cf:b0:2f9:c349:2f84]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2d8b:b0:2ee:d371:3227 with SMTP id 98e67ed59e1d1-30e2e616623mr10140667a91.17.1747266217551; Wed, 14 May 2025 16:43:37 -0700 (PDT) Date: Wed, 14 May 2025 16:42:05 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <3a897dc919d25951816cba95dd53bfeb2ea6e581.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 26/51] mm: Consolidate freeing of typed folios on final folio_put() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Fuad Tabba Some folio types, such as hugetlb, handle freeing their own folios. The guestmem_hugetlb folio, to be introduced in a later patch, requires extra handling as part of the freeing process. As a first step towards that, this patch consolidates freeing folios that have a type. The first user is hugetlb folios. Later in this patch series, guestmem_hugetlb will become the second user of this. Suggested-by: David Hildenbrand Acked-by: Vlastimil Babka Acked-by: David Hildenbrand Signed-off-by: Fuad Tabba Signed-off-by: Ackerley Tng Change-Id: I881dc58ca89603ddd1e8e1ccca8f5dbfc80c43be --- include/linux/page-flags.h | 15 +++++++++++++++ mm/swap.c | 23 ++++++++++++++++++----- 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e6a21b62dcce..9dd60fb8c33f 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -985,6 +985,21 @@ static inline bool page_has_type(const struct page *pa= ge) return page_mapcount_is_type(data_race(page->page_type)); } =20 +static inline int page_get_type(const struct page *page) +{ + return page->page_type >> 24; +} + +static inline bool folio_has_type(const struct folio *folio) +{ + return page_has_type(&folio->page); +} + +static inline int folio_get_type(const struct folio *folio) +{ + return page_get_type(&folio->page); +} + #define FOLIO_TYPE_OPS(lname, fname) \ static __always_inline bool folio_test_##fname(const struct folio *folio) \ { \ diff --git a/mm/swap.c b/mm/swap.c index 77b2d5997873..d0a5971787c4 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -94,6 +94,19 @@ static void page_cache_release(struct folio *folio) unlock_page_lruvec_irqrestore(lruvec, flags); } =20 +static void free_typed_folio(struct folio *folio) +{ + switch (folio_get_type(folio)) { +#ifdef CONFIG_HUGETLBFS + case PGTY_hugetlb: + free_huge_folio(folio); + return; +#endif + default: + WARN_ON_ONCE(1); + } +} + void __folio_put(struct folio *folio) { if (unlikely(folio_is_zone_device(folio))) { @@ -101,8 +114,8 @@ void __folio_put(struct folio *folio) return; } =20 - if (folio_test_hugetlb(folio)) { - free_huge_folio(folio); + if (unlikely(folio_has_type(folio))) { + free_typed_folio(folio); return; } =20 @@ -964,13 +977,13 @@ void folios_put_refs(struct folio_batch *folios, unsi= gned int *refs) if (!folio_ref_sub_and_test(folio, nr_refs)) continue; =20 - /* hugetlb has its own memcg */ - if (folio_test_hugetlb(folio)) { + if (unlikely(folio_has_type(folio))) { + /* typed folios have their own memcg, if any */ if (lruvec) { unlock_page_lruvec_irqrestore(lruvec, flags); lruvec =3D NULL; } - free_huge_folio(folio); + free_typed_folio(folio); continue; } folio_unqueue_deferred_split(folio); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 255B125290E for ; Wed, 14 May 2025 23:43:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266222; cv=none; b=lVMMsu9W/CH3nh4mFzoP5UluVIdMLjIBKT3mmhRnogGYuPD48Vr5Xc6bIvF3yFDi3W9vfLKfsWLiCiHR/j10oJ6g+jHnUjN4vvVl9HQvmagB7BWOR2NWewdp0uOL1FAnx2sz/sLXyfknNWr6JoqSAPlYzy8dO93WP++uxi/Je4U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266222; c=relaxed/simple; bh=VN6Zy91LaMiTbU6A7di8uTmOZRNoQlpEPvLShP2HHx8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=HZoY8Fxy4K0/MCa+1XUB2hRTz5pHthff7hhoP3sR8OwFLAckMoYFDqqdVITeZnmcJfbXyVQjr8xHLNQ6xKk+18GhVqBODfeCGznSGF470cbnQI+UCzTLHTK9l3t7qvL570H2nFKRhW1O3F+FqoX1DTN/wrctQzCLfVC80C1KziY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=uwowlui7; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="uwowlui7" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b269789425bso262920a12.0 for ; Wed, 14 May 2025 16:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266219; x=1747871019; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xjDNmSUtedoYx4zVHC06mLM1phpXgQwReFmVa/WU1hs=; b=uwowlui7Ww0uHOqUJdsSdieK6tzzSnYW1m3kcEE+SdU49ghYQexl1gy5lOrJ2t3Aag A8Y3tVar8D+uyo81+ljRrO06eD7AQshMg/pV70KSwmJvp1VxAVOU2qBfAeIbZx1WjJKZ 45gI2HGzOn+3hfJhr7A/HLOKvaK7ITrfo0lbSuWMtXpp1pOfH+qQt+BRUReukZ5BT/Xc 4g/KLtHBmQrRrhvI8buuowdzpCIXWKiFNBXopXk/1tkSZm6aHK9XX/9fcKN0CohE98GU LDmfTIys6RbDkB9rKU762zFAVZWnLVe2nFJBXowJyFr8Zmgg2SVKAeNWuOyd/ODZbMpn 14vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266219; x=1747871019; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xjDNmSUtedoYx4zVHC06mLM1phpXgQwReFmVa/WU1hs=; b=jDGldrrUXSrE4CfBlnhtoprDOPq5+5zIH2UeH5FOC8MpoEu7RzUPdcz0xk+oPhSQ9F C4mccNkX4ODWm2JPfi9qfL+eAgKRAfrikwm2uB7RYyWxHIbvCEnoJplS54nsvKAIe3m5 xotfGLM3HRssmovHi+WWTABPp9RHvWSjsQ9v1CKmUMQoprid/WybyC+VVoVxaIsjwGFJ lv9LfkRx2m742z0uYoSPE9a3B8n//Ury7Qjz1qmmlt4CVUoT6jNufXTxc/ERfJzrWfBM 4w/FEXpZyUXKA38h0cALaqcjYVrkVpqINpO3UyXUmjJZj3gfDyJJDuvp6sSk+wrz3T9N VwGQ== X-Forwarded-Encrypted: i=1; AJvYcCUSkh3RWafNQnDt2+UdTy372qBHf2xey2yhj/KJG8VqqUn7+128oATRjZ82o4xWfB+RTB2rMFiznAuiFMs=@vger.kernel.org X-Gm-Message-State: AOJu0Yy0eWbsTeEHFRr2EwMLDlRJKRIcGuTLrXANk67O07nsLRgmtfVi ERcxhe272VOe9kgWtbJVzfthNBhpDdav3LQfAkdLOWl8tk70t6mwHP/dWkQfv01JL+c+xBx+89P JzohLMeRYNNHiK/sel/fQLA== X-Google-Smtp-Source: AGHT+IF1YPUusmAbmb3ZD3GPQhCrYy6/top7yw+u+NgqlU9DG8yUspI8ex9oYsbLJOClXn04qHIYKawEAR+7VdCYWQ== X-Received: from pfbbd12.prod.google.com ([2002:a05:6a00:278c:b0:740:555:f7af]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:3949:b0:1ee:dded:e5b with SMTP id adf61e73a8af0-215ff11a531mr7641318637.24.1747266219362; Wed, 14 May 2025 16:43:39 -0700 (PDT) Date: Wed, 14 May 2025 16:42:06 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <779107f1ff8c79095ca0b2d7921e4c54e20861ad.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 27/51] mm: hugetlb: Expose hugetlb_subpool_{get,put}_pages() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This will allow hugetlb subpools to be used by guestmem_hugetlb. Signed-off-by: Ackerley Tng Change-Id: I909355935f2ab342e65e7bfdc106bedd1dc177c9 --- include/linux/hugetlb.h | 3 +++ mm/hugetlb.c | 6 ++---- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c59264391c33..e6b90e72d46d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -119,6 +119,9 @@ struct hugepage_subpool *hugepage_new_subpool(struct hs= tate *h, long max_hpages, long min_hpages, bool use_surplus); void hugepage_put_subpool(struct hugepage_subpool *spool); =20 +long hugepage_subpool_get_pages(struct hugepage_subpool *spool, long delta= ); +long hugepage_subpool_put_pages(struct hugepage_subpool *spool, long delta= ); + void hugetlb_dup_vma_private(struct vm_area_struct *vma); void clear_vma_resv_huge_pages(struct vm_area_struct *vma); int move_hugetlb_page_tables(struct vm_area_struct *vma, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d22c5a8fd441..816f257680be 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -205,8 +205,7 @@ void hugepage_put_subpool(struct hugepage_subpool *spoo= l) * only be different than the passed value (delta) in the case where * a subpool minimum size must be maintained. */ -static long hugepage_subpool_get_pages(struct hugepage_subpool *spool, - long delta) +long hugepage_subpool_get_pages(struct hugepage_subpool *spool, long delta) { long ret =3D delta; =20 @@ -250,8 +249,7 @@ static long hugepage_subpool_get_pages(struct hugepage_= subpool *spool, * The return value may only be different than the passed value (delta) * in the case where a subpool minimum size must be maintained. */ -static long hugepage_subpool_put_pages(struct hugepage_subpool *spool, - long delta) +long hugepage_subpool_put_pages(struct hugepage_subpool *spool, long delta) { long ret =3D delta; unsigned long flags; --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CD277253941 for ; Wed, 14 May 2025 23:43:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266223; cv=none; b=DalbPs7NBamH8409Utf0GF21sdozA7PrmxoPnfd0bUUDqGwBzUxZNuvjf8el3ezKIOphPJSrWDpCFpUMn2YSBpwyhrHAYdlAHrwUxZHoLH1VeK168k+nvqD5XdFu7Znn4imVsRgeI73xb0BXTyxOXwyAvzaJpR+utrXkfn3QFSU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266223; c=relaxed/simple; bh=HJj8WHDdya4gA/Ti/ZVYOZJkvqr0Goeoxk9qXCNBqnE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=apsP+pZs+ue3aegjmoHbFmOQLimRkFGNQLRHz2PvZqWhfzStwOam1TICmLyxC5mG23zSf4QSgKyL3ekgXoD8t+lChFztyUoXBod65ncZ1UWurHN+aAelTN7DEmMPURGqy+ZPtpblSMIUuPnWh4mkbcH7xGEiAtZ0ZXoSPXYwdp8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rjauxxvn; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rjauxxvn" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30ab4d56096so389075a91.0 for ; Wed, 14 May 2025 16:43:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266221; x=1747871021; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ybRHoummrWWH+GOSU2oOA7jlRVO35C0Hs9Na7WUf7vY=; b=rjauxxvnkmlCQciFC1cBv84vzGq+Wfhw+sBoog1q+7lIVhyHjKqv9dHEwEllbtNE+K LY5n/2YXuYow3vbKac+ROkRTBVDfYptKw3hC5SAtR89RzHjIX3L9n86Oy0dJSgZqHzkC dtcRZuWueadH5QXTyWN4mIyF4VTfEPS89m/kiwRmMWHpuAhGCtqQFUQTXuxfOldsc5RK rTG7HJTYD4JDnd/0udIc/RBXFZOhfqkvlYozhqsbWjUy1JjYG5dmsGnwOgIqIE/w0ogV B8JyhJL96locyMTIAgkVSxvT6xzMRVS2YFNZlNDT4l2UbyQwVoK2PAQxvzVMGMGlMpqE W2nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266221; x=1747871021; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ybRHoummrWWH+GOSU2oOA7jlRVO35C0Hs9Na7WUf7vY=; b=q33DIbHth98/DtdLNZbGm6qC73UPrMm+Foe14Vna49rNkBKKrsEb5FOx6N64pqoXuh I5Pq4pWfpdLccAotswbdPDdF6FasS+P4Nx0zFHnPbmNELcVpq+9z83MMMB461OXGMbV6 7xvxJmnNUMJP5NrRZnqxkit5UL3vtTBMnor14Nzmd3aUYFekKzjhAKDr94gop4DYxs/q Enlzb+Suuf3KMCKIbhyF4RhK4sv60I0cmyeoTxbqQZ89FRhHKeyRwMyGS359WxD5qUXJ UJ4WESZ0DZx4pPPMxIH0NXmPab6h6TXLR07xMnFzscfmMV3FuvRNAoZWB0xmKyw3bgGF dCPQ== X-Forwarded-Encrypted: i=1; AJvYcCU4RDtFch+eK7YG9fH66JU0/cactvApRgiQznBtmKRLTYzuVa7k+L2zKjMm5RItXwNZ7NrkckqLAoqiGmc=@vger.kernel.org X-Gm-Message-State: AOJu0YwB5ETs0Yc+gZ0uj6wi3DbHxbb9HvPRKjeSrb+XEg7Te1sb5RsY Y2dO+R0+f5o2YNXU2LkbtYMbojX98QBCjI7aPpSD5Zpp2Agt0Bcj7cduJHziH7ndDtLvE1FN4zj l1Du/8Koepbr6Es7DRnRYIw== X-Google-Smtp-Source: AGHT+IF7CTIjvQ3/Yejxf4LzxERKUbwh5QUADPu51M5AW0vpxWcS4ShcwU93trN4MwXAS2gTS+MsKTv3PqeW6YPJgA== X-Received: from pjuj4.prod.google.com ([2002:a17:90a:d004:b0:30a:89a8:cef0]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a85:b0:30a:3dde:6af4 with SMTP id 98e67ed59e1d1-30e2e687a50mr6884071a91.31.1747266221046; Wed, 14 May 2025 16:43:41 -0700 (PDT) Date: Wed, 14 May 2025 16:42:07 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <8f8b6d6f44cdc6b27db11e1e867dc92efca6d177.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 28/51] mm: Introduce guestmem_hugetlb to support folio_put() handling of guestmem pages From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The PGTY_guestmem_hugetlb is introduced so folios can be marked for further cleanup by guestmem_hugetlb. guestmem_hugetlb folios can have positive mapcounts, which will conflict with the installation of a page type. Hence, PGTY_guestmem_hugetlb will only be installed when a folio is truncated, after the folio has been unmapped and has a mapcount of 0. Signed-off-by: Fuad Tabba Acked-by: Vlastimil Babka Acked-by: David Hildenbrand Signed-off-by: Ackerley Tng Change-Id: I635f8929e06f73d7899737bd47090b7cbc7222dc --- include/linux/page-flags.h | 17 +++++++++++++++++ mm/Kconfig | 10 ++++++++++ mm/Makefile | 1 + mm/debug.c | 1 + mm/guestmem_hugetlb.c | 14 ++++++++++++++ mm/guestmem_hugetlb.h | 9 +++++++++ mm/swap.c | 9 +++++++++ 7 files changed, 61 insertions(+) create mode 100644 mm/guestmem_hugetlb.c create mode 100644 mm/guestmem_hugetlb.h diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 9dd60fb8c33f..543f6481ca60 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -965,6 +965,7 @@ enum pagetype { PGTY_zsmalloc =3D 0xf6, PGTY_unaccepted =3D 0xf7, PGTY_large_kmalloc =3D 0xf8, + PGTY_guestmem_hugetlb =3D 0xf9, =20 PGTY_mapcount_underflow =3D 0xff }; @@ -1114,6 +1115,22 @@ FOLIO_TYPE_OPS(hugetlb, hugetlb) FOLIO_TEST_FLAG_FALSE(hugetlb) #endif =20 +/* + * PGTY_guestmem_hugetlb, for now, is used to mark a folio as requiring fu= rther + * cleanup by the guestmem_hugetlb allocator. This page type is installed= only + * at truncation time, by guest_memfd, if further cleanup is required. It= is + * safe to install this page type at truncation time because by then mapco= unt + * would be 0. + * + * The plan is to always set this page type for any folios allocated by + * guestmem_hugetlb once typed folios can be mapped to userspace cleanly. + */ +#ifdef CONFIG_GUESTMEM_HUGETLB +FOLIO_TYPE_OPS(guestmem_hugetlb, guestmem_hugetlb) +#else +FOLIO_TEST_FLAG_FALSE(guestmem_hugetlb) +#endif + PAGE_TYPE_OPS(Zsmalloc, zsmalloc, zsmalloc) =20 /* diff --git a/mm/Kconfig b/mm/Kconfig index e113f713b493..131adc49f58d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1216,6 +1216,16 @@ config SECRETMEM memory areas visible only in the context of the owning process and not mapped to other processes and other kernel page tables. =20 +config GUESTMEM_HUGETLB + bool "Enable guestmem_hugetlb allocator for guest_memfd" + depends on HUGETLBFS + help + Enable this to make HugeTLB folios available to guest_memfd + (KVM virtualization) as backing memory. + + This feature wraps HugeTLB as a custom allocator that + guest_memfd can use. + config ANON_VMA_NAME bool "Anonymous VMA name support" depends on PROC_FS && ADVISE_SYSCALLS && MMU diff --git a/mm/Makefile b/mm/Makefile index e7f6bbf8ae5f..c91c8e8fef71 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -127,6 +127,7 @@ obj-$(CONFIG_PAGE_EXTENSION) +=3D page_ext.o obj-$(CONFIG_PAGE_TABLE_CHECK) +=3D page_table_check.o obj-$(CONFIG_CMA_DEBUGFS) +=3D cma_debug.o obj-$(CONFIG_SECRETMEM) +=3D secretmem.o +obj-$(CONFIG_GUESTMEM_HUGETLB) +=3D guestmem_hugetlb.o obj-$(CONFIG_CMA_SYSFS) +=3D cma_sysfs.o obj-$(CONFIG_USERFAULTFD) +=3D userfaultfd.o obj-$(CONFIG_IDLE_PAGE_TRACKING) +=3D page_idle.o diff --git a/mm/debug.c b/mm/debug.c index db83e381a8ae..439ab128772d 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -56,6 +56,7 @@ static const char *page_type_names[] =3D { DEF_PAGETYPE_NAME(table), DEF_PAGETYPE_NAME(buddy), DEF_PAGETYPE_NAME(unaccepted), + DEF_PAGETYPE_NAME(guestmem_hugetlb), }; =20 static const char *page_type_name(unsigned int page_type) diff --git a/mm/guestmem_hugetlb.c b/mm/guestmem_hugetlb.c new file mode 100644 index 000000000000..51a724ebcc50 --- /dev/null +++ b/mm/guestmem_hugetlb.c @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * guestmem_hugetlb is an allocator for guest_memfd. guest_memfd wraps Hug= eTLB + * as an allocator for guest_memfd. + */ + +#include + +#include "guestmem_hugetlb.h" + +void guestmem_hugetlb_handle_folio_put(struct folio *folio) +{ + WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); +} diff --git a/mm/guestmem_hugetlb.h b/mm/guestmem_hugetlb.h new file mode 100644 index 000000000000..5c9452b77252 --- /dev/null +++ b/mm/guestmem_hugetlb.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MM_GUESTMEM_HUGETLB_H +#define _LINUX_MM_GUESTMEM_HUGETLB_H + +#include + +void guestmem_hugetlb_handle_folio_put(struct folio *folio); + +#endif diff --git a/mm/swap.c b/mm/swap.c index d0a5971787c4..2747230ced89 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -40,6 +40,10 @@ =20 #include "internal.h" =20 +#ifdef CONFIG_GUESTMEM_HUGETLB +#include "guestmem_hugetlb.h" +#endif + #define CREATE_TRACE_POINTS #include =20 @@ -101,6 +105,11 @@ static void free_typed_folio(struct folio *folio) case PGTY_hugetlb: free_huge_folio(folio); return; +#endif +#ifdef CONFIG_GUESTMEM_HUGETLB + case PGTY_guestmem_hugetlb: + guestmem_hugetlb_handle_folio_put(folio); + return; #endif default: WARN_ON_ONCE(1); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 20FF2253B73 for ; Wed, 14 May 2025 23:43:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266224; cv=none; b=HiFrILByVVgznmbM44EdCXAXFt+T4+C89PRsyI0k5+z70nkY2PRCBSwoHIWjiu9K5ut6RVaFFhOY5u79yGPFPRAamKMs6ZeJ7UcO3AFG7i9aJODNXS8NLUDvNKOgNN3QR4dL3Y2cD/c4OwvspHAYCOKQmPx0Wx2U/jBe21H68Zk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266224; c=relaxed/simple; bh=fQP8LrPdftVt1PEYQDzvx7ve03WT9tehNqYvFsWXy74=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ebygivFWq8EcttrIBfMiraQE+21gzDQEk3AJTmazAu5RyIUoVs9kqM+GszWTHGXF+B4LAR+eBeTrcch1Y+JscbxxQQw1SZ7otbTICDYbHNtvm7y6wYazDVUnD2RWqcnesVQYEqds5SpdAiiuO0vxoWEILdCk7Vj3uEjTal57Yq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MR/OPrIA; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MR/OPrIA" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30a91c0745bso1244399a91.1 for ; Wed, 14 May 2025 16:43:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266222; x=1747871022; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YO6uWak20yIpwl5SkrA91BElV/aTzcnCIZVjpZ5wsSs=; b=MR/OPrIAe8NmsBIzcqor6wDkODVin2qHToT28J9RdTgmrVTkjErkIqWP48AcRdCJLi yFewlfP9kq+fbGqp8F1JrILYrJUjLa95W5z071I/16KLfbsUT2zjKl6gnDw2XJosuPcm wNpStp8W+EyuS2NSDk/IApKszWW3+JmFkNDkoi4pM6oE74qAh+CT5vmUkppOQSuBc4ZJ paMmysEJ4kb5zipdk60uv1ZRJmU3AamMWTw1pVBreCP03IAt0/K8FjQB74RznN+GVFoH XAfP2ivMJk6BnG1Fdrw6L10PUHpZvJ4K8PWeQibQWA7H4hapRAPcRK2Wn7JK+BMiOj/D EKmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266222; x=1747871022; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YO6uWak20yIpwl5SkrA91BElV/aTzcnCIZVjpZ5wsSs=; b=PKGWV+2ottz0FMWSuZUumXsgpnZ+4j0V4pBhVOgSUVq/Sk1+FICjXGdVqEU1uq01+R vLMZ4iSyIVrs8SAbj5YL6vv9Itg8g4pFTM31J2iydkL5kXL8A83k+VkX2xXldgS1KKas dv68yu+r+oQnH9YHMynobh9snw2GRrOLDfKYVf7FPMW7nxhqva12wmg+S+UvoQ3VgILF AihJW/iPD2SZWInUpRFlJmaelU94XG2YW4LNwz/AheFcdeIFo0UyxvvdV/bddvkxaDy0 LgRbrn6k6CaT7O+r1r6oNXF86l/qtBkasz8l/9kNnl7yq7jp6XxcFruoz3zyTh6hHvp0 wEBg== X-Forwarded-Encrypted: i=1; AJvYcCVdEGDy0iZxeioEf9I2ZEtLWUXmbdhWO3XfcTpx7LgNF9VzwlOH/rKmvg/OqI1Kcf0vNPRxt4tzESpNskg=@vger.kernel.org X-Gm-Message-State: AOJu0YyxgYYf/9kEQRIsbTkT/IasnGKUAzDIZR/XzghO2pnJV0eF6wy0 BHfc8SpJGL81ZvvuHv+j3dLNIR8Xcr9vemTGOzaL91Idf66nPddn1KzJ7R563cFLr0Jfmp9NPnf KYRQ4jPUedY3PYGvTs2cs1w== X-Google-Smtp-Source: AGHT+IGl3WmzH+yO2/mQx4qb1kBF9dOrq8V3rB8nhLLZwxphKAU+/jLPzMK1piBiMErLawYA7pQKxljtm90DB1S9Bg== X-Received: from pjbtb15.prod.google.com ([2002:a17:90b:53cf:b0:2ff:4be0:c675]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1811:b0:30e:3737:7c87 with SMTP id 98e67ed59e1d1-30e4db11d0amr1980560a91.5.1747266222377; Wed, 14 May 2025 16:43:42 -0700 (PDT) Date: Wed, 14 May 2025 16:42:08 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 29/51] mm: guestmem_hugetlb: Wrap HugeTLB as an allocator for guest_memfd From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" guestmem_hugetlb is an allocator for guest_memfd. It wraps HugeTLB to provide huge folios for guest_memfd. This patch also introduces guestmem_allocator_operations as a set of operations that allocators for guest_memfd can provide. In a later patch, guest_memfd will use these operations to manage pages from an allocator. The allocator operations are memory-management specific and are placed in mm/ so key mm-specific functions do not have to be exposed unnecessarily. Signed-off-by: Ackerley Tng Change-Id: I3cafe111ea7b3c84755d7112ff8f8c541c11136d --- include/linux/guestmem.h | 20 +++++ include/uapi/linux/guestmem.h | 29 +++++++ mm/Kconfig | 5 +- mm/guestmem_hugetlb.c | 159 ++++++++++++++++++++++++++++++++++ 4 files changed, 212 insertions(+), 1 deletion(-) create mode 100644 include/linux/guestmem.h create mode 100644 include/uapi/linux/guestmem.h diff --git a/include/linux/guestmem.h b/include/linux/guestmem.h new file mode 100644 index 000000000000..4b2d820274d9 --- /dev/null +++ b/include/linux/guestmem.h @@ -0,0 +1,20 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_GUESTMEM_H +#define _LINUX_GUESTMEM_H + +#include + +struct guestmem_allocator_operations { + void *(*inode_setup)(size_t size, u64 flags); + void (*inode_teardown)(void *private, size_t inode_size); + struct folio *(*alloc_folio)(void *private); + /* + * Returns the number of PAGE_SIZE pages in a page that this guestmem + * allocator provides. + */ + size_t (*nr_pages_in_folio)(void *priv); +}; + +extern const struct guestmem_allocator_operations guestmem_hugetlb_ops; + +#endif diff --git a/include/uapi/linux/guestmem.h b/include/uapi/linux/guestmem.h new file mode 100644 index 000000000000..2e518682edd5 --- /dev/null +++ b/include/uapi/linux/guestmem.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_GUESTMEM_H +#define _UAPI_LINUX_GUESTMEM_H + +/* + * Huge page size must be explicitly defined when using the guestmem_huget= lb + * allocator for guest_memfd. It is the responsibility of the application= to + * know which sizes are supported on the running system. See mmap(2) man = page + * for details. + */ + +#define GUESTMEM_HUGETLB_FLAG_SHIFT 58 +#define GUESTMEM_HUGETLB_FLAG_MASK 0x3fUL + +#define GUESTMEM_HUGETLB_FLAG_16KB (14UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_64KB (16UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_512KB (19UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_1MB (20UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_2MB (21UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_8MB (23UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_16MB (24UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_32MB (25UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_256MB (28UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_512MB (29UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_1GB (30UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_2GB (31UL << GUESTMEM_HUGETLB_FLAG_SHIFT) +#define GUESTMEM_HUGETLB_FLAG_16GB (34UL << GUESTMEM_HUGETLB_FLAG_SHIFT) + +#endif /* _UAPI_LINUX_GUESTMEM_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 131adc49f58d..bb6e39e37245 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1218,7 +1218,10 @@ config SECRETMEM =20 config GUESTMEM_HUGETLB bool "Enable guestmem_hugetlb allocator for guest_memfd" - depends on HUGETLBFS + select GUESTMEM + select HUGETLBFS + select HUGETLB_PAGE + select HUGETLB_PAGE_OPTIMIZE_VMEMMAP help Enable this to make HugeTLB folios available to guest_memfd (KVM virtualization) as backing memory. diff --git a/mm/guestmem_hugetlb.c b/mm/guestmem_hugetlb.c index 51a724ebcc50..5459ef7eb329 100644 --- a/mm/guestmem_hugetlb.c +++ b/mm/guestmem_hugetlb.c @@ -5,6 +5,14 @@ */ =20 #include +#include +#include +#include +#include +#include +#include + +#include =20 #include "guestmem_hugetlb.h" =20 @@ -12,3 +20,154 @@ void guestmem_hugetlb_handle_folio_put(struct folio *fo= lio) { WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); } + +struct guestmem_hugetlb_private { + struct hstate *h; + struct hugepage_subpool *spool; + struct hugetlb_cgroup *h_cg_rsvd; +}; + +static size_t guestmem_hugetlb_nr_pages_in_folio(void *priv) +{ + struct guestmem_hugetlb_private *private =3D priv; + + return pages_per_huge_page(private->h); +} + +static void *guestmem_hugetlb_setup(size_t size, u64 flags) + +{ + struct guestmem_hugetlb_private *private; + struct hugetlb_cgroup *h_cg_rsvd =3D NULL; + struct hugepage_subpool *spool; + unsigned long nr_pages; + int page_size_log; + struct hstate *h; + long hpages; + int idx; + int ret; + + page_size_log =3D (flags >> GUESTMEM_HUGETLB_FLAG_SHIFT) & + GUESTMEM_HUGETLB_FLAG_MASK; + h =3D hstate_sizelog(page_size_log); + if (!h) + return ERR_PTR(-EINVAL); + + /* + * Check against h because page_size_log could be 0 to request default + * HugeTLB page size. + */ + if (!IS_ALIGNED(size, huge_page_size(h))) + return ERR_PTR(-EINVAL); + + private =3D kzalloc(sizeof(*private), GFP_KERNEL); + if (!private) + return ERR_PTR(-ENOMEM); + + /* Creating a subpool makes reservations, hence charge for them now. */ + idx =3D hstate_index(h); + nr_pages =3D size >> PAGE_SHIFT; + ret =3D hugetlb_cgroup_charge_cgroup_rsvd(idx, nr_pages, &h_cg_rsvd); + if (ret) + goto err_free; + + hpages =3D size >> huge_page_shift(h); + spool =3D hugepage_new_subpool(h, hpages, hpages, false); + if (!spool) + goto err_uncharge; + + private->h =3D h; + private->spool =3D spool; + private->h_cg_rsvd =3D h_cg_rsvd; + + return private; + +err_uncharge: + ret =3D -ENOMEM; + hugetlb_cgroup_uncharge_cgroup_rsvd(idx, nr_pages, h_cg_rsvd); +err_free: + kfree(private); + return ERR_PTR(ret); +} + +static void guestmem_hugetlb_teardown(void *priv, size_t inode_size) +{ + struct guestmem_hugetlb_private *private =3D priv; + unsigned long nr_pages; + int idx; + + hugepage_put_subpool(private->spool); + + idx =3D hstate_index(private->h); + nr_pages =3D inode_size >> PAGE_SHIFT; + hugetlb_cgroup_uncharge_cgroup_rsvd(idx, nr_pages, private->h_cg_rsvd); + + kfree(private); +} + +static struct folio *guestmem_hugetlb_alloc_folio(void *priv) +{ + struct guestmem_hugetlb_private *private =3D priv; + struct mempolicy *mpol; + struct folio *folio; + pgoff_t ilx; + int ret; + + ret =3D hugepage_subpool_get_pages(private->spool, 1); + if (ret =3D=3D -ENOMEM) { + return ERR_PTR(-ENOMEM); + } else if (ret > 0) { + /* guest_memfd will not use surplus pages. */ + goto err_put_pages; + } + + /* + * TODO: mempolicy would probably have to be stored on the inode, use + * task policy for now. + */ + mpol =3D get_task_policy(current); + + /* TODO: ignore interleaving for now. */ + ilx =3D NO_INTERLEAVE_INDEX; + + /* + * charge_cgroup_rsvd is false because we already charged reservations + * when creating the subpool for this + * guest_memfd. use_existing_reservation is true - we're using a + * reservation from the guest_memfd's subpool. + */ + folio =3D hugetlb_alloc_folio(private->h, mpol, ilx, false, true); + mpol_cond_put(mpol); + + if (IS_ERR_OR_NULL(folio)) + goto err_put_pages; + + /* + * Clear restore_reserve here so that when this folio is freed, + * free_huge_folio() will always attempt to return the reservation to + * the subpool. guest_memfd, unlike regular hugetlb, has no resv_map, + * and hence when freeing, the folio needs to be returned to the + * subpool. guest_memfd does not use surplus hugetlb pages, so in + * free_huge_folio(), returning to subpool will always succeed and the + * hstate reservation will then get restored. + * + * hugetlbfs does this in hugetlb_add_to_page_cache(). + */ + folio_clear_hugetlb_restore_reserve(folio); + + hugetlb_set_folio_subpool(folio, private->spool); + + return folio; + +err_put_pages: + hugepage_subpool_put_pages(private->spool, 1); + return ERR_PTR(-ENOMEM); +} + +const struct guestmem_allocator_operations guestmem_hugetlb_ops =3D { + .inode_setup =3D guestmem_hugetlb_setup, + .inode_teardown =3D guestmem_hugetlb_teardown, + .alloc_folio =3D guestmem_hugetlb_alloc_folio, + .nr_pages_in_folio =3D guestmem_hugetlb_nr_pages_in_folio, +}; +EXPORT_SYMBOL_GPL(guestmem_hugetlb_ops); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A26FF253F2B for ; Wed, 14 May 2025 23:43:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266226; cv=none; b=puwDDK3s1IrYsuJjroQ5yWlL/Hh7JP1JicOamI95DMu7IQxOHtLukh4Vr3jsa89mZSR0ajo0ndR6Slt1gQOAd/K+h0deuZf0ty4GY6jpAnN0bnX0MRpdnwZeZys6eMH6wefnWeEXhEPmK2J83o+zy2On5w3uMZBq0ocgWBdtOlw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266226; c=relaxed/simple; bh=2eqGb+Z54Yf6chN7C9tBb9iG2LUsZRU/FE89bpzSmJQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Fe+hqZHr33a7F+xK9+mrhPHbubE9hW4uSRsAfLVklAyh6cFcDQW0e6JUXCDyyM5Y/kWlWW/sju8FnMJiSVz4WHRNCodyQiuf2o0/1pBBqYtHdxQbW71nCtvBV2Tg6Fzg1hVfRyDy5DjJRJB9QM+23ymt6M1JlPYPThjA1jFJ4Sw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aWNXuEX7; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aWNXuEX7" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30c371c34e7so404283a91.1 for ; Wed, 14 May 2025 16:43:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266224; x=1747871024; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lZHhjGByLJzVXtoqJOTMMGXn3R5joV4I6usoUliN7uk=; b=aWNXuEX7cDO7EekPjQNk5sZnm613GFjh/aFf+2MdTDHO18fvEX0HXle0hAk1j039T8 lycxb2dpSkGCee8KcULM+sY9CEGECF+xtlDWT6Cw1meJTMx4AGA9McvfimYQ9ia0+KSs fZBsZdYdRp3RqR/RNdxwvwZIPJ6GPTDhJ5MQyf99pka8GgzTZteQ8sq7/66+eYb7my1q 5Jzq6xYiSxGkZiCcREszbKsK614SayXJyghlGeKftvK7YT/SVjRimiVzPdPWpP1jDd4V RQB9xULsU/2l3u5Ji1onD4/RIc4OSacj0Y6x0BxdmQX5Ms56CjxTkYPweWjHBml8rG9P pHqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266224; x=1747871024; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lZHhjGByLJzVXtoqJOTMMGXn3R5joV4I6usoUliN7uk=; b=dFGaStFKbrKrPzmFX1AY+pQ4TRO13epbnig+2Lewdvy3aQpuKcFfh2jquNiP8EuRn2 az+AQABjc268KXKdEUCVSWOmOezNk+toxLFl8PK4yIjWPjWofScXqCtSj7h0zWisUtPy GDlx4QVJMXN554ptX66rMGFJB1qfPQ8TPX1tmo3f9ONvPhFp4uY4lSRwF7YGHQL4bGS6 5587eLg44cTpCBnEl9Nj2wWyMltcfyeuWXHAiwocKIuWItSHF+2Vp+pbFq8EwXIIMfSc lKbyj4vILqnFCOlqK1bDioYzRxZIvWoGJFGw+yOkB+3/e8Ah/g7/WJKt4yNi8VgZGpCv rAMg== X-Forwarded-Encrypted: i=1; AJvYcCVBT1cWxXGEWf5522rQwN5OJ3hBrMiKpzovl0Xw7JDjwf4BnEyL3iq/rdbTuH5yUFTSwp2C1/wwpXoCBCE=@vger.kernel.org X-Gm-Message-State: AOJu0YyIx0hExSyU5Fv+5CD18gOa+QHuvbaVMhBp8RCpK6oHH63uQeMo B9bWxL0tq+SgMPSo+Kjic6nCwqt3oKFME78GEUvNe4S1N9k1HMef4R5x58/TJMdunXkvF1j2HzZ sp5+3hCLeSUgSzttjouaZlg== X-Google-Smtp-Source: AGHT+IEyquCcT5XxZ9BgIpP/32+sIegfX+nvFI1VSjxlaKftnmHI/Ogt7pc+GLW7KEnHncBj/7BeWAroc3//byxO4g== X-Received: from pjq12.prod.google.com ([2002:a17:90b:560c:b0:2ff:5516:6add]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4c51:b0:30c:5604:f654 with SMTP id 98e67ed59e1d1-30e515873cemr701793a91.9.1747266223852; Wed, 14 May 2025 16:43:43 -0700 (PDT) Date: Wed, 14 May 2025 16:42:09 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 30/51] mm: truncate: Expose truncate_inode_folio() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" guest_memfd will be using truncate_inode_folio() to remove folios from guest_memfd's filemap. Change-Id: Iab72c6d4138cf19f6efeb38341eabe28ded42fd6 Signed-off-by: Ackerley Tng --- include/linux/mm.h | 1 + mm/guestmem_hugetlb.c | 2 +- mm/internal.h | 1 - mm/truncate.c | 1 + 4 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e4e73c231ced..74ca6b7d1d43 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2530,6 +2530,7 @@ extern void truncate_pagecache(struct inode *inode, l= off_t new); extern void truncate_setsize(struct inode *inode, loff_t newsize); void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to); void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t e= nd); +int truncate_inode_folio(struct address_space *mapping, struct folio *foli= o); int generic_error_remove_folio(struct address_space *mapping, struct folio *folio); =20 diff --git a/mm/guestmem_hugetlb.c b/mm/guestmem_hugetlb.c index 5459ef7eb329..ec5a188ca2a7 100644 --- a/mm/guestmem_hugetlb.c +++ b/mm/guestmem_hugetlb.c @@ -4,12 +4,12 @@ * as an allocator for guest_memfd. */ =20 -#include #include #include #include #include #include +#include #include =20 #include diff --git a/mm/internal.h b/mm/internal.h index 25a29872c634..a1694f030539 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -448,7 +448,6 @@ unsigned find_lock_entries(struct address_space *mappin= g, pgoff_t *start, unsigned find_get_entries(struct address_space *mapping, pgoff_t *start, pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices); void filemap_free_folio(struct address_space *mapping, struct folio *folio= ); -int truncate_inode_folio(struct address_space *mapping, struct folio *foli= o); bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end); long mapping_evict_folio(struct address_space *mapping, struct folio *foli= o); diff --git a/mm/truncate.c b/mm/truncate.c index 057e4aa73aa9..4baab1e5d2cf 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -176,6 +176,7 @@ int truncate_inode_folio(struct address_space *mapping,= struct folio *folio) filemap_remove_folio(folio); return 0; } +EXPORT_SYMBOL_GPL(truncate_inode_folio); =20 /* * Handle partial folios. The folio may be entirely within the --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCAE422CBF8 for ; Wed, 14 May 2025 23:43:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266227; cv=none; b=gB4ofHOalQ1KaJq95z1I4Uj3niG3XXl0Nf2i3wUnFEh7oqUPAn3se/M6iGJ5WWi3Z045lMpZQPENS00W82RUP7uaZ+YShT4PyuNMFChc93EVvd+v2AIjeSoyDDNFWWFcbQpuIo2ifWm/e6NMf8PHpvgIucXN1Y2BsH8ioRprLzI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266227; c=relaxed/simple; bh=880XVl+KH7tuOh/Vi4slppYflzAbo1wGOw0YlbCi5yM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=snjyXC7Wyr9K0jgDQKDWTaUMa/rc2lI1Fzad/cBDTdE1ryw20wvso6JTCmtifb6AaTVu0xHpuMto7ytDA1hIiTXfJ+GuABA5mvQdltPgAx9H3i3gsh/w128CaUuzkasARVIC9uDvzkpylLujt5amwbvYtBHVUxFerjwEiJtWJU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3xwVpFHK; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3xwVpFHK" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30ac97eda56so541433a91.1 for ; Wed, 14 May 2025 16:43:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266225; x=1747871025; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=n2xHpbJgtJxOVMnodYqV0AgaR/C3WrhDv288edrMISE=; b=3xwVpFHKSMmumGg6IWWrYZCj09DbaupqvdpMpHbUttXSsqjHMSNb1ezJicT7uToZ08 WQTjgosk+89QtTFBJSN9bU/DQsWS2UtRRyDSA/h+gCIoAYxugmHMiZQXgBow+NXiYO/W giDpH1ZWa+DwjMxCyCl//ltdvCzbYoGdCA5Ihvl/BQEPd7Rbr6M5uCMkCsI/njkJQW2N rBVF6U2L/kVlNx9DxNd9VTWq4NKQF8MEgYL+VMKe8EnE1KHLmhBgbshDtVqpwzeAtqq1 rw5thxRwJ7hKaniGoQPZwFFGJYFGqbBIeHgF4cg+rbN6SSEPyH1ht6fN4QdgPxIsGWVv x4xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266225; x=1747871025; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n2xHpbJgtJxOVMnodYqV0AgaR/C3WrhDv288edrMISE=; b=c+ADOjIR96p+YziwiTR8dGqzLO5eFOPnNVCJPt9ENVQWtSMCwz00MLT4H+8eJ/8sUZ R+FOq3q4rKv2TzlAROLPHekeNMhIO1V4ITnVrlUS4ciisYWQbDNDdfiDR/GSvnHBob5Y Un9CRHUP8vNDDsehwNrqDa76szswFEghMlb31M+Ayo3g9hFkM5oII981LLbAjFVbS7N8 kYTL73rHIds2COlEuRL5B2MDbBffGLmvZfhIM8uNb435y1sFe77Ng09sgokNKAYFyuVe x9/M59lxIQ013ApQ04Gko9LdlV2fdUO5uqrJbrBczcLTfv6ZvgsqHlF3cgAYFMLgDsE3 3WQw== X-Forwarded-Encrypted: i=1; AJvYcCWma1PV/eA0+Us3UHSHnU/LebU7pWI3jbrLGdcakL8McBLm0r3t1gRD7UcMi1yj7paLL4ecrXuG34DzgvI=@vger.kernel.org X-Gm-Message-State: AOJu0YwLElT8Eoe9mg5JpacfYL3ezfprbrrEuGT0vjEqRF2tJkyHlD5u WjGaYxCndGPvQREtAmxUO23MTM5eCONdWcUXvQyIHmQ0hu8rAAy9wohJXZDWMWyhgyexoadj91I 1kk9x/2NS7uBOJKNJcyvKpA== X-Google-Smtp-Source: AGHT+IEnxUnflQCwANEToR4+zQJiUqjf1j7XWon+tqwv5PtG2ghVVngVh87NrGZhCVUyi9zCtLfCOzjpKjbtnaQ2Wg== X-Received: from pjn6.prod.google.com ([2002:a17:90b:5706:b0:30a:9720:ea33]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:6c6:b0:30e:ee6:6745 with SMTP id 98e67ed59e1d1-30e2e5b6599mr8868127a91.10.1747266225334; Wed, 14 May 2025 16:43:45 -0700 (PDT) Date: Wed, 14 May 2025 16:42:10 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <4acb9139318e3ae35d61ed7da9d41db2e328dc40.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 31/51] KVM: x86: Set disallow_lpage on base_gfn and guest_memfd pgoff misalignment From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When slot->base_gfn and userspace_addr are not aligned wrt each other, large page support is disabled for the entire memslot. This patch applies the same logic for when slot->base_gfn and gmem.pgoff are not aligned wrt each other. Change-Id: Iab21b8995e77beae6dbadc3b623a1e9e07e6dce6 Signed-off-by: Ackerley Tng --- arch/x86/kvm/x86.c | 53 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 44 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 12433b1e755b..ee0e3420cc17 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12950,6 +12950,46 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slo= t, unsigned long npages) return 0; } =20 +static inline bool kvm_is_level_aligned(u64 value, int level) +{ + return IS_ALIGNED(value, KVM_PAGES_PER_HPAGE(level)); +} + +static inline bool +kvm_should_allow_lpage_for_slot(struct kvm_memory_slot *slot, int level) +{ + bool gfn_and_userspace_addr_aligned; + unsigned long ugfn; + + ugfn =3D slot->userspace_addr >> PAGE_SHIFT; + + /* + * If addresses are not aligned wrt each other, then large page mapping + * cannot be allowed for the slot since page tables only allow guest to + * host translations to function at fixed levels. + */ + gfn_and_userspace_addr_aligned =3D + kvm_is_level_aligned(slot->base_gfn ^ ugfn, level); + + /* + * If slot->userspace_addr is 0 (disabled), 0 is always aligned so the + * check is deferred to gmem.pgoff. + */ + if (!gfn_and_userspace_addr_aligned) + return false; + + if (kvm_slot_has_gmem(slot)) { + bool gfn_and_gmem_pgoff_aligned; + + gfn_and_gmem_pgoff_aligned =3D kvm_is_level_aligned( + slot->base_gfn ^ slot->gmem.pgoff, level); + + return gfn_and_gmem_pgoff_aligned; + } + + return true; +} + static int kvm_alloc_memslot_metadata(struct kvm *kvm, struct kvm_memory_slot *slot) { @@ -12971,7 +13011,6 @@ static int kvm_alloc_memslot_metadata(struct kvm *k= vm, =20 for (i =3D 1; i < KVM_NR_PAGE_SIZES; ++i) { struct kvm_lpage_info *linfo; - unsigned long ugfn; int lpages; int level =3D i + 1; =20 @@ -12983,16 +13022,12 @@ static int kvm_alloc_memslot_metadata(struct kvm = *kvm, =20 slot->arch.lpage_info[i - 1] =3D linfo; =20 - if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1)) + if (!kvm_is_level_aligned(slot->base_gfn, level)) linfo[0].disallow_lpage =3D 1; - if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 1)) + if (!kvm_is_level_aligned(slot->base_gfn + npages, level)) linfo[lpages - 1].disallow_lpage =3D 1; - ugfn =3D slot->userspace_addr >> PAGE_SHIFT; - /* - * If the gfn and userspace address are not aligned wrt each - * other, disable large page support for this slot. - */ - if ((slot->base_gfn ^ ugfn) & (KVM_PAGES_PER_HPAGE(level) - 1)) { + + if (!kvm_should_allow_lpage_for_slot(slot, level)) { unsigned long j; =20 for (j =3D 0; j < lpages; ++j) --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91DE3255F26 for ; Wed, 14 May 2025 23:43:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266229; cv=none; b=qRRh8XR9odCz0ogEF4I9d+oDn3TvzKndTfqsnbvWzbgAkn+xD0wbxFa4yvaObNR7zecUqWSsaPYTjPHe8eOdfrhrDwDnbzWtPKzuTjQAhZ4R7GWjLCAi2fIpyNlO9DIYkTLY/KEGZXrlEHN2Ara41SROQlBOg7ayviwt7JEYhII= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266229; c=relaxed/simple; bh=BMRs/VwumQxe5F8MpsL0eBoWD/J4YJWeRXXnxzz4k0I=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=FF3La288Sq2FrVCokD14k9Biqbxo/Nk072uBPRbGlxPxqg1W9CMtaIEuR96B4JLdD43Sn5AtS/DaKuSydlBdlDnpAUhweko+LvuIGuG7olOZ4EqVCFK3J2UvlFpfwAtsXt6XULD8/T69Yf9l7m53tDf8amX6UGBAyptlob9db1c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=IFAeQFce; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IFAeQFce" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30ac9855e35so420611a91.2 for ; Wed, 14 May 2025 16:43:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266227; x=1747871027; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Zcjm16puorCPBtthqT8mNitWd+IfzDcCRMmhCOBHoD8=; b=IFAeQFcebEeXnnWAygSMY4msp12BT2J17sVw3WP6LWG48NeIidJPlEpWeZYyISdRy3 6f9n9JvDDpfFO/FNM0VXElTMrJiXzGquITfb5PQ8dR2z8kK/8sw5GnPgjx6lTV/Vx1+x nB9dajL9jrXly5GC4Qbk+hVnbu1crVuEkFhXnMiwVlm3u6UQW2XQpd258LGFnkcU6/DC LfzhRol9vxPJpQIEVRJmCXayLrML65lCrSqdsq7L+7b1Cohz0k30c8ZndYp0PyR9k2kN 0ylykiA3H7SIhkfN19fg9Qc3FgETbcIMqrj31xZA10Fpgu0YSBougIw7SSd/7oQaiJ1Y ud4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266227; x=1747871027; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Zcjm16puorCPBtthqT8mNitWd+IfzDcCRMmhCOBHoD8=; b=bL9DUBZsYAKzuGymXS8PJvuQjMJCvOCqylPiclzrUApDYoPkUAEqJGbC0eC9brY/e4 5l71UfQz2FofTMpPxqKRbm0d/BrYkzhne+5MjCgl5pYRRmNBerbo+CxRn2VisMNMVTlu Kvm9/FCZDYg8wxbrwawPxExIrZgjyeCeyEuZdwW9hpv8bC+0vWjjhRTT2YZvAVyxd/vD HiZMIcT6Vh30BbG/NRVXsLZ6iCXvL37c5pxMMDYaw7p6OTsMC8kgnHgLu/ud9XX1g/7n SYet95rrK5ueG2xS857PTEfWTQfczW5wtDuLLjlJA7bm0Gaegoq5iqohk7qXwtf8oXqa pZnQ== X-Forwarded-Encrypted: i=1; AJvYcCWKv2A8Blrd1IwtioL0qJVA3AzVJcr5F6lgH/BaYX6VA5MuOXODJLCr+bUDM4DgPDnEnKSrqbmgxlL4w1w=@vger.kernel.org X-Gm-Message-State: AOJu0Ywq6mIyJeSwQbDkjAjXwmhw8AkQAD/gZZhbHG8RhIFNZvSH47M7 pAbk5INbjYDXNBFnjKzkCOYeuOrDdsqnd937P91wHKQ1oRC8g1ng4ExdIXI7uaDTmbwWYr8y86R b//OjQosh2ljUmvBG6mw+Fg== X-Google-Smtp-Source: AGHT+IGWVoQymI4B6hVxU1Vt4i2GsQrw+x1ntpHu1NnHoUmQ350L2ODuv1v00PBSIq3/k4CuQkSvhD8J7IuL5NzAmA== X-Received: from pjbpw8.prod.google.com ([2002:a17:90b:2788:b0:301:1ea9:63b0]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:38c7:b0:30e:37be:698d with SMTP id 98e67ed59e1d1-30e37be6bcemr5417459a91.31.1747266226813; Wed, 14 May 2025 16:43:46 -0700 (PDT) Date: Wed, 14 May 2025 16:42:11 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <4d16522293c9a3eacdbe30148b6d6c8ad2eb5908.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 32/51] KVM: guest_memfd: Support guestmem_hugetlb as custom allocator From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds support for guestmem_hugetlb as the first custom allocator in guest_memfd. If requested at guest_memfd creation time, the custom allocator will be used in initialization and cleanup. Signed-off-by: Ackerley Tng Change-Id: I1eb9625dc761ecadcc2aa21480cfdfcf9ab7ce67 --- include/uapi/linux/kvm.h | 1 + virt/kvm/Kconfig | 5 + virt/kvm/guest_memfd.c | 203 +++++++++++++++++++++++++++++++++++++-- 3 files changed, 199 insertions(+), 10 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 433e184f83ea..af486b2e4862 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -1571,6 +1571,7 @@ struct kvm_memory_attributes { =20 #define GUEST_MEMFD_FLAG_SUPPORT_SHARED (1UL << 0) #define GUEST_MEMFD_FLAG_INIT_PRIVATE (1UL << 1) +#define GUEST_MEMFD_FLAG_HUGETLB (1UL << 2) =20 struct kvm_create_guest_memfd { __u64 size; diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 14ffd9c1d480..ff917bb57371 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -133,3 +133,8 @@ config KVM_GMEM_SHARED_MEM select KVM_GMEM bool prompt "Enables in-place shared memory for guest_memfd" + +config KVM_GMEM_HUGETLB + select KVM_PRIVATE_MEM + depends on GUESTMEM_HUGETLB + bool "Enables using a custom allocator with guest_memfd, see CONFIG= _GUESTMEM_HUGETLB" diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 8c9c9e54616b..c65d93c5a443 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -3,11 +3,14 @@ #include #include #include +#include #include #include #include #include =20 +#include + #include "kvm_mm.h" =20 static struct vfsmount *kvm_gmem_mnt; @@ -22,6 +25,10 @@ struct kvm_gmem_inode_private { #ifdef CONFIG_KVM_GMEM_SHARED_MEM struct maple_tree shareability; #endif +#ifdef CONFIG_KVM_GMEM_HUGETLB + const struct guestmem_allocator_operations *allocator_ops; + void *allocator_private; +#endif }; =20 enum shareability { @@ -40,6 +47,44 @@ static struct kvm_gmem_inode_private *kvm_gmem_private(s= truct inode *inode) return inode->i_mapping->i_private_data; } =20 +#ifdef CONFIG_KVM_GMEM_HUGETLB + +static const struct guestmem_allocator_operations * +kvm_gmem_allocator_ops(struct inode *inode) +{ + return kvm_gmem_private(inode)->allocator_ops; +} + +static void *kvm_gmem_allocator_private(struct inode *inode) +{ + return kvm_gmem_private(inode)->allocator_private; +} + +static bool kvm_gmem_has_custom_allocator(struct inode *inode) +{ + return kvm_gmem_allocator_ops(inode) !=3D NULL; +} + +#else + +static const struct guestmem_allocator_operations * +kvm_gmem_allocator_ops(struct inode *inode) +{ + return NULL; +} + +static void *kvm_gmem_allocator_private(struct inode *inode) +{ + return NULL; +} + +static bool kvm_gmem_has_custom_allocator(struct inode *inode) +{ + return false; +} + +#endif + /** * folio_file_pfn - like folio_file_page, but return a pfn. * @folio: The folio which contains this index. @@ -510,7 +555,6 @@ static int kvm_gmem_filemap_add_folio(struct address_sp= ace *mapping, static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) { struct folio *folio; - gfp_t gfp; int ret; =20 repeat: @@ -518,17 +562,24 @@ static struct folio *kvm_gmem_get_folio(struct inode = *inode, pgoff_t index) if (!IS_ERR(folio)) return folio; =20 - gfp =3D mapping_gfp_mask(inode->i_mapping); + if (kvm_gmem_has_custom_allocator(inode)) { + void *p =3D kvm_gmem_allocator_private(inode); =20 - /* TODO: Support huge pages. */ - folio =3D filemap_alloc_folio(gfp, 0); - if (!folio) - return ERR_PTR(-ENOMEM); + folio =3D kvm_gmem_allocator_ops(inode)->alloc_folio(p); + if (IS_ERR(folio)) + return folio; + } else { + gfp_t gfp =3D mapping_gfp_mask(inode->i_mapping); =20 - ret =3D mem_cgroup_charge(folio, NULL, gfp); - if (ret) { - folio_put(folio); - return ERR_PTR(ret); + folio =3D filemap_alloc_folio(gfp, 0); + if (!folio) + return ERR_PTR(-ENOMEM); + + ret =3D mem_cgroup_charge(folio, NULL, gfp); + if (ret) { + folio_put(folio); + return ERR_PTR(ret); + } } =20 ret =3D kvm_gmem_filemap_add_folio(inode->i_mapping, folio, index); @@ -611,6 +662,80 @@ static void kvm_gmem_invalidate_end(struct kvm_gmem *g= mem, pgoff_t start, } } =20 +/** + * kvm_gmem_truncate_indices() - Truncates all folios beginning @index for + * @nr_pages. + * + * @mapping: filemap to truncate pages from. + * @index: the index in the filemap to begin truncation. + * @nr_pages: number of PAGE_SIZE pages to truncate. + * + * Return: the number of PAGE_SIZE pages that were actually truncated. + */ +static long kvm_gmem_truncate_indices(struct address_space *mapping, + pgoff_t index, size_t nr_pages) +{ + struct folio_batch fbatch; + long truncated; + pgoff_t last; + + last =3D index + nr_pages - 1; + + truncated =3D 0; + folio_batch_init(&fbatch); + while (filemap_get_folios(mapping, &index, last, &fbatch)) { + unsigned int i; + + for (i =3D 0; i < folio_batch_count(&fbatch); ++i) { + struct folio *f =3D fbatch.folios[i]; + + truncated +=3D folio_nr_pages(f); + folio_lock(f); + truncate_inode_folio(f->mapping, f); + folio_unlock(f); + } + + folio_batch_release(&fbatch); + cond_resched(); + } + + return truncated; +} + +/** + * kvm_gmem_truncate_inode_aligned_pages() - Removes entire folios from fi= lemap + * in @inode. + * + * @inode: inode to remove folios from. + * @index: start of range to be truncated. Must be hugepage aligned. + * @nr_pages: number of PAGE_SIZE pages to be iterated over. + * + * Removes folios beginning @index for @nr_pages from filemap in @inode, u= pdates + * inode metadata. + */ +static void kvm_gmem_truncate_inode_aligned_pages(struct inode *inode, + pgoff_t index, + size_t nr_pages) +{ + size_t nr_per_huge_page; + long num_freed; + pgoff_t idx; + void *priv; + + priv =3D kvm_gmem_allocator_private(inode); + nr_per_huge_page =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(pri= v); + + num_freed =3D 0; + for (idx =3D index; idx < index + nr_pages; idx +=3D nr_per_huge_page) { + num_freed +=3D kvm_gmem_truncate_indices( + inode->i_mapping, idx, nr_per_huge_page); + } + + spin_lock(&inode->i_lock); + inode->i_blocks -=3D (num_freed << PAGE_SHIFT) / 512; + spin_unlock(&inode->i_lock); +} + static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t= len) { struct list_head *gmem_list =3D &inode->i_mapping->i_private_list; @@ -940,6 +1065,13 @@ static void kvm_gmem_free_inode(struct inode *inode) { struct kvm_gmem_inode_private *private =3D kvm_gmem_private(inode); =20 + /* private may be NULL if inode creation process had an error. */ + if (private && kvm_gmem_has_custom_allocator(inode)) { + void *p =3D kvm_gmem_allocator_private(inode); + + kvm_gmem_allocator_ops(inode)->inode_teardown(p, inode->i_size); + } + kfree(private); =20 free_inode_nonrcu(inode); @@ -959,8 +1091,24 @@ static void kvm_gmem_destroy_inode(struct inode *inod= e) #endif } =20 +static void kvm_gmem_evict_inode(struct inode *inode) +{ + truncate_inode_pages_final_prepare(inode->i_mapping); + + if (kvm_gmem_has_custom_allocator(inode)) { + size_t nr_pages =3D inode->i_size >> PAGE_SHIFT; + + kvm_gmem_truncate_inode_aligned_pages(inode, 0, nr_pages); + } else { + truncate_inode_pages(inode->i_mapping, 0); + } + + clear_inode(inode); +} + static const struct super_operations kvm_gmem_super_operations =3D { .statfs =3D simple_statfs, + .evict_inode =3D kvm_gmem_evict_inode, .destroy_inode =3D kvm_gmem_destroy_inode, .free_inode =3D kvm_gmem_free_inode, }; @@ -1062,6 +1210,12 @@ static void kvm_gmem_free_folio(struct folio *folio) { folio_clear_unevictable(folio); =20 + /* + * No-op for 4K page since the PG_uptodate is cleared as part of + * freeing, but may be required for other allocators to reset page. + */ + folio_clear_uptodate(folio); + kvm_gmem_invalidate(folio); } =20 @@ -1115,6 +1269,25 @@ static struct inode *kvm_gmem_inode_make_secure_inod= e(const char *name, if (err) goto out; =20 +#ifdef CONFIG_KVM_GMEM_HUGETLB + if (flags & GUEST_MEMFD_FLAG_HUGETLB) { + void *allocator_priv; + size_t nr_pages; + + allocator_priv =3D guestmem_hugetlb_ops.inode_setup(size, flags); + if (IS_ERR(allocator_priv)) { + err =3D PTR_ERR(allocator_priv); + goto out; + } + + private->allocator_ops =3D &guestmem_hugetlb_ops; + private->allocator_private =3D allocator_priv; + + nr_pages =3D guestmem_hugetlb_ops.nr_pages_in_folio(allocator_priv); + inode->i_blkbits =3D ilog2(nr_pages << PAGE_SHIFT); + } +#endif + inode->i_private =3D (void *)(unsigned long)flags; inode->i_op =3D &kvm_gmem_iops; inode->i_mapping->a_ops =3D &kvm_gmem_aops; @@ -1210,6 +1383,10 @@ static int __kvm_gmem_create(struct kvm *kvm, loff_t= size, u64 flags) return err; } =20 +/* Mask of bits belonging to allocators and are opaque to guest_memfd. */ +#define SUPPORTED_CUSTOM_ALLOCATOR_MASK \ + (GUESTMEM_HUGETLB_FLAG_MASK << GUESTMEM_HUGETLB_FLAG_SHIFT) + int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args) { loff_t size =3D args->size; @@ -1222,6 +1399,12 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_crea= te_guest_memfd *args) if (flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED) valid_flags |=3D GUEST_MEMFD_FLAG_INIT_PRIVATE; =20 + if (IS_ENABLED(CONFIG_KVM_GMEM_HUGETLB) && + flags & GUEST_MEMFD_FLAG_HUGETLB) { + valid_flags |=3D GUEST_MEMFD_FLAG_HUGETLB | + SUPPORTED_CUSTOM_ALLOCATOR_MASK; + } + if (flags & ~valid_flags) return -EINVAL; =20 --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3BB4B25486E for ; Wed, 14 May 2025 23:43:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266232; cv=none; b=Izu9xDY3NNd7UiyVSMuZZTWaqBZqglgsF8NH4p7tlSB6nBiJBoalNwNfZ7UIZl3+zN0riEdwQmNEJRm+Lv3l3+vSeVVC15OE597VPOFOLj7HYqMgIqEvnn68eMBeqd4GIYFv2QAwgyW5+lnVAvU8kqUeml/n4NP4xkI35N4c9Fk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266232; c=relaxed/simple; bh=JtCHgpy0nnZ9AybihcEVe9TWyhmxkNRuz0XK1RE8q5E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=o4oj2g5PzhFSqFI1oMR9VrYWYLTK8HezKZe/E9W4XoRMlgaPnMEAe6Z9Tq6sqkz8rbdHiP94gQwePEGvxjm+SGyCYopN6uVQlXfnhPBdQ2CXbNs9gLX7OrkxqfTMg+xO6eKAXCVzVju12my0DvvABGIvQfBvoj+O+uEzyZLD6Yg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ecg/7N6j; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ecg/7N6j" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30c50f130d9so316645a91.1 for ; Wed, 14 May 2025 16:43:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266228; x=1747871028; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cGxf10xTbolob6ARtbQFmrKdEey0c2iLS6uYVAPHbb0=; b=ecg/7N6jr0tfsB4ivTK/RhHFVIF9I4gJjgNy1SzfJm0ym9RBTCxr4/bwPtaE9qP/sP UKJxACGq7Yqlv+fASu94LM8/CFjcZwx035SvNi7qIA+xO+es1wbYyhio5qXkuTxqofyj px3Pw4C97KsFcnbNOofmyq0EsOQC0VJMyVeMPh9wLHSQUamq9/CcvYogLVPzEkgCDK7i IWEB+0FuoUIUVy1tL8LFk0ifJw8ku/u8KGevunZuM+3vaImVV1aYjBp9iLccEKELcGLb EZq1VHf61JRtKUBL5Bm3QtpMt3AykZVPtAhdODmbzpjRvr14WA16yCz52p1+Pk5xTa9n T3xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266228; x=1747871028; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cGxf10xTbolob6ARtbQFmrKdEey0c2iLS6uYVAPHbb0=; b=U7F7XiqRpNd7TbMcccH9yftRdomC18HZ7LTDVzaVFikiIyGq+N5Ta1e/IUuKkocmz6 nSgMU1ByAnFterkzxPDFhcojQABMHdyuVvgU4DB4ycY51YyFPkVIWJ2y1fbXYSOxkBQ0 11/TdrjOb6HrJSWZItgjIZhmXLTdV2HPak3SkaQ4E5lFhJ1WR1n3LXOfYTWbmXRZFiVd xaji8hy5v3Yktc6pbmiCVULRDsAYEBUOJZ7IalghY6Wl8BOohdhzrmdE9yvb2v7+aK2K z4tvYVUffH7ilgSdW0APXtWo7S7rZT6FCTs9P3tHxsrZzZ/ppUSjT7pf2oiBXUTqiUtA P5TQ== X-Forwarded-Encrypted: i=1; AJvYcCXx1uZbKAe8JE8HVmkkvV+1mcX8cDeAcwQWla5AEMbfpjAGQW+vTysrHeuL94FCnKNYXKbAGyN1a1WIRe8=@vger.kernel.org X-Gm-Message-State: AOJu0Yy6UWO9INpBqd87NM1cUgfgaDtr2nC3wRmg7Fo0gvU9e7FCs4I7 VTFG0T7RkM3Rw3usC4Rks6IaOKa8Zby4PcTBkV47/IlZdWjCIPV1X4gpwD9skrJpf4p6qkiice0 80P6bIzlXGKzQlIPTCT3p2Q== X-Google-Smtp-Source: AGHT+IF3s18X1WXVSpREAFxwZmEiWRkEzf4FnXZJnpCpoYmdbOqB9rnOiNcmfKVh8wkeRqqYO9tvfAUew5Jususlyg== X-Received: from pjbta5.prod.google.com ([2002:a17:90b:4ec5:b0:2ff:5752:a78f]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:17c1:b0:2ee:9b09:7d3d with SMTP id 98e67ed59e1d1-30e51786009mr630115a91.19.1747266228378; Wed, 14 May 2025 16:43:48 -0700 (PDT) Date: Wed, 14 May 2025 16:42:12 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 33/51] KVM: guest_memfd: Allocate and truncate from custom allocator From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If a custom allocator is requested at guest_memfd creation time, pages from the custom allocator will be used to back guest_memfd. Change-Id: I59df960b3273790f42fe5bea54a234f40962eb75 Signed-off-by: Ackerley Tng --- mm/memory.c | 1 + virt/kvm/guest_memfd.c | 142 +++++++++++++++++++++++++++++++++++++---- 2 files changed, 132 insertions(+), 11 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index ba3ea0a82f7f..3af45e96913c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -7249,6 +7249,7 @@ void folio_zero_user(struct folio *folio, unsigned lo= ng addr_hint) else process_huge_page(addr_hint, nr_pages, clear_subpage, folio); } +EXPORT_SYMBOL_GPL(folio_zero_user); =20 static int copy_user_gigantic_page(struct folio *dst, struct folio *src, unsigned long addr_hint, diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index c65d93c5a443..24d270b9b725 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -478,15 +478,13 @@ static inline void kvm_gmem_mark_prepared(struct foli= o *folio) * leaking host data and the up-to-date flag is set. */ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot = *slot, - gfn_t gfn, struct folio *folio) + gfn_t gfn, struct folio *folio, + unsigned long addr_hint) { - unsigned long nr_pages, i; pgoff_t index; int r; =20 - nr_pages =3D folio_nr_pages(folio); - for (i =3D 0; i < nr_pages; i++) - clear_highpage(folio_page(folio, i)); + folio_zero_user(folio, addr_hint); =20 /* * Preparing huge folios should always be safe, since it should @@ -554,7 +552,9 @@ static int kvm_gmem_filemap_add_folio(struct address_sp= ace *mapping, */ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index) { + size_t allocated_size; struct folio *folio; + pgoff_t index_floor; int ret; =20 repeat: @@ -581,8 +581,10 @@ static struct folio *kvm_gmem_get_folio(struct inode *= inode, pgoff_t index) return ERR_PTR(ret); } } + allocated_size =3D folio_size(folio); =20 - ret =3D kvm_gmem_filemap_add_folio(inode->i_mapping, folio, index); + index_floor =3D round_down(index, folio_nr_pages(folio)); + ret =3D kvm_gmem_filemap_add_folio(inode->i_mapping, folio, index_floor); if (ret) { folio_put(folio); =20 @@ -598,7 +600,17 @@ static struct folio *kvm_gmem_get_folio(struct inode *= inode, pgoff_t index) return ERR_PTR(ret); } =20 - __folio_set_locked(folio); + spin_lock(&inode->i_lock); + inode->i_blocks +=3D allocated_size / 512; + spin_unlock(&inode->i_lock); + + /* + * folio is the one that is allocated, this gets the folio at the + * requested index. + */ + folio =3D page_folio(folio_file_page(folio, index)); + folio_lock(folio); + return folio; } =20 @@ -736,6 +748,92 @@ static void kvm_gmem_truncate_inode_aligned_pages(stru= ct inode *inode, spin_unlock(&inode->i_lock); } =20 +/** + * kvm_gmem_zero_range() - Zeroes all sub-pages in range [@start, @end). + * + * @mapping: the filemap to remove this range from. + * @start: index in filemap for start of range (inclusive). + * @end: index in filemap for end of range (exclusive). + * + * The pages in range may be split. truncate_inode_pages_range() isn't the= right + * function because it removes pages from the page cache; this function on= ly + * zeroes the pages. + */ +static void kvm_gmem_zero_range(struct address_space *mapping, + pgoff_t start, pgoff_t end) +{ + struct folio_batch fbatch; + + folio_batch_init(&fbatch); + while (filemap_get_folios(mapping, &start, end - 1, &fbatch)) { + unsigned int i; + + for (i =3D 0; i < folio_batch_count(&fbatch); ++i) { + struct folio *f; + size_t nr_bytes; + + f =3D fbatch.folios[i]; + nr_bytes =3D offset_in_folio(f, end << PAGE_SHIFT); + if (nr_bytes =3D=3D 0) + nr_bytes =3D folio_size(f); + + folio_zero_segment(f, 0, nr_bytes); + } + + folio_batch_release(&fbatch); + cond_resched(); + } +} + +/** + * kvm_gmem_truncate_inode_range() - Truncate pages in range [@lstart, @le= nd). + * + * @inode: inode to truncate from. + * @lstart: offset in inode for start of range (inclusive). + * @lend: offset in inode for end of range (exclusive). + * + * Removes full (huge)pages from the filemap and zeroing incomplete + * (huge)pages. The pages in the range may be split. + */ +static void kvm_gmem_truncate_inode_range(struct inode *inode, loff_t lsta= rt, + loff_t lend) +{ + pgoff_t full_hpage_start; + size_t nr_per_huge_page; + pgoff_t full_hpage_end; + size_t nr_pages; + pgoff_t start; + pgoff_t end; + void *priv; + + priv =3D kvm_gmem_allocator_private(inode); + nr_per_huge_page =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(pri= v); + + start =3D lstart >> PAGE_SHIFT; + end =3D min(lend, i_size_read(inode)) >> PAGE_SHIFT; + + full_hpage_start =3D round_up(start, nr_per_huge_page); + full_hpage_end =3D round_down(end, nr_per_huge_page); + + if (start < full_hpage_start) { + pgoff_t zero_end =3D min(full_hpage_start, end); + + kvm_gmem_zero_range(inode->i_mapping, start, zero_end); + } + + if (full_hpage_end > full_hpage_start) { + nr_pages =3D full_hpage_end - full_hpage_start; + kvm_gmem_truncate_inode_aligned_pages(inode, full_hpage_start, + nr_pages); + } + + if (end > full_hpage_end && end > full_hpage_start) { + pgoff_t zero_start =3D max(full_hpage_end, start); + + kvm_gmem_zero_range(inode->i_mapping, zero_start, end); + } +} + static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t= len) { struct list_head *gmem_list =3D &inode->i_mapping->i_private_list; @@ -752,7 +850,12 @@ static long kvm_gmem_punch_hole(struct inode *inode, l= off_t offset, loff_t len) list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_begin(gmem, start, end); =20 - truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); + if (kvm_gmem_has_custom_allocator(inode)) { + kvm_gmem_truncate_inode_range(inode, offset, offset + len); + } else { + /* Page size is PAGE_SIZE, so use optimized truncation function. */ + truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); + } =20 list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_end(gmem, start, end); @@ -776,6 +879,16 @@ static long kvm_gmem_allocate(struct inode *inode, lof= f_t offset, loff_t len) =20 start =3D offset >> PAGE_SHIFT; end =3D (offset + len) >> PAGE_SHIFT; + if (kvm_gmem_has_custom_allocator(inode)) { + size_t nr_pages; + void *p; + + p =3D kvm_gmem_allocator_private(inode); + nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(p); + + start =3D round_down(start, nr_pages); + end =3D round_down(end, nr_pages); + } =20 r =3D 0; for (index =3D start; index < end; ) { @@ -1570,7 +1683,7 @@ static struct folio *__kvm_gmem_get_pfn(struct file *= file, =20 *pfn =3D folio_file_pfn(folio, index); if (max_order) - *max_order =3D 0; + *max_order =3D folio_order(folio); =20 *is_prepared =3D folio_test_uptodate(folio); return folio; @@ -1597,8 +1710,15 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_mem= ory_slot *slot, goto out; } =20 - if (!is_prepared) - r =3D kvm_gmem_prepare_folio(kvm, slot, gfn, folio); + if (!is_prepared) { + /* + * Use the same address as hugetlb for zeroing private pages + * that won't be mapped to userspace anyway. + */ + unsigned long addr_hint =3D folio->index << PAGE_SHIFT; + + r =3D kvm_gmem_prepare_folio(kvm, slot, gfn, folio, addr_hint); + } =20 folio_unlock(folio); =20 --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D583325332F for ; Wed, 14 May 2025 23:43:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266232; cv=none; b=FwY4wtBsYFal9uW6IUVZLaFPle3HM8dROdVSvCoGHN7JrwdvbUHfLuRaC/D5+9GPozbypdWpXYfLEPVP+aU0+/2u1enBKD0VSnIiEhCxzlVB7YUqvegRxuipAavVw8v0j0nU5EKUl16fxxPYELaiCWVY4M4gNVazcr0kp2vESHs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266232; c=relaxed/simple; bh=gRuj5LhnHp77MQo/x7/Xu8VS5sA+MCNxmDYy3otphfw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XAKtRs3JARP72MLrBJ3b+euIYg895YX8QlUKAocJErS5Hi3Ku38JUh1ENTVS/pIuQP4nyJfD/FWT8rhqMk3L3x3aqncYcE07ZfdSNpxmTZXXABp7z+K+V8ijUqFwSFqy63d+qBhpUWf4v5qPtJ7YLkRjKGQImLN20fjvyLVdqxQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1gaBvw8e; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1gaBvw8e" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b00e4358a34so167050a12.0 for ; Wed, 14 May 2025 16:43:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266230; x=1747871030; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=McES21oaKednLCm3MuTlWlFGQGSEQ+X7XTmzybNd4qs=; b=1gaBvw8eU/8mGgSmFw7AsC9I1Igv6ccHbHeUe1txNdFdWNBVOy0Y5ezW0/m268kN5X HJDtACB4PWkYsnZxzFNlHcF2BK6gHJ6TG1ZntpYDiZC+ZS8bBFA1GscrTT/qm5Iwvoy1 Yjx5kwvmBFi1cdsjfw200a5PVF8vJkdgvxsCEOblus0VL5dwLiKKnmO+nZEsU8md6myW TFzZHEwjSvMJMBfcRLbNEwZvci4JDdMB/HMuZ5BAqdx55qeEtZ3GNTRb27CMsPwh1ZW5 vF+hj81y/pggTo3oASoCRH12hkeqtx9NNZdXajl9e//UXtmjEVpnw+qR8YsOd4kyD3ed DOiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266230; x=1747871030; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=McES21oaKednLCm3MuTlWlFGQGSEQ+X7XTmzybNd4qs=; b=VF3snT4j876o5+iw5l82/gi/ltOhWRnqhYmw7pQ83cQCaXC8ybH2y0+OdWiCmnplHh 39eGDfz+q8PgrmC5vPVy7MQpvMeoHziYt4YyPGZyIOag2DAOLo3jWjVhzupY6gfM7Tjy hcA3LFLbvQ5HYt21sB5XVH+Gm9zpYgeS20ufvAFIeze1pjjyUxk3HPytAxepHaf5khLf y0D6aBRhtCgZ/9JbTMbfbu2lVUz0QQngFsc7PzkIUlcKHE7jTKF5YoDO3AEu2VNTpQCL 0OVHbshh8KrMwZkt98HFtR/5eT/Q2o+k3JAbmwZb4ZGweVP0QsYolqJ5bpvBlJ5XMlR8 MLjQ== X-Forwarded-Encrypted: i=1; AJvYcCUGopw/sqxgs7LO4usW/r7EcVN8jLI3zR76qe2vo0RUZ9JDoTDMgxCHk1dB3f4jB4AjeiMv5yH1aD2XNv8=@vger.kernel.org X-Gm-Message-State: AOJu0Yzz7qGmZduXAgjJQOJm3tqwdbj9ierIx22ltyjdMNEzTleTrg0j eHT+I6K3XXQrJBBr65ZFEebkX9+7DNtH6Fx04/vyxmqEZZV4+UnTEQbWzbDhYJwmtfQrG5VlOTa FpiS1ZWjQTAGt0jX5/tIOyQ== X-Google-Smtp-Source: AGHT+IE/wZXTpXwDl+Su1tSGiLVncuSuLdWOUOGA11KTtFzjP3WVcRBqQXBLgSVDxBHYoQCHOkJf5vIDaeKDk1aLpQ== X-Received: from pgdr2.prod.google.com ([2002:a63:9b02:0:b0:b16:a617:f449]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:9144:b0:1f5:93b1:6a58 with SMTP id adf61e73a8af0-215ff094408mr8084312637.8.1747266229924; Wed, 14 May 2025 16:43:49 -0700 (PDT) Date: Wed, 14 May 2025 16:42:13 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <45c797aa925e0d2830978105cdf12d6c39f0bd1f.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 34/51] mm: hugetlb: Add functions to add/delete folio from hugetlb lists From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" These functions are introduced in hugetlb.c so the private hugetlb_lock can be accessed. These functions will be used in splitting and merging pages in a later patch. Signed-off-by: Ackerley Tng Co-developed-by: Vishal Annapurve Signed-off-by: Vishal Annapurve Change-Id: I42f8feda40cbd28e5fd02e54fa58145d847a220e --- include/linux/hugetlb.h | 2 ++ mm/hugetlb.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e6b90e72d46d..e432ccfe3e63 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -156,6 +156,8 @@ bool hugetlb_reserve_pages(struct inode *inode, long fr= om, long to, vm_flags_t vm_flags); long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); +void hugetlb_folio_list_add(struct folio *folio, struct list_head *list); +void hugetlb_folio_list_del(struct folio *folio); bool folio_isolate_hugetlb(struct folio *folio, struct list_head *list); int get_hwpoison_hugetlb_folio(struct folio *folio, bool *hugetlb, bool un= poison); int get_huge_page_for_hwpoison(unsigned long pfn, int flags, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 816f257680be..6e326c09c505 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7473,6 +7473,28 @@ long hugetlb_unreserve_pages(struct inode *inode, lo= ng start, long end, return 0; } =20 +void hugetlb_folio_list_add(struct folio *folio, struct list_head *list) +{ + /* + * hstate's hugepage_activelist is guarded by hugetlb_lock, hence hold + * hugetlb_lock while modifying folio-> lru. + */ + spin_lock_irq(&hugetlb_lock); + list_add(&folio->lru, list); + spin_unlock_irq(&hugetlb_lock); +} + +void hugetlb_folio_list_del(struct folio *folio) +{ + /* + * hstate's hugepage_activelist is guarded by hugetlb_lock, hence hold + * hugetlb_lock while modifying folio-> lru. + */ + spin_lock_irq(&hugetlb_lock); + list_del(&folio->lru); + spin_unlock_irq(&hugetlb_lock); +} + #ifdef CONFIG_HUGETLB_PMD_PAGE_TABLE_SHARING static unsigned long page_table_shareable(struct vm_area_struct *svma, struct vm_area_struct *vma, --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2EC37256C88 for ; Wed, 14 May 2025 23:43:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266234; cv=none; b=aBkBJdyTP1oQvvXC6gH+9izdiTRvMNEvxDxvGyJp923QqIhxqSaTy1Um2eJdOx/dz2+FbdiDdejGmtPqcdpEKBZ9B45fNLnpAKV4anUMFeyEsqDipZDiwvxMBYMeTMMgOVgmNCwmX6QSGM8y9U0aefkxywGNksIxAQ/e36gkgaw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266234; c=relaxed/simple; bh=WIiuIhBvEYQRoDVYaMe6Epa6jW56JZZDNHsLnkwv8LE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mk1XyxamSEOyyeV8diliIb8zZkMbMFm4jBevrKXU2vKN4q/gNPtVaPjvO15jqVtIp7n+i2aCQohdctKXaAZbTLQiCsrAGXdBZJDZlV5b3qcNt836S79dAoxXcUmEsGtwk79uRXQqF3UJhOjJS6MBDaXU4SSi0e0DUHutiXmeKsg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mARadzej; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mARadzej" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30ab5d34fdbso360806a91.0 for ; Wed, 14 May 2025 16:43:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266232; x=1747871032; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VfnYbWQzMmQc1PANKcXbyhh93wrWxLA5NUfiI2kZlfs=; b=mARadzejB9xaLag0baFW/aClRLPyi2wXmXmCQLxkqP3A7q8uU3atLorUOvL435O0l3 ey3vbDghtzjtiXIx2ec9i+aDs2fUeI27wgLt+x/UmnfYIZMoe6LWQb9W+2KwrJbufhXs dePLiQykaaRLMeaZF7+MHJAqev6y217+sEjIEvvUmTqy/ecmMQpcp6cUGh1JSjD33YWC 4xktAJdcC5Cw+9NmJhYSgzUi8Y9e5DiUTK8rarjl0nzV3jkNRtLP3TuYC/OXm1opdNY/ ekYapZowZb3x4NaxLyrAKJ3Mq/XcnkPSkKsmi41tn/wI8gPmYWMsopGyzyaqAqnpPHH+ ciWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266232; x=1747871032; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VfnYbWQzMmQc1PANKcXbyhh93wrWxLA5NUfiI2kZlfs=; b=h2Dt2aIl3ktzICp2/o9vy93BEYrPEwuNS9U/GdpejW+KhH8nmEizmK8Bhc1BnXQdn5 D8p28TMr2A6Zlr0uKhBkTx1wEsbptenZ+TuiCIWd2ySnZ7dFpc7y6UqTCJ7zlIXkqcjp GdsD+rgwr08RlJqk71MV846Vw3y7sLWB13JbsAGKx43A0uNv9WR3g1i8hFZsOrajI5Oh j6iK3l9r13Fvli1qGZ7CGYJwkL5pZITdm6TXEQbJXf/c2SvGUn9hBzCPad2y1sKeF0Jb YVzqaYy004r1fU64+AVZzA+WhSW2WkmtG+KRyX3aKzCJxzM9Q6U01VEmm8ZLDa4FaYmT fwJg== X-Forwarded-Encrypted: i=1; AJvYcCVRhpxqIruH7Iaw1iKsXWETeR9F/1p2GqI1CWUtKczv/4kXEbFIKiTY8ZgP2SLnd9hdeItAS0XRIn1ZxwQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwQgz8326wOVaKIcde35N46W7sig6KSv8NjzEYSv6JeD8otgsAW dBb4qme0Dkuyw5CpVMAdl0gOK0MSonOq8vtAgcUGMy/fwpAiILC4aIyEmeKoJQzNyvxaAOVhacy g33wt5STSOcgpmuXji6/nGQ== X-Google-Smtp-Source: AGHT+IFVGcqlDEnJpkFEd+R21FoPe5Rerk1Iq1HXA53kK0tkVEoANeobibZBnoWnktKVrpAkNdhU2ptVfud0MkCE0w== X-Received: from pji8.prod.google.com ([2002:a17:90b:3fc8:b0:2ea:5084:5297]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4acb:b0:2ee:b4bf:2d06 with SMTP id 98e67ed59e1d1-30e2e6133d8mr7435241a91.19.1747266231568; Wed, 14 May 2025 16:43:51 -0700 (PDT) Date: Wed, 14 May 2025 16:42:14 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <2ae41e0d80339da2b57011622ac2288fed65cd01.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 35/51] mm: guestmem_hugetlb: Add support for splitting and merging pages From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" These functions allow guest_memfd to split and merge HugeTLB pages, and clean them up on freeing the page. For merging and splitting pages on conversion, guestmem_hugetlb expects the refcount on the pages to already be 0. The caller must ensure that. For conversions, guest_memfd ensures that the refcounts are already 0 by checking that there are no unexpected refcounts, and then freezing the expected refcounts away. On unexpected refcounts, guest_memfd will return an error to userspace. For truncation, on unexpected refcounts, guest_memfd will return an error to userspace. For truncation on closing, guest_memfd will just remove its own refcounts (the filemap refcounts) and mark split pages with PGTY_guestmem_hugetlb. The presence of PGTY_guestmem_hugetlb will trigger the folio_put() callback to handle further cleanup. This cleanup process will merge pages (with refcount 0, since cleanup is triggered from folio_put()) before returning the pages to HugeTLB. Since the merging process is long, it is deferred to a worker thread since folio_put() could be called from atomic context. Change-Id: Ib04a3236f1e7250fd9af827630c334d40fb09d40 Signed-off-by: Ackerley Tng Co-developed-by: Vishal Annapurve Signed-off-by: Vishal Annapurve --- include/linux/guestmem.h | 3 + mm/guestmem_hugetlb.c | 349 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 347 insertions(+), 5 deletions(-) diff --git a/include/linux/guestmem.h b/include/linux/guestmem.h index 4b2d820274d9..3ee816d1dd34 100644 --- a/include/linux/guestmem.h +++ b/include/linux/guestmem.h @@ -8,6 +8,9 @@ struct guestmem_allocator_operations { void *(*inode_setup)(size_t size, u64 flags); void (*inode_teardown)(void *private, size_t inode_size); struct folio *(*alloc_folio)(void *private); + int (*split_folio)(struct folio *folio); + void (*merge_folio)(struct folio *folio); + void (*free_folio)(struct folio *folio); /* * Returns the number of PAGE_SIZE pages in a page that this guestmem * allocator provides. diff --git a/mm/guestmem_hugetlb.c b/mm/guestmem_hugetlb.c index ec5a188ca2a7..8727598cf18e 100644 --- a/mm/guestmem_hugetlb.c +++ b/mm/guestmem_hugetlb.c @@ -11,15 +11,12 @@ #include #include #include +#include =20 #include =20 #include "guestmem_hugetlb.h" - -void guestmem_hugetlb_handle_folio_put(struct folio *folio) -{ - WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress."); -} +#include "hugetlb_vmemmap.h" =20 struct guestmem_hugetlb_private { struct hstate *h; @@ -34,6 +31,339 @@ static size_t guestmem_hugetlb_nr_pages_in_folio(void *= priv) return pages_per_huge_page(private->h); } =20 +static DEFINE_XARRAY(guestmem_hugetlb_stash); + +struct guestmem_hugetlb_metadata { + void *_hugetlb_subpool; + void *_hugetlb_cgroup; + void *_hugetlb_hwpoison; + void *private; +}; + +struct guestmem_hugetlb_stash_item { + struct guestmem_hugetlb_metadata hugetlb_metadata; + /* hstate tracks the original size of this folio. */ + struct hstate *h; + /* Count of split pages, individually freed, waiting to be merged. */ + atomic_t nr_pages_waiting_to_be_merged; +}; + +struct workqueue_struct *guestmem_hugetlb_wq __ro_after_init; +static struct work_struct guestmem_hugetlb_cleanup_work; +static LLIST_HEAD(guestmem_hugetlb_cleanup_list); + +static inline void guestmem_hugetlb_register_folio_put_callback(struct fol= io *folio) +{ + __folio_set_guestmem_hugetlb(folio); +} + +static inline void guestmem_hugetlb_unregister_folio_put_callback(struct f= olio *folio) +{ + __folio_clear_guestmem_hugetlb(folio); +} + +static inline void guestmem_hugetlb_defer_cleanup(struct folio *folio) +{ + struct llist_node *node; + + /* + * Reuse the folio->mapping pointer as a struct llist_node, since + * folio->mapping is NULL at this point. + */ + BUILD_BUG_ON(sizeof(folio->mapping) !=3D sizeof(struct llist_node)); + node =3D (struct llist_node *)&folio->mapping; + + /* + * Only schedule work if list is previously empty. Otherwise, + * schedule_work() had been called but the workfn hasn't retrieved the + * list yet. + */ + if (llist_add(node, &guestmem_hugetlb_cleanup_list)) + queue_work(guestmem_hugetlb_wq, &guestmem_hugetlb_cleanup_work); +} + +void guestmem_hugetlb_handle_folio_put(struct folio *folio) +{ + guestmem_hugetlb_unregister_folio_put_callback(folio); + + /* + * folio_put() can be called in interrupt context, hence do the work + * outside of interrupt context + */ + guestmem_hugetlb_defer_cleanup(folio); +} + +/* + * Stash existing hugetlb metadata. Use this function just before splittin= g a + * hugetlb page. + */ +static inline void +__guestmem_hugetlb_stash_metadata(struct guestmem_hugetlb_metadata *metada= ta, + struct folio *folio) +{ + /* + * (folio->page + 1) doesn't have to be stashed since those fields are + * known on split/reconstruct and will be reinitialized anyway. + */ + + /* + * subpool is created for every guest_memfd inode, but the folios will + * outlive the inode, hence we store the subpool here. + */ + metadata->_hugetlb_subpool =3D folio->_hugetlb_subpool; + /* + * _hugetlb_cgroup has to be stored for freeing + * later. _hugetlb_cgroup_rsvd does not, since it is NULL for + * guest_memfd folios anyway. guest_memfd reservations are handled in + * the inode. + */ + metadata->_hugetlb_cgroup =3D folio->_hugetlb_cgroup; + metadata->_hugetlb_hwpoison =3D folio->_hugetlb_hwpoison; + + /* + * HugeTLB flags are stored in folio->private. stash so that ->private + * can be used by core-mm. + */ + metadata->private =3D folio->private; +} + +static int guestmem_hugetlb_stash_metadata(struct folio *folio) +{ + XA_STATE(xas, &guestmem_hugetlb_stash, 0); + struct guestmem_hugetlb_stash_item *stash; + void *entry; + + stash =3D kzalloc(sizeof(*stash), 1); + if (!stash) + return -ENOMEM; + + stash->h =3D folio_hstate(folio); + __guestmem_hugetlb_stash_metadata(&stash->hugetlb_metadata, folio); + + xas_set_order(&xas, folio_pfn(folio), folio_order(folio)); + + xas_lock(&xas); + entry =3D xas_store(&xas, stash); + xas_unlock(&xas); + + if (xa_is_err(entry)) { + kfree(stash); + return xa_err(entry); + } + + return 0; +} + +static inline void +__guestmem_hugetlb_unstash_metadata(struct guestmem_hugetlb_metadata *meta= data, + struct folio *folio) +{ + folio->_hugetlb_subpool =3D metadata->_hugetlb_subpool; + folio->_hugetlb_cgroup =3D metadata->_hugetlb_cgroup; + folio->_hugetlb_cgroup_rsvd =3D NULL; + folio->_hugetlb_hwpoison =3D metadata->_hugetlb_hwpoison; + + folio_change_private(folio, metadata->private); +} + +static int guestmem_hugetlb_unstash_free_metadata(struct folio *folio) +{ + struct guestmem_hugetlb_stash_item *stash; + unsigned long pfn; + + pfn =3D folio_pfn(folio); + + stash =3D xa_erase(&guestmem_hugetlb_stash, pfn); + __guestmem_hugetlb_unstash_metadata(&stash->hugetlb_metadata, folio); + + kfree(stash); + + return 0; +} + +/** + * guestmem_hugetlb_split_folio() - Split a HugeTLB @folio to PAGE_SIZE pa= ges. + * + * @folio: The folio to be split. + * + * Context: Before splitting, the folio must have a refcount of 0. After + * splitting, each split folio has a refcount of 0. + * Return: 0 on success and negative error otherwise. + */ +static int guestmem_hugetlb_split_folio(struct folio *folio) +{ + long orig_nr_pages; + int ret; + int i; + + if (folio_size(folio) =3D=3D PAGE_SIZE) + return 0; + + orig_nr_pages =3D folio_nr_pages(folio); + ret =3D guestmem_hugetlb_stash_metadata(folio); + if (ret) + return ret; + + /* + * hugetlb_vmemmap_restore_folio() has to be called ahead of the rest + * because it checks and page type. This doesn't actually split the + * folio, so the first few struct pages are still intact. + */ + ret =3D hugetlb_vmemmap_restore_folio(folio_hstate(folio), folio); + if (ret) + goto err; + + /* + * Can clear without lock because this will not race with the folio + * being mapped. folio's page type is overlaid with mapcount and so in + * other cases it's necessary to take hugetlb_lock to prevent races with + * mapcount increasing. + */ + __folio_clear_hugetlb(folio); + + /* + * Remove the first folio from h->hugepage_activelist since it is no + * longer a HugeTLB page. The other split pages should not be on any + * lists. + */ + hugetlb_folio_list_del(folio); + + /* Actually split page by undoing prep_compound_page() */ + __folio_clear_head(folio); + +#ifdef NR_PAGES_IN_LARGE_FOLIO + /* + * Zero out _nr_pages, otherwise this overlaps with memcg_data, + * resulting in lookups on false memcg_data. _nr_pages doesn't have to + * be set to 1 because folio_nr_pages() relies on the presence of the + * head flag to return 1 for nr_pages. + */ + folio->_nr_pages =3D 0; +#endif + + for (i =3D 1; i < orig_nr_pages; ++i) { + struct page *p =3D folio_page(folio, i); + + /* Copy flags from the first page to split pages. */ + p->flags =3D folio->flags; + + p->mapping =3D NULL; + clear_compound_head(p); + } + + return 0; + +err: + guestmem_hugetlb_unstash_free_metadata(folio); + + return ret; +} + +/** + * guestmem_hugetlb_merge_folio() - Merge a HugeTLB folio from the folio + * beginning @first_folio. + * + * @first_folio: the first folio in a contiguous block of folios to be mer= ged. + * + * The size of the contiguous block is tracked in guestmem_hugetlb_stash. + * + * Context: The first folio is checked to have a refcount of 0 before + * reconstruction. After reconstruction, the reconstructed folio = has a + * refcount of 0. + */ +static void guestmem_hugetlb_merge_folio(struct folio *first_folio) +{ + struct guestmem_hugetlb_stash_item *stash; + struct hstate *h; + + stash =3D xa_load(&guestmem_hugetlb_stash, folio_pfn(first_folio)); + h =3D stash->h; + + /* + * This is the step that does the merge. prep_compound_page() will write + * to pages 1 and 2 as well, so guestmem_unstash_hugetlb_metadata() has + * to come after this. + */ + prep_compound_page(&first_folio->page, huge_page_order(h)); + + WARN_ON(guestmem_hugetlb_unstash_free_metadata(first_folio)); + + /* + * prep_compound_page() will set up mapping on tail pages. For + * completeness, clear mapping on head page. + */ + first_folio->mapping =3D NULL; + + __folio_set_hugetlb(first_folio); + + hugetlb_folio_list_add(first_folio, &h->hugepage_activelist); + + hugetlb_vmemmap_optimize_folio(h, first_folio); +} + +static struct folio *guestmem_hugetlb_maybe_merge_folio(struct folio *foli= o) +{ + struct guestmem_hugetlb_stash_item *stash; + unsigned long first_folio_pfn; + struct folio *first_folio; + unsigned long pfn; + size_t nr_pages; + + pfn =3D folio_pfn(folio); + + stash =3D xa_load(&guestmem_hugetlb_stash, pfn); + nr_pages =3D pages_per_huge_page(stash->h); + if (atomic_inc_return(&stash->nr_pages_waiting_to_be_merged) < nr_pages) + return NULL; + + first_folio_pfn =3D round_down(pfn, nr_pages); + first_folio =3D pfn_folio(first_folio_pfn); + + guestmem_hugetlb_merge_folio(first_folio); + + return first_folio; +} + +static void guestmem_hugetlb_cleanup_folio(struct folio *folio) +{ + struct folio *merged_folio; + + merged_folio =3D guestmem_hugetlb_maybe_merge_folio(folio); + if (merged_folio) + __folio_put(merged_folio); +} + +static void guestmem_hugetlb_cleanup_workfn(struct work_struct *work) +{ + struct llist_node *node; + + node =3D llist_del_all(&guestmem_hugetlb_cleanup_list); + while (node) { + struct folio *folio; + + folio =3D container_of((struct address_space **)node, + struct folio, mapping); + + node =3D node->next; + folio->mapping =3D NULL; + + guestmem_hugetlb_cleanup_folio(folio); + } +} + +static int __init guestmem_hugetlb_init(void) +{ + INIT_WORK(&guestmem_hugetlb_cleanup_work, guestmem_hugetlb_cleanup_workfn= ); + + guestmem_hugetlb_wq =3D alloc_workqueue("guestmem_hugetlb", + WQ_MEM_RECLAIM | WQ_UNBOUND, 0); + if (!guestmem_hugetlb_wq) + return -ENOMEM; + + return 0; +} +subsys_initcall(guestmem_hugetlb_init); + static void *guestmem_hugetlb_setup(size_t size, u64 flags) =20 { @@ -164,10 +494,19 @@ static struct folio *guestmem_hugetlb_alloc_folio(voi= d *priv) return ERR_PTR(-ENOMEM); } =20 +static void guestmem_hugetlb_free_folio(struct folio *folio) +{ + if (xa_load(&guestmem_hugetlb_stash, folio_pfn(folio))) + guestmem_hugetlb_register_folio_put_callback(folio); +} + const struct guestmem_allocator_operations guestmem_hugetlb_ops =3D { .inode_setup =3D guestmem_hugetlb_setup, .inode_teardown =3D guestmem_hugetlb_teardown, .alloc_folio =3D guestmem_hugetlb_alloc_folio, + .split_folio =3D guestmem_hugetlb_split_folio, + .merge_folio =3D guestmem_hugetlb_merge_folio, + .free_folio =3D guestmem_hugetlb_free_folio, .nr_pages_in_folio =3D guestmem_hugetlb_nr_pages_in_folio, }; EXPORT_SYMBOL_GPL(guestmem_hugetlb_ops); --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D902025A647 for ; Wed, 14 May 2025 23:43:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266235; cv=none; b=mER0t69zvy7vuWrMaOyCYVtdKJm6Z9GwrWCZppz68Mg6BvtRHVLBmSyWx1NMmvzAUX5kuyuATbfjLyB3BOoonr6v8PEjl7aTo5fNR2wVueyqDetaPUsrdIAXU+MdAJe/JBA29tW695qgymSxE10rsxq/gQ8fv/EaL+F3xzWiHvc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266235; c=relaxed/simple; bh=VqiXtJ8KCG1AwXpHhLg8Tkfzo/8n2mGO7InnOeWdcDY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JdHiwtstDrMkezmjeaI2J1dkzyaJ/82yQ764CixS12jQHhgHN9NhcC1xcsmryIE/qRBd9vvEdiB6iJZ0MEkXb9n6NLcRxBYcKkooE+kGmzR1T0G22I/Oy2D4cljM12F1iDvhLqtFHBtg5dYajWzQBKmNyRTb8UN5TjjXf/V2SiE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=111Uf9wO; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="111Uf9wO" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-af534e796baso143661a12.3 for ; Wed, 14 May 2025 16:43:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266233; x=1747871033; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nKDqtrlVHa3bdl8p7B56FX6c4Fp0ZkjHF2tEioSPG28=; b=111Uf9wOEjtVRI+7u5zYWfd01lgDWNNuYt8aIRyqSwV3BHctLFinfwtB7sMMSzMb7Y KkDHErE14TALE8cOpztroHnZfVVrtiI1hqdQ29XCet0kqS9bKQap824RCfDEa7wWlr9H iMd+rmupLM85LytKutiGMozV75kGwHgCEiFN9tmESn2fhta6Eo7Q0Gy5YG3sPcTnpXfX g9NYnGteyAOoNpQr3Izzq38oKs6am5+Xkbn/zKNelh+lZvPHi8LTzCY5C+7AIK6iaMGC LUuDUQ8+xP7Trq+TVdPczkIMaJ4AMZhmJXScJqmP02PIUiX766dqRysonYu5OBMHwkgJ 47EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266233; x=1747871033; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nKDqtrlVHa3bdl8p7B56FX6c4Fp0ZkjHF2tEioSPG28=; b=E3CwLbZoEGpVoyUANdJnw9I7pRUxIh5mIZvJe/TBUTB9m5hAUqSJsMTZKVkwoRYuUG Gj4UCNfpJk5girI2fxfw82QfyIBJDAsMCsuze+CbzOvaSCNZAoiWOFbJBZbF7lMW7KbZ T167M9K9aApWPMqrjCVElu2ZRoBcMf4vuwBoH0f4sDNOBd6rM5ORhmIPG0nuQip0d83O Dx87LHEaUJgxv4yAaN4CDu/+Fkbpck/lIrLmlquHmK1EhuVmcVzoMvO3l+iPBJ58Sgzz rWTFjVcaHzqMaVRFNS89ejhXiygzJQ/3hVK2WotQrv1OWtP6OCcf5KjZdEZ2oT//ppaL jqcA== X-Forwarded-Encrypted: i=1; AJvYcCWw2sJE4vyympl//1Ifrc6IamNyzB54/3gS2eR/4UpMSbAyzCmFdmhbzAjJ0Gh+TAARSizeHG/RH2Cyb1k=@vger.kernel.org X-Gm-Message-State: AOJu0Yz6CrQF7WT0AfUJNK6KNr2e+NXZBlt5qJ65/ZIUfG0OGtMvaB+H IqYuTTQZPNFa7bVhPtz3FBiLWxV8kNL6+lkxXBL32m7XjLBvMR3pc+mDJUt/oIX7kdHiVIUCnsv wxsbq5bnlX5XqWM7V/VQpjQ== X-Google-Smtp-Source: AGHT+IGnPIRIGLxCiEdg+1CVTBfIREiEwND24QECga88Bq6mE7e6bLUoxaAGPBs3XSZHbyYwxbSdih97V+hzi1KYgg== X-Received: from pjbsh12.prod.google.com ([2002:a17:90b:524c:b0:308:861f:fddb]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2742:b0:301:98fc:9b2f with SMTP id 98e67ed59e1d1-30e2e583da8mr7514670a91.1.1747266233160; Wed, 14 May 2025 16:43:53 -0700 (PDT) Date: Wed, 14 May 2025 16:42:15 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <28d1e564df1b9774611563146afa7da91cdd4dc0.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 36/51] mm: Convert split_folio() macro to function From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This will prevent the macro from overriding any function and function calls defined as split_folio(). Change-Id: I88a66bd876731b272282a42468c3bf8ac008b7cc Signed-off-by: Ackerley Tng --- include/linux/huge_mm.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e893d546a49f..f392ff49a816 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -99,7 +99,11 @@ extern struct kobj_attribute thpsize_shmem_enabled_attr; #define thp_vma_allowable_order(vma, vm_flags, tva_flags, order) \ (!!thp_vma_allowable_orders(vma, vm_flags, tva_flags, BIT(order))) =20 -#define split_folio(f) split_folio_to_list(f, NULL) +int split_folio_to_list(struct folio *folio, struct list_head *list); +static inline int split_folio(struct folio *folio) +{ + return split_folio_to_list(folio, NULL); +} =20 #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES #define HPAGE_PMD_SHIFT PMD_SHIFT --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9ABB225C703 for ; Wed, 14 May 2025 23:43:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266237; cv=none; b=DAONtRcAG5W3Pj9r9TK1CWZ30DNbmYlmg7e1dUzMJPbA+oA725Jb/rMEO4DqOT2p9MEqVjDzzBBz54hZPBtnNTKXtBPoj9920pvwpXyVA0d5wemsO7h+lvKUPoXqHwlzznphEOwv6t6vqX6Gzt2SrYmO6oy99nURyNUjpnZW56Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266237; c=relaxed/simple; bh=IMiKEMd6OxkBoCNVWLOzENvmPQZSLrc1NWT7HaivxWo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JlpCMwUC2M6GKrqHXF441sd6mGWnaB28+ZlVq0M53QQ6lYMukmh4k7z0hMrDCCVDOHFvwtjpz/m1fSO0D/LIR50zfFK7vBHJ3QAwAzaa9DlDabAvOECZqZBeizeNF3JZNL3iMwsuL6pgHRoIYKtBn+EH3wyLjt4MQ00hnJPRI60= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=lWpVxQE4; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lWpVxQE4" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-74292762324so272254b3a.0 for ; Wed, 14 May 2025 16:43:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266235; x=1747871035; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uuq1bEexdRhfFEX1jCABHc+UNgsIInH8N9P0pZ09IXA=; b=lWpVxQE4i8lmzDx5Y94IGz4ODljbyX6Edr1/iYC4JCNTqGAzQ6GikucSHNnRdq0k9N wODK+Rtr5JQD3u7Q4n1hTrv1wMtqWIsI6jlbUXvyaBIzUK1CVwoWo00ULPQ5LC/wZqiS /bz3GW6jBJR7jEyjvg4tIouNzElyQ9t89PXCO6piR8SqKWCbJX3JZRXVQIbVHhmGrzTR EXveltb979x2WHXeqroFn69h5lENXc+dbWBkAiJzhre355kM2mZbge3awmsRNQ2huF6w A/wQlyKBLnINIQSFc6jVv4srKLs4mHAC1wZXBUH4Ml4webkymYKUtL1aKuFuz6lkdWst nnyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266235; x=1747871035; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uuq1bEexdRhfFEX1jCABHc+UNgsIInH8N9P0pZ09IXA=; b=YM6/rjz9+L301VYu6XCqCurpaXvDYvsi0KZn/c/7mnqQ7800vb7/vJ6h+nosdnOKYq 60czeVxN1SxfthZGqaq0bi0EguwSR26QK0mbhZOtNp+7XEdcwMIE7V+C9TgC2ZC8qnDg s8S7hBCs5WwxApKc5nq7OQMUBS7qMyI//lOh7akNvytks8R9naB7Aa6SslRnnOaYQnwe Ctghxy4O4mf/mfbNoLO5VuDkMOk0smmNzaRUbbX4iS0rbP9n6JIxbUr6fqFem8MsuUF2 lwtRa0W1Y+n6xGsAI00B+L8fThOZo9SLWeTcrv0BRziwdjjI4rWQhZ5xNjZPg+UOj3+h IzJw== X-Forwarded-Encrypted: i=1; AJvYcCVfyStszW/vnyyn+ucqcgsCQ58576hb06eHc9bJOGXvCMgIY3iP5jWPS11lQh8yOMhGd7OF0sRJKdRzECE=@vger.kernel.org X-Gm-Message-State: AOJu0YwRCh9LREEdQrh0pndmcSwF9yCdmaF770mI1WMkRwLLP3p4ItGL HGKbu4iC8vLCHFGp6ozNM5KomS1PEtid5dHDPhbnj9FwGET1kvXAID/6Se58FXJBTmN3QKDgZEj 5KkThgLwJ02eTvpjsxxHDZQ== X-Google-Smtp-Source: AGHT+IEP7NX45yVSRggtsTELkRXbv8SEWvt+7ORsl2x8HhwmdlC6ogf+zGJHadz5Krd2YgHwh61qzF+l2O1bbO5BGw== X-Received: from pfmv16.prod.google.com ([2002:a62:a510:0:b0:736:3cd5:ba3f]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:181c:b0:740:9d7c:aeb9 with SMTP id d2e1a72fcca58-74289377cb0mr7546070b3a.21.1747266234724; Wed, 14 May 2025 16:43:54 -0700 (PDT) Date: Wed, 14 May 2025 16:42:16 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 37/51] filemap: Pass address_space mapping to ->free_folio() From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com, Mike Day Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Elliot Berman The plan is to be able to support multiple allocators for guest_memfd folios. To allow each allocator to handle release of a folio from a guest_memfd filemap, ->free_folio() needs to retrieve allocator information that is stored on the guest_memfd inode. ->free_folio() shouldn't assume that folio->mapping is set/valid, and the mapping is well-known to callers of .free_folio(). Hence, pass address_space mapping to ->free_folio() for the callback to retrieve any necessary information. Link: https://lore.kernel.org/all/15f665b4-2d33-41ca-ac50-fafe24ade32f@redh= at.com/ Suggested-by: David Hildenbrand Acked-by: David Hildenbrand Change-Id: I8bac907832a0b2491fa403a6ab72fcef1b4713ee Signed-off-by: Elliot Berman Tested-by: Mike Day Signed-off-by: Ackerley Tng --- Documentation/filesystems/locking.rst | 2 +- Documentation/filesystems/vfs.rst | 15 +++++++++------ fs/nfs/dir.c | 9 +++++++-- fs/orangefs/inode.c | 3 ++- include/linux/fs.h | 2 +- mm/filemap.c | 9 +++++---- mm/secretmem.c | 3 ++- mm/vmscan.c | 4 ++-- virt/kvm/guest_memfd.c | 3 ++- 9 files changed, 31 insertions(+), 19 deletions(-) diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesyst= ems/locking.rst index 0ec0bb6eb0fb..c3d7430481ae 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -263,7 +263,7 @@ prototypes:: sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t start, size_t len); bool (*release_folio)(struct folio *, gfp_t); - void (*free_folio)(struct folio *); + void (*free_folio)(struct address_space *, struct folio *); int (*direct_IO)(struct kiocb *, struct iov_iter *iter); int (*migrate_folio)(struct address_space *, struct folio *dst, struct folio *src, enum migrate_mode); diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/= vfs.rst index ae79c30b6c0c..bba1ac848f96 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -833,7 +833,7 @@ cache in your filesystem. The following members are de= fined: sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t start, size_t len); bool (*release_folio)(struct folio *, gfp_t); - void (*free_folio)(struct folio *); + void (*free_folio)(struct address_space *, struct folio *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); int (*migrate_folio)(struct mapping *, struct folio *dst, struct folio *src, enum migrate_mode); @@ -1011,11 +1011,14 @@ cache in your filesystem. The following members ar= e defined: clear the uptodate flag if it cannot free private data yet. =20 ``free_folio`` - free_folio is called once the folio is no longer visible in the - page cache in order to allow the cleanup of any private data. - Since it may be called by the memory reclaimer, it should not - assume that the original address_space mapping still exists, and - it should not block. + free_folio is called once the folio is no longer visible in + the page cache in order to allow the cleanup of any private + data. Since it may be called by the memory reclaimer, it + should not assume that the original address_space mapping + still exists at folio->mapping. The mapping the folio used to + belong to is instead passed for free_folio to read any + information it might need from the mapping. free_folio should + not block. =20 ``direct_IO`` called by the generic read/write routines to perform direct_IO - diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index bd23fc736b39..148433f6d9d4 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -55,7 +55,7 @@ static int nfs_closedir(struct inode *, struct file *); static int nfs_readdir(struct file *, struct dir_context *); static int nfs_fsync_dir(struct file *, loff_t, loff_t, int); static loff_t nfs_llseek_dir(struct file *, loff_t, int); -static void nfs_readdir_clear_array(struct folio *); +static void nfs_free_folio(struct address_space *, struct folio *); static int nfs_do_create(struct inode *dir, struct dentry *dentry, umode_t mode, int open_flags); =20 @@ -69,7 +69,7 @@ const struct file_operations nfs_dir_operations =3D { }; =20 const struct address_space_operations nfs_dir_aops =3D { - .free_folio =3D nfs_readdir_clear_array, + .free_folio =3D nfs_free_folio, }; =20 #define NFS_INIT_DTSIZE PAGE_SIZE @@ -230,6 +230,11 @@ static void nfs_readdir_clear_array(struct folio *foli= o) kunmap_local(array); } =20 +static void nfs_free_folio(struct address_space *mapping, struct folio *fo= lio) +{ + nfs_readdir_clear_array(folio); +} + static void nfs_readdir_folio_reinit_array(struct folio *folio, u64 last_c= ookie, u64 change_attr) { diff --git a/fs/orangefs/inode.c b/fs/orangefs/inode.c index 5ac743c6bc2e..884cc5295f3e 100644 --- a/fs/orangefs/inode.c +++ b/fs/orangefs/inode.c @@ -449,7 +449,8 @@ static bool orangefs_release_folio(struct folio *folio,= gfp_t foo) return !folio_test_private(folio); } =20 -static void orangefs_free_folio(struct folio *folio) +static void orangefs_free_folio(struct address_space *mapping, + struct folio *folio) { kfree(folio_detach_private(folio)); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 0fded2e3c661..9862ea92a2af 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -455,7 +455,7 @@ struct address_space_operations { sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t offset, size_t len); bool (*release_folio)(struct folio *, gfp_t); - void (*free_folio)(struct folio *folio); + void (*free_folio)(struct address_space *mapping, struct folio *folio); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); /* * migrate the contents of a folio to the specified target. If diff --git a/mm/filemap.c b/mm/filemap.c index bed7160db214..a02c3d8e00e8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -226,11 +226,11 @@ void __filemap_remove_folio(struct folio *folio, void= *shadow) =20 void filemap_free_folio(struct address_space *mapping, struct folio *folio) { - void (*free_folio)(struct folio *); + void (*free_folio)(struct address_space*, struct folio *); =20 free_folio =3D mapping->a_ops->free_folio; if (free_folio) - free_folio(folio); + free_folio(mapping, folio); =20 folio_put_refs(folio, folio_nr_pages(folio)); } @@ -820,7 +820,8 @@ EXPORT_SYMBOL(file_write_and_wait_range); void replace_page_cache_folio(struct folio *old, struct folio *new) { struct address_space *mapping =3D old->mapping; - void (*free_folio)(struct folio *) =3D mapping->a_ops->free_folio; + void (*free_folio)(struct address_space *, struct folio *) =3D + mapping->a_ops->free_folio; pgoff_t offset =3D old->index; XA_STATE(xas, &mapping->i_pages, offset); =20 @@ -849,7 +850,7 @@ void replace_page_cache_folio(struct folio *old, struct= folio *new) __lruvec_stat_add_folio(new, NR_SHMEM); xas_unlock_irq(&xas); if (free_folio) - free_folio(old); + free_folio(mapping, old); folio_put(old); } EXPORT_SYMBOL_GPL(replace_page_cache_folio); diff --git a/mm/secretmem.c b/mm/secretmem.c index c0e459e58cb6..178507c1b900 100644 --- a/mm/secretmem.c +++ b/mm/secretmem.c @@ -152,7 +152,8 @@ static int secretmem_migrate_folio(struct address_space= *mapping, return -EBUSY; } =20 -static void secretmem_free_folio(struct folio *folio) +static void secretmem_free_folio(struct address_space *mapping, + struct folio *folio) { set_direct_map_default_noflush(&folio->page); folio_zero_segment(folio, 0, folio_size(folio)); diff --git a/mm/vmscan.c b/mm/vmscan.c index 3783e45bfc92..b8add4d0cf18 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -788,7 +788,7 @@ static int __remove_mapping(struct address_space *mappi= ng, struct folio *folio, xa_unlock_irq(&mapping->i_pages); put_swap_folio(folio, swap); } else { - void (*free_folio)(struct folio *); + void (*free_folio)(struct address_space *, struct folio *); =20 free_folio =3D mapping->a_ops->free_folio; /* @@ -817,7 +817,7 @@ static int __remove_mapping(struct address_space *mappi= ng, struct folio *folio, spin_unlock(&mapping->host->i_lock); =20 if (free_folio) - free_folio(folio); + free_folio(mapping, folio); } =20 return 1; diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 24d270b9b725..c578d0ebe314 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -1319,7 +1319,8 @@ static void kvm_gmem_invalidate(struct folio *folio) static inline void kvm_gmem_invalidate(struct folio *folio) {} #endif =20 -static void kvm_gmem_free_folio(struct folio *folio) +static void kvm_gmem_free_folio(struct address_space *mapping, + struct folio *folio) { folio_clear_unevictable(folio); =20 --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F245325CC51 for ; Wed, 14 May 2025 23:43:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266239; cv=none; b=lJDAT7tu2+NphI66NtGm2Dt0ztIzAiNmLdRhIBEzCkt7cHUXKJKw0ggP2QStJx4KeStN2BEQRbwdGTMMSbpWDmO7aXtcbpwT4XYS6WUbKdjPK5s9lyWOy4kKnnGrreS1mucwviAwxoByEy1rVdAxFHywbrbu7q67MuOyRn4U9Ho= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266239; c=relaxed/simple; bh=XwFjWjxtTsC5Fi5aJIX4EK+f5cmTCITRnhMR71nt1xI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SewDXK8q3kJlSEAzN1hO++qIPt7avN7lQUV6zqKwyZFi/N14trr1oQnyIm5AKi7vXtZZY4u2fchznXODtvPmx/qDwAB5q7TDuWfeU0WI65Idwe6i4hF7dBmcZoaN6xznjYf4ITsi28ObrmBD7Ro9P+2PKnvRI0HyjTgJfWCgOU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0qdpfmbI; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0qdpfmbI" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30c4b072631so348287a91.2 for ; Wed, 14 May 2025 16:43:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266236; x=1747871036; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=l5jpW989/YFyq0v/aJ7/h6dJUf2Wl55zcXhclfQe+Us=; b=0qdpfmbI37cyblg0+qfdJRk0ru4By0iCiW9qFbUvGZoE4jQY2rpe+l/tZ+qHxPPyap Lj7iKpPyj9OZNxwQOJtbxOBJqhMtdzUYP4QEPnE1V8mIeLdVWPTJR3jEY/O8f/lLnQ7m RqUijXdm14A04xrSjPxg6TmI2didXI6qeyjF0LZIyL/yrFvx83nYdVdv65b9zfPAiOQD CXOdOQxEsMXSMwIocRipV7aU9fF0VAhZIbuBHipq5byAMk+oHgrN/dq0f/4G0RZ/P5mK V/PLKpEgAM/X09TrFu0fNekIKq6+OsS8ug+BLHXy4JvP3ATlSdweo4OFF+E6CPas/78t IwKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266236; x=1747871036; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l5jpW989/YFyq0v/aJ7/h6dJUf2Wl55zcXhclfQe+Us=; b=eBxTeYySB+c2B7WyQJ0c70z7aZFat0Fy6Bs8vZ+ifCDSln+KoCt5K63fbjUIKKvkkU jL96BB9RIKXnHtO5wlBvnUZ6F/GfsOSrTvOnc6I0k0kzj6NI90F99SozayuKZU4VZFYm 7mxtHnLc8LDNEDnBnbS6GUIcpYuwxAexm5XgcRvQzGxGuhywTNsFn9xiPua99UbNPOjH aZooyV9f6fuNz7Civ2v+4RmOGh2WllbYgHiGp4csW7fHX8QPqN14QdkJrMnq4loVKLj+ iP5UTxVPpQLYSqcj2GmuR1nO+bzdzinDc6z5rPBVNrVd6HwIEftW3B8FCxI1S9e4/GRm GrrA== X-Forwarded-Encrypted: i=1; AJvYcCVu1sLWBNZabWDlJT5+S0ZNT3O/Gm0HrJqhW0vCx28Unz7MShNWLCM2pl4jfEizBWlIDmwXCc+s9r6XQhY=@vger.kernel.org X-Gm-Message-State: AOJu0Yz8/ZuHx//N0/f6HajNvM+TlVSaGvx9kqBgz6B4SeDHM7crggSy h12NPKEDBPd01Z/71OdKwOFDlLW5agSgl/jWHu61/Nhz3RrQSpfJOTmr8HNf3hkE8zwaW91rXk9 5k8OPUTUWjcOr/8psPH6djw== X-Google-Smtp-Source: AGHT+IEHmJ3lib+NJLUGZyWmKwas+lc+zkjE9XbH3FlgTmX876qh5dQdz6tJQxjBx6fJXXbdWv8qDStuh5HH5lswOQ== X-Received: from pjbsx3.prod.google.com ([2002:a17:90b:2cc3:b0:30a:7da4:f075]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a8c:b0:2ff:7331:18bc with SMTP id 98e67ed59e1d1-30e51907799mr665163a91.26.1747266236406; Wed, 14 May 2025 16:43:56 -0700 (PDT) Date: Wed, 14 May 2025 16:42:17 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <7753dc66229663fecea2498cf442a768cb7191ba.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 38/51] KVM: guest_memfd: Split allocator pages for guest_memfd use From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In this patch, newly allocated pages are split to 4K regular pages before providing them to the requester (fallocate() or KVM). During a private to shared conversion, folios are split if not already split. During a shared to private conversion, folios are merged if not already merged. When the folios are removed from the filemap on truncation, the allocator is given a chance to do any necessary prep for when the folio is freed. When a conversion is requested on a subfolio within a hugepage range, faulting must be prevented on the whole hugepage range for correctness. See related discussion at https://lore.kernel.org/all/Z__AAB_EFxGFEjDR@google.com/T/ Signed-off-by: Ackerley Tng Co-developed-by: Vishal Annapurve Signed-off-by: Vishal Annapurve Change-Id: Ib5ee22e3dae034c529773048a626ad98d4b10af3 --- mm/filemap.c | 2 + virt/kvm/guest_memfd.c | 501 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 483 insertions(+), 20 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index a02c3d8e00e8..a052f8e0c41e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -223,6 +223,7 @@ void __filemap_remove_folio(struct folio *folio, void *= shadow) filemap_unaccount_folio(mapping, folio); page_cache_delete(mapping, folio, shadow); } +EXPORT_SYMBOL_GPL(__filemap_remove_folio); =20 void filemap_free_folio(struct address_space *mapping, struct folio *folio) { @@ -258,6 +259,7 @@ void filemap_remove_folio(struct folio *folio) =20 filemap_free_folio(mapping, folio); } +EXPORT_SYMBOL_GPL(filemap_remove_folio); =20 /* * page_cache_delete_batch - delete several folios from page cache diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index c578d0ebe314..cb426c1dfef8 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -41,6 +41,11 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *g= mem, pgoff_t start, pgoff_t end); static void kvm_gmem_invalidate_end(struct kvm_gmem *gmem, pgoff_t start, pgoff_t end); +static int __kvm_gmem_filemap_add_folio(struct address_space *mapping, + struct folio *folio, pgoff_t index); +static int kvm_gmem_restructure_folios_in_range(struct inode *inode, + pgoff_t start, size_t nr_pages, + bool is_split_operation); =20 static struct kvm_gmem_inode_private *kvm_gmem_private(struct inode *inode) { @@ -126,6 +131,31 @@ static enum shareability kvm_gmem_shareability_get(str= uct inode *inode, return xa_to_value(entry); } =20 +static bool kvm_gmem_shareability_in_range(struct inode *inode, pgoff_t st= art, + size_t nr_pages, enum shareability m) +{ + struct maple_tree *mt; + pgoff_t last; + void *entry; + + mt =3D &kvm_gmem_private(inode)->shareability; + + last =3D start + nr_pages - 1; + mt_for_each(mt, entry, start, last) { + if (xa_to_value(entry) =3D=3D m) + return true; + } + + return false; +} + +static inline bool kvm_gmem_has_some_shared(struct inode *inode, pgoff_t s= tart, + size_t nr_pages) +{ + return kvm_gmem_shareability_in_range(inode, start, nr_pages, + SHAREABILITY_ALL); +} + static struct folio *kvm_gmem_get_shared_folio(struct inode *inode, pgoff_= t index) { if (kvm_gmem_shareability_get(inode, index) !=3D SHAREABILITY_ALL) @@ -241,6 +271,105 @@ static bool kvm_gmem_has_safe_refcount(struct address= _space *mapping, pgoff_t st return refcount_safe; } =20 +static void kvm_gmem_unmap_private(struct kvm_gmem *gmem, pgoff_t start, + pgoff_t end) +{ + struct kvm_memory_slot *slot; + struct kvm *kvm =3D gmem->kvm; + unsigned long index; + bool locked =3D false; + bool flush =3D false; + + xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { + pgoff_t pgoff =3D slot->gmem.pgoff; + + struct kvm_gfn_range gfn_range =3D { + .start =3D slot->base_gfn + max(pgoff, start) - pgoff, + .end =3D slot->base_gfn + min(pgoff + slot->npages, end) - pgoff, + .slot =3D slot, + .may_block =3D true, + /* This function is only concerned with private mappings. */ + .attr_filter =3D KVM_FILTER_PRIVATE, + }; + + if (!locked) { + KVM_MMU_LOCK(kvm); + locked =3D true; + } + + flush |=3D kvm_mmu_unmap_gfn_range(kvm, &gfn_range); + } + + if (flush) + kvm_flush_remote_tlbs(kvm); + + if (locked) + KVM_MMU_UNLOCK(kvm); +} + +static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, + pgoff_t end) +{ + struct kvm_memory_slot *slot; + struct kvm *kvm =3D gmem->kvm; + unsigned long index; + bool found_memslot; + + found_memslot =3D false; + xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { + gfn_t gfn_start; + gfn_t gfn_end; + pgoff_t pgoff; + + pgoff =3D slot->gmem.pgoff; + + gfn_start =3D slot->base_gfn + max(pgoff, start) - pgoff; + gfn_end =3D slot->base_gfn + min(pgoff + slot->npages, end) - pgoff; + + if (!found_memslot) { + found_memslot =3D true; + + KVM_MMU_LOCK(kvm); + kvm_mmu_invalidate_begin(kvm); + } + + kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end); + } + + if (found_memslot) + KVM_MMU_UNLOCK(kvm); +} + +static pgoff_t kvm_gmem_compute_invalidate_bound(struct inode *inode, + pgoff_t bound, bool start) +{ + size_t nr_pages; + void *priv; + + if (!kvm_gmem_has_custom_allocator(inode)) + return bound; + + priv =3D kvm_gmem_allocator_private(inode); + nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv); + + if (start) + return round_down(bound, nr_pages); + else + return round_up(bound, nr_pages); +} + +static pgoff_t kvm_gmem_compute_invalidate_start(struct inode *inode, + pgoff_t bound) +{ + return kvm_gmem_compute_invalidate_bound(inode, bound, true); +} + +static pgoff_t kvm_gmem_compute_invalidate_end(struct inode *inode, + pgoff_t bound) +{ + return kvm_gmem_compute_invalidate_bound(inode, bound, false); +} + static int kvm_gmem_shareability_apply(struct inode *inode, struct conversion_work *work, enum shareability m) @@ -299,35 +428,53 @@ static void kvm_gmem_convert_invalidate_begin(struct = inode *inode, struct conversion_work *work) { struct list_head *gmem_list; + pgoff_t invalidate_start; + pgoff_t invalidate_end; struct kvm_gmem *gmem; - pgoff_t end; + pgoff_t work_end; =20 - end =3D work->start + work->nr_pages; + work_end =3D work->start + work->nr_pages; + invalidate_start =3D kvm_gmem_compute_invalidate_start(inode, work->start= ); + invalidate_end =3D kvm_gmem_compute_invalidate_end(inode, work_end); =20 gmem_list =3D &inode->i_mapping->i_private_list; list_for_each_entry(gmem, gmem_list, entry) - kvm_gmem_invalidate_begin(gmem, work->start, end); + kvm_gmem_invalidate_begin(gmem, invalidate_start, invalidate_end); } =20 static void kvm_gmem_convert_invalidate_end(struct inode *inode, struct conversion_work *work) { struct list_head *gmem_list; + pgoff_t invalidate_start; + pgoff_t invalidate_end; struct kvm_gmem *gmem; - pgoff_t end; + pgoff_t work_end; =20 - end =3D work->start + work->nr_pages; + work_end =3D work->start + work->nr_pages; + invalidate_start =3D kvm_gmem_compute_invalidate_start(inode, work->start= ); + invalidate_end =3D kvm_gmem_compute_invalidate_end(inode, work_end); =20 gmem_list =3D &inode->i_mapping->i_private_list; list_for_each_entry(gmem, gmem_list, entry) - kvm_gmem_invalidate_end(gmem, work->start, end); + kvm_gmem_invalidate_end(gmem, invalidate_start, invalidate_end); } =20 static int kvm_gmem_convert_should_proceed(struct inode *inode, struct conversion_work *work, bool to_shared, pgoff_t *error_index) { - if (!to_shared) { + if (to_shared) { + struct list_head *gmem_list; + struct kvm_gmem *gmem; + pgoff_t work_end; + + work_end =3D work->start + work->nr_pages; + + gmem_list =3D &inode->i_mapping->i_private_list; + list_for_each_entry(gmem, gmem_list, entry) + kvm_gmem_unmap_private(gmem, work->start, work_end); + } else { unmap_mapping_pages(inode->i_mapping, work->start, work->nr_pages, false); =20 @@ -340,6 +487,27 @@ static int kvm_gmem_convert_should_proceed(struct inod= e *inode, return 0; } =20 +static int kvm_gmem_convert_execute_work(struct inode *inode, + struct conversion_work *work, + bool to_shared) +{ + enum shareability m; + int ret; + + m =3D to_shared ? SHAREABILITY_ALL : SHAREABILITY_GUEST; + ret =3D kvm_gmem_shareability_apply(inode, work, m); + if (ret) + return ret; + /* + * Apply shareability first so split/merge can operate on new + * shareability state. + */ + ret =3D kvm_gmem_restructure_folios_in_range( + inode, work->start, work->nr_pages, to_shared); + + return ret; +} + static int kvm_gmem_convert_range(struct file *file, pgoff_t start, size_t nr_pages, bool shared, pgoff_t *error_index) @@ -371,18 +539,21 @@ static int kvm_gmem_convert_range(struct file *file, = pgoff_t start, =20 list_for_each_entry(work, &work_list, list) { rollback_stop_item =3D work; - ret =3D kvm_gmem_shareability_apply(inode, work, m); + + ret =3D kvm_gmem_convert_execute_work(inode, work, shared); if (ret) break; } =20 if (ret) { - m =3D shared ? SHAREABILITY_GUEST : SHAREABILITY_ALL; list_for_each_entry(work, &work_list, list) { + int r; + + r =3D kvm_gmem_convert_execute_work(inode, work, !shared); + WARN_ON(r); + if (work =3D=3D rollback_stop_item) break; - - WARN_ON(kvm_gmem_shareability_apply(inode, work, m)); } } =20 @@ -434,6 +605,277 @@ static int kvm_gmem_ioctl_convert_range(struct file *= file, return ret; } =20 +#ifdef CONFIG_KVM_GMEM_HUGETLB + +static inline void __filemap_remove_folio_for_restructuring(struct folio *= folio) +{ + struct address_space *mapping =3D folio->mapping; + + spin_lock(&mapping->host->i_lock); + xa_lock_irq(&mapping->i_pages); + + __filemap_remove_folio(folio, NULL); + + xa_unlock_irq(&mapping->i_pages); + spin_unlock(&mapping->host->i_lock); +} + +/** + * filemap_remove_folio_for_restructuring() - Remove @folio from filemap f= or + * split/merge. + * + * @folio: the folio to be removed. + * + * Similar to filemap_remove_folio(), but skips LRU-related calls (meaning= less + * for guest_memfd), and skips call to ->free_folio() to maintain folio fl= ags. + * + * Context: Expects only the filemap's refcounts to be left on the folio. = Will + * freeze these refcounts away so that no other users will interf= ere + * with restructuring. + */ +static inline void filemap_remove_folio_for_restructuring(struct folio *fo= lio) +{ + int filemap_refcount; + + filemap_refcount =3D folio_nr_pages(folio); + while (!folio_ref_freeze(folio, filemap_refcount)) { + /* + * At this point only filemap refcounts are expected, hence okay + * to spin until speculative refcounts go away. + */ + WARN_ONCE(1, "Spinning on folio=3D%p refcount=3D%d", folio, folio_ref_co= unt(folio)); + } + + folio_lock(folio); + __filemap_remove_folio_for_restructuring(folio); + folio_unlock(folio); +} + +/** + * kvm_gmem_split_folio_in_filemap() - Split @folio within filemap in @ino= de. + * + * @inode: inode containing the folio. + * @folio: folio to be split. + * + * Split a folio into folios of size PAGE_SIZE. Will clean up folio from f= ilemap + * and add back the split folios. + * + * Context: Expects that before this call, folio's refcount is just the + * filemap's refcounts. After this function returns, the split fo= lios' + * refcounts will also be filemap's refcounts. + * Return: 0 on success or negative error otherwise. + */ +static int kvm_gmem_split_folio_in_filemap(struct inode *inode, struct fol= io *folio) +{ + size_t orig_nr_pages; + pgoff_t orig_index; + size_t i, j; + int ret; + + orig_nr_pages =3D folio_nr_pages(folio); + if (orig_nr_pages =3D=3D 1) + return 0; + + orig_index =3D folio->index; + + filemap_remove_folio_for_restructuring(folio); + + ret =3D kvm_gmem_allocator_ops(inode)->split_folio(folio); + if (ret) + goto err; + + for (i =3D 0; i < orig_nr_pages; ++i) { + struct folio *f =3D page_folio(folio_page(folio, i)); + + ret =3D __kvm_gmem_filemap_add_folio(inode->i_mapping, f, + orig_index + i); + if (ret) + goto rollback; + } + + return ret; + +rollback: + for (j =3D 0; j < i; ++j) { + struct folio *f =3D page_folio(folio_page(folio, j)); + + filemap_remove_folio_for_restructuring(f); + } + + kvm_gmem_allocator_ops(inode)->merge_folio(folio); +err: + WARN_ON(__kvm_gmem_filemap_add_folio(inode->i_mapping, folio, orig_index)= ); + + return ret; +} + +static inline int kvm_gmem_try_split_folio_in_filemap(struct inode *inode, + struct folio *folio) +{ + size_t to_nr_pages; + void *priv; + + if (!kvm_gmem_has_custom_allocator(inode)) + return 0; + + priv =3D kvm_gmem_allocator_private(inode); + to_nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_page(priv); + + if (kvm_gmem_has_some_shared(inode, folio->index, to_nr_pages)) + return kvm_gmem_split_folio_in_filemap(inode, folio); + + return 0; +} + +/** + * kvm_gmem_merge_folio_in_filemap() - Merge @first_folio within filemap in + * @inode. + * + * @inode: inode containing the folio. + * @first_folio: first folio among folios to be merged. + * + * Will clean up subfolios from filemap and add back the merged folio. + * + * Context: Expects that before this call, all subfolios only have filemap + * refcounts. After this function returns, the merged folio will = only + * have filemap refcounts. + * Return: 0 on success or negative error otherwise. + */ +static int kvm_gmem_merge_folio_in_filemap(struct inode *inode, + struct folio *first_folio) +{ + size_t to_nr_pages; + pgoff_t index; + void *priv; + size_t i; + int ret; + + index =3D first_folio->index; + + priv =3D kvm_gmem_allocator_private(inode); + to_nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv); + if (folio_nr_pages(first_folio) =3D=3D to_nr_pages) + return 0; + + for (i =3D 0; i < to_nr_pages; ++i) { + struct folio *f =3D page_folio(folio_page(first_folio, i)); + + filemap_remove_folio_for_restructuring(f); + } + + kvm_gmem_allocator_ops(inode)->merge_folio(first_folio); + + ret =3D __kvm_gmem_filemap_add_folio(inode->i_mapping, first_folio, index= ); + if (ret) + goto err_split; + + return ret; + +err_split: + WARN_ON(kvm_gmem_allocator_ops(inode)->split_folio(first_folio)); + for (i =3D 0; i < to_nr_pages; ++i) { + struct folio *f =3D page_folio(folio_page(first_folio, i)); + + WARN_ON(__kvm_gmem_filemap_add_folio(inode->i_mapping, f, index + i)); + } + + return ret; +} + +static inline int kvm_gmem_try_merge_folio_in_filemap(struct inode *inode, + struct folio *first_folio) +{ + size_t to_nr_pages; + void *priv; + + priv =3D kvm_gmem_allocator_private(inode); + to_nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv); + + if (kvm_gmem_has_some_shared(inode, first_folio->index, to_nr_pages)) + return 0; + + return kvm_gmem_merge_folio_in_filemap(inode, first_folio); +} + +static int kvm_gmem_restructure_folios_in_range(struct inode *inode, + pgoff_t start, size_t nr_pages, + bool is_split_operation) +{ + size_t to_nr_pages; + pgoff_t index; + pgoff_t end; + void *priv; + int ret; + + if (!kvm_gmem_has_custom_allocator(inode)) + return 0; + + end =3D start + nr_pages; + + /* Round to allocator page size, to check all (huge) pages in range. */ + priv =3D kvm_gmem_allocator_private(inode); + to_nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv); + + start =3D round_down(start, to_nr_pages); + end =3D round_up(end, to_nr_pages); + + for (index =3D start; index < end; index +=3D to_nr_pages) { + struct folio *f; + + f =3D filemap_get_folio(inode->i_mapping, index); + if (IS_ERR(f)) + continue; + + /* Leave just filemap's refcounts on the folio. */ + folio_put(f); + + if (is_split_operation) + ret =3D kvm_gmem_split_folio_in_filemap(inode, f); + else + ret =3D kvm_gmem_try_merge_folio_in_filemap(inode, f); + + if (ret) + goto rollback; + } + return ret; + +rollback: + for (index -=3D to_nr_pages; index >=3D start; index -=3D to_nr_pages) { + struct folio *f; + + f =3D filemap_get_folio(inode->i_mapping, index); + if (IS_ERR(f)) + continue; + + /* Leave just filemap's refcounts on the folio. */ + folio_put(f); + + if (is_split_operation) + WARN_ON(kvm_gmem_merge_folio_in_filemap(inode, f)); + else + WARN_ON(kvm_gmem_split_folio_in_filemap(inode, f)); + } + + return ret; +} + +#else + +static inline int kvm_gmem_try_split_folio_in_filemap(struct inode *inode, + struct folio *folio) +{ + return 0; +} + +static int kvm_gmem_restructure_folios_in_range(struct inode *inode, + pgoff_t start, size_t nr_pages, + bool is_split_operation) +{ + return 0; +} + +#endif + #else =20 static int kvm_gmem_shareability_setup(struct maple_tree *mt, loff_t size,= u64 flags) @@ -563,11 +1005,16 @@ static struct folio *kvm_gmem_get_folio(struct inode= *inode, pgoff_t index) return folio; =20 if (kvm_gmem_has_custom_allocator(inode)) { - void *p =3D kvm_gmem_allocator_private(inode); + size_t nr_pages; + void *p; =20 + p =3D kvm_gmem_allocator_private(inode); folio =3D kvm_gmem_allocator_ops(inode)->alloc_folio(p); if (IS_ERR(folio)) return folio; + + nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(p); + index_floor =3D round_down(index, nr_pages); } else { gfp_t gfp =3D mapping_gfp_mask(inode->i_mapping); =20 @@ -580,10 +1027,11 @@ static struct folio *kvm_gmem_get_folio(struct inode= *inode, pgoff_t index) folio_put(folio); return ERR_PTR(ret); } + + index_floor =3D index; } allocated_size =3D folio_size(folio); =20 - index_floor =3D round_down(index, folio_nr_pages(folio)); ret =3D kvm_gmem_filemap_add_folio(inode->i_mapping, folio, index_floor); if (ret) { folio_put(folio); @@ -600,6 +1048,13 @@ static struct folio *kvm_gmem_get_folio(struct inode = *inode, pgoff_t index) return ERR_PTR(ret); } =20 + /* Leave just filemap's refcounts on folio. */ + folio_put(folio); + + ret =3D kvm_gmem_try_split_folio_in_filemap(inode, folio); + if (ret) + goto err; + spin_lock(&inode->i_lock); inode->i_blocks +=3D allocated_size / 512; spin_unlock(&inode->i_lock); @@ -608,14 +1063,17 @@ static struct folio *kvm_gmem_get_folio(struct inode= *inode, pgoff_t index) * folio is the one that is allocated, this gets the folio at the * requested index. */ - folio =3D page_folio(folio_file_page(folio, index)); - folio_lock(folio); + folio =3D filemap_lock_folio(inode->i_mapping, index); =20 return folio; + +err: + filemap_remove_folio(folio); + return ERR_PTR(ret); } =20 -static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, - pgoff_t end) +static void kvm_gmem_invalidate_begin_and_zap(struct kvm_gmem *gmem, + pgoff_t start, pgoff_t end) { bool flush =3D false, found_memslot =3D false; struct kvm_memory_slot *slot; @@ -848,7 +1306,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, l= off_t offset, loff_t len) filemap_invalidate_lock(inode->i_mapping); =20 list_for_each_entry(gmem, gmem_list, entry) - kvm_gmem_invalidate_begin(gmem, start, end); + kvm_gmem_invalidate_begin_and_zap(gmem, start, end); =20 if (kvm_gmem_has_custom_allocator(inode)) { kvm_gmem_truncate_inode_range(inode, offset, offset + len); @@ -978,7 +1436,7 @@ static int kvm_gmem_release(struct inode *inode, struc= t file *file) * Zap all SPTEs pointed at by this file. Do not free the backing * memory, as its lifetime is associated with the inode, not the file. */ - kvm_gmem_invalidate_begin(gmem, 0, -1ul); + kvm_gmem_invalidate_begin_and_zap(gmem, 0, -1ul); kvm_gmem_invalidate_end(gmem, 0, -1ul); =20 list_del(&gmem->entry); @@ -1289,7 +1747,7 @@ static int kvm_gmem_error_folio(struct address_space = *mapping, struct folio *fol end =3D start + folio_nr_pages(folio); =20 list_for_each_entry(gmem, gmem_list, entry) - kvm_gmem_invalidate_begin(gmem, start, end); + kvm_gmem_invalidate_begin_and_zap(gmem, start, end); =20 /* * Do not truncate the range, what action is taken in response to the @@ -1330,6 +1788,9 @@ static void kvm_gmem_free_folio(struct address_space = *mapping, */ folio_clear_uptodate(folio); =20 + if (kvm_gmem_has_custom_allocator(mapping->host)) + kvm_gmem_allocator_ops(mapping->host)->free_folio(folio); + kvm_gmem_invalidate(folio); } =20 --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B069125D1FD for ; Wed, 14 May 2025 23:43:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266240; cv=none; b=BUp0oDZNK++ZQcivVjxkZql/I9NkwPVs6BWg0igyoeV4uPZBpgWyXFIurd++QDF9wq5EEb3tGWQZ82GMcGyM1QsAKsWCwNkDZop62B7cFQpJ/vAPRMI4+j5k3NJgk61BG/DSMyCcWEVLijFf7emNVe+bWfDk+/Q+Q0JMgKx5FTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266240; c=relaxed/simple; bh=102GvifG4zt1ttrSz67mMu9EOxHXu/R1+sbn9BQTSZg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MPpNLVud1I1CZKp/wiya60bDe8MvrEPFMAfCzIIcR2TWmIbcpBALEd5WjRb73kgHS7fDGKmoNPJUBV6FGkVoyLyFfXPwWBIyhKvT+3oa4clsiOtJl2Ny5pIPm92mRSLTP+PZhKhVCp9oryfdUcs27Fg3D97lHhPN6bBBwzskHpw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=N+4E1C+O; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="N+4E1C+O" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-22e45821fd7so3487065ad.0 for ; Wed, 14 May 2025 16:43:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266238; x=1747871038; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pmGFcMqll2ZECWkoZDuet36vvCpdEEo2q3JYb5OCRKk=; b=N+4E1C+O66u/2AsRaBgT91MJS9ng42C+9NjRXQ3YNyKU42nyuFhpbcN/JwWuKvlEPU en9fcu8azoG/a1PQ9ifKcy0UelB0dOwmMHBEn3F8BobX7Y0NcbQu84cWord+3mL4H/ln xiQIdnqLrpcgCGVHBFog50fU327p7AhtoEDvulCt5TV1Ze50XVp/fOCRUloIh8ssPVQ5 QIVgG7r7e36Z6YwJl7QyeK0b4B+WCmU4koDKATAzizKCStQeElL5CQBd25B78Mmo1UDa xfPtDymGbD34ZlOi7ghcLmQfbgRHlM3lNj5NTX7DORFiAHu4VCebVNrob22VaWRdbc6f K2XQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266238; x=1747871038; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pmGFcMqll2ZECWkoZDuet36vvCpdEEo2q3JYb5OCRKk=; b=DJCgROdP5ugMuWknl+l8x30YAfkilGlc59Mhf+WgUKo/pLs2KQagPzKHIfvX1vTHMQ Is8+EkhswW+qoN5Os797nqE2AU4rnBwkeLaiBPZ59leYsM33qD+Lfez2Lbful7DQc/fE HV6WN81Qmp2z/DDeo8mb+4h671xpIDOX4LOZJUs518zVsFGv2u8uX1zvTlcqQAm+Rr8d Iz90GSuucEJ6wROfywCVZErj1Bu1gHZFXHkaK48NNmk7zhcJX7/bODhXiSx4Ie6l/h89 KAbYVcTCqquDrD2hslLt2SAtm1W5TOGL6wxWD9mQDRLBBSQ+kS/iZd1g/1q06vZeyHGu IAZg== X-Forwarded-Encrypted: i=1; AJvYcCWzloIRZta1/ae6z7lI/WWQlxtRt8v8jHlvqOz8QrQx9SO1fxePvYxBi/WC9LKrTSov9HEmvgXtdtd9Zww=@vger.kernel.org X-Gm-Message-State: AOJu0Yzo4QUSl824taSvHoMbneWq+Hhx4cWmAW+f+U/270OpFNBUBgdZ wUEVapzUNUuP/4lMUZf+x+I1zb3PPhu0Um49Iy1KAj49qRx01WGjxzYVdGCj+Dif9RwP7IOZKUO IIAOMFff2wOYJNzSviLaJ3A== X-Google-Smtp-Source: AGHT+IGyG3aK+TPRgSsxWpjag/A4ruHzJ2SmD21jLoxWwmPBfdNpP7Yc+Ypfwlkx2TEetNXBaMC3rrsf4RqZeQwBkA== X-Received: from pjbsz11.prod.google.com ([2002:a17:90b:2d4b:b0:308:7499:3dfc]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:228a:b0:223:60ce:2451 with SMTP id d9443c01a7336-231b5e26004mr6306905ad.15.1747266237995; Wed, 14 May 2025 16:43:57 -0700 (PDT) Date: Wed, 14 May 2025 16:42:18 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <625bd9c98ad4fd49d7df678f0186129226f77d7d.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 39/51] KVM: guest_memfd: Merge and truncate on fallocate(PUNCH_HOLE) From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Merge and truncate on fallocate(PUNCH_HOLE), but if the file is being closed, defer merging to folio_put() callback. Change-Id: Iae26987756e70c83f3b121edbc0ed0bc105eec0d Signed-off-by: Ackerley Tng --- virt/kvm/guest_memfd.c | 76 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 68 insertions(+), 8 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index cb426c1dfef8..04b1513c2998 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -859,6 +859,35 @@ static int kvm_gmem_restructure_folios_in_range(struct= inode *inode, return ret; } =20 +static long kvm_gmem_merge_truncate_indices(struct inode *inode, pgoff_t i= ndex, + size_t nr_pages) +{ + struct folio *f; + pgoff_t unused; + long num_freed; + + unmap_mapping_pages(inode->i_mapping, index, nr_pages, false); + + if (!kvm_gmem_has_safe_refcount(inode->i_mapping, index, nr_pages, &unuse= d)) + return -EAGAIN; + + f =3D filemap_get_folio(inode->i_mapping, index); + if (IS_ERR(f)) + return 0; + + /* Leave just filemap's refcounts on the folio. */ + folio_put(f); + + WARN_ON(kvm_gmem_merge_folio_in_filemap(inode, f)); + + num_freed =3D folio_nr_pages(f); + folio_lock(f); + truncate_inode_folio(inode->i_mapping, f); + folio_unlock(f); + + return num_freed; +} + #else =20 static inline int kvm_gmem_try_split_folio_in_filemap(struct inode *inode, @@ -874,6 +903,12 @@ static int kvm_gmem_restructure_folios_in_range(struct= inode *inode, return 0; } =20 +static long kvm_gmem_merge_truncate_indices(struct inode *inode, pgoff_t i= ndex, + size_t nr_pages) +{ + return 0; +} + #endif =20 #else @@ -1182,8 +1217,10 @@ static long kvm_gmem_truncate_indices(struct address= _space *mapping, * * Removes folios beginning @index for @nr_pages from filemap in @inode, u= pdates * inode metadata. + * + * Return: 0 on success and negative error otherwise. */ -static void kvm_gmem_truncate_inode_aligned_pages(struct inode *inode, +static long kvm_gmem_truncate_inode_aligned_pages(struct inode *inode, pgoff_t index, size_t nr_pages) { @@ -1191,19 +1228,34 @@ static void kvm_gmem_truncate_inode_aligned_pages(s= truct inode *inode, long num_freed; pgoff_t idx; void *priv; + long ret; =20 priv =3D kvm_gmem_allocator_private(inode); nr_per_huge_page =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(pri= v); =20 + ret =3D 0; num_freed =3D 0; for (idx =3D index; idx < index + nr_pages; idx +=3D nr_per_huge_page) { - num_freed +=3D kvm_gmem_truncate_indices( - inode->i_mapping, idx, nr_per_huge_page); + if (mapping_exiting(inode->i_mapping) || + !kvm_gmem_has_some_shared(inode, idx, nr_per_huge_page)) { + num_freed +=3D kvm_gmem_truncate_indices( + inode->i_mapping, idx, nr_per_huge_page); + } else { + ret =3D kvm_gmem_merge_truncate_indices(inode, idx, + nr_per_huge_page); + if (ret < 0) + break; + + num_freed +=3D ret; + ret =3D 0; + } } =20 spin_lock(&inode->i_lock); inode->i_blocks -=3D (num_freed << PAGE_SHIFT) / 512; spin_unlock(&inode->i_lock); + + return ret; } =20 /** @@ -1252,8 +1304,10 @@ static void kvm_gmem_zero_range(struct address_space= *mapping, * * Removes full (huge)pages from the filemap and zeroing incomplete * (huge)pages. The pages in the range may be split. + * + * Return: 0 on success and negative error otherwise. */ -static void kvm_gmem_truncate_inode_range(struct inode *inode, loff_t lsta= rt, +static long kvm_gmem_truncate_inode_range(struct inode *inode, loff_t lsta= rt, loff_t lend) { pgoff_t full_hpage_start; @@ -1263,6 +1317,7 @@ static void kvm_gmem_truncate_inode_range(struct inod= e *inode, loff_t lstart, pgoff_t start; pgoff_t end; void *priv; + long ret; =20 priv =3D kvm_gmem_allocator_private(inode); nr_per_huge_page =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(pri= v); @@ -1279,10 +1334,11 @@ static void kvm_gmem_truncate_inode_range(struct in= ode *inode, loff_t lstart, kvm_gmem_zero_range(inode->i_mapping, start, zero_end); } =20 + ret =3D 0; if (full_hpage_end > full_hpage_start) { nr_pages =3D full_hpage_end - full_hpage_start; - kvm_gmem_truncate_inode_aligned_pages(inode, full_hpage_start, - nr_pages); + ret =3D kvm_gmem_truncate_inode_aligned_pages( + inode, full_hpage_start, nr_pages); } =20 if (end > full_hpage_end && end > full_hpage_start) { @@ -1290,6 +1346,8 @@ static void kvm_gmem_truncate_inode_range(struct inod= e *inode, loff_t lstart, =20 kvm_gmem_zero_range(inode->i_mapping, zero_start, end); } + + return ret; } =20 static long kvm_gmem_punch_hole(struct inode *inode, loff_t offset, loff_t= len) @@ -1298,6 +1356,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, = loff_t offset, loff_t len) pgoff_t start =3D offset >> PAGE_SHIFT; pgoff_t end =3D (offset + len) >> PAGE_SHIFT; struct kvm_gmem *gmem; + long ret; =20 /* * Bindings must be stable across invalidation to ensure the start+end @@ -1308,8 +1367,9 @@ static long kvm_gmem_punch_hole(struct inode *inode, = loff_t offset, loff_t len) list_for_each_entry(gmem, gmem_list, entry) kvm_gmem_invalidate_begin_and_zap(gmem, start, end); =20 + ret =3D 0; if (kvm_gmem_has_custom_allocator(inode)) { - kvm_gmem_truncate_inode_range(inode, offset, offset + len); + ret =3D kvm_gmem_truncate_inode_range(inode, offset, offset + len); } else { /* Page size is PAGE_SIZE, so use optimized truncation function. */ truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); @@ -1320,7 +1380,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, = loff_t offset, loff_t len) =20 filemap_invalidate_unlock(inode->i_mapping); =20 - return 0; + return ret; } =20 static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t l= en) --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 40B1325DCE5 for ; Wed, 14 May 2025 23:44:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266241; cv=none; b=nvsSRQ0riKPIGf0jR49FkmcdTSPyhg1bdh0AhAtcdJqK92lIAQLuBkMpG6QEq8r7qblWpiDQECcqFQ9QkkyavDw3OlIP823gJwTisLGhqTG4r3N3gDvqZMzXY70tHeJBDPod9NhY262S9nFysxwW0Tiw4BBoJIvsvBv1HByPAbs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266241; c=relaxed/simple; bh=3P1w96IQ4XSGB18Ay+axlVK9B9zTFk4aMEWu6OYLYcM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cBiN4NVdV0J/QUyrb3p07nLy3tYKWAhrTDos7HVyFoqNs0OQ4WQ5fmJmrEzO9ZJMlnJ7QlttqJLOWwDYlcytWoxdjFeEiFQ5P003BNOuCIWDUOvpKlfHaRvSA1xDdXneFd0a7/rpQ5mw4J4TU3700NqVq2ClA9bkWP5tX8f170M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xcZycVHj; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xcZycVHj" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30c4bdd0618so343437a91.1 for ; Wed, 14 May 2025 16:44:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266240; x=1747871040; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=w59iMjR8iadMOYtl6Kc2hNNZvpxUqRzRIVqBODddmXk=; b=xcZycVHjw4RW8M98HOF38surOmCoUT/RcVNXxjoYmXfH9icQ7eV6ADwT8IiFkZlj5M Pk/ljf6tJvphd+g9iL+9GgoaEQQ1SCdsP7SlLy83xGR49hnpp3B4lUIzg6GqE+3NQUko IkrPuY3oW7qPqNp2nCBX0w6YGUo1TdNZP4Mnv9aASeFQyQIrQFP4cZ9T1U3Shl67Hn8V JkV7EYXDaQWTOD6u0/rxSR4HOqBCszHwJ+yza/kDK/NWsCN9qrGd0SjmVGXzIZserK7/ 9kJ5zDqZFQH8jWudyy9kwytRoOMm0lLjhnJgRtDsByPmbTGlSX1ElftINp8ZsEgIsjXe d9Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266240; x=1747871040; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=w59iMjR8iadMOYtl6Kc2hNNZvpxUqRzRIVqBODddmXk=; b=lYThTaULfVnYANbkbGi7oHNOvAOX9hoZqRX1ujSulS7Nj9ricDzoQrDgrcXHP/Zl+V 49KakiQrvYbH3sxKzjW7poZp8ZwMXccgwfi7PfRAs89fA0EkN1/7VNGRrMwilV+6hN76 kdf26QPn0BIiwy/DNPdAJ6oXfPSGPd1qJZ3XgBkB8gC7G6dDpGQWFfHGoH1OUlMKwEZU C3IxvQNKaQtVvLw7wSgh7ARfmsdXF9cmP7VpBjsdqvsHQnyrRz3mps6XSBGcyYpHwRqR oeV3CWMg8Yci5eUKsYh8YaP1sOpaY+bPQIVG8eo+5MMRXLeCcfsOx4BvmFSyEMRUVfYB UuDg== X-Forwarded-Encrypted: i=1; AJvYcCWBg6TUDqNbK/eHy19iudJ3jiXMyt8hUMRe2PgIwQ/A4OLyi+B1Yr/8D5txXnexAkgJ66sWLIsXr2KQs24=@vger.kernel.org X-Gm-Message-State: AOJu0Yw/h7FOc8TLCqTMESnGHupvxj4gwXv58E6q7OLVCEn/DvfIZzmh BUBm+DbPmoeo6T7qsNG3lLkXyCIjLo7py2l/kPJ/JHLS1oRn68LX+Hde4T5wkdCB27ZzlVUCbzz TxglzwUTnbMyZkPb+Op6ytQ== X-Google-Smtp-Source: AGHT+IGyLkn3nqeNKfrPwmXh8d7PdkrS0NNj4rvg8/rg8+lUUn5fqkFx2CkNgkz2N4cDBWoYAECpFAIq4YtOUG++oA== X-Received: from pjbsy6.prod.google.com ([2002:a17:90b:2d06:b0:2e0:915d:d594]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:38cc:b0:2ff:5ec1:6c6a with SMTP id 98e67ed59e1d1-30e2e5dcdcamr9260937a91.18.1747266239615; Wed, 14 May 2025 16:43:59 -0700 (PDT) Date: Wed, 14 May 2025 16:42:19 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <3f48795c0c34f4faf661394e5ad9805f9014ae23.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 40/51] KVM: guest_memfd: Update kvm_gmem_mapping_order to account for page status From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" kvm_gmem_mapping_order() should return the maximum mapping order for a gfn if a page were to be faulted in for that gfn. For inodes that support a custom allocator, the maximum mapping order should be determined by the custom allocator in conjunction with guest_memfd. This patch updates kvm_gmem_mapping_order() to take into account that for the guestmem_hugetlb custom allocator, pages are split if any page in a huge page range is shared. Change-Id: I5c061af6cefdcbd708a4334cd58edc340afcf44e Signed-off-by: Ackerley Tng --- virt/kvm/guest_memfd.c | 72 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 62 insertions(+), 10 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 04b1513c2998..8b5fe1360e58 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -709,19 +709,27 @@ static int kvm_gmem_split_folio_in_filemap(struct ino= de *inode, struct folio *fo return ret; } =20 +static inline bool kvm_gmem_should_split_at_index(struct inode *inode, + pgoff_t index) +{ + pgoff_t index_floor; + size_t nr_pages; + void *priv; + + priv =3D kvm_gmem_allocator_private(inode); + nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv); + index_floor =3D round_down(index, nr_pages); + + return kvm_gmem_has_some_shared(inode, index_floor, nr_pages); +} + static inline int kvm_gmem_try_split_folio_in_filemap(struct inode *inode, struct folio *folio) { - size_t to_nr_pages; - void *priv; - if (!kvm_gmem_has_custom_allocator(inode)) return 0; =20 - priv =3D kvm_gmem_allocator_private(inode); - to_nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_page(priv); - - if (kvm_gmem_has_some_shared(inode, folio->index, to_nr_pages)) + if (kvm_gmem_should_split_at_index(inode, folio->index)) return kvm_gmem_split_folio_in_filemap(inode, folio); =20 return 0; @@ -890,6 +898,12 @@ static long kvm_gmem_merge_truncate_indices(struct ino= de *inode, pgoff_t index, =20 #else =20 +static inline bool kvm_gmem_should_split_at_index(struct inode *inode, + pgoff_t index) +{ + return false; +} + static inline int kvm_gmem_try_split_folio_in_filemap(struct inode *inode, struct folio *folio) { @@ -1523,7 +1537,7 @@ static inline struct file *kvm_gmem_get_file(struct k= vm_memory_slot *slot) return get_file_active(&slot->gmem.file); } =20 -static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot *slot, gfn_t gfn) +static pgoff_t kvm_gmem_get_index(const struct kvm_memory_slot *slot, gfn_= t gfn) { return gfn - slot->base_gfn + slot->gmem.pgoff; } @@ -2256,14 +2270,52 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_me= mory_slot *slot, EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); =20 /** - * Returns the mapping order for this @gfn in @slot. + * kvm_gmem_mapping_order() - Get the mapping order for this @gfn in @slot. + * + * @slot: the memslot that gfn belongs to. + * @gfn: the gfn to look up mapping order for. * * This is equal to max_order that would be returned if kvm_gmem_get_pfn()= were * called now. + * + * Return: the mapping order for this @gfn in @slot. */ int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn) { - return 0; + struct inode *inode; + struct file *file; + int ret; + + file =3D kvm_gmem_get_file((struct kvm_memory_slot *)slot); + if (!file) + return 0; + + inode =3D file_inode(file); + + ret =3D 0; + if (kvm_gmem_has_custom_allocator(inode)) { + bool should_split; + pgoff_t index; + + index =3D kvm_gmem_get_index(slot, gfn); + + filemap_invalidate_lock_shared(inode->i_mapping); + should_split =3D kvm_gmem_should_split_at_index(inode, index); + filemap_invalidate_unlock_shared(inode->i_mapping); + + if (!should_split) { + size_t nr_pages; + void *priv; + + priv =3D kvm_gmem_allocator_private(inode); + nr_pages =3D kvm_gmem_allocator_ops(inode)->nr_pages_in_folio(priv); + + ret =3D ilog2(nr_pages); + } + } + + fput(file); + return ret; } EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order); =20 --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC0822627E9 for ; Wed, 14 May 2025 23:44:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266243; cv=none; b=tKXJQAr8XfGmUE22NWVApizh4h8CaXYcXDUIwhyPzD0V923PP6QGXYGnrzF8KdgYBF0khtPwrTb4uJ6YSo4ZIveG39aEvNtLDRyJZ4hRH4EYEmfoRKB4EgdkCaCuXn0Dch3JNNc1NEXHcf1BfJHjbhTA8pV4BhVBlkWtk9xVMo8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266243; c=relaxed/simple; bh=hpWPV5aHOEDKO6rFgPmyIVni5fletgcR4okZkc1dIp4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=DOOl+TJ/vbXTorTScqMqyU7mYzPyFGo6MC0wyI4gsPa304iYEpGtIjjViPA9sUIxZL2B6dCr2YRgZ0NS8iZCOczOxHTAn0kUL3VY3DCPmEU6Mijnq1AeK7IEJ9nt0EXtHolQVsR4V/bQm7kxUaeLZcERHoA/Og/ea6cZ59bdpeM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=iTFFvdb9; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="iTFFvdb9" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-22e4c94ed70so3375095ad.0 for ; Wed, 14 May 2025 16:44:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266241; x=1747871041; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jN2ogTCvzAJBcchmVAWl03ZLuEFD8lFgW6Z8LiBDfqo=; b=iTFFvdb9nIUJLrJj4bTkI/ktx8t3YbZdIJrIGHr9rjLS+HsGp0mWLx5H/mYMPn+WST 0+uRqA0RjTPEaWTxtRx1R1jSS0/CC/NdFU09B38OsPB/pryPfu85aQSzueEqFvx6mHEt /ROlvRHn+d06pIuyv3LpUeIX0w4MPeZJ8R2a5S30DNRVWy1inghgdTe3fR/06dPNWm+C IqljklrmU3OQOYVQG1zDu7xTMpa2DJdJh7+TM9tnwIE/CB3CrCbgvzI7YosX26h4WXet TDkf8y1bIAik5L95iaejMfEmg7oOc0uE0M+SlbcxZ5AaH+gR0ybybmCUNKuozQc+R5+R U17g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266241; x=1747871041; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jN2ogTCvzAJBcchmVAWl03ZLuEFD8lFgW6Z8LiBDfqo=; b=jVEfAcFnvGkS/VwoYug7SkLccvcRd1V67jlSuc4vqK4zOA7rf//dtST542ULNrpuO4 nie4gPcSc1ZrQGc2kiHIn28VcIC7VI7PO4bLbC7G7MdtqMF03aCo70gKJfnmbgkwAJmS agrxfsuW8oXyU9mEjVekt6m3Lg8HcTstPyhpRbFgaLbnhu1U0tLqYfGKg/6eigbk3hT8 JeeB4o6fUlNSMcRpzhI279FcQ8VoQzTVD4cV9Sv3jeLOiukSg9nclB6OHT5hzRbeuzyz NhVSeuLNX14Pb9jFDlolTWVcHEW3GdIbHT2IZZ/oyn4NVFluT+fIMRoLO1cG7OwEUWfA Uz5A== X-Forwarded-Encrypted: i=1; AJvYcCUmROxm7ty/7wKNKjS1qE0GsmcQ2yxM8R3jJdKwNtNlUHWGeQdQ17LfMjMOP4bL9nB1nRNmiWSV4QA0VPk=@vger.kernel.org X-Gm-Message-State: AOJu0Yz1e4kp5YyiSLJMgpSM2LZjlV3nW+Mez54cwHMG1jgC2TUH9xHC 3hwgRQZS41tWj+FewLo5VE9My1q5tesYwqtvINzlqMsxsGL6b4YGPM8MUMKZXKduMy83czB6yHZ MYRgCgd5a1NcGB3sMdQg3Fw== X-Google-Smtp-Source: AGHT+IGK1505Z+7Ltj5/aIaYU2/uK6H1PmXwZ8+A0lZ5yW9OmGTcOSnh69lygj29kJbn7gmEfzs1V0Zln0gB2gFDxA== X-Received: from plrj13.prod.google.com ([2002:a17:903:28d:b0:22f:a4aa:b819]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:198e:b0:22e:5e82:6701 with SMTP id d9443c01a7336-231b5e9cdc6mr4377715ad.18.1747266241271; Wed, 14 May 2025 16:44:01 -0700 (PDT) Date: Wed, 14 May 2025 16:42:20 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <147952f80781ebf35446f07c2a36810bce4de032.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 41/51] KVM: Add CAP to indicate support for HugeTLB as custom allocator From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If this CAP returns true, then guestmem_hugetlb can be used as a custom allocator for guest_memfd. Change-Id: I4edef395b5bd5814b70c81788d87aa94823c35d5 Signed-off-by: Ackerley Tng --- include/uapi/linux/kvm.h | 1 + virt/kvm/kvm_main.c | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index af486b2e4862..5012343dc2c5 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -932,6 +932,7 @@ struct kvm_enable_cap { #define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239 #define KVM_CAP_GMEM_SHARED_MEM 240 #define KVM_CAP_GMEM_CONVERSION 241 +#define KVM_CAP_GMEM_HUGETLB 242 =20 struct kvm_irq_routing_irqchip { __u32 irqchip; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 92054b1bbd3f..230bcb853712 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4845,6 +4845,10 @@ static int kvm_vm_ioctl_check_extension_generic(stru= ct kvm *kvm, long arg) case KVM_CAP_GMEM_SHARED_MEM: case KVM_CAP_GMEM_CONVERSION: return true; +#endif +#ifdef CONFIG_KVM_GMEM_HUGETLB + case KVM_CAP_GMEM_HUGETLB: + return true; #endif default: break; --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F36B264F8B for ; Wed, 14 May 2025 23:44:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266246; cv=none; b=KAEA1WL0xi8BbsZV0fu4iseA2QKyR8quadZs37vC6c3CCse6qLC24W6IH3qqRkya9mjwX+8df4jvrtWcoS7bEiCrnA0XuRJfL/opbdxuS2NIudC3GYLsbBT2EgVxlEu4FYv4EwtFIVVlux122DHlq/mxkzkr7Ck0DJ7czREjq/c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266246; c=relaxed/simple; bh=1rVfuYi1WHYEYjmdy8cIEb5msrK0n7swdg38Pr4wEeg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PZjWZGQT2u/Kuq0KOU0e4h/L4eQeEcAB///84d62fS97D6PskxFF0RkgUU83oDcfj/RP3s/vTrtAXXPo1N6eyiD9RkkWbf87XpZfie+oPUzV64Wnul0/ltCAJkDRksddalLYefspPUBSy95f7ZiWQs06mnyUss39D/hqrHX7QyA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4sXv9Gz1; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4sXv9Gz1" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30c4bdd0618so343476a91.1 for ; Wed, 14 May 2025 16:44:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266243; x=1747871043; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BduUPNqb61B8Eivz+AJ2LgpTa1Bu3K7jmzRP1k+UIFk=; b=4sXv9Gz1PxEr5sP04dFyv8besvyc183ZakczXBAeV4rW8ysPxeGZgt+7W9rm3iDaLI DkDMGilK9RYcrp7CclKmiJmncD8H09ZyKuzmqHSpMdeyTCTc165txTZUjS5cCEH5aQzk juLdteZq2t0zoMIvotQhK5M7l3CdYv9LM/U9dslM/4TDezuvFE5SNzoyFrDwXbQ37gyv F3m5RxwMtnGb9iX9QeYGaEcK60hVTUHKFrUiBQ1LZbvEtIAeSx7FeA4vqfBYMZL3Fcmo OINjqK7oJhEIdxxOXJ8qbTNDhkyQYXhveme4R5Va0UrOA+R+YJOCiA7ugb4EG8vdQ4LU wPLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266243; x=1747871043; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BduUPNqb61B8Eivz+AJ2LgpTa1Bu3K7jmzRP1k+UIFk=; b=sTT7NCuWDwifRVrGbQtrJe37ElmLdIJggSHKp8cjxAUTui7rvXTpv653fqvu3ZmPae /mKoaeQ+n40Fwg1sS9C3s+kbDBkfQTVwEjD1+dOp9VU5Z/VyA4F1cTQgYh51ZzTALApK su++J/A9JMDEo+8uj1N+f3i3Nk6l3IOq4vb5u/a7VyGN+i8nXV/p/xujLdGFHUAwPmMd 86erDeUu0NqpbbJX6B4RDL8qarlIgEJ6rFveVAzTfgEM1tK7Q4qy8fI1eb2blcoyS/IG lvb89ddyT0RD8K7zAaW3h3C+POvlZUCN93RGPmJQpMGmJkaXXhKa03ChXu4Ar/+xbtyu 0+ew== X-Forwarded-Encrypted: i=1; AJvYcCVeeQFZJNAKB80g5rXVRkfP1WtgPNfYB+SF5fLjSOXn95tNCwsM1zua/7p5pfEaQldrmCseW5uH5hJqUMs=@vger.kernel.org X-Gm-Message-State: AOJu0YxHNt4NaBDGMuTSy9TrS+Rsjvb3tvfihc+sScxswOghltndaZGj GmfxjCgxJYKu42tJEq5iUyUwWmpN0x7QjUC5M/BXKw5w3kB4PIBtIVvmGa4Uj0vUVcwb8CTbWuD cTD+hzyT6UOzYfKhjTJrEdw== X-Google-Smtp-Source: AGHT+IEjnuIuPh++1YsJBYPVFxGMfCMSkD+n47vqKHsfozYjZZGvfWXQQGTInhBsTuWssNrzOA4VxrkjA/1Su+uotw== X-Received: from pjbpw8.prod.google.com ([2002:a17:90b:2788:b0:301:1ea9:63b0]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:17c6:b0:2fa:15ab:4dff with SMTP id 98e67ed59e1d1-30e2e643c54mr7245629a91.31.1747266242805; Wed, 14 May 2025 16:44:02 -0700 (PDT) Date: Wed, 14 May 2025 16:42:21 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <322b245180f09076c5cbbac0d68ea27c0a8c878b.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 42/51] KVM: selftests: Add basic selftests for hugetlb-backed guest_memfd From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add tests for 2MB and 1GB page sizes, and update the invalid flags test for GUEST_MEMFD_FLAG_HUGETLB. In tests, touch every page but not every byte in page to save time while testing. Change-Id: I7d80a12b991a064cfd796e3c6e11f9a95fd16ec1 Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/guest_memfd_test.c | 94 +++++++++++++------ 1 file changed, 67 insertions(+), 27 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing= /selftests/kvm/guest_memfd_test.c index 1e79382fd830..c8acccaa9e1d 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -13,6 +13,8 @@ =20 #include #include +#include +#include #include #include #include @@ -38,6 +40,7 @@ static void test_file_read_write(int fd) static void test_faulting_allowed(int fd, size_t page_size, size_t total_s= ize) { const char val =3D 0xaa; + size_t increment; char *mem; size_t i; int ret; @@ -45,21 +48,25 @@ static void test_faulting_allowed(int fd, size_t page_s= ize, size_t total_size) mem =3D mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); TEST_ASSERT(mem !=3D MAP_FAILED, "mmaping() guest memory should pass."); =20 - memset(mem, val, total_size); - for (i =3D 0; i < total_size; i++) + increment =3D page_size >> 1; + + for (i =3D 0; i < total_size; i +=3D increment) + mem[i] =3D val; + for (i =3D 0; i < total_size; i +=3D increment) TEST_ASSERT_EQ(mem[i], val); =20 ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, page_size); TEST_ASSERT(!ret, "fallocate the first page should succeed"); =20 - for (i =3D 0; i < page_size; i++) + for (i =3D 0; i < page_size; i +=3D increment) TEST_ASSERT_EQ(mem[i], 0x00); - for (; i < total_size; i++) + for (; i < total_size; i +=3D increment) TEST_ASSERT_EQ(mem[i], val); =20 - memset(mem, val, total_size); - for (i =3D 0; i < total_size; i++) + for (i =3D 0; i < total_size; i +=3D increment) + mem[i] =3D val; + for (i =3D 0; i < total_size; i +=3D increment) TEST_ASSERT_EQ(mem[i], val); =20 ret =3D munmap(mem, total_size); @@ -209,7 +216,7 @@ static void test_create_guest_memfd_invalid_sizes(struc= t kvm_vm *vm, size_t size; int fd; =20 - for (size =3D 1; size < page_size; size++) { + for (size =3D 1; size < page_size; size +=3D (page_size >> 1)) { fd =3D __vm_create_guest_memfd(vm, size, guest_memfd_flags); TEST_ASSERT(fd =3D=3D -1 && errno =3D=3D EINVAL, "guest_memfd() with non-page-aligned page size '0x%lx' should fail = with EINVAL", @@ -217,28 +224,33 @@ static void test_create_guest_memfd_invalid_sizes(str= uct kvm_vm *vm, } } =20 -static void test_create_guest_memfd_multiple(struct kvm_vm *vm) +static void test_create_guest_memfd_multiple(struct kvm_vm *vm, + uint64_t guest_memfd_flags, + size_t page_size) { int fd1, fd2, ret; struct stat st1, st2; =20 - fd1 =3D __vm_create_guest_memfd(vm, 4096, 0); + fd1 =3D __vm_create_guest_memfd(vm, page_size, guest_memfd_flags); TEST_ASSERT(fd1 !=3D -1, "memfd creation should succeed"); =20 ret =3D fstat(fd1, &st1); TEST_ASSERT(ret !=3D -1, "memfd fstat should succeed"); - TEST_ASSERT(st1.st_size =3D=3D 4096, "memfd st_size should match requeste= d size"); + TEST_ASSERT(st1.st_size =3D=3D page_size, "memfd st_size should match req= uested size"); =20 - fd2 =3D __vm_create_guest_memfd(vm, 8192, 0); + fd2 =3D __vm_create_guest_memfd(vm, page_size * 2, guest_memfd_flags); TEST_ASSERT(fd2 !=3D -1, "memfd creation should succeed"); =20 ret =3D fstat(fd2, &st2); TEST_ASSERT(ret !=3D -1, "memfd fstat should succeed"); - TEST_ASSERT(st2.st_size =3D=3D 8192, "second memfd st_size should match r= equested size"); + TEST_ASSERT(st2.st_size =3D=3D page_size * 2, + "second memfd st_size should match requested size"); + =20 ret =3D fstat(fd1, &st1); TEST_ASSERT(ret !=3D -1, "memfd fstat should succeed"); - TEST_ASSERT(st1.st_size =3D=3D 4096, "first memfd st_size should still ma= tch requested size"); + TEST_ASSERT(st1.st_size =3D=3D page_size, + "first memfd st_size should still match requested size"); TEST_ASSERT(st1.st_ino !=3D st2.st_ino, "different memfd should have diff= erent inode numbers"); =20 close(fd2); @@ -449,21 +461,13 @@ static void test_guest_memfd_features(struct kvm_vm *= vm, size_t page_size, close(fd); } =20 -static void test_with_type(unsigned long vm_type, uint64_t guest_memfd_fla= gs, - bool expect_mmap_allowed) +static void test_guest_memfd_features_for_page_size(struct kvm_vm *vm, + uint64_t guest_memfd_flags, + size_t page_size, + bool expect_mmap_allowed) { - struct kvm_vm *vm; - size_t page_size; + test_create_guest_memfd_multiple(vm, guest_memfd_flags, page_size); =20 - if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) - return; - - vm =3D vm_create_barebones_type(vm_type); - - test_create_guest_memfd_multiple(vm); - test_bind_guest_memfd_wrt_userspace_addr(vm); - - page_size =3D getpagesize(); if (guest_memfd_flags & GUEST_MEMFD_FLAG_SUPPORT_SHARED) { test_guest_memfd_features(vm, page_size, guest_memfd_flags, expect_mmap_allowed, true); @@ -479,6 +483,34 @@ static void test_with_type(unsigned long vm_type, uint= 64_t guest_memfd_flags, test_guest_memfd_features(vm, page_size, guest_memfd_flags, expect_mmap_allowed, false); } +} + +static void test_with_type(unsigned long vm_type, uint64_t base_flags, + bool expect_mmap_allowed) +{ + struct kvm_vm *vm; + uint64_t flags; + + if (!(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) + return; + + vm =3D vm_create_barebones_type(vm_type); + + test_bind_guest_memfd_wrt_userspace_addr(vm); + + printf("Test guest_memfd with 4K pages for vm_type %ld\n", vm_type); + test_guest_memfd_features_for_page_size(vm, base_flags, getpagesize(), ex= pect_mmap_allowed); + printf("\tPASSED\n"); + + printf("Test guest_memfd with 2M pages for vm_type %ld\n", vm_type); + flags =3D base_flags | GUEST_MEMFD_FLAG_HUGETLB | GUESTMEM_HUGETLB_FLAG_2= MB; + test_guest_memfd_features_for_page_size(vm, flags, SZ_2M, expect_mmap_all= owed); + printf("\tPASSED\n"); + + printf("Test guest_memfd with 1G pages for vm_type %ld\n", vm_type); + flags =3D base_flags | GUEST_MEMFD_FLAG_HUGETLB | GUESTMEM_HUGETLB_FLAG_1= GB; + test_guest_memfd_features_for_page_size(vm, flags, SZ_1G, expect_mmap_all= owed); + printf("\tPASSED\n"); =20 kvm_vm_release(vm); } @@ -486,9 +518,14 @@ static void test_with_type(unsigned long vm_type, uint= 64_t guest_memfd_flags, static void test_vm_with_gmem_flag(struct kvm_vm *vm, uint64_t flag, bool expect_valid) { - size_t page_size =3D getpagesize(); + size_t page_size; int fd; =20 + if (flag =3D=3D GUEST_MEMFD_FLAG_HUGETLB) + page_size =3D get_def_hugetlb_pagesz(); + else + page_size =3D getpagesize(); + fd =3D __vm_create_guest_memfd(vm, page_size, flag); =20 if (expect_valid) { @@ -550,6 +587,9 @@ static void test_gmem_flag_validity(void) /* After conversions are supported, all VM types support shared mem. */ uint64_t valid_flags =3D GUEST_MEMFD_FLAG_SUPPORT_SHARED; =20 + if (kvm_has_cap(KVM_CAP_GMEM_HUGETLB)) + valid_flags |=3D GUEST_MEMFD_FLAG_HUGETLB; + test_vm_type_gmem_flag_validity(VM_TYPE_DEFAULT, valid_flags); =20 #ifdef __x86_64__ --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:19 2025 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11C4D266F19 for ; Wed, 14 May 2025 23:44:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266247; cv=none; b=fNWwyfWw454eW9EdHoGH8iKnavwtKqmSnouFOoryWeqPuPJnzN8g63/Pe6we1TgXHXLrv8LAEMLo1xGYL0rg1jHlLleIjVyrZpy6tdksrT67pBcgbC5DvnqTZp7jNVHr1gzX6XNQ2p+1gj527eD9Va34mYSyLDVfCFG3Y2uVLr0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266247; c=relaxed/simple; bh=ImmIEjgg6VN8oNGIpqxRIgN7oquIvMOxprY/MGcEcGE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=UwfSSx/idjW5oYA6QLKROVHulJ/kyUh2/0rGjcQ5gE/Nywd+ARb5s67wH9MC+9yvNbDWxY2BZK1HbdxQ8Eu72V49NgCjHYZZkwvgqKTn7am7L2IakIyAwhKCTLSIc171g5Uw16tpECiH3N5mUMyEhfdgtM1NZvj1tRVUxInPniE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ckoGeI/V; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ckoGeI/V" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b2371b50cabso291717a12.0 for ; Wed, 14 May 2025 16:44:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266244; x=1747871044; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=SBKCNvKK5vlZpa9NXE6RG3mzzUtnLi6xsQAbwU+j9NQ=; b=ckoGeI/VC7JcpC/2jFd9RGFu6A75HHK8rS4bFpQrh7R1EZsKlvL3GAq3Qk6ss9tXiW eDWEL9xeMZpEprhp5pYhBbmkei31Nj5p67oG22lCANaYAOcBVrGWtHhYGXfHd7654VPa muSt8tozcZKdp9vA21H5iswq62+SqTReYYi+QpDFqeCZ09y6RQQuvbySwAqip0ZKGqvj N2fAhF8KybW4KmtN7keCOlPibPdWyJ+yzn51fbR1eJhKcG4o8853PBw7UMl8J8Ul9Kwy 5BqyTjbLC2xcJe/r2v9sH7TZt8rYVf2T18txmxyg9UF5AoV01iB+oFKk5/rqSBMMrqlD eZdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266244; x=1747871044; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SBKCNvKK5vlZpa9NXE6RG3mzzUtnLi6xsQAbwU+j9NQ=; b=I2XEVZPbkbLOWPgK7MCkxHpYPYLF7W8aIqr6HkCmqN0mx/vt9HmfslfRzUGISY+PwA now7vO7ggelnP2B8e3MWlVPdA1dWTTOGtfwqQcNT0nSD5tvUs/jNR2B7Kk6Z/hy2TKbi aYQEZqgVMbdSwXhAjT0Z3l+n34YQGi2YHBKeyj0+Hbs8P3dlnUi8JzAWxQejk4x6noD0 3Rgho3u4TiOO+qgh2WuqoT8cLmAemy1bsE/qBnnflEFo11OpQNIxCeZM+dALXEoQMhC4 t4Xs+nHRj4v41oZ7S/5dPzeMUA2ucYb+foaYKw8wEXS5vkPlffKzutdKs6uzz4sdte9a Gvxw== X-Forwarded-Encrypted: i=1; AJvYcCVryXeWiu2wOLv0QL1ZVMO+wolp4f3fpb//krL35XYlEG2QWUfBDCkzFkeT8qhuA5Vo3LbhDb+9R1YKM64=@vger.kernel.org X-Gm-Message-State: AOJu0YyTcqmtQ3pyZhQsuSloyk8YArd8SGhDRftwLFNGJy1kCc2HbYkb KwaBorg+7o8APpCuQS8++Q+AQGD6wpzlcVQHv3ybKDEDwvNEN6wo+v3sVMtT7U6NWAyfLFICaF4 ihXxIP4A59XedFhFOjENhfQ== X-Google-Smtp-Source: AGHT+IFlY73C7NnEaCyZjX9EVeyEjmbdYFDraUMzvzxmT+X894pXxbvAFLXXKcOPJa5mxHY7SeCxnzAUoVQK/RAQbA== X-Received: from pjbqx6.prod.google.com ([2002:a17:90b:3e46:b0:30a:7c16:a1aa]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2585:b0:2f4:4003:f3ea with SMTP id 98e67ed59e1d1-30e5195f760mr820504a91.33.1747266244464; Wed, 14 May 2025 16:44:04 -0700 (PDT) Date: Wed, 14 May 2025 16:42:22 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 43/51] KVM: selftests: Update conversion flows test for HugeTLB From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch updates conversion flows test to use GUEST_MEMFD_FLAG_HUGETLB and tests with the 2MB and 1GB sizes. Change-Id: If5d93cb776d6bebd504a80bba553bd534e62be38 Signed-off-by: Ackerley Tng --- .../kvm/guest_memfd_conversions_test.c | 171 ++++++++++-------- 1 file changed, 98 insertions(+), 73 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_conversions_test.c b/t= ools/testing/selftests/kvm/guest_memfd_conversions_test.c index 34eb6c9a37b1..22126454fd6b 100644 --- a/tools/testing/selftests/kvm/guest_memfd_conversions_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_conversions_test.c @@ -5,6 +5,7 @@ * Copyright (c) 2024, Google LLC. */ #include +#include #include #include #include @@ -228,6 +229,11 @@ static struct kvm_vm *setup_test(size_t test_page_size= , bool init_private, if (init_private) flags |=3D GUEST_MEMFD_FLAG_INIT_PRIVATE; =20 + if (test_page_size =3D=3D SZ_2M) + flags |=3D GUEST_MEMFD_FLAG_HUGETLB | GUESTMEM_HUGETLB_FLAG_2MB; + else if (test_page_size =3D=3D SZ_1G) + flags |=3D GUEST_MEMFD_FLAG_HUGETLB | GUESTMEM_HUGETLB_FLAG_1GB; + *guest_memfd =3D vm_create_guest_memfd(vm, test_page_size, flags); TEST_ASSERT(*guest_memfd > 0, "guest_memfd creation failed"); =20 @@ -249,79 +255,80 @@ static void cleanup_test(size_t guest_memfd_size, str= uct kvm_vm *vm, TEST_ASSERT_EQ(close(guest_memfd), 0); } =20 -static void test_sharing(void) +static void test_sharing(size_t test_page_size) { struct kvm_vcpu *vcpu; struct kvm_vm *vm; int guest_memfd; char *mem; =20 - vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/false, &vcpu, &guest_memf= d, &mem); + vm =3D setup_test(test_page_size, /*init_private=3D*/false, &vcpu, &guest= _memfd, &mem); =20 host_use_memory(mem, 'X', 'A'); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'A', 'B', 0); =20 /* Toggle private flag of memory attributes and run the test again. */ - guest_memfd_convert_private(guest_memfd, 0, PAGE_SIZE); + guest_memfd_convert_private(guest_memfd, 0, test_page_size); =20 assert_host_cannot_fault(mem); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'B', 'C', 0); =20 - guest_memfd_convert_shared(guest_memfd, 0, PAGE_SIZE); + guest_memfd_convert_shared(guest_memfd, 0, test_page_size); =20 host_use_memory(mem, 'C', 'D'); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'D', 'E', 0); =20 - cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); + cleanup_test(test_page_size, vm, guest_memfd, mem); } =20 -static void test_init_mappable_false(void) +static void test_init_mappable_false(size_t test_page_size) { struct kvm_vcpu *vcpu; struct kvm_vm *vm; int guest_memfd; char *mem; =20 - vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/true, &vcpu, &guest_memfd= , &mem); + vm =3D setup_test(test_page_size, /*init_private=3D*/true, &vcpu, &guest_= memfd, &mem); =20 assert_host_cannot_fault(mem); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); =20 - guest_memfd_convert_shared(guest_memfd, 0, PAGE_SIZE); + guest_memfd_convert_shared(guest_memfd, 0, test_page_size); =20 host_use_memory(mem, 'A', 'B'); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'B', 'C', 0); =20 - cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); + cleanup_test(test_page_size, vm, guest_memfd, mem); } =20 /* * Test that even if there are no folios yet, conversion requests are reco= rded * in guest_memfd. */ -static void test_conversion_before_allocation(void) +static void test_conversion_before_allocation(size_t test_page_size) { struct kvm_vcpu *vcpu; struct kvm_vm *vm; int guest_memfd; char *mem; =20 - vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/false, &vcpu, &guest_memf= d, &mem); + vm =3D setup_test(test_page_size, /*init_private=3D*/false, &vcpu, &guest= _memfd, &mem); =20 - guest_memfd_convert_private(guest_memfd, 0, PAGE_SIZE); + guest_memfd_convert_private(guest_memfd, 0, test_page_size); =20 assert_host_cannot_fault(mem); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); =20 - guest_memfd_convert_shared(guest_memfd, 0, PAGE_SIZE); + guest_memfd_convert_shared(guest_memfd, 0, test_page_size); =20 host_use_memory(mem, 'A', 'B'); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'B', 'C', 0); =20 - cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); + cleanup_test(test_page_size, vm, guest_memfd, mem); } =20 -static void __test_conversion_if_not_all_folios_allocated(int total_nr_pag= es, +static void __test_conversion_if_not_all_folios_allocated(size_t test_page= _size, + int total_nr_pages, int page_to_fault) { const int second_page_to_fault =3D 8; @@ -332,15 +339,15 @@ static void __test_conversion_if_not_all_folios_alloc= ated(int total_nr_pages, char *mem; int i; =20 - total_size =3D PAGE_SIZE * total_nr_pages; + total_size =3D test_page_size * total_nr_pages; vm =3D setup_test(total_size, /*init_private=3D*/false, &vcpu, &guest_mem= fd, &mem); =20 /* * Fault 2 of the pages to test filemap range operations except when * page_to_fault =3D=3D second_page_to_fault. */ - host_use_memory(mem + page_to_fault * PAGE_SIZE, 'X', 'A'); - host_use_memory(mem + second_page_to_fault * PAGE_SIZE, 'X', 'A'); + host_use_memory(mem + page_to_fault * test_page_size, 'X', 'A'); + host_use_memory(mem + second_page_to_fault * test_page_size, 'X', 'A'); =20 guest_memfd_convert_private(guest_memfd, 0, total_size); =20 @@ -348,37 +355,37 @@ static void __test_conversion_if_not_all_folios_alloc= ated(int total_nr_pages, bool is_faulted; char expected; =20 - assert_host_cannot_fault(mem + i * PAGE_SIZE); + assert_host_cannot_fault(mem + i * test_page_size); =20 is_faulted =3D i =3D=3D page_to_fault || i =3D=3D second_page_to_fault; expected =3D is_faulted ? 'A' : 'X'; guest_use_memory(vcpu, - GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + GUEST_MEMFD_SHARING_TEST_GVA + i * test_page_size, expected, 'B', 0); } =20 guest_memfd_convert_shared(guest_memfd, 0, total_size); =20 for (i =3D 0; i < total_nr_pages; ++i) { - host_use_memory(mem + i * PAGE_SIZE, 'B', 'C'); + host_use_memory(mem + i * test_page_size, 'B', 'C'); guest_use_memory(vcpu, - GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + GUEST_MEMFD_SHARING_TEST_GVA + i * test_page_size, 'C', 'D', 0); } =20 cleanup_test(total_size, vm, guest_memfd, mem); } =20 -static void test_conversion_if_not_all_folios_allocated(void) +static void test_conversion_if_not_all_folios_allocated(size_t test_page_s= ize) { const int total_nr_pages =3D 16; int i; =20 for (i =3D 0; i < total_nr_pages; ++i) - __test_conversion_if_not_all_folios_allocated(total_nr_pages, i); + __test_conversion_if_not_all_folios_allocated(test_page_size, total_nr_p= ages, i); } =20 -static void test_conversions_should_not_affect_surrounding_pages(void) +static void test_conversions_should_not_affect_surrounding_pages(size_t te= st_page_size) { struct kvm_vcpu *vcpu; int page_to_convert; @@ -391,40 +398,40 @@ static void test_conversions_should_not_affect_surrou= nding_pages(void) =20 page_to_convert =3D 2; nr_pages =3D 4; - total_size =3D PAGE_SIZE * nr_pages; + total_size =3D test_page_size * nr_pages; =20 vm =3D setup_test(total_size, /*init_private=3D*/false, &vcpu, &guest_mem= fd, &mem); =20 for (i =3D 0; i < nr_pages; ++i) { - host_use_memory(mem + i * PAGE_SIZE, 'X', 'A'); - guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + host_use_memory(mem + i * test_page_size, 'X', 'A'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * test_page_size, 'A', 'B', 0); } =20 - guest_memfd_convert_private(guest_memfd, PAGE_SIZE * page_to_convert, PAG= E_SIZE); + guest_memfd_convert_private(guest_memfd, test_page_size * page_to_convert= , test_page_size); =20 =20 for (i =3D 0; i < nr_pages; ++i) { char to_check; =20 if (i =3D=3D page_to_convert) { - assert_host_cannot_fault(mem + i * PAGE_SIZE); + assert_host_cannot_fault(mem + i * test_page_size); to_check =3D 'B'; } else { - host_use_memory(mem + i * PAGE_SIZE, 'B', 'C'); + host_use_memory(mem + i * test_page_size, 'B', 'C'); to_check =3D 'C'; } =20 - guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * test_page_size, to_check, 'D', 0); } =20 - guest_memfd_convert_shared(guest_memfd, PAGE_SIZE * page_to_convert, PAGE= _SIZE); + guest_memfd_convert_shared(guest_memfd, test_page_size * page_to_convert,= test_page_size); =20 =20 for (i =3D 0; i < nr_pages; ++i) { - host_use_memory(mem + i * PAGE_SIZE, 'D', 'E'); - guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + host_use_memory(mem + i * test_page_size, 'D', 'E'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * test_page_size, 'E', 'F', 0); } =20 @@ -432,7 +439,7 @@ static void test_conversions_should_not_affect_surround= ing_pages(void) } =20 static void __test_conversions_should_fail_if_memory_has_elevated_refcount( - int nr_pages, int page_to_convert) + size_t test_page_size, int nr_pages, int page_to_convert) { struct kvm_vcpu *vcpu; loff_t error_offset; @@ -443,50 +450,50 @@ static void __test_conversions_should_fail_if_memory_= has_elevated_refcount( int ret; int i; =20 - total_size =3D PAGE_SIZE * nr_pages; + total_size =3D test_page_size * nr_pages; vm =3D setup_test(total_size, /*init_private=3D*/false, &vcpu, &guest_mem= fd, &mem); =20 - pin_pages(mem + page_to_convert * PAGE_SIZE, PAGE_SIZE); + pin_pages(mem + page_to_convert * test_page_size, test_page_size); =20 for (i =3D 0; i < nr_pages; i++) { - host_use_memory(mem + i * PAGE_SIZE, 'X', 'A'); - guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + host_use_memory(mem + i * test_page_size, 'X', 'A'); + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * test_page_size, 'A', 'B', 0); } =20 error_offset =3D 0; - ret =3D __guest_memfd_convert_private(guest_memfd, page_to_convert * PAGE= _SIZE, - PAGE_SIZE, &error_offset); + ret =3D __guest_memfd_convert_private(guest_memfd, page_to_convert * test= _page_size, + test_page_size, &error_offset); TEST_ASSERT_EQ(ret, -1); TEST_ASSERT_EQ(errno, EAGAIN); - TEST_ASSERT_EQ(error_offset, page_to_convert * PAGE_SIZE); + TEST_ASSERT_EQ(error_offset, page_to_convert * test_page_size); =20 unpin_pages(); =20 - guest_memfd_convert_private(guest_memfd, page_to_convert * PAGE_SIZE, PAG= E_SIZE); + guest_memfd_convert_private(guest_memfd, page_to_convert * test_page_size= , test_page_size); =20 for (i =3D 0; i < nr_pages; i++) { char expected; =20 if (i =3D=3D page_to_convert) - assert_host_cannot_fault(mem + i * PAGE_SIZE); + assert_host_cannot_fault(mem + i * test_page_size); else - host_use_memory(mem + i * PAGE_SIZE, 'B', 'C'); + host_use_memory(mem + i * test_page_size, 'B', 'C'); =20 expected =3D i =3D=3D page_to_convert ? 'X' : 'C'; - guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA + i * test_page_size, expected, 'D', 0); } =20 - guest_memfd_convert_shared(guest_memfd, page_to_convert * PAGE_SIZE, PAGE= _SIZE); + guest_memfd_convert_shared(guest_memfd, page_to_convert * test_page_size,= test_page_size); =20 =20 for (i =3D 0; i < nr_pages; i++) { char expected =3D i =3D=3D page_to_convert ? 'X' : 'D'; =20 - host_use_memory(mem + i * PAGE_SIZE, expected, 'E'); + host_use_memory(mem + i * test_page_size, expected, 'E'); guest_use_memory(vcpu, - GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + GUEST_MEMFD_SHARING_TEST_GVA + i * test_page_size, 'E', 'F', 0); } =20 @@ -496,15 +503,18 @@ static void __test_conversions_should_fail_if_memory_= has_elevated_refcount( * This test depends on CONFIG_GUP_TEST to provide a kernel module that ex= poses * pin_user_pages() to userspace. */ -static void test_conversions_should_fail_if_memory_has_elevated_refcount(v= oid) +static void test_conversions_should_fail_if_memory_has_elevated_refcount( + size_t test_page_size) { int i; =20 - for (i =3D 0; i < 4; i++) - __test_conversions_should_fail_if_memory_has_elevated_refcount(4, i); + for (i =3D 0; i < 4; i++) { + __test_conversions_should_fail_if_memory_has_elevated_refcount( + test_page_size, 4, i); + } } =20 -static void test_truncate_should_not_change_mappability(void) +static void test_truncate_should_not_change_mappability(size_t test_page_s= ize) { struct kvm_vcpu *vcpu; struct kvm_vm *vm; @@ -512,40 +522,40 @@ static void test_truncate_should_not_change_mappabili= ty(void) char *mem; int ret; =20 - vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/false, &vcpu, &guest_memf= d, &mem); + vm =3D setup_test(test_page_size, /*init_private=3D*/false, &vcpu, &guest= _memfd, &mem); =20 host_use_memory(mem, 'X', 'A'); =20 ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, - 0, PAGE_SIZE); + 0, test_page_size); TEST_ASSERT(!ret, "truncating the first page should succeed"); =20 host_use_memory(mem, 'X', 'A'); =20 - guest_memfd_convert_private(guest_memfd, 0, PAGE_SIZE); + guest_memfd_convert_private(guest_memfd, 0, test_page_size); =20 assert_host_cannot_fault(mem); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'A', 'A', 0); =20 ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, - 0, PAGE_SIZE); + 0, test_page_size); TEST_ASSERT(!ret, "truncating the first page should succeed"); =20 assert_host_cannot_fault(mem); guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); =20 - cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); + cleanup_test(test_page_size, vm, guest_memfd, mem); } =20 -static void test_fault_type_independent_of_mem_attributes(void) +static void test_fault_type_independent_of_mem_attributes(size_t test_page= _size) { struct kvm_vcpu *vcpu; struct kvm_vm *vm; int guest_memfd; char *mem; =20 - vm =3D setup_test(PAGE_SIZE, /*init_private=3D*/true, &vcpu, &guest_memfd= , &mem); - vm_mem_set_shared(vm, GUEST_MEMFD_SHARING_TEST_GPA, PAGE_SIZE); + vm =3D setup_test(test_page_size, /*init_private=3D*/true, &vcpu, &guest_= memfd, &mem); + vm_mem_set_shared(vm, GUEST_MEMFD_SHARING_TEST_GPA, test_page_size); =20 /* * kvm->mem_attr_array set to shared, guest_memfd memory initialized as @@ -558,8 +568,8 @@ static void test_fault_type_independent_of_mem_attribut= es(void) /* Guest can fault and use memory. */ guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); =20 - guest_memfd_convert_shared(guest_memfd, 0, PAGE_SIZE); - vm_mem_set_private(vm, GUEST_MEMFD_SHARING_TEST_GPA, PAGE_SIZE); + guest_memfd_convert_shared(guest_memfd, 0, test_page_size); + vm_mem_set_private(vm, GUEST_MEMFD_SHARING_TEST_GPA, test_page_size); =20 /* Host can use shared memory. */ host_use_memory(mem, 'X', 'A'); @@ -567,7 +577,19 @@ static void test_fault_type_independent_of_mem_attribu= tes(void) /* Guest can also use shared memory. */ guest_use_memory(vcpu, GUEST_MEMFD_SHARING_TEST_GVA, 'X', 'A', 0); =20 - cleanup_test(PAGE_SIZE, vm, guest_memfd, mem); + cleanup_test(test_page_size, vm, guest_memfd, mem); +} + +static void test_with_size(size_t test_page_size) +{ + test_sharing(test_page_size); + test_init_mappable_false(test_page_size); + test_conversion_before_allocation(test_page_size); + test_conversion_if_not_all_folios_allocated(test_page_size); + test_conversions_should_not_affect_surrounding_pages(test_page_size); + test_truncate_should_not_change_mappability(test_page_size); + test_conversions_should_fail_if_memory_has_elevated_refcount(test_page_si= ze); + test_fault_type_independent_of_mem_attributes(test_page_size); } =20 int main(int argc, char *argv[]) @@ -576,14 +598,17 @@ int main(int argc, char *argv[]) TEST_REQUIRE(kvm_check_cap(KVM_CAP_GMEM_SHARED_MEM)); TEST_REQUIRE(kvm_check_cap(KVM_CAP_GMEM_CONVERSION)); =20 - test_sharing(); - test_init_mappable_false(); - test_conversion_before_allocation(); - test_conversion_if_not_all_folios_allocated(); - test_conversions_should_not_affect_surrounding_pages(); - test_truncate_should_not_change_mappability(); - test_conversions_should_fail_if_memory_has_elevated_refcount(); - test_fault_type_independent_of_mem_attributes(); + printf("Test guest_memfd with 4K pages\n"); + test_with_size(PAGE_SIZE); + printf("\tPASSED\n"); + + printf("Test guest_memfd with 2M pages\n"); + test_with_size(SZ_2M); + printf("\tPASSED\n"); + + printf("Test guest_memfd with 1G pages\n"); + test_with_size(SZ_1G); + printf("\tPASSED\n"); =20 return 0; } --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:20 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1EBC267F70 for ; Wed, 14 May 2025 23:44:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266248; cv=none; b=gr/u3OhootrAxxLgFbSgnJe18pqxqJwvGu+zi1087pjQPNgUJQnrVLav53oaAtpzd2bXU2eEJ/5GU9TmF63K5lMqyc1KOu0GDb2vZ9GnMAOaUG5qbVp63+lQeCHbiaF3kCUZFGcr9a9rJokSdh55rYGvdFWIUN7r1vMja+wI6Bk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266248; c=relaxed/simple; bh=LDbOHPoSGuNFXS8JdzaelZFxT0RmINDPbHD9pwKJJaU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=CXoyUrHq1q/k2Ng9Wnvopw/ehoK/Fy+wYNm5vY7AVImTWFmm2Fx82Wvn1cBobvRapoZvACiWA0iuY+Xkl+HeMdg8URr6M3g2XDyhriZzsenX49Dkfje/P+kuLPZVRlkp2e1IWUhcVwJLRT2X1n4ab5wONYTnX1t307zbEw3i2fM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=DV0gRTyo; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="DV0gRTyo" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30a80fe759bso581327a91.1 for ; Wed, 14 May 2025 16:44:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266246; x=1747871046; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7kvSCkXJQhrKVgucxSGtwwGYhLWVqw+CwFYL4UvJquE=; b=DV0gRTyoMlCf+hIaASLoGpat9PrzeZ1QCbiAZw970dEqBkFPkQAE0Os4NFQoba8HlV SArGOOQoFnM94aj+hfSJpffgF2gCg/IaUFg5vIXaufr7dU0QVhDXEMJfnz2wIfZ+UXGA IsBadfrKRNRq0D4tN+kBFVpUb5wEnM5HzNyzTAYDw6P2BLgqxjEIYXzkNoFkqw+dKEZ0 WbZJzTq8DtFWTa5LsW9ZNXgS4S6cJsJMN6GLlnNONDsNY8RyrQ/oRKiFAewWcJ4gF0mb oq/1UnLdl3DMeAe0/5eO0xAL5J8TW+GZ8/haAx/ZxyF9hAFn6J3t8HvyX2SaY14eJ0XU clog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266246; x=1747871046; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7kvSCkXJQhrKVgucxSGtwwGYhLWVqw+CwFYL4UvJquE=; b=BlrUhohZ2jh+A6ScAmooSw+C6Wl2NcusP4azs1X8DPpAfGl4H/Y7+koFMULW7N9bES S5xFeLxNXaAQ6TVc5j4q3+yklUAkLoM9Xl9bjmVN1S1/QhJ5JOrxk5+D5fo+IZAV/y2K IHNz8kZ9mYjRimL6H8dZXGrrn7bPV7mJpdopD6+Dn3PZmcfV/T7uDhWybAYRydQaPWN3 cMVjLDGb3SKzoVHGrzeWi3G4gLZgStlgnsViJe5zpwnKpKwASXI3mvSw2LsZ1sj/jLJ4 RWMPi6aQOs04Ijiak1VPBdp8Hv9YElEWEoddjobpWfFNq48X04R7xtX+uy8rq2ZQoTnw UsEQ== X-Forwarded-Encrypted: i=1; AJvYcCWMr0LI1O07j/py6bY2DXdZlZ+V89QUuOknlJ1q6gdpW0jLnYW+XZKKVnG7FYGH/SJqLtZWz94d1dBcUEE=@vger.kernel.org X-Gm-Message-State: AOJu0YyrCcnOhJf7TT99KFvlQDO8wm3D1Uzgqo1qry5qjCJgB+0eOKJM 5Y4q1ZB1VJwLqkHtZa8PIwLvChs4MduZh5woO+MhHPR6L5I9EfIqj9fP81Nf/gaxigywbARuDnT GNTndOGYkQBXht7PzisGCTQ== X-Google-Smtp-Source: AGHT+IE20STsLMarzTx6jgJoU5OziT731+lq67F3PvF/P+tS/GIdT+Mf8wt3BdWWvNKEn5ln5kbUV5Kg3XvNRqtHxw== X-Received: from pjbcz5.prod.google.com ([2002:a17:90a:d445:b0:2fa:a101:755]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d88c:b0:2f2:a664:df20 with SMTP id 98e67ed59e1d1-30e51570e69mr753738a91.7.1747266245969; Wed, 14 May 2025 16:44:05 -0700 (PDT) Date: Wed, 14 May 2025 16:42:23 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 44/51] KVM: selftests: Test truncation paths of guest_memfd From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When guest_memfd folios are truncated, if pages are split, they have to be merged. For truncations, userspace will get an error if there are unexpected refcounts on the folios. For truncation on closing, kernel will handle the merging even if there are unexpected refcounts on the folios. This patch tests the above two scenarios. Change-Id: I0f0c619763f575605fab8b3c453858960e43ed71 Signed-off-by: Ackerley Tng --- .../kvm/guest_memfd_conversions_test.c | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/tools/testing/selftests/kvm/guest_memfd_conversions_test.c b/t= ools/testing/selftests/kvm/guest_memfd_conversions_test.c index 22126454fd6b..435f91424d5f 100644 --- a/tools/testing/selftests/kvm/guest_memfd_conversions_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_conversions_test.c @@ -4,6 +4,7 @@ * * Copyright (c) 2024, Google LLC. */ +#include #include #include #include @@ -580,6 +581,97 @@ static void test_fault_type_independent_of_mem_attribu= tes(size_t test_page_size) cleanup_test(test_page_size, vm, guest_memfd, mem); } =20 +static void test_truncate_shared_while_pinned(size_t test_page_size) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int guest_memfd; + char *mem; + int ret; + + vm =3D setup_test(test_page_size, /*init_private=3D*/false, &vcpu, + &guest_memfd, &mem); + + ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE, 0, test_page_size); + TEST_ASSERT(!ret, "fallocate should have succeeded"); + + pin_pages(mem, test_page_size); + + ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + 0, test_page_size); + if (test_page_size =3D=3D PAGE_SIZE) { + TEST_ASSERT(!ret, "truncate should have succeeded since there is no need= to merge"); + } else { + TEST_ASSERT(ret, "truncate should have failed since pages are pinned"); + TEST_ASSERT_EQ(errno, EAGAIN); + } + + unpin_pages(); + + ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + 0, test_page_size); + TEST_ASSERT(!ret, "truncate should succeed now that pages are unpinned"); + + cleanup_test(test_page_size, vm, guest_memfd, mem); +} + +static void test_truncate_private(size_t test_page_size) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int guest_memfd; + char *mem; + int ret; + + vm =3D setup_test(test_page_size, /*init_private=3D*/true, &vcpu, + &guest_memfd, &mem); + + ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE, 0, test_page_size); + TEST_ASSERT(!ret, "fallocate should have succeeded"); + + ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + 0, test_page_size); + TEST_ASSERT(!ret, "truncate should have succeeded since there is no need = to merge"); + + cleanup_test(test_page_size, vm, guest_memfd, mem); +} + +static void __test_close_with_pinning(size_t test_page_size, bool init_pri= vate) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + int guest_memfd; + char *mem; + int ret; + + vm =3D setup_test(test_page_size, init_private, &vcpu, &guest_memfd, &mem= ); + + ret =3D fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE, 0, test_page_size); + TEST_ASSERT(!ret, "fallocate should have succeeded"); + + if (!init_private) + pin_pages(mem, test_page_size); + + cleanup_test(test_page_size, vm, guest_memfd, mem); + + if (!init_private) + unpin_pages(); + + /* + * Test this with ./guest_memfd_wrap_test_check_hugetlb_reporting.sh to + * check that the HugeTLB page got merged and returned to HugeTLB. + * + * Sleep here to give kernel worker time to do the merge and return. + */ + sleep(1); +} + +static void test_close_with_pinning(size_t test_page_size) +{ + __test_close_with_pinning(test_page_size, true); + __test_close_with_pinning(test_page_size, false); +} + static void test_with_size(size_t test_page_size) { test_sharing(test_page_size); @@ -590,6 +682,9 @@ static void test_with_size(size_t test_page_size) test_truncate_should_not_change_mappability(test_page_size); test_conversions_should_fail_if_memory_has_elevated_refcount(test_page_si= ze); test_fault_type_independent_of_mem_attributes(test_page_size); + test_truncate_shared_while_pinned(test_page_size); + test_truncate_private(test_page_size); + test_close_with_pinning(test_page_size); } =20 int main(int argc, char *argv[]) --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:20 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47251268FDD for ; Wed, 14 May 2025 23:44:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266250; cv=none; b=AumH03buVTKMaX5z0N3K3+0yerSne2/8sDCEELRUCv5M5VkNGKDUjLBjioYMD/T18A+9JbtTZN2dfzvQD/n1M2s58VMUcs3Wx9eggdOYLxnteRjEC8q5VFsFqNPYffktXSUnsNsbngkaT5wsF+ohzucc154d1Ax8i4KCP5oCFpA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266250; c=relaxed/simple; bh=g2kEqW1Z21Xi73nWkW3r1kF6BUbvwrCvya4e/6PiEcw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=TZyNgNuMygsKW5c/7exD3xSuVfr9+Q3sbi4M36Pc5BC1eN3aplgn/3UDByIrNpSY23unscwWU8fpf+ElXfTR6hprI6oTaPzLnJKmvWGv1ic1lYCN70gl/kRiEYufKp/uynuHFhgCjGmpxKToLiaQFhk4aySaHDMJbTJ2BeQk3YY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fBO29/j0; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fBO29/j0" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-7375e2642b4so250899b3a.2 for ; Wed, 14 May 2025 16:44:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266247; x=1747871047; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8H4kuVjKe4FyXjhH122Og8e4/l1o3DJQcDfSdK/hCao=; b=fBO29/j03QNVCbMbvjUkh5oMcUQlhF8iBOy0wPAvhHssH2W8ovTT6cB9+um9b0Dp0l HY2lsSj99IMgkHNRVMSvbP7zwFvoF1ucX+ge9V/Ao8VVgmG1Q7ENuGpa/Dr64Oy7Gb+5 ADCpjQzcFDtRgwPjqZlF2Wm1RUcIP4Px27y+sOOCnmz081NSkgDGLDYPDBjPD+GC/vKs iFCkiz3dsdrnzjiCqcQtL4sHwTTJZ/Xi0GN8BTTI7kZHmWMOJI56wd1MSWd8H+a7Do8A lZ5hkhiSWQLRyLVatzuGpkLI7J+fn8KAm3j9tBbsjYALiQV82oo78S2TfiFmeUcUNUVM y/tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266247; x=1747871047; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8H4kuVjKe4FyXjhH122Og8e4/l1o3DJQcDfSdK/hCao=; b=iY1fd5ervL94KPxm3N2dcWvEMqg1e4a+s2y513+FiL26ajsKd/h5SKu15jSJYUJGSw 7HGvYLHfqnO2rAjkaRBTKSSxaCR/Q8woqgR66AnVjgC6FfJfHKJNlAqV+hylk9Qy4i+X 0LvklJAihJiNyo9+XfobfSNvM7w1ZeGkWdsXuT4J59Z3mlUkBaKAloqcjBjotQb8Ro0f rwIIvDTJI4GW9eJxa27mn/bTWYKkTdByte4+KgB7YKptL1/8jJ6yGZbd4UjIBAshitXP GvH72L7JWTJZIrUaFwKTBxfK5OxGxSKi7S1cOKwk6ginnngqZkqNw1joY5ggfMzcAv5T WIhg== X-Forwarded-Encrypted: i=1; AJvYcCUphn+Y3d2H7EgyEXYeLy5IFrH5xnZO5ijwcJi6N03iUJeCZFIqZC8uXw6beSdplMlGlukKRoOcePfg9Gw=@vger.kernel.org X-Gm-Message-State: AOJu0Yzrhr3x9Bb7VMxSzfLNIXKcqoUjH4CAl40nmf9lD8Jv9yRmeWCm Mn5DgFZvQ5SzkRAKHZcYgxXQ9HIhT1ip6FRIt58YNyLRQob4++qXbU/ZEaldso0Lv5VSrc2bNsh aTkMUa4hvBexRraxYUsxcYg== X-Google-Smtp-Source: AGHT+IGiAe2NB0jbTkTUU7WLr9Mqfo/T69az13j5YN1qfxr++PxmdC5TExfGFvvHcO1pRG98WRl0DEE4eI7872AoZw== X-Received: from pfbbj12.prod.google.com ([2002:a05:6a00:318c:b0:730:76c4:7144]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:4327:b0:203:c29b:eb6c with SMTP id adf61e73a8af0-215ff0aaf6bmr8757671637.4.1747266247493; Wed, 14 May 2025 16:44:07 -0700 (PDT) Date: Wed, 14 May 2025 16:42:24 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 45/51] KVM: selftests: Test allocation and conversion of subfolios From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This patch adds tests for allocation and conversion of subfolios in a large folio. Change-Id: I37035b2c24398e2c83a2ac5a46b4e6ceed2a8b53 Signed-off-by: Ackerley Tng --- .../kvm/guest_memfd_conversions_test.c | 88 +++++++++++++++++++ 1 file changed, 88 insertions(+) diff --git a/tools/testing/selftests/kvm/guest_memfd_conversions_test.c b/t= ools/testing/selftests/kvm/guest_memfd_conversions_test.c index 435f91424d5f..c31d1abd1b93 100644 --- a/tools/testing/selftests/kvm/guest_memfd_conversions_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_conversions_test.c @@ -672,6 +672,92 @@ static void test_close_with_pinning(size_t test_page_s= ize) __test_close_with_pinning(test_page_size, false); } =20 +static void test_allocate_subfolios(size_t test_page_size) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + size_t increment; + int guest_memfd; + size_t nr_pages; + char *mem; + int i; + + if (test_page_size =3D=3D PAGE_SIZE) + return; + + vm =3D setup_test(test_page_size, /*init_private=3D*/false, &vcpu, + &guest_memfd, &mem); + + nr_pages =3D test_page_size / PAGE_SIZE; + + /* + * Loop backwards to check allocation of the correct subfolio within the + * huge folio. If it were allocated wrongly, the second loop would error + * out because one or more of the checks would be wrong. + */ + increment =3D nr_pages >> 1; + for (i =3D nr_pages - 1; i >=3D 0; i -=3D increment) + host_use_memory(mem + i * PAGE_SIZE, 'X', 'A' + i); + for (i =3D nr_pages - 1; i >=3D 0; i -=3D increment) + host_use_memory(mem + i * PAGE_SIZE, 'A' + i, 'A' + i); + + cleanup_test(test_page_size, vm, guest_memfd, mem); +} + +static void test_convert_subfolios(size_t test_page_size) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + size_t increment; + int guest_memfd; + size_t nr_pages; + int to_convert; + char *mem; + int i; + + if (test_page_size =3D=3D PAGE_SIZE) + return; + + vm =3D setup_test(test_page_size, /*init_private=3D*/true, &vcpu, + &guest_memfd, &mem); + + nr_pages =3D test_page_size / PAGE_SIZE; + + increment =3D nr_pages >> 1; + for (i =3D 0; i < nr_pages; i +=3D increment) { + guest_use_memory(vcpu, + GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + 'X', 'A', 0); + assert_host_cannot_fault(mem + i * PAGE_SIZE); + } + + to_convert =3D round_up(nr_pages / 2, increment); + guest_memfd_convert_shared(guest_memfd, to_convert * PAGE_SIZE, PAGE_SIZE= ); + + + for (i =3D 0; i < nr_pages; i +=3D increment) { + if (i =3D=3D to_convert) + host_use_memory(mem + i * PAGE_SIZE, 'A', 'B'); + else + assert_host_cannot_fault(mem + i * PAGE_SIZE); + + guest_use_memory(vcpu, + GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + 'X', 'B', 0); + } + + guest_memfd_convert_private(guest_memfd, to_convert * PAGE_SIZE, PAGE_SIZ= E); + + for (i =3D 0; i < nr_pages; i +=3D increment) { + guest_use_memory(vcpu, + GUEST_MEMFD_SHARING_TEST_GVA + i * PAGE_SIZE, + 'B', 'C', 0); + assert_host_cannot_fault(mem + i * PAGE_SIZE); + } + + cleanup_test(test_page_size, vm, guest_memfd, mem); +} + static void test_with_size(size_t test_page_size) { test_sharing(test_page_size); @@ -685,6 +771,8 @@ static void test_with_size(size_t test_page_size) test_truncate_shared_while_pinned(test_page_size); test_truncate_private(test_page_size); test_close_with_pinning(test_page_size); + test_allocate_subfolios(test_page_size); + test_convert_subfolios(test_page_size); } =20 int main(int argc, char *argv[]) --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:20 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1D96267F5D for ; Wed, 14 May 2025 23:44:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266251; cv=none; b=hPLZ/K9ReVIPOL0RCcQVNW3+yigSHQeAH4YAm8cvZu0r28JV2R5WBkS23qRzrJ4Ch40fuVeu2E+IzfQB5Otds2BG+WkgolBGfpLioLjIdB+8LyUGbQuS8S4JX2paMx0wIoT3H7X7ztvtS+7GN5/Z94ux3OJ0yA1G3tmxtm5Xigw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266251; c=relaxed/simple; bh=YaXMZqa/CJWT3sVvlMcRl825YHOjaW+jUzinohijxTk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=t1HNVyx9zStU1yWh2tEhhMmF9al+OO7zY1G3/EkPR/m9m8CKz8nlY98RD6bBLDtIRrTYSbf3RBuDp8EukyPIMtir8jDlNRCFIzGdyMXdzr+bSoI9jkBKjkW6PlQEutnpI/SnzwP0i3fKXa5YMtz8SFlj2VX1XsxL0gOFk60nKDo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EdodJE/p; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EdodJE/p" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30e2bd11716so359832a91.3 for ; Wed, 14 May 2025 16:44:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266249; x=1747871049; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LAqx7PB78yrm1J6fClisMvqbdgg3BicIqBxSbwrmM4g=; b=EdodJE/pViH8yZ5i7aF3n2+feodg4+9ORS4RGoYH7yF7QL/sRoP09Qsn8f2v2aH0bN sxYjkOSPWY6XYE0E+f6iS2LToOsUXVLnLdWuJXeHshMz8MJCewxeXFElHHvis3oFYQBp m6+kk803Q0r1Nb+3sPCQNaUMoT+V+4xzTZf8OqDlcl+AVklOeD6CPK8ZvUamVdYmUGxx svTmmNLYfO8Q83EpaJoepQLlaXmRo1BmebejJR2Fn9/mtw6SAGlnvlaMV7uaWpnldutf Q4QRmi/o4RheTQkF8XxmQSD9kG8Smm/O1jPEOXxNqiff24sJDgxeiiZs3VT8STEiuP8v O7vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266249; x=1747871049; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LAqx7PB78yrm1J6fClisMvqbdgg3BicIqBxSbwrmM4g=; b=qWw0VHmFv0M43KRShbqPKGPE6LzrYz82KQ/1YaMJGd0GURD9DS75K/x853AD33dlFy LI2zhPyZ8CSEUMYEnykwuNsoE724T2wgNfW+jpfUsn9zj3eSLrLcGiQsRkvnjHjNzV7+ TUwQjKcbVtZh0/b7z9UB0Dyutdg8AwSQrPe2crsTyoHxVrUGdIOEFTmPv0PgP6VYy67k JxyOcRH3D0qgWbeEm++E2V7forgWPzvI1Fy8dy8ZfmMMLj1dQLmfG1DlelY+7Y7fkEe1 fqyjSK67Au5pd8CQblZaDN7sc5MHM5c5d1tEWaXm92lHTDA/fOMlUPMKi3z+Jd+vpz3E 8wWg== X-Forwarded-Encrypted: i=1; AJvYcCXJevmXaHiWKOyUDZhFiKwXgxBK/uxaiqvfXz54TVL1566MV3mSmYD4ZL3IcCB90QgbVv6ZzuteJ1u+How=@vger.kernel.org X-Gm-Message-State: AOJu0YyOWmiyiIt1OB9Yk2Ezg6vq4cVv3FBRrSB6LslgTm4fCUKY81vE qxxwGH4QGTulnr3fnaYkO76oB6IDqjNsLYoEx5/51iRx7/inq3NsEp0gxcCxKNytfxIuZXbHPOH zM8gKM2Jo8bwsgpjhZRdAUA== X-Google-Smtp-Source: AGHT+IHTYGGI34KSQ5XCdFQVJvWJnhJg093+O7Y4UZIaW3vTNyanshiaJtG95L6+3kUV6GJKWdEgPBz38RMwUlzgHQ== X-Received: from pjbpw18.prod.google.com ([2002:a17:90b:2792:b0:2fe:800f:23a]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:498c:b0:301:1bce:c26f with SMTP id 98e67ed59e1d1-30e5156ed71mr757330a91.3.1747266249098; Wed, 14 May 2025 16:44:09 -0700 (PDT) Date: Wed, 14 May 2025 16:42:25 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <56149cfab1ab08d73618fd3914addd51dd42193a.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 46/51] KVM: selftests: Test that guest_memfd usage is reported via hugetlb From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Using HugeTLB as the huge page allocator for guest_memfd allows reuse of HugeTLB's reporting mechanism, hence HugeTLB stats must be kept up-to-date. This patch tests for the above. Signed-off-by: Ackerley Tng Change-Id: Ida3319b1d40c593d8167a03506c7030e67fc746b --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../kvm/guest_memfd_hugetlb_reporting_test.c | 384 ++++++++++++++++++ ...uest_memfd_provide_hugetlb_cgroup_mount.sh | 36 ++ 3 files changed, 421 insertions(+) create mode 100644 tools/testing/selftests/kvm/guest_memfd_hugetlb_reporti= ng_test.c create mode 100755 tools/testing/selftests/kvm/guest_memfd_provide_hugetlb= _cgroup_mount.sh diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selft= ests/kvm/Makefile.kvm index bc22a5a23c4c..2ffe6bc95a68 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -132,6 +132,7 @@ TEST_GEN_PROGS_x86 +=3D coalesced_io_test TEST_GEN_PROGS_x86 +=3D dirty_log_perf_test TEST_GEN_PROGS_x86 +=3D guest_memfd_test TEST_GEN_PROGS_x86 +=3D guest_memfd_conversions_test +TEST_GEN_PROGS_x86 +=3D guest_memfd_hugetlb_reporting_test TEST_GEN_PROGS_x86 +=3D hardware_disable_test TEST_GEN_PROGS_x86 +=3D memslot_modification_stress_test TEST_GEN_PROGS_x86 +=3D memslot_perf_test diff --git a/tools/testing/selftests/kvm/guest_memfd_hugetlb_reporting_test= .c b/tools/testing/selftests/kvm/guest_memfd_hugetlb_reporting_test.c new file mode 100644 index 000000000000..8ff1dda3e02f --- /dev/null +++ b/tools/testing/selftests/kvm/guest_memfd_hugetlb_reporting_test.c @@ -0,0 +1,384 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Tests that HugeTLB statistics are correct at various points of the life= cycle + * of guest_memfd with 1G page support. + * + * Providing a HUGETLB_CGROUP_PATH will allow cgroup reservations to be + * tested. + * + * Either use + * + * ./guest_memfd_provide_hugetlb_cgroup_mount.sh ./guest_memfd_hugetlb_r= eporting_test + * + * or provide the mount with + * + * export HUGETLB_CGROUP_PATH=3D/tmp/hugetlb-cgroup + * mount -t cgroup -o hugetlb none $HUGETLB_CGROUP_PATH + * ./guest_memfd_hugetlb_reporting_test + * + * + * Copyright (C) 2025 Google LLC + * + * Authors: + * Ackerley Tng + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "kvm_util.h" +#include "test_util.h" +#include "processor.h" + +static unsigned long read_value(const char *file_name) +{ + FILE *fp; + unsigned long num; + + fp =3D fopen(file_name, "r"); + TEST_ASSERT(fp !=3D NULL, "Error opening file %s!\n", file_name); + + TEST_ASSERT_EQ(fscanf(fp, "%lu", &num), 1); + + fclose(fp); + + return num; +} + +enum hugetlb_statistic { + FREE_HUGEPAGES, + NR_HUGEPAGES, + NR_OVERCOMMIT_HUGEPAGES, + RESV_HUGEPAGES, + SURPLUS_HUGEPAGES, + NR_TESTED_HUGETLB_STATISTICS, +}; + +enum hugetlb_cgroup_statistic { + LIMIT_IN_BYTES, + MAX_USAGE_IN_BYTES, + USAGE_IN_BYTES, + NR_TESTED_HUGETLB_CGROUP_STATISTICS, +}; + +enum hugetlb_cgroup_statistic_category { + USAGE =3D 0, + RESERVATION, + NR_HUGETLB_CGROUP_STATISTIC_CATEGORIES, +}; + +static const char *hugetlb_statistics[NR_TESTED_HUGETLB_STATISTICS] =3D { + [FREE_HUGEPAGES] =3D "free_hugepages", + [NR_HUGEPAGES] =3D "nr_hugepages", + [NR_OVERCOMMIT_HUGEPAGES] =3D "nr_overcommit_hugepages", + [RESV_HUGEPAGES] =3D "resv_hugepages", + [SURPLUS_HUGEPAGES] =3D "surplus_hugepages", +}; + +static const char *hugetlb_cgroup_statistics[NR_TESTED_HUGETLB_CGROUP_STAT= ISTICS] =3D { + [LIMIT_IN_BYTES] =3D "limit_in_bytes", + [MAX_USAGE_IN_BYTES] =3D "max_usage_in_bytes", + [USAGE_IN_BYTES] =3D "usage_in_bytes", +}; + +enum test_page_size { + TEST_SZ_2M, + TEST_SZ_1G, + NR_TEST_SIZES, +}; + +struct test_param { + size_t page_size; + int memfd_create_flags; + uint64_t guest_memfd_flags; + char *hugetlb_size_string; + char *hugetlb_cgroup_size_string; +}; + +const struct test_param *test_params(enum test_page_size size) +{ + static const struct test_param params[] =3D { + [TEST_SZ_2M] =3D { + .page_size =3D PG_SIZE_2M, + .memfd_create_flags =3D MFD_HUGETLB | MFD_HUGE_2MB, + .guest_memfd_flags =3D GUEST_MEMFD_FLAG_HUGETLB | GUESTMEM_HUGETLB_FLAG= _2MB, + .hugetlb_size_string =3D "2048kB", + .hugetlb_cgroup_size_string =3D "2MB", + }, + [TEST_SZ_1G] =3D { + .page_size =3D PG_SIZE_1G, + .memfd_create_flags =3D MFD_HUGETLB | MFD_HUGE_1GB, + .guest_memfd_flags =3D GUEST_MEMFD_FLAG_HUGETLB | GUESTMEM_HUGETLB_FLAG= _1GB, + .hugetlb_size_string =3D "1048576kB", + .hugetlb_cgroup_size_string =3D "1GB", + }, + }; + + return ¶ms[size]; +} + +static unsigned long read_hugetlb_statistic(enum test_page_size size, + enum hugetlb_statistic statistic) +{ + char path[PATH_MAX] =3D "/sys/kernel/mm/hugepages/hugepages-"; + + strcat(path, test_params(size)->hugetlb_size_string); + strcat(path, "/"); + strcat(path, hugetlb_statistics[statistic]); + + return read_value(path); +} + +static unsigned long read_hugetlb_cgroup_statistic(const char *hugetlb_cgr= oup_path, + enum test_page_size size, + enum hugetlb_cgroup_statistic statistic, + bool reservations) +{ + char path[PATH_MAX] =3D ""; + + strcat(path, hugetlb_cgroup_path); + + if (hugetlb_cgroup_path[strlen(hugetlb_cgroup_path) - 1] !=3D '/') + strcat(path, "/"); + + strcat(path, "hugetlb."); + strcat(path, test_params(size)->hugetlb_cgroup_size_string); + if (reservations) + strcat(path, ".rsvd"); + strcat(path, "."); + strcat(path, hugetlb_cgroup_statistics[statistic]); + + return read_value(path); +} + +static unsigned long hugetlb_baseline[NR_TEST_SIZES] + [NR_TESTED_HUGETLB_STATISTICS]; + +static unsigned long + hugetlb_cgroup_baseline[NR_TEST_SIZES] + [NR_TESTED_HUGETLB_CGROUP_STATISTICS] + [NR_HUGETLB_CGROUP_STATISTIC_CATEGORIES]; + + +static void establish_baseline(const char *hugetlb_cgroup_path) +{ + const char *p =3D hugetlb_cgroup_path; + int i, j; + + for (i =3D 0; i < NR_TEST_SIZES; ++i) { + for (j =3D 0; j < NR_TESTED_HUGETLB_STATISTICS; ++j) + hugetlb_baseline[i][j] =3D read_hugetlb_statistic(i, j); + + if (!hugetlb_cgroup_path) + continue; + + for (j =3D 0; j < NR_TESTED_HUGETLB_CGROUP_STATISTICS; ++j) { + hugetlb_cgroup_baseline[i][j][USAGE] =3D + read_hugetlb_cgroup_statistic(p, i, j, USAGE); + hugetlb_cgroup_baseline[i][j][RESERVATION] =3D + read_hugetlb_cgroup_statistic(p, i, j, RESERVATION); + } + } +} + +static void assert_stats_at_baseline(const char *hugetlb_cgroup_path) +{ + const char *p =3D hugetlb_cgroup_path; + + /* Enumerate these for easy assertion reading. */ + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_2M, FREE_HUGEPAGES), + hugetlb_baseline[TEST_SZ_2M][FREE_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_2M, NR_HUGEPAGES), + hugetlb_baseline[TEST_SZ_2M][NR_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_2M, NR_OVERCOMMIT_HUGEPAGES= ), + hugetlb_baseline[TEST_SZ_2M][NR_OVERCOMMIT_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_2M, RESV_HUGEPAGES), + hugetlb_baseline[TEST_SZ_2M][RESV_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_2M, SURPLUS_HUGEPAGES), + hugetlb_baseline[TEST_SZ_2M][SURPLUS_HUGEPAGES]); + + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_1G, FREE_HUGEPAGES), + hugetlb_baseline[TEST_SZ_1G][FREE_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_1G, NR_HUGEPAGES), + hugetlb_baseline[TEST_SZ_1G][NR_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_1G, NR_OVERCOMMIT_HUGEPAGES= ), + hugetlb_baseline[TEST_SZ_1G][NR_OVERCOMMIT_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_1G, RESV_HUGEPAGES), + hugetlb_baseline[TEST_SZ_1G][RESV_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(TEST_SZ_1G, SURPLUS_HUGEPAGES), + hugetlb_baseline[TEST_SZ_1G][SURPLUS_HUGEPAGES]); + + if (!hugetlb_cgroup_path) + return; + + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, TEST_SZ_2M, LIMIT_IN_BYTES, USAGE), + hugetlb_cgroup_baseline[TEST_SZ_2M][LIMIT_IN_BYTES][USAGE]); + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, TEST_SZ_2M, MAX_USAGE_IN_BYTES, USAGE), + hugetlb_cgroup_baseline[TEST_SZ_2M][MAX_USAGE_IN_BYTES][USAGE]); + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, TEST_SZ_2M, USAGE_IN_BYTES, USAGE), + hugetlb_cgroup_baseline[TEST_SZ_2M][USAGE_IN_BYTES][USAGE]); + + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, TEST_SZ_1G, LIMIT_IN_BYTES, RESERVATION= ), + hugetlb_cgroup_baseline[TEST_SZ_1G][LIMIT_IN_BYTES][RESERVATION]); + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, TEST_SZ_1G, MAX_USAGE_IN_BYTES, RESERVA= TION), + hugetlb_cgroup_baseline[TEST_SZ_1G][MAX_USAGE_IN_BYTES][RESERVATION]); + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, TEST_SZ_1G, USAGE_IN_BYTES, RESERVATION= ), + hugetlb_cgroup_baseline[TEST_SZ_1G][USAGE_IN_BYTES][RESERVATION]); +} + +static void assert_stats(const char *hugetlb_cgroup_path, + enum test_page_size size, unsigned long num_reserved, + unsigned long num_faulted) +{ + size_t pgsz =3D test_params(size)->page_size; + const char *p =3D hugetlb_cgroup_path; + + TEST_ASSERT_EQ(read_hugetlb_statistic(size, FREE_HUGEPAGES), + hugetlb_baseline[size][FREE_HUGEPAGES] - num_faulted); + TEST_ASSERT_EQ(read_hugetlb_statistic(size, NR_HUGEPAGES), + hugetlb_baseline[size][NR_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(size, NR_OVERCOMMIT_HUGEPAGES), + hugetlb_baseline[size][NR_OVERCOMMIT_HUGEPAGES]); + TEST_ASSERT_EQ(read_hugetlb_statistic(size, RESV_HUGEPAGES), + hugetlb_baseline[size][RESV_HUGEPAGES] + num_reserved - num_fault= ed); + TEST_ASSERT_EQ(read_hugetlb_statistic(size, SURPLUS_HUGEPAGES), + hugetlb_baseline[size][SURPLUS_HUGEPAGES]); + + if (!hugetlb_cgroup_path) + return; + + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, size, LIMIT_IN_BYTES, USAGE), + hugetlb_cgroup_baseline[size][LIMIT_IN_BYTES][USAGE]); + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, size, MAX_USAGE_IN_BYTES, USAGE), + hugetlb_cgroup_baseline[size][MAX_USAGE_IN_BYTES][USAGE]); + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, size, USAGE_IN_BYTES, USAGE), + hugetlb_cgroup_baseline[size][USAGE_IN_BYTES][USAGE] + num_faulted * pgs= z); + + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, size, LIMIT_IN_BYTES, RESERVATION), + hugetlb_cgroup_baseline[size][LIMIT_IN_BYTES][RESERVATION]); + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, size, MAX_USAGE_IN_BYTES, RESERVATION), + hugetlb_cgroup_baseline[size][MAX_USAGE_IN_BYTES][RESERVATION]); + TEST_ASSERT_EQ( + read_hugetlb_cgroup_statistic(p, size, USAGE_IN_BYTES, RESERVATION), + hugetlb_cgroup_baseline[size][USAGE_IN_BYTES][RESERVATION] + num_reserve= d * pgsz); +} + +/* Use hugetlb behavior as a baseline. guest_memfd should have comparable = behavior. */ +static void test_hugetlb_behavior(const char *hugetlb_cgroup_path, enum te= st_page_size test_size) +{ + const struct test_param *param; + char *mem; + int memfd; + + param =3D test_params(test_size); + + assert_stats_at_baseline(hugetlb_cgroup_path); + + memfd =3D memfd_create("guest_memfd_hugetlb_reporting_test", + param->memfd_create_flags); + + assert_stats(hugetlb_cgroup_path, test_size, 0, 0); + + mem =3D mmap(NULL, param->page_size, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_HUGETLB, memfd, 0); + TEST_ASSERT(mem !=3D MAP_FAILED, "Couldn't mmap()"); + + assert_stats(hugetlb_cgroup_path, test_size, 1, 0); + + *mem =3D 'A'; + + assert_stats(hugetlb_cgroup_path, test_size, 1, 1); + + munmap(mem, param->page_size); + + assert_stats(hugetlb_cgroup_path, test_size, 1, 1); + + madvise(mem, param->page_size, MADV_DONTNEED); + + assert_stats(hugetlb_cgroup_path, test_size, 1, 1); + + madvise(mem, param->page_size, MADV_REMOVE); + + assert_stats(hugetlb_cgroup_path, test_size, 1, 1); + + close(memfd); + + assert_stats_at_baseline(hugetlb_cgroup_path); +} + +static void test_guest_memfd_behavior(const char *hugetlb_cgroup_path, + enum test_page_size test_size) +{ + const struct test_param *param; + struct kvm_vm *vm; + int guest_memfd; + + param =3D test_params(test_size); + + assert_stats_at_baseline(hugetlb_cgroup_path); + + vm =3D vm_create_barebones_type(KVM_X86_SW_PROTECTED_VM); + + assert_stats(hugetlb_cgroup_path, test_size, 0, 0); + + guest_memfd =3D vm_create_guest_memfd(vm, param->page_size, + param->guest_memfd_flags); + + /* fd creation reserves pages. */ + assert_stats(hugetlb_cgroup_path, test_size, 1, 0); + + fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE, 0, param->page_size); + + assert_stats(hugetlb_cgroup_path, test_size, 1, 1); + + fallocate(guest_memfd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, + param->page_size); + + assert_stats(hugetlb_cgroup_path, test_size, 1, 0); + + close(guest_memfd); + + /* + * Wait a little for stats to be updated in rcu callback. resv_hugepages + * is updated on truncation in ->free_inode, and ->free_inode() happens + * in an rcu callback. + */ + usleep(300 * 1000); + + assert_stats_at_baseline(hugetlb_cgroup_path); + + kvm_vm_free(vm); +} + +int main(int argc, char *argv[]) +{ + char *hugetlb_cgroup_path; + + hugetlb_cgroup_path =3D getenv("HUGETLB_CGROUP_PATH"); + + establish_baseline(hugetlb_cgroup_path); + + test_hugetlb_behavior(hugetlb_cgroup_path, TEST_SZ_2M); + test_hugetlb_behavior(hugetlb_cgroup_path, TEST_SZ_1G); + + test_guest_memfd_behavior(hugetlb_cgroup_path, TEST_SZ_2M); + test_guest_memfd_behavior(hugetlb_cgroup_path, TEST_SZ_1G); +} diff --git a/tools/testing/selftests/kvm/guest_memfd_provide_hugetlb_cgroup= _mount.sh b/tools/testing/selftests/kvm/guest_memfd_provide_hugetlb_cgroup_= mount.sh new file mode 100755 index 000000000000..4180d49771c8 --- /dev/null +++ b/tools/testing/selftests/kvm/guest_memfd_provide_hugetlb_cgroup_mount.= sh @@ -0,0 +1,36 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0-only +# +# Wrapper that runs test, providing a hugetlb cgroup mount in environment +# variable HUGETLB_CGROUP_PATH +# +# Example: +# ./guest_memfd_provide_hugetlb_cgroup_mount.sh ./guest_memfd_hugetlb_re= porting_test +# +# Copyright (C) 2025, Google LLC. + +script_dir=3D$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null &= & pwd ) + +temp_dir=3D$(mktemp -d /tmp/guest_memfd_hugetlb_reporting_test_XXXXXX) +if [[ -z "$temp_dir" ]]; then + echo "Error: Failed to create temporary directory for hugetlb cgroup mou= nt." >&2 + exit 1 +fi + +delete_temp_dir() { + rm -rf $temp_dir +} +trap delete_temp_dir EXIT + + +mount -t cgroup -o hugetlb none $temp_dir + + +cleanup() { + umount $temp_dir + rm -rf $temp_dir +} +trap cleanup EXIT + + +HUGETLB_CGROUP_PATH=3D$temp_dir $@ --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:20 2025 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38B0B269CE8 for ; Wed, 14 May 2025 23:44:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266252; cv=none; b=btq8LO0GfiS52d6yX0fNCo+/GTuQnfz1D9T89bPDm3tLUVZZ/jGNHt++TSv84FN3S0D8T4Y0tQ92mZ8j4uNN8vffj7XKr51jiFTxAEZk/gvV4J30t08adomNnY3T+LsMV4L53EU3wbwQxjFoeXSholIvNXVPTxXn9bzXCT/0EzM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266252; c=relaxed/simple; bh=RmlnZgFdZqjzPPd7eIlteMHDS+xME2PjQiXp3qME9AU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=K2tZ3Uc2h/G4RQSK2raCSrlwmzTeWsApqV3lErAVZ3Lf4q7lrWwivyfNvdbPxNVDTYZ51XElPo02yZJYMrBX3BsljVlnqMjobeJ5GX+nmXjt+L8U4FK+cAilVqFiIgp509AZC0FYKtrUKU0XIz9T0WxS6c8tzxXs9RNPLxiacXA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2WyHbWhY; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2WyHbWhY" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-22e540fa2d0so2717595ad.1 for ; Wed, 14 May 2025 16:44:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266250; x=1747871050; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=yoLxnUTPsoaNLqVdaJcroJKZWfS4rkiGi/yftlVoHwU=; b=2WyHbWhYQDt9pMVrp4KMzh8C/L/7nug049ml7nsT3/+vLPQfD8C1Vali0Ytyuejs9Y i4dD3xuiFQTw0Dik0+kY7AfNmBeWdC913XiSMM1XL+5HjArG1dbX4gj0nC656WNiRrmz 6SMAPaZYMTEZ5IiB9wga3DyCKkhFv2AileX0FkuFMVMTVoiMoxQmcYjWxMXbY0sBaDY+ zonfamF00cOjoQL2LYb31AzUwR0CWOudIGbAiohw/xbpB6bRX3S3acpSM9t3KgitAHa4 UdB7DjVFrW+jgFlcMFhQ7p25rFErb2DmZli7ms9Dy5MBY1NHmbstW4G+BsI4WF1wAnRf iKDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266250; x=1747871050; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yoLxnUTPsoaNLqVdaJcroJKZWfS4rkiGi/yftlVoHwU=; b=KnGct7pIakqLbCDKG6GM3Z3fBqowWnyufT07MdpJOrSWdn+XC279vYa9KRDLQaB31A 4rXtvqwtkcPnhvK1K67rAiUolAJQzMnmQhqC9PiCzSgdKumqgF9lApChS+4bE/KL9goH FMI5Ygz5F5CzMv9RSVjRV1NsFnyOdUsv/WbeYtdi5oWm13+4S5Qn3OZdsmAiKbw4i0+M AqOn4xsVOx2e+gEC1Hxhmaa63rir5107u71iNgWL4iWlzAmQl0I9VHmt3Ud55lhww8c2 2+sYhBny0oG+eWYXLVcsSifB4Um8Y8vAyfFv5vWcJkgClgHKv0x2iTQ6mo+ole5AoaSW UOjw== X-Forwarded-Encrypted: i=1; AJvYcCUg+bxTQydmrNagzdBieTAUBSzXsPZ3qwOyMYJCQ8iy3bYYTUiCqhzjRC17IOiVVcCYpEki6E4/l+0kQJ0=@vger.kernel.org X-Gm-Message-State: AOJu0Yz67//ABBabJlFTwov1r5m5vK4YJex4FrgwJWPI7T9ak/vqIRgC SNNSaFHRCD6wwjTDsL0ckTMkG4msu+YHiqgBjPRjV5cbwvisw1l8XI0Gh8IkY4eiwvKmeoti/KH nTlQvzZwwNkYgAagehwygvw== X-Google-Smtp-Source: AGHT+IHZbq2m2KZ1WT28Acy6uWBe7uvmxINNxxXuS4/4XNP8bRQlXVNAWhWbDMLmHfKnKz0cepkUgU78XTTuk1uhxw== X-Received: from pgg17.prod.google.com ([2002:a05:6a02:4d91:b0:b1f:fba5:9aad]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:ce04:b0:22c:3294:f038 with SMTP id d9443c01a7336-231b5e3cf88mr5286885ad.18.1747266250517; Wed, 14 May 2025 16:44:10 -0700 (PDT) Date: Wed, 14 May 2025 16:42:26 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 47/51] KVM: selftests: Support various types of backing sources for private memory From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Adds support for various type of backing sources for private memory (in the sense of confidential computing), similar to the backing sources available for shared memory. Change-Id: I683b48c90d74f8cb99e416d26c8fb98331df0bab Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/include/test_util.h | 18 ++++- tools/testing/selftests/kvm/lib/test_util.c | 77 +++++++++++++++++++ 2 files changed, 94 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testin= g/selftests/kvm/include/test_util.h index b4a03784ac4f..bfd9d9a897e3 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -139,9 +139,19 @@ enum vm_mem_backing_src_type { =20 struct vm_mem_backing_src_alias { const char *name; - uint32_t flag; + uint64_t flag; }; =20 +enum vm_private_mem_backing_src_type { + VM_PRIVATE_MEM_SRC_GUEST_MEM, /* Use default page size */ + VM_PRIVATE_MEM_SRC_HUGETLB, /* Use kernel default page size for hugetl= b pages */ + VM_PRIVATE_MEM_SRC_HUGETLB_2MB, + VM_PRIVATE_MEM_SRC_HUGETLB_1GB, + NUM_PRIVATE_MEM_SRC_TYPES, +}; + +#define DEFAULT_VM_PRIVATE_MEM_SRC VM_PRIVATE_MEM_SRC_GUEST_MEM + #define MIN_RUN_DELAY_NS 200000UL =20 bool thp_configured(void); @@ -154,6 +164,12 @@ int get_backing_src_madvise_advice(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); + +void private_mem_backing_src_help(const char *flag); +enum vm_private_mem_backing_src_type parse_private_mem_backing_src_type(co= nst char *type_name); +const struct vm_mem_backing_src_alias *vm_private_mem_backing_src_alias(ui= nt32_t i); +size_t get_private_mem_backing_src_pagesz(uint32_t i); + long get_run_delay(void); =20 /* diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/se= lftests/kvm/lib/test_util.c index 24dc90693afd..8c4d6ec44c41 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -15,6 +15,8 @@ #include #include #include "linux/kernel.h" +#include +#include =20 #include "test_util.h" =20 @@ -288,6 +290,34 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_= src_alias(uint32_t i) return &aliases[i]; } =20 +const struct vm_mem_backing_src_alias *vm_private_mem_backing_src_alias(ui= nt32_t i) +{ + static const struct vm_mem_backing_src_alias aliases[] =3D { + [VM_PRIVATE_MEM_SRC_GUEST_MEM] =3D { + .name =3D "private_mem_guest_mem", + .flag =3D 0, + }, + [VM_PRIVATE_MEM_SRC_HUGETLB] =3D { + .name =3D "private_mem_hugetlb", + .flag =3D GUEST_MEMFD_FLAG_HUGETLB, + }, + [VM_PRIVATE_MEM_SRC_HUGETLB_2MB] =3D { + .name =3D "private_mem_hugetlb_2mb", + .flag =3D GUEST_MEMFD_FLAG_HUGETLB | GUESTMEM_HUGETLB_FLAG_2MB, + }, + [VM_PRIVATE_MEM_SRC_HUGETLB_1GB] =3D { + .name =3D "private_mem_hugetlb_1gb", + .flag =3D GUEST_MEMFD_FLAG_HUGETLB | GUESTMEM_HUGETLB_FLAG_1GB, + }, + }; + _Static_assert(ARRAY_SIZE(aliases) =3D=3D NUM_PRIVATE_MEM_SRC_TYPES, + "Missing new backing private mem src types?"); + + TEST_ASSERT(i < NUM_PRIVATE_MEM_SRC_TYPES, "Private mem backing src type = ID %d too big", i); + + return &aliases[i]; +} + #define MAP_HUGE_PAGE_SIZE(x) (1ULL << ((x >> MAP_HUGE_SHIFT) & MAP_HUGE_M= ASK)) =20 size_t get_backing_src_pagesz(uint32_t i) @@ -333,6 +363,22 @@ int get_backing_src_madvise_advice(uint32_t i) } } =20 +size_t get_private_mem_backing_src_pagesz(uint32_t i) +{ + switch (i) { + case VM_PRIVATE_MEM_SRC_GUEST_MEM: + return getpagesize(); + case VM_PRIVATE_MEM_SRC_HUGETLB: + return get_def_hugetlb_pagesz(); + default: { + uint64_t flag =3D vm_private_mem_backing_src_alias(i)->flag; + + return 1UL << ((flag >> GUESTMEM_HUGETLB_FLAG_SHIFT) & + GUESTMEM_HUGETLB_FLAG_MASK); + } + } +} + bool is_backing_src_hugetlb(uint32_t i) { return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); @@ -369,6 +415,37 @@ enum vm_mem_backing_src_type parse_backing_src_type(co= nst char *type_name) return -1; } =20 +static void print_available_private_mem_backing_src_types(const char *pref= ix) +{ + int i; + + printf("%sAvailable private mem backing src types:\n", prefix); + + for (i =3D 0; i < NUM_PRIVATE_MEM_SRC_TYPES; i++) + printf("%s %s\n", prefix, vm_private_mem_backing_src_alias(i)->name); +} + +void private_mem_backing_src_help(const char *flag) +{ + printf(" %s: specify the type of memory that should be used to\n" + " back guest private memory. (default: %s)\n", + flag, vm_private_mem_backing_src_alias(DEFAULT_VM_PRIVATE_MEM_SRC)= ->name); + print_available_private_mem_backing_src_types(" "); +} + +enum vm_private_mem_backing_src_type parse_private_mem_backing_src_type(co= nst char *type_name) +{ + int i; + + for (i =3D 0; i < NUM_PRIVATE_MEM_SRC_TYPES; i++) + if (!strcmp(type_name, vm_private_mem_backing_src_alias(i)->name)) + return i; + + print_available_private_mem_backing_src_types(""); + TEST_FAIL("Unknown private mem backing src type: %s", type_name); + return -1; +} + long get_run_delay(void) { char path[64]; --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:20 2025 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C13B726A0DF for ; Wed, 14 May 2025 23:44:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266254; cv=none; b=YcMhh5DM/kXcn91kfhzTHnh7wPefIy5otTs0dYbvXlTXk6VB0n0nJW1KbUQ1XYoy8DmXFvtvDGvT0BUQK5g7EQ2GNI9/EmyKibVMxN3a1AEQJagMsEHeI2rxsbVQEA7scMwRDrMTPdeP57byLcOvrshgFVLtk2vF6t6ygUisnDc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266254; c=relaxed/simple; bh=aPgC7xelTHDVWDAUkE1HCYXZKLL8arCU43wTrXitw0Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=BVo2pJ/j385uX6bj4tSc0hBzTAzm1qC9ybJN8uWa/rnW1d30pOsiAfwWA0BBaVVqvMmjQgIgOsBGQ4H/kkh3IiTucr+uv+FWY8bm1TmK7rYnJZcOiQeBLwp5F6ZVG1GnypBvU3q9sE+QHoE5ddzKGFJUxMP/APVgDHTXgOHOSoQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=RmD2sQg9; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RmD2sQg9" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-22e3b59e46eso5979805ad.2 for ; Wed, 14 May 2025 16:44:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266252; x=1747871052; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=VIHO3z7qgUdLCctBcJJdIeKM53vTdpjbrMww0TEetmg=; b=RmD2sQg9MN8h5oYzAJRutyJer64ydMRG+TvuT3E1XO5U3RDpSX44m5ab4YZckA8SD/ lNqRd5Ezwo0NGMFaZEzrSHeCwnr7uK5adnfL8taWsrucKC+2SPsfktXffs5w3hZXSJYA 2gzRunhj8IaQ5F/Kx182nnvw8/OH7Lm5JxDozaRk0J/Sh/8uiqefAqIwS2RKlQnxxjyZ hftQ40PTZMF8R7xuCwYi6CcYjRMu4V0VWEGsK6OH/DmqRJBhP5yEPqx0dpDaGloCimxc QpDJxiXMutZzfACbKEdgMSBtXI1EXtVQfp+H/tB0qZY0jKsNvfvTSxd7eA3DuLanir8w ob7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266252; x=1747871052; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=VIHO3z7qgUdLCctBcJJdIeKM53vTdpjbrMww0TEetmg=; b=OMwvpursrxaISSia4FZUYSEC9np2NCCjS6KwczYG88DuBPYD1lM2YPrb0JiJ36JYje Gfehdsv0XwZINnrmOiysAEEMOBIcrwhocm3s5HLoCwXrkl7q8UZo+alVyeA887g7oCXu v6fiE8iiywBKUhLpEAMwFUpNoWsHgWuyBOAm9De9DIZHzCMlRhaTZ/hdgYW5PQe5pBHJ gZqtDNhVzHubbR8z3+a5mkr2I6+8FiiE7f2sQh0xVdIYMOp7ulgsu3ma0fsd7KPrNBye d6eKECoWPg48GONikKN6f339FxD+Dc/Oub+HN3HI5tCi9an2YYHq4+S0VnAUNEaH4OET 5Yug== X-Forwarded-Encrypted: i=1; AJvYcCU3DOF4MS5hpTP1t/+9gE+/CD8370i7nyX97SXKOKsq/Oi35tgjYCYKdbTUrxGhHRFnvL3fCV4QkL6Ye6k=@vger.kernel.org X-Gm-Message-State: AOJu0YxhlqnYIe2zJu6d7v7tcAy64nTb5ac5hRHUKcaHSeavqEVcPElH 7btQ6AOExQaaqvDax4UL/IgnzEzxqiwrRLK7ZvFcDFXx849UsjCGCokqRUdKj1zAgGovJk+14sA TEX/1wTcZZ98Zz9GEQUOBbQ== X-Google-Smtp-Source: AGHT+IHmwCzSwN6Nfb7PIWHR4ZLGRIGjOq8g3mWPjJm4bmOxXDdJ3v5NacuKAC2o4rPhtu2Mbj8tJejqtkdzvZ+2iQ== X-Received: from plgk17.prod.google.com ([2002:a17:902:ce11:b0:22e:15c1:3510]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:1744:b0:223:f639:69df with SMTP id d9443c01a7336-231b60fd7e5mr4722055ad.41.1747266252099; Wed, 14 May 2025 16:44:12 -0700 (PDT) Date: Wed, 14 May 2025 16:42:27 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 48/51] KVM: selftests: Update test for various private memory backing source types From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update private_mem_conversions_test for various private memory backing source types, testing HugeTLB support in guest_memfd. Change-Id: I50facb166a282f97570591eb331c3f19676b01cc Signed-off-by: Ackerley Tng --- .../kvm/x86/private_mem_conversions_test.c | 42 +++++++++++++------ 1 file changed, 29 insertions(+), 13 deletions(-) diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c= b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c index ec20bb7e95c8..5a0fd9155ce8 100644 --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.c @@ -450,21 +450,18 @@ static void *__test_mem_conversions(void *params) } =20 static void test_mem_conversions(enum vm_mem_backing_src_type src_type, + enum vm_private_mem_backing_src_type private_mem_src_type, uint32_t nr_vcpus, uint32_t nr_memslots, bool back_shared_memory_with_guest_memfd) { - /* - * Allocate enough memory so that each vCPU's chunk of memory can be - * naturally aligned with respect to the size of the backing store. - */ - const size_t alignment =3D max_t(size_t, SZ_2M, get_backing_src_pagesz(sr= c_type)); struct test_thread_args *thread_args[KVM_MAX_VCPUS]; - const size_t per_cpu_size =3D align_up(PER_CPU_DATA_SIZE, alignment); - const size_t memfd_size =3D per_cpu_size * nr_vcpus; - const size_t slot_size =3D memfd_size / nr_memslots; struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; pthread_t threads[KVM_MAX_VCPUS]; + size_t per_cpu_size; + size_t memfd_size; struct kvm_vm *vm; + size_t alignment; + size_t slot_size; int memfd, i, r; uint64_t flags; =20 @@ -473,6 +470,18 @@ static void test_mem_conversions(enum vm_mem_backing_s= rc_type src_type, .type =3D KVM_X86_SW_PROTECTED_VM, }; =20 + /* + * Allocate enough memory so that each vCPU's chunk of memory can be + * naturally aligned with respect to the size of the backing store. + */ + alignment =3D max_t(size_t, SZ_2M, + max_t(size_t, get_backing_src_pagesz(src_type), + get_private_mem_backing_src_pagesz( + private_mem_src_type))); + per_cpu_size =3D align_up(PER_CPU_DATA_SIZE, alignment); + memfd_size =3D per_cpu_size * nr_vcpus; + slot_size =3D memfd_size / nr_memslots; + TEST_ASSERT(slot_size * nr_memslots =3D=3D memfd_size, "The memfd size (0x%lx) needs to be cleanly divisible by the number = of memslots (%u)", memfd_size, nr_memslots); @@ -483,6 +492,7 @@ static void test_mem_conversions(enum vm_mem_backing_sr= c_type src_type, flags =3D back_shared_memory_with_guest_memfd ? GUEST_MEMFD_FLAG_SUPPORT_SHARED : 0; + flags |=3D vm_private_mem_backing_src_alias(private_mem_src_type)->flag; memfd =3D vm_create_guest_memfd(vm, memfd_size, flags); =20 for (i =3D 0; i < nr_memslots; i++) { @@ -547,10 +557,13 @@ static void test_mem_conversions(enum vm_mem_backing_= src_type src_type, static void usage(const char *cmd) { puts(""); - printf("usage: %s [-h] [-g] [-m nr_memslots] [-s mem_type] [-n nr_vcpus]\= n", cmd); + printf("usage: %s [-h] [-g] [-m nr_memslots] [-s mem_type] [-p private_me= m_type] [-n nr_vcpus]\n", + cmd); puts(""); backing_src_help("-s"); puts(""); + private_mem_backing_src_help("-p"); + puts(""); puts(" -n: specify the number of vcpus (default: 1)"); puts(""); puts(" -m: specify the number of memslots (default: 1)"); @@ -561,6 +574,7 @@ static void usage(const char *cmd) =20 int main(int argc, char *argv[]) { + enum vm_private_mem_backing_src_type private_mem_src_type =3D DEFAULT_VM_= PRIVATE_MEM_SRC; enum vm_mem_backing_src_type src_type =3D DEFAULT_VM_MEM_SRC; bool back_shared_memory_with_guest_memfd =3D false; uint32_t nr_memslots =3D 1; @@ -569,11 +583,14 @@ int main(int argc, char *argv[]) =20 TEST_REQUIRE(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(KVM_X86_SW_PROTECTED_V= M)); =20 - while ((opt =3D getopt(argc, argv, "hgm:s:n:")) !=3D -1) { + while ((opt =3D getopt(argc, argv, "hgm:s:p:n:")) !=3D -1) { switch (opt) { case 's': src_type =3D parse_backing_src_type(optarg); break; + case 'p': + private_mem_src_type =3D parse_private_mem_backing_src_type(optarg); + break; case 'n': nr_vcpus =3D atoi_positive("nr_vcpus", optarg); break; @@ -590,9 +607,8 @@ int main(int argc, char *argv[]) } } =20 - test_mem_conversions(src_type, nr_vcpus, nr_memslots, - back_shared_memory_with_guest_memfd); - + test_mem_conversions(src_type, private_mem_src_type, nr_vcpus, + nr_memslots, back_shared_memory_with_guest_memfd); =20 return 0; } --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:20 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7650926AA94 for ; Wed, 14 May 2025 23:44:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266256; cv=none; b=TTKMNHGTgICSRyv3CZ/fvn0C7pc2BK7MzNCG1jzMuJoA1H/hLvcSG327L/5u9lw6EZrz2bvwgkXSj1ndWsNnzRd4RoPL1TY3lpjNC8GCbRBVBdjAs5M2aKAou5YMkCFPjAVpB0zDAc84cGy7e5dhzRU9vOXb3OwX1WKlrLdZc1s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266256; c=relaxed/simple; bh=W4BuF5k7B7Tw/7iEJJF+2Dr7h9c8MMcjTfURTwwSkno=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=fNQgbNxqTtX5e/j609lAFY6wOMW1kbQhGvuOOVNMJ9OgYmwqEui4zuOFCEJ/rV0uVOhiurwFh+KL1y1+BFgXJYJta8Ofha9qDtOBs5RgtGrmzSTHW8m7rwC30RRDi+QBgxAWQBzqEOALyYHCSFJePVcNKLYenoDPFXyK0eRx1Wo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xN3xVQ5M; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xN3xVQ5M" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30ad1e374e2so381739a91.0 for ; Wed, 14 May 2025 16:44:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266254; x=1747871054; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=cgpRGOQjA/CW069YjbpUEsTR9l1DVf09U2PvTNruEdk=; b=xN3xVQ5MBlt7ING8YtP7kdoFTcQQYr+idO4YOSfyE8Ov36J/U2IVYS69BUq7bVUYvk BgUcHYh3wUgg+HuAv60t/SlMBOdOk4cGiJnIS0Ht5RH1DGVfbZ2sVTPn/vv/m44tInVj sghXHb0U4NYVnjbsFBqKd9Go64SNTHw+S5Ty10gTT+6bSQp5lXMl3eXVy16vTNcx8Loy sKlMvcW0WKkxHK8W50y8zkgqrH7QGGuoHNOrw2bMMWFaFsH81PApXdSLEJteHTaP+69r wh9fYBCV4naU0lR5QDBaDCFSGq85ixY16itXygSnvMzesmHzpaZm/nc1n/qkKDfXr2ty Yyhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266254; x=1747871054; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cgpRGOQjA/CW069YjbpUEsTR9l1DVf09U2PvTNruEdk=; b=eugnMFnwKXjREsrIEesbfcAkVw3j37I+b6YXHx2p8VYhHmb14HZ6KyGQbgpmEhrUd0 b8h4mYptnSzE4yYCfNQY2bzPOCrAcc5BOihCpUZj0GZJMgEGid4XOO/7ErlkpJE2RTTR Oj4HRonBUB83WxXOcjqv4RbwPYHz6B7kj4oTQVeHAO2rFCIE4cxRCF/ww2TtMtC2eQSS qdLWlV7XXPszcmJBrB0IOUzj0Ai4WPMmqt8aN8YzpExohsiLjK9LbqcWt1QtJy7/DD8B LyPE0zDb2/3PewmX4V1B9jdiq8bx5HvXKU0K6Nro+a8OppiYTB7rNiOxFeoGc/9LG4Qs LrEQ== X-Forwarded-Encrypted: i=1; AJvYcCUrWqR3R+vxGoy51Wi4A+KcqoD1EpjFhYsBbtyoitC/urzuM8d2B5WIP/iNxPbUPmK2LPJCLQVeUOu9Htc=@vger.kernel.org X-Gm-Message-State: AOJu0YwKbv8he9/VLj7NCo+Rzh1oH+ZNLn71BuWFmmwj6DMdkBNAv/eC LwsJxNJGtCAhaQC3R/nOSVwB5Uyj+BTMkwNaQhY84Xyv4thkzHgqtocGOFmaLWjT8JhaNMc4hL0 zToPAx20ilXQfBRgGsXDNwQ== X-Google-Smtp-Source: AGHT+IGCW/uG3koL7ZmUpNQ3fu2bu9Qxb+Pwj1sAO0kEu9QYRpIBgr98bzhS9ZJB60f0G+PEHbBrD9llUcw4LLEWwA== X-Received: from pjboh6.prod.google.com ([2002:a17:90b:3a46:b0:2f4:465d:5c61]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2d4f:b0:2fe:a336:fe63 with SMTP id 98e67ed59e1d1-30e5190763cmr782897a91.24.1747266253599; Wed, 14 May 2025 16:44:13 -0700 (PDT) Date: Wed, 14 May 2025 16:42:28 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: Subject: [RFC PATCH v2 49/51] KVM: selftests: Update private_mem_conversions_test.sh to test with HugeTLB pages From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Update test script to also test HugeTLB support for guest_memfd. Signed-off-by: Ackerley Tng Change-Id: I7c6cc25d6b86e1e0dc74018f46c7e2796fab6357 --- .../kvm/x86/private_mem_conversions_test.sh | 29 ++++++++++++++----- 1 file changed, 22 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.s= h b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh index 5dda6916e071..0d2c5fa729fd 100755 --- a/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh +++ b/tools/testing/selftests/kvm/x86/private_mem_conversions_test.sh @@ -57,6 +57,17 @@ backing_src_types+=3D( shmem ) backing_src_types+=3D( shared_hugetlb ) || \ echo "skipping shared_hugetlb backing source type" =20 +private_mem_backing_src_types=3D( private_mem_guest_mem ) +[ -n "$hugepage_default_enabled" ] && \ + private_mem_backing_src_types+=3D( private_mem_hugetlb ) || \ + echo "skipping private_mem_hugetlb backing source type" +[ -n "$hugepage_2mb_enabled" ] && \ + private_mem_backing_src_types+=3D( private_mem_hugetlb_2mb ) || \ + echo "skipping private_mem_hugetlb_2mb backing source type" +[ -n "$hugepage_1gb_enabled" ] && \ + private_mem_backing_src_types+=3D( private_mem_hugetlb_1gb ) || \ + echo "skipping private_mem_hugetlb_1gb backing source type" + set +e =20 TEST_EXECUTABLE=3D"$(dirname "$0")/private_mem_conversions_test" @@ -66,17 +77,21 @@ TEST_EXECUTABLE=3D"$(dirname "$0")/private_mem_conversi= ons_test" =20 for src_type in "${backing_src_types[@]}"; do =20 - set -x + for private_mem_src_type in "${private_mem_backing_src_types[@]}"; do =20 - $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test - $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test -m $num_memslots_t= o_test + set -x =20 - $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test -g - $TEST_EXECUTABLE -s "$src_type" -n $num_vcpus_to_test -m $num_memslots_t= o_test -g + $TEST_EXECUTABLE -s "$src_type" -p "$private_mem_src_type" -n $num_vcpu= s_to_test + $TEST_EXECUTABLE -s "$src_type" -p "$private_mem_src_type" -n $num_vcpu= s_to_test -m $num_memslots_to_test =20 - { set +x; } 2>/dev/null + $TEST_EXECUTABLE -s "$src_type" -p "$private_mem_src_type" -n $num_vcpu= s_to_test -g + $TEST_EXECUTABLE -s "$src_type" -p "$private_mem_src_type" -n $num_vcpu= s_to_test -m $num_memslots_to_test -g =20 - echo + { set +x; } 2>/dev/null + + echo + + done =20 done ) --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:20 2025 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEF7326A1D8 for ; Wed, 14 May 2025 23:44:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266257; cv=none; b=Jget1ju2BUrZTXNyHSr/3S/+vN2Z2Pkyooy5rXm7WLxR6ZY7NpggKzwYzosAMM0JkMB2/kangisF4WhgXNzKCYPljMMlj9Qh6b1C+EwgIw3b4l+jUuIpboPs7DQNCz0/Kh6mJiAj5x5xf9gwORp7hN+b9998tGr7avkGoRIky9s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747266257; c=relaxed/simple; bh=eoNudGoVVrSmlzo9/8eda7elJKi5T0wlPjDob88hz8o=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Y1HFTT7RbdLZpyvl2q8Iqh5WHnVEO4VxNNJ6h5janI0Jr7XpSBfLiCk4U8SQgKqENzE9tg6+jlCNqdvwfmyrIddOrHQeRewW99/hLXCbK/2GRt09KWrONTeqr6uLw2VGBnCXQkeAjceAP/NF5MnBYAARuKPR219RUoczYr2Kpb0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ZIUZ4C7H; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ZIUZ4C7H" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-30e48854445so266654a91.2 for ; Wed, 14 May 2025 16:44:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747266255; x=1747871055; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CiDwHat/uPl/OY4yi7AkHkDLS3qdk3FYn5t6iuFb10s=; b=ZIUZ4C7HoQGw2645aJtKvzMePKkDWg2YY3lBkVh8ZoBfWikppORePjvgMFRRc8CEGd Se5wGgpE+W9r1TW3HCFbVqmqPb+U5oq6HJip6HHYTJBX0ZTy129hxEM+oPB5zcg1oNCv k/ex2jFIrFF5isODx6layZZHkpDFGoUh5hROc80TDxdIt2+1IPA1tR2+dvwfV/5OQgXm K5M1T5p18C1k1AkmMjDGEYaIbZuJez88fyJbHqEfSuILs+ufG1fse2DSlCSY0RSGJGxi hAjLlU16jYEpYscnbRxs1GTXZ1g4wLEZ2xeMsKw3LTSbyFxKRd52YNd1zR1OL8yM8bbJ cNvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747266255; x=1747871055; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CiDwHat/uPl/OY4yi7AkHkDLS3qdk3FYn5t6iuFb10s=; b=bCNYBnq1U9mmvq0ZNKgA6kh+3V+jZm08g0bQB+zR5+jEsJVpRy7hwmRhT5BnNckGkG WP8RKqgPnC4W/abeW1+p3Q8Aj2L3b88yFX+9y5TMPfKHpWmJgOhgi3Vgwn1nav+IAPzz fPoCGEoxk9b3GiZu3GQmc0S/R+yyAFCe4YMppLCPMqnG6okhJ4nTPMV5Bu6nvjglxPqE gbx7JeXofBxUgaYUSbPFDA7TyDHkRfrQiUwOzjUznnEyi0nO1h/qzAgAhMhUZQMZmNju liDZmKZMGl4t6Da7Gj1/IhU+wsyEi3UDs9s3faHFg0FeYwxsDfxgSqLDKu7jARrOLRMD iAnQ== X-Forwarded-Encrypted: i=1; AJvYcCU7TS0yzLkoUTV0ZebgjcQ7OnqyPCC+IeWOy7z8EbQ4/p/NtKFH+Gf/iBgnSdIQnK7l5Rf8aEyLBnPRD60=@vger.kernel.org X-Gm-Message-State: AOJu0Yw7EpFppSg3b4rCaqUt98aajuKBSNOcx/RMXWnywsbVZ25DnFn6 /MBReNf9wUMJArgcNFD6dWaLa0zStnFOrmFjGmVREQzU6uCpAdNRO7mNKehkoxp0QXawLO7uDhy 2u4lTFPpcmN/5QHv+grrnwA== X-Google-Smtp-Source: AGHT+IFOAM9UVwj4ad58iQw+X7kCOsmk0nNgnZGussZhDVj2+jakXhlWEPGRVMVXx6qiGavsIkJ+Vleisd/CLOi4bA== X-Received: from pjbsu3.prod.google.com ([2002:a17:90b:5343:b0:2fc:3022:36b8]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:56cd:b0:2ff:7ad4:77b1 with SMTP id 98e67ed59e1d1-30e5156ea2bmr771844a91.2.1747266255223; Wed, 14 May 2025 16:44:15 -0700 (PDT) Date: Wed, 14 May 2025 16:42:29 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1045.g170613ef41-goog Message-ID: <9c38fdff026b84f8e4a3e1279d5ed4eed6dce0ba.1747264138.git.ackerleytng@google.com> Subject: [RFC PATCH v2 50/51] KVM: selftests: Add script to test HugeTLB statistics From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, usama.arif@bytedance.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This script wraps other tests to check that HugeTLB statistics are restored to what they were before the test was run. Does not account HugeTLB statistics updated by other non-test processes running in the background while the test is running. Change-Id: I1d827656ef215fd85e368f4a3629f306e7f33f18 Signed-off-by: Ackerley Tng --- ...memfd_wrap_test_check_hugetlb_reporting.sh | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100755 tools/testing/selftests/kvm/guest_memfd_wrap_test_check= _hugetlb_reporting.sh diff --git a/tools/testing/selftests/kvm/guest_memfd_wrap_test_check_hugetl= b_reporting.sh b/tools/testing/selftests/kvm/guest_memfd_wrap_test_check_hu= getlb_reporting.sh new file mode 100755 index 000000000000..475ec5c4ce1b --- /dev/null +++ b/tools/testing/selftests/kvm/guest_memfd_wrap_test_check_hugetlb_repor= ting.sh @@ -0,0 +1,95 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0-only +# +# Wrapper that runs test, checking that HugeTLB-related statistics have not +# changed before and after test. +# +# Example: +# ./guest_memfd_wrap_test_check_hugetlb_reporting.sh ./guest_memfd_test +# +# Example of combining this with ./guest_memfd_provide_hugetlb_cgroup_moun= t.sh: +# ./guest_memfd_provide_hugetlb_cgroup_mount.sh \ +# ./guest_memfd_wrap_test_check_hugetlb_reporting.sh \ +# ./guest_memfd_hugetlb_reporting_test +# +# Copyright (C) 2025, Google LLC. + +declare -A baseline + +hugetlb_sizes=3D( + "2048kB" + "1048576kB" +) + +statistics=3D( + "free_hugepages" + "nr_hugepages" + "nr_overcommit_hugepages" + "resv_hugepages" + "surplus_hugepages" +) + +cgroup_hugetlb_sizes=3D( + "2MB" + "1GB" +) + +cgroup_statistics=3D( + "limit_in_bytes" + "max_usage_in_bytes" + "usage_in_bytes" +) + +establish_statistics_baseline() { + for size in "${hugetlb_sizes[@]}"; do + + for statistic in "${statistics[@]}"; do + + local path=3D"/sys/kernel/mm/hugepages/hugepages-${size}/${statistic= }" + baseline["$path"]=3D$(cat "$path") + + done + + done + + if [ -n "$HUGETLB_CGROUP_PATH" ]; then + + for size in "${cgroup_hugetlb_sizes[@]}"; do + + for statistic in "${cgroup_statistics[@]}"; do + + local rsvd_path=3D"${HUGETLB_CGROUP_PATH}/hugetlb.${size}.rsvd.${s= tatistic}" + local path=3D"${HUGETLB_CGROUP_PATH}/hugetlb.${size}.${statistic}" + + baseline["$rsvd_path"]=3D$(cat "$rsvd_path") + baseline["$path"]=3D$(cat "$path") + + done + + done + + fi +} + +assert_path_at_baseline() { + local path=3D$1 + + current=3D$(cat "$path") + expected=3D${baseline["$path"]} + if [ "$current" !=3D "$expected" ]; then + echo "$path was $current instead of $expected" + fi +} + +assert_statistics_at_baseline() { + for path in "${!baseline[@]}"; do + assert_path_at_baseline $path + done +} + + +establish_statistics_baseline + +$@ + +assert_statistics_at_baseline --=20 2.49.0.1045.g170613ef41-goog From nobody Thu Dec 18 05:16:20 2025 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD8B84B1E65 for ; Fri, 16 May 2025 00:22:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747354948; cv=none; b=e8xJKel+C2uDmn4OufFS6J7vCt+jQmOW4ZmGf6GwYnLEkkzgiMbifD0riGj3Z1utV8o22lx9BFw7EixfrEe4uLUEcyQ25TpxE/O4Z9AiSh6s7KpMxxQSHMzBBoskfxJosgAeaTQFhtLyfixHUxS4fyCfUudVqZimRpi2hIMBMsg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747354948; c=relaxed/simple; bh=FLr14or9G6ctdhc4NjO2TMqj5hlhyJOyOKBdrjBX+q0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=roDviWKC/gt5ETpQfcMoN/kupdjqYqbkJuBfXTdCUAT1H6xL3DS3C142QFBocAVCggLwtICIDEU7KimmWn9LxP0pFYFcTczpidrV6SVAMQMLMntEyJBX1jK6zSXVQ1Qz5Oyn0sZJTLl/DLGP+uafzi3tmhwNEGgu9J8LuAoomI4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CUXqAjcz; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CUXqAjcz" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-741a845cab2so2140143b3a.0 for ; Thu, 15 May 2025 17:22:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747354945; x=1747959745; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=U2sAkeu7MGicRsmk1eQ6Ys1LarlK/EQzO2qqYVap+q8=; b=CUXqAjczPMuTHriZl6tvaL4123gmRh9zTXElchUlG6gYQCikDJRMz1sCutUHfzgoJf G9sVqQV3rr9ma/GFOyUz8xe0EmS8n2lyWgz3kUK4PfEgNgkm0q9RsJu4xrt7VAlqEksN pPPxv92ER/JvLVvvcpoayrDxJ+hpKqc2oVPgE0fnfwhrKKNDwTtXFbrzBsUgJbMWOFx8 En4szy+an3umatgmNrQ+G/cUAbUxOs/cNqgIsjezLCwixE0mU0BLsFO/m7M9C6LT6Oow /dixVeiszZhi/YLzvg/K/CPiFbEsSrQHIOVKmrOrkXlgwMhrhwVuG+r03Qz6rHAug8AW 4/KA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747354945; x=1747959745; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=U2sAkeu7MGicRsmk1eQ6Ys1LarlK/EQzO2qqYVap+q8=; b=Hpzia0zCTx8tab8AGNuFI+Hk3SQqt/rya/wkBGL5PSFn2tvopFgXqR5lPJ5gVkz5Ic dlHyFPEuYOOhiGuOLiy5vSeiymNOijZdCtOdoyNmkY/nGa77K9CXnzCyanCZAcsCopQi YIv2P0DKcgg/5EXT+BR3ioBOWBDuLeE/iEjCZtFacDWKFEMRdzY+XcqNBkmeui9GAu0p jnN3y3hqIWSyib3xf5xQNBYx0uSEpnaqIKTsjzMiO/gYIBx6iRKb8RellNa+ZCE7DBrA n+25DYdEzaqTRmgf5GRHCf+YvNThe0CE5+GfPxodKZ5yM1JMA9+KSaRZam9r4BpG53FM 80Mw== X-Forwarded-Encrypted: i=1; AJvYcCWDnfmsgWCQnn6ubnZefSdGgTXxM7xlvZDPEKvoYAypXoP0Ksr1zW4EWNeODyAym906QPdiDgI+rQCB8ZQ=@vger.kernel.org X-Gm-Message-State: AOJu0YwPFQtyqk5nCyXbHK+riFX+A243gg/rmWB+N3S1EQu6bsawhcmj OR7uqG8ue3HepxtUIB62pJuxFgcMzBaJOVn/0rJ/1THLWmzP55vwXj7oViGBsqa/zVkIXRvgiFR Fii/CyEsSWB3iuivrq3pNfuTvwA== X-Google-Smtp-Source: AGHT+IE6+aZ6pB0+4YWi/bOE+1/FrllyYKfVOjrPP4jF0YyDdqPgCbM1S1tbN8ddfEjmhOfVoPvdJc2p2g7nYFoIDA== X-Received: from pfbmb27.prod.google.com ([2002:a05:6a00:761b:b0:736:b063:5038]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4fc2:b0:73e:598:7e5b with SMTP id d2e1a72fcca58-742acc906ccmr622342b3a.1.1747354944946; Thu, 15 May 2025 17:22:24 -0700 (PDT) Date: Thu, 15 May 2025 17:22:21 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.49.0.1101.gccaa498523-goog Message-ID: Subject: [RFC PATCH v2 51/51] KVM: selftests: Test guest_memfd for accuracy of st_blocks From: Ackerley Tng To: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org Cc: ackerleytng@google.com, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, binbin.wu@linux.intel.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Test that st_blocks in struct stat (inode->i_blocks) is updated. Change-Id: I67d814f130671b6b64b575e6a25fd17b1994c640 Signed-off-by: Ackerley Tng --- .../testing/selftests/kvm/guest_memfd_test.c | 55 ++++++++++++++++--- 1 file changed, 46 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/kvm/guest_memfd_test.c b/tools/testing= /selftests/kvm/guest_memfd_test.c index c8acccaa9e1d..f51cd876d7dc 100644 --- a/tools/testing/selftests/kvm/guest_memfd_test.c +++ b/tools/testing/selftests/kvm/guest_memfd_test.c @@ -142,41 +142,78 @@ static void test_file_size(int fd, size_t page_size, = size_t total_size) TEST_ASSERT_EQ(sb.st_blksize, page_size); } =20 -static void test_fallocate(int fd, size_t page_size, size_t total_size) +static void assert_st_blocks_equals_size(int fd, size_t page_size, size_t = expected_size) { + struct stat sb; + int ret; + + /* TODO: st_blocks is not updated for 4K-page guest_memfd. */ + if (page_size =3D=3D getpagesize()) + return; + + ret =3D fstat(fd, &sb); + TEST_ASSERT(!ret, "fstat should succeed"); + TEST_ASSERT_EQ(sb.st_blocks, expected_size / 512); +} + +static void test_fallocate(int fd, size_t test_page_size, size_t total_siz= e) +{ + size_t page_size; int ret; =20 ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, total_size); TEST_ASSERT(!ret, "fallocate with aligned offset and size should succeed"= ); + assert_st_blocks_equals_size(fd, test_page_size, total_size); =20 ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, - page_size - 1, page_size); + test_page_size - 1, test_page_size); TEST_ASSERT(ret, "fallocate with unaligned offset should fail"); + assert_st_blocks_equals_size(fd, test_page_size, total_size); =20 - ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE, total_size, page_size); + ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE, total_size, test_page_size); TEST_ASSERT(ret, "fallocate beginning at total_size should fail"); + assert_st_blocks_equals_size(fd, test_page_size, total_size); =20 - ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE, total_size + page_size, page_s= ize); + ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE, total_size + test_page_size, t= est_page_size); TEST_ASSERT(ret, "fallocate beginning after total_size should fail"); + assert_st_blocks_equals_size(fd, test_page_size, total_size); =20 ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, - total_size, page_size); + total_size, test_page_size); TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) at total_size should succeed"); + assert_st_blocks_equals_size(fd, test_page_size, total_size); =20 ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, - total_size + page_size, page_size); + total_size + test_page_size, test_page_size); TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) after total_size should succeed"= ); + assert_st_blocks_equals_size(fd, test_page_size, total_size); =20 ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, - page_size, page_size - 1); + test_page_size, test_page_size - 1); TEST_ASSERT(ret, "fallocate with unaligned size should fail"); + assert_st_blocks_equals_size(fd, test_page_size, total_size); =20 ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, - page_size, page_size); + test_page_size, test_page_size); TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) with aligned offset and size sho= uld succeed"); + assert_st_blocks_equals_size(fd, test_page_size, total_size - test_page_s= ize); =20 - ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE, page_size, page_size); + ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + test_page_size, test_page_size); + TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) in a hole should succeed"); + assert_st_blocks_equals_size(fd, test_page_size, total_size - test_page_s= ize); + + ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE, test_page_size, test_page_size= ); TEST_ASSERT(!ret, "fallocate to restore punched hole should succeed"); + assert_st_blocks_equals_size(fd, test_page_size, total_size); + + page_size =3D getpagesize(); + if (test_page_size =3D=3D page_size) { + ret =3D fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, + test_page_size + page_size, page_size); + TEST_ASSERT(!ret, "fallocate(PUNCH_HOLE) of a subfolio should succeed"); + assert_st_blocks_equals_size(fd, test_page_size, total_size); + } } =20 static void test_invalid_punch_hole(int fd, size_t page_size, size_t total= _size) --=20 2.49.0.1045.g170613ef41-goog