From nobody Mon Feb 9 08:42:11 2026 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=reject dis=none) header.from=google.com ARC-Seal: i=1; a=rsa-sha256; t=1686078361; cv=none; d=zohomail.com; s=zohoarc; b=n2QNLl/xcFnTJhTtZuAtYdlq0+hkmsWO8bvg1PPLOQATXPq3FUNtInLvxLcK9kQ8gwPd+Vo0dTQ7nFLLI2+RwvMz/vVUM0Z8yzzHqkIHIpYHR74VKytzeSeTSiSjrbSl+nxJLj3KDLZ6ytZF5ChemHkBzCmqUNSUUjYFz5WZWVI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1686078361; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=qPbJS0NeJm0ebjWuWAGTg4zHB+jUkimTjx0FXzxIc84=; b=ElRnTXEpCoKAtdGuZaXDVn/xHVQJegtdNGIWw1CJv4jPMYPJ7mcIS0koJ6YcMsA/5y7yEILq/qmN0QrjcE6BjHdj0tYofBfQS4Ae7Qa4V5Ul6ZKgfV57XGYV39BodGiRuwRjWU4FLmZfvk9uu1YD1492stkdAyFTlvcTLwf8zuw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1686078361896571.5058597159303; Tue, 6 Jun 2023 12:06:01 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1q6bz4-0001RI-8g; Tue, 06 Jun 2023 15:04:46 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <3SIN_ZAsKCncVXfZmgZtoibbjjbgZ.XjhlZhp-YZqZgijibip.jmb@flex--ackerleytng.bounces.google.com>) id 1q6bz2-0001Q6-5T for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:44 -0400 Received: from mail-pl1-x649.google.com ([2607:f8b0:4864:20::649]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <3SIN_ZAsKCncVXfZmgZtoibbjjbgZ.XjhlZhp-YZqZgijibip.jmb@flex--ackerleytng.bounces.google.com>) id 1q6bz0-000298-6Q for qemu-devel@nongnu.org; Tue, 06 Jun 2023 15:04:43 -0400 Received: by mail-pl1-x649.google.com with SMTP id d9443c01a7336-1b03f9dfd52so25771935ad.3 for ; Tue, 06 Jun 2023 12:04:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686078280; x=1688670280; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qPbJS0NeJm0ebjWuWAGTg4zHB+jUkimTjx0FXzxIc84=; b=08oQhpNB5bPsCTiYZb6zzITcl3NrY7Nrj0gHP7CL4wEm89iCZHqEqZvXMnagv03rkn GPexWIBMA+li5aiwDX4ihlFm6cVZOeax1+AHpQmOU2YD/QwBli/TC1j7FXVthdfotrbj ohAd8tgwk7CZ9llZkCvOzoa+H0uBPFBXEcWSK2TcUhtWGacdRDhynTXSQrFRtxSj03ll +4JVjB3yWPlWCPldFPbqjqq8D3jeR9j4nRL0TBHcuQBOxSh7kt2ZiDSE77C+EmED3XTt Cx6sMOv8frrdr7pYnwu4yVAynavbLFmrX0Dakhf26NBqNaaItWBILzjwUpL7K2OiHx+L mttg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686078280; x=1688670280; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qPbJS0NeJm0ebjWuWAGTg4zHB+jUkimTjx0FXzxIc84=; b=YcPKwBxLJYQOuYBqX6c93+FmwIVvJN5XMcpIknPtIpqV07gLFNDVO1shlhMdZIUZK3 MutOHmtaEdPc8cB4liCP7nvD++mtHGT2X6QoKzc4nYmfOtToXeJDakaTm6jPcXGqQLuj kTaT2xOe8awRSbp3F+KMN4ipE68CDbRjIjMb/mZRs8CooDRHRyKly+jrDe5iphOJIJC2 8THik8P19gZmeENOVqUnWmtL3QuvNYRoWbt6Y1xmi1eBXeEnN+z2q/iL/Y6rCG8HneY2 SzKjfTHwz+CKpPSfe/lZ8cjJRn9pgSGpI1nggKP9StsIF5Yd+b0GGBXn+VOr9MSXOc3I YyxA== X-Gm-Message-State: AC+VfDz22BN5TdWdK5103PFa3Tw2pSG6HyVPfYtul/bpXmcv0f3MRs8P Korbymk6+1agcWhcKhasJ8j8dZPgjlueItX0eA== X-Google-Smtp-Source: ACHHUZ7+W5XWqHy7WGLlye5fNuolaabE941Ydb/Msxa478YJYlgHnPlrM8+Gzs9sRjg5WLNJlGVtgTPDm/CmJ9Gc2A== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a17:902:da8c:b0:1b0:4b1d:26e1 with SMTP id j12-20020a170902da8c00b001b04b1d26e1mr956600plx.8.1686078280564; Tue, 06 Jun 2023 12:04:40 -0700 (PDT) Date: Tue, 6 Jun 2023 19:04:00 +0000 In-Reply-To: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.41.0.rc0.172.g3f132b7071-goog Message-ID: Subject: [RFC PATCH 15/19] KVM: guest_mem: hugetlb: initialization and cleanup From: Ackerley Tng To: akpm@linux-foundation.org, mike.kravetz@oracle.com, muchun.song@linux.dev, pbonzini@redhat.com, seanjc@google.com, shuah@kernel.org, willy@infradead.org Cc: brauner@kernel.org, chao.p.peng@linux.intel.com, coltonlewis@google.com, david@redhat.com, dhildenb@redhat.com, dmatlack@google.com, erdemaktas@google.com, hughd@google.com, isaku.yamahata@gmail.com, jarkko@kernel.org, jmattson@google.com, joro@8bytes.org, jthoughton@google.com, jun.nakajima@intel.com, kirill.shutemov@linux.intel.com, liam.merwick@oracle.com, mail@maciej.szmigiero.name, mhocko@suse.com, michael.roth@amd.com, qperret@google.com, rientjes@google.com, rppt@kernel.org, steven.price@arm.com, tabba@google.com, vannapurve@google.com, vbabka@suse.cz, vipinsh@google.com, vkuznets@redhat.com, wei.w.wang@intel.com, yu.c.zhang@linux.intel.com, kvm@vger.kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, qemu-devel@nongnu.org, x86@kernel.org, Ackerley Tng Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::649; envelope-from=3SIN_ZAsKCncVXfZmgZtoibbjjbgZ.XjhlZhp-YZqZgijibip.jmb@flex--ackerleytng.bounces.google.com; helo=mail-pl1-x649.google.com X-Spam_score_int: -95 X-Spam_score: -9.6 X-Spam_bar: --------- X-Spam_report: (-9.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, USER_IN_DEF_DKIM_WL=-7.5 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @google.com) X-ZM-MESSAGEID: 1686078364252100001 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" First stage of hugetlb support: add initialization and cleanup routines Signed-off-by: Ackerley Tng --- include/uapi/linux/kvm.h | 25 ++++++++++++ virt/kvm/guest_mem.c | 88 +++++++++++++++++++++++++++++++++++++--- 2 files changed, 108 insertions(+), 5 deletions(-) diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 0fa665e8862a..1df0c802c29f 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -13,6 +13,7 @@ #include #include #include +#include =20 #define KVM_API_VERSION 12 =20 @@ -2280,6 +2281,30 @@ struct kvm_memory_attributes { #define KVM_CREATE_GUEST_MEMFD _IOWR(KVMIO, 0xd4, struct kvm_create_guest= _memfd) =20 #define KVM_GUEST_MEMFD_HUGE_PMD (1ULL << 0) +#define KVM_GUEST_MEMFD_HUGETLB (1ULL << 1) + +/* + * Huge page size encoding when KVM_GUEST_MEMFD_HUGETLB is specified, and = a huge + * page size other than the default is desired. See hugetlb_encode.h. All + * known huge page size encodings are provided here. It is the responsibi= lity + * of the application to know which sizes are supported on the running sys= tem. + * See mmap(2) man page for details. + */ +#define KVM_GUEST_MEMFD_HUGE_SHIFT HUGETLB_FLAG_ENCODE_SHIFT +#define KVM_GUEST_MEMFD_HUGE_MASK HUGETLB_FLAG_ENCODE_MASK + +#define KVM_GUEST_MEMFD_HUGE_64KB HUGETLB_FLAG_ENCODE_64KB +#define KVM_GUEST_MEMFD_HUGE_512KB HUGETLB_FLAG_ENCODE_512KB +#define KVM_GUEST_MEMFD_HUGE_1MB HUGETLB_FLAG_ENCODE_1MB +#define KVM_GUEST_MEMFD_HUGE_2MB HUGETLB_FLAG_ENCODE_2MB +#define KVM_GUEST_MEMFD_HUGE_8MB HUGETLB_FLAG_ENCODE_8MB +#define KVM_GUEST_MEMFD_HUGE_16MB HUGETLB_FLAG_ENCODE_16MB +#define KVM_GUEST_MEMFD_HUGE_32MB HUGETLB_FLAG_ENCODE_32MB +#define KVM_GUEST_MEMFD_HUGE_256MB HUGETLB_FLAG_ENCODE_256MB +#define KVM_GUEST_MEMFD_HUGE_512MB HUGETLB_FLAG_ENCODE_512MB +#define KVM_GUEST_MEMFD_HUGE_1GB HUGETLB_FLAG_ENCODE_1GB +#define KVM_GUEST_MEMFD_HUGE_2GB HUGETLB_FLAG_ENCODE_2GB +#define KVM_GUEST_MEMFD_HUGE_16GB HUGETLB_FLAG_ENCODE_16GB =20 struct kvm_create_guest_memfd { __u64 size; diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index 13253af40be6..b533143e2878 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -19,6 +19,7 @@ #include #include #include +#include =20 #include =20 @@ -30,6 +31,11 @@ struct kvm_gmem { struct kvm *kvm; u64 flags; struct xarray bindings; + struct { + struct hstate *h; + struct hugepage_subpool *spool; + struct resv_map *resv_map; + } hugetlb; }; =20 static loff_t kvm_gmem_get_size(struct file *file) @@ -346,6 +352,46 @@ static const struct inode_operations kvm_gmem_iops =3D= { .setattr =3D kvm_gmem_setattr, }; =20 +static int kvm_gmem_hugetlb_setup(struct inode *inode, struct kvm_gmem *gm= em, + loff_t size, u64 flags) +{ + int page_size_log; + int hstate_idx; + long hpages; + struct resv_map *resv_map; + struct hugepage_subpool *spool; + struct hstate *h; + + page_size_log =3D (flags >> KVM_GUEST_MEMFD_HUGE_SHIFT) & KVM_GUEST_MEMFD= _HUGE_MASK; + hstate_idx =3D get_hstate_idx(page_size_log); + if (hstate_idx < 0) + return -ENOENT; + + h =3D &hstates[hstate_idx]; + /* Round up to accommodate size requests that don't align with huge pages= */ + hpages =3D round_up(size, huge_page_size(h)) >> huge_page_shift(h); + spool =3D hugepage_new_subpool(h, hpages, hpages); + if (!spool) + goto out; + + resv_map =3D resv_map_alloc(); + if (!resv_map) + goto out_subpool; + + inode->i_blkbits =3D huge_page_shift(h); + + gmem->hugetlb.h =3D h; + gmem->hugetlb.spool =3D spool; + gmem->hugetlb.resv_map =3D resv_map; + + return 0; + +out_subpool: + kfree(spool); +out: + return -ENOMEM; +} + static struct inode *kvm_gmem_create_inode(struct kvm *kvm, loff_t size, u= 64 flags, struct vfsmount *mnt) { @@ -368,6 +414,12 @@ static struct inode *kvm_gmem_create_inode(struct kvm = *kvm, loff_t size, u64 fla if (!gmem) goto err_inode; =20 + if (flags & KVM_GUEST_MEMFD_HUGETLB) { + err =3D kvm_gmem_hugetlb_setup(inode, gmem, size, flags); + if (err) + goto err_gmem; + } + xa_init(&gmem->bindings); =20 kvm_get_kvm(kvm); @@ -385,6 +437,8 @@ static struct inode *kvm_gmem_create_inode(struct kvm *= kvm, loff_t size, u64 fla =20 return inode; =20 +err_gmem: + kfree(gmem); err_inode: iput(inode); return ERR_PTR(err); @@ -414,6 +468,8 @@ static struct file *kvm_gmem_create_file(struct kvm *kv= m, loff_t size, u64 flags return file; } =20 +#define KVM_GUEST_MEMFD_ALL_FLAGS (KVM_GUEST_MEMFD_HUGE_PMD | KVM_GUEST_ME= MFD_HUGETLB) + int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *gmem) { int fd; @@ -424,8 +480,15 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create= _guest_memfd *gmem) if (size < 0 || !PAGE_ALIGNED(size)) return -EINVAL; =20 - if (flags & ~KVM_GUEST_MEMFD_HUGE_PMD) - return -EINVAL; + if (!(flags & KVM_GUEST_MEMFD_HUGETLB)) { + if (flags & ~(unsigned int)KVM_GUEST_MEMFD_ALL_FLAGS) + return -EINVAL; + } else { + /* Allow huge page size encoding in flags. */ + if (flags & ~(unsigned int)(KVM_GUEST_MEMFD_ALL_FLAGS | + (KVM_GUEST_MEMFD_HUGE_MASK << KVM_GUEST_MEMFD_HUGE_SHIFT))) + return -EINVAL; + } =20 if (flags & KVM_GUEST_MEMFD_HUGE_PMD) { #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -610,7 +673,17 @@ static void kvm_gmem_evict_inode(struct inode *inode) * pointed at this file. */ kvm_gmem_invalidate_begin(kvm, gmem, 0, -1ul); - truncate_inode_pages_final(inode->i_mapping); + if (gmem->flags & KVM_GUEST_MEMFD_HUGETLB) { + truncate_inode_pages_final_prepare(inode->i_mapping); + remove_mapping_hugepages( + inode->i_mapping, gmem->hugetlb.h, gmem->hugetlb.spool, + gmem->hugetlb.resv_map, inode, 0, LLONG_MAX); + + resv_map_release(&gmem->hugetlb.resv_map->refs); + hugepage_put_subpool(gmem->hugetlb.spool); + } else { + truncate_inode_pages_final(inode->i_mapping); + } kvm_gmem_invalidate_end(kvm, gmem, 0, -1ul); =20 mutex_unlock(&kvm->slots_lock); @@ -688,10 +761,15 @@ bool kvm_gmem_check_alignment(const struct kvm_usersp= ace_memory_region2 *mem) { size_t page_size; =20 - if (mem->flags & KVM_GUEST_MEMFD_HUGE_PMD) + if (mem->flags & KVM_GUEST_MEMFD_HUGETLB) { + size_t page_size_log =3D ((mem->flags >> KVM_GUEST_MEMFD_HUGE_SHIFT) + & KVM_GUEST_MEMFD_HUGE_MASK); + page_size =3D 1UL << page_size_log; + } else if (mem->flags & KVM_GUEST_MEMFD_HUGE_PMD) { page_size =3D HPAGE_PMD_SIZE; - else + } else { page_size =3D PAGE_SIZE; + } =20 return (IS_ALIGNED(mem->gmem_offset, page_size) && IS_ALIGNED(mem->memory_size, page_size)); --=20 2.41.0.rc0.172.g3f132b7071-goog