From nobody Sat Nov 30 10:32:39 2024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 92FC71C0DD7 for ; Tue, 10 Sep 2024 23:45:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726011921; cv=none; b=T3m2LNgXWe/neFRU0RBJyqLJw9xLP2t6pdayAf6251QCR04t7MKFXhPiY58L+eULVdRcbU4RbmxdV6+p6QQYUllGi44RaXuW49jhHYrXfCh+bxQTCuo9XGF4m/gg3OJy9bQu/NEMxJbE8d5BlFnGYsWy6BzbNUtc4+HpPlQwtSk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1726011921; c=relaxed/simple; bh=OYgGvX0hYL3mGQHZcgbKQoTaViWJ4HoiAFWlcJhpcoQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QW+zkw8ei5Lxvj1yURREzSeza3pdUJMlgCG6IxhLCfRDkut2mfuxEGqwlCWxlVbZa92uRIzDDMYRabaTkBdtqlEzwZXEICNXgJuuafrx34kGIwY4mWEo4Tvn2jPEtoibg8zLFFseetAz6F0cHNanu10+QZHWmQ8KRShKbxQgKFQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TWnEg+ha; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TWnEg+ha" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6db70bf67faso99451367b3.0 for ; Tue, 10 Sep 2024 16:45:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1726011918; x=1726616718; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tJq0MqYSe34io4UmOdJbczSi4Mtmt5uxlF7QPZDIEdU=; b=TWnEg+haf1eLvMdkzMcC9xC5gy3LKA7p5jQVamv+1lxQIaPHUFJ3d0TPTGcfXH0j03 1oMejKuzzl4EEhNrKgtdWc87G84fxHtXG7ALfU219YXtn7Q+uMHodcwCOIqdiR8jAYFu Gif3VgoEX5yVrIafdddBIIKYxglFaDR7XGAQYK6bsWeqtNQ5M3BZTTSa86tzJMGrJHAk 5dHr5KpAHlTUX/TNvXzJEneKk2Va6gvKAMCoRNidyqeTO7XxO3cc0wg5rRtpX/6g2dGx ikf/Idfa5VgDUf+KxX0vR9hdinqV5rSITVsHGoYrRIytWv+2IC2QreMLaYoO4gn+OYpo sPyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726011918; x=1726616718; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tJq0MqYSe34io4UmOdJbczSi4Mtmt5uxlF7QPZDIEdU=; b=RiMSibRRyFOVwiRvBuheZ4dj+ycII/4ni7lCkBQupHwbA3MHrD+EfQBLgrRrPlMA5U X442j5ntW7KDbzl1yc9ulC1xCocJ+AftmVB+I+e8TMUftVDpssRFaunPdiZxUqVu3yvG BPVqKWY+r+Oj07YHgmuJjGWBFDJM9P8GuEAu3yVb4Wr+wvN+aXhlekq2bmqkD33D1tqI lG2YqTWmNoBt7zfs/Vtkpk4bDet4x1ZIrGq1azoOUHb2Ro1RQcNqmtgP22a6x4MUl6Sx gFf9DhDroZYWuc9KyOdYn0zcj0V9shqbhBgLEToRKZkBcwBF82raTx8aBl5y92kniMN9 OPLw== X-Forwarded-Encrypted: i=1; AJvYcCWwc+q3zEOqPGr0A2d9oBA6u+j8ob7AV/jG24LI1BAz6zLSD0b/eiSqkJaAl1bsudOzmeaPxRlWfc839ZM=@vger.kernel.org X-Gm-Message-State: AOJu0YzLMmNRWHHxKqPlK26j4sLhnBgnFMi2ZhD0pW2Izddn0koAog2n rPECxsXpmQ6YE4Mr5MXTCiINKVqX6Bmf3JJukSFKTiJcQVH2I3FXXCqD1PAuUDNpj8In5AJgnJ2 Xl7CKz3SrJTn5eFkfVqUvCA== X-Google-Smtp-Source: AGHT+IGSLUm8oBiaMD+q3wklbXJeQqiWR8FYu/x3IcQqY2JhzZy9eAY0OzQzZWTZkFFReNAsglf+owucD34Qou3lfw== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:13f8]) (user=ackerleytng job=sendgmr) by 2002:a05:690c:38c:b0:6af:623c:7694 with SMTP id 00721157ae682-6db44a5d200mr10772047b3.0.1726011918533; Tue, 10 Sep 2024 16:45:18 -0700 (PDT) Date: Tue, 10 Sep 2024 23:43:57 +0000 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: X-Mailer: git-send-email 2.46.0.598.g6f2099f65c-goog Message-ID: Subject: [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a struct kvm_gmem_private From: Ackerley Tng To: tabba@google.com, quic_eberman@quicinc.com, roypat@amazon.co.uk, jgg@nvidia.com, peterx@redhat.com, david@redhat.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, mike.kravetz@oracle.com Cc: erdemaktas@google.com, vannapurve@google.com, ackerleytng@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-fsdevel@kvack.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The faultability xarray is stored on the inode since faultability is a property of the guest_memfd's memory contents. In this RFC, presence of an entry in the xarray indicates faultable, but this could be flipped so that presence indicates unfaultable. For flexibility, a special value "FAULT" is used instead of a simple boolean. However, at some stages of a VM's lifecycle there could be more private pages, and at other stages there could be more shared pages. This is likely to be replaced by a better data structure in a future revision to better support ranges. Also store struct kvm_gmem_hugetlb in struct kvm_gmem_hugetlb as a pointer. inode->i_mapping->i_private_data. Co-developed-by: Fuad Tabba Signed-off-by: Fuad Tabba Co-developed-by: Ackerley Tng Signed-off-by: Ackerley Tng Co-developed-by: Vishal Annapurve Signed-off-by: Vishal Annapurve --- virt/kvm/guest_memfd.c | 105 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 94 insertions(+), 11 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 8151df2c03e5..b603518f7b62 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -26,11 +26,21 @@ struct kvm_gmem_hugetlb { struct hugepage_subpool *spool; }; =20 -static struct kvm_gmem_hugetlb *kvm_gmem_hgmem(struct inode *inode) +struct kvm_gmem_inode_private { + struct xarray faultability; + struct kvm_gmem_hugetlb *hgmem; +}; + +static struct kvm_gmem_inode_private *kvm_gmem_private(struct inode *inode) { return inode->i_mapping->i_private_data; } =20 +static struct kvm_gmem_hugetlb *kvm_gmem_hgmem(struct inode *inode) +{ + return kvm_gmem_private(inode)->hgmem; +} + static bool is_kvm_gmem_hugetlb(struct inode *inode) { u64 flags =3D (u64)inode->i_private; @@ -38,6 +48,57 @@ static bool is_kvm_gmem_hugetlb(struct inode *inode) return flags & KVM_GUEST_MEMFD_HUGETLB; } =20 +#define KVM_GMEM_FAULTABILITY_VALUE 0x4641554c54 /* FAULT */ + +/** + * Set faultability of given range of inode indices [@start, @end) to + * @faultable. Return 0 if attributes were successfully updated or negative + * errno on error. + */ +static int kvm_gmem_set_faultable(struct inode *inode, pgoff_t start, pgof= f_t end, + bool faultable) +{ + struct xarray *faultability; + void *val; + pgoff_t i; + + /* + * The expectation is that fewer pages are faultable, hence save memory + * entries are created for faultable pages as opposed to creating + * entries for non-faultable pages. + */ + val =3D faultable ? xa_mk_value(KVM_GMEM_FAULTABILITY_VALUE) : NULL; + faultability =3D &kvm_gmem_private(inode)->faultability; + + /* + * TODO replace this with something else (maybe interval + * tree?). store_range doesn't quite do what we expect if overlapping + * ranges are specified: if we store_range(5, 10, val) and then + * store_range(7, 12, NULL), the entire range [5, 12] will be NULL. For + * now, use the slower xa_store() to store individual entries on indices + * to avoid this. + */ + for (i =3D start; i < end; i++) { + int r; + + r =3D xa_err(xa_store(faultability, i, val, GFP_KERNEL_ACCOUNT)); + if (r) + return r; + } + + return 0; +} + +/** + * Return true if the page at @index is allowed to be faulted in. + */ +static bool kvm_gmem_is_faultable(struct inode *inode, pgoff_t index) +{ + struct xarray *faultability =3D &kvm_gmem_private(inode)->faultability; + + return xa_to_value(xa_load(faultability, index)) =3D=3D KVM_GMEM_FAULTABI= LITY_VALUE; +} + /** * folio_file_pfn - like folio_file_page, but return a pfn. * @folio: The folio which contains this index. @@ -895,11 +956,21 @@ static void kvm_gmem_hugetlb_teardown(struct inode *i= node) =20 static void kvm_gmem_evict_inode(struct inode *inode) { + struct kvm_gmem_inode_private *private =3D kvm_gmem_private(inode); + + /* + * .evict_inode can be called before faultability is set up if there are + * issues during inode creation. + */ + if (private) + xa_destroy(&private->faultability); + if (is_kvm_gmem_hugetlb(inode)) kvm_gmem_hugetlb_teardown(inode); else truncate_inode_pages_final(inode->i_mapping); =20 + kfree(private); clear_inode(inode); } =20 @@ -1028,7 +1099,9 @@ static const struct inode_operations kvm_gmem_iops = =3D { .setattr =3D kvm_gmem_setattr, }; =20 -static int kvm_gmem_hugetlb_setup(struct inode *inode, loff_t size, u64 fl= ags) +static int kvm_gmem_hugetlb_setup(struct inode *inode, + struct kvm_gmem_inode_private *private, + loff_t size, u64 flags) { struct kvm_gmem_hugetlb *hgmem; struct hugepage_subpool *spool; @@ -1036,6 +1109,10 @@ static int kvm_gmem_hugetlb_setup(struct inode *inod= e, loff_t size, u64 flags) struct hstate *h; long hpages; =20 + hgmem =3D kzalloc(sizeof(*hgmem), GFP_KERNEL); + if (!hgmem) + return -ENOMEM; + page_size_log =3D (flags >> KVM_GUEST_MEMFD_HUGE_SHIFT) & KVM_GUEST_MEMFD= _HUGE_MASK; h =3D hstate_sizelog(page_size_log); =20 @@ -1046,21 +1123,16 @@ static int kvm_gmem_hugetlb_setup(struct inode *ino= de, loff_t size, u64 flags) if (!spool) goto err; =20 - hgmem =3D kzalloc(sizeof(*hgmem), GFP_KERNEL); - if (!hgmem) - goto err_subpool; - inode->i_blkbits =3D huge_page_shift(h); =20 hgmem->h =3D h; hgmem->spool =3D spool; - inode->i_mapping->i_private_data =3D hgmem; =20 + private->hgmem =3D hgmem; return 0; =20 -err_subpool: - kfree(spool); err: + kfree(hgmem); return -ENOMEM; } =20 @@ -1068,6 +1140,7 @@ static struct inode *kvm_gmem_inode_make_secure_inode= (const char *name, loff_t size, u64 flags) { const struct qstr qname =3D QSTR_INIT(name, strlen(name)); + struct kvm_gmem_inode_private *private; struct inode *inode; int err; =20 @@ -1079,12 +1152,20 @@ static struct inode *kvm_gmem_inode_make_secure_ino= de(const char *name, if (err) goto out; =20 + err =3D -ENOMEM; + private =3D kzalloc(sizeof(*private), GFP_KERNEL); + if (!private) + goto out; + if (flags & KVM_GUEST_MEMFD_HUGETLB) { - err =3D kvm_gmem_hugetlb_setup(inode, size, flags); + err =3D kvm_gmem_hugetlb_setup(inode, private, size, flags); if (err) - goto out; + goto free_private; } =20 + xa_init(&private->faultability); + inode->i_mapping->i_private_data =3D private; + inode->i_private =3D (void *)(unsigned long)flags; inode->i_op =3D &kvm_gmem_iops; inode->i_mapping->a_ops =3D &kvm_gmem_aops; @@ -1097,6 +1178,8 @@ static struct inode *kvm_gmem_inode_make_secure_inode= (const char *name, =20 return inode; =20 +free_private: + kfree(private); out: iput(inode); =20 --=20 2.46.0.598.g6f2099f65c-goog