From nobody Sun Oct 5 09:24:50 2025 Received: from mail-qt1-f176.google.com (mail-qt1-f176.google.com [209.85.160.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9FE3E26F478 for ; Thu, 7 Aug 2025 01:45:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754531136; cv=none; b=ip1QTXjunRHABB2Jhbh23hlZ8u8v+XDzO0salhGKXTlGCEHX+VYOUIyZF/9GqQxOMs1q5zH/l09k3bGES7U7Ke/GGJcFiARyKY9cT9TH/bP2ZwtHN2iXAQrkKFruT80LYd+bFpUKM2uedBQogdDR/zlXNkFANYhhaSJWpep2Zh8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1754531136; c=relaxed/simple; bh=F3LzhC++grM3zTUN87MyrI6xOO9Xu3zZ55hHAozBMgg=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Kobnkdx6Fecw/hbf2nagTRqe8YWHK2+8alhpTfdOMi4ax/5NDuX7bbM1yuecxazJQBtcpven8l7AmMQ/2mmZKdNKv3o/vVsxSxrksBctPa3Yodq5YADgR/wO2YC2WMfZ1VSYYRrY0zp4Zt4MOcal9B4te8Gim75OuVN5F8v0KQE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=soleen.com; spf=pass smtp.mailfrom=soleen.com; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b=Yt78Oyr+; arc=none smtp.client-ip=209.85.160.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=soleen.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=soleen.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b="Yt78Oyr+" Received: by mail-qt1-f176.google.com with SMTP id d75a77b69052e-4af123c6fc4so5511471cf.0 for ; Wed, 06 Aug 2025 18:45:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1754531130; x=1755135930; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=ehkeScdrauztu8T9he2tGLB8hND8bPPz6tbPB+5xOmw=; b=Yt78Oyr+xb02NN3eyX+87MbAVSTApkgT1R6Wztn83jiNpi4TJBiL0Xa2Yqjd4kw2lW Wz0dSAVjxvUpSIIc4cSUBnjH0EOakV4jAIdlmsrSqKHa0cMN191CbGdrXe3OjQEHMDhw Kg7vm7G50M3q8wIecQrqu07H9V4oSmLbqF9ifewk/aSUG8z1Mpe0ReRjbsnuhMX18ZZF YNUi4OVs8XZMzB846HjhNOq0yMzCo1tpENu2K19umeUsk+y3r+Oy25WHOAB5szOGxiJv AodDRqjKTXkJL7eL7YYd1S7xsVwKwHmJysUuHDFatblEI2ehd3Tx/VggnJSK1GTTiIcO gMDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1754531130; x=1755135930; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ehkeScdrauztu8T9he2tGLB8hND8bPPz6tbPB+5xOmw=; b=Viuzv/tAoaZ6BMaMF7MwsDdewsC2BCZO0bo1VayEPgsbfHiDSFbRcn9RB/Z2VOnvg7 aSplQdfi+t4/0Ng7pDyr74V920hqmiD+AcKDgGW6vsft+X0p5BXrWqOJf5frZ9zybqaV ByW0l4/hCw1VtAsjgtcdcmogq7qHQTLd9Tx+KJQ5krLKnphqjW6+4Xb/q9zmg4jRkQUE 6ffqGNlYO8YG8ZOEiGkINOzUTOxCkOSsXIGa3GfUaRSg1Lq6OI4XNxHp3T6Y8UWwdfxH Ygy6T0UfVndPe3eGSGcp/fhnKI1KPDCGWjJAoBb21Y+LChPuS+SKQ89kvn6vrv/+S0JP KS0g== X-Forwarded-Encrypted: i=1; AJvYcCXTVCsXThPq+yInq21OWKmGf1E7mTBAa4MHgAnN/jz7ilynd3hj0xG6ZNUUNXwktXC6HksowSGexIPZPYg=@vger.kernel.org X-Gm-Message-State: AOJu0Yz8Gl/fRt1RG3LvgXaERQL84k+xsUXW6P6u08X0bF442nEGbdMT VbRejNP0PBHcXuYMVP9VF9TjuqT1HErzYOEwwtXwTcrie4xkz6b8uB6W8JAc0i7NaVk= X-Gm-Gg: ASbGncu7+lG40PpsNzGTF/LaHK1In2uz0uIMetV1+NVoy9S9tVdWObRxz75j24kbo9z +NczqyjJyDJLVLqFPA/SLsgN/Xh/N71eOPXp+sJhfdPXZeKtloZ5m9DOFT7Mopzro4nXwMGkazk 5SJJRFgCw/mJM2RR06T4iovr9NF3GuuzwiFf8YRjRyWlovtLmWgzt3kJG7+LkmTF5ikywDuNcev R0ZTKmoMvaYUCOCcuqAQYHaWif0XuaW9hPo+dlQYXQhFw5f7rtSifarAKJ0x0Dxnhc7U/RbduRC Gyox7Hug/EQkNRoSeZthWovLiPMqboQnAXQxojIqnAv/2w7esLXYJvQDsvA+FqDUHEPK0IqCOB1 /YdUyrKdSt1SJcVtV+K8ERsHNf5yUH2RvCezSj0PMcH+yOYlD5chm8PQV8o30VFmbzK0vw4uyTd COqrvXydUbbLn1 X-Google-Smtp-Source: AGHT+IGqYgaN1YVbRBRIlJ65LL6blzyWcpm1/jVEKqTrwyRimKIgHpEcS08wJGhbX5ipdDSbTol6hA== X-Received: by 2002:a05:622a:1211:b0:4b0:6965:dd97 with SMTP id d75a77b69052e-4b0915b344bmr63499471cf.44.1754531129758; Wed, 06 Aug 2025 18:45:29 -0700 (PDT) Received: from soleen.c.googlers.com.com (235.247.85.34.bc.googleusercontent.com. [34.85.247.235]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-7077cde5a01sm92969046d6.70.2025.08.06.18.45.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Aug 2025 18:45:29 -0700 (PDT) From: Pasha Tatashin To: pratyush@kernel.org, jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, pasha.tatashin@soleen.com, rppt@kernel.org, dmatlack@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com, ptyadav@amazon.de, lennart@poettering.net, brauner@kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, saeedm@nvidia.com, ajayachandra@nvidia.com, jgg@nvidia.com, parav@nvidia.com, leonro@nvidia.com, witu@nvidia.com Subject: [PATCH v3 29/30] luo: allow preserving memfd Date: Thu, 7 Aug 2025 01:44:35 +0000 Message-ID: <20250807014442.3829950-30-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.50.1.565.gc32cd1483b-goog In-Reply-To: <20250807014442.3829950-1-pasha.tatashin@soleen.com> References: <20250807014442.3829950-1-pasha.tatashin@soleen.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pratyush Yadav The ability to preserve a memfd allows userspace to use KHO and LUO to transfer its memory contents to the next kernel. This is useful in many ways. For one, it can be used with IOMMUFD as the backing store for IOMMU page tables. Preserving IOMMUFD is essential for performing a hypervisor live update with passthrough devices. memfd support provides the first building block for making that possible. For another, applications with a large amount of memory that takes time to reconstruct, reboots to consume kernel upgrades can be very expensive. memfd with LUO gives those applications reboot-persistent memory that they can use to quickly save and reconstruct that state. While memfd is backed by either hugetlbfs or shmem, currently only support on shmem is added. To be more precise, support for anonymous shmem files is added. The handover to the next kernel is not transparent. All the properties of the file are not preserved; only its memory contents, position, and size. The recreated file gets the UID and GID of the task doing the restore, and the task's cgroup gets charged with the memory. After LUO is in prepared state, the file cannot grow or shrink, and all its pages are pinned to avoid migrations and swapping. The file can still be read from or written to. Co-developed-by: Changyuan Lyu Signed-off-by: Changyuan Lyu Co-developed-by: Pasha Tatashin Signed-off-by: Pasha Tatashin Signed-off-by: Pratyush Yadav --- MAINTAINERS | 2 + mm/Makefile | 1 + mm/memfd_luo.c | 507 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 510 insertions(+) create mode 100644 mm/memfd_luo.c diff --git a/MAINTAINERS b/MAINTAINERS index b88b77977649..7421d21672f3 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14209,6 +14209,7 @@ F: tools/testing/selftests/livepatch/ =20 LIVE UPDATE M: Pasha Tatashin +R: Pratyush Yadav L: linux-kernel@vger.kernel.org S: Maintained F: Documentation/ABI/testing/sysfs-kernel-liveupdate @@ -14218,6 +14219,7 @@ F: Documentation/userspace-api/liveupdate.rst F: include/linux/liveupdate.h F: include/uapi/linux/liveupdate.h F: kernel/liveupdate/ +F: mm/memfd_luo.c F: tools/testing/selftests/liveupdate/ =20 LLC (802.2) diff --git a/mm/Makefile b/mm/Makefile index ef54aa615d9d..0a9936ffc172 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -100,6 +100,7 @@ obj-$(CONFIG_NUMA) +=3D memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) +=3D migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o +obj-$(CONFIG_LIVEUPDATE) +=3D memfd_luo.o obj-$(CONFIG_MEMCG_V1) +=3D memcontrol-v1.o obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o ifdef CONFIG_SWAP diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c new file mode 100644 index 000000000000..0c91b40a2080 --- /dev/null +++ b/mm/memfd_luo.c @@ -0,0 +1,507 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright (c) 2025, Google LLC. + * Pasha Tatashin + * Changyuan Lyu + * + * Copyright (C) 2025 Amazon.com Inc. or its affiliates. + * Pratyush Yadav + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + +static const char memfd_luo_compatible[] =3D "memfd-v1"; + +#define PRESERVED_PFN_MASK GENMASK(63, 12) +#define PRESERVED_PFN_SHIFT 12 +#define PRESERVED_FLAG_DIRTY BIT(0) +#define PRESERVED_FLAG_UPTODATE BIT(1) + +#define PRESERVED_FOLIO_PFN(desc) (((desc) & PRESERVED_PFN_MASK) >> PRESER= VED_PFN_SHIFT) +#define PRESERVED_FOLIO_FLAGS(desc) ((desc) & ~PRESERVED_PFN_MASK) +#define PRESERVED_FOLIO_MKDESC(pfn, flags) (((pfn) << PRESERVED_PFN_SHIFT)= | (flags)) + +struct memfd_luo_preserved_folio { + /* + * The folio descriptor is made of 2 parts. The bottom 12 bits are used + * for storing flags, the others for storing the PFN. + */ + u64 foliodesc; + u64 index; +}; + +static int memfd_luo_preserve_folios(struct memfd_luo_preserved_folio *pfo= lios, + struct folio **folios, + unsigned int nr_folios) +{ + unsigned int i; + int err; + + for (i =3D 0; i < nr_folios; i++) { + struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + struct folio *folio =3D folios[i]; + unsigned int flags =3D 0; + unsigned long pfn; + + err =3D kho_preserve_folio(folio); + if (err) + goto err_unpreserve; + + pfn =3D folio_pfn(folio); + if (folio_test_dirty(folio)) + flags |=3D PRESERVED_FLAG_DIRTY; + if (folio_test_uptodate(folio)) + flags |=3D PRESERVED_FLAG_UPTODATE; + + pfolio->foliodesc =3D PRESERVED_FOLIO_MKDESC(pfn, flags); + pfolio->index =3D folio->index; + } + + return 0; + +err_unpreserve: + i--; + for (; i >=3D 0; i--) + WARN_ON_ONCE(kho_unpreserve_folio(folios[i])); + return err; +} + +static void memfd_luo_unpreserve_folios(const struct memfd_luo_preserved_f= olio *pfolios, + unsigned int nr_folios) +{ + unsigned int i; + + for (i =3D 0; i < nr_folios; i++) { + const struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + struct folio *folio; + + if (!pfolio->foliodesc) + continue; + + folio =3D pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc)); + + kho_unpreserve_folio(folio); + unpin_folio(folio); + } +} + +static void *memfd_luo_create_fdt(unsigned long size) +{ + unsigned int order =3D get_order(size); + struct folio *fdt_folio; + int err =3D 0; + void *fdt; + + if (order > MAX_PAGE_ORDER) + return NULL; + + fdt_folio =3D folio_alloc(GFP_KERNEL, order); + if (!fdt_folio) + return NULL; + + fdt =3D folio_address(fdt_folio); + + err |=3D fdt_create(fdt, (1 << (order + PAGE_SHIFT))); + err |=3D fdt_finish_reservemap(fdt); + err |=3D fdt_begin_node(fdt, ""); + if (err) + goto free; + + return fdt; + +free: + folio_put(fdt_folio); + return NULL; +} + +static int memfd_luo_finish_fdt(void *fdt) +{ + int err; + + err =3D fdt_end_node(fdt); + if (err) + return err; + + return fdt_finish(fdt); +} + +static int memfd_luo_prepare(struct liveupdate_file_handler *handler, + struct file *file, u64 *data) +{ + struct memfd_luo_preserved_folio *preserved_folios; + struct inode *inode =3D file_inode(file); + unsigned int max_folios, nr_folios =3D 0; + int err =3D 0, preserved_size; + struct folio **folios; + long size, nr_pinned; + pgoff_t offset; + void *fdt; + u64 pos; + + if (WARN_ON_ONCE(!shmem_file(file))) + return -EINVAL; + + inode_lock(inode); + shmem_i_mapping_freeze(inode, true); + + size =3D i_size_read(inode); + if ((PAGE_ALIGN(size) / PAGE_SIZE) > UINT_MAX) { + err =3D -E2BIG; + goto err_unlock; + } + + /* + * Guess the number of folios based on inode size. Real number might end + * up being smaller if there are higher order folios. + */ + max_folios =3D PAGE_ALIGN(size) / PAGE_SIZE; + folios =3D kvmalloc_array(max_folios, sizeof(*folios), GFP_KERNEL); + if (!folios) { + err =3D -ENOMEM; + goto err_unfreeze; + } + + /* + * Pin the folios so they don't move around behind our back. This also + * ensures none of the folios are in CMA -- which ensures they don't + * fall in KHO scratch memory. It also moves swapped out folios back to + * memory. + * + * A side effect of doing this is that it allocates a folio for all + * indices in the file. This might waste memory on sparse memfds. If + * that is really a problem in the future, we can have a + * memfd_pin_folios() variant that does not allocate a page on empty + * slots. + */ + nr_pinned =3D memfd_pin_folios(file, 0, size - 1, folios, max_folios, + &offset); + if (nr_pinned < 0) { + err =3D nr_pinned; + pr_err("failed to pin folios: %d\n", err); + goto err_free_folios; + } + /* nr_pinned won't be more than max_folios which is also unsigned int. */ + nr_folios =3D (unsigned int)nr_pinned; + + preserved_size =3D sizeof(struct memfd_luo_preserved_folio) * nr_folios; + if (check_mul_overflow(sizeof(struct memfd_luo_preserved_folio), + nr_folios, &preserved_size)) { + err =3D -E2BIG; + goto err_unpin; + } + + /* + * Most of the space should be taken by preserved folios. So take its + * size, plus a page for other properties. + */ + fdt =3D memfd_luo_create_fdt(PAGE_ALIGN(preserved_size) + PAGE_SIZE); + if (!fdt) { + err =3D -ENOMEM; + goto err_unpin; + } + + pos =3D file->f_pos; + err =3D fdt_property(fdt, "pos", &pos, sizeof(pos)); + if (err) + goto err_free_fdt; + + err =3D fdt_property(fdt, "size", &size, sizeof(size)); + if (err) + goto err_free_fdt; + + err =3D fdt_property_placeholder(fdt, "folios", preserved_size, + (void **)&preserved_folios); + if (err) { + pr_err("Failed to reserve folios property in FDT: %s\n", + fdt_strerror(err)); + err =3D -ENOMEM; + goto err_free_fdt; + } + + err =3D memfd_luo_preserve_folios(preserved_folios, folios, nr_folios); + if (err) + goto err_free_fdt; + + err =3D memfd_luo_finish_fdt(fdt); + if (err) + goto err_unpreserve; + + err =3D kho_preserve_folio(virt_to_folio(fdt)); + if (err) + goto err_unpreserve; + + kvfree(folios); + inode_unlock(inode); + + *data =3D virt_to_phys(fdt); + return 0; + +err_unpreserve: + memfd_luo_unpreserve_folios(preserved_folios, nr_folios); +err_free_fdt: + folio_put(virt_to_folio(fdt)); +err_unpin: + unpin_folios(folios, nr_pinned); +err_free_folios: + kvfree(folios); +err_unfreeze: + shmem_i_mapping_freeze(inode, false); +err_unlock: + inode_unlock(inode); + return err; +} + +static int memfd_luo_freeze(struct liveupdate_file_handler *handler, + struct file *file, u64 *data) +{ + u64 pos =3D file->f_pos; + void *fdt; + int err; + + if (WARN_ON_ONCE(!*data)) + return -EINVAL; + + fdt =3D phys_to_virt(*data); + + /* + * The pos or size might have changed since prepare. Everything else + * stays the same. + */ + err =3D fdt_setprop(fdt, 0, "pos", &pos, sizeof(pos)); + if (err) + return err; + + return 0; +} + +static void memfd_luo_cancel(struct liveupdate_file_handler *handler, + struct file *file, u64 data) +{ + const struct memfd_luo_preserved_folio *pfolios; + struct inode *inode =3D file_inode(file); + struct folio *fdt_folio; + void *fdt; + int len; + + if (WARN_ON_ONCE(!data)) + return; + + inode_lock(inode); + shmem_i_mapping_freeze(inode, false); + + fdt =3D phys_to_virt(data); + fdt_folio =3D virt_to_folio(fdt); + pfolios =3D fdt_getprop(fdt, 0, "folios", &len); + if (pfolios) + memfd_luo_unpreserve_folios(pfolios, len / sizeof(*pfolios)); + + kho_unpreserve_folio(fdt_folio); + folio_put(fdt_folio); + inode_unlock(inode); +} + +static struct folio *memfd_luo_get_fdt(u64 data) +{ + return kho_restore_folio((phys_addr_t)data); +} + +static void memfd_luo_finish(struct liveupdate_file_handler *handler, + struct file *file, u64 data, bool reclaimed) +{ + const struct memfd_luo_preserved_folio *pfolios; + struct folio *fdt_folio; + int len; + + if (reclaimed) + return; + + fdt_folio =3D memfd_luo_get_fdt(data); + + pfolios =3D fdt_getprop(folio_address(fdt_folio), 0, "folios", &len); + if (pfolios) + memfd_luo_unpreserve_folios(pfolios, len / sizeof(*pfolios)); + + folio_put(fdt_folio); +} + +static int memfd_luo_retrieve(struct liveupdate_file_handler *handler, u64= data, + struct file **file_p) +{ + const struct memfd_luo_preserved_folio *pfolios; + int nr_pfolios, len, ret =3D 0, i =3D 0; + struct address_space *mapping; + struct folio *folio, *fdt_folio; + const u64 *pos, *size; + struct inode *inode; + struct file *file; + const void *fdt; + + fdt_folio =3D memfd_luo_get_fdt(data); + if (!fdt_folio) + return -ENOENT; + + fdt =3D page_to_virt(folio_page(fdt_folio, 0)); + + pfolios =3D fdt_getprop(fdt, 0, "folios", &len); + if (!pfolios || len % sizeof(*pfolios)) { + pr_err("invalid 'folios' property\n"); + ret =3D -EINVAL; + goto put_fdt; + } + nr_pfolios =3D len / sizeof(*pfolios); + + size =3D fdt_getprop(fdt, 0, "size", &len); + if (!size || len !=3D sizeof(u64)) { + pr_err("invalid 'size' property\n"); + ret =3D -EINVAL; + goto put_folios; + } + + pos =3D fdt_getprop(fdt, 0, "pos", &len); + if (!pos || len !=3D sizeof(u64)) { + pr_err("invalid 'pos' property\n"); + ret =3D -EINVAL; + goto put_folios; + } + + file =3D shmem_file_setup("", 0, VM_NORESERVE); + + if (IS_ERR(file)) { + ret =3D PTR_ERR(file); + pr_err("failed to setup file: %d\n", ret); + goto put_folios; + } + + inode =3D file->f_inode; + mapping =3D inode->i_mapping; + vfs_setpos(file, *pos, MAX_LFS_FILESIZE); + + for (; i < nr_pfolios; i++) { + const struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + phys_addr_t phys; + u64 index; + int flags; + + if (!pfolio->foliodesc) + continue; + + phys =3D PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc)); + folio =3D kho_restore_folio(phys); + if (!folio) { + pr_err("Unable to restore folio at physical address: %llx\n", + phys); + goto put_file; + } + index =3D pfolio->index; + flags =3D PRESERVED_FOLIO_FLAGS(pfolio->foliodesc); + + /* Set up the folio for insertion. */ + /* + * TODO: Should find a way to unify this and + * shmem_alloc_and_add_folio(). + */ + __folio_set_locked(folio); + __folio_set_swapbacked(folio); + + ret =3D mem_cgroup_charge(folio, NULL, mapping_gfp_mask(mapping)); + if (ret) { + pr_err("shmem: failed to charge folio index %d: %d\n", + i, ret); + goto unlock_folio; + } + + ret =3D shmem_add_to_page_cache(folio, mapping, index, NULL, + mapping_gfp_mask(mapping)); + if (ret) { + pr_err("shmem: failed to add to page cache folio index %d: %d\n", + i, ret); + goto unlock_folio; + } + + if (flags & PRESERVED_FLAG_UPTODATE) + folio_mark_uptodate(folio); + if (flags & PRESERVED_FLAG_DIRTY) + folio_mark_dirty(folio); + + ret =3D shmem_inode_acct_blocks(inode, 1); + if (ret) { + pr_err("shmem: failed to account folio index %d: %d\n", + i, ret); + goto unlock_folio; + } + + shmem_recalc_inode(inode, 1, 0); + folio_add_lru(folio); + folio_unlock(folio); + folio_put(folio); + } + + inode->i_size =3D *size; + *file_p =3D file; + folio_put(fdt_folio); + return 0; + +unlock_folio: + folio_unlock(folio); + folio_put(folio); +put_file: + fput(file); + i++; +put_folios: + for (; i < nr_pfolios; i++) { + const struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + + folio =3D kho_restore_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc)); + if (folio) + folio_put(folio); + } + +put_fdt: + folio_put(fdt_folio); + return ret; +} + +static bool memfd_luo_can_preserve(struct liveupdate_file_handler *handler, + struct file *file) +{ + struct inode *inode =3D file_inode(file); + + return shmem_file(file) && !inode->i_nlink; +} + +static const struct liveupdate_file_ops memfd_luo_file_ops =3D { + .prepare =3D memfd_luo_prepare, + .freeze =3D memfd_luo_freeze, + .cancel =3D memfd_luo_cancel, + .finish =3D memfd_luo_finish, + .retrieve =3D memfd_luo_retrieve, + .can_preserve =3D memfd_luo_can_preserve, + .owner =3D THIS_MODULE, +}; + +static struct liveupdate_file_handler memfd_luo_handler =3D { + .ops =3D &memfd_luo_file_ops, + .compatible =3D memfd_luo_compatible, +}; + +static int __init memfd_luo_init(void) +{ + int err; + + err =3D liveupdate_register_file_handler(&memfd_luo_handler); + if (err) + pr_err("Could not register luo filesystem handler: %d\n", err); + + return err; +} +late_initcall(memfd_luo_init); --=20 2.50.1.565.gc32cd1483b-goog