From nobody Sun Oct 5 09:22:49 2025 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E968F26B0AE for ; Mon, 29 Sep 2025 01:04:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759107892; cv=none; b=HOsnb7kuxI9AQO3Vo0Bvg5dDXntRt1y4dfNUKq69A/2LoF+B0pRSjqontnB7Tiv0jvY3+wTlP2+FpHEyQG1moSZIpkseThsETkjrpf92vKXAAv8GkpIqb0b+H9PT3XNshl2qe0pqt6KFUTzokkqGXWta8Bm42kh4/i4AX2qm5U8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759107892; c=relaxed/simple; bh=atx7+45CbzyMl9240tLjOxzl3me8oSMCbHABSuWi61w=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lt7zCtxFQDGF4tf8OEi4Z3QrP1LiPc/T2DX/znD9xG3GsBQHgDep44KhPENUfeT/CP28lB1pjLV58imI7qq4D15pem7Z/y+A1ZF6omdTBD0i+Ma8e9r6dA/F/FcZPPoJUOcHgMErxfS1VQmcROsO7GktToNBl5+4za5Mz9HMx94= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=soleen.com; spf=pass smtp.mailfrom=soleen.com; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b=Hm4aFKwA; arc=none smtp.client-ip=209.85.160.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=soleen.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=soleen.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=soleen.com header.i=@soleen.com header.b="Hm4aFKwA" Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-4dfe74ed2e1so12567121cf.2 for ; Sun, 28 Sep 2025 18:04:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; t=1759107888; x=1759712688; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=pwXh0/kKCu0f+t04ISmoQpHfND/zfxsioOU5iAk15yM=; b=Hm4aFKwA9MRTxuJ67DDDYFEYA+z6mMjMQ68x0toevSQ6MScm5DbnCVNNNvDZNw8grE v9tv/Bn+fgXPWclF4cvo2G4Rgmn0RlDBhOG2yobPGNxzwedzOzpvsgVPzBAbzV7Rybum K/sHp4n8uE7gtXg/exmo+yGzgkIPxvw39sKS4akFMwUowYMo75+9zj6Bt906NumdJ1x3 6oyK+j5cB9x3NeN7MNdnXQgvKI8C0MAveypU1KWXTaApnW59pdg9/+dPXbEeMRZ71niM FqxYOlFpM5XJNMKYBybu5Fz+0WUwFAeScm610nVyuOjB177LK4C4TtntiD54F187oifO 5yhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1759107888; x=1759712688; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pwXh0/kKCu0f+t04ISmoQpHfND/zfxsioOU5iAk15yM=; b=SKcPT23RqAuC6vo5Sk0JO3jqhu0bI16LT55QNWmnjoAiUZ7NaFCFdrIWTslyOQqqUO AbJ9uY/TnWpFqwM/RLVJBrrfK3hbIAUNrXrXR66dy0LpGwobLBIXLFm2wVY+MtSSLIkk 8mPZjur+Cli/BlSm4U0XWvNd0PdxgFBXC1D77KU7ZRz+fUbNEhrF6u5S2tC02btELlW1 xeD94xZsCezOkKifmveutAD+v0xB5MPqV/MJnk6ueEOMFog6eY+j5frko72ZhrDVzPky SrhlCVlYks6mZ7Ue41LXOjRS7kTurDc9LN/YVedlqVB4YEEZPsr1Vaqx4iSrHeNpjll8 ngeA== X-Forwarded-Encrypted: i=1; AJvYcCWAQ8Ukceu6subUXsE1RTrNbCrEufgDs/d0uB1XFTtqLTCOUlvgI2uRuUgkyWR6qAfMKG/AdpuCN/iFiVs=@vger.kernel.org X-Gm-Message-State: AOJu0YzhWyV5wR28fBd/l/+2YHDIEwdFtPvUYYqDlEa3Hdk2OMIOvUeT oh0Gku3orVMEkEvZMbt5AomfBqI7LpG+3SduYBJYfWnq3yKeVq28Wyov2OVLSXxI4lE= X-Gm-Gg: ASbGncsdkq7mfv68YCgESRh+0riuL+4LH30jDKNudDtvyHdCV9Zh7f7t4moeDhs35Cv 6MqyOctikfkNVIDKTavYnIw+mzgQe6j6mh+/B2CGUA+t7umLVMD7v+8JGTwM4fS70Bb0215EvBo cIrkb1+ATAcqo2CmFOwzRZ19AK4QfPs9ktSFlEVhOhFxqGlTnfjulHD/JjqphQg//1mgAhZNftS lE2r0S9OYSky+PeWxm/6s8xPeVoSeSWisrUXY5t4X+QM45Rd95pnpBxXnBlYDXczlR+ICP8seaA 9F4VRd0vTj+pF5vd4nf+osBEIbrgndE7Zk1/mdaNDnEi3HJ0akT3E9h8/JU+4YQiHuYuVdjYuOM cAf+g1/hgSTq88SK4wDh+zQQ0SmxXnKwTxRVydocvHbgLpMDWDCciRL3seoBjwydgzEvfMJrDrO b9gG9v8zSGf5SJztIz0g== X-Google-Smtp-Source: AGHT+IGRP/eMVDf7zXMK0pi9Zti3NauS9lYe/VmLc3i8GoMpiU6Y/BcqSpE7f6fMLGjBIl3ZckzaXg== X-Received: by 2002:a05:622a:1b07:b0:4c0:983:9436 with SMTP id d75a77b69052e-4da48e74a08mr188378531cf.34.1759107887482; Sun, 28 Sep 2025 18:04:47 -0700 (PDT) Received: from soleen.c.googlers.com.com (53.47.86.34.bc.googleusercontent.com. [34.86.47.53]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4db0c0fbe63sm64561521cf.23.2025.09.28.18.04.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 28 Sep 2025 18:04:46 -0700 (PDT) From: Pasha Tatashin To: pratyush@kernel.org, jasonmiu@google.com, graf@amazon.com, changyuanl@google.com, pasha.tatashin@soleen.com, rppt@kernel.org, dmatlack@google.com, rientjes@google.com, corbet@lwn.net, rdunlap@infradead.org, ilpo.jarvinen@linux.intel.com, kanie@linux.alibaba.com, ojeda@kernel.org, aliceryhl@google.com, masahiroy@kernel.org, akpm@linux-foundation.org, tj@kernel.org, yoann.congal@smile.fr, mmaurer@google.com, roman.gushchin@linux.dev, chenridong@huawei.com, axboe@kernel.dk, mark.rutland@arm.com, jannh@google.com, vincent.guittot@linaro.org, hannes@cmpxchg.org, dan.j.williams@intel.com, david@redhat.com, joel.granados@kernel.org, rostedt@goodmis.org, anna.schumaker@oracle.com, song@kernel.org, zhangguopeng@kylinos.cn, linux@weissschuh.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, gregkh@linuxfoundation.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rafael@kernel.org, dakr@kernel.org, bartosz.golaszewski@linaro.org, cw00.choi@samsung.com, myungjoo.ham@samsung.com, yesanishhere@gmail.com, Jonathan.Cameron@huawei.com, quic_zijuhu@quicinc.com, aleksander.lobakin@intel.com, ira.weiny@intel.com, andriy.shevchenko@linux.intel.com, leon@kernel.org, lukas@wunner.de, bhelgaas@google.com, wagi@kernel.org, djeffery@redhat.com, stuart.w.hayes@gmail.com, ptyadav@amazon.de, lennart@poettering.net, brauner@kernel.org, linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org, saeedm@nvidia.com, ajayachandra@nvidia.com, jgg@nvidia.com, parav@nvidia.com, leonro@nvidia.com, witu@nvidia.com, hughd@google.com, skhawaja@google.com, chrisl@kernel.org, steven.sistare@oracle.com Subject: [PATCH v4 24/30] luo: allow preserving memfd Date: Mon, 29 Sep 2025 01:03:15 +0000 Message-ID: <20250929010321.3462457-25-pasha.tatashin@soleen.com> X-Mailer: git-send-email 2.51.0.536.g15c5d4f767-goog In-Reply-To: <20250929010321.3462457-1-pasha.tatashin@soleen.com> References: <20250929010321.3462457-1-pasha.tatashin@soleen.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Pratyush Yadav The ability to preserve a memfd allows userspace to use KHO and LUO to transfer its memory contents to the next kernel. This is useful in many ways. For one, it can be used with IOMMUFD as the backing store for IOMMU page tables. Preserving IOMMUFD is essential for performing a hypervisor live update with passthrough devices. memfd support provides the first building block for making that possible. For another, applications with a large amount of memory that takes time to reconstruct, reboots to consume kernel upgrades can be very expensive. memfd with LUO gives those applications reboot-persistent memory that they can use to quickly save and reconstruct that state. While memfd is backed by either hugetlbfs or shmem, currently only support on shmem is added. To be more precise, support for anonymous shmem files is added. The handover to the next kernel is not transparent. All the properties of the file are not preserved; only its memory contents, position, and size. The recreated file gets the UID and GID of the task doing the restore, and the task's cgroup gets charged with the memory. After LUO is in prepared state, the file cannot grow or shrink, and all its pages are pinned to avoid migrations and swapping. The file can still be read from or written to. Co-developed-by: Changyuan Lyu Signed-off-by: Changyuan Lyu Co-developed-by: Pasha Tatashin Signed-off-by: Pasha Tatashin Signed-off-by: Pratyush Yadav --- MAINTAINERS | 2 + mm/Makefile | 1 + mm/memfd_luo.c | 523 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 526 insertions(+) create mode 100644 mm/memfd_luo.c diff --git a/MAINTAINERS b/MAINTAINERS index e99af6101d3c..a17e4e077174 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -14433,6 +14433,7 @@ F: tools/testing/selftests/livepatch/ =20 LIVE UPDATE M: Pasha Tatashin +R: Pratyush Yadav L: linux-kernel@vger.kernel.org S: Maintained F: Documentation/ABI/testing/sysfs-kernel-liveupdate @@ -14441,6 +14442,7 @@ F: Documentation/userspace-api/liveupdate.rst F: include/linux/liveupdate.h F: include/uapi/linux/liveupdate.h F: kernel/liveupdate/ +F: mm/memfd_luo.c F: tools/testing/selftests/liveupdate/ =20 LLC (802.2) diff --git a/mm/Makefile b/mm/Makefile index 21abb3353550..7738ec416f00 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -100,6 +100,7 @@ obj-$(CONFIG_NUMA) +=3D memory-tiers.o obj-$(CONFIG_DEVICE_MIGRATION) +=3D migrate_device.o obj-$(CONFIG_TRANSPARENT_HUGEPAGE) +=3D huge_memory.o khugepaged.o obj-$(CONFIG_PAGE_COUNTER) +=3D page_counter.o +obj-$(CONFIG_LIVEUPDATE) +=3D memfd_luo.o obj-$(CONFIG_MEMCG_V1) +=3D memcontrol-v1.o obj-$(CONFIG_MEMCG) +=3D memcontrol.o vmpressure.o ifdef CONFIG_SWAP diff --git a/mm/memfd_luo.c b/mm/memfd_luo.c new file mode 100644 index 000000000000..221e31c1197e --- /dev/null +++ b/mm/memfd_luo.c @@ -0,0 +1,523 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright (c) 2025, Google LLC. + * Pasha Tatashin + * Changyuan Lyu + * + * Copyright (C) 2025 Amazon.com Inc. or its affiliates. + * Pratyush Yadav + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + +#define PRESERVED_PFN_MASK GENMASK(63, 12) +#define PRESERVED_PFN_SHIFT 12 +#define PRESERVED_FLAG_DIRTY BIT(0) +#define PRESERVED_FLAG_UPTODATE BIT(1) + +#define PRESERVED_FOLIO_PFN(desc) (((desc) & PRESERVED_PFN_MASK) >> PRESER= VED_PFN_SHIFT) +#define PRESERVED_FOLIO_FLAGS(desc) ((desc) & ~PRESERVED_PFN_MASK) +#define PRESERVED_FOLIO_MKDESC(pfn, flags) (((pfn) << PRESERVED_PFN_SHIFT)= | (flags)) + +struct memfd_luo_preserved_folio { + /* + * The folio descriptor is made of 2 parts. The bottom 12 bits are used + * for storing flags, the others for storing the PFN. + */ + u64 foliodesc; + u64 index; +}; + +static int memfd_luo_preserve_folios(struct memfd_luo_preserved_folio *pfo= lios, + struct folio **folios, + unsigned int nr_folios) +{ + int err; + long i; + + for (i =3D 0; i < nr_folios; i++) { + struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + struct folio *folio =3D folios[i]; + unsigned int flags =3D 0; + unsigned long pfn; + + err =3D kho_preserve_folio(folio); + if (err) + goto err_unpreserve; + + pfn =3D folio_pfn(folio); + if (folio_test_dirty(folio)) + flags |=3D PRESERVED_FLAG_DIRTY; + if (folio_test_uptodate(folio)) + flags |=3D PRESERVED_FLAG_UPTODATE; + + pfolio->foliodesc =3D PRESERVED_FOLIO_MKDESC(pfn, flags); + pfolio->index =3D folio->index; + } + + return 0; + +err_unpreserve: + i--; + for (; i >=3D 0; i--) + WARN_ON_ONCE(kho_unpreserve_folio(folios[i])); + return err; +} + +static void memfd_luo_unpreserve_folios(const struct memfd_luo_preserved_f= olio *pfolios, + unsigned int nr_folios) +{ + unsigned int i; + + for (i =3D 0; i < nr_folios; i++) { + const struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + struct folio *folio; + + if (!pfolio->foliodesc) + continue; + + folio =3D pfn_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc)); + + WARN_ON_ONCE(kho_unpreserve_folio(folio)); + unpin_folio(folio); + } +} + +static void *memfd_luo_create_fdt(unsigned long size) +{ + unsigned int order =3D get_order(size); + struct folio *fdt_folio; + int err =3D 0; + void *fdt; + + if (order > MAX_PAGE_ORDER) + return NULL; + + fdt_folio =3D folio_alloc(GFP_KERNEL | __GFP_ZERO, order); + if (!fdt_folio) + return NULL; + + fdt =3D folio_address(fdt_folio); + + err |=3D fdt_create(fdt, (1 << (order + PAGE_SHIFT))); + err |=3D fdt_finish_reservemap(fdt); + err |=3D fdt_begin_node(fdt, ""); + if (err) + goto free; + + return fdt; + +free: + folio_put(fdt_folio); + return NULL; +} + +static int memfd_luo_finish_fdt(void *fdt) +{ + int err; + + err =3D fdt_end_node(fdt); + if (err) + return err; + + return fdt_finish(fdt); +} + +static int memfd_luo_prepare(struct liveupdate_file_handler *handler, + struct file *file, u64 *data) +{ + struct memfd_luo_preserved_folio *preserved_folios; + struct inode *inode =3D file_inode(file); + unsigned int max_folios, nr_folios =3D 0; + int err =3D 0, preserved_size; + struct folio **folios; + long size, nr_pinned; + pgoff_t offset; + void *fdt; + u64 pos; + + inode_lock(inode); + shmem_i_mapping_freeze(inode, true); + + size =3D i_size_read(inode); + if ((PAGE_ALIGN(size) / PAGE_SIZE) > UINT_MAX) { + err =3D -E2BIG; + goto err_unlock; + } + + /* + * Guess the number of folios based on inode size. Real number might end + * up being smaller if there are higher order folios. + */ + max_folios =3D PAGE_ALIGN(size) / PAGE_SIZE; + folios =3D kvmalloc_array(max_folios, sizeof(*folios), GFP_KERNEL); + if (!folios) { + err =3D -ENOMEM; + goto err_unfreeze; + } + + /* + * Pin the folios so they don't move around behind our back. This also + * ensures none of the folios are in CMA -- which ensures they don't + * fall in KHO scratch memory. It also moves swapped out folios back to + * memory. + * + * A side effect of doing this is that it allocates a folio for all + * indices in the file. This might waste memory on sparse memfds. If + * that is really a problem in the future, we can have a + * memfd_pin_folios() variant that does not allocate a page on empty + * slots. + */ + nr_pinned =3D memfd_pin_folios(file, 0, size - 1, folios, max_folios, + &offset); + if (nr_pinned < 0) { + err =3D nr_pinned; + pr_err("failed to pin folios: %d\n", err); + goto err_free_folios; + } + /* nr_pinned won't be more than max_folios which is also unsigned int. */ + nr_folios =3D (unsigned int)nr_pinned; + + preserved_size =3D sizeof(struct memfd_luo_preserved_folio) * nr_folios; + if (check_mul_overflow(sizeof(struct memfd_luo_preserved_folio), + nr_folios, &preserved_size)) { + err =3D -E2BIG; + goto err_unpin; + } + + /* + * Most of the space should be taken by preserved folios. So take its + * size, plus a page for other properties. + */ + fdt =3D memfd_luo_create_fdt(PAGE_ALIGN(preserved_size) + PAGE_SIZE); + if (!fdt) { + err =3D -ENOMEM; + goto err_unpin; + } + + pos =3D file->f_pos; + err =3D fdt_property(fdt, "pos", &pos, sizeof(pos)); + if (err) + goto err_free_fdt; + + err =3D fdt_property(fdt, "size", &size, sizeof(size)); + if (err) + goto err_free_fdt; + + err =3D fdt_property_placeholder(fdt, "folios", preserved_size, + (void **)&preserved_folios); + if (err) { + pr_err("Failed to reserve folios property in FDT: %s\n", + fdt_strerror(err)); + err =3D -ENOMEM; + goto err_free_fdt; + } + + err =3D memfd_luo_preserve_folios(preserved_folios, folios, nr_folios); + if (err) + goto err_free_fdt; + + err =3D memfd_luo_finish_fdt(fdt); + if (err) + goto err_unpreserve; + + err =3D kho_preserve_folio(virt_to_folio(fdt)); + if (err) + goto err_unpreserve; + + kvfree(folios); + inode_unlock(inode); + + *data =3D virt_to_phys(fdt); + return 0; + +err_unpreserve: + memfd_luo_unpreserve_folios(preserved_folios, nr_folios); +err_free_fdt: + folio_put(virt_to_folio(fdt)); +err_unpin: + unpin_folios(folios, nr_pinned); +err_free_folios: + kvfree(folios); +err_unfreeze: + shmem_i_mapping_freeze(inode, false); +err_unlock: + inode_unlock(inode); + return err; +} + +static int memfd_luo_freeze(struct liveupdate_file_handler *handler, + struct file *file, u64 *data) +{ + u64 pos =3D file->f_pos; + void *fdt; + int err; + + if (WARN_ON_ONCE(!*data)) + return -EINVAL; + + fdt =3D phys_to_virt(*data); + + /* + * The pos might have changed since prepare. Everything else stays the + * same. + */ + err =3D fdt_setprop(fdt, 0, "pos", &pos, sizeof(pos)); + if (err) + return err; + + return 0; +} + +static void memfd_luo_cancel(struct liveupdate_file_handler *handler, + struct file *file, u64 data) +{ + const struct memfd_luo_preserved_folio *pfolios; + struct inode *inode =3D file_inode(file); + struct folio *fdt_folio; + void *fdt; + int len; + + if (WARN_ON_ONCE(!data)) + return; + + inode_lock(inode); + shmem_i_mapping_freeze(inode, false); + + fdt =3D phys_to_virt(data); + fdt_folio =3D virt_to_folio(fdt); + pfolios =3D fdt_getprop(fdt, 0, "folios", &len); + if (pfolios) + memfd_luo_unpreserve_folios(pfolios, len / sizeof(*pfolios)); + + kho_unpreserve_folio(fdt_folio); + folio_put(fdt_folio); + inode_unlock(inode); +} + +static struct folio *memfd_luo_get_fdt(u64 data) +{ + return kho_restore_folio((phys_addr_t)data); +} + +static void memfd_luo_discard_folios(const struct memfd_luo_preserved_foli= o *pfolios, + unsigned int nr_folios) +{ + unsigned int i; + + for (i =3D 0; i < nr_folios; i++) { + const struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + struct folio *folio; + phys_addr_t phys; + + if (!pfolio->foliodesc) + continue; + + phys =3D PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc)); + folio =3D kho_restore_folio(phys); + if (!folio) { + pr_warn_ratelimited("Unable to restore folio at physical address: %llx\= n", + phys); + continue; + } + + folio_put(folio); + } +} + +static void memfd_luo_finish(struct liveupdate_file_handler *handler, + struct file *file, u64 data, bool reclaimed) +{ + const struct memfd_luo_preserved_folio *pfolios; + struct folio *fdt_folio; + int len; + + if (reclaimed) + return; + + fdt_folio =3D memfd_luo_get_fdt(data); + + pfolios =3D fdt_getprop(folio_address(fdt_folio), 0, "folios", &len); + if (pfolios) + memfd_luo_discard_folios(pfolios, len / sizeof(*pfolios)); + + folio_put(fdt_folio); +} + +static int memfd_luo_retrieve(struct liveupdate_file_handler *handler, u64= data, + struct file **file_p) +{ + const struct memfd_luo_preserved_folio *pfolios; + int nr_pfolios, len, ret =3D 0, i =3D 0; + struct address_space *mapping; + struct folio *folio, *fdt_folio; + const u64 *pos, *size; + struct inode *inode; + struct file *file; + const void *fdt; + + fdt_folio =3D memfd_luo_get_fdt(data); + if (!fdt_folio) + return -ENOENT; + + fdt =3D page_to_virt(folio_page(fdt_folio, 0)); + + pfolios =3D fdt_getprop(fdt, 0, "folios", &len); + if (!pfolios || len % sizeof(*pfolios)) { + pr_err("invalid 'folios' property\n"); + ret =3D -EINVAL; + goto put_fdt; + } + nr_pfolios =3D len / sizeof(*pfolios); + + size =3D fdt_getprop(fdt, 0, "size", &len); + if (!size || len !=3D sizeof(u64)) { + pr_err("invalid 'size' property\n"); + ret =3D -EINVAL; + goto put_folios; + } + + pos =3D fdt_getprop(fdt, 0, "pos", &len); + if (!pos || len !=3D sizeof(u64)) { + pr_err("invalid 'pos' property\n"); + ret =3D -EINVAL; + goto put_folios; + } + + file =3D shmem_file_setup("", 0, VM_NORESERVE); + + if (IS_ERR(file)) { + ret =3D PTR_ERR(file); + pr_err("failed to setup file: %d\n", ret); + goto put_folios; + } + + inode =3D file->f_inode; + mapping =3D inode->i_mapping; + vfs_setpos(file, *pos, MAX_LFS_FILESIZE); + + for (; i < nr_pfolios; i++) { + const struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + phys_addr_t phys; + u64 index; + int flags; + + if (!pfolio->foliodesc) + continue; + + phys =3D PFN_PHYS(PRESERVED_FOLIO_PFN(pfolio->foliodesc)); + folio =3D kho_restore_folio(phys); + if (!folio) { + pr_err("Unable to restore folio at physical address: %llx\n", + phys); + goto put_file; + } + index =3D pfolio->index; + flags =3D PRESERVED_FOLIO_FLAGS(pfolio->foliodesc); + + /* Set up the folio for insertion. */ + __folio_set_locked(folio); + __folio_set_swapbacked(folio); + + ret =3D mem_cgroup_charge(folio, NULL, mapping_gfp_mask(mapping)); + if (ret) { + pr_err("shmem: failed to charge folio index %d: %d\n", + i, ret); + goto unlock_folio; + } + + ret =3D shmem_add_to_page_cache(folio, mapping, index, NULL, + mapping_gfp_mask(mapping)); + if (ret) { + pr_err("shmem: failed to add to page cache folio index %d: %d\n", + i, ret); + goto unlock_folio; + } + + if (flags & PRESERVED_FLAG_UPTODATE) + folio_mark_uptodate(folio); + if (flags & PRESERVED_FLAG_DIRTY) + folio_mark_dirty(folio); + + ret =3D shmem_inode_acct_blocks(inode, 1); + if (ret) { + pr_err("shmem: failed to account folio index %d: %d\n", + i, ret); + goto unlock_folio; + } + + shmem_recalc_inode(inode, 1, 0); + folio_add_lru(folio); + folio_unlock(folio); + folio_put(folio); + } + + inode->i_size =3D *size; + *file_p =3D file; + folio_put(fdt_folio); + return 0; + +unlock_folio: + folio_unlock(folio); + folio_put(folio); +put_file: + fput(file); + i++; +put_folios: + for (; i < nr_pfolios; i++) { + const struct memfd_luo_preserved_folio *pfolio =3D &pfolios[i]; + + folio =3D kho_restore_folio(PRESERVED_FOLIO_PFN(pfolio->foliodesc)); + if (folio) + folio_put(folio); + } + +put_fdt: + folio_put(fdt_folio); + return ret; +} + +static bool memfd_luo_can_preserve(struct liveupdate_file_handler *handler, + struct file *file) +{ + struct inode *inode =3D file_inode(file); + + return shmem_file(file) && !inode->i_nlink; +} + +static const struct liveupdate_file_ops memfd_luo_file_ops =3D { + .prepare =3D memfd_luo_prepare, + .freeze =3D memfd_luo_freeze, + .cancel =3D memfd_luo_cancel, + .finish =3D memfd_luo_finish, + .retrieve =3D memfd_luo_retrieve, + .can_preserve =3D memfd_luo_can_preserve, + .owner =3D THIS_MODULE, +}; + +static struct liveupdate_file_handler memfd_luo_handler =3D { + .ops =3D &memfd_luo_file_ops, + .compatible =3D "memfd-v1", +}; + +static int __init memfd_luo_init(void) +{ + int err; + + err =3D liveupdate_register_file_handler(&memfd_luo_handler); + if (err) + pr_err("Could not register luo filesystem handler: %d\n", err); + + return err; +} +late_initcall(memfd_luo_init); --=20 2.51.0.536.g15c5d4f767-goog